Saj Sense Protocol (SSP) v1.0

Unified multimodal neural codec transport protocol. 12 modalities. Per-modality encryption with independent forward-secrecy ratchets. AI-native discrete token streams. Sub-100ms end-to-end latency.

6 provisional patent applications filed at IP Australia, March 2026.

Join 200+ researchers and developers tracking the SSP specification

SSP Frame Wire Format 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Ver | Flags | Sequence Number (16) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Timestamp (32 us) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Modality Bitmap (16) | Slots | Priority | SID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | HMAC-SHA256 (64-bit truncated) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 22-byte frame header. 1-255 modality slots per frame. Each slot: independent codec, encryption key, and sync anchor.

Design Goals

G1

Sub-100ms end-to-end latency

Real-time multimodal transport with bounded latency guarantees across all modality types.

G2

Graceful degradation

Perceptual impact scoring determines which modalities degrade first under bandwidth pressure.

G3

Extensible modality slots

Register custom modality IDs (0x10+) without protocol changes. Future-proof by design.

G4

Per-modality encryption

Independent encryption keys per modality. Share audio without exposing biometric data.

G5

AI-native token stream

Discrete token payloads designed for direct consumption by transformer architectures.

G6

Bandwidth-adaptive

Dynamic bitrate allocation across modalities based on perceptual importance and available bandwidth.

G7

Cross-modal prediction

Modality slots declare prediction dependencies enabling cross-modal compression gains.

G8

E2E verification

HMAC-SHA256 integrity verification on every frame. Truncated to 64 bits for wire efficiency.

G9

MPEG-I MIHS interop

Designed for compatibility with ISO/IEC 23090-31:2025 Multimodal Information Handling System.

Gap Analysis

SSP fills capabilities absent from existing transport protocols. No existing standard supports 7+ modality framing, per-modality E2E encryption, or AI-native token output.

Capability RTP WebRTC MIHS MoQT SSP
7+ modality framing Yes
Per-modality E2E encryption Yes
AI-native token output Yes
Cross-modal prediction Yes
Perceptual bitrate allocation Yes
Latent-space watermarking Yes
Selective disclosure Yes
Audio/video streaming Yes Yes Yes Yes Yes

Per-Modality Encryption

Each modality slot carries an independent encryption_key_id referencing a per-modality key established during SSP_KEY_EXCHANGE. Each modality MAY use an independent forward-secrecy ratchet chain.

Selective Disclosure

Audio Key 1
Video Key 2
Biometric Key 3
Recipient A holds Key 1 only:
Audio — decrypted
Video — encrypted, no access
Biometric — encrypted, no access

Encryption Architecture

Cipher
ChaCha20-Poly1305 AEAD
Key Derivation
Per-modality ratchet chain (SSP-ENC-001)
Key ID Field
Byte 11 of slot header (0 = unencrypted)
Forward Secrecy
Independent ratchet per modality
Frame Integrity
HMAC-SHA256 (64-bit truncated)

Modality Registry

12 registered modality IDs. Custom modalities from 0x10. Each modality slot carries independent codec, encryption, and synchronization configuration.

0x01
AudioSpeech
Human vocal content
0x02
AudioAmbient
Environmental audio
0x03
VideoFace
Facial video stream
0x04
VideoScene
Scene/environment video
0x05
HapticVibro
Vibrotactile feedback
0x06
HapticKinesthetic
Force/resistance feedback
0x07
Spatial3D
3D spatial/positional data
0x08
Biometric
Physiological signals
0x09
MotionBody
Full-body motion capture
0x0A
MotionHand
Hand/finger tracking
0x0B
Thermal
Infrared/thermal imaging
0x0C
Emotion
Affective state inference

0x10+ reserved for custom modality registration

Token Stream

SSP frames carry AI-native discrete token payloads. Each modality produces tokens at its own rate. Cross-modal synchronization anchors align streams in time.

AudioSpeech
VideoFace
Biometric
MotionBody
Discrete token
Sync anchor
Different rates per modality. Shared temporal alignment.

Intellectual Property

6 provisional patent applications filed at IP Australia, March 2026. PCT international phase deadline: March 2027. All assessed at novelty 8-10/10. Freedom-to-operate: GREEN across all 6.

SSP-PROTO-001 Filed

Unified Multimodal Transport with Dynamic Modality Registration

Novelty: 8/10 FTO: GREEN
SSP-TOKEN-001 Filed

AI-Native Discrete Token Stream for Heterogeneous Sensory Modalities

Novelty: 9/10 FTO: GREEN
SSP-ENC-001 Filed

Per-Modality Cryptographic Key Management with Independent Forward-Secrecy Ratchets

Novelty: 8/10 FTO: GREEN
SSP-FRAME-001 Filed

Multimodal Packet Frame Format with Cross-Modal Prediction Dependencies

Novelty: 8/10 FTO: GREEN
PROV-037 Filed

Domain-Adaptive Neural Codec Tokenizer for Heterogeneous Sensory Modalities

Novelty: 9/10 FTO: GREEN
PROV-038 Filed

Latent-Space Codec Watermarking for Neural Audio/Video Provenance

Novelty: 10/10 FTO: GREEN

Enterprise & Defense

SSP provides the multimodal sensing infrastructure layer for organizations that cannot trust third-party processing of biometric, spatial, or classified audio/video streams.

Selective Disclosure

Per-modality encryption with independent forward-secrecy ratchets enables granular access control. Share audio transcription without video access. Share motion tracking without biometric data. Isolate thermal/spatial from emotional analysis. Each modality's key chain ratchets independently per SSP-ENC-001.

EU AI Act Compliance

PROV-038 (Latent-Space Codec Watermarking) embeds imperceptible provenance markers in the latent space of neural codec tokens. Survives re-encoding, transcoding, and adversarial extraction attempts. Addresses Article 50 watermarking requirements for AI-generated audio/video content.

Target Compliance

  • EU AI Act (Article 50 watermarking)
  • FIPS 140-3 (ChaCha20-Poly1305, AES-256-GCM)
  • ISO/IEC 23090-31:2025 (MPEG-I MIHS)
  • RFC 2119 requirement language

Implementation

Reference implementation in Rust. Full frame serialization, wire format roundtrip, HMAC verification, modality slot encoding, and token stream parsing.

main.rs Rust
use saj_sense::{SspFrame, ModalitySlot};
 
// Build a frame with audio + biometric modalities
let audio = ModalitySlot {
modality_id: 0x01, // AudioSpeech
codec_id: 0x03, // SajCodecMed
encryption_key_id: 1,
payload: audio_data,
..Default::default()
};
 
let bio = ModalitySlot {
modality_id: 0x08, // Biometric
encryption_key_id: 2, // Separate key
..Default::default()
};
 
// Selective disclosure: key 1 != key 2
let frame = SspFrame::new(vec![audio, bio]);
let wire = frame.to_bytes();
 
// Roundtrip verification
let (parsed, n) = SspFrame::from_bytes(&wire)?;
assert_eq!(parsed.slots.len(), 2);
assert_eq!(n, wire.len());
Install
cargo add saj-sense
476+
Tests passing
0
Failures
22
Byte frame header
12
Modalities

Modality Slot Wire Format

Each SSP frame carries 1-255 modality slots. Each slot has a 12-byte fixed header followed by variable payload bytes.

Offset Size Field Description
0 1 Modality ID Registered modality identifier (0x01-0x0C, 0x10+)
1 1 Codec ID Codec used for this modality's payload
2 2 Payload Length Length of variable payload in bytes
4 4 Sync Anchor Microsecond timestamp for cross-modal sync
8 4 bits QoS / LOD Quality-of-service level for bandwidth adaptation
8.5 4 bits Prediction Deps Cross-modal prediction dependency bitmap
9 2 Token Count Number of discrete tokens in payload
11 1 Encryption Key ID Per-modality key reference (0 = unencrypted)

Stay Updated on SSP

The Saj Sense Protocol specification is under active development. Join the waitlist to receive updates on new versions, reference implementations, and early access to the SDK.

No spam. Unsubscribe anytime.