webrtc/docs/native-code/rtp-hdrext/video-layers-allocation00
Philipp Hancke dde1cb6212 Add note about two-byte extension to VLA docs
since the extension can be too large to fit the 16 bytes available
to one-byte extensions
  https://www.rfc-editor.org/rfc/rfc8285#section-4.2
when including the width and height fields.
Also document when those fields are sent.

BUG=webrtc:12000

Change-Id: If17f57d40c0bde9b060f223c548e407d6c124b82
Reviewed-on: https://webrtc-review.googlesource.com/c/src/+/321200
Reviewed-by: Harald Alvestrand <hta@webrtc.org>
Reviewed-by: Per Kjellander <perkj@webrtc.org>
Commit-Queue: Philipp Hancke <phancke@microsoft.com>
Cr-Commit-Position: refs/heads/main@{#40910}
2023-10-11 11:20:19 +00:00
..
README.md Add note about two-byte extension to VLA docs 2023-10-11 11:20:19 +00:00

Video Layers Allocation

The goal of this extension is for a video sender to provide information about the target bitrate, resolution and frame rate of each scalability layer in order to aid a selective forwarding middlebox to decide which layer to relay.

Name: "Video layers allocation version 0"

Formal name: http://www.webrtc.org/experiments/rtp-hdrext/video-layers-allocation00

Status: This extension is defined here to allow for experimentation.

In a conference scenario, a video from a single sender may be received by several recipients with different downlink bandwidth constraints and UI requirements. To allow this, a sender can send video with several scalability layers and a middle box can choose a layer to relay for each receiver.

This extension support temporal layers, multiple spatial layers sent on a single rtp stream (SVC), or independent spatial layers sent on multiple rtp streams (simulcast).

RTP header extension format

Note: when including the optional width, height and maximum framerate fields, the total data length of the extension can exceed 16 bytes and is sent as a two-byte header extension [1]

Data layout

//                           +-+-+-+-+-+-+-+-+
//                           |RID| NS| sl_bm |
//                           +-+-+-+-+-+-+-+-+
// Spatial layer bitmask     |sl0_bm |sl1_bm |
//   up to 2 bytes           |---------------|
//   when sl_bm == 0         |sl2_bm |sl3_bm |
//                           +-+-+-+-+-+-+-+-+
// Number of temporal layers |#tl|#tl|#tl|#tl|
// per spatial layer         |   |   |   |   |
//                           +-+-+-+-+-+-+-+-+
//  Target bitrate in kpbs   |               |
//   per temporal layer      :      ...      :
//    leb128 encoded         |               |
//                           +-+-+-+-+-+-+-+-+
// Resolution and framerate  |               |
// 5 bytes per spatial layer + width-1 for   +
//      (optional)           | rid=0, sid=0  |
//                           +---------------+
//                           |               |
//                           + height-1 for  +
//                           | rid=0, sid=0  |
//                           +---------------+
//                           | max framerate |
//                           +-+-+-+-+-+-+-+-+
//                           :      ...      :
//                           +-+-+-+-+-+-+-+-+

RID: RTP stream index this allocation is sent on, numbered from 0. 2 bits.

NS: Number of RTP streams minus one. 2 bits, thus allowing up-to 4 RTP streams.

sl_bm: BitMask of the active Spatial Layers when same for all RTP streams or 0 otherwise. 4 bits, thus allows up to 4 spatial layers per RTP streams.

slX_bm: BitMask of the active Spatial Layers for RTP stream with index=X. When NS < 2, takes one byte, otherwise uses two bytes. Zero-padded to byte alignment.

#tl: 2-bit value of number of temporal layers-1, thus allowing up-to 4 temporal layers. Values are stored in ascending order of spatial id. Zero-padded to byte alignment.

Target bitrate in kbps. Values are stored using leb128 encoding [2]. One value per temporal layer. Values are stored in (RTP stream id, spatial id, temporal id) ascending order. All bitrates are total required bitrate to receive the corresponding layer, i.e. in simulcast mode they include only corresponding spatial layers, in full-svc all lower spatial layers are included. All lower temporal layers are also included.

Resolution and framerate. Optional. Presence is inferred from the rtp header extension size. Encoded (width - 1), 16-bit, (height - 1), 16-bit, max frame rate 8-bit per spatial layer per RTP stream. Values are stored in (RTP stream id, spatial id) ascending order. Only sent when the resolution differs from the last values, the framerate changed by more than 5fps and on key frames.

An empty layer allocation (i.e nothing sent on ssrc) is encoded as special case with a single 0 byte.

[1] https://www.rfc-editor.org/rfc/rfc8285#section-4.3 [2] https://aomediacodec.github.io/av1-spec/#leb128