Internet-Draft CARP - a CMAF compliant implementation o September 2025
Law Expires 28 March 2026 [Page]
Workgroup:
Media Over QUIC
Internet-Draft:
draft-law-moq-carp-latest
Published:
Intended Status:
Informational
Expires:
Author:
W. Law
Akamai

CARP - a CMAF compliant implementation of WARP

Abstract

This document updates [WARP] by defining a new optional feature for the streaming format. It specifies the syntax and semantics for adding CMAF-packaged media [CMAF] to WARP.

About This Document

This note is to be removed before publishing as an RFC.

The latest revision of this draft can be found at https://wilaw.github.io/carp/draft-law-moq-carp.html. Status information for this document may be found at https://datatracker.ietf.org/doc/draft-law-moq-carp/.

Discussion of this document takes place on the Media Over QUIC Working Group mailing list (mailto:moq@ietf.org), which is archived at https://mailarchive.ietf.org/arch/browse/moq/. Subscribe at https://www.ietf.org/mailman/listinfo/moq/.

Source for this draft and an issue tracker can be found at https://github.com/wilaw/carp.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 28 March 2026.

Table of Contents

1. Introduction

CARP Streaming Format (CARP) is a media format designed to deliver CMAF [CMAF] and LOC [LOC] compliant media content over MOQ Transport (MOQT) [MoQTransport]. CARP extends WARP and retains all the scope, capabilities and features of WARP including the catalog format, timeline, ABR switching and LOC support. CARP is targeted at real-time and interactive levels of live latency, as well as VOD content.

This document describes version 1 of the CARP streaming format.

2. WARP Extension

All of the specifications, requirements, and terminology defined in [WARP] apply to implementations of this extension unless explicitly noted otherwise in this document.

3. CMAF Packaging

3.1. Initialization headers

A CMAF header is a sequence of CMAF constrained ISO BMFF boxes that do not reference any media samples, but are associated with a CMAF track and are necessary for initializing the decoding of the subsequent CMAF fragments.

The header for a given MOQT Track MUST be packaged by encoding the header using [BASE64] and then inserting that payload as the value of the Initialization data "initData" field in the catalog entry for that Track.

3.2. Switching sets and tracks

This specification defines a direct mapping between CMAF Tracks ( [CMAF] Sect 3.2.1) and MOQT tracks ([MoQTransport] Sect 2.3).

CMAF switching sets are a set of one or more CMAF tracks (3.2.1), where each track is an alternative encoding of the same source content and are constrained to enable seamless track switching (3.3.9).

Each CMAF track in a switching set MUST be transmitted as a separate MOQT Track. The catalog entry for each of these tracks in the switching set MUST carry a Alternate group (altGroup) key with a common value.

The MOQT Group numbers within these switching set tracks MUST be media time-aligned. Mandating the track being media time-aligned requires that the presentation time of the first media sample contained within the first MOQT Object of each MOQT Group is identical.

3.3. Object Packaging

The payload of each Object is subject to the following requirements:

  • MUST contain at least one Movie Fragment Box (moof) followed by a Media Data Box (mdat). This is equivalent to requiring that each Object hold at least one CMAF Chunk. The Media Fragment Box (moof) MUST contain a Movie Fragment Header Box (mfhd) and Track Box (trak) with a Track ID (track_ID) matching a Track Box in the initialization fragment.

  • MAY contain multiple successive CMAF Chunks.

  • MUST contain a single [ISOBMFF] track.

  • MUST contain media content encoded in decode order.

3.4. Group Packaging

Each MOQT Group

  • MUST begin with an Object containing a stream access point (SAP type 1 or 2).

  • MUST contain one or more contiguous Groups of Pictures (GOPs).

  • The Group boundary MUST align with a CMAF Fragment boundary. CMAF Fragments and CMAF Chunks MUST not span Groups.

3.5. Catalog description

3.5.1. CMAF packaging type

This specification extends the allowed packaging values defined in WARP Section 5.2.10 to include a new entry, as defined in Table 1 below:

Table 1
Name Value Reference
CMAF cmaf This RFC

Every Track entry in a CARP catalog MUST declare a "packaging" type value of "cmaf".

3.5.2. Max SAP starting types

This specification adds two track-level catalog fields, as defined in Table 2 below:

Table 2
Field Name Definition
Max Group SAP starting type maxGrpSapStartingType Section 3.5.2.1
Max Object SAP starting type maxObjSapStartingType Section 3.5.2.2
3.5.2.1. Max Group SAP starting type

Location: T Required: Optional JSON Type: Number

A number indicating the maximum SAP type the MOQT Groups in the track start with. [Ed.Note: This field, when the SAP terminology is translated to video codec terminology of Random Access Point (RAP) pictures such as IDR, CRA, etc, would also apply to WARP.]

3.5.2.2. Max Object SAP starting type

Location: T Required: Optional JSON Type: Number

A number indicating the maximum SAP type the MOQT Objects in the track start with.

3.6. Timeline description

This specification extends the METADATA element of the timeline track, defined in WARP Section 7, in the following four aspects:

  • Specification of a general scheme for metadata signalled through the METADATA element of the timeline track [Ed.Note: This aspect should also be applied to WARP, thus should be moved to WARP later on.]

  • Definition of a namespace for CARP-specific metadata signalled through the METADATA element of the timeline track

  • Addition of metadata signalling of SAP type

  • Addition of metadata signalling of earliest presentation time

3.6.1. General metadata scheme

When not empty, the string in the METADATA field MUST contain one or more comma separated key=value pairs, formatted as Strings as specified in Section 3.3.3 of [RFC9651]. For example, the METADATA field can be "key1=1,key2=3268". For another example, the METADATA field can be "key1=1,key2=""hello-world"",key3=3268".

A key name MAY be prefixed with a namespace. When a namespace is present, the separator between the namespace prefix and the key name is '.'.

3.6.2. Namespace for CARP-specific metadata signalled through the METADATA element

For CARP-specific metadata signalled through the METADATA element, the namespace is "timeline:metadata:carp".

3.6.3. Metadata signalling of SAP type

When the key name of a key=value pair is "SAP_TYPE", the value indicates the SAP type the Object begins with. The namespace-prefixed key is "timeline:metadata:carp.SAP_TYPE".

The value 0 indicates that the Object does not start with an ISOBMFF stream access point. The value equal to 1, 2, or 3 indicates that the Object begins with a stream access point of SAP type 1, 2, or 3, respectively. When the Object is the first Object in the Group, the value MUST be equal to 1 or 2.

3.6.4. Metadata signalling of earliest presentation time

When the key name of a key=value pair is "EARLIEST_PTS", the value indicates the earliest media presentation timestamp rounded to the nearest millisecond of all media samples in the Object. The namespace-prefixed key is "timeline:metadata:carp.SAP_TYPE".

WWhen the SAP type the Object begins with is 2 or 3, the EARLIEST_PTS key SHOULD be present.

4. Catalog Examples

The following section provides non-normative JSON examples of various catalogs compliant with this draft.

4.1. Simulcast video tracks - 3 alternate video qualities along with audio

This example shows catalog for a media producer capable of sending 3 time-aligned video tracks for high definition, low definition and medium definition video qualities, along with an audio track.

{
  "version": 1,
  "generatedAt": 1746104606044,
  "tracks":[
    {
      "name": "hd",
      "renderGroup": 1,
      "packaging": "cmaf",
      "isLive": true,
      "initData": "AAAAIGZ0eXBpc281AAA...AAAAAAAAAAAAA",
      "role": "video",
      "codec":"avc1.640028",
      "width":1920,
      "height":1080,
      "bitrate":5000000,
      "framerate":30,
      "altGroup":1
    },
    {
      "name": "md",
      "renderGroup": 1,
      "packaging": "cmaf",
      "isLive": true,
      "initData": "AAAAHGZ0eXBpc281AAA...AAAAAAAAAAAAAA",
      "role": "video",
      "codec":"avc1.64001e",
      "width":720,
      "height":640,
      "bitrate":3000000,
      "framerate":30,
      "altGroup":1
    },
    {
      "name": "sd",
      "renderGroup": 1,
      "packaging": "cmaf",
      "isLive": true,
      "initData": "AAAAHGZ0eXBpc281AAA...AAAAAAAAAAAAAA",
      "role": "video",
      "codec":"avc1.64000d",
      "width":192,
      "height":144,
      "bitrate":500000,
      "framerate":30,
      "altGroup":1
    },
    {
      "name": "audio",
      "renderGroup": 1,
      "packaging": "cmaf",
      "isLive": true,
      "initData": "AAAAHGZ0eXBpc281AAA...AAAAAAAAAAAAAA",
      "role": "audio",
      "codec":"mp4a.40.5",
      "samplerate":48000,
      "channelConfig":"2",
      "bitrate":67071
    }
   ]
}

5. Conventions and Definitions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

6. Security Considerations

TODO Security

7. IANA Considerations

This document has no IANA actions.

8. Normative References

[BASE64]
Josefsson, S., "The Base16, Base32, and Base64 Data Encodings", RFC 4648, DOI 10.17487/RFC4648, , <https://www.rfc-editor.org/rfc/rfc4648>.
[CMAF]
Standardization, I. O. for., "Information technology — Multimedia application format (MPEG-A) — Part 19: Common media application format (CMAF) for segmented media", .
[MoQTransport]
Curley, L., Pugin, K., Nandakumar, S., Vasiliev, V., and I. Swett, "Media over QUIC Transport", Work in Progress, Internet-Draft, draft-ietf-moq-transport-10, , <https://datatracker.ietf.org/doc/html/draft-ietf-moq-transport-10>.
[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/rfc/rfc2119>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/rfc/rfc8174>.
[WARP]
Law, W., Curley, L., Vasiliev, V., Nandakumar, S., and K. Pugin, "WARP Streaming Format", Work in Progress, Internet-Draft, draft-ietf-moq-warp-01, , <https://datatracker.ietf.org/doc/html/draft-ietf-moq-warp-01>.

Acknowledgments

TODO acknowledge.

Author's Address

Will Law
Akamai