Skip to content

Support ID3/EMSG metadata in HLS audio renditions #3043

@SvanGestel-dss

Description

@SvanGestel-dss

Summary

Currently, Media3 only captures EMSG-wrapped ID3 metadata from HLS "variant" tracks (streams declared via EXT-X-STREAM-INF). ID3 metadata present in audio renditions declared via EXT-X-MEDIA is silently dropped.
This enhancement request is to support ID3 metadata extraction from audio renditions, which is common in CMAF HLS streams where Nielsen watermarks or other timed metadata are embedded in audio segments rather than video.

Use Case

Many HLS streams using CMAF (fMP4) packaging place timed metadata (Nielsen watermarks, ad markers, etc.) in audio rendition segments rather than video variant segments. This is a valid and common packaging approach, but Media3 currently doesn't surface this metadata to applications.

Example multivariant playlist structure:

#EXTM3U
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="aac-128k",NAME="English",URI="audio-aac/128_slide.m3u8"
#EXT-X-STREAM-INF:BANDWIDTH=3598000,CODECS="avc1.4d4020,mp4a.40.2",AUDIO="aac-128k"
video-2400K/slide.m3u8

In this structure, EMSG boxes containing ID3 data may be in audio-aac/128_slide.m3u8 segments, not in the video variant segments.

Current Behavior

In DefaultHlsExtractorFactory.createFragmentedMp4Extractor():

// Only enable the EMSG TrackOutput if this is the 'variant' track (i.e. the main one) to avoid
// creating a separate EMSG track for every audio track in a video stream.
@FragmentedMp4Extractor.Flags
int flags = isFmp4Variant(format) ? FragmentedMp4Extractor.FLAG_ENABLE_EMSG_TRACK : 0;

The isFmp4Variant() check returns false for audio renditions (they have empty variantInfos), so FLAG_ENABLE_EMSG_TRACK is not set, and EMSG boxes are silently ignored.
/** Returns true if this {@code format} represents a 'variant' track (i.e. the main one). */
private static boolean isFmp4Variant(Format format) {
Metadata metadata = format.metadata;
if (metadata == null) {
return false;
}
return metadata.getFirstMatchingEntry(
HlsTrackMetadataEntry.class, trackMetadata -> !trackMetadata.variantInfos.isEmpty())
!= null;
}

Proposed Enhancement

Change 1: Enable EMSG track for all fMP4 extractors
In DefaultHlsExtractorFactory.createFragmentedMp4Extractor(), always enable EMSG track output for fMP4 HLS segments:

// Enable EMSG track for all fMP4 HLS to capture ID3 metadata from any track
int flags = FragmentedMp4Extractor.FLAG_ENABLE_EMSG_TRACK;

Considerations:

  • This creates an additional TrackOutput for each fMP4 extractor (including audio renditions)
  • Noticed that increased overhead was very minimal (limited testing)
  • The trade-off is a small memory increase for broader metadata support
  • An alternative would be to make this configurable via HlsMediaSource.Factory

Change 2: Add ID3 TrackGroup for audio renditions during chunkless preparation
When allowChunklessPreparation = true, audio rendition wrappers are prepared without downloading segments. Currently, no ID3 TrackGroup is declared for audio renditions during this process, so when EMSG is later discovered during playback, there's no TrackGroup to map it to.

if (allowChunklessPreparation && codecsStringAllowsChunklessPreparation) {
List<TrackGroup> muxedTrackGroups = new ArrayList<>();
if (numberOfVideoCodecs > 0) {
Format[] videoFormats = new Format[selectedVariantsCount];
for (int i = 0; i < videoFormats.length; i++) {
videoFormats[i] = deriveVideoFormat(selectedPlaylistFormats[i]);
}
muxedTrackGroups.add(new TrackGroup(sampleStreamWrapperUid, videoFormats));
if (numberOfAudioCodecs > 0
&& (multivariantPlaylist.muxedAudioFormat != null
|| multivariantPlaylist.audios.isEmpty())) {
muxedTrackGroups.add(
new TrackGroup(
/* id= */ sampleStreamWrapperUid + ":audio",
deriveAudioFormat(
selectedPlaylistFormats[0],
multivariantPlaylist.muxedAudioFormat,
/* isPrimaryTrackInVariant= */ false)));
}
List<Format> ccFormats = multivariantPlaylist.muxedCaptionFormats;
if (ccFormats != null) {
for (int i = 0; i < ccFormats.size(); i++) {
String ccId = sampleStreamWrapperUid + ":cc:" + i;
muxedTrackGroups.add(
new TrackGroup(ccId, extractorFactory.getOutputTextFormat(ccFormats.get(i))));
}
}

In HlsMediaPeriod.buildAndPrepareAudioSampleStreamWrappers():

if (allowChunklessPreparation && codecStringsAllowChunklessPreparation) {
    Format[] renditionFormats = scratchPlaylistFormats.toArray(new Format[0]);
    // Add ID3 track group for audio renditions to capture potential EMSG metadata
    TrackGroup id3TrackGroup = new TrackGroup(
        sampleStreamWrapperUid + ":id3",
        new Format.Builder()
            .setId("ID3")
            .setSampleMimeType(MimeTypes.APPLICATION_ID3)
            .build());
    sampleStreamWrapper.prepareWithMultivariantPlaylistInfo(
        new TrackGroup[] {
            new TrackGroup(sampleStreamWrapperUid, renditionFormats),
            id3TrackGroup
        },
        /* primaryTrackGroupIndex= */ 0,
        /* optionalTrackGroupsIndices...= */ 1);
}

Considerations:
This mirrors the existing behavior for the main (variant) wrapper, which already adds an ID3 TrackGroup during chunkless preparation
The ID3 track is marked as optional, so it won't affect track selection
Without this change, EMSG metadata in audio works with allowChunklessPreparation = false but not with true

Environment
Media3 version: 1.8.0 (also reproduced on latest main (1.9.1))
Affected streams: CMAF HLS with EMSG-wrapped ID3 in audio renditions

Additional Context
HLS doesn't signal metadata track availability in the playlist, so Media3 makes upfront assumptions. The current assumption (metadata only in variants) doesn't cover the valid use case of metadata in audio renditions. These proposed changes expand support while maintaining backward compatibility.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions