-
Notifications
You must be signed in to change notification settings - Fork 770
Support ID3/EMSG metadata in HLS audio renditions #3043
Description
Summary
Currently, Media3 only captures EMSG-wrapped ID3 metadata from HLS "variant" tracks (streams declared via EXT-X-STREAM-INF). ID3 metadata present in audio renditions declared via EXT-X-MEDIA is silently dropped.
This enhancement request is to support ID3 metadata extraction from audio renditions, which is common in CMAF HLS streams where Nielsen watermarks or other timed metadata are embedded in audio segments rather than video.
Use Case
Many HLS streams using CMAF (fMP4) packaging place timed metadata (Nielsen watermarks, ad markers, etc.) in audio rendition segments rather than video variant segments. This is a valid and common packaging approach, but Media3 currently doesn't surface this metadata to applications.
Example multivariant playlist structure:
#EXTM3U
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="aac-128k",NAME="English",URI="audio-aac/128_slide.m3u8"
#EXT-X-STREAM-INF:BANDWIDTH=3598000,CODECS="avc1.4d4020,mp4a.40.2",AUDIO="aac-128k"
video-2400K/slide.m3u8
In this structure, EMSG boxes containing ID3 data may be in audio-aac/128_slide.m3u8 segments, not in the video variant segments.
Current Behavior
In DefaultHlsExtractorFactory.createFragmentedMp4Extractor():
Lines 329 to 332 in 01c3757
| // Only enable the EMSG TrackOutput if this is the 'variant' track (i.e. the main one) to avoid | |
| // creating a separate EMSG track for every audio track in a video stream. | |
| @FragmentedMp4Extractor.Flags | |
| int flags = isFmp4Variant(format) ? FragmentedMp4Extractor.FLAG_ENABLE_EMSG_TRACK : 0; |
The isFmp4Variant() check returns false for audio renditions (they have empty variantInfos), so FLAG_ENABLE_EMSG_TRACK is not set, and EMSG boxes are silently ignored.
Lines 349 to 358 in 01c3757
| /** Returns true if this {@code format} represents a 'variant' track (i.e. the main one). */ | |
| private static boolean isFmp4Variant(Format format) { | |
| Metadata metadata = format.metadata; | |
| if (metadata == null) { | |
| return false; | |
| } | |
| return metadata.getFirstMatchingEntry( | |
| HlsTrackMetadataEntry.class, trackMetadata -> !trackMetadata.variantInfos.isEmpty()) | |
| != null; | |
| } |
Proposed Enhancement
Change 1: Enable EMSG track for all fMP4 extractors
In DefaultHlsExtractorFactory.createFragmentedMp4Extractor(), always enable EMSG track output for fMP4 HLS segments:
// Enable EMSG track for all fMP4 HLS to capture ID3 metadata from any track
int flags = FragmentedMp4Extractor.FLAG_ENABLE_EMSG_TRACK;
Considerations:
- This creates an additional TrackOutput for each fMP4 extractor (including audio renditions)
- Noticed that increased overhead was very minimal (limited testing)
- The trade-off is a small memory increase for broader metadata support
- An alternative would be to make this configurable via HlsMediaSource.Factory
Change 2: Add ID3 TrackGroup for audio renditions during chunkless preparation
When allowChunklessPreparation = true, audio rendition wrappers are prepared without downloading segments. Currently, no ID3 TrackGroup is declared for audio renditions during this process, so when EMSG is later discovered during playback, there's no TrackGroup to map it to.
media/libraries/exoplayer_hls/src/main/java/androidx/media3/exoplayer/hls/HlsMediaPeriod.java
Lines 655 to 682 in 01c3757
| if (allowChunklessPreparation && codecsStringAllowsChunklessPreparation) { | |
| List<TrackGroup> muxedTrackGroups = new ArrayList<>(); | |
| if (numberOfVideoCodecs > 0) { | |
| Format[] videoFormats = new Format[selectedVariantsCount]; | |
| for (int i = 0; i < videoFormats.length; i++) { | |
| videoFormats[i] = deriveVideoFormat(selectedPlaylistFormats[i]); | |
| } | |
| muxedTrackGroups.add(new TrackGroup(sampleStreamWrapperUid, videoFormats)); | |
| if (numberOfAudioCodecs > 0 | |
| && (multivariantPlaylist.muxedAudioFormat != null | |
| || multivariantPlaylist.audios.isEmpty())) { | |
| muxedTrackGroups.add( | |
| new TrackGroup( | |
| /* id= */ sampleStreamWrapperUid + ":audio", | |
| deriveAudioFormat( | |
| selectedPlaylistFormats[0], | |
| multivariantPlaylist.muxedAudioFormat, | |
| /* isPrimaryTrackInVariant= */ false))); | |
| } | |
| List<Format> ccFormats = multivariantPlaylist.muxedCaptionFormats; | |
| if (ccFormats != null) { | |
| for (int i = 0; i < ccFormats.size(); i++) { | |
| String ccId = sampleStreamWrapperUid + ":cc:" + i; | |
| muxedTrackGroups.add( | |
| new TrackGroup(ccId, extractorFactory.getOutputTextFormat(ccFormats.get(i)))); | |
| } | |
| } |
In HlsMediaPeriod.buildAndPrepareAudioSampleStreamWrappers():
if (allowChunklessPreparation && codecStringsAllowChunklessPreparation) {
Format[] renditionFormats = scratchPlaylistFormats.toArray(new Format[0]);
// Add ID3 track group for audio renditions to capture potential EMSG metadata
TrackGroup id3TrackGroup = new TrackGroup(
sampleStreamWrapperUid + ":id3",
new Format.Builder()
.setId("ID3")
.setSampleMimeType(MimeTypes.APPLICATION_ID3)
.build());
sampleStreamWrapper.prepareWithMultivariantPlaylistInfo(
new TrackGroup[] {
new TrackGroup(sampleStreamWrapperUid, renditionFormats),
id3TrackGroup
},
/* primaryTrackGroupIndex= */ 0,
/* optionalTrackGroupsIndices...= */ 1);
}
Considerations:
This mirrors the existing behavior for the main (variant) wrapper, which already adds an ID3 TrackGroup during chunkless preparation
The ID3 track is marked as optional, so it won't affect track selection
Without this change, EMSG metadata in audio works with allowChunklessPreparation = false but not with true
Environment
Media3 version: 1.8.0 (also reproduced on latest main (1.9.1))
Affected streams: CMAF HLS with EMSG-wrapped ID3 in audio renditions
Additional Context
HLS doesn't signal metadata track availability in the playlist, so Media3 makes upfront assumptions. The current assumption (metadata only in variants) doesn't cover the valid use case of metadata in audio renditions. These proposed changes expand support while maintaining backward compatibility.