Update ISO Base Media Format draft to version 0.8.1.

- Switch to 'Opus' file type identification.
- Revise channel mapping to better support ambisonics.
This commit is contained in:
Ralph Giles 2018-09-12 18:42:51 -07:00
parent 5cbd7d5f7d
commit f689e05227
No known key found for this signature in database
GPG key ID: 9259A8F2D2D44C84

View file

@ -7,12 +7,12 @@
</head> </head>
<body bgcolor="0x333333" text="#60B0C0"> <body bgcolor="0x333333" text="#60B0C0">
<b><u>Encapsulation of Opus in ISO Base Media File Format</u></b><br> <b><u>Encapsulation of Opus in ISO Base Media File Format</u></b><br>
<font size="2">last updated: April 28, 2016</font><br> <font size="2">last updated: August 28, 2018</font><br>
<br> <br>
<div class="normal_link pre frame_box"> <div class="normal_link pre frame_box">
Encapsulation of Opus in ISO Base Media File Format Encapsulation of Opus in ISO Base Media File Format
Version 0.6.8 (incomplete) Version 0.8.1 (incomplete)
Table of Contents Table of Contents
@ -20,7 +20,7 @@ Table of Contents
<a href="#2">2</a> Normative References <a href="#2">2</a> Normative References
<a href="#3">3</a> Terms and Definitions <a href="#3">3</a> Terms and Definitions
<a href="#4">4</a> Design Rules of Encapsulation <a href="#4">4</a> Design Rules of Encapsulation
<a href="#4.1">4.1</a> File Type Indentification <a href="#4.1">4.1</a> File Type Identification
<a href="#4.2">4.2</a> Overview of Track Structure <a href="#4.2">4.2</a> Overview of Track Structure
<a href="#4.3">4.3</a> Definitions of Opus sample <a href="#4.3">4.3</a> Definitions of Opus sample
<a href="#4.3.1">4.3.1</a> Sample entry format <a href="#4.3.1">4.3.1</a> Sample entry format
@ -32,7 +32,9 @@ Table of Contents
<a href="#4.3.6.1">4.3.6.1</a> Random Access Point <a href="#4.3.6.1">4.3.6.1</a> Random Access Point
<a href="#4.3.6.2">4.3.6.2</a> Pre-roll <a href="#4.3.6.2">4.3.6.2</a> Pre-roll
<a href="#4.4">4.4</a> Trimming of Actual Duration <a href="#4.4">4.4</a> Trimming of Actual Duration
<a href="#4.5">4.5</a> Channel Layout (informative) <a href="#4.5">4.5</a> Channel Mapping
<a href="#4.5.1">4.5.1</a> ISO Base Media native Channel Mapping
<a href="#4.5.2">4.5.2</a> Composition on all active tracks (informative)
<a href="#4.6">4.6</a> Basic Structure (informative) <a href="#4.6">4.6</a> Basic Structure (informative)
<a href="#4.6.1">4.6.2</a> Initial Movie <a href="#4.6.1">4.6.2</a> Initial Movie
<a href="#4.6.2">4.6.3</a> Movie Fragments <a href="#4.6.2">4.6.3</a> Movie Fragments
@ -53,7 +55,7 @@ Table of Contents
[2] RFC 6716 [2] RFC 6716
Definition of the Opus Audio Codec Definition of the Opus Audio Codec
[3] draft-ietf-codec-oggopus-06 [3] RFC 7845
Ogg Encapsulation for the Opus Audio Codec Ogg Encapsulation for the Opus Audio Codec
<a name="3"></a> <a name="3"></a>
@ -83,8 +85,8 @@ Table of Contents
<a name="4"></a> <a name="4"></a>
4 Design Rules of Encapsulation 4 Design Rules of Encapsulation
4.1 File Type Indentification<a name="4.1"></a> 4.1 File Type Identification<a name="4.1"></a>
This specification does not define any brand to declare files are conformant to this specification. However, This specification defines the brand 'Opus' to declare files are conformant to this specification. Additionally,
files conformant to this specification shall contain at least one brand, which supports the requirements and the files conformant to this specification shall contain at least one brand, which supports the requirements and the
requirements described in this clause without contradiction, in the compatible brands list of the File Type Box. requirements described in this clause without contradiction, in the compatible brands list of the File Type Box.
As an example, the minimal support of the encapsulation of Opus bitstreams in ISO Base Media file format requires As an example, the minimal support of the encapsulation of Opus bitstreams in ISO Base Media file format requires
@ -117,15 +119,14 @@ Table of Contents
The syntax and semantics of the OpusSampleEntry is shown as follows. The syntax and semantics of the OpusSampleEntry is shown as follows.
class OpusSampleEntry() extends AudioSampleEntry ('Opus'){ class OpusSampleEntry() extends AudioSampleEntry ('Opus') {
OpusSpecificBox(); OpusSpecificBox();
} }
+ channelcount: + channelcount:
The channelcount field shall be set to the sum of the total number of Opus bitstreams and the number The channelcount field indicates the number of output channels and shall be set to the same value of
of Opus bitstreams producing two channels. This value is indentical with (M+N), where M is the value of the OutputChannelCount in the OpusDecoderConfigurationRecord. The value of this field may be used in
the *Coupled Stream Count* field and N is the value of the *Stream Count* field in the *Channel Mapping the ChannelLayout if any as described in 4.5.1.
Table* in the identification header defined in Ogg Opus [3].
+ samplesize: + samplesize:
The samplesize field shall be set to 16. The samplesize field shall be set to 16.
+ samplerate: + samplerate:
@ -135,20 +136,21 @@ Table of Contents
4.3.2 Opus Specific Box<a name="4.3.2"></a> 4.3.2 Opus Specific Box<a name="4.3.2"></a>
Exactly one Opus Specific Box shall be present in each OpusSampleEntry. Exactly one Opus Specific Box shall be present in each OpusSampleEntry.
The Opus Specific Box contains the Version field and this specification defines version 0 of this box. The Opus Specific Box contains an OpusDecoderConfigurationRecord which contains the Version field and
If incompatible changes occured in the fields after the Version field within the OpusSpecificBox in the this specification defines version 0 of this record. If incompatible changes occured in the fields after
future versions of this specification, another version will be defined. the Version field within the OpusDecoderConfigurationRecord in the future versions of this specification,
another version will be defined.
This box refers to Ogg Opus [3] at many parts but all the data are stored as big-endian format. This box refers to Ogg Opus [3] at many parts but all the data are stored as big-endian format.
The syntax and semantics of the Opus Specific Box is shown as follows. The syntax and semantics of the Opus Specific Box is shown as follows.
class ChannelMappingTable (unsigned int(8) OutputChannelCount){ class ChannelMappingTable (unsigned int(8) OutputChannelCount) {
unsigned int(8) StreamCount; unsigned int(8) StreamCount;
unsigned int(8) CoupledCount; unsigned int(8) CoupledCount;
unsigned int(8 * OutputChannelCount) ChannelMapping; unsigned int(8 * OutputChannelCount) ChannelMapping;
} }
aligned(8) class OpusSpecificBox extends Box('dOps'){ aligned(8) class OpusDecoderConfigurationRecord {
unsigned int(8) Version; unsigned int(8) Version;
unsigned int(8) OutputChannelCount; unsigned int(8) OutputChannelCount;
unsigned int(16) PreSkip; unsigned int(16) PreSkip;
@ -160,6 +162,10 @@ Table of Contents
} }
} }
class OpusSpecificBox extends Box('dOps') {
OpusDecoderConfigurationRecord() OpusConfig;
}
+ Version: + Version:
The Version field shall be set to 0. The Version field shall be set to 0.
In the future versions of this specification, this field may be set to other values. And without support In the future versions of this specification, this field may be set to other values. And without support
@ -181,7 +187,8 @@ Table of Contents
header define in Ogg Opus [3]. Note that the value is stored as 8.8 fixed-point. header define in Ogg Opus [3]. Note that the value is stored as 8.8 fixed-point.
+ ChannelMappingFamily: + ChannelMappingFamily:
The ChannelMappingFamily field shall be set to the same value as the *Channel Mapping Family* field in The ChannelMappingFamily field shall be set to the same value as the *Channel Mapping Family* field in
the identification header defined in Ogg Opus [3]. the identification header defined in Ogg Opus [3]. Note that the value 255 may be used for an alternative
to map channels by ISO Base Media native mapping. The details are described in 4.5.1.
+ StreamCount: + StreamCount:
The StreamCount field shall be set to the same value as the *Stream Count* field in the identification The StreamCount field shall be set to the same value as the *Stream Count* field in the identification
header defined in Ogg Opus [3]. header defined in Ogg Opus [3].
@ -270,7 +277,24 @@ Table of Contents
the duration of the last Opus sample may be helpful by setting zero to the segment_duration field since the the duration of the last Opus sample may be helpful by setting zero to the segment_duration field since the
value 0 represents implicit duration equal to the sum of the duration of all samples. value 0 represents implicit duration equal to the sum of the duration of all samples.
<a name="4.5"></a> <a name="4.5"></a>
4.5 Channel Layout (informative) 4.5 Channel Mapping
4.5.1 ISO Base Media native Channel Mapping<a name="4.5.1"></a>
ISO Base Media File Format, that is ISO/IEC 14496-12 [1], defines an extension ChannelLayout to the
AudioSampleEntry, which conveys information of mapping channels to loudspeaker positions. The ChannelLayout
enables to specify the channel layout more flexibly than the predefined layouts of the ChannelMappingFamily.
To utilize the ChannelLayout for OpusSampleEntry, the ChannelMappingFamily field should be set to 255.
Even when the ChannelMappingFamily field is set to another value, the assignment of each output channel to
loudspeaker position specified by the ChannelMappingFamily would be changed as specified by the ChannelLayout.
The procedure of the assignment is the following.
1. Decoded channels are mapped to output channels according to the ChannelMappingTable.
2. Output channels are mapped to loudspeaker positions according to the ChannelLayout.
In this way, the parameters of the Opus Specific Box are processed before the ChannelLayout, and the
ChannelLayout shall follow the Opus Specific Box.
4.5.2 Composition on all active tracks (informative)<a name="4.5.2"></a>
By the application of alternate_group in the Track Header Box, whole audio channels in all active tracks from By the application of alternate_group in the Track Header Box, whole audio channels in all active tracks from
non-alternate group and/or different alternate group from each other are composited into the presentation. If non-alternate group and/or different alternate group from each other are composited into the presentation. If
an Opus sample consists of multiple Opus bitstreams, it can be splitted into individual Opus bitstreams and an Opus sample consists of multiple Opus bitstreams, it can be splitted into individual Opus bitstreams and
@ -282,30 +306,33 @@ Table of Contents
OutputChannelCount = 6; OutputChannelCount = 6;
StreamCount = 4; StreamCount = 4;
CoupledCount = 2; CoupledCount = 2;
ChannelMapping = {0, 4, 1, 2, 3, 5}; // front left, front center, front right, rear left, rear right, LFE ChannelMapping = {0, 4, 1, 2, 3, 5}; // front left, front center, front right,
// rear left, rear right, LFE
Here, to couple front left to front right channels into the first stream, and couple rear left to rear right Here, to couple front left to front right channels into the first stream, and couple rear left to rear right
channels into the second stream, reordering is needed since coupled streams must precede any non-coupled stream. channels into the second stream, reordering is needed since coupled streams must precede any non-coupled
You extract the four Opus bitstreams from this track and you encapsulate two of the four into a track and the stream. You extract the four Opus bitstreams from this track and you encapsulate two of the four into a track
others into another track. The former track is as follows. and the others into another track. The former track is as follows.
OutputChannelCount = 6; OutputChannelCount = 6;
StreamCount = 2; StreamCount = 2;
CoupledCount = 2; CoupledCount = 2;
ChannelMapping = {0, 255, 1, 2, 3, 255}; // front left, front center, front right, rear left, rear right, LFE ChannelMapping = {0, 255, 1, 2, 3, 255}; // front left, front center, front right,
// rear left, rear right, LFE
And the latter track is as follows. And the latter track is as follows.
OutputChannelCount = 6; OutputChannelCount = 6;
StreamCount = 2; StreamCount = 2;
CoupledCount = 0; CoupledCount = 0;
ChannelMapping = {255, 0, 255, 255, 255, 1}; // front left, front center, front right, rear left, rear right, LFE ChannelMapping = {255, 0, 255, 255, 255, 1}; // front left, front center, front right,
// rear left, rear right, LFE
In addition, the value of the alternate_group field in the both tracks is set to 0. As the result, the player In addition, the value of the alternate_group field in the both tracks is set to 0. As the result, the player
may play as if channels with 255 are not present, and play the presentation constructed from the both tracks may play as if channels with 255 are not present, and play the presentation constructed from the both tracks
in the same channel layout as the one of the original track. Keep in mind that the way of the composition, i.e. in the same channel layout as the one of the original track. Keep in mind that the way of the composition, i.e.
the mixing for playback, is not defined here, and maybe different results could occur except for the channel the mixing for playback, is not defined here, and maybe different results could occur except for the channel
layout of the original, depending on an implementation or the definition of a derived file format. layout of the original, depending on an implementation or the definition of a derived file format.
Note that some derived file formats may specify the restriction to ignore alternate grouping. In the context of Note that some derived file formats may specify the restriction to ignore alternate grouping. In the context
such file formats, this application is not available. This unavailability does not mean incompatibilities among of such file formats, this application is not available. This unavailability does not mean incompatibilities
file formats unless the restriction to the value of the alternate_group field is specified and brings about among file formats unless the restriction to the value of the alternate_group field is specified and brings
any conflict among their definitions. about any conflict among their definitions.
<a name="4.6"></a> <a name="4.6"></a>
4.6 Basic Structure (informative) 4.6 Basic Structure (informative)
4.6.1 Initial Movie<a name="4.6.1"></a> 4.6.1 Initial Movie<a name="4.6.1"></a>
@ -395,7 +422,7 @@ Table of Contents
+----+----+----+----+----+----+----+----+------------------------------+ +----+----+----+----+----+----+----+----+------------------------------+
| | |sgpd|* | | | | | Sample Group Description Box | | | |sgpd|* | | | | | Sample Group Description Box |
+----+----+----+----+----+----+----+----+------------------------------+ +----+----+----+----+----+----+----+----+------------------------------+
| | |sbgp|* | | | | | Sample to Group Box | | | |sbgp| | | | | | Sample to Group Box |
+----+----+----+----+----+----+----+----+------------------------------+ +----+----+----+----+----+----+----+----+------------------------------+
Figure 3 - Basic structure of Movie Fragment Box Figure 3 - Basic structure of Movie Fragment Box
@ -407,14 +434,14 @@ Table of Contents
<a name="4.7"></a> <a name="4.7"></a>
4.7 Example of Encapsulation (informative) 4.7 Example of Encapsulation (informative)
[File] [File]
size = 17790 size = 17757
[ftyp: File Type Box] [ftyp: File Type Box]
position = 0 position = 0
size = 24 size = 24
major_brand = mp42 : MP4 version 2 major_brand = Opus : Opus audio coding
minor_version = 0 minor_version = 0
compatible_brands compatible_brands
brand[0] = mp42 : MP4 version 2 brand[0] = Opus : Opus audio coding
brand[1] = iso2 : ISO Base Media file format version 2 brand[1] = iso2 : ISO Base Media file format version 2
[moov: Movie Box] [moov: Movie Box]
position = 24 position = 24
@ -444,30 +471,11 @@ Table of Contents
pre_defined = 0x00000000 pre_defined = 0x00000000
pre_defined = 0x00000000 pre_defined = 0x00000000
next_track_ID = 2 next_track_ID = 2
[iods: Object Descriptor Box]
position = 140
size = 33
version = 0
flags = 0x000000
[tag = 0x10: MP4_IOD]
expandableClassSize = 16
ObjectDescriptorID = 1
URL_Flag = 0
includeInlineProfileLevelFlag = 0
reserved = 0xf
ODProfileLevelIndication = 0xff
sceneProfileLevelIndication = 0xff
audioProfileLevelIndication = 0xfe
visualProfileLevelIndication = 0xff
graphicsProfileLevelIndication = 0xff
[tag = 0x0e: ES_ID_Inc]
expandableClassSize = 4
Track_ID = 1
[trak: Track Box] [trak: Track Box]
position = 173 position = 140
size = 608 size = 608
[tkhd: Track Header Box] [tkhd: Track Header Box]
position = 181 position = 148
size = 92 size = 92
version = 0 version = 0
flags = 0x000007 flags = 0x000007
@ -492,7 +500,7 @@ Table of Contents
width = 0.000000 width = 0.000000
height = 0.000000 height = 0.000000
[edts: Edit Box] [edts: Edit Box]
position = 273 position = 240
size = 36 size = 36
[elst: Edit List Box] [elst: Edit List Box]
position = 281 position = 281
@ -505,10 +513,10 @@ Table of Contents
media_time = 312 media_time = 312
media_rate = 1.000000 media_rate = 1.000000
[mdia: Media Box] [mdia: Media Box]
position = 309 position = 276
size = 472 size = 472
[mdhd: Media Header Box] [mdhd: Media Header Box]
position = 317 position = 284
size = 32 size = 32
version = 0 version = 0
flags = 0x000000 flags = 0x000000
@ -519,7 +527,7 @@ Table of Contents
language = und language = und
pre_defined = 0x0000 pre_defined = 0x0000
[hdlr: Handler Reference Box] [hdlr: Handler Reference Box]
position = 349 position = 316
size = 51 size = 51
version = 0 version = 0
flags = 0x000000 flags = 0x000000
@ -530,41 +538,41 @@ Table of Contents
reserved = 0x00000000 reserved = 0x00000000
name = Xiph Audio Handler name = Xiph Audio Handler
[minf: Media Information Box] [minf: Media Information Box]
position = 400 position = 367
size = 381 size = 381
[smhd: Sound Media Header Box] [smhd: Sound Media Header Box]
position = 408 position = 375
size = 16 size = 16
version = 0 version = 0
flags = 0x000000 flags = 0x000000
balance = 0.000000 balance = 0.000000
reserved = 0x0000 reserved = 0x0000
[dinf: Data Information Box] [dinf: Data Information Box]
position = 424 position = 391
size = 36 size = 36
[dref: Data Reference Box] [dref: Data Reference Box]
position = 432 position = 399
size = 28 size = 28
version = 0 version = 0
flags = 0x000000 flags = 0x000000
entry_count = 1 entry_count = 1
[url : Data Entry Url Box] [url : Data Entry Url Box]
position = 448 position = 415
size = 12 size = 12
version = 0 version = 0
flags = 0x000001 flags = 0x000001
location = in the same file location = in the same file
[stbl: Sample Table Box] [stbl: Sample Table Box]
position = 460 position = 427
size = 321 size = 321
[stsd: Sample Description Box] [stsd: Sample Description Box]
position = 468 position = 435
size = 79 size = 79
version = 0 version = 0
flags = 0x000000 flags = 0x000000
entry_count = 1 entry_count = 1
[Opus: Audio Description] [Opus: Audio Description]
position = 484 position = 451
size = 63 size = 63
reserved = 0x000000000000 reserved = 0x000000000000
data_reference_index = 1 data_reference_index = 1
@ -577,7 +585,7 @@ Table of Contents
reserved = 0 reserved = 0
samplerate = 48000.000000 samplerate = 48000.000000
[dOps: Opus Specific Box] [dOps: Opus Specific Box]
position = 520 position = 487
size = 27 size = 27
Version = 0 Version = 0
OutputChannelCount = 6 OutputChannelCount = 6
@ -595,7 +603,7 @@ Table of Contents
4 -> 3: side right 4 -> 3: side right
5 -> 5: rear center 5 -> 5: rear center
[stts: Decoding Time to Sample Box] [stts: Decoding Time to Sample Box]
position = 547 position = 514
size = 24 size = 24
version = 0 version = 0
flags = 0x000000 flags = 0x000000
@ -604,7 +612,7 @@ Table of Contents
sample_count = 18 sample_count = 18
sample_delta = 1920 sample_delta = 1920
[stsc: Sample To Chunk Box] [stsc: Sample To Chunk Box]
position = 571 position = 538
size = 40 size = 40
version = 0 version = 0
flags = 0x000000 flags = 0x000000
@ -618,7 +626,7 @@ Table of Contents
samples_per_chunk = 5 samples_per_chunk = 5
sample_description_index = 1 sample_description_index = 1
[stsz: Sample Size Box] [stsz: Sample Size Box]
position = 611 position = 578
size = 92 size = 92
version = 0 version = 0
flags = 0x000000 flags = 0x000000
@ -643,7 +651,7 @@ Table of Contents
entry_size[16] = 962 entry_size[16] = 962
entry_size[17] = 848 entry_size[17] = 848
[stco: Chunk Offset Box] [stco: Chunk Offset Box]
position = 703 position = 670
size = 24 size = 24
version = 0 version = 0
flags = 0x000000 flags = 0x000000
@ -651,7 +659,7 @@ Table of Contents
chunk_offset[0] = 797 chunk_offset[0] = 797
chunk_offset[1] = 13096 chunk_offset[1] = 13096
[sgpd: Sample Group Description Box] [sgpd: Sample Group Description Box]
position = 727 position = 694
size = 26 size = 26
version = 1 version = 1
flags = 0x000000 flags = 0x000000
@ -660,7 +668,7 @@ Table of Contents
entry_count = 1 entry_count = 1
roll_distance[0] = -2 roll_distance[0] = -2
[sbgp: Sample to Group Box] [sbgp: Sample to Group Box]
position = 753 position = 720
size = 28 size = 28
version = 0 version = 0
flags = 0x000000 flags = 0x000000
@ -670,10 +678,10 @@ Table of Contents
sample_count = 18 sample_count = 18
group_description_index = 1 group_description_index = 1
[free: Free Space Box] [free: Free Space Box]
position = 781 position = 748
size = 8 size = 8
[mdat: Media Data Box] [mdat: Media Data Box]
position = 789 position = 756
size = 17001 size = 17001
<a name="5"></a> <a name="5"></a>
5 Authors' Address 5 Authors' Address