Updates from mailing list and other small fixes.
* Bump the document date.
* Mandate that the ID header must complete on the first page (to
remove any ambiguities about this requirement in RFC 3533).
* Remove reundant wording that rillian forgot to remove in 360a4117
.
* Split the "Granule Position" section into subsections.
* Move the first paragraph of the "Other Implementation Notes"
section into the "Granule Position" section, add general seeking
implementation guidance, and be specific about the interaction
between pre-roll and pre-skip.
* Retitle the remaining contents of the "Other Implementation Notes"
section to "Packet Size Limits"
* Specify that all the header fields are REQUIRED (and add a
description of the Channel Mapping Table as a whole, so we can
say when it is REQUIRED).
* Specify that implementations MUST NOT reject headers with extra
data if they have an unknown minor version number.
* Add a reference to RFC 3629 (UTF-8).
* Minor formatting adjustments to vorbis-trim and vorbis-mapping
cites.
* Eliminate semicolons and terrible "Else, if" constructs.
This commit is contained in:
parent
3527f9d4c4
commit
b3744613b7
1 changed files with 94 additions and 28 deletions
|
@ -51,7 +51,7 @@
|
||||||
</address>
|
</address>
|
||||||
</author>
|
</author>
|
||||||
|
|
||||||
<date day="3" month="July" year="2012"/>
|
<date day="16" month="July" year="2012"/>
|
||||||
<area>RAI</area>
|
<area>RAI</area>
|
||||||
<workgroup>codec</workgroup>
|
<workgroup>codec</workgroup>
|
||||||
|
|
||||||
|
@ -141,7 +141,7 @@ The first packet in the logical Ogg bitstream MUST contain the identification
|
||||||
(ID) header, which uniquely identifies a stream as Opus audio.
|
(ID) header, which uniquely identifies a stream as Opus audio.
|
||||||
The format of this header is defined in <xref target="id_header"/>.
|
The format of this header is defined in <xref target="id_header"/>.
|
||||||
It MUST be placed alone (without any other packet data) on the first page of
|
It MUST be placed alone (without any other packet data) on the first page of
|
||||||
the logical Ogg bitstream.
|
the logical Ogg bitstream, and must complete on that page.
|
||||||
This page MUST have its 'beginning of stream' flag set.
|
This page MUST have its 'beginning of stream' flag set.
|
||||||
</t>
|
</t>
|
||||||
<t>
|
<t>
|
||||||
|
@ -164,9 +164,9 @@ The value N is specified in the ID header (see
|
||||||
logical Ogg bitstream.
|
logical Ogg bitstream.
|
||||||
</t>
|
</t>
|
||||||
<t>
|
<t>
|
||||||
The first N-1 Opus packets, if any, are packed one after another in sequence
|
The first N-1 Opus packets, if any, are packed one after another into the Ogg
|
||||||
into the Ogg packet, using the self-delimiting framing from Appendix B
|
packet, using the self-delimiting framing from Appendix B of
|
||||||
of <xref target="RFCOpus"/>.
|
<xref target="RFCOpus"/>.
|
||||||
The remaining Opus packet is packed at the end of the Ogg packet using the
|
The remaining Opus packet is packed at the end of the Ogg packet using the
|
||||||
regular, undelimited framing from Section 3 of <xref target="RFCOpus"/>.
|
regular, undelimited framing from Section 3 of <xref target="RFCOpus"/>.
|
||||||
All of the Opus packets in a single Ogg packet MUST be constrained to have the
|
All of the Opus packets in a single Ogg packet MUST be constrained to have the
|
||||||
|
@ -244,6 +244,7 @@ In order to support capturing a stream that uses discontinuous transmission
|
||||||
not transmitted.
|
not transmitted.
|
||||||
</t>
|
</t>
|
||||||
|
|
||||||
|
<section anchor="preskip" title="Pre-skip">
|
||||||
<t>
|
<t>
|
||||||
There is some amount of latency introduced during the decoding process, to
|
There is some amount of latency introduced during the decoding process, to
|
||||||
allow for overlap in the MDCT modes, stereo mixing in the LP modes, and
|
allow for overlap in the MDCT modes, stereo mixing in the LP modes, and
|
||||||
|
@ -269,7 +270,9 @@ It may also be used to perform sample-accurate cropping of existing encoded
|
||||||
This amount need not be a multiple of 2.5 ms, may be smaller than a single
|
This amount need not be a multiple of 2.5 ms, may be smaller than a single
|
||||||
packet, or may span the contents of several packets.
|
packet, or may span the contents of several packets.
|
||||||
</t>
|
</t>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section anchor="pcm_sample_position" title="PCM Sample Position">
|
||||||
<t>
|
<t>
|
||||||
The PCM sample position is determined from the granule position using the
|
The PCM sample position is determined from the granule position using the
|
||||||
formula
|
formula
|
||||||
|
@ -306,7 +309,7 @@ In this case, the PCM sample position of the first audio sample to be played
|
||||||
<t>
|
<t>
|
||||||
Vorbis streams use a granule position smaller than the number of audio samples
|
Vorbis streams use a granule position smaller than the number of audio samples
|
||||||
contained in the first audio data page to indicate that some of those samples
|
contained in the first audio data page to indicate that some of those samples
|
||||||
must be trimmed from the output. See <xref target="vorbis-trim"/>.
|
must be trimmed from the output (see <xref target="vorbis-trim"/>).
|
||||||
However, to do so, Vorbis requires that the first audio data page contains
|
However, to do so, Vorbis requires that the first audio data page contains
|
||||||
exactly two packets, in order to allow the decoder to perform PCM position
|
exactly two packets, in order to allow the decoder to perform PCM position
|
||||||
adjustments before needing to return any PCM data.
|
adjustments before needing to return any PCM data.
|
||||||
|
@ -315,7 +318,9 @@ Opus uses the pre-skip mechanism for this purpose instead, since the encoder
|
||||||
large packets in streams with a very large number of channels might not fit on
|
large packets in streams with a very large number of channels might not fit on
|
||||||
a single page.
|
a single page.
|
||||||
</t>
|
</t>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section title="end_trimming" title="End Trimming">
|
||||||
<t>
|
<t>
|
||||||
The page with the 'end of stream' flag set MAY have a granule position that
|
The page with the 'end of stream' flag set MAY have a granule position that
|
||||||
indicates the page contains less audio data than would normally be returned by
|
indicates the page contains less audio data than would normally be returned by
|
||||||
|
@ -330,7 +335,10 @@ The remaining samples are discarded.
|
||||||
The number of discarded samples SHOULD be no larger than the number decoded
|
The number of discarded samples SHOULD be no larger than the number decoded
|
||||||
from the last packet.
|
from the last packet.
|
||||||
</t>
|
</t>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section anchor="start_granpos_restrictions"
|
||||||
|
title="Restrictions on the Initial Granule Position">
|
||||||
<t>
|
<t>
|
||||||
The granule position of the first audio data page with a completed packet MAY
|
The granule position of the first audio data page with a completed packet MAY
|
||||||
be larger than the number of samples contained in packets that complete on
|
be larger than the number of samples contained in packets that complete on
|
||||||
|
@ -367,6 +375,32 @@ This would indicate that more samples should be skipped from the initial
|
||||||
</t>
|
</t>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
|
<section anchor="seeking_and_preroll" title="Seeking and Pre-roll">
|
||||||
|
<t>
|
||||||
|
Seeking in Ogg files is best performed using a bisection search for a page
|
||||||
|
whose granule position corresponds to a PCM position at or before the seek
|
||||||
|
target.
|
||||||
|
With appropriately weighted bisection, accurate seeking can be performed with
|
||||||
|
just three or four bisections even in multi-gigabyte files.
|
||||||
|
See <xref target="seeking"/> for general implementation guidance.
|
||||||
|
</t>
|
||||||
|
|
||||||
|
<t>
|
||||||
|
When seeking within an Ogg Opus stream, the decoder SHOULD start decoding (and
|
||||||
|
discarding the output) at least 3840 samples (80 ms) prior to the
|
||||||
|
seek target in order to ensure that the output audio is correct by the time it
|
||||||
|
reaches the seek target.
|
||||||
|
This 'pre-roll' is separate from, and unrelated to, the 'pre-skip' used at the
|
||||||
|
beginning of the stream.
|
||||||
|
If the point 80 ms prior to the seek target comes before the initial PCM
|
||||||
|
sample position, the decoder SHOULD start decoding from the beginning of the
|
||||||
|
stream, applying pre-skip as normal, regardless of whether the pre-skip is
|
||||||
|
larger or smaller than 80 ms.
|
||||||
|
</t>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
</section>
|
||||||
|
|
||||||
<section anchor="headers" title="Header Packets">
|
<section anchor="headers" title="Header Packets">
|
||||||
<t>
|
<t>
|
||||||
An Opus stream contains exactly two mandatory header packets.
|
An Opus stream contains exactly two mandatory header packets.
|
||||||
|
@ -473,12 +507,12 @@ The original sample rate of the encoder input is not preserved by the lossy
|
||||||
An Ogg Opus player SHOULD select the playback sample rate according to the
|
An Ogg Opus player SHOULD select the playback sample rate according to the
|
||||||
following procedure:
|
following procedure:
|
||||||
<list style="numbers">
|
<list style="numbers">
|
||||||
<t>If the hardware supports 48 kHz playback, decode at 48 kHz;</t>
|
<t>If the hardware supports 48 kHz playback, decode at 48 kHz.</t>
|
||||||
<t>Else, if the hardware's highest available sample rate is a supported
|
<t>Otherwise, if the hardware's highest available sample rate is a supported
|
||||||
rate, decode at this sample rate;</t>
|
rate, decode at this sample rate.</t>
|
||||||
<t>Else, if the hardware's highest available sample rate is less than
|
<t>Otherwise, if the hardware's highest available sample rate is less than
|
||||||
48 kHz, decode at the highest supported rate above this and resample;</t>
|
48 kHz, decode at the highest supported rate above this and resample.</t>
|
||||||
<t>Else, decode at 48 kHz and resample.</t>
|
<t>Otherwise, decode at 48 kHz and resample.</t>
|
||||||
</list>
|
</list>
|
||||||
However, the 'Input Sample Rate' field allows the encoder to pass the sample
|
However, the 'Input Sample Rate' field allows the encoder to pass the sample
|
||||||
rate of the original input stream as metadata.
|
rate of the original input stream as metadata.
|
||||||
|
@ -542,9 +576,28 @@ Each possible value of this octet indicates a mapping family, which defines a
|
||||||
allowed channel count.
|
allowed channel count.
|
||||||
The details are described in <xref target="channel_mapping"/>.
|
The details are described in <xref target="channel_mapping"/>.
|
||||||
</t>
|
</t>
|
||||||
|
<t><spanx style="strong">Channel Mapping Table</spanx>:
|
||||||
|
This table defines the mapping from encoded streams to output channels.
|
||||||
|
It is omitted when the channel mapping family is 0, but REQUIRED otherwise.
|
||||||
|
Its contents are specified in <xref target="channel_mapping"/>.
|
||||||
|
</t>
|
||||||
</list>
|
</list>
|
||||||
</t>
|
</t>
|
||||||
|
|
||||||
|
<t>
|
||||||
|
All fields in the ID headers are REQUIRED, except for the channel mapping
|
||||||
|
table, which is omitted when the channel mapping family is 0.
|
||||||
|
Implementations SHOULD reject ID headers which do not contain enough data for
|
||||||
|
these fields, even if they contain a valid Magic Signature.
|
||||||
|
Future versions of this specification, even backwards-compatible versions,
|
||||||
|
might include additional fields in the ID header.
|
||||||
|
If an ID header has a compatible major version, but a larger minor version,
|
||||||
|
an implementation MUST NOT reject it for containing additional data not
|
||||||
|
specified here.
|
||||||
|
However, implementations MAY reject streams in which the ID header does not
|
||||||
|
complete on the first page.
|
||||||
|
</t>
|
||||||
|
|
||||||
<section anchor="channel_mapping" title="Channel Mapping">
|
<section anchor="channel_mapping" title="Channel Mapping">
|
||||||
<t>
|
<t>
|
||||||
An Ogg Opus stream allows mapping one number of Opus streams (N) to a possibly
|
An Ogg Opus stream allows mapping one number of Opus streams (N) to a possibly
|
||||||
|
@ -658,9 +711,8 @@ When the 'channel mapping family' octet has this value, the channel mapping
|
||||||
<vspace blankLines="1"/>
|
<vspace blankLines="1"/>
|
||||||
Allowed numbers of channels: 1...8.<vspace/>
|
Allowed numbers of channels: 1...8.<vspace/>
|
||||||
Channel meanings depend on the number of channels.
|
Channel meanings depend on the number of channels.
|
||||||
See <xref target="vorbis-mapping">the
|
See <xref target="vorbis-mapping"/> for the assignments from output channel
|
||||||
Vorbis mapping</xref> for the assignments from output channel number to
|
number to specific speaker locations.
|
||||||
specific speaker locations.
|
|
||||||
<vspace blankLines="1"/>
|
<vspace blankLines="1"/>
|
||||||
</t>
|
</t>
|
||||||
<t>Family 255 (no defined channel meaning):
|
<t>Family 255 (no defined channel meaning):
|
||||||
|
@ -756,13 +808,13 @@ It MUST NOT indicate that the vendor string is longer than the rest of the
|
||||||
<t><spanx style="strong">Vendor String</spanx> (variable length, UTF-8 vector):
|
<t><spanx style="strong">Vendor String</spanx> (variable length, UTF-8 vector):
|
||||||
<vspace blankLines="1"/>
|
<vspace blankLines="1"/>
|
||||||
This is a simple human-readable tag for vendor information, encoded as a UTF-8
|
This is a simple human-readable tag for vendor information, encoded as a UTF-8
|
||||||
string.
|
string <xref target="RFC3629"/>.
|
||||||
No terminating NUL octet is required.
|
No terminating NUL octet is required.
|
||||||
<vspace blankLines="1"/>
|
<vspace blankLines="1"/>
|
||||||
This tag is intended to identify the codec encoder and encapsulation
|
This tag is intended to identify the codec encoder and encapsulation
|
||||||
implementations, for tracing differences in technical behavior. The
|
implementations, for tracing differences in technical behavior.
|
||||||
user-facing encoding application can use the 'ENCODER' user commment
|
The user-facing encoding application can use the 'ENCODER' user commment tag
|
||||||
tag name to identify themselves.
|
name to identify themselves.
|
||||||
<vspace blankLines="1"/>
|
<vspace blankLines="1"/>
|
||||||
</t>
|
</t>
|
||||||
<t><spanx style="strong">User Comment List Length</spanx> (32 bits, unsigned,
|
<t><spanx style="strong">User Comment List Length</spanx> (32 bits, unsigned,
|
||||||
|
@ -794,6 +846,17 @@ There is one for each user comment indicated by the 'user comment list length'
|
||||||
</list>
|
</list>
|
||||||
</t>
|
</t>
|
||||||
|
|
||||||
|
<t>
|
||||||
|
The vendor string length and user comment list length are REQUIRED, and
|
||||||
|
implementations SHOULD reject comment headers that do not contain enough data
|
||||||
|
for these fields, or that do not contain enough data for the corresponding
|
||||||
|
vendor string or user comments they describe.
|
||||||
|
Making this check before allocating the associated memory to contain the data
|
||||||
|
may help prevent a possible Denial-of-Service (DoS) attack from small comment
|
||||||
|
headers that claim to contain strings longer than the entire packet or more
|
||||||
|
user comments than than could possibly fit in the packet.
|
||||||
|
</t>
|
||||||
|
|
||||||
<t>
|
<t>
|
||||||
The user comment strings follow the NAME=value format described by
|
The user comment strings follow the NAME=value format described by
|
||||||
<xref target="vorbis-comment"/> with the same recommended tag names.
|
<xref target="vorbis-comment"/> with the same recommended tag names.
|
||||||
|
@ -836,19 +899,11 @@ There is no Opus comment tag corresponding to REPLAYGAIN_ALBUM_GAIN.
|
||||||
That information should instead be stored in the ID header's 'output gain'
|
That information should instead be stored in the ID header's 'output gain'
|
||||||
field.
|
field.
|
||||||
</t>
|
</t>
|
||||||
|
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section anchor="other_implementation_notes"
|
<section anchor="packet_size_limits" title="Packet Size Limits">
|
||||||
title="Other Implementation Notes">
|
|
||||||
<t>
|
|
||||||
When seeking within an Ogg Opus stream, the decoder should start decoding (and
|
|
||||||
discarding the output) at least 3840 samples (80 ms) prior to the
|
|
||||||
seek point in order to ensure that the output audio is correct at the seek
|
|
||||||
point.
|
|
||||||
</t>
|
|
||||||
<t>
|
<t>
|
||||||
Technically valid Opus packets can be arbitrarily large due to the padding
|
Technically valid Opus packets can be arbitrarily large due to the padding
|
||||||
format, although the amount of non-padding data they can contain is bounded.
|
format, although the amount of non-padding data they can contain is bounded.
|
||||||
|
@ -978,6 +1033,7 @@ The authors agree to grant third parties the irrevocable right to copy, use,
|
||||||
<references title="Normative References">
|
<references title="Normative References">
|
||||||
|
|
||||||
<?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml"?>
|
<?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml"?>
|
||||||
|
<?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.3629.xml"?>
|
||||||
<?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.3533.xml"?>
|
<?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.3533.xml"?>
|
||||||
<?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.5334.xml"?>
|
<?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.5334.xml"?>
|
||||||
<?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.6381.xml"?>
|
<?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.6381.xml"?>
|
||||||
|
@ -1034,6 +1090,16 @@ The authors agree to grant third parties the irrevocable right to copy, use,
|
||||||
<!--?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.3550.xml"?-->
|
<!--?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.3550.xml"?-->
|
||||||
<?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.4732.xml"?>
|
<?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.4732.xml"?>
|
||||||
|
|
||||||
|
<reference anchor="seeking"
|
||||||
|
target="http://wiki.xiph.org/Seeking">
|
||||||
|
<front>
|
||||||
|
<title>Granulepos Encoding and How Seeking Really Works</title>
|
||||||
|
<author initials="S." surname="Pfeiffer" fullname="Silvia Pfeiffer"/>
|
||||||
|
<author initials="C." surname="Parker" fullname="Conrad Parker"/>
|
||||||
|
<author initials="G." surname="Maxwell" fullname="Greg Maxwell"/>
|
||||||
|
</front>
|
||||||
|
</reference>
|
||||||
|
|
||||||
<reference anchor="replay-gain"
|
<reference anchor="replay-gain"
|
||||||
target="http://wiki.xiph.org/VorbisComment#Replay_Gain">
|
target="http://wiki.xiph.org/VorbisComment#Replay_Gain">
|
||||||
<front>
|
<front>
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue