Gen-art changes
This commit is contained in:
parent
e134dc4785
commit
3fe9cca1fb
1 changed files with 154 additions and 98 deletions
|
@ -98,7 +98,7 @@ Only the decoder portion of this software is normative, though a
|
|||
significant amount of code is shared by both the encoder and decoder.
|
||||
<xref target="conformance"/> provides a decoder conformance test.
|
||||
The decoder contains a great deal of integer and fixed-point arithmetic which
|
||||
must be performed exactly, including all rounding considerations, so any
|
||||
needs to be performed exactly, including all rounding considerations, so any
|
||||
useful specification requires domain-specific symbolic language to adequately
|
||||
define these operations.
|
||||
Additionally, any
|
||||
|
@ -136,8 +136,8 @@ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
|
|||
interpreted as described in RFC 2119 <xref target="rfc2119"></xref>.
|
||||
</t>
|
||||
<t>
|
||||
Even when using floating-point, various operations in the codec require
|
||||
bit-exact fixed-point behavior.
|
||||
Various operations in the codec require bit-exact fixed-point behavior, even
|
||||
when writing a floating point implementation.
|
||||
The notation "Q<n>", where n is an integer, denotes the number of binary
|
||||
digits to the right of the decimal point in a fixed-point number.
|
||||
For example, a signed Q14 value in a 16-bit word can represent values from
|
||||
|
@ -191,6 +191,41 @@ sign(x) = < 0, x == 0 ,
|
|||
</t>
|
||||
</section>
|
||||
|
||||
<section anchor="abs" toc="exclude" title="abs(x)">
|
||||
<t>
|
||||
The absolute value of x, i.e.,
|
||||
<figure align="center">
|
||||
<artwork align="center"><![CDATA[
|
||||
abs(x) = sign(x)*x .
|
||||
]]></artwork>
|
||||
</figure>
|
||||
</t>
|
||||
</section>
|
||||
|
||||
<section anchor="floor" toc="exclude" title="floor(f)">
|
||||
<t>
|
||||
The largest integer z such that z <= f.
|
||||
</t>
|
||||
</section>
|
||||
|
||||
<section anchor="ceil" toc="exclude" title="ceil(f)">
|
||||
<t>
|
||||
The smallest integer z such that z >= f.
|
||||
</t>
|
||||
</section>
|
||||
|
||||
<section anchor="round" toc="exclude" title="round(f)">
|
||||
<t>
|
||||
The integer z nearest to f, with ties rounded towards negative infinity,
|
||||
i.e.,
|
||||
<figure align="center">
|
||||
<artwork align="center"><![CDATA[
|
||||
round(f) = ceil(f - 0.5) .
|
||||
]]></artwork>
|
||||
</figure>
|
||||
</t>
|
||||
</section>
|
||||
|
||||
<section anchor="log2" toc="exclude" title="log2(f)">
|
||||
<t>
|
||||
The base-two logarithm of f.
|
||||
|
@ -221,12 +256,6 @@ Examples:
|
|||
</t>
|
||||
</section>
|
||||
|
||||
<section anchor="floor" toc="exclude" title="floor(x)">
|
||||
<t>
|
||||
Largest integer z such that z <= x.
|
||||
</t>
|
||||
</section>
|
||||
|
||||
</section>
|
||||
|
||||
</section>
|
||||
|
@ -312,10 +341,9 @@ On the other hand, non-speech signals are not always adequately coded using
|
|||
<t>
|
||||
A "Hybrid" mode allows the use of both layers simultaneously with a frame size
|
||||
of 10 or 20 ms and a SWB or FB audio bandwidth.
|
||||
Each frame is split into a low frequency signal and a high frequency signal,
|
||||
with a cutoff of 8 kHz.
|
||||
The LP layer then codes the low frequency signal, followed by the MDCT layer
|
||||
coding the high frequency signal.
|
||||
The LP layer codes the low frequencies by resampling the signal down to WB.
|
||||
The MDCT layer follows, coding the high frequency portion of the signal.
|
||||
The cutoff between the two lies at 8 kHz, the maximum WB audio bandwidth.
|
||||
In the MDCT layer, all bands below 8 kHz are discarded, so there is no
|
||||
coding redundancy between the two layers.
|
||||
</t>
|
||||
|
@ -528,6 +556,10 @@ Support for that variant is OPTIONAL.
|
|||
All bit diagrams in this document number the bits so that bit 0 is the most
|
||||
significant bit of the first byte, and bit 7 is the least significant.
|
||||
Bit 8 is thus the most significant bit of the second byte, etc.
|
||||
Well-formed Opus packets obey certain requirements, marked [R1] through [R7]
|
||||
below.
|
||||
These are summarized in <xref target="malformed-packets"/> along with
|
||||
appropriate means of handling malformed packets.
|
||||
</t>
|
||||
|
||||
<section anchor="toc_byte" title="The TOC Byte">
|
||||
|
@ -606,9 +638,10 @@ This draft refers to a packet as a code 0 packet, code 1 packet, etc., based on
|
|||
the value of "c".
|
||||
</t>
|
||||
|
||||
<t>
|
||||
<t anchor="R1">
|
||||
A well-formed Opus packet MUST contain at least one byte with the TOC
|
||||
information, though the frame(s) within a packet MAY be zero bytes long.
|
||||
information [R1], though the frame(s) within a packet MAY be zero bytes
|
||||
long.
|
||||
</t>
|
||||
</section>
|
||||
|
||||
|
@ -649,12 +682,13 @@ It is also roughly the maximum useful rate of the MDCT layer, as shortly
|
|||
on the codebook sizes.
|
||||
</t>
|
||||
|
||||
<t>
|
||||
<t anchor="R2">
|
||||
No length is transmitted for the last frame in a VBR packet, or for any of the
|
||||
frames in a CBR packet, as it can be inferred from the total size of the
|
||||
packet and the size of all other data in the packet.
|
||||
However, the length of any individual frame MUST NOT exceed 1275 bytes, to
|
||||
allow for repacketization by gateways, conference bridges, or other software.
|
||||
However, the length of any individual frame MUST NOT exceed
|
||||
1275 bytes [R2], to allow for repacketization by gateways,
|
||||
conference bridges, or other software.
|
||||
</t>
|
||||
</section>
|
||||
|
||||
|
@ -681,13 +715,13 @@ For code 0 packets, the TOC byte is immediately followed by N-1 bytes
|
|||
</section>
|
||||
|
||||
<section title="Code 1: Two Frames in the Packet, Each with Equal Compressed Size">
|
||||
<t>
|
||||
<t anchor="R3">
|
||||
For code 1 packets, the TOC byte is immediately followed by the
|
||||
(N-1)/2 bytes of compressed data for the first frame, followed by
|
||||
(N-1)/2 bytes of compressed data for the second frame, as illustrated in
|
||||
<xref target="code1_packet"/>.
|
||||
The number of payload bytes available for compressed data, N-1, MUST be even
|
||||
for all code 1 packets.
|
||||
for all code 1 packets [R3].
|
||||
</t>
|
||||
<figure anchor="code1_packet" title="A Code 1 Packet" align="center">
|
||||
<artwork align="center"><![CDATA[
|
||||
|
@ -709,7 +743,7 @@ The number of payload bytes available for compressed data, N-1, MUST be even
|
|||
</section>
|
||||
|
||||
<section title="Code 2: Two Frames in the Packet, with Different Compressed Sizes">
|
||||
<t>
|
||||
<t anchor="R4">
|
||||
For code 2 packets, the TOC byte is followed by a one- or two-byte sequence
|
||||
indicating the length of the first frame (marked N1 in <xref target='code2_packet'/>),
|
||||
followed by N1 bytes of compressed data for the first frame.
|
||||
|
@ -720,7 +754,7 @@ A code 2 packet MUST contain enough bytes to represent a valid length.
|
|||
For example, a 1-byte code 2 packet is always invalid, and a 2-byte code 2
|
||||
packet whose second byte is in the range 252...255 is also invalid.
|
||||
The length of the first frame, N1, MUST also be no larger than the size of the
|
||||
payload remaining after decoding that length for all code 2 packets.
|
||||
payload remaining after decoding that length for all code 2 packets [R4].
|
||||
This makes, for example, a 2-byte code 2 packet with a second byte in the range
|
||||
1...251 invalid as well (the only valid 2-byte code 2 packet is one where the
|
||||
length of both frames is zero).
|
||||
|
@ -745,17 +779,17 @@ This makes, for example, a 2-byte code 2 packet with a second byte in the range
|
|||
</section>
|
||||
|
||||
<section title="Code 3: A Signaled Number of Frames in the Packet">
|
||||
<t>
|
||||
<t anchor="R5">
|
||||
Code 3 packets signal the number of frames, as well as additional
|
||||
padding, called "Opus padding" to indicate that this padding is added at the
|
||||
Opus layer, rather than at the transport layer.
|
||||
Code 3 packets MUST have at least 2 bytes.
|
||||
Code 3 packets MUST have at least 2 bytes [R6,R7].
|
||||
The TOC byte is followed by a byte encoding the number of frames in the packet
|
||||
in bits 2 to 7 (marked "M" in <xref target='frame_count_byte'/>), with bit 1 indicating whether
|
||||
or not Opus padding is inserted (marked "p" in <xref target='frame_count_byte'/>), and bit 0
|
||||
indicating VBR (marked "v" in <xref target='frame_count_byte'/>).
|
||||
M MUST NOT be zero, and the audio duration contained within a packet MUST NOT
|
||||
exceed 120 ms.
|
||||
exceed 120 ms [R5].
|
||||
This limits the maximum frame count for any frame size to 48 (for 2.5 ms
|
||||
frames), with lower limits for longer frame sizes.
|
||||
<xref target="frame_count_byte"/> illustrates the layout of the frame count
|
||||
|
@ -777,7 +811,7 @@ Values from 0...254 indicate that 0...254 bytes of padding are included,
|
|||
in addition to the byte(s) used to indicate the size of the padding.
|
||||
If the value is 255, then the size of the additional padding is 254 bytes,
|
||||
plus the padding value encoded in the next byte.
|
||||
There MUST be at least one more byte in the packet in this case.
|
||||
There MUST be at least one more byte in the packet in this case [R6,R7].
|
||||
The additional padding bytes appear at the end of the packet, and MUST be set
|
||||
to zero by the encoder to avoid creating a covert channel.
|
||||
The decoder MUST accept any value for the padding bytes, however.
|
||||
|
@ -795,17 +829,17 @@ To add 256 bytes to a packet, set the padding bit to 1, insert two bytes after
|
|||
By using the value 255 multiple times, it is possible to create a packet of any
|
||||
specific, desired size.
|
||||
Let P be the number of header bytes used to indicate the padding size plus the
|
||||
total amount of padding bytes (i.e., the total number of bytes added to the
|
||||
packet).
|
||||
Then P MUST be no more than N-2.
|
||||
number of padding bytes themselves (i.e., P is the total number of bytes added
|
||||
to the packet).
|
||||
Then P MUST be no more than N-2 [R6,R7].
|
||||
</t>
|
||||
<t>
|
||||
In the CBR case, the compressed length of each frame in bytes is equal to the
|
||||
number of remaining bytes R in the packet after subtracting the (optional)
|
||||
padding, (R=N-2-P), divided by M.
|
||||
The value R MUST be a non-negative integer multiple of M.
|
||||
The compressed data for all M frames then follows, each of size
|
||||
(N-2-P)/M bytes, as illustrated in <xref target="code3cbr_packet"/>.
|
||||
<t anchor="R6">
|
||||
In the CBR case, let R=N-2-P be the number of bytes remaining in the packet
|
||||
after subtracting the (optional) padding.
|
||||
Then the compressed length of each frame in bytes is equal to R/M.
|
||||
The value R MUST be a non-negative integer multiple of M [R6].
|
||||
The compressed data for all M frames follows, each of size
|
||||
R/M bytes, as illustrated in <xref target="code3cbr_packet"/>.
|
||||
</t>
|
||||
|
||||
<figure anchor="code3cbr_packet" title="A CBR Code 3 Packet" align="center">
|
||||
|
@ -816,11 +850,11 @@ The compressed data for all M frames then follows, each of size
|
|||
| config |s|1|1|0|p| M | Padding length (Optional) :
|
||||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||||
| |
|
||||
: Compressed frame 1 ((N-2-P)/M bytes)... :
|
||||
: Compressed frame 1 (R/M bytes)... :
|
||||
| |
|
||||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||||
| |
|
||||
: Compressed frame 2 ((N-2-P)/M bytes)... :
|
||||
: Compressed frame 2 (R/M bytes)... :
|
||||
| |
|
||||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||||
| |
|
||||
|
@ -828,7 +862,7 @@ The compressed data for all M frames then follows, each of size
|
|||
| |
|
||||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||||
| |
|
||||
: Compressed frame M ((N-2-P)/M bytes)... :
|
||||
: Compressed frame M (R/M bytes)... :
|
||||
| |
|
||||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
||||
: Opus Padding (Optional)... |
|
||||
|
@ -836,13 +870,13 @@ The compressed data for all M frames then follows, each of size
|
|||
]]></artwork>
|
||||
</figure>
|
||||
|
||||
<t>
|
||||
<t anchor="R7">
|
||||
In the VBR case, the (optional) padding length is followed by M-1 frame
|
||||
lengths (indicated by "N1" to "N[M-1]" in <xref target='code3vbr_packet'/>), each encoded in a
|
||||
one- or two-byte sequence as described above.
|
||||
The packet MUST contain enough data for the M-1 lengths after removing the
|
||||
(optional) padding, and the sum of these lengths MUST be no larger than the
|
||||
number of bytes remaining in the packet after decoding them.
|
||||
number of bytes remaining in the packet after decoding them [R7].
|
||||
The compressed data for all M frames follows, each frame consisting of the
|
||||
indicated number of bytes, with the final frame consuming any remaining bytes
|
||||
before the final padding, as illustrated in <xref target="code3cbr_packet"/>.
|
||||
|
@ -944,7 +978,7 @@ Four FB stereo 20 ms CELT frames of the same compressed size:
|
|||
</figure>
|
||||
</section>
|
||||
|
||||
<section title="Receiving Malformed Packets">
|
||||
<section anchor="malformed-packets" title="Receiving Malformed Packets">
|
||||
<t>
|
||||
A receiver MUST NOT process packets which violate any of the rules above as
|
||||
normal Opus packets.
|
||||
|
@ -956,15 +990,16 @@ Packets which violate these constraints may cause implementations of
|
|||
</t>
|
||||
<t>
|
||||
These constraints are summarized here for reference:
|
||||
<list style="symbols">
|
||||
<list style="format [R%d]">
|
||||
<t>Packets are at least one byte.</t>
|
||||
<t>No implicit frame length is larger than 1275 bytes.</t>
|
||||
<t>Code 1 packets have an odd total length, N, so that (N-1)/2 is an
|
||||
integer.</t>
|
||||
<t>Code 2 packets have enough bytes after the TOC for a valid frame length, and
|
||||
that length is no larger than the number of bytes remaining in the packet.</t>
|
||||
<t>Code 3 packets contain at least one frame, but no more than 120 ms of
|
||||
audio total.</t>
|
||||
<t>Code 2 packets have enough bytes after the TOC for a valid frame
|
||||
length, and that length is no larger than the number of bytes remaining in the
|
||||
packet.</t>
|
||||
<t>Code 3 packets contain at least one frame, but no more than 120 ms
|
||||
of audio total.</t>
|
||||
<t>The length of a CBR code 3 packet, N, is at least two bytes, the number of
|
||||
bytes added to indicate the padding size plus the trailing padding bytes
|
||||
themselves, P, is no more than N-2, and the frame count, M, satisfies
|
||||
|
@ -1078,14 +1113,22 @@ The range decoder maintains an internal state vector composed of the two-tuple
|
|||
current range and the actual coded value, minus one, and the size of the
|
||||
current range, respectively.
|
||||
Both val and rng are 32-bit unsigned integer values.
|
||||
The decoder initializes rng to 128 and initializes val to 127 minus the top 7
|
||||
bits of the first input octet.
|
||||
It saves the remaining bit for use in the renormalization procedure described
|
||||
in <xref target="range-decoder-renorm"/>, which the decoder invokes
|
||||
immediately after initialization to read additional bits and establish the
|
||||
invariant that rng > 2**23.
|
||||
</t>
|
||||
|
||||
<section anchor="range-decoder-init" title="Range Decoder Initialization">
|
||||
<t>
|
||||
Let b0 be the first input octet (or zero if there are no octets in this Opus
|
||||
frame).
|
||||
The decoder initializes rng to 128 and initializes val to
|
||||
(127 - (b0>>1)), where (b0>>1) is the top 7 bits of the
|
||||
first input octet.
|
||||
It saves the remaining bit, (b0&1), for use in the renormalization
|
||||
procedure described in <xref target="range-decoder-renorm"/>, which the
|
||||
decoder invokes immediately after initialization to read additional bits and
|
||||
establish the invariant that rng > 2**23.
|
||||
</t>
|
||||
</section>
|
||||
|
||||
<section anchor="decoding-symbols" title="Decoding Symbols">
|
||||
<t>
|
||||
Decoding a symbol is a two-step process.
|
||||
|
@ -1103,7 +1146,7 @@ fs = ft - min(------ + 1, ft) .
|
|||
rng/ft
|
||||
]]></artwork>
|
||||
</figure>
|
||||
The divisions here are exact integer division.
|
||||
The divisions here are integer division.
|
||||
</t>
|
||||
<t>
|
||||
The decoder then identifies the symbol in the current context corresponding to
|
||||
|
@ -1159,13 +1202,14 @@ To normalize the range, the decoder repeats the following process, implemented
|
|||
by ec_dec_normalize() (entdec.c), until rng > 2**23.
|
||||
If rng is already greater than 2**23, the entire process is skipped.
|
||||
First, it sets rng to (rng<<8).
|
||||
Then it reads the next octet of the payload and combines it with the left-over
|
||||
bit buffered from the previous octet to form the 8-bit value sym.
|
||||
It takes the left-over bit as the high bit (bit 7) of sym, and the top 7 bits
|
||||
of the octet it just read as the other 7 bits of sym.
|
||||
Then it reads the next octet of the Opus frame and forms an 8-bit value sym,
|
||||
using the left-over bit buffered from the previous octet as the high bit
|
||||
and the top 7 bits of the octet just read as the other 7 bits of sym.
|
||||
The remaining bit in the octet just read is buffered for use in the next
|
||||
iteration.
|
||||
If no more input octets remain, it uses zero bits instead.
|
||||
See <xref target="range-decoder-init"/> for the initialization used to process
|
||||
the first octet.
|
||||
Then, it sets
|
||||
<figure align="center">
|
||||
<artwork align="center"><![CDATA[
|
||||
|
@ -1771,6 +1815,8 @@ In order to properly produce LBRR frames under all conditions, an encoder might
|
|||
transitions.
|
||||
However, the reference implementation opts to disable LBRR frames at the
|
||||
transition point for simplicity.
|
||||
Since transitions are relatively infrequent in normal usage, this does not have
|
||||
a significant impact on packet loss robustness.
|
||||
</t>
|
||||
|
||||
<t>
|
||||
|
@ -1849,11 +1895,11 @@ The quantized excitation signal (see <xref target="silk_excitation"/>) follows
|
|||
<c><xref target="silk_gains"/></c>
|
||||
<c/>
|
||||
|
||||
<c>Normalized LSF Stage 1 Index</c>
|
||||
<c>Normalized LSF Stage-1 Index</c>
|
||||
<c><xref target="silk_nlsf_stage1_pdfs"/></c>
|
||||
<c/>
|
||||
|
||||
<c>Normalized LSF Stage 2 Residual</c>
|
||||
<c>Normalized LSF Stage-2 Residual</c>
|
||||
<c><xref target="silk_nlsf_stage2"/></c>
|
||||
<c/>
|
||||
|
||||
|
@ -1978,7 +2024,7 @@ wi0 = i0 + 3*(n/5)
|
|||
wi1 = i2 + 3*(n%5)
|
||||
]]></artwork>
|
||||
</figure>
|
||||
where the division is exact integer division.
|
||||
where the division is integer division.
|
||||
The range of these indices is 0 to 14, inclusive.
|
||||
Let w[i] be the i'th weight from <xref target="silk_stereo_weights_table"/>.
|
||||
Then the two prediction weights, w0_Q13 and w1_Q13, are
|
||||
|
@ -1994,6 +2040,9 @@ w0_Q13 = w_Q13[wi0]
|
|||
</figure>
|
||||
N.b., w1_Q13 is computed first here, because w0_Q13 depends on it.
|
||||
The constant 6554 is approximately 0.1 in Q16.
|
||||
Although wi0 and wi1 only have 15 possible values,
|
||||
<xref target="silk_stereo_weights_table"/> contains 16 entries to allow
|
||||
interpolation between entry wi0 and (wi0 + 1) (and likewise for wi1).
|
||||
</t>
|
||||
|
||||
<texttable anchor="silk_stereo_weights_table"
|
||||
|
@ -2064,6 +2113,7 @@ In that case, if this flag is zero (indicating that there should be a side
|
|||
channel), then Packet Loss Concealment (PLC, see
|
||||
<xref target="Packet Loss Concealment"/>) SHOULD be invoked to recover a
|
||||
side channel signal.
|
||||
Otherwise, the stereo image will collapse.
|
||||
</t>
|
||||
|
||||
<texttable anchor="silk_mid_only_pdf" title="Mid-only Flag PDF">
|
||||
|
@ -2171,7 +2221,7 @@ The 3 least significant bits are decoded using a uniform PDF:
|
|||
</texttable>
|
||||
|
||||
<t>
|
||||
These 6 bits are combined to form a gain index between 0 and 63.
|
||||
These 6 bits are combined to form a value, gain_index, between 0 and 63.
|
||||
When the gain for the previous subframe is available, then the current gain is
|
||||
limited as follows:
|
||||
<figure align="center">
|
||||
|
@ -2182,11 +2232,10 @@ log_gain = max(gain_index, previous_log_gain - 16) .
|
|||
This may help some implementations limit the change in precision of their
|
||||
internal LTP history.
|
||||
The indices which this clamp applies to cannot simply be removed from the
|
||||
codebook, because the previous gain index will not be available after packet
|
||||
loss.
|
||||
This step is skipped after a decoder reset, and in the side channel if the
|
||||
previous frame in the side channel was not coded, since there is no previous
|
||||
gain index.
|
||||
codebook, because previous_log_gain will not be available after packet loss.
|
||||
The clamping is skipped after a decoder reset, and in the side channel if the
|
||||
previous frame in the side channel was not coded, since there is no value for
|
||||
previous_log_gain available.
|
||||
It MAY also be skipped after packet loss.
|
||||
</t>
|
||||
|
||||
|
@ -2195,7 +2244,7 @@ For subframes which do not have an independent gain (including the first
|
|||
subframe of frames not listed as using independent coding above), the
|
||||
quantization gain is coded relative to the gain from the previous subframe (in
|
||||
the same channel).
|
||||
The PDF in <xref target="silk_delta_gain_pdf"/> yields a delta gain index
|
||||
The PDF in <xref target="silk_delta_gain_pdf"/> yields a delta_gain_index value
|
||||
between 0 and 40, inclusive.
|
||||
</t>
|
||||
<texttable anchor="silk_delta_gain_pdf"
|
||||
|
@ -2212,8 +2261,8 @@ The following formula translates this index into a quantization gain for the
|
|||
current subframe using the gain from the previous subframe:
|
||||
<figure align="center">
|
||||
<artwork align="center"><![CDATA[
|
||||
log_gain = clamp(0, max(2*gain_index - 16,
|
||||
previous_log_gain + gain_index - 4), 63) .
|
||||
log_gain = clamp(0, max(2*delta_gain_index - 16,
|
||||
previous_log_gain + delta_gain_index - 4), 63) .
|
||||
]]></artwork>
|
||||
</figure>
|
||||
</t>
|
||||
|
@ -2251,10 +2300,10 @@ A set of normalized Line Spectral Frequency (LSF) coefficients follow the
|
|||
Coding (LPC) coefficients for the current SILK frame.
|
||||
Once decoded, the normalized LSFs form an increasing list of Q15 values between
|
||||
0 and 1.
|
||||
These represent the interleaved zeros on the unit circle between 0 and pi
|
||||
(hence "normalized") in the standard decomposition of the LPC filter into a
|
||||
symmetric part and an anti-symmetric part (P and Q in
|
||||
<xref target="silk_nlsf2lpc"/>).
|
||||
These represent the interleaved zeros on the upper half of the unit circle
|
||||
(between 0 and pi, hence "normalized") in the standard decomposition
|
||||
<xref target="line-spectral-pairs"/> of the LPC filter into a symmetric part
|
||||
and an anti-symmetric part (P and Q in <xref target="silk_nlsf2lpc"/>).
|
||||
Because of non-linear effects in the decoding process, an implementation SHOULD
|
||||
match the fixed-point arithmetic described in this section exactly.
|
||||
An encoder SHOULD also use the same process.
|
||||
|
@ -2275,7 +2324,7 @@ After reconstructing the normalized LSFs
|
|||
All of this is necessary to ensure the reconstruction process is stable.
|
||||
</t>
|
||||
|
||||
<section anchor="silk_nlsf_stage1" title="Stage 1 Normalized LSF Decoding">
|
||||
<section anchor="silk_nlsf_stage1" title="Normalized LSF Stage 1 Decoding">
|
||||
<t>
|
||||
The first VQ stage uses a 32-element codebook, coded with one of the PDFs in
|
||||
<xref target="silk_nlsf_stage1_pdfs"/>, depending on the audio bandwidth and
|
||||
|
@ -2291,7 +2340,7 @@ The actual codebook elements are listed in
|
|||
</t>
|
||||
|
||||
<texttable anchor="silk_nlsf_stage1_pdfs"
|
||||
title="PDFs for Normalized LSF Index Stage-1 Decoding">
|
||||
title="PDFs for Normalized LSF Stage-1 Index Decoding">
|
||||
<ttcol align="left">Audio Bandwidth</ttcol>
|
||||
<ttcol align="left">Signal Type</ttcol>
|
||||
<ttcol align="left">PDF</ttcol>
|
||||
|
@ -2327,7 +2376,7 @@ The actual codebook elements are listed in
|
|||
|
||||
</section>
|
||||
|
||||
<section anchor="silk_nlsf_stage2" title="Stage 2 Normalized LSF Decoding">
|
||||
<section anchor="silk_nlsf_stage2" title="Normalized LSF Stage 2 Decoding">
|
||||
<t>
|
||||
A total of 16 PDFs are available for the LSF residual in the second stage: the
|
||||
8 (a...h) for NB and MB frames given in
|
||||
|
@ -2341,7 +2390,7 @@ Which PDF is used for which coefficient is driven by the index, I1,
|
|||
</t>
|
||||
|
||||
<texttable anchor="silk_nlsf_stage2_nbmb_pdfs"
|
||||
title="PDFs for NB/MB Normalized LSF Index Stage-2 Decoding">
|
||||
title="PDFs for NB/MB Normalized LSF Stage-2 Index Decoding">
|
||||
<ttcol align="left">Codebook</ttcol>
|
||||
<ttcol align="left">PDF</ttcol>
|
||||
<c>a</c> <c>{1, 1, 1, 15, 224, 11, 1, 1, 1}/256</c>
|
||||
|
@ -2355,7 +2404,7 @@ Which PDF is used for which coefficient is driven by the index, I1,
|
|||
</texttable>
|
||||
|
||||
<texttable anchor="silk_nlsf_stage2_wb_pdfs"
|
||||
title="PDFs for WB Normalized LSF Index Stage-2 Decoding">
|
||||
title="PDFs for WB Normalized LSF Stage-2 Index Decoding">
|
||||
<ttcol align="left">Codebook</ttcol>
|
||||
<ttcol align="left">PDF</ttcol>
|
||||
<c>i</c> <c>{1, 1, 1, 9, 232, 9, 1, 1, 1}/256</c>
|
||||
|
@ -2369,7 +2418,7 @@ Which PDF is used for which coefficient is driven by the index, I1,
|
|||
</texttable>
|
||||
|
||||
<texttable anchor="silk_nlsf_nbmb_stage2_cb_sel"
|
||||
title="Codebook Selection for NB/MB Normalized LSF Index Stage 2 Decoding">
|
||||
title="Codebook Selection for NB/MB Normalized LSF Stage-2 Index Decoding">
|
||||
<ttcol>I1</ttcol>
|
||||
<ttcol>Coefficient</ttcol>
|
||||
<c/>
|
||||
|
@ -2441,7 +2490,7 @@ Which PDF is used for which coefficient is driven by the index, I1,
|
|||
</texttable>
|
||||
|
||||
<texttable anchor="silk_nlsf_wb_stage2_cb_sel"
|
||||
title="Codebook Selection for WB Normalized LSF Index Stage 2 Decoding">
|
||||
title="Codebook Selection for WB Normalized LSF Stage-2 Index Decoding">
|
||||
<ttcol>I1</ttcol>
|
||||
<ttcol>Coefficient</ttcol>
|
||||
<c/>
|
||||
|
@ -2763,7 +2812,7 @@ w2_Q18[k] = (1024/(cb1_Q8[k] - cb1_Q8[k-1])
|
|||
</artwork>
|
||||
</figure>
|
||||
where cb1_Q8[-1] = 0 and cb1_Q8[d_LPC] = 256, and the
|
||||
division is exact integer division.
|
||||
division is integer division.
|
||||
This is reduced to an unsquared, Q9 value using the following square-root
|
||||
approximation:
|
||||
<figure align="center">
|
||||
|
@ -2786,7 +2835,7 @@ The reference implementation already requires code to compute these weights on
|
|||
</t>
|
||||
|
||||
<texttable anchor="silk_nlsf_nbmb_codebook"
|
||||
title="Codebook Vectors for NB/MB Normalized LSF Stage 1 Decoding">
|
||||
title="NB/MB Normalized LSF Stage-1 Codebook Vectors">
|
||||
<ttcol>I1</ttcol>
|
||||
<ttcol>Codebook (Q8)</ttcol>
|
||||
<c/>
|
||||
|
@ -2858,7 +2907,7 @@ The reference implementation already requires code to compute these weights on
|
|||
</texttable>
|
||||
|
||||
<texttable anchor="silk_nlsf_wb_codebook"
|
||||
title="Codebook Vectors for WB Normalized LSF Stage 1 Decoding">
|
||||
title="WB Normalized LSF Stage-1 Codebook Vectors">
|
||||
<ttcol>I1</ttcol>
|
||||
<ttcol>Codebook (Q8)</ttcol>
|
||||
<c/>
|
||||
|
@ -2939,7 +2988,7 @@ NLSF_Q15[k] = clamp(0,
|
|||
(cb1_Q8[k]<<7) + (res_Q10[k]<<14)/w_Q9[k], 32767) ,
|
||||
]]></artwork>
|
||||
</figure>
|
||||
where the division is exact integer division.
|
||||
where the division is integer division.
|
||||
However, nothing in either the reconstruction process or the
|
||||
quantization process in the encoder thus far guarantees that the coefficients
|
||||
are monotonically increasing and separated well enough to ensure a stable
|
||||
|
@ -3010,16 +3059,16 @@ For all other values of i, both NLSF_Q15[i-1] and NLSF_Q15[i] are updated as
|
|||
follows:
|
||||
<figure align="center">
|
||||
<artwork align="center"><![CDATA[
|
||||
i-1
|
||||
__
|
||||
min_center_Q15 = (NDeltaMin[i]>>1) + \ NDeltaMin[k]
|
||||
/_
|
||||
k=0
|
||||
d_LPC
|
||||
__
|
||||
max_center_Q15 = 32768 - (NDeltaMin[i]>>1) - \ NDeltaMin[k]
|
||||
/_
|
||||
k=i+1
|
||||
i-1
|
||||
__
|
||||
min_center_Q15 = (NDeltaMin_Q15[i]>>1) + \ NDeltaMin_Q15[k]
|
||||
/_
|
||||
k=0
|
||||
d_LPC
|
||||
__
|
||||
max_center_Q15 = 32768 - (NDeltaMin_Q15[i]>>1) - \ NDeltaMin_Q15[k]
|
||||
/_
|
||||
k=i+1
|
||||
center_freq_Q15 = clamp(min_center_Q15[i],
|
||||
(NLSF_Q15[i-1] + NLSF_Q15[i] + 1)>>1,
|
||||
max_center_Q15[i])
|
||||
|
@ -3353,7 +3402,7 @@ sc_Q16[0] = 65470 - -------------------------- ,
|
|||
(maxabs_Q12 * (k+1)) >> 2
|
||||
]]></artwork>
|
||||
</figure>
|
||||
where the division here is exact integer division.
|
||||
where the division here is integer division.
|
||||
This is an approximation of the chirp factor needed to reduce the target
|
||||
coefficient to 32767, though it is both less than 0.999 and, for
|
||||
k > 0 when maxabs_Q12 is much greater than 32767, still slightly
|
||||
|
@ -6035,7 +6084,7 @@ rng = rng - --- * (fh - fl) .
|
|||
ft
|
||||
]]></artwork>
|
||||
</figure>
|
||||
The divisions here are exact integer division.
|
||||
The divisions here are integer division.
|
||||
</t>
|
||||
|
||||
<section anchor="range-encoder-renorm" title="Renormalization">
|
||||
|
@ -7605,7 +7654,7 @@ Robust and Efficient Quantization of Speech LSP Parameters Using Structured Vect
|
|||
<reference anchor="Martin79">
|
||||
<front>
|
||||
<title>Range encoding: An algorithm for removing redundancy from a digitised message</title>
|
||||
<author initials="N." surname="Martin" fullname=""><organization/></author>
|
||||
<author initials="G.N.N." surname="Martin" fullname="G. Nigel N. Martin"><organization/></author>
|
||||
<date year="1979" />
|
||||
</front>
|
||||
<seriesInfo name="Proc. Institution of Electronic and Radio Engineers International Conference on Video and Data Recording" value="" />
|
||||
|
@ -7693,6 +7742,13 @@ Robust and Efficient Quantization of Speech LSP Parameters Using Structured Vect
|
|||
</front>
|
||||
</reference>
|
||||
|
||||
<reference anchor="line-spectral-pairs" target="http://en.wikipedia.org/wiki/Line_spectral_pairs">
|
||||
<front>
|
||||
<title>Line Spectral Pairs</title>
|
||||
<author><organization>Wikipedia</organization></author>
|
||||
</front>
|
||||
</reference>
|
||||
|
||||
<reference anchor="range-coding" target="http://en.wikipedia.org/wiki/Range_coding">
|
||||
<front>
|
||||
<title>Range Coding</title>
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue