diff --git a/doc/draft-ietf-codec-opus.xml b/doc/draft-ietf-codec-opus.xml index 37514aa6..89209ed4 100644 --- a/doc/draft-ietf-codec-opus.xml +++ b/doc/draft-ietf-codec-opus.xml @@ -38,6 +38,20 @@ + +Mozilla Corporation +
+ + + + + + + + +tterriberry@mozilla.com +
+
@@ -72,7 +86,10 @@ Thus a codec with both layers available can operate over a wider range than The primary normative part of this specification is provided by the source code in . -The codec contains significant amounts of integer and fixed-point arithmetic +In general, only the decoder portion of this software is normative, though a + significant amount of code is shared by both the encoder and decoder. + +The decoder contains significant amounts of integer and fixed-point arithmetic which must be performed exactly, including all rounding considerations, so any useful specification must make extensive use of domain-specific symbolic language to adequately define these operations. @@ -87,6 +104,7 @@ For these reasons this RFC uses the reference implementation as the sole symbolic representation of the codec. + While the symbolic representation is unambiguous and complete it is not always the easiest way to understand the codec's operation. For this reason this document also describes significant parts of the codec in English and @@ -142,6 +160,30 @@ The largest of two values x and y. +
+
+ +
+ +With this definition, if lo>hi, the lower bound is the one that is enforced. + +
+ +
+ +The sign of x, i.e., +
+ 0 . +]]> +
+
+
+
The base-two logarithm of f. @@ -152,16 +194,13 @@ The base-two logarithm of f. The minimum number of bits required to store a positive integer n in two's complement notation, or 0 for a non-positive integer n. -
0 -]]> - +]]>
- Examples: ilog(-1) = 0 @@ -254,6 +293,12 @@ At the decoder, the two decoder outputs are simply added together. To compensate for the different look-aheads required by each layer, the CELT encoder input is delayed by an additional 2.7 ms. This ensures that low frequencies and high frequencies arrive at the same time. +This extra delay MAY be reduced by an encoder by using less lookahead for noise + shaping or using a simpler resampler in the LP layer, but this will reduce + quality. +However, the base 2.5 ms look-ahead in the CELT layer cannot be reduced in + the encoder because it is needed for the MDCT overlap, whose size is fixed by + the decoder. @@ -348,6 +393,10 @@ When a packet contains multiple VBR frames, the compressed length of one or meaning of the first byte as follows: 0: No frame (DTX or lost packet) + 1...251: Size of the frame in bytes 252...255: A second byte is needed. The total size is (size[1]*4)+size[0] @@ -690,15 +739,13 @@ Then the three-tuple corresponding to the kth symbol is given by
- - - +]]>
The range decoder extracts the symbols and integers encoded using the range @@ -804,8 +851,8 @@ The reference implementation uses three additional decoding methods that are
-The first is ec_decode_bin (entdec.c), defined using the parameter ftb instead - of ft. +The first is ec_decode_bin() (entdec.c), defined using the parameter ftb + instead of ft. It is mathematically equivalent to calling ec_decode() with ft = (1<<ftb), but avoids one of the divisions. @@ -852,6 +899,25 @@ Combining the search with the update allows the division to be replaced by a This is the primary interface with the range decoder in the SILK layer, though it is used in a few places in the CELT layer as well. + +Although icdf[k] is more convenient for the code, the frequency counts, f[k], + are a more natural representation of the probability distribution function + (PDF) for a given symbol. +Therefore this draft lists the latter, not the former, when describing the + context in which a symbol is coded as a list, e.g., {4, 4, 4, 4}/16 for a + uniform context with four possible values and ft=16. +The value of ft after the slash is always the sum of the entries in the PDF, + but is included for convenience. +Contexts with identical probabilities, f[k]/ft, but different values of ft + (or equivalently, ftb) are not the same, and cannot, in general, be used in + place of one another. +An icdf table is also not capable of representing a PDF where the first symbol + has 0 probability. +In such contexts, ec_dec_icdf() can decode the symbol by using a table that + drops the entries for any initial zero-probability values and adding the + constant offset of the first value with a non-zero probability to its return + value. +
@@ -887,7 +953,7 @@ The limit of 8 bits in the range coded symbol is a trade-off between itself (which gets larger as more bits are included). Using raw bits reduces the maximum number of divisions required in the worst case, but means that it may be possible to decode a value outside the range - 0 to ft-1. + 0 to ft-1, inclusive. @@ -983,8 +1049,8 @@ This is the bit reserved for termination of the encoder.
-For ec_tell_frac(), the number of bits rng represents must be computed to - fractional precision. +ec_tell_frac() estimates the number of bits buffered in rng to fractional + precision. Since rng must be greater than 2**23 after renormalization, l must be at least 24. Let r = rng>>(l-16), so that 32768 <= r < 65536, an unsigned Q15 @@ -1005,17 +1071,61 @@ ec_tell_frac() then returns (nbits_total*8 - l).
-
- - At the receiving end, the received packets are by the range decoder split into a number of frames contained in the packet. Each of which contains the necessary information to reconstruct a 20 ms frame of the output signal. - -
- - An overview of the decoder is given in . - -
- - + +The LP layer uses a modified version of the SILK codec (herein simply called + "SILK"), which has a relatively traditional Code-Excited Linear Prediction + (CELP) structure. +It runs in NB, MB, and WB modes internally. +When used in a hybrid frame in SWB or FB mode, the LP layer itself still only + runs in WB mode. + + +Internally, the LP layer of a single Opus frame is composed of either a single + 10 ms SILK frame or between one and three 20 ms SILK frames. +Each SILK frame is in turn composed of either two or four 5 ms subframes. +Optional Low Bit-Rate Redundancy (LBRR) frames, which are redundant copies of + the previous SILK frames, may appear to aid in recovery from packet loss. +If present, these appear before the regular SILK frames. +All of these frames and subframes are decoded from the same range coder, with + no padding between them. +Thus packing multiple SILK frames in a single Opus frame saves, on average, + half a byte per SILK frame. +It also allows some parameters to be predicted from prior SILK frames in the + same Opus frame, since this does not degrade packet loss robustness (beyond + any penalty for merely using larger packets). + + + +Stereo support in SILK uses a variant of mid-side coding, allowing a mono + decoder to simply decode the mid channel. +However, the data for the two channels is interleaved, so a mono decoder must + still unpack the data for the side channel. +It would be required to do so anyway for hybrid Opus frames, or to support + decoding individual 20 ms frames. + + + +Symbol(s) +PDF +Condition +VAD flags {1, 1}/2 +LBRR flag {1, 1}/2 +Per-frame LBRR flags +Frame Type +Gain index + +Order of the symbols in the SILK section of the bit-stream. + + + +
+ +An overview of the decoder is given in . + +
+ +| Range |--->| Decode |---------------------------+ @@ -1035,9 +1145,9 @@ ec_tell_frac() then returns (nbits_total*8 - l). 5: LPC coefficients 6: Decoded signal ]]> - - Decoder block diagram. -
+ +Decoder block diagram. +
@@ -1071,7 +1181,7 @@ ec_tell_frac() then returns (nbits_total*8 - l). @@ -1091,7 +1201,7 @@ e_LPC(n) = e(n) + \ e(n - L - i) * b_i, @@ -1101,7 +1211,1407 @@ y(n) = e_LPC(n) + \ e_LPC(n - i) * a_i,
-
+ + + +
+ +The LP layer begins with two to eight header bits, decoded in silk_Decode() + (silk_dec_API.c). +These consist of one Voice Activity Detection (VAD) bit per frame (up to 3), + followed by a single flag indicating the presence of LBRR frames. +For a stereo packet, these flags correspond to the mid channel, and a second + set of flags is included for the side channel. + + +Because these are the first symbols decoded by the range coder, they can be + extracted directly from the upper bits of the first byte of compressed data. +Thus, a receiver can determine if an Opus frame contains any active SILK frames + or if it contains LBRR frames without the overhead of using the range decoder. + +
+ +
+ +If an Opus frame contains more than one SILK frame, then for each channel that + has its LBRR flag set, a set of per-frame LBRR flags is decoded. +When there are two SILK frames present, the 2-frame LBRR flag PDF from + is used, and when there are three SILK frames + the 3-frame LBRR flag PDF is used. +For each channel, the resulting 2- or 3-bit integer contains the corresponding + LBRR flag for each frame, packed in order from the LSb to the MSb. + + +LBRR frames do not include their own separate VAD flags. +An LBRR frame is only meant to be transmitted for active speech, thus all LBRR + frames are treated as active. + +
+ +
+ + +Each SILK frame or LBRR frame includes a set of side information... + +
+ +Each SILK frame or LBRR frame begins with a single + frame type symbol that jointly codes the signal + type and quantization offset type of the corresponding frame. +If the current frame is an normal SILK frame whose VAD bit was not set (an + inactive frame), then the frame type symbol takes + on the value either 0 or 1 and is decoded using the first PDF in + . +If the frame is an LBRR frame or a normal SILK frame whose VAD flag was set (an + active frame), then the symbol ranges from 2 to 5, + inclusive, and is decoded using the second PDF in + . + translates between the value of the + frame type symbol and the corresponding signal type and quantization offset + type. + + + +VAD Flag +PDF +Inactive {26, 230, 0, 0, 0, 0}/256 +Active or LBRR {0, 0, 24, 74, 148, 10}/256 + + + +Frame Type +Signal Type +Quantization Offset Type +0 Non-speech 0 +1 Non-speech 1 +2 Unvoiced 0 +3 Unvoiced 1 +4 Voiced 0 +5 Voiced 1 + + +
+ +
+ +A separate quantization gain is coded for each 5 ms subframe. +These gains control the step size between quantization levels of the excitation + signal and, therefore, the quality of the reconstruction. +They are independent of the pitch gains coded for voiced frames. +The quantization gains are themselves uniformly quantized to 6 bits on a + log scale, giving them a resolution of approximately 1.369 dB and a range + of approximately 1.94 dB to 88.21 dB. +For the first SILK frame, the first LBRR frame, or an LBRR frame where the + previous LBRR frame was not coded, an independent coding method is used for + the first subframe. +The 3 most significant bits of the quantization gain are decoded using a PDF + selected from based on the + decoded signal type. + + + +Signal Type +PDF +Non-speech {32, 112, 68, 29, 12, 1, 1, 1}/256 +Unvoiced {2, 17, 45, 60, 62, 47, 19, 4}/256 +Voiced {1, 3, 26, 71, 94, 50, 9, 2}/256 + + + +The 3 least significant bits are decoded using a uniform PDF: + + +PDF +{32, 32, 32, 32, 32, 32, 32, 32}/256 + + + +For all other subframes (including the first subframe of the frame when + not using independent coding), the quantization gain is coded relative to the + gain from the previous subframe. +The PDF in yields a delta gain index + between 0 and 40, inclusive. + + +PDF +{6, 5, 11, 31, 132, 21, 8, 4, + 3, 2, 2, 2, 1, 1, 1, 1, + 1, 1, 1, 1, 1, 1, 1, 1, + 1, 1, 1, 1, 1, 1, 1, 1, + 1, 1, 1, 1, 1, 1, 1, 1, 1}/256 + + +The following formula translates this index into a quantization gain for the + current subframe using the gain from the previous subframe: + +
+ +
+ +silk_gains_dequant() (silk_gain_quant.c) dequantizes the gain for the + kth subframe and converts it into a linear Q16 + scale factor via + +
+>16) + 2090) +]]> +
+ +The function silk_log2lin() (silk_log2lin.c) computes an approximation of + of 2**(inLog_Q7/128.0), where inLog_Q7 is its Q7 input. +Let i = inLog_Q7>>7 be the integer part of inLogQ7 and + f = inLog_Q7&127 be the fractional part. +Then, if i < 16, then +
+>16)+f)>>7)*(1< +
+ yields the approximate exponential. +Otherwise, silk_log2lin uses +
+>16)+f)*((1<>7) . +]]> +
+
+
+ +
+ + +Normalized Line Spectral Frequencies (LSFs) follow the quantization gains in + the bitstream, and represent the Linear Prediction Coefficients (LPCs) for the + current SILK frame. +Once decoded, they form an increasing list of Q15 values between 0 and 1. +These represent the interleaved zeros on the unit circle between 0 and pi + (hence "normalized") in the standard decomposition of the LPC filter into a + symmetric part and an anti-symmetric part (P and Q in + ). +Because of non-linear effects in the decoding process, an implementation SHOULD + match the fixed-point arithmetic described in this section exactly. +The reference decoder uses fixed-point arithmetic for this even when running in + floating point mode, for this reason. +An encoder SHOULD also use the same process. + + +The normalized LSFs are coded using a two-stage vector quantizer (VQ). +NB and MB frames use an order-10 predictor, while WB frames use an order-16 + predictor, and thus have different sets of tables. +The first VQ stage uses a 32-element codebook, coded with one of the PDFs in + , depending on the audio bandwidth and + the signal type of the current SILK or LBRR frame. +This yields a single index, I1, for the entire + frame. +This indexes an element in a coarse codebook, selects the PDFs for the + second stage of the VQ, and selects the prediction weights used to remove + intra-frame redundancy from the second stage. +The actual codebook elements are listed in + and + , but they are not needed until the last + stages of reconstructing the LSF coefficients. + + + +Audio Bandwidth +Signal Type +PDF +NB or MB Non-speech or unvoiced + +{44, 34, 30, 19, 21, 12, 11, 3, + 3, 2, 16, 2, 2, 1, 5, 2, + 1, 3, 3, 1, 1, 2, 2, 2, + 3, 1, 9, 9, 2, 7, 2, 1}/256 + +NB or MB Voiced + +{1, 10, 1, 8, 3, 8, 8, 14, +13, 14, 1, 14, 12, 13, 11, 11, +12, 11, 10, 10, 11, 8, 9, 8, + 7, 8, 1, 1, 6, 1, 6, 5}/256 + +WB Non-speech or unvoiced + +{31, 21, 3, 17, 1, 8, 17, 4, + 1, 18, 16, 4, 2, 3, 1, 10, + 1, 3, 16, 11, 16, 2, 2, 3, + 2, 11, 1, 4, 9, 8, 7, 3}/256 + +WB Voiced + +{1, 4, 16, 5, 18, 11, 5, 14, +15, 1, 3, 12, 13, 14, 14, 6, +14, 12, 2, 6, 1, 12, 12, 11, +10, 3, 10, 5, 1, 1, 1, 3}/256 + + + + +A total of 16 PDFs, each with a different PDF, are available for the LSF + residual in the second stage: the 8 (a...h) for NB and MB frames given in + , and the 8 (i...p) for WB frames + given in . +Which PDF is used for which coefficient is driven by the index, I1, + decoded in the first stage. + lists the letter of the + corresponding PDF for each normalized LSF coefficient for NB and MB, and + lists them for WB. + + + +Codebook +PDF +a {1, 1, 1, 15, 224, 11, 1, 1, 1}/256 +b {1, 1, 2, 34, 183, 32, 1, 1, 1}/256 +c {1, 1, 4, 42, 149, 55, 2, 1, 1}/256 +d {1, 1, 8, 52, 123, 61, 8, 1, 1}/256 +e {1, 3, 16, 53, 101, 74, 6, 1, 1}/256 +f {1, 3, 17, 55, 90, 73, 15, 1, 1}/256 +g {1, 7, 24, 53, 74, 67, 26, 3, 1}/256 +h {1, 1, 18, 63, 78, 58, 30, 6, 1}/256 + + + +Codebook +PDF +i {1, 1, 1, 9, 232, 9, 1, 1, 1}/256 +j {1, 1, 2, 28, 186, 35, 1, 1, 1}/256 +k {1, 1, 3, 42, 152, 53, 2, 1, 1}/256 +l {1, 1, 10, 49, 126, 65, 2, 1, 1}/256 +m {1, 4, 19, 48, 100, 77, 5, 1, 1}/256 +n {1, 1, 14, 54, 100, 72, 12, 1, 1}/256 +o {1, 1, 15, 61, 87, 61, 25, 4, 1}/256 +p {1, 7, 21, 50, 77, 81, 17, 1, 1}/256 + + + +I1 +Coefficient + +0 1 2 3 4 5 6 7 8 9 + 0 +a a a a a a a a a a + 1 +b d b c c b c b b b + 2 +c b b b b b b b b b + 3 +b c c c c b c b b b + 4 +c d d d d c c c c c + 5 +a f d d c c c c b b + g +a c c c c c c c c b + 7 +c d g e e e f e f f + 8 +c e f f e f e g e e + 9 +c e e h e f e f f e +10 +e d d d c d c c c c +11 +b f f g e f e f f f +12 +c h e g f f f f f f +13 +c h f f f f f g f e +14 +d d f e e f e f e e +15 +c d d f f e e e e e +16 +c e e g e f e f f f +17 +c f e g f f f e f e +18 +c h e f e f e f f f +19 +c f e g h g f g f e +20 +d g h e g f f g e f +21 +c h g e e e f e f f +22 +e f f e g g f g f e +23 +c f f g f g e g e e +24 +e f f f d h e f f e +25 +c d e f f g e f f e +26 +c d c d d e c d d d +27 +b b c c c c c d c c +28 +e f f g g g f g e f +29 +d f f e e e e d d c +30 +c f d h f f e e f e +31 +e e f e f g f g f e + + + +I1 +Coefficient + +0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 + 0 +i  i  i  i  i  i  i  i  i  i  i  i  i  i  i  i + 1 +k  l  l  l  l  l  k  k  k  k  k  j  j  j  i  l + 2 +k  n  n  l  p  m  m  n  k  n  m  n  n  m  l  l + 3 +i  k  j  k  k  j  j  j  j  j  i  i  i  i  i  j + 4 +i  o  n  m  o  m  p  n  m  m  m  n  n  m  m  l + 5 +i  l  n  n  m  l  l  n  l  l  l  l  l  l  k  m + 6 +i  i  i  i  i  i  i  i  i  i  i  i  i  i  i  i + 7 +i  k  o  l  p  k  n  l  m  n  n  m  l  l  k  l + 8 +i  o  k  o  o  m  n  m  o  n  m  m  n  l  l  l + 9 +k  j  i  i  i  i  i  i  i  i  i  i  i  i  i  i +j0 +i  j  i  i  i  i  i  i  i  i  i  i  i  i  i  j +11 +k  k  l  m  n  l  l  l  l  l  l  l  k  k  j  l +12 +k  k  l  l  m  l  l  l  l  l  l  l  l  k  j  l +13 +l  m  m  m  o  m  m  n  l  n  m  m  n  m  l  m +14 +i  o  m  n  m  p  n  k  o  n  p  m  m  l  n  l +15 +i  j  i  j  j  j  j  j  j  j  i  i  i  i  j  i +16 +j  o  n  p  n  m  n  l  m  n  m  m  m  l  l  m +17 +j  l  l  m  m  l  l  n  k  l  l  n  n  n  l  m +18 +k  l  l  k  k  k  l  k  j  k  j  k  j  j  j  m +19 +i  k  l  n  l  l  k  k  k  j  j  i  i  i  i  i +20 +l  m  l  n  l  l  k  k  j  j  j  j  j  k  k  m +21 +k  o  l  p  p  m  n  m  n  l  n  l  l  k  l  l +22 +k  l  n  o  o  l  n  l  m  m  l  l  l  l  k  m +23 +j  l  l  m  m  m  m  l  n  n  n  l  j  j  j  j +24 +k  n  l  o  o  m  p  m  m  n  l  m  m  l  l  l +25 +i  o  j  j  i  i  i  i  i  i  i  i  i  i  i  i +26 +i  o  o  l  n  k  n  n  l  m  m  p  p  m  m  m +27 +l  l  p  l  n  m  l  l  l  k  k  l  l  l  k  l +28 +i  i  j  i  i  i  k  j  k  j  j  k  k  k  j  j +29 +i  l  k  n  l  l  k  l  k  j  i  i  j  i  i  j +30 +l  n  n  m  p  n  l  l  k  l  k  k  j  i  j  i +31 +k  l  n  l  m  l  l  l  k  j  k  o  m  i  i  i + + + +Decoding the second stage residual proceeds as follows. +For each coefficient, the decoder reads a symbol using the PDF corresponding to + I1 from either or + , and subtracts 4 from the result + to given an index in the range -4 to 4, inclusive. +If the index is either -4 or 4, it reads a second symbol using the PDF in + , and adds the value of this second symbol + to the index, using the same sign. +This gives the index, I2[k], a total range of -10 to 10, inclusive. + + + +PDF +{156, 60, 24, 9, 4, 2, 1}/256 + + + +The decoded indices from both stages are translated back into normalized LSF + coefficients in silk_NLSF_decode() (silk_NLSF_decode.c). +The stage-2 indices represent residuals after both the first stage of the VQ + and a separate backwards-prediction step. +The backwards prediction process in the encoder subtracts a prediction from + each residual formed by a multiple of the coefficient that follows it. +The decoder must undo this process. + contains lists of prediction weights + for each coefficient. +There are two lists for NB and MB, and another two lists for WB, giving two + possible prediction weights for each coefficient. + + + +Coefficient +A +B +C +D + 0 179 116 175 68 + 1 138 67 148 62 + 2 140 82 160 66 + 3 148 59 176 60 + 4 151 92 178 72 + 5 149 72 173 117 + 6 153 100 174 85 + 7 151 89 164 90 + 8 163 92 177 118 + 9 174 136 +10 196 151 +11 182 142 +12 198 160 +13 192 142 +14 182 155 + + + +The prediction is undone using the procedure implemented in + silk_NLSF_residual_dequant() (silk_NLSF_decode.c), which is as follows. +Each coefficient selects its prediction weight from one of the two lists based + on the stage-1 index, I1. + gives the selections for each + coefficient for NB and MB, and gives + the selections for WB. +Let d_LPC be the order of the codebook, i.e., 10 for NB and MB, and 16 for WB, + and let pred_Q8[k] be the weight for the kth + coefficient selected by this process for + 0 <= k < d_LPC-1. +Then, the stage-2 residual for each coefficient is computed via +
+>8 : 0) + + ((((I2[k]<<10) + sign(I2[k])*102)*qstep)>>16) , +]]> +
+ where qstep is the Q16 quantization step size, which is 11796 for NB and MB + and 9830 for WB (representing step sizes of approximately 0.18 and 0.15, + respectively). +
+ + +I1 +Coefficient + +0 1 2 3 4 5 6 7 8 + 0 +A B A A A A A A A + 1 +B A A A A A A A A + 2 +A A A A A A A A A + 3 +B B B A A A A B A + 4 +A B A A A A A A A + 5 +A B A A A A A A A + 6 +B A B B A A A B A + 7 +A B B A A B B A A + 8 +A A B B A B A B B + 9 +A A B B A A B B B +10 +A A A A A A A A A +11 +A B A B B B B B A +12 +A B A B B B B B A +13 +A B B B B B B B A +14 +B A B B A B B B B +15 +A B B B B B A B A +16 +A A B B A B A B A +17 +A A B B B A B B B +18 +A B B A A B B B A +19 +A A A B B B A B A +20 +A B B A A B A B A +21 +A B B A A A B B A +22 +A A A A A B B B B +23 +A A B B A A A B B +24 +A A A B A B B B B +25 +A B B B B B B B A +26 +A A A A A A A A A +27 +A A A A A A A A A +28 +A A B A B B A B A +29 +A A A B A A A A A +30 +A A A B B A B A B +31 +B A B B A B B B B + + + +I1 +Coefficient + +0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 + 0 +C  C  C  C  C  C  C  C  C  C  C  C  C  C  D + 1 +C  C  C  C  C  C  C  C  C  C  C  C  C  C  C + 2 +C  C  D  C  C  D  D  D  C  D  D  D  D  C  C + 3 +C  C  C  C  C  C  C  C  C  C  C  C  D  C  C + 4 +C  D  D  C  D  C  D  D  C  D  D  D  D  D  C + 5 +C  D  C  C  C  C  C  C  C  C  C  C  C  C  C + 6 +D  C  C  C  C  C  C  C  C  C  C  D  C  D  C + 7 +C  D  D  C  C  C  D  C  D  D  D  C  D  C  D + 8 +C  D  C  D  D  C  D  C  D  C  D  D  D  D  D + 9 +C  C  C  C  C  C  C  C  C  C  C  C  C  C  D +10 +C  D  C  C  C  C  C  C  C  C  C  C  C  C  C +11 +C  C  D  C  D  D  D  D  D  D  D  C  D  C  C +12 +C  C  D  C  C  D  C  D  C  D  C  C  D  C  C +13 +C  C  C  C  D  D  C  D  C  D  D  D  D  C  C +14 +C  D  C  C  C  D  D  C  D  D  D  C  D  D  D +15 +C  C  D  D  C  C  C  C  C  C  C  C  D  D  C +16 +C  D  D  C  D  C  D  D  D  D  D  C  D  C  C +17 +C  C  D  C  C  C  C  D  C  C  D  D  D  C  C +18 +C  C  C  C  C  C  C  C  C  C  C  C  C  C  D +19 +C  C  C  C  C  C  C  C  C  C  C  C  D  C  C +20 +C  C  C  C  C  C  C  C  C  C  C  C  C  C  C +21 +C  D  C  D  C  D  D  C  D  C  D  C  D  D  C +22 +C  C  D  D  D  D  C  D  D  C  C  D  D  C  C +23 +C  D  D  C  D  C  D  C  D  C  C  C  C  D  C +24 +C  C  C  D  D  C  D  C  D  D  D  D  D  D  D +25 +C  C  C  C  C  C  C  C  C  C  C  C  C  C  D +26 +C  D  D  C  C  C  D  D  C  C  D  D  D  D  D +27 +C  C  C  C  C  D  C  D  D  D  D  C  D  D  D +28 +C  C  C  C  C  C  C  C  C  C  C  C  C  C  D +29 +C  C  C  C  C  C  C  C  C  C  C  C  C  C  D +30 +D  C  C  C  C  C  C  C  C  C  C  D  C  C  C +31 +C  C  D  C  C  D  D  D  C  C  D  C  C  D  C + + + +The spectral distortion introduced by the quantization of each LSF coefficient + varies, so the stage-2 residual is weighted accordingly, using the + low-complexity weighting function proposed in . +The weights are derived directly from the stage-1 codebook vector. +Let cb1_Q8[k] be the kth entry of the stage-1 + codebook vector from or + . +Then for 0 <= k < d_LPC the following expression + computes the square of the weight as a Q18 value: +
+ + + +
+ where cb1_Q8[-1] = 0 and cb1_Q8[d_LPC] = 256, and the + division is exact integer division. +This is reduced to an unsquared, Q9 value using the following square-root + approximation: +
+>(i-8)) & 127 +y = ((i&1) ? 32768 : 46214) >> ((32-i)>>1) +w_Q9[k] = y + ((213*f*y)>>16) +]]> +
+The cb1_Q8[] vector completely determines these weights, and they may be + tabulated and stored as 13-bit unsigned values (with a range of 1819 to 5227) + to avoid computing them when decoding. +The reference implementation computes them on the fly in + silk_NLSF_VQ_weights_laroia() (silk_NLSF_VQ_weights_laroia.c) and its + caller, to reduce the amount of ROM required. +
+ + +I1 +Codebook + + 0   1   2   3   4   5   6   7   8   9 +0 +12  35  60  83 108 132 157 180 206 228 +1 +15  32  55  77 101 125 151 175 201 225 +2 +19  42  66  89 114 137 162 184 209 230 +3 +12  25  50  72  97 120 147 172 200 223 +4 +26  44  69  90 114 135 159 180 205 225 +5 +13  22  53  80 106 130 156 180 205 228 +6 +15  25  44  64  90 115 142 168 196 222 +7 +19  24  62  82 100 120 145 168 190 214 +8 +22  31  50  79 103 120 151 170 203 227 +9 +21  29  45  65 106 124 150 171 196 224 +10 +30  49  75  97 121 142 165 186 209 229 +11 +19  25  52  70  93 116 143 166 192 219 +12 +26  34  62  75  97 118 145 167 194 217 +13 +25  33  56  70  91 113 143 165 196 223 +14 +21  34  51  72  97 117 145 171 196 222 +15 +20  29  50  67  90 117 144 168 197 221 +16 +22  31  48  66  95 117 146 168 196 222 +17 +24  33  51  77 116 134 158 180 200 224 +18 +21  28  70  87 106 124 149 170 194 217 +19 +26  33  53  64  83 117 152 173 204 225 +20 +27  34  65  95 108 129 155 174 210 225 +21 +20  26  72  99 113 131 154 176 200 219 +22 +34  43  61  78  93 114 155 177 205 229 +23 +23  29  54  97 124 138 163 179 209 229 +24 +30  38  56  89 118 129 158 178 200 231 +25 +21  29  49  63  85 111 142 163 193 222 +26 +27  48  77 103 133 158 179 196 215 232 +27 +29  47  74  99 124 151 176 198 220 237 +28 +33  42  61  76  93 121 155 174 207 225 +29 +29  53  87 112 136 154 170 188 208 227 +30 +24  30  52  84 131 150 166 186 203 229 +31 +37  48  64  84 104 118 156 177 201 230 + + + +I1 +Codebook + + 0  1  2  3  4   5   6   7   8   9  10  11  12  13  14  15 +0 + 7 23 38 54 69  85 100 116 131 147 162 178 193 208 223 239 +1 +13 25 41 55 69  83  98 112 127 142 157 171 187 203 220 236 +2 +15 21 34 51 61  78  92 106 126 136 152 167 185 205 225 240 +3 +10 21 36 50 63  79  95 110 126 141 157 173 189 205 221 237 +4 +17 20 37 51 59  78  89 107 123 134 150 164 184 205 224 240 +5 +10 15 32 51 67  81  96 112 129 142 158 173 189 204 220 236 +6 + 8 21 37 51 65  79  98 113 126 138 155 168 179 192 209 218 +7 +12 15 34 55 63  78  87 108 118 131 148 167 185 203 219 236 +8 +16 19 32 36 56  79  91 108 118 136 154 171 186 204 220 237 +9 +11 28 43 58 74  89 105 120 135 150 165 180 196 211 226 241 +10 + 6 16 33 46 60  75  92 107 123 137 156 169 185 199 214 225 +11 +11 19 30 44 57  74  89 105 121 135 152 169 186 202 218 234 +12 +12 19 29 46 57  71  88 100 120 132 148 165 182 199 216 233 +13 +17 23 35 46 56  77  92 106 123 134 152 167 185 204 222 237 +14 +14 17 45 53 63  75  89 107 115 132 151 171 188 206 221 240 +15 + 9 16 29 40 56  71  88 103 119 137 154 171 189 205 222 237 +16 +16 19 36 48 57  76  87 105 118 132 150 167 185 202 218 236 +17 +12 17 29 54 71  81  94 104 126 136 149 164 182 201 221 237 +18 +15 28 47 62 79  97 115 129 142 155 168 180 194 208 223 238 +19 + 8 14 30 45 62  78  94 111 127 143 159 175 192 207 223 239 +20 +17 30 49 62 79  92 107 119 132 145 160 174 190 204 220 235 +21 +14 19 36 45 61  76  91 108 121 138 154 172 189 205 222 238 +22 +12 18 31 45 60  76  91 107 123 138 154 171 187 204 221 236 +23 +13 17 31 43 53  70  83 103 114 131 149 167 185 203 220 237 +24 +17 22 35 42 58  78  93 110 125 139 155 170 188 206 224 240 +25 + 8 15 34 50 67  83  99 115 131 146 162 178 193 209 224 239 +26 +13 16 41 66 73  86  95 111 128 137 150 163 183 206 225 241 +27 +17 25 37 52 63  75  92 102 119 132 144 160 175 191 212 231 +28 +19 31 49 65 83 100 117 133 147 161 174 187 200 213 227 242 +29 +18 31 52 68 88 103 117 126 138 149 163 177 192 207 223 239 +30 +16 29 47 61 76  90 106 119 133 147 161 176 193 209 224 240 +31 +15 21 35 50 61  73  86  97 110 119 129 141 175 198 218 237 + + + +Given the stage-1 codebook entry cb1_Q8[], the stage-2 residual res_Q10[], and + their corresponding weights, w_Q9[], the reconstructed normalized LSF + coefficients are +
+ +
+ where the division is exact integer division. +However, nothing thus far in the reconstruction process, nor in the + quantization process in the encoder, guarantees that the coefficients are + monotonically increasing and separated well enough to ensure a stable filter. +When using the reference encoder, roughly 2% of frames violate this constraint. +The next section describes a stabilization procedure used to make these + guarantees. +
+ +
+ +The normalized LSF stabilization procedure is implemented in + silk_NLSF_stabilize() (silk_NLSF_stabilize.c). +This process ensures that consecutive values of the normalized LSF + coefficients, NLSF_Q15[], are spaced some minimum distance apart + (predetermined to be the 0.01 percentile of a large training set). + gives the minimum spacings for NB and MB + and those for WB, where row k is the minimum allowed value of + NLSF_Q[k]-NLSF_Q[k-1]. +For the purposes of computing this spacing for the first and last coefficient, + NLSF_Q15[-1] is taken to be 0, and NLSF_Q15[d_LPC] is taken to be 32768. + + + +Coefficient +NB and MB +WB + 0 250 100 + 1 3 3 + 2 6 40 + 3 3 3 + 4 3 3 + 5 3 3 + 6 4 5 + 7 3 14 + 8 3 14 + 9 3 10 +10 461 11 +11 3 +12 8 +13 9 +14 7 +15 3 +16 347 + + + +The procedure starts off by trying to make small adjustments which attempt to + minimize the amount of distortion introduced. +After 20 such adjustments, it falls back to a more direct method which + guarantees the constraints are enforced but may require large adjustments. + + +Let NDeltaMin_Q15[k] be the minimum required spacing for the current audio + bandwidth from . +First, the procedure finds the index i where + NLSF_Q15[i] - NLSF_Q15[i-1] - NDeltaMin_Q15[i] is the + smallest, breaking ties by using the lower value of i. +If this value is non-negative, then the stabilization stops; the coefficients + satisfy all the constraints. +Otherwise, if i == 0, it sets NLSF_Q15[0] to NDeltaMin_Q15[0], and if + i == d_LPC, it sets NLSF_Q15[d_LPC-1] to + (32768 - NDeltaMin_Q15[d_LPC]). +For all other values of i, both NLSF_Q15[i-1] and NLSF_Q15[i] are updated as + follows: +
+>1) + \ NDeltaMin[k] + /_ + k=0 + d_LPC + __ + max_center_Q15 = 32768 - (NDeltaMin[i]>>1) - \ NDeltaMin[k] + /_ + k=i+1 +center_freq_Q15 = clamp(min_center_Q15[i], + (NLSF_Q15[i-1] + NLSF_Q15[i] + 1)>>1, + max_center_Q15[i]) + + NLSF_Q15[i-1] = center_freq_Q15 - (NDeltaMin_Q15[i]>>1) + + NLSF_Q15[i] = NLSF_Q15[i-1] + NDeltaMin_Q15[i] . +]]> +
+Then the procedure repeats again, until it has executed 20 times, or until + it stops because the coefficients satisfy all the constraints. +
+ +After the 20th repetition of the above, the following fallback procedure + executes once. +First, the values of NLSF_Q15[k] for 0 <= k < d_LPC + are sorted in ascending order. +Then for each value of k from 0 to d_LPC-1, NLSF_Q15[k] is set to +
+ +
+Next, for each value of k from d_LPC-1 down to 0, NLSF_Q15[k] is set to +
+ +
+
+ +
+ +
+ +For 20 ms SILK frames, the first half of the frame (i.e., the first two + sub-frames) may use normalized LSF coefficients that are interpolated between + the decoded LSFs for the previous frame and the current frame. +A Q2 interpolation factor follows the LSF coefficient indices in the bitstream, + which is decoded using the PDF in . +This happens in silk_decode_indices() (silk_decode_indices.c). +For the first frame after a decoder reset, when no prior LSF coefficients are + available, the decoder still decodes this factor, but ignores its value and + always uses 4 instead. +For 10 ms SILK frames, this factor is not stored at all. + + + +PDF +{13, 22, 29, 11, 181}/256 + + + +Let n2_Q15[k] be the normalized LSF coefficients decoded by the procedure in + , n0_Q15[k] be the LSF coefficients + decoded for the prior frame, and w_Q2 be the interpolation factor. +Then the normalized LSF coefficients used for the first half of a 20 ms + frame, n1_Q15[k], are +
+> 2) . +]]> +
+This interpolation is performed in silk_decode_parameters() + (silk_decode_parameters.c). +
+
+ +
+ +Any LPC filter A(z) can be split into a symmetric part P(z) and an + anti-symmetric part Q(z) such that +
+ +
+with +
+ +
+The even normalized LSF coefficients correspond to a pair of conjugate roots of + P(z), while the odd coefficients correspond to a pair of conjugate roots of + Q(z), all of which lie on the unit circle. +In addition, P(z) has a root at pi and Q(z) has a root at 0. +Thus, they may be reconstructed mathematically from a set of normalized LSF + coefficients, n[k], as +
+ +
+
+ +However, SILK performs this reconstruction using a fixed-point approximation + that can be reproduced in a bit-exact manner in all decoders to avoid + prediction drift. +The function silk_NLSF2A() (silk_NLSF2A.c) implements this procedure. + + +To start, it approximates cos(pi*n[k]) using a table lookup with linear + interpolation. +The encoder SHOULD use the inverse of this piecewise linear approximation, + rather than true the inverse of the cosine function, when deriving the + normalized LSF coefficients. + + +The top 7 bits of each normalized LSF coefficient index a value in the table, + and the next 8 bits interpolate between it and the next value. +Let i = n[k]>>8 be the integer index and + f = n[k]&255 be the fractional part of a given coefficient. +Then the approximated cosine, c_Q17[k], is +
+> 4 , +]]> +
+ where cos_Q13[i] is the corresponding entry of + . +
+ + + +0 +1 +2 +3 +0 + 8192 8190 8182 8170 +4 + 8152 8130 8104 8072 +8 + 8034 7994 7946 7896 +12 + 7840 7778 7714 7644 +16 + 7568 7490 7406 7318 +20 + 7226 7128 7026 6922 +24 + 6812 6698 6580 6458 +28 + 6332 6204 6070 5934 +32 + 5792 5648 5502 5352 +36 + 5198 5040 4880 4718 +40 + 4552 4382 4212 4038 +44 + 3862 3684 3502 3320 +48 + 3136 2948 2760 2570 +52 + 2378 2186 1990 1794 +56 + 1598 1400 1202 1002 +60 + 802 602 402 202 +64 + 0 -202 -402 -602 +68 + -802-1002-1202-1400 +72 +-1598-1794-1990-2186 +76 +-2378-2570-2760-2948 +80 +-3136-3320-3502-3684 +84 +-3862-4038-4212-4382 +88 +-4552-4718-4880-5040 +92 +-5198-5352-5502-5648 +96 +-5792-5934-6070-6204 +100 +-6332-6458-6580-6698 +104 +-6812-6922-7026-7128 +108 +-7226-7318-7406-7490 +112 +-7568-7644-7714-7778 +116 +-7840-7896-7946-7994 +120 +-8034-8072-8104-8130 +124 +-8152-8170-8182-8190 +128 +-8192 + + + +Given the list of cosine values, silk_NLSF2A_find_poly() (silk_NLSF2A.c) + computes the coefficients of P and Q, described here via a simple recurrence. +Let p_Q16[k][j] and q_Q16[k][j] be the coefficients of the products of the + first (k+1) root pairs for P and Q, with j indexing the coefficient number. +Only the first (k+2) coefficients are needed, as the products are symmetric. +Let p_Q16[0][0] = q_Q16[0][0] = 1<<16, + p_Q16[0][1] = -c_Q17[0], q_Q16[0][1] = -c_Q17[1], and + d2 = d_LPC/2. +As boundary conditions, assume + p_Q16[k][j] = q_Q16[k][j] = 0 for all + j < 0. +Also, assume p_Q16[k][k+2] = p_Q16[k][k] and + q_Q16[k][k+2] = q_Q16[k][k] (because of the symmetry). +Then, for 0 <k < d2 and 0 <= j <= k+1, +
+>16) , + +q_Q16[k][j] = q_Q16[k-1][j] + q_Q16[k-1][j-2] + - ((c_Q17[2*k+1]*q_Q16[k-1][j-1] + 32768)>>16) . +]]> +
+The use of Q17 values for the cosine terms in an otherwise Q16 expression + implicitly scales them by a factor of 2. +The multiplications in this recurrence may require up to 48 bits of precision + in the result to avoid overflow. +In practice, each row of the recurrence only depends on the previous row, so an + implementation does not need to store all of them. +
+ +silk_NLSF2A() uses the values from the last row of this recurrence to + reconstruct a 32-bit version of the LPC filter (without the leading 1.0 + coefficient), a32_Q17[k], 0 <= k < d2: +
+ +
+The sum and difference of two terms from each of the p_Q16 and q_Q16 + coefficient lists reflect the (z**-1 + 1) and (z**-1 - 1) + factors of P and Q, respectively. +The promotion of the expression from Q16 to Q17 implicitly scales the result + by 1/2. +
+
+ +
+ +The a32_Q17[] coefficients are too large to fit in a 16-bit value, which + significantly increases the cost of applying this filter in fixed-point + decoders. +Reducing them to Q12 precision doesn't incur any significant quality loss, + but still does not guarantee they will fit. +silk_NLSF2A() applies up to 10 rounds of bandwidth expansion to limit + the dynamic range of these coefficients. +Even floating-point decoders SHOULD perform these steps, to avoid mismatch. + + +For each round, the process first finds the index k such that abs(a32_Q17[k]) + is the largest, breaking ties by using the lower value of k. +Then, it computes the corresponding Q12 precision value, maxabs_Q12, subject to + an upper bound to avoid overflow when computing the chirp factor: +
+> 5, 163838) . +]]> +
+If this is larger than 32767, the procedure derives the chirp factor, + sc_Q16[0], to use in the bandwidth expansion as +
+> 2 +]]> +
+ where the division here is exact integer division. +This is an approximation of the chirp factor needed to reduce the target + coefficient to 32767, though it is both less than 0.999 and, for + k > 0 when maxabs_Q12 is much greater than 32767, still slightly + too large. +
+ +silk_bwexpander_32() (silk_bwexpander_32.c) peforms the bandwidth expansion + (again, only when maxabs_Q12 is greater than 32767) using the following + recurrence: +
+> 16 + +sc_Q16[k+1] = (sc_Q16[0]*sc_Q16[k] + 32768) >> 16 +]]> +
+The first multiply may require up to 48 bits of precision in the result to + avoid overflow. +The second multiply must be unsigned to avoid overflow with only 32 bits of + precision. +The reference implementation uses a slightly more complex formulation that + avoids the 32-bit overflow using signed multiplication, but is otherwise + equivalent. +
+ +After 10 rounds of bandwidth expansion are performed, they are simply saturated + to 16 bits: +
+> 5, 32767) << 5 . +]]> +
+Because this performs the actual saturation in the Q12 domain, but converts the + coefficients back to the Q17 domain for the purposes of prediction gain + limiting, this step must be performed after the 10th round of bandwidth + expansion, regardless of whether or not the Q12 version of any of the + coefficients still overflow a 16-bit integer. +This saturation is not performed if maxabs_Q12 drops to 32767 or less prior to + the 10th round. +
+
+ +
+ +Even if the Q12 coefficients would fit, the resulting filter may still have a + significant gain (especially for voiced sounds), making the filter unstable. +silk_NLSF2A() applies up to 18 additional rounds of bandwidth expansion to + limit the prediction gain. +Instead of controlling the amount of bandwidth expansion using the prediction + gain itself (which may diverge to infinity for an unstable filter), + silk_NLSF2A() uses LPC_inverse_pred_gain_QA() (silk_LPC_inv_pred_gain.c) + to compute the reflection coefficients associated with the filter. +The filter is stable if and only if the magnitude of these coefficients is + sufficiently less than one. +The reflection coefficients can be computed using a simple Levinson recurrence, + initialized with the LPC coefficients a[d_LPC-1][n] = a[n], and then + updated via +
+ +
+
+ +However, LPC_inverse_pred_gain_QA() approximates this using fixed-point + arithmetic to guarantee reproducible results across platforms and + implementations. +It is important to run on the real Q12 coefficients that will be used during + reconstruction, because small changes in the coefficients can make a stable + filter unstable, but increasing the precision back to Q16 allows more accurate + computation of the reflection coefficients. +Thus, let +
+> 5) << 4 +]]> +
+ be the Q16 representation of the Q12 version of the LPC coefficients that will + eventually be used. +Then for each k from d_LPC-1 down to 0, if + abs(a32_Q16[k][k]) > 65520, the filter is unstable and the + recurrence stops. +Otherwise, the row k-1 of a32_Q16 is computed from row k as +
+> 32) , + + b1[k] = ilog(div_Q30[k]) - 16 , + + (1<<29) - 1 + inv_Qb1[k] = ----------------------- , + div_Q30[k] >> (b1[k]+1) + + err_Q29[k] = (1<<29) + - ((div_Q30[k]<<(15-b1[k]))*inv_Qb1[k] >> 16) , + + mul_Q16[k] = ((inv_Qb1[k] << 16) + + (err_Q29[k]*inv_Qb1[k] >> 13)) >> b1[k] , + + b2[k] = ilog(mul_Q16[k]) - 15 , + + t_Q16[k-1][n] = a32_Q16[k][n] + - ((a32_Q16[k][k-n-1]*rc_Q31[k] >> 32) << 1) , + +a32_Q16[k-1][n] = ((t_Q16[k-1][n] * + (mul_Q16[k] << (16-b2[k]))) >> 32) << b2[k] . +]]> +
+Here, rc_Q30[k] are the reflection coefficients. +div_Q30[k] is the denominator for each iteration, and mul_Q16[k] is its + multiplicative inverse. +inv_Qb1[k], which ranges from 16384 to 32767, is a low-precision version of + that inverse (with b1[k] fractional bits, where b1[k] ranges from 3 to 14). +err_Q29[k] is the residual error, ranging from -32392 to 32763, which is used + to improve the accuracy. +t_Q16[k-1][n], 0 <= n < k, are the numerators for the + next row of coefficients in the recursion, and a32_Q16[k-1][n] is the final + version of that row. +Every multiply in this procedure except the one used to compute mul_Q16[k] + requires more than 32 bits of precision, but otherwise all intermediate + results fit in 32 bits or less. +In practice, because each row only depends on the next one, an implementation + does not need to store them all. +If abs(a32_Q16[k][k]) <= 65520 for + 0 <= k < d_LPC, then the filter is considerd stable. +
+ +On round i, 1 <= i <= 18, if the filter passes this + stability check, then this procedure stops, and +
+> 5 +]]> +
+are the final LPC coefficients to use for + reconstruction. +Otherwise, a round of bandwidth expansion is applied using the same procedure + as in , with +
+ +
+If, after the 18th round, the filter still fails the stability check, then + a_Q12[k] is set to 0 for all k. +
+
+ +
+ +
+ +After the normalized LSF indices and, for 20 ms frames, the LSF + interpolation index, voiced frames (see ) + include additional Long-Term Prediction (LTP) parameters. + + +
+ +
+ +
+ +The Low Bit-Rate Redundancy (LBRR) information, if present, immediately follows + the header bits. +Each frame whose LBRR flag was set includes a separate set of data for each + channel. + +
+ + + +
@@ -1115,26 +2625,26 @@ Insert decoder figure. Symbol(s) PDF Condition -silence [32767, 1]/32768 -post-filter [1, 1]/2 +silence {32767, 1}/32768 +post-filter {1, 1}/2 octave uniform (6)post-filter period raw bits (4+octave)post-filter gain raw bits (3)post-filter -tapset [2, 1, 1]/4post-filter -transient [7, 1]/8 -intra [7, 1]/8 +tapset {2, 1, 1}/4post-filter +transient {7, 1}/8 +intra {7, 1}/8 coarse energy tf_change -tf_select [1, 1]/2 -spread [7, 2, 21, 2]/32 +tf_select {1, 1}/2 +spread {7, 2, 21, 2}/32 dyn. alloc. -alloc. trim [2, 2, 5, 10, 22, 46, 22, 10, 5, 2, 2]/128 -skip [1, 1]/2 +alloc. trim {2, 2, 5, 10, 22, 46, 22, 10, 5, 2, 2}/128 +skip {1, 1}/2 intensity uniform -dual [1, 1]/2 +dual {1, 1}/2 fine energy residual -anti-collapse[1, 1]/2 +anti-collapse{1, 1}/2 finalize Order of the symbols in the CELT section of the bit-stream. @@ -1561,7 +3071,7 @@ within the octave is decoded using 4+octave raw bits. The final pitch period is equal to (16<<octave)+fine_pitch-1 so it is bounded between 15 and 1022, inclusively. Next, the gain is decoded as three raw bits and is equal to G=3*(int_gain+1)/32. The set of post-filter taps is decoded last using -a pdf equal to [2, 1, 1]/4. Tapset zero corresponds to the filter coefficients +a pdf equal to {2, 1, 1}/4. Tapset zero corresponds to the filter coefficients g0 = 0.3066406250, g1 = 0.2170410156, g2 = 0.1296386719. Tapset one corresponds to the filter coefficients g0 = 0.4638671875, g1 = 0.2680664062, g2 = 0, and tapset two uses filter coefficients g0 = 0.7998046875, @@ -2119,7 +3629,7 @@ c_tilt = 0.04 + 0.06 * C
- For a frame of voiced speech the pitch pulses will remain dominant in the pre-whitened input signal. Further whitening is desirable as it leads to higher quality at the same available bitrate. To achieve this, a Long-Term Prediction (LTP) analysis is carried out to estimate the coefficients of a fifth order LTP filter for each of four sub-frames. The LTP coefficients are used to find an LTP residual signal with the simulated output signal as input to obtain better modelling of the output signal. This LTP residual signal is the input to an LPC analysis where the LPCs are estimated using Burgs method, such that the residual energy is minimized. The estimated LPCs are converted to a Line Spectral Frequency (LSF) vector, and quantized as described in . After quantization, the quantized LSF vector is converted to LPC coefficients and hence by using these quantized coefficients the encoder remains fully synchronized with the decoder. The LTP coefficients are quantized using a method described in . The quantized LPC and LTP coefficients are now used to filter the high-pass filtered input signal and measure a residual energy for each of the four subframes. + For a frame of voiced speech the pitch pulses will remain dominant in the pre-whitened input signal. Further whitening is desirable as it leads to higher quality at the same available bitrate. To achieve this, a Long-Term Prediction (LTP) analysis is carried out to estimate the coefficients of a fifth order LTP filter for each of four subframes. The LTP coefficients are used to find an LTP residual signal with the simulated output signal as input to obtain better modelling of the output signal. This LTP residual signal is the input to an LPC analysis where the LPCs are estimated using Burgs method, such that the residual energy is minimized. The estimated LPCs are converted to a Line Spectral Frequency (LSF) vector, and quantized as described in . After quantization, the quantized LSF vector is converted to LPC coefficients and hence by using these quantized coefficients the encoder remains fully synchronized with the decoder. The LTP coefficients are quantized using a method described in . The quantized LPC and LTP coefficients are now used to filter the high-pass filtered input signal and measure a residual energy for each of the four subframes.