Also using the same int->float conversion functions for SILK as for CELT
and changed encoder implementation default to constrained VBR just to
be safe when VBR gets more aggressive.
decoder:
- fixed incorrect scaling of filter states for the smallest quantization
step sizes
- NLSF2A now limits the prediction gain of LPC filters
encoder:
- increased damping of LTP coefficients in LTP analysis
- increased white noise fraction in noise shaping LPC analysis
- introduced maximum total prediction gain. Used by Burg's method to
exit early if prediction gain is exceeded. This improves packet
loss robustness and numerical robustness in Burg's method
- Prefiltered signal is now in int32 Q10 domain, from int16 Q0
- Increased max number of iterations in CBR gain control loop from 5 to 6
- Removed useless code from LTP scaling control
- Optimization: smarter LPC loop unrolling
- Switched default win32 compile mode to be floating-point
resampler:
- made resampler have constant delay of 0.75 ms; removed delay
compensation from silk code.
- removed obsolete table entries (~850 Bytes)
- increased downsampling filter order from 16 to 18/24/36 (depending on
frequency ratio)
- reoptimized filter coefficients
C reserves identifiers of the from _[A-Z]+ and we have a number of
those in the code. This patch renames the various function arguments,
MACROS and preprocessor symbols to avoid the reserved form.
It also removes the CHANNELS() macro altogether. This was a
minor optimization for TI DSP to force a mono-only build,
as were the associated local 'const' versions. Since stereo
support is manditory, it wasn't worth keeping.
Thanks to John Ridges for raising the issue, and Jean-Marc Valin
and Greg Maxwell for reviewing the changes.
This is achieved by running the encoding process in a loop and
padding when we don't reach the exact rate. It also implements
VBR-with-cap, which means we no longer need to artificially decrease
the SILK bandwidth when it's close to the cap.
b24e5746 introduced changes to LastGainIndex which broke
conditional coding for side frames after a mid-only frame (i.e.,
in a 60 ms frame where the side is coded, not coded, then coded
again).
These rules were a mess in general, however, because the side
channel state kept a different nFramesDecoded count from the mid
channel state, and had no way to tell if the prior side frame was
coded.
This patch attempts to rationalize them by moving the conditional
coding decision up to the top level, where all this information is
available.
The first coded side frame after an uncoded side frame now always
uses independent coding.
If such a frame is also not the first side frame in an Opus frame,
then it doesn't include an LTP scaling parameter (because the LTP
state is well-defined).
- There was a bug where the decoder resampler was not properly initialized
when fs_kHz == API_fs_kHz. In that case the resampler would continue to
upsample, and the output was corrupt.
- The delay value in the decoder was taken from the state before it was
potentially updated. This caused the decoder to apply the new dalay value one
frame late
- The encoder and decoder states are now updated more consistently, when
the sampling rate changes (pesq liked these changes)
- Properly resetting the side channel encoder and decoder for the first
frame with side coding active again
- Faster updating the "ratio" value in the LR_to_MS() code for large
prediction values means that for certain extreme/artificial input
signals the output looks better
- increases the max pitch lag by 1 (the thing Tim pointed out). this brings the decoder in sync with the old one
- avoids that the first stereo frame is collapsed to mono
Simplifies mono/stereo switching in SILK
Fixes a quantization mismatch between encoder and decoder
Constrains the pitch lags in the same way in the encoder and decoder
The API permits the caller to freely copy the codec state on their
own, but this can't work if there are any any position dependant pointers
in the codec state.
On MacOS, stdlib.h ends up including sys/signal.h, generating
warnings about the local variables called 'signal' shadowing
the global symbol signal(3). Tested with XCode 4.1 on
MacOS X 10.7.0.
The signal buffers passed in are generally frames being processed,
and the code already uses the term frame and frame_length elsewhere,
so I've resolved the warning by renaming signal and signal_* locals
and parameters to frame and frame_*.
This is a tentative fix for a bug found in fuzzing where the encoder
switched from mono to stereo while in the process of changing bandwidth.
The result was that the newly added side would use the new sampling
rate, while the mid hadn't switched yet, causing an encoder/decoder
mismatch. The fix is that the side rate selection gets overridden
to use the mid rate.
The bug would occur when compiling with fuzzing enabled and using:
./test_opus 0 48000 2 24000 input.sw output.sw
- Merged the LPC stabilization from NLSF2A_stable.c into NLSF2A.c
- The bandwidth expansion in NLSF2A() now operates on int32 LPC coefficients in
Q17 domain (instead of int16 Q12 coefficients)
- The function bwexpander_32() has a more precise way of updating the chirp
variable (round to nearest, instead of round down)
- Changed a few variables in NLSF_stabilize() from int16 to int32 to avoid signed
wrap-around (no difference in results as the wrap-around would always be reversed
later)
- The LSF codebook for WB speech has a quantization stepsize of 0.15 (was 0.16).
This doesn't break the bitstream, although it slightly limits quality of signals
encoded with the old version and decoded with the new one (I can't really hear it
and PESQ gives high scores as well). I does improve handling of tonal signals.
- As discussed: the Q-domain of the poly function is now in Q16 (was Q20)
- As discussed: limiting the LSFs in NLSF_decode() to 0...32767
- The silk_NLSF_DELTA_MIN values were lowered to deal with a possible future situation with less or no input HP filtering.