This makes all remaining large stack allocations use the vararray
macros.
This continues the work of 6f2d9f50 to allow compiling with
NONTHREADSAFE_PSEUDOSTACK to move the memory for large buffers
off the stack for devices where it is very limited.
It also does this for some additional large buffers used by the
PLC in the decoder.
1) The memcpy's were using sizeof(opus_int32), but the type of the
local buffer was opus_int16.
2) Because the size was wrong, this potentially allowed the source
and destination regions of the memcpy overlap.
I _believe_ that nSamplesIn is at least fs_in_khZ, which is at
least 8.
Since RESAMPLER_ORDER_FIR_12 is only 8, I don't think that's a
problem once you fix the type size.
3) The size of the buffer used RESAMPLER_MAX_BATCH_SIZE_IN, but the
data stored in it was actually _twice_ the input batch size
(nSamplesIn<<1).
Because this never blew up in testing, I suspect that in practice
the batch sizes are reasonable enough that none of these things
was ever a problem, but proving that seems non-obvious.
This patch just converts the whole thing to use CELT's vararrays.
This fixes the buffer size problems (since we allocate a buffer
with the actual size we use) and gets these large buffers off the
stack on devices using the pseudo-stack.
It also fixes the memcpy problems by changing the sizeof to
opus_int16.
It turns out sFIR, which saved state between calls, was being used
elsewhere as opus_int32, so this converts it to a union to make
this sharing explicit.
decoder:
- fixed incorrect scaling of filter states for the smallest quantization
step sizes
- NLSF2A now limits the prediction gain of LPC filters
encoder:
- increased damping of LTP coefficients in LTP analysis
- increased white noise fraction in noise shaping LPC analysis
- introduced maximum total prediction gain. Used by Burg's method to
exit early if prediction gain is exceeded. This improves packet
loss robustness and numerical robustness in Burg's method
- Prefiltered signal is now in int32 Q10 domain, from int16 Q0
- Increased max number of iterations in CBR gain control loop from 5 to 6
- Removed useless code from LTP scaling control
- Optimization: smarter LPC loop unrolling
- Switched default win32 compile mode to be floating-point
resampler:
- made resampler have constant delay of 0.75 ms; removed delay
compensation from silk code.
- removed obsolete table entries (~850 Bytes)
- increased downsampling filter order from 16 to 18/24/36 (depending on
frequency ratio)
- reoptimized filter coefficients