Compare the output of xcorr_kernel() against the results of
xcorr_kernel_c() when configured with --enable-check-asm.
Currently this is only checked in fixed point, as a float check
requires more sophisticated error analysis and may need to be
customized for each vector implementation.
Signed-off-by: Jean-Marc Valin <jmvalin@jmvalin.ca>
The LPCs are computed in 32-bit, so increase the allowed range from +/-8
to +/-64 to avoid overflows caught during fuzzing. Before downshifting
back down to the +/-8 range in the final 16-bit output, perform bandwidth
extension to avoid any additional overflow issues.
The "mem" in celt_fir_c() either is contained in the head of input "x"
in reverse order already, or can be easily attached to the head of "x"
before calling the function. Removing argument "mem" can eliminate the
redundant buffer copies inside.
Update celt_fir_sse4_1() accordingly.
1. Only for fixed point on x86 platform (32bit and 64bit, uses SIMD
intrinsics up to SSE4.2)
2. Use "configure --enable-fixed-point --enable-intrinsics" to enable
optimization, default is disabled.
3. Official test cases are verified and passed.
Signed-off-by: Timothy B. Terriberry <tterribe@xiph.org>
Computes most of the auto-correlation by reusing pitch_xcorr(). We only
need lag*(lag-1)/2 MACs to complete the calculations.
To do this, pitch_xcorr() was modified so that it no longer truncates the
length to a multiple of 4. Also, the xcorr didn't need the floor at -1.
As a side benefit, this speeds up the PLC, which uses a higher order LPC
filter.
This splits out the non-arch-specific portions of a patch written
by Aurélien Zanelli <aurelien.zanelli@parrot.com
http://lists.xiph.org/pipermail/opus/2013-May/002088.html
I also added support for odd n, for custom modes.
0.25% speedup on 96 kbps stereo encode+decode on a Cortex A8.