This lets us cut out a bunch of work in the large _n, small _k case
where most of the dimensions won't have any pulses.
It also gets rid of all remaining usage of CELT_PVQ_U() in cwrsi(),
leaving just a single test instead of lots of mins and maxes, and
makes a bunch of the jump threading more obvious.
This is a 1.6% decoder speedup on a 96 kbps comp48-stereo encode on
a Cortex A8.
Does not change the behaviour of the VBR code in most cases. The only
exception is that the VBR offset is now taken into accound in the base_rate,
which will have a (very minor) impact on CVBR at low rate.
There's no CPU detection for it, it only gets enabled by __SSE__
which gcc (other compilers?) defines automatically when supported
by -march=, which means at least all x86-64. For ia32, the user needs to
enable it in the CFLAGS.
Run-time CPU detection (RTCD) is enabled by default if target platform support
it.
It can be disable at compile time with --disable-rtcd option.
Add RTCD support for ARM architecture.
Thanks to Timothy B. Terriberry for help and code review
Signed-off-by: Timothy B. Terriberry <tterribe@xiph.org>
With gcc-4.4 at least, the raw asm.s files will always successfully
compile even if the default -march for the compiler would not support
those instructions. So switch to testing the inline asm versions,
where the compiler will barf if they aren't supported by the default
arch if no -march is explicitly given, or if they aren't supported by
the requested -march when it is.
If opus_compare doesn't exist or isn't executable, tests failed normally
which could be misleading.
So test for existence and mode to avoid this ambiguity.
Rename y0 and y1 because of the name clash with Bessel functions.
Initialize y_3 to zero because gcc is too dumb to realize it can't
be used uninitialized.
Computes most of the auto-correlation by reusing pitch_xcorr(). We only
need lag*(lag-1)/2 MACs to complete the calculations.
To do this, pitch_xcorr() was modified so that it no longer truncates the
length to a multiple of 4. Also, the xcorr didn't need the floor at -1.
As a side benefit, this speeds up the PLC, which uses a higher order LPC
filter.
I've done some editing for clarity, but more needs to be done.
The language needs clean-up, we should forward-reference the LPC
Extrapolation section, and we need a reference for actually
computing linear prediction coefficients.