When using OPUS_CUSTOM, `CELTDecoder->end` can be larger than 21.
Assert against 25 instead in OPUS_CUSTOM builds.
Signed-off-by: Jean-Marc Valin <jmvalin@jmvalin.ca>
The "mem" in celt_fir_c() either is contained in the head of input "x"
in reverse order already, or can be easily attached to the head of "x"
before calling the function. Removing argument "mem" can eliminate the
redundant buffer copies inside.
Update celt_fir_sse4_1() accordingly.
Reordering the add with VERY_SMALL changes the dependencies cycle from 2 add + 1 mul
(11 cycles on haswell) to 1 add + 1 mul (8 cycles). This makes the entire decoder about
1.5% faster.
Optimize opus decode (float only) use case using ARM NE10.
Mainly effects opus_ifft and ctl_mdct_backward and related
functions.
Work based on previous Encode optimization using ARM NE10
library. See previous commit for details on how to enable
this.
Signed-off-by: Timothy B. Terriberry <tterribe@xiph.org>
Enable x86 intrinsics when building in floating-point mode.
Support SSE as an arch value.
Use RTCD to conditionally enable existing floating-point Celt SSE code.
Call functions directly (without RTCD) when their architecture can be presumed.
Use SSE4.1 intrinsics optimized code for Silk even in floating-point mode.
1. Only for fixed point on x86 platform (32bit and 64bit, uses SIMD
intrinsics up to SSE4.2)
2. Use "configure --enable-fixed-point --enable-intrinsics" to enable
optimization, default is disabled.
3. Official test cases are verified and passed.
Signed-off-by: Timothy B. Terriberry <tterribe@xiph.org>
Newer versions of MSVC are unhappy with the strategy of the build
environment redefining "inline" (even though they don't support the
actual keyword). Instead we define OPUS_INLINE to the right thing
in opus_defines.h.
This is the same approach we use for restrict.
Run-time CPU detection (RTCD) is enabled by default if target platform support
it.
It can be disable at compile time with --disable-rtcd option.
Add RTCD support for ARM architecture.
Thanks to Timothy B. Terriberry for help and code review
Signed-off-by: Timothy B. Terriberry <tterribe@xiph.org>
This makes all remaining large stack allocations use the vararray
macros.
This continues the work of 6f2d9f50 to allow compiling with
NONTHREADSAFE_PSEUDOSTACK to move the memory for large buffers
off the stack for devices where it is very limited.
It also does this for some additional large buffers used by the
PLC in the decoder.