Computes most of the auto-correlation by reusing pitch_xcorr(). We only
need lag*(lag-1)/2 MACs to complete the calculations.
To do this, pitch_xcorr() was modified so that it no longer truncates the
length to a multiple of 4. Also, the xcorr didn't need the floor at -1.
As a side benefit, this speeds up the PLC, which uses a higher order LPC
filter.
This splits out the non-arch-specific portions of a patch written
by Aurélien Zanelli <aurelien.zanelli@parrot.com
http://lists.xiph.org/pipermail/opus/2013-May/002088.html
I also added support for odd n, for custom modes.
0.25% speedup on 96 kbps stereo encode+decode on a Cortex A8.