This means we're "time-ordered" in all cases except when increasing
the time resolution on frames that already use short blocks.
There's no reordering when increasing the frequency resolution
on short blocks.
Dynalloc becomes 2x more likely every time we use it, until it
reaches a probability of 1/4. Allocation increments now have
a floor of 1/8 bit/sample and a ceiling of 1 bit/sample.
The modeline-bisection and interpolator have used different criteria
for the minimum coding threshold since the introduction of the
"backwards done" in 405e6a99. This meant that a lower modeline could be
selected which the interpolator was never able to get under the maximum
allocation. This patch makes the modeline selection search use the same
criteria as the interpolator.
This removes an XOR, an ADD, and an AND, and replaces them with
an AND NOT in ec_dec_normalize().
Also, simplify the loop structure of ec_dec_cdf() and eliminate a
CMOV.
All of our usage of ec_{enc|dec}_bit_prob had the probability of a
"one" being a power of two.
This adds a new ec_{enc|dec}_bit_logp() function that takes this
explicitly into account.
It introduces less rounding error than the bit_prob version, does not
require 17-bit integers to be emulated by ec_{encode|decode}_bin(),
and does not require any multiplies or divisions at all.
It is exactly equivalent to
ec_encode_bin(enc,_val?0:(1<<_logp)-1,(1<<_logp)-(_val?1:0),1<<_logp)
The old ec_{enc|dec}_bit_prob functions are left in place for now,
because I am not sure if SILK is still using them or not when
combined in Opus.
It turns out to be more convenient to store dif=low+rng-code-1
instead of dif=low+rng-code.
This gets rid of a decrement in the normal decode path, replaces a
decrement and an "and" in the normalization loop with a single
add, and makes it clear that the new ec_dec_cdf() will not result
in an infinite loop.
This does not change the bitstream.
This decodes a value encoded with ec_encode_bin() without using any
divisions.
It is only meant for small alphabets.
If a symbol can take on a large number of possible values, a binary
search would be better.
This patch also converts spread_decision to use it, since it is
faster and introduces less rounding error to encode a single
decision for the entire value than to encode it a bit at a time.
These were stored internally in one order and in the bitstream in a
different order.
Both used bare constants, making it unclear what either actually
meant.
This changes them to use the same order, gives them named constants,
and renames all the "fold" decision stuff to "spread" instead,
since that is what it is really controlling.
The bisection search in compute_allocation() was not using the same
method to count psum as interp_bits2pulses, i.e., it did not
include the 64*C<<BITRES<<LM allocation ceiling (this adds at most
84 max operations/frame, and so should have a trivial CPU cost).
Again, I wouldn't want to try to explain why these are different in
a spec, so let's make them the same.
In addition, the procedure used to fill in bits1 and bits2 after the
bisection search was not the same as the one used during the
bisection search.
I.e., the
if (bits1[j] > 0)
bits1[j] += trim_offset[j];
step was not also done for bits2, so bits1[j] + bits2[j] would not
be equal to what was computed earlier for the hi line, and would
not be guaranteed to be larger than total.
We now compute both allocation lines in the same manner, and then
obtain bits2 by subtracting them, instead of trying to compute the
offset from bits1 up front.
Finally, there was nothing to stop a bitstream from boosting a band
beyond the number of bits remaining, which means that bits1 would
not produce an allocation less than or equal to total, which means
that some bands would receive a negative allocation in the decoder
when the "left over" negative bits were redistributed to other
bands.
This patch only adds the dynalloc offset to allocation lines greater
than 0, so that an all-zeros floor still exists; the effect is that
a dynalloc boost gets linearly scaled between allocation lines 0 and
1, and is constant (like it was before) after that.
We don't have to add the extra condition to the bisection search,
because it never examines allocation line 0.
This re-writes the indexing in the search to make that explicit;
it was tested and gives exactly the same results in exactly the
same number of iterations as the old search.
Commit 8e447678 increased the number of cases where we end skipping
without explicit signaling.
Before, this would cause the bit we reserved for this purpose to
either a) get grabbed by some N=1 band to code its sign bits or
b) wind up as part of the fine energy at the end.
This patch gives it back to the band where we stopped skipping,
which is either the first band, or a band that was boosted by
dynalloc.
This allows the bit to be used for shape coding in that band, and
allows the better computation of the fine offset, since the band
knows it will get that bit in advance.
With this change, we now guarantee that the number of bits allocated
by compute_allocation() is exactly equal to the input total, less
the bits consumed by skip flags during allocation itself (assuming
total was non-negative; for negative total, no bits are emitted,
and no bits are allocated).
Excess fractions of a bit can't be re-used in N=1 bands during
quant_all_bands() because there's no shape, only a sign bit.
This meant that all the fractional bits in these bands accumulated,
often up to 5 or 6 bits for stereo, until the first band with N>1,
where they were dumped all at once.
This patch moves the rebalancing for N=1 bands to
interp_bits2pulses() instead, where excess bits still have a
chance to be moved into fine energy.
In commit ffe10574 JM added a "done" flag to the allocation
interpolation loop: whenver a band did not have enough bits to
pass its threshold for receiving PVQ pulses, all of the rest of
band were given just enough bits for fine energy only.
This patch implements JM's "backwards done" idea: instead work
backwards, dropping bands until the first band that is over the
threshold is encountered, and don't artificially reduce the
allocation any more after that.
This is much more stable: we can continue to signal manual skips if
we want to, but we aren't forced to skip a large number of bands
because of an isolated hole in he allocation.
This makes low-bitrate 120-sample frames much less rough.
It also reduces the force skip threshold from
alloc_floor+(1<<BITRES)+1 to just alloc_floor+(1<<BITRES), because
the former can now cascade to cause many bands to be skipped.
The difference here is subtle, and increases signaling overhead by
0.11% of the total bitrate, but Monty confirmed that removing the
+1 reduces noise in the bass (i.e., in N=1 bands where such a skip
could cascade).
Finally the 64*C<<BITRES<<LM ceiling is moved into the bisection
search, instead of just being imposed afterwards, again because I
wouldn't want to try to explain in a spec why they're different.
1) Continue to update left and percoeff if we skip all the way to the
first band.
This doesn't actually matter for correctness, but I don't want to
try to explain in a spec why we aren't doing this.
2) Force all the bits in skipped bands to go to fine energy.
Before some of them could continue to be given to pulses, even though no
pulses would actually be allocate for them.
The margin of safety was supposed to be 1/8th bit, not 1 bit, and the
bit we reserved to terminate skip signalling before was actually 8
bits.
This patch updates the margin of safety to the correct value and
accounts for the one bit (not 8) needed for skip signalling.
It also fixes the remainder calculation in the skip loop to work
correctly when start>0.
Now that manual skipping is in the same loop as forced skipping, there
is no reason to do all of one, then all of the other.
This ensures we won't propagate bits to bands that have almost nothing
later in quant_all_bands() because we didn't have enough bits to
signal them skipped.
This allows us to a) not pay a coding cost to avoid skipping bands that are
stupid to skip (e.g., the first band, or bands that have so few bits that we
wouldn't redistribute anything) and b) not reserve bits to pay that cost.
This moves more of the decisions about when to stop skipping bands into the
encoder-specific branch, so they are not forced in the decoder (because there
is currently no bit-savings from forcing them).
It also no longer requires an extra bit to code the fine energy in a skipped
band: that was meant to account for the skip flag, but we already subtracted
that.