Using the min energy of the two last non-transient frames rather
than the min of just the two last frames. Also slightly increasing
the "thresh" upper bound coefficient to 0.5.
By moving the energy floor to the encoder, we can use a different
floor for prediction than for the decay level. Also, the fixed-point
dynamic range has been increased to avoid overflows when a fixed-point
decoder is used on a stream encoded in floating-point.
Jean-Marc's original anti-collapse patch used a threshold on the
content of a decoded band to determine whether or not it should
be filled with random noise.
Since this is highly sensitive to the accuracy of the
implementation, it could lead to significant decoder output
differences even if decoding error up to that point was relatively
small.
This patch detects collapsed bands from the output of the vector
quantizer, using exact integer arithmetic.
It makes two simplifying assumptions:
a) If either input to haar1() is non-zero during TF resolution
adjustments, then the output will be non-zero.
b) If the content of a block is non-zero in any of the bands that
are used for folding, then the folded output will be non-zero.
b) in particular is likely to be false when SPREAD_NONE is used.
It also ignores the case where mid and side are orthogonal in
stereo_merge, but this is relatively unlikely.
This misses just over 3% of the cases that Jean-Marc's anti-collapse
detection strategy would catch, but does not mis-classify any (all
detected collapses are true collapses).
This patch overloads the "fill" parameter to mark which blocks have
non-zero content for folding.
As a consequence, if a set of blocks on one side of a split has
collapsed, _no_ folding is done: the result would be zero anyway,
except for short blocks with SPREAD_AGGRESSIVE that are split down
to a single block, but a) that means a lot of bits were available
so a collapse is unlikely and b) anti-collapse can fill the block
anyway, if it's used.
This also means that if itheta==0 or itheta==16384, we no longer
fold at all on that side (even with long blocks), since we'd be
multiplying the result by zero anyway.
This looks for bands in each short block that have no energy. For
each of these "collapsed" bands, noise is injected to have an
energy equal to the minimum of the two previous frames for that band.
The mechanism can be used whenever there are 4 or more MDCTs (otherwise
no complete collapse is possible) and is signalled with one bit just
before the final fine energy bits.
This patch makes all symbols conditional on whether or not there's
enough space left in the buffer to code them, and eliminates much
of the redundancy in the side information.
A summary of the major changes:
* The isTransient flag is moved up to before the the coarse energy.
If there are not enough bits to code the coarse energy, the flag
would get forced to 0, meaning what energy values were coded
would get interpreted incorrectly.
This might not be the end of the world, and I'd be willing to
move it back given a compelling argument.
* Coarse energy switches coding schemes when there are less than 15
bits left in the packet:
- With at least 2 bits remaining, the change in energy is forced
to the range [-1...1] and coded with 1 bit (for 0) or 2 bits
(for +/-1).
- With only 1 bit remaining, the change in energy is forced to
the range [-1...0] and coded with one bit.
- If there is less than 1 bit remaining, the change in energy is
forced to -1.
This effectively low-passes bands whose energy is consistently
starved; this might be undesirable, but letting the default be
zero is unstable, which is worse.
* The tf_select flag gets moved back after the per-band tf_res
flags again, and is now skipped entirely when none of the
tf_res flags are set, and the default value is the same for
either alternative.
* dynalloc boosting is now limited so that it stops once it's given
a band all the remaining bits in the frame, or when it hits the
"stupid cap" of (64<<LM)*(C<<BITRES) used during allocation.
* If dynalloc boosing has allocated all the remaining bits in the
frame, the alloc trim parameter does not get encoded (it would
have no effect).
* The intensity stereo offset is now limited to the range
[start...codedBands], and thus doesn't get coded until after
all of the skip decisions.
Some space is reserved for it up front, and gradually given back
as each band is skipped.
* The dual stereo flag is coded only if intensity>start, since
otherwise it has no effect.
It is now coded after the intensity flag.
* The space reserved for the final skip flag, the intensity stereo
offset, and the dual stereo flag is now redistributed to all
bands equally if it is unused.
Before, the skip flag's bit was given to the band that stopped
skipping without it (usually a dynalloc boosted band).
In order to enable simple interaction between VBR and these
packet-size enforced limits, many of which are encountered before
VBR is run, the maximum packet size VBR will allow is computed at
the beginning of the encoding function, and the buffer reduced to
that size immediately.
Later, when it is time to make the VBR decision, the minimum packet
size is set high enough to ensure that no decision made thus far
will have been affected by the packet size.
As long as this is smaller than the up-front maximum, all of the
encoder's decisions will remain in-sync with the decoder.
If it is larger than the up-front maximum, the packet size is kept
at that maximum, also ensuring sync.
The minimum used now is slightly larger than it used to be, because
it also includes the bits added for dynalloc boosting.
Such boosting is shut off by the encoder at low rates, and so
should not cause any serious issues at the rates where we would
actually run out of room before compute_allocation().
This renames ec_dec_cdf() to ec_dec_icdf(), and changes the
functionality to use an "inverse" CDF table, where
icdf[i]=ft-cdf[i+1].
The first entry is omitted entirely.
It also adds a corresonding ec_enc_icdf() to the encoder, which uses
the same table.
One could use ec_encode_bin() by converting the values in the tables
back to normal CDF values, but the icdf[] table already has them in
the form ec_encode_bin() wants to use them, so there's no reason to
translate them and then translate them back.
This is done primarily to allow SILK to use the range coder with
8-bit probability tables containing cumulative frequencies that
span the full range 0...256.
With an 8-bit table, the final 256 of a normal CDF becomes 0 in the
"inverse" CDF.
It's the 0 at the start of a normal CDF which would become 256, but
this is the value we omit, as it already has to be special-cased in
the encoder, and is not used at all in the decoder.
The band where intensity stereo begins was being coded as an
absolute value, rather than relative to start, even though the
range of values in the bitstream was limited as if it was being
coded relative to start (meaning there would be desync if
intensity was sufficiently large).
The valid bands range from [start,end) everywhere, with start<end.
Therefore end should never be 0, and should be allowed to extend
all the way to mode->nbEBands.
This patch does _not_ enforce that start<end, and it does _not_
handle clearing oldBandE[] when the valid range changes, which
are separate issues.
For our current usage, this doesn't matter, but is more consistent
with the rest of the API.
We may want to reduce this to an unsigned char[], but I'd rather
coordinate that optimization with SILK's planned reduction to
8-bit CDFs, as we may be able to use the same code.
This simplifies a good bit of the error handling, and should make it
impossible to overrun the buffer in the encoder or decoder, while
still allowing tell() to operate correctly after a bust.
The encoder now tries to keep the range coder data intact after a
bust instead of corrupting it with extra bits data, though this is
not a guarantee (too many extra bits may have already been flushed).
It also now correctly reports errors when the bust occurs merging the
last byte of range coder and extra bits.
A number of abstraction barrier violations were cleaned up, as well.
This patch also includes a number of minor performance improvements:
ec_{enc|dec}_bits() in particular should be much faster.
Finally, tf_select was changed to be coded with the range coder
rather than extra bits, so that it is at the front of the packet
(for unequal error protection robustness).
Dynalloc becomes 2x more likely every time we use it, until it
reaches a probability of 1/4. Allocation increments now have
a floor of 1/8 bit/sample and a ceiling of 1 bit/sample.
All of our usage of ec_{enc|dec}_bit_prob had the probability of a
"one" being a power of two.
This adds a new ec_{enc|dec}_bit_logp() function that takes this
explicitly into account.
It introduces less rounding error than the bit_prob version, does not
require 17-bit integers to be emulated by ec_{encode|decode}_bin(),
and does not require any multiplies or divisions at all.
It is exactly equivalent to
ec_encode_bin(enc,_val?0:(1<<_logp)-1,(1<<_logp)-(_val?1:0),1<<_logp)
The old ec_{enc|dec}_bit_prob functions are left in place for now,
because I am not sure if SILK is still using them or not when
combined in Opus.
This decodes a value encoded with ec_encode_bin() without using any
divisions.
It is only meant for small alphabets.
If a symbol can take on a large number of possible values, a binary
search would be better.
This patch also converts spread_decision to use it, since it is
faster and introduces less rounding error to encode a single
decision for the entire value than to encode it a bit at a time.
These were stored internally in one order and in the bitstream in a
different order.
Both used bare constants, making it unclear what either actually
meant.
This changes them to use the same order, gives them named constants,
and renames all the "fold" decision stuff to "spread" instead,
since that is what it is really controlling.
Commit 8e447678 increased the number of cases where we end skipping
without explicit signaling.
Before, this would cause the bit we reserved for this purpose to
either a) get grabbed by some N=1 band to code its sign bits or
b) wind up as part of the fine energy at the end.
This patch gives it back to the band where we stopped skipping,
which is either the first band, or a band that was boosted by
dynalloc.
This allows the bit to be used for shape coding in that band, and
allows the better computation of the fine offset, since the band
knows it will get that bit in advance.
With this change, we now guarantee that the number of bits allocated
by compute_allocation() is exactly equal to the input total, less
the bits consumed by skip flags during allocation itself (assuming
total was non-negative; for negative total, no bits are emitted,
and no bits are allocated).
The margin of safety was supposed to be 1/8th bit, not 1 bit, and the
bit we reserved to terminate skip signalling before was actually 8
bits.
This patch updates the margin of safety to the correct value and
accounts for the one bit (not 8) needed for skip signalling.
It also fixes the remainder calculation in the skip loop to work
correctly when start>0.
This allows us to a) not pay a coding cost to avoid skipping bands that are
stupid to skip (e.g., the first band, or bands that have so few bits that we
wouldn't redistribute anything) and b) not reserve bits to pay that cost.