This stores the caps array in 32nd bits/sample instead of 1/2 bits
scaled by LM and the channel count, which is slightly less
less accurate for the last two bands, and much more accurate for
all the other bands.
A constant offset is subtracted to allow it to represent values
larger than 255 in 8 bits (the range of unoffset values is
77...304).
In addition, this replaces the last modeline in the allocation table
with the caps array, allowing the initial interpolation to
allocate 8 bits/sample or more, which was otherwise impossible.
The first version of the mono decoder with stereo output collapsed
the historic energy values stored for anti-collapse down to one
channel (by taking the max).
This means that a subsequent switch back would continue on using
the the maximum of the two values instead of the original history,
which would make anti-collapse produce louder noise (and
potentially more pre-echo than otherwise).
This patch moves the max into the anti_collapse function itself,
and does not store the values back into the source array, so the
full stereo history is maintained if subsequent frames switch
back.
It also fixes an encoder mismatch, which never took the max
(assuming, apparently, that the output channel count would never
change).
Instead of just dumping excess bits into the first band after
allocation, use them to initialize the rebalancing loop in
quant_all_bands().
This allows these bits to be redistributed over several bands, like
normal.
The previous "dumb cap" of (64<<LM)*(C<<BITRES) was not actually
achievable by many (most) bands, and did not take the cost of
coding theta for splits into account, and so was too small for some
bands.
This patch adds code to compute a fairly accurate estimate of the
real maximum per-band rate (an estimate only because of rounding
effects and the fact that the bit usage for theta is variable),
which is then truncated and stored in an 8-bit table in the mode.
This gives improved quality at all rates over 160 kbps/channel,
prevents bits from being wasted all the way up to 255 kbps/channel
(the maximum rate allowed, and approximately the maximum number of
bits that can usefully be used regardless of the allocation), and
prevents dynalloc and trim from producing enormous waste
(eliminating the need for encoder logic to prevent this).
This changes folding so that the LCG is never used on transients
(either short blocks or long blocks with increased time
resolution), except in the case that there's not enough decoded
spectrum to fold yet.
It also now only subtracts the anti-collapse bit from the total
allocation in quant_all_bands() when space has actually been
reserved for it.
Finally, it cleans up some of the fill and collapse_mask tracking
(this tracking was originally made intentionally sloppy to save
work, but then converted to replace the existing fill flag at the
last minute, which can have a number of logical implications).
The changes, in particular:
1) Splits of less than a block now correctly mark the second half
as filled only if the whole block was filled (previously it
would also mark it filled if the next block was filled).
2) Splits of less than a block now correctly mark a block as
un-collapsed if either half was un-collapsed, instead of marking
the next block as un-collapsed when the high half was.
3) The N=2 stereo special case now keeps its fill mask even when
itheta==16384; previously this would have gotten cleared,
despite the fact that we fold into the side in this case.
4) The test against fill for folding now only considers the bits
corresponding to the current set of blocks.
Previously it would still fold if any later block was filled.
5) The collapse mask used for the LCG fold data is now correctly
initialized when B=16 on platforms with a 16-bit int.
6) The high bits on a collapse mask are now cleared after the TF
resolution changes and interleaving at level 0, instead of
waiting until the very end.
This prevents extraneous high flags set on mid from being mixed
into the side flags for mid-side stereo.
Using the min energy of the two last non-transient frames rather
than the min of just the two last frames. Also slightly increasing
the "thresh" upper bound coefficient to 0.5.
By moving the energy floor to the encoder, we can use a different
floor for prediction than for the decay level. Also, the fixed-point
dynamic range has been increased to avoid overflows when a fixed-point
decoder is used on a stream encoded in floating-point.
Jean-Marc's original anti-collapse patch used a threshold on the
content of a decoded band to determine whether or not it should
be filled with random noise.
Since this is highly sensitive to the accuracy of the
implementation, it could lead to significant decoder output
differences even if decoding error up to that point was relatively
small.
This patch detects collapsed bands from the output of the vector
quantizer, using exact integer arithmetic.
It makes two simplifying assumptions:
a) If either input to haar1() is non-zero during TF resolution
adjustments, then the output will be non-zero.
b) If the content of a block is non-zero in any of the bands that
are used for folding, then the folded output will be non-zero.
b) in particular is likely to be false when SPREAD_NONE is used.
It also ignores the case where mid and side are orthogonal in
stereo_merge, but this is relatively unlikely.
This misses just over 3% of the cases that Jean-Marc's anti-collapse
detection strategy would catch, but does not mis-classify any (all
detected collapses are true collapses).
This patch overloads the "fill" parameter to mark which blocks have
non-zero content for folding.
As a consequence, if a set of blocks on one side of a split has
collapsed, _no_ folding is done: the result would be zero anyway,
except for short blocks with SPREAD_AGGRESSIVE that are split down
to a single block, but a) that means a lot of bits were available
so a collapse is unlikely and b) anti-collapse can fill the block
anyway, if it's used.
This also means that if itheta==0 or itheta==16384, we no longer
fold at all on that side (even with long blocks), since we'd be
multiplying the result by zero anyway.
This looks for bands in each short block that have no energy. For
each of these "collapsed" bands, noise is injected to have an
energy equal to the minimum of the two previous frames for that band.
The mechanism can be used whenever there are 4 or more MDCTs (otherwise
no complete collapse is possible) and is signalled with one bit just
before the final fine energy bits.
This patch makes all symbols conditional on whether or not there's
enough space left in the buffer to code them, and eliminates much
of the redundancy in the side information.
A summary of the major changes:
* The isTransient flag is moved up to before the the coarse energy.
If there are not enough bits to code the coarse energy, the flag
would get forced to 0, meaning what energy values were coded
would get interpreted incorrectly.
This might not be the end of the world, and I'd be willing to
move it back given a compelling argument.
* Coarse energy switches coding schemes when there are less than 15
bits left in the packet:
- With at least 2 bits remaining, the change in energy is forced
to the range [-1...1] and coded with 1 bit (for 0) or 2 bits
(for +/-1).
- With only 1 bit remaining, the change in energy is forced to
the range [-1...0] and coded with one bit.
- If there is less than 1 bit remaining, the change in energy is
forced to -1.
This effectively low-passes bands whose energy is consistently
starved; this might be undesirable, but letting the default be
zero is unstable, which is worse.
* The tf_select flag gets moved back after the per-band tf_res
flags again, and is now skipped entirely when none of the
tf_res flags are set, and the default value is the same for
either alternative.
* dynalloc boosting is now limited so that it stops once it's given
a band all the remaining bits in the frame, or when it hits the
"stupid cap" of (64<<LM)*(C<<BITRES) used during allocation.
* If dynalloc boosing has allocated all the remaining bits in the
frame, the alloc trim parameter does not get encoded (it would
have no effect).
* The intensity stereo offset is now limited to the range
[start...codedBands], and thus doesn't get coded until after
all of the skip decisions.
Some space is reserved for it up front, and gradually given back
as each band is skipped.
* The dual stereo flag is coded only if intensity>start, since
otherwise it has no effect.
It is now coded after the intensity flag.
* The space reserved for the final skip flag, the intensity stereo
offset, and the dual stereo flag is now redistributed to all
bands equally if it is unused.
Before, the skip flag's bit was given to the band that stopped
skipping without it (usually a dynalloc boosted band).
In order to enable simple interaction between VBR and these
packet-size enforced limits, many of which are encountered before
VBR is run, the maximum packet size VBR will allow is computed at
the beginning of the encoding function, and the buffer reduced to
that size immediately.
Later, when it is time to make the VBR decision, the minimum packet
size is set high enough to ensure that no decision made thus far
will have been affected by the packet size.
As long as this is smaller than the up-front maximum, all of the
encoder's decisions will remain in-sync with the decoder.
If it is larger than the up-front maximum, the packet size is kept
at that maximum, also ensuring sync.
The minimum used now is slightly larger than it used to be, because
it also includes the bits added for dynalloc boosting.
Such boosting is shut off by the encoder at low rates, and so
should not cause any serious issues at the rates where we would
actually run out of room before compute_allocation().