Commit graph

64 commits

Author SHA1 Message Date
Jean-Marc Valin
83e95a5ce6
Don't compute linear activation in-place
saves a few cycles
2023-08-01 21:26:16 -04:00
Jean-Marc Valin
e62fd5c5c9
C implementation of FWGAN 2023-08-01 19:19:13 -04:00
Jean-Marc Valin
e9f8402a71
Handle float matrices with multiple of 8 rows 2023-08-01 19:16:27 -04:00
Jean-Marc Valin
5eaa4a504f
Add Gated Linear Unit (GLU) 2023-08-01 17:52:49 -04:00
Jean-Marc Valin
b1f94b1e92
Add compute_generic_dense()
And missing prototypes
2023-07-27 19:54:10 -04:00
Jean-Marc Valin
60d67b1112
New compute_generic_conv1d() 2023-07-27 19:54:10 -04:00
Jean-Marc Valin
62cd1c963b
Transition to LinearLayer and remove unused code 2023-07-20 01:01:34 -04:00
Jean-Marc Valin
f5a68a41b0
Add generic linear layer
Should be able to handle all previous GRU variants and more.
2023-07-20 01:01:32 -04:00
Jean-Marc Valin
8423ef1de2
Remove unused code 2023-07-20 01:01:29 -04:00
Jean-Marc Valin
f9f35904f4
No longer need to #include "common.h" 2023-06-27 17:13:06 -04:00
Jean-Marc Valin
9f4fc8bbfa
Replacing RNN_ macros with existing OPUS_ ones 2023-06-23 00:02:12 -04:00
Marcus Asteborg
f36685fc97
Remove trailing whitespace in dnn 2023-06-22 13:58:37 -07:00
xnorpx
5b96946277 Use pragma message instead of warning on MSVC
Signed-off-by: Jean-Marc Valin <jmvalin@amazon.com>
2023-05-23 02:31:09 -04:00
Jan Buethe
e6390e34c7 removed compute_dense function (conflict with opus mlp) 2022-10-21 12:33:34 +00:00
Jan Buethe
1978cc6094 refactoring 2022-10-21 12:13:38 +00:00
Jan Buethe
c1b357ed47 first attempt of C implementation of fec encoder (not tested yet due to NEON/DOT_PROD not being separable) 2022-10-18 19:30:23 +02:00
Jean-Marc Valin
f3bc6bacd2 Avoiding tmp buffer overflows 2022-02-03 00:27:20 -05:00
Jean-Marc Valin
2f5b51c94a Avoiding symbol clashes with Opus 2022-01-24 23:21:31 -05:00
Jean-Marc Valin
805fed733a Fix warnings 2022-01-24 16:33:32 -05:00
Jean-Marc Valin
57f5681987 Add swish activation support 2022-01-24 16:22:29 -05:00
Jean-Marc Valin
60a009b457 Making codebase C90-compliant 2022-01-19 18:10:44 -05:00
Jean-Marc Valin
3a47548536 Using KISS99 (taken from Daala) as RNG 2021-11-10 17:58:51 -05:00
Jean-Marc Valin
e4b4613d05 Fix signed-unsigned biases 2021-09-02 02:34:08 -04:00
Jean-Marc Valin
51ef273e06 Using 8-bit recurrent weights for GRU B 2021-09-02 02:33:55 -04:00
Jean-Marc Valin
8bdbbfa18d Support for sparse GRU B input matrices
Only on the C side, no sparse GRU B training yet
2021-07-16 03:07:26 -04:00
Jean-Marc Valin
c74330e850 Pre-compute GRU B conditioning
Adapted from PR: https://github.com/mozilla/LPCNet/pull/134
by zhuxiaoxu <zhuxiaoxu@ainirobot.com>
but had to be reworked due to previous weight quantization changes.
2021-07-15 16:06:56 -04:00
Jean-Marc Valin
7d8b00f11d Sampling directly from the logit
Avoids having to compute a sigmoid
2021-07-10 01:59:49 -04:00
Jean-Marc Valin
7cef98ec8c Minor optimization: merging all 3 embeddings 2021-07-10 01:59:49 -04:00
Jean-Marc Valin
006556036a Cleaning up the sparse GRU
It no longer overwrites its input vector
2021-07-10 01:59:49 -04:00
Jean-Marc Valin
d332100808 Representing output pdf as binary probability tree
Saves on the MDense/softmax computation since we only need to compute
8 values instead of 256.
2021-07-10 01:59:49 -04:00
Jean-Marc Valin
8c4b88cfab Using a bisection search for sampling 2021-06-30 18:14:12 -04:00
Jean-Marc Valin
e35441f2cc Faster activation functions for AVX
Using rational function approximation for tanh() and sigmoid.
2021-06-29 04:05:48 -04:00
Jean-Marc Valin
5571ef1b8e minor optimization: removing some copying 2021-06-26 01:27:03 -04:00
Jean-Marc Valin
83657d0e43 Dot product AVX2 code for non-sparse multiply 2021-01-16 02:11:21 -05:00
Jean-Marc Valin
40b309d92b WIP: 8-bit SIMD for GRU B 2021-01-16 02:11:21 -05:00
Jean-Marc Valin
06489b42dd oops, fix number of columns 2021-01-16 02:11:20 -05:00
Jean-Marc Valin
bce779886d WIP: signed*unsigned arithmetic 2021-01-16 02:11:20 -05:00
Jean-Marc Valin
73a05f55c7 wip 8x4 2021-01-16 02:11:19 -05:00
Jean-Marc Valin
14fb264a0f Fix sampling bug for 16-bit rand()
According to David Rowe, when rand() returns RAND_MAX (which is likely
for 16-bit output), we end up producing a click.
2020-06-20 23:47:04 -04:00
David Rowe
7dc696b9a4 refactored for different machines, sgemv_accum16 using NEON intrisics
Signed-off-by: Jean-Marc Valin <jmvalin@jmvalin.ca>
2018-12-10 21:28:29 -05:00
Jean-Marc Valin
771cc7868a Support for plain AVX with no FMA 2018-12-04 07:58:13 -05:00
Jean-Marc Valin
b05f950e38 Using the right name: s/gemm/sgemv/ 2018-11-30 10:56:44 -05:00
Jean-Marc Valin
c395a68b7d moving code around 2018-11-30 10:46:32 -05:00
Jean-Marc Valin
05f4851dcd Making the code work even without AVX2/FMA 2018-11-30 10:32:04 -05:00
Jean-Marc Valin
d7f0abcd19 Delaying the softmax() to avoid the pow()
Now at 5x real-time, with all the low-hanging fruit done.
2018-11-29 20:09:36 -05:00
Jean-Marc Valin
faf3fe3d24 gemm_accum16() doesn't need a multiple of 16 columns (just lines). 2018-11-29 19:50:09 -05:00
Jean-Marc Valin
7ee79b63df Add AXV versions of exp(), tanh() and sigmoid()
Now 3x faster than real-time
2018-11-29 19:43:59 -05:00
Jean-Marc Valin
4de3e53a73 Adding some sparse GRU support
Still need to properly dump as sparse.
2018-11-28 18:49:19 -05:00
Jean-Marc Valin
ec671ed90e Quick and dirty AVX2 implementation of gemm_accum
Brings us very close to real-time
2018-11-28 14:57:22 -05:00
Jean-Marc Valin
732fce9ab2 Pre-computing GRU_A's input contribution. 2018-11-28 14:05:36 -05:00