Commit graph

47 commits

Author SHA1 Message Date
Jean-Marc Valin
2f5b51c94a Avoiding symbol clashes with Opus 2022-01-24 23:21:31 -05:00
Jean-Marc Valin
805fed733a Fix warnings 2022-01-24 16:33:32 -05:00
Jean-Marc Valin
57f5681987 Add swish activation support 2022-01-24 16:22:29 -05:00
Jean-Marc Valin
60a009b457 Making codebase C90-compliant 2022-01-19 18:10:44 -05:00
Jean-Marc Valin
3a47548536 Using KISS99 (taken from Daala) as RNG 2021-11-10 17:58:51 -05:00
Jean-Marc Valin
e4b4613d05 Fix signed-unsigned biases 2021-09-02 02:34:08 -04:00
Jean-Marc Valin
51ef273e06 Using 8-bit recurrent weights for GRU B 2021-09-02 02:33:55 -04:00
Jean-Marc Valin
8bdbbfa18d Support for sparse GRU B input matrices
Only on the C side, no sparse GRU B training yet
2021-07-16 03:07:26 -04:00
Jean-Marc Valin
c74330e850 Pre-compute GRU B conditioning
Adapted from PR: https://github.com/mozilla/LPCNet/pull/134
by zhuxiaoxu <zhuxiaoxu@ainirobot.com>
but had to be reworked due to previous weight quantization changes.
2021-07-15 16:06:56 -04:00
Jean-Marc Valin
7d8b00f11d Sampling directly from the logit
Avoids having to compute a sigmoid
2021-07-10 01:59:49 -04:00
Jean-Marc Valin
7cef98ec8c Minor optimization: merging all 3 embeddings 2021-07-10 01:59:49 -04:00
Jean-Marc Valin
006556036a Cleaning up the sparse GRU
It no longer overwrites its input vector
2021-07-10 01:59:49 -04:00
Jean-Marc Valin
d332100808 Representing output pdf as binary probability tree
Saves on the MDense/softmax computation since we only need to compute
8 values instead of 256.
2021-07-10 01:59:49 -04:00
Jean-Marc Valin
8c4b88cfab Using a bisection search for sampling 2021-06-30 18:14:12 -04:00
Jean-Marc Valin
e35441f2cc Faster activation functions for AVX
Using rational function approximation for tanh() and sigmoid.
2021-06-29 04:05:48 -04:00
Jean-Marc Valin
5571ef1b8e minor optimization: removing some copying 2021-06-26 01:27:03 -04:00
Jean-Marc Valin
83657d0e43 Dot product AVX2 code for non-sparse multiply 2021-01-16 02:11:21 -05:00
Jean-Marc Valin
40b309d92b WIP: 8-bit SIMD for GRU B 2021-01-16 02:11:21 -05:00
Jean-Marc Valin
06489b42dd oops, fix number of columns 2021-01-16 02:11:20 -05:00
Jean-Marc Valin
bce779886d WIP: signed*unsigned arithmetic 2021-01-16 02:11:20 -05:00
Jean-Marc Valin
73a05f55c7 wip 8x4 2021-01-16 02:11:19 -05:00
Jean-Marc Valin
14fb264a0f Fix sampling bug for 16-bit rand()
According to David Rowe, when rand() returns RAND_MAX (which is likely
for 16-bit output), we end up producing a click.
2020-06-20 23:47:04 -04:00
David Rowe
7dc696b9a4 refactored for different machines, sgemv_accum16 using NEON intrisics
Signed-off-by: Jean-Marc Valin <jmvalin@jmvalin.ca>
2018-12-10 21:28:29 -05:00
Jean-Marc Valin
771cc7868a Support for plain AVX with no FMA 2018-12-04 07:58:13 -05:00
Jean-Marc Valin
b05f950e38 Using the right name: s/gemm/sgemv/ 2018-11-30 10:56:44 -05:00
Jean-Marc Valin
c395a68b7d moving code around 2018-11-30 10:46:32 -05:00
Jean-Marc Valin
05f4851dcd Making the code work even without AVX2/FMA 2018-11-30 10:32:04 -05:00
Jean-Marc Valin
d7f0abcd19 Delaying the softmax() to avoid the pow()
Now at 5x real-time, with all the low-hanging fruit done.
2018-11-29 20:09:36 -05:00
Jean-Marc Valin
faf3fe3d24 gemm_accum16() doesn't need a multiple of 16 columns (just lines). 2018-11-29 19:50:09 -05:00
Jean-Marc Valin
7ee79b63df Add AXV versions of exp(), tanh() and sigmoid()
Now 3x faster than real-time
2018-11-29 19:43:59 -05:00
Jean-Marc Valin
4de3e53a73 Adding some sparse GRU support
Still need to properly dump as sparse.
2018-11-28 18:49:19 -05:00
Jean-Marc Valin
ec671ed90e Quick and dirty AVX2 implementation of gemm_accum
Brings us very close to real-time
2018-11-28 14:57:22 -05:00
Jean-Marc Valin
732fce9ab2 Pre-computing GRU_A's input contribution. 2018-11-28 14:05:36 -05:00
Jean-Marc Valin
040aa437c3 Simper GRU implementation just for reset_after. 2018-11-28 12:37:18 -05:00
Jean-Marc Valin
36a0bf8c75 Wow, managed two bugs in a 25-character line 2018-11-27 14:50:38 -05:00
Jean-Marc Valin
c7b978b923 Fix reset_after GRU 2018-11-27 14:37:10 -05:00
Jean-Marc Valin
4ccfbdff04 Frame network seems to be working 2018-11-26 18:41:54 -05:00
Jean-Marc Valin
538f25565a Starting to actually test this -- fix a few OOB reads 2018-11-26 16:02:49 -05:00
Jean-Marc Valin
575d8d6fa4 Adding sampling 2018-11-26 11:04:41 -05:00
Jean-Marc Valin
7119eaf33b Plumbing for the frame rate network 2018-11-25 17:20:24 -05:00
Jean-Marc Valin
141830ce5a Fixing includes 2018-11-24 16:00:30 -05:00
Jean-Marc Valin
37fbcaee0b mdense max size 2018-11-24 15:51:08 -05:00
Jean-Marc Valin
94ac0841df Precomputing sizes 2018-11-24 15:47:48 -05:00
Jean-Marc Valin
c025744e34 Fix conv1d, default to size 384 2018-11-24 15:30:17 -05:00
Jean-Marc Valin
66486004ba Implement MDense 2018-11-24 12:23:11 -05:00
Jean-Marc Valin
d4046036a9 Dump Conv1D (didn't check weight ordering at all) 2018-11-24 11:32:01 -05:00
Jean-Marc Valin
b9cd61be8b Work in progress translation to C 2018-11-23 19:43:58 -05:00