Commit graph

456 commits

Author SHA1 Message Date
Jean-Marc Valin
8bdbbfa18d Support for sparse GRU B input matrices
Only on the C side, no sparse GRU B training yet
2021-07-16 03:07:26 -04:00
Jean-Marc Valin
4c0e224865 Model update
Weights are the same, but they are dumped to C differently.
2021-07-15 16:12:42 -04:00
Jean-Marc Valin
c74330e850 Pre-compute GRU B conditioning
Adapted from PR: https://github.com/mozilla/LPCNet/pull/134
by zhuxiaoxu <zhuxiaoxu@ainirobot.com>
but had to be reworked due to previous weight quantization changes.
2021-07-15 16:06:56 -04:00
Jean-Marc Valin
0d53fad50d Using np.memmap() to load the training data
Makes loading faster
2021-07-14 13:47:23 -04:00
Jean-Marc Valin
5a51e2eed1 Adding command-line options to training script 2021-07-13 03:09:04 -04:00
Jean-Marc Valin
1edf5d7986 README.md update 2021-07-11 03:46:25 -04:00
Jean-Marc Valin
4298f2f9e1 Adding support for SSE2 and SSSE3 2021-07-11 03:36:20 -04:00
Jean-Marc Valin
116bcb38fb Adding SSE 4.1 for older platforms
AVX without AVX2 should now work again too.
2021-07-10 14:08:01 -04:00
Jean-Marc Valin
3e223e6015 Fixes Python inference for the binary probability tree 2021-07-10 01:59:49 -04:00
Jean-Marc Valin
f8f12e7f3c NEON float->char conversion (same as the AVX2 version) 2021-07-10 01:59:49 -04:00
Jean-Marc Valin
a1079c2ce3 Again, same conversion as 3206cec, for NEON 2021-07-10 01:59:49 -04:00
Jean-Marc Valin
7d8b00f11d Sampling directly from the logit
Avoids having to compute a sigmoid
2021-07-10 01:59:49 -04:00
Jean-Marc Valin
e8f70128d5 same conversion cleanup as 3206cec for sgemv_accum8x4() 2021-07-10 01:59:49 -04:00
Jean-Marc Valin
7cef98ec8c Minor optimization: merging all 3 embeddings 2021-07-10 01:59:49 -04:00
Jean-Marc Valin
714380e71b More manual unrolling 2021-07-10 01:59:49 -04:00
Jean-Marc Valin
006556036a Cleaning up the sparse GRU
It no longer overwrites its input vector
2021-07-10 01:59:49 -04:00
Jean-Marc Valin
44fe055682 cleanup float<->int conversions 2021-07-10 01:59:49 -04:00
Jean-Marc Valin
60d6eab63d Doing a bit of unrolling to speed things up 2021-07-10 01:59:49 -04:00
Jean-Marc Valin
3e7ab9ff87 update model 2021-07-10 01:59:49 -04:00
Jean-Marc Valin
54abdb6f5d Sparse matrix indexing optimization
The 4* is now stored in the table to avoid computing it in the loop
2021-07-10 01:59:49 -04:00
Jean-Marc Valin
2681822c18 update model 2021-07-10 01:59:49 -04:00
Jean-Marc Valin
d332100808 Representing output pdf as binary probability tree
Saves on the MDense/softmax computation since we only need to compute
8 values instead of 256.
2021-07-10 01:59:49 -04:00
Jean-Marc Valin
c151fc1853 Merge branch 'exp_improved_simd2' 2021-06-30 18:56:04 -04:00
Jean-Marc Valin
f0ce43389a Update test_lpcnet.py, remove old TF1 code 2021-06-30 18:54:27 -04:00
Jean-Marc Valin
d428b0d32a Update model 2021-06-30 18:27:31 -04:00
Jean-Marc Valin
8c4b88cfab Using a bisection search for sampling 2021-06-30 18:14:12 -04:00
Jean-Marc Valin
e35441f2cc Faster activation functions for AVX
Using rational function approximation for tanh() and sigmoid.
2021-06-29 04:05:48 -04:00
Jean-Marc Valin
5571ef1b8e minor optimization: removing some copying 2021-06-26 01:27:03 -04:00
Jean-Marc Valin
d61f7e00f8 Fix missing transpose in the sparity code
CuDNNGRU and GRU don't use the same weight format
2021-06-25 13:43:37 -04:00
Jean-Marc Valin
ca0a43bee9 Update README.md 2021-06-24 17:47:51 -04:00
Jean-Marc Valin
c1535c8ccf Adding option to disable int8 dot products 2021-06-24 17:31:05 -04:00
Jean-Marc Valin
0b9f6bab81 Remove unnecessary mask in exp() approximation
This isn't necessary since valid exponents can't flip the sign bit
2021-06-21 01:34:38 -04:00
Jean-Marc Valin
ae2ae5ead6 Remove useless multiply by one
See bffdcee95 (commitcomment-46372726)
2021-06-21 01:30:51 -04:00
Jean-Marc Valin
c7ba313a67 Adding extra constraint to avoid saturation for SSE/AVX2
When implementing using SSSE3 or AVX2, our dot products can saturate
if two adjacent weights sum to more than 127.
2021-06-18 17:39:35 -04:00
Jean-Marc Valin
237245f815 Support for multi-GPU training
Not sure why CuDNNGRU doesn't get used by default, but we need
to explicitly use it to get things to run fast.
2021-06-18 13:20:43 -04:00
Jean-Marc Valin
ebc9483b4c update model 2021-02-01 01:07:35 -05:00
Jean-Marc Valin
79980b2044 Minor update to training scripts 2021-01-18 02:13:52 -05:00
Jean-Marc Valin
20fea538c2 more reasonable noise
was increased too much in 713d53e8a
2021-01-17 21:39:42 -05:00
Jean-Marc Valin
b9c230b346 Add NEON intrinsics 2021-01-16 02:11:22 -05:00
Jean-Marc Valin
b214e684c1 Neon WIP: Compiles but very slow 2021-01-16 02:11:21 -05:00
Jean-Marc Valin
8c3fe6f31d Cleaning up float version 2021-01-16 02:11:21 -05:00
Jean-Marc Valin
40b9fd0a75 Fix some quantization issues 2021-01-16 02:11:21 -05:00
Jean-Marc Valin
83657d0e43 Dot product AVX2 code for non-sparse multiply 2021-01-16 02:11:21 -05:00
Jean-Marc Valin
1707b960de cleanup, add signed-unsigned biases 2021-01-16 02:11:21 -05:00
Jean-Marc Valin
40b309d92b WIP: 8-bit SIMD for GRU B 2021-01-16 02:11:21 -05:00
Jean-Marc Valin
e695355ba5 some cleanup 2021-01-16 02:11:20 -05:00
Jean-Marc Valin
06489b42dd oops, fix number of columns 2021-01-16 02:11:20 -05:00
Jean-Marc Valin
d87f974431 Vectorizing conversion 2021-01-16 02:11:20 -05:00
Jean-Marc Valin
6b582edbed WIP: remove scalar code from AVX2 code 2021-01-16 02:11:20 -05:00
Jean-Marc Valin
be392e3857 WIP: Got some AVX2 code working 2021-01-16 02:11:20 -05:00