Jean-Marc Valin
|
8bdbbfa18d
|
Support for sparse GRU B input matrices
Only on the C side, no sparse GRU B training yet
|
2021-07-16 03:07:26 -04:00 |
|
Jean-Marc Valin
|
4c0e224865
|
Model update
Weights are the same, but they are dumped to C differently.
|
2021-07-15 16:12:42 -04:00 |
|
Jean-Marc Valin
|
c74330e850
|
Pre-compute GRU B conditioning
Adapted from PR: https://github.com/mozilla/LPCNet/pull/134
by zhuxiaoxu <zhuxiaoxu@ainirobot.com>
but had to be reworked due to previous weight quantization changes.
|
2021-07-15 16:06:56 -04:00 |
|
Jean-Marc Valin
|
0d53fad50d
|
Using np.memmap() to load the training data
Makes loading faster
|
2021-07-14 13:47:23 -04:00 |
|
Jean-Marc Valin
|
5a51e2eed1
|
Adding command-line options to training script
|
2021-07-13 03:09:04 -04:00 |
|
Jean-Marc Valin
|
1edf5d7986
|
README.md update
|
2021-07-11 03:46:25 -04:00 |
|
Jean-Marc Valin
|
4298f2f9e1
|
Adding support for SSE2 and SSSE3
|
2021-07-11 03:36:20 -04:00 |
|
Jean-Marc Valin
|
116bcb38fb
|
Adding SSE 4.1 for older platforms
AVX without AVX2 should now work again too.
|
2021-07-10 14:08:01 -04:00 |
|
Jean-Marc Valin
|
3e223e6015
|
Fixes Python inference for the binary probability tree
|
2021-07-10 01:59:49 -04:00 |
|
Jean-Marc Valin
|
f8f12e7f3c
|
NEON float->char conversion (same as the AVX2 version)
|
2021-07-10 01:59:49 -04:00 |
|
Jean-Marc Valin
|
a1079c2ce3
|
Again, same conversion as 3206cec, for NEON
|
2021-07-10 01:59:49 -04:00 |
|
Jean-Marc Valin
|
7d8b00f11d
|
Sampling directly from the logit
Avoids having to compute a sigmoid
|
2021-07-10 01:59:49 -04:00 |
|
Jean-Marc Valin
|
e8f70128d5
|
same conversion cleanup as 3206cec for sgemv_accum8x4()
|
2021-07-10 01:59:49 -04:00 |
|
Jean-Marc Valin
|
7cef98ec8c
|
Minor optimization: merging all 3 embeddings
|
2021-07-10 01:59:49 -04:00 |
|
Jean-Marc Valin
|
714380e71b
|
More manual unrolling
|
2021-07-10 01:59:49 -04:00 |
|
Jean-Marc Valin
|
006556036a
|
Cleaning up the sparse GRU
It no longer overwrites its input vector
|
2021-07-10 01:59:49 -04:00 |
|
Jean-Marc Valin
|
44fe055682
|
cleanup float<->int conversions
|
2021-07-10 01:59:49 -04:00 |
|
Jean-Marc Valin
|
60d6eab63d
|
Doing a bit of unrolling to speed things up
|
2021-07-10 01:59:49 -04:00 |
|
Jean-Marc Valin
|
3e7ab9ff87
|
update model
|
2021-07-10 01:59:49 -04:00 |
|
Jean-Marc Valin
|
54abdb6f5d
|
Sparse matrix indexing optimization
The 4* is now stored in the table to avoid computing it in the loop
|
2021-07-10 01:59:49 -04:00 |
|
Jean-Marc Valin
|
2681822c18
|
update model
|
2021-07-10 01:59:49 -04:00 |
|
Jean-Marc Valin
|
d332100808
|
Representing output pdf as binary probability tree
Saves on the MDense/softmax computation since we only need to compute
8 values instead of 256.
|
2021-07-10 01:59:49 -04:00 |
|
Jean-Marc Valin
|
c151fc1853
|
Merge branch 'exp_improved_simd2'
|
2021-06-30 18:56:04 -04:00 |
|
Jean-Marc Valin
|
f0ce43389a
|
Update test_lpcnet.py, remove old TF1 code
|
2021-06-30 18:54:27 -04:00 |
|
Jean-Marc Valin
|
d428b0d32a
|
Update model
|
2021-06-30 18:27:31 -04:00 |
|
Jean-Marc Valin
|
8c4b88cfab
|
Using a bisection search for sampling
|
2021-06-30 18:14:12 -04:00 |
|
Jean-Marc Valin
|
e35441f2cc
|
Faster activation functions for AVX
Using rational function approximation for tanh() and sigmoid.
|
2021-06-29 04:05:48 -04:00 |
|
Jean-Marc Valin
|
5571ef1b8e
|
minor optimization: removing some copying
|
2021-06-26 01:27:03 -04:00 |
|
Jean-Marc Valin
|
d61f7e00f8
|
Fix missing transpose in the sparity code
CuDNNGRU and GRU don't use the same weight format
|
2021-06-25 13:43:37 -04:00 |
|
Jean-Marc Valin
|
ca0a43bee9
|
Update README.md
|
2021-06-24 17:47:51 -04:00 |
|
Jean-Marc Valin
|
c1535c8ccf
|
Adding option to disable int8 dot products
|
2021-06-24 17:31:05 -04:00 |
|
Jean-Marc Valin
|
0b9f6bab81
|
Remove unnecessary mask in exp() approximation
This isn't necessary since valid exponents can't flip the sign bit
|
2021-06-21 01:34:38 -04:00 |
|
Jean-Marc Valin
|
ae2ae5ead6
|
Remove useless multiply by one
See bffdcee95 (commitcomment-46372726)
|
2021-06-21 01:30:51 -04:00 |
|
Jean-Marc Valin
|
c7ba313a67
|
Adding extra constraint to avoid saturation for SSE/AVX2
When implementing using SSSE3 or AVX2, our dot products can saturate
if two adjacent weights sum to more than 127.
|
2021-06-18 17:39:35 -04:00 |
|
Jean-Marc Valin
|
237245f815
|
Support for multi-GPU training
Not sure why CuDNNGRU doesn't get used by default, but we need
to explicitly use it to get things to run fast.
|
2021-06-18 13:20:43 -04:00 |
|
Jean-Marc Valin
|
ebc9483b4c
|
update model
|
2021-02-01 01:07:35 -05:00 |
|
Jean-Marc Valin
|
79980b2044
|
Minor update to training scripts
|
2021-01-18 02:13:52 -05:00 |
|
Jean-Marc Valin
|
20fea538c2
|
more reasonable noise
was increased too much in 713d53e8a
|
2021-01-17 21:39:42 -05:00 |
|
Jean-Marc Valin
|
b9c230b346
|
Add NEON intrinsics
|
2021-01-16 02:11:22 -05:00 |
|
Jean-Marc Valin
|
b214e684c1
|
Neon WIP: Compiles but very slow
|
2021-01-16 02:11:21 -05:00 |
|
Jean-Marc Valin
|
8c3fe6f31d
|
Cleaning up float version
|
2021-01-16 02:11:21 -05:00 |
|
Jean-Marc Valin
|
40b9fd0a75
|
Fix some quantization issues
|
2021-01-16 02:11:21 -05:00 |
|
Jean-Marc Valin
|
83657d0e43
|
Dot product AVX2 code for non-sparse multiply
|
2021-01-16 02:11:21 -05:00 |
|
Jean-Marc Valin
|
1707b960de
|
cleanup, add signed-unsigned biases
|
2021-01-16 02:11:21 -05:00 |
|
Jean-Marc Valin
|
40b309d92b
|
WIP: 8-bit SIMD for GRU B
|
2021-01-16 02:11:21 -05:00 |
|
Jean-Marc Valin
|
e695355ba5
|
some cleanup
|
2021-01-16 02:11:20 -05:00 |
|
Jean-Marc Valin
|
06489b42dd
|
oops, fix number of columns
|
2021-01-16 02:11:20 -05:00 |
|
Jean-Marc Valin
|
d87f974431
|
Vectorizing conversion
|
2021-01-16 02:11:20 -05:00 |
|
Jean-Marc Valin
|
6b582edbed
|
WIP: remove scalar code from AVX2 code
|
2021-01-16 02:11:20 -05:00 |
|
Jean-Marc Valin
|
be392e3857
|
WIP: Got some AVX2 code working
|
2021-01-16 02:11:20 -05:00 |
|