Jean-Marc Valin
|
60a009b457
|
Making codebase C90-compliant
|
2022-01-19 18:10:44 -05:00 |
|
Jean-Marc Valin
|
4298f2f9e1
|
Adding support for SSE2 and SSSE3
|
2021-07-11 03:36:20 -04:00 |
|
Jean-Marc Valin
|
116bcb38fb
|
Adding SSE 4.1 for older platforms
AVX without AVX2 should now work again too.
|
2021-07-10 14:08:01 -04:00 |
|
Jean-Marc Valin
|
54abdb6f5d
|
Sparse matrix indexing optimization
The 4* is now stored in the table to avoid computing it in the loop
|
2021-07-10 01:59:49 -04:00 |
|
Jean-Marc Valin
|
d332100808
|
Representing output pdf as binary probability tree
Saves on the MDense/softmax computation since we only need to compute
8 values instead of 256.
|
2021-07-10 01:59:49 -04:00 |
|
Jean-Marc Valin
|
c1535c8ccf
|
Adding option to disable int8 dot products
|
2021-06-24 17:31:05 -04:00 |
|
Jean-Marc Valin
|
b214e684c1
|
Neon WIP: Compiles but very slow
|
2021-01-16 02:11:21 -05:00 |
|
Jean-Marc Valin
|
8c3fe6f31d
|
Cleaning up float version
|
2021-01-16 02:11:21 -05:00 |
|
Jean-Marc Valin
|
83657d0e43
|
Dot product AVX2 code for non-sparse multiply
|
2021-01-16 02:11:21 -05:00 |
|
Jean-Marc Valin
|
1707b960de
|
cleanup, add signed-unsigned biases
|
2021-01-16 02:11:21 -05:00 |
|
Jean-Marc Valin
|
40b309d92b
|
WIP: 8-bit SIMD for GRU B
|
2021-01-16 02:11:21 -05:00 |
|
Jean-Marc Valin
|
e695355ba5
|
some cleanup
|
2021-01-16 02:11:20 -05:00 |
|
Jean-Marc Valin
|
be392e3857
|
WIP: Got some AVX2 code working
|
2021-01-16 02:11:20 -05:00 |
|
Jean-Marc Valin
|
bce779886d
|
WIP: signed*unsigned arithmetic
|
2021-01-16 02:11:20 -05:00 |
|
Jean-Marc Valin
|
11736ca9e3
|
WIP: 8-bit mul
|
2021-01-16 02:11:19 -05:00 |
|
Jean-Marc Valin
|
73a05f55c7
|
wip 8x4
|
2021-01-16 02:11:19 -05:00 |
|
Jean-Marc Valin
|
a8fb25f11c
|
Remove NaN checks
|
2019-03-20 13:36:42 -04:00 |
|
David Rowe
|
7dc696b9a4
|
refactored for different machines, sgemv_accum16 using NEON intrisics
Signed-off-by: Jean-Marc Valin <jmvalin@jmvalin.ca>
|
2018-12-10 21:28:29 -05:00 |
|