Commit graph

18 commits

Author SHA1 Message Date
Jean-Marc Valin
60a009b457 Making codebase C90-compliant 2022-01-19 18:10:44 -05:00
Jean-Marc Valin
4298f2f9e1 Adding support for SSE2 and SSSE3 2021-07-11 03:36:20 -04:00
Jean-Marc Valin
116bcb38fb Adding SSE 4.1 for older platforms
AVX without AVX2 should now work again too.
2021-07-10 14:08:01 -04:00
Jean-Marc Valin
54abdb6f5d Sparse matrix indexing optimization
The 4* is now stored in the table to avoid computing it in the loop
2021-07-10 01:59:49 -04:00
Jean-Marc Valin
d332100808 Representing output pdf as binary probability tree
Saves on the MDense/softmax computation since we only need to compute
8 values instead of 256.
2021-07-10 01:59:49 -04:00
Jean-Marc Valin
c1535c8ccf Adding option to disable int8 dot products 2021-06-24 17:31:05 -04:00
Jean-Marc Valin
b214e684c1 Neon WIP: Compiles but very slow 2021-01-16 02:11:21 -05:00
Jean-Marc Valin
8c3fe6f31d Cleaning up float version 2021-01-16 02:11:21 -05:00
Jean-Marc Valin
83657d0e43 Dot product AVX2 code for non-sparse multiply 2021-01-16 02:11:21 -05:00
Jean-Marc Valin
1707b960de cleanup, add signed-unsigned biases 2021-01-16 02:11:21 -05:00
Jean-Marc Valin
40b309d92b WIP: 8-bit SIMD for GRU B 2021-01-16 02:11:21 -05:00
Jean-Marc Valin
e695355ba5 some cleanup 2021-01-16 02:11:20 -05:00
Jean-Marc Valin
be392e3857 WIP: Got some AVX2 code working 2021-01-16 02:11:20 -05:00
Jean-Marc Valin
bce779886d WIP: signed*unsigned arithmetic 2021-01-16 02:11:20 -05:00
Jean-Marc Valin
11736ca9e3 WIP: 8-bit mul 2021-01-16 02:11:19 -05:00
Jean-Marc Valin
73a05f55c7 wip 8x4 2021-01-16 02:11:19 -05:00
Jean-Marc Valin
a8fb25f11c Remove NaN checks 2019-03-20 13:36:42 -04:00
David Rowe
7dc696b9a4 refactored for different machines, sgemv_accum16 using NEON intrisics
Signed-off-by: Jean-Marc Valin <jmvalin@jmvalin.ca>
2018-12-10 21:28:29 -05:00