Commit graph

10 commits

Author SHA1 Message Date
Jean-Marc Valin
f8f12e7f3c NEON float->char conversion (same as the AVX2 version) 2021-07-10 01:59:49 -04:00
Jean-Marc Valin
a1079c2ce3 Again, same conversion as 3206cec, for NEON 2021-07-10 01:59:49 -04:00
Jean-Marc Valin
54abdb6f5d Sparse matrix indexing optimization
The 4* is now stored in the table to avoid computing it in the loop
2021-07-10 01:59:49 -04:00
Jean-Marc Valin
d332100808 Representing output pdf as binary probability tree
Saves on the MDense/softmax computation since we only need to compute
8 values instead of 256.
2021-07-10 01:59:49 -04:00
Jean-Marc Valin
c1535c8ccf Adding option to disable int8 dot products 2021-06-24 17:31:05 -04:00
Jean-Marc Valin
b9c230b346 Add NEON intrinsics 2021-01-16 02:11:22 -05:00
Jean-Marc Valin
b214e684c1 Neon WIP: Compiles but very slow 2021-01-16 02:11:21 -05:00
Jean-Marc Valin
a09815925a Neon: Make gcc actually generate VMLA instructions for sparse mul
Otherwise it was splitting the mla into a mul and an add
2019-03-20 12:58:39 -04:00
Jean-Marc Valin
492ef9b362 Neon implementation of the activation functions 2019-03-20 03:03:44 -04:00
David Rowe
7dc696b9a4 refactored for different machines, sgemv_accum16 using NEON intrisics
Signed-off-by: Jean-Marc Valin <jmvalin@jmvalin.ca>
2018-12-10 21:28:29 -05:00