Jean-Marc Valin
|
f8f12e7f3c
|
NEON float->char conversion (same as the AVX2 version)
|
2021-07-10 01:59:49 -04:00 |
|
Jean-Marc Valin
|
a1079c2ce3
|
Again, same conversion as 3206cec, for NEON
|
2021-07-10 01:59:49 -04:00 |
|
Jean-Marc Valin
|
54abdb6f5d
|
Sparse matrix indexing optimization
The 4* is now stored in the table to avoid computing it in the loop
|
2021-07-10 01:59:49 -04:00 |
|
Jean-Marc Valin
|
d332100808
|
Representing output pdf as binary probability tree
Saves on the MDense/softmax computation since we only need to compute
8 values instead of 256.
|
2021-07-10 01:59:49 -04:00 |
|
Jean-Marc Valin
|
c1535c8ccf
|
Adding option to disable int8 dot products
|
2021-06-24 17:31:05 -04:00 |
|
Jean-Marc Valin
|
b9c230b346
|
Add NEON intrinsics
|
2021-01-16 02:11:22 -05:00 |
|
Jean-Marc Valin
|
b214e684c1
|
Neon WIP: Compiles but very slow
|
2021-01-16 02:11:21 -05:00 |
|
Jean-Marc Valin
|
a09815925a
|
Neon: Make gcc actually generate VMLA instructions for sparse mul
Otherwise it was splitting the mla into a mul and an add
|
2019-03-20 12:58:39 -04:00 |
|
Jean-Marc Valin
|
492ef9b362
|
Neon implementation of the activation functions
|
2019-03-20 03:03:44 -04:00 |
|
David Rowe
|
7dc696b9a4
|
refactored for different machines, sgemv_accum16 using NEON intrisics
Signed-off-by: Jean-Marc Valin <jmvalin@jmvalin.ca>
|
2018-12-10 21:28:29 -05:00 |
|