Jean-Marc Valin
|
60a009b457
|
Making codebase C90-compliant
|
2022-01-19 18:10:44 -05:00 |
|
Jean-Marc Valin
|
4298f2f9e1
|
Adding support for SSE2 and SSSE3
|
2021-07-11 03:36:20 -04:00 |
|
Jean-Marc Valin
|
116bcb38fb
|
Adding SSE 4.1 for older platforms
AVX without AVX2 should now work again too.
|
2021-07-10 14:08:01 -04:00 |
|
Jean-Marc Valin
|
e8f70128d5
|
same conversion cleanup as 3206cec for sgemv_accum8x4()
|
2021-07-10 01:59:49 -04:00 |
|
Jean-Marc Valin
|
714380e71b
|
More manual unrolling
|
2021-07-10 01:59:49 -04:00 |
|
Jean-Marc Valin
|
44fe055682
|
cleanup float<->int conversions
|
2021-07-10 01:59:49 -04:00 |
|
Jean-Marc Valin
|
60d6eab63d
|
Doing a bit of unrolling to speed things up
|
2021-07-10 01:59:49 -04:00 |
|
Jean-Marc Valin
|
54abdb6f5d
|
Sparse matrix indexing optimization
The 4* is now stored in the table to avoid computing it in the loop
|
2021-07-10 01:59:49 -04:00 |
|
Jean-Marc Valin
|
d332100808
|
Representing output pdf as binary probability tree
Saves on the MDense/softmax computation since we only need to compute
8 values instead of 256.
|
2021-07-10 01:59:49 -04:00 |
|
Jean-Marc Valin
|
e35441f2cc
|
Faster activation functions for AVX
Using rational function approximation for tanh() and sigmoid.
|
2021-06-29 04:05:48 -04:00 |
|
Jean-Marc Valin
|
c1535c8ccf
|
Adding option to disable int8 dot products
|
2021-06-24 17:31:05 -04:00 |
|
Jean-Marc Valin
|
0b9f6bab81
|
Remove unnecessary mask in exp() approximation
This isn't necessary since valid exponents can't flip the sign bit
|
2021-06-21 01:34:38 -04:00 |
|
Jean-Marc Valin
|
ae2ae5ead6
|
Remove useless multiply by one
See bffdcee95 (commitcomment-46372726)
|
2021-06-21 01:30:51 -04:00 |
|
Jean-Marc Valin
|
8c3fe6f31d
|
Cleaning up float version
|
2021-01-16 02:11:21 -05:00 |
|
Jean-Marc Valin
|
83657d0e43
|
Dot product AVX2 code for non-sparse multiply
|
2021-01-16 02:11:21 -05:00 |
|
Jean-Marc Valin
|
e695355ba5
|
some cleanup
|
2021-01-16 02:11:20 -05:00 |
|
Jean-Marc Valin
|
d87f974431
|
Vectorizing conversion
|
2021-01-16 02:11:20 -05:00 |
|
Jean-Marc Valin
|
6b582edbed
|
WIP: remove scalar code from AVX2 code
|
2021-01-16 02:11:20 -05:00 |
|
Jean-Marc Valin
|
be392e3857
|
WIP: Got some AVX2 code working
|
2021-01-16 02:11:20 -05:00 |
|
Jean-Marc Valin
|
2b4652f9f6
|
WIP: cleanup
|
2021-01-16 02:11:20 -05:00 |
|
Jean-Marc Valin
|
bce779886d
|
WIP: signed*unsigned arithmetic
|
2021-01-16 02:11:20 -05:00 |
|
Jean-Marc Valin
|
11736ca9e3
|
WIP: 8-bit mul
|
2021-01-16 02:11:19 -05:00 |
|
Jean-Marc Valin
|
c045702e51
|
Add non-dot-product AVX code
|
2021-01-16 02:11:19 -05:00 |
|
Jean-Marc Valin
|
8e405b44e0
|
Improve accuracy of AVX sigmoid
Reciprocal approximation could cause the sigmoid output to be
greater than 1.0.
|
2021-01-16 01:51:39 -05:00 |
|
David Rowe
|
7dc696b9a4
|
refactored for different machines, sgemv_accum16 using NEON intrisics
Signed-off-by: Jean-Marc Valin <jmvalin@jmvalin.ca>
|
2018-12-10 21:28:29 -05:00 |
|