Timothy B. Terriberry
59dc75fa97
Rework 32-bit SSE loads yet again.
...
The existing code in vec_avx.h produced
warning: dereferencing type-punned pointer will break
strict-aliasing rules
with gcc 6.4.0.
We already had a macro to work around this within the rules of the
C standard, but trying to use that here does not get optimized
into a single MOVD like we were hoping.
Replacing it with memcpy() instead does get optimized correctly,
but requires switching from a macro to an inline function in order
to be able to declare a local variable and return a value.
We already have such an inline function in NSQ_del_dec_avx2.c, so
hoist that out and use it everywhere, and then convert vec_avx.h
to use it also.
2024-02-23 02:23:37 -05:00
Jean-Marc Valin
2e034f6f31
Adding RTCD for DNN code
...
Starting with compute_linear()
2023-11-15 23:45:32 -05:00
Jean-Marc Valin
58923f61c2
Fix non-AVX builds
2023-11-11 03:24:21 -05:00
Jean-Marc Valin
1ada7d4d6f
Vectorizing sgemv for multiples of 4 with SSE
2023-11-03 02:48:38 -04:00
Jean-Marc Valin
62b546436f
Speed up general case for float matrix multiply
2023-10-30 00:08:53 -04:00
Jean-Marc Valin
88c58cfaf3
nnet.h no longer needs to #include "vec.h"
2023-10-20 17:25:27 -04:00
Jean-Marc Valin
81624caf9c
Silencing alignment warnings on x86 intrinsics
...
Those intrinsics don't actually require alignment so we're OK
2023-10-07 17:45:39 -04:00
Michael Klingbeil
d431c321f1
Fixes vnni macro redefinition with clang
2023-09-01 23:18:21 -04:00
Jean-Marc Valin
e9f8402a71
Handle float matrices with multiple of 8 rows
2023-08-01 19:16:27 -04:00
Jean-Marc Valin
8f7c72a662
Always define USE_SU_BIAS in vec_avx.h
2023-07-22 14:56:05 -04:00
Jean-Marc Valin
4710bdf712
Add SSE2 support
...
Not so much for old machines, as for getting decent performance
when not setting -march= (SSE2 is part of the amd64 ABI).
2023-07-22 14:56:05 -04:00
Jean-Marc Valin
9261eb5c37
Refactoring to make VNNI and SSE2 easier
2023-07-22 14:56:04 -04:00
Jean-Marc Valin
62cd1c963b
Transition to LinearLayer and remove unused code
2023-07-20 01:01:34 -04:00
Jean-Marc Valin
f5a68a41b0
Add generic linear layer
...
Should be able to handle all previous GRU variants and more.
2023-07-20 01:01:32 -04:00
xnorpx
7122abde59
Rename celt_exp to lpcnet_exp
...
Depending on what defines are set there is collisions with the ones
in Opus. To avoid these errors we rename the exp functions and
macros.
Signed-off-by: Jean-Marc Valin <jmvalin@amazon.com>
2023-05-24 00:46:20 -04:00
xnorpx
879084f6f0
Fix some of C4244 double to float warnings
2023-05-24 00:30:19 -04:00
xnorpx
702fffb70a
Include math.h to make header self-contained.
...
Signed-off-by: Jean-Marc Valin <jmvalin@amazon.com>
2023-05-23 11:24:35 -04:00
xnorpx
5b96946277
Use pragma message instead of warning on MSVC
...
Signed-off-by: Jean-Marc Valin <jmvalin@amazon.com>
2023-05-23 02:31:09 -04:00
Jan Buethe
d80f99f78b
added void to shut up missing prototype warning
2022-10-21 15:33:41 +00:00
Jean-Marc Valin
60a009b457
Making codebase C90-compliant
2022-01-19 18:10:44 -05:00
Jean-Marc Valin
4298f2f9e1
Adding support for SSE2 and SSSE3
2021-07-11 03:36:20 -04:00
Jean-Marc Valin
116bcb38fb
Adding SSE 4.1 for older platforms
...
AVX without AVX2 should now work again too.
2021-07-10 14:08:01 -04:00
Jean-Marc Valin
e8f70128d5
same conversion cleanup as 3206cec for sgemv_accum8x4()
2021-07-10 01:59:49 -04:00
Jean-Marc Valin
714380e71b
More manual unrolling
2021-07-10 01:59:49 -04:00
Jean-Marc Valin
44fe055682
cleanup float<->int conversions
2021-07-10 01:59:49 -04:00
Jean-Marc Valin
60d6eab63d
Doing a bit of unrolling to speed things up
2021-07-10 01:59:49 -04:00
Jean-Marc Valin
54abdb6f5d
Sparse matrix indexing optimization
...
The 4* is now stored in the table to avoid computing it in the loop
2021-07-10 01:59:49 -04:00
Jean-Marc Valin
d332100808
Representing output pdf as binary probability tree
...
Saves on the MDense/softmax computation since we only need to compute
8 values instead of 256.
2021-07-10 01:59:49 -04:00
Jean-Marc Valin
e35441f2cc
Faster activation functions for AVX
...
Using rational function approximation for tanh() and sigmoid.
2021-06-29 04:05:48 -04:00
Jean-Marc Valin
c1535c8ccf
Adding option to disable int8 dot products
2021-06-24 17:31:05 -04:00
Jean-Marc Valin
0b9f6bab81
Remove unnecessary mask in exp() approximation
...
This isn't necessary since valid exponents can't flip the sign bit
2021-06-21 01:34:38 -04:00
Jean-Marc Valin
ae2ae5ead6
Remove useless multiply by one
...
See bffdcee95 (commitcomment-46372726)
2021-06-21 01:30:51 -04:00
Jean-Marc Valin
8c3fe6f31d
Cleaning up float version
2021-01-16 02:11:21 -05:00
Jean-Marc Valin
83657d0e43
Dot product AVX2 code for non-sparse multiply
2021-01-16 02:11:21 -05:00
Jean-Marc Valin
e695355ba5
some cleanup
2021-01-16 02:11:20 -05:00
Jean-Marc Valin
d87f974431
Vectorizing conversion
2021-01-16 02:11:20 -05:00
Jean-Marc Valin
6b582edbed
WIP: remove scalar code from AVX2 code
2021-01-16 02:11:20 -05:00
Jean-Marc Valin
be392e3857
WIP: Got some AVX2 code working
2021-01-16 02:11:20 -05:00
Jean-Marc Valin
2b4652f9f6
WIP: cleanup
2021-01-16 02:11:20 -05:00
Jean-Marc Valin
bce779886d
WIP: signed*unsigned arithmetic
2021-01-16 02:11:20 -05:00
Jean-Marc Valin
11736ca9e3
WIP: 8-bit mul
2021-01-16 02:11:19 -05:00
Jean-Marc Valin
c045702e51
Add non-dot-product AVX code
2021-01-16 02:11:19 -05:00
Jean-Marc Valin
8e405b44e0
Improve accuracy of AVX sigmoid
...
Reciprocal approximation could cause the sigmoid output to be
greater than 1.0.
2021-01-16 01:51:39 -05:00
David Rowe
7dc696b9a4
refactored for different machines, sgemv_accum16 using NEON intrisics
...
Signed-off-by: Jean-Marc Valin <jmvalin@jmvalin.ca>
2018-12-10 21:28:29 -05:00