David Rowe
|
7dc696b9a4
|
refactored for different machines, sgemv_accum16 using NEON intrisics
Signed-off-by: Jean-Marc Valin <jmvalin@jmvalin.ca>
|
2018-12-10 21:28:29 -05:00 |
|
Jean-Marc Valin
|
771cc7868a
|
Support for plain AVX with no FMA
|
2018-12-04 07:58:13 -05:00 |
|
Jean-Marc Valin
|
b05f950e38
|
Using the right name: s/gemm/sgemv/
|
2018-11-30 10:56:44 -05:00 |
|
Jean-Marc Valin
|
c395a68b7d
|
moving code around
|
2018-11-30 10:46:32 -05:00 |
|
Jean-Marc Valin
|
05f4851dcd
|
Making the code work even without AVX2/FMA
|
2018-11-30 10:32:04 -05:00 |
|
Jean-Marc Valin
|
d7f0abcd19
|
Delaying the softmax() to avoid the pow()
Now at 5x real-time, with all the low-hanging fruit done.
|
2018-11-29 20:09:36 -05:00 |
|
Jean-Marc Valin
|
faf3fe3d24
|
gemm_accum16() doesn't need a multiple of 16 columns (just lines).
|
2018-11-29 19:50:09 -05:00 |
|
Jean-Marc Valin
|
7ee79b63df
|
Add AXV versions of exp(), tanh() and sigmoid()
Now 3x faster than real-time
|
2018-11-29 19:43:59 -05:00 |
|
Jean-Marc Valin
|
4de3e53a73
|
Adding some sparse GRU support
Still need to properly dump as sparse.
|
2018-11-28 18:49:19 -05:00 |
|
Jean-Marc Valin
|
ec671ed90e
|
Quick and dirty AVX2 implementation of gemm_accum
Brings us very close to real-time
|
2018-11-28 14:57:22 -05:00 |
|
Jean-Marc Valin
|
732fce9ab2
|
Pre-computing GRU_A's input contribution.
|
2018-11-28 14:05:36 -05:00 |
|
Jean-Marc Valin
|
040aa437c3
|
Simper GRU implementation just for reset_after.
|
2018-11-28 12:37:18 -05:00 |
|
Jean-Marc Valin
|
36a0bf8c75
|
Wow, managed two bugs in a 25-character line
|
2018-11-27 14:50:38 -05:00 |
|
Jean-Marc Valin
|
c7b978b923
|
Fix reset_after GRU
|
2018-11-27 14:37:10 -05:00 |
|
Jean-Marc Valin
|
4ccfbdff04
|
Frame network seems to be working
|
2018-11-26 18:41:54 -05:00 |
|
Jean-Marc Valin
|
538f25565a
|
Starting to actually test this -- fix a few OOB reads
|
2018-11-26 16:02:49 -05:00 |
|
Jean-Marc Valin
|
575d8d6fa4
|
Adding sampling
|
2018-11-26 11:04:41 -05:00 |
|
Jean-Marc Valin
|
7119eaf33b
|
Plumbing for the frame rate network
|
2018-11-25 17:20:24 -05:00 |
|
Jean-Marc Valin
|
141830ce5a
|
Fixing includes
|
2018-11-24 16:00:30 -05:00 |
|
Jean-Marc Valin
|
37fbcaee0b
|
mdense max size
|
2018-11-24 15:51:08 -05:00 |
|
Jean-Marc Valin
|
94ac0841df
|
Precomputing sizes
|
2018-11-24 15:47:48 -05:00 |
|
Jean-Marc Valin
|
c025744e34
|
Fix conv1d, default to size 384
|
2018-11-24 15:30:17 -05:00 |
|
Jean-Marc Valin
|
66486004ba
|
Implement MDense
|
2018-11-24 12:23:11 -05:00 |
|
Jean-Marc Valin
|
d4046036a9
|
Dump Conv1D (didn't check weight ordering at all)
|
2018-11-24 11:32:01 -05:00 |
|
Jean-Marc Valin
|
b9cd61be8b
|
Work in progress translation to C
|
2018-11-23 19:43:58 -05:00 |
|