opus/dnn
Jean-Marc Valin a9871fe6b4 Add README
2019-03-19 04:08:12 -04:00
..
doc Fixing Makefile 2019-03-18 21:54:31 -04:00
include Fixing dynamic libraries 2019-03-18 21:53:28 -04:00
m4 Copied from RNNoise directly 2019-03-18 19:57:40 -04:00
_kiss_fft_guts.h Importing DSP code from RNNoise 2018-06-24 02:41:36 -04:00
arch.h Importing DSP code from RNNoise 2018-06-24 02:41:36 -04:00
AUTHORS Copied from RNNoise directly 2019-03-18 19:57:40 -04:00
autogen.sh Making autogen.sh download and unpack the model 2019-03-18 21:43:36 -04:00
causalconv.py first wavenet implementation 2018-07-13 02:44:43 -04:00
celt_lpc.c fix pitch 2018-06-25 02:10:31 -04:00
celt_lpc.h fix pitch 2018-06-25 02:10:31 -04:00
ceps_vq_train.c minor update to training code 2019-03-12 14:43:13 -04:00
common.c Add LPCNet decoder object 2019-03-18 14:13:07 -04:00
common.h Using log approximations 2019-01-01 14:37:19 -05:00
compile.sh Moving the frame out of lpcnet.c and into test_lpcnet.c 2018-12-11 16:59:07 -05:00
concat.sh added concat.sh script 2018-12-16 09:31:50 +10:30
configure.ac Fixing dynamic libraries 2019-03-18 21:53:28 -04:00
COPYING add license 2018-10-10 17:28:14 -04:00
dump_data.c Split off decoder code 2019-03-17 13:25:43 -04:00
dump_lpcnet.py Use a single u-law embedding 2019-01-21 16:52:57 -05:00
freq.c More work on making freq.[ch] more generic 2018-12-13 17:40:05 -05:00
freq.h WIP: Splitting off the encoder 2019-03-15 02:44:56 -04:00
gatedconv.py wip... 2018-07-23 17:05:21 -04:00
kiss_fft.c Importing DSP code from RNNoise 2018-06-24 02:41:36 -04:00
kiss_fft.h Importing DSP code from RNNoise 2018-06-24 02:41:36 -04:00
lpcnet-uninstalled.pc.in s/rnnoise/lpcnet/ (untested) 2019-03-18 20:05:14 -04:00
lpcnet.c Fixing dynamic libraries 2019-03-18 21:53:28 -04:00
lpcnet.pc.in s/rnnoise/lpcnet/ (untested) 2019-03-18 20:05:14 -04:00
lpcnet.py 20-bit VQ 2019-02-15 15:13:14 -05:00
lpcnet_dec.c Add LPCNet decoder object 2019-03-18 14:13:07 -04:00
lpcnet_demo.c Fixing dynamic libraries 2019-03-18 21:53:28 -04:00
lpcnet_enc.c Fixing dynamic libraries 2019-03-18 21:53:28 -04:00
lpcnet_private.h Add LPCNet decoder object 2019-03-18 14:13:07 -04:00
Makefile.am Add missing headers 2019-03-18 20:50:53 -04:00
mdense.py initial commit 2018-06-21 20:45:54 -04:00
nnet.c refactored for different machines, sgemv_accum16 using NEON intrisics 2018-12-10 21:28:29 -05:00
nnet.h Adding some sparse GRU support 2018-11-28 18:49:19 -05:00
opus_types.h Importing DSP code from RNNoise 2018-06-24 02:41:36 -04:00
pitch.c Fix NaN issue 2018-07-11 17:41:35 -04:00
pitch.h fix pitch 2018-06-25 02:10:31 -04:00
README Add README 2019-03-19 04:08:12 -04:00
README.md doc update 2019-01-01 14:40:00 -05:00
tansig_table.h Work in progress translation to C 2018-11-23 19:43:58 -05:00
test_lpcnet.c More work on making freq.[ch] more generic 2018-12-13 17:40:05 -05:00
test_lpcnet.py Use real features at the chunk edges rather than zeros 2019-01-24 14:16:30 -05:00
test_vec.c Vectorization testing code 2018-12-11 01:41:27 -05:00
train_lpcnet.py Split off decoder code 2019-03-17 13:25:43 -04:00
ulaw.py mu-law code cleanup 2018-10-09 02:39:12 -04:00
update_version Copied from RNNoise directly 2019-03-18 19:57:40 -04:00
vec.h refactored for different machines, sgemv_accum16 using NEON intrisics 2018-12-10 21:28:29 -05:00
vec_avx.h refactored for different machines, sgemv_accum16 using NEON intrisics 2018-12-10 21:28:29 -05:00
vec_neon.h refactored for different machines, sgemv_accum16 using NEON intrisics 2018-12-10 21:28:29 -05:00

LPCNet

Low complexity implementation of the WaveRNN-based LPCNet algorithm, as described in:

J.-M. Valin, J. Skoglund, LPCNet: Improving Neural Speech Synthesis Through Linear Prediction, Submitted for ICASSP 2019, arXiv:1810.11846.

Introduction

Work in progress software for researching low CPU complexity algorithms for speech synthesis and compression by applying Linear Prediction techniques to WaveRNN. High quality speech can be synthesised on regular CPUs (around 3 GFLOP) with SIMD support (AVX, AVX2/FMA, NEON currently supported).

The BSD licensed software is written in C and Python/Keras. For training, a GTX 1080 Ti or better is recommended.

This software is an open source starting point for WaveRNN-based speech synthesis and coding.

Quickstart

  1. Set up a Keras system with GPU.

  2. Generate training data:

    make dump_data
    ./dump_data -train input.s16 features.f32 data.u8
    

    where the first file contains 16 kHz 16-bit raw PCM audio (no header) and the other files are output files. This program makes several passes over the data with different filters to generate a large amount of training data.

  3. Now that you have your files, train with:

    ./train_lpcnet.py features.f32 data.u8
    

    and it will generate a wavenet*.h5 file for each iteration. If it stops with a "Failed to allocate RNN reserve space" message try reducing the batch_size variable in train_wavenet_audio.py.

  4. You can synthesise speech with Python and your GPU card:

    ./dump_data -test test_input.s16 test_features.f32
    ./test_lpcnet.py test_features.f32 test.s16
    

    Note the .h5 is hard coded in test_lpcnet.py, modify for your .h file.

  5. Or with C on a CPU: First extract the model files nnet_data.h and nnet_data.c

    ./dump_lpcnet.py lpcnet15_384_10_G16_64.h5
    

    Then you can make the C synthesiser and try synthesising from a test feature file:

    make test_lpcnet
    ./dump_data -test test_input.s16 test_features.f32
    ./test_lpcnet test_features.f32 test.s16
    

Speech Material for Training

Suitable training material can be obtained from the McGill University Telecommunications & Signal Processing Laboratory. Download the ISO and extract the 16k-LP7 directory, the src/concat.sh script can be used to generate a headerless file of training samples.

cd 16k-LP7
sh /path/to/concat.sh

Reading Further

  1. LPCNet: DSP-Boosted Neural Speech Synthesis
  2. Sample model files: https://jmvalin.ca/misc_stuff/lpcnet_models/