mirror of https://github.com/xiph/opus.git synced 2025-05-23 11:49:12 +00:00

History

Jean-Marc Valin 12f16df6b7 More work on making freq.[ch] more generic		2018-12-13 17:40:05 -05:00
..
_kiss_fft_guts.h	Importing DSP code from RNNoise	2018-06-24 02:41:36 -04:00
arch.h	Importing DSP code from RNNoise	2018-06-24 02:41:36 -04:00
causalconv.py	first wavenet implementation	2018-07-13 02:44:43 -04:00
celt_lpc.c	fix pitch	2018-06-25 02:10:31 -04:00
celt_lpc.h	fix pitch	2018-06-25 02:10:31 -04:00
common.h	Compute LPC from features	2018-12-07 18:16:19 -05:00
compile.sh	Moving the frame out of lpcnet.c and into test_lpcnet.c	2018-12-11 16:59:07 -05:00
COPYING	add license	2018-10-10 17:28:14 -04:00
dump_data.c	Avoiding an infinite loop	2018-12-12 11:00:33 -05:00
dump_lpcnet.py	Managing to actually use sparse matrices	2018-11-28 20:20:17 -05:00
freq.c	More work on making freq.[ch] more generic	2018-12-13 17:40:05 -05:00
freq.h	More work on making freq.[ch] more generic	2018-12-13 17:40:05 -05:00
gatedconv.py	wip...	2018-07-23 17:05:21 -04:00
kiss_fft.c	Importing DSP code from RNNoise	2018-06-24 02:41:36 -04:00
kiss_fft.h	Importing DSP code from RNNoise	2018-06-24 02:41:36 -04:00
lpcnet.c	Moving the frame out of lpcnet.c and into test_lpcnet.c	2018-12-11 16:59:07 -05:00
lpcnet.h	Moving the frame out of lpcnet.c and into test_lpcnet.c	2018-12-11 16:59:07 -05:00
lpcnet.py	Controlling per-gate sparsity	2018-12-10 16:15:50 -05:00
mdense.py	initial commit	2018-06-21 20:45:54 -04:00
nnet.c	refactored for different machines, sgemv_accum16 using NEON intrisics	2018-12-10 21:28:29 -05:00
nnet.h	Adding some sparse GRU support	2018-11-28 18:49:19 -05:00
opus_types.h	Importing DSP code from RNNoise	2018-06-24 02:41:36 -04:00
pitch.c	Fix NaN issue	2018-07-11 17:41:35 -04:00
pitch.h	fix pitch	2018-06-25 02:10:31 -04:00
README.md	Remove the need for useless exc and pred files	2018-12-01 12:05:23 -05:00
tansig_table.h	Work in progress translation to C	2018-11-23 19:43:58 -05:00
test_lpcnet.c	More work on making freq.[ch] more generic	2018-12-13 17:40:05 -05:00
test_lpcnet.py	Fix flooring of the pitch period	2018-12-10 11:23:31 -05:00
test_vec.c	Vectorization testing code	2018-12-11 01:41:27 -05:00
train_lpcnet.py	Controlling per-gate sparsity	2018-12-10 16:15:50 -05:00
ulaw.py	mu-law code cleanup	2018-10-09 02:39:12 -04:00
vec.h	refactored for different machines, sgemv_accum16 using NEON intrisics	2018-12-10 21:28:29 -05:00
vec_avx.h	refactored for different machines, sgemv_accum16 using NEON intrisics	2018-12-10 21:28:29 -05:00
vec_neon.h	refactored for different machines, sgemv_accum16 using NEON intrisics	2018-12-10 21:28:29 -05:00

README.md

LPCNet

Low complexity WaveRNN-based speech coding by Jean-Marc Valin

Introduction

Work in progress software for researching low CPU complexity algorithms for speech compression by applying Linear Prediction techniques to WaveRNN. The goal is to reduce the CPU complexity such that high quality speech can be synthesised on regular CPUs (around 1 GFLOP).

The BSD licensed software is written in C and Keras and currently requires a GPU (e.g. GT1060) to run. For training models, a GTX 1080 Ti or better is recommended.

This software is also a useful resource as an open source starting point for WaveRNN-based speech coding.

Quickstart

Set up a Keras system with GPU.
In the src/ directory, run ./compile.sh to compile the data processing program.
Then, run the resulting executable:
```
./dump_data input.s16 features.f32 pcm.s16
```
where the first file contains 16 kHz 16-bit raw PCM audio (no header) and the other files are output files. The input file currently used is 6 hours long, but you may be able to get away with less (and you can always use ±5% or 10% resampling to augment your data).
Now that you have your files, you can do the training with:
```
./train_lpcnet.py features.f32 pcm.s16
```
and it will generate a wavenet*.h5 file for each iteration. If it stops with a "Failed to allocate RNN reserve space" message try reducing the batch_size variable in train_wavenet_audio.py.
You can synthesise speech with:

 ./test_lpcnet.py features.f32 > pcm.txt

The output file pcm.txt contains ASCII PCM samples that need to be converted to WAV for playback

Speech Material for Training

Suitable training material can be obtained from the McGill University Telecommunications & Signal Processing Laboratory. Download the ISO and extract the 16k-LP7 directory, the src/concat.sh script can be used to generate a headerless file of training samples.

cd 16k-LP7
sh ~/CELP/src/concat.sh

Reading Further

If you're lucky, you may be able to get the current model at: https://jmvalin.ca/misc_stuff/lpcnet_models/
WaveNet and Codec 2