mirror of https://github.com/xiph/opus.git synced 2025-05-19 09:58:30 +00:00

History

Jean-Marc Valin 77d02dbd2f Using macros for sizes in the demo		2019-03-27 14:12:52 -04:00
..
doc	Fixing Makefile	2019-03-18 21:54:31 -04:00
include	Using macros for sizes in the demo	2019-03-27 14:12:52 -04:00
m4	Copied from RNNoise directly	2019-03-18 19:57:40 -04:00
_kiss_fft_guts.h	Importing DSP code from RNNoise	2018-06-24 02:41:36 -04:00
arch.h	Remove NaN checks	2019-03-20 13:36:42 -04:00
AUTHORS	Copied from RNNoise directly	2019-03-18 19:57:40 -04:00
autogen.sh	Making autogen.sh download and unpack the model	2019-03-18 21:43:36 -04:00
causalconv.py	first wavenet implementation	2018-07-13 02:44:43 -04:00
celt_lpc.c	fix pitch	2018-06-25 02:10:31 -04:00
celt_lpc.h	fix pitch	2018-06-25 02:10:31 -04:00
ceps_vq_train.c	minor update to training code	2019-03-12 14:43:13 -04:00
common.c	Add LPCNet decoder object	2019-03-18 14:13:07 -04:00
common.h	Using log approximations	2019-01-01 14:37:19 -05:00
compile.sh	Moving the frame out of lpcnet.c and into test_lpcnet.c	2018-12-11 16:59:07 -05:00
concat.sh	added concat.sh script	2018-12-16 09:31:50 +10:30
configure.ac	Add dump_data	2019-03-19 14:42:23 -04:00
COPYING	add license	2018-10-10 17:28:14 -04:00
dump_data.c	Split off decoder code	2019-03-17 13:25:43 -04:00
dump_lpcnet.py	Use a single u-law embedding	2019-01-21 16:52:57 -05:00
freq.c	More work on making freq.[ch] more generic	2018-12-13 17:40:05 -05:00
freq.h	WIP: Splitting off the encoder	2019-03-15 02:44:56 -04:00
gatedconv.py	wip...	2018-07-23 17:05:21 -04:00
kiss_fft.c	Importing DSP code from RNNoise	2018-06-24 02:41:36 -04:00
kiss_fft.h	Importing DSP code from RNNoise	2018-06-24 02:41:36 -04:00
lpcnet-uninstalled.pc.in	s/rnnoise/lpcnet/ (untested)	2019-03-18 20:05:14 -04:00
lpcnet.c	Make param ordering consistent for lpcnet_synthesize()	2019-03-27 14:06:46 -04:00
lpcnet.pc.in	s/rnnoise/lpcnet/ (untested)	2019-03-18 20:05:14 -04:00
lpcnet.py	20-bit VQ	2019-02-15 15:13:14 -05:00
lpcnet_dec.c	Add LPCNet decoder object	2019-03-18 14:13:07 -04:00
lpcnet_demo.c	Using macros for sizes in the demo	2019-03-27 14:12:52 -04:00
lpcnet_enc.c	Fixing dynamic libraries	2019-03-18 21:53:28 -04:00
lpcnet_private.h	Add LPCNet decoder object	2019-03-18 14:13:07 -04:00
Makefile.am	Add dump_data	2019-03-19 14:42:23 -04:00
mdense.py	initial commit	2018-06-21 20:45:54 -04:00
nnet.c	refactored for different machines, sgemv_accum16 using NEON intrisics	2018-12-10 21:28:29 -05:00
nnet.h	Adding some sparse GRU support	2018-11-28 18:49:19 -05:00
opus_types.h	Importing DSP code from RNNoise	2018-06-24 02:41:36 -04:00
pitch.c	Fix NaN issue	2018-07-11 17:41:35 -04:00
pitch.h	fix pitch	2018-06-25 02:10:31 -04:00
README	Add README	2019-03-19 04:08:12 -04:00
README.md	README.md update	2019-03-22 14:37:01 -04:00
tansig_table.h	Work in progress translation to C	2018-11-23 19:43:58 -05:00
test_lpcnet.c	Make param ordering consistent for lpcnet_synthesize()	2019-03-27 14:06:46 -04:00
test_lpcnet.py	Use real features at the chunk edges rather than zeros	2019-01-24 14:16:30 -05:00
test_vec.c	Vectorization testing code	2018-12-11 01:41:27 -05:00
train_lpcnet.py	Making it easier to adapt (or not) a model	2019-03-24 03:48:26 -04:00
ulaw.py	mu-law code cleanup	2018-10-09 02:39:12 -04:00
update_version	Copied from RNNoise directly	2019-03-18 19:57:40 -04:00
vec.h	Remove NaN checks	2019-03-20 13:36:42 -04:00
vec_avx.h	refactored for different machines, sgemv_accum16 using NEON intrisics	2018-12-10 21:28:29 -05:00
vec_neon.h	Neon: Make gcc actually generate VMLA instructions for sparse mul	2019-03-20 12:58:39 -04:00

README.md

LPCNet

Low complexity implementation of the WaveRNN-based LPCNet algorithm, as described in:

J.-M. Valin, J. Skoglund, LPCNet: Improving Neural Speech Synthesis Through Linear Prediction, Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), arXiv:1810.11846, 2019.

Introduction

Work in progress software for researching low CPU complexity algorithms for speech synthesis and compression by applying Linear Prediction techniques to WaveRNN. High quality speech can be synthesised on regular CPUs (around 3 GFLOP) with SIMD support (AVX, AVX2/FMA, NEON currently supported). The code also supports very low bitrate compression at 1.6 kb/s.

The BSD licensed software is written in C and Python/Keras. For training, a GTX 1080 Ti or better is recommended.

This software is an open source starting point for LPCNet/WaveRNN-based speech synthesis and coding.

Using the existing software

You can build the code using:

./autogen.sh
./configure
make

Note that the autogen.sh script is used when building from Git and will automatically download the latest model (models are too large to put in Git).

It is highly recommended to set the CFLAGS environment variable to enable AVX or NEON prior to running configure, otherwise no vectorization will take place and the code will be very slow. On a recent x86 CPU, something like

export CFLAGS='-O3 -g -mavx2 -mfma'

should work. On ARM, you can enable Neon with:

export CFLAGS='-O3 -g -mfpu=neon'

You can test the capabilities of LPCNet using the lpcnet_demo application. To encode a file:

./lpcnet_demo -encode input.pcm compressed.bin

where input.pcm is a 16-bit (machine endian) PCM file sampled at 16 kHz. The raw compressed data (no header) is written to compressed.bin and consists of 8 bytes per 40-ms packet.

To decode:

./lpcnet_demo -decode compressed.bin output.pcm

where output.pcm is also 16-bit, 16 kHz PCM.

The same functionality is available in the form of a library. See include/lpcnet.h for the API.

Training a new model

This codebase is also meant for research and it is possible to train new models. These are the steps to do that:

Set up a Keras system with GPU.
Generate training data:
```
./dump_data -train input.s16 features.f32 data.u8
```
where the first file contains 16 kHz 16-bit raw PCM audio (no header) and the other files are output files. This program makes several passes over the data with different filters to generate a large amount of training data.
Now that you have your files, train with:
```
./src/train_lpcnet.py features.f32 data.u8
```
and it will generate an lpcnet*.h5 file for each iteration. If it stops with a "Failed to allocate RNN reserve space" message try reducing the batch_size variable in train_lpcnet.py.
You can synthesise speech with Python and your GPU card:
```
./dump_data -test test_input.s16 test_features.f32
./src/test_lpcnet.py test_features.f32 test.s16
```
Note the .h5 is hard coded in test_lpcnet.py, modify for your .h5 file.
Or with C on a CPU: First extract the model files nnet_data.h and nnet_data.c
```
./dump_lpcnet.py lpcnet15_384_10_G16_64.h5
```
and move the generated nnet_data.* files to the src/ directory. Then you just need to rebuild the software and use lpcnet_demo as explained above.

Speech Material for Training

Suitable training material can be obtained from the McGill University Telecommunications & Signal Processing Laboratory. Download the ISO and extract the 16k-LP7 directory, the src/concat.sh script can be used to generate a headerless file of training samples.

cd 16k-LP7
sh /path/to/concat.sh

Reading Further

LPCNet: DSP-Boosted Neural Speech Synthesis
Sample model files: https://jmvalin.ca/misc_stuff/lpcnet_models/