mirror of https://github.com/xiph/opus.git synced 2025-05-17 17:08:29 +00:00

History

Jean-Marc Valin b6ac1c78bb FEC hooks in the PLC code Can now inject FEC features to be used by the PLC when available		2022-09-27 02:24:21 -04:00
..
doc	Fixing Makefile	2019-03-18 21:54:31 -04:00
include	FEC hooks in the PLC code	2022-09-27 02:24:21 -04:00
m4	Copied from RNNoise directly	2019-03-18 19:57:40 -04:00
training_tf2	Move back to tanh for frame rate network	2022-09-24 03:22:57 -04:00
_kiss_fft_guts.h	Importing DSP code from RNNoise	2018-06-24 02:41:36 -04:00
arch.h	Remove NaN checks	2019-03-20 13:36:42 -04:00
AUTHORS	Copied from RNNoise directly	2019-03-18 19:57:40 -04:00
autogen.sh	Decreasing look-ahead of default model to 1 frame	2022-09-24 03:25:33 -04:00
burg.c	Adding Burg spectral estimation code	2022-02-03 00:28:23 -05:00
burg.h	Adding Burg spectral estimation code	2022-02-03 00:28:23 -05:00
ceps_vq_train.c	minor update to training code	2019-03-12 14:43:13 -04:00
common.c	Add LPCNet decoder object	2019-03-18 14:13:07 -04:00
common.h	Using log approximations	2019-01-01 14:37:19 -05:00
compile.sh	Moving the frame out of lpcnet.c and into test_lpcnet.c	2018-12-11 16:59:07 -05:00
concat.sh	added concat.sh script	2018-12-16 09:31:50 +10:30
configure.ac	Adding option to disable int8 dot products	2021-06-24 17:31:05 -04:00
COPYING	add license	2018-10-10 17:28:14 -04:00
datasets.txt	Update model	2021-08-02 19:02:29 -04:00
download_model.sh	Merge branch 'plc_challenge' into master	2022-09-07 00:38:55 -04:00
dump_data.c	Using Burg cepstrum for feature prediction	2022-02-04 22:04:23 -05:00
freq.c	Merge branch 'plc_challenge' into master	2022-09-07 00:38:55 -04:00
freq.h	Merge branch 'plc_challenge' into master	2022-09-07 00:38:55 -04:00
kiss99.c	Minor fixes to kiss99	2021-11-10 18:01:42 -05:00
kiss99.h	Minor fixes to kiss99	2021-11-10 18:01:42 -05:00
kiss_fft.c	Importing DSP code from RNNoise	2018-06-24 02:41:36 -04:00
kiss_fft.h	Importing DSP code from RNNoise	2018-06-24 02:41:36 -04:00
lpcnet-uninstalled.pc.in	s/rnnoise/lpcnet/ (untested)	2019-03-18 20:05:14 -04:00
lpcnet.c	Fix causal PLC for models with non-zero lookahead	2022-09-16 01:44:53 -04:00
lpcnet.pc.in	s/rnnoise/lpcnet/ (untested)	2019-03-18 20:05:14 -04:00
LPCNet.yml	Add conda env file with working tensorflow and keras version for LPCNet	2020-07-31 17:25:29 -04:00
lpcnet_dec.c	Avoiding more symbol clashes with Opus	2022-01-25 00:08:27 -05:00
lpcnet_demo.c	Add delay-compensation for non-causal PLC	2022-02-21 22:52:39 -05:00
lpcnet_enc.c	Enable pitch xcorr refining	2022-02-16 23:09:27 -05:00
lpcnet_plc.c	FEC hooks in the PLC code	2022-09-27 02:24:21 -04:00
lpcnet_private.h	FEC hooks in the PLC code	2022-09-27 02:24:21 -04:00
Makefile.am	Adding Burg spectral estimation code	2022-02-03 00:28:23 -05:00
nnet.c	Avoiding tmp buffer overflows	2022-02-03 00:27:20 -05:00
nnet.h	Avoiding symbol clashes with Opus	2022-01-24 23:21:31 -05:00
opus_types.h	Importing DSP code from RNNoise	2018-06-24 02:41:36 -04:00
pitch.c	Avoiding more symbol clashes with Opus	2022-01-25 00:08:27 -05:00
pitch.h	Making codebase C90-compliant	2022-01-19 18:10:44 -05:00
README	Add README	2019-03-19 04:08:12 -04:00
README.md	Merge branch 'plc_challenge' into master	2022-09-07 00:38:55 -04:00
tansig_table.h	WIP: signed*unsigned arithmetic	2021-01-16 02:11:20 -05:00
test_lpcnet.c	Removing the unused features	2021-07-29 03:20:59 -04:00
test_vec.c	Vectorization testing code	2018-12-11 01:41:27 -05:00
update_version	Copied from RNNoise directly	2019-03-18 19:57:40 -04:00
vec.h	Making codebase C90-compliant	2022-01-19 18:10:44 -05:00
vec_avx.h	Making codebase C90-compliant	2022-01-19 18:10:44 -05:00
vec_neon.h	NEON float->char conversion (same as the AVX2 version)	2021-07-10 01:59:49 -04:00

README.md

LPCNet

Low complexity implementation of the WaveRNN-based LPCNet algorithm, as described in:

J.-M. Valin, J. Skoglund, LPCNet: Improving Neural Speech Synthesis Through Linear Prediction, Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), arXiv:1810.11846, 2019.
J.-M. Valin, U. Isik, P. Smaragdis, A. Krishnaswamy, Neural Speech Synthesis on a Shoestring: Improving the Efficiency of LPCNet, Proc. ICASSP, arxiv:2106.04129, 2022.
K. Subramani, J.-M. Valin, U. Isik, P. Smaragdis, A. Krishnaswamy, End-to-end LPCNet: A Neural Vocoder With Fully-Differentiable LPC Estimation, Proc. INTERSPEECH, arxiv:2106.04129, 2022.

For coding/PLC applications of LPCNet, see:

J.-M. Valin, J. Skoglund, A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet, Proc. INTERSPEECH, arxiv:1903.12087, 2019.
J. Skoglund, J.-M. Valin, Improving Opus Low Bit Rate Quality with Neural Speech Synthesis, Proc. INTERSPEECH, arxiv:1905.04628, 2020.
J.-M. Valin, A. Mustafa, C. Montgomery, T.B. Terriberry, M. Klingbeil, P. Smaragdis, A. Krishnaswamy, Real-Time Packet Loss Concealment With Mixed Generative and Predictive Model, Proc. INTERSPEECH, arxiv:2205.05785, 2022.

Introduction

Work in progress software for researching low CPU complexity algorithms for speech synthesis and compression by applying Linear Prediction techniques to WaveRNN. High quality speech can be synthesised on regular CPUs (around 3 GFLOP) with SIMD support (SSE2, SSSE3, AVX, AVX2/FMA, NEON currently supported). The code also supports very low bitrate compression at 1.6 kb/s.

The BSD licensed software is written in C and Python/Keras. For training, a GTX 1080 Ti or better is recommended.

This software is an open source starting point for LPCNet/WaveRNN-based speech synthesis and coding.

Using the existing software

You can build the code using:

./autogen.sh
./configure
make

Note that the autogen.sh script is used when building from Git and will automatically download the latest model (models are too large to put in Git). By default, LPCNet will attempt to use 8-bit dot product instructions on AVX*/Neon to speed up inference. To disable that (e.g. to avoid quantization effects when retraining), add --disable-dot-product to the configure script. LPCNet does not yet have a complete implementation for some of the integer operations on the ARMv7 architecture so for now you will also need --disable-dot-product to successfully compile on 32-bit ARM.

It is highly recommended to set the CFLAGS environment variable to enable AVX or NEON prior to running configure, otherwise no vectorization will take place and the code will be very slow. On a recent x86 CPU, something like

export CFLAGS='-Ofast -g -march=native'

should work. On ARM, you can enable Neon with:

export CFLAGS='-Ofast -g -mfpu=neon'

While not strictly required, the -Ofast flag will help with auto-vectorization, especially for dot products that cannot be optimized without -ffast-math (which -Ofast enables). Additionally, -falign-loops=32 has been shown to help on x86.

You can test the capabilities of LPCNet using the lpcnet_demo application. To encode a file:

./lpcnet_demo -encode input.pcm compressed.bin

where input.pcm is a 16-bit (machine endian) PCM file sampled at 16 kHz. The raw compressed data (no header) is written to compressed.bin and consists of 8 bytes per 40-ms packet.

To decode:

./lpcnet_demo -decode compressed.bin output.pcm

where output.pcm is also 16-bit, 16 kHz PCM.

Alternatively, you can run the uncompressed analysis/synthesis using -features instead of -encode and -synthesis instead of -decode. The same functionality is available in the form of a library. See include/lpcnet.h for the API.

To try packet loss concealment (PLC), you first need a PLC model, which you can get with:

./download_model.sh plc-3b1eab4

or (for the PLC challenge submission):

./download_model.sh plc_challenge

PLC can be tested with:

./lpcnet_demo -plc_file noncausal_dc error_pattern.txt input.pcm output.pcm

where error_pattern.txt is a text file with one entry per 20-ms packet, with 1 meaning "packet lost" and 0 meaning "packet not lost". noncausal_dc is the non-causal (5-ms look-ahead) with special handling for DC offsets. It's also possible to use "noncausal", "causal", or "causal_dc".

Training a new model

This codebase is also meant for research and it is possible to train new models. These are the steps to do that:

Set up a Keras system with GPU.
Generate training data:
```
./dump_data -train input.s16 features.f32 data.s16
```
where the first file contains 16 kHz 16-bit raw PCM audio (no header) and the other files are output files. This program makes several passes over the data with different filters to generate a large amount of training data.
Now that you have your files, train with:
```
python3 training_tf2/train_lpcnet.py features.f32 data.s16 model_name
```
and it will generate an h5 file for each iteration, with model_name as prefix. If it stops with a "Failed to allocate RNN reserve space" message try specifying a smaller --batch-size for train_lpcnet.py.

You can synthesise speech with Python and your GPU card (very slow):

./dump_data -test test_input.s16 test_features.f32
./training_tf2/test_lpcnet.py lpcnet_model_name.h5 test_features.f32 test.s16

Or with C on a CPU (C inference is much faster): First extract the model files nnet_data.h and nnet_data.c
```
./training_tf2/dump_lpcnet.py lpcnet_model_name.h5
```
and move the generated nnet_data.* files to the src/ directory. Then you just need to rebuild the software and use lpcnet_demo as explained above.

Speech Material for Training

Suitable training material can be obtained from Open Speech and Language Resources. See the datasets.txt file for details on suitable training data.

Reading Further

LPCNet: DSP-Boosted Neural Speech Synthesis
A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet
Sample model files (check compatibility): https://media.xiph.org/lpcnet/data/