mirror of
https://github.com/xiph/opus.git
synced 2025-06-06 15:30:48 +00:00
Update DRED REDME.md
This commit is contained in:
parent
b7e1c4f0aa
commit
ddbe483839
2 changed files with 41 additions and 14 deletions
|
@ -1,6 +1,9 @@
|
||||||
# Framewise Auto-Regressive GAN (FARGAN)
|
# Framewise Auto-Regressive GAN (FARGAN)
|
||||||
|
|
||||||
Implementation of FARGAN, a low-complexity neural vocoder.
|
Implementation of FARGAN, a low-complexity neural vocoder. Pre-trained models
|
||||||
|
are provided as C code in the dnn/ directory with the corresponding model in
|
||||||
|
dnn/models/ directory (name starts with fargan_). If you don't want to train
|
||||||
|
a new FARGAN model, you can skip straight to the Inference section.
|
||||||
|
|
||||||
## Data preparation
|
## Data preparation
|
||||||
|
|
||||||
|
|
|
@ -1,24 +1,48 @@
|
||||||
# Rate-Distortion-Optimized Variational Auto-Encoder
|
# Deep REDundancy (DRED) with RDO-VAE
|
||||||
|
|
||||||
## Setup
|
This is a rate-distortion-optimized variational autoencoder (RDO-VAE) designed
|
||||||
The python code requires python >= 3.6 and has been tested with python 3.6 and python 3.10. To install requirements run
|
to coding redundancy information. Pre-trained models are provided as C code
|
||||||
|
in the dnn/ directory with the corresponding model in dnn/models/ directory
|
||||||
|
(name starts with rdovae_). If you don't want to train a new DRED model, you can
|
||||||
|
skip straight to the Inference section.
|
||||||
|
|
||||||
|
## Data preparation
|
||||||
|
|
||||||
|
For data preparation you need to build Opus as detailed in the top-level README.
|
||||||
|
You will need to use the --enable-dred configure option.
|
||||||
|
The build will produce an executable named "dump_data".
|
||||||
|
To prepare the training data, run:
|
||||||
```
|
```
|
||||||
python -m pip install -r requirements.txt
|
./dump_data -train in_speech.pcm out_features.f32 out_speech.pcm
|
||||||
```
|
```
|
||||||
|
Where the in_speech.pcm speech file is a raw 16-bit PCM file sampled at 16 kHz.
|
||||||
|
The speech data used for training the model can be found at:
|
||||||
|
https://media.xiph.org/lpcnet/speech/tts_speech_negative_16k.sw
|
||||||
|
The out_speech.pcm file isn't needed for DRED, but it is needed to train
|
||||||
|
the FARGAN vocoder (see dnn/torch/fargan/ for details).
|
||||||
|
|
||||||
## Training
|
## Training
|
||||||
To generate training data use dump date from the main LPCNet repo
|
|
||||||
```
|
|
||||||
./dump_data -train 16khz_speech_input.s16 features.f32 data.s16
|
|
||||||
```
|
|
||||||
|
|
||||||
To train the model, simply run
|
To perform training, run the following command:
|
||||||
```
|
```
|
||||||
python train_rdovae.py features.f32 output_folder
|
python ./train_rdovae.py --cuda-visible-devices 0 --sequence-length 400 --split-mode random_split --state-dim 80 --batch-size 512 --epochs 400 --lambda-max 0.04 --lr 0.003 --lr-decay-factor 0.0001 out_features.f32 output_dir
|
||||||
```
|
```
|
||||||
|
The final model will be in output_dir/checkpoints/chechpoint_400.pth.
|
||||||
|
|
||||||
To train on CUDA device add `--cuda-visible-devices idx`.
|
The model can be converted to C using:
|
||||||
|
```
|
||||||
|
python export_rdovae_weights.py output_dir/checkpoints/chechpoint_400.pth dred_c_dir
|
||||||
|
```
|
||||||
|
which will create a number of C source and header files in the fargan_c_dir directory.
|
||||||
|
Copy these files to the opus/dnn/ directory (replacing the existing ones) and recompile Opus.
|
||||||
|
|
||||||
|
## Inference
|
||||||
|
|
||||||
## ToDo
|
DRED is integrated within the Opus codec and can be evaluated using the opus_demo
|
||||||
- Upload checkpoints and add URLs
|
executable. For example:
|
||||||
|
```
|
||||||
|
./opus_demo voip 16000 1 64000 -loss 50 -dred 100 -sim_loss 50 input.pcm output.pcm
|
||||||
|
```
|
||||||
|
Will tell the encoder to encode a 16 kHz raw audio file at 64 kb/s using up to 1 second
|
||||||
|
of redundancy (units are based on 10-ms) and then simulate 50% loss. Refer to `opus_demo --help`
|
||||||
|
for more details.
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue