opus/dnn/torch/osce
2023-10-07 18:52:38 +02:00
..
data added more enhancement stuff 2023-09-12 14:50:24 +02:00
engine added more enhancement stuff 2023-09-12 14:50:24 +02:00
losses Opus ng lace 2023-06-30 21:15:56 +00:00
models bugfix 2023-09-22 11:39:22 +02:00
utils integrated JM's FFT ada conv 2023-09-13 16:31:29 +02:00
adv_train_model.py added more enhancement stuff 2023-09-12 14:50:24 +02:00
adv_train_vocoder.py added more enhancement stuff 2023-09-12 14:50:24 +02:00
make_default_setup.py added more enhancement stuff 2023-09-12 14:50:24 +02:00
README.md updated osce readme 2023-10-07 18:52:38 +02:00
requirements.txt added requirements.txt to osce 2023-09-12 16:22:49 +02:00
test_model.py Opus ng lace 2023-06-30 21:15:56 +00:00
test_vocoder.py added more enhancement stuff 2023-09-12 14:50:24 +02:00
train_model.py Opus ng lace 2023-06-30 21:15:56 +00:00
train_vocoder.py added more enhancement stuff 2023-09-12 14:50:24 +02:00

Opus Speech Coding Enhancement

This folder hosts models for enhancing Opus SILK.

Environment setup

The code is tested with python 3.11. Conda setup is done via

conda create -n osce python=3.11

conda activate osce

python -m pip install -r requirements.txt

Generating training data

First step is to convert all training items to 16 kHz and 16 bit pcm and then concatenate them. A convenient way to do this is to create a file list and then run

python scripts/concatenator.py filelist 16000 dataset/clean.s16 --db_min -40 --db_max 0

which on top provides some random scaling.

Second step is to run a patched version of opus_demo in the dataset folder, which will produce the coded output and add feature files. To build the patched opus_demo binary, check out the exp-neural-silk-enhancement branch and build opus_demo the usual way. Then run

cd dataset && <path_to_patched_opus_demo>/opus_demo voip 16000 1 9000 -silk_random_switching 249 clean.s16 coded.s16

The argument to -silk_random_switching specifies the number of frames after which parameters are switched randomly.

Generating inference data

Generating inference data is analogous to generating training data. Given an item 'item1.wav' run mkdir item1.se && sox item1.wav -r 16000 -e signed-integer -b 16 item1.raw && cd item1.se && <path_to_patched_opus_demo>/opus_demo voip 16000 1 <bitrate> ../item1.raw noisy.s16

The folder item1.se then serves as input for the test_model.py script or for the --testdata argument of train_model.py resp. adv_train_model.py

Regression loss based training

Create a default setup for LACE or NoLACE via

python make_default_setup.py model.yml --model lace/nolace --path2dataset <path2dataset>

Then run

python train_model.py model.yml <output folder> --no-redirect

for running the training script in foreground or

nohup python train_model.py model.yml <output folder> &

to run it in background. In the latter case the output is written to <output folder>/out.txt.

Adversarial training (NoLACE only)

Create a default setup for NoLACE via

python make_default_setup.py nolace_adv.yml --model nolace --adversarial --path2dataset <path2dataset>

Then run

python adv_train_model.py nolace_adv.yml <output folder> --no-redirect

for running the training script in foreground or

nohup python adv_train_model.py nolace_adv.yml <output folder> &

to run it in background. In the latter case the output is written to <output folder>/out.txt.