mirror of https://github.com/xiph/opus.git synced 2025-05-22 03:18:30 +00:00

History

Jan Buethe 0563d71b25 updated osce readme		2023-10-07 18:52:38 +02:00
..
data	added more enhancement stuff	2023-09-12 14:50:24 +02:00
engine	added more enhancement stuff	2023-09-12 14:50:24 +02:00
losses	Opus ng lace	2023-06-30 21:15:56 +00:00
models	bugfix	2023-09-22 11:39:22 +02:00
utils	integrated JM's FFT ada conv	2023-09-13 16:31:29 +02:00
adv_train_model.py	added more enhancement stuff	2023-09-12 14:50:24 +02:00
adv_train_vocoder.py	added more enhancement stuff	2023-09-12 14:50:24 +02:00
make_default_setup.py	added more enhancement stuff	2023-09-12 14:50:24 +02:00
README.md	updated osce readme	2023-10-07 18:52:38 +02:00
requirements.txt	added requirements.txt to osce	2023-09-12 16:22:49 +02:00
test_model.py	Opus ng lace	2023-06-30 21:15:56 +00:00
test_vocoder.py	added more enhancement stuff	2023-09-12 14:50:24 +02:00
train_model.py	Opus ng lace	2023-06-30 21:15:56 +00:00
train_vocoder.py	added more enhancement stuff	2023-09-12 14:50:24 +02:00

README.md

Opus Speech Coding Enhancement

This folder hosts models for enhancing Opus SILK.

Environment setup

The code is tested with python 3.11. Conda setup is done via

conda create -n osce python=3.11

conda activate osce

python -m pip install -r requirements.txt

Generating training data

First step is to convert all training items to 16 kHz and 16 bit pcm and then concatenate them. A convenient way to do this is to create a file list and then run

python scripts/concatenator.py filelist 16000 dataset/clean.s16 --db_min -40 --db_max 0

which on top provides some random scaling.

Second step is to run a patched version of opus_demo in the dataset folder, which will produce the coded output and add feature files. To build the patched opus_demo binary, check out the exp-neural-silk-enhancement branch and build opus_demo the usual way. Then run

cd dataset && <path_to_patched_opus_demo>/opus_demo voip 16000 1 9000 -silk_random_switching 249 clean.s16 coded.s16

The argument to -silk_random_switching specifies the number of frames after which parameters are switched randomly.

Generating inference data

Generating inference data is analogous to generating training data. Given an item 'item1.wav' run mkdir item1.se && sox item1.wav -r 16000 -e signed-integer -b 16 item1.raw && cd item1.se && <path_to_patched_opus_demo>/opus_demo voip 16000 1 <bitrate> ../item1.raw noisy.s16

The folder item1.se then serves as input for the test_model.py script or for the --testdata argument of train_model.py resp. adv_train_model.py

Regression loss based training

Create a default setup for LACE or NoLACE via

python make_default_setup.py model.yml --model lace/nolace --path2dataset <path2dataset>

Then run

python train_model.py model.yml <output folder> --no-redirect

for running the training script in foreground or

nohup python train_model.py model.yml <output folder> &

to run it in background. In the latter case the output is written to <output folder>/out.txt.

Adversarial training (NoLACE only)

Create a default setup for NoLACE via

python make_default_setup.py nolace_adv.yml --model nolace --adversarial --path2dataset <path2dataset>

Then run

python adv_train_model.py nolace_adv.yml <output folder> --no-redirect

for running the training script in foreground or

nohup python adv_train_model.py nolace_adv.yml <output folder> &

to run it in background. In the latter case the output is written to <output folder>/out.txt.