mirror of
https://github.com/xiph/opus.git
synced 2025-05-15 07:58:29 +00:00
![]() Removing one of the 2d conv layers for pitch estimation reduces complexity without noticeable degradation. FARGAN model has more adversarial training. Also, no need for the double precision in the low-pass filter. |
||
---|---|---|
.. | ||
data_augmentation.py | ||
download_demand.sh | ||
evaluation.py | ||
experiments.py | ||
export_neuralpitch_weights.py | ||
models.py | ||
neural_pitch_update.py | ||
ptdb_process.sh | ||
README.md | ||
run_crepe.py | ||
training.py | ||
utils.py |
Neural Pitch Estimation
-
Dataset Installation
-
Download and unzip PTDB Dataset: wget https://www2.spsc.tugraz.at/databases/PTDB-TUG/SPEECH_DATA_ZIPPED.zip unzip SPEECH_DATA_ZIPPED.zip
-
Inside "SPEECH DATA" above, run ptdb_process.sh to combine male/female
-
To Download and combine demand, simply run download_demand.sh
-
-
LPCNet preparation
- To extract xcorr, add lpcnet_extractor.c and add relevant functions to lpcnet_enc.c, add source for headers/c files and Makefile.am, and compile to generate ./lpcnet_xcorr_extractor object
-
Dataset Augmentation and training (check out arguments to each of the following)
- Run data_augmentation.py
- Run training.py using augmented data
- Run experiments.py