SINGING TO SPEECH CONVERSION WITH GENERATIVE FLOW

Jiawen Huang, Emmanouil Benetos


Samples used in the subjective evaluation (dataset: NHSS)

Singing ALT-TTS WORLD-nodur WORLD-dur M2-nodur M2-dur
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5
Sample 6
Sample 7
Sample 8

Comparing M1 and M2 (dataset: DSing)

We could not include both M1 and M2 in the subjective evaluation due to limited resources. M1 and M2 demonstrate different strengths in objective evaluation. Below are audio samples comparing M1 and M2, which illustrate our decision to exclude M1.
Singing M1-nodur M1-dur M2-nodur M2-dur
Sample 1
Sample 2
Sample 3

Inference on other languages (dataset: MIR-1k and PJS)

This section showcases the potential of the proposed model when applied to unseen languages. It is not discussed in the paper.
Singing M2-nodur M2-dur
Mandarin sample
Japanese sample