SINGING TO SPEECH CONVERSION WITH GENERATIVE FLOW

Jiawen Huang, Emmanouil Benetos

Samples used in the subjective evaluation (dataset: NHSS)

	Singing	ALT-TTS	WORLD-nodur	WORLD-dur	M2-nodur	M2-dur
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5
Sample 6
Sample 7
Sample 8

Comparing M1 and M2 (dataset: DSing)

We could not include both M1 and M2 in the subjective evaluation due to limited resources. M1 and M2 demonstrate different strengths in objective evaluation. Below are audio samples comparing M1 and M2, which illustrate our decision to exclude M1.

	Singing	M1-nodur	M1-dur	M2-nodur	M2-dur
Sample 1
Sample 2
Sample 3

Inference on other languages (dataset: MIR-1k and PJS)

This section showcases the potential of the proposed model when applied to unseen languages. It is not discussed in the paper.

	Singing	M2-nodur	M2-dur
Mandarin sample
Japanese sample