Samples used in the subjective evaluation (dataset: NHSS)
Singing
ALT-TTS
WORLD-nodur
WORLD-dur
M2-nodur
M2-dur
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5
Sample 6
Sample 7
Sample 8
Comparing M1 and M2 (dataset: DSing)
We could not include both M1 and M2 in the subjective evaluation due to limited resources. M1 and M2 demonstrate different strengths in objective evaluation. Below are audio samples comparing M1 and M2, which illustrate our decision to exclude M1.
Singing
M1-nodur
M1-dur
M2-nodur
M2-dur
Sample 1
Sample 2
Sample 3
Inference on other languages (dataset: MIR-1k and PJS)
This section showcases the potential of the proposed model when applied to unseen languages. It is not discussed in the paper.