ComedicSpeech: Text To Speech For Stand-up Comedies in Low-Resource Scenarios

ArXiv: arXiv:2305.12200


  • Yuyue Wang (Gaoling School of Artificial Intelligence, Renmin University of China)
  • Huan Xiao* (School of Statistics, Renmin University of China)
  • Yihan Wu (Gaoling School of Artificial Intelligence, Renmin University of China)
  • Ruihua Song# (Gaoling School of Artificial Intelligence, Renmin University of China)

* Equally contributed authors.

# Corresponding authors.


Text to Speech (TTS) models can generate natural and high-quality speech, but it is not expressive enough when synthesizing speech with dramatic expressiveness, such as stand-up comedies. Considering comedians have diverse personal speech styles, including personal prosody, rhythm, and fillers, it requires real-world datasets and strong speech style modeling capabilities, which brings challenges. In this paper, we construct a new dataset and develop ComedicSpeech, a TTS system tailored for the stand-up comedy synthesis in low-resource scenarios. First, we extract prosody representation by the prosody encoder and condition it to the TTS model in a flexible way. Second, we enhance the personal rhythm modeling by a conditional duration predictor. Third, we model the personal fillers by introducing comedian-related special tokens. Experiments show that ComedicSpeech achieves better expressiveness than baselines with only ten-minute training data for each comedian.


Baseline Comparison and Audio Samples Synthesized by ComedicSpeech
1.1 Lan Hu(Comedian A)
1.2 Li Yang(Comedian B)
1.3 Bo Yang(Comedian C)
1.4 Niao Niao(Comedian D)
Ablation Studies
2.1 Lan Hu(Comedian A)
2.2 Li Yang(Comedian B)
2.3 Bo Yang(Comedian C)
2.4 Niao Niao(Comedian D)
Case Study

Archietecture of ComedicSpeech

Baseline Comparison and Audio Samples Synthesized by ComedicSpeech

1.1 Lan Hu

Demo1: 而且他这高考出问题,我看着都心慌,这字儿我都不认识,我也是高材生啊

English translation:And he had a problem with the college entrance examination, I looked at the panic, this word I don't know, I am also a top student.

GT GT mel + MelGAN Baseline (Spk Emb) Baseline (GST) ComedicSpeech

Demo2: 而且热搜上面是有标签的,就是那个什么营销号,就是那个一个井字儿对吧。

English translation:And there's a hashtag on it, the marketing account, you know, a hash symbol.

GT GT mel + MelGAN Baseline (Spk Emb) Baseline (GST) ComedicSpeech

Demo3: 还拉了个群,叫“抄底小队”,后来才知道,那是“自杀小队”。

English translation:And formed a group called the "bottom Hunting Squad," which, as it turned out, was the "suicide Squad."

GT GT mel + MelGAN Baseline (Spk Emb) Baseline (GST) ComedicSpeech

Demo4: 聊遗憾,我真的觉得遗憾挺好的,就相比你啥都留不下来,你至少还留下了遗憾嘛。

English translation:Talking about regret, I really think regret is good, compared to you can not save anything, you at least left regret.

GT GT mel + MelGAN Baseline (Spk Emb) Baseline (GST) ComedicSpeech

1.2 Li Yang

Demo1: 然后我就给那个男的打电话,我说首先是这样的,我非常敬重你的人品。

English translation:And I called the guy, and I said, first of all, I have a lot of respect for your character.

GT GT mel + MelGAN Baseline (Spk Emb) Baseline (GST) ComedicSpeech

Demo2: 不是的,这是一种偏见。我们家不重男轻女,在我们家是这样。

English translation:No, it's a form of prejudice. We don't have a preference for boys, not in our family.

GT GT mel + MelGAN Baseline (Spk Emb) Baseline (GST) ComedicSpeech

Demo3: 但是没有一个孩子,把同学领回家写作业写着写着说,是不是累了?啊?

English translation:But there is no child, taking his classmates home to write homework and say, is it tired?

GT GT mel + MelGAN Baseline (Spk Emb) Baseline (GST) ComedicSpeech

Demo4: 我真正开始有一点自信以后,是我开始说脱口秀以后,我觉得我长得挺好的,你知道吧。

English translation:When I really started to feel a little more confident, when I started talking talk, I thought I looked good, you know.

GT GT mel + MelGAN Baseline (Spk Emb) Baseline (GST) ComedicSpeech

1.3 Bo Yang

Demo1: 有一天我睡的正香,我女朋友把我叫醒。

English translation:One day when I was fast asleep, my girlfriend woke me up.

GT GT mel + MelGAN Baseline (Spk Emb) Baseline (GST) ComedicSpeech

Demo2: 后来,老板把我开了,可能他觉得,工作压力太大了。

English translation:Later, my boss fired me, maybe he felt the pressure of work was too much.

GT GT mel + MelGAN Baseline (Spk Emb) Baseline (GST) ComedicSpeech

Demo3: 看见了吗,你们加一起也斗不过我。

English translation:See? You can't beat me together.

GT GT mel + MelGAN Baseline (Spk Emb) Baseline (GST) ComedicSpeech

Demo4: 最后我想说,我去过很多公司。

English translation:Finally, I want to say that I've been to a lot of companies.

GT GT mel + MelGAN Baseline (Spk Emb) Baseline (GST) ComedicSpeech

1.4 Niao Niao

Demo1: 谁能想到这其中最煎熬的竟然是中间的那十分钟。

English translation:Who would have thought the hardest part would be the 10 minutes in between.

GT GT mel + MelGAN Baseline (Spk Emb) Baseline (GST) ComedicSpeech

Demo2: 我来之前还想,我的稿子里有地狱,会不会太沉重了呢。

English translation:Before I came here, I was wondering if my script was too heavy with hell in it.

GT GT mel + MelGAN Baseline (Spk Emb) Baseline (GST) ComedicSpeech

Demo3: 自打办了健身卡,身体没有变健康,灵魂得到了升华。

English translation:Since running a gym membership, the body has not become healthy, the soul has been sublimated.

GT GT mel + MelGAN Baseline (Spk Emb) Baseline (GST) ComedicSpeech

Ablation Studies

2.1 Lan Hu

compare1: 而且他这高考出问题,我看着都心慌,这字儿我都不认识,我也是高材生啊。

English translation:And he had a problem with the college entrance examination, I looked at the panic, this word I don't know, I am also a top student.

ComedicSpeech ComedicSpeech - duration CLN ComedicSpeech + pitch and energy CLN ComedicSpeech - spc

compare2: 而且热搜上面是有标签的,就是那个什么营销号,就是那个一个井字儿对吧。

English translation:And there's a hashtag on it, the marketing account, you know, a hash symbol.

ComedicSpeech ComedicSpeech - duration CLN ComedicSpeech + pitch and energy CLN ComedicSpeech - spc

compare3: 还拉了个群,叫“抄底小队”,后来才知道,那是“自杀小队”。

English translation:And formed a group called the "bottom Hunting Squad," which, as it turned out, was the "suicide Squad."

ComedicSpeech ComedicSpeech - duration CLN ComedicSpeech + pitch and energy CLN ComedicSpeech - spc

compare4: 聊遗憾,我真的觉得遗憾挺好的,就相比你啥都留不下来,你至少还留下了遗憾嘛。

English translation:Talking about regret, I really think regret is good, compared to you can not save anything, you at least left regret.

ComedicSpeech ComedicSpeech - duration CLN ComedicSpeech + pitch and energy CLN ComedicSpeech - spc

2.2 Li Yang

compare1: 然后我就给那个男的打电话,我说首先是这样的,我非常敬重你的人品。

English translation:And I called the guy, and I said, first of all, I have a lot of respect for your character.

ComedicSpeech ComedicSpeech - duration CLN ComedicSpeech + pitch and energy CLN ComedicSpeech - spc

compare2: 不是的,这是一种偏见。我们家不重男轻女,在我们家是这样。

English translation:No, it's a form of prejudice. We don't have a preference for boys, not in our family.

ComedicSpeech ComedicSpeech - duration CLN ComedicSpeech + pitch and energy CLN ComedicSpeech - spc

compare3: 但是没有一个孩子,把同学领回家写作业写着写着说,是不是累了?啊?

English translation:But there is no child, taking his classmates home to write homework and say, is it tired?

ComedicSpeech ComedicSpeech - duration CLN ComedicSpeech + pitch and energy CLN ComedicSpeech - spc

compare4: 我真正开始有一点自信以后,是我开始说脱口秀以后,我觉得我长得挺好的,你知道吧。

English translation:When I really started to feel a little more confident, when I started talking talk, I thought I looked good, you know.

ComedicSpeech ComedicSpeech - duration CLN ComedicSpeech + pitch and energy CLN ComedicSpeech - spc

2.3 Bo Yang

compare1: 有一天我睡的正香,我女朋友把我叫醒。

English translation:One day when I was fast asleep, my girlfriend woke me up.

ComedicSpeech ComedicSpeech - duration CLN ComedicSpeech + pitch and energy CLN ComedicSpeech - spc

compare2: 后来,老板把我开了,可能他觉得,工作压力太大了。

English translation:Later, my boss fired me, maybe he felt the pressure of work was too much.

ComedicSpeech ComedicSpeech - duration CLN ComedicSpeech + pitch and energy CLN ComedicSpeech - spc

compare3: 看见了吗,你们加一起也斗不过我。

English translation:See? You can't beat me together.

ComedicSpeech ComedicSpeech - duration CLN ComedicSpeech + pitch and energy CLN ComedicSpeech - spc

compare4: 最后我想说,我去过很多公司。

English translation:Finally, I want to say that I've been to a lot of companies.

ComedicSpeech ComedicSpeech - duration CLN ComedicSpeech + pitch and energy CLN ComedicSpeech - spc

2.4 Niao Niao

compare1: 谁能想到这其中最煎熬的竟然是中间的那十分钟。

English translation:Who would have thought the hardest part would be the 10 minutes in between.

ComedicSpeech ComedicSpeech - duration CLN ComedicSpeech + pitch and energy CLN ComedicSpeech - spc

compare2: 我来之前还想,我的稿子里有地狱,会不会太沉重了呢。

English translation:Before I came here, I was wondering if my script was too heavy with hell in it.

ComedicSpeech ComedicSpeech - duration CLN ComedicSpeech + pitch and energy CLN ComedicSpeech - spc

compare3: 自打办了健身卡,身体没有变健康,灵魂得到了升华。

English translation:Since running a gym membership, the body has not become healthy, the soul has been sublimated.

ComedicSpeech ComedicSpeech - duration CLN ComedicSpeech + pitch and energy CLN ComedicSpeech - spc

Case Study——The diversity of styles in ComedicSpeech to the same text (contain personal filler)



English translation:Using text with personality filler (Yang Li's "You know what ") :

English translation:When I really started to feel a little more confident, when I started talking talk, I thought I looked good, you know.

GT Baseline (Spk Emb)—Li Yang's tone
ComedicSpeech-Lan Hu's tone ComedicSpeech-Li Yang's tone ComedicSpeech-Bo Yang's tone ComedicSpeech-Niao Niao's tone