bots

Generate TTS Using Your Terminal

It’s not a new hack but it will work well if you have short texts (1-3 sentences) that you wish to convert to speech.

After all, when returning a response to the Google Assistant, you can use a subset of the Speech Synthesis Markup Language (SSML).

Why?

Because you can make your agent’s responses seem more life-like experience.

How?

Open your terminal and try something like this:

curl "http://www.google.com/speech-api/v1/synthesize?lang=en-us&text=actions+on+google+rock" -o aog-rock-hack.mp3

That’s it.

If you want to use SSML and get fancy, it’s also support it in the request:

https://www.google.com/speech-api/v1/synthesize?lang=es-US&ssml=<speak>Hey<prosody%20rate="slow">how+are+you+doing+this+morning</prosody></speak>

Later you can use it in your Action on Google like that:

<speak>
Here are <say-as interpret-as="characters">SSML</say-as> samples.
I can pause <break time="3s"/>.
I can play a sound
<audio src="https://www.example.com/MY_MP3_FILE.mp3"&gt;
didn't get your MP3 audio file
</audio>.
I can speak in cardinals. Your number is
<say-as interpret-as="cardinal">
10
</say-as>.
Or I can speak in ordinals. You are
<say-as interpret-as="ordinal">
10
</say-as> in line.
Or I can even speak in digits. The digits for ten are
<say-as interpret-as="characters">
10
</say-as>.
I can also substitute phrases, like the
<sub alias="World Wide Web Consortium">
W3C
</sub>.
Finally, I can speak a paragraph with two sentences.
<p>
<s>This is sentence one.</s>
<s>This is sentence two.</s>
</p>
</speak>

For more hack and tips, check these slides on VUI Design or the official docs on SSML.


Discover more from Ido Green

Subscribe to get the latest posts sent to your email.

Standard

One thought on “Generate TTS Using Your Terminal

Comments are closed.