Overview
Cepstral voices support the Speech Synthesis Markup Language (SSML). SSML provides a wide range of control over how input text is read by a TTS engine. For example, with SSML, one can alter prosody attributes, such as rate, pitch, and volume, insert pauses of any length, change the speaking voice while reading, and control many other aspects of how the text is read by the synthetic voice.
When can SSML be used?
The Cepstral Swift TTS engine supports SSML natively, and by default it parses all input text for SSML. However, whether or not SSML is honored depends greatly on the context in which the Cepstral voice is used. If the application that is using the voices does not support SSML, the SSML markup will not make it through to the Swift TTS Engine for parsing. Particularly, SSML does not work in the following highly-used contexts:
- Microsoft SAPI 5.1
If you are using Cepstral voices under Microsoft Windows via the SAPI5 interface, you cannot use SSML. Instead, you can use Microsoft's own SAPI XML to achieve similar results, if the application supports SAPI XML.
There is good news on this front for SSML! The next version of SAPI, version 5.3, will support SSML. SAPI5.3 will be built into Windows Vista. At this point, it appears that SAPI5.3 will not be released for Windows XP and other versions of Microsoft Windows prior to Windows Vista.
For more on SAPI XML, please see this page.
- Apple Speech Manager
If you are using Cepstral voices under Apple Macintosh OS X through the Speech Manager interface, you cannot use SSML. Instead, you can use Apple's own speech markup language, called Embedded Speech Commands.
For more on Embedded Speech Commands, please see this page.
SSML does work with with Cepstral voices in any application that has been written to access the Cepstral Swift TTS Engine directly, without interacting with SAPI 5.1 or the Apple Speech Manager. SSML can be used with Cepstral voices in the following contexts:
- swift - The Cepstral command-line interface
Installed with every Cepstral voice for Microsoft Windows, Apple Macintosh OS X, Linux, and Solaris is an command-line utility called "swift." By default, any text arguments or input text files are parsed for SSML content.
- SwiftTalker
The SwiftTalker application that is bundled with Cepstral voices for Microsoft Windows and Windows CE supports SSML.
- Cepstral Tools
SSML can be used in the text you provide to test a voice in the "Voices" tab of the Cepstral Tools applet for the Windows Control Panel.
- Asterisk PBX
SSML can be used with Cepstral voices in Asterisk by simply embedding the markup into the input text.
Common Usage Examples
This section lists many of the most comman uses of SSML with Cepstral Voices. The examples are shown as context-free text containing SSML markup. These examples can be used in any context in which SSML works with Cepstral Voices (See "When can SSML be used?"). For more detailed descriptions of how the elements and attributes used in these examples work, see the official W3C SSML Specification:
http://www.w3.org/TR/speech-synthesis/
1. Inserting silence / pauses
"This is not <break strength='none' /> a pause."
"This is a <break strength='x-weak' /> phrase break."
"This is a <break strength='weak' /> phrase break."
"This is a <break strength='medium' /> sentence break."
"This is a <break strength='strong' /> paragraph break."
"This is a <break strength='x-strong' /> paragraph break."
"This is a <break time='3s' /> three second pause."
"This is a <break time='4500ms' /> 4.5 second pause."
"This is a <break /> sentence break."
2. Changing Voices
"This is the default voice. <voice name="David">This is David.</voice> This is the default again. <voice name="Callie">Callie here.</voice>"
3. Adjusting Speech Rate
"I am now <prosody rate='x-slow'>speaking at half speed.</prosody>"
"I am now <prosody rate='slow'>speaking at 2/3 speed.</prosody>"
"I am now <prosody rate='medium'>speaking at normal speed.</prosody>"
"I am now <prosody rate='fast'>speaking 33% faster.</prosody>"
"I am now <prosody rate='x-fast'>speaking twice as fast</prosody>"
"I am now <prosody rate='default'>speaking at normal speed.</prosody>"
"I am now <prosody rate='.42'>speaking at 42% of normal speed.</prosody>"
"I am now <prosody rate='2.8'>speaking 2.8 times as fast</prosody>"
"I am now <prosody rate='-0.3'>speaking 30% more slowly.</prosody>"
"I am now <prosody rate='+0.3'>speaking 30% faster.</prosody>"
4. Adjusting Voice Pitch
"<prosody pitch='x-low'>This is half-pitch</prosody>"
"<prosody pitch='low'>This is 3/4 pitch.</prosody>"
"<prosody pitch='medium'>This is normal pitch.</prosody>"
"<prosody pitch='high'>This is twice as high.</prosody>"
"<prosody pitch='x-high'>This is three times as high.</prosody>"
"<prosody pitch='default'>This is normal pitch.</prosody>"
"<prosody pitch='-50%'>This is 50% lower.</prosody>"
"<prosody pitch='+50%'>This is 50% higher.</prosody>"
"<prosody pitch='-6st'>This is six semitones lower.</prosody>"
"<prosody pitch='+6st'>This is six semitones higher.</prosody>"
"<prosody pitch='-25Hz'>This has a pitch mean 25 Hertz lower.</prosody>"
"<prosody pitch='+25Hz'>This has a pitch mean 25 Hertz higher.</prosody>"
"<prosody pitch='75Hz'>This has a pitch mean of 75 Hertz.</prosody>"
5. Adjusting Output Volume
"<prosody volume='silent'>This is silent.</prosody>"
"<prosody volume='x-soft'>This is 25% as loud.</prosody>"
"<prosody volume='soft'>This is 50% as loud.</prosody>"
"<prosody volume='medium'>This is the default volume.</prosody>"
"<prosody volume='loud'>This is 50% louder.</prosody>"
"<prosody volume='x-loud'>This is 100% louder.</prosody>"
"<prosody volume='default'>This is the default volume.</prosody>"
"<prosody volume='-33%'>This is 33% softer.</prosody>"
"<prosody volume='+33%'>This is 33% louder.</prosody>"
"<prosody volume='33%'>This is 33% louder.</prosody>"
"<prosody volume='33'>This is 33% of normal volume.</prosody>"
6. Adding Emphasis to Speech
"This is <emphasis level='strong'>stronger</emphasis> than the rest."
"This is <emphasis level='moderate'>stronger</emphasis> than the rest."
"This is <emphasis level='none'>the same as</emphasis> than the rest."
7. Inserting Recorded Audio Files
"Please leave your message after the tone <audio src='beep.wav' />"
"<audio src='non_existing_file.au'>File could not be played.</audio>"
8. Applying Cepstral Special Effects
"Hello. <cepstral:sfx file='/path/to/my_sfx.sfx'>Howdy, sir. How are you?</cepstral:sfx> I am fine."
"Sit! <voice name='Dog' sfx_file='/path/to/my_sfx.sfx'>Woof!</voice> Good boy."
9. Inserting Bookmarks
"Place a bookmark <mark name='mark37' /> here."
10. Spelling Words Phonetically
"You say <phoneme ph='t ah0 m ey1 t ow0'>tomato</phoneme>, I say <phoneme ph='t ah0 m aa1 t ow0'>tomato</phoneme>"
For a complete list of available phonemes for your language, please see the "Editing the Lexicon" page.
11. Defining a Spoken Form that Differs from the Written Form
"SAPI is the <sub alias='Microsoft Speech API'>SAPI</sub>."
Known Issues
Please see the "Known Issues" area for your platform on the Support Page for any known issues with Cepstral's SSML implementation.
Acknowledgements
Much of the information used in the development of Cepstrals SSML implementation section was taked from the official SSML specification, available at:
I have a question that isn't covered here.
For all other technical support inquiries, please use our Contact Request Form. Please provide as much technical information as possible. Thank you!