SSML, or Speech Synthesis Markup Language, provides users with a standardized method for controlling different aspects of speech synthesis output. For example, with SSML, one can alter prosody attributes, such as rate, pitch, and volume, insert pauses of any length, change the speaking voice while reading, and control many other aspects of how the text is read by the synthetic voice. More information can be found on on the W3's SSML 1.0 specificiation page.
There are several ways to affect pronunciation, and which one to use depends on how you are using the application.
If you are using the Swift command line application to process text, or almost any application that calls Swift directly, you are using our native interface. Swift supports the Speech Synthesis Markup Language (SSML) as the default input mode for the synthesizer, with our own phoneme set for specifying pronunciations. With this you can put in-line pronunciations, and other mark-up defined in SSML.
Our phonetic alphabet is the one that you also use when making entries into a swift voice dictionary (lexicon.txt). You can find more about this here.
Example:
-
Welcome to <phoneme ph="k eh1 p s t r ah0 l">Cepstral</phoneme>.
The Cepstral Swift TTS engine supports SSML natively, and by default it parses all input text for SSML. However, whether or not SSML is honored depends greatly on the context in which the Cepstral voice is used. If the application that is using the voices does not support SSML, the SSML markup will not make it through to the Swift TTS Engine for parsing. Particularly, SSML does not work in the following highly-used contexts:
- Microsoft SAPI 5.1
If you are using Cepstral voices under Microsoft Windows via the SAPI5 interface, you cannot use SSML. Instead, you can use Microsoft's own SAPI XML to achieve similar results, if the application supports SAPI XML. SAPI versions 5.3 and above will support SSML.
For more on SAPI XML, please see this page.
- Apple Speech Manager
If you are using Cepstral voices under Apple Macintosh OS X through the Speech Manager interface, you cannot use SSML. Instead, you can use Apple's own speech markup language, called Embedded Speech Commands.
For more on Embedded Speech Commands, please see this page.
- Swift - The Cepstral command-line interface
Installed with every Cepstral voice for Microsoft Windows, Apple Macintosh OS X, and Linux is a command-line utility called "Swift." By default, any text arguments or input text files sent through Swift are parsed for SSML content.
- SwiftTalker
The SwiftTalker application that is bundled with Cepstral voices for Microsoft Windows and Windows CE supports SSML.
- Cepstral Tools
SSML can be used in the text you provide to test a voice in the "Voices" tab of the Cepstral Tools applet for the Windows Control Panel.
- Asterisk PBX
SSML can be used with Cepstral voices in Asterisk by simply embedding the markup into the input text.
This section lists many of the most comman uses of SSML with Cepstral Voices. The examples are shown as context-free text containing SSML markup. These examples can be used in any context in which SSML works with Cepstral Voices (See "When can SSML be used?"). For more detailed descriptions of how the elements and attributes used in these examples work, see the official W3C SSML Specification:
http://www.w3.org/TR/speech-synthesis/
1. Inserting silence / pauses
-
"This is not <break strength='none' /> a pause."
"This is a <break strength='x-weak' /> phrase break."
"This is a <break strength='weak' /> phrase break."
"This is a <break strength='medium' /> sentence break."
"This is a <break strength='strong' /> paragraph break."
"This is a <break strength='x-strong' /> paragraph break."
"This is a <break time='3s' /> three second pause."
"This is a <break time='4500ms' /> 4.5 second pause."
"This is a <break /> sentence break."
-
"This is the default voice. <voice name="David">This is David.</voice> This is the default again. <voice name="Callie">Callie here.</voice>"
-
"I am now <prosody rate='x-slow'>speaking at half speed.</prosody>"
"I am now <prosody rate='slow'>speaking at 2/3 speed.</prosody>"
"I am now <prosody rate='medium'>speaking at normal speed.</prosody>"
"I am now <prosody rate='fast'>speaking 33% faster.</prosody>"
"I am now <prosody rate='x-fast'>speaking twice as fast</prosody>"
"I am now <prosody rate='default'>speaking at normal speed.</prosody>"
"I am now <prosody rate='.42'>speaking at 42% of normal speed.</prosody>"
"I am now <prosody rate='2.8'>speaking 2.8 times as fast</prosody>"
"I am now <prosody rate='-0.3'>speaking 30% more slowly.</prosody>"
"I am now <prosody rate='+0.3'>speaking 30% faster.</prosody>"
-
"<prosody pitch='x-low'>This is half-pitch</prosody>"
"<prosody pitch='low'>This is 3/4 pitch.</prosody>"
"<prosody pitch='medium'>This is normal pitch.</prosody>"
"<prosody pitch='high'>This is twice as high.</prosody>"
"<prosody pitch='x-high'>This is three times as high.</prosody>"
"<prosody pitch='default'>This is normal pitch.</prosody>"
"<prosody pitch='-50%'>This is 50% lower.</prosody>"
"<prosody pitch='+50%'>This is 50% higher.</prosody>"
"<prosody pitch='-6st'>This is six semitones lower.</prosody>"
"<prosody pitch='+6st'>This is six semitones higher.</prosody>"
"<prosody pitch='-25Hz'>This has a pitch mean 25 Hertz lower.</prosody>"
"<prosody pitch='+25Hz'>This has a pitch mean 25 Hertz higher.</prosody>"
"<prosody pitch='75Hz'>This has a pitch mean of 75 Hertz.</prosody>"
-
"<prosody volume='silent'>This is silent.</prosody>"
"<prosody volume='x-soft'>This is 25% as loud.</prosody>"
"<prosody volume='soft'>This is 50% as loud.</prosody>"
"<prosody volume='medium'>This is the default volume.</prosody>"
"<prosody volume='loud'>This is 50% louder.</prosody>"
"<prosody volume='x-loud'>This is 100% louder.</prosody>"
"<prosody volume='default'>This is the default volume.</prosody>"
"<prosody volume='-33%'>This is 33% softer.</prosody>"
"<prosody volume='+33%'>This is 33% louder.</prosody>"
"<prosody volume='33%'>This is 33% louder.</prosody>"
"<prosody volume='33'>This is 33% of normal volume.</prosody>"
-
"This is <emphasis level='strong'>stronger</emphasis> than the rest."
"This is <emphasis level='moderate'>stronger</emphasis> than the rest."
"This is <emphasis level='none'>the same as</emphasis> than the rest."
-
"Please leave your message after the tone <audio src='beep.wav' />"
"<audio src='non_existing_file.au'>File could not be played.</audio>"
-
"Hello. <cepstral:sfx file='/path/to/my_sfx.sfx'>Howdy, sir. How are you?</cepstral:sfx> I am fine."
"Sit! <voice name='Dog' sfx_file='/path/to/my_sfx.sfx'>Woof!</voice> Good boy."
-
"Place a bookmark <mark name='mark37' /> here."
-
"You say <phoneme ph='t ah0 m ey1 t ow0'>tomato</phoneme>, I say <phoneme ph='t ah0 m aa1 t ow0'>tomato</phoneme>"
For a complete list of available phonemes for your language, please see the "Lexicon Tutorial".