Cepstral - Tutorials

What is SSML?

SSML, or Speech Synthesis Markup Language, provides users with a standardized method for controlling different aspects of speech synthesis output. For example, with SSML, one can alter prosody attributes, such as rate, pitch, and volume, insert pauses of any length, change the speaking voice while reading, and control many other aspects of how the text is read by the synthetic voice. More information can be found on on the W3's SSML 1.0 specificiation page.

How do I use SSML with Cepstral TTS products?

There are several ways to affect pronunciation, and which one to use depends on how you are using the application.

If you are using the Swift command line application to process text, or almost any application that calls Swift directly, you are using our native interface. Swift supports the Speech Synthesis Markup Language (SSML) as the default input mode for the synthesizer, with our own phoneme set for specifying pronunciations. With this you can put in-line pronunciations, and other mark-up defined in SSML.

Our phonetic alphabet is the one that you also use when making entries into a swift voice dictionary (lexicon.txt). You can find more about this here.

Example:

Welcome to <phoneme ph="k eh1 p s t r ah0 l">Cepstral</phoneme>. Of course, this example is contrived, because our engine already says "Cepstral" properly.

When can SSML be used?

The Cepstral Swift TTS engine supports SSML natively, and by default it parses all input text for SSML. However, whether or not SSML is honored depends greatly on the context in which the Cepstral voice is used. If the application that is using the voices does not support SSML, the SSML markup will not make it through to the Swift TTS Engine for parsing. Particularly, SSML does not work in the following highly-used contexts:

Microsoft SAPI 5.1
If you are using Cepstral voices under Microsoft Windows via the SAPI5 interface, you cannot use SSML. Instead, you can use Microsoft's own SAPI XML to achieve similar results, if the application supports SAPI XML. SAPI versions 5.3 and above will support SSML.

For more on SAPI XML, please see this page.
Apple Speech Manager
If you are using Cepstral voices under Apple Macintosh OS X through the Speech Manager interface, you cannot use SSML. Instead, you can use Apple's own speech markup language, called Embedded Speech Commands.

For more on Embedded Speech Commands, please see this page.

SSML does work with with Cepstral voices in any application that has been written to access the Cepstral Swift TTS Engine directly, without interacting with SAPI 5.1 or the Apple Speech Manager. SSML can be used with Cepstral voices in the following contexts:

Swift - The Cepstral command-line interface
Installed with every Cepstral voice for Microsoft Windows, Apple Macintosh OS X, and Linux is a command-line utility called "Swift." By default, any text arguments or input text files sent through Swift are parsed for SSML content.
SwiftTalker
The SwiftTalker application that is bundled with Cepstral voices for Microsoft Windows and Windows CE supports SSML.
Cepstral Tools
SSML can be used in the text you provide to test a voice in the "Voices" tab of the Cepstral Tools applet for the Windows Control Panel.
Asterisk PBX
SSML can be used with Cepstral voices in Asterisk by simply embedding the markup into the input text.

Common Usage Examples

This section lists many of the most comman uses of SSML with Cepstral Voices. The examples are shown as context-free text containing SSML markup. These examples can be used in any context in which SSML works with Cepstral Voices (See "When can SSML be used?"). For more detailed descriptions of how the elements and attributes used in these examples work, see the official W3C SSML Specification:

http://www.w3.org/TR/speech-synthesis/

1. Inserting silence / pauses

2. Changing Voices

"This is the default voice. <voice name="David">This is David.</voice> This is the default again. <voice name="Callie">Callie here.</voice>" 3. Adjusting Speech Rate

4. Adjusting Voice Pitch

5. Adjusting Output Volume

6. Adding Emphasis to Speech

7. Inserting Recorded Audio Files

8. Applying Cepstral Special Effects

9. Inserting Bookmarks

"Place a bookmark <mark name='mark37' /> here." 10. Spelling Words Phonetically

Lexicon Tutorial