Page 1 of 1

C#/SAPI: .WAV files written with missing/incorrect data

PostPosted: Thu Aug 07, 2014 7:12 pm
by BrightArrow-Tech
I'm using Cepstral voice through SAPI on a C#-based telephony application, and I've noticed that if there are many requests to generate text-to-speech happening rapidly, the resulting .WAV file often gets written with a correct WAV wrapper but no valid audio data. Instead it seems like the SAPI spVoice.Speak() call is writing a registry location for Cepstral tokens where it should be writing the PCM information.

I haven't had any luck looking up similar problems with Cepstral voice but I did try purchasing a concurrency port for our license (we previously only had one port) and that does seem to have improved the issue, but failures still occur.

Could this be a result of SAPI not closing the Cepstral ports after the Speak call? That seems like the most likely suspect since increasing the number of ports helped and the "empty file" issue only occurs when there are many such calls in rapid succession. I didn't see a specific way to close the spVoice object but I shut off the file stream it was using to write to disk and set the voice object to null. I know there are issues with C# garbage collection, but I was under the impression that setting an object to null would release it immediately.

Here is an example of hex data that I am seeing from a failed "empty" WAV file (size ~300 bytes):
RIFF....WAVEfmt ........@...€>........data....EVNTì...........................................¢.......H.K.E.Y._.L.O.C.A.L._.M.A.C.H.I.N.E.\.S.O.F.T.W.A.R.E.\.M.i.c.r.o.s.o.f.t.\.S.p.e.e.c.h.\.V.o.i.c.e.s.\.T.o.k.e.n.s.\.C.e.p.s.t.r.a.l._.W.i.l.l.i.a.m.-.8.k.H.z...·.........................


Here is a code snippet showing how I'm calling Cepstral voice through SAPI:

Code: Select all
                    SpVoice voice = new SpVoice();
                    SpFileStream voiceStream = new SpFileStream();

                    if (language == SPOKEN_LANGUAGE.AMERICAN_SPANISH)
                    {
                        byte[] bytes = Encoding.Default.GetBytes(strInputString); //IWM060714 borrowed - re-encode translated text to UTF-8 to remove accented characters
                        string strTranslatedString = Encoding.UTF8.GetString(bytes);
                        strTranslatedString = strTranslatedString.Normalize();
                        strInputString = strTranslatedString;

                        CTrace.TraceLog(TRACE.ALWAYS, "GenerateCepstralAudio Re-Encoded Text: " + strInputString);

                        voice.Voice = voice.GetVoices("name = " + DatabaseServices.ReadRegistry.GetRegistryValue("TTS Spanish Male", "Cepstral Miguel")).Item(0);
                    }
                    else
                    {
                        voice.Voice = voice.GetVoices("name = " + DatabaseServices.ReadRegistry.GetRegistryValue("TTS English Male", "Cepstral William")).Item(0);
                    }

                    voiceStream.Format.Type = SpeechAudioFormatType.SAFT8kHz16BitMono;
                    voiceStream.Open(strTrimPath, SpeechStreamFileMode.SSFMCreateForWrite, true);

                    voice.AudioOutputStream = voiceStream;
                    voice.Speak(strInputString);
                    voice.WaitUntilDone(-1);

                    voiceStream.Close();
                    voiceStream = null;
                    voice = null;


Does anyone here have any suggestions? Thanks!