Set up a control that renders an AUDIO tag;
Generate a voice sound from the passed text parameter on the server and save it into an in-memory stream;
Convert the stream’s contents to a Data URI;
Return the generated Data URI to the client as the response to the client callback.
So, first of all, my markup looks like this:
As you can see, the SpeechSynthesizer control features a few optional properties:
- Age: the age for the generated voice (default is the one of the system’s default language);
- Gender: gender of the generated voice (same default as per Age);
- Culture: the culture of the generated voice (system default);
- Rate: the speaking rate, from –10 (fastest) to 10 (slowest), where the default is 0 (normal rate);
- Ssml: if the text is to be considered SSML or not (default is false);
- Volume: the volume %, between 0 and 100 (default);
- VoiceName: the name of a voice that is installed on the server machine.
The Age, Gender and Culture properties and the VoiceName are exclusive, you either specify one or the other. If you want to know the voices installed on your machine, have a look at the GetInstalledVoices method. If no property is specified, the speech will be synthesized with the operating system’s default (Microsoft Anna on Windows 7, Microsoft Dave, Hazel and Zira on Windows 8, etc). By the way, you can get additional voices, either commercially or for free, just look them up in Google.
Without further delay, here is the code:
As you can see, the SpeechSynthesizer control inherits from HtmlGenericControl, this is the simplest out-of-the-box class that will allow me to render my tag of choice (in this case, AUDIO); by the way, this class requires that I decorate it with a ConstructorNeedsTagAttribute, but you don’t have to worry about it. It implements ICallbackEventHandler for the client callback mechanism. I make sure that all of AUDIO’s attributes are removed from the output, because I don’t want them around.
Inside of it, I have an instance of the SpeechSynthesizer class, the one that will be used to do the actual work. Because this class is disposable, I make sure it is disposed at the end of the control’s life cycle. Based on the parameters being supplied, I either call the SelectVoiceByHints or the SelectVoice methods. One thing to note is, we need to set up a synchronization context, because the SpeechSynthesizer works asynchronously, so that we can wait for its result.
A full markup example would be:
And that’s it! Have fun with speech on your web apps!