Speech Synthesis with ASP.NET and HTML5

The .NET framework includes the SpeechSynthesizer class which can be used to access the Windows speech synthesis engine. The problem with web applications is, of course, this class runs on the server. Because I wanted a mechanism to have speech synthesis (text-to-speech) fired by JavaScript, without requiring any plugin, I decided to implement one myself.

Once again, I will be using client callbacks, my out-of-the-box ASP.NET favorite AJAX technique. I will also be using HTML5’s AUDIO tag and Data URIs. What I’m going to do is:

  • Set up a control that renders an AUDIO tag;
  • Add to it a JavaScript function that takes a string parameter and causes a client callback;
  • Generate a voice sound from the passed text parameter on the server and save it into an in-memory stream;
  • Convert the stream’s contents to a Data URI;
  • Return the generated Data URI to the client as the response to the client callback.

Of course, all of this in cross-browser style (provided your browser knows the AUDIO tag and Data URIs, which all modern browsers do).

So, first of all, my markup looks like this:

   1: <web:SpeechSynthesizer runat="server" ID="synthesizer" Ssml="false" VoiceName="Microsoft Anna" Age="Adult" Gender="Male" Culture="en-US" Rate="0" Volume="100" />

As you can see, the SpeechSynthesizer control features a few optional properties:

  • Age: the age for the generated voice (default is the one of the system’s default language);
  • Gender: gender of the generated voice (same default as per Age);
  • Culture: the culture of the generated voice (system default);
  • Rate: the speaking rate, from –10 (fastest) to 10 (slowest), where the default is 0 (normal rate);
  • Ssml: if the text is to be considered SSML or not (default is false);
  • Volume: the volume %, between 0 and 100 (default);
  • VoiceName: the name of a voice that is installed on the server machine.

The Age, Gender and Culture properties and the VoiceName are exclusive, you either specify one or the other. If you want to know the voices installed on your machine, have a look at the GetInstalledVoices method. If no property is specified, the speech will be synthesized with the operating system’s default (Microsoft Anna on Windows 7, Microsoft Dave, Hazel and Zira on Windows 8, etc). By the way, you can get additional voices, either commercially or for free, just look them up in Google.

Without further delay, here is the code:

   1: [ConstructorNeedsTag(false)]
   2: public class SpeechSynthesizer : HtmlGenericControl, ICallbackEventHandler
   3: {
   4:     private readonly System.Speech.Synthesis.SpeechSynthesizer synth = new System.Speech.Synthesis.SpeechSynthesizer();
   5:  
   6:     public SpeechSynthesizer() : base("audio")
   7:     {
   8:         this.Age = VoiceAge.NotSet;
   9:         this.Gender = VoiceGender.NotSet;
  10:         this.Culture = CultureInfo.CurrentCulture;
  11:         this.VoiceName = String.Empty;
  12:         this.Ssml = false;
  13:     }
  14:  
  15:     [DefaultValue("")]
  16:     public String VoiceName { get; set; }
  17:  
  18:     [DefaultValue(100)]
  19:     public Int32 Volume { get; set; }
  20:  
  21:     [DefaultValue(0)]
  22:     public Int32 Rate { get; set; }
  23:  
  24:     [TypeConverter(typeof(CultureInfoConverter))]
  25:     public CultureInfo Culture { get; set; }
  26:  
  27:     [DefaultValue(VoiceGender.NotSet)]
  28:     public VoiceGender Gender { get; set; }
  29:  
  30:     [DefaultValue(VoiceAge.NotSet)]
  31:     public VoiceAge Age { get; set; }
  32:  
  33:     [DefaultValue(false)]
  34:     public Boolean Ssml { get; set; }
  35:  
  36:     protected override void OnInit(EventArgs e)
  37:     {
  38:         AsyncOperationManager.SynchronizationContext = new SynchronizationContext();
  39:  
  40:         var sm = ScriptManager.GetCurrent(this.Page);
  41:         var reference = this.Page.ClientScript.GetCallbackEventReference(this, "text", String.Format("function(result){{ document.getElementById('{0}').src = result; document.getElementById('{0}').play(); }}", this.ClientID), String.Empty, true);
  42:         var script = String.Format("\ndocument.getElementById('{0}').speak = function(text){{ {1} }};\n", this.ClientID, reference);
  43:  
  44:         if (sm != null)
  45:         {
  46:             this.Page.ClientScript.RegisterStartupScript(this.GetType(), String.Concat("speak", this.ClientID), String.Format("Sys.WebForms.PageRequestManager.getInstance().add_pageLoaded(function() {{ {0} }});\n", script), true);
  47:         }
  48:         else
  49:         {
  50:             this.Page.ClientScript.RegisterStartupScript(this.GetType(), String.Concat("speak", this.ClientID), script, true);
  51:         }
  52:  
  53:         base.OnInit(e);
  54:     }
  55:  
  56:     protected override void OnPreRender(EventArgs e)
  57:     {
  58:         this.Attributes.Remove("class");
  59:         this.Attributes.Remove("src");
  60:         this.Attributes.Remove("preload");
  61:         this.Attributes.Remove("loop");
  62:         this.Attributes.Remove("autoplay");
  63:         this.Attributes.Remove("controls");
  64:         
  65:         this.Style[HtmlTextWriterStyle.Display] = "none";
  66:         this.Style[HtmlTextWriterStyle.Visibility] = "hidden";
  67:  
  68:         base.OnPreRender(e);
  69:     }
  70:  
  71:     public override void Dispose()
  72:     {
  73:         this.synth.Dispose();
  74:  
  75:         base.Dispose();
  76:     }
  77:  
  78:     #region ICallbackEventHandler Members
  79:  
  80:     String ICallbackEventHandler.GetCallbackResult()
  81:     {
  82:         using (var stream = new MemoryStream())
  83:         {
  84:             this.synth.Rate = this.Rate;
  85:             this.synth.Volume = this.Volume;
  86:             this.synth.SetOutputToWaveStream(stream);
  87:  
  88:             if (String.IsNullOrWhiteSpace(this.VoiceName) == false)
  89:             {
  90:                 this.synth.SelectVoice(this.VoiceName);
  91:             }
  92:             else
  93:             {
  94:                 this.synth.SelectVoiceByHints(this.Gender, this.Age, 0, this.Culture);                    
  95:             }
  96:  
  97:             if (this.Ssml == false)
  98:             {
  99:                 this.synth.Speak(this.Context.Items["data"] as String);
 100:             }
 101:             else
 102:             {
 103:                 this.synth.SpeakSsml(this.Context.Items["data"] as String);
 104:             }
 105:  
 106:             return (String.Concat("data:audio/wav;base64,", Convert.ToBase64String(stream.ToArray())));
 107:         }
 108:     }
 109:  
 110:     void ICallbackEventHandler.RaiseCallbackEvent(String eventArgument)
 111:     {
 112:         this.Context.Items["data"] = eventArgument;
 113:     }
 114:  
 115:     #endregion
 116: }

As you can see, the SpeechSynthesizer control inherits from HtmlGenericControl, this is the simplest out-of-the-box class that will allow me to render my tag of choice (in this case, AUDIO); by the way, this class requires that I decorate it with a ConstructorNeedsTagAttribute, but you don’t have to worry about it. It implements ICallbackEventHandler for the client callback mechanism. I make sure that all of AUDIO’s attributes are removed from the output, because I don’t want them around.

Inside of it, I have an instance of the SpeechSynthesizer class, the one that will be used to do the actual work. Because this class is disposable, I make sure it is disposed at the end of the control’s life cycle. Based on the parameters being supplied, I either call the SelectVoiceByHints or the SelectVoice methods. One thing to note is, we need to set up a synchronization context, because the SpeechSynthesizer works asynchronously, so that we can wait for its result.

The generated sound will be output to an in-memory buffer and then converted into a WAV Data URI, which is basically a Base64 encoded string with an associated mime-type.

Finally, on the client-side, all it takes is to set the returned Data URI as the AUDIO SRC property, and that's it.

A full markup example would be:

   1: <%@ Register Assembly="System.Speech, Version=4.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35" Namespace="System.Speech" TagPrefix="web" %>
   2: <!DOCTYPE html>
   3: <html xmlns="http://www.w3.org/1999/xhtml">
   4: <head runat="server">
   5:     <script type="text/javascript">
   1:  
   2:         
   3:         function onSpeak(text)
   4:         {
   5:             document.getElementById('synthesizer').speak(text);
   6:         }
   7:  
   8:     
</script>
   6: </head>
   7: <body>
   8:     <form runat="server">
   9:     <div>
  10:         <web:SpeechSynthesizer runat="server" ID="synthesizer" Age="Adult" Gender="Male" Culture="en-US" Rate="0" Volume="100" />
  11:         <input type="text" id="text" name="text"/>
  12:         <input type="button" value="Speak" onclick="onSpeak(this.form.text.value)"/>
  13:     </div>
  14:     </form>
  15: </body>
  16: </html>

And that’s it! Have fun with speech on your web apps! Winking smile

                             

21 Comments

  • Hi,

    I am playing about trying to recreate what you have done above. For one reason or another I keeping running into issues this is usually to do with the System.Speech lib but digging deeper it may be an issue with the Client Callback Control. Not sure if this is because I haven't followed correctly, have implement wrong or don't have all the prerequisites for getting this set up. Do you have a complete example solution that can be downloaded so I can work through it side by side and understand where i went wrong ?

    Thanks

  • Hi, Si!
    Why don't you send me your code and I'll have a look?
    Address is rjperes at hotmail.

  • Hi.
    Something wrong with your code that <web:SpeechSynthesizer is given web.config missing configuration even its done!
    That mean may different way to implement a custum control to makes it work. Im on working on it..

  • Hi im not able to run your program, great appreciate if you can help me with this.

    defaul.aspx code
    <%@ Page Title="Home Page" Language="C#" MasterPageFile="~/Site.Master" AutoEventWireup="true" CodeBehind="Default.aspx.cs" Inherits="speechrecognition._Default" %>
    <%@ Register Assembly="System.Speech, Version=4.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35" Namespace="System.Speech" TagPrefix="web" %>

    <asp:Content runat="server" ID="HeadConent1" ContentPlaceHolderID="HeadContent>
    <script type="text/javascript">


    function onSpeak(text) {
    document.getElementById('synthesizer').speak(text);
    }
    </script>
    </asp:Content>
    <asp:Content runat="server" ID="BodyContent" ContentPlaceHolderID="MainContent">

    <h3>We suggest the following:</h3>
    <div>
    <web:SpeechSynthesizer runat="server" ID="synthesizer" Age="Adult" Gender="Male" Culture="en-US" Rate="0" Volume="100" /> - See more at: http://weblogs.asp.net/ricardoperes/speech-synthesis-with-asp-net-and-html5#sthash.3qau41Of.dpuf
    <input type="text" id="text" name="text"/>
    <input type="button" value="Speak" onclick="onSpeak(this.form.text.value)"/>
    </div>

    </asp:Content>


    getting this error:

    varser Error

    Description: An error occurred during the parsing of a resource required to service this request. Please review the following specific parse error details and modify your source file appropriately.

    Parser Error Message: Unknown server tag 'web:SpeechSynthesizer'.

    Source Error:


    Line 16: <h3>We suggest the following:</h3>
    Line 17: <div>
    Line 18: <web:SpeechSynthesizer runat="server" ID="synthesizer" Age="Adult" Gender="Male" Culture="en-US" Rate="0" Volume="100" /> - See more at: http://weblogs.asp.net/ricardoperes/speech-synthesis-with-asp-net-and-html5#sthash.3qau41Of.dpuf
    Line 19: <input type="text" id="text" name="text"/>
    Line 20: <input type="button" value="Speak" onclick="onSpeak(this.form.text.value)"/>

    Source File: /Default.aspx Line: 18

    Version Information: Microsoft .NET Framework Version:4.0.30319; ASP.NET Version:4.0.30319.34280




    SpeechSynthesizer.cs class

    using System;
    using System.ComponentModel;
    using System.Globalization;
    using System.IO;
    using System.Speech.Synthesis;
    using System.Threading;
    using System.Web.UI;
    using System.Web.UI.HtmlControls;
    namespace speechrecognition
    {
    [ConstructorNeedsTag(false)]
    public class SpeechSynthesizer : HtmlGenericControl, ICallbackEventHandler
    {
    private readonly System.Speech.Synthesis.SpeechSynthesizer synth = new System.Speech.Synthesis.SpeechSynthesizer();

    public SpeechSynthesizer()
    : base("audio")
    {
    this.Age = VoiceAge.NotSet;
    this.Gender = VoiceGender.NotSet;
    this.Culture = CultureInfo.CurrentCulture;
    this.VoiceName = String.Empty;
    this.Ssml = false;
    }

    [DefaultValue("")]
    public String VoiceName { get; set; }

    [DefaultValue(100)]
    public Int32 Volume { get; set; }

    [DefaultValue(0)]
    public Int32 Rate { get; set; }

    [TypeConverter(typeof(CultureInfoConverter))]
    public CultureInfo Culture { get; set; }

    [DefaultValue(VoiceGender.NotSet)]
    public VoiceGender Gender { get; set; }

    [DefaultValue(VoiceAge.NotSet)]
    public VoiceAge Age { get; set; }

    [DefaultValue(false)]
    public Boolean Ssml { get; set; }

    protected override void OnInit(EventArgs e)
    {
    AsyncOperationManager.SynchronizationContext = new SynchronizationContext();

    var sm = ScriptManager.GetCurrent(this.Page);
    var reference = this.Page.ClientScript.GetCallbackEventReference(this, "text", String.Format("function(result){{ document.getElementById('{0}').src = result; document.getElementById('{0}').play(); }}", this.ClientID), String.Empty, true);
    var script = String.Format("\ndocument.getElementById('{0}').speak = function(text){{ {1} }};\n", this.ClientID, reference);

    if (sm != null)
    {
    this.Page.ClientScript.RegisterStartupScript(this.GetType(), String.Concat("speak", this.ClientID), String.Format("Sys.WebForms.PageRequestManager.getInstance().add_pageLoaded(function() {{ {0} }});\n", script), true);
    }
    else
    {
    this.Page.ClientScript.RegisterStartupScript(this.GetType(), String.Concat("speak", this.ClientID), script, true);
    }

    base.OnInit(e);
    }

    protected override void OnPreRender(EventArgs e)
    {
    this.Attributes.Remove("class");
    this.Attributes.Remove("src");
    this.Attributes.Remove("preload");
    this.Attributes.Remove("loop");
    this.Attributes.Remove("autoplay");
    this.Attributes.Remove("controls");

    this.Style[HtmlTextWriterStyle.Display] = "none";
    this.Style[HtmlTextWriterStyle.Visibility] = "hidden";

    base.OnPreRender(e);
    }

    public override void Dispose()
    {
    this.synth.Dispose();

    base.Dispose();
    }

    #region ICallbackEventHandler Members

    String ICallbackEventHandler.GetCallbackResult()
    {
    using (var stream = new MemoryStream())
    {
    this.synth.Rate = this.Rate;
    this.synth.Volume = this.Volume;
    this.synth.SetOutputToWaveStream(stream);

    if (String.IsNullOrWhiteSpace(this.VoiceName) == false)
    {
    this.synth.SelectVoice(this.VoiceName);
    }
    else
    {
    this.synth.SelectVoiceByHints(this.Gender, this.Age, 0, this.Culture);
    }

    if (this.Ssml == false)
    {
    this.synth.Speak(this.Context.Items["data"] as String);
    }
    else
    {
    this.synth.SpeakSsml(this.Context.Items["data"] as String);
    }

    return (String.Concat("data:audio/wav;base64,", Convert.ToBase64String(stream.ToArray())));
    }
    }

    void ICallbackEventHandler.RaiseCallbackEvent(String eventArgument)
    {
    this.Context.Items["data"] = eventArgument;
    }

    #endregion
    }
    }

  • Hi, I sent you the speechrecognition.zip file to your email provided rjperes@hotmail.com

    Thank you for your help.
    Ahmed
    440 622 1531

  • Hi Ahmed Khaja, Even I am facing the same issue as you, in the below code of line. Were you able to run this?

    //------------------------------------------------------------------------------
    // <auto-generated>
    // This code was generated by a tool.
    //
    // Changes to this file may cause incorrect behavior and will be lost if
    // the code is regenerated.
    // </auto-generated>
    //------------------------------------------------------------------------------

    namespace VoiceReconitionWebAppDemo {


    public partial class _Default {

    /// <summary>
    /// processor control.
    /// </summary>
    /// <remarks>
    /// Auto-generated field.
    /// To modify move field declaration from designer file to code-behind file.
    /// </remarks>
    protected global::System.Speech.SpeechRecognition processor;

    /// <summary>
    /// synthesizer control.
    /// </summary>
    /// <remarks>
    /// Auto-generated field.
    /// To modify move field declaration from designer file to code-behind file.
    /// </remarks>
    protected global::System.Speech.SpeechSynthesizer synthesizer;
    }
    }

    Error 3 The type or namespace name 'SpeechRecognition' does not exist in the namespace 'System.Speech' (are you missing an assembly reference?) C:\D\Work\Project\SpeachReconition\VoiceReconitionWebAppDemo\Default.aspx.designer.cs 22 41 VoiceReconitionWebAppDemo

  • What is this
    <%@ Register Assembly="System.Speech, Version=4.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35" Namespace="System.Speech" TagPrefix="web" %>
    This makes no sense at all! You should register your project, not System.Speech!

  • Hi RicardoPeres,

    Thank you for your post. i an trying reproduce your work in my project and i am not able to complete it properly. I am requesting,
    can you please upload your project some ware, so i can download it work around it.
    if you do it, it will really helpful to me and my project.

  • Chirag: please see https://github.com/rjperes/DevelopmentWithADot.AspNetSpeechSynthesizer.

  • This code does not work IE 11. It works OK on chrome, Edge.

  • Hi, Amer!
    What is not working? It is a JavaScript problem?

  • I picked the generated data and put into this html file.

    You will see this html page works perfectly fine in Chrome and Edge. In IE 11 it says invalid source.

  • Just in case you missed the url, here is the link:

    http://www.pmnet.co.uk/iic/texttovoice.html

  • I found out the actual problem is IE11 can not play "wav" audio format. You need to convert it into mp3 to make it work in IE11.

    Is there anything built in .NET library to convert wav to mp3?

    There are NAudio.Lame package available to convert audio formats but they are build for .Net 4.5. I need a solution for .Net 3.5.

    Any suggestions?

  • Hi, Amer!
    Thanks for letting me (us) know!
    AFAIK, MP3 format is copyrighted, so you may not be able to freely convert it. But browsers nowadays can play a myriad of file formats, like OGG Vorbis, including IE11.
    Cheers!

  • Hola,mira que estoy intentando ejecutar el código y genera error para el
    <web:SpeechSynthesizer runat="server" ID="synthesizer" Age="Adult" Gender="Male" Culture="en-US" Rate="0" Volume="100" />
    que hace falta algo en el web config

    tu podrías darme el código por correo en un zip.

    Gracias

    Luis Mariño

  • Hola,mira que estoy intentando ejecutar el código y genera error para el
    <web:SpeechSynthesizer runat="server" ID="synthesizer" Age="Adult" Gender="Male" Culture="en-US" Rate="0" Volume="100" />
    que hace falta algo en el web config

    tu podrías darme el código por correo en un zip.

    Gracias

    Luis Mariño


  • Thanks for your post. I am trying to reproduce your work in my project and I am not able to complete it correctly.
    Could you leave it in a ZIP file to my mail, I would appreciate it a lot

    Luis Mariño

  • Luis: the code is available at https://github.com/rjperes/DevelopmentWithADot.AspNetSpeechSynthesizer.
    You need to disable code signing, as I didn't provide my certificate.

  • to go step by step, Your code worked properly on my development laptop. when I moved it to my IIS server, it did not work until I change the Identity of the application pool to Local system, so it can access system32 folder.
    Now, when I moved it to a public hosting server, like A2 hosting as example, where I can not change the identity, what is the solution?
    currently, on any server that does not have the required access, initializing the synthesizer will return null.
    private readonly System.Speech.Synthesis.SpeechSynthesizer synth = new System.Speech.Synthesis.SpeechSynthesizer();
    I appreciate a solution for public hosting....

  • Hi, Omar!
    I'm afraid I don't have a solution for you as I have never come across it. If you happen to find one, please do share it here! Thanks!

Add a Comment

As it will appear on the website

Not displayed

Your website