Contents tagged with HTML
Speech synthesis and recognition were both introduced in .NET 3.0. They both live in System.Speech.dll. In the past, I already talked about speech synthesis in the context of ASP.NET Web Form applications, this time, I’m going to talk about speech recognition.
.NET has in fact two APIs for that:
System.Speech, which comes with the core framework;
Microsoft.Speech, offering a somewhat similar API, but requiring a separate download: Microsoft Speech Platform Software Development Kit.
I am going to demonstrate a technique that makes use of HTML5 features, namely, Data URIs and the getUserMedia API and also ASP.NET Client Callbacks, which, if you have been following my blog, should know that I am a big fan of.
First, because we have two APIs that we can use, let’s start by creating an abstract base provider class:
It basically features one method, RecognizeFromWav, which takes a physical path and returns a SpeechRecognitionResult (code coming next). For completeness, it also implements the Dispose Pattern, because some provider may require it.
In a moment we will be creating implementations for the built-in .NET provider as well as Microsoft Speech Platform.
The SpeechRecognition property refers to our Web Forms control, inheriting from HtmlGenericControl, which is the one that knows how to instantiate one provider or the other:
The control exposes some custom properties:
Culture: an optional culture name, such as “pt-PT” or “en-US”; if not specified, it defaults to the current culture in the server machine;
Mode: one of the two providers: Desktop (for System.Speech) or Server (for Microsoft.Speech, of Microsoft Speech Platform);
SampleRate: a sample rate, by default, it is 44100;
Grammars: an optional collection of additional grammar files, with extension .grxml (Speech Recognition Grammar Specification), to add to the engine;
Choices: an optional collection of choices to recognize, if we want to restrict the scope, such as “yes”/”no”, “red”/”green”, etc.
The mode enumeration looks like this:
The SpeechRecognition control also has an event, SpeechRecognized, which allows overriding the detected phrases. Its argument is this simple class that follows the regular .NET event pattern:
Which in turn holds a SpeechRecognitionResult:
This class receives the phrase that the speech recognition engine understood plus an array of additional alternatives, in descending order.
You need to add an assembly-level attribute, WebResourceAttribute, possibly in AssemblyInfo.cs, of course, replacing MyNamespace for your assembly’s default namespace:
This attribute registers a script file with some content-type so that it can be included in a page by the RegisterClientScriptResource method.
And here is it:
OK, let’s move on the the provider implementations; first, Desktop:
What this provider does is simply receive the location of a .WAV file and feed it to SpeechRecognitionEngine, together with some parameters of SpeechRecognition (Culture, AudioRate, Grammars and Choices)
Finally, the code for the Server (Microsoft Speech Platform Software Development Kit) version:
As you can see, it is very similar to the Desktop one. Keep in mind, however, that for this provider to work you will have to download the Microsoft Speech Platform SDK, the Microsoft Speech Platform Runtime and at least one language from the Language Pack.
Here is a sample markup declaration:
If you want to add specific choices, add the Choices attribute to the control declaration and separate the values by commas:
Or add a grammar file:
By the way, grammars are not so difficult to create, you can find a good documentation in MSDN: http://msdn.microsoft.com/en-us/library/ee800145.aspx.
And that’s it. You start recognition by calling startRecording(), get results in onSpeechRecognized() (or any other function set in the OnClientSpeechRecognized property) and stop recording with stopRecording(). The values passed to onSpeechRecognized() are those that may have been filtered by the server-side SpeechRecognized event handler.
A final word of advisory: because generated sound files may become very large, do keep the recordings as short as possible.
Of course, this offers several different possibilities, I am looking forward to hearing them from you!
I have already talked about SignalR in this blog. I think it is one of the most interesting technologies that Microsoft put out recently, not because it is something substantially new – AJAX, long polling and server-sent events have been around for quite some time -, but because of how easy and extensive they made it.
Most of the examples of SignalR usually are about chat. You know that I have been digging into HTML5 lately, and I already posted on media acquisition using HTML5’s getUserMedia API. This time, however, I’m going to talk about video streaming!
I wrote a simple ASP.NET Web Forms control that leverages SignalR to build a real-time communication channel, and this channel transmits images as Data URIs. The source of the feed comes from getUserMedia and is therefore available in all modern browsers, except, alas, one that shall remain unnamed. Any browser, however, can be used to display the streaming feed.
So, the architecture is quite simple:
One SignalR Hub for the communication channel;
One ASP.NET Web Forms control, that renders an HTML5 VIDEO tag that is used to display the video being acquired, on any compatible browser.
Here are some nice screenshots of my now famous home wall, where one of the browser instances, Chrome, is broadcasting to Firefox, Opera and Safari. Unfortunately, IE is not supported.
So, all we need is a control declaration on an ASP.NET Web Forms page, which could look like this:
The non-ordinary properties that the VideoStreaming control supports are:
Interval: the rate in milliseconds that the control broadcasts that the video is being broadcast; do not set it to less than 200, from my experience, it will not work well;
Source: if the control is set as a streaming source, or just as a receiver (default);
And a simple CANVAS element for Target:
The reason for the two main streaming modes, Target and Event are flexibility: with Target, you directly draw the streamed picture on a CANVAS or IMG (it will detect automatically what to use), which needs to be already present, and with Event, you can do your own custom processing.
The source for the VideoStreaming control is this:
Renders a VIDEO tag element with the specified Width and Height;
Register a SignalR hub;
Removes any eventual VIDEO-related attribute that may be present on the control declaration;
Of course, we now need the hub code, which is as simple as it could be, just a method that takes the image as a Data URI and its original dimensions and broadcasts it to the world:
And finally the simple enumerations used by the VideoStreaming control:
And there you have it! Just drop the VideoStreaming control on an ASPX page and you’re done! If you want to have it broadcast, you need to set its Source property to true, for example, on the containing page:
That’s it. Very simple video streaming without the need for any plugin or server. Hope you enjoy it!
I wanted to be able to capture a snapshot and to upload it in AJAX style. Of course, if you know me you can guess I wanted this with ASP.NET Web Forms and client callbacks!
OK, so I came up with this markup:
The properties and events worth notice are:
Width & Height: should be self explanatory;
PictureTaken: a handler for a server-side .NET event that is raised when the picture is sent to the server asynchronously.
startCapture: when called, starts displaying the output of the camera in real time;
stopCapture: pauses the update of the camera;
takeSnapshot: takes a snapshot of the current camera output (as visible on screen) and sends it asynchronously to the server, raising the PictureTaken event.
When the startCapture method is called, the PictureSnapshot control (or, better, its underlying VIDEO tag) starts displaying whatever the camera is pointing to (if we so authorize it).
Here is an example of one of my home walls… yes, I deliberately got out of the way!
OK, enough talk, the code for PictureSnapshot looks like this:
As you can see, it inherits from WebControl. This is the simplest class in ASP.NET that allows me to output the tag I want, and also includes some nice stuff (width, height, style, class, etc). Also, it implements ICallbackEventHandler for the client callback stuff.
You may find it strange that I am setting explicitly the WIDTH and HEIGHT attributes, that is because VIDEO requires it in the tag, not just on the STYLE. In the way, I am removing any CROSSORIGIN, SRC, MUTED, PRELOAD, LOOP, AUTOPLAY, MEDIAGROUP, POSTER and CONTROLS attributes that may be present in the markup, because we don’t really want them.
The PictureTakenEventArgs class:
Finally, a simple event handler for PictureTaken:
Pictures arrive as instances of the Image class, and you can do whatever you want with it.
As always, feedback is greatly appreciated!
Set up a control that renders an AUDIO tag;
Generate a voice sound from the passed text parameter on the server and save it into an in-memory stream;
Convert the stream’s contents to a Data URI;
Return the generated Data URI to the client as the response to the client callback.
So, first of all, my markup looks like this:
As you can see, the SpeechSynthesizer control features a few optional properties:
- Age: the age for the generated voice (default is the one of the system’s default language);
- Gender: gender of the generated voice (same default as per Age);
- Culture: the culture of the generated voice (system default);
- Rate: the speaking rate, from –10 (fastest) to 10 (slowest), where the default is 0 (normal rate);
- Ssml: if the text is to be considered SSML or not (default is false);
- Volume: the volume %, between 0 and 100 (default);
- VoiceName: the name of a voice that is installed on the server machine.
The Age, Gender and Culture properties and the VoiceName are exclusive, you either specify one or the other. If you want to know the voices installed on your machine, have a look at the GetInstalledVoices method. If no property is specified, the speech will be synthesized with the operating system’s default (Microsoft Anna on Windows 7, Microsoft Dave, Hazel and Zira on Windows 8, etc). By the way, you can get additional voices, either commercially or for free, just look them up in Google.
Without further delay, here is the code:
As you can see, the SpeechSynthesizer control inherits from HtmlGenericControl, this is the simplest out-of-the-box class that will allow me to render my tag of choice (in this case, AUDIO); by the way, this class requires that I decorate it with a ConstructorNeedsTagAttribute, but you don’t have to worry about it. It implements ICallbackEventHandler for the client callback mechanism. I make sure that all of AUDIO’s attributes are removed from the output, because I don’t want them around.
Inside of it, I have an instance of the SpeechSynthesizer class, the one that will be used to do the actual work. Because this class is disposable, I make sure it is disposed at the end of the control’s life cycle. Based on the parameters being supplied, I either call the SelectVoiceByHints or the SelectVoice methods. One thing to note is, we need to set up a synchronization context, because the SpeechSynthesizer works asynchronously, so that we can wait for its result.
A full markup example would be:
And that’s it! Have fun with speech on your web apps!
A lot has changed with regard to events and event handling since the old days of HTML. Let’s have a brief look at it.
I have been playing recently with HTML5, and one thing that I got to understand really well was the new upload mechanisms available. Specifically, I wanted to understand how SkyOneDrive, Google Drive, Dropbox, etc, all support dropping files from the local machine, and how to use it in an ASP.NET Web Forms (sorry!) project, and I got it!
So, I want to have a panel (a DIV) that allows dropping files from the local machine; I want to be able to filter these files by their content type, maximum length, or by any custom condition. If the dropped files pass all conditions, they are sent asynchronously (read, AJAX) and raise a server-side event. After that, if they are multimedia files, I can preview them on the client-side.
My markup looks like this:
The UploadPanel control inherits from the Panel class, so it can have any of its properties. In this case, I am setting a specific width, height and border, so that it is easier to target.
Out of the box, it supports the following validations:
MaximumFiles: The maximum number of files to drop; if not set, any number is accepted;
MaximumLength: The maximum length of any individual file, in bytes; if not set, any file size is accepted;
ContentTypes: A comma-delimited list of content types; can take a precise content type, such as “image/gif” or a content type part, such as “image/”; if not specified, any content type is accepted.
When a file is dropped into the UploadPanel, the following client-side events are raised:
OnValidationFailure: Raised whenever any of the built-in validations (maximum files, maximum length and content types) fails, for any of the dropped files;
OnBeforeUpload: Raised before the upload starts, and after all built-in validations (maximum files, maximum length and content types) succeed; this gives developers a chance to analyze the files to upload and to optionally cancel the upload, or to add additional custom parameters that will be posted together with the files;
OnUploadFailure: Raised if the upload fails for some reason;
OnUploadCanceled: Raised if the upload is canceled;
OnUploadProgress: Raised possibly several times as the file is being uploaded, providing an indication of the total upload progress;
OnUploadSuccess: Raised when the upload terminates successfully;
OnUploadComplete: Raised when the upload completes, either successfully or not;
OnPreviewFile: Raised for each multimedia file uploaded (images, videos, sound), to allow previewing it on the client-side; the handler function receives the file as a data URL.
The Upload event takes a parameter of type UploadEventArgs which looks like this:
This class receives a list of all the files uploaded and also any custom properties assigned on the OnBeforeUpload event. It also allows the return of a response string, that will be received by the OnUploadSuccess or OnUploadComplete client-side events:
Finally, the code:
The UploadPanel class inherits from Panel and implements ICallbackEventHandler, for client callbacks. If you are curious, the __CALLBACKID, __CALLBACKPARAM, __EVENTTARGET and __EVENTARGUMENT are required for ASP.NET to detect a request as a callback, but only __CALLBACKID needs to be set with the unique id of the UploadPanel control.
HTML5 offers a lot of exciting new features. Stay tuned for some more examples of its integration with ASP.NET!
A nice post: IE8 and HTML 5