Dragon NaturallySpeaking vs Microsoft Speech Recognition

I recently heard from a large reseller of Dragon NaturallySpeaking who thinks ScanSoft Nuance should start giving away Dragon NaturallySpeaking Preferred - as a response to the competitive threat from the next version of Microsoft speech recognition, to be released with Windows Vista. The theory being that now is the time to get the general market "hooked" on DNS. While it's certainly an easy approach to take, it suffers from a few problems.

 

Switching costs for non-macro versions of speech recognition systems are not very high; just a few minutes of training, and an import of custom vocabularies. The primary speech user interface remains essentially the same – more on this later. In other markets, competitors against Microsoft might be able to rely on users knee-jerk dislike of “M$”, but it doesn't appear that ScanSoft has earned any customer loyalty. The second problem is that speech recognition is one of the remaining product domains where quality really matters. 98% recognition accuracy might be great from a speech recognition research perspective, but it's really just not very good from a user's perspective. Lots of work still needs be done. And I expect a users will go towards the product with greater accuracy and more responsiveness.

 

Let me instead offer my own suggestions for how Nuance can compete.

 

Nuance needs to open up. While I understand that they have significant intellectual property protection issues, listening and talking to customers can only help. Allowing employees to have their own weblogs and to participate on online discussion forums would go a long way towards reducing the enmity many of its most active users feel towards the company. Microsoft has at least 5 speech based weblogs (Sprague WebLog, Robert Brown, SpeechLeadJen's Weblog, Rob's Rhapsody). In contrast, the silence from Nuance is deafening.

 

And of course the product needs to be improved. Some of the needed changes are architectural; dictation support for Word and Rich Text edit controls is no longer sufficient. Certainly RTF had its use back in the day, but the world has moved on. Adding support for the Windows Text Services Framework would make it easier for application vendors to have built-in free support for speech recognition. Without this application vendors have a choice to make when developing their applications. If they support the TSF they automatically support Windows speech-recognition, tablet input, virtual keyboard input and whatever other devices Windows ends up supporting. Or they can build specifically for DNS - an easy choice I would think for any application vendor.

 

The GUI needs improvement. This is well covered territory, but specifically the command browser is in desperate need of a mercy killing and I would follow up with a redesign for the DragonBar, throw in some new icons, and allow users to change the font size of the correction menu.

 

API - $2000 for the documentation of an ActiveX based API is just silly. This is an area where Nuance needs to meet Microsoft's pricing. With Windows Vista not only will users get a free speech recognition product, but developers will get a free speech API. A native .Net API, and more exposed functionality would help.

 

Innovation - this is needed on two different levels. Real improvements are needed in accuracy and responsiveness, but more than just Nuance need to take a lead in speech user interfaces. As I mentioned before the basic command structure for Dragon NaturallySpeaking and Microsoft Speech Recognition are quite similar. Users trained in one will really have no difficulty migrating to the other. But there's plenty of room for improvement in speech user interfaces.

 

Speech user interfaces, are in essence, the command line all over again - the user is expected to know in advance how to use the system.

One of the design philosophies at Applied Recognition is that the speech user interface is backed up with a graphical user interface - there's no need to guess what the command might be. In a sense this is a start towards an inductive user interface for speech. Innovation in this area from Nuance is needed to improve the user experience for its customers, and make it more difficult for its customers to switch to competing products.

7 Comments

  • Couldn't agree more, especially with the "opening up" part. While working on an interface between NaturallySpeaking and XEmacs I was rather surprised to find so little information available and no way whatsoever to talk to the people behind the product.

    Nuance should realize that these kinds of addons and interfaces will actually drive the sales of their products. I'm not there to compete with them, I just want to give people more ways of using NaturallySpeaking.

    I really, really hope this will change.

  • I have used Dragon Dictate since at least version 6. [I am a very poor typist and eagle help I can get.]

    I just purchased version 10 and found it to be less accurate with more glitches than version 9, which in turn was less accurate and slower than version 8.

    Over the last several days I have reinstalled version 8 on my PC's and it is relatively quick and accurate. I am much happier.

    I have not tried to Vista voice-recognition. Is it better than Dragon version 8?

    I'm also troubled at how Nuance purchased and then suppressed the IBM voice-recognition product.

    Does anyone know a current user blog that discusses these voice-recognition issues?

  • I'm currently using the Dragon version 10 to write this. I found it has become much more accurate after I've used it for a while. It seems that, as Nuance has stated, the program becomes much more adept at to the user after being used for a while.sure, it makes a few mistakes here and there, but the more you use it the better it becomes and the better you become at using it. At least, that was the case with me.

  • sdk API - $2000? I'd like to know more things like:
    Are there videos included in this sdk?
    What apps. out there have been made using this DNS sdk?
    Vista speech engine is embeded? why is that?
    I see man-machine comunication through voice a bit far into the future?
    Speech recognition technology monopoly? why?
    we're still tied up to mouse and keyboard?
    the 1 million $ question: why use a speech recognition system that is less accuarate than a human? This tool should, naturally surpass human skill...to justify its mass use.. fast and accurate is the human formula...


  • For some people with repetitive strain injury speech recognition can be the difference between continuing work and having to change your job.

  • Couldn't agree more, especially with the "opening up" part. While working on an interface between NaturallySpeaking and XEmacs I was rather surprised to find so little information available and no way whatsoever to talk to the people behind the product.

    Nuance should realize that these kinds of addons and interfaces will actually drive the sales of their products. I'm not there to compete with them, I just want to give people more ways of using NaturallySpeaking.

    I really, really hope this will change...

  • So what u guys are saying is dragon ie good at accuracy.. but i tried microsoft SDK 5.1 thats too weak in accuracy..do we have nay open source for this voice recognition

Comments have been disabled for this content.