Dragon NaturallySpeaking vs Microsoft Speech Recognition
I recently heard from a large reseller of Dragon NaturallySpeaking who thinks ScanSoft Nuance should start giving away Dragon NaturallySpeaking Preferred - as a response to the competitive threat from the next version of Microsoft speech recognition, to be released with Windows Vista. The theory being that now is the time to get the general market "hooked" on DNS. While it's certainly an easy approach to take, it suffers from a few problems.
Switching costs for non-macro versions of speech recognition systems are not very high; just a few minutes of training, and an import of custom vocabularies. The primary speech user interface remains essentially the same – more on this later. In other markets, competitors against Microsoft might be able to rely on users knee-jerk dislike of “M$”, but it doesn't appear that ScanSoft has earned any customer loyalty. The second problem is that speech recognition is one of the remaining product domains where quality really matters. 98% recognition accuracy might be great from a speech recognition research perspective, but it's really just not very good from a user's perspective. Lots of work still needs be done. And I expect a users will go towards the product with greater accuracy and more responsiveness.
Let me instead offer my own suggestions for how Nuance can compete.
Nuance needs to open up. While I understand that they have significant intellectual property protection issues, listening and talking to customers can only help. Allowing employees to have their own weblogs and to participate on online discussion forums would go a long way towards reducing the enmity many of its most active users feel towards the company. Microsoft has at least 5 speech based weblogs (Sprague WebLog, Robert Brown, SpeechLead, Jen's Weblog, Rob's Rhapsody). In contrast, the silence from Nuance is deafening.
And of course the product needs to be improved. Some of the needed changes are architectural; dictation support for Word and Rich Text edit controls is no longer sufficient. Certainly RTF had its use back in the day, but the world has moved on. Adding support for the Windows Text Services Framework would make it easier for application vendors to have built-in free support for speech recognition. Without this application vendors have a choice to make when developing their applications. If they support the TSF they automatically support Windows speech-recognition, tablet input, virtual keyboard input and whatever other devices Windows ends up supporting. Or they can build specifically for DNS - an easy choice I would think for any application vendor.
The GUI needs improvement. This is well covered territory, but specifically the command browser is in desperate need of a mercy killing and I would follow up with a redesign for the DragonBar, throw in some new icons, and allow users to change the font size of the correction menu.
API - $2000 for the documentation of an ActiveX based API is just silly. This is an area where Nuance needs to meet Microsoft's pricing. With Windows Vista not only will users get a free speech recognition product, but developers will get a free speech API. A native .Net API, and more exposed functionality would help.
Innovation - this is needed on two different levels. Real improvements are needed in accuracy and responsiveness, but more than just Nuance need to take a lead in speech user interfaces. As I mentioned before the basic command structure for Dragon NaturallySpeaking and Microsoft Speech Recognition are quite similar. Users trained in one will really have no difficulty migrating to the other. But there's plenty of room for improvement in speech user interfaces.
Speech user interfaces, are in essence, the command line all over again - the user is expected to know in advance how to use the system.
One of the design philosophies at Applied Recognition is that the speech user interface is backed up with a graphical user interface - there's no need to guess what the command might be. In a sense this is a start towards an inductive user interface for speech. Innovation in this area from Nuance is needed to improve the user experience for its customers, and make it more difficult for its customers to switch to competing products.