140 likes | 244 Views
Conversing with your computer: challenges, solutions and the road ahead Pranav Lal. Introduction.
E N D
Conversing with your computer: challenges, solutions and the road aheadPranav Lal
Introduction • When we think of conversing with our computer, pictures of HAL in 2001, a space odyssey comes to mind. However, the technology for doing this is here and is being used heavily in specialist applications. Namely, speech-recognition engines have been connected to screen readers to allow you to converse with your computer and control it at the same time. This presentation will highlight the unique challenges in connecting screen readers to speech-recognition applications and elucidate how they have been overcome.
Screen Readers • Converts the text on a computer screen to speech. • Must monitor the computer for significant events such as error messages and other informational popups. • Also reads the World Wide Web.
Using a screen reader • The user uses several keystrokes to issue screen reader commands. • E.g. the ‘Insert’ key + down arrow is used to read a line of text. • Special keystrokes exist for other tasks such as reading tables. • The user can control what information is given via these keystrokes and other configuration options.
Speech-recognition program • Converts speech to text. • Has several speech commands that the user can issue to carry out actions on the operating system and in various applications. • Statistical models are used to compute word probabilities in context and output the recognized text to the screen.
Integrating screen readers with speech-recognition programs • Seems simple since a speech-recognition program after all is just another program running on the computer.
Integration challenges • Technical - Controlling the screen reader from the speech-recognition program, monitoring the speech-recognition program in real time etc. • Usability - Integration complexities need to be masked to facilitate smooth usage. • Environmental - The computer on which such technology is run must have sufficient system resources to support it.
Integration challenges continued • Development - Keep up with updates in both screen readers and speech recognition technology, cater to multiple customer needs and applications. • Market related - Strong business model since niche market. A lot of research is required to develop this technology. Staff need to be trained to support this technology.
A solution • The discussion around the solutions has to be necessarily product based. There is no one framework or strategy that can be used to marry speech-recognition applications and screen readers. The current solution on the market is the J-ware line of products from T&T Consultancy Ltd. Their flagship product is called J-say.
J-Say • J-say combines Dragon NaturallySpeaking and Jaws for Windows to provide a seamless and consistent solution for integrating Dragon NaturallySpeaking and Jaws for windows. Jaws for windows is one of the foremost screen readers on the market. The reason it lends its self so well for integration with speech-recognition is because of its extremely powerful scripting language.
J-Say and integration challenges • Technical - Uses jaws scripts, Microsoft Active accessibility and other program APIs to bridge Jaws and Dragon. • Usability - follows a minimal speech approach for input and output and uses natural language commands. • Environmental - A computer with a sufficiently robust configuration must be used to run J-Say. • Development - Jaws and Dragon are very extensible. Jaws scripts can be called from Dragon using a single routine. • Market - T&T Consultancy has a port folio of products so is not tied to a single set of customers. Developers are encouraged and users are free to ask questions. The chair person himself monitor’s the user discussion forum and participates in beta testing.
Conclusion • Speech-recognition is being gradually applied in various devices. We can see it most frequently in our ability to dictate an assigned name of a contact into our mobile phones. Similarly, speech synthesis is also catching up. Again, note the synthetic voice announcing the name of the caller in a number of mobile phones. In the long run, speech as a mode of input will probably replace the keyboard since it is far more natural for a person. For this to happen though a significant number of technical challenges need to be met. In terms of economics, small companies with innovative products will succeed in the market. Some of these products may be niche products initially so the scale of operations could be small, but over time, as the word spreads and technology improves, they will become common place.
References • http://www.tandt-consultancy.com- developers of J-Say • http://www.ngtvoice.com-J-Say distributer for the USA and Canada • http://www.freedomscientific.com- makers of Jaws For Windows • http://www.nuance.com- makers of Dragon naturally Speaking