Speech Recognition, Dyslexia and Disabilities
"Dictating to your computer is so easy. No typing, no more spelling mistakes, it's the dyslexic person's dream."
A few years ago computer dictation was widely advertised on TV. The message was that it is dead easy. Many people bought and many people were disappointed. Most speech recognition software packages got left as “shelfware” after initial enthusiasm they stayed in the box on the shelf.
What is the reality today? Is dictation software the panacea which cures all ills? Or is it useless? “I tried that and it doesn’t work”.
The truth of course is in between. Since this article first appeared on the web in 1997 we have consistently been saying the same thing, although most of the rest has changed.
In those days we called it Voice Recognition Software (VR or VRS), and you may also see VAS for Voice Activated Software and ASR for Automatic Speech Recognition or Automated Speech Recognition. Confusingly, all these terms may also be used for other processes such as commanding machinery as well as dictating to a computer. But it’s purely computer dictation that we are talking about here.
Q What’s the most important factor for success? The hardware? The software?
A No; it’s the person
The latest software (Dragon NS Preferred 10 Software) has really solved the technical problems. A normally clear speaker, using a recent computer with a decent microphone and with a little experience should get very good recognition results and gain real productivity benefits. We outline later the technical problems which can still arise and that still need to be avoided.
However, Speech Recognition can still lead to frustration and a lack of success. Today, the main reasons for this will be human, not technical.
Speech recognition software is more likely to be successful if you are motivated because, for instance:
- you have a disability
- RSI makes typing difficult
- Dyslexia makes spelling difficult
- Dyspraxia makes handwriting and using the keyboard difficult
- so speech recognition is widely used as an Assistive Technology
- You need to write whilst using your hands for something else (e.g. radiologists or pathologists)
- You have patience to put up with some inevitable initial frustration
- You have support see below.
- You do a lot of writing as with any software you need to spend some time learning to use it effectively. If you do not use it often you will need to relearn it each time. You will probably stop bothering.
- So lawyers, academics, authors, journalists, students
- You can speak clearly (although people with severe speech impediments, such as disarthria have also persevered to get results which are satisfactory to them in overcoming severe physical problems using the keyboard)
In addition, you are more likely to be a “great dictator” if:
- You can speak fluently;
- You use a wide vocabulary;
- You can find the words you need easily (you have good word retrieval).
- You already understand word processing & punctuation.
- You can multi-task that is you can use the software whilst composing text
- You have as much privacy as you feel you need to dictate confidently
None of those bullet points are essential. Individual ones can be overcome and some contradict each other; dyslexic people often have problems with word retrieval, for example. But the more you can tick, the more likely it is that speech recognition will work for you.
It helps a lot to have somebody who knows speech recognition to guide you through the early stages. You can save a lot of wasted and frustrating time with a quiet word in your ear (”Slow down a little.” “Don’t shout.” “Move the microphone a little.” “Speak like a newsreader.” “This is the best way to make that correction.”)
Ideally, if you can afford it, a professional one-to-one trainer will save you time and give you the best start.
If you are looking for productivity gains from dictation software (and you should be), accuracy is hugely important. Each mistake that you make takes many times longer to correct compared with dictating a word correctly. So it is worth going to a lot of trouble to improve accuracy by one or two percentage points. This is particularly important for dyslexic people who are liable to have more difficulty finding and correcting an error than somebody who reads and spells well.
It is absolutely critical to have the microphone (”mic” or “mike”) properly adjusted, and we suspect that this is the single most likely cause of frustration and failure at dictation. Unfortunately the setup programs which attempt to tell you whether your microphone is properly adjusted may still tell you that your microphone is adjusted properly when it is far from optimal. For the new user this is a Catch 22, until you have had the system running with a properly adjusted microphone you do not know how well it should work, so do not know that it is not properly adjusted. If it is not properly adjusted the training process can be very frustrating and ultimately pointless.
Discrete versus continuous speech:
In the early days of speech recognition programs like Dragon Dictate required discrete speech, where you separate each word like this, with a short pause. People occasionally recommended obsolete discrete speech software until quite recently for particular circumstances, e.g. somebody with severe speech difficulties. But the normal continuous speech technology is now so much better in every way that I doubt there are any circumstances where we would still recommend discrete speech.
Performance and system requirements:
Any computer that you buy new today is likely to be powerful enough for speech recognition. But what about an older one? The official “minimum system requirements” for Processor speed and RAM, are in our opinion, the absolute minimum, although some people may find performance on even lower spec machines to be acceptable. However, we find that on the minimum spec performance can be unusably slow particularly if you are trying to dictate into a large application like Microsoft Word. Recognition accuracy also suffers.
On a less powerful machine performance is more likely to be OK if you satisfy yourself with just using the simple word processor that comes with most dictation programs (eg DragonPad), and not trying to use the direct dictation facilities. Note that later versions of the same package usually have significantly better recognition, although older versions may still be available, so keep an eye on version numbers. Dragon Naturally Speaking Version 9 is our current recommendation. So we suggest at least a 1.8 GHz Pentium with 1GB RAM.
Notebooks versus desktops:
It is generally the case that a notebook or laptop computer will be slower than a desktop machine of the same specification. It follows that it is all the more important (and, alas! all the more expensive) to have more than the minimum spec if you want a notebook to perform well. The worse performance is partly because of the different chips, the power saving capabilities and the smaller components on the notebook. It is also, with voice recognition, often caused by the fact that the sound input circuitry is electronically “noisier” than on a desktop, so that the recognition engine has trouble getting a clear signal from your voice. The noise comes from the fact that the sound circuitry is very close to electronically noisy components like disk drive motors, power supply etc. True, ie audible rather than electronic, fan noise can itself be an issue if the fan cuts in and out unpredictably.
Up until now computer and sound card manufacturers have been much more concerned about the quality of the sound coming out of their computers for games and music than the quality of the sound going in. Most new computers now give adequate sound quality, but there will be some “rogues” and the sound input quality may not have been tested. We have never seen any comparative tests of computer sound input quality, and do not have the resources to do them ourselves. So we cannot say that any particular system is “best”. (Such research would be out of date within days, anyway.) But we aim to make clear which computers we have tested to work well with speech recognition. Your best policy is to buy a certified speech recognition ready machine.
Most desktop computers now have sound built in to the motherboard like notebooks, rather than being on a separate card. This can still be less good for voice recognition, for the same reason electronic noise. So again it is safer to choose a machine that has been certified for use with speech recognition. You should make it clear to the person selling you a computer that you want it for dictation, and that you will take it back if the sound input quality is not good enough.
If all else fails, whether with a notebook or desktop, if the sound quality is not good enough you can add a USB microphone. Once again, however, it is helpful to have an expert on hand to help you work out that the computer’s sound processing isn’t good enough.
For obvious reasons the microphones that are usually provided with the software have to be cheap, and although they may be adequate for sound input quality under favourable conditions, they are often lousy for other reasons. They may be impossible to adjust adequately for many head shapes (”Change your head!” we hear them cry.) (People will often do better with the ear piece behind their ear rather than over it.) They may be uncomfortable. And the mics often, in consequence, refuse to keep in position. This is important. If the microphone moves too far away, so that the signal is weaker than it was adjusted for, or if it moves too close so that you have to hold it (sending noises up it) or so that it brushes your face, or so that you breathe and spit your plosives at it, all will significantly spoil the sound quality and cause rotten recognition.
So you will usually get better results with a better microphone than that supplied in the box. Sometimes it may make the difference between success and failure. We recommend the Andrea-NC-181VM, particularly for classroom use. It also has a reputation for being:
- as accurate as any;
- holding its position;
- minimizing interference from external noises.
Proof reading, especially from a computer screen, is a difficult skill. We all tend to read what we want to read, rather than what is actually written there. Proof reading is particularly difficult for dyslexic people. Even the best dictation system, after you have spent a long time training it and working with it, will make recognition mistakes. Some people will find some of the mistakes easy to spot, as the wrong word will be quite different. But mistakes will often be in quite small items like using the wrong one of two common short words longer words are much easier for a dictation system to get right or putting the wrong ending on a word.
To spot these errors, speech feedback is useful, where the computer reads back to you what it has written. Dyslexic people often find that speech feedback helps with grammar as well helping you to realize, for example, that a sentence has no verb. The classic program to do this with is Texthelp Read & Write. Dragon NaturallySpeaking includes its own text to speech synthesis program, which can be useful. However, beware of thinking that it is a full and adequate substitute for Texthelp Read & Write. Texthelp Read & Write’s other features like its spell checker and word prediction can also be useful even in dictation: they can make up for difficulties in entering a correction when the program hasn’t supplied the correct alternative. But most of all the text reader window, which highlights each word as it reads it, really helps a dyslexic person by making it much easier to follow the reading and focus on the mistakes. ClaroRead is a similar, but simpler product which has given careful thought to integration with Dragon NaturallySpeaking.
Can be very valuable, both to know how to use the system, and, more fundamentally, to be able to recognise when it is working properly and when not.
For somebody new to dictation there are a lot of things to get right: diction style, microphone adjustment and positioning, making corrections, punctuation and the voice commands. Dictation is a bit like riding a bike, and it can be very useful indeed to have the help of someone who knows the ropes and can guide you through getting everything right at once. Training from somebody who knows their stuff will help make the small modifications to speech style (pace, clarity, particularly of unstressed words, evenness of volume) which make a big difference. If they know how well a system should work it also overcomes the Catch 22 of not being able to tell whether the microphone is properly adjusted. Time spent training with a non-performing system is wasted. Having somebody who knows their stuff sit with you when you are starting off can save a huge amount of time and frustration. So avoid the blind leading the blind please. . .
People who are not familiar with computers are often learning dictation at the same time as they are learning the basics of computing, of Windows and of word processing all of which add up to considerable information overload. Not to mention students who are trying to follow a course as well! So some one-to-one training, support, and help with the training reading for poor readers, will make a lot of difference to how easily you get on. On the other hand, somebody who is already clued up on computers will probably find running a dictation system very straight forward (or as straightforward as anything ever is with computers!).
Training the software
Traditionally you have had to spend some time training the software to recognise your voice. You read a prepared script from the screen and the software then adjusts itself to better recognise your voice. In addition the program learns as it goes along from what it gets right and from your corrections, so that its accuracy should improve as time goes on. You can skip the training (which normally takes about 15 minutes) with the latest NaturallySpeaking, and just pitch straight in to dictating. This can be useful with people who have reading problems, for whom the training can be difficult, although for most people we would still recommend doing the training. Another strategy is for a support person to whisper phrase by phrase into the user’s ear.
Keep it simple:
Even the most sophisticated and expensive dictation system has its own simple version of the WordPad wordprocessor to dictate into. We strongly suggest that you start just by using this, on its own. Get that working well, fluently and with confidence, before you go on to using the other features of your dictation package if you wish.
It is fun to control your editing by voice and to navigate by voice around menus and windows. It may even be quicker than using mouse and keyboard (but usually isn’t!). But it introduces more complication and confusion, and there is more to go wrong and to frustrate you.
It is dictation of passages of text, working well, which will give you a major increase in productivity, with the least to learn. Don’t forget that dictating directly into, say, MS Word will make recognition go slower, unless you have loads of spare power and RAM. Of course if you have a mobility problem, such as RSI, then not using mouse and keyboard will also be important to you.
Working with children:
There is still a lot to discover about using dictation systems with children. The points made above about motivation apply in buckets with children. On the whole children are not producing masses of written work, so are less likely to have the motivation tp persevere with speech recognition. But where spelling, handwriting and composing are major problems, then SR can be hugely liberating and allow children to express their ideas on paper fluently for the first time in their lives.
Dictation systems can encourage children to speak clearly. It is important to make sure that you are familiar with the program and that it is recognising you well before you try it with a child, particularly if that child already has too much experience of failure. It is often a good idea to make corrections for the child to start off with. This lessens the cognitive load of dealing with the new system, and allows them to see what they have achieved without the extra learning and possible frustrations of correcting errors.
Studies have shown that students with learning difficulties who use speech recognition:
- Use longer and richer words
- Write more creatively
- Organise work better
- Complete more work
counterintuitively but it makes sense if you think about it
- Improve reading
- Improve spelling
- and produce better hand-written work
Which is the best speech recognition system?
Five years ago there were 13 different speech recognition programs to choose from, from four different families. Today the choice is very much easier as there is only one family that we would recommend on PCs, and that is Dragon NaturallySpeaking Version 9. IBM ViaVoice is still available but has not been developed recently and has fallen behind. Its remaining distribution is being handled by Nuance the publishers of Dragon NaturallySpeaking. We have a matrix showing the different versions of Dragon available. If you have to use an older computer then an earlier version may be justified, but if you put any value on your time and productivity we would strongly recommend the latest Dragon and up to date hardware. You may see older versions going very cheap as promotions or on Ebay. Cheap for a reason, and probably not worth the hassle!
Windows Vista will have speech recognition built in. We haven’t used it enough to form a judgement, but early indications are that the accuracy of Microsoft’s recognition has improved from Windows XP, but the convenience and correction mechanism are still some way behind Dragon. It is also possible that the UK English standard is behind the US version of Vista.
For current Macs the only choice that works is MacSpeech iListen. IBM ViaVoice is still available for older Macs but hasn’t been upgraded to work on the latest versions. IListen on the Mac is not as good as Dragon on the PC. Although Macs now are often a good choice outside their traditional architecture/graphics/design specialism, if you are producing a lot of text and need speech recognition, then we would still recommend a PC and Dragon.
Article last updated: 7 July 2010