Voice Command Technology for Vehicle Control: Is it Feasible?

  • Thread starter Phrak
  • Start date
In summary: I'm pretty sure it's in the phone. And it sounds like it's probably too slow for what you want. What resources/datasheets have you found so far? If you can permit learning...I'm a bit suprised to find no neuro networks (yet) but a few DSPs. I should expect the delay in a neuro network would be in the order of a few microseconds.I'm pretty sure it's in the phone. And it sounds like it's probably too slow for what you want. What resources/datasheets have you found so far?
  • #1
Phrak
4,267
6
Does anyone know the current state of technology?

I'd like be able to build some hardware capable of identifying a small number of voice commands--only eight to twelve, more or less. But speed is an issue. The faster the better, but anything over ~0.3 second delay would probably be too much. It could be hardwired commands, or trainable to a single operator's voice, if that makes it easier or better. Commands can be single syllable.

The incoming audio range is bluetooth (30 Hz to 3 KHz?)

I can devote as much as 10 square inches of real estate to the hardware. With my small understanding of this topic, I'm not getting a lot of info from the internet. Anything could help...
 
Last edited:
Engineering news on Phys.org
  • #2
I was thinking of doing something similar but never got to it. You probably need a fast DSP processor.

What are you trying to do?
 
  • #3
david90 said:
I was thinking of doing something similar but never got to it. You probably need a fast DSP processor.

What are you trying to do?

Hi david, thanks for answering! I'm trying to control occupant ridden motorized vehicle. I think you're right about a DSP. I've veen getting a few more google hits when including 'DSP'.

I've been thinking. 10 square inches is a lot of real estate. It's not the area that concerns me, but that would be way too much hardware.
 
  • #4
I remember doing research on this topic and found out that there is a cheap voice recognition module. You could train it. I can't remember what it's called.

I didn't go the pre made module route because I want to learn. It turns out that it was too complex for me to handle.
 
  • #5
I had some students that worked with a module that could do voice-dependent command recognition, and I seem to recall that it was the HM2007:
http://www.imagesco.com/speech/speech-recognition-components.html

(The VD-01 they're hawking on the same page looks to be a voice-independent speech-to-text)

These guys were the first hit when I Google'd speech recognition IC, and looks like they've got a module that can recognize command words (either dependent or independent):
http://www.sensoryinc.com/products/modules_and_boards.html

Not sure if you'd want to digitize, transmit, and then recreate the voice signal for whichever module you end up using, or if it'd be easier to just put the module in the headset or whatever it is that you're using and then transmit the output of these ICs (Sensory claims 'low' power operation at 12 mA or so)
 
Last edited by a moderator:
  • #6
MATLABdude said:
I had some students that worked with a module that could do voice-dependent command recognition, and I seem to recall that it was the HM2007:
http://www.imagesco.com/speech/speech-recognition-components.html

(The VD-01 they're hawking on the same page looks to be a voice-independent speech-to-text)

These guys were the first hit when I Google'd speech recognition IC, and looks like they've got a module that can recognize command words (either dependent or independent):
http://www.sensoryinc.com/products/modules_and_boards.html

Not sure if you'd want to digitize, transmit, and then recreate the voice signal for whichever module you end up using, or if it'd be easier to just put the module in the headset or whatever it is that you're using and then transmit the output of these ICs (Sensory claims 'low' power operation at 12 mA or so)

I want to digitize and pass it to a ucontroller. It could be 4 bits plus controls, or UART--doesn't matter.

Thanks for the links. I'll look them over.
 
Last edited by a moderator:
  • #7
It may be covered by MATLAB's post, but I think the current mainstream speech recognition technology uses would be over-the-telephone commands (untrained) like is used in many customer service lines, and cell phone commands (trained). Not sure the best way to get pointers to those technologies and chipsets, though.
 
  • #8
berkeman said:
It may be covered by MATLAB's post, but I think the current mainstream speech recognition technology uses would be over-the-telephone commands (untrained) like is used in many customer service lines, and cell phone commands (trained). Not sure the best way to get pointers to those technologies and chipsets, though.

My wife uses voice dialing on her cell phone. I timed about 1.2 seconds from the time she finishes speaking a command until it repeats it back. From what I can gather the voice recognition processor is in the cell phone itself rather than remotely connected.

I'm a bit suprised to find no neuro networks (yet) but a few DSPs. I should expect the delay in a neuro network would be in the order of a few microseconds.
 
Last edited:
  • #9
Phrak said:
My wife uses voice dialing on her cell phone. I timed about 1.2 seconds from the time she finishes speaking a command until it repeats it back. From what I can gather the voice recognition processor is in the cell phone itself rather than remotely connected.

I'm pretty sure it's in the phone. And it sounds like it's probably too slow for what you want. What resources/datasheets have you found so far? If you can permit learning and just a few directed phrases, you should be able to get it down to pretty quick, it would seem. Heck, just count syllables worst case...
 
  • #10
berkeman said:
I'm pretty sure it's in the phone. And it sounds like it's probably too slow for what you want. What resources/datasheets have you found so far? If you can permit learning and just a few directed phrases, you should be able to get it down to pretty quick, it would seem. Heck, just count syllables worst case...

I've been looking over the hardware links povided by MATLABdude (thanks MATLABdude!) The first one quotes a 300 millisecond delay max. Marginal. The second is Sensory Inc, who seem to be selling a RISC processor with various burned-in voice firmware options. I've emailed them for recognition speeds.

I'm not sure that counting syllables would be good for my app. Instant command recognition almost begs for simple, short, barked monosyllables. Too short though, and the 'distance' between commands is small, and decoding error rates large, I'd imagine. I need a very, very small error rate. A 'go' command shouldn't be confused with 'stop'.
 
  • #11
berkeman said:
I'm pretty sure it's in the phone. And it sounds like it's probably too slow for what you want. What resources/datasheets have you found so far? If you can permit learning and just a few directed phrases, you should be able to get it down to pretty quick, it would seem. Heck, just count syllables worst case...

Berkeman. Your Ham Radio byline and syllable counting comment has shaken a few brain cells loose. The front-end ear hardware preprocesses sound to the frequency domain--the cocalear thing. Vowels are interesting objects constructed of 3 frequencies. Your vocal cords provide a fundamental frequency, f0. The shape of the mouth generates a modulation envelope such that there are at least two frequency peaks created; f1 and f2. The two ratios f1/f0 and f2/f0 decode to specific vowels.

Vowels or vowel pairs could be commands. Command recogntion would consist of doing FFTs.
 
  • #12
Good stuff, Phrak. BTW, I didn't get to give my safety-weenie speech yet in this thread...

Phrak said:
I'm trying to control occupant ridden motorized vehicle.

Now, there's a reason that Detroit hasn't yet embraced voice recognition to replace the steering wheel and foot controls. So, what back-ups do you plan for the controls, for when the voice commands aren't working or are slow to respond? Especially with the BlueTooth RF layer in the middle, it isn't going to be 100% accurate and timely all the time.

Now, if it were for remote-controlled cars, with no occupant involved, then voice recognition could be a cool thing. Kind of like what the Wii player did for video games, but in a different way.

BTW, I remember back in an old NOVA TV episode (I think it was a NOVA) about advances in fighter plane technology... They were discussing the use of voice recognition for pilot control of subsystems (not flying), and how the recognition had trouble in real-life situations because the stress of the situation chaned the pilot's voice. And then trying to talk under high-g maneuvers didn't work so well...

Finally, remember that there will be noise in any vehicle environment, so a good noise cancelling microphone is a must. Good luck!
 
  • #13
berkeman said:
Good stuff, Phrak. BTW, I didn't get to give my safety-weenie speech yet in this thread...

That's understandable. What some experimenter want to do with 120VAC is shocking. (haha)

Now, there's a reason that Detroit hasn't yet embraced voice recognition to replace the steering wheel and foot controls. So, what back-ups do you plan for the controls, for when the voice commands aren't working or are slow to respond? Especially with the BlueTooth RF layer in the middle, it isn't going to be 100% accurate and timely all the time.

Now, if it were for remote-controlled cars, with no occupant involved, then voice recognition could be a cool thing. Kind of like what the Wii player did for video games, but in a different way.

BTW, I remember back in an old NOVA TV episode (I think it was a NOVA) about advances in fighter plane technology... They were discussing the use of voice recognition for pilot control of subsystems (not flying), and how the recognition had trouble in real-life situations because the stress of the situation chaned the pilot's voice. And then trying to talk under high-g maneuvers didn't work so well...

Finally, remember that there will be noise in any vehicle environment, so a good noise cancelling microphone is a must. Good luck!

I've been thinking along similar lines. One PPM error rate could be too much. But a noise cancelling microphone is something I didn't think of.

Even with perfect decoding, the operator could say something mistaken as a command. It all seems to be a design killer.
 

1. What is voice command recognition?

Voice command recognition is a technology that allows computers, devices, and applications to interpret spoken commands and carry out corresponding actions. It uses natural language processing algorithms to understand spoken words and phrases and convert them into text.

2. How does voice command recognition work?

Voice command recognition works by using a combination of speech recognition, natural language processing, and artificial intelligence algorithms. When a user speaks a command, the device or application first converts the audio into text, then uses language models and contextual information to interpret the command and carry out the requested action.

3. What are the benefits of voice command recognition?

Voice command recognition offers several benefits, including hands-free operation, improved accessibility for individuals with disabilities, and increased efficiency and productivity. It also allows for a more natural and intuitive interaction with technology, making it easier for users to perform tasks and access information.

4. What are some common applications of voice command recognition?

Voice command recognition is used in a wide range of applications, including virtual assistants like Siri and Alexa, smart home devices, dictation software, and in-car navigation systems. It is also increasingly being integrated into mobile apps and websites to enhance user experience.

5. What are the limitations of voice command recognition?

While voice command recognition has advanced significantly in recent years, it still has some limitations. Accents, background noise, and speech impediments can affect the accuracy of recognition. It also requires an internet connection for cloud-based processing, and may not be suitable for sensitive or private tasks due to potential security risks.

Similar threads

  • Electrical Engineering
Replies
3
Views
4K
Back
Top