Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Voice Command Recognition

  1. Jan 1, 2009 #1
    Does anyone know the current state of technology?

    I'd like be able to build some hardware capable of identifying a small number of voice commands--only eight to twelve, more or less. But speed is an issue. The faster the better, but anything over ~0.3 second delay would probably be too much. It could be hardwired commands, or trainable to a single operator's voice, if that makes it easier or better. Commands can be single syllable.

    The incoming audio range is bluetooth (30 Hz to 3 KHz?)

    I can devote as much as 10 square inches of real estate to the hardware. With my small understanding of this topic, I'm not getting a lot of info from the internet. Anything could help...
    Last edited: Jan 1, 2009
  2. jcsd
  3. Jan 2, 2009 #2
    I was thinking of doing something similar but never got to it. You probably need a fast DSP processor.

    What are you trying to do?
  4. Jan 2, 2009 #3
    Hi david, thanks for answering! I'm trying to control occupant ridden motorized vehicle. I think you're right about a DSP. I've veen getting a few more google hits when including 'DSP'.

    I've been thinking. 10 square inches is a lot of real estate. It's not the area that concerns me, but that would be way too much hardware.
  5. Jan 2, 2009 #4
    I remember doing research on this topic and found out that there is a cheap voice recognition module. You could train it. I can't remember what it's called.

    I didn't go the pre made module route because I want to learn. It turns out that it was too complex for me to handle.
  6. Jan 2, 2009 #5


    User Avatar
    Science Advisor

    I had some students that worked with a module that could do voice-dependent command recognition, and I seem to recall that it was the HM2007:

    (The VD-01 they're hawking on the same page looks to be a voice-independent speech-to-text)

    These guys were the first hit when I Google'd speech recognition IC, and looks like they've got a module that can recognize command words (either dependent or independent):
    http://www.sensoryinc.com/products/modules_and_boards.html [Broken]

    Not sure if you'd want to digitize, transmit, and then recreate the voice signal for whichever module you end up using, or if it'd be easier to just put the module in the headset or whatever it is that you're using and then transmit the output of these ICs (Sensory claims 'low' power operation at 12 mA or so)
    Last edited by a moderator: May 3, 2017
  7. Jan 2, 2009 #6
    I want to digitize and pass it to a ucontroller. It could be 4 bits plus controls, or UART--doesn't matter.

    Thanks for the links. I'll look them over.
    Last edited by a moderator: May 3, 2017
  8. Jan 2, 2009 #7


    User Avatar

    Staff: Mentor

    It may be covered by MATLAB's post, but I think the current mainstream speech recognition technology uses would be over-the-telephone commands (untrained) like is used in many customer service lines, and cell phone commands (trained). Not sure the best way to get pointers to those technologies and chipsets, though.
  9. Jan 2, 2009 #8
    My wife uses voice dialing on her cell phone. I timed about 1.2 seconds from the time she finishes speaking a command until it repeats it back. From what I can gather the voice recognition processor is in the cell phone itself rather than remotely connected.

    I'm a bit suprised to find no neuro networks (yet) but a few DSPs. I should expect the delay in a neuro network would be in the order of a few microseconds.
    Last edited: Jan 2, 2009
  10. Jan 2, 2009 #9


    User Avatar

    Staff: Mentor

    I'm pretty sure it's in the phone. And it sounds like it's probably too slow for what you want. What resources/datasheets have you found so far? If you can permit learning and just a few directed phrases, you should be able to get it down to pretty quick, it would seem. Heck, just count syllables worst case...
  11. Jan 2, 2009 #10
    I've been looking over the hardware links povided by MATLABdude (thanks MATLABdude!) The first one quotes a 300 millisecond delay max. Marginal. The second is Sensory Inc, who seem to be selling a RISC processor with various burned-in voice firmware options. I've emailed them for recognition speeds.

    I'm not sure that counting syllables would be good for my app. Instant command recognition almost begs for simple, short, barked monosyllables. Too short though, and the 'distance' between commands is small, and decoding error rates large, I'd imagine. I need a very, very small error rate. A 'go' command shouldn't be confused with 'stop'.
  12. Jan 2, 2009 #11
    Berkeman. Your Ham Radio byline and syllable counting comment has shaken a few brain cells loose. The front-end ear hardware preprocesses sound to the frequency domain--the cocalear thing. Vowels are interesting objects constructed of 3 frequencies. Your vocal cords provide a fundemental frequency, f0. The shape of the mouth generates a modulation envelope such that there are at least two frequency peaks created; f1 and f2. The two ratios f1/f0 and f2/f0 decode to specific vowels.

    Vowels or vowel pairs could be commands. Command recogntion would consist of doing FFTs.
  13. Jan 3, 2009 #12


    User Avatar

    Staff: Mentor

    Good stuff, Phrak. BTW, I didn't get to give my safety-weenie speech yet in this thread...

    Now, there's a reason that Detroit hasn't yet embraced voice recognition to replace the steering wheel and foot controls. So, what back-ups do you plan for the controls, for when the voice commands aren't working or are slow to respond? Especially with the BlueTooth RF layer in the middle, it isn't going to be 100% accurate and timely all the time.

    Now, if it were for remote-controlled cars, with no occupant involved, then voice recognition could be a cool thing. Kind of like what the Wii player did for video games, but in a different way.

    BTW, I remember back in an old NOVA TV episode (I think it was a NOVA) about advances in fighter plane technology.... They were discussing the use of voice recognition for pilot control of subsystems (not flying), and how the recognition had trouble in real-life situations because the stress of the situation chaned the pilot's voice. And then trying to talk under high-g maneuvers didn't work so well...

    Finally, remember that there will be noise in any vehicle environment, so a good noise cancelling microphone is a must. Good luck!
  14. Jan 3, 2009 #13
    That's understandable. What some experimenter want to do with 120VAC is shocking. (haha)

    I've been thinking along similar lines. One PPM error rate could be too much. But a noise cancelling microphone is something I didn't think of.

    Even with perfect decoding, the operator could say something mistaken as a command. It all seems to be a design killer.
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook