Learn Speech Recognition Programming from Scratch

  • Thread starter Thread starter Gianluca
  • Start date Start date
AI Thread Summary
The discussion centers on learning computer programming from scratch to create a speech recognition program that identifies four specific words: pear, flower, bee, and apple. The user prefers to work on Windows but is open to Linux. They aim to develop a program that recognizes spoken words using sound wave analysis, emphasizing the need for a foundational approach rather than relying on pre-built packages, as this is part of a physics thesis. Key challenges include determining how to store the recognized words, whether in a database or a custom file format. Suggestions include utilizing built-in speech recognition features in Windows and the .NET Framework, with references to MSDN articles for guidance. The conversation also touches on using statistical methods, specifically Hidden Markov Models (HMM), for classifying spoken words, highlighting that while the process may be complex, resources are available to assist in learning and implementation.
Gianluca
Messages
4
Reaction score
0
Hello guys!
Well, computer programming language: any. I am willing to learn from scratch. Platform, preferably Windows, but I'm prepared to fall back on Linux (but I do not know it with perfection)
The end result I want to achieve is to create a program (or script) that allows the computer to determine which of, well, four words he already knows (ie pear, flower, bee, apple), I just said.
Something like that: Start the program (he already knows the 4 words). I say apple at the mic. And he answers: apple.
Nothing fancy, completely useless, I know, but it'll not only help to expand considerably my coder skills, but also it'll help me to integrate significantly a thesis in physics on which I'm working. And it is precisely the problem.
For Linux there are a lot of packages which, if implemented in a script, help in this area. But unfortunately I can not afford to have that packages that do the dirty work for me, cause it is a thesis of physics, not a computer science one, and I've got to find the way to teach the PC to recognize what I'm saying, using the basis of the wave from of the sound, or otherwise a physically and objectively measurable (with a mic) parameter.
Another problem: where to save the four words he has learned? Database? Invented ad hoc file? hum :|

I realize this is something incredibly difficul to achieve, maybe impossible, but perhaps you know some sites, papers, books (author and title are enough for me), would you do me a favor and link them?
Even better, you could link me another section of this forum maybe a bit more suitable to address the issue?

If something is not clear, just say ... I'm italian and it's not impossible that I've done some stupid errors somewhere :P

THX! :D
 
Technology news on Phys.org
So are you more interested in classifying the signal (as one of your four words) or in processing the signal into something useful to a classifier? If you want to classify it, I think you'll have the most luck doing it statistically with an HMM. This has been the predominant approach for long enough that you can find resources explaining how to do it. I've never done the type of signal processing that's required for speech recognition, but the HMM is pretty simple to write and train, and implementing Viterbi is short and straightforward. You can certainly write these parts yourself from scratch.
 
Dear Peeps I have posted a few questions about programing on this sectio of the PF forum. I want to ask you veterans how you folks learn program in assembly and about computer architecture for the x86 family. In addition to finish learning C, I am also reading the book From bits to Gates to C and Beyond. In the book, it uses the mini LC3 assembly language. I also have books on assembly programming and computer architecture. The few famous ones i have are Computer Organization and...
I had a Microsoft Technical interview this past Friday, the question I was asked was this : How do you find the middle value for a dataset that is too big to fit in RAM? I was not able to figure this out during the interview, but I have been look in this all weekend and I read something online that said it can be done at O(N) using something called the counting sort histogram algorithm ( I did not learn that in my advanced data structures and algorithms class). I have watched some youtube...
Back
Top