Maths needed for voice simulation

greeniekin · Jun 17, 2011

I am a programmer and I was thinking about making a program to simulate a voice. To be clear I'm not looking to make a text-to-speech program, and I'm only really expecting to do vowels.

My idea is to basically to generate a frequency(vibrating vocal chords) in a pipe(throat mouth). Having a different shape of the pipe.

Unfortunately searching for stuff i come across speech recognition or text to speech(and the best text to speech don't generate sounds they use recordings).

I have no idea what i should be looking for. If anyone could point me in the right direction or provide any insight it would be much appreciated.

I know little things like vowels have many different frequencies sounding at once, and that they are referred to as formants. Though i don't know how these are determined.

maverick_starstrider · Jun 18, 2011

greeniekin said:

I am a programmer and I was thinking about making a program to simulate a voice. To be clear I'm not looking to make a text-to-speech program, and I'm only really expecting to do vowels.

My idea is to basically to generate a frequency(vibrating vocal chords) in a pipe(throat mouth). Having a different shape of the pipe.

Unfortunately searching for stuff i come across speech recognition or text to speech(and the best text to speech don't generate sounds they use recordings).

I have no idea what i should be looking for. If anyone could point me in the right direction or provide any insight it would be much appreciated.

I know little things like vowels have many different frequencies sounding at once, and that they are referred to as formants. Though i don't know how these are determined.

I really don't know anything about this but I'd assume the approach you're suggesting will go nowhere. Attempting to simulate the physics of the vocal chords is probably the backwards way to do it. A better approach would be to get a microphone and sound out letters and sounds into it and then do a Fourier decomposition of each sound, strip all but the most dominant modes and replay it; see how close it sounds to correct. In this way you could very quickly establish templates for mouth sounds. Obviously there are far more advanced steps you could take from there which would relate to signal analysis.

atyy · Jun 18, 2011

I think these are more interested in singing than speech, but maybe you'll find them useful.

http://soundlab.cs.princeton.edu/research/phymod/
http://darwin.bth.rwth-aachen.de/opus/volltexte/2002/393/pdf/Kob_Malte.pdf

Maths needed for voice simulation

Similar threads

High School Is there anything in the Universe that is not fundamentally made up of matter?

New person here, where do I post my own personal hypothesis?

High School Buoyancy and gravity

High School Individual photons or electrons on a screen?

Undergrad Questions about bubble behaviour

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect