Why don't AI systems have good voices?

  • Thread starter Thread starter IntegrateMe
  • Start date Start date
  • Tags Tags
    Ai Systems
Click For Summary

Discussion Overview

The discussion revolves around the challenges faced by AI systems in generating human-like voices, particularly focusing on the complexities of speech synthesis and recognition. Participants explore the difficulties in engineering AI to replicate the nuances of human speech.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested

Main Points Raised

  • One participant questions the relationship between AI and voice synthesizers, suggesting that both recognizing and creating human speech are significantly challenging tasks.
  • Another participant emphasizes the importance of inflection in human speech, noting that variations in pitch, amplitude, and speed convey different meanings, which AI systems struggle to replicate.
  • It is proposed that once AI can understand meaning in the same way humans do, generating speech may become simpler.
  • A link is shared by a participant, presumably to provide additional resources or information related to the topic.

Areas of Agreement / Disagreement

Participants express differing views on the relationship between AI and voice synthesis, and while some agree on the challenges of replicating human speech, there is no consensus on the solutions or the feasibility of achieving human-like voices.

Contextual Notes

The discussion does not address specific technical limitations or assumptions underlying the claims made about AI and speech synthesis.

IntegrateMe
Messages
214
Reaction score
1
Why don't AI systems have "good" voices?

Is it difficult to engineer an AI system that actually sounds like a human (in terms of speaking)?
 
Engineering news on Phys.org


I'm not sure what AI and voice synthesizers have to do with each other, but I can tell you that recognizing and creating human speech, different though they are, are both VERY difficult.

It's one of many areas where human's abilities at pattern recognition is WAY ahead of anything computers are currently able to do.
 


Human speech is "inflected" with changes of pitch, amplitude, and speed (relative length of vowels, etc) depending on the MEANING of what is being said.

For example
"JOHN ran down these stairs" (i.e John ran down, but somebody else did not).
"John RAN down these stairs" (i.e he didn't walk).
"John ran DOWN these stairs" (i.e. he didn't run up them).
"John ran down THESE stairs" (i.e. not some other stairs)
"John ran down these STAIRS" (i.e. not down the street).

Once you can get a computer to understand meaning the same way that a human does, speech generation should be pretty simple IMO. :smile:
 


well try this link click here
hope it will help you..
 

Similar threads

  • · Replies 57 ·
2
Replies
57
Views
5K
  • · Replies 101 ·
4
Replies
101
Views
5K
  • · Replies 6 ·
Replies
6
Views
4K
  • · Replies 3 ·
Replies
3
Views
1K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 9 ·
Replies
9
Views
3K
  • · Replies 28 ·
Replies
28
Views
5K
  • · Replies 12 ·
Replies
12
Views
2K
  • · Replies 13 ·
Replies
13
Views
5K