Acoustic Model and Language Model

In summary, the conversation is discussing an exercise involving a vowel (V), a feature vector (O), and two models: an acoustic model (P AM) and a language model (P LM). The goal is to find a vowel (V) that will maximize the probability of V given O (P(V|O)), using the log likelihoods provided in a table. The specific steps for finding this vowel are unclear, as well as the role of the numbers in the log table. It is recommended to seek clarification from an instructor.
  • #1
nao113
68
13
Homework Statement
Suppose 𝑉 is a vowel and 𝑂 is a feature vector.
Suppose that 𝑃 AM (𝑂|𝑉) is an acoustic model and 𝑃 𝐿M (𝑉) is a language model. Obtain a vowel 𝑉 that maximizes 𝑃(𝑉|𝑂) when the acoustic and language model log likelihoods are given in the following table.
Relevant Equations
W: a vowel v (v ∊ {a,i,u,e,o})
O: a feature vector
Question:
Screenshot 2023-04-25 at 19.26.03.png


My Answer:
WhatsApp Image 2023-04-25 at 19.32.30.jpeg


Is it correct? Thank you
 
Last edited by a moderator:
Physics news on Phys.org
  • #2
nao113 said:
Homework Statement: Suppose 𝑉 is a vowel and 𝑂 is a feature vector.
Suppose that 𝑃 AM (𝑂|𝑉) is an acoustic model and 𝑃 𝐿M (𝑉) is a language model. Obtain a vowel 𝑉 that maximizes 𝑃(𝑉|𝑂) when the acoustic and language model log likelihoods are given in the following table.
Relevant Equations: W: a vowel v (v ∊ {a,i,u,e,o})
O: a feature vector

Question:
View attachment 325473

My Answer:
View attachment 325474

Is it correct? Thank you
No idea without some more context.
Is P(V|O) a conditional probability?
What does argmax mean?
How did you go from ##P(V|O)## to ##\frac{P(O|V)P(V)}{P(O)}## in the 2nd line of your work and similar for the 3rd line?
What role do the numbers in the log table play?
 
  • #3
This is the reference that I got, I don t know about what argmax mean here, so I assumed it has the same meaning as log e (P(V|O)).
Screenshot 2023-04-26 at 17.05.46.png

Screenshot 2023-04-26 at 17.06.12.png

Screenshot 2023-04-26 at 17.05.55.png
 
  • #4
What you've posted so far doesn't give any definition of "argmax". In your work that you showed in post #1, you added the numbers in the first row of the table to get one sum, and then added the numbers in the second row to get another sum. You then multiplied the two sums.

Given that I know nothing more about this than what you posted, I think your work is incorrect. My guess, and this is only a guess, is that to maximize ##P(O|W)P(O)## what you need to do is to look at the five separate products of the numbers in the five columns, and pick whichever one is the largest. You might get better advice by contacting your instructor.
 

1. What is an acoustic model?

An acoustic model is a statistical model used in speech recognition systems to represent the relationship between audio signals and phonemes (the smallest units of sound in a language). It is trained on a large dataset of audio recordings and their corresponding transcriptions to learn the patterns and variations in speech sounds.

2. What is a language model?

A language model is a statistical model used in natural language processing to predict the probability of a sequence of words occurring in a given language. It is trained on a large corpus of text data to learn the patterns and rules of a language, and is used to help a computer understand and generate human language.

3. How are acoustic and language models used in speech recognition?

An acoustic model is used to convert audio signals into a sequence of phonemes, while a language model is used to determine the most likely sequence of words based on the phoneme sequence. These models work together to transcribe spoken words into text.

4. What is the difference between an acoustic model and a language model?

The main difference between an acoustic model and a language model is the type of input they analyze. An acoustic model processes audio signals, while a language model processes text. Additionally, an acoustic model focuses on the relationship between audio signals and phonemes, while a language model focuses on the relationship between words and their context in a language.

5. How are acoustic and language models improved?

Acoustic and language models are constantly being improved through the use of larger and more diverse training datasets, as well as advancements in machine learning algorithms. Additionally, incorporating contextual information and using multiple models in combination can also improve their accuracy and performance.

Similar threads

  • Engineering and Comp Sci Homework Help
Replies
9
Views
1K
  • Engineering and Comp Sci Homework Help
Replies
18
Views
2K
  • STEM Academic Advising
Replies
1
Views
92
  • Engineering and Comp Sci Homework Help
Replies
1
Views
783
Replies
1
Views
350
  • Classical Physics
Replies
8
Views
876
  • New Member Introductions
Replies
2
Views
39
  • Classical Physics
2
Replies
42
Views
2K
Replies
9
Views
4K
  • Electrical Engineering
Replies
3
Views
780
Back
Top