44100 Hz signal sampling, matlab question

  • #1
Bassalisk
947
2
So I was playing in MATLAB with signals.

I recorded myself saying some sample sentence. I imported it as a vector (it was mono), and reproduced it and it worked.

But then I told matlab, to skip every other element.

Basically I wrote this:

soundsc(a(1:2:end),44100).

what I EXPECTED was to hear some snipping. But what I heard was my voice going really fast and ending after 4 seconds. ( original clip was 8 seconds).

Why?

If MATLAB skipped every other sample, and we still kept the period(frequency) the same, why did it make the clip shorter?

I mean yes, I have two times less samples to reproduce than in original clip. But shouldn't that still be distributed throughout those 8 seconds?

Shouldn't I just end up with less "quality" voice?

I didn't tell him to delete every 2 samples, and the compress, I only told him to skip every other one.

Is MATLAB "seeing" this command (1:2:end) as a new vector essentially, which is then twice as short, and then it would sound faster?
 
Physics news on Phys.org
  • #2
what I EXPECTED was to hear some snipping. But what I heard was my voice going really fast and ending after 4 seconds. ( original clip was 8 seconds).

Why?

Because you were skipping every other sample, so an 8 second sample would have taken 4 sec to play. What you need to do is cut the sample rate in half since you are skipping every other sample, so try:

soundsc(a(1:2:end),44100/2)


If MATLAB skipped every other sample, and we still kept the period(frequency) the same, why did it make the clip shorter?

Because you kept the frequency the same. So before you had 8*44100 samples and you played them at a rate of 44100 samples per second so you had 8 seconds of audio. Now you have 4*44100 samples and are playing them at 44100 samples per second so you have 4 seconds of audio.

Skipping every other sample is equivalent to sampling at 22050Hz, so you need to play it back at that rate if you are going to skip every other sample.


Shouldn't I just end up with less "quality" voice?

Well, what I think you will find is that there isn't much change. Lowering your sample rate makes you miss higher frequency content, but the human voice only goes to about 250Hz or so. Telephones sample voice at around 8kHz.

It is interesting to try though, keep cranking down the samples until you hear a change and see if that correlates.
 
  • #3
Floid said:
Because you were skipping every other sample, so an 8 second sample would have taken 4 sec to play. What you need to do is cut the sample rate in half since you are skipping every other sample, so try:

soundsc(a(1:2:end),44100/2)




Because you kept the frequency the same. So before you had 8*44100 samples and you played them at a rate of 44100 samples per second so you had 8 seconds of audio. Now you have 4*44100 samples and are playing them at 44100 samples per second so you have 4 seconds of audio.

Skipping every other sample is equivalent to sampling at 22050Hz, so you need to play it back at that rate if you are going to skip every other sample.




Well, what I think you will find is that there isn't much change. Lowering your sample rate makes you miss higher frequency content, but the human voice only goes to about 250Hz or so. Telephones sample voice at around 8kHz.

It is interesting to try though, keep cranking down the samples until you hear a change and see if that correlates.

Thank you. I was confused with that sampling etc. As I just learned it today.
 
  • #4
The most important frequency band for understanding human speech is around 3 kHz. And that is also the frequency band where human hearing is most sensitive. Isn't evolution wonderful!

Vowel sounds contain frequencies down to about 100 Hz, but if you filter out the frequecies around 3kHz you start to lose the cononsant sounds, and you won't be able to hear the difference between words like "pig" "big" and "dig" for example.

Going from 44100 samples/sec to 22050, you are only throwing away frequences above 10.025kHz. There was probably very little in that frequency range anyway, apart from a bit of random noise.

If you output only one sample in 8 or one in 16, you will start to hear an effect.
 
  • #5
AlephZero said:
The most important frequency band for understanding human speech is around 3 kHz. And that is also the frequency band where human hearing is most sensitive. Isn't evolution wonderful!

Vowel sounds contain frequencies down to about 100 Hz, but if you filter out the frequecies around 3kHz you start to lose the cononsant sounds, and you won't be able to hear the difference between words like "pig" "big" and "dig" for example.

Going from 44100 samples/sec to 22050, you are only throwing away frequences above 10.025kHz. There was probably very little in that frequency range anyway, apart from a bit of random noise.

If you output only one sample in 8 or one in 16, you will start to hear an effect.


I found your post very interesting. I will definitely experiment with this, and research human voice as in frequency. As a future telecommunication engineer, I should know this :)
 
  • #6
Bassalisk said:
I found your post very interesting. I will definitely experiment with this, and research human voice as in frequency. As a future telecommunication engineer, I should know this :)

If you want to see something really interesting, research Pulse-Code Modulation. It's how they originally compressed voices into ludicrously low bandwidth when the telephone system was just converting from analog to digital in the 60s through the 80s. Good stuff. Also look up vocoding and Linear Predictive Coding (LPC). In LPC they basically only transmitted a kernal of the speech on the network and then locally synthesized a "close-enough" version closer to the receiver. Amazing. It's absolutely fascinating what they were able to do with the limitations of their hardware.
 
  • #7
carlgrace said:
If you want to see something really interesting, research Pulse-Code Modulation. It's how they originally compressed voices into ludicrously low bandwidth when the telephone system was just converting from analog to digital in the 60s through the 80s. Good stuff. Also look up vocoding and Linear Predictive Coding (LPC). In LPC they basically only transmitted a kernal of the speech on the network and then locally synthesized a "close-enough" version closer to the receiver. Amazing. It's absolutely fascinating what they were able to do with the limitations of their hardware.

Thank you very much. I will definitely check it out
 

Similar threads

Replies
8
Views
1K
Replies
9
Views
4K
Replies
4
Views
2K
Replies
10
Views
2K
Replies
2
Views
2K
Replies
4
Views
304
Replies
1
Views
1K
Replies
1
Views
2K
Back
Top