44100 Hz signal sampling, matlab question

Click For Summary

Discussion Overview

The discussion revolves around the effects of sampling a recorded audio signal in MATLAB, specifically focusing on the implications of skipping samples and how it affects playback speed and audio quality. Participants explore concepts related to audio sampling rates, human speech frequencies, and the technical aspects of sound reproduction.

Discussion Character

  • Technical explanation
  • Conceptual clarification
  • Exploratory

Main Points Raised

  • One participant describes their experience with MATLAB, noting that skipping every other sample resulted in their voice playing back faster and ending sooner than expected.
  • Another participant explains that skipping every other sample effectively halves the sample rate, leading to a shorter playback duration, and suggests adjusting the playback rate accordingly.
  • Some participants discuss the implications of lowering the sample rate on audio quality, particularly regarding the frequencies relevant to human speech.
  • There is mention of the importance of the frequency band around 3 kHz for understanding speech and how filtering out certain frequencies can affect clarity.
  • Participants express interest in further experimentation with audio sampling and its effects, as well as related topics like Pulse-Code Modulation and Linear Predictive Coding.

Areas of Agreement / Disagreement

Participants generally agree on the technical explanation of how skipping samples affects playback speed and audio quality, but there is no consensus on the extent of the impact on speech intelligibility or the specific frequencies that are critical for understanding speech.

Contextual Notes

Some participants note the limitations of their understanding of sampling and its effects, indicating that they are still learning about these concepts.

Who May Find This Useful

This discussion may be useful for individuals interested in audio processing, telecommunications, or those studying the technical aspects of sound reproduction and human speech frequencies.

Bassalisk
Messages
946
Reaction score
2
So I was playing in MATLAB with signals.

I recorded myself saying some sample sentence. I imported it as a vector (it was mono), and reproduced it and it worked.

But then I told matlab, to skip every other element.

Basically I wrote this:

soundsc(a(1:2:end),44100).

what I EXPECTED was to hear some snipping. But what I heard was my voice going really fast and ending after 4 seconds. ( original clip was 8 seconds).

Why?

If MATLAB skipped every other sample, and we still kept the period(frequency) the same, why did it make the clip shorter?

I mean yes, I have two times less samples to reproduce than in original clip. But shouldn't that still be distributed throughout those 8 seconds?

Shouldn't I just end up with less "quality" voice?

I didn't tell him to delete every 2 samples, and the compress, I only told him to skip every other one.

Is MATLAB "seeing" this command (1:2:end) as a new vector essentially, which is then twice as short, and then it would sound faster?
 
Physics news on Phys.org
what I EXPECTED was to hear some snipping. But what I heard was my voice going really fast and ending after 4 seconds. ( original clip was 8 seconds).

Why?

Because you were skipping every other sample, so an 8 second sample would have taken 4 sec to play. What you need to do is cut the sample rate in half since you are skipping every other sample, so try:

soundsc(a(1:2:end),44100/2)


If MATLAB skipped every other sample, and we still kept the period(frequency) the same, why did it make the clip shorter?

Because you kept the frequency the same. So before you had 8*44100 samples and you played them at a rate of 44100 samples per second so you had 8 seconds of audio. Now you have 4*44100 samples and are playing them at 44100 samples per second so you have 4 seconds of audio.

Skipping every other sample is equivalent to sampling at 22050Hz, so you need to play it back at that rate if you are going to skip every other sample.


Shouldn't I just end up with less "quality" voice?

Well, what I think you will find is that there isn't much change. Lowering your sample rate makes you miss higher frequency content, but the human voice only goes to about 250Hz or so. Telephones sample voice at around 8kHz.

It is interesting to try though, keep cranking down the samples until you hear a change and see if that correlates.
 
Floid said:
Because you were skipping every other sample, so an 8 second sample would have taken 4 sec to play. What you need to do is cut the sample rate in half since you are skipping every other sample, so try:

soundsc(a(1:2:end),44100/2)




Because you kept the frequency the same. So before you had 8*44100 samples and you played them at a rate of 44100 samples per second so you had 8 seconds of audio. Now you have 4*44100 samples and are playing them at 44100 samples per second so you have 4 seconds of audio.

Skipping every other sample is equivalent to sampling at 22050Hz, so you need to play it back at that rate if you are going to skip every other sample.




Well, what I think you will find is that there isn't much change. Lowering your sample rate makes you miss higher frequency content, but the human voice only goes to about 250Hz or so. Telephones sample voice at around 8kHz.

It is interesting to try though, keep cranking down the samples until you hear a change and see if that correlates.

Thank you. I was confused with that sampling etc. As I just learned it today.
 
The most important frequency band for understanding human speech is around 3 kHz. And that is also the frequency band where human hearing is most sensitive. Isn't evolution wonderful!

Vowel sounds contain frequencies down to about 100 Hz, but if you filter out the frequecies around 3kHz you start to lose the cononsant sounds, and you won't be able to hear the difference between words like "pig" "big" and "dig" for example.

Going from 44100 samples/sec to 22050, you are only throwing away frequences above 10.025kHz. There was probably very little in that frequency range anyway, apart from a bit of random noise.

If you output only one sample in 8 or one in 16, you will start to hear an effect.
 
AlephZero said:
The most important frequency band for understanding human speech is around 3 kHz. And that is also the frequency band where human hearing is most sensitive. Isn't evolution wonderful!

Vowel sounds contain frequencies down to about 100 Hz, but if you filter out the frequecies around 3kHz you start to lose the cononsant sounds, and you won't be able to hear the difference between words like "pig" "big" and "dig" for example.

Going from 44100 samples/sec to 22050, you are only throwing away frequences above 10.025kHz. There was probably very little in that frequency range anyway, apart from a bit of random noise.

If you output only one sample in 8 or one in 16, you will start to hear an effect.


I found your post very interesting. I will definitely experiment with this, and research human voice as in frequency. As a future telecommunication engineer, I should know this :)
 
Bassalisk said:
I found your post very interesting. I will definitely experiment with this, and research human voice as in frequency. As a future telecommunication engineer, I should know this :)

If you want to see something really interesting, research Pulse-Code Modulation. It's how they originally compressed voices into ludicrously low bandwidth when the telephone system was just converting from analog to digital in the 60s through the 80s. Good stuff. Also look up vocoding and Linear Predictive Coding (LPC). In LPC they basically only transmitted a kernal of the speech on the network and then locally synthesized a "close-enough" version closer to the receiver. Amazing. It's absolutely fascinating what they were able to do with the limitations of their hardware.
 
carlgrace said:
If you want to see something really interesting, research Pulse-Code Modulation. It's how they originally compressed voices into ludicrously low bandwidth when the telephone system was just converting from analog to digital in the 60s through the 80s. Good stuff. Also look up vocoding and Linear Predictive Coding (LPC). In LPC they basically only transmitted a kernal of the speech on the network and then locally synthesized a "close-enough" version closer to the receiver. Amazing. It's absolutely fascinating what they were able to do with the limitations of their hardware.

Thank you very much. I will definitely check it out
 

Similar threads

  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 9 ·
Replies
9
Views
5K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 10 ·
Replies
10
Views
4K
  • · Replies 4 ·
Replies
4
Views
7K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 1 ·
Replies
1
Views
3K