High School Variance & Standard Deviation

Vanadium 50 · Aug 29, 2024

Agent Smith said:

which is a bette

Which is better, momentum or energy? Temperature or pressure? Volume or area?

Dale · Aug 29, 2024

Agent Smith said:

which is a better, the variance or the standard deviation, in giving us an accurate measure of variability

The distinctions between them are not in terms of accuracy. Variance is nice because it is additive and you can partition a total variance into separate portions. The standard deviation is nice because it has the same units as the random variable itself and can be meaningfully compared to the mean.

Agent Smith said:

By squaring (variance), we're amplifying the magnitude of the variability and by taking the square root (standard deviation) we're downsizing the variability

Neither of these statements is true.

Agent Smith · Aug 29, 2024

Agent Smith said:

@Dale Gracias, si, it maketh no sense to compare ##m^2## with ##m##, but which is a better, the variance or the standard deviation, in giving us an accurate measure of variability, which both I presume are measuring. By squaring (variance), we're amplifying the magnitude of the variability and by taking the square root (standard deviation) we're downsizing the variability (both in terms of pure magnitude). However the magnitude itself conveys no information (vide infra )

@Dale & @Vanadium 50

For ##\sigma = 9## (say) meters, the ##9## meters by itself is meaningless, oui?

gleem · Aug 30, 2024

One can deduce that the variance is a measure of the spread or dispersion of a probability distribution through Chebyshev's inequality. E. Parsen: "Modern Probability Theory and Its Applications"

Since I am not well versed in this subject I leave it to the cognoscenti to elaborate.

Agent Smith · Aug 30, 2024

gleem said:

One can deduce that the variance is a measure of the spread or dispersion of a probability distribution through Chebyshev's inequality. E. Parsen: "Modern Probability Theory and Its Applications"

Since I am not well versed in this subject I leave it to the cognoscenti to elaborate.

Gracias.

Dale · Aug 30, 2024

Agent Smith said:

@Dale & @Vanadium 50

For ##\sigma = 9## (say) meters, the ##9## meters by itself is meaningless, oui?

##9 \mathrm{\ m}## is meaningful since it is never "by itself" but is in the context of the SI system. What I was objecting to was your claim that "By squaring (variance), we're amplifying the magnitude of the variability and by taking the square root (standard deviation) we're downsizing the variability". Neither of these is true.

The numerical point you are alluding to is numerically wrong. Yes, ##9^2 = 81 > 9##, but ##0.9^2 = 0.81 < 0.9##. So it is not always true that squaring a number "amplifies" the number. Nor is it always true that taking the square root always "downsizes" the number. Yesterday I was working with a data set with means in the ##-0.010## range and standard deviations in the ##0.005## range, so variances had even smaller numbers that were annoying to format. The variances were decidedly not "amplified".

Also, the units don't work out. ##(9\mathrm{\ m})^2 = 81 \mathrm{\ m^2}## cannot be compared to ##9\mathrm{\ m}## at all. So you cannot say that ##81 \mathrm{\ m^2}## is "amplified" from ##9 \mathrm{\ m}##.

It doesn't make sense to me to talk about the magnitude or the size of the variability as something different from the standard deviation and the variance.

Agent Smith · Aug 30, 2024

Dale said:

##9 \mathrm{\ m}## is meaningful since it is never "by itself" but is in the context of the SI system. What I was objecting to was your claim that "By squaring (variance), we're amplifying the magnitude of the variability and by taking the square root (standard deviation) we're downsizing the variability". Neither of these is true.

The numerical point you are alluding to is numerically wrong. Yes, ##9^2 = 81 > 9##, but ##0.9^2 = 0.81 < 0.9##. So it is not always true that squaring a number "amplifies" the number. Nor is it always true that taking the square root always "downsizes" the number. Yesterday I was working with a data set with means in the ##-0.010## range and standard deviations in the ##0.005## range, so variances had even smaller numbers that were annoying to format. The variances were decidedly not "amplified".

Also, the units don't work out. ##(9\mathrm{\ m})^2 = 81 \mathrm{\ m^2}## cannot be compared to ##9\mathrm{\ m}## at all. So you cannot say that ##81 \mathrm{\ m^2}## is "amplified" from ##9 \mathrm{\ m}##.

It doesn't make sense to me to talk about the magnitude or the size of the variability as something different from the standard deviation and the variance.

Gracias for the clarification, but it's still non liquet.

Say we have a statistical report saying that for whale sharks, their lengths, ##\mu = 20 \text{ m}## and ##\sigma = 3 \text{ m}##, for me ##\sigma## only makes sense if linked to ##\mu## and one also has to have a fair idea of what a ##\text{meter}## is. I think someone mentioned that we need keep our units in mind before interpreting statistical information.

Correct?

Dale · Aug 30, 2024

Agent Smith said:

for me ##\sigma## only makes sense if linked to ##\mu##

Why? I could have a random variable with ##\sigma = 3 \mathrm{\ m}## regardless of ##\mu##. The only thing that ##\sigma=3\mathrm{\ m}## tells us about ##\mu## is that ##\mu## has dimensions of length.

Agent Smith · Oct 10, 2024

A follow up question.
Watched a video on Pearson's correlation coefficient: ##r = \frac{\sum (x - \overline x)(y - \overline y)}{\sqrt {\sum (x_i - \overline x)^2 \sum (y_i - \overline y)^2}}##. The denominator, says the author, is the process of standardization. I don't what that means. Is this ##\frac{x - \overline x}{\sigma}## (z score) also standardization?

FactChecker · Oct 10, 2024

Agent Smith said:

A follow up question.
Watched a video on Pearson's correlation coefficient: ##r = \frac{\sum (x - \overline x)(y - \overline y)}{\sqrt {\sum (x_i - \overline x)^2 \sum (y_i - \overline y)^2}}##. The denominator, says the author, is the process of standardization. I don't what that means. Is this ##\frac{x - \overline x}{\sigma}## (z score) also standardization?

It normalized the result to be on within [-1, 1], where 1 is perfectly positively correlated, -1 is perfectly negatively correlated, and anything else is in between. When you want to know how strongly two things are related, you want to be able to ignore the scale of variation of each one. So you divide each one by its standard deviation.
Consider what happens without the denominators to normalize the calculations. Suppose I wanted to know how strongly the length of a person's left toe tended to imply other size characteristics.
In one case, I want to see how well it implied the length of the same person's right toe. Without the denominators, all the numbers and variations from the mean would be small and the calculation would be a small number even though you know the relationship is very strong.
In another case, I want to see how well it implied the heights of the same people. You know that the relationship is not as strong, but the calculation will give a larger number because the heights are larger and have more variation.

statdad · Dec 24, 2024

The words "standardized" and "normalized" get tossed around quite a bit in these discussions, rather like "derivative" and "integral" get tossed around in analysis: the words can mean different procedures while describing things that are quite similar in concept.

Think about the correlation coefficient you have posted. The expression $$\frac 1 n \sum{\left(x-\overline{x}\right)\left(y-\overline{y}\right)}$$ is the covariance of the variables, and its size depends on the units of both variables, so it can be arbitrarily large or small, depending on the scales of those variables.

Remember that the (biased) standard deviation for x is $$\sqrt{\frac 1 n \sum{\left(x-\overline{x}\right)^2}}$$, with a similar expression for y. The correlation coefficient is the covariance divided by the product of the two standard deviations: this has the effect of cancelling out the units of both variables so that correlation is a pure number. That process is usually referred to as normalization, but unfortunately standardization is also used.

How does it ensure correlation is between -1 and 1? It comes from the Cauchy-Schwarz inequality that says $$\left(\sum{ab}\right)^2 \le \left(\sum{a^2}\right)\left(\sum{b^2}\right)$$. If you divide the both sides by the right you get $$\frac{\left(\sum{ab}\right)^2}{\left(\sum{a^2}\right)\left(\sum{b^2}\right)} \le 1$$, and taking square roots gets the "-1 <= *** <= 1" part. Use ##x - \overline{x}## for a and##y - \overline y## for b.

Agent Smith · Dec 24, 2024

What is this

FactChecker · Dec 24, 2024

Agent Smith said:

What is this View attachment 354857

It's the Cauchy-Schwartz inequality. See this.

High School Variance & Standard Deviation

Similar threads

Undergrad A variant of the Monty Hall problem

Undergrad Please Explain (actually explain) The Monty Hall Problem

Undergrad What Are the Axioms of Fuzzy Logic and How Do They Extend Boolean Algebra?

High School How Rare Is Low Smartphone Usage Among Metro Travelers in Japan?

High School Onto set mapping is the surjective set mapping, and into injective?

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers