Standard deviation is wrong whatever I do?

Click For Summary

Homework Help Overview

The discussion revolves around the calculation of standard deviation in a weighted dataset. Participants are examining discrepancies between their manual calculations and results obtained from Excel, particularly focusing on the mean and standard deviation values derived from their data.

Discussion Character

  • Exploratory, Assumption checking, Mathematical reasoning

Approaches and Questions Raised

  • Participants discuss their calculations of standard deviation and mean, questioning the accuracy of their results. Some suggest that rounding errors may be influencing the outcomes, while others explore the implications of using weights in their calculations.

Discussion Status

The discussion is active, with participants sharing their calculations and identifying potential errors. Some have provided guidance on how to correctly apply weights in Excel, while others are clarifying the correct denominator to use in the standard deviation formula. Multiple interpretations of the calculations are being explored without a clear consensus.

Contextual Notes

There are mentions of confusion regarding the number of observations versus the sum of weights in the context of calculating standard deviation. Participants are also reflecting on how to accurately represent uncertainty in their results.

santoki
Messages
34
Reaction score
0
0AcVc67.jpg


After filling out the table, I used it to finally calculate the standard deviation as 0.105. Just to make sure, I ran it through Microsoft Excel and got 0.942809042. I put two together and figured I'm doing something wrong after the xw column because Excel agrees that the mean is also 1.7.

Sample calculations:
d = x - <x>
d = (1.50 m) - (1.7 m)
d = -0.2 m

d2 = (-0.2 m)2 = 0.04

wd2 = (1)(0.04) = 0.04

SD = [itex]\sqrt{[0.04+0.0225+0.09+0.0075+0+0+0.02+0+0.04]/(21-1)}[/itex]
SD = 0.1048808848
 
Physics news on Phys.org
If you take the weights seriously, you end up with ##{\Sigma x_i w_i \over \Sigma w_i }= 1.652##
 
I think your problem is rounding error. 1.7 is not a sufficiently accurate value for the mean.
 
  • Like
Likes   Reactions: 1 person
In practice, we avoid having to correct all di (in case of a change or correction) by keeping track of x and x2 and later using
< (x - <x>)2 > = <x2> - 2 < x <x> > + (<x>)2= <x2> - 2 <x><x> > + (<x>)2 = <x2> - <x>2
 
changed mean as suggested

L2MTD6y.png


Actually it wasn't just the mean, I didn't round anything till I solved for the standard deviation. Though what I ended up with was in the hundredths place!

SD = [itex]\sqrt{0.172381853/20}[/itex]
SD = 0.0928390686

I really don't understand what kind of math error I'm doing or if I'm missing something?
 
santoki said:
SD = 0.0928390686
That's the right answer (now). You must have made a mistake in how you put it into Excel.
 
haruspex said:
That's the right answer (now). You must have made a mistake in how you put it into Excel.

I've rechecked my data on Excel and it matched up. I've only highlighted the x and w columns to make a descriptive statistics analysis on Excel and came up with that.

Do you have any idea what could have caused the two answers to be a decimal place away from each other?
 
[strike]Ah, the denominator in the square root for SD is not the sum of the weights -1, but the number of observations - 1, so 8. Then you get 0.147, so you report <x> = 1.65 +/- 0.05 with a standard deviation of 0.15[/strike]

Edited: no, the ∑w -1 should be in the denominator.

I had a hard time finding the smaller discrepancies, but: there seems to be a 0 missing in the d for x=1.65.
Code:
		x	w	xw		d	d^2		wd^2
							
		1.5	1	1.5	-0.152380952	0.023219955	0.023219955
		1.55	1	1.55	-0.102380952	0.010481859	0.010481859
		1.6	9	14.4	-0.052380952	0.002743764	0.024693878
		1.65	3	4.95	[B]-0.002380952[/B]	0.000005669	0.000017007
		1.7	4	6.8	0.047619048	0.002267574	0.009070295
		1.75	0	0	0.097619048	0.009529478	0.000000000
		1.8	2	3.6	0.147619048	0.021791383	0.043582766
		1.85	0	0	0.197619048	0.039053288	0.000000000
		1.9	1	1.9	0.247619048	0.061315193	0.061315193
							
	Sum	15.3	21	34.7	0.428571429	0.170408163	0.172380952
							
	XL mean	1.70					
	XL stdev	0.136930639
					
weighted actual
	<x>		[B]1.652[/B]		(from 34.7/21)
[strike]	SD		0.147		(from sqrt( 0.17238/(9-1) )[/strike]
	SD		0.09283		(from sqrt( 0.17238/([b]20[/b]-1) )
Edited: Sorry to make a mess of things. Now I really have to think what the best estimate for the uncertainty in the mean is... probably sd/√∑w = 0.0202, so you would still report something like

1.65 +/- 0.02 with σ = 0.09
 
Last edited:
BvU said:
Ah, the denominator inder the square root for SD is not the sum of the weights -1, but the number of observations - 1, so 8.[/code]

I checked my calculation for x = 1.65 and you were right, I three zeroes instead of two. My last question is that why did the N (21) from N = ∑w and <x> = (∑xw)/N changed to the N (9) in the SD formula. Wouldn't that be unclear for students and would all make the same mistake of using 21 as N for each column?
 
  • #10
So your SD 0.093 was correct. As Haruspex already stated. :redface: I was so embarrassed I forgot the ##\Sigma##w that I edited my last post. Somewhat confusing, I'm sorry to say.

Your 3 zeros instead of 2 surprised me: why not let XL do the work ?

Then: why XL Descriptive statistics doesn't do what you expect: well for the height because it isn't weighed. For the frequency it calculates the average weight (21/9) which isn't what you want. I don't know how to seduce XL to take the weights into account as weights, so I had to trick it by simply typing 1.6 nine times, 1.65 three times, etc. Then Descriptive statistics does do the remainder of the work:

Code:
Test	
	
Mean			1.652380952
Standard Error		0.020259093
Median			1.6
Mode			1.6
Standard Deviation	0.092838826
Sample Variance		0.008619048
...
Sum	34.7
Count	21
 
  • #11
BvU said:
why XL Descriptive statistics doesn't do what you expect: well for the height because it isn't weighed. For the frequency it calculates the average weight (21/9) which isn't what you want. I don't know how to seduce XL to take the weights into account as weights, so I had to trick it by simply typing 1.6 nine times, 1.65 three times, etc. Then Descriptive statistics does do the remainder of the work[/code]

I think I get what you mean now that I can compare two different outcomes. Thank you very much for your help!
 

Similar threads

  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 5 ·
Replies
5
Views
2K
Replies
8
Views
4K
  • · Replies 4 ·
Replies
4
Views
2K
Replies
1
Views
3K
  • · Replies 3 ·
Replies
3
Views
4K
  • · Replies 14 ·
Replies
14
Views
8K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 14 ·
Replies
14
Views
3K
  • · Replies 18 ·
Replies
18
Views
3K