'Reverse' derive correct dataset when given mean + std dev

fieldmusic123
Messages
7
Reaction score
0
If I know the mean and standard deviation, but I don't have an original data set, is it possible to reverse calculate a data set that has this known mean and std dev? Is there an excel function I could use to do this easily? I know there could be multiple correct datasets and that's fine, I'm just wondering if it's possible to reverse derive a correct one that has the known mean and std dev.

So, for a MEAN = 100 and STD. DEV= 15, can anyone tell me how to calculate a data set with n=3?

Many thanks.
 
Physics news on Phys.org
Yes, but there are often other properties that you want your data set to have (always positive, from a uniform, normal, or Poison process, etc.) So you usually want more than that. The RAND() EXCEL function gives a number from a uniform distribution between 0 and 1. It has a mean of 0.5 and a variance of 1/12. So multiply and add to get your desired SD and mean. To get a number from a normal distribution, combine RAND and NORMINV ( NORMINV(RAND(), desired_mean, desired_sd) )
 
  • Like
Likes fieldmusic123
FactChecker said:
Yes, but there are often other properties that you want your data set to have (always positive, from a uniform, normal, or Poison process, etc.) So you usually want more than that. The RAND() EXCEL function gives a number from a uniform distribution between 0 and 1. It has a mean of 0.5 and a variance of 1/12. So multiply and add to get your desired SD and mean. To get a number from a normal distribution, combine RAND and NORMINV ( NORMINV(RAND(), desired_mean, desired_sd) )

Thanks for your reply. I used " NORMINV(RAND(), 100, 15) " in Excel to generate a set of random numbers that look like they would individually fall within a normal distribution for a mean of 100. However, these values, when combined, do not have a mean of 100 (eg. 80, 93, 102). Is there a way I could perhaps use Excel to modify this data set to force it to have a mean of 100 and std dev of 15? Apologies if I am missing something in your answer.
 
Call your three data points x, y, and z. Then the mean is (x+ y+ z)/3= 100 so x+ y+ z= 300. The variance is (x- 100)^2+ (y- 100)^2+ (z- 100)^2= x^2+ y^2+ z^2- 200(x+ y+ z)+ 30000= 225. That gives two equations to solve for the three unknown values. There are infinite number of such collections of data points. If you assume a normal distribution, which depends only on the two parameters, mean and standard deviation, then, of course, you can do this.
 
  • Like
Likes fieldmusic123
fieldmusic123 said:
Is there a way I could perhaps use Excel to modify this data set to force it to have a mean of 100 and std dev of 15?
Yes, you could scale it and add an offset (two independent variables) to satisfy the two constraints. But note that the expected value of the sd of a sample is smaller than the sd of the population. For a sample size of 3 a correction factor of approximately ## \sqrt{\frac 23} ## is appropriate if I remember correctly.
 
  • Like
Likes fieldmusic123
MrAnchovy said:
Yes, you could scale it and add an offset (two independent variables) to satisfy the two constraints. But note that the expected value of the sd of a sample is smaller than the sd of the population. For a sample size of 3 a correction factor of approximately ## \sqrt{\frac 23} ## is appropriate if I remember correctly.

Thanks for your reply.

I used NORMINV(RAND(), 100, 15) in Excel to generate 80, 93, 102. Could you please show me how to scale and add a suitable offset as I'm kind of lost with this?

Huge thanks.
 
Use the formulas for the mean and standard deviation. So if you scale x, y and z by k and add an offset a, the mean is ## \frac{kx+a + ky+a + kz+a}3 = \frac k3 (x+y+z) + a ##. Use the appropriate formula for the sd as well and substitute in your numerical values and you will have two simultaneous equations that are easily solved for k and a. I'm not sure the end result has any statistical merit though.
 
  • Like
Likes fieldmusic123
fieldmusic123 said:
I used NORMINV(RAND(), 100, 15) in Excel to generate 80, 93, 102. Could you please show me how to scale and add a suitable offset as I'm kind of lost with this?
Oh. I misunderstood.
1) Calculate your sample mean sm = (80+93+102)/3 = 91.6666666666667 and sample standard deviation ssd;
2) Change each data point Xi to X'i = (15/ssd)(Xi - sm) + 100

That should have the correct sample mean and standard deviation.
Note: the standard deviation of the data will have the specified sample standard deviation. If you want to match the true standard deviation, multiply ssd by ((n-1)/n)0.5, where n is the number of data points. It is not clear to me which value would be more appropriate.
 
  • Like
Likes fieldmusic123
FactChecker said:
If you want to match the true standard deviation, multiply ssd by ((n-1)/n)0.5, where n is the number of data points.
Are you sure this is an accurate correction factor for n = 3?
 
  • Like
Likes fieldmusic123 and FactChecker
  • #10
MrAnchovy said:
Are you sure this is an accurate correction factor for n = 3?
No, I was wrong. Thanks for the correction. The standard equation for the sample variance (dividing by n-1) is an unbiased estimator for the distribution variance. Taking the square root introduces a bias, so the standard equation for the sample standard deviation (still dividing by n-1) is a biased estimator. But it underestimates σ, so multiplying it by ((n-1)/n)1/2 would make it even worse. (see https://en.wikipedia.org/?title=Unbiased_estimation_of_standard_deviation )
 
  • Like
Likes fieldmusic123
  • #11
Thanks for you help guys, now I understand
 
Back
Top