'Reverse' derive correct dataset when given mean + std dev

  • Context: Undergrad 
  • Thread starter Thread starter fieldmusic123
  • Start date Start date
  • Tags Tags
    Derive Mean Reverse
Click For Summary

Discussion Overview

The discussion revolves around the feasibility of reverse calculating a dataset given its mean and standard deviation, specifically focusing on generating a dataset with a mean of 100 and a standard deviation of 15 using Excel. Participants explore various methods and considerations related to the properties of the dataset.

Discussion Character

  • Exploratory
  • Technical explanation
  • Mathematical reasoning

Main Points Raised

  • One participant inquires about the possibility of deriving a dataset from known mean and standard deviation, acknowledging that multiple datasets could satisfy these conditions.
  • Another participant suggests using the Excel functions RAND() and NORMINV to generate numbers from uniform and normal distributions, respectively, to achieve the desired mean and standard deviation.
  • There is a discussion about the need for additional properties in the dataset, such as positivity or adherence to specific distributions.
  • One participant provides a mathematical approach to derive three data points that meet the mean and standard deviation requirements, indicating that there are infinite combinations possible.
  • Another participant mentions the possibility of scaling and adding an offset to adjust the generated dataset to meet the specified mean and standard deviation.
  • There is a correction regarding the use of a correction factor for the standard deviation when working with small sample sizes, with some participants expressing uncertainty about its appropriateness.
  • A later reply clarifies the method for adjusting the dataset to achieve the correct mean and standard deviation, including a formula for scaling and offsetting the data points.
  • One participant questions the accuracy of the correction factor for a sample size of 3, leading to a discussion about the bias in standard deviation estimations.

Areas of Agreement / Disagreement

Participants generally agree that it is possible to generate a dataset with the specified mean and standard deviation, but there are multiple methods and considerations involved. Disagreement exists regarding the appropriate correction factor for standard deviation in small samples, and the discussion remains unresolved on this point.

Contextual Notes

Limitations include the assumptions made about the distribution of the data and the potential biases introduced when estimating standard deviation from small samples.

fieldmusic123
Messages
7
Reaction score
0
If I know the mean and standard deviation, but I don't have an original data set, is it possible to reverse calculate a data set that has this known mean and std dev? Is there an excel function I could use to do this easily? I know there could be multiple correct datasets and that's fine, I'm just wondering if it's possible to reverse derive a correct one that has the known mean and std dev.

So, for a MEAN = 100 and STD. DEV= 15, can anyone tell me how to calculate a data set with n=3?

Many thanks.
 
Physics news on Phys.org
Yes, but there are often other properties that you want your data set to have (always positive, from a uniform, normal, or Poison process, etc.) So you usually want more than that. The RAND() EXCEL function gives a number from a uniform distribution between 0 and 1. It has a mean of 0.5 and a variance of 1/12. So multiply and add to get your desired SD and mean. To get a number from a normal distribution, combine RAND and NORMINV ( NORMINV(RAND(), desired_mean, desired_sd) )
 
  • Like
Likes   Reactions: fieldmusic123
FactChecker said:
Yes, but there are often other properties that you want your data set to have (always positive, from a uniform, normal, or Poison process, etc.) So you usually want more than that. The RAND() EXCEL function gives a number from a uniform distribution between 0 and 1. It has a mean of 0.5 and a variance of 1/12. So multiply and add to get your desired SD and mean. To get a number from a normal distribution, combine RAND and NORMINV ( NORMINV(RAND(), desired_mean, desired_sd) )

Thanks for your reply. I used " NORMINV(RAND(), 100, 15) " in Excel to generate a set of random numbers that look like they would individually fall within a normal distribution for a mean of 100. However, these values, when combined, do not have a mean of 100 (eg. 80, 93, 102). Is there a way I could perhaps use Excel to modify this data set to force it to have a mean of 100 and std dev of 15? Apologies if I am missing something in your answer.
 
Call your three data points x, y, and z. Then the mean is (x+ y+ z)/3= 100 so x+ y+ z= 300. The variance is (x- 100)^2+ (y- 100)^2+ (z- 100)^2= x^2+ y^2+ z^2- 200(x+ y+ z)+ 30000= 225. That gives two equations to solve for the three unknown values. There are infinite number of such collections of data points. If you assume a normal distribution, which depends only on the two parameters, mean and standard deviation, then, of course, you can do this.
 
  • Like
Likes   Reactions: fieldmusic123
fieldmusic123 said:
Is there a way I could perhaps use Excel to modify this data set to force it to have a mean of 100 and std dev of 15?
Yes, you could scale it and add an offset (two independent variables) to satisfy the two constraints. But note that the expected value of the sd of a sample is smaller than the sd of the population. For a sample size of 3 a correction factor of approximately ## \sqrt{\frac 23} ## is appropriate if I remember correctly.
 
  • Like
Likes   Reactions: fieldmusic123
MrAnchovy said:
Yes, you could scale it and add an offset (two independent variables) to satisfy the two constraints. But note that the expected value of the sd of a sample is smaller than the sd of the population. For a sample size of 3 a correction factor of approximately ## \sqrt{\frac 23} ## is appropriate if I remember correctly.

Thanks for your reply.

I used NORMINV(RAND(), 100, 15) in Excel to generate 80, 93, 102. Could you please show me how to scale and add a suitable offset as I'm kind of lost with this?

Huge thanks.
 
Use the formulas for the mean and standard deviation. So if you scale x, y and z by k and add an offset a, the mean is ## \frac{kx+a + ky+a + kz+a}3 = \frac k3 (x+y+z) + a ##. Use the appropriate formula for the sd as well and substitute in your numerical values and you will have two simultaneous equations that are easily solved for k and a. I'm not sure the end result has any statistical merit though.
 
  • Like
Likes   Reactions: fieldmusic123
fieldmusic123 said:
I used NORMINV(RAND(), 100, 15) in Excel to generate 80, 93, 102. Could you please show me how to scale and add a suitable offset as I'm kind of lost with this?
Oh. I misunderstood.
1) Calculate your sample mean sm = (80+93+102)/3 = 91.6666666666667 and sample standard deviation ssd;
2) Change each data point Xi to X'i = (15/ssd)(Xi - sm) + 100

That should have the correct sample mean and standard deviation.
Note: the standard deviation of the data will have the specified sample standard deviation. If you want to match the true standard deviation, multiply ssd by ((n-1)/n)0.5, where n is the number of data points. It is not clear to me which value would be more appropriate.
 
  • Like
Likes   Reactions: fieldmusic123
FactChecker said:
If you want to match the true standard deviation, multiply ssd by ((n-1)/n)0.5, where n is the number of data points.
Are you sure this is an accurate correction factor for n = 3?
 
  • Like
Likes   Reactions: fieldmusic123 and FactChecker
  • #10
MrAnchovy said:
Are you sure this is an accurate correction factor for n = 3?
No, I was wrong. Thanks for the correction. The standard equation for the sample variance (dividing by n-1) is an unbiased estimator for the distribution variance. Taking the square root introduces a bias, so the standard equation for the sample standard deviation (still dividing by n-1) is a biased estimator. But it underestimates σ, so multiplying it by ((n-1)/n)1/2 would make it even worse. (see https://en.wikipedia.org/?title=Unbiased_estimation_of_standard_deviation )
 
  • Like
Likes   Reactions: fieldmusic123
  • #11
Thanks for you help guys, now I understand
 

Similar threads

  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 29 ·
Replies
29
Views
3K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 6 ·
Replies
6
Views
4K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 23 ·
Replies
23
Views
4K