'Reverse' derive correct dataset when given mean + std dev

fieldmusic123 · Jun 17, 2015

If I know the mean and standard deviation, but I don't have an original data set, is it possible to reverse calculate a data set that has this known mean and std dev? Is there an excel function I could use to do this easily? I know there could be multiple correct datasets and that's fine, I'm just wondering if it's possible to reverse derive a correct one that has the known mean and std dev.

So, for a MEAN = 100 and STD. DEV= 15, can anyone tell me how to calculate a data set with n=3?

Many thanks.

FactChecker · Jun 17, 2015

Yes, but there are often other properties that you want your data set to have (always positive, from a uniform, normal, or Poison process, etc.) So you usually want more than that. The RAND() EXCEL function gives a number from a uniform distribution between 0 and 1. It has a mean of 0.5 and a variance of 1/12. So multiply and add to get your desired SD and mean. To get a number from a normal distribution, combine RAND and NORMINV ( NORMINV(RAND(), desired_mean, desired_sd) )

fieldmusic123 · Jun 17, 2015

FactChecker said:

Yes, but there are often other properties that you want your data set to have (always positive, from a uniform, normal, or Poison process, etc.) So you usually want more than that. The RAND() EXCEL function gives a number from a uniform distribution between 0 and 1. It has a mean of 0.5 and a variance of 1/12. So multiply and add to get your desired SD and mean. To get a number from a normal distribution, combine RAND and NORMINV ( NORMINV(RAND(), desired_mean, desired_sd) )

Thanks for your reply. I used " NORMINV(RAND(), 100, 15) " in Excel to generate a set of random numbers that look like they would individually fall within a normal distribution for a mean of 100. However, these values, when combined, do not have a mean of 100 (eg. 80, 93, 102). Is there a way I could perhaps use Excel to modify this data set to force it to have a mean of 100 and std dev of 15? Apologies if I am missing something in your answer.

HallsofIvy · Jun 17, 2015

Call your three data points x, y, and z. Then the mean is (x+ y+ z)/3= 100 so x+ y+ z= 300. The variance is (x- 100)^2+ (y- 100)^2+ (z- 100)^2= x^2+ y^2+ z^2- 200(x+ y+ z)+ 30000= 225. That gives two equations to solve for the three unknown values. There are infinite number of such collections of data points. If you assume a normal distribution, which depends only on the two parameters, mean and standard deviation, then, of course, you can do this.

pbuk · Jun 17, 2015

fieldmusic123 said:

Is there a way I could perhaps use Excel to modify this data set to force it to have a mean of 100 and std dev of 15?

Yes, you could scale it and add an offset (two independent variables) to satisfy the two constraints. But note that the expected value of the sd of a sample is smaller than the sd of the population. For a sample size of 3 a correction factor of approximately ## \sqrt{\frac 23} ## is appropriate if I remember correctly.

fieldmusic123 · Jun 17, 2015

MrAnchovy said:

Yes, you could scale it and add an offset (two independent variables) to satisfy the two constraints. But note that the expected value of the sd of a sample is smaller than the sd of the population. For a sample size of 3 a correction factor of approximately ## \sqrt{\frac 23} ## is appropriate if I remember correctly.

Thanks for your reply.

I used NORMINV(RAND(), 100, 15) in Excel to generate 80, 93, 102. Could you please show me how to scale and add a suitable offset as I'm kind of lost with this?

Huge thanks.

pbuk · Jun 17, 2015

Use the formulas for the mean and standard deviation. So if you scale x, y and z by k and add an offset a, the mean is ## \frac{kx+a + ky+a + kz+a}3 = \frac k3 (x+y+z) + a ##. Use the appropriate formula for the sd as well and substitute in your numerical values and you will have two simultaneous equations that are easily solved for k and a. I'm not sure the end result has any statistical merit though.

FactChecker · Jun 17, 2015

fieldmusic123 said:

I used NORMINV(RAND(), 100, 15) in Excel to generate 80, 93, 102. Could you please show me how to scale and add a suitable offset as I'm kind of lost with this?

Oh. I misunderstood.
1) Calculate your sample mean sm = (80+93+102)/3 = 91.6666666666667 and sample standard deviation ssd;
2) Change each data point X_i to X'_i = (15/ssd)(X_i - sm) + 100

That should have the correct sample mean and standard deviation.
Note: the standard deviation of the data will have the specified sample standard deviation. If you want to match the true standard deviation, multiply ssd by ((n-1)/n)^0.5, where n is the number of data points. It is not clear to me which value would be more appropriate.

pbuk · Jun 17, 2015

FactChecker said:

If you want to match the true standard deviation, multiply ssd by ((n-1)/n)^0.5, where n is the number of data points.

Are you sure this is an accurate correction factor for n = 3?

FactChecker · Jun 17, 2015

MrAnchovy said:

Are you sure this is an accurate correction factor for n = 3?

No, I was wrong. Thanks for the correction. The standard equation for the sample variance (dividing by n-1) is an unbiased estimator for the distribution variance. Taking the square root introduces a bias, so the standard equation for the sample standard deviation (still dividing by n-1) is a biased estimator. But it underestimates σ, so multiplying it by ((n-1)/n)^1/2 would make it even worse. (see https://en.wikipedia.org/?title=Unbiased_estimation_of_standard_deviation )

fieldmusic123 · Jun 18, 2015

Thanks for you help guys, now I understand

'Reverse' derive correct dataset when given mean + std dev

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad The problem of points

Graduate Expected numbers of cards of a last color remaining

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect