Deriving conclusion from a small data set

musicgold · Nov 4, 2013

Hi

Please see table below GDP growth rates of a group 5 countries. I am trying to derive some conclusions from this small sample.

1. can I conclude that countries in the group have very similar growth rates and there is no significant difference between their growth rates?

2. As shown, the highest growth rate is just one std. dev. higher than the sample average. However, I am not sure if the std. dev. of such a small sample mean anything?

3. What would I need to do to be able to say that there is no statistically significant difference between these growth rates?

Thanks.

--------------------------------
Country GDP growth rates
A ---- 3.3%
B ---- 3.0%
C ---- 2.9%
D ---- 2.8%
E ---- 2.4%

Average 2.9%
Std. dev 0.3%
Range 0.9%

Simon Bridge · Nov 4, 2013

I imagine each of the individual figures have their individual uncertainties based on how they were calculated and the data used to do so.

The std.dev. of the mean is not the same as the std.dev. of the dataset.

If this were physics, then I'd take the uncertainty on each quoted rate as 0.1pp (percentage points)
Therefore the uncertainty on the mean will be 0.04 ... so they are all significantly off the mean growth.
But so what?

Does country E have the same growth rate as country A?
Their difference is 0.7pp ... that's outside 2x0.3 right?

musicgold · Nov 5, 2013

Thanks Simon.

Simon Bridge said:

The std.dev. of the mean is not the same as the std.dev. of the dataset.

I am not sure what you mean by 'std dev of the mean'? Do you mean 'std dev. of the sample' or 'std error' ?

Simon Bridge said:

If this were physics, then I'd take the uncertainty on each quoted rate as 0.1pp (percentage points)
Therefore the uncertainty on the mean will be 0.04 ... so they are all significantly off the mean growth.

This is not clear to me. I mean I know what a pp is, but I don't understand how the uncertainty of the mean will be 0.04pp.

Simon Bridge · Nov 5, 2013

musicgold said:

Thanks Simon.

I am not sure what you mean by 'std dev of the mean'? Do you mean 'std dev. of the sample' or 'std error' ?

The std.dev. of the mean is the std.error of the value you got for the mean.
When each of the values in a data set have an uncertainty, the the mean of those values will also be uncertain.
Recall "hypothesis testing" in statistics?

This is not clear to me. I mean I know what a pp is, but I don't understand how the uncertainty of the mean will be 0.04pp.

That is the uncertainty on one value divided my the square root of the number of terms.
But I don't think the mean of the values is going to tell you what you want to know.

Consider: if you only had two values, how would you see if they were significantly similar?

Office_Shredder · Nov 5, 2013

There are two distinct questions that you can be asking:

1.) I measured five data points, but it's impossible to measure the GDP of a nation with 100% accuracy. Are these measurements the statistically different given the noise in the measurements?

2.) I have good measurements, but a nation's GDP growth can be affected by temporary effects. Are these five economies growing at statistically significant different rates?

Which one are you looking for?

musicgold · Nov 11, 2013

Office_Shredder said:

There are two distinct questions that you can be asking:

1.) I measured five data points, but it's impossible to measure the GDP of a nation with 100% accuracy. Are these measurements the statistically different given the noise in the measurements?

2.) I have good measurements, but a nation's GDP growth can be affected by temporary effects. Are these five economies growing at statistically significant different rates?

Which one are you looking for?

Thanks. I think I am looking for the first one. How should I figure that out?

Simon Bridge · Nov 11, 2013

eg. let's say you have two data points ##x\pm\sigma_x## and ##y\pm\sigma_y## and you want to know if they are statistically different from each other.

That would be like asking if the difference ##x-y## is within some confidence interval of zero.

You know how to do "hypothesis testing" right?

In your case, you need some way of estimating the statistical uncertainty in each individual measurement.
A common estimator is to take half the smallest quoted place-value in the measurement.
i.e. A: (3.3±0.05)%

You also have more than two data points.
It is possible that countries A and B have similar enough growth to be statistically the same but A and E are statistically different.

Deriving conclusion from a small data set

1. What is a small data set and why is it important to derive conclusions from it?

2. How do you determine the reliability of conclusions derived from a small data set?

3. Can conclusions derived from a small data set be generalized to a larger population?

4. How can visualizations and graphs aid in deriving conclusions from a small data set?

5. What are some common mistakes to avoid when deriving conclusions from a small data set?

Similar threads

Hot Threads

Recent Insights