Sorry, the file size is too big. Can I email it to you? Or I'll just copy-paste the text here.
I decided to learn statistics because of my interest in Climate Change. In a 2010 BBC interview, a prominent climate scientist admitted that the warming trend since 1995 to “the present” was “not statistically significant”. This led to headlines saying that warming since 1995 was “insignificant”. Now that you have taken this class you should know that in statistics, the term “statistically significant” has a special meaning, and “not statistically significant” does not mean “insignificant”. It means the evidence does not meet the 95% confidence level. The headlines were false. Furthermore, collecting additional evidence may increase our confidence level, and in fact one year later the trend reached the 95% confidence level, and even the correctly stated claim was no longer true.
For this project I wanted to collect some temperature data and see what I could find. I wanted a sample that represented the whole US. There are 3144 counties and county equivalents in the US. My initial plan was to randomly pick counties from this list and collect the average temperature for each in the years 1964 and 2014. There are two things you should notice immediately about this plan. First, it only includes the US, which will not tell us much about the rest of the world. Second, it only compares two years, and to identify a long term trend we would want to compare data over a period of many years.
It turns out that you can’t look up average annual temperature by county. Each county has multiple weather stations, and furthermore weather stations come and go. Because I wanted to compare the data for the same location from 1964 and 2014, I needed dependent samples. So I ended up looking for one weather station in each county that was present in both 1964 and 2014. This is a form of stratified sampling.
Here are the locations:
[slide]
I want to point out that there are less dots in the Western US. This is because counties are larger in the West. Since I only picked one station in each county, the stations are spread out more in the West. I will come back to this point later.
Now that I have my data, the first thing I want to do is use the differences from the two dependent samples to test a claim about the mean of the population of all such differences.
[slide]
So the p-value is way too high. Actually, the sample mean for population 1 (average temperature from 1964) is higher than the sample mean for population 2 (average temperature from 2014). So I should be testing for the opposite.
[slide]
So this is a surprise. It is backwards. Why is that?
Remember that I have fewer dots in the Western US. Maybe this is affecting the data. Also remember that I am only looking at 1964 and 2014. Maybe there is something special about one of those years.
[slide]
Well it turns out there is something special about 2014. In 2014 the contiguous United States experienced extremes of both hot and cold. The West had record heat, while the states in blue were exceptionally cold. Since I have few dots in the West, most of my data is from counties in areas that were exceptionally cold.
This means that whether the average temperature at a location went up or down is dependent on whether it is in the West or not. The formal way to test for independence is a contingency table.
[slide]
Now I know that I should be looking at the Western states separately from the rest of the US, and I need to collect more data from the West.
I selected some counties in the West, and this time I included all the weather stations in them that were present in both 1964 and 2014, although I threw out some that were too close together. I split my data into Western states and the rest of the US. Here are the locations:
[slide]
Now I test the claim that, in the West, average temperatures from 1964 are less than average temperatures from 2014:
[slide]
We have 95% confidence that the true value of the mean of the differences in temperature is between −3.319 and −1.758 degrees F.
And I test the claim that, in the Rest, average temperatures from 1964 are greater than average temperatures from 2014:
[slide]
We have 95% confidence that the true value of the mean of the differences in temperature is between 1.403 and 3.418 degrees F.
It is claimed that Climate Change will lead to more extreme weather, higher highs and lower lows. The map of extreme temperatures seems to show that. We measure the relative spread between highs and lows with the standard deviation. Here are the histograms:
[slide]
Actually the standard deviation for the whole country did not change much between 1964 and 2014. Why is that? If you take a closer look at the data, you will see that the mean temperature of the Western states is lower than the mean temperature of the whole country, and the mean temperature of the rest of the country is higher than the mean temperature of the whole country. Since the Western states got warmer and the rest of the country got colder, the extremes canceled out. But if you look at the standard deviation within the Western states and the standard deviation within the rest of the country separately, you can see that they both went up.
It is also claimed that Climate Change is warming the Arctic faster than anywhere else. I interpret this to mean that colder regions will warm faster. So we can see that both claims are true: within smaller regions there will be more extremes, but they will average out over larger regions.
I wanted to do more precise analysis of the variance of the data, but the tools we have require normal distributions, and you can see in the histograms the data is far from normal.
I enjoyed this project because it let me discover something I didn’t know before. And now that I have taken this class I have some new tools for checking the facts behind the headlines.