# Increasing monthly temperatures - don't understand the question

1. May 1, 2009

### Shukie

1. The problem statement, all variables and given/known data
I'm doing an assignment in Mathematica and I need some help understanding the question. I have a datasheet with temperatures for each month for the last 108 years:

http://www.knmi.nl/klimatologie/maandgegevens/datafiles/mndgeg_260_tg.txt [Broken]

Question 3 was:

"Determine the average temperature of each month over the period 1901-2008. Make a table with two columns and twelve rows, showing the average temperature for each month."

That was easy enough. Now comes question 4:

"Same as question three, but now we assume a linear warming. That means for each month the average temperature will be fit by the function $$t_{average,i}[x] = a_i + b_i(x -2008)$$. In this function, i goes from 1 to 12 (from january to december) and x from 1901 to 2008. Your table will now have three columns. What is the meaning of $$a_i$$ and $$b_i$$? What are their dimensions?"

This question confuses me and if it confuses you too it might be due to my shoddy translation. Anyway, what exactly am I supposed to put in the third column? I assume I will have to calculate the values of $$a_i$$ and $$b_i$$? So the third column will have twelve rows of something like: $$5 + 6(1901 - 2008)$$? If so, how do I calculate those values for $$a_i$$ and $$b_i$$?

Last edited by a moderator: May 4, 2017
2. May 1, 2009

### sylas

The temperature over a year shows a strong seasonal cycle, repeating year by year with cold in winter and warm in summer.

The average over a year is thus a bit dubious, but an average temperature for a month makes sense. You calculated that in the first question.

The followup is suggesting you don't just get one average, but a line that matches the trend. The question is oddly worded. It does not seem to be asking you to actually calculate a_i and b_i, just describe what they mean.

There are methods to calculate these numbers; just like there is a method to calculate a mean.

Try this. Plot all the temperatures for January, in sequence, from 1901 to 2008. Now get a ruler, and draw a line that gets as close as you can to as many of the points as you can. That line has an equation: you should be able to express the line in the form

t = a + b (x - 2008)

You can repeat this for each month; that's where the i subscript comes in.

You should be able to look at the equation, and figure out the units for a and b. As for what they mean; a should be easy. For b, when you get the units it might make more sense. But try it with the ruler and see what values you get for a and for b, and maybe that will help clarify what they mean.

As background, the mathematical method for actually calculating a and b, rather than guessing with a ruler lined up to the points, is called "linear regression".

Cheers -- sylas

Last edited by a moderator: May 4, 2017
3. May 1, 2009

### Shukie

Thank you for your answer, that clarified things a bit. Perhaps my translation was poorly done or the question was just poorly worded in the first place, but in the next question it asks to calculate the mean and standard deviatian of the values of $$a_i$$ and $$b_i$$, so I assume I will have to calculate them here. So basically what I want to do is perform a linear regression for each month and that will give me twelve different values for $$a_i$$ and twelve for $$b_i$$?

Now, in Mathematica, I found an example of a linear regress for a straight line:

Regress[{2, 3, 4, 6}, x, x]

I suspect one of the x's means that the variable is x and the other x implies that it's a straight line. Should I try to modify the above example for my own function or do I have to do something completely different?

4. May 1, 2009

### sylas

Yes. The question is asking for 12 different trends, one for each month.

You should have documentation for functions in Mathematica; but that seems to be right -- except possibly the second argument should be {1, x}. I've not used it, myself. The first argument there is your y-values. The x-values are assumed to 1,2,3,4 or something like that, or you can give a vector of points with both x and y values, I think.

If you fit a regression to (1,2), (2,3), (3,4), (4,6) you get the line 0.5 + 1.3*x

That is, you estimate y (data) as a linear function of x and a constant. You can use the Mathematica function to do more than only linear regression. I don't really know how "Regress" works, but the second argument is the functions you can use, and the third are the independent variables. You've only got one variable (x). I would have thought the second argument should be {1, x}, so that you get a line as a linear combination with a constant term.

I think you are on the right track.

Cheers -- sylas

PS. Actually, just x in the second argument is probably okay. I think mathematica includes a constant term by default, unless you explicitly tell it not to with an extra argument.

PPS. See if you can make Regress use "x-2008" as the function in the second argument. The question is asking you for a linear combination of a constant and "x-2008". I can think of various ways you might do this; but keep track of what Mathematica is using for "x" as well.

Last edited: May 1, 2009
5. May 2, 2009

### Shukie

Thanks a ton for your help, I'm not really comfortable with Mathematica and this assignment will probably determine if I pass this year or not So your help is very much appreciated!

I took your advice and tried to use (x-2008) as a function and I used regression for the month january. The Transpose[data]/10 gives me the temperatures for each month in a seperate list. I also made a plot for clarity's sake and it seems a little steep:

http://img510.imageshack.us/img510/4840/regressr.jpg [Broken]

I think there is still a problem with this. I think x is supposed to run from 1901 to 2008, not from 1-108, so I need to add that into my regression, but I have no idea how. That might take care of the 'steepness'. However, is the rest okay? Because I believe I have now calculated $$a_i$$ and $$b_i$$ for january, like the question asks. Then I just need to do the rest as well.

Last edited by a moderator: May 4, 2017
6. May 2, 2009

### sylas

You are doing very well!

All that is missing is that you are supplying y values, but not the x values. So Regress is still taking the x as 1,2,3... rather than 1901,1902,1903...

You can give a vector of X-values in the first argument as well. I think it will allow you to pass a list consisting of two vectors: something like [Xlist, Ylist]

Try and see.

Alternatively, a cheat would be to use the function "x-108". Think about it. But supplying x values will be better.

I obtained exactly the same slope with regression functions in Excel. You've got that correct; and now you just need to include the years in your plot, rather than the sequence number. That is, you've got b_i. Looking at your graph you can see what the a_i is going to be.

Cheers -- sylas

7. May 2, 2009

### Shukie

I finally managed to combine the temperatures of a given month and their corresponding years into seperate lists. So now I have a list:

TY = {{{1901, -0.3}, {1902, 4.6} ... {2008, 6.1}}, {{1901, -0.9}, {{1902, -0.4} ...

So TY[[1]] is all the temperatures of the month january, which I put into my regression:

http://img254.imageshack.us/img254/1681/regress1d.jpg [Broken]

Not a whole lot seems to have changed, except the value for $$a_i$$. This confuses me a bit, because it's now around 3.0 opposed to ~31.0 which I found earlier. Yet the graph still looks pretty much the same. In fact, it seems completely the same. Can this be correct? (The graph isn't really part of the question as you know, but it seems like a useful tool to check my answers)

Last edited by a moderator: May 4, 2017
8. May 2, 2009

### sylas

It's now correct. All that you needed to change was the a_i, and now you have precisely what I calculated using Excel.

Now you are in a good position to explain what the a_i and b_i actually mean, and give their units.

Previously you had 31 degrees for a_i. That's a bit too much for January in the Netherlands! 3 degrees sounds much more sensible!

The graph is exactly the same; that's what you wanted. The only difference is that the X axis is now labeled correctly, and the a_i value is properly adjusted for the new X axis labels. Previously you were using the number 108 to represent the year 2008, and so previously the x value of 2008 was actually the year 3908!

Basically, what you had before was the temperature projected 1900 years into the future. 31 degrees is what you should expect in the year 3908, assuming current trends continue unchanged for 1900 years. That won't happen. The current rate of warming is very rapid by comparison with what is normal for our planet. It won't continue like this for 2000 years; something will break long before then.

This problem is touching on the whole issue of climate projections, on which all kinds of people have strong feelings. I'd love to sit in on your classes, just to see how you all talk about it! I'll resist commenting on the background science of this -- but I have written a fair bit about it in the Earth science subforum here.

Well done -- sylas

9. May 2, 2009

### Shukie

Thanks a lot for your help, you really helped me through this question. All that is left for me to do is figure out a way to do the regressions for all months without repeating the same command twelve times. One of the criteria is 'simple and elegant use of Mathematica' after all. However, perhaps that is a question for a different topic, but if anyone knows, I'd love to hear it.

Also, $$b_i$$ seems to be the average increase of the monthly temperature. Is that correct? Then, since x is dimensionless, the dimensions of both $$a_i$$ and $$b_i$$ should be the temperature in Celsius. As for $$a_i$$, my guess was that it would simply be the average temperature of a given month over the period 1901-2008, but that isn't correct. I'll have to think that one over some more.

Thanks again!

Edit: A simple Table managed to take care of my first problem.

http://img127.imageshack.us/img127/6365/regress2.jpg [Broken]

This is interesting, because if my guess about the meaning of $$b_i$$ is correct, it would appear that the temperature of some months is changing at nearly only half the rate of some others.

Last edited by a moderator: May 4, 2017
10. May 2, 2009

### sylas

x is not dimensionless, and so b_i does not have units of temperature.

You're right that you're wrong about a_i. Look at your plot, and see where a_i can be found.

Yes; I noticed that also! It is interesting, good observation.

There are two possible reasons for this, and with a bit of checking I'm pretty sure only one holds up. Here are two alternatives:
• There really is a difference in the trend at different months. This could happen, for example, if there was a trend to more extreme seasonal variation. Summer gets hotter, and winter heats up rather less. If this is the case, you should expect a roughly cyclic pattern in the values for b_i. A "radar" plot would be a good way to look at the b_i values!
• The trend is uncertain, because of the natural year to year variations going on all the time. You can check if this is the case by getting a bit more information from your regressions -- like the standard error on the regression slopes. If differences between b_i in different months is of a magnitude comparable to the standard error in the estimate of b_i, then the differences are not significant, and just show up because this is a statistical estimate with noisy data. Trying to get Regress to give you information about standard errors would be a good way to check this out.

Answering this question probably goes a bit beyond your assignment requirements! But hey.

You could try the "Thread" function in mathematica, as an alternative to the "Table" function for building up the arguments to Regress. It might be a bit more efficient/elegant; but what you've got is working very well.

Cheers -- sylas

11. May 3, 2009

### Shukie

So, $$a_i$$ is the temperature in Celsius and since x is in years, $$b_i = \frac{C}{T}$$ (temperature in celsius per year). Also, $$a_i$$ represents the temperature that it would be in 2008 if the temperature obeyed this linear model?

Actually, the next question is to find the mean and standard deviation of my list of $$b_i$$. I did just that and found $$B_{mean}$$ = 0.0128915and $$B_{SD}$$ = 0.00292541. This means that only four of the slopes lie outside one standard deviation (0.00853316, 0.00917098, 0.0159241, 0.0190084) and only one lies outside two standard deviations (0.0190084).

Now the question asks me to draw conclusions about the warming. These differences are not very significant are they? In that case I wouldn't be able to draw any conclusions.

I tried this one as well, but I couldn't get it to work. I'll try it again later.

12. May 3, 2009

### sylas

Exactly so.

Well, the number of months that lie inside or outside the limits of a number of standard deviations is not particularly interesting. The standard deviation is defined so that you would tend to get about 2/3 of your sample within a standard deviation. So four months outside a single standard deviation is just what you should expect from any sample of twelve.

What's more relevant is the relative magnitude of the standard deviation and the mean.

Think of it this way. Suppose I just tell you that January is warming at 0.0146960473 C/year.

That's a bit too much accuracy, yes?

So the question arises: how much confidence can you place in the trend, given the data? There are actually better ways to do this, using what is called the "standard error" in a regression analysis; but the method you are trying gives you a reasonable stab at the accuracy.

The warming trend for a given month might have several contributing factors. Some of it might be due to an overall trend of higher temperatures. Some of it might be due to a trend of more or less extreme seasons. (More extreme seasons would give extra warming in summer months and reduced warming in winter months; less extreme seasons would reverse that.) And some might be just the chaotic effects of weather, which mean different months differ somewhat at random.

You can use the mean and standard deviation to give a rough idea of whether there is any significance to overall warming; and to quantify how accurately you can estimate that warming.

13. May 3, 2009

### Shukie

Thanks for your help. Unfortunately, I don't quite get it yet. I'm not sure how I can use the mean and standard deviation to see if there is any significance to the warming. Do I need to use the standard deviation of the mean? $$\frac{\sigma_b}{\sqrt{N}}$$

14. May 3, 2009

### sylas

No.... consider the mean and standard deviation of your twelve b_i values, which you have already obtained correctly. (I got the same values.)

If the mean is 0, then there's no warming trend, right?

If the mean is very small, then there's no significant warming trend, right?

Riddle me this: how small is small, in this case?

Cheers -- sylas

15. May 3, 2009

### Shukie

The mean lies about 4 standard deviations from 0. This would mean that it's highly unlikely that it's not warming up, right? So we can safely assume that there is indeed some amount of warming going on. The question is then, how much. Can I answer that question with just the mean and standard deviation?

16. May 4, 2009

### sylas

I believe so. This is a form of "hypothesis testing". Basically, you assume that there is no warming trend, and then estimate how surprising it is to get the observations you have. If the observations are surprising under the no warming hypothesis, then you can reject that hypothesis with high confidence.

I'm not sure how much they expect from you; but look up "hypothesis testing" anyway. It's a big subject, so go with the simple introductions.

Note that you've got a lot better than a single month that is several standard deviations away from zero. You've got 12 months, ALL of which are several standard deviations above zero. This increases your confidence in the warming.

There's a whole statistical theory of how this works, and I'm not expert at it.

A really quick and easy notion would be to argue as follows.
1. Assume, for the sake of argument, that there is no warming.
2. The mean you have obtained is surprising, under this assumption.
3. Let's assume that the standard deviation, however, is about right. That is, assume that that the trends for a given month are zero on average (no warming) and also that the standard deviation is what you have measured, which is about 0.003.
4. Under this no-warming assumption, what is the probability that 12 randomly selected months will have a mean which is what you have measured, or more?
5. The standard deviation for the mean of a sample size 12 is the standard deviation for a single sample, divided by $\sqrt{12}$. This would be more like 0.00087.
6. Given your assumption, the mean value for your 12 months should be expected to come from a normal distributions with mean 0 and with standard deviation 0.00087. You obtained about 0.0129. What is the probability you would obtain a mean as high as that, under these assumptions?
7. If this probability is less than 1%, you can reject the null-hypothesis with 99% confidence.

17. May 4, 2009

### Shukie

Thank you, that really cleared things up. I don't know how to calculate it in Mathematica, but I can do it with my graphic calculator.

normalcdf(0.0129, 10^9, 0, 0.00087) = 5*10^-50

I don't know if I calculated it correctly, because that would mean a mean of 0.0129 or higher would have a ridiculously low chance of happening if there is no warming, pretty much zero chance really. I don't know how realistic that is.

Last edited: May 4, 2009
18. May 4, 2009

### sylas

That sounds about right. Just check that the calculator not actually overflowing.

You've calculated sdev for b_i as 0.00292541
The standard deviation for 12 items, therefore, should be 0.000844493
(You've used 0.00087, which was a really crude low accuracy value by me, starting from 0.003)

You've got the mean as 0.0190084
That's about 22.5 standard deviations! 10^-50 sounds the right ball park.

In other words, there's a ridiculously low chance of no warming trend. You've definitely got warming happening; there is a statistically significant increase in temperature going on.

Now it just so happens that I play around with this kind of data quite a lot. I've loaded your De Bilt data into a spreadsheet I have ready to hand, and used that to calculate the trend with a linear regression over the data. But I used another slightly more complicated method for getting the significance of the trend.

I gave each year a single temperature as the mean of all twelve months. (This is already in your data as column 13). I did a regression over that, and found the slope. Then I used the "standard error" method, with the student-t distribution to get bounds from a confidence level. Here's what I got:

Warming trend is De Bilt: 108.000 year trend = 0.129 C/decade, +/- 0.039 (95% conf)

(My spreadsheet calculates trend in degrees per decade. You can divide by 10 to get degrees per year.)

The +/0 0.39 means that I am 95% confident the trend is somewhere from 0.090 to 0.168. There's too much noise to nail down an accurate slope better than that; but it's definitely positive.

The confidence limit is a parameter, and the spreadsheet cannot actually handle the extreme probability that would allow for a negative trend to be consistent with this data.

Here's how it works.
De Bilt: 108.000 year trend = 0.129 C/decade, +/- 0.051 (99% conf)
De Bilt: 108.000 year trend = 0.129 C/decade, +/- 0.066 (99.9% conf)
De Bilt: 108.000 year trend = 0.129 C/decade, +/- 0.079 (99.99% conf)
De Bilt: 108.000 year trend = 0.129 C/decade, +/- 0.091 (99.999% conf)
De Bilt: 108.000 year trend = 0.129 C/decade, +/- 0.102 (99.9999% conf)
De Bilt: 108.000 year trend = 0.129 C/decade, +/- 97957.433 (99.99999% conf) {Oops. Overflow}

The method you are using, with the mean and standard distribution for individual months, is a bit quicker and easier, and it does establish that the warming trend is unambiguous.

Cheers -- sylas

PS. You are in the Netherlands, I guess? You speak English like a native. Very impressive! I've been to Eindhoven and Amsterdam, back in 1997.

Added in edit. Thinking about this, I may have given you some bad advice. All the different months are not really "independent" samples, and so dividing by square root of 12 might have been inappropriate. I'm not sure; I am not a statistician. You could ask your instructor, or perhaps another reader here might correct me on this.

Suppose you just stick with 0.00292541 as the standard deviation, rather than 0.00087 or whatever. The method I used ends up with 95% confidence limits at 0.0039; that suggests to me that dividing by sqrt(12) might have given just a bit TOO much confidence in the warming.

In this case, you would find that the mean is about 6.5 standard deviations above zero. This is still enough to give very high confidence for warming, but it might be the more sensible bound, statistically. I'm really not sure! Anyhow, plug that into your normalcdf as well.

Last edited: May 4, 2009
19. May 4, 2009

### Shukie

Thanks for your extensive explanation, very interesting.

Even without dividing the standard deviation by the square root of 12, I still get normalcdf(0.0129, 10^9, 0, 0.00292541) = 5*10^-6

So we can still safely assume that there is warming going on. In your method, you actually quantify your answer, but I guess that's a little out of reach with my method and perhaps not neccesary either, because the next question asks:

What is the mean for $$b_i$$ that a student would have found 7 years ago when he only had data up till 2001? Can you draw any further conclusions about the warming in De Bilt?

I calculated this and the mean of the period 1901-2001 is 0.0100872, opposed to the mean of the period 1901-2008, which is 0.0128915. Can I conclude from this that the warming is speeding up?

Thank you, it's one of the positive benefits of watching too much tv I guess :tongue2: I'm 99% confident that if you come back now you'll find the temperatures a bit more pleasant

Last edited: May 4, 2009
20. May 4, 2009

### sylas

Compared with your accuracy bounds on the long term trend, that's a weak conclusion. The additional trend is within natural variation. Compare the difference with the standard deviation of trends.

There's a better way to look at this... use a sliding window!

The great thing about a tool like mathematica is that you'll be able to do this very easily. Pick some non-trivial number of years... at least 10. You can make this a parameter and try different values.

Now calculate the trend over 10 consecutive years. Compare 1901-1910 with 1930-1939 with 1999-2008. It's probably best to use the column 13 data for the annual average temperature, rather than worry about particular months, just to keep it simpler for the moment.

In fact, use an odd number, like 11. For every year, obtain the trend over a range of up to 5 years either side. Do it like this.

• Let R be your "range" for the sliding window. Try R = 5, and R = 10, and anything else you like.
• For each year X, consider the years X-R up to X+R, and calculate the trend B over this window of (2R+1) years. Note that if R is 5, you can only go up to 2003 (which is the middle of the window 1998-2008).
• Now plot B against X. This is a plot of how the short term trend changes over time!
• Calculate the mean and standard deviation of your B values -- the short term trends. The mean should be something tolerably close to the long term trend, but the standard deviation indicates how much the short term trend can vary from the long term trend.

Once you've done this, there are quite a number of other ways to try and draw a few conclusions. you'll find that there is a lot more variation with a shorter window. An 11 year window gives quite noisy results, but a 21 year window starts to show a few rather suggestive features.

This is pretty relevant at present. A lot of people are making a claim that "global warming has stopped". They do this by looking at a short term trend, which is at present below the longer term trend. But in fact, there's nothing particularly significant about the slow down involved. It's easily within natural variation. Statistically, the natural variation of weather is observed to have a magnitude which implies that any long term trend is bound to have decades where the short term trend is substantially above or below a long term trend. Ten years is just not enough, given noisy data, to conclude much about whether warming is speeding up or slowing down. Twenty years would make more sense, given the observed variation.

Wouldn't be hard.... actually I liked the Netherlands very much, and I certainly hope I will be back one day for a visit.

Cheers -- sylas