sylas said: "Engineering is, of course, rather different to science. "Validation" is a normal part of working for designed or engineered constructions; but it does not have quite so central a position in empirical science.
For example... what would it mean to "validate" a climate model? We already know that they are not complete and only give a partial picture of climate. To quote a common phrase: climate models are always wrong but often useful."
Sorry, I can't accept that. If you want to say that it is against my religion, fine. But in this case my religion is what I learned from professors and colleagues that it takes to do statistics right. The difference can be summed up in two books: Lies, Damned Lies and Statistics by Michael Wheeler, and The Visual Display of Quantitative Information by Edward Tufte. Go read both books, then go look at the original hockey stick again and decide which author describes it...
Terry Oldberg said: "According to Gray, he urged description by the IPCC of how the IPCC's models could be statistically validated. He says that the IPCC blew him off on this issue and implies that the IPCC's models are neither validated nor susceptible to validation. If Gray is correct, then the IPCC's models are not "scientific" models under Karl Popper's criterion of falsifiability."
Amen Terry! For me this sums up perfectly why I count myself an anthropogenic global warming skeptic even though I think there are good reasons for reducing CO2 emissions.
Although much of my work has been in the area of programming languages and algorithms, I have a MS in Statistics. I recently made a post on a completely different topic in which I said: "Beware of EDA (exploratory data analysis) without validation. There are times when you must split the sample available to you in half, one part for EDA, the other for validation. Anything else is junk science, or at least junk statistical analysis." (Since I am quoting myself here, I felt I could make some minor edits without adding brackets. ;-)
So the problems I have with climate science as a whole, are the incestuous sharing and editing of data, which makes anything like an independent test almost impossible, and the attitude that even when falsified, climate models should be kept the parameters juggled, new data added etc. No one should be surprised that climate models have no predictive power given the culture in which they have been created.
Will there be a real climate science at some point in the future? Probably. But I can't see it evolving from the current climate experts. They are not real scientists, although they do claim that on TV.
Am I being too harsh? I don't think I am being harsh enough. As a statistician who finds EDA to be fun, I often ask for the data behind papers in in areas such as cosmology, superconductivity, and the solar neutrino problem to mention a few. (I have also asked for such data in fields where I am known, but I am ignoring that.) In every area except climate science, the only problem I have is that the scientists are so glad to have someone interested who is a statistician and programmer who can help with validation, that my validation (or falsification) is not independent.* But in climate science the story is much different.
Yes, I have looked at the data which is now publicly available, and I am regularly shocked at how poor it is. What use is data from sites where measurements were only taken when it wasn't too cold, snowing, or raining? Or where normal and unavailable are represented by the same symbol (0). Climate researchers have often further processed these data to substitute proxies where original data is missing, but now any validation is impossible. At best, attempts to duplicate results will end up either accepting the researcher's assumptions, or with a much different data set. This is the "one tree in Siberia" problem.** If you have access to the original data you can run tests to determine the possible degree of measurement error and the amount of (unbiased) random error present. But when the only data available have been merged, outliers removed, and otherwise "cleaned up," you have to either accept it on religious grounds or reject it as unverifiable.
The other way, of course, to test models is to use them for prediction. Yes, I have seen models which did predict colder temperatures in 2008 and 2009--but they are based on sunspots, and cosmic rays. They are definitely not part of any IPCC consensus.
Finally, Bill Illis has linked to some (decent) NOAA data showing CO2 levels over 6000 ppm, or 20 times current levels millions of years ago. The simple application of the Stefan-Boltzman law would call for about 16 degrees C of ratiative forcing, which was clearly not the case. I certainly understand why the simple thermodynamics doesn't work. There are windows in the CO2 absorption spectrum which remain open no matter how many doublings of CO2 partial pressures occur. It doesn't take much data analysis either to realize that the answer is that water vapor and clouds have a complex response to temperature. But the shouting down by "climate scientists" of weathermen who use statistics to develop and validate complex models of how water vapor actually works is pretty shameful. (They probably watch the predictions of those models every night on TV to see what to wear in the morning, while shouting the models down during the day.)
I'd love to share some of those weather models with you, but there is an entirely different problem involved. The weather data itself is available from NOAA to anybody, in more detail than most desktop computers can handle. But there is a tremendous competition, part of it commercial, part of it military, to come up with better long-term weather prediction models. (And when long-term is long enough, it becomes climate.) Most of this work is done within a few miles of Bedford, MA, either at commercial weather forcasting companies, or by government agencies headquartered at Hanscom AFB.
I worked at MITRE Bedford for over a decade, and as I said at the beginning, my work involvement in these projects usually involved such things as programming languages support, distributed systems development, or (complexity) analysis of algorithms. So the data and models were not mine to distribute. However, the development effort in house was approximately three times as much effort on validating models as on creating new ones. When the models were turned over to the Air Force, the additional validation costs were huge. The additional work created by running multiple models side-by-side for a year or so is appalling. Back then it was basically one model run per supercomputer with multiple runs of the same model necessary every day. Part of the pain is that the data used in the run are usually twelve to twenty-four hours stale by the time the run finishes--and it often took two supercomputers to produce one result every twelve hours. If you read the forcasts from the National Hurricane Center you will find that today (well this year) they usually run four different models against the data from each hurricane every few hours.
The big prediction problem incidently is still the rain/snow line in three dimensions. It is easy to see how snow vs. rain on the ground results in a different albedo once the storm is over, but the same line is very important in heat transport within summer thunderstorms. And validation there is just a bit harder. I've seen pictures of aircraft that came back with (fortunately only slightly)) bent wings. And with lots of dents from flying into hail.
*I remember one case where I suggested non-parametric statistics as a way to tease a signal out of the background. The Wilcoxon rank sum test pulled out a very clear and statistically significant result--which after a few more days of data turned out to be from the heating system in the building. Sigh! At least I helped him track down what would have been an embarrassing systematic error, even if it did mean he had to start data collection all over again.
** Yes, I know that the tree ring data from Siberia was not based on data from just one tree. The problem is that the processed data wipes out the normal random variations that can be used to test for significance.