Proton Soup
- 223
- 1
from what i remember on the CRU thread, CRU had a bunch of data sets in printed form. at some point, persons at CRU had to manually enter all this data into a computer. but what CRU did not have apparently was computing resources to store (and maybe more accurately, process) all this data. so, they produced a roughened, courser, "homogenized" data set from the original data with which to work. at some point, the printed data was deemed either burdensome or unnecessary and destroyed. presumably, either the computing resources to handle large data sets was still not available, or perhaps rough estimates were simply deemed "good enough", that seems unclear. waiting on simulations to complete is certainly time-consuming, so it is at least one factor to consider.
as for what i don't remember reading, what is this mysterious homogenization algorithm? has anyone ever reproduced the homogenized data set? one would assume data is shared between colleagues and that the basic tenet of science (reproducibility) is adhered to. if not, then we have no validation, a broken chain of custody, and no assurance that results produced from the data (and homogenized data) are valid.
another thing that's curious to me: how many cubic feet of printed hardcopies of original data are we talking about here? an entire room, a broom closet, or a bit of space on one of the researcher's bookshelf?