Vanadium 50 said:
Are you suggesting that the work done by SDSS in handling large amounts of data doesn't work?
Not sure, but I think they are likely running into the same issues that computational astrophysicists are running into, and I've seen nothing that suggests that they've made any progress in data handling that the computational astrophysicists haven't run into.
They have 50 TB, maybe 100, and have been at it for a decade.
Sure and that will give you raw data. The trouble with raw data is that it's pretty much useless without tools to do data mining and visualization. Suppose you have 50 TB of data that is the result of the a simulation, and you want to run statistics. You end up spending a few weeks writing a program that hits the raw data files, and this program takes two days to run and after another two weeks of pulling your hair out, you finally get a graph.
Except then you want to run some other statistics, and you have to go through all over again. And then you find that the raw data is on one server, the analysis program is on another, and you are not going to FTP 50 TB of data over.
And then you find that all of the data is scattered against three or four files, with no metadata, and in order to do calibrations, you have to spend a few weeks e-mailing people trying to get information about what the data means.
Again, it's possible that SDSS has totally licked the problem, but I really, really, really, really doubt it. What they seem to be doing is doing the best with what tools are available and processing the data so that it's generally available for other scientists. What they don't see to have a mechanism of doing is to allow general access to the original raw data, and then have outside groups totally reproduce their data reduction.
You might reply, but that means that giving people access to 50TB of data, that's impossible! And my point is that making those things possible is exactly what astroinformatics is all about.