Astroinformatics: Learn More at AAS10

  • Thread starter Thread starter Simfish
  • Start date Start date
  • Tags Tags
    Field
AI Thread Summary
Astroinformatics is emerging as a new field, paralleling the development of bioinformatics, driven by the need to manage and analyze vast amounts of astrophysical data. The discussion highlights that traditional methods of data handling are becoming inadequate as researchers face challenges with terabytes of data. While the Sloan Digital Sky Survey (SDSS) has been successful in managing large data sets, concerns remain about the accessibility of raw data and the tools necessary for effective data mining and visualization. The conversation emphasizes the importance of developing new computational skills and collaborative efforts between astrophysicists and computer scientists to advance data analysis capabilities. Despite some skepticism about the novelty of astroinformatics, the potential for innovative applications and improved data access is recognized as crucial for future research in astrophysics.
Physics news on Phys.org


Looks like a buzzword attached to what people have been doing for years.
 


Yeah, seems like applied math or statistics would do the job for you well enough..
 


I think it came out of "bioinformatics." What happened was that with the genome mapping project, the biologists got whomped with massive amounts of data that they couldn't deal with, and so they needed to create this new subfield that combined CS and biology.

Looks like the same thing is happening with astrophysics. Also I think it's becoming obvious to people that we need some new research since the way that people have been handling data for years just doesn't work. If you have terabytes of data, you need some non-trivial CS skills to deal with it.
 


Are you suggesting that the work done by SDSS in handling large amounts of data doesn't work? They have 50 TB, maybe 100, and have been at it for a decade. I might have said that this is largely a solved problem.
 


Vanadium 50 said:
Are you suggesting that the work done by SDSS in handling large amounts of data doesn't work?

Not sure, but I think they are likely running into the same issues that computational astrophysicists are running into, and I've seen nothing that suggests that they've made any progress in data handling that the computational astrophysicists haven't run into.

They have 50 TB, maybe 100, and have been at it for a decade.

Sure and that will give you raw data. The trouble with raw data is that it's pretty much useless without tools to do data mining and visualization. Suppose you have 50 TB of data that is the result of the a simulation, and you want to run statistics. You end up spending a few weeks writing a program that hits the raw data files, and this program takes two days to run and after another two weeks of pulling your hair out, you finally get a graph.

Except then you want to run some other statistics, and you have to go through all over again. And then you find that the raw data is on one server, the analysis program is on another, and you are not going to FTP 50 TB of data over.

And then you find that all of the data is scattered against three or four files, with no metadata, and in order to do calibrations, you have to spend a few weeks e-mailing people trying to get information about what the data means.

Again, it's possible that SDSS has totally licked the problem, but I really, really, really, really doubt it. What they seem to be doing is doing the best with what tools are available and processing the data so that it's generally available for other scientists. What they don't see to have a mechanism of doing is to allow general access to the original raw data, and then have outside groups totally reproduce their data reduction.

You might reply, but that means that giving people access to 50TB of data, that's impossible! And my point is that making those things possible is exactly what astroinformatics is all about.
 
Last edited:


I am aware of the problem. HEP has it as well. But like I said, SDSS has had this problem for a decade, and they are a successful experiment. This seems to me to be a largely solved problem.
 


Vanadium 50 said:
I am aware of the problem. HEP has it as well. But like I said, SDSS has had this problem for a decade, and they are a successful experiment. This seems to me to be a largely solved problem.

When you've solved the old problems, the next step is to find new problems to solve.

For example, it would be really neat if you could put SDSS on a server and turn it into Google sky with steroids where you can write a n-body simulation and pull the initial conditions from the SDSS server. Or be able to do a database query like "give me the distribution of Mg++ line widths for all of the F2 class stellar objects within the Milky Way." That's not going to be possible without a lot of cooperation between astrophysicists and CS people.
 


Yes, but now you have moved away from "a whole new field of study", which is what this is being billed as, to "looking for interesting new things to do in an existing field". Not that the latter is bad - but I think that my description of a new buzzword being applied to what people have been doing fir years fits.
 

Similar threads

Back
Top