http://nedwww.ipac.caltech.edu/level5/Sept03/Feigelson/paper.pdf PhyStat 2003: Statistical Problems in Particle Physics, Astrophysics, and Cosmology Statistical Challenges in Modern Astronomy E. D. Feigelson Department of Astronomy & Astrophysics, Penn State University, University Park PA 16802, USA G. J. Babu Department of Statistics, Penn State University, University Park PA 16802, USA Despite centuries of close association, statistics andastronomyare surprisingly distant today. Most observational astronomical research relies on an inadequate toolbox of methodological tools. Yet the needs are substantial: astronomy encounters sophisticated problems involving sampling theory, survival analysis, multivariate classification and analysis, time series analysis, wavelet analysis, spatial point processes, nonlinear regression, bootstrap resampling and model selection. We review the recent resurgence of astrostatistical research, and outline new challenges raised by the emerging Virtual Observatory. Our essay ends with a list of research challenges and arXiv:astro-ph/0401404 v1 20 Jan 2004 infrastructure for astrostatistics in the coming decade. 1. The glorious history of astronomy and statistics Astronomy is perhaps the oldest observational science1 . The effort to understand the mysterious luminous objects in the sky has been an important element of human culture for at least 104years.Quantitative measurements of celestial phenomena were carried out by many ancient civilizations. The classical Greeks were not active observers but were unusually creative in the applications of mathematical principles to as tronomy. The geometric models ofthe Platonists with crystalline spheres spinning around the static Earth were elaborated in detail, and this model endured in Europe for 15 centuries. But it was another Greek natural philosopher, Hipparchus, who made one of the first applications of mathematical principles that we now consider to be in the realm of statistics. Finding scatter in Bablylonian measurements of the length of a year, defined as the time between solstices, he took the middle of the range – rather than the mean or median – for the best value. This is but one of many discussions of statistical issues in the history of astronomy. Ptolemy estimated parameters of a non-linear cosmological model using a minimax goodness-of-fit method. Al-Biruni discussed the dangers of propagating errors from inaccurate in struments and inattentive observers. While some Medieval scholars advised against the acquisition of repeated measurements, fearing that errors would compound rather than compensate for each other, the usefulnes of the mean to increase precision was demonstrated with great success by Tycho Brahe. During the 19thcentury, several elements of modern mathematical statistics were developed in the context 1The historical relationship between astronomy and statistics is described in references [15], [38] and elsewhere. Our Astrostatistics monograph gives more detail and contemporary examples of astrostatistical problems [3]. of celestial mechanics, where the application of Newtonian theory to solar system phenomena gave astonishingly precise and self-consistent quantitative inferences. Legendre developed L2 least squares parameter estimation to model cometary orbits. The least squares method became an instant success in European astronomy and geodesy.Other astronomers and physicists contributed to statistics: Huygens wrote a book on probability in games of chance; Newton developed an interpolation procedure; Halley laid foundations of actuarial science; Quetelet worked on statistical approaches to social sciences; Bessel first used the concept of ”probable error”; and Airy wrote a volume on the theory of errors. But the two fields diverged in the late-19thand 20th centuries. Astronomy leaped onto the advances of physics – electromagnetism, thermodynamics, quantum mechanics and general relativity – to understand the physical nature of stars, galaxies and the Universe as a whole. A subfield called “statistical astronomy” was still present but concentrated on rather narrow issues involving star counts and Galactic structure [30]. Statistics concentrated on analytical approaches. It found its principle applications in socialsciences, biometrical sciences and in practical industries (e.g., Sir R. A. Fisher’s employment by the British agricultural service). 2. Statistical needs of astronomy today Contemporary astronomy abounds in questions of a statistical nature. In addition to exploratory data analysis and simple heuristic (usually linear)modeling common in other fields, astronomers also often interpret data in terms of complicated non-linear models based on deterministic astrophysical processes. The phenomena studied must obey known behaviors of atomic and nuclear physics, gravitation and mechanics, thermodynamics and radiative processes, and so forth. ‘Modeling’ data may thus involves both the selection of a model family based on an astrophysical--------------------