Using big data to identify astronomocal data bias

Chronos · Apr 18, 2017

I've been following, albeit loosely, the use of big data to refine astronomical data. It has been frequently noted that astronomy is an excellent test ground for big data approaches. I'm led to wonder what kind of results have been achieved to date and how effective are these methods for detecting bias in data sets such as foreground contamination? Can it be used to test the parameter space of assumptions applied to data sets?

Jacob Mybiz · Apr 18, 2017

This is the most thorough review of this issue I am aware of http://theconversation.com/cosmology-is-in-crisis-but-not-for-the-reason-you-may-think-52349.

Chronos · Apr 19, 2017

The article evokes a sense of pareidolia, regarding big data. Intelligent life forms are predisposed to make associations between the unknown and familiar. It serves a vital survival role to anticipate danger by drawing parallels between new, unfamiliar data and data from past experience. Unanticipated and/or intense sensory inputs [e.g., loud noises] are autonomically processed as potential threats. Inputs reminiscent of pleasant experiences [e.g., music] are similarly processed. You cannot divorce the observer from this kind of bias because it is a hardwired response. I prefer to think the data can speak for itself. Correlations only suggest the possibility a data set varies due to something other than random noise. It could be a known factor, an unknown factor, or merely systematic error. That is what I expect big data analysis should be capable of discerning.

Jacob Mybiz · Apr 19, 2017

Chronos said:

The article evokes a sense of pareidolia, regarding big data. Intelligent life forms are predisposed to make associations between the unknown and familiar. It serves a vital survival role to anticipate danger by drawing parallels between new, unfamiliar data and data from past experience. Unanticipated and/or intense sensory inputs [e.g., loud noises] are autonomically processed as potential threats. Inputs reminiscent of pleasant experiences [e.g., music] are similarly processed. You cannot divorce the observer from this kind of bias because it is a hardwired response. I prefer to think the data can speak for itself. Correlations only suggest the possibility a data set varies due to something other than random noise. It could be a known factor, an unknown factor, or merely systematic error. That is what I expect big data analysis should be capable of discerning.

I agree but that's a pretty thorough meta analysis it's been awhile since I read the article but as I recall the primary issue is confirmation bias not in assumptions but reflected in the data. They don't have sufficient variability over a large sample size. The logical conclusion being the data is being reported in a biased manner which supports the presumed hypothesis. There is a very important distinction between producing data and choosing post hoc a theory which is familiar to you and, as the author suggests due to a lack of experimental control allowing bias, most likely without intent, to influence the data reported. Simple experimental controls where the scientists are blind to the data they are analyzing and what it represents protects the integrity and validity of research. Humans as you eloquently stated are biased by nature which is why experiments should always be designed to control for it.

Chronos · Apr 20, 2017

I concur, but, see constraining the effects of human bias on the output as a motive for using big data analysis tools. I am less concerned with raw data than data which has been 'cleaned' to filter out systematics.

Using big data to identify astronomocal data bias

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Undergrad Milne vs. Minkowski metric

Undergrad Comparing different scale factor functions in the same graph

Graduate Strong Progenitor Age Bias in Supernova Cosmology

Undergrad Does the scale factor need to be normalized?

High School A Mind-Boggling Number Comparison

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight