Using big data to identify astronomocal data bias

  • Context: Undergrad 
  • Thread starter Thread starter Chronos
  • Start date Start date
  • Tags Tags
    Bias Big data Data
Click For Summary

Discussion Overview

The discussion centers on the use of big data in astronomy, particularly regarding its effectiveness in identifying biases in astronomical data sets, such as foreground contamination. Participants explore the implications of human bias in data interpretation and the potential of big data methods to refine data analysis and enhance the integrity of research findings.

Discussion Character

  • Exploratory
  • Debate/contested
  • Technical explanation

Main Points Raised

  • Some participants note that astronomy serves as a valuable test ground for big data approaches and question the results achieved in detecting biases in data sets.
  • One participant references an article discussing a crisis in cosmology, suggesting that confirmation bias may be reflected in the data due to insufficient variability over large sample sizes.
  • Another participant discusses the concept of pareidolia, arguing that human observers cannot be divorced from bias, which complicates data interpretation.
  • There is a suggestion that big data analysis should discern between known factors, unknown factors, and systematic errors in data sets.
  • Some participants express the importance of experimental controls to mitigate human bias in data reporting and analysis.
  • One participant emphasizes the need for 'cleaned' data to filter out systematic errors rather than focusing solely on raw data.

Areas of Agreement / Disagreement

Participants express a mix of agreement and disagreement regarding the impact of human bias on data interpretation and the effectiveness of big data methods. While there is a shared concern about bias, the specific mechanisms and implications remain contested.

Contextual Notes

Participants highlight limitations related to the variability of data samples and the influence of human bias on data interpretation, but these aspects remain unresolved within the discussion.

Chronos
Science Advisor
Gold Member
Messages
11,420
Reaction score
750
I've been following, albeit loosely, the use of big data to refine astronomical data. It has been frequently noted that astronomy is an excellent test ground for big data approaches. I'm led to wonder what kind of results have been achieved to date and how effective are these methods for detecting bias in data sets such as foreground contamination? Can it be used to test the parameter space of assumptions applied to data sets?
 
Space news on Phys.org
The article evokes a sense of pareidolia, regarding big data. Intelligent life forms are predisposed to make associations between the unknown and familiar. It serves a vital survival role to anticipate danger by drawing parallels between new, unfamiliar data and data from past experience. Unanticipated and/or intense sensory inputs [e.g., loud noises] are autonomically processed as potential threats. Inputs reminiscent of pleasant experiences [e.g., music] are similarly processed. You cannot divorce the observer from this kind of bias because it is a hardwired response. I prefer to think the data can speak for itself. Correlations only suggest the possibility a data set varies due to something other than random noise. It could be a known factor, an unknown factor, or merely systematic error. That is what I expect big data analysis should be capable of discerning.
 
  • Like
Likes   Reactions: Buzz Bloom and Jacob Mybiz
Chronos said:
The article evokes a sense of pareidolia, regarding big data. Intelligent life forms are predisposed to make associations between the unknown and familiar. It serves a vital survival role to anticipate danger by drawing parallels between new, unfamiliar data and data from past experience. Unanticipated and/or intense sensory inputs [e.g., loud noises] are autonomically processed as potential threats. Inputs reminiscent of pleasant experiences [e.g., music] are similarly processed. You cannot divorce the observer from this kind of bias because it is a hardwired response. I prefer to think the data can speak for itself. Correlations only suggest the possibility a data set varies due to something other than random noise. It could be a known factor, an unknown factor, or merely systematic error. That is what I expect big data analysis should be capable of discerning.
I agree but that's a pretty thorough meta analysis it's been awhile since I read the article but as I recall the primary issue is confirmation bias not in assumptions but reflected in the data. They don't have sufficient variability over a large sample size. The logical conclusion being the data is being reported in a biased manner which supports the presumed hypothesis. There is a very important distinction between producing data and choosing post hoc a theory which is familiar to you and, as the author suggests due to a lack of experimental control allowing bias, most likely without intent, to influence the data reported. Simple experimental controls where the scientists are blind to the data they are analyzing and what it represents protects the integrity and validity of research. Humans as you eloquently stated are biased by nature which is why experiments should always be designed to control for it.
 
I concur, but, see constraining the effects of human bias on the output as a motive for using big data analysis tools. I am less concerned with raw data than data which has been 'cleaned' to filter out systematics.
 
  • Like
Likes   Reactions: Buzz Bloom

Similar threads

  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 6 ·
Replies
6
Views
5K
  • · Replies 19 ·
Replies
19
Views
3K
  • · Replies 14 ·
Replies
14
Views
5K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 0 ·
Replies
0
Views
1K
Replies
4
Views
3K