How to calculate the right chisquare

  • Context: Undergrad 
  • Thread starter Thread starter BillKet
  • Start date Start date
Click For Summary

Discussion Overview

The discussion revolves around calculating the chi-square statistic and its associated p-value when analyzing simulated data sets, specifically focusing on how the choice of binning affects these calculations. Participants explore the implications of binning on statistical analysis in the context of background and signal data.

Discussion Character

  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant seeks guidance on how to account for the number of bins when calculating the chi-square statistic and p-value.
  • Another participant emphasizes that the chi-square statistic is dependent on the number of cells in the contingency table and questions how the expected number of observations is computed.
  • A participant queries the meaning of "expected number," suggesting it may refer to the number of events.
  • Clarification is provided regarding "expected number" as the expected value of a random variable, referencing the Pearson's chi-square test.
  • There is a request for clarification on what is meant by "the number of cuts" and the nature of the data being analyzed.

Areas of Agreement / Disagreement

Participants express differing views on the interpretation of expected values and the implications of binning on statistical results. The discussion remains unresolved regarding the best approach to account for the number of bins in chi-square calculations.

Contextual Notes

Participants have not fully defined the assumptions regarding the data sets or the specific methods used for calculating expected values, leaving some aspects of the discussion open to interpretation.

BillKet
Messages
311
Reaction score
30
Hello! I have some simulated background only data and some background plus signal simulated data. After some cuts I end up with a histogram for each of the 2 sets and I want to calculate the chisquare (and hence the p-value for a signal actually being present in the background plus signal simulated data). However, it seems like the p-value changes quite a lot when changing the number of bins I use to bin my data. How do I take the number of bins into account when calculating the chisquare? Thank you!
 
Physics news on Phys.org
BillKet said:
How do I take the number of bins into account when calculating the chisquare?
Are you actually asking that question? The chi-square statistic is a funtion of the number of "cells" in the contingency table.

However, it seems like the p-value changes quite a lot when changing the number of bins I use to bin my data.

A usable null hypothesis requires that you be able to compute the expected number of observations that fall in each cell. How are you computing this expected number? How does it change when you change the number of bins? (Pehaps this calculation is done by using the simulated data. If so, exactly how is the simulated data used?)
 
Stephen Tashi said:
Are you actually asking that question? The chi-square statistic is a funtion of the number of "cells" in the contingency table.
A usable null hypothesis requires that you be able to compute the expected number of observations that fall in each cell. How are you computing this expected number? How does it change when you change the number of bins? (Pehaps this calculation is done by using the simulated data. If so, exactly how is the simulated data used?)
What do you mean by expected number? Isn't that the number of events?
 
BillKet said:
What do you mean by expected number? Isn't that the number of events?

I mean the "expected number" in the sense of the expected value of a random variable.

Have you read an article about Pearsons chi-square test or whatever variant of the chi-square test you want to use? For example, in the Wikipedia article https://en.wikipedia.org/wiki/Pearson's_chi-squared_test, the expected number of counts in a cell is denoted as "##E_i##".
 
BillKet said:
After some cuts I end up with a histogram for each of the 2 sets and I want to calculate the chisquare (and hence the p-value for a signal actually being present in the background plus signal simulated data).

What do you mean by the number of cuts? And what is the nature of the data?
 

Similar threads

Replies
28
Views
4K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 37 ·
2
Replies
37
Views
5K
  • · Replies 10 ·
Replies
10
Views
6K
  • · Replies 40 ·
2
Replies
40
Views
5K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 18 ·
Replies
18
Views
3K
  • · Replies 3 ·
Replies
3
Views
8K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
4K