How to calculate the right chisquare

BillKet · Jan 12, 2020

Hello! I have some simulated background only data and some background plus signal simulated data. After some cuts I end up with a histogram for each of the 2 sets and I want to calculate the chisquare (and hence the p-value for a signal actually being present in the background plus signal simulated data). However, it seems like the p-value changes quite a lot when changing the number of bins I use to bin my data. How do I take the number of bins into account when calculating the chisquare? Thank you!

Stephen Tashi · Jan 12, 2020

BillKet said:

How do I take the number of bins into account when calculating the chisquare?

Are you actually asking that question? The chi-square statistic is a funtion of the number of "cells" in the contingency table.

However, it seems like the p-value changes quite a lot when changing the number of bins I use to bin my data.

A usable null hypothesis requires that you be able to compute the expected number of observations that fall in each cell. How are you computing this expected number? How does it change when you change the number of bins? (Pehaps this calculation is done by using the simulated data. If so, exactly how is the simulated data used?)

BillKet · Jan 12, 2020

Stephen Tashi said:

Are you actually asking that question? The chi-square statistic is a funtion of the number of "cells" in the contingency table.
A usable null hypothesis requires that you be able to compute the expected number of observations that fall in each cell. How are you computing this expected number? How does it change when you change the number of bins? (Pehaps this calculation is done by using the simulated data. If so, exactly how is the simulated data used?)

What do you mean by expected number? Isn't that the number of events?

Stephen Tashi · Jan 12, 2020

BillKet said:

What do you mean by expected number? Isn't that the number of events?

I mean the "expected number" in the sense of the expected value of a random variable.

Have you read an article about Pearsons chi-square test or whatever variant of the chi-square test you want to use? For example, in the Wikipedia article https://en.wikipedia.org/wiki/Pearson's_chi-squared_test, the expected number of counts in a cell is denoted as "##E_i##".

gleem · Jan 12, 2020

BillKet said:

After some cuts I end up with a histogram for each of the 2 sets and I want to calculate the chisquare (and hence the p-value for a signal actually being present in the background plus signal simulated data).

What do you mean by the number of cuts? And what is the nature of the data?

How to calculate the right chisquare

1. What is a chi-square test?

2. How do I calculate the chi-square value?

3. What is the significance level in a chi-square test?

4. How do I interpret the results of a chi-square test?

5. What are the assumptions of a chi-square test?

Similar threads

Hot Threads

Recent Insights