How to calculate the right chisquare

  • I
  • Thread starter BillKet
  • Start date
In summary: Are there two sets of data, and each set has been cut to produce a histogram?In summary, the speaker is trying to calculate the chi-square and p-value for a signal in background plus signal simulated data. However, the p-value changes when the number of bins used to bin the data is changed. The speaker also mentions needing to take the number of bins into account when calculating the chi-square. They ask about how to compute the expected number of observations and how it changes when the number of bins is changed. They also clarify that they mean the expected value of a random variable when mentioning the expected number. Lastly, they ask if the speaker has read about Pearson's chi-square test and mention the use of cuts and histograms in their
  • #1
BillKet
312
29
Hello! I have some simulated background only data and some background plus signal simulated data. After some cuts I end up with a histogram for each of the 2 sets and I want to calculate the chisquare (and hence the p-value for a signal actually being present in the background plus signal simulated data). However, it seems like the p-value changes quite a lot when changing the number of bins I use to bin my data. How do I take the number of bins into account when calculating the chisquare? Thank you!
 
Physics news on Phys.org
  • #2
BillKet said:
How do I take the number of bins into account when calculating the chisquare?
Are you actually asking that question? The chi-square statistic is a funtion of the number of "cells" in the contingency table.

However, it seems like the p-value changes quite a lot when changing the number of bins I use to bin my data.

A usable null hypothesis requires that you be able to compute the expected number of observations that fall in each cell. How are you computing this expected number? How does it change when you change the number of bins? (Pehaps this calculation is done by using the simulated data. If so, exactly how is the simulated data used?)
 
  • #3
Stephen Tashi said:
Are you actually asking that question? The chi-square statistic is a funtion of the number of "cells" in the contingency table.
A usable null hypothesis requires that you be able to compute the expected number of observations that fall in each cell. How are you computing this expected number? How does it change when you change the number of bins? (Pehaps this calculation is done by using the simulated data. If so, exactly how is the simulated data used?)
What do you mean by expected number? Isn't that the number of events?
 
  • #4
BillKet said:
What do you mean by expected number? Isn't that the number of events?

I mean the "expected number" in the sense of the expected value of a random variable.

Have you read an article about Pearsons chi-square test or whatever variant of the chi-square test you want to use? For example, in the Wikipedia article https://en.wikipedia.org/wiki/Pearson's_chi-squared_test, the expected number of counts in a cell is denoted as "##E_i##".
 
  • #5
BillKet said:
After some cuts I end up with a histogram for each of the 2 sets and I want to calculate the chisquare (and hence the p-value for a signal actually being present in the background plus signal simulated data).

What do you mean by the number of cuts? And what is the nature of the data?
 

1. What is a chi-square test?

A chi-square test is a statistical method used to determine if there is a significant difference between the observed and expected frequencies of two or more categorical variables. It is commonly used in research to analyze data and determine if there is a relationship between variables.

2. How do I calculate the chi-square value?

The chi-square value is calculated by subtracting the expected frequency from the observed frequency, squaring the difference, and dividing it by the expected frequency. This process is repeated for each category and then the values are summed to get the overall chi-square value.

3. What is the significance level in a chi-square test?

The significance level, also known as alpha (α), is the predetermined threshold used to determine if the chi-square value is statistically significant. It is typically set at 0.05, meaning that there is a 5% chance that the observed differences are due to chance alone.

4. How do I interpret the results of a chi-square test?

The results of a chi-square test are typically presented in a chi-square table, which shows the calculated chi-square value, degrees of freedom, and the associated p-value. If the p-value is less than the significance level, it indicates that there is a significant difference between the observed and expected frequencies, and the null hypothesis can be rejected.

5. What are the assumptions of a chi-square test?

The assumptions of a chi-square test include: 1) the data is collected from a random sample, 2) the expected frequency for each category is at least 5, and 3) the variables are independent of each other. If these assumptions are not met, the results of the chi-square test may not be reliable.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
28
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
2
Replies
37
Views
4K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
18
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
7K
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
22
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
993
Back
Top