Statistical uncertainties of sub-backgrounds

In summary, the given code generates the statistical or systematic uncertainty of different sub-backgrounds to the total background. It evaluates the uncertainty on the total background by summing the individual uncertainties weighted by the number of events for each sub-background. However, this approach may not accurately capture the effect of uncertainties for backgrounds with different numbers of events. An alternative approach is to calculate the uncertainty as the square root of the sum of the squared uncertainties for each individual sub-background, which may provide a more accurate estimate.
  • #1
ChrisVer
Gold Member
3,378
464
I was given a code that generates the statistical or systematic uncertainty of different sub-backgrounds to the total background...
Let's say that the total is [itex]N[/itex] and each sub-background has [itex]N_i, \delta N_i[/itex] number of events and relative uncertainty (=err/Nevt) respectively.
What the code does is it evaluates the uncertainty on the total bkg by:
[itex]\delta N = \delta N_1 \frac{N_1}{N} + ... + \delta N_M \frac{N_M}{N}[/itex]
Is that correct?
why not:
[itex]\delta N = \frac{\sqrt{N_1^2 \delta N_1^2 + ... + N_M^2 \delta N_M^2}}{N}[/itex]
?
 
Last edited:
Physics news on Phys.org
  • #2
Let me quantify it... let's take [itex]\delta N_i = \delta N =0.05[/itex] (so 5% uncertainty on all sub-backgrounds).
and let's only take the case of 2 sub-backgrounds B1, B2, with [itex]N_1 = 50[/itex] and [itex]N_2=70[/itex]
The total background will then be [itex]N_T=120[/itex]

following the given-code algo, I calculate that the total uncertainty is:
[itex]\delta N_T= \frac{\delta N}{N_T} (N_1 + N_2) = 0.05[/itex] or 5%
while using mine:
[itex] \delta N_T =\frac{\sqrt{\delta N^2 (N_1^2+N_2^2)}}{N_T} = 0.036[/itex] or 3.6%
 
Last edited:
  • #3
ChrisVer said:
I was given a code that generates the statistical or systematic uncertainty of different sub-backgrounds to the total background...

"Uncertainty" is an ambiguous term. Are you calculating "uncertainty" in the sense of the standard deviation of the sum of two random variables? Or are you calculating "worst case bounds" for a measurement of a sum ?
 
  • #4
my background uncertainties correspond to 1-standard deviation.
 
  • #5
ChrisVer said:
my background uncertainties correspond to 1-standard deviation.

It would help me if we state the question using terminology that is standard for statistics. What one can do with sample data is "estimate" population parameters. For example, we can't "calculate" the population mean from a sample of data. We "calculate" an estimator of the population mean.

I think you are asking about how to estimate something, but posing the question in terms of population parameters. First, let's clarify what that question is in terms of population parameters.

Assume we have a random variable ##X## with standard deviations ##\sigma_X## and mean ##\mu_x##. We can define a random variable ##E_X = \frac{X - \bar{X}} {\bar{X}}## The standard deviation of ##E_X## is ##\sigma_{E_X} = \frac{ \sigma X}{\bar X}##. The mean of ##E_X## is ##\bar{E_X} = 0 ##. (If we were to define ##E_X## as ##\frac{|X - \bar{X}|} {\bar{X}} ## then the mean wouldn't be zero.)

Does ##\delta_X## denote ##\sigma_{E_X}## ? - or does it denote ##\sigma_X##?

In a similar manner, define a random variable ##Y## with associated parameters ##\mu_Y, \sigma_Y## and an associated variable ##E_Y##.

Are you trying to estimate the standard deviation of the random variable ##T = X + Y## (i.e. a single realization of both ##X## and ##Y##) ?

Or are you trying to find the standard deviation of the random variable ##E_T = \frac{T - \bar{T}}{\bar{T}}## ?

Or are you trying to find the standard deviation of a random variable that involves summing more than one realization of ##X## and ##Y## ?
 
  • #6
I don't quiet understand the discussion on the mean but let's go with that...in fact I am a little afraid of talking for random variables when it comes to yields, but I suppose there is nothing wrong to regard them as such...
So in that discussion the numbers I give [itex]N_i[/itex] are the means, and the relative uncertainties are the standard deviations/mean..In particular the relative uncertainty of 0.05 I mentioned for the 50 events is corresponds to a std of 0.05*50 =2.5 events. So my [itex]\delta_X[/itex] denotes the [itex]\sigma_{E_X}[/itex]... but in general it can be messy to deal with this uncertainty when the backgrounds are weighted by the number of their events (in particular the 0.05 for a bkg of 1000 events is not bringing the same effect as to a bkg of 10,000 events)... for that reason it's easier to move to [itex]\sigma_X[/itex] and do the calculation with this... I think that there is nothing wrong with the equation I'm proposing; I am trying to see if the one that was in the code makes sense and how it's motivated (to be honest, up to now, I see it as a bug).

Stephen Tashi said:
Are you trying to estimate the standard deviation of the random variable T=X+YT = X + Y (i.e. a single realization of both XX and YY) ?

Or are you trying to find the standard deviation of the random variable ET=T−¯T¯TE_T = \frac{T - \bar{T}}{\bar{T}} ?
I don't understand a difference between those two questions... if I knew the answer to the first, I would know the answer to the second and vice versa... But I guess it's the 2nd what I am looking for... and that's why I go through the first:
I get the standard deviation [itex]\sigma_T[/itex] and then divide with [itex]\bar{T}[/itex]. In particular if I knew the [itex]\sigma_T[/itex] I would know the number of events of my uncertainty, so I could say 120+/-3 ... By dividing with [itex]\bar{T}[/itex] I get the relative uncertainty and can say 120+/- 2.5%.

Stephen Tashi said:
Or are you trying to find the standard deviation of a random variable that involves summing more than one realization of XX and YY ?
I don't think so, since I have a value for X and Y ?
 
Last edited:
  • #7
ChrisVer said:
I am trying to see if the one that was in the code makes sense and how it's motivated

It is correct for giving the bound ##\delta N ## on the total observed count if we regard the ##\delta N_i ## as bounds on the observed counts of the ##N_i##. Being given that the bound on a count is (definitely) plus or minus 10 is not the same as being given that the standard deviation of the count as a random variable is 10. For example if the observed count is 100 and you know the bound is 10 then the "true" count is definitely between 90 and 110. However if the observed count is 100 and you regard it as as random variable, then you can't state a definite probability for it being between 90 and 110 unless you also assume 100 is the population (i.e. "true") mean value of the random variable. So you have to assume you "lucked out" and observed a count that was exactly equal to the population mean.
 
  • Like
Likes ChrisVer
  • #8
If sub-backgrounds arise from normally distributed random phenomena then the uncertaintyin the total background as expressed by the square root of the variance should be

s.d. = σ =√( ΣidNi2)

I do not understand the utility of this program that you are using is it for simulation or what?
 
  • #9
It's for generating a latex table that contains the error values for the several backgrounds and the total.
In particular it reads the background yields and errors and generates the table of them + determines the total background yield + its error ... I found disreptancies to the last error results with errors I had previously calculated myself and ended up with looking it out.
 
Last edited:
  • #10
That is what it does but what is the ultimate usefulness of this. Why would I need this info?
 
  • #11
I answered to
gleem said:
is it for simulation or what?
which is no.
In post 1 I write how it does what it's supposed to do.
 
  • #12
I basically disagree with the method it uses for determining uncertainties of the net background.based on my assumptions and the information you presented.
 
  • #13
gleem said:
If sub-backgrounds arise from normally distributed random phenomena then the uncertaintyin the total background as expressed by the square root of the variance should be

s.d. = σ =√( ΣidNi2)

True, but relative errors aren't additive.

##\frac{ (X + Y) - (\mu_X + \mu_Y)}{\mu_X + \mu_Y} \ne \frac{X-\mu_X}{\mu_X} + \frac{Y - \mu_Y}{\mu_Y} ##

So a basic question is how to interpret the ##\delta## notation. Is ##\delta Ni## the standard deviation of the i-th count or is it the standard deviation of the relative error in the i-th count ?
I do not understand the utility of this program that you are using is it for simulation or what?

If even ChrisVer doesn't know exactly how the data is generated, we can't give good advice.
 
  • #14
Stephen Tashi said:
If even ChrisVer doesn't know exactly how the data is generated, we can't give good advice.
I know how the data is generated [they are Monte Carlo]. But I find this of low importance to mention, since what I am referring to is a separate thing.
The numbers I am using are entries in several bins of some distribution and their corresponding systematic and statistical errors.
So if you have some backgrounds (like photons, jets, W etc) the total number and error of events in each bin is supposed to be the "mean" number of expected events of each background, and the error is supposed to be the sum in quadrature of the separate errors. Not just the sum of errors.

Stephen Tashi said:
is how to interpret the δ\delta notation. Is δNi\delta Ni the standard deviation of the i-th count or is it the standard deviation of the relative error in the i-th count ?
I am not sure I understand this question very well, but I will try to explain what [itex]\delta N_i[/itex] stands for...
For the statistical uncertainty, the relative uncertainty is [itex]\delta N_i = \frac{\sigma_i}{N_i}[/itex] where [itex]\sigma_i[/itex] is the standard deviation (or the error) of the [itex]N_i[/itex] yield ([itex]i[/itex] runs over the several backgrounds, so i=photon, jet, W etc)... that means that if you repeatedly did the experiment, the number of events of the background [itex]i[/itex] you'd measure would be found (within 68% certainty) in the range [itex][N_i - \sigma_i , N_i +\sigma_i][/itex]. The relative uncertainty instead of giving a Gaussian distribution around the mean [itex]N_i[/itex] and std [itex]\sigma_i[/itex] is giving a Gaussian distribution with mean 0 and standard deviation 1.
The question then becomes what can you say about the total background, given that you have [itex]N_i,\sigma_i[/itex] (or [itex]\delta N_i[/itex]).
What I say:
The total number of events will obviously be the sum of the individual components: [itex]N_T= \sum_i N_i[/itex].
The standard deviation of the sum of the individual components is supposed to be [itex]\sigma_T = \sqrt{\sum_i \sigma_i^2}[/itex]
Or the relative uncertainty [itex]\delta N_T = \frac{\sigma_T}{N_T}= \frac{\sqrt{\sum_i N_i^2 \delta N_i^2} }{N_T}[/itex]

What I read in the code under question:
The total number of events is the sum of the individual components: [itex]N_T= \sum_i N_i[/itex].
The standard deviation of the sum of the individual components is supposed to be [itex]\sigma_T = \sum_i \sigma_i[/itex]
Which gives the relative uncertainty [itex]\delta N_T = \frac{\sigma_T}{N_T} = \frac{\sum_i N_i \delta N_i}{N_T}[/itex]
 

1. What are sub-backgrounds in statistics?

Sub-backgrounds refer to the components of a data set that are considered as noise or background signal, rather than the main signal of interest. These sub-backgrounds can include instrumental noise, background radiation, or other sources of uncertainty that may affect the accuracy of the data.

2. How are sub-backgrounds quantified?

Sub-backgrounds can be quantified using statistical methods such as hypothesis testing, confidence intervals, and standard deviation. These techniques allow scientists to determine the range of uncertainty associated with the sub-backgrounds in a data set.

3. Why is it important to consider statistical uncertainties of sub-backgrounds?

It is important to consider the statistical uncertainties of sub-backgrounds because they can affect the accuracy and reliability of the data. By understanding and quantifying these uncertainties, scientists can make more informed decisions about the validity of their results and the conclusions drawn from them.

4. How can sub-backgrounds be minimized?

Sub-backgrounds can be minimized through various methods such as increasing sample sizes, improving instrumentation, and implementing data filtering techniques. These strategies can help reduce the impact of sub-backgrounds on the data and improve the overall accuracy of the results.

5. Are there any limitations to quantifying statistical uncertainties of sub-backgrounds?

Yes, there are limitations to quantifying statistical uncertainties of sub-backgrounds. These uncertainties are often based on assumptions and may not accurately represent the true uncertainties in the data. Additionally, some sub-backgrounds may be difficult to measure or quantify, making it challenging to accurately determine their impact on the data.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
12
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
22
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
16
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
Replies
3
Views
913
  • Set Theory, Logic, Probability, Statistics
Replies
0
Views
963
Replies
13
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
18
Views
2K
Back
Top