A Statistical uncertainties of sub-backgrounds

ChrisVer · Jul 29, 2016

I was given a code that generates the statistical or systematic uncertainty of different sub-backgrounds to the total background...
Let's say that the total is N and each sub-background has N_i, \delta N_i number of events and relative uncertainty (=err/Nevt) respectively.
What the code does is it evaluates the uncertainty on the total bkg by:
\delta N = \delta N_1 \frac{N_1}{N} + ... + \delta N_M \frac{N_M}{N}
Is that correct?
why not:
\delta N = \frac{\sqrt{N_1^2 \delta N_1^2 + ... + N_M^2 \delta N_M^2}}{N}
?

ChrisVer · Jul 29, 2016

Let me quantify it... let's take \delta N_i = \delta N =0.05 (so 5% uncertainty on all sub-backgrounds).
and let's only take the case of 2 sub-backgrounds B1, B2, with N_1 = 50 and N_2=70
The total background will then be N_T=120

following the given-code algo, I calculate that the total uncertainty is:
\delta N_T= \frac{\delta N}{N_T} (N_1 + N_2) = 0.05 or 5%
while using mine:
\delta N_T =\frac{\sqrt{\delta N^2 (N_1^2+N_2^2)}}{N_T} = 0.036 or 3.6%

Stephen Tashi · Jul 29, 2016

ChrisVer said:

I was given a code that generates the statistical or systematic uncertainty of different sub-backgrounds to the total background...

"Uncertainty" is an ambiguous term. Are you calculating "uncertainty" in the sense of the standard deviation of the sum of two random variables? Or are you calculating "worst case bounds" for a measurement of a sum ?

ChrisVer · Jul 29, 2016

my background uncertainties correspond to 1-standard deviation.

Stephen Tashi · Jul 30, 2016

ChrisVer said:

my background uncertainties correspond to 1-standard deviation.

It would help me if we state the question using terminology that is standard for statistics. What one can do with sample data is "estimate" population parameters. For example, we can't "calculate" the population mean from a sample of data. We "calculate" an estimator of the population mean.

I think you are asking about how to estimate something, but posing the question in terms of population parameters. First, let's clarify what that question is in terms of population parameters.

Assume we have a random variable ##X## with standard deviations ##\sigma_X## and mean ##\mu_x##. We can define a random variable ##E_X = \frac{X - \bar{X}} {\bar{X}}## The standard deviation of ##E_X## is ##\sigma_{E_X} = \frac{ \sigma X}{\bar X}##. The mean of ##E_X## is ##\bar{E_X} = 0 ##. (If we were to define ##E_X## as ##\frac{|X - \bar{X}|} {\bar{X}} ## then the mean wouldn't be zero.)

Does ##\delta_X## denote ##\sigma_{E_X}## ? - or does it denote ##\sigma_X##?

In a similar manner, define a random variable ##Y## with associated parameters ##\mu_Y, \sigma_Y## and an associated variable ##E_Y##.

Are you trying to estimate the standard deviation of the random variable ##T = X + Y## (i.e. a single realization of both ##X## and ##Y##) ?

Or are you trying to find the standard deviation of the random variable ##E_T = \frac{T - \bar{T}}{\bar{T}}## ?

Or are you trying to find the standard deviation of a random variable that involves summing more than one realization of ##X## and ##Y## ?

ChrisVer · Jul 30, 2016

I don't quiet understand the discussion on the mean but let's go with that...in fact I am a little afraid of talking for random variables when it comes to yields, but I suppose there is nothing wrong to regard them as such...
So in that discussion the numbers I give N_i are the means, and the relative uncertainties are the standard deviations/mean..In particular the relative uncertainty of 0.05 I mentioned for the 50 events is corresponds to a std of 0.05*50 =2.5 events. So my \delta_X denotes the \sigma_{E_X}... but in general it can be messy to deal with this uncertainty when the backgrounds are weighted by the number of their events (in particular the 0.05 for a bkg of 1000 events is not bringing the same effect as to a bkg of 10,000 events)... for that reason it's easier to move to \sigma_X and do the calculation with this... I think that there is nothing wrong with the equation I'm proposing; I am trying to see if the one that was in the code makes sense and how it's motivated (to be honest, up to now, I see it as a bug).

Stephen Tashi said:

Are you trying to estimate the standard deviation of the random variable T=X+YT = X + Y (i.e. a single realization of both XX and YY) ?

Or are you trying to find the standard deviation of the random variable ET=T−¯T¯TE_T = \frac{T - \bar{T}}{\bar{T}} ?

I don't understand a difference between those two questions... if I knew the answer to the first, I would know the answer to the second and vice versa... But I guess it's the 2nd what I am looking for... and that's why I go through the first:
I get the standard deviation \sigma_T and then divide with \bar{T}. In particular if I knew the \sigma_T I would know the number of events of my uncertainty, so I could say 120+/-3 ... By dividing with \bar{T} I get the relative uncertainty and can say 120+/- 2.5%.

Stephen Tashi said:

Or are you trying to find the standard deviation of a random variable that involves summing more than one realization of XX and YY ?

I don't think so, since I have a value for X and Y ?

Stephen Tashi · Jul 31, 2016

ChrisVer said:

I am trying to see if the one that was in the code makes sense and how it's motivated

It is correct for giving the bound ##\delta N ## on the total observed count if we regard the ##\delta N_i ## as bounds on the observed counts of the ##N_i##. Being given that the bound on a count is (definitely) plus or minus 10 is not the same as being given that the standard deviation of the count as a random variable is 10. For example if the observed count is 100 and you know the bound is 10 then the "true" count is definitely between 90 and 110. However if the observed count is 100 and you regard it as as random variable, then you can't state a definite probability for it being between 90 and 110 unless you also assume 100 is the population (i.e. "true") mean value of the random variable. So you have to assume you "lucked out" and observed a count that was exactly equal to the population mean.

gleem · Jul 31, 2016

If sub-backgrounds arise from normally distributed random phenomena then the uncertaintyin the total background as expressed by the square root of the variance should be

s.d. = σ =√( Σ_idN_i²)

I do not understand the utility of this program that you are using is it for simulation or what?

ChrisVer · Jul 31, 2016

It's for generating a latex table that contains the error values for the several backgrounds and the total.
In particular it reads the background yields and errors and generates the table of them + determines the total background yield + its error ... I found disreptancies to the last error results with errors I had previously calculated myself and ended up with looking it out.

gleem · Jul 31, 2016

That is what it does but what is the ultimate usefulness of this. Why would I need this info?

ChrisVer · Jul 31, 2016

I answered to

gleem said:

is it for simulation or what?

which is no.
In post 1 I write how it does what it's supposed to do.

gleem · Jul 31, 2016

I basically disagree with the method it uses for determining uncertainties of the net background.based on my assumptions and the information you presented.

Stephen Tashi · Jul 31, 2016

gleem said:

If sub-backgrounds arise from normally distributed random phenomena then the uncertaintyin the total background as expressed by the square root of the variance should be

s.d. = σ =√( Σ_idN_i²)

True, but relative errors aren't additive.

##\frac{ (X + Y) - (\mu_X + \mu_Y)}{\mu_X + \mu_Y} \ne \frac{X-\mu_X}{\mu_X} + \frac{Y - \mu_Y}{\mu_Y} ##

So a basic question is how to interpret the ##\delta## notation. Is ##\delta Ni## the standard deviation of the i-th count or is it the standard deviation of the relative error in the i-th count ?

I do not understand the utility of this program that you are using is it for simulation or what?

If even ChrisVer doesn't know exactly how the data is generated, we can't give good advice.

ChrisVer · Jul 31, 2016

Stephen Tashi said:

If even ChrisVer doesn't know exactly how the data is generated, we can't give good advice.

I know how the data is generated [they are Monte Carlo]. But I find this of low importance to mention, since what I am referring to is a separate thing.
The numbers I am using are entries in several bins of some distribution and their corresponding systematic and statistical errors.
So if you have some backgrounds (like photons, jets, W etc) the total number and error of events in each bin is supposed to be the "mean" number of expected events of each background, and the error is supposed to be the sum in quadrature of the separate errors. Not just the sum of errors.

Stephen Tashi said:

is how to interpret the δ\delta notation. Is δNi\delta Ni the standard deviation of the i-th count or is it the standard deviation of the relative error in the i-th count ?

I am not sure I understand this question very well, but I will try to explain what \delta N_i stands for...
For the statistical uncertainty, the relative uncertainty is \delta N_i = \frac{\sigma_i}{N_i} where \sigma_i is the standard deviation (or the error) of the N_i yield (i runs over the several backgrounds, so i=photon, jet, W etc)... that means that if you repeatedly did the experiment, the number of events of the background i you'd measure would be found (within 68% certainty) in the range [N_i - \sigma_i , N_i +\sigma_i]. The relative uncertainty instead of giving a Gaussian distribution around the mean N_i and std \sigma_i is giving a Gaussian distribution with mean 0 and standard deviation 1.
The question then becomes what can you say about the total background, given that you have N_i,\sigma_i (or \delta N_i).
What I say:
The total number of events will obviously be the sum of the individual components: N_T= \sum_i N_i.
The standard deviation of the sum of the individual components is supposed to be \sigma_T = \sqrt{\sum_i \sigma_i^2}
Or the relative uncertainty \delta N_T = \frac{\sigma_T}{N_T}= \frac{\sqrt{\sum_i N_i^2 \delta N_i^2} }{N_T}

What I read in the code under question:
The total number of events is the sum of the individual components: N_T= \sum_i N_i.
The standard deviation of the sum of the individual components is supposed to be \sigma_T = \sum_i \sigma_i
Which gives the relative uncertainty \delta N_T = \frac{\sigma_T}{N_T} = \frac{\sum_i N_i \delta N_i}{N_T}

A Statistical uncertainties of sub-backgrounds

Similar threads

Hot Threads

B A Little Probability Puzzle

I Need help solving this Existence Algorithm for truth

A Does this computation satisfy LTL formulas?

A Prove that points which are indistinguishable from 0 exist (using logic)

A Mathematical Connection between Cosmic Expansion and Exponential Growth

Recent Insights

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem

Insights Why Vector Spaces Explain The World: A Historical Perspective