A Statistical uncertainties of sub-backgrounds

1. Jul 29, 2016

ChrisVer

I was given a code that generates the statistical or systematic uncertainty of different sub-backgrounds to the total background...
Let's say that the total is $N$ and each sub-background has $N_i, \delta N_i$ number of events and relative uncertainty (=err/Nevt) respectively.
What the code does is it evaluates the uncertainty on the total bkg by:
$\delta N = \delta N_1 \frac{N_1}{N} + ... + \delta N_M \frac{N_M}{N}$
Is that correct?
why not:
$\delta N = \frac{\sqrt{N_1^2 \delta N_1^2 + ... + N_M^2 \delta N_M^2}}{N}$
?

Last edited: Jul 29, 2016
2. Jul 29, 2016

ChrisVer

Let me quantify it... let's take $\delta N_i = \delta N =0.05$ (so 5% uncertainty on all sub-backgrounds).
and let's only take the case of 2 sub-backgrounds B1, B2, with $N_1 = 50$ and $N_2=70$
The total background will then be $N_T=120$

following the given-code algo, I calculate that the total uncertainty is:
$\delta N_T= \frac{\delta N}{N_T} (N_1 + N_2) = 0.05$ or 5%
while using mine:
$\delta N_T =\frac{\sqrt{\delta N^2 (N_1^2+N_2^2)}}{N_T} = 0.036$ or 3.6%

Last edited: Jul 29, 2016
3. Jul 29, 2016

Stephen Tashi

"Uncertainty" is an ambiguous term. Are you calculating "uncertainty" in the sense of the standard deviation of the sum of two random variables? Or are you calculating "worst case bounds" for a measurement of a sum ?

4. Jul 29, 2016

ChrisVer

my background uncertainties correspond to 1-standard deviation.

5. Jul 30, 2016

Stephen Tashi

It would help me if we state the question using terminology that is standard for statistics. What one can do with sample data is "estimate" population parameters. For example, we can't "calculate" the population mean from a sample of data. We "calculate" an estimator of the population mean.

I think you are asking about how to estimate something, but posing the question in terms of population parameters. First, let's clarify what that question is in terms of population parameters.

Assume we have a random variable $X$ with standard deviations $\sigma_X$ and mean $\mu_x$. We can define a random variable $E_X = \frac{X - \bar{X}} {\bar{X}}$ The standard deviation of $E_X$ is $\sigma_{E_X} = \frac{ \sigma X}{\bar X}$. The mean of $E_X$ is $\bar{E_X} = 0$. (If we were to define $E_X$ as $\frac{|X - \bar{X}|} {\bar{X}}$ then the mean wouldn't be zero.)

Does $\delta_X$ denote $\sigma_{E_X}$ ? - or does it denote $\sigma_X$?

In a similar manner, define a random variable $Y$ with associated parameters $\mu_Y, \sigma_Y$ and an associated variable $E_Y$.

Are you trying to estimate the standard deviation of the random variable $T = X + Y$ (i.e. a single realization of both $X$ and $Y$) ?

Or are you trying to find the standard deviation of the random variable $E_T = \frac{T - \bar{T}}{\bar{T}}$ ?

Or are you trying to find the standard deviation of a random variable that involves summing more than one realization of $X$ and $Y$ ?

6. Jul 30, 2016

ChrisVer

I don't quiet understand the discussion on the mean but let's go with that...in fact I am a little afraid of talking for random variables when it comes to yields, but I suppose there is nothing wrong to regard them as such...
So in that discussion the numbers I give $N_i$ are the means, and the relative uncertainties are the standard deviations/mean..In particular the relative uncertainty of 0.05 I mentioned for the 50 events is corresponds to a std of 0.05*50 =2.5 events. So my $\delta_X$ denotes the $\sigma_{E_X}$... but in general it can be messy to deal with this uncertainty when the backgrounds are weighted by the number of their events (in particular the 0.05 for a bkg of 1000 events is not bringing the same effect as to a bkg of 10,000 events)... for that reason it's easier to move to $\sigma_X$ and do the calculation with this.... I think that there is nothing wrong with the equation I'm proposing; I am trying to see if the one that was in the code makes sense and how it's motivated (to be honest, up to now, I see it as a bug).

I don't understand a difference between those two questions... if I knew the answer to the first, I would know the answer to the second and vice versa.... But I guess it's the 2nd what I am looking for.... and that's why I go through the first:
I get the standard deviation $\sigma_T$ and then divide with $\bar{T}$. In particular if I knew the $\sigma_T$ I would know the number of events of my uncertainty, so I could say 120+/-3 ... By dividing with $\bar{T}$ I get the relative uncertainty and can say 120+/- 2.5%.

I don't think so, since I have a value for X and Y ?

Last edited: Jul 30, 2016
7. Jul 31, 2016

Stephen Tashi

It is correct for giving the bound $\delta N$ on the total observed count if we regard the $\delta N_i$ as bounds on the observed counts of the $N_i$. Being given that the bound on a count is (definitely) plus or minus 10 is not the same as being given that the standard deviation of the count as a random variable is 10. For example if the observed count is 100 and you know the bound is 10 then the "true" count is definitely between 90 and 110. However if the observed count is 100 and you regard it as as random variable, then you can't state a definite probability for it being between 90 and 110 unless you also assume 100 is the population (i.e. "true") mean value of the random variable. So you have to assume you "lucked out" and observed a count that was exactly equal to the population mean.

8. Jul 31, 2016

gleem

If sub-backgrounds arise from normally distributed random phenomena then the uncertaintyin the total background as expressed by the square root of the variance should be

s.d. = σ =√( ΣidNi2)

I do not understand the utility of this program that you are using is it for simulation or what?

9. Jul 31, 2016

ChrisVer

It's for generating a latex table that contains the error values for the several backgrounds and the total.
In particular it reads the background yields and errors and generates the table of them + determines the total background yield + its error ... I found disreptancies to the last error results with errors I had previously calculated myself and ended up with looking it out.

Last edited: Jul 31, 2016
10. Jul 31, 2016

gleem

That is what it does but what is the ultimate usefulness of this. Why would I need this info?

11. Jul 31, 2016

ChrisVer

which is no.
In post 1 I write how it does what it's supposed to do.

12. Jul 31, 2016

gleem

I basically disagree with the method it uses for determining uncertainties of the net background.based on my assumptions and the information you presented.

13. Jul 31, 2016

Stephen Tashi

True, but relative errors aren't additive.

$\frac{ (X + Y) - (\mu_X + \mu_Y)}{\mu_X + \mu_Y} \ne \frac{X-\mu_X}{\mu_X} + \frac{Y - \mu_Y}{\mu_Y}$

So a basic question is how to interpret the $\delta$ notation. Is $\delta Ni$ the standard deviation of the i-th count or is it the standard deviation of the relative error in the i-th count ?
If even ChrisVer doesn't know exactly how the data is generated, we can't give good advice.

14. Jul 31, 2016

ChrisVer

I know how the data is generated [they are Monte Carlo]. But I find this of low importance to mention, since what I am refering to is a separate thing.
The numbers I am using are entries in several bins of some distribution and their corresponding systematic and statistical errors.
So if you have some backgrounds (like photons, jets, W etc) the total number and error of events in each bin is supposed to be the "mean" number of expected events of each background, and the error is supposed to be the sum in quadrature of the seperate errors. Not just the sum of errors.

I am not sure I understand this question very well, but I will try to explain what $\delta N_i$ stands for...
For the statistical uncertainty, the relative uncertainty is $\delta N_i = \frac{\sigma_i}{N_i}$ where $\sigma_i$ is the standard deviation (or the error) of the $N_i$ yield ($i$ runs over the several backgrounds, so i=photon, jet, W etc)... that means that if you repeatedly did the experiment, the number of events of the background $i$ you'd measure would be found (within 68% certainty) in the range $[N_i - \sigma_i , N_i +\sigma_i]$. The relative uncertainty instead of giving a Gaussian distribution around the mean $N_i$ and std $\sigma_i$ is giving a Gaussian distribution with mean 0 and standard deviation 1.
The question then becomes what can you say about the total background, given that you have $N_i,\sigma_i$ (or $\delta N_i$).
What I say:
The total number of events will obviously be the sum of the individual components: $N_T= \sum_i N_i$.
The standard deviation of the sum of the individual components is supposed to be $\sigma_T = \sqrt{\sum_i \sigma_i^2}$
Or the relative uncertainty $\delta N_T = \frac{\sigma_T}{N_T}= \frac{\sqrt{\sum_i N_i^2 \delta N_i^2} }{N_T}$

What I read in the code under question:
The total number of events is the sum of the individual components: $N_T= \sum_i N_i$.
The standard deviation of the sum of the individual components is supposed to be $\sigma_T = \sum_i \sigma_i$
Which gives the relative uncertainty $\delta N_T = \frac{\sigma_T}{N_T} = \frac{\sum_i N_i \delta N_i}{N_T}$