Subtracting background from data

kelly0303 · Apr 8, 2020

Hello! I have some counts measured at fixed points, both for background only and signal+background. Say that for a given x I have 16 counts with background only and 100 for background+signal. So the background is ##16\pm 4## and for the signal+background ##100\pm 10## (assuming poisson statistics). If I want to subtract the background, I get 84 counts. I am not sure what error to put on this number. If I use Poisson, I would have ##84 \pm \sqrt{84}##. If I use the error propagation for taking the difference I get an error of ##\sqrt{4^2+10^2}=\sqrt{116}## so I get ##84 \pm \sqrt{116}##. Which one of these is the right way?

BvU · Apr 8, 2020

The second way is the right way. Easier to convince yourself that this is correct if signal and background are closer together.
All under condition that errors are independent.

Dale · Apr 8, 2020

kelly0303 said:

If I use Poisson, I would have ##84 \pm \sqrt{84}##

But why would the difference of two Poisson variables be a Poisson variable?

Stephen Tashi · Apr 8, 2020

kelly0303 said:

Which one of these is the right way?

It's possible to present information in terms of vague concepts like "errors" or "uncertainties" and obey certain traditional methods for computing such quantities. However, if publishing statistics is going to be a frequent part of your work, I suggest you get a clear idea of the basic scenario for statistical estimation - or use a more precise vocabulary if you already have a clear idea.

Numbers that can be calculated from sample data are not the parameters of distribution from which the sample is taken. Instead, they can only be estimators of those parameters. Estimators are not right or wrong in the same sense that a proposed solution to the equation ##2x + 3 = 5## is right or wrong. For most distributions, the odds are that any estimator will be wrong in that sense. Considering that samples are random variables, estimators are also random variables. Estimators have probability distributions. The goodness and/or badness of an estimator is mathematically defined by the properties of its distribution. The notion that an estimator is "right" may be interpreted as saying it has certain desirable properties (e.g. unbiased, minimum variance, maximum liklihood etc.) - or it can be interpreted as saying that the person using the word "right" doesn't understand the concept of estimators!

I think what @Dale alludes-to is the distinction between a poission distribution and a Skellam distribution
https://en.wikipedia.org/wiki/Skellam_distribution.

Rephrasing the questions in more precise language:

If I want to subtract the background, I get 84 counts.

Let ##S_b =## the observed counts in the sample of signal+background. Let ##B = ## the observed counts in the sample of background-only. You intend to estimate the mean of the distribution of the signal-only as ##\hat{\mu} = S_b - B##.

##\hat{\mu} = S_b - B## is an intuitively pleasing idea for an estimator. We should check what properties this estimator has. ( What would we do if ##S_b -B < 0##? That would illustrate Dale's point.)

I am not sure what error to put on this number.

The distribution of ##\hat{\mu}## has a standard deviation ##\sigma_{\hat{\mu}}## You want a good estimator for it. (i.e. This is not a question about the standard deviation of the distribution of signal-only. Instead it's a question about the standard deviation of an estimator for the mean of that distribution.)

If I use Poisson, I would have ##84 \pm \sqrt{84}##.

One possible estimator for the standard deviation of ##\hat{\mu}## is ##\sqrt{|S_b - B|}##

If I use the error propagation for taking the difference I get an error of ##\sqrt{4^2+10^2}=\sqrt{116}## so I get ##84 \pm \sqrt{116}##.

Another possible estimator for the standard deviation of ##\hat{\mu}## is ##\sqrt{ S_b + B}##.

Which one of these is the right way?

better phrased as: How do the properties of these two estimators compare?

----------
Most readers of a published estimate for the standard deviation of ##\hat{\mu}## (or any other statistical estimate!) will interpret the value as if it were an estimate for a normal distribution. So they will think that deviations of one sigma , two sigma etc. have a probability corresponding what's true for a normal distribution. Often this way of thinking is approximately correct. However, it's worth checking whether thinking this way is approximately correct for poisson and Skellam distributions.

hutchphd · Apr 8, 2020

Of course in practice one would not design an experiment where the subtracted error was substantially affected by the error in background. One would take a sufficiently robust measurement of the background to make that error small enough to ignore. Otherwise it gets messy (and there is far too much work involved!).

Stephen Tashi · Apr 8, 2020

hutchphd said:

Of course in practice one would not design an experiment where the subtracted error was substantially affected by the error in background. One would take a sufficiently robust measurement of the background to make that error small enough to ignore.

For that situation, a mathematical model is that ##S_b## and ##B## are not independent random samples. Instead, the sampling process is to take indepedent random samples ##S'_b## and ##B'##. Then if ##S'_b - B' \le 0## discard those samples and take another pair of samples to get different values to use for ##S_b, B##.

If course, in practice, you'd discard ##S'_b## and ##B'## and try to find out what was wrong with you equipment!

hutchphd · Apr 8, 2020

Perhaps I misunderstand.
Often one can integrate the background counts over a much longer time allowing a much more precise measure of the mean of the background with arbitrarilly small error.
If the background and measurement sources are independent (say an external background source) and each Poisson distributed, the distribution of the sum (S_b+B) for the actual measurement will also be Poisson, and the error in this total measurement will be √(S_b+B).
This number will also be the absolute error in the "corrected (subtracted) counts" because the the background average has been measured to sufficient accuracy.

Stephen Tashi · Apr 8, 2020

hutchphd said:

Perhaps I misunderstand.
Often one can integrate the background counts over a much longer time allowing a much more precise measure of the mean of the background with arbitrarilly small error.

The mathematical model of that is that you continue counting until ##S_b >>B##. Technically, this makes the samples ##S_b## and ##B## dependent, but treating them as independent may be approximately ok.

the error in this total measurement will be √(S_b+B).

Better said as: ##\sqrt{ S_b + B}## is an estimator for the standard deviation of ##S_b + B##.

gleem · Apr 8, 2020

In this particular case, it would seem that the OP counts are sufficiently high so as to make the Gaussian approximation of the Poisson Distribution valid obviating the need to go no further than to take the uncertainty as ## \sqrt{N_{signal} +N_{bg}} ##. Any way the derivation of the propagation of error formula makes no assumptions on the probability distributions.

hutchphd · Apr 8, 2020

In the Gaussian case with simple subtraction of the background, one would still need to also include any error in the background measurement when taking the difference. So the error in the background alone would be added rms to this result.

Stephen Tashi · Apr 9, 2020

gleem said:

Any way the derivation of the propagation of error formula makes no assumptions on the probability distributions.

Yes, the derivation of the propagation of error formula for the sum or difference of random variables is a statement about how the population standard deviation of the distribution of the sum and difference is a function of the population standard deviations of independent random variables involve in the sum or difference. But it is not a result that tells how a "good" estimator of that standard deviation must be related to estimators of the distributions involved in the sum or difference.

If the question is "How does the propagation of errors formula say to work the problem?" then I agree that it says to compute the "uncertainty" as ##\sqrt{ S_b + B}## because the propagation of errors formula is simply a procedure. It makes no claim that its answer is an optimal estimate (in any of the various ways of defining optimal.)

However the question in the OP seems to be: "Which is a better (in some sense) estimator of the standard deviation of ##\hat{\mu}##, is it ##\sqrt{ |S_b - B|}## or ##\sqrt{S_b + B}## or some third possibility?

It wouldn't surpise me if ##\sqrt{S_b + B}## wins, but can we justify this mathematically? Or can we only advocate it because of traditions?

gleem · Apr 9, 2020

There is mathematical validity in the propagation of uncertainty formalism particularly for uncertainty in the vicinity of a point on a well behaved function one in which the first derivatives are relatively constant in the range of the uncertainly.

I am surprised at the OP dilemma. Both measurements had assumed statistical uncertainties that are uncorrelated. The experimenter knows from sample to sample the readings vary and the background varies. By considering that the ##\sqrt{84}## could be the proper estimate of uncertainty assume that the statistical variation of the BG has no effect on the result of sample -BG result.

Stephen Tashi · Apr 9, 2020

gleem said:

There is mathematical validity in the propagation of uncertainty formalism particularly for uncertainty in the vicinity of a point on a well behaved function one in which the first derivatives are relatively constant in the range of the uncertainly.

I completely agree. However, that mathematics concerns parameters of the distributions involved, not estimators for those parameters. Parameters are constants. Estimators are random variables.

The analogous propositions for estimators could go something like this: If ##A## and ##B## are independent random variables and ##E_A, E_B## are respectively unbiased and minimum variance estimators for standard deviations of ##A,B## then ##\sqrt{ E_A^2 + E_B^2}## is an unbiased minimum variance estimator for the standard deviation of random variable ##A - B##.

Is that conjecture true? Even if it is, it is very difficult to find unbiased estimators for the standard deviation. In particular, taking the square root of a unbiased estimator of the variance doesn't work. (e.g. https://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation )

The original post asks which of two estimators is "the right way ". Perhaps the definition of "right way" is sociological and not mathematical. If the goal is to put numbers in a paper that won't arouse the ire of referees for a journal, I agree that applying the propagation-of-error formula is a safe and traditional method for that purpose.

hutchphd · Apr 9, 2020

As usual my point was a bit more prosaic.
The estimator for E_A in this case is √(S_b+B) and so the rms estimator for the difference would in fact be √(S_b+2B) unless the background alone is measured much more precisely which is usually possible..

BvU · Apr 9, 2020

Yes. As suggested in post #2: ##\ \ S+B=100,\ \ B = 16##

hutchphd · Apr 9, 2020

kelly0303 said:

If I use the error propagation for taking the difference I get an error of √42+102=√11642+102=116\sqrt{4^2+10^2}=\sqrt{116}

And quite pleasingly the Skellam distribution for the difference gives exactly that result also. So everyone is agreed perhaps for various reasons.

Stephen Tashi · Apr 10, 2020

hutchphd said:

And quite pleasingly the Skellam distribution for the difference gives exactly that result also.

If we assume the standard deviations of the two poisson random variables involved in the Skellam distribution are exactly 4 and 10 then the standard deviation of the Skellam distribution is exactly ##\sqrt{116}##. However, this isn't a result that says anything about the standard deviation of an estimator the mean of the Skellam distribution. If a paper publishes the information ##84 \pm \sqrt{116}## this give an estimate ##84## and the standard deviation of the estimator that made it.

Pretending that the data tells us the exact values of for the parameters of the distribution is an interesting daydream, but it is unlikely to happen. The "propagation of errors" procedure calculates as if this daydream came true.

hutchphd · Apr 10, 2020

Point is well taken and I believe I understand the nuance. But the fact that the different estimators are mutually consistent is better than the alternative to me.

Subtracting background from data

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad The problem of points

Graduate Expected numbers of cards of a last color remaining

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect