A Confused about magnitude of error

kelly0303 · Apr 13, 2023

Hello! I am trying to simulate the following experiment. It is a counting experiment where the probability of getting an event after a given trial is given by:

$$P = 2\left(\frac{a}{x}+\frac{b}{c}\right)^2\left(1-\cos\left(\frac{\pi x}{x_0}\right)\right)$$

where, ##a = 4\pi##, ##b = 2000\pi##, ##c = 20000\pi## and ##x_0 = 200\pi## are fixed parameters (we can assume there is no uncertainty on them), and ##x## is an experimental parameter whose value from one measurement instance to the next changes based on a gaussian with mean ##200\pi## and standard deviation ##20\pi##. In my case this probability is about ##0.058##. In my simulations I sample ##x## 1000 times and compute P for each of these, then I sum up all the P values, which is basically the number of events I would see in the experiment (0.058 probability x 1000 experiments ~ 58 measured event), call it N. The uncertainty on N, in my simulations, I assume to be ##\sqrt{N}##. I do this 100 times and in the end I get the plot attached below, showing the number of counts for each of the 100 simulations. The black line is the expected number of counts for ##x = x_0##. I am confused about the fact that I am constantly under-estimating the true value and by the fact that I am so much closer to the true value than expected from the errors. Can someone help me understand what am I doing wrong? Thank you!

Screenshot 2023-04-14 at 12.55.42 AM.png

BvU · Apr 14, 2023

Hi,

To get a feeling for the values, I simplified your expression for P by dividing numerators and denominators by ##x_0## to $$
P = \left ({4/(200\pi)\over y} + {1\over 10}\right )^2 \Bigl(1 - cos(\pi y)\Bigr)$$ where now ##\ y=x/x_0\ ## has a gaussian distribution with average 1 and ##\sigma## 0.1

If there is no error in your expression, P varies from 0.0583 to 0.0545 when ##x## varies from ##180\pi## to ##220\pi##.

Perhaps nothing wrong with all that.

But I have a hard time following

kelly0303 said:

In my simulations I sample x 1000 times and compute P for each of these, then I sum up all the P values, which is basically the number of events I would see in the experiment (0.058 probability x 1000 experiments ~ 58 measured event), call it N. The uncertainty on N, in my simulations, I assume to be ##\sqrt N##.

Isn't ##\sqrt {N\over1000} ## the estimated uncertainty ?

##\ ##

DrClaude · Apr 14, 2023

##P## is not symmetric wrt ##x = 200 \pi##, but is skewed towards lower values. Normally distributed ##x## centered around ##200 \pi## will thus lead to lower average ##P## than ##P(x = 200\pi)##, so this is normal (pun intended).

Is the equation for ##P## correct? Is it really ##\pi x## in the numerator in the cosine, not simply ##x##?

BvU · Apr 14, 2023

kelly0303 · Apr 14, 2023

BvU said:

Hi,

To get a feeling for the values, I simplified your expression for P by dividing numerators and denominators by ##x_0## to $$
P = \left ({4/(200\pi)\over y} + {1\over 10}\right )^2 \Bigl(1 - cos(\pi y)\Bigr)$$ where now ##\ y=x/x_0\ ## has a gaussian distribution with average 1 and ##\sigma## 0.1

If there is no error in your expression, P varies from 0.0583 to 0.0545 when ##x## varies from ##180\pi## to ##220\pi##.

Perhaps nothing wrong with all that.

But I have a hard time following

Isn't ##\sqrt {N\over1000} ## the estimated uncertainty ?

##\ ##

I definitely don't know much about statistics, but usually when I was assigning an uncertainty on a given number of counts (for example on a bin in a histogram and assuming more than ~10 counts or so), I was using ##\sqrt{N}## as the statistical uncertainty. So in this case, after 1000 measurements I would have ~58 counts, so I assumed the uncertainty would be directly on the counts that I physically measured i.e. ##\sqrt{58}## and the value of the probability for example would be ~##0.058 \pm \sqrt{58}/1000##. Why would I divide by 1000 under the integral and then once again when moving from counts to actual probability?

BvU · Apr 14, 2023

Test it out: instead of 1000 do 4000 or 9000 and see what happens to the scatter

kelly0303 · Apr 14, 2023

BvU said:

Test it out: instead of 1000 do 4000 or 9000 and see what happens to the scatter

I am attaching below the result for 10000 and 100000 events per experiment. It seems like the more events I have the more I underestimate the uncertainty... Now I am really confused

Screenshot 2023-04-14 at 10.00.46 AM.png

Screenshot 2023-04-14 at 10.00.53 AM.png

kelly0303 · Apr 14, 2023

DrClaude said:

##P## is not symmetric wrt ##x = 200 \pi##, but is skewed towards lower values. Normally distributed ##x## centered around ##200 \pi## will thus lead to lower average ##P## than ##P(x = 200\pi)##, so this is normal (pun intended).

Is the equation for ##P## correct? Is it really ##\pi x## in the numerator in the cosine, not simply ##x##?

But why are the uncertainties so weird, and why they start to underestimate the value as I go to higher N. Also how should I deal with this shift in one direction in practice?

The equation is correct, I need to get ##2## in the ideal case for the term in the second bracket.

BvU · Apr 14, 2023

kelly0303 said:

In my simulations I sample x 1000 times

using some library function ?

Dale · Apr 14, 2023

kelly0303 said:

then I sum up all the P values, which is basically the number of events I would see

This is the issue.

You need to not sum up the P values. You need to sample a Bernoulli RV with each P value. In the figure below the blue dots are using the "sum up all the P values" approach, while the orange dots sample a Bernoulli RV with each P value.

BvU · Apr 14, 2023

kelly0303 said:

It is a counting experiment where the probability of getting an event after a given trial is given by

Can you explain this ? A probability that depends on a sample value ?

##\ ##

kelly0303 · Apr 14, 2023

Dale said:

This is the issue.

You need to not sum up the P values. You need to sample a Bernoulli RV with each P value. In the figure below the blue dots are using the "sum up all the P values" approach, while the orange dots sample a Bernoulli RV with each P value.

View attachment 324884

Thanks a lot! I see, I initially wanted to do that, but it seemed more complicated to implement and I went for the sum. But why are they not equivalent (in the limit of large number of events)?

EDIT: That still seems to underestimate the true value for high number of event. I am attaching below what I am getting for ##10^6## event per point. I am still not sure why this is happening. Shouldn't my uncertainty account for that, even if I might underestimate the true value?

Dale · Apr 14, 2023

kelly0303 said:

That still seems to underestimate the true value for high number of event.

I am not sure where your “true value” comes in.

kelly0303 · Apr 14, 2023

Dale said:

I am not sure where your “true value” comes in.

That is achieved when ##x = x_0##. Basically it would be the value I would measure all the time if I had no uncertainty on ##x##.

Dale · Apr 14, 2023

kelly0303 said:

That is achieved when ##x = x_0##. Basically it would be the value I would measure all the time if I had no uncertainty on ##x##.

If you look at ##P## you will see that its range is substantially asymmetric about ##0.058##. It is close to the maximum, so there are very few values of ##x## that produce ##P>0.058## and there are many values of ##x## that produce ##P<0.058##. So the overall result will be naturally biased low.

kelly0303 · Apr 15, 2023

Dale said:

If you look at ##P## you will see that its range is substantially asymmetric about ##0.058##. It is close to the maximum, so there are very few values of ##x## that produce ##P>0.058## and there are many values of ##x## that produce ##P<0.058##. So the overall result will be naturally biased low.

I see, but how can I properly estimate the uncertainty such that the result is still consistent with the true value, given the errorbars? Shouldn't that be the case for a properly estimated value? Or how else I should proceed with my analysis given this setup such that I can extract experimentally the correct value i.e. the true probability?

Dale · Apr 15, 2023

kelly0303 said:

how else I should proceed with my analysis given this setup such that I can extract experimentally the correct value

You may be able to use a Bayesian approach, although I have never done one quite like this. So I am not certain with the details, but that would be the direction I would go.

Edit: I think maybe a GLM would work.

kelly0303 · Apr 17, 2023

Dale said:

You may be able to use a Bayesian approach, although I have never done one quite like this. So I am not certain with the details, but that would be the direction I would go.

Edit: I think maybe a GLM would work.

Thank you! I am still confused about the result I am getting. Why is applying the normal error propagation formula not working in this case? Especially that we have only one variable, so no issues with correlations between variables, I expected that to work. Are there cases where that formula is simply not usable? I assumed that is quite universal.

Dale · Apr 17, 2023

kelly0303 said:

Why is applying the normal error propagation formula not working in this case?

Sorry, I missed this. Where did you apply the propagation of uncertainty?

In general, the standard uncertainty propagation formula uses a first order Taylor series approximation. So it will only be exact for a linear function. This problem P is quite nonlinear. Probably you would need at least a second order expansion.

A Confused about magnitude of error

Similar threads

B A Little Probability Puzzle

I A variant of the Monty Hall problem

I What Are the Axioms of Fuzzy Logic and How Do They Extend Boolean Algebra?

I Please Explain (actually explain) The Monty Hall Problem

B How Rare Is Low Smartphone Usage Among Metro Travelers in Japan?

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers