• A
• kelly0303
In summary, the experimenter is trying to understand why the expected number of events in a given scenario (x=200pi) is much lower than what they observe in practice. They assume there is an error in the expression for the probability, and that the uncertainty is on the number of events counted. When they increase the number of measured events, the uncertainty decreases, but it does not stay at the same value.
kelly0303
Hello! I am trying to simulate the following experiment. It is a counting experiment where the probability of getting an event after a given trial is given by:

$$P = 2\left(\frac{a}{x}+\frac{b}{c}\right)^2\left(1-\cos\left(\frac{\pi x}{x_0}\right)\right)$$

where, ##a = 4\pi##, ##b = 2000\pi##, ##c = 20000\pi## and ##x_0 = 200\pi## are fixed parameters (we can assume there is no uncertainty on them), and ##x## is an experimental parameter whose value from one measurement instance to the next changes based on a gaussian with mean ##200\pi## and standard deviation ##20\pi##. In my case this probability is about ##0.058##. In my simulations I sample ##x## 1000 times and compute P for each of these, then I sum up all the P values, which is basically the number of events I would see in the experiment (0.058 probability x 1000 experiments ~ 58 measured event), call it N. The uncertainty on N, in my simulations, I assume to be ##\sqrt{N}##. I do this 100 times and in the end I get the plot attached below, showing the number of counts for each of the 100 simulations. The black line is the expected number of counts for ##x = x_0##. I am confused about the fact that I am constantly under-estimating the true value and by the fact that I am so much closer to the true value than expected from the errors. Can someone help me understand what am I doing wrong? Thank you!

Hi,

To get a feeling for the values, I simplified your expression for P by dividing numerators and denominators by ##x_0## to $$P = \left ({4/(200\pi)\over y} + {1\over 10}\right )^2 \Bigl(1 - cos(\pi y)\Bigr)$$ where now ##\ y=x/x_0\ ## has a gaussian distribution with average 1 and ##\sigma## 0.1

If there is no error in your expression, P varies from 0.0583 to 0.0545 when ##x## varies from ##180\pi## to ##220\pi##.

Perhaps nothing wrong with all that.

But I have a hard time following
kelly0303 said:
In my simulations I sample x 1000 times and compute P for each of these, then I sum up all the P values, which is basically the number of events I would see in the experiment (0.058 probability x 1000 experiments ~ 58 measured event), call it N. The uncertainty on N, in my simulations, I assume to be ##\sqrt N##.
Isn't ##\sqrt {N\over1000} ## the estimated uncertainty ?

##\ ##

##P## is not symmetric wrt ##x = 200 \pi##, but is skewed towards lower values. Normally distributed ##x## centered around ##200 \pi## will thus lead to lower average ##P## than ##P(x = 200\pi)##, so this is normal (pun intended).

Is the equation for ##P## correct? Is it really ##\pi x## in the numerator in the cosine, not simply ##x##?

hutchphd

BvU said:
Hi,

To get a feeling for the values, I simplified your expression for P by dividing numerators and denominators by ##x_0## to $$P = \left ({4/(200\pi)\over y} + {1\over 10}\right )^2 \Bigl(1 - cos(\pi y)\Bigr)$$ where now ##\ y=x/x_0\ ## has a gaussian distribution with average 1 and ##\sigma## 0.1

If there is no error in your expression, P varies from 0.0583 to 0.0545 when ##x## varies from ##180\pi## to ##220\pi##.

Perhaps nothing wrong with all that.

But I have a hard time following

Isn't ##\sqrt {N\over1000} ## the estimated uncertainty ?

##\ ##
I definitely don't know much about statistics, but usually when I was assigning an uncertainty on a given number of counts (for example on a bin in a histogram and assuming more than ~10 counts or so), I was using ##\sqrt{N}## as the statistical uncertainty. So in this case, after 1000 measurements I would have ~58 counts, so I assumed the uncertainty would be directly on the counts that I physically measured i.e. ##\sqrt{58}## and the value of the probability for example would be ~##0.058 \pm \sqrt{58}/1000##. Why would I divide by 1000 under the integral and then once again when moving from counts to actual probability?

Test it out: instead of 1000 do 4000 or 9000 and see what happens to the scatter

BvU said:
Test it out: instead of 1000 do 4000 or 9000 and see what happens to the scatter
I am attaching below the result for 10000 and 100000 events per experiment. It seems like the more events I have the more I underestimate the uncertainty... Now I am really confused

DrClaude said:
##P## is not symmetric wrt ##x = 200 \pi##, but is skewed towards lower values. Normally distributed ##x## centered around ##200 \pi## will thus lead to lower average ##P## than ##P(x = 200\pi)##, so this is normal (pun intended).

Is the equation for ##P## correct? Is it really ##\pi x## in the numerator in the cosine, not simply ##x##?
But why are the uncertainties so weird, and why they start to underestimate the value as I go to higher N. Also how should I deal with this shift in one direction in practice?

The equation is correct, I need to get ##2## in the ideal case for the term in the second bracket.

kelly0303 said:
In my simulations I sample x 1000 times
using some library function ?

kelly0303 said:
then I sum up all the P values, which is basically the number of events I would see
This is the issue.

You need to not sum up the P values. You need to sample a Bernoulli RV with each P value. In the figure below the blue dots are using the "sum up all the P values" approach, while the orange dots sample a Bernoulli RV with each P value.

BvU
kelly0303 said:
It is a counting experiment where the probability of getting an event after a given trial is given by
Can you explain this ? A probability that depends on a sample value ?

##\ ##

Dale said:
This is the issue.

You need to not sum up the P values. You need to sample a Bernoulli RV with each P value. In the figure below the blue dots are using the "sum up all the P values" approach, while the orange dots sample a Bernoulli RV with each P value.

View attachment 324884
Thanks a lot! I see, I initially wanted to do that, but it seemed more complicated to implement and I went for the sum. But why are they not equivalent (in the limit of large number of events)?

EDIT: That still seems to underestimate the true value for high number of event. I am attaching below what I am getting for ##10^6## event per point. I am still not sure why this is happening. Shouldn't my uncertainty account for that, even if I might underestimate the true value?

Last edited:
kelly0303 said:
That still seems to underestimate the true value for high number of event.
I am not sure where your “true value” comes in.

Dale said:
I am not sure where your “true value” comes in.
That is achieved when ##x = x_0##. Basically it would be the value I would measure all the time if I had no uncertainty on ##x##.

kelly0303 said:
That is achieved when ##x = x_0##. Basically it would be the value I would measure all the time if I had no uncertainty on ##x##.
If you look at ##P## you will see that its range is substantially asymmetric about ##0.058##. It is close to the maximum, so there are very few values of ##x## that produce ##P>0.058## and there are many values of ##x## that produce ##P<0.058##. So the overall result will be naturally biased low.

Dale said:
If you look at ##P## you will see that its range is substantially asymmetric about ##0.058##. It is close to the maximum, so there are very few values of ##x## that produce ##P>0.058## and there are many values of ##x## that produce ##P<0.058##. So the overall result will be naturally biased low.
I see, but how can I properly estimate the uncertainty such that the result is still consistent with the true value, given the errorbars? Shouldn't that be the case for a properly estimated value? Or how else I should proceed with my analysis given this setup such that I can extract experimentally the correct value i.e. the true probability?

kelly0303 said:
how else I should proceed with my analysis given this setup such that I can extract experimentally the correct value
You may be able to use a Bayesian approach, although I have never done one quite like this. So I am not certain with the details, but that would be the direction I would go.

Edit: I think maybe a GLM would work.

Last edited:
Dale said:
You may be able to use a Bayesian approach, although I have never done one quite like this. So I am not certain with the details, but that would be the direction I would go.

Edit: I think maybe a GLM would work.
Thank you! I am still confused about the result I am getting. Why is applying the normal error propagation formula not working in this case? Especially that we have only one variable, so no issues with correlations between variables, I expected that to work. Are there cases where that formula is simply not usable? I assumed that is quite universal.

kelly0303 said:
Why is applying the normal error propagation formula not working in this case?
Sorry, I missed this. Where did you apply the propagation of uncertainty?

In general, the standard uncertainty propagation formula uses a first order Taylor series approximation. So it will only be exact for a linear function. This problem P is quite nonlinear. Probably you would need at least a second order expansion.

Last edited:

• Set Theory, Logic, Probability, Statistics
Replies
22
Views
1K
• Set Theory, Logic, Probability, Statistics
Replies
12
Views
2K
• Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
• Set Theory, Logic, Probability, Statistics
Replies
3
Views
482
• Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
• Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
• Set Theory, Logic, Probability, Statistics
Replies
5
Views
2K
• Set Theory, Logic, Probability, Statistics
Replies
7
Views
1K
• Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
• Set Theory, Logic, Probability, Statistics
Replies
13
Views
1K