Undergrad How to Bin Data for Spectrum Fitting with Poisson Errors?

gleem · Dec 20, 2019

kelly0303 said:

Thank you for your reply. How should I do time normalization? Assuming I have 1 count in 0.01 seconds and 1 count in 0.02 seconds, should I normalize by taking 2 counts in 0.02 seconds and 1 count in 0.02 seconds, getting an average of 1.5 counts in 0.01 seconds? I am not sure if I can just multiply the counts by 2 (or any integer). That 1 count is the result of a Poisson process so I am not sure that having 1 count in 0.01 seconds implies 2 in 0.02 seconds.

This is a problem with low or zero counts. For zero counts does that mean actual zero counts or you didn't count long enough? In calculating a rate for very low counts for a short time you have a large error. One count in 0..01 seconds gives a rate of 100 cps ±100 cps as you know and adding such data only makes the total error worse. That is why I think adding counts in a bin and dividing by the total time for that bin is better(and more correct). The more points in a bin the more confidence you have in the numbers. IMO.

Your example of adding rates in a bin vs calculating a rate from the total bin counts divided by the total time you ignored that the latter method produces an average count rate in the bin. To properly compare you need to average the former by dividing the summed rates by the number of them. Thus in your example you should compare 55 to 40. But you say see they do not agree. So what is the problem? Well first you assumed that the two rates where partially concurrent. 100 cps happening at the same time 10 cps for 10 sec. In fact this is not true. They are sequential 100 cps for 10 sec followed by 10 cps for 20 sec. So the whole interval is 30 sec. How do you average this? you take the time weighted average. 100 cps for 1/3 of the time and 10 cps for 2/3 of the time. Summing these you get 40 cps. This is the same reasoning you would use for averaging the rate of a pulsed signal.

hutchphd · Dec 20, 2019

kelly0303 said:

Sorry for not coming back to that. I tried it, but it seems like I have some data points (I guess random fluctuations), for which the measurement time is very small (around 0.001 seconds) but I have 1 count in that time. This gives a huge rate, making that bin significantly bigger than the others, even if there is not way for it to be the case, physically. Should I somehow do some time weighting? Thank you!

I did wonder! Thanks.
If the data you show is representative, you could first just sum ~10 successive measurements without worrying about the error and report the total time and counts and average energy. Then do #12 as described. Any inaccuracy will be negligible (unless the data is very different).

kelly0303 · Dec 21, 2019

gleem said:

This is a problem with low or zero counts. For zero counts does that mean actual zero counts or you didn't count long enough? In calculating a rate for very low counts for a short time you have a large error. One count in 0..01 seconds gives a rate of 100 cps ±100 cps as you know and adding such data only makes the total error worse. That is why I think adding counts in a bin and dividing by the total time for that bin is better(and more correct). The more points in a bin the more confidence you have in the numbers. IMO.

Your example of adding rates in a bin vs calculating a rate from the total bin counts divided by the total time you ignored that the latter method produces an average count rate in the bin. To properly compare you need to average the former by dividing the summed rates by the number of them. Thus in your example you should compare 55 to 40. But you say see they do not agree. So what is the problem? Well first you assumed that the two rates where partially concurrent. 100 cps happening at the same time 10 cps for 10 sec. In fact this is not true. They are sequential 100 cps for 10 sec followed by 10 cps for 20 sec. So the whole interval is 30 sec. How do you average this? you take the time weighted average. 100 cps for 1/3 of the time and 10 cps for 2/3 of the time. Summing these you get 40 cps. This is the same reasoning you would use for averaging the rate of a pulsed signal.

Thanks a lot for this! So the approaches are equivalent, as long as I use a time weighted average, not just a simple average. What is the error on the rate in this case? If I add the counts and divide by time, I would get ##\sqrt{1200}/30##, as the rate error. If I do the time weighted average, I get ##\sqrt{1000}/10*1/3+\sqrt{200}/20*2/3 = (\sqrt{1000}+\sqrt{200})/30##. Which one is the right error? Thank you!

gleem · Dec 21, 2019

Your error calcs are incorrect. For rates the uncertainty is ##\sqrt{rate/time}##
EDIT : sorry you are correct here.

The error of the weighted average of the rates is ##\sqrt{(1/3^2)\sigma_{r1} ^{2} + (2/3)^2\sigma_{r2}^2 }## using propagation of errors for the function r₁/3 +2r₂/3which gives the same result as ##\sqrt{40/30}## =1.15 cps

kelly0303 · Dec 21, 2019

gleem said:

Your error calcs are incorrect. For rates the uncertainty is ##\sqrt{rate/time}##
EDIT : sorry you are correct here.

The error of the weighted average of the rates is ##\sqrt{(1/3^2)\sigma_{r1} ^{2} + (2/3)^2\sigma_{r2}^2 }## using propagation of errors for the function r₁/3 +2r₂/3which gives the same result as ##\sqrt{40/30}## =1.15 cps

Sorry is the first formula correct? Taking the square root of time would mean that the error on the rate doesn't have the same units as the rate itself.

hutchphd · Dec 22, 2019

Remember the rate has the time in it already so it works.
I agree with everything @gleem has said. Just to reiterate: because you do in fact want the "interval weighted" rate values when you bin (I screwed this up because I didn't understand the relative energy widths) the binned rate is just

N_total/t_total.

Similarly the weighted RMS error sum for the rate ends up being

√N_total/t_total

using the propagation of errors formula in #35. All self consistent...and simple

votingmachine · Dec 25, 2019

I don't know if it is cheating but in a similar data analysis, I built a spreadsheet that let me choose the bin size and then looked at the results. I was analyzing microarray data and there if you look at signal strength vs the number if occurrences it has to be placed in bins. There are very few with a very strong signal, and very many with a very weak signal. Each signal is unique, but if you sort it into bins, then you can see the difference between how often a signal response of 0-50 occurs, then how often 50-100, then 100-150, etc.

I would choose the bin size based on the data. It might be cheating but sometimes a change in the bin size changed the "noise" in the data. I had a spreadsheet where I imported the data, hanged the bin size in a single cell, which then calculated and plotted the results.

You should be able to sort your data, then apply a set of columns that apply the bin size test based on a separate cell. I don't think it matters whether the plat you generate is vs the midpoint, but i would use that.

mfb · Dec 26, 2019

Adding counts and measurements times in a bin is the right approach, it is the best you can do with the given information. Making a bin includes the assumption that the value doesn't change much within the bin (otherwise the bin is too wide), in that case you can just add counts and times. If you want to be fancy with the x-values you can take the weighted average of the x-values going into that bin as x-value (with the measurement times as weights), but within bins as fine as in your example this shouldn't matter.
The calculations of the uncertainties in the previous posts are good, too.

Bin the measurements after sorting them by increasing wavenumber, of course.

A direct one-dimensional likelihood fit to 10,000 or even 100,000 data points shouldn't be an issue, by the way, unless your degrees of freedom are really excessive.

Undergrad How to Bin Data for Spectrum Fitting with Poisson Errors?

Similar threads

High School A Little Probability Puzzle

Undergrad A variant of the Monty Hall problem

Undergrad Please Explain (actually explain) The Monty Hall Problem

Undergrad What Are the Axioms of Fuzzy Logic and How Do They Extend Boolean Algebra?

High School How Rare Is Low Smartphone Usage Among Metro Travelers in Japan?

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers