Calculating Bin Center for Rate vs Energy Plot

BillKet · Oct 6, 2019

Hello! I want to plot a rate vs energy plot, but I am not sure how assign the bin center. Assume I have 2 data points in the bin (so 2 energy values), ##E_1## and ##E_2## and the first one has 100 counts measured over 10 seconds (so a rate of 10 counts/s) while the second one has 2000 counts measured over 100 seconds (so a rate of 20 counts/s). If I combine them, I get a rate of 2100 counts per 110 seconds i.e. 19.1 counts per second. But I am not sure what energy to assign to this bin. Would it be ##(E_1+E_2)/2##? Or a time weighted average ##(t_1E_1+t_2E_2)/(t_1+t_2)##? Or something else (maybe related to the error on each bin i.e. ##\sqrt{N}##? Thank you!

BvU · Oct 7, 2019

Hi,

BillKet said:

I want to plot a rate vs energy plot

Doesn't that mean you want to keep the two energies separated ?

BillKet said:

19.1 counts per second

That's an unweighted averageThe math if you insist on merging is simple:

you have ##r_1 = {100\pm\sqrt{ 100}\over 10}## events with ##E_1## per unit of time and ##r_2 = {2000\pm \sqrt{2000}\over 100}## with ##E_2##, so $$ r_E = {\sum {r_i\over \sigma_{r_i}^2} \over \sum {1\over\sigma_{r_i}^2} }$$so a bit more than 19.1 (19.52##\pm## 0.02)
(In other words: the value and the error are mainly determined by the more accurate counting rate that weighs 20 times heavier).

For ##E## you get $$E={\sum {r_i E_i\over \sigma_{r_i}^2} \over \sum {r_i\over \sigma_{r_i}^2}}$$ In other words : as above

Stephen Tashi · Oct 7, 2019

BillKet said:

I want to plot a rate vs energy plot,

Assume I have 2 data points in the bin (so 2 energy values), ##E_1## and ##E_2##

But I am not sure what energy to assign to this bin.

I don't know what physical interpretation your graph has. Let's suppose there is a smooth function ##R(x)## that gives "rate" ##R## as a function of energy ##x##. You want to accurately represent that function on some interval ("bin") of energy given by ##a \le E_1 \lt E_2 \le b##. You intend to plot one point ##(x_0, R(x_0))## for some point ##x_0## in the interval ##[a,b]##.

With only that goal in mind, it doesn't matter what energy ##x_0## in ##[a,b]## you select as long as you plot the correct ##R(x_0)## that goes with that energy.

@BvU has suggested a method that treats both the rates and the energies in the data as realizations of random variables. @BvU seems to recognize problems in physics from scanty descriptions, so perhaps he guessed what you are doing.

I myself don't understand your data. For example, are the energies ##E_1## and ##E_2## precisely measured values?

(so 2 energy values), E1 and E2 and the first one has 100 counts measured over 10 seconds

Was the energy E1 constant over the 10 seconds? Or is E1 an average energy for the 10 second interval?

gleem · Oct 7, 2019

Consider exact data that varies linearly in the interval of the two bins. I think the method of determining an energy to assign to the merged bins is best determined by weighted averaging of the energies wrt to the respective bin widths. The values of the data should not influence the averaging. Of course you must assure that each bin is measure in the same units. The sum of the area of each bin must equal the area of the merged bin. where the area is the product of the bin width and the value of the data at the bin energy.

E_{new bin} = (E_iΔE_i + E_i+1ΔE_i+1/( ΔE_i+ΔE_i+1)

gleem · Oct 8, 2019

Let me elaborate. Suppose we can say r = aE +b describes the data variation in the region of the bins to be merged where r is the rate per unit energy interval, a and b are constant and E is the energy at the bin position. The actual data is r⋅ΔE = area of a bin

$$Area_{r1}=r_{}{_{1}}\Delta E_{1}$$
$$Area_{r2}=r_{}{_{2}}\Delta E_{2} $$
$$Area {_{r1}}+Area {_{r2}} = r_{1}\Delta E_{1}+r_{2}\Delta E_{2}$$
$$Area_{r1+r2} = (aE_{1}+b)\Delta E_{1}+(aE_{2}+b)\Delta E_{2}
= a(E_{1}\Delta E_{1} +E_{2}\Delta E_{2} ) +b(\Delta E_{1}+\Delta E_{2})$$
$$Area_{r1+r2}/(\Delta E_{1}+\Delta E_{2}) = a(E_{1}\Delta E_{1} +E_{2}\Delta E_{2})/(\Delta E_{1}+\Delta E_{2})+b
$$

The coefficient of a is then the value of the energy associated with the merged bins.

Energy of the merged bins = ##E_{1}\Delta E_{1} +E_{2}\Delta E_{2})/(\Delta E_{1}+\Delta E_{2}##

BillKet · Oct 9, 2019

BvU said:

Hi,
Doesn't that mean you want to keep the two energies separated ?

That's an unweighted averageThe math if you insist on merging is simple:

you have ##r_1 = {100\pm\sqrt{ 100}\over 10}## events with ##E_1## per unit of time and ##r_2 = {2000\pm \sqrt{2000}\over 100}## with ##E_2##, so $$ r_E = {\sum {r_i\over \sigma_{r_i}^2} \over \sum {1\over\sigma_{r_i}^2} }$$so a bit more than 19.1 (19.52##\pm## 0.02)
(In other words: the value and the error are mainly determined by the more accurate counting rate that weighs 20 times heavier).

For ##E## you get $$E={\sum {r_i E_i\over \sigma_{r_i}^2} \over \sum {r_i\over \sigma_{r_i}^2}}$$ In other words : as above

Thank you for your reply! I am a bit confused about the way you calculate the rate when rebinning. Why do you use the error in that calculation. To use a toy example numbers: if I know that I have 100 counts in 10 seconds between 0 and 50 (in some arbitrary energy units) and 2000 counts in 100 seconds between 50 and 100, then I know for sure, based on my experiment, that I have 2100 events in 110 seconds between 0 and 100. Why can't I just assign to that 0 to 100 bin a rate of 2100/110? What I am trying to say is that, if I were to decide from the beginning to use bins of 100, that's what I would get: 2100/100. So based on your formula, if I decided to use bins of size 50 and then double the size of the bin, would give me a different result than when I would do directly bins of size 100. That doesn't really make sense to me. Am I missing something?

BvU · Oct 9, 2019

Ah, I should have heeded @Stephen Tashi 's mildly formulated (thanks steve!) criticism: my interpretation of #1 was that you had exact (i.e. relatively accurate) energies ##E_1## and ##E_2## somewhere in the spectrum, where you measured counting rates ##r_1## and ##r_2##, and there is only one ##E## in the phenomenon. That is indeed a (few) assumption(s) too many. Sorry.

If it's just a matter of merging two adjacent bins with centers ##E_1## and ##E_2## then your new counting rate is indeed ##2100/110## at ##E_1+E_2\over 2##. But why do such a thing and throw away information ? ##100\pm 10## counts/unit t next to ## 200\pm 5## is reason for alarm. Merging the bins sweeps that under the rug.

gleem · Oct 9, 2019

BillKet said:

if I know that I have 100 counts in 10 seconds between 0 and 50 (in some arbitrary energy units) and 2000 counts in 100 seconds between 50 and 100, then I know for sure, based on my experiment, that I have 2100 events in 110 seconds between 0 and 100. Why can't I just assign to that 0 to 100 bin a rate of 2100/110?

You must be careful. Look at the example where the rates in the two bins are actually the same. You count for 100 sec in one and 10 sec in the other yielding say 2000 in one and 200 in the other if you combine the counts into 2200 and the times to 110 you get at combined rate of 20 cps which is great. now if we repeat for a count of 2000 in 100 sec and 100 in 10 sec but this rate is not the same as the first bin and combine the two 2200 counts in 110 sec we bet 19 cps. If we would have counted the second bin for a full 100 sec we would have expected 1000 counts. Combining these two rates we get 3000 total counts in 200 sec. giving and average rate of 15 cps !

The lesson is that you should normalize the counting times.That is the rates must be computed from the same counting time.

BillKet · Oct 9, 2019

BvU said:

Ah, I should have heeded @Stephen Tashi 's mildly formulated (thanks steve!) criticism: my interpretation of #1 was that you had exact (i.e. relatively accurate) energies ##E_1## and ##E_2## somewhere in the spectrum, where you measured counting rates ##r_1## and ##r_2##, and there is only one ##E## in the phenomenon. That is indeed a (few) assumption(s) too many. Sorry.

If it's just a matter of merging two adjacent bins with centers ##E_1## and ##E_2## then your new counting rate is indeed ##2100/110## at ##E_1+E_2\over 2##. But why do such a thing and throw away information ? ##100\pm 10## counts/unit t next to ## 200\pm 5## is reason for alarm. Merging the bins sweeps that under the rug.

Thank you for the clarification! You're right, merging bins so different is not a great idea. My questions was mostly for bins with similar counts, in case I want to combine them (for example 98, 100, 103 counts per second). I gave that example just to easily do the computations. So I agree with the rate you obtain when combining the bins, but I am still not 100% sure about the center of the bin. Is just the average enough? Shouldn't I weight by the time there, the way I mentioned in the original post? Given that the rate after merging is around 19, and the second bin was 20 and it was measured for longer, I feel like I am more confident that a rate of 19 should correspond to an energy much closer to ##E_2## than to ##E_1##. For example, assuming a linear increase in the number of counts with energy, that would be the right thing to do. What do you think?

BvU · Oct 10, 2019

My feeling is that you then mix up observation and interpretation. And: what do you do with the enrgy resolution ? If it's smaller than the bin width you might have a case, but then merging isn't indicated. If it's bigger, then you need to worry about your observations ...

gleem · Oct 10, 2019

It is standard when merging bins of equal width to take the position of the new bin as the average of the merged bins no matter the values of the function at the bin positions.

You should not combine the counts and times for the two bins because they are measuring two different situations one a E1 and one at E2. The count in each bin come from a different parent population. Combining them would only be permissible if you were counting the same thing e.g. counts related to E1 for two different times.

As for determining the new rate value for the merged bins. Forgetting about the statistical nature of the data. Rates automatically normalize the data if all the rates are counts per same unit of time. The proper method would be to add the rates associated to the merged bin assigned energy. Why? if you had two counters one set for E1 and the other for E2 you would be accumulating counts at a rate of r1 + r2.

BillKet · Oct 13, 2019

gleem said:

It is standard when merging bins of equal width to take the position of the new bin as the average of the merged bins no matter the values of the function at the bin positions.

You should not combine the counts and times for the two bins because they are measuring two different situations one a E1 and one at E2. The count in each bin come from a different parent population. Combining them would only be permissible if you were counting the same thing e.g. counts related to E1 for two different times.

As for determining the new rate value for the merged bins. Forgetting about the statistical nature of the data. Rates automatically normalize the data if all the rates are counts per same unit of time. The proper method would be to add the rates associated to the merged bin assigned energy. Why? if you had two counters one set for E1 and the other for E2 you would be accumulating counts at a rate of r1 + r2.

I am not sure I understand why would I add the rates together. If my both rates are 100 counts in 10 seconds (i.e. each of them are 10 counts per second), if I combine them I get 200 counts in 20 seconds which gives a count also of 10 counts per second. Based on what you said, adding the counts I would get 20 counts per second. Am I miss-understanding what you said?

Stephen Tashi · Oct 14, 2019

BillKet said:

If my both rates are 100 counts in 10 seconds (i.e. each of them are 10 counts per second), if I combine them I get 200 counts in 20 seconds which gives a count also of 10 counts per second.

##\frac{a}{b} + \frac{c}{b} \ne \frac{a+c}{b+b}##

BillKet · Oct 14, 2019

Stephen Tashi said:

##\frac{a}{b} + \frac{c}{b} \ne \frac{a+c}{b+b}##

Yeah, that's what I am saying. Adding the rates directly doesn't make sense to me.

BvU · Oct 14, 2019

Rates are not only per unit of time, but also: per bin width of energy. So adding rates is perfectly correct. You get twice the rate for twice the interval of energy when you merge bins.

gleem · Oct 14, 2019

BillKet said:

I am not sure I understand why would I add the rates together. If my both rates are 100 counts in 10 seconds (i.e. each of them are 10 counts per second), if I combine them I get 200 counts in 20 seconds which gives a count also of 10 counts per second. Based on what you said, adding the counts I would get 20 counts per second. Am I miss-understanding what you said?

My reasoning is thus: when you merge two adjacent bins you are in fact determining the net counts from the two bins simultaneously for the same time not sequentially i.e., 10 sec vs 20 seconds. That said you can renormalize it to the former energy intervals by averaging the count rates. This does not affect the uncertainties associated with the rates. It is important to make sure that your data reflects the counts for the same energy interval to start since the interval determines the net counts and thus the rates.

Look at post #8 carefully. When count rates are from the same parent population having the same mean and variance ( if you count long enough in each bin you get rates that approach one another) you can add the counts you get over any time intervals and you get the same rate. However if the means (and therefore variances) are different adding the counts over the combined time gives a result that weigh the larger count time. You find that adding counts and dividing by the combined times only gives the correct rate if the rates are actually the same.

Take another example suppose you get 2000 counts in 100 sec and 4000 in 10 sec ( this is reasonable since you would count longer for a lower count rate). Adding counts and dividing by the summed times gives 38.2 cps. How can that be if one bin gives 400 cps. If you normalize to the same counting time say 100 sec then the second rate above is 40000 counts in 100 sec. adding the counts and dividing by 100 gives 420 cps . Adding the rates also gives 420 cps.

Stephen Tashi · Oct 14, 2019

BillKet said:

Yeah, that's what I am saying. Adding the rates directly doesn't make sense to me.

Are you saying that "adding rates" means something different that arithmetic addition of units that have the same physical units? For example, if we perform arithmetic addition on the rates 10 counts per second and 15 counts per second we get 25 counts per second. Similarly, if we add 10 kg/m to 15 kg/m we get 25 kg/m.

BvU said:

Rates are not only per unit of time, but also: per bin width of energy.

You might be correct. I don't know why the original poster can't describe a specific situation. ("I want to plot a rate vs energy plot" doesn't attribute a specific meaning to the data or the graph - anymore than saying "I want to plot velocity versus dollars" would describe a specific physical situation.)

Imagine a situation where an experimenter sets the internal energy of an object to a certain values ##E_i## and measures of counts ##C_i## of particles emitted from that object for ##T_i## seconds (without regard to what energies those particles have.) In such a situation it is not possible to have a pair of detectors that independently make counts at different energies ##E_1, E_2## on the same object in the same time interval. And it is not necessarily true that the rate of counts at energy ##E_1+E_2## would be the sum of rates ##C_1/T_1 + C_2/T_2##.

Imagine a different situation where an object is kept at a constant state and emits particles at various energies. It is possible to have two detectors set to detect particles that have different energies ##E_1## and ##E_2## and to have these detectors in operation over the exactly the same time interval. A slightly different situation is to have the two detectors in operation over different time intervals. If they operate in different time intervals then there can be variation in the "background" count of particles emitted from sources other than the object.

I found the third edition of Bevington and Robinson's Data Reduction and Error Analysis for the Physical Sciences online. In section 9.3 "Composite plots", it gives several considerations for bin sizes.

Multiple Peaks

Separation of closely spaced peaks is an important problem in many research fields. Although we should not attempt of extract information from our data by sorting in bins smaller than the uncertainties of our measurements, and should not use bin widths that are so narrow that the number of events in the bins are too small to satisfy Gaussian statistics, we also should not err in the other direction and risk supressing important details. Selecting optimum bin sizes is critical. For some data samples, different bin widths for different regions of the data sample may be appropriate.

The book doesn't define "optimum" bin sizes mathematically. It does give a specific and detailed example. I don't know if Bevington and Robinson is regarded as an authoritative text. If it is, this may explain why researchers who consult it agonize over bin sizes.

BillKet · Oct 16, 2019

gleem said:

My reasoning is thus: when you merge two adjacent bins you are in fact determining the net counts from the two bins simultaneously for the same time not sequentially i.e., 10 sec vs 20 seconds. That said you can renormalize it to the former energy intervals by averaging the count rates. This does not affect the uncertainties associated with the rates. It is important to make sure that your data reflects the counts for the same energy interval to start since the interval determines the net counts and thus the rates.

Look at post #8 carefully. When count rates are from the same parent population having the same mean and variance ( if you count long enough in each bin you get rates that approach one another) you can add the counts you get over any time intervals and you get the same rate. However if the means (and therefore variances) are different adding the counts over the combined time gives a result that weigh the larger count time. You find that adding counts and dividing by the combined times only gives the correct rate if the rates are actually the same.

Take another example suppose you get 2000 counts in 100 sec and 4000 in 10 sec ( this is reasonable since you would count longer for a lower count rate). Adding counts and dividing by the summed times gives 38.2 cps. How can that be if one bin gives 400 cps. If you normalize to the same counting time say 100 sec then the second rate above is 40000 counts in 100 sec. adding the counts and dividing by 100 gives 420 cps . Adding the rates also gives 420 cps.

But you also have 20 counts/sec in one of the bins, so I don't see the problem. But let's take a super basic example. Assume that I have a bin between 5 and 15 (in some energy units), centered at 10, with 100 counts in 1 second. The next bin is between 15 and 25, centered at 20, also with 100 counts in 1 second. Now, assuming that I want to fit a curve to these points (which is usually what one does in a counting experiment) I would have 2 points at coordinates (10,100) and (20,100). If (for some reason) I decide to double the size of my bin, the 2 bins mentioned above will become one bin between 5 and 25, centered at 15. Based on what you say, if I add the rates, I would get 100+100 = 200 counts per second, so the new point would be at (15,200). if I do it the way I suggest, I would get 200/2=100 counts per second so the new point would be (15,100). So if I fit a curve to (10,100) and (20,100), I would be more confident that at 15 the number of counts is 100 rather than 200. So can you please explain to me exactly what do you mean (in this case) by adding the rates. More specifically, what would your point (x,y) would be on a plot after rebinning?

Stephen Tashi · Oct 16, 2019

It would be useful to define a problem before discussing solutions to it. @gleem, what is your interpretation of the problem? What does the data represent? What is the "rate vs energy" curve supposed to represent? How do we determine whether one proposed plot of the data is better than another?

WWGD · Oct 16, 2019

Stephen Tashi said:

It would be useful to define a problem before discussing solutions to it. @gleem, what is your interpretation of the problem? What does the data represent? What is the "rate vs energy" curve supposed to represent? How do we determine whether one proposed plot of the data is better than another?

This is the point I was trying to make with my example. Do you, OP have a theory in mind you want to test for about the distribution or otherwise? That would narrow down or even determine the binning format. Otherwise, if you're trying to go bottom-up, i.e., trying to come up with a pattern from zero, from the data alone, it is a whole new approach. Edit: maybe more clearly, without a goal in mind you have an issue of unsupervised training/classification which is different from supervised training.

gleem · Oct 16, 2019

@BillKet In the case you have given the rates are the same which does not produce any problems when you add the counts and divide by the sum of the times for the two bins. You get the average of the rates. If one has counts in one bin that occur over one time and counts in the other over a different time and the rates they represent are significantly different you get into a problem doing it your way.

Reread my example

Take another example suppose you get 2000 counts in 100 sec and 4000 in 10 sec ( this is reasonable since you would count longer for a lower count rate). Adding counts and dividing by the summed times gives 38.2 cps. How can that be if one bin gives 400 cps. If you normalize to the same counting time say 100 sec then the second rate above is 40000 counts in 100 sec. adding the counts and dividing by 100 gives 420 cps . Adding the rates also gives 420 cps.

So in merging the two bins in my example would you feel more comfortable quoting 38.2 cps or 420cps for the merged bins? You see I normalized the times of the data acquisition to a single time. I know you do not like the rate coming out in your example twice that of each bin. You can get around that by averaging the two rates after summing by diving by two. The rates depend on an experimental condition (count time) and have nothing to do with the physics. So when fitting you must make sure that the rates you plot have been normalized to the same counting time if you want to use your method.

Another reason why your method would be incorrect. Suppose you need the real count rate for a whole peak. You could set an energy window to straddle the peak and collect all the data in one counting period say 100 sec. That would be the same as counting smaller energy intervals over the peak sequentially for 100 sec each. But if you wanted the proper count rate for all the counts in the peak you could not add all the counts in each interval and then divide by the sum of the times of all the individual energy intervals. You would sum all the counts and divide by common count time used for each energy interval.

BillKet · Oct 16, 2019

gleem said:

@BillKet In the case you have given the rates are the same which does not produce any problems when you add the counts and divide by the sum of the times for the two bins. You get the average of the rates. If one has counts in one bin that occur over one time and counts in the other over a different time and the rates they represent are significantly different you get into a problem doing it your way.

Reread my exampleSo in merging the two bins in my example would you feel more comfortable quoting 38.2 cps or 420cps for the merged bins? You see I normalized the times of the data acquisition to a single time. I know you do not like the rate coming out in your example twice that of each bin. You can get around that by averaging the two rates after summing by diving by two. The rates depend on an experimental condition (count time) and have nothing to do with the physics. So when fitting you must make sure that the rates you plot have been normalized to the same counting time if you want to use your method.

Another reason why your method would be incorrect. Suppose you need the real count rate for a whole peak. You could set an energy window to straddle the peak and collect all the data in one counting period say 100 sec. That would be the same as counting smaller energy intervals over the peak sequentially for 100 sec each. But if you wanted the proper count rate for all the counts in the peak you could not add all the counts in each interval and then divide by the sum of the times of all the individual energy intervals. You would sum all the counts and divide by common count time used for each energy interval.

Sorry I am still confused. First of all, I might not have explained my approach well. Given your example I would get (2000+4000)/(100+10) = 54.5 counts/second. I am not sure how to get 38.2, but that is not what I tried to explain and given that I have 20 counts on the left and 400 on the right (so it is probably the rising of a peak), 54.5 cps at a value of the energy in between seems reasonable. 420 cps definitely seems too much. Secondly, I don't think your approach of just multiplying the counts by 10 is right (I might be wrong tho). The counts have a Poisson error associated with them. You can't just say that if in t seconds you had N counts, in kt seconds you would have kN counts, as there would be statistical fluctuations. Lastly, again, I think I am still missunderstanding your approach. Assuming you are right, your approach should apply to my simple example, too. Let's be a bit more specific. Assume that 2 people measure the background for an experiment, at the same time, over the same range and let's say that the true value (based on theory for example) is 100 cps for all the energies within a certain range. The first person measures the value of the energy at 10 (in some energy units) and gets 100, while the other one measures it at 5 and 15 and gets, let's say, 99 and 101. In order to compare the measurements, the second one wants an estimate for his counts at the energy of 10. If he just adds the counts he would get 200. If he does what I suggest he would get 100. So I think that 100 would be the right answer. I am not sure why would you divide by 2 here, given that in your example you didn't divide by 2 (you got 420 cps, not 210 cps). And basically, if you divide by 2, you do what I am doing already (i.e. (99+101)/(1+1)). So I am not sure what you mean by adding rates, which would be, in this case 99/1+101/1=200/1 cps.

gleem · Oct 16, 2019

BillKet said:

I am not sure how to get 38.2,

Oops, neither do I, you are correct.

BillKet said:

Secondly, I don't think your approach of just multiplying the counts by 10 is right (I might be wrong tho). The counts have a Poisson error associated with them. You can't just say that if in t seconds you had N counts, in kt seconds you would have kN counts, as there would be statistical fluctuations.

That is true but your measurement is an estimate of the mean albeit crude so roughly you can do that and there would be some error yes but it is the best one can do given the data.

BillKet said:

The first person measures the value of the energy at 10 (in some energy units) and gets 100, while the other one measures it at 5 and 15 and gets, let's say, 99 and 101.

Your setting up the example to meet your requirements. Not fair. Are your numbers rates or total counts and what are the counting times. Are they the same. If they are there is no problem. Your assuming that the middle rate is the same as the rates bordering it. If the counting times where the same for all measurements then you would be correct. The problem comes when the counting times are different for different energies. This could happen when the count rate get low and you count much longer to get better statistics.
Take two intervals next to one another. one with N1 counts obtained in t1 sec and the other with N2 oounts obtained in t2 sec. the rates are expected to be different. If I would have measured both intervals simultaneously my rate should have been r1 +r2. It is like two faucets filling the same pail at the same time. If I had two buckets each filled at a different rate r1 and r2 and combined them without knowing how long each faucet was filling the pail how could I produce a combined rate of filling without knowing those times? Even if you told me the sum of the times that would in general not be accurate unless you also were told the ratio of the times. I could fill both pail to the same level one pail slowly for a long time and the other quickly for a short time. If you took the contents of the pail and divided by the sum of the times you would be in error on the low side because you are inadvertently weighing the low rate more.

V1/t1 +V2/t2 = r1 +r2

BillKet said:

I am not sure why would you divide by 2 here, given that in your example you didn't divide by 2 (you got 420 cps, not 210 cps). And basically, if you divide by 2, you do what I am doing already (i.e. (99+101)/(1+1)). So I am not sure what you mean by adding rates, which would be, in this case 99/1+101/1=200/1 cps.

≠

As for the factor two . Adding two rate gives you the rate for an bin width twice the original. If you wanted to use the original bin width as reference then divide by two.. If you merged three you would divide by three. etc.

Make up examples for yourself with different rates and times in each bin. Do your way and then mine.(you have to divide the net rate you get using my way by 2 to compare to yours)

BillKet · Oct 16, 2019

So I agree that if you measure them separately you would get r1 + r2, but this is not what binning is. When you do a rebinning in a rate vs energy plot, you basically show the values of the rates at different points than before (the middle of the points from before, if using double the size of the bins). So instead of measuring the rate at E1 and E2, you are asking what is the rate at (E1+E2)/2. Why would the rate in the middle be the sum of the rates on the left and right? And applying my logic to your example (maybe mine was too trivial, I agree), I would get as I said before, 54.5 cps. Assuming that the energies are, 100 and 200 (in some units) the (weighted) value of the energy at which I would get that count would be ##\frac{100\cdot 100 + 200 \cdot 10}{100+10} = 109##. So with 20 cps at 100 and 400 cps at 200 I would get 54.5 cps at 109, which seems really reasonable for me. My method is basically a linear interpolation between the 2 points that you gave. However, I still don't understand how you would get a higher count in the middle. That would imply the existence of a peak there, probably. Why would you have a peak?

BvU · Oct 17, 2019

BillKet said:

you are asking what is the rate at (E1+E2)/2

No. You are asking what the rate is for the interval ## E_1 < E < E_2 ##.

[Edit] sorry: I mean the lower end of the ##E_1## bin and the high end of the ##E_2 bin ##

gleem · Oct 17, 2019

BillKet said:

Assuming that the energies are, 100 and 200 (in some units) the (weighted) value of the energy at which I would get that count would be 100⋅100+200⋅10100+10=109100⋅100+200⋅10100+10=109\frac{100\cdot 100 + 200 \cdot 10}{100+10} = 109.

I do not understand. I already discussed this in post 4. You should weight the energies with the bin widths not the counting times since they are arbitrary and will change your data if you use different times. Counting times should be irrelevant except for the affect on the uncertainty of the rates.

gleem said:

_{Enew bin = (Ei ΔEi + Ei+1ΔEi+1/( ΔEi+ΔEi+1)}

Now I agree that 54 cpm at 109 seems good but why are you satisfied with a data point that is more reflective of the low rate with a higher relative uncertainly near the lower energy and neglecting the higher rate. Wouldn't it be better to have an energy for the new bin more representative of the two energies as well as the rates being more representative of the two rates.

BillKet said:

I still don't understand how you would get a higher count in the middle. That would imply the existence of a peak there, probably. Why would you have a peak?

The data gives the rate at each energy interval of which there are say N. When you bin two consecutive intervals you now have N/2 bins. The total rate indicated by the original data is

## R_{total}^{orig} = \sum_{i=1}^{N}r_{i} ##

When you rebin you get
## R_{total}^{rebin} = \sum_{j=1}^{N/2}(r_{2j-1}+r_{2j} ) ##

Rebinning should conserve the total rate ## R_{total}^{orig} = R_{total}^{rebin} ## which is accomplished by adding the rates of the consecutive bins.

Now you can change the rates of all the bins by a constant factor ( like averaging) and it will not affect the parameters obtained from the fit except the absolute rates but not the relative rates. It wold also not affect the relative uncertainties of the data either.

What if you were given the data in terms of rates only; what would you do?

BillKet · Oct 17, 2019

gleem said:

I do not understand. I already discussed this in post 4. You should weight the energies with the bin widths not the counting times since they are arbitrary and will change your data if you use different times. Counting times should be irrelevant except for the affect on the uncertainty of the rates.
Now I agree that 54 cpm at 109 seems good but why are you satisfied with a data point that is more reflective of the low rate with a higher relative uncertainly near the lower energy and neglecting the higher rate. Wouldn't it be better to have an energy for the new bin more representative of the two energies as well as the rates being more representative of the two rates.

The data gives the rate at each energy interval of which there are say N. When you bin two consecutive intervals you now have N/2 bins. The total rate indicated by the original data is

## R_{total}^{orig} = \sum_{i=1}^{N}r_{i} ##

When you rebin you get
## R_{total}^{rebin} = \sum_{j=1}^{N/2}(r_{2j-1}+r_{2j} ) ##

Rebinning should conserve the total rate ## R_{total}^{orig} = R_{total}^{rebin} ## which is accomplished by adding the rates of the consecutive bins.

Now you can change the rates of all the bins by a constant factor ( like averaging) and it will not affect the parameters obtained from the fit except the absolute rates but not the relative rates. It wold also not affect the relative uncertainties of the data either.

What if you were given the data in terms of rates only; what would you do?

What confuses me about adding the rates directly the way you suggest, is that the final rate (i.e. the sum) will reflect something of the form counts per second per bin width (I think). So doubling the bin width and measuring everything that come into that bin in a given amount of time would lead to adding the rates directly. But this is not what my question is about. I am asking only about the rate you associate to the center of the bin, which has units of counts per second not counts per second per electronvolt (or some energy unit). So if I have a rate at E1 and another one at E2 and I combine them, the total rate for all the energy in that bin would be the sum of the rates. But (usually) you associated to the height of a bin just one point (probably the center), so again using the sum of the edges to get the center doesn't seem right to me.

gleem · Oct 17, 2019

BillKet said:

What confuses me about adding the rates directly the way you suggest, is that the final rate (i.e. the sum) will reflect something of the form counts per second per bin width (I think).

Yes and that's what your data means also. It is implied. There is some window of acceptance at an energy. If it varies you have to know how it varies.

BillKet said:

So if I have a rate at E1 and another one at E2 and I combine them, the total rate for all the energy in that bin would be the sum of the rates. But (usually) you associated to the height of a bin just one point (probably the center), so again using the sum of the edges to get the center doesn't seem right to me.

If you average them (divide by 2) will that make you happy? It is ok to do this if you keep in mind that you have implicitly reduced the total number events your data represents by a factor of 2.

Before we go any further could you show some representative data that you are working with like four energies, the net events and time of data accumulation at each energy and a statement about the energy interval at each energy. I should have asked for this earlier.

BillKet · Oct 17, 2019

gleem said:

Yes and that's what your data means also. It is implied. There is some window of acceptance at an energy. If it varies you have to know how it varies.
If you average them (divide by 2) will that make you happy? It is ok to do this if you keep in mind that you have implicitly reduced the total number events your data represents by a factor of 2.

Before we go any further could you show some representative data that you are working with like four energies, the net events and time of data accumulation at each energy and a statement about the energy interval at each energy. I should have asked for this earlier.

Here is some of the raw data (also thanks a lot for helping me with this). The columns are measurement time is seconds, number of counts and wavenumber in ##cm^{-1}##.

1.10898922	385.035301	13252.7038
1.10137726	322.323705	13252.7067
1.01061749	384.913186	13252.7291
1.10074832	336.134968	13252.7409
1.11675192	320.572539	13252.7487
1.09975126	355.53494	13252.7604

The measurement is done at a fixed wavenumber for about 1 second and the number of events recorded. To give a bit more details about this, I want to plot rate vs freq (I will call wavenumber freq) and fit a curve to this and get the center of the peak (if you plot the data like this there is a clear peak). Assuming that I want to rebin the data for some reason (you might lose some information, but that's another story) and I decide to use, say 100 bins, such that I end up with 100 datapoints to use for the fit. The way I do it is to divide the x-range into 100 equal parts and associate and (x,y) to each of these 100 bins. The freq is not equally spaced, so the bins will have different number of points in them. So, for the sake of argument, let's assume that my binning will put the first 2 points in a bin and the next 4 points in another bin (and assume there are no other points in these 2 bins). What would be the x and y associated to these 2 points, that you would get to use for the fit (assume you need to fit a Lorentzian curve)?

BvU · Oct 17, 2019

BillKet said:

if you plot the data like this there is a clear peak

??

Total confusion. How do you count to 385.035301 ?
In my book 385 is followed by 386

[edit] Ah ! when you say "counts" you mean "rate" -- next time add a dimension to your data columns !

What on eart can move you to merge "bins" here ? You don't have bins, you have "energy settings".

gleem · Oct 17, 2019

Events are suppose to be integers. You wave number differ by 0.000022% really.

Ok for me to proceed I have to have some confidence in this data. Can you explain exactly what and how you get this data and confirm the validity of the significant figures your present.

BillKet · Oct 17, 2019

BvU said:

??

Total confusion. How do you count to 385.035301 ?
In my book 385 is followed by 386

[edit] Ah ! when you say "counts" you mean "rate" -- next time add a dimension to your data columns !

View attachment 251353

What on eart can move you to merge "bins" here ? You don't have bins, you have "energy settings".

I am sorry, yes I meant rate (to get counts multiply the first 2 columns, my bad!). Also I have few tens of thousands of points between 13252.3231 and 13293.9497 ##cm^{-1}##. Here I provided some randomly selected points just to give an example the kind of data that I have, but there is a peak when you plot everything (and you have something around 2000 counts around that freq value). So one of the reasons to merge bins, is that there might be an error associated with the freq measurement, so if I say 13252.74, if there was a miss reading, that might be actually a count belonging to 13252.75 or 13252.73. If we merge several bins, average the frequencies and quote the average as the center of the new bin (I am still not sure what kind of average would that be), we might be able to neglect that error when doing the fit. The main problem is that we know that the issue is there but we are not sure how to quantify it exactly yet. So averaging over several energies (assuming that the error goes like ##1/\sqrt(N)##) the error on each freq should be small enough such that we can ignore it. Also we are pretty sure that there is no fine structure in the peak, so doing this average should not hide some physics from us.

BvU · Oct 17, 2019

BillKet said:

some randomly selected points

Ever see the end of the movie "A few good men" ? You could play Nicholsons role without having to audit !

Post your plot of ALL data. Counting rate vertically, cm^-1 horizontally

If your counting times don't differ by more than 10%, don't even bother worrying about the effect on the error bar in individual measurements. Let alone trying to compensate.

BTW what's your energy resolution estimate ? Judging from your first two "some of the data" points it might be really good -- but by now I have trouble believing anything I read in this thread.

BillKet said:

Assuming that I want to rebin the data for some reason

Is that the lame reason 'misreading the frequency' ?

There can be no sensible argument concocted from that for rebinning. Or do you want to artificially 'improve' your observations in order to get a peak ?

BillKet · Oct 17, 2019

gleem said:

Events are suppose to be integers. You wave number differ by 0.000022% really.

Ok for me to proceed I have to have some confidence in this data. Can you explain exactly what and how you get this data and confirm the validity of the significant figures your present.

I made some clarifications in a post after, those are rates, not counts, I am sorry. So we used a laser to ionize some molecules and counted the number of ionized molecules for each laser wavelength. The laser (Ti:Sapphire)
wavelength was adjusted with a predefined schedule and when the wavelength was changed, the change was of about 0.02 ##cm^{-1}##. As I said above, we have some issues with reading the exact wavelength of the laser so 13252.7038 and 13252.7067 are probably the same value (which one of them is right, we don't know for sure), while 13252.7291 is the one after the increase by 0.02 ##cm^{-1}##. This is the raw data that we have from the software (including the decimals provided). As I said, this uncertainty about the actual freq, is the main reason why we want to do a rebinning.

BvU · Oct 17, 2019

Don't rebin -- no reason. And show the plot -- I sure hope the peak isn't at 13252.7

Calculating Bin Center for Rate vs Energy Plot

Similar threads

Hot Threads

Recent Insights