Extracting data the right way

kelly0303 · Jun 15, 2021

Hello! I have a relationship of the form ##y_i=ax_i##. In my case ##y_i## is a frequency and ##x_i## is a mass. For each mass, ##x_i## I measure ##y_i## (and I get a central value and an error). The difference between ##x_i##'s is about 1 (in some arbitrary units). Is it possible to extract the exact value of ##x_i## for each mass, with this information, without knowing anything about ##a##? For a bit of background, we know ##x_i## well enough to be able to separate a given mass from the rest in the experiment, hence we know for sure that a given ##y_i## corresponds to a given mass. However we are able to measure ##y_i## extremely well (the error is very small), so I was wondering if we can use the measured ##y_i##'s to extract the masses, ##x_i## with smaller error than currently.

Twigg · Jun 15, 2021

There is some hope given that the masses are discrete. Based on past threads, am I right to think you're looking at a laser spectrum over multiple isotopes? If so, can you make some guesses about a few values of ##x_i## by comparing the heights of the spectral peaks and natural abundances?

You need some outside information. If you know at least one value of ##x_i##, you can estimate the slope ##a## by evaluating ##a_{est} = \frac{y_i}{x_i}## with error ##\sigma_a = \frac{\sigma_{y,i}}{x_i}## (since you're assuming you know the integer value of ##x_i## exactly). With that estimate of ##a##, you can estimate the rest of the ##x_j## (##j \neq i##) by doing ##x_{j,est} = \frac{y_j}{a_{est}}## with error ##\sigma_{x,j} = \sqrt{ \left( \frac{\sigma_{y,j}}{a_{est}} \right) ^2 + \left( \frac{y_j \sigma_a}{a_{est}^2} \right)^2 }##. If you end up with ##\sigma_{x,j} \leq 1##, then you'll be able to estimate the ##x_j## with at least 68% (##1 \sigma##) accuracy.

If you can't identify masses by natural abundance, you might be in luck if you have any odd isotopes mixed in. You can identify them by their nuclear spin by applying a magnetic field and seeing how many zeeman levels pop out (~~or by looking at the population decay versus time with a small detuning, you'll see beats in the exponential decay corresponding to zeeman sublevel splittings~~).

Let me know if I'm dead wrong about the context of the experiment o:)

Edit: I crossed out a suggestion because I forgot that that method only works when the excited state is the one with the zeeman levels.

kelly0303 · Jun 15, 2021

Twigg said:

There is some hope given that the masses are discrete. Based on past threads, am I right to think you're looking at a laser spectrum over multiple isotopes? If so, can you make some guesses about a few values of ##x_i## by comparing the heights of the spectral peaks and natural abundances?

You need some outside information. If you know at least one value of ##x_i##, you can estimate the slope ##a## by evaluating ##a_{est} = \frac{y_i}{x_i}## with error ##\sigma_a = \frac{\sigma_{y,i}}{x_i}## (since you're assuming you know the integer value of ##x_i## exactly). With that estimate of ##a##, you can estimate the rest of the ##x_j## (##j \neq i##) by doing ##x_{j,est} = \frac{y_j}{a_{est}}## with error ##\sigma_{x,j} = \sqrt{ \left( \frac{\sigma_{y,j}}{a_{est}} \right) ^2 + \left( \frac{y_j \sigma_a}{a_{est}^2} \right)^2 }##. If you end up with ##\sigma_{x,j} \leq 1##, then you'll be able to estimate the ##x_j## with at least 68% (##1 \sigma##) accuracy.

If you can't identify masses by natural abundance, you might be in luck if you have any odd isotopes mixed in. You can identify them by their nuclear spin by applying a magnetic field and seeing how many zeeman levels pop out (~~or by looking at the population decay versus time with a small detuning, you'll see beats in the exponential decay corresponding to zeeman sublevel splittings~~).

Let me know if I'm dead wrong about the context of the experiment

Edit: I crossed out a suggestion because I forgot that that method only works when the excited state is the one with the zeeman levels.

You're right about the experiment! We are looking at several isotopes. However, the setup is a bit different. We do know the masses quite well (from Penning trap mass measurements). I was wondering if there is a way to use the measured frequency to extract the mass better than what we already have, using that formula. For example if we know the masses with a relative error of ##10^{-3}##(0.1%), and the frequency with a relative error of, say, ##10^{-6}##, and we know that the parameter ##a## is the same for all isotopes, can we use all this information to extract the masses with a smaller error than ##10^{-3}##?

For example (I am just trowing this out, not sure if it makes sense from a statistics point of view), if I fit ##y=ax## with the data I have, together with errors on x and y I would get "a" with a given error. Then, using this a, I would get the individual masses by doing ##x_i=y_i/a##. My hope was that if we have enough isotopes and the error on ##y_i## is small enough, the error on a would be small enough that the resulted error on ##x_i## from ##x_i=y_i/a## would be smaller than the initial error.

As I said, this is just an idea. Intuitively, I would expect that using all the information about all the isotopes at once would allow us to constrain the masses better than using ##a## and one isotope at a time. What do you think?

Twigg · Jun 15, 2021

Ahh ok this makes sense. My gut instinct is that you should be able to constrain the masses, but probably not down to the ##10^{-6}## level. If you have ##N## data points, my gut feeling is that you'd be able to constrain the masses down to the ##\frac{10^{-3}}{\sqrt{N}}## level.

My thoughts:
With the N measurements ##(x_i, y_i)##, you can generate a sequence of N-1 ratios: ##\left( \frac{x_i}{x_1}, \frac{y_i}{y_1} \right) ## for ## i = 2, 3, ..., N##. You know that, up to noise, these ratios have to be equal: ##\frac{x_i}{x_1} = \frac{y_i}{y_1}## because ##y_i = a x_i##. You can use this constraint and the ##10^{-6}## precision on y to constrain ratios of the x's down to the ##10^{-6}## level. However, there is still a floating degree of freedom: the baseline mass ##x_1##. This makes sense, because the frequency spectrum only gives you an idea of the relative masses, it doesn't give you a kilogram standard. In other words, you constrain the relative masses, but you still need a reference mass ##x_1##. I believe you can use the information of all N of the mass measurements ##x_i## by minimizing the squared error ## \langle \left(x_i - \frac{y_i}{y_1} \hat{x}_1 \right)^2 \rangle## where ##\hat{x}_1## is your estimate of the mass ##x_1## (this is the variable you vary to minimize the squared error). My hunch is that you will be limited to a ##\frac{1}{\sqrt{N}}## reduction in the uncertainty of individual masses because this procedure is like averaging over the information contained in the N mass measurements. That means there's a ##\frac{10^{-3}}{\sqrt{N}}## limit of the uncertainty in ##\hat{x}_1##.

I could be wrong on this. This is just my gut reaction.

Edit: Maybe it should be ##\frac{10^{-3}}{\sqrt{N-1}}##? I'm not sure. I'd put this question to a Monte Carlo test.

kelly0303 · Jun 16, 2021

Twigg said:

Ahh ok this makes sense. My gut instinct is that you should be able to constrain the masses, but probably not down to the ##10^{-6}## level. If you have ##N## data points, my gut feeling is that you'd be able to constrain the masses down to the ##\frac{10^{-3}}{\sqrt{N}}## level.

My thoughts:
With the N measurements ##(x_i, y_i)##, you can generate a sequence of N-1 ratios: ##\left( \frac{x_i}{x_1}, \frac{y_i}{y_1} \right) ## for ## i = 2, 3, ..., N##. You know that, up to noise, these ratios have to be equal: ##\frac{x_i}{x_1} = \frac{y_i}{y_1}## because ##y_i = a x_i##. You can use this constraint and the ##10^{-6}## precision on y to constrain ratios of the x's down to the ##10^{-6}## level. However, there is still a floating degree of freedom: the baseline mass ##x_1##. This makes sense, because the frequency spectrum only gives you an idea of the relative masses, it doesn't give you a kilogram standard. In other words, you constrain the relative masses, but you still need a reference mass ##x_1##. I believe you can use the information of all N of the mass measurements ##x_i## by minimizing the squared error ## \langle \left(x_i - \frac{y_i}{y_1} \hat{x}_1 \right)^2 \rangle## where ##\hat{x}_1## is your estimate of the mass ##x_1## (this is the variable you vary to minimize the squared error). My hunch is that you will be limited to a ##\frac{1}{\sqrt{N}}## reduction in the uncertainty of individual masses because this procedure is like averaging over the information contained in the N mass measurements. That means there's a ##\frac{10^{-3}}{\sqrt{N}}## limit of the uncertainty in ##\hat{x}_1##.

I could be wrong on this. This is just my gut reaction.

Edit: Maybe it should be ##\frac{10^{-3}}{\sqrt{N-1}}##? I'm not sure. I'd put this question to a Monte Carlo test.

Thanks a lot! I was actually guessing that the error would go like ##\frac{10^{-3}}{\sqrt{N}}##. That was helpful, it gave me confidence to put this to a test!

Extracting data the right way

1. What is the importance of extracting data the right way?

2. What are some common methods for extracting data?

3. How can data extraction be done ethically?

4. What are some potential challenges in data extraction?

5. How can data extraction be improved?

Similar threads

Hot Threads

Recent Insights