Least Squares Method- What is the Measured mean value of y ?

nerdy_hottie · Jan 27, 2013

Homework Statement

So I'm doing a Least Squares Analysis and I'm wondering about what the 'measured mean value of y for replicate measurements of the unknown' value is supposed to be. I have no idea in the world what it's asking for. The value it is speaking of is not the same as the average value in y. I will post the example so you can see what I'm talking about.

Least-Squares Spreadsheet

X Y
1 2
3 3
4 4
6 5 m 0.615384615 1.346153846 b
sm 0.054392829 0.214144783 sb
R2 0.984615385 0.196116135 sy

n= 4
Mean y= 3.5
Σ(xi-mean x)2 13

*********Measured y= 2.72
k= number of replicate measurements of y= 1
Derived x= 2.2325
sx= 0.373502805

Homework Equations

I'm looking for an equation, or an explanation as to how to obtain the value.

The Attempt at a Solution

I have asterisks (********) next to the measured y value in the spreadsheet. (The value is 2.72). The only reason I know what it is in this case is because this is an example from my textbook. I have no idea where it comes from, but I need it for an equation to be able to do my lab and I don't know how to find the value.
As far as I can make sense of it, I have no means of calculating 'measured mean value of y for replicate measurements of the unknown', as there are no replicate measurements of the y values. Right?
Just in case it helps, this is for an analytical chemistry lab, but it's pertaining to statistics, so I asked it here.

Thanks.

I like Serena · Jan 27, 2013

Welcome to PF, nerdy_hottie!

It looks like your measured y value of 2.72 is given and not calculated.
The purpose it to find the corresponding x.
The derived x is found by applying the inverse of the found linear relationship.

The 2.72 appears to be the result of a set of k=1 measurements.
This is relevant for the estimated sx, the standard deviation of the derived x.

nerdy_hottie · Jan 27, 2013

So how would I find the given value for another set of values? Is there a table or something based on the number of k?

I like Serena · Jan 27, 2013

The idea is that a new set of y measurements is done for a fixed unknown value of x.

The more measurements, the more accurate y will be, the more accurate will the linear relationship be, and the more accurate will the corresponding resulting x be.

I don't have the formulas at hand, but typically the standard deviations will decrease by a factor of about √k.
I guess what you would need is those formulas.

nerdy_hottie · Jan 27, 2013

I have the formulas for finding sx (uncertainty in x), and all other corresponding formulas for all the values I have listed. I have another set of data which I want to find the value for 'measured y', but I don't know what it is to proceed. So is what you're saying that if I have another set of data with only one replicate measurement of y, the value will always be the given 2.72?

I like Serena · Jan 27, 2013

I'm saying that the y measurement of 2.72 (and its number of measurements k=1) is not calculated from the data you have shown.
It is drawn from elsewhere.

nerdy_hottie · Jan 27, 2013

Yes, but is it constant across all data sets with number of measurements, k=1 ? I mean, if I don't have to calculate it from the data given, and it's a given value for k=1, then isn't it a constant?

I like Serena · Jan 27, 2013

I have seen only 1 dataset with only 4 measurements.
I guess it's a constant across this dataset...

For which purpose do you need it?

nerdy_hottie · Jan 27, 2013

Okay sorry for any confusion but I didn't want to take the time to post all the data. I was just trying to find out the meaning of that measured y value, and apply it to the data I have now and all other data sets in the future. Right now the data set I'm working with is as follows:

Determination of Cu in Brass Using AA Spec.

Conc. (ppm) Abs.
0.000 0.000
2.044 0.268
4.088 0.509
6.132 0.723

m 0.118 0.014 b
sm 0.004 0.016 sb
R2 0.997 0.019 sy

n= 3
Mean y= 0.500
Σ(xi-mean x)2= 8.355872

Measured y= ?
sx ? (need measured y)
Hope that's a little clearer.

I like Serena · Jan 27, 2013

Seems to me you are supposed to measure the absorption yourself a couple of times.
And then fill in that value.
Didn't you say this is for a chem lab?

From that you can find the copper concentration and its associated uncertainty.

You would use the relation:
$$Absorption = (0.118\pm 0.004) \times Concentration + (0.014 \pm 0.016)$$

nerdy_hottie · Jan 27, 2013

I have measured the absorbance.. the values are above.
"Abs
0.000
0.268
0.509
0.723 "
for the corresponding values of concentration.
I have calculated average, "Mean y=0.500", and other such parameters, as seen above. I am performing a least squares analysis, and am as far as calculating sx using the formula
s_x=(s_y/|m|)√(1/k+1/n+((y-[itex]\overline{y}[/itex])²/(m²*[itex]\Sigma[/itex](x_i-[itex]\overline{x}[/itex]²)))

I just need that value for measured y.

Ray Vickson · Jan 27, 2013

I like Serena said:

Seems to me you are supposed to measure the absorption yourself a couple of times.
And then fill in that value.
Didn't you say this is for a chem lab?

From that you can find the copper concentration and its associated uncertainty.

You would use the relation:
$$Absorption = (0.118\pm 0.004) \times Concentration + (0.014 \pm 0.016)$$

This might not be true *exactly* as written. In regression analysis there are expressions available that give "prediction intervals" for y(x) and "confidence intervals for Ey(x) in terms of x, so the width of an uncertainty bracket is different for different values of x. See, eg.,
http://www.weibull.com/DOEWeb/confidence_intervals_in_simple_linear_regression.htm .

Since the intervals for m and b are correlated, we cannot just use the two intervals separately--as your expression does--although that might give a pretty good approximation in some cases

I like Serena · Jan 27, 2013

nerdy_hottie said:

I have measured the absorbance.. the values are above.
"Abs
0.000
0.268
0.509
0.723 "
for the corresponding values of concentration.
I have calculated average, "Mean y=0.500", and other such parameters, as seen above. I am performing a least squares analysis, and am as far as calculating sx using the formula
s_x=(s_y/|m|)√(1/k+1/n+((y-[itex]\overline{y}[/itex])²/(m²*[itex]\Sigma[/itex](x_i-[itex]\overline{x}[/itex]²)))

I just need that value for measured y.

Yes, so you did 3 measurements to calibrate, using known concentrations.
Next you would pick substance X with an unknown concentration of copper.
Do k absorption measurements and fill that in in your formula to find the standard deviation of the concentration.

Btw, be careful to put the last square outside the parentheses. It should be ##(x_i-\bar x)^2##.

I like Serena · Jan 27, 2013

Ray Vickson said:

This might not be true *exactly* as written. In regression analysis there are expressions available that give "prediction intervals" for y(x) and "confidence intervals for Ey(x) in terms of x, so the width of an uncertainty bracket is different for different values of x. See, eg.,
http://www.weibull.com/DOEWeb/confidence_intervals_in_simple_linear_regression.htm .

Since the intervals for m and b are correlated, we cannot just use the two intervals separately--as your expression does--although that might give a pretty good approximation in some cases

Yep, those were the formulas I was looking for in post #4.
It appears the OP is supposed to use a version that is even more advanced than the ones mentioned.

Actually, this is pretty advanced for a chem lab.

nerdy_hottie · Jan 27, 2013

Oh my gosh I'm sorry I don't know what I'm talking about. Yes, I have other values for 'substance x', as you called it (samples of brass) which I have other absorbance values for. I mixed up three different samples (dilutions) of brass using three different masses of the same brass solid, and have absorbances corresponding to these three solutions. These absorbances (of which I have three corresponding to the three different samples) are actually a measurement of the average of three absorbance values, because the machine I used (an atomic absorption spectrometer-AA spec.) actually takes three separate readings of a sample and reports the average value of that sample (which I have listed below)
So I have three values for brass, each is an average the machine took.
Does mean that the k value is 3 (because the number of replicate measurements, or times the machine took an absorbance value, is 3) ? Am I understanding this at all right or totally wrong?
And if the value of k is 3, what then is the corresponding value of 'measured y' for that set of samples?
(I don't think you need it, but the absorbances for my three separate brass samples are:
0.521, 0.511, 0.524)

I like Serena · Jan 27, 2013

Good!

Your 'measured y' would be 0.521 for the first brass sample.
And indeed you would have k=3 replicate measurements.
Fill that in your formula, and you'll get the standard deviation for the concentration of copper in this brass sample.

Repeat for the other 2 samples to find the sx in their copper concentrations as well.Or am I misunderstanding and are all those measurements for the same sample of unknown brass?
If that is the case, you should average them and use k=9.

nerdy_hottie · Jan 27, 2013

No, you're understanding correctly.
But now I see the place of my confusion in the first place. I thought that s_x would be only one value for the whole data set. Now I see that (for this data set), there will be three separate values of s_x.
However, going back to the first sample example, where the measured value of y was 2.72. I know that it's a given number and not calculated in any way, but where does the value come from? It is not a value in the list of y values (only values are 2,3,4,5), so where does it come from, you know what I mean? And in this sample example in the book, there is only one value for s_x, for the range of all the data. So I'm not saying you're wrong, but it's just that my book only has one value for the whole set.

I like Serena · Jan 27, 2013

Well, I can only assume that the example in your book had 4 measurements with known concentrations, and 1 measurement for an unknown concentration.
The sx would be for that one unknown concentration.But... there may be more than one sx mentioned.
When you do a linear regression, you can also determine another sx.
For instance ##s_x=\sqrt{\sum (x_i - \bar x)^2 \over n-1}##.
This could be part of the calculation of m.
Either way, this sx would have no purpose for you.

nerdy_hottie · Jan 27, 2013

Alright thanks for all the help, I know what I'm going to do now, whether it's right or not, I don't care much at this point. Most of our mark is based on how accurate our results are, so if this is wrong or not it shouldn't affect my mark a whole lot.

I like Serena · Jan 27, 2013

Would you mind to let me know how it ends?

nerdy_hottie · Jan 27, 2013

I pass it in tomorrow, and I won't get it back until at least a week after that, but if I remember by then, then sure I will !

I like Serena · Jan 27, 2013

Please do check if your resulting sx has a reasonable value.
Otherwise you have likely made a calculation error (which is not uncommon ;)).
It would be a shame to lose marks over something like that.

I like Serena · Feb 7, 2013

Did you get it back?

Ray Vickson · Feb 7, 2013

nerdy_hottie said:

Homework Statement

So I'm doing a Least Squares Analysis and I'm wondering about what the 'measured mean value of y for replicate measurements of the unknown' value is supposed to be. I have no idea in the world what it's asking for. The value it is speaking of is not the same as the average value in y. I will post the example so you can see what I'm talking about.

Least-Squares Spreadsheet

X Y
1 2
3 3
4 4
6 5

m 0.615384615 1.346153846 b
sm 0.054392829 0.214144783 sb
R2 0.984615385 0.196116135 sy

n= 4
Mean y= 3.5
Σ(xi-mean x)2 13

*********Measured y= 2.72
k= number of replicate measurements of y= 1
Derived x= 2.2325
sx= 0.373502805

Homework Equations

I'm looking for an equation, or an explanation as to how to obtain the value.

The Attempt at a Solution

I have asterisks (********) next to the measured y value in the spreadsheet. (The value is 2.72). The only reason I know what it is in this case is because this is an example from my textbook. I have no idea where it comes from, but I need it for an equation to be able to do my lab and I don't know how to find the value.
As far as I can make sense of it, I have no means of calculating 'measured mean value of y for replicate measurements of the unknown', as there are no replicate measurements of the y values. Right?
Just in case it helps, this is for an analytical chemistry lab, but it's pertaining to statistics, so I asked it here.

Thanks.

I think that, in principle, you may have a problem that is difficult to solve exactly. You did a least-squares fit of y to x, so the statistical output is valid if the model was of the form
[tex] y = \alpha + \beta x +\epsilon,[/tex] where ##\alpha, \; \beta## are unknown constants and ##\epsilon## is a mean-0 random variable with variance that does not depend on x. If, further, the distribution of ##\epsilon## is NORMAL, you can develop confidence intervals, etc. You have a series of observations
[tex] y_i = \alpha + \beta x_i + \epsilon_i, \: i = 1,2, \ldots, n,[/tex] where the different ##\epsilon_i## are mutually independent and have the same distribution.

You do not know ##\alpha## and ##\beta##, but instead you estimate them as 'a' and 'b' using least-squares formulas. Assuming correctness of the form of statistical model, a and b will be unbiased estimates of the underlying parameters, and the computed total squared error ##S^2## will be related to ##\sigma^2 = \text{Var}(\epsilon)##, via standard formulas.

Standard formulas allow us to give confidence intervals on Ey(x) and on y(x) at some future-measured value of x. These formulas a a bit complicated, but can be found in many sources. However, what you seem to want to do is almost the opposite: you measure y and want to know about x. So, if the original model is valid, what you have is
[tex] y = \alpha + \beta x + \epsilon \: \Longrightarrow x = \frac{y - \alpha - \epsilon}{\beta} = \frac{y}{\beta} - \frac{\alpha}{\beta} - \frac{\epsilon}{\beta}.[/tex] Now the problem you face is that the expected value of ##y/ \beta## is not ##y/b## (although it may be close, sometimes) and that the expected value of ##\alpha / \beta## is not ##a/b##. I suspect that getting exact formulas is somewhere between difficult and impossible, although, of course, one can always resort to Monte-Carlo simulation to get rough estimates.

However, I suspect you are supposed to put ##a## instead of ##\alpha## and ##b## instead of ##\beta## to get an estimate for the mean of x. Whether or not that is of any use is not at all clear.

I like Serena · Feb 7, 2013

@RGV: Please! Did you bother to read the comments in this thread? Or did you just want to leave your mark?
I only triggered this thread because I was curious for the results.

Ray Vickson · Feb 7, 2013

I like Serena said:

@RGV: Please! Did you bother to read the comments in this thread? Or did you just want to leave your mark?
I only triggered this thread because I was curious for the results.

Well, yes, I tried to, but I lost track of what was going on, and how it related to the OP's original question. In part the problem is that the thread is diffusing and introducing different strains, so it is no longer east to follow.

Besides, if you don't think my remarks are relevant, just ignore them. Maybe the OP will find them helpful, or maybe not---I hoped they were. He/she can be the judge of that.

Least Squares Method- What is the Measured mean value of y ?

Homework Statement

Homework Equations

The Attempt at a Solution

Homework Statement

Homework Equations

The Attempt at a Solution

1. What is the purpose of using the least squares method?

2. How is the measured mean value of y calculated using least squares?

3. What does the "least squares" part of the method name refer to?

4. Can the least squares method be used for any type of data set?

5. Are there any limitations to using the least squares method?

Similar threads

Hot Threads

Recent Insights