# Least Squares Method- What is the Measured mean value of y ?

• nerdy_hottie
In summary, Homework Equations state that the measured y value is the result of a set of k=1 measurements and that it is constant across all data sets with number of measurements, k=1.
nerdy_hottie

## Homework Statement

So I'm doing a Least Squares Analysis and I'm wondering about what the 'measured mean value of y for replicate measurements of the unknown' value is supposed to be. I have no idea in the world what it's asking for. The value it is speaking of is not the same as the average value in y. I will post the example so you can see what I'm talking about.

X Y
1 2
3 3
4 4
6 5 m 0.615384615 1.346153846 b
sm 0.054392829 0.214144783 sb
R2 0.984615385 0.196116135 sy

n= 4
Mean y= 3.5
Σ(xi-mean x)2 13

*********Measured y= 2.72
k= number of replicate measurements of y= 1
Derived x= 2.2325
sx= 0.373502805

## Homework Equations

I'm looking for an equation, or an explanation as to how to obtain the value.

## The Attempt at a Solution

I have asterisks (********) next to the measured y value in the spreadsheet. (The value is 2.72). The only reason I know what it is in this case is because this is an example from my textbook. I have no idea where it comes from, but I need it for an equation to be able to do my lab and I don't know how to find the value.
As far as I can make sense of it, I have no means of calculating 'measured mean value of y for replicate measurements of the unknown', as there are no replicate measurements of the y values. Right?
Just in case it helps, this is for an analytical chemistry lab, but it's pertaining to statistics, so I asked it here.

Thanks.

Welcome to PF, nerdy_hottie!

It looks like your measured y value of 2.72 is given and not calculated.
The purpose it to find the corresponding x.
The derived x is found by applying the inverse of the found linear relationship.

The 2.72 appears to be the result of a set of k=1 measurements.
This is relevant for the estimated sx, the standard deviation of the derived x.

So how would I find the given value for another set of values? Is there a table or something based on the number of k?

The idea is that a new set of y measurements is done for a fixed unknown value of x.

The more measurements, the more accurate y will be, the more accurate will the linear relationship be, and the more accurate will the corresponding resulting x be.

I don't have the formulas at hand, but typically the standard deviations will decrease by a factor of about √k.
I guess what you would need is those formulas.

I have the formulas for finding sx (uncertainty in x), and all other corresponding formulas for all the values I have listed. I have another set of data which I want to find the value for 'measured y', but I don't know what it is to proceed. So is what you're saying that if I have another set of data with only one replicate measurement of y, the value will always be the given 2.72?

I'm saying that the y measurement of 2.72 (and its number of measurements k=1) is not calculated from the data you have shown.
It is drawn from elsewhere.

Yes, but is it constant across all data sets with number of measurements, k=1 ? I mean, if I don't have to calculate it from the data given, and it's a given value for k=1, then isn't it a constant?

I have seen only 1 dataset with only 4 measurements.
I guess it's a constant across this dataset...

For which purpose do you need it?

Okay sorry for any confusion but I didn't want to take the time to post all the data. I was just trying to find out the meaning of that measured y value, and apply it to the data I have now and all other data sets in the future. Right now the data set I'm working with is as follows:

Determination of Cu in Brass Using AA Spec.

Conc. (ppm) Abs.
0.000 0.000
2.044 0.268
4.088 0.509
6.132 0.723

m 0.118 0.014 b
sm 0.004 0.016 sb
R2 0.997 0.019 sy

n= 3
Mean y= 0.500
Σ(xi-mean x)2= 8.355872

Measured y= ?
sx ? (need measured y)
Hope that's a little clearer.

Seems to me you are supposed to measure the absorption yourself a couple of times.
And then fill in that value.
Didn't you say this is for a chem lab?

From that you can find the copper concentration and its associated uncertainty.

You would use the relation:
$$Absorption = (0.118\pm 0.004) \times Concentration + (0.014 \pm 0.016)$$

Last edited:
I have measured the absorbance.. the values are above.
"Abs
0.000
0.268
0.509
0.723 "
for the corresponding values of concentration.
I have calculated average, "Mean y=0.500", and other such parameters, as seen above. I am performing a least squares analysis, and am as far as calculating sx using the formula
sx=(sy/|m|)√(1/k+1/n+((y-$\overline{y}$)2/(m2*$\Sigma$(xi-$\overline{x}$2)))

I just need that value for measured y.

I like Serena said:
Seems to me you are supposed to measure the absorption yourself a couple of times.
And then fill in that value.
Didn't you say this is for a chem lab?

From that you can find the copper concentration and its associated uncertainty.

You would use the relation:
$$Absorption = (0.118\pm 0.004) \times Concentration + (0.014 \pm 0.016)$$

This might not be true *exactly* as written. In regression analysis there are expressions available that give "prediction intervals" for y(x) and "confidence intervals for Ey(x) in terms of x, so the width of an uncertainty bracket is different for different values of x. See, eg.,
http://www.weibull.com/DOEWeb/confidence_intervals_in_simple_linear_regression.htm .

Since the intervals for m and b are correlated, we cannot just use the two intervals separately--as your expression does--although that might give a pretty good approximation in some cases

nerdy_hottie said:
I have measured the absorbance.. the values are above.
"Abs
0.000
0.268
0.509
0.723 "
for the corresponding values of concentration.
I have calculated average, "Mean y=0.500", and other such parameters, as seen above. I am performing a least squares analysis, and am as far as calculating sx using the formula
sx=(sy/|m|)√(1/k+1/n+((y-$\overline{y}$)2/(m2*$\Sigma$(xi-$\overline{x}$2)))

I just need that value for measured y.

Yes, so you did 3 measurements to calibrate, using known concentrations.
Next you would pick substance X with an unknown concentration of copper.
Do k absorption measurements and fill that in in your formula to find the standard deviation of the concentration.

Btw, be careful to put the last square outside the parentheses. It should be ##(x_i-\bar x)^2##.

Ray Vickson said:
This might not be true *exactly* as written. In regression analysis there are expressions available that give "prediction intervals" for y(x) and "confidence intervals for Ey(x) in terms of x, so the width of an uncertainty bracket is different for different values of x. See, eg.,
http://www.weibull.com/DOEWeb/confidence_intervals_in_simple_linear_regression.htm .

Since the intervals for m and b are correlated, we cannot just use the two intervals separately--as your expression does--although that might give a pretty good approximation in some cases

Yep, those were the formulas I was looking for in post #4.
It appears the OP is supposed to use a version that is even more advanced than the ones mentioned.

Actually, this is pretty advanced for a chem lab.

Oh my gosh I'm sorry I don't know what I'm talking about. Yes, I have other values for 'substance x', as you called it (samples of brass) which I have other absorbance values for. I mixed up three different samples (dilutions) of brass using three different masses of the same brass solid, and have absorbances corresponding to these three solutions. These absorbances (of which I have three corresponding to the three different samples) are actually a measurement of the average of three absorbance values, because the machine I used (an atomic absorption spectrometer-AA spec.) actually takes three separate readings of a sample and reports the average value of that sample (which I have listed below)
So I have three values for brass, each is an average the machine took.
Does mean that the k value is 3 (because the number of replicate measurements, or times the machine took an absorbance value, is 3) ? Am I understanding this at all right or totally wrong?
And if the value of k is 3, what then is the corresponding value of 'measured y' for that set of samples?
(I don't think you need it, but the absorbances for my three separate brass samples are:
0.521, 0.511, 0.524)

Good!

Your 'measured y' would be 0.521 for the first brass sample.
And indeed you would have k=3 replicate measurements.
Fill that in your formula, and you'll get the standard deviation for the concentration of copper in this brass sample.

Repeat for the other 2 samples to find the sx in their copper concentrations as well.Or am I misunderstanding and are all those measurements for the same sample of unknown brass?
If that is the case, you should average them and use k=9.

Last edited:
No, you're understanding correctly.
But now I see the place of my confusion in the first place. I thought that sx would be only one value for the whole data set. Now I see that (for this data set), there will be three separate values of sx.
However, going back to the first sample example, where the measured value of y was 2.72. I know that it's a given number and not calculated in any way, but where does the value come from? It is not a value in the list of y values (only values are 2,3,4,5), so where does it come from, you know what I mean? And in this sample example in the book, there is only one value for sx, for the range of all the data. So I'm not saying you're wrong, but it's just that my book only has one value for the whole set.

Well, I can only assume that the example in your book had 4 measurements with known concentrations, and 1 measurement for an unknown concentration.
The sx would be for that one unknown concentration.But... there may be more than one sx mentioned.
When you do a linear regression, you can also determine another sx.
For instance ##s_x=\sqrt{\sum (x_i - \bar x)^2 \over n-1}##.
This could be part of the calculation of m.
Either way, this sx would have no purpose for you.

Alright thanks for all the help, I know what I'm going to do now, whether it's right or not, I don't care much at this point. Most of our mark is based on how accurate our results are, so if this is wrong or not it shouldn't affect my mark a whole lot.

Would you mind to let me know how it ends?

I pass it in tomorrow, and I won't get it back until at least a week after that, but if I remember by then, then sure I will !

Otherwise you have likely made a calculation error (which is not uncommon ;)).
It would be a shame to lose marks over something like that.

Did you get it back?

nerdy_hottie said:

## Homework Statement

So I'm doing a Least Squares Analysis and I'm wondering about what the 'measured mean value of y for replicate measurements of the unknown' value is supposed to be. I have no idea in the world what it's asking for. The value it is speaking of is not the same as the average value in y. I will post the example so you can see what I'm talking about.

X Y
1 2
3 3
4 4
6 5

m 0.615384615 1.346153846 b
sm 0.054392829 0.214144783 sb
R2 0.984615385 0.196116135 sy

n= 4
Mean y= 3.5
Σ(xi-mean x)2 13

*********Measured y= 2.72
k= number of replicate measurements of y= 1
Derived x= 2.2325
sx= 0.373502805

## Homework Equations

I'm looking for an equation, or an explanation as to how to obtain the value.

## The Attempt at a Solution

I have asterisks (********) next to the measured y value in the spreadsheet. (The value is 2.72). The only reason I know what it is in this case is because this is an example from my textbook. I have no idea where it comes from, but I need it for an equation to be able to do my lab and I don't know how to find the value.
As far as I can make sense of it, I have no means of calculating 'measured mean value of y for replicate measurements of the unknown', as there are no replicate measurements of the y values. Right?
Just in case it helps, this is for an analytical chemistry lab, but it's pertaining to statistics, so I asked it here.

Thanks.

I think that, in principle, you may have a problem that is difficult to solve exactly. You did a least-squares fit of y to x, so the statistical output is valid if the model was of the form
$$y = \alpha + \beta x +\epsilon,$$ where ##\alpha, \; \beta## are unknown constants and ##\epsilon## is a mean-0 random variable with variance that does not depend on x. If, further, the distribution of ##\epsilon## is NORMAL, you can develop confidence intervals, etc. You have a series of observations
$$y_i = \alpha + \beta x_i + \epsilon_i, \: i = 1,2, \ldots, n,$$ where the different ##\epsilon_i## are mutually independent and have the same distribution.

You do not know ##\alpha## and ##\beta##, but instead you estimate them as 'a' and 'b' using least-squares formulas. Assuming correctness of the form of statistical model, a and b will be unbiased estimates of the underlying parameters, and the computed total squared error ##S^2## will be related to ##\sigma^2 = \text{Var}(\epsilon)##, via standard formulas.

Standard formulas allow us to give confidence intervals on Ey(x) and on y(x) at some future-measured value of x. These formulas a a bit complicated, but can be found in many sources. However, what you seem to want to do is almost the opposite: you measure y and want to know about x. So, if the original model is valid, what you have is
$$y = \alpha + \beta x + \epsilon \: \Longrightarrow x = \frac{y - \alpha - \epsilon}{\beta} = \frac{y}{\beta} - \frac{\alpha}{\beta} - \frac{\epsilon}{\beta}.$$ Now the problem you face is that the expected value of ##y/ \beta## is not ##y/b## (although it may be close, sometimes) and that the expected value of ##\alpha / \beta## is not ##a/b##. I suspect that getting exact formulas is somewhere between difficult and impossible, although, of course, one can always resort to Monte-Carlo simulation to get rough estimates.

However, I suspect you are supposed to put ##a## instead of ##\alpha## and ##b## instead of ##\beta## to get an estimate for the mean of x. Whether or not that is of any use is not at all clear.

I only triggered this thread because I was curious for the results.

I like Serena said:
I only triggered this thread because I was curious for the results.

Well, yes, I tried to, but I lost track of what was going on, and how it related to the OP's original question. In part the problem is that the thread is diffusing and introducing different strains, so it is no longer east to follow.

Besides, if you don't think my remarks are relevant, just ignore them. Maybe the OP will find them helpful, or maybe not---I hoped they were. He/she can be the judge of that.

## 1. What is the purpose of using the least squares method?

The purpose of using the least squares method is to find the best fitting line or curve through a set of data points. This method minimizes the sum of the squared distances between the data points and the line or curve, making it a useful tool for analyzing and predicting trends in data.

## 2. How is the measured mean value of y calculated using least squares?

The measured mean value of y is calculated by taking the sum of all the y values in a data set and dividing it by the number of data points. In the least squares method, this value is also known as the intercept of the best fitting line or curve through the data points.

## 3. What does the "least squares" part of the method name refer to?

The "least squares" part of the method name refers to the fact that the method minimizes the sum of the squared distances between the data points and the line or curve. This means that it finds the line or curve that best fits the data by minimizing the overall error.

## 4. Can the least squares method be used for any type of data set?

Yes, the least squares method can be used for any type of data set, as long as there is a relationship between the x and y values. It is commonly used in regression analysis to determine the relationship between two variables and make predictions based on that relationship.

## 5. Are there any limitations to using the least squares method?

One limitation of the least squares method is that it assumes a linear relationship between the variables being analyzed. If the relationship is actually non-linear, the method may not produce accurate results. Additionally, outliers in the data set can also affect the accuracy of the method.

• Other Physics Topics
Replies
2
Views
587
• Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
• Precalculus Mathematics Homework Help
Replies
5
Views
2K
• Calculus
Replies
2
Views
1K
• Introductory Physics Homework Help
Replies
7
Views
1K
• Calculus and Beyond Homework Help
Replies
3
Views
2K
• Linear and Abstract Algebra
Replies
9
Views
2K
• Calculus and Beyond Homework Help
Replies
10
Views
2K
• Set Theory, Logic, Probability, Statistics
Replies
4
Views
2K
• Special and General Relativity
Replies
146
Views
7K