Linear Least-Squares Regression: a_0 = ?

In summary: Adding more (0,0) points into the regression might move the line closer to the point where a0 = 0, but it also increases the unreliability of the line. So it's up to you to decide if the reliability of the line is more important than the accuracy of the line.
  • #1
s3a
818
8

Homework Statement


Hello to everyone that's reading this. :)

For this linear least-squares regression problem (typed below and also), I correctly find the value of g (which is what the problem statement wants to have found), but I was curious about the value of ##a_0## (and that's what this entire thread is about).

Problem statement (Alternatively, one can view this PDF: http://docdro.id/GmeGXNr):
"To measure g (the acceleration due to gravity) the following experiment is carried out. A ball is dropped from the top of a 30-m-tall building. As the object is falling down, its speed v is measured at various heights by sensors that are attached to the building. The data measured in the experiment is given in the table.

x (m), v (m/s)
0, 0
5, 9.85
10, 14.32
15, 17.63
20, 19.34
25, 22.41

In terms of the coordinates shown in the figure (positive down), the speed of the ball v as a function of the distance x is given by ##v^2 = 2gx##. Using linear regression, determine the experimental value of g."

Homework Equations


##a_1 = (n Sxy - Sx Sy) / (n Sxx - (Sx)^2)##
##a_0 = (Sxx Sy - Sxy Sx) / (n Sxx - (Sx)^2)##

The Attempt at a Solution


The solution in the PDF:
"The equation v^2 = 2gx can be transformed into linear form by setting Y = v^2. The resulting equation Y = 2gx, is linear in Y and x with m = 2g and b = 0. Therefore, once m is determined, g can be calculated using g = m/s. The calculations are done by executing the following MATLAB program (script file):
Code:
clear all; clc;
x=[0 5 10 15 20 25];
y=[0 9.85 14.32 17.63 19.34 22.41];
Y=y.^2;
X=x;
% Equation 5-13
SX=sum(X);
SY=sum(Y);
SXY=sum(X.*Y);
SXX=sum(X.*X);
% Equation 5-14
n=length(X);
a1=(n*SXY-SX*SY)/(n*SXX-SX^2)
a0=(SXX*SY-SXY*SX)/(n*SXX-SX^2)
m=a1
b=a0
g=m/2

When the program is executed, the following values are displayed in the Command Window:
a1 = 19.7019
a0 = 1.9170
m = 19.7019
b = 1.9170
g = 9.8510

Thus, the measured value of g is 9.8510 m/s^2."

Basically, what's I'd like to know is:
Should the value of ##a_0## be 0 or 1.9170380952380952381? What "wins"? The ##a_0 = (Sxx Sy - Sxy Sx) / (n Sxx - (Sx)^2)## formula or the zero term in v^2 = 2gx + 0?

Any input would be greatly appreciated!
 
Physics news on Phys.org
  • #2
We know from what a0 represents (the velocity at time 0), that it must be 0. I can't immediately think of a nice way to force that, but I tried adding more (0,0) points into the regression. That should do more to force the regression to a0 = 0 without invalidating the other data points. By forcing the regression line more toward the (0,0) point, it should also get a steeper slope to go through the other data points. With 10 such (0,0) points added, the regression gave a0 = 0.3073 and m = 19.7897. With 20 such (0,0) points added, the regression gave a0 = 0.1598 and m = 19.7978.

Forcing a0=0 is bringing the value of g farther from the standard value of 9.807. That suggests the possibility that the ball was actually thrown down with some velocity like your original a0 = 1.9170, and that the original (0,0) data point is wrong. In that case, your original results would be more legitimate.

Gravity does change from place to place, so it is hard to say what is right. I don't know how much it might change.
 
Last edited:
  • #3
Okay, so it now makes sense to me why it seems that the question is not asking for the line as a whole; it's just asking for linear regression to be used (in order to indirectly find g). If we change the slope (such as by adding a lot of (0,0) points), we're increasing the unreliability/overall error of the line, and it seems to me that the reliability of the line, in general, is more important than forcing a_0 to be equal to 0 (especially since it is known with 100% accuracy that the square of the velocity is 0, when the distance traveled is 0, and since we want a_1 to be as accurate as possible so that we get a value for g that's as accurate as possible, and a_1 is most accurate when a_0 is ≈1.92, instead of 0).

Please do let me know if anything I just said is wrong, but I'm optismitic about it all being correct.

Thanks for your input! :)
 
  • #4
In an ideal world, a0 = 0. Since the data doesn't exactly support that, you have to use your best judgment.
 

FAQ: Linear Least-Squares Regression: a_0 = ?

1. What is the definition of "Linear Least-Squares Regression"?

Linear Least-Squares Regression is a statistical method used to find the best-fitting line for a set of data points. It calculates the line that minimizes the sum of the squared distances between the data points and the line.

2. How is the y-intercept, or a0, calculated in Linear Least-Squares Regression?

The y-intercept, or a0, is calculated by finding the point where the regression line intersects the y-axis. It is determined by the formula a0 = &bar;y - a1&bar;x, where &bar;y is the mean of the dependent variable and &bar;x is the mean of the independent variable.

3. What is the significance of the y-intercept, or a0, in Linear Least-Squares Regression?

The y-intercept, or a0, represents the starting point of the regression line on the y-axis. It can provide insight into the relationship between the dependent and independent variables and can be used to make predictions about the dependent variable when the independent variable is 0.

4. Can the y-intercept, or a0, ever be a negative value in Linear Least-Squares Regression?

Yes, the y-intercept, or a0, can be a negative value in Linear Least-Squares Regression. This indicates that the regression line intersects the y-axis below 0 and can have a downward slope.

5. How can the accuracy of the y-intercept, or a0, be determined in Linear Least-Squares Regression?

The accuracy of the y-intercept, or a0, can be determined by calculating the standard error of the estimate. This measures the average distance between the actual data points and the predicted values from the regression line. A smaller standard error of the estimate indicates a more accurate y-intercept.

Back
Top