The line of best fit. Need help?

  • Thread starter sutupidmath
  • Start date
  • Tags
    Fit Line
In summary, the line of best fit can be determined using the equation y=mx+b, where m is the slope and b is the y-intercept. The expressions for m and b are SS(xy)/SS(x) and [SUM(y)-m*SUM(x)]/n, respectively. The proof for these expressions involves using calculus and minimizing the sum of squared errors.
  • #1
sutupidmath
1,630
4
Well, i am taking a first course in elementary statistics, so we do not actually prove anything at all. So i was wonderign how does one determine the line of best fit?
y=mx+b, where m is the slope of the line of best fit,
I know that the slope is equal to

m=SS(xy)/SS(x), where SS(x) is the sum of square of x, while SS(xy) the sum of the squares of x,y. also

b=[SUM(y)-m*SUM(x)]/n but i have no idea how one would come up with these expressions.
Can somebody show a proof for this, or just point me to the right direction?
 
Physics news on Phys.org
  • #2
The proof reqires a knowledge of elementary calculus. Basic idea is to assume a straight line fit with m and b unknown. Set up the expression for the sum of the squares of the distances of the points from the line. Find m and b which minimize the expression.
 
  • #3
Let μ and β be the least squares estimators of m and b in y = mx + b.

The estimated equation is then y[t] = β + μ x[t] + u[t] where u is the residual error and t indexes "data row" (e.g., observation).

u[t] = y[t] - (β + μ x[t])

u[t]^2 = (y[t] - (β + μ x[t]))^2

Sum over t:
Σt u[t]^2 = Σt (y[t] - (β + μ x[t]))^2

Now minimize with respect to μ and β by differentiating Σt u[t]^2 with respect to μ and β separately, setting each derivative to zero, then solving for μ and β that satisfy these two equations simultaneously:

∂Σt u[t]^2/∂μ = ∂Σt u[t]^2/∂β = 0.
 
Last edited:
  • #4
EnumaElish said:
Let μ and β be the least squares estimators of m and b in y = mx + b.

The estimated equation is then y[t] = β + μ x[t] + u[t] where u is the residual error and t indexes "data row" (e.g., observation).

u[t] = y[t] - (β + μ x[t])

u[t]^2 = (y[t] - (β + μ x[t]))^2

Sum over t:
Σt u[t]^2 = Σt (y[t] - (β + μ x[t]))^2

Now minimize with respect to μ and β by differentiating Σt u[t]^2 with respect to μ and β separately, setting each derivative to zero, then solving for μ and β that satisfy these two equations simultaneously:

∂Σt u[t]^2/∂μ = ∂Σt u[t]^2/∂β = 0.

Thankyou for your replies! I totally forgot to let you know that i had managed to get to the result i wanted, my approach was almost identical with what u did, with the exception of notation, but i got to the result!

thnx both of you!
 

Related to The line of best fit. Need help?

1. What is the line of best fit?

The line of best fit is a straight line that represents the relationship between two variables in a scatter plot. It is determined by calculating the slope and intercept of the line that minimizes the distance between the line and all of the data points.

2. Why is the line of best fit important?

The line of best fit is important because it helps us to visualize and understand the relationship between two variables in a scatter plot. It can also be used to make predictions and analyze trends in the data.

3. How is the line of best fit calculated?

The line of best fit is calculated using a method called linear regression. This involves finding the slope and intercept of the line that minimizes the sum of the squared distances between the line and all of the data points.

4. Can the line of best fit be used for any type of data?

The line of best fit is most commonly used for linear data, where the relationship between the two variables can be represented by a straight line. However, it can also be used for non-linear data by transforming the data or using other regression methods.

5. What does the slope and intercept of the line of best fit tell us?

The slope of the line of best fit represents the rate of change between the two variables, while the intercept represents the value of the dependent variable when the independent variable is equal to zero. These values can help us to interpret the relationship between the variables and make predictions based on the data.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
763
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
16
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
5K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
908
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
1K
Back
Top