Least Square Method to fit a line to a set of datapoints

  • Thread starter Thread starter MatinSAR
  • Start date Start date
  • Tags Tags
    Method Square
Click For Summary
SUMMARY

The discussion focuses on the Least Squares Method for fitting a line to a set of data points represented as (x_i, y_i) in the form y = a_0 + a_1x. The objective is to minimize the total error E, defined as E = ∑(y_i - a_0 - a_1x_i)². To find the coefficients a_0 (intercept) and a_1 (slope), participants discuss deriving the uncertainties associated with these coefficients, emphasizing the need for assumptions regarding the residuals, specifically normality and homoscedasticity. References are provided for further reading on calculating uncertainties in linear regression.

PREREQUISITES
  • Understanding of linear regression concepts
  • Familiarity with error minimization techniques
  • Knowledge of statistical terms such as homoscedasticity and normality
  • Basic proficiency in calculus for deriving equations
NEXT STEPS
  • Study the derivation of uncertainties for linear regression coefficients a_0 and a_1
  • Learn about the implications of homoscedasticity in regression analysis
  • Explore the application of the Least Squares Method in Python using libraries like NumPy or SciPy
  • Investigate the differences between homoscedasticity and heteroscedasticity in statistical modeling
USEFUL FOR

Students in introductory physics courses, data analysts, statisticians, and anyone involved in linear regression analysis seeking to understand error estimation and coefficient uncertainties.

MatinSAR
Messages
673
Reaction score
204
Homework Statement
Given a set of data points, we aim to fit a line to these points by minimizing the total error and finding the coefficients ##a_0##(intercept) and ##a_1##(slope).

Each of these data points has an associated error. Derive the expressions that give the errors (uncertainties) of ##a_0## and ##a_1##.
Relevant Equations
I'll mention the relevant equations later in my solution.
We are given a set of points ##(x_i , y_i)##. If we want to fit a line to these points which has the form of ##y=a_0+a_1x##, we need to do it in a way which minimizes the total error E:$$E = \sum_{i=1}^n (y_i - a_0 - a_1x_i)^2$$So we set ##\frac{\partial E}{\partial a_0} = 0## and ##\frac{\partial E}{\partial a_1} = 0## and solve the system of equations. Then we get:

1734639701853.png

1734639731835.png


My problem , I have no idea how to start with errors to find uncertainties of ##a_0## and ##a_1##.
 
Physics news on Phys.org
I believe you need to make some assumptions on the residuales; IIRC, normality and homostadicity, in order to find the distribution of the slope, intercept. Under " reasonable" conditions, they are both normal and converge to the true value.
 
  • Like
Likes   Reactions: Gavran and MatinSAR
WWGD said:
homostadicity
homoschedasticity ?
 
  • Like
Likes   Reactions: MatinSAR
haruspex said:
homoschedasticity ?
Something like that.
 
  • Like
Likes   Reactions: PhDeezNutz and MatinSAR
According to this Wikipedia article,
In statistics, a sequence of random variables is homoscedastic (/ˌhoʊmoʊskəˈdæstɪk/) if all its random variables have the same finite variance; this is also known as homogeneity of variance. The complementary notion is called heteroscedasticity, also known as heterogeneity of variance. The spellings homoskedasticity and heteroskedasticity are also frequently used. “Skedasticity” comes from the Ancient Greek word “skedánnymi”, meaning “to scatter”.
 
  • Like
Likes   Reactions: MatinSAR and WWGD
MatinSAR said:
Homework Statement: Given a set of data points, we aim to fit a line to these points by minimizing the total error and finding the coefficients ##a_0##(intercept) and ##a_1##(slope).

Each of these data points has an associated error. Derive the expressions that give the errors (uncertainties) of ##a_0## and ##a_1##.
...
My problem , I have no idea how to start with errors to find uncertainties of ##a_0## and ##a_1##.

Here are a few links in an old thread (with a link to an even older thread, etc. etc. -- sigh -- turtles all the way down).

I specially recommend Kirchner

##\ ##
 
  • Like
Likes   Reactions: MatinSAR
BvU said:
Here are a few links in an old thread (with a link to an even older thread, etc. etc. -- sigh -- turtles all the way down).

I specially recommend Kirchner

##\ ##
Thank you for providing the links. Of course, I'll eventually find a post that doesn't link to an older one.
 
  • Like
Likes   Reactions: BvU
  • #10
Why is this thread placed in the Introductory Physics Homework Help?
 
  • #11
Gavran said:
Why is this thread placed in the Introductory Physics Homework Help?
Because students in introductory physics courses with a lab component are sometimes required to use linear regression to analyze their data?
 
Last edited:
  • Like
Likes   Reactions: MatinSAR and Gavran
  • #12
  • Like
Likes   Reactions: kuruman

Similar threads

  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 2 ·
Replies
2
Views
4K
  • · Replies 2 ·
Replies
2
Views
5K
  • · Replies 13 ·
Replies
13
Views
2K
  • · Replies 15 ·
Replies
15
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 4 ·
Replies
4
Views
1K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 11 ·
Replies
11
Views
2K