Question about creating a regression model

  • Thread starter Thread starter celery1
  • Start date Start date
  • Tags Tags
    Model Regression
Click For Summary

Discussion Overview

The discussion revolves around creating regression models, specifically focusing on polynomial regression for a plasma physics research project. Participants explore methods for fitting data points to polynomial functions, including linear, quadratic, and higher-order polynomials, while also considering the implications of model selection.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant seeks code and formulas for polynomial regression beyond linear regression, specifically up to sixth order.
  • Another participant explains the relationship between polynomial regression and linear regression, introducing the concept of the Vandermonde matrix and its role in solving for coefficients.
  • A third participant questions the purpose of fitting a polynomial model, emphasizing the importance of context in model selection and the potential risks of fitting data without a specific theoretical framework.
  • The original poster clarifies that they are looking to describe a change in velocity with respect to time using polynomial functions and intends to use the correlation coefficient (r value) to determine the best fit.
  • The original poster expresses uncertainty about how to apply the least squares regression technique using the Vandermonde matrix described by the second participant.

Areas of Agreement / Disagreement

Participants do not reach a consensus on the best approach to polynomial regression, with differing views on the importance of context in model fitting and the implications of using polynomial models without a theoretical basis.

Contextual Notes

There are unresolved questions regarding the application of least squares regression using the Vandermonde matrix, as well as the assumptions underlying the choice of polynomial models for data fitting.

celery1
Messages
2
Reaction score
0
Hey, so I've started doing a plasma physics research project and one of the things that I have to do is design a function which approximates a curve based on data points that its fed. So far I found the formula for creating a linear regression, but I'm having trouble finding the formulas for quadratic and higher level polynomial regressions.
I know that a calculator can do it, but I can't find source code in any language to compare it to.

More or less what I'm looking for is either code in some language like this one

y = mx + b
"""
Sx = Sy = Sxx = Sxy = Syy = 0.0
n = len(pairs)
for x,y in pairs:
Sx = Sx + x
Sy = Sy + y
Sxx = Sxx + x*x
Sxy = Sxy + x*y
Syy = Syy + y*y
m = ((n * Sxy) - (Sx * Sy)) / ((n * Sxx) - Sx ** 2)
b = (Sy - (m * Sx)) / n
r = ((n * Sxy) - (Sx * Sy)) / (math.sqrt((n * (Sxx)) - (Sx ** 2)) *
math.sqrt((n * Syy) - (Sy ** 2)))
print("y = %sx + %s" % (m, b))
print("r = %s" % r)
return m, b, r

Where I can just translate it

Except for something which gives the formulas for polynomial functions like

y= ax^2+bx+c
Where this gives the quadratic regression formula
Or simply the formula for calculating quadratic regression, cubic regression and so on. I would like to go as high as sixth order but please if you have anything that would help please post it.
 
Technology news on Phys.org
I gather you are looking to solve for the an in

y = \sum_{n=0}^N a_n\,x^n

Note the equation is linear in terms of the coefficients an: It's still a linear regression. Applying least squares techniques leads to

\begin{bmatrix}<br /> v_0 &amp; v_1 &amp; \cdots &amp; v_n \\<br /> v_1 &amp; v_2 &amp; \cdots &amp; v_{n+1} \\<br /> \vdots &amp; \vdots &amp; \ddots &amp; \vdots \\<br /> v_n &amp; v_{n+1} &amp; \cdots &amp; v_{2n}\end{bmatrix}<br /> \begin{bmatrix} a_0 \\ a_1 \\ \vdots \\ a_n\end{bmatrix} =<br /> \begin{bmatrix} u_0 \\ u_1 \\ \vdots \\ u_n\end{bmatrix}

The matrix on the left is the Vandermonde matrix:

v_k = \sum_{i=1}^M x_i^k

where M is the number of observations and xi is the ith observation. The vector on the right is formed by

u_k =\sum_{i=1}^M y_ix_i^k

There are special techniques for solving the above. Google Vandermonde matrix for more.
 
Hey celery1 and welcome to the forums.

Just a question for you that I feel is important: Do you want to the data to a particular model for a reason or are you just fitting the model to different polynomial models?

The reason I bring this up is because finding a polynomial that gives the best fit may not be a good idea if you want to explain the data.

If on the other hand you had a particular model in mind for a reason (like for example an inverse-square relationship in a gravitational or electromagnetic experiment) and you wanted to test the fit to that particular model, that is one thing because there is context in this scenario.

If you want to just find the best polynomial that fits, then I would have to ask what you are trying to find out, because fitting data to a polynomial without context is dangerous and might give the wrong conclusion.
 
Pretty much its the output from my project which is essentially a change in velocity with respect to time. So essentially its just a bunch of points x,y and I wanted to determine which function fits them the best to describe the data that I'm seeing.
Its the stuff that I posted for the linear regression curve but expanded for other polynomials and then I'm thinking of using the r value to get the best polynomial of them all.

I would like to use the matrix that DH posted but I don't fully understand how I would get the least squares regression from it.
 

Similar threads

  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 4 ·
Replies
4
Views
4K
  • · Replies 23 ·
Replies
23
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 89 ·
3
Replies
89
Views
6K
  • · Replies 17 ·
Replies
17
Views
3K