Question about creating a regression model

  • Thread starter Thread starter celery1
  • Start date Start date
  • Tags Tags
    Model Regression
AI Thread Summary
The discussion centers on the need for formulas and code to perform polynomial regression, specifically quadratic and higher-order polynomial regressions, in the context of a plasma physics research project. The user has successfully implemented linear regression but is seeking guidance on how to extend this to polynomial functions, ideally up to sixth order. They express interest in understanding the mathematical foundations, including the use of the Vandermonde matrix and least squares techniques for solving polynomial regression problems.A key point raised is the importance of context when fitting data to polynomial models. While the user aims to find the best polynomial fit for a dataset representing changes in velocity over time, there is a cautionary note about the potential pitfalls of fitting data without a specific model in mind, as it may lead to misleading conclusions. The user is also looking for practical code examples to facilitate the implementation of these polynomial regression techniques.
celery1
Messages
2
Reaction score
0
Hey, so I've started doing a plasma physics research project and one of the things that I have to do is design a function which approximates a curve based on data points that its fed. So far I found the formula for creating a linear regression, but I'm having trouble finding the formulas for quadratic and higher level polynomial regressions.
I know that a calculator can do it, but I can't find source code in any language to compare it to.

More or less what I'm looking for is either code in some language like this one

y = mx + b
"""
Sx = Sy = Sxx = Sxy = Syy = 0.0
n = len(pairs)
for x,y in pairs:
Sx = Sx + x
Sy = Sy + y
Sxx = Sxx + x*x
Sxy = Sxy + x*y
Syy = Syy + y*y
m = ((n * Sxy) - (Sx * Sy)) / ((n * Sxx) - Sx ** 2)
b = (Sy - (m * Sx)) / n
r = ((n * Sxy) - (Sx * Sy)) / (math.sqrt((n * (Sxx)) - (Sx ** 2)) *
math.sqrt((n * Syy) - (Sy ** 2)))
print("y = %sx + %s" % (m, b))
print("r = %s" % r)
return m, b, r

Where I can just translate it

Except for something which gives the formulas for polynomial functions like

y= ax^2+bx+c
Where this gives the quadratic regression formula
Or simply the formula for calculating quadratic regression, cubic regression and so on. I would like to go as high as sixth order but please if you have anything that would help please post it.
 
Technology news on Phys.org
I gather you are looking to solve for the an in

y = \sum_{n=0}^N a_n\,x^n

Note the equation is linear in terms of the coefficients an: It's still a linear regression. Applying least squares techniques leads to

\begin{bmatrix}<br /> v_0 &amp; v_1 &amp; \cdots &amp; v_n \\<br /> v_1 &amp; v_2 &amp; \cdots &amp; v_{n+1} \\<br /> \vdots &amp; \vdots &amp; \ddots &amp; \vdots \\<br /> v_n &amp; v_{n+1} &amp; \cdots &amp; v_{2n}\end{bmatrix}<br /> \begin{bmatrix} a_0 \\ a_1 \\ \vdots \\ a_n\end{bmatrix} =<br /> \begin{bmatrix} u_0 \\ u_1 \\ \vdots \\ u_n\end{bmatrix}

The matrix on the left is the Vandermonde matrix:

v_k = \sum_{i=1}^M x_i^k

where M is the number of observations and xi is the ith observation. The vector on the right is formed by

u_k =\sum_{i=1}^M y_ix_i^k

There are special techniques for solving the above. Google Vandermonde matrix for more.
 
Hey celery1 and welcome to the forums.

Just a question for you that I feel is important: Do you want to the data to a particular model for a reason or are you just fitting the model to different polynomial models?

The reason I bring this up is because finding a polynomial that gives the best fit may not be a good idea if you want to explain the data.

If on the other hand you had a particular model in mind for a reason (like for example an inverse-square relationship in a gravitational or electromagnetic experiment) and you wanted to test the fit to that particular model, that is one thing because there is context in this scenario.

If you want to just find the best polynomial that fits, then I would have to ask what you are trying to find out, because fitting data to a polynomial without context is dangerous and might give the wrong conclusion.
 
Pretty much its the output from my project which is essentially a change in velocity with respect to time. So essentially its just a bunch of points x,y and I wanted to determine which function fits them the best to describe the data that I'm seeing.
Its the stuff that I posted for the linear regression curve but expanded for other polynomials and then I'm thinking of using the r value to get the best polynomial of them all.

I would like to use the matrix that DH posted but I don't fully understand how I would get the least squares regression from it.
 
Dear Peeps I have posted a few questions about programing on this sectio of the PF forum. I want to ask you veterans how you folks learn program in assembly and about computer architecture for the x86 family. In addition to finish learning C, I am also reading the book From bits to Gates to C and Beyond. In the book, it uses the mini LC3 assembly language. I also have books on assembly programming and computer architecture. The few famous ones i have are Computer Organization and...
I had a Microsoft Technical interview this past Friday, the question I was asked was this : How do you find the middle value for a dataset that is too big to fit in RAM? I was not able to figure this out during the interview, but I have been look in this all weekend and I read something online that said it can be done at O(N) using something called the counting sort histogram algorithm ( I did not learn that in my advanced data structures and algorithms class). I have watched some youtube...

Similar threads

Back
Top