Data analysis: 2D Surface fitting from raw data

In summary: These provide pre-written and optimized functions for a variety of mathematical tasks such as curve fitting and regression.In summary, the conversation revolves around generating data and fitting a multivariate polynomial to it without using expensive software like Mathematica. The approach discussed involves minimizing the square of the error between observed and predicted values using differentiation and solving linear equations. The use of free computer algebra systems and pre-written functions in python is also suggested.
  • #1
lepatterso
1
0
Hey everybody

I've got a question for the programming savvy. I'm generating data which can generally be described by..

f(x,y) = Ʃn(an * xn)*Ʃn(bn * yn)

Basically what I'm looking at is a data set which shows a surface which looks like about 1/4th of a wonky bowl.

My trouble is that I need to calculate derivatives of f wRT x&y, and do some mathematical magic. To do that, I need to first generate f(x,y) from my raw data. So far, I've always used a canned package like Mathematica or MATLAB to generate that function, but this time I've got to build it from scratch. (Those licenses are expensive!)

Does anyone have any good examples of a fitting routine worked out for a 2d surface? I'm working in python, and am relatively new to no training wheels scientific computing.

Thanks!
 
Technology news on Phys.org
  • #2
If you have some confidence that your model f(x,y) is good then it isn't too difficult to fit a multivariate polynomial to your data without needing Mathematica.

Your goal is to minimize the square of the error between your observed and predicted z values.

To minimize a square you differentiate the square, set equal to zero and solve for the variables.

Suppose a simplified example where I am going to generate some data points {x,y,2x^2+3y+5} and then pretend I haven't seen that 2,3 or 5: {{3, -3, 14}, {-4, -3, 28}, {-3, -5, 8}, {2, 2, 19}, {-5, -5, 40}}

You want to minimize the sum of (zi - (a xi^2+b yi+c))^2.

Your xi,yi and zi are now constants, your variables are a,b,c.

To differentiate (zi - (a xi^2+b yi+c))^2 with respect to a gives 2(zi - (a xi^2+b yi+c))(-xi^2) and similarly for b and c.

Thus you want to solve

{0 == Sum(2 (zi - (a xi^2 + b yi + c)) (-xi^2)),
0 == Sum(2 (zi - (a xi^2 + b yi + c)) (-yi)),
0 == Sum(2 (zi - (a xi^2 + b yi + c)) (-1))}

where you sum over all your {xi,yi,zi}

That partially simplifies to

{0 == -8(19-4 a-2 b-c)-32(28-16 a+3 b-c)-18(14-9 a+3 b-c)-50(40-25 a+5 b-c)-18(8-9 a+5 b-c),
0 == -4(19-4 a-2 b-c)+6(28-16 a+3 b-c)+6 (14-9 a+3 b-c)+10(40-25 a+5 b-c)+10(8-9 a+5 b-c),
0 == -2(19-4 a-2 b-c)-2(28-16 a+3 b-c)-2 (14-9 a+3 b-c)-2(40-25 a+5 b-c)-2(8-9 a+5 b-c)}

which simplifies to

{0 == -3444 + 2118 a - 474 b + 126 c,
0 == 656 - 474 a + 144 b - 28 c,
0 == -218 + 126 a - 28 b + 10 c}

for the given data set.

Notice that these are n linear equations in n unknowns. That makes solving easy.

Solving that gives a==2, b==3, c==5 which were exactly the coefficients I used to generate the data set in the beginning and then waited to see if they would correctly reappear.

Check every detail of this carefully to make certain I have not made any errors.

If you study this for a while and remember some of your calculus and try this on a few examples where you know what the answer should be so you can flush out any errors or misunderstandings then you should be able to do multivariate polynomial regression on your data without needing Mathematica. In the process you may gain a deeper understanding of why regression works.

I urge you to not think that if a cubic polynomial sort of fit the data then a tenth degree polynomial must be better. But this has gone on long enough.

You could later consider trying Reduce, Maxima, Axiom, Sage or any of the other available "free" computer algebra systems. But you pay for those in trying to figure out where to get them, how to install them, how to use them, where to find people who can help answer questions, etc.
 
Last edited:
  • #3
Bill's lesson looks good.

But, if you want to start taking advantage of popular algorithms in python, you may want to familiarize yourself with scipy and other packages.
 

1. What is data analysis and why is it important?

Data analysis is the process of examining, cleaning, and organizing data in order to draw meaningful conclusions and make informed decisions. It is important because it allows us to gain insights and make predictions based on the data, which can be used to improve processes, make better decisions, and drive growth.

2. What is 2D surface fitting in data analysis?

2D surface fitting is a data analysis technique used to find the best-fit curve or surface that represents a set of data points in a two-dimensional space. This is typically done by using mathematical algorithms to minimize the error between the fitted curve or surface and the actual data points.

3. What are the steps involved in 2D surface fitting?

The steps involved in 2D surface fitting include: 1) selecting the appropriate mathematical model for the data, 2) identifying and removing any outliers or erroneous data points, 3) determining the best-fit curve or surface using mathematical algorithms, and 4) evaluating the fit of the curve or surface to the data.

4. What are the common challenges in 2D surface fitting?

Some common challenges in 2D surface fitting include dealing with noisy or incomplete data, choosing the appropriate mathematical model for the data, and determining the best-fit curve or surface with a reasonable level of accuracy. Additionally, overfitting (when the curve or surface fits the data too closely and does not accurately represent the underlying trend) can also be a challenge.

5. How is 2D surface fitting used in real-world applications?

2D surface fitting is used in a variety of real-world applications, such as image and signal processing, computer graphics, financial forecasting, and scientific research. It is also commonly used in engineering and manufacturing to analyze and improve product design and performance. Additionally, it is used in data visualization to create smooth and visually appealing representations of data.

Similar threads

  • STEM Educators and Teaching
Replies
5
Views
647
  • Programming and Computer Science
Replies
2
Views
1K
  • Programming and Computer Science
Replies
4
Views
2K
  • Programming and Computer Science
Replies
15
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
Replies
22
Views
6K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
893
  • DIY Projects
Replies
33
Views
2K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
1
Views
1K
  • General Math
Replies
19
Views
1K
Back
Top