Finding periodic best-fit equation for data set?

In summary: B, U, V, D, E and requires the use of non-linear regression. The method of the paper "Régressions et équations intégrales" explains how to proceed. But, it is not a plug-in for Excel.This method is based on the use of a numerical integration. As a consequence, I am not sure that it is compatible with the use of "Solver".Furthermore, the non-linear regression requires a good estimation of the starting values of the unknowns. Regards.In summary, the conversation revolves around finding a simple trigonometric function that can accurately represent a given data set. The person asking for help has some basic knowledge in
  • #1
DyslexicHobo
251
0
Hello,

I have a data set that follows an equation similar to sin(x)+x. Just from eyeballing the data, it seems like there should be a pretty simple trigonometric function A*sin(B*x)+C*x. I went to school for engineering so I have some basic/intermediate knowledge of mathematics but it's been a while since I've applied it. Is there any software (freeware or Excel) that I can use? How about any cool math tricks?

Thanks!
 
Mathematics news on Phys.org
  • #2
Just an update with an idea I had... I set up an excel document. The first four columns are the following: Y values from data, Y = A*sin(B*x)+C*x+D, X values from data, (B-A)^2. I plotted both on the same graph and am attempting to eyeball approximate values for A, B, C, and D. Then I'll write a macro to parametrically minimize r^2 by adjusting A, B, C, and D.

Will this work?
 
  • #3
Hi !
The methods used to solve this kind of problems are called "Non-linear regression".
The case of trigonometric functions mixed with other usual functions are known to be difficult in practice.
The more or less difficulties depend a lot of the experimental data (number of experimental measurements, how they are distributed on few or many periods, etc.)
Since I am on the verge to leave for several days, I will let some one else give you advices about the available statistical softwares. I suppose that some people which have experience on the subject will give you the information.
Nevertheless, when I will come back, if you have not found a method giving good results with your data, I suggest to try another method. The general principle is explained in a paper "Régressions et équations intégrales" :
http://www.scribd.com/JJacquelin/documents
(Pages 25-36 :"Régression sinusoïdale")
No real need to read the paper, which is not translated yet. Moreover, the function considered is A+B*sin(W*x)+C*cos(W*x). So, some modifications have to be made in order to apply to the function A*sin(B*x)+C*x. Then, if you still need it, I could write the algorithm corresponding to your function. It's rather simple because that is a staight forward computation (i.e. non-recursive).
 
Last edited:
  • #4
Jacquelin,

Thank you for your response. I've actually just found a reasonable method using these steps (which I created myself, so maybe it is not the most accurate method):
1. Created 4 columns in excel
i. Y values from data.
ii. Y = A*sin(B*x + C)+D*x+E
iii. X values from data
iv. (B-A)^2

2. Used an educated guess to guess values for A, B, C, D, and E
i. I wasn't sure how to guess the amplitude, so I left this for last and just eyeballed different values
ii. I know the period is 365 days and my unit for x is in days, so for B I used 2∏/365
iii. Again, I wasn't sure how to estimate the phase shift so I just eyeballed this after the other constants were approximated
iv. For this, I used Excel's linear line of best fit for my actual data set and plugged in the slope value
v. Same as above; used Excel's linear line of best fit for my actual data set and plugged in the y-intercept value

3. Created another cell that was equal to the average of all the r^2 values

4. I used Excel's "Solver" add-in to find the minimum value of the cell in step 3 by parametrically changing the values for A, B, C, D, and E.

Although this line of best fit doesn't actually fit quite as well as I'd hoped, I'm very excited I was able to figure this out on my own. It feels like solving a difficult homework problem. It's not a good enough equation to use as a predictor of future behavior like I was hoping. I'm fairly certain this is just because of the unpredictable nature of my data and not actually my method of finding an equation. I'm certainly open to suggestions for improvements to my method, though.

I've attached an image of my excel plot for reference. The red is my equation and the blue is my data set.
 

Attachments

  • untitled.bmp
    428.2 KB · Views: 956
  • #5
Hi DyslexicHobo !

Sorry for my late answer. I have been away for a week.
I am surprised by your result presented on the image : The red and the blue curves are rather far from one to the other.
Generally the fitting is much better. What is going wrong ? I cannot say without knowing more precisely the data set.
The blue curve doesn't give the numerical values (especially the number of points, how they are distributed. Also, the scatter is hiden by the width of the line). Moreover, I cannot correctly convert a curve to numerical data.
In order to go further, I suggest to send a folder in attachment, with the data set expressed on numerical format.
I could test it in using an available software. I am confident that the fitting would be good. Then I could explain how to proceed by yourself.
 
Last edited:
  • #6
This looks like an application for a DFT.
Knowing the number of points available over how many years would be helpful.
Seeing the raw numerical data would be good.
 
  • #7
Hi DyslexicHobo !

in your post Oct10-13, 06:18 PM you wrote : << I know the period is 365 days >>
Therefore, the value of B is known in the equation :
Y = A*sin(B*x + C)+D*x+E
In this case, there is no need for guessed values to start, nor trial-and-error, because a classical linear regression gives directly the optimum values of the unknown parameters.
sin(B*x+C) = U*sin(B*X)+V*cos(B*X) where U=A*cos(C) and V=A*sin(C)
The function to be fitted is
Y = U*sin(B*x)+V*cos(B*X)+D*x+E
The equation is linear regarding the coefficients U, V, D, E. So, you can compute them directly, thanks to the linear regression method.
Then, A=sqrt(U²+V²) and C=arctan(V/U) if U>0 , or C=arctan(V/U)+pi if U<0.

In the case of the period of the sinusoidal term is not known ( case of unknown B has to be optimised as well as A, C, D, E) the regression is non-linear. Usually, the non linear methods of regression require gessed values to start and iterative processes.

A different method consists to first transform the non-linear equation to a linear equation thanks to a convenient integral equation. The advantage is that there is no need for gessed values to start and no need for itterative computation.
The case of the function on the form : Y = A*sin(B*x + C)+D*x+E is shown in the chapter "Mixed linear and sinusoidal regression", in addition to the paper "Régressions et équations intégrales", pp.47-48. The computation process is shown in full details pp. 49-50 and followed by a numerical example.
The access to the paper "Régressions et équations intégrales" is on Scribd :
http://www.scribd.com/JJacquelin/documents


:
 

What is a periodic best-fit equation?

A periodic best-fit equation is a mathematical representation of a set of data that shows a repeated pattern or cycle. It is used to predict future data points within the same pattern.

How do you find the best-fit equation for a data set?

There are several methods for finding the best-fit equation for a data set, including regression analysis, Fourier series, and least squares method. These methods involve analyzing the data and finding the equation that best fits the pattern of the data points.

Why is it important to find a periodic best-fit equation for a data set?

Finding a periodic best-fit equation allows for better understanding and prediction of future data points within the same pattern. It can also help identify any underlying trends or patterns in the data.

What factors should be considered when choosing a best-fit equation for a data set?

When choosing a best-fit equation, it is important to consider the type of data (e.g. continuous or discrete), the shape of the data (e.g. linear or nonlinear), and the desired level of accuracy. It is also important to check for any outliers or anomalies in the data that may affect the choice of equation.

Can a periodic best-fit equation be used for all types of data?

No, a periodic best-fit equation is only suitable for data sets that exhibit a repeating pattern or cycle. If the data does not have a periodic nature, other methods of finding a best-fit equation should be used.

Similar threads

  • General Math
Replies
2
Views
723
Replies
4
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
11
Views
758
Replies
4
Views
1K
Replies
16
Views
2K
  • Calculus and Beyond Homework Help
Replies
6
Views
644
  • STEM Educators and Teaching
Replies
5
Views
658
Replies
2
Views
1K
  • Atomic and Condensed Matter
Replies
1
Views
1K
Back
Top