Finding periodic best-fit equation for data set?

  • Thread starter Thread starter DyslexicHobo
  • Start date Start date
  • Tags Tags
    Data Periodic Set
Click For Summary

Discussion Overview

The discussion revolves around finding a periodic best-fit equation for a data set that appears to follow a function similar to sin(x) + x. Participants explore methods for non-linear regression, particularly involving trigonometric functions combined with linear terms, and share their experiences with software tools like Excel for data analysis and fitting.

Discussion Character

  • Exploratory
  • Technical explanation
  • Mathematical reasoning
  • Debate/contested

Main Points Raised

  • One participant suggests using a trigonometric function of the form A*sin(B*x) + C*x for fitting the data and inquires about software options for this purpose.
  • Another participant describes their approach using Excel to set up a model and perform parameter estimation through a macro to minimize r² values.
  • A participant notes that fitting trigonometric functions mixed with linear terms can be challenging and suggests that the difficulty may depend on the distribution and number of data points.
  • One participant shares their method of estimating parameters and expresses excitement about their progress, despite acknowledging that the fit is not as accurate as hoped.
  • Another participant questions the quality of the fit based on a provided image and suggests that more information about the data set would be necessary for further analysis.
  • A suggestion is made that a Discrete Fourier Transform (DFT) might be applicable, contingent on the number of data points and the time span of the data.
  • One participant explains that if the period is known, a linear regression approach could yield optimal values for the parameters without needing initial guesses, contrasting with non-linear regression methods.
  • Another participant mentions a method to transform the non-linear equation into a linear one, which could simplify the fitting process.

Areas of Agreement / Disagreement

Participants express differing views on the effectiveness of the methods used for fitting the data. While some suggest that linear regression could provide better results given known parameters, others emphasize the challenges of fitting mixed functions and the need for more data to assess the fitting quality. The discussion remains unresolved regarding the best approach to take.

Contextual Notes

Limitations include the lack of detailed information about the data set, such as the number of points and their distribution, which could affect the fitting process. The discussion also highlights the dependence on initial parameter estimates in non-linear regression methods.

DyslexicHobo
Messages
249
Reaction score
0
Hello,

I have a data set that follows an equation similar to sin(x)+x. Just from eyeballing the data, it seems like there should be a pretty simple trigonometric function A*sin(B*x)+C*x. I went to school for engineering so I have some basic/intermediate knowledge of mathematics but it's been a while since I've applied it. Is there any software (freeware or Excel) that I can use? How about any cool math tricks?

Thanks!
 
Physics news on Phys.org
Just an update with an idea I had... I set up an excel document. The first four columns are the following: Y values from data, Y = A*sin(B*x)+C*x+D, X values from data, (B-A)^2. I plotted both on the same graph and am attempting to eyeball approximate values for A, B, C, and D. Then I'll write a macro to parametrically minimize r^2 by adjusting A, B, C, and D.

Will this work?
 
Hi !
The methods used to solve this kind of problems are called "Non-linear regression".
The case of trigonometric functions mixed with other usual functions are known to be difficult in practice.
The more or less difficulties depend a lot of the experimental data (number of experimental measurements, how they are distributed on few or many periods, etc.)
Since I am on the verge to leave for several days, I will let some one else give you advices about the available statistical softwares. I suppose that some people which have experience on the subject will give you the information.
Nevertheless, when I will come back, if you have not found a method giving good results with your data, I suggest to try another method. The general principle is explained in a paper "Régressions et équations intégrales" :
http://www.scribd.com/JJacquelin/documents
(Pages 25-36 :"Régression sinusoïdale")
No real need to read the paper, which is not translated yet. Moreover, the function considered is A+B*sin(W*x)+C*cos(W*x). So, some modifications have to be made in order to apply to the function A*sin(B*x)+C*x. Then, if you still need it, I could write the algorithm corresponding to your function. It's rather simple because that is a staight forward computation (i.e. non-recursive).
 
Last edited:
Jacquelin,

Thank you for your response. I've actually just found a reasonable method using these steps (which I created myself, so maybe it is not the most accurate method):
1. Created 4 columns in excel
i. Y values from data.
ii. Y = A*sin(B*x + C)+D*x+E
iii. X values from data
iv. (B-A)^2

2. Used an educated guess to guess values for A, B, C, D, and E
i. I wasn't sure how to guess the amplitude, so I left this for last and just eyeballed different values
ii. I know the period is 365 days and my unit for x is in days, so for B I used 2∏/365
iii. Again, I wasn't sure how to estimate the phase shift so I just eyeballed this after the other constants were approximated
iv. For this, I used Excel's linear line of best fit for my actual data set and plugged in the slope value
v. Same as above; used Excel's linear line of best fit for my actual data set and plugged in the y-intercept value

3. Created another cell that was equal to the average of all the r^2 values

4. I used Excel's "Solver" add-in to find the minimum value of the cell in step 3 by parametrically changing the values for A, B, C, D, and E.

Although this line of best fit doesn't actually fit quite as well as I'd hoped, I'm very excited I was able to figure this out on my own. It feels like solving a difficult homework problem. It's not a good enough equation to use as a predictor of future behavior like I was hoping. I'm fairly certain this is just because of the unpredictable nature of my data and not actually my method of finding an equation. I'm certainly open to suggestions for improvements to my method, though.

I've attached an image of my excel plot for reference. The red is my equation and the blue is my data set.
 

Attachments

Hi DyslexicHobo !

Sorry for my late answer. I have been away for a week.
I am surprised by your result presented on the image : The red and the blue curves are rather far from one to the other.
Generally the fitting is much better. What is going wrong ? I cannot say without knowing more precisely the data set.
The blue curve doesn't give the numerical values (especially the number of points, how they are distributed. Also, the scatter is hiden by the width of the line). Moreover, I cannot correctly convert a curve to numerical data.
In order to go further, I suggest to send a folder in attachment, with the data set expressed on numerical format.
I could test it in using an available software. I am confident that the fitting would be good. Then I could explain how to proceed by yourself.
 
Last edited:
This looks like an application for a DFT.
Knowing the number of points available over how many years would be helpful.
Seeing the raw numerical data would be good.
 
Hi DyslexicHobo !

in your post Oct10-13, 06:18 PM you wrote : << I know the period is 365 days >>
Therefore, the value of B is known in the equation :
Y = A*sin(B*x + C)+D*x+E
In this case, there is no need for guessed values to start, nor trial-and-error, because a classical linear regression gives directly the optimum values of the unknown parameters.
sin(B*x+C) = U*sin(B*X)+V*cos(B*X) where U=A*cos(C) and V=A*sin(C)
The function to be fitted is
Y = U*sin(B*x)+V*cos(B*X)+D*x+E
The equation is linear regarding the coefficients U, V, D, E. So, you can compute them directly, thanks to the linear regression method.
Then, A=sqrt(U²+V²) and C=arctan(V/U) if U>0 , or C=arctan(V/U)+pi if U<0.

In the case of the period of the sinusoidal term is not known ( case of unknown B has to be optimised as well as A, C, D, E) the regression is non-linear. Usually, the non linear methods of regression require gessed values to start and iterative processes.

A different method consists to first transform the non-linear equation to a linear equation thanks to a convenient integral equation. The advantage is that there is no need for gessed values to start and no need for itterative computation.
The case of the function on the form : Y = A*sin(B*x + C)+D*x+E is shown in the chapter "Mixed linear and sinusoidal regression", in addition to the paper "Régressions et équations intégrales", pp.47-48. The computation process is shown in full details pp. 49-50 and followed by a numerical example.
The access to the paper "Régressions et équations intégrales" is on Scribd :
http://www.scribd.com/JJacquelin/documents


:
 

Similar threads

  • · Replies 14 ·
Replies
14
Views
4K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 12 ·
Replies
12
Views
4K
Replies
2
Views
2K
  • · Replies 11 ·
Replies
11
Views
3K
Replies
6
Views
1K
  • · Replies 2 ·
Replies
2
Views
8K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 5 ·
Replies
5
Views
4K