How can I approximate a data set with an exponential curve?

  • Thread starter davee123
  • Start date
  • Tags
    Data Set
In summary: In the other case, f2 = a ln x + b and the objective is to minimize... the sum of the squares of the differences between the given data points and the corresponding values of f2. The latter case won't ensure that the sum of squares is minimized in the former case.In summary, a user is seeking help in approximating the next 50 data points in a set that appears to follow an exponential curve. They have tried using the basic forms of e^Ax+B and x^A+B, but neither gave the desired curve. The user believes there may be two growth formulas at play and is more
  • #1
davee123
672
4
Not sure if "General Math" is the best place for this, although I'm honestly not sure which sub-forum would be right.

So, I've got a data set. It looks like it's a standard exponential curve, but I honestly don't remember how to figure out an equation that will approximate it well. Actually, I guess I DO remember how to do an Nth degree polynomial given N data points, but I don't trust the standard polynomial form to do the job here, since I want to predict the data set a ways out.

The 125 data points I have currently are:

33.67
36.8
39.6
50.92
52.8
54.72
55.2
64.68
72.52
76.72
85.47
87.2
99.96
106.78
123.2
132
145.36
147.2
166.1
175.95
204.37
212.38
226.6
230.42
271.22
283.08
315.1
358.6
391.6
416.9
440
461.1
532.4
565.8
622.4
652
697.23
789.95
813.78
832
912
957.84
1155.08
1255.8
1277.3
1474
1601.3
1676.22
1782.73
2034.12
2097.6
2307.24
2647.84
2683.64
2964
3402.6
3622.6
4040.4
4296.4
4605.3
4803.5
5863.7
6259
6509
7378.4
7711.2
8432
8903
9694.2
10488
11144.1
12198
13727
14739.2
16148.2
18921
20608.9
21128
21660
25281
26319.7
30084
32050.8
32554.2
35431.2 <=== It's possible that somewhere around here, the function changes!
36432
40404
40510.2
44484
47424
51604
55624
61759
66670
72228
78880
85042
94242
100080
111240
121040
129456
139840
152613
171600
181440
197776
215644
233280
258750
279900
302820
328510
357280
388750
429000
462300
506350
535300
590400
638400
701800
753960
810250
980900

I'd like to be able to approximate the next 50 or so points (the next 47 to be precise). I've tried playing around with the basics of e^Ax+B or x^A+B, but these don't seem to give me the right curve. Also, there may be TWO growth formulas, I'm not sure. The first two-thirds or so might follow one pattern, and the latter one-third or so might follow another pattern. So really, I'm more interested in the latter one-third, in the event that there really ARE two different formulas.

Ideas anyone on how to go about approximating this? Is my best bet really to do some crazy 40th order polynomial (I sure hope not)?

DaveE
 
Mathematics news on Phys.org
  • #2
Some general comments:

1. Extrapolation is very tricky business unless you have a very good handle on the functional relationship that describes the data and have confidence that the functional relationship holds outside the range you have actual data

2. Clues can often be obtained by knowing the source of the data and what it represents. If some physical phenomenon, there may be known or accepted functional relationships that can be used.

3. Accuracy of fit vs simplicity of the function and consequence of inaccuracy are also considerations.

Having said that, I took a quick look and it sure looks exponential to me. If you plot the data on a log scale, it is remarkably linear, which suggests a function of the form ln(y) = ax + b or y = exp(ax + b) would come pretty close. You can get a 1st order approximation by just using the first and last data points. A least squares approach would result in a better approximation for a & b that minimizes the square error.
 
  • #3
hotvette said:
1. Extrapolation is very tricky business unless you have a very good handle on the functional relationship that describes the data and have confidence that the functional relationship holds outside the range you have actual data

2. Clues can often be obtained by knowing the source of the data and what it represents. If some physical phenomenon, there may be known or accepted functional relationships that can be used.

In this case, we're pretty sure that the pattern holds for the first 80-or-so data points and holds similarly for 80-or-so data points beyond that. The system in question is actually data that's been collected for an online game. The first iteration of the game featured about 80-or-so different "monsters" of increasing difficulty whose stats are represented first. Later, the game was expanded with an additional 80-or-so monsters with additional stats. So we (the players) know a bit what to expect, but we're curious how difficult the monsters are GOING to get in the future. It takes quite a while for people to progress, and the question is now: can anyone ever even hope to get to the top tier of monsters?

Anyway, suffice to say that there's a good chance that the math involved will be relatively basic and consistent. It's not a real world system that's subject to some crazy system dynamics model or anything that would crash and burn after experiencing exponential growth or anything like that. It's entirely theoretical.

However, I don't expect it to be perfect-- there are two components to the data given, which, individually, sort of rise randomly, but when multiplied together provide this set of data, which is VERY striking of a more simplistic mathematical formula. Hence, that's what I'm hoping to find, but each of the two sub-components may suffer from some rounding errors or other slight human-level tweaking.

hotvette said:
Having said that, I took a quick look and it sure looks exponential to me. If you plot the data on a log scale, it is remarkably linear, which suggests a function of the form ln(y) = ax + b or y = exp(ax + b) would come pretty close. You can get a 1st order approximation by just using the first and last data points. A least squares approach would result in a better approximation for a & b that minimizes the square error.

Ahhh, thanks! I had played around with exp(ax)+b, but not with exp(ax+b), since I guess it's been too long for me to remember which constants are significant in which form. I'll give that a try and see if I can get something that works.

DaveE
 
  • #4
You can solve ln(y) = ax+b using discrete least squares.
 
  • #5
daniel_i_l said:
You can solve ln(y) = ax+b using discrete least squares.

True and easy to do, but recognize that it does solve a different problem than the nonlinear version y = c*exp(a*x), where c = exp(b). The linear version will give a worse fit in the latter data points than the nonlinear version.
 
  • #6
daniel_i_l said:
You can solve ln(y) = ax+b using discrete least squares.

hotvette said:
True and easy to do, but recognize that it does solve a different problem than the nonlinear version y = c*exp(a*x), where c = exp(b). The linear version will give a worse fit in the latter data points than the nonlinear version.
Why would that be true? y= c exp(ax) and ln(y)= ax+ b are exactly the same equation.
 
  • #7
Even though the equations are mathematically equivalent, the least squares formulations aren't. In one case, f1 = ax + b and the objective is to minimize F1=sum(ln(y)-f1)2 whereas in the other case, f2 = c*exp(a*x) and the objective is to minimize F2=sum(y-f2)2. They are different problems with different results.
 
Last edited:
  • #8
davee123,

This might interest you.

http://ccsl.mae.cornell.edu/eureqa
 

1. What is the purpose of approximating a data set?

The purpose of approximating a data set is to find a simpler representation of the data that still captures its essential features. This can make the data easier to interpret and use for further analysis.

2. What methods can be used to approximate a data set?

There are several methods that can be used to approximate a data set, including linear regression, polynomial regression, and moving averages. Each method has its own advantages and is suitable for different types of data.

3. How accurate is the approximation of a data set?

The accuracy of the approximation of a data set depends on the method used and the quality of the data. Generally, the simpler the approximation method, the less accurate it will be. However, more complex methods may overfit the data and also result in lower accuracy.

4. Can approximating a data set be used for predictions?

Approximating a data set can be used for predictions, but it is important to note that the predictions may not be accurate if the data does not follow a clear pattern. Additionally, the accuracy of the predictions will depend on the quality of the data and the chosen approximation method.

5. How can I determine which approximation method is best for my data set?

The best approximation method for a data set will depend on the type of data and the desired level of accuracy. It is recommended to try multiple methods and compare their results to determine which one best captures the essential features of the data.

Back
Top