1. Limited time only! Sign up for a free 30min personal tutor trial with Chegg Tutors
    Dismiss Notice
Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Approximating a data set

  1. Dec 15, 2009 #1
    Not sure if "General Math" is the best place for this, although I'm honestly not sure which sub-forum would be right.

    So, I've got a data set. It looks like it's a standard exponential curve, but I honestly don't remember how to figure out an equation that will approximate it well. Actually, I guess I DO remember how to do an Nth degree polynomial given N data points, but I don't trust the standard polynomial form to do the job here, since I want to predict the data set a ways out.

    The 125 data points I have currently are:

    33.67
    36.8
    39.6
    50.92
    52.8
    54.72
    55.2
    64.68
    72.52
    76.72
    85.47
    87.2
    99.96
    106.78
    123.2
    132
    145.36
    147.2
    166.1
    175.95
    204.37
    212.38
    226.6
    230.42
    271.22
    283.08
    315.1
    358.6
    391.6
    416.9
    440
    461.1
    532.4
    565.8
    622.4
    652
    697.23
    789.95
    813.78
    832
    912
    957.84
    1155.08
    1255.8
    1277.3
    1474
    1601.3
    1676.22
    1782.73
    2034.12
    2097.6
    2307.24
    2647.84
    2683.64
    2964
    3402.6
    3622.6
    4040.4
    4296.4
    4605.3
    4803.5
    5863.7
    6259
    6509
    7378.4
    7711.2
    8432
    8903
    9694.2
    10488
    11144.1
    12198
    13727
    14739.2
    16148.2
    18921
    20608.9
    21128
    21660
    25281
    26319.7
    30084
    32050.8
    32554.2
    35431.2 <=== It's possible that somewhere around here, the function changes!
    36432
    40404
    40510.2
    44484
    47424
    51604
    55624
    61759
    66670
    72228
    78880
    85042
    94242
    100080
    111240
    121040
    129456
    139840
    152613
    171600
    181440
    197776
    215644
    233280
    258750
    279900
    302820
    328510
    357280
    388750
    429000
    462300
    506350
    535300
    590400
    638400
    701800
    753960
    810250
    980900

    I'd like to be able to approximate the next 50 or so points (the next 47 to be precise). I've tried playing around with the basics of e^Ax+B or x^A+B, but these don't seem to give me the right curve. Also, there may be TWO growth formulas, I'm not sure. The first two-thirds or so might follow one pattern, and the latter one-third or so might follow another pattern. So really, I'm more interested in the latter one-third, in the event that there really ARE two different formulas.

    Ideas anyone on how to go about approximating this? Is my best bet really to do some crazy 40th order polynomial (I sure hope not)?

    DaveE
     
  2. jcsd
  3. Dec 15, 2009 #2

    hotvette

    User Avatar
    Homework Helper

    Some general comments:

    1. Extrapolation is very tricky business unless you have a very good handle on the functional relationship that describes the data and have confidence that the functional relationship holds outside the range you have actual data

    2. Clues can often be obtained by knowing the source of the data and what it represents. If some physical phenomenon, there may be known or accepted functional relationships that can be used.

    3. Accuracy of fit vs simplicity of the function and consequence of inaccuracy are also considerations.

    Having said that, I took a quick look and it sure looks exponential to me. If you plot the data on a log scale, it is remarkably linear, which suggests a function of the form ln(y) = ax + b or y = exp(ax + b) would come pretty close. You can get a 1st order approximation by just using the first and last data points. A least squares approach would result in a better approximation for a & b that minimizes the square error.
     
  4. Dec 15, 2009 #3
    In this case, we're pretty sure that the pattern holds for the first 80-or-so data points and holds similarly for 80-or-so data points beyond that. The system in question is actually data that's been collected for an online game. The first iteration of the game featured about 80-or-so different "monsters" of increasing difficulty whose stats are represented first. Later, the game was expanded with an additional 80-or-so monsters with additional stats. So we (the players) know a bit what to expect, but we're curious how difficult the monsters are GOING to get in the future. It takes quite a while for people to progress, and the question is now: can anyone ever even hope to get to the top tier of monsters?

    Anyway, suffice to say that there's a good chance that the math involved will be relatively basic and consistent. It's not a real world system that's subject to some crazy system dynamics model or anything that would crash and burn after experiencing exponential growth or anything like that. It's entirely theoretical.

    However, I don't expect it to be perfect-- there are two components to the data given, which, individually, sort of rise randomly, but when multiplied together provide this set of data, which is VERY striking of a more simplistic mathematical formula. Hence, that's what I'm hoping to find, but each of the two sub-components may suffer from some rounding errors or other slight human-level tweaking.

    Ahhh, thanks! I had played around with exp(ax)+b, but not with exp(ax+b), since I guess it's been too long for me to remember which constants are significant in which form. I'll give that a try and see if I can get something that works.

    DaveE
     
  5. Dec 16, 2009 #4

    daniel_i_l

    User Avatar
    Gold Member

    You can solve ln(y) = ax+b using discrete least squares.
     
  6. Dec 16, 2009 #5

    hotvette

    User Avatar
    Homework Helper

    True and easy to do, but recognize that it does solve a different problem than the nonlinear version y = c*exp(a*x), where c = exp(b). The linear version will give a worse fit in the latter data points than the nonlinear version.
     
  7. Dec 17, 2009 #6

    HallsofIvy

    User Avatar
    Science Advisor

    Why would that be true? y= c exp(ax) and ln(y)= ax+ b are exactly the same equation.
     
  8. Dec 17, 2009 #7

    hotvette

    User Avatar
    Homework Helper

    Even though the equations are mathematically equivalent, the least squares formulations aren't. In one case, f1 = ax + b and the objective is to minimize F1=sum(ln(y)-f1)2 whereas in the other case, f2 = c*exp(a*x) and the objective is to minimize F2=sum(y-f2)2. They are different problems with different results.
     
    Last edited: Dec 18, 2009
  9. Dec 21, 2009 #8

    hotvette

    User Avatar
    Homework Helper

Share this great discussion with others via Reddit, Google+, Twitter, or Facebook