# Best model to fit data

1. Jul 14, 2011

### kdbnlin78

Hey all.

I have some data, approximately 6 months worth. It is values that do not depend on time but are represented by a pair (x, y) such that the values x is measured at a point in time y. Therefore the data is equivalent to measuring how much stock I have in a given time period, say. The measurements occur once a day for 6 months.

I need to take this data and create a model that would allow me to predict values of x in (say) one month, three months and six months time.

My question is what is the best method to use? Should I suupose that a power law exists and us a regression model, or would a simple plotting of moving averages allow me to make an accurate prediction?

I guess I don't necessarily need to end with a function model representing x ans a function of y (x = f(y)) but more that I can use some statistical inference to accurately forecast the values of x_{i} in the future.

Any help on this matter would be very appreciated.

Regards,
kdbnlin

2. Jul 14, 2011

### pmsrw3

There is no good general answer to this question. There are lots of techniques, but forecasting is an art. (I had an entire one-semester course on it in biz school.)

Fitting to a power law is a good strategy if there is indeed reason to think a power law describes the data. That might be because you have a theory that says it should be a power law, or because a power law fits past performance well. Absent such reason, there are lots of other possible models.

Finally, I don't understand this:

As well as I can make this out, you are saying that you measure x as a function of time. Then, for some mysterious reason, you decide to call time y, and, most mysterious of all, you claim the "values that do not depend on time". Could you make this a little clearer? If the values don't depend on time, how can you possibly think you could build a model that will make predictions for future times? In fact, if the values don't depend on time, doesn't that mean that the value is always the same, so no prediction is necessary?

3. Jul 14, 2011

### kdbnlin78

Hi pmsrw3 thanks for the reply.

Apologies for the (non) use of English!

What I mean is this: A measurement of (say) x does not depend on time. We capture the value of x at some point in time. So for example, a shop sells cans of soup. The shop owner measures how many cans of soup he sells each week. He measures this on Friday evening every week. He can assume that the sales of the soup do not (necessarily) depend on time.

So I have a pair of values (x, t) such that x represents the number of cans sold and t is the time (of the day) at which the shop owner counted them.

I mean then that x = x(t) is not really what I am looking for but I would like to know (using the analogy) how many cans of soup I will sell in the next 26 fridays.

I hope that makes (some kind of) sense.

Regards,
kdbnlin.

4. Jul 14, 2011

### pmsrw3

So what you have is a series of sales values, each with a date. There is in addition a completely uninteresting and uninformative datum that gives the time of day, y, at which each report was made. Is that right?

Assuming the answer is yes, you're obviously just going to throw out y. Then you have a series of (t, x) values. Here I'm using t for the date, since a date is also a time measurement. (Sorry if that's strange to you, but it's the way such data are usually handled.)

I can't think of any particular reason to expect this to follow a power law. I would probably just fit it to a constant, a line, a quadratic, etc, stopping when the fit stopped improving (as determined by an F-test, for instance). I'd be very surprised if the data justify going beyond a quadratic. If you had at least a year's worth of data it might also make sense to include a seasonal correction, but you don't have a long enough time series for that. (You might, however, be able to find something useful in BLS sales stats.)

5. Jul 15, 2011

### chiro

Your question is nearly the same as given a function, finding a representation of the function that describes the data, and then using that to transform the data back to something linear that can be operated on and analyzed through standard techniques.

As pmsrw3 stated above, there is not a single or easy way to do this.

By your post it seems like you are looking for parametrization of your data.

One suggestion you could try is to break up your data into smaller chunks and try a parametrization of the subsections of your data, and then chunk of chunk, try and uniform the different parametrization sections together.

If the data follows some simple power law, you may not have to do this, but if its complicated, this may help (and I emphasize "may").

One other possible way is to transform your data by time series analysis to make it "smooth" and then use a variety of integral transform techniques to get required information about its functional structure. Depending on the transform used, you will get specific information about the function in some form or another. This kind of method is more systematic than what I said above, but its more complex.

6. Jul 18, 2011

### kdbnlin78

This is the route I think I will take. It seems very straightfoprward to smooth out the data by a simple moving averages technique. One question comes to mind; How do I know what "step distance" to take in calculating the moving averages?

I suppose I mean what value of $n$ should I take in /frac{1}{n}/sum_{i-1}^{i+n-1}a_{i} where the a_{i} is my n-sequence of data?

Further to this, I should imagine a discrete Fourier transform may ork in this case, given that my measurements are specifically taken in regular intervals over time.

7. Jul 18, 2011

### Stephen Tashi

My personal philosophy on the generalities of fitting models to data is that you should actually have a model. If you hypothesize a probability model (one that is detailed enough to let you write a computer simulation of how the data is generated) then even if this model has unknown parameters, you can answer many questions about the "best" way to fit equations to the data. If you hypothesize such a model you are using subjective judgement, but it you can explicitly know and state what have assumed. If you don't hypothesize a model and make a a whole collection of subjective judements based on "I like this transform" or "I'll group these things together" etc. then you still have made subjective judgements, but they have no coherent structure and they don't imply that any particular mathematical methods are optimal.

8. Jul 18, 2011

### pmsrw3

I like that attitude, but I have some sympathy with kdbnlin78. Sometimes (and this happens a lot in biz forecasting), you may have absolutely no good explanation for why a thing changes in the way it does. And yet you still want to make forecasts. In that case, making up a model is an exercise in subjectivity no better than choosing a Fourier series because you just love Fourier series. I think in that case one is justified in trying a bunch of random stuff and seeing what works. In fact, that may actually lead to a model.

9. Jul 18, 2011

### D H

Staff Emeritus
Not necessarily. A model is a nice thing (very nice thing!) to have, but lack of a model does not mean you can't go forward. This is similar in a sense to the distinction between supervised and unsupervised machine learning.

One thing to beware of in model-free fit (or unsupervised learning) is the danger of overfitting. Throw 2001 data points at a fitting algorithm and it will gladly up with a 2000th order polynomial that matches every single one of those data points to a T. That 2000th order polynomial almost certainly has zero predictive capability. Shoot, it almost certainly doesn't even have good interpolative capabilities.

10. Jul 18, 2011

### kdbnlin78

I agree with your sentiment and accept that my philiosophy here is somewhat flawed and is certainly not perfect. My motivation for using the approach in my previous post is based on a "best case scenario" given some time series data that isn't really appropriate to answer the questions I am being asked.

However, in this case, like all good Scientists and Mathematicians, I'll proceed and exhaust all possible avenues until one gives me a "best case answer".

As a Mathematician this approach pains me somewhat but needs must.

11. Jul 18, 2011

### pmsrw3

Exactly. And the same danger exists in a more opaque form when you try lots and lots of different models until you find the one that seems to fit best. You may find one that fits the points you have, but that's not a good reason to think it'll fit the next point you get. Nate Silver at FiveThirtyEight.com is good on this.

The guy's got six months of data sampled weekly -- call it 26 data points. I really think smoothing + integral transforms is overkill (and probably overfitting!). Such a short series probably only tells you where you are and how fast things have been changing lately.

EDIT: Sorry, I went back and read again, and I may have gotten this wrong. In the OP, kdbnlin78 says the data are sampled daily. But then later he says they're sampled on Fridays. So I'm not sure if he has 26 or 180 data points. But the point stands.

12. Jul 18, 2011

### kdbnlin78

13. Jul 18, 2011

### pmsrw3

Oh, 1100... Well, that puts a bit of a different complexion on things.

You know, I think you should try http://en.wikipedia.org/wiki/LOESS" [Broken].

Last edited by a moderator: May 5, 2017
14. Jul 18, 2011

### kdbnlin78

Ok - Local regression. I awsn't aware of it as a technique. I think you may be right though, looks like a good technique for my problem. An unknown function representing the data and my needs for a forecast.

(Edit: I see that the technique is local since all we are doing are Taylor expansions - makes sense now)

Thank you very much for that - I'll search the term and do some reading!

Regards,
kdbnlin

Last edited by a moderator: May 5, 2017