# Want to (roughly) predict future behavior of a system

1. Jan 24, 2014

### oneamp

Hello

I am building a set of data. It is composed of events that occur at discrete times throughout a day. There is tolerance, eg. per 5 seconds. I want to be able to predict the probability that an event will occur at a given time on a future day, by taking the probability derived from past days accumulated. Then I want to compare it to a random number and simulate a future day.

So, if I use one day, my prediction will be the same as this day, since the probability of each event is one (it all occured). If I include two days, then I can superimpose them, coming up with a new aggregate probability to compare my random numbers to. I think the more days I use, the more accurate a picture of typical behavior I will have, and the more my prediction will match a real future day.

Note that I don't want to predict the future. I want to emulate a future day, which statistically seems probable, not abnormal.

The behavior is not entirely random. It is for example: 'what time I went to the kitchen that day'. It happens several times a day, and mostly happens around the same times.

1) Is my approach meaningful? If not, what direction do I need to look in instead?

2) How can i correlate the days to provide an aggregate probability that an event will happen between time and (time - window)?

3) How can I measure how accurate my prediction is, as a function of number of days included in the aggregate probability?

Thank you

2. Jan 25, 2014

### Stephen Tashi

A hard part of applying probability to real life is to define the events precisely. For example, if you ask "What is the probability that I go to the kitchen between 12:00 and 12:05 PM", this doesn't define a precise event because it doesn't state the population of outcomes that is involved. You'd have to say something like "On a day selected at random from all the days of the year, what is the probability that I go to the kitchen between 12:00 and 12:05 PM" or "On a weekday selected at random from all the days in the year, what is the probability that I go to the kitchen between 12:00 and 12:05 PM".

If you are interested in anything involving the relation between two events, you must define the population so the events can have that relation. For example if you are interested in the wear on the rug between the kitchen and the living room, you can't investigate this by simulating the events of the day by drawing an event every 5 minutes at random from a population defined by "On a day selected at random,....). In a real day, the event at 12:00-12:05 PM and the event at 12:05-12:10 PM aren't selected at random as if they were from completely different days.

3. Jan 25, 2014

### oneamp

Thanks that's enough

4. Jan 26, 2014

### FactChecker

Sure. Your approach is reasonable. If you have several days of statistics, you can calculate the standard deviation of the number of events that happen in a fixed time period. That will give you an idea of how much variation there is from day to day. One thing to think about is the non-random aspects of the events. (i.e. Does eating a late breakfast tend to imply eating a late lunch? Does the arrival of the morning paper rule out the arrival of an evening paper?). If you choose to ignore those relationships and assume that every event is independent of the others it may make your simulated day unrealistic (eating 6 meals in a day?). But mimicking those relationships can quickly get out of hand. You will have to be judicious in what you assume. There are entire computer languages and systems that people use to simulate complicated things.

Last edited: Jan 26, 2014