# Given a series of output, how to determine function?

1. Apr 12, 2014

### iamsmooth

1. The problem statement, all variables and given/known data
I am a dumb programmer trying to figure out the relationship for a sequence of output. I can't seem to figure it out by guessing, so I assume there's a way to mathematically work this out.
Anyways I wrote a program to do discrete event simulation and I have the following numbers on a given input x that represents the % of time the system spends in this state. The numbers are not exactly what they should be because it's simulated, but there's a huge sample so it should be very close.

x = 1, f(x) = 0
x = 2, f(x) = 0.063
x = 3, f(x) = 0.156
x = 4, f(x) = 0.26
x = 5, f(x) = 0.366
x = 6, f(x) = 0.467

2. Relevant equations
This may have something to do with the fraction 1/4.

My system has events that spend 25% of the time doing one thing, then 75% of the time doing another.

3. The attempt at a solution

I tried guessing and checking, the only way I know how. I figured 0.25^0 = 1, 0.25^1 = 0.25, 0.25^2 = 0.063, then 0.25^3 = 0.015625 which is close, but one decimal off and finally 0.25^4 = 0.00390625 which isn't even close anymore to the above.

I also tried dividing each subsequent output with the last output to try and get some sort of pattern and it seems sporadic. I'm so confused >< Please point me in the right direction!

2. Apr 12, 2014

### Simon Bridge

Nope - there is no way to uniquely determine the generating function from a series of discrete samples.
You have to start out with some idea about what the function could be, then test that idea by doing some sort of fitness test.

I can fit your data really well to a quartic or a quintic polynomial.

... it sounds more like a stochastic process. If it is statistical - involving some randomness - then f(x) is not going to be smooth.

Presumably you know the generating algorithm?

Last edited: Apr 12, 2014
3. Apr 12, 2014

### iamsmooth

The algorithm is that each event will perform an action A followed by action B. Action A is performed for an exponentially distributed amount of time with a mean of 10 minutes, and B is performed for an exponentially distributed amount of time for 30 minutes.

So what I'm trying to figure out is the percentage of time the system has 2 or more events performing action A simultaneously. So when there is 1 event, the answer is 0%, because it can't spend anytime with the system having more than 2 events doing activity A. 2 events = 0.063, 3 events = 0.156, 4 events = 0.26. Eventually it approaches 100% when you have a lot of events, because it's going to be very rare that only 0 or 1 events are performing action A.

So basically that's my problem. I simulated the output and received this, and I don't know how to analyze what it means. It definitely has to do with the 0.25 but I'm really unsure because my math skills are VERY rusty (I'm planning on taking some time over the next few months to relearn some of the university-level math I learned a while back).

4. Apr 12, 2014

### Simon Bridge

What makes you so sure that 0.25 has anything to do with it?

Using least-squares fit for polynomial order 4, the coefficient vector is:
c = [-6.6667e-05 1.4167e-03 -1.3000e-02 6.3583e-02 -5.5933e-02 4.0000e-03]

$f(x)=c(1)x^4+c(2)x^3+c(3)x^2+c(4)x+c(5)$

But I don't understand your description of the generating algorithm.
What do you mean by an "event"?
How does the "event" decide which action to perform?

5. Apr 12, 2014

### iamsmooth

Well, I am simulating events. Each event starts out doing action B for an amount of time that's exponentially distributed with a mean of 30 minutes, to get this number I do: -(actionTime) * log(randomNumber)
where actionTime is the amount of time the event is spent performing action and randomNumber is just a randomly generated floating point number from 0-1.

So if I say there are 2 events. They both start out performing event B from time 0 for -(30) * log(randomNumber) minutes, then depending on the specific number they rolled, they switch to action A for -(10) * log(randomNumber) minutes. I simulate a million minutes and receive the sequence of results for input which represents the number of events I want to start the system with. Hopefully that helps?

6. Apr 12, 2014

### haruspex

I think that what you are calling events others might call processes. You start x processes at once. Each process independently is in state A for an exponentially distributed random time, then in state B for an exponentially distributed random time, then ... returns state A, picking a new random time, ad nauseam?
I assume the two times are independent. For all processes, the mean sojourn in state A is 10 minutes, in state B 30 minutes.
And you want to know the probability that at any instant there are two or more processes in state A. Is that it?

Assuming the system has been running long enough that it has ceased to matter whether all started in the same state, I don't see that the distribution of time periods matters. Doesn't it just come down to each independently being in state A with probability 1/4? That gives an excellent fit to your data.

7. Apr 12, 2014

### Simon Bridge

How does that relate to the values of x - is x the number of "events"?

i.e.
We could consider an event to be a fisherman, action A is "fish" and action B is "cut bait" and x is the number of fishermen in the boat. Each starts out cutting bait, and when that's done, they start fishing for a bit - alternating fish and cut-bait?

So actionTime is 30 for event B and 10 for event A.
Is randomNumber taken from a rand() function - i.e. equally likely to be any real number between 0 and 1?

I'm still a bit confused:
You said each starts out doing action B for a time that is exponentially distributed with a mean of 30mins.
You then tell me that this time is determined by: Time = actionTime*log(randomNumber)
But then you say that actionTime is the time spent doing the action.

... or is it the mean-time doing the action?
You have a reference that this is actually the mean?

You have been running the simulation to gather statistics on what exactly?
i.e. perhaps you want to know how much time out of a set period (1 million minutes, or a day, or whatever) that the fishermen spend actually fishing ... you would need that to determine how many fishermen you need to hire to get a certain size catch in the time period.

See how making things concrete helps think about it?
Keeping the goals a secret will just make life harder for you.

BTW: I agree with haruspex (above)