Given a series of output, how to determine function?

In summary, the conversation discusses a programmer trying to find the relationship for a sequence of output in their program. They have simulated the output and have the numbers for a given input, representing the percentage of time the system spends in a certain state. The algorithm involves events spending a certain amount of time doing action A followed by action B, with action A having a mean of 10 minutes and action B having a mean of 30 minutes. The goal is to determine the percentage of time the system has 2 or more events performing action A simultaneously. There is a discussion about using a polynomial fit for the data and the use of randomness in the algorithm.
  • #1
iamsmooth
103
0

Homework Statement


I am a dumb programmer trying to figure out the relationship for a sequence of output. I can't seem to figure it out by guessing, so I assume there's a way to mathematically work this out.
Anyways I wrote a program to do discrete event simulation and I have the following numbers on a given input x that represents the % of time the system spends in this state. The numbers are not exactly what they should be because it's simulated, but there's a huge sample so it should be very close.

x = 1, f(x) = 0
x = 2, f(x) = 0.063
x = 3, f(x) = 0.156
x = 4, f(x) = 0.26
x = 5, f(x) = 0.366
x = 6, f(x) = 0.467

Homework Equations


This may have something to do with the fraction 1/4.

My system has events that spend 25% of the time doing one thing, then 75% of the time doing another.


The Attempt at a Solution



I tried guessing and checking, the only way I know how. I figured 0.25^0 = 1, 0.25^1 = 0.25, 0.25^2 = 0.063, then 0.25^3 = 0.015625 which is close, but one decimal off and finally 0.25^4 = 0.00390625 which isn't even close anymore to the above.

I also tried dividing each subsequent output with the last output to try and get some sort of pattern and it seems sporadic. I'm so confused >< Please point me in the right direction!
 
Physics news on Phys.org
  • #2
I am a dumb programmer trying to figure out the relationship for a sequence of output. I can't seem to figure it out by guessing, so I assume there's a way to mathematically work this out.
Nope - there is no way to uniquely determine the generating function from a series of discrete samples.
You have to start out with some idea about what the function could be, then test that idea by doing some sort of fitness test.

I can fit your data really well to a quartic or a quintic polynomial.

My system has events that spend 25% of the time doing one thing, then 75% of the time doing another.
... it sounds more like a stochastic process. If it is statistical - involving some randomness - then f(x) is not going to be smooth.

Presumably you know the generating algorithm?
 
Last edited:
  • #3
The algorithm is that each event will perform an action A followed by action B. Action A is performed for an exponentially distributed amount of time with a mean of 10 minutes, and B is performed for an exponentially distributed amount of time for 30 minutes.

So what I'm trying to figure out is the percentage of time the system has 2 or more events performing action A simultaneously. So when there is 1 event, the answer is 0%, because it can't spend anytime with the system having more than 2 events doing activity A. 2 events = 0.063, 3 events = 0.156, 4 events = 0.26. Eventually it approaches 100% when you have a lot of events, because it's going to be very rare that only 0 or 1 events are performing action A.

So basically that's my problem. I simulated the output and received this, and I don't know how to analyze what it means. It definitely has to do with the 0.25 but I'm really unsure because my math skills are VERY rusty (I'm planning on taking some time over the next few months to relearn some of the university-level math I learned a while back).
 
  • #4
What makes you so sure that 0.25 has anything to do with it?

Using least-squares fit for polynomial order 4, the coefficient vector is:
c = [-6.6667e-05 1.4167e-03 -1.3000e-02 6.3583e-02 -5.5933e-02 4.0000e-03]

Giving your function as:
##f(x)=c(1)x^4+c(2)x^3+c(3)x^2+c(4)x+c(5)##

But I don't understand your description of the generating algorithm.
What do you mean by an "event"?
How does the "event" decide which action to perform?
 
  • Like
Likes 1 person
  • #5
Well, I am simulating events. Each event starts out doing action B for an amount of time that's exponentially distributed with a mean of 30 minutes, to get this number I do: -(actionTime) * log(randomNumber)
where actionTime is the amount of time the event is spent performing action and randomNumber is just a randomly generated floating point number from 0-1.

So if I say there are 2 events. They both start out performing event B from time 0 for -(30) * log(randomNumber) minutes, then depending on the specific number they rolled, they switch to action A for -(10) * log(randomNumber) minutes. I simulate a million minutes and receive the sequence of results for input which represents the number of events I want to start the system with. Hopefully that helps?
 
  • #6
I think that what you are calling events others might call processes. You start x processes at once. Each process independently is in state A for an exponentially distributed random time, then in state B for an exponentially distributed random time, then ... returns state A, picking a new random time, ad nauseam?
I assume the two times are independent. For all processes, the mean sojourn in state A is 10 minutes, in state B 30 minutes.
And you want to know the probability that at any instant there are two or more processes in state A. Is that it?

Assuming the system has been running long enough that it has ceased to matter whether all started in the same state, I don't see that the distribution of time periods matters. Doesn't it just come down to each independently being in state A with probability 1/4? That gives an excellent fit to your data.
 
  • #7
How does that relate to the values of x - is x the number of "events"?

i.e.
We could consider an event to be a fisherman, action A is "fish" and action B is "cut bait" and x is the number of fishermen in the boat. Each starts out cutting bait, and when that's done, they start fishing for a bit - alternating fish and cut-bait?

So actionTime is 30 for event B and 10 for event A.
Is randomNumber taken from a rand() function - i.e. equally likely to be any real number between 0 and 1?

I'm still a bit confused:
You said each starts out doing action B for a time that is exponentially distributed with a mean of 30mins.
You then tell me that this time is determined by: Time = actionTime*log(randomNumber)
But then you say that actionTime is the time spent doing the action.

... or is it the mean-time doing the action?
You have a reference that this is actually the mean?

You have been running the simulation to gather statistics on what exactly?
i.e. perhaps you want to know how much time out of a set period (1 million minutes, or a day, or whatever) that the fishermen spend actually fishing ... you would need that to determine how many fishermen you need to hire to get a certain size catch in the time period.

See how making things concrete helps think about it?
Keeping the goals a secret will just make life harder for you.

BTW: I agree with haruspex (above)
 

1. How do I determine the function if I only have the output?

To determine the function from a series of output, you will need to observe the patterns and relationships between the inputs and outputs. Look for any common differences or ratios between the input and output values. You can also plot the points on a graph and see if there is a clear trend or shape.

2. Is it possible to determine the function if I only have a few output values?

Yes, it is possible to determine the function with only a few output values. However, the more output values you have, the easier it will be to identify patterns and make accurate conclusions about the function.

3. Can I use any mathematical method to determine the function?

There are various mathematical methods that can be used to determine a function, such as linear regression, polynomial regression, and exponential regression. The best method to use will depend on the nature of the data and the type of function you are trying to find.

4. What if the function is not a simple one, like a polynomial or exponential function?

If the function is not a simple, recognizable one, you may need to use more advanced mathematical techniques or software to determine the function. This could include using calculus or machine learning algorithms to find the best fit function for the given data.

5. Is it important to know the function if I already have the output values?

Knowing the function can be beneficial in understanding the relationship between the inputs and outputs, predicting future outputs, and making informed decisions based on the data. However, in some cases, it may not be necessary to know the function if the output values provide enough information for the desired purpose.

Similar threads

  • Calculus and Beyond Homework Help
Replies
1
Views
255
  • Calculus and Beyond Homework Help
Replies
8
Views
1K
  • Calculus and Beyond Homework Help
Replies
5
Views
2K
  • Calculus and Beyond Homework Help
Replies
6
Views
2K
  • Calculus and Beyond Homework Help
Replies
5
Views
1K
  • Calculus and Beyond Homework Help
Replies
10
Views
1K
  • Calculus and Beyond Homework Help
Replies
13
Views
1K
  • Calculus and Beyond Homework Help
Replies
9
Views
1K
  • Calculus and Beyond Homework Help
Replies
8
Views
1K
  • Calculus and Beyond Homework Help
Replies
12
Views
5K
Back
Top