Transition Rate Matrix for 5 Processing Units

CTK · Jun 16, 2022

Summary: The transition rate matrix for a problem where there are 5 Processing Units

A computer has five processing units (PU’s). The lifetimes of the PU’s are independent and have the Exp(µ) law. When a PU fails, the computer tries to reconfigure itself to work with the remaining PU’s. This succeeds with a probability p (called the coverage factor). If the reconfiguration succeeds then the computer goes on working, and if the reconfiguration fails then the computer crashes, i.e., it is ‘down’. If no PU’s are functional then the computer cannot function, i.e., it is ‘down’. Assume reconfigurations occur instantaneously, and that the computer never recovers from a whole- system crash. Let X(t) = 0 if the computer is down at time t, and otherwise it denotes the number of working PU’s.

Now, if we want to find the transition rates between each state, then is it as follows:

we have a state space of S = {0,1} where 0 is down and 1 is up. Now:

transition from 0 to 0 is: 0
transition from 0 to 1 is: µ*p
transition from 1 to 0 is: µ*(1-p)
transition from 1 to 1 is: µ

And so, the transition rate matrix:

R = 0 µp
µ*(1-p) µ

Is that correct? I appreciate any input from anyone, thanks.

PeroK · Jun 16, 2022

Do you not have to take into account multiple processor failures over time?

The number of active processors can never increase.

CTK · Jun 16, 2022

PeroK said:

Do you not have to take into account multiple processor failures over time?

The number of active processors can never increase.

The question says that if one PU fails and can't be reconfigured, then the computer crashes.
So first of all, is my state space even correct? S = 0,1 means 0 is down, which means one (or more) of PUs failed and can't be fixed...and state 1 means all are working well OR one has failed but was reconfigured with probability of P.
So I am just struggling to put all of this information into a matrix. Any help would be appreciate it.

PeroK · Jun 16, 2022

I think the problem only requires a state space of up or down.

The first thing you need to do is model the CPU failure process. You have a distribution for each processor. Don't you need the probability that ##n## processors have failed at time ##t##?

PeroK · Jun 16, 2022

Sorry, it requires you to model the number of working processors at time ##t##. Not just whether the computer is still running.

Your state space needs to cover that.

CTK · Jun 16, 2022

PeroK said:

Sorry, it requires you to model the number of working processors at time ##t##. Not just whether the computer is still running.

Your state space needs to cover that.

Could you please explain a bit more what do you mean? Sorry for that, but I am still struggling with this question, thanks for your help.

PeroK · Jun 16, 2022

CTK said:

Could you please explain a bit more what do you mean? Sorry for that, but I am still struggling with this question, thanks for your help.

What does it take for all five processors to be working at time ##t##?

What does it take for precisely four processors to be working at time ##t##?

CTK · Jun 16, 2022

PeroK said:

What does it take for all five processors to be working at time ##t##?

that none of them fail.

PeroK said:

What does it take for precisely four processors to be working at time ##t##?

that one fails and can't be repaired? If yes, then the whole compute crashes and therefore we enter a state 0, right?

But I am still not understanding how would hat be put into a matrix?

PeroK · Jun 16, 2022

CTK said:

that none of them fail.

Yes. You should be able to calculate that.

CTK said:

that one fails and can't be repaired? If yes, then the whole compute crashes and therefore we enter a state 0, right?

No. That one fails and the recovery process is successful. There is nothing about processor repair in the question.

CTK said:

But I am still not understanding how would hat be put into a matrix?

One step at a time.

CTK · Jun 16, 2022

PeroK said:

Yes. You should be able to calculate that.

So if nothing is failing, shouldn't that be zero?

PeroK said:

No. That one fails and the recovery process is successful. There is nothing about processor repair in the question.

So is that represented as µ*p where µ is the lifetime so failing means its lifetime is done, and p refers to the successful recovery of the process. Is that correct?

PeroK said:

One step at a time.

PeroK · Jun 16, 2022

CTK said:

So if nothing is failing, shouldn't that be zero?

No. There is a non-zero probability of zero failures.

CTK said:

So is that represented as µ*p where µ is the lifetime so failing means its lifetime is done, and p refers to the successful recovery of the process. Is that correct?

No. You have five processors, four of which have not failed and one has failed.

Also, ##\mu## is a parameter in an exponential distribution, not a probability.

I get the feeling you are perhaps a little out of your depth on this one!

I'll be off line for a couple of hours Now.

CTK · Jun 16, 2022

PeroK said:

No. There is a non-zero probability of zero failures.

so is that (1-p)^5?

PeroK said:

No. You have five processors, four of which have not failed and one has failed.

is that p*(1-p)^4

PeroK said:

Also, ##\mu## is a parameter in an exponential distribution, not a probability.

I get the feeling you are perhaps a little out of your depth on this one!

This question is killing me.

PeroK said:

I'll be off line for a couple of hours Now.

Unlucky me ahaha. No worries, thanks for your help.

PeroK · Jun 16, 2022

CTK said:

so is that (1-p)^5?

No. The probability ##p## relates to the machine recovering from a processor failure.

The probability of failure is calculated from the failure distribution: in this case an exponential distribution with parameter ##\mu##.

CTK said:

is that p*(1-p)^4

No. As above.

CTK said:

This question is killing me.

I can see that.

CTK · Jun 16, 2022

PeroK said:

No. The probability ##p## relates to the machine recovering from a processor failure.

The probability of failure is calculated from the failure distribution: in this case an exponential distribution with parameter ##\mu##.No. As above.

I can see that.

To be honest, I am just giving up, I really appreciate your time and effort though with helping me, that was very nice of you. Have a great day.

PeroK · Jun 16, 2022

The exponential distribution is quite common and applies where the chance of failure at each time is independent of time. It applies to radioactive half lives, and is often applied to lightbulb and processor failures. It means that a light bulb has as much chance of failing on any day, whether it's the first day it's used, the 100th day or the 1000th day (dependent on, of course, it surviving the first 99 or 999 days).

Then you can apply that to any number of independent light bulbs and calculate the probability of how many are still working after however many days.

You need to study that theory as that is the basis for this problem- which has the added complication of recovery from such failures.

Office_Shredder · Jun 16, 2022

Let's simplify things and skip the time process at first. Can you write down the transition matrix for the discrete process where each step is a processor fails, and then the reconfiguration either succeeds or fails?

CTK · Jun 17, 2022

PeroK said:

The exponential distribution is quite common and applies where the chance of failure at each time is independent of time. It applies to radioactive half lives, and is often applied to lightbulb and processor failures. It means that a light bulb has as much chance of failing on any day, whether it's the first day it's used, the 100th day or the 1000th day (dependent on, of course, it surviving the first 99 or 999 days).

Then you can apply that to any number of independent light bulbs and calculate the probability of how many are still working after however many days.

You need to study that theory as that is the basis for this problem- which has the added complication of recovery from such failures.

I certainly need to look into that because I am encountering quite a lot of problems that are very similar in nature. Thanks.

CTK · Jun 17, 2022

Office_Shredder said:

Let's simplify things and skip the time process at first. Can you write down the transition matrix for the discrete process where each step is a processor fails, and then the reconfiguration either succeeds or fails?

Thanks for your input and help, I really appreciate it. But as I said earlier, I have really given up on it now and maybe will give it a look at a later time, so I will try to close this thread so that others won't waste time on it. But as I said, thanks for your time and help.

Transition Rate Matrix for 5 Processing Units

1. What is a transition rate matrix for 5 processing units?

2. How is a transition rate matrix calculated?

3. What is the significance of a transition rate matrix in scientific research?

4. Can a transition rate matrix be used for systems with more or less than 5 processing units?

5. Are there any limitations to using a transition rate matrix?

Similar threads

Hot Threads

Recent Insights