# Hazard Function, and how to 'turn a rate into a probability'

## Main Question or Discussion Point

So I have a question relating to how the negative exponential distribution arises when talking about probability, more specifically, when deriving a probability from a rate. I believe it involves concepts related to the hazard function, as well as Taylor series expansions. Hopefully someone can recognize what I'm trying to say and point out what assumptions I am forgetting or what concepts I need to know to better articulate the problem and solution. (Please excuse my fast and loose 'verbal' math).

My question is thus:

so we have a an initial amount k0 and we wish to calculate the new amount k1 after time t. the amount change is equal to the initial amount times the rate of change over the interval, λt. this is a stochastic process.
so we have

k1 = k0 - k0*λt

k1 = k0(1 - λt)

So the amount change is λt*k0. Another way of putting this is that the amount change is equal to (k0 * the rate at which the event occurs once per time period t * time interval), plus the rate at which the event occurs twice per time interval, which is equal to the the first term squared (since the rate at which a second event occurs is a function of the rate a first event occurs), and the rate at which a third event occurs is equal to the first two terms multiplied by the first term, and so on. Now if we shrink t towards zero, dividing by a factor equivalent to the number of terms in the series [what says we have to do that btw??], the higher powers of the first term in the above series drop out, and so the event occurs once during the interval or not at all. The proportion of change λt then equals the probability of the event occurring, and the complement 1-λt is the complement of this probability.

Now using the geometric distribution, we can calculate the probability of the event occuring in the n+1th interval t after n intervals of failure, (λt)(1-λt)n[\sup], after which by substituting a variable and applying the limit (1-1/n)^n as n-> ∞ =e we get e-λt[\sup]. [errors?]

So a few questions here. One, not sure if I quite understand what I'm saying :) about the rate at which the event occurs once per time interval; does this necessarily mean that the amount would necessarily be less than 1? Or if it equals 1, then the series wouldn't converge. Also, do the rates for the 'higher powers' of the first term of the series necessarily have to be the same, at least if the initial rate is not the instantaneous rate? If you understand what I am trying to say and can point out the mathematical concepts I need to get there, I'd greatly appreciate it. Am I mismatching concepts from different definitions in ways that don't make sense?

(and btw, I had to rewrite this when physicsforums signed me out initially but I don't think I left anything out!)

Related Set Theory, Logic, Probability, Statistics News on Phys.org
Stephen Tashi
so we have a an initial amount k0 and we wish to calculate the new amount k1 after time t. the amount change is equal to the initial amount times the rate of change over the interval, λt. this is a stochastic process.
By that description, It isn't a stochastic process unless there is something stochastic about the rate.

If you want a stochastic process, perhaps you want "the rate" to be the expected rate of decay. Then you need to define the stochastic process for which "the rate" is an average rate.

If we assume $\lambda$ is the mean number of decay events per unit time, then can we justify saying that the probability of exactly one event in the small time interval $dt$ is $\lambda dt$ ?

I think your question focuses on how to form a clear model of situation. Are you trying to describe the tradtional model of radioactive decay?

Thanks for the response.

>By that description, It isn't a stochastic process unless there is something stochastic about the rate.

Yes, that's what I meant, expected rate- thanks for that clarification. Just to respond to that, then how can that expected rate be broken down into a series, basically of derivatives...?

Radioactive decay would be one example. But I am not just trying to form a model but understand how we get to the model, so I really don't want a derivation of radioactive decay but more an explication of what I was saying below, if it makes enough sense to do so. Really I am trying to understand what it "means" to see the negative exponential, and how one can get there from first principles, i.e. without using the natural log/exponential to start with.

Stephen Tashi
I can't see what your idea is yet, but by free-association it raises some interesting questions.

Suppose the mean number of occurences of an event in time $\triangle t$ is $q$.

One question is "Can we compute the probability that it will happen $k$ times in the time interval $\triangle t$ by subdividing $\triangle t$ into $n$ smaller intervals of length $\delta t = \triangle t / n$ and using the binomial theorem to compute that proabiltiy? We set the number of trials to be $n$. The mean of the binomial is $np$ and thus we set $np = q$ so the probability of success in an interval of $\delta t$ would be $q/n$.

You can't do this for an arbitrary stochastic process, but you can ask: "What kind of stochastic process would justify my doing that?".

The negative exponential is associated with time between events, not with numbers of events. Your title mentions the negative exponential and "hazard function". My free-association on that subject is that I've always found discussions of hazard functions disagreeable because they focus on the anti-cumulaitive distribution of random variables insteead of the cumulative distribution. My stab at the forwards way of doing things is this:

Suppose we have a distribution of times given by the cumulative $F(t)$. We want to pretend that these times represent the elapsed time to the very first occurence of some event. How can we simulate this process as a series of discrete randoms draws of a 0-or-1 that begin at time $t = 0$ and happen a intervals of $\triangle t$ until the first draw of 1 happens?

At the step that begins at time $n \triangle t$ you just use Bayes theorem to compute the probability that a time drawn from the distribuition of $F(t)$ will land in [the interval $[n \triangle t, (n+1) \triangle t]$ given the condition that it didn't land in the interval $[ 0, n \triangle t ]$. You use this result as the probability of drawing a 1 in the interval.

If we take the limit of he above calculation as $\triangle t$ approaches 0, I think we get an equivlanet for the theory of "hazard functions", just phrased differently.

I can't see what your idea is yet, but by free-association it raises some interesting questions.

Suppose the mean number of occurences of an event in time $\triangle t$ is $q$.

One question is "Can we compute the probability that it will happen $k$ times in the time interval $\triangle t$ by subdividing $\triangle t$ into $n$ smaller intervals of length $\delta t = \triangle t / n$ and using the binomial theorem to compute that proabiltiy? We set the number of trials to be $n$. The mean of the binomial is $np$ and thus we set $np = q$ so the probability of success in an interval of $\delta t$ would be $q/n$.

You can't do this for an arbitrary stochastic process, but you can ask: "What kind of stochastic process would justify my doing that?". .
OK, this is making sense (its a derivation of the Poisson distribution?).
I have
$n!/(k!*(n-k))!$*(q/n)k*(1-q/n)n(1-q/n)-k
But I can't
figure out how to properly take the limits...not sure what the limit as n-> infinity of the binomial coefficient would be (1?), nor (1-q/n)-k, although I got that (1-q/n)n is e-q and (q/n)k is 1. (right?)

will try to get this before moving onto the second. I think these are basically the concepts I was trying to sort out, along with how the negative exponential arises from times between events. Thanks again.

Stephen Tashi
OK, this is making sense (its a derivation of the Poisson distribution?).
I have
$n!/(k!*(n-k))!$*(q/n)k*(1-q/n)n(1-q/n)-k

Perhaps my question involves that expression but my question isn't about approximating a process, it's about getting exactly consistent results when you subdivide the process into other processes. Think about it this way:

Your boss tells you to write a program to simulates a process that has an average of 5 "events" per second. You decide to do this by drawing from a binomial distribution with parameters N = 20, p = 5/20 once per second. So, on the average, you get 5 "successes" per second and you declare these to be "events".

Then you boss says. "We need a higher resolution simulation. Change the program so it simulates what happens at half second intervals". You change the program so it draws from a binomial distribution with parameters N = 10, p = (2.5/10) every half second.

You boss comes to you and says "The statisticians are complaining because you changed the distribution of events. Look at these results. You kept the average number of successes per second constant, but you've done something to change how often we get 2 successes per second, and so forth."

You see that by your first program, the probability of 2 successes per second is:
Pr(S=2 in 1 second ) = $\binom{20}{2}(.25)^2 (.75)^{18}$
$= \frac{(20)(19)}{2} (.25)^2 (.75)^{18}$
$= 190 (.25)^2 (.75)^{18}$

By your second program, the probability of 2 successes per second is Pr(S =2 in 1 second)

= Pr(S = 2 in 1st half-second and S = 0 in second half-second)
+ Pr(S = 1 is 1st half-second and S = 1 in second half-second)
+ Pr(S = 0 in first half-second and S = 2 in second half-second)

$= \binom{10}{2}(.25)^2(.75)^8 \binom{10}{0}(.25)^0 (.75)^{10}$
$+ \binom{10}{1}(.25)^1 (.75)^9 \binom{10}{1}(.25)^1(.75)^9$
$+ \binom{10}{0}(.25)^0(.75)^{10} \binom{10}{2}(.25)^2 (.75)^8$

$= \frac{(10)(9)}{2} (.25)^2 (.75)^8 (.75)^{10}$
$+ \frac{10}{1} (.25)^1 (.75)^9 \frac{10}{1}(.25)^1 (.75)^9$
$+ (.75)^{10} \frac{(10)(9)}{2} (.25)^2 (.75)^8$

$= ( 45 + 100 + 45 ) (.25)^2 (.75)^{18}$

So its the statisticians that are wrong, not your program. But were you just lucky? Will your programs give consistent answers for the probabilities of 3 events per second? Suppose you had chosen a different kind of distribution than the binomial. Would it have been possible to "cut it in half"?

Suppose that in your first program, you used a unform distribution on the integers 0 through 10 every second. The mean number of events would be (1/11)(10)(10+1)/2 = 5. The probability of getting 2 events in a second would be 1/11. Suppose you subdivide that distribution into two uniform distributions, each on the integers 0 through 5 every half second. The mean number of events would be 2.5 + 2.5= 5 so that's OK. . The probability of getting 2 events per second is (1/6)(1/6) + (1/6)(1/6) + (1/6)(1/6) = 3/36 = 1/12..

not sure what the limit as n-> infinity of the binomial coefficient would be
$\binom{n}{k} = (n)(n-1)(n-2)...(n-k+1)$ so it approaches infinity.

$\lim_{n \rightarrow \infty} (q/n)^k = 0$ not 1. My guess that that you must look at the binomial coefficient and that factor together. At any rate, it's something we can look up. I suppose we can look up the answer to my question too, but where to look is more obscure.

Last edited:
So the only thing I can think of regarding your question, as much as I understand it, is some invocation of the central limit theorem...i.e. the more you subsample the interval, the close the probabilities for a given event would be, even between different distributions? Does that bear on the question you are asking, if so how?

Regarding limits: yes, I goofed there, (q/n)^k obviously limits to 0, not one. But where do we look up these answers, is there a book?

Also, I realized one thing that clarifies my thinking on the negative exponential:
Basically, you have to make use of the definition of expectation.
So if we have

λ = average amount of pings per time, pings occur randomly

Δt = time interval

then λΔt = the average number of pings occur in interval Δt, units:#pings

So if X = number of pings, with X taking values of 0, 1, 2, 3..., then the expected value of X in an interval Δt would be sum of (from X= 0 to ∞)

X[Pr(X=1)]^x

assuming that each ping is an independent event, and so probability of multiple pings is simply product of probability of one ping.

so now, λΔt = X[Pr(X=1)]^x

Then, if Δt is limiting to zero, then we assume that we can ignore higher powers of the probability Pr(X=1), since as Δt shrinks so too does [Pr(X=1)] in that interval. Thus, we are left with

λΔt = X[Pr(X=1)]

as Δt limits to 0, and thus λΔt is equivalent to the probability of 1 ping and (1-λΔt) is the probability of 0 pings, during interval Δt. We can then substitute the parameter λΔt into e.g. a geometric distribution to derive an expression for the probability of the waiting time for the first ping. Or other distributions to simulate other kinds of processes? Hopefully that's correct.

Stephen Tashi
The CLT involves what happens to the sample mean as more and more independent samples are taken from the same distribution. My question asks about the consistency of two related, but not identical distributions.

But, back to your question. Your last post looks like an attempt to show that the waiting times between events in a Poisson process are exponentially distributed. The original post seems to ask why a Poission distribution is a good approximation for the Binomial when the probability of "success" is small. Am I interpreting your questions correctly?

Right. I understand now. For me the key step was how the expectation for # successes in a subinterval is for a very short subinterval the same as the probability of 1 success over that subinterval.

λe^(-λt)dt is the function achieved if we allow Δt -> dt or number of trials n -> ∞ for the geometric distribution with n trials. Since λ is a constant for a given process, this is a function only of time t. Thus this is a function describing the probability for (waiting time until 1 success) lying in the interval "dt". This fulfills the requirements of a probability density function (Pr[a <= X <= b] = integral from a to b of f(x)dx). Since for a probability density function the "dt" is implied, we can drop the dt and leave λe^(-λt) is the pdf. Or something along those lines.

If λe^(-λt) can be understood as the pdf for 1 success after failure in the interval t, then the probability of n successes during time t, event order not important, is the limit of the binomial, with p = (λt/n) as n (number of Bernoulli trials) -> ∞ . Which is the poisson distribution, as in your original reply.

I still need to take a look at what you are saying about a 'hazard function' derivation using Bayes theorem, but hopefully the above is correct.

I don't really have anything to add about your question about the consistency of two related, but not identical sample distributions. However, here are a couple papers that maybe come at your question from a different direction, or at least might be of interest-
http://arxiv.org/abs/0906.3507
http://www.mdpi.com/1099-4300/12/3/289

Edit: they probably are unrelated to your point, but if you do come up with anything I'd like to hear it.

Last edited: