# I P(X=x) in continuous distributions

1. Oct 26, 2016

### mertcan

hi,initially I am aware that for continuous distributions, P(X=x) always equals zero, but when I look at some derivations as the attachment I see that for exponential variable they use exponential pdf when they want to find P(X1=x). My question is : if we say that for continuous distributions, P(X=x) always equals zero why P(X1=x) equals exponential pdf here why it is not 0 ?????????

Last edited: Oct 26, 2016
2. Oct 26, 2016

### FactChecker

There is no P(X1=x) in the calculations. There are probabilities of other things given X1=x. That is different. Although the probability of a continuous random variable equaling any pre-specified value is zero, a sample of the variable must have a value, say x0. Given that the value of X is x0, it is legitimate to draw conclusions regarding other related probabilities.
As a simple example, suppose X is a uniform random variable in [0, 10]. Define the related random variable Y = 0 if X<1; Y=1 if X ≥ 1. Then even though P( X = π ) = 0, we can legitimately say that P( Y = 1 | X = π ) =1.

Last edited: Oct 26, 2016
3. Oct 28, 2016

### Stephen Tashi

In doing calculations, it often possible to think informally of the a continuous density function f(x) as giving a probability P(X = x) even though it does not. This invalid way of thinking is helpful in remembering certain formulas - even though it doesn't prove them.

The derivation you showed didn't quote any theorems to justify its steps. So let's ask how your text justified the first step:

$P\{ X_1 < X_2\} = \int_{0}^\infty P\{X_1 < X_2| X_1 = x \} \lambda_1 e^{-\lambda_1 x} dx$

If the author of your text is very careful, you will be able to find a theorem or definition that justifies this step. If the author of your text is giving an informal treatment of probability then he may expect the reader to justify the first step by thinking:

$P\{ X_1 < X_2\} = \int_{0}^\infty P\{X_1 < X_2| X_1 = x \} P\{X_1=x\} dx$
$= \int_{0}^\infty P\{X_1 < X_2| X_1 = x \}\lambda_1 e^{-\lambda_1 x} dx$

This informal way of thinking applies several ideas. One idea is that the probability of an event can be computed by partitioning it into mutually exclusive events and adding up the probabilities of the events in the partition. So we partition the event $\{X_1 < X_2\}$ by considering all possible values of $X_1$. Informally , we are thinking of the set $\{X_1 < X_2\}$ as a union of sets like:

$\{X_1 < X_2\} =( \{X_1 < X_2\}\cap\{X_1 = 0.3\}) \cup ( \{X_1 < X_2\}\cap\{ X_1 = 0.62\} ) \cup (\{X_1 < X_2\} \cap \{X_1 = 5.7\}) \cup ....$

$P\{X_1 < X_2\} = P ( \{X_1 < X_2\}\cap\{X_1 = 0.3\}) + P( \{X_1 < X_2\}\cap\{ X_1 = 0.62\}) + P (\{X_1 < X_2\} \cap \{X_1 = 5.7\} ) + ....$

( Since there are as many possible values of $X_1$ to consider as there are non-negative real numbers, we can't effectively express these thoughts by using a notation with "$...$". )

The next idea is applying the theorem $P\{A \cap B\} = P\{A | B\} P\{B\}$ to each term in the sum, obtaining:

$P\{X_1 < X_2\} = P \{X_1 < X_2| X_1 = 0.3\}P\{X_1 = 0.3\} + P \{X_1 < X_2| X_1 = 0.62\}P\{ X_1 = 0.62\} + P \{X_1 < X_2| X_1 = 5.7\} P\{ X_1 = 5.7\} + ....$

Next we use the hazy notion that "An integral is an infinite sum", obtaining

$P\{X_1 < X_2\} = \int_0^\infty P\{X_1 < X_2| X_1 = x\} P\{X_1 = x\} dx$

Using the misinterpretation $P \{X_1 = x\} = \lambda_1 e^{-\lambda_1 x}$ we get the first step shown in your text:
$P\{X_1 < X_2\}= \int_{0}^\infty P\{X_1 < X_2| X_1 = x \}\lambda_1 e^{-\lambda_1 x} dx$

This way of thinking is analgous to the type of play known as a "Comedy of Errors". Many mishaps lead to a happy conclusion.

It is valid to think of a set as being partitioned into as many subsets as there are real numbers. Instead of a "$...$" notation we should use notation like:

$\{X1 < X2\} = \cup_{x \in [0,\infty)} ( \{X_1 < X_2\} \cap \{X_1 = x \})$

However, it is not valid to apply the idea that "The probability of an event is the sum of the probabilities of the mutually exclusive events that partition it" to such a partition. When we study formal probability theory in terms of "measure theory" we use axioms that apply this idea only to certain partitions composed of a countably infinite or finite number of subsets. (For example, the length of [0,1] is not the sum of the "lengths" of each point in [0,1]. But we can say the length of [0,1] = length of [0,1/2) + length of [1/2, 3/4) + length of [3/4, 7/8) + ... .)

The hazy idea that "An integral is an infinite sum" doesn't do justice to the concept of an integral. An integral can be viewed as a limit of sums, but they are not simply sums of the values of a function. For example $\int_0^1 f(x) dx$ isn't $lim_{n\rightarrow \infty} \sum_{i = 0}^n f( i/n)$. A sum involved in an integral involves the values of the function multiplied by another factor - e.g. $\sum_{i=0}^n f(i/n) (1/n)$.

If we wanted to straighten out the wrong steps in the above comedy of errors, we could improve them by using the type of "$dx,$" reasoning used in physics. That type of reasoning does not give a rigorous proof, but it is fairly reliable in deducing correct conclusions. If we applied that type of reasoning, you would find that we are only dealing with probabilities like $P\{ x - dx/2 < X_1 \leq x + dx/2 \}$ and we are approximating such a probability by $\lambda_1 e^{-\lambda_1 x} dx$. We would not be considering the event $P\{X_1 = x\}$.

It would interesting to perform the proof in your text using the "$dx$" type of reasoning - but it's late at night and I still have other things I need to do!

4. Oct 28, 2016

### mertcan

@Stephen Tashi thanks for your remarkable and nice answer, I so glad , but I am really eager to examine the derivation, I hope you can provide me with the proof of "dx" type reasoning if you have plenty of time....