georg gill said:
I don't get $$\frac{P[x<X<x+dx|N=n]}{dx}=f_{X|N}(x|n)$$ Can someone derive why?
The attachment you gave involves taking a limit. The attachment's use of the notation "##(x|n)##" as the argument of a function is often seen, but somewhat confusing. The usual notation for a function of two variables "##(x,n)##" would be satisfactory. Some people prefer to use ##(x|n)## to emphasize that a function gives a conditional probability. If we use ""##f_{(X,N)} ##" to denote the joint density for ##(X,N)## and ##f_{X|N}## to denote the conditional density of ##X## given ##N## then we don't need the notation ""##(x|n)##" in the argument.
A correct notation for the equation you ask about is: ## f_{X|N}(x,n) = \lim_{dx \rightarrow 0} \frac{P(x < X < x + dx| N= n)}{dx}##
This equation is true "by definition" when we use the concept of a conditional density that (I'm guessing) your text materials use. However, unless the author of your text has been very ingenious, the definition of conditional density in the text may not include the above equation as a special case.
Suppose we have a
vector ##X## of random variables whose outcomes are in a set ##S##. Let ##A## be a subset of ##S##. The general concept of the conditional density ##f_{X|A}## is that ##f_{X|A}(x) ## is the limit of ##P(X \in w | x \in A)## as the set ##w## "approaches" the single vector ##x##. It is complicated to give a precise definition for a limit where a set ##w## is said to "approach" a single member of the set ##x##. I don' know how much detail your text materials attempt.The equation you ask about involves the special situation where the vector of variables is ##(X,N)## and the set ##A## is ##\{(x,k):-\infty < x < \infty, k = n\}##
I would believe that $$f_{X|N}(x|n)=\frac{f(x,N)}{p_n(N)}$$ but I don't get how that would be the same.
A correct notation for that is ## f_{X|N}(x,n)=\frac{f(x,n)}{p_N(n)}## with the understanding that ##f## is ##f_{(X,N)}##, the joint density of ##(X,N)##.
I assume you are using "##p_N## to denote the marginal density function of ##N##.
You probably believe that equation for the same reason that I believe it. It can be (incorrectly) deduced by thinking about a density function ##f_X(x)## as representing "The probability that ##X = x##". This way of thinking is technically incorrect since a "probability
density" is not a "probability". (By analogy a rod that has mass density of 5 kg/ meter, does not have a mass of 5 kg located at each single point ##X = x##. Each point has zero mass, but the mass density at each point is 5 kg/meter.)
Nevertheless, this incorrect way of thinking often leads to correct conclusions, so it is useful as an intuitive way of understanding formulae.
Attempting to reason correctly, begin with the definition
eq. 1) ## f_{X|N}(x,n) = \lim_{dx \rightarrow 0} \frac{P(x < X < x + dx| N= n)}{dx}##
We can set ##P(x < X< x + dx | N=n)## equal to ##\frac{ P( (X,N) \in ([x,x+dx],n) )} {P( (X,N) \in ((-\infty,\infty),n)} ## by definition of a conditional probability.
(This is an application of the definition ##P(A|B) = \frac{ P(A \cap B)}{ P(B)}## with ##A= \{(y,k): x < y < x + dx, k = n\} ## and ##B = \{(y,k): -\infty < y < \infty, k=n\}##.)
The right hand side of eq. 1 becomes: ## \lim_{dx \rightarrow 0} \frac{ P( (X,N) \in ([x,x+dx],n))} {P( (x,n) \in ((-\infty,\infty),n))} ##.
The denominator ##P((x,n) \in (-\infty,\infty),n))## does not depend on ##dx##, so for the purposes of taking the limit, the denominator is a constant. The value of the denominator is
by definition the marginal density ##P(n)##. The limit of the numerator is, by definition ##f_{(X,N)}(x,n)##.
Of course, to justify the limit of the numerator by material in you text, we would have to know how your text defines a joint density for vector containing both continuous and discrete random variables. In terms of the general concept described above,to let the set ##w## "approach" a single value ##x## we set the discrete random variables equal to the values they have in ##x## and take the limit of the continuous variables as they approach the values specified in ##x##. So the joint density defined as ##\lim_{ (dx,dn) \rightarrow (0,0) } P( x < X < x + dx, n \le N < n + dn) ## is evaluated by taking ##lim_{dx \rightarrow 0} P( x < X < x + dx, N =n)##.
And I don't get that $$\frac{P[x<X<x+dx|N=n]}{dx}=\frac{P[N=n|x<X<x+dx]}{P[N=n]}\frac{P[x<X<x+dx]}{dx}$$
As
@StoneTemplePython pointed out, the general idea is ##P(A \cap B) = P(A|B)P(B) = P(B|A) P(A)##
So ##P(A|B) = \frac{P(B|A)}{P(B} P(A) ## Apply this with ##A = \{(y,k): x < y < x + dx, 0 \le k \lt \infty\},\ B=\{(y,k): -\infty < y < \infty, k = n\}##
Then divide both sides of the equation by ##dx##.
It is tedious to use the methods of calculus in a logically precise manner to deal with a mix of continuous and discrete random variables. It's even more tedious to deal with a single random variable that has both continuous and discrete values. (e.g. Define the random variable Y by : Flip a fair coin. If the coin lands heads then Y = .5. If the coin lands tales then pick Y from a uniform distribution on [0,1]. ) As
@StoneTemplePython mentioned, a typical education in probability theory begins by treating continuous and discrete random variables separately. Then it jumps to the advanced topic of measure theory. The great generality of measure theory can then be used to handle a mixture of discrete and continuous random variables as a particular case.