Undergrad Rewriting of equality in conditional probability distribution

georg gill · Jan 16, 2018

I don't get $$\frac{P[x<X<x+dx|N=n]}{dx}=f_{X|N}(x|n)$$ Can someone derive why? I would believe that $$f_{X|N}(x|n)=\frac{f(x,N)}{p_n(N)}$$ but I don't get how that would be the same. And I don't get that $$\frac{P[x<X<x+dx|N=n]}{dx}=\frac{P[N=n|x<X<x+dx]}{P[N=n]}\frac{P[x<X<x+dx]}{dx}$$

Can someone show how you can rewrite that equality that way.

StoneTemplePython · Jan 16, 2018

I haven't seen things shown this way before, but a couple of thoughts.

1.) you realize they are just trying to extend Bayes Rule right? I.e. the bottom line in your image, after you clear the denominator is

##p_N(n) f_{X|N}(x|n) = f_X(x) p_{N|X}(n|x)##

which is the mixed form of Bayes Rule.

2.) The point of this problems is that in general people get antsy when you mix continuous and discrete cases. One of two approaches then gets used -- either extensive use of measure theory, or working at the level of the CDF. You problem is doing the latter, though maybe it isn't so easy to tell. The notation ##P\{x \lt X \lt x + dx\vert N = n \}## is rather hard for me to interpret (for starters it appears that only the dx is being conditioned on) but what I think they are actually doing is ##F_{X|N}(X + \delta) - F_{X|N}(X)## for some small ##\delta \gt 0##. That is they are working at the CDF level and looking at some small chunk of probability. To avoid scaling issues, you divide out this ##\delta## and get

##\frac{F_{X|N}(X + \delta) - F_{X|N}(X)}{\delta}##.

From here the interest is in what happens for very small ##\delta##, so they pass a limit. You should recognize this as a difference quotient for a derivative, and recall that when you differentiate a CDF you recover the underlying PDF. Since it is a conditional CDF, you recover a conditional PDF.

I don't really like the way this is shown in your text thougth. Hopefully the above is helpful.

Stephen Tashi · Jan 17, 2018

georg gill said:

I don't get $$\frac{P[x<X<x+dx|N=n]}{dx}=f_{X|N}(x|n)$$ Can someone derive why?

The attachment you gave involves taking a limit. The attachment's use of the notation "##(x|n)##" as the argument of a function is often seen, but somewhat confusing. The usual notation for a function of two variables "##(x,n)##" would be satisfactory. Some people prefer to use ##(x|n)## to emphasize that a function gives a conditional probability. If we use ""##f_{(X,N)} ##" to denote the joint density for ##(X,N)## and ##f_{X|N}## to denote the conditional density of ##X## given ##N## then we don't need the notation ""##(x|n)##" in the argument.

A correct notation for the equation you ask about is: ## f_{X|N}(x,n) = \lim_{dx \rightarrow 0} \frac{P(x < X < x + dx| N= n)}{dx}##

This equation is true "by definition" when we use the concept of a conditional density that (I'm guessing) your text materials use. However, unless the author of your text has been very ingenious, the definition of conditional density in the text may not include the above equation as a special case.

Suppose we have a vector ##X## of random variables whose outcomes are in a set ##S##. Let ##A## be a subset of ##S##. The general concept of the conditional density ##f_{X|A}## is that ##f_{X|A}(x) ## is the limit of ##P(X \in w | x \in A)## as the set ##w## "approaches" the single vector ##x##. It is complicated to give a precise definition for a limit where a set ##w## is said to "approach" a single member of the set ##x##. I don' know how much detail your text materials attempt.The equation you ask about involves the special situation where the vector of variables is ##(X,N)## and the set ##A## is ##\{(x,k):-\infty < x < \infty, k = n\}##

I would believe that $$f_{X|N}(x|n)=\frac{f(x,N)}{p_n(N)}$$ but I don't get how that would be the same.

A correct notation for that is ## f_{X|N}(x,n)=\frac{f(x,n)}{p_N(n)}## with the understanding that ##f## is ##f_{(X,N)}##, the joint density of ##(X,N)##.

I assume you are using "##p_N## to denote the marginal density function of ##N##.

You probably believe that equation for the same reason that I believe it. It can be (incorrectly) deduced by thinking about a density function ##f_X(x)## as representing "The probability that ##X = x##". This way of thinking is technically incorrect since a "probability density" is not a "probability". (By analogy a rod that has mass density of 5 kg/ meter, does not have a mass of 5 kg located at each single point ##X = x##. Each point has zero mass, but the mass density at each point is 5 kg/meter.)

Nevertheless, this incorrect way of thinking often leads to correct conclusions, so it is useful as an intuitive way of understanding formulae.

Attempting to reason correctly, begin with the definition
eq. 1) ## f_{X|N}(x,n) = \lim_{dx \rightarrow 0} \frac{P(x < X < x + dx| N= n)}{dx}##
We can set ##P(x < X< x + dx | N=n)## equal to ##\frac{ P( (X,N) \in ([x,x+dx],n) )} {P( (X,N) \in ((-\infty,\infty),n)} ## by definition of a conditional probability.
(This is an application of the definition ##P(A|B) = \frac{ P(A \cap B)}{ P(B)}## with ##A= \{(y,k): x < y < x + dx, k = n\} ## and ##B = \{(y,k): -\infty < y < \infty, k=n\}##.)

The right hand side of eq. 1 becomes: ## \lim_{dx \rightarrow 0} \frac{ P( (X,N) \in ([x,x+dx],n))} {P( (x,n) \in ((-\infty,\infty),n))} ##.
The denominator ##P((x,n) \in (-\infty,\infty),n))## does not depend on ##dx##, so for the purposes of taking the limit, the denominator is a constant. The value of the denominator is by definition the marginal density ##P(n)##. The limit of the numerator is, by definition ##f_{(X,N)}(x,n)##.

Of course, to justify the limit of the numerator by material in you text, we would have to know how your text defines a joint density for vector containing both continuous and discrete random variables. In terms of the general concept described above,to let the set ##w## "approach" a single value ##x## we set the discrete random variables equal to the values they have in ##x## and take the limit of the continuous variables as they approach the values specified in ##x##. So the joint density defined as ##\lim_{ (dx,dn) \rightarrow (0,0) } P( x < X < x + dx, n \le N < n + dn) ## is evaluated by taking ##lim_{dx \rightarrow 0} P( x < X < x + dx, N =n)##.

And I don't get that $$\frac{P[x<X<x+dx|N=n]}{dx}=\frac{P[N=n|x<X<x+dx]}{P[N=n]}\frac{P[x<X<x+dx]}{dx}$$

As @StoneTemplePython pointed out, the general idea is ##P(A \cap B) = P(A|B)P(B) = P(B|A) P(A)##
So ##P(A|B) = \frac{P(B|A)}{P(B} P(A) ## Apply this with ##A = \{(y,k): x < y < x + dx, 0 \le k \lt \infty\},\ B=\{(y,k): -\infty < y < \infty, k = n\}##
Then divide both sides of the equation by ##dx##.

It is tedious to use the methods of calculus in a logically precise manner to deal with a mix of continuous and discrete random variables. It's even more tedious to deal with a single random variable that has both continuous and discrete values. (e.g. Define the random variable Y by : Flip a fair coin. If the coin lands heads then Y = .5. If the coin lands tales then pick Y from a uniform distribution on [0,1]. ) As @StoneTemplePython mentioned, a typical education in probability theory begins by treating continuous and discrete random variables separately. Then it jumps to the advanced topic of measure theory. The great generality of measure theory can then be used to handle a mixture of discrete and continuous random variables as a particular case.

Undergrad Rewriting of equality in conditional probability distribution

Attachments

Thread 'Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense'

Similar threads

Undergrad A variant of the Monty Hall problem

Undergrad Please Explain (actually explain) The Monty Hall Problem

Undergrad What Are the Axioms of Fuzzy Logic and How Do They Extend Boolean Algebra?

High School How Rare Is Low Smartphone Usage Among Metro Travelers in Japan?

High School Onto set mapping is the surjective set mapping, and into injective?

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers