Undergrad Rewriting of equality in conditional probability distribution

Click For Summary
The discussion centers on deriving the equality $$\frac{P[x<X<x+dx|N=n]}{dx}=f_{X|N}(x|n)$$ and clarifying the relationship between conditional probability distributions. Participants highlight that this expression is a limit definition of conditional density, emphasizing the need for careful handling of continuous and discrete cases. The conversation also touches on the application of Bayes' Rule and the challenges of mixing different types of random variables, suggesting that measure theory may be necessary for rigorous treatment. Overall, the discussion reveals the complexities involved in understanding conditional probability distributions and the notation used in their representation.
georg gill
Messages
151
Reaction score
6
bank stat.png


I don't get $$\frac{P[x<X<x+dx|N=n]}{dx}=f_{X|N}(x|n)$$ Can someone derive why? I would believe that $$f_{X|N}(x|n)=\frac{f(x,N)}{p_n(N)}$$ but I don't get how that would be the same. And I don't get that $$\frac{P[x<X<x+dx|N=n]}{dx}=\frac{P[N=n|x<X<x+dx]}{P[N=n]}\frac{P[x<X<x+dx]}{dx}$$

Can someone show how you can rewrite that equality that way.
 

Attachments

  • bank stat.png
    bank stat.png
    41.7 KB · Views: 1,183
Last edited:
Physics news on Phys.org
I haven't seen things shown this way before, but a couple of thoughts.

1.) you realize they are just trying to extend Bayes Rule right? I.e. the bottom line in your image, after you clear the denominator is

##p_N(n) f_{X|N}(x|n) = f_X(x) p_{N|X}(n|x)##

which is the mixed form of Bayes Rule.

2.) The point of this problems is that in general people get antsy when you mix continuous and discrete cases. One of two approaches then gets used -- either extensive use of measure theory, or working at the level of the CDF. You problem is doing the latter, though maybe it isn't so easy to tell. The notation ##P\{x \lt X \lt x + dx\vert N = n \}## is rather hard for me to interpret (for starters it appears that only the dx is being conditioned on) but what I think they are actually doing is ##F_{X|N}(X + \delta) - F_{X|N}(X)## for some small ##\delta \gt 0##. That is they are working at the CDF level and looking at some small chunk of probability. To avoid scaling issues, you divide out this ##\delta## and get

##\frac{F_{X|N}(X + \delta) - F_{X|N}(X)}{\delta}##.

From here the interest is in what happens for very small ##\delta##, so they pass a limit. You should recognize this as a difference quotient for a derivative, and recall that when you differentiate a CDF you recover the underlying PDF. Since it is a conditional CDF, you recover a conditional PDF.

I don't really like the way this is shown in your text thougth. Hopefully the above is helpful.
 
georg gill said:
I don't get $$\frac{P[x<X<x+dx|N=n]}{dx}=f_{X|N}(x|n)$$ Can someone derive why?

The attachment you gave involves taking a limit. The attachment's use of the notation "##(x|n)##" as the argument of a function is often seen, but somewhat confusing. The usual notation for a function of two variables "##(x,n)##" would be satisfactory. Some people prefer to use ##(x|n)## to emphasize that a function gives a conditional probability. If we use ""##f_{(X,N)} ##" to denote the joint density for ##(X,N)## and ##f_{X|N}## to denote the conditional density of ##X## given ##N## then we don't need the notation ""##(x|n)##" in the argument.

A correct notation for the equation you ask about is: ## f_{X|N}(x,n) = \lim_{dx \rightarrow 0} \frac{P(x < X < x + dx| N= n)}{dx}##

This equation is true "by definition" when we use the concept of a conditional density that (I'm guessing) your text materials use. However, unless the author of your text has been very ingenious, the definition of conditional density in the text may not include the above equation as a special case.

Suppose we have a vector ##X## of random variables whose outcomes are in a set ##S##. Let ##A## be a subset of ##S##. The general concept of the conditional density ##f_{X|A}## is that ##f_{X|A}(x) ## is the limit of ##P(X \in w | x \in A)## as the set ##w## "approaches" the single vector ##x##. It is complicated to give a precise definition for a limit where a set ##w## is said to "approach" a single member of the set ##x##. I don' know how much detail your text materials attempt.The equation you ask about involves the special situation where the vector of variables is ##(X,N)## and the set ##A## is ##\{(x,k):-\infty < x < \infty, k = n\}##
I would believe that $$f_{X|N}(x|n)=\frac{f(x,N)}{p_n(N)}$$ but I don't get how that would be the same.

A correct notation for that is ## f_{X|N}(x,n)=\frac{f(x,n)}{p_N(n)}## with the understanding that ##f## is ##f_{(X,N)}##, the joint density of ##(X,N)##.

I assume you are using "##p_N## to denote the marginal density function of ##N##.

You probably believe that equation for the same reason that I believe it. It can be (incorrectly) deduced by thinking about a density function ##f_X(x)## as representing "The probability that ##X = x##". This way of thinking is technically incorrect since a "probability density" is not a "probability". (By analogy a rod that has mass density of 5 kg/ meter, does not have a mass of 5 kg located at each single point ##X = x##. Each point has zero mass, but the mass density at each point is 5 kg/meter.)

Nevertheless, this incorrect way of thinking often leads to correct conclusions, so it is useful as an intuitive way of understanding formulae.

Attempting to reason correctly, begin with the definition
eq. 1) ## f_{X|N}(x,n) = \lim_{dx \rightarrow 0} \frac{P(x < X < x + dx| N= n)}{dx}##
We can set ##P(x < X< x + dx | N=n)## equal to ##\frac{ P( (X,N) \in ([x,x+dx],n) )} {P( (X,N) \in ((-\infty,\infty),n)} ## by definition of a conditional probability.
(This is an application of the definition ##P(A|B) = \frac{ P(A \cap B)}{ P(B)}## with ##A= \{(y,k): x < y < x + dx, k = n\} ## and ##B = \{(y,k): -\infty < y < \infty, k=n\}##.)

The right hand side of eq. 1 becomes: ## \lim_{dx \rightarrow 0} \frac{ P( (X,N) \in ([x,x+dx],n))} {P( (x,n) \in ((-\infty,\infty),n))} ##.
The denominator ##P((x,n) \in (-\infty,\infty),n))## does not depend on ##dx##, so for the purposes of taking the limit, the denominator is a constant. The value of the denominator is by definition the marginal density ##P(n)##. The limit of the numerator is, by definition ##f_{(X,N)}(x,n)##.

Of course, to justify the limit of the numerator by material in you text, we would have to know how your text defines a joint density for vector containing both continuous and discrete random variables. In terms of the general concept described above,to let the set ##w## "approach" a single value ##x## we set the discrete random variables equal to the values they have in ##x## and take the limit of the continuous variables as they approach the values specified in ##x##. So the joint density defined as ##\lim_{ (dx,dn) \rightarrow (0,0) } P( x < X < x + dx, n \le N < n + dn) ## is evaluated by taking ##lim_{dx \rightarrow 0} P( x < X < x + dx, N =n)##.
And I don't get that $$\frac{P[x<X<x+dx|N=n]}{dx}=\frac{P[N=n|x<X<x+dx]}{P[N=n]}\frac{P[x<X<x+dx]}{dx}$$

As @StoneTemplePython pointed out, the general idea is ##P(A \cap B) = P(A|B)P(B) = P(B|A) P(A)##
So ##P(A|B) = \frac{P(B|A)}{P(B} P(A) ## Apply this with ##A = \{(y,k): x < y < x + dx, 0 \le k \lt \infty\},\ B=\{(y,k): -\infty < y < \infty, k = n\}##
Then divide both sides of the equation by ##dx##.

It is tedious to use the methods of calculus in a logically precise manner to deal with a mix of continuous and discrete random variables. It's even more tedious to deal with a single random variable that has both continuous and discrete values. (e.g. Define the random variable Y by : Flip a fair coin. If the coin lands heads then Y = .5. If the coin lands tales then pick Y from a uniform distribution on [0,1]. ) As @StoneTemplePython mentioned, a typical education in probability theory begins by treating continuous and discrete random variables separately. Then it jumps to the advanced topic of measure theory. The great generality of measure theory can then be used to handle a mixture of discrete and continuous random variables as a particular case.
 
  • Like
Likes georg gill
The standard _A " operator" maps a Null Hypothesis Ho into a decision set { Do not reject:=1 and reject :=0}. In this sense ( HA)_A , makes no sense. Since H0, HA aren't exhaustive, can we find an alternative operator, _A' , so that ( H_A)_A' makes sense? Isn't Pearson Neyman related to this? Hope I'm making sense. Edit: I was motivated by a superficial similarity of the idea with double transposition of matrices M, with ## (M^{T})^{T}=M##, and just wanted to see if it made sense to talk...

Similar threads

  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 6 ·
Replies
6
Views
1K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K