Conceptual Problems with Random Variables and Sample Theory

In summary: P(X=2) but if X really is a constant function then that set is either the whole sample space S (if 2 is in the range of X) or it's an empty set (if 2 isn't in the range of X) and so P(X=2) is either 1 or 0.
  • #1
siddharth5129
94
3
Hi
I'm having a few conceptual difficulties with random variables and I was hoping someone could clear up a few things for me:

1) Firstly, what exactly do we mean when we say that two random variables X and Y are equal. I understand what identically distributed means, but my difficulty is with equality.
My professor says that equality of X and Y means that for every outcome ω in the sample space, X(ω) = Y(ω). Now, if these variable are continuously distributed, isn't it also true that P(X=Y) = 0 and that P( X and Y ∈ (a,b)) < 1 for (a,b) ⊂ ℝ . I don't see any inconsistency here, but it seems off. Is this really the definition of equality?

2) Also, I'm not entirely sure what it means to add two random variables. Can I go with the above and say that Z = X + Y if for every outcome ω of the sample space, Z(ω) = X(ω) + Y(ω).

3) My final conceptual difficulty is with the large sample theory. Why do we look at N observations in a population as N random variables in their own right and not as N instances of a single random variable ( which is what they intuitively seem to be ) Is this just a convenient starting point or is their a solid rationale behind it ? Surely the random variable is random and variable across the population studied, not for every individual. Does it make sense, for example, to talk about disease frequency being a random variable in the context of a single person ?

I'd appreciate any sort of clarification. Thanks :)
 
Physics news on Phys.org
  • #2
siddharth5129 said:
Hi
I'm having a few conceptual difficulties with random variables and I was hoping someone could clear up a few things for me:

1) Firstly, what exactly do we mean when we say that two random variables X and Y are equal. I understand what identically distributed means, but my difficulty is with equality.
My professor says that equality of X and Y means that for every outcome ω in the sample space, X(ω) = Y(ω). Now, if these variable are continuously distributed, isn't it also true that P(X=Y) = 0 and that P( X and Y ∈ (a,b)) < 1 for (a,b) ⊂ ℝ . I don't see any inconsistency here, but it seems off. Is this really the definition of equality?
If the variables X and Y describe the same thing, they are equal. For example, X(w) might be a binary variable like "is male". X(w) is 1 if w is a male and 0 otherwise. Y equals X by the definition above if Y(w) also equals 1 if and only if w is a male, even if the description of Y might be different.

P(X=Y) would equal zero only if the two are both independent and continuously distributed.

2) Also, I'm not entirely sure what it means to add two random variables. Can I go with the above and say that Z = X + Y if for every outcome ω of the sample space, Z(ω) = X(ω) + Y(ω).
That looks right.
3) My final conceptual difficulty is with the large sample theory. Why do we look at N observations in a population as N random variables in their own right and not as N instances of a single random variable ( which is what they intuitively seem to be ) Is this just a convenient starting point or is their a solid rationale behind it ? Surely the random variable is random and variable across the population studied, not for every individual. Does it make sense, for example, to talk about disease frequency being a random variable in the context of a single person ?

I'd appreciate any sort of clarification. Thanks :)
You probably wouldn't discuss frequency of disease in a single person, but probability of a random person having a disease seems fair. Then, if you select 10 random people, you have selected 10 random (binary) variables. Each one being p% likely to have the disease. The frequency is the observed proportion, which you would use to make inferences about what the true probability is within the population at large.
 
  • #3
siddharth5129 said:
isn't it also true that P(X=Y) = 0 and that P( X and Y ∈ (a,b)) < 1 for (a,b) ⊂ ℝ .

The notation P(X=Y) is ambiguous. It would be unusual to interpret it to mean "The probability that X(w) = Y(w) for each w in the sample space S". For that to make sense, you'd need to considering a different sample space than S. You'd be considering a sample space where the event (X=Y) is defined by "we pick two random variables at random and find they are equal as random variables". Using that interpretation, you can't say if P(X=Y)= 0 without more information.

You might be thinking of a situation where we are given that X=Y ( as random variables) and we sample two possibly different outcomes w1 and w2 from the space S and define the event "X=Y" to mean X(w1) = Y(w2).

In that case, even given that X and Y are continuous random variables we can't say conclude P(X=Y)=0 unless we have more information - for example information about the joint distribution of w1 and w2. Perhaps you are thinking of a special situation - such as letting w1 and w2 be two independent random samples (i.e. single numbers) taken from a normal distribution.

To be clear, notation has to distinguish between "a random variable" and "a realization of a random variable", but it's common to be careless about notation and leave it to the reader to figure things out. For example, if X is a random variable then the notation "X=2" would literally say "X is the constant function X(w) = 2 for each outcome w" But what most people mean by "X=2" when used inside "P(X=2)" is "The set of all outcomes w such that X(w) = 2"
 

Similar threads

Replies
5
Views
2K
Replies
30
Views
3K
Replies
9
Views
2K
Replies
1
Views
1K
Replies
1
Views
858
Replies
4
Views
2K
Back
Top