Conceptual Problems with Random Variables and Sample Theory

Click For Summary
SUMMARY

This discussion addresses conceptual challenges related to random variables and sample theory. The equality of random variables X and Y is defined as X(ω) = Y(ω) for every outcome ω in the sample space, despite the fact that P(X=Y) = 0 for continuous distributions. Adding random variables is confirmed as Z = X + Y if Z(ω) = X(ω) + Y(ω) for all ω. The discussion also clarifies that N observations in a population are treated as N independent random variables, allowing for statistical inference about disease frequency rather than individual instances.

PREREQUISITES
  • Understanding of random variables and their properties
  • Familiarity with probability theory, particularly continuous distributions
  • Knowledge of statistical inference and sample theory
  • Basic concepts of joint distributions and independence in probability
NEXT STEPS
  • Study the properties of continuous random variables in detail
  • Explore the concept of joint distributions and their implications for random variables
  • Learn about statistical inference techniques using large sample theory
  • Investigate the differences between random variables and their realizations in probability notation
USEFUL FOR

Students and professionals in statistics, data science, and mathematics who seek to deepen their understanding of random variables, probability theory, and statistical inference methods.

siddharth5129
Messages
94
Reaction score
3
Hi
I'm having a few conceptual difficulties with random variables and I was hoping someone could clear up a few things for me:

1) Firstly, what exactly do we mean when we say that two random variables X and Y are equal. I understand what identically distributed means, but my difficulty is with equality.
My professor says that equality of X and Y means that for every outcome ω in the sample space, X(ω) = Y(ω). Now, if these variable are continuously distributed, isn't it also true that P(X=Y) = 0 and that P( X and Y ∈ (a,b)) < 1 for (a,b) ⊂ ℝ . I don't see any inconsistency here, but it seems off. Is this really the definition of equality?

2) Also, I'm not entirely sure what it means to add two random variables. Can I go with the above and say that Z = X + Y if for every outcome ω of the sample space, Z(ω) = X(ω) + Y(ω).

3) My final conceptual difficulty is with the large sample theory. Why do we look at N observations in a population as N random variables in their own right and not as N instances of a single random variable ( which is what they intuitively seem to be ) Is this just a convenient starting point or is their a solid rationale behind it ? Surely the random variable is random and variable across the population studied, not for every individual. Does it make sense, for example, to talk about disease frequency being a random variable in the context of a single person ?

I'd appreciate any sort of clarification. Thanks :)
 
Physics news on Phys.org
siddharth5129 said:
Hi
I'm having a few conceptual difficulties with random variables and I was hoping someone could clear up a few things for me:

1) Firstly, what exactly do we mean when we say that two random variables X and Y are equal. I understand what identically distributed means, but my difficulty is with equality.
My professor says that equality of X and Y means that for every outcome ω in the sample space, X(ω) = Y(ω). Now, if these variable are continuously distributed, isn't it also true that P(X=Y) = 0 and that P( X and Y ∈ (a,b)) < 1 for (a,b) ⊂ ℝ . I don't see any inconsistency here, but it seems off. Is this really the definition of equality?
If the variables X and Y describe the same thing, they are equal. For example, X(w) might be a binary variable like "is male". X(w) is 1 if w is a male and 0 otherwise. Y equals X by the definition above if Y(w) also equals 1 if and only if w is a male, even if the description of Y might be different.

P(X=Y) would equal zero only if the two are both independent and continuously distributed.

2) Also, I'm not entirely sure what it means to add two random variables. Can I go with the above and say that Z = X + Y if for every outcome ω of the sample space, Z(ω) = X(ω) + Y(ω).
That looks right.
3) My final conceptual difficulty is with the large sample theory. Why do we look at N observations in a population as N random variables in their own right and not as N instances of a single random variable ( which is what they intuitively seem to be ) Is this just a convenient starting point or is their a solid rationale behind it ? Surely the random variable is random and variable across the population studied, not for every individual. Does it make sense, for example, to talk about disease frequency being a random variable in the context of a single person ?

I'd appreciate any sort of clarification. Thanks :)
You probably wouldn't discuss frequency of disease in a single person, but probability of a random person having a disease seems fair. Then, if you select 10 random people, you have selected 10 random (binary) variables. Each one being p% likely to have the disease. The frequency is the observed proportion, which you would use to make inferences about what the true probability is within the population at large.
 
siddharth5129 said:
isn't it also true that P(X=Y) = 0 and that P( X and Y ∈ (a,b)) < 1 for (a,b) ⊂ ℝ .

The notation P(X=Y) is ambiguous. It would be unusual to interpret it to mean "The probability that X(w) = Y(w) for each w in the sample space S". For that to make sense, you'd need to considering a different sample space than S. You'd be considering a sample space where the event (X=Y) is defined by "we pick two random variables at random and find they are equal as random variables". Using that interpretation, you can't say if P(X=Y)= 0 without more information.

You might be thinking of a situation where we are given that X=Y ( as random variables) and we sample two possibly different outcomes w1 and w2 from the space S and define the event "X=Y" to mean X(w1) = Y(w2).

In that case, even given that X and Y are continuous random variables we can't say conclude P(X=Y)=0 unless we have more information - for example information about the joint distribution of w1 and w2. Perhaps you are thinking of a special situation - such as letting w1 and w2 be two independent random samples (i.e. single numbers) taken from a normal distribution.

To be clear, notation has to distinguish between "a random variable" and "a realization of a random variable", but it's common to be careless about notation and leave it to the reader to figure things out. For example, if X is a random variable then the notation "X=2" would literally say "X is the constant function X(w) = 2 for each outcome w" But what most people mean by "X=2" when used inside "P(X=2)" is "The set of all outcomes w such that X(w) = 2"
 

Similar threads

  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 30 ·
2
Replies
30
Views
5K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K