# Conceptual Problems with Random Variables and Sample Theory

1. Sep 29, 2015

### siddharth5129

Hi
I'm having a few conceptual difficulties with random variables and I was hoping someone could clear up a few things for me:

1) Firstly, what exactly do we mean when we say that two random variables X and Y are equal. I understand what identically distributed means, but my difficulty is with equality.
My professor says that equality of X and Y means that for every outcome ω in the sample space, X(ω) = Y(ω). Now, if these variable are continuously distributed, isn't it also true that P(X=Y) = 0 and that P( X and Y ∈ (a,b)) < 1 for (a,b) ⊂ ℝ . I don't see any inconsistency here, but it seems off. Is this really the definition of equality?

2) Also, I'm not entirely sure what it means to add two random variables. Can I go with the above and say that Z = X + Y if for every outcome ω of the sample space, Z(ω) = X(ω) + Y(ω).

3) My final conceptual difficulty is with the large sample theory. Why do we look at N observations in a population as N random variables in their own right and not as N instances of a single random variable ( which is what they intuitively seem to be ) Is this just a convenient starting point or is their a solid rationale behind it ? Surely the random variable is random and variable across the population studied, not for every individual. Does it make sense, for example, to talk about disease frequency being a random variable in the context of a single person ?

I'd appreciate any sort of clarification. Thanks :)

2. Sep 30, 2015

### RUber

If the variables X and Y describe the same thing, they are equal. For example, X(w) might be a binary variable like "is male". X(w) is 1 if w is a male and 0 otherwise. Y equals X by the definition above if Y(w) also equals 1 if and only if w is a male, even if the description of Y might be different.

P(X=Y) would equal zero only if the two are both independent and continuously distributed.

That looks right.
You probably wouldn't discuss frequency of disease in a single person, but probability of a random person having a disease seems fair. Then, if you select 10 random people, you have selected 10 random (binary) variables. Each one being p% likely to have the disease. The frequency is the observed proportion, which you would use to make inferences about what the true probability is within the population at large.

3. Oct 1, 2015

### Stephen Tashi

The notation P(X=Y) is ambiguous.

It would be unusual to interpret it to mean "The probability that X(w) = Y(w) for each w in the sample space S". For that to make sense, you'd need to considering a different sample space than S. You'd be considering a sample space where the event (X=Y) is defined by "we pick two random variables at random and find they are equal as random variables". Using that interpretation, you can't say if P(X=Y)= 0 without more information.

You might be thinking of a situation where we are given that X=Y ( as random variables) and we sample two possibly different outcomes w1 and w2 from the space S and define the event "X=Y" to mean X(w1) = Y(w2).

In that case, even given that X and Y are continuous random variables we can't say conclude P(X=Y)=0 unless we have more information - for example information about the joint distribution of w1 and w2. Perhaps you are thinking of a special situation - such as letting w1 and w2 be two independent random samples (i.e. single numbers) taken from a normal distribution.

To be clear, notation has to distinguish between "a random variable" and "a realization of a random variable", but it's common to be careless about notation and leave it to the reader to figure things out. For example, if X is a random variable then the notation "X=2" would literally say "X is the constant function X(w) = 2 for each outcome w" But what most people mean by "X=2" when used inside "P(X=2)" is "The set of all outcomes w such that X(w) = 2"