Physics Forums (http://www.physicsforums.com/index.php)
-   Set Theory, Logic, Probability, Statistics (http://www.physicsforums.com/forumdisplay.php?f=78)

 musicgold Feb9-12 09:29 AM

I think I understand the concept of random variable (for example, the number of heads when three coins are tossed together or the temperature of a place at 6.00am every morning).

I am, however, confused as I have seen some material which refers even the values taken by a random variable (or instances) as random variables. For example, consider the text from a PowerPoint presentation. The second part, for example, calls the members of a sample as independent variables.

Thanks.

Text from a presentation.
“Suppose we are given a random variable X with some unknown probability distribution. We want to estimate the basic parameters of this distribution, like the expectation of X and the variance of X. The usual way to do this is to observe n independent variables all with the same distribution as X”

“Let X1,X2,…,Xn be independent and identically distributed random variables having c. d. f. F and expected value μ. Such a sequence of random variables is said to constitute a sample from the distribution F.”

 mathman Feb9-12 04:35 PM

The language is a little faulty. He seems to be using random variable to mean both the variable and a sample of the variable. For example the outcome of a coin toss is a random variable with two possible outcomes. Once you toss a coin you are taking a sample.

 Stephen Tashi Feb10-12 12:49 AM

If you have a random variable X and you consider the process of taking n independent samples of it (as opposed to taking one definite sample with fixed numerical values) then you have a random vector. Random vectors are sometimes called random variables (just as in vector math, a "variable" could represent a vector.)

When you think about statistics, it is a mistake to try to think about a typical problem in terms of a single random variable. Anything that is a function of a random variable is another random variable to worry about. Thus a random sample of n independent realizations of the random variable X is a random vector. The mean of this sample is another random variable. The variance of the sample is another random variable. The unbiased estimator of the sample variance is another random variable. A statistic, such as the t-statistic is a function of the sample values, so it becomes another random variable. (This is particularly confusing if you are used to thinking of "a statistic" as definite numerical value, such as 78.3 years. In statistics, a statistic is any function of the sample values and hence it is a random variable. Adding further to the confusion is the fact that terms like "sample variance" and "sample mean" are sometimes used to refer to specific numerical results instead of functions of random variables. )

 musicgold Feb10-12 09:27 AM

Thanks folks.

Ok. Now I get it. But I have a follow up question. The term IID- independent and identically distributed - a commonly used qualifier for most random variables.

If I am taking samples of a random variable, I am picking points from one distribution. Why do I have to use the qualifier 'identically distributed'?

Also, how does a non-identically distributed sample of a random variable look?

 lavinia Feb10-12 10:02 AM

Quote:
 Quote by musicgold (Post 3754533) I think I understand the concept of random variable (for example, the number of heads when three coins are tossed together or the temperature of a place at 6.00am every morning). I am, however, confused as I have seen some material which refers even the values taken by a random variable (or instances) as random variables. For example, consider the text from a PowerPoint presentation. The second part, for example, calls the members of a sample as independent variables. How should I think about this? Thanks. Text from a presentation. “Suppose we are given a random variable X with some unknown probability distribution. We want to estimate the basic parameters of this distribution, like the expectation of X and the variance of X. The usual way to do this is to observe n independent variables all with the same distribution as X” “Let X1,X2,…,Xn be independent and identically distributed random variables having c. d. f. F and expected value μ. Such a sequence of random variables is said to constitute a sample from the distribution F.”
When one has a large number of independent samples of a distribution then the average of the sample is a sample from a nearly normally distributed random variable - assuming that the original distribution has finite variance. Further the mean of the nearly normal distribution
is the mean of the original distribution and its variance converges to zero for increasingly large samples. This is the Central Limit Theorem

http://en.wikipedia.org/wiki/Central...#Classical_CLT

Classical statistics is possible because large averages are close to normally distributed even when the original distribution is unknown. All you need is finite variance and mean. So these two parameters can be accurately estimated from the averages of independent samples because normal distributions are well understood.

The crux of this line of reasoning is the idea of independent sampling. Independent samples from a single random variable are equivalent to samples of different random variables with the same distribution. Independence means that nothing is changed by the sampling process. The samples are the same as if they were taken from different random variables.

It is not unfair to say that the thing that differentiates probability theory from analysis is the idea of independence. This in my opinion is what you should try to understand. Then everything else will make sense.

 alan2 Feb10-12 10:10 AM

A sequence of random variables, X1, X2, ... is identically distributed if all have the same distribution function. Then they all have the same set of possible values. For example, X1 is the first flip of a coin, X2 is the second flip, etc.... You could have a bunch of random variables all with different distributions. For example, X1 is the flip of a coin, X2 is the roll of a die, etc.... They are different random variables. But if you repeatedly sample the same random variable then your results are necessarily identically distributed. i.e. X1 is one point drawn from the given distribution, X2 is another point drawn from the same distribution, etc. I think it may be semantics. I know about probability but a statistician may use different language. I think mathman said it well above.

 musicgold Feb13-12 05:06 PM

Thanks folks.

But I have a follow up question. The term IID- independent and identically distributed - a commonly used qualifier for most random variables.

If I am taking samples of a random variable, I am picking points from one distribution. Why do I have to use the qualifier 'identically distributed'?

Also, how does a non-identically distributed sample of a random variable look?

 HallsofIvy Feb13-12 06:31 PM

If you are picking points "from one distribution" then they are "identically distributed".

As for "non-identically distributed", consider this- flip a coin and roll a single die. The set of "outcomes" is
(H, 1), (H, 2), (H, 3), (H, 4), (H, 5), (H, 6), (T, 1), (T, 2), (T, 3), (T, 4), (T, 5), (T, 6).

 MrAnchovy Feb13-12 06:36 PM

Quote:
 Quote by musicgold (Post 3762261) Why do I have to use the qualifier 'identically distributed'
Because if the distributions are not identical the steps that follow would not be valid.

Quote:
 Quote by musicgold (Post 3762261) how does a non-identically distributed sample of a random variable look
The face value of a playing card drawn from a pack without replacement is a simple example.

(Crossposted with HallsofIvy, but I think my example is better ;) so I will let it stand)

 MrAnchovy Feb14-12 02:35 PM

Quote:
 Quote by MrAnchovy (Post 3762383) The face value of a playing card drawn from a pack without replacement is a simple example.
... but of course that is an example of a dependent (and non-identically distributed) random variable so perhaps HallsofIvy's example is better after all.

 musicgold Feb15-12 12:13 AM

Quote:
 As for "non-identically distributed", consider this- flip a coin and roll a single die. The set of "outcomes" is (H, 1), (H, 2), (H, 3), (H, 4), (H, 5), (H, 6), (T, 1), (T, 2), (T, 3), (T, 4), (T, 5), (T, 6).
I am not sure, but it seems the distribution of these outcomes will be identical. They will have a uniform distribution with 8.3% chance for each outcome. I actually ran a simulation and I was getting an almost uniform distribution. Is that right?

 chiro Feb15-12 12:15 AM

Hey musicgold.

The easiest way to think about a random variable in any context is basically that you have a function that maps a value to a corresponding probability. It's not the most rigorous way of defining it, but for most purposes this is what a random variable is.

You basically associate an event with a probability. In a continuous distribution your event is actually a non-zero simple interval (i.e. [a,b] where a < b) and with discrete portions you associate one particular value with a probability.

If the random variable follows all the Kolmogorov Axioms (all probabilities add up to 1, all are greater than or equal to 0, etc), then you have a random variable.

 alan2 Feb15-12 12:46 AM