What is the expected (squared) coefficient of variation of a sample?

In summary: I see what you mean. Does what you call "coefficient of variation of the samples" have a precise definition? For example, to some people, the terminology "standard deviation of a sample" of N outcomes involves division by N and to others it involves division by N-1. We can use different estimators for the ##\sigma## of the distribution.
  • #1
NotEuler
55
2
I've been trying to figure this out, but am not getting anywhere and was hoping someone here might know.
Say I have a distribution of which I know the variance and mean. I then take samples of n random variables from this distribution.
Without knowing anything more about the distribution, can I calculate the expectation of the coefficient of variation of the samples?
What about the squared coefficient of variation?
If I understand correctly, the expected mean of the samples is just the mean of the distribution. And the expected variance of the samples is just the variance of the distribution divided by n.
But I haven't been able to get much further than that.

Perhaps another way to ask the same is: If I have n iid random variables, what is their expected (squared) coefficient of variation?
 
Physics news on Phys.org
  • #2
The coefficient of variation, and also its square, will depend on the distribution of the random variable. That's because it's a nonlinear function of the samples, so things don't cancel out as they do for the mean and the variance, which are respectively linear in the sampled values and the squared sampled values.
Consider for example the following two random variables:
1. X is chosen from {1, 2, 3, 4, 5} with equal probability for each one.
2. Y is chosen from {2.293, 5.828} with probability of 0.8 on the lower number.
You can verify that X and Y have the same mean and variance. But if you simulate enough samples of size 10 and take the average of the sample coefficients of variation observed you should get something like 0.47 for X and 0.42 for Y.
In short, in most cases, distribution matters and we can't just use the mean and std dev. The cases where we can ignore distribution are generally special cases, or are linear combinations of random variables so we can use linearities to just add means and variances.
Even more concisely, the answer to the "can I?" question is No.
 
  • #3
Excellent, many thanks! Makes a lot of sense.

So does this mean that the 'best' one can do (for an expression of the expectation of the coefficient of variation of the samples) is an expression like Expectation(cv) or Expectation(cv^2) for the square of it?

Or can something more be done if one knows more about the distribution, like skewness, kurtosis and so on?
Or perhaps this is a silly question. Maybe knowing 'all the central moments' is ultimately the same thing as knowing the whole distribution...
But I suppose there could be an expression that involves only a limited number of the central moments, like the first 3, 4 or 5...
 
  • #4
NotEuler said:
So does this mean that the 'best' one can do (for an expression of the expectation of the coefficient of variation of the samples) is an expression like Expectation(cv) or Expectation(cv^2) for the square of it?
That can't be answered until you define precisely what you want "best" to mean. In the theory of statistical estimation, some terminologies for possible meanings of "best" for an estimator are:

Maximum liklihood
Minimum Variance
Unbiased
Minimum Least Square
Asymptotically efficient

Perhaps you can find a technical article involving some of this terminology and "square of the coefficient of variation".
 
  • #5
Stephen Tashi said:
That can't be answered until you define precisely what you want "best" to mean. In the theory of statistical estimation, some terminologies for possible meanings of "best" for an estimator are:

Maximum liklihood
Minimum Variance
Unbiased
Minimum Least Square
Asymptotically efficient

Perhaps you can find a technical article involving some of this terminology and "square of the coefficient of variation".

But is this really the same thing? I don't want to estimate anything, I want to find an expression for something (cv^2) in terms of exactly known properties of a distribution. I just don't know if that is actually possible (although I know from andrewkirk's reply that it is not possible in terms of variance and mean alone).
 
  • #6
NotEuler said:
But is this really the same thing? I don't want to estimate anything, I want to find an expression for something (cv^2) in terms of exactly known properties of a distribution.

I see what you mean. Does what you call "coefficient of variation of the samples" have a precise definition?

For example, to some people, the terminology "standard deviation of a sample" of N outcomes involves division by N and to others it involves division by N-1. We can use different estimators for the ##\sigma## of the distribution.

It seems to me that however you define cv^2 , it's purpose is to estimate something. So you might find an answer by looking at the literature of estimators.
 
  • #7
I don't know of a general statement for the distribution of a sample coefficient of variation. I can say that in the case where the samples come from a normal distribution then

$$
\sqrt n \left(\frac{s}{\overline{x}} - \frac {\sigma}{\mu}\right) \xrightarrow{d} \mathcal{N}\left(0,\frac{\sigma^2}{2\mu^2} + \frac{\sigma^4}{\mu^4}\right)
$$

That might give you some insight for a [very special] case.
 

1. What is the coefficient of variation (CV)?

The coefficient of variation (CV) is a measure of relative variability that is used to compare the variability of different data sets. It is calculated by dividing the standard deviation by the mean and multiplying by 100.

2. How is the squared coefficient of variation (CV2) calculated?

The squared coefficient of variation (CV2) is calculated by squaring the coefficient of variation (CV). This is done to eliminate the negative values that can occur when calculating the CV for data sets with a mean of 0 or close to 0.

3. What does the squared coefficient of variation (CV2) tell us about a sample?

The squared coefficient of variation (CV2) tells us the proportion of the mean that can be attributed to the standard deviation. A lower CV2 indicates that the sample has a smaller relative variability, while a higher CV2 indicates a larger relative variability.

4. How does the squared coefficient of variation (CV2) differ from the coefficient of variation (CV)?

The squared coefficient of variation (CV2) differs from the coefficient of variation (CV) in that it is a squared value and therefore has different units. While the CV is expressed as a percentage, the CV2 is expressed as a proportion or decimal value.

5. Why is the squared coefficient of variation (CV2) important in statistical analysis?

The squared coefficient of variation (CV2) is important in statistical analysis because it provides a measure of relative variability that can be used to compare different data sets. It is especially useful when comparing data sets with different means, as it takes into account the magnitude of the mean in relation to the standard deviation. Additionally, the squared value eliminates negative values and allows for easier interpretation of the results.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
857
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
472
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
442
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
485
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
662
Back
Top