Register to reply

Would measures like skewness or kurtosis magnify uncertainty?

Share this thread:
JoAuSc
#1
Apr25-07, 01:16 PM
P: 200
Let's say you've given out surveys where people have to respond whether they think the technology is "not significant/little significance/moderately significant/..." so that there are six choices total, on a scale from 1 to 6. After collect a few dozen or so of these surveys you get a distribution.

Obviously, there's a large amount of uncertainty in the measurement, probably around + or -1. If you try and take statistical measures such as the skewness or kurtosis, which involve cubing or taking the fourth power of the deviations from the mean, would that magnify the resulting uncertainty to the point where these measures would be unusuable?
Phys.Org News Partner Science news on Phys.org
Experts defend operational earthquake forecasting, counter critiques
EU urged to convert TV frequencies to mobile broadband
Sierra Nevada freshwater runoff could drop 26 percent by 2100
EnumaElish
#2
Apr25-07, 02:10 PM
Sci Advisor
HW Helper
EnumaElish's Avatar
P: 2,481
That depends on your objective. When defining uncertainty, do you think that "a few people erring significantly" tells you more about the extent of the uncertainty than "a lot of people erring slightly"? Then, skewness and kurtosis should be part of your definition. (I do not fully understand when you say "they magnify uncertainty." Do you have in mind some kind of additive uncertainty function which adds together the 2nd through the 4th moments, unweighted?) Another issue is whether we are talking about the uncertainty of x (usu. measured as standard deviation = s), or the uncertainty of "x bar" (usu. measured as standard error = s/√n)? If the latter, then each additional moment will have to be weighted by a function of the sample size.
JoAuSc
#3
Apr26-07, 12:14 PM
P: 200
Quote Quote by EnumaElish View Post
That depends on your objective. When defining uncertainty, do you think that "a few people erring significantly" tells you more about the extent of the uncertainty than "a lot of people erring slightly"? Then, skewness and kurtosis should be part of your definition. (I do not fully understand when you say "they magnify uncertainty." Do you have in mind some kind of additive uncertainty function which adds together the 2nd through the 4th moments, unweighted?) Another issue is whether we are talking about the uncertainty of x (usu. measured as standard deviation = s), or the uncertainty of "x bar" (usu. measured as standard error = s/√n)? If the latter, then each additional moment will have to be weighted by a function of the sample size.
Let me try to clarify. When people are polled, if their opinions are measured exactly, then there should be a certain spread of data. In addition, there's uncertainty in that 1.) people must choose from six discrete data points rather than a continuum, and 2.) that people may not agree on whether a certain view of the future is "moderately significant" or "slightly significant", even if they agree exactly on the specific forecast. I'm guessing the additional uncertainty (that is, in addition to the natural spread of opinion) is about plus or minus one. I'm also guessing that everyone's slight uncertainty is more important than a few people's out-of-the-ballpark guesses. Thus, according to what you said, skewness and kurtosis would not be necessary; however, I'm trying to analyze the natural spread, not the uncertainties I just mentioned. My question is that in the process of calculating skewness, kurtosis, etc., would the additional errors propagate themselves enough so that the end result is too uncertain to be useful?

EnumaElish
#4
Apr26-07, 01:46 PM
Sci Advisor
HW Helper
EnumaElish's Avatar
P: 2,481
Would measures like skewness or kurtosis magnify uncertainty?

The technical terms for the problem(s) you described is measurement error, or errors-in-variable.

Let's take each of the problems in turn. For the "rounding problem," suppose 20 people circled option 2. In reality, they might be distributed uniformly from 1.5 to 2.4 with an increment of 0.1. Under this assumption, the observed responses aren't going to be especially skewed or kurtic relative to the true responses, even though they have a lower variance.

For the "interpretation problem" (I'll rename this the "trembling hand problem") suppose that 5 of the 20 people who circled "2" meant to circle "1" while another 5 meant to circle "3" but they all ended up circling 2 because "their hands trembled." (Is this a fair interpretation of your description?) For this problem, too, the observed responses aren't going to be especially skewed or kurtic relative to the true responses, even though they have a lower variance.

What is important is how you think the true responses are distributed relative to the observed ones. If the true responses are more or less evenly distributed with respect to the observed ones, then they are not going to make much of a difference for the moments greater than the 2nd.

Finally, you can easily run some simulations in Excel, and use the "SKEW" and the "KURT" functions to compare "true" responses with "observed" responses (of hypothetical respondents).
EnumaElish
#5
Apr26-07, 01:50 PM
Sci Advisor
HW Helper
EnumaElish's Avatar
P: 2,481
From a purely technical point of view, as long as the additional uncertainty is between -1 and +1, shouldn't increasing the power term reduce, not increase, its effect?
matt grime
#6
Apr26-07, 01:53 PM
Sci Advisor
HW Helper
P: 9,396
Of course the idea that assigning numeric values to opinions is utterly flawed in the first place. Why 1-6 for the 6 choices? Why not 1,2,4,6,7,8 as the values?
EnumaElish
#7
Apr26-07, 02:02 PM
Sci Advisor
HW Helper
EnumaElish's Avatar
P: 2,481
Excellent point. Matt, as for the quotation, the farthest I could get is:

To say that Gell-Mann "discovered" the quark is not quite right. All of his great breakthroughs came from playing with symbols on paper and chalkboards. His most important tools, he liked to say, were pencil, paper, and wastebasket.
from http://www.randomhouse.com/knopf/cat...0&view=excerpt
JoAuSc
#8
Apr26-07, 04:08 PM
P: 200
Quote Quote by EnumaElish View Post
The technical terms for the problem(s) you described is measurement error, or errors-in-variable.

Let's take each of the problems in turn. For the "rounding problem," suppose 20 people circled option 2. In reality, they might be distributed uniformly from 1.5 to 2.4 with an increment of 0.1. Under this assumption, the observed responses aren't going to be especially skewed or kurtic relative to the true responses, even though they have a lower variance.

For the "interpretation problem" (I'll rename this the "trembling hand problem") suppose that 5 of the 20 people who circled "2" meant to circle "1" while another 5 meant to circle "3" but they all ended up circling 2 because "their hands trembled." (Is this a fair interpretation of your description?)
I think it's a good way of thinking about it, though it's more of a trembling-mind problem. For this problem, too, the observed responses aren't going to be especially skewed or kurtic relative to the true responses, even though they have a lower variance.

What is important is how you think the true responses are distributed relative to the observed ones.
Quote Quote by EnumaElish
If the true responses are more or less evenly distributed with respect to the observed ones, then they are not going to make much of a difference for the moments greater than the 2nd.

Finally, you can easily run some simulations in Excel, and use the "SKEW" and the "KURT" functions to compare "true" responses with "observed" responses (of hypothetical respondents).
Thanks. I'll try that.

Quote Quote by matt grime View Post
Of course the idea that assigning numeric values to opinions is utterly flawed in the first place. Why 1-6 for the 6 choices? Why not 1,2,4,6,7,8 as the values?
That's a good point. (I don't really have a choice in the matter, it wasn't my team that took the data, but I'll try to address this anyway.) I wouldn't say "utterly flawed", because it's obvious we can order the responses like we can order numbers, i.e. there's probably some kind of 1D continuum when it comes to the answer to "how significant would a breakthrough in solar sails be?", at least if we defined "breakthrough" definitely enough. FYI, here's an example of the key which was on the survey:

significance:
1- trivial
2 - marginal significance
3 - small significance
4 - moderate significance
5 - major significance
6 - revolutionary

Nevertheless, I agree that there's no good reason that (small - marginal) should equal (marginal - trivial), for example. We don't know how large the distances between successive data values is, so we can't do reliable math on this set, making figures such as the mean and the standard deviation (as well as the skewness and the kurtosis) less meaningful.

Perhaps I'm looking at analyzing this data the wrong way. I'd rather not do just the median and mode, since those indicators don't take into account most of the data values, but maybe something like quantiles. Let me know if you guys have any further ideas.
EnumaElish
#9
Apr26-07, 04:37 PM
Sci Advisor
HW Helper
EnumaElish's Avatar
P: 2,481
Quote Quote by matt grime View Post
Of course the idea that assigning numeric values to opinions is utterly flawed in the first place. Why 1-6 for the 6 choices? Why not 1,2,4,6,7,8 as the values?
http://en.wikipedia.org/wiki/Categor...ication_scheme
EnumaElish
#10
Apr26-07, 04:38 PM
Sci Advisor
HW Helper
EnumaElish's Avatar
P: 2,481
Quote Quote by JoAuSc View Post
Let me know if you guys have any further ideas.
http://en.wikipedia.org/wiki/Analysi...tegorical_data
JoAuSc
#11
Apr30-07, 05:24 PM
P: 200
Thanks for the help. Looking at one of the wikipedia articles EnumaElish posted, I saw a paper mentioed called "On The Statistical Treatment of Football Numbers", which lead me to an old book called "Readings in Statistics", which had a lot of helpful information on this sort of stuff. Interestingly, a paper in there by C. Alan Boneau claims that, assuming you can definitely match a number to an opinion without having to make comparisons, then you can compare two populations with some altered variable by comparing their means, std. dev.'s, etc. In other words, sometimes calculating the mean of ordinal numbers is helpful. I don't know if it's legitimate enough in our case, but at least there's some evidence on our side.

My project partner and I plan to take the mean, std. dev., skewness, median, mode, and the absolute deviation from the scores we ourselves get from filling out the survey. As much as I want to try some methods more appropriate to ordinal data, we're trying to get this thing done by Thursday, so we don't really have time.
EnumaElish
#12
May1-07, 10:27 AM
Sci Advisor
HW Helper
EnumaElish's Avatar
P: 2,481
This prompted me to read Section 7.4 "Ordered Responses" in K. E. Train's book Discrete Choice Methods with Simulation.1 Suppose the respondents' answers are based on how much income (or utility, or "progress"), denoted Y, he or she expects to get out of the technological innovation that is being surveyed. The standard assumption in a discrete choice model would be to assume that the respondent will choose k'th response if yk < Y < yk+1, where y1, ..., yn+1 are the cutoff values of Y for n choices. Which leads me to think that as long as the cutoff values are well-specified (even if only conceptually), the ordinal (i.e. the nominal) values of the responses do not matter. That is, the underlying "latent" distribution of the Y's (predicted from or "fitted to" the observed responses) will be independent of the nominal values of the responses.

To be even more concrete, suppose 10 people out of 100 replied "third highest response". Then Prob(y3 < Y < y4) = 0.1. The nominal value of the "third highest response" does not matter.

1See Multinomial logit .


Register to reply

Related Discussions
Can a human eye magnify an image? Medical Sciences 22
Uncertainty principle, relating the uncertainty in position to the uncertainty Advanced Physics Homework 3
Skewness or Kurtosis Problem? Set Theory, Logic, Probability, Statistics 1
The kurtosis of standard normal pdf Calculus 1