EnumaElish said:
The technical terms for the problem(s) you described is measurement error, or errors-in-variable.
Let's take each of the problems in turn. For the "rounding problem," suppose 20 people circled option 2. In reality, they might be distributed uniformly from 1.5 to 2.4 with an increment of 0.1. Under this assumption, the observed responses aren't going to be especially skewed or kurtic relative to the true responses, even though they have a lower variance.
For the "interpretation problem" (I'll rename this the "trembling hand problem") suppose that 5 of the 20 people who circled "2" meant to circle "1" while another 5 meant to circle "3" but they all ended up circling 2 because "their hands trembled." (Is this a fair interpretation of your description?)
I think it's a good way of thinking about it, though it's more of a trembling-mind problem. For this problem, too, the observed responses aren't going to be especially skewed or kurtic relative to the true responses, even though they have a lower variance.
What is important is how you think the true responses are distributed relative to the observed ones.
EnumaElish said:
If the true responses are more or less evenly distributed with respect to the observed ones, then they are not going to make much of a difference for the moments greater than the 2nd.
Finally, you can easily run some simulations in Excel, and use the "SKEW" and the "KURT" functions to compare "true" responses with "observed" responses (of hypothetical respondents).
Thanks. I'll try that.
matt grime said:
Of course the idea that assigning numeric values to opinions is utterly flawed in the first place. Why 1-6 for the 6 choices? Why not 1,2,4,6,7,8 as the values?
That's a good point. (I don't really have a choice in the matter, it wasn't my team that took the data, but I'll try to address this anyway.) I wouldn't say "utterly flawed", because it's obvious we can order the responses like we can order numbers, i.e. there's probably some kind of 1D continuum when it comes to the answer to "how significant would a breakthrough in solar sails be?", at least if we defined "breakthrough" definitely enough. FYI, here's an example of the key which was on the survey:
significance:
1- trivial
2 - marginal significance
3 - small significance
4 - moderate significance
5 - major significance
6 - revolutionary
Nevertheless, I agree that there's no good reason that (small - marginal) should equal (marginal - trivial), for example. We don't know how large the distances between successive data values is, so we can't do reliable math on this set, making figures such as the mean and the standard deviation (as well as the skewness and the kurtosis) less meaningful.
Perhaps I'm looking at analyzing this data the wrong way. I'd rather not do just the median and mode, since those indicators don't take into account most of the data values, but maybe something like quantiles. Let me know if you guys have any further ideas.