statdad
Homework Helper
- 1,547
- 99
"I did not generate multiple data sets and measure m and b across them, so it is not valid to say that m and b are statistics in my analysis; they are simply dependent variables which were chosen such that, when treated as constants in a linear equation, a certain criterion is minimized."
Not true at all. Whenever you have any set of random data, collected or generated, the slope and intercept calculated from least squares are statistics. It may be awkward, or extremely difficult, to define a population, but they are statistics nevertheless. If you are saying they don't have a distribution because these values are based on one sample: we think of these just as we do every statistic: specific realizations of a random quantity.
"Although one could choose to generate multiple data sets, and look at the distribution of the m and b statistics across those data sets, this would not be useful in any way ..."
The distributions of the slope and intercept are conceptualized just as the distributions of sample means, standard deviations, etc. In textbooks all distributions in these situations are normal or t, in real life not so much, but the idea is the same.
"Therefore, the full class of available data sets has infinite range, and therefore cannot be randomly sampled and has no distribution. Even if you restricted the analysis to a fixed class of problems, say, where Y = mx + b + \epsilon , then you still have an infinite range of parameters which cannot be sampled!"
I'm not even sure what you mean here - it makes no sense.
Not true at all. Whenever you have any set of random data, collected or generated, the slope and intercept calculated from least squares are statistics. It may be awkward, or extremely difficult, to define a population, but they are statistics nevertheless. If you are saying they don't have a distribution because these values are based on one sample: we think of these just as we do every statistic: specific realizations of a random quantity.
"Although one could choose to generate multiple data sets, and look at the distribution of the m and b statistics across those data sets, this would not be useful in any way ..."
The distributions of the slope and intercept are conceptualized just as the distributions of sample means, standard deviations, etc. In textbooks all distributions in these situations are normal or t, in real life not so much, but the idea is the same.
"Therefore, the full class of available data sets has infinite range, and therefore cannot be randomly sampled and has no distribution. Even if you restricted the analysis to a fixed class of problems, say, where Y = mx + b + \epsilon , then you still have an infinite range of parameters which cannot be sampled!"
I'm not even sure what you mean here - it makes no sense.