- #1
DaleSwanson
- 352
- 2
I took a LA course in the spring, and was interested by the least squares method for building models. I decided to practice this concept by attempting to build a model that would predict ticket sales for the Mega Millions lottery given the jackpot amount. I have 249 data pairs of jackpot and ticket sales for past data. I tried a bunch of possible models and came up with two that work reasonably well:
Model 1: [itex]y = β_0 + β_1 x_1 + β_2 x_1^2[/itex]
Model 2: [itex]y = β_0 * e^{β_1 * x_1}[/itex]
where x1 is jackpot, and y is ticket sales.
I found the coefficients for both as well as the normal distance from the vector representing the actual data. After that I put the models into a spreadsheet and had it calculate the predicted sales for all the past drawings I had data for. I then took the average of the percent errors of all those predictions.
The normal distance for model 1 was 52.94, and for model 2 was 0.7446. The percent errors were reversed though. Model 1 had the lower average error of 5.83%, and model 2 had an average error of 6.97%.
I realize the model is best when the normal distance is minimized; however, I'm wondering if I can compare that distance between models of different dimensions? If not then that would explain why model 1 seems to do a better job based on percent errors, but has a much larger normal distance. On the other hand, if comparisons between models of different dimensions are valid, then why the discrepancy between what these two methods of evaluating model accuracy tell me?
Model 1: [itex]y = β_0 + β_1 x_1 + β_2 x_1^2[/itex]
Model 2: [itex]y = β_0 * e^{β_1 * x_1}[/itex]
where x1 is jackpot, and y is ticket sales.
I found the coefficients for both as well as the normal distance from the vector representing the actual data. After that I put the models into a spreadsheet and had it calculate the predicted sales for all the past drawings I had data for. I then took the average of the percent errors of all those predictions.
The normal distance for model 1 was 52.94, and for model 2 was 0.7446. The percent errors were reversed though. Model 1 had the lower average error of 5.83%, and model 2 had an average error of 6.97%.
I realize the model is best when the normal distance is minimized; however, I'm wondering if I can compare that distance between models of different dimensions? If not then that would explain why model 1 seems to do a better job based on percent errors, but has a much larger normal distance. On the other hand, if comparisons between models of different dimensions are valid, then why the discrepancy between what these two methods of evaluating model accuracy tell me?