Least squares approximation: Is smaller normal distance always better?

Click For Summary
SUMMARY

The discussion focuses on the application of the least squares method for modeling ticket sales based on jackpot amounts in the Mega Millions lottery. Two models were evaluated: Model 1 (y = β_0 + β_1 x_1 + β_2 x_1^2) and Model 2 (y = β_0 * e^{β_1 * x_1}). Despite Model 1 having a larger normal distance of 52.94 compared to Model 2's 0.7446, it achieved a lower average percent error of 5.83% versus 6.97%. The conversation highlights the complexities of comparing models of different dimensions and the implications of using various norms in regression analysis.

PREREQUISITES
  • Understanding of least squares regression
  • Familiarity with regression coefficients and model fitting
  • Knowledge of statistical norms (1-norm, 2-norm)
  • Basic concepts of numerical analysis and interpolation
NEXT STEPS
  • Explore singular value decomposition and its applications in regression
  • Learn about principal component analysis (PCA) for dimensionality reduction
  • Study advanced interpolation techniques such as BSPLINES and NURBS
  • Investigate optimization problems in the context of general linear models
USEFUL FOR

Data scientists, statisticians, and anyone involved in predictive modeling and regression analysis will benefit from this discussion, particularly those interested in understanding model accuracy and the implications of different fitting techniques.

DaleSwanson
Messages
350
Reaction score
2
I took a LA course in the spring, and was interested by the least squares method for building models. I decided to practice this concept by attempting to build a model that would predict ticket sales for the Mega Millions lottery given the jackpot amount. I have 249 data pairs of jackpot and ticket sales for past data. I tried a bunch of possible models and came up with two that work reasonably well:
Model 1: y = β_0 + β_1 x_1 + β_2 x_1^2
Model 2: y = β_0 * e^{β_1 * x_1}
where x1 is jackpot, and y is ticket sales.

I found the coefficients for both as well as the normal distance from the vector representing the actual data. After that I put the models into a spreadsheet and had it calculate the predicted sales for all the past drawings I had data for. I then took the average of the percent errors of all those predictions.

The normal distance for model 1 was 52.94, and for model 2 was 0.7446. The percent errors were reversed though. Model 1 had the lower average error of 5.83%, and model 2 had an average error of 6.97%.

I realize the model is best when the normal distance is minimized; however, I'm wondering if I can compare that distance between models of different dimensions? If not then that would explain why model 1 seems to do a better job based on percent errors, but has a much larger normal distance. On the other hand, if comparisons between models of different dimensions are valid, then why the discrepancy between what these two methods of evaluating model accuracy tell me?
 
Physics news on Phys.org
DaleSwanson said:
I took a LA course in the spring, and was interested by the least squares method for building models. I decided to practice this concept by attempting to build a model that would predict ticket sales for the Mega Millions lottery given the jackpot amount. I have 249 data pairs of jackpot and ticket sales for past data. I tried a bunch of possible models and came up with two that work reasonably well:
Model 1: y = β_0 + β_1 x_1 + β_2 x_1^2
Model 2: y = β_0 * e^{β_1 * x_1}
where x1 is jackpot, and y is ticket sales.

I found the coefficients for both as well as the normal distance from the vector representing the actual data. After that I put the models into a spreadsheet and had it calculate the predicted sales for all the past drawings I had data for. I then took the average of the percent errors of all those predictions.

The normal distance for model 1 was 52.94, and for model 2 was 0.7446. The percent errors were reversed though. Model 1 had the lower average error of 5.83%, and model 2 had an average error of 6.97%.

I realize the model is best when the normal distance is minimized; however, I'm wondering if I can compare that distance between models of different dimensions? If not then that would explain why model 1 seems to do a better job based on percent errors, but has a much larger normal distance. On the other hand, if comparisons between models of different dimensions are valid, then why the discrepancy between what these two methods of evaluating model accuracy tell me?

Hey DaleSwanson.

For fitting models in a regression, there are different ways of calculating the regression coeffecients.

The different ways depend on the decomposition used. Some methods use what is called a pseudo-inverse.

The different kinds of decomposition include singular value decomposition, and some other one I can't remember. The pseudo-inverse comes into play when you get situations like when random variables are linearly-dependent or close enough to blow up the matrix.

You can also do regression by using principal components in what is known as a principal component analysis (PCA). The PCA effectively creates a orthogonal basis for the random variables where the first linear combination maximizes variance the rest do the same while maintaining orthogonality by setting covariance terms to zero.

I guess the thing is asking yourself what to minimize. With least squares, the idea is to minimize the total error to the model. But this is not a done deal since you can have many different norms. You can have the 1-norm or 2-norm which are standard norms, you can have other arbitrary norms that may get a specific fit corresponding to a characteristic of the norm-itself.

By engineering a custom norm, you may get features that relate to that particular choice of norm that give an explanation that you would not get with standard 1-norm or 2-norm and if you can construct such a norm to get the properties you had in mind, then we are dealing with a more general algorithm.
 
As chiro said, there are many different ways to approximate a value and they well may give different answers. It is, after all, only an "approximation". The reason we typically use "least squares" is that it relates to our usual idea of "distance" with the formula \sqrt{x^2+ y^2+ z^2}.

Other commonly used approximations are |x|+ |y|+ |z| and min(|x|,|y|,|z|)
 
Well thanks for both the replies. It would seem my exposure to this is just the tip of a much larger iceberg.

I find the concept of building models to fit data very interesting. What sort of course would cover this in more detail? Would a calc based intro to statistics course be good?
 
DaleSwanson said:
Well thanks for both the replies. It would seem my exposure to this is just the tip of a much larger iceberg.

I find the concept of building models to fit data very interesting. What sort of course would cover this in more detail? Would a calc based intro to statistics course be good?

For fitting deterministic models, the area and ideas of numeric analysis are very useful. Within this, there is the study of interpolation.

Interpolation is simply the practice of finding a fit between data points at the simplest level, given certain conditions. The simplest model is the Lagrange polynomial which calculates a polynomial that goes through the data points of degree n + 1 where n is the number of points.

The more advanced interpolation schemes are things like BSPLINES and NURBS which are more flexible, but harder to work with. You have some schemes which are supplied with things like derivative information.

One thing that you can do is to take sample data and smooth it using time-series series and expectation techniques and use the above framework to create a fit, but I wouldn't do this as a general principle because understanding not only the data output, but the context of the data and underlying processes is far more important than trying to find the best fit.

In terms of the general statistical field, you can look at regression and generalized linear models.

If you wanted to use general norm conditions where you use some more exotic norm rather than 1-norm or 2-norm, then you will end up with an optimization problem under the constraint of the norm. This means that you will be doing an optimization problem which has its nonclamenture and ideas with regard to whether global minima exist. Recall that you are finding a fit with minimum residual, so essentially you are trying to solve an optimization problem.

The tensor calculus is also a good thing to look at if you are using exotic norms since under certain conditions you can always convert between co-ordinate systems which means that you can build a bridge between the euclidean-norms (like 2-norm) and your exotic norm.

So for general understanding of finding fits, look at linear and generalized linear models, optimization (so you can understand how general optimization problems are found with general conditions), numerical analysis, and tensor theory. All of these in combination will give an idea of how to solve more general kinds of residual minimization and thus get a specific model fit given your general requirements.
 

Similar threads

  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
Replies
24
Views
3K
  • · Replies 6 ·
Replies
6
Views
5K
  • · Replies 2 ·
Replies
2
Views
2K
Replies
1
Views
5K
Replies
4
Views
5K
  • · Replies 2 ·
Replies
2
Views
1K
  • · Replies 6 ·
Replies
6
Views
2K
Replies
10
Views
2K