Least squares approximation: Is smaller normal distance always better?

In summary, Dale Swanson found that there are different ways to calculate regression coefficients, and that by using a custom norm he was able to get features that related to the particular choice of norm.
  • #1
DaleSwanson
352
2
I took a LA course in the spring, and was interested by the least squares method for building models. I decided to practice this concept by attempting to build a model that would predict ticket sales for the Mega Millions lottery given the jackpot amount. I have 249 data pairs of jackpot and ticket sales for past data. I tried a bunch of possible models and came up with two that work reasonably well:
Model 1: [itex]y = β_0 + β_1 x_1 + β_2 x_1^2[/itex]
Model 2: [itex]y = β_0 * e^{β_1 * x_1}[/itex]
where x1 is jackpot, and y is ticket sales.

I found the coefficients for both as well as the normal distance from the vector representing the actual data. After that I put the models into a spreadsheet and had it calculate the predicted sales for all the past drawings I had data for. I then took the average of the percent errors of all those predictions.

The normal distance for model 1 was 52.94, and for model 2 was 0.7446. The percent errors were reversed though. Model 1 had the lower average error of 5.83%, and model 2 had an average error of 6.97%.

I realize the model is best when the normal distance is minimized; however, I'm wondering if I can compare that distance between models of different dimensions? If not then that would explain why model 1 seems to do a better job based on percent errors, but has a much larger normal distance. On the other hand, if comparisons between models of different dimensions are valid, then why the discrepancy between what these two methods of evaluating model accuracy tell me?
 
Physics news on Phys.org
  • #2
DaleSwanson said:
I took a LA course in the spring, and was interested by the least squares method for building models. I decided to practice this concept by attempting to build a model that would predict ticket sales for the Mega Millions lottery given the jackpot amount. I have 249 data pairs of jackpot and ticket sales for past data. I tried a bunch of possible models and came up with two that work reasonably well:
Model 1: [itex]y = β_0 + β_1 x_1 + β_2 x_1^2[/itex]
Model 2: [itex]y = β_0 * e^{β_1 * x_1}[/itex]
where x1 is jackpot, and y is ticket sales.

I found the coefficients for both as well as the normal distance from the vector representing the actual data. After that I put the models into a spreadsheet and had it calculate the predicted sales for all the past drawings I had data for. I then took the average of the percent errors of all those predictions.

The normal distance for model 1 was 52.94, and for model 2 was 0.7446. The percent errors were reversed though. Model 1 had the lower average error of 5.83%, and model 2 had an average error of 6.97%.

I realize the model is best when the normal distance is minimized; however, I'm wondering if I can compare that distance between models of different dimensions? If not then that would explain why model 1 seems to do a better job based on percent errors, but has a much larger normal distance. On the other hand, if comparisons between models of different dimensions are valid, then why the discrepancy between what these two methods of evaluating model accuracy tell me?

Hey DaleSwanson.

For fitting models in a regression, there are different ways of calculating the regression coeffecients.

The different ways depend on the decomposition used. Some methods use what is called a pseudo-inverse.

The different kinds of decomposition include singular value decomposition, and some other one I can't remember. The pseudo-inverse comes into play when you get situations like when random variables are linearly-dependent or close enough to blow up the matrix.

You can also do regression by using principal components in what is known as a principal component analysis (PCA). The PCA effectively creates a orthogonal basis for the random variables where the first linear combination maximizes variance the rest do the same while maintaining orthogonality by setting covariance terms to zero.

I guess the thing is asking yourself what to minimize. With least squares, the idea is to minimize the total error to the model. But this is not a done deal since you can have many different norms. You can have the 1-norm or 2-norm which are standard norms, you can have other arbitrary norms that may get a specific fit corresponding to a characteristic of the norm-itself.

By engineering a custom norm, you may get features that relate to that particular choice of norm that give an explanation that you would not get with standard 1-norm or 2-norm and if you can construct such a norm to get the properties you had in mind, then we are dealing with a more general algorithm.
 
  • #3
As chiro said, there are many different ways to approximate a value and they well may give different answers. It is, after all, only an "approximation". The reason we typically use "least squares" is that it relates to our usual idea of "distance" with the formula [itex]\sqrt{x^2+ y^2+ z^2}[/itex].

Other commonly used approximations are |x|+ |y|+ |z| and min(|x|,|y|,|z|)
 
  • #4
Well thanks for both the replies. It would seem my exposure to this is just the tip of a much larger iceberg.

I find the concept of building models to fit data very interesting. What sort of course would cover this in more detail? Would a calc based intro to statistics course be good?
 
  • #5
DaleSwanson said:
Well thanks for both the replies. It would seem my exposure to this is just the tip of a much larger iceberg.

I find the concept of building models to fit data very interesting. What sort of course would cover this in more detail? Would a calc based intro to statistics course be good?

For fitting deterministic models, the area and ideas of numeric analysis are very useful. Within this, there is the study of interpolation.

Interpolation is simply the practice of finding a fit between data points at the simplest level, given certain conditions. The simplest model is the Lagrange polynomial which calculates a polynomial that goes through the data points of degree n + 1 where n is the number of points.

The more advanced interpolation schemes are things like BSPLINES and NURBS which are more flexible, but harder to work with. You have some schemes which are supplied with things like derivative information.

One thing that you can do is to take sample data and smooth it using time-series series and expectation techniques and use the above framework to create a fit, but I wouldn't do this as a general principle because understanding not only the data output, but the context of the data and underlying processes is far more important than trying to find the best fit.

In terms of the general statistical field, you can look at regression and generalized linear models.

If you wanted to use general norm conditions where you use some more exotic norm rather than 1-norm or 2-norm, then you will end up with an optimization problem under the constraint of the norm. This means that you will be doing an optimization problem which has its nonclamenture and ideas with regard to whether global minima exist. Recall that you are finding a fit with minimum residual, so essentially you are trying to solve an optimization problem.

The tensor calculus is also a good thing to look at if you are using exotic norms since under certain conditions you can always convert between co-ordinate systems which means that you can build a bridge between the euclidean-norms (like 2-norm) and your exotic norm.

So for general understanding of finding fits, look at linear and generalized linear models, optimization (so you can understand how general optimization problems are found with general conditions), numerical analysis, and tensor theory. All of these in combination will give an idea of how to solve more general kinds of residual minimization and thus get a specific model fit given your general requirements.
 

1. What is least squares approximation?

Least squares approximation is a statistical method used to find the best fit line or curve for a set of data points. It minimizes the sum of squared distances between the data points and the line or curve, hence the name "least squares".

2. How is least squares approximation used in science?

Least squares approximation is commonly used in science to analyze and interpret data, particularly in fields such as physics, engineering, and economics. It allows scientists to make predictions and draw conclusions based on the relationship between variables in their data.

3. What is the normal distance in least squares approximation?

The normal distance in least squares approximation refers to the perpendicular distance from a data point to the best fit line or curve. It is calculated by finding the difference between the actual y-value of the data point and the y-value predicted by the best fit line or curve.

4. Is a smaller normal distance always better in least squares approximation?

In most cases, a smaller normal distance is desirable in least squares approximation as it indicates a better fit between the data points and the line or curve. However, there are situations where a larger normal distance may be more appropriate, such as when dealing with outliers or when the data has a natural variance.

5. Are there any limitations to least squares approximation?

Yes, there are limitations to least squares approximation. It assumes that the relationship between the variables is linear and that the data is normally distributed. It also does not account for any potential errors in the measurements or the presence of outliers, which can affect the accuracy of the results.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
24
Views
2K
Replies
6
Views
1K
Replies
2
Views
927
  • Calculus and Beyond Homework Help
Replies
10
Views
1K
  • Calculus and Beyond Homework Help
Replies
4
Views
4K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
2K
  • Introductory Physics Homework Help
Replies
7
Views
5K
  • Calculus and Beyond Homework Help
Replies
1
Views
2K
Replies
6
Views
5K
  • Calculus and Beyond Homework Help
Replies
1
Views
5K
Back
Top