Comparing discrete data to a continuous model (1D)

Click For Summary
SUMMARY

This discussion focuses on the comparison of discrete data points to a continuous model using the least squares method. The traditional approach involves calculating the average squared residuals for discrete data, defined as average residuals squared = {[f(x1) - y1]^2 + ... + [f(x10) - y10]^2}/10. The conversation explores whether a similar method can be applied to continuous data by minimizing the integral of the squared residuals, expressed as average residual squared = integral of (f(x) - g(x))^2 dx. The concept of defining the "best" fit is emphasized, highlighting that it requires clear definitions and assumptions about the data and model.

PREREQUISITES
  • Understanding of least squares fitting
  • Familiarity with residual analysis
  • Knowledge of integral calculus
  • Concept of probability distributions in data analysis
NEXT STEPS
  • Research the application of least squares fitting in continuous data analysis
  • Study the properties of Gaussian noise in statistical modeling
  • Explore the concept of cumulative distribution functions in data fitting
  • Learn about alternative fitting methods beyond least squares
USEFUL FOR

Data scientists, statisticians, mathematicians, and anyone involved in modeling and fitting data to mathematical functions.

mikeph
Messages
1,229
Reaction score
18
Say I have a model, y = f(x), and ten discrete data points to compare to this model, (x1, y1)...(x10,y10). The normal way would then be to take the residuals and square them to get a quality of fit, ie.

average residuals squared = {[f(x1) - y1]^2 + ... + [f(x10) - y10]^2}/10

I also remember being told that if this value is minimised then the model f(x) is the best estimate of the data, assuming the data contains only Gaussian noise?

Say instead my data were continuous (for whatever reason). Is it an equally rigorous idea to try to minimise the continuous sum of the residual squared? For example if my data is y = g(x), then the continuous version of the residual is

average residual squared = integral of (f(x) - g(x))^2 dx.

Does this make sense, is this the correct approach to comparing a continuous data set and a model?

Thanksedit- I can maybe put this a better way. Rather than only comparing the data to f(x) at the points where we have measured data, which seems a bit biased to me, why don't we measure it over the entire range of x, and then say "the most we can obtain from our data is that the function looks like a stepwise function with step heights equal to y1, y2,...", and then compute the residual in terms of the area between the model and the stepwise function.
 
Last edited:
Physics news on Phys.org
Asking what the "best" way to fit a model to data is like asking for the best color to paint a room. It isn't a mathematical question unless you precisely define what "best" means to you.

If you precisely define the meaning of "best". then you need a lot of information (or a lot of assumptions) to solve the problem. Otherwise, finding the best way is as futile as tyring to find missing sides and angles of triangle when all you know is one side and one angle.

People often define "best" fit to mean a model that minimizes the sum of the squares of the "errors" or "residuals" between the fitting equation and the data. In the continuous case, some people befine "best" to mean a fit that minimizes the integral of the square of the difference between the fit and a continuous version fo the data.

In the case where the data is data assumed to come from a probability distribution, people sometimes define the "best" fit to be the one that minimizes the sum of the squared residuals between the fitted cumulative distribution and the cumulative distribution of the data. This is the method that you proposed in your Edit.

The above facts are facts about human behavior and culture, not mathematical theorems. People have written mathematical articles about why least squares turns out to be a good way of defining "best" in real world problems. These articles argue that particular goals and particular assumptions are reasonable models for many real world problems and they show that least squares fitting is best according to those goals and assumptions.
 

Similar threads

  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 23 ·
Replies
23
Views
4K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 64 ·
3
Replies
64
Views
6K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 10 ·
Replies
10
Views
3K