Comparing discrete data to a continuous model (1D)

mikeph
Messages
1,229
Reaction score
18
Say I have a model, y = f(x), and ten discrete data points to compare to this model, (x1, y1)...(x10,y10). The normal way would then be to take the residuals and square them to get a quality of fit, ie.

average residuals squared = {[f(x1) - y1]^2 + ... + [f(x10) - y10]^2}/10

I also remember being told that if this value is minimised then the model f(x) is the best estimate of the data, assuming the data contains only Gaussian noise?

Say instead my data were continuous (for whatever reason). Is it an equally rigorous idea to try to minimise the continuous sum of the residual squared? For example if my data is y = g(x), then the continuous version of the residual is

average residual squared = integral of (f(x) - g(x))^2 dx.

Does this make sense, is this the correct approach to comparing a continuous data set and a model?

Thanksedit- I can maybe put this a better way. Rather than only comparing the data to f(x) at the points where we have measured data, which seems a bit biased to me, why don't we measure it over the entire range of x, and then say "the most we can obtain from our data is that the function looks like a stepwise function with step heights equal to y1, y2,...", and then compute the residual in terms of the area between the model and the stepwise function.
 
Last edited:
Physics news on Phys.org
Asking what the "best" way to fit a model to data is like asking for the best color to paint a room. It isn't a mathematical question unless you precisely define what "best" means to you.

If you precisely define the meaning of "best". then you need a lot of information (or a lot of assumptions) to solve the problem. Otherwise, finding the best way is as futile as tyring to find missing sides and angles of triangle when all you know is one side and one angle.

People often define "best" fit to mean a model that minimizes the sum of the squares of the "errors" or "residuals" between the fitting equation and the data. In the continuous case, some people befine "best" to mean a fit that minimizes the integral of the square of the difference between the fit and a continuous version fo the data.

In the case where the data is data assumed to come from a probablity distribution, people sometimes define the "best" fit to be the one that minimizes the sum of the squared residuals between the fitted cumulative distribution and the cumulative distribution of the data. This is the method that you proposed in your Edit.

The above facts are facts about human behavior and culture, not mathematical theorems. People have written mathematical articles about why least squares turns out to be a good way of defining "best" in real world problems. These articles argue that particular goals and particular assumptions are reasonable models for many real world problems and they show that least squares fitting is best according to those goals and assumptions.
 
Hi all, I've been a roulette player for more than 10 years (although I took time off here and there) and it's only now that I'm trying to understand the physics of the game. Basically my strategy in roulette is to divide the wheel roughly into two halves (let's call them A and B). My theory is that in roulette there will invariably be variance. In other words, if A comes up 5 times in a row, B will be due to come up soon. However I have been proven wrong many times, and I have seen some...
Namaste & G'day Postulate: A strongly-knit team wins on average over a less knit one Fundamentals: - Two teams face off with 4 players each - A polo team consists of players that each have assigned to them a measure of their ability (called a "Handicap" - 10 is highest, -2 lowest) I attempted to measure close-knitness of a team in terms of standard deviation (SD) of handicaps of the players. Failure: It turns out that, more often than, a team with a higher SD wins. In my language, that...
Back
Top