How to fit given function to blurred data points?

sceptic · Oct 8, 2014

Are there any elaborated theory or method how to fit parameters of a function family to data given by probability distributions of data points instead of given coordinates of points precisely without error? I think this is a very general problem, I hope it is already solved.

Important:

I would like a general method working with any kind of probability distribution around data points, not just a Gaussian which can be described an error value, for example its variance.

I would like to use all information which is available, so a fully Bayesian solution without unnecessary estimation.

DrDu · Oct 8, 2014

In classical statistics, you would set up the Likelihood for your parameters and maximize it.
Bayesian statistics is similar: you multiply the likelihood with the prior distributions of the the parameters to obtain the posterior probability distribution of the parameters.

sceptic · Oct 8, 2014

Yes, I know all the principles. But I need a practical example with equations, maybe a book chapter or a paper with this kind of problem. For example what kind of keyword should I search for? The distributions can be the same, but not Gaussian. Is it practically possible to calculate at all? Maybe in general for lots of data point distributions the problem can exponentially explode, or can't?

mfb · Oct 8, 2014

Likelihood and minimization are good keywords. Usually the point of maximal likelihood is found with iterative approximations.

sceptic said:

Maybe in general for lots of data point distributions the problem can exponentially explode, or can't?

No, it is typically linear with data points (because you have to calculate the likelihood for each data point). Many free parameters can make the problem time-consuming, especially if they are highly correlated.

Number Nine · Oct 13, 2014

There is a concept of an "error-in-variables" model that deals with this kind of thing, although I'd probably just take a hierarchical approach. As an example, suppose that we have observed points ##(x_1,\dots,x_n)## from a normal distribution ##N(\mu,\tau^2)## which we assume are actually measured with normally distributed error ##N(0,\sigma^2_i)##. If ##x_i## has true value ##\mu_i## (which is unobserved), then we have ##x_i \sim N(\mu_i, \sigma^2_i)##, so the full model for the mean is
x_i = \mu_i + \epsilon_i, \ \ \ where \ \epsilon_i \sim N(0, \tau^2)
or
x_i = \mu + e_i + \epsilon_i, \ \ \ where \ e_i \sim N(0, \sigma^2_i)
Basically, we just model the error at two different levels.

A similar regression model might take the form

y_i = \alpha + \beta \mu_i + \epsilon

Note that you can assume any kind of error structure you want; it doesn't have to be normal. The same general approach would still apply.

How to fit given function to blurred data points?

Similar threads

Undergrad Please Explain (actually explain) The Monty Hall Problem

Undergrad A variant of the Monty Hall problem

Undergrad My basic understanding of set theory

High School Onto set mapping is the surjective set mapping, and into injective?

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers