How to fit given function to blurred data points?

Click For Summary

Discussion Overview

The discussion revolves around methods for fitting parameters of a function family to data represented by probability distributions rather than precise coordinates. The focus is on finding a general approach that accommodates various types of probability distributions, not limited to Gaussian distributions, and emphasizes a fully Bayesian solution.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested

Main Points Raised

  • One participant seeks a general method for fitting parameters to data given by probability distributions, expressing a preference for a fully Bayesian approach without unnecessary estimation.
  • Another participant suggests using classical statistics by setting up the likelihood for parameters and maximizing it, noting similarities with Bayesian statistics where the likelihood is multiplied by prior distributions to obtain posterior probabilities.
  • A participant requests practical examples with equations and references, questioning the feasibility of calculations when distributions are not Gaussian and expressing concern about potential exponential complexity with many data point distributions.
  • It is mentioned that while likelihood and minimization are relevant keywords, the problem is typically linear with respect to data points, though many free parameters can complicate the process, especially if they are highly correlated.
  • A concept of an "error-in-variables" model is introduced, which deals with observed points from a normal distribution measured with normally distributed error. This model allows for different levels of error modeling and suggests that any error structure can be assumed.

Areas of Agreement / Disagreement

Participants express varying levels of familiarity with the principles of fitting models to data with uncertainty. There is no consensus on a specific method or example, and concerns about the complexity of calculations remain unresolved.

Contextual Notes

Participants highlight limitations regarding the assumptions of distributions and the potential for increased complexity with numerous data points and parameters. The discussion does not resolve these complexities.

sceptic
Messages
10
Reaction score
0
Are there any elaborated theory or method how to fit parameters of a function family to data given by probability distributions of data points instead of given coordinates of points precisely without error? I think this is a very general problem, I hope it is already solved.

Important:

I would like a general method working with any kind of probability distribution around data points, not just a Gaussian which can be described an error value, for example its variance.

I would like to use all information which is available, so a fully Bayesian solution without unnecessary estimation.
 
Physics news on Phys.org
In classical statistics, you would set up the Likelihood for your parameters and maximize it.
Bayesian statistics is similar: you multiply the likelihood with the prior distributions of the the parameters to obtain the posterior probability distribution of the parameters.
 
Yes, I know all the principles. But I need a practical example with equations, maybe a book chapter or a paper with this kind of problem. For example what kind of keyword should I search for? The distributions can be the same, but not Gaussian. Is it practically possible to calculate at all? Maybe in general for lots of data point distributions the problem can exponentially explode, or can't?
 
Likelihood and minimization are good keywords. Usually the point of maximal likelihood is found with iterative approximations.
sceptic said:
Maybe in general for lots of data point distributions the problem can exponentially explode, or can't?
No, it is typically linear with data points (because you have to calculate the likelihood for each data point). Many free parameters can make the problem time-consuming, especially if they are highly correlated.
 
There is a concept of an "error-in-variables" model that deals with this kind of thing, although I'd probably just take a hierarchical approach. As an example, suppose that we have observed points ##(x_1,\dots,x_n)## from a normal distribution ##N(\mu,\tau^2)## which we assume are actually measured with normally distributed error ##N(0,\sigma^2_i)##. If ##x_i## has true value ##\mu_i## (which is unobserved), then we have ##x_i \sim N(\mu_i, \sigma^2_i)##, so the full model for the mean is
x_i = \mu_i + \epsilon_i, \ \ \ where \ \epsilon_i \sim N(0, \tau^2)
or
x_i = \mu + e_i + \epsilon_i, \ \ \ where \ e_i \sim N(0, \sigma^2_i)
Basically, we just model the error at two different levels.

A similar regression model might take the form

y_i = \alpha + \beta \mu_i + \epsilon

Note that you can assume any kind of error structure you want; it doesn't have to be normal. The same general approach would still apply.
 

Similar threads

  • · Replies 26 ·
Replies
26
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
Replies
28
Views
4K
Replies
8
Views
2K
  • · Replies 20 ·
Replies
20
Views
3K
Replies
24
Views
3K
  • · Replies 11 ·
Replies
11
Views
3K
  • · Replies 16 ·
Replies
16
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 19 ·
Replies
19
Views
2K