Fitting distributions that have a singular component

Click For Summary

Discussion Overview

The discussion revolves around the challenges of fitting distributions that include a singular component, specifically focusing on data derived from a mixture of a uniform distribution and the Cantor distribution. Participants explore estimation methods for the unknown probability parameter p, as well as the implications of finite precision in data representation.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • Some participants propose that traditional maximum likelihood estimation (MLE) methods may not be applicable for estimating p in the context of mixed distributions.
  • Others discuss the philosophical implications of claiming to have data from continuous distributions, suggesting that such claims often involve assumptions about precision and representation.
  • A participant introduces the idea of representing mixed distributions through a continuous random variable, using a dart game analogy to illustrate how a continuous function can yield discrete outcomes.
  • There is a suggestion that Monte Carlo methods could be employed to sample from the mixed distribution, contingent on the precision of the numbers used.
  • Some participants mention the inverse cumulative distribution function (CDF) approach as a potential method for estimating the mixed distribution, raising questions about the properties of such estimators.
  • Alternative estimation methods, such as the Method of Moments, are proposed as viable options for cases where MLE is problematic.

Areas of Agreement / Disagreement

Participants express a range of views on the appropriate methods for estimating parameters in mixed distributions, with no consensus reached on a single approach. The discussion includes both agreement on the limitations of MLE and differing opinions on alternative methods and their applicability.

Contextual Notes

Participants note the challenges posed by finite precision in data representation and the implications for statistical estimation, highlighting the need for careful consideration of definitions and assumptions in the context of mixed distributions.

bpet
Messages
531
Reaction score
7
For example, suppose you have some data where each point takes its value from U(0,1) with probability p and the Cantor distribution with probability (1-p) where p is fixed but unknown.

Here the standard MLE approach falls over, so how would you go about estimating p?
 
Physics news on Phys.org
bpet said:
For example, suppose you have some data

Imagining how we could have data from a singular distribution is an interesting challenge. Even saying we have data from a non-sigular continuos distribution is usually a lie, although apparently a harmless lie.

When we say that we "have data" from a garden variety distribution, such as the uniform distribution on [0,1], we mean that we have data that consists of some truncated values, so we have only rational numbers and only limited precision.

We could imagine having data with infinite precision if we imagine that the numbers are expressed in some symbolic form, such as [itex]\frac{\pi}{3} , \frac{e}{3},\frac{\sqrt{2}}{7}[/itex] etc. However, are we getting into some subtle logical contradiction by doing that? This has to do with imagining that there is a process that can sample a distribution, develop a system of symbolic representation that is adequate to exactly represent the value of each observation and express the data in that form.
 
Stephen Tashi said:
Imagining how we could have data from a singular distribution is an interesting challenge. Even saying we have data from a non-sigular continuos distribution is usually a lie, although apparently a harmless lie.

When we say that we "have data" from a garden variety distribution, such as the uniform distribution on [0,1], we mean that we have data that consists of some truncated values, so we have only rational numbers and only limited precision.

We could imagine having data with infinite precision if we imagine that the numbers are expressed in some symbolic form, such as [itex]\frac{\pi}{3} , \frac{e}{3},\frac{\sqrt{2}}{7}[/itex] etc. However, are we getting into some subtle logical contradiction by doing that? This has to do with imagining that there is a process that can sample a distribution, develop a system of symbolic representation that is adequate to exactly represent the value of each observation and express the data in that form.

Interesting point. I guess, when working with finite precision data we are only observing events with non-zero probability, e.g. "X is in the interval (x-dx,x+dx)".

To apply MLE to data drawn from absolutely continuous distributions would also require that the dx are sufficiently small yet equal for any x - but wouldn't that lead to problems when the data is a mix of continuous and singular data, e.g sensitivity to the magnitude of dx?
 
The way I imagine a mixed continuous and discrete distribution of the random variable Y is to think about it being generated by a continuous random variable X. For example there could be a dart came where one's score is a function Y(X) of the distance X that the dart lands from the center of the board. The function could be a continuously varying function of X on some parts of the board. The board could also have a few circular rings of finite area and if the dart lands on such a ring then Y takes on some value that is constant over the entire ring.

It wonder if some of that measure theory that I was supposed to learn once-upon-a-time says when a mixed distribution can be represented that way. If a mixed discrete and continuous distribution can be respesented as Y(X) for some continuous distribution X then my intuition says that in a practical situation, on could Monte Carlo X using numbers of finite precision and get a good represenation of sampling Y.

I suppose to do actual math, we'd have to formulate a precise definitions of our goals - something to the effect that an "approximable" distribution of Y is one that is the limit (in distribution) of a sequence of random variables Y, where the Y are discrete distributions ( thinking of them as samples from continuous distributions taken with a limited precision that becomes more precise as i increases).
 
Stephen Tashi said:
...If a mixed discrete and continuous distribution can be respesented as Y(X) for some continuous distribution X then my intuition says that in a practical situation, on could Monte Carlo X using numbers of finite precision and get a good represenation of sampling Y.

We could use the inverse cdf approach, i.e. X ~ U(0,1) and set Y=inf{x:F(x)>=X} (this works for any distribution with known cdf, mixture or otherwise).

I suppose to do actual math, we'd have to formulate a precise definitions of our goals - something to the effect that an "approximable" distribution of Y is one that is the limit (in distribution) of a sequence of random variables Y, where the Y are discrete distributions ( thinking of them as samples from continuous distributions taken with a limited precision that becomes more precise as i increases).


I wonder if explicit discretization can be avoided by directly fitting the cdf to the empirical cdf, for example by minimizing the KS distance. Is much known about the properties of these sort of estimators?
 
bpet said:
For example, suppose you have some data where each point takes its value from U(0,1) with probability p and the Cantor distribution with probability (1-p) where p is fixed but unknown.

Here the standard MLE approach falls over, so how would you go about estimating p?

Hey bpet.

There are other estimators besides MLE.

For the uniform, one estimator is using the Method of Moments. Basically this boils down to using moment information (first moment is the mean) and solving equations for estimators using the number of moments required being equal to the number of parameters estimated.

You can use these in the cases where say a continuous distribution has no time-changing derivative like in the uniform case.
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 16 ·
Replies
16
Views
3K
Replies
3
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 8 ·
Replies
8
Views
5K
  • · Replies 17 ·
Replies
17
Views
2K