Density and distribution

1. Jun 17, 2010

Rasalhague

I'm trying to learn how the names density and distribution, and related terms, are used in statistics and probability theory. Here are four concepts which I've labelled by pairs:

$$(0, 0) = f:\mathbb{R} \to [0,1] \; : \; f(x)=P(X=x)=P(\{s \in S : X(s)=x\})$$

where s is an element of a sample space, S.

$$(0, 1) = F(x)=P(X \leq x)=\sum_{t=-\infty}^{x}f(t)$$

$$(1, 0) = g(x) :$$

$$(i) \, g(x) \geq 0;$$

$$(ii) \, \int_{-\infty}^{\infty}g(x) \, dx = 1;$$

$$(iii) \, \int_{a}^{b} g(x) \, dx = P(a < X < b)$$

where a and b are any two values of X such that a < b.

$$(1, 1) = G(x)=P(X\leq x) = \int_{-\infty}^{x}g(t) \, dt$$

Hoel, in Introduction to Mathematical Statistics, calls concepts (n, 0) probability densities. He calls concepts (n, 1) probability distributions. Wolfram Mathworld gives the same definitions, but adds that some people use the term "cumulative distribution function", CDF, in place of probability distribution. Wikipedia calls (0, 0) a probability mass function (only discrete), (1, 0) a probability density function (only continuous), and (n, 1) cumulative distribution functions, discrete and continuous.

However, Mathworld seems to use the term distribution in a different sense with reference to, for example, the binomial and normal "distributions". The formula they define the binomial distribution by here [ http://mathworld.wolfram.com/BinomialDistribution.html ] is called a "probability density function" by Wolfram Alpha and a "probability mass function" by Wikipedia. Hoel also seems to neglect his earlier definitions and use the terms distribution and density interchangeably with respect to the binomial and normal distribution (or density).

I get the impression that for everyone in the context of the binomial and normal distributions, and for Wikipedia in general, "(probability) distribution" may refer to something broader than (Inclusive of? Related to?) Hoel's densities and distributions, so that the bell curve is only the graph of the probability density function associated with the normal distribution, and neither this density nor the corresponding cumulative distribution function are to be identified as the normal distribution itself. Is that right? Could someone explain more exactly what distribution is here?

I'm afraid I don't understand enough of the supporting terminology to understand Wikipedia's definition of probability distribution [ http://en.wikipedia.org/wiki/Probability_distribution ], but I notice that it avoids identifying it with either probability mass function / probability density, or cumulative distribution function. Instead it just says that probability distributions are "characterized by" these. What exactly is the relationship?

This article uses the notation Pr[X = x] and Pr[X < or = x]. So their probability distribution seems to be a function of two variables, Pr(x, r), where x is a real number, and r is one of two relations, either "=" or "< or =", so that, depending on the relation, it could manifest as Hoel's/Mathworld's density, or Hoel's/Mathworld's (cumulative) distribution. Is that anywhere near the mark?

Last edited: Jun 17, 2010
2. Jun 18, 2010

Rasalhague

Would this be a fair dictionary-entry style answer to the question "What does distribution mean?" in terms of f, g, F, G, as these are defined in #1?

(1) A function Q:Rx{ =, <or= }x{ discrete, nondiscrete } --> [0,1], such that

(a) Q(x,=,discrete) = f(x)
PMF (probability mass function), "discrete density"

(b) Q(x,<or=,discrete) = F(x)
CDF (cumulative distribution function, discrete)

(c) Q(x,=,nondiscrete) = g(x)
PDF (probability density function), "continuous density"

(d) Q(x,<or=,nondiscrete) = G(x)
CDF (cumulative distribution function, continuous)

(Or more generally, Q:Rnx{ =, <or= }x{ discrete, nondiscrete } --> [0,1], with the appropriate generalisations of f, g, F, G to multivariable functions.)

(2) A synonym for CDF functions, i.e. functions of the form F(x) and G(x).

*

I see that Excel does something similar, in that you can enter formulas of the form DISTRIBUTION1-NAME[PARAMETER1,...,PARAMETERn,FALSE] or DISTRIBUTION1-NAME[___,...,___,TRUE] to select for PMF/PDF/density (false), or CFD/distribution2 (true).

3. Jun 18, 2010

This discussion doesn't rely a measure theoretic approach. You can find a discussion in a text on probability theory (Chung's "A Course in Probability Theory" is a good one).

If $$X$$ is a continuous random variable, its (cumulative) distribution function $$F$$ satisfies

$$P(X \le x) = F(x)$$

The density function in the case of a continuous random variable is a function $$f$$ that is non-negative and satisfies

$$\int_{-\infty}^\infty f(x) \, dx = 1, \qquad F' = f$$

Either the cdf or the density can be used for calculation. The classic example is

\begin{align*} P(a \le X \le b) & = F(b) - F(a) \\ P(A \le X \le b) & = \int_a^b f(x) \, dx \end{align*}

In the discrete case the cumulative distribution function $$G$$ still satisfies

$$P(X \le x) = G(x)$$

The function that is in some sense analogous to the density is often called the probability mass function. It satisfies

$$P(X = x) = g(x)$$

Neither $$F$$ nor $$G$$ is the actual distribution: that name refers to the "rule" that governs the assignment of probabilities: normal distribution, exponential distribution, binomial distribution, Poisson, are some examples.

4. Jun 18, 2010

Rasalhague

Okay, so a distribution, in this sense, is not even a function, just the probability-assigning rule associated with these various related functions (if discrete: pmf and cdf, if continuous: pdf, cdf). Thanks for your comments, statdad, and for the book recommendation. The Hoel book was originally published in 1947, although revised in 1983. But given the Mathworld articles, presumably some people still use that terminology, distribution function for cdf; still, I suppose if they always use the word function in that expression, it wouldn't be ambiguous, only confusing to a novice :~)

5. Jun 18, 2010