Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Density and distribution

  1. Jun 17, 2010 #1
    I'm trying to learn how the names density and distribution, and related terms, are used in statistics and probability theory. Here are four concepts which I've labelled by pairs:

    [tex](0, 0) = f:\mathbb{R} \to [0,1] \; : \; f(x)=P(X=x)=P(\{s \in S : X(s)=x\})[/tex]

    where s is an element of a sample space, S.

    [tex](0, 1) = F(x)=P(X \leq x)=\sum_{t=-\infty}^{x}f(t)[/tex]

    [tex](1, 0) = g(x) : [/tex]

    [tex](i) \, g(x) \geq 0;[/tex]

    [tex](ii) \, \int_{-\infty}^{\infty}g(x) \, dx = 1;[/tex]

    [tex](iii) \, \int_{a}^{b} g(x) \, dx = P(a < X < b)[/tex]

    where a and b are any two values of X such that a < b.

    [tex](1, 1) = G(x)=P(X\leq x) = \int_{-\infty}^{x}g(t) \, dt[/tex]

    Hoel, in Introduction to Mathematical Statistics, calls concepts (n, 0) probability densities. He calls concepts (n, 1) probability distributions. Wolfram Mathworld gives the same definitions, but adds that some people use the term "cumulative distribution function", CDF, in place of probability distribution. Wikipedia calls (0, 0) a probability mass function (only discrete), (1, 0) a probability density function (only continuous), and (n, 1) cumulative distribution functions, discrete and continuous.

    However, Mathworld seems to use the term distribution in a different sense with reference to, for example, the binomial and normal "distributions". The formula they define the binomial distribution by here [ http://mathworld.wolfram.com/BinomialDistribution.html ] is called a "probability density function" by Wolfram Alpha and a "probability mass function" by Wikipedia. Hoel also seems to neglect his earlier definitions and use the terms distribution and density interchangeably with respect to the binomial and normal distribution (or density).

    I get the impression that for everyone in the context of the binomial and normal distributions, and for Wikipedia in general, "(probability) distribution" may refer to something broader than (Inclusive of? Related to?) Hoel's densities and distributions, so that the bell curve is only the graph of the probability density function associated with the normal distribution, and neither this density nor the corresponding cumulative distribution function are to be identified as the normal distribution itself. Is that right? Could someone explain more exactly what distribution is here?

    I'm afraid I don't understand enough of the supporting terminology to understand Wikipedia's definition of probability distribution [ http://en.wikipedia.org/wiki/Probability_distribution ], but I notice that it avoids identifying it with either probability mass function / probability density, or cumulative distribution function. Instead it just says that probability distributions are "characterized by" these. What exactly is the relationship?

    This article uses the notation Pr[X = x] and Pr[X < or = x]. So their probability distribution seems to be a function of two variables, Pr(x, r), where x is a real number, and r is one of two relations, either "=" or "< or =", so that, depending on the relation, it could manifest as Hoel's/Mathworld's density, or Hoel's/Mathworld's (cumulative) distribution. Is that anywhere near the mark?
    Last edited: Jun 17, 2010
  2. jcsd
  3. Jun 18, 2010 #2
    Would this be a fair dictionary-entry style answer to the question "What does distribution mean?" in terms of f, g, F, G, as these are defined in #1?

    (1) A function Q:Rx{ =, <or= }x{ discrete, nondiscrete } --> [0,1], such that

    (a) Q(x,=,discrete) = f(x)
    PMF (probability mass function), "discrete density"

    (b) Q(x,<or=,discrete) = F(x)
    CDF (cumulative distribution function, discrete)

    (c) Q(x,=,nondiscrete) = g(x)
    PDF (probability density function), "continuous density"

    (d) Q(x,<or=,nondiscrete) = G(x)
    CDF (cumulative distribution function, continuous)

    (Or more generally, Q:Rnx{ =, <or= }x{ discrete, nondiscrete } --> [0,1], with the appropriate generalisations of f, g, F, G to multivariable functions.)

    (2) A synonym for CDF functions, i.e. functions of the form F(x) and G(x).


    I see that Excel does something similar, in that you can enter formulas of the form DISTRIBUTION1-NAME[PARAMETER1,...,PARAMETERn,FALSE] or DISTRIBUTION1-NAME[___,...,___,TRUE] to select for PMF/PDF/density (false), or CFD/distribution2 (true).
  4. Jun 18, 2010 #3


    User Avatar
    Homework Helper

    This discussion doesn't rely a measure theoretic approach. You can find a discussion in a text on probability theory (Chung's "A Course in Probability Theory" is a good one).

    If [tex] X [/tex] is a continuous random variable, its (cumulative) distribution function [tex] F [/tex] satisfies

    P(X \le x) = F(x)

    The density function in the case of a continuous random variable is a function [tex] f [/tex] that is non-negative and satisfies

    \int_{-\infty}^\infty f(x) \, dx = 1, \qquad F' = f

    Either the cdf or the density can be used for calculation. The classic example is

    P(a \le X \le b) & = F(b) - F(a) \\
    P(A \le X \le b) & = \int_a^b f(x) \, dx

    In the discrete case the cumulative distribution function [tex] G [/tex] still satisfies

    P(X \le x) = G(x)

    The function that is in some sense analogous to the density is often called the probability mass function. It satisfies

    P(X = x) = g(x)

    Neither [tex] F [/tex] nor [tex] G [/tex] is the actual distribution: that name refers to the "rule" that governs the assignment of probabilities: normal distribution, exponential distribution, binomial distribution, Poisson, are some examples.
  5. Jun 18, 2010 #4
    Okay, so a distribution, in this sense, is not even a function, just the probability-assigning rule associated with these various related functions (if discrete: pmf and cdf, if continuous: pdf, cdf). Thanks for your comments, statdad, and for the book recommendation. The Hoel book was originally published in 1947, although revised in 1983. But given the Mathworld articles, presumably some people still use that terminology, distribution function for cdf; still, I suppose if they always use the word function in that expression, it wouldn't be ambiguous, only confusing to a novice :~)
  6. Jun 18, 2010 #5


    User Avatar
    Homework Helper

    Yes, distribution function and cdf are often used interchangeably. Terminology is always fun - the most obvious difference in language is the use of "normal distribution" in the US for what much, if not most, of the rest of the world refers to as the "Gaussian distribution".
  7. Jun 19, 2010 #6
    Ha ha, tooth-grindingly fun! Thanks for that tip too. Just to be more specific, say the experiment is to flip a coin 3 times. Is the binomial distribution for this experiment the rule that assigns an equal probability of 1/8 to each simple event, i.e. each set containing one element of the underlying set, S, of a sample space? Or is it the function which uses this rule to map subsets of S to the interval [0,1]? Or is it a rule that also specifies what counts as a success, e.g. getting heads? Is it a rule, or a function, that depends on the random variable? Hoel describes the random variable as generating a new sample space. In this case, if the random variable X:S-->R such that X(s), for each s in S, is the number of heads obtained, would the binomial distribution perhaps be the probability rule that depends both on X and on the probabilities assigned to subsets of S and that assigns unequal probabilities to the integers {0, 1, 2, 3} (i.e. the rule of the PMF)?
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook