Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Stats text error?

  1. Dec 4, 2011 #1
    ...or more likely, I'm just being retarded. So here's the offending statement:

    Let X be our random variable with a countable range of {x_1, x_2, .... , x_n, ...}. Let f(x) be a discrete density function and F(x) the corresponding cumulative distribution function. Then,

    f(x_j) = F(x_j) - lim_(0<h→0){F[ (x_j) - h ]}.

    I expected the text to say,

    f(x) = F(x_j) - F[ x_j - x_(j-1) ]

    Keep in mind this is a respected stats text, i.e., Mood's Introduction to the Theory of Statistics. What the heck is going on. So confused.

    p.s. sorry if my formatting is horrible. I'm not entirely sure yet how best to exhibit math in asii code. (I just joined).
  2. jcsd
  3. Dec 4, 2011 #2

    It is slightly odd but it doesn't seem wrong to me. That looks like the general definition of a CDF. I guess the author saw no need to introduce a special definition for discrete variables.
  4. Dec 4, 2011 #3


    User Avatar
    Science Advisor

    It makes sense to me.

    What you should do is draw a diagram of a discrete PDF function and then look at what translate to pictorially.
  5. Dec 4, 2011 #4

    Stephen Tashi

    User Avatar
    Science Advisor

    It's interesting that it says a "countable range". An example of a countable range would be all rational numbers in the interval [0,1]. In that case, if [itex] X_j = 1/2 [/itex] then what would [itex] X_{j-1} [/itex] be? Mood's definition doesn't require that we know the answer to that.

    It does say "discrete density function". So how does the text define "discrete density function"? Can we define a ramp shaped distribution on the rational numbers in the interval [0,1] ? Would that count as a "discrete" distribution?

    See the thread:
    Physics Forums > PF Lounge > Forum Feedback & Announcements
    LaTeX Guide: Include mathematical symbols and equations in a post
  6. Dec 4, 2011 #5

    Stephen Tashi

    User Avatar
    Science Advisor

    Here's an interesting example:

    Let the possible values of the random variable X be {1, 1/2, 1/2+1/4, 1/2+1/4+1/8, 1/2+1/4+1/8 +1/6,... etc }

    Define the cumulative distribution F as follows:
    Let F(x) = 0 for x < 1/2
    Let F(x) = 1 for x >= 1
    For x in [1/2,1) define F as follows:
    Let F(x) = x if x is a possible value of X. If x is not a possible value of X then let F(x) = y where y is largest possible value of X that is less than x.

    Evaluate the density f(1). We can't find it by using F(1) - F(the largest possible value of X that is just smaller than 1).
  7. Dec 7, 2011 #6
    Thanks you all. As it turns out, I was an idiot. His definition was correct (seemingly). It turns out for any point x_j , if we move an infinitesimal amount 'left' (in the negative direction), the CDF takes on the value of F( x_(j-1) ). A friend showed me graphically.

    @Stephen You are right. Mood has an incredibly liberal definition, where the range of random variable X needn't merely take on integer values. It only needs to be a set with the same cardinality some subset of the integers. In other words, if you can make a bijection between {x_1, x_2, ... x_n, ...} and S ⊆ ℤ, it is a valid range for a discrete random variable.

    ALSO, tremendous example! :D

    I feel like I can intuit what f(1) should be. Mood defines F( X = x ) = P[ {x' : x' < x, x' ∊ R(X) } ], where R(X) is the range of the discrete random variable. So it seems the P-functions argument, the set {x' : x' < x, x' ∊ R(X) } should equal [ {1, 1/2, 1/2 + 1/4, 1/2 + 1/4 + 1/8 +, ...} - {1} ].

    But oddly with Mood's limit CDF definition, 1 would would be x_k where k ⟶ ∞. Not sure how the limit definition handles it.
  8. Dec 7, 2011 #7

    Stephen Tashi

    User Avatar
    Science Advisor

    I don't know whether whether you mean the P-function to be the cumulative distribution or the density. It looks like you are trying to subtract a set from another set.

    Mood's definition is:

    [itex] f(x_j) = lim_{h \rightarrow 0^+} ( F(x_j) - F(x_j - h)) [/itex]

    I think this would imply (after some detailed arguing) that
    [itex] f(1) = F(1) - lim_{k \rightarrow \infty} \sum_{j=1}^k \frac{1}{2^j} = 1 - 1 = 0 [/itex]

    and 0 is the correct answer.


    There are several interesting tehnicalities here. To apply Mood's definition (using the ordinary definition of limit) we must define the cumulative F(x) on entire intervals of real numbers, not merely at isolated points. Otherwise the limit he requires doesn't exist. I wonder if other books on probability do this for the cumulatives of discrete distributions?

    The problem facing writers on introductory probability theory is that they are trying to dance around having to teach advanced theories of integration. I'm glad they do this since I find advanced theories of integration hard to keep in mind.

    Consider this example. We have an idealized dart game with a circular board of radius 10. Suppose the dart lands on the board at distance x from the center, the points you get are given by the rules: You get [itex] \frac{1}{x-0.5} [/itex] points if the dart lands with [itex] x \gt 1 [/itex]. You get 2 points if [itex] x \le 1 [/itex].

    Suppose [itex] x [/itex] has a uniform distribution on the interval [0,10]. What is the probability density [itex] f(s) [/itex] of the random variable s that gives the points that are scored on a throw?

    There is a problem integrating f(s) to be 1 by any elementary theory of integration. If you use Riemann integration, you can't detect the fact that s = 2 is a special situation. In a manner of speaking, 2 is only "one point wide", so no matter what value you give f(2), it won't change the value of the Riemann integral. On the other hand, if you try to use discrete summation instead of integration, you can't deal with f(s) on the interval [itex] [\frac{1}{9.5}, 2.0) [/itex].

    The rough and ready way is to declare f(2) to be a "point mass" and do a hybrid integral combining Riemann integration plus a summation of f(2) when you integrate over any interval containing s = 2. I have not seen this method advocated in any introductory probability courses. I suppose it would be embarrassing to put it in a textbook since the "proper" way to do things is by measure theory and/or fancier methods of integration.
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook