Continuous random variable: Zero probablity

In summary, the conversation discusses the concept of zero probability in a continuous cumulative probability distribution. It is pointed out that assigning a finite value or zero to a point in the distribution can lead to difficulties, and instead, a range or set of values is used to represent a positive probability. The conversation also includes a discussion on the interpretation of the cumulative probability function and the limitations of applying transitivity over an uncountable set.
  • #36
Stephen Tashi said:
No definition of probability in terms of measure theory deals with whether events happen. If there is a definition saying an event with probability zero never happens, it isn't a definition from mathematical probability theory.

And, do you agree that:

If you choose two real numbers, ##a, b## at random from a uniform distribution on ##\mathbb{R}##, then the probability that the quadratic equation ##x^2 + 2bx + c = 0## has complex roots is 0?
 
Mathematics news on Phys.org
  • #37
PeroK said:
How would you make something happen that has 0 probability?

How to do that is irrelevant to the content of mathematical probability theory. That's my point.

Mathematical probability simply doesn't deal with the happening or non-happening of events. It merely says that a collection of events can be assigned a number called a "probability". The question of whether an event happens given that has been assigned a probability (be that probability zero, or one, or some other number - be it assigned by a probability density or by some other form of "measure") is a question of physics or of whatever science governs the particular problem being studied. The happening or non-happening of events involves the interpretation of how probability serves as a model for some physical phenomena.
 
  • #38
PeroK said:
And, do you agree that:

If you choose two real numbers, ##a, b## at random from a uniform distribution on ##\mathbb{R}##,

What specific function would be a "uniform distribution on ##\mathbb{R}##"?
 
  • #39
Stephen Tashi said:
What specific function would be a "uniform distribution on ##\mathbb{R}##"?

The one implied by this post, which started the whole debate:
StoneTemplePython said:
Sometimes coming at this from a different angle is illuminating. Here's a very relevant problem from Fifty Challenging Problems in Probability

>> What is the probability that the quadratic equation: x2+2bx+c=0x^2 + 2bx + c = 0 has real roots?

note your domain is reals, b and c are independently sampled from (−∞,∞)(-\infty,\infty), though it might be helpful to think of them coming from (−3,3)(-3, 3) or (−n,n)(-n, n) and to then consider the limiting behavior.

I used to not like this "zero probability but still possible" stuff -- but eventually I got over it, and this problem helped that process move along.
 
  • #40
PeroK said:
The one implied by this post, which started the whole debate:

I don't see any specific function mentioned in the passage you quoted.

The passage you quoted attempts to define a uniform distribution on ##\mathbb{R}## as a limit of a sequence uniform distributions on ##[-n,n]## as ##n \rightarrow \infty##. However a function that is the limit of a sequence of distribution functions is not necessarily a distribution function itself.
 
  • #41
Stephen Tashi said:
I don't see any specific function mentioned in the passage you quoted.

The passage you quoted attempts to define a uniform distribution on ##\mathbb{R}## as a limit of a sequence uniform distributions on ##[-n,n]## as ##n \rightarrow \infty##. However a function that is the limit of a sequence of distribution functions is not necessarily a distribution function itself.

That doesn't answer the question about whether it's possible for a quadratic equation to have complex roots, even if the probability that it has complex roots is 0!

I'm certainly willing to retract my statement about "something with probability 0 cannot happen, by definition", as meaningless because it mixes a strictly mathematical concept (probability) with a physical concept (something happening).

I would replace it by:

In any application of probability theory, we should not apply the theory in such a way that events that are physically possible are mapped to sets with zero probability (measure).
 
  • #42
PeroK said:
In any application of probability theory, we should not apply the theory in such a way that events that are physically possible are mapped to sets with zero probability (measure).

That's uncontroversial in many applications. However, physics uses continuous distributions and the probability of Nature realizing a specific value from a continuous distribution like a gaussian is zero. So, are physically possible events mapped to sets with probability zero? Or are continuous distributions an incorrect model for what is physically possible? (I don't know.)

Probability theory doesn't even have any axiom that says we can take samples of a random variable. This thread is discusses the distinction between being able to "realize" a set as the value of a random sample versus being able to assign a set a probability: https://www.physicsforums.com/threads/relative-frequency-and-nonmeasurable-sets.909705/
 
  • #43
PeroK said:
In any application of probability theory, we should not apply the theory in such a way that events that are physically possible are mapped to sets with zero probability (measure).

Among other things, your suggestion would have unfortunate geometric implications. Consider the case of generating two 2-d vectors with each of the (4 total) Cartesian coordinates generated independently, uniformly at random over ##[-1,1]##. Can these two vectors be linearly dependent? Yes, of course. What is the probability of that? Zero. And how much area / 'volume' is enclosed by two linearly dependent vectors? Zero -- i.e. determinant = 0.

This experiment and it's results generalize to a (finite) n-dimensional vector space with ##\leq n## vectors in them. (Of course if you had n + 1 vectors in an n dimensional space... the results would be different, with probability one. It is also worth noting that that probability of generating the zero vector during one of these experiments is zero, even though it is technically possible.)

There is considerably more thought behind these things -- defective random variables certainly come to mind -- but that's probably about as much as I have to say on it.

If you don't like the problem from Mosteller (or to be technically accurate -- my perhaps overly quick re-statement of it), it's a minor point. Trying to re-define the meaning of zero probability on the other hand is quite an undertaking. You may like the nonstandard view from Nelson.
 
  • #44
StoneTemplePython said:
To answer your question -- modern probability theory comes from Kolmogorov, using measure theory. There are other rigorous formulations -- e.g. Nelson's Radically Elementary Probability Theory which does use nonstandard analysis -- the preface humorously suggests that even high school students can understand the book (it is not an easy read). You can read it here: https://web.math.princeton.edu/~nelson/books/rept.pdf

For the most part you will not see people use infinitesimals in standard probability theory -- except when they 'tap out' and feel like it is the needed to convey what they want to say. (In particular I've seen a lot of people start talking about infinitesimal generators of continuous Markov chains, but otherwise they won't use infinitesimals.)

Once you get used to the idea that zero probability events can happen (but not zero density ones), I suspect you'll be ok with the terminology. It took me a while.
- - - -
Sometimes coming at this from a different angle is illuminating. Here's a very relevant problem from Fifty Challenging Problems in Probability

>> What is the probability that the quadratic equation: ##x^2 + 2bx + c = 0## has real roots?

note your domain is reals, b and c are independently sampled from ##(-\infty,\infty)##, though it might be helpful to think of them coming from ##(-3, 3)## or ##(-n, n)## and to then consider the limiting behavior.

I used to not like this "zero probability but still possible" stuff -- but eventually I got over it, and this problem helped that process move along.
Discriminant =[tex]b^2-c[/tex]
Discriminant > 0 equivalent to real roots.
Case1: c<0 -> real roots.
Case2: Choose b and c uniformly from some large positive interval (0,N) Prob(discr. <0) ~ 1/N. Limiting case prob. both roots real =1.
 
  • #45
StoneTemplePython said:
Among other things, your suggestion would have unfortunate geometric implications. Consider the case of generating two 2-d vectors with each of the (4 total) Cartesian coordinates generated independently, uniformly at random over ##[-1,1]##.

Selecting a single value from a uniform distribution over [-1,1] involves the happening of an event that has probability zero. So we have a problem right there.

The other thread I mentioned poses the following apparent paradox. Suppose it is possible to select numbers independently at random from a uniform distribution on [0,1]. Let S be a (well defined) subset of [0,1] that is not measurable. For each number we select , we determine if that number is in S or not. We keep a tally of the fraction of numbers that are selected that are in S. What would such empirical data look like? If its tends to approach a limiting frequency then that would empirically suggest that a probability can be assigned to the event S. If it does not approach a limiting frequency, then what does the data look like? Does it oscillate? How can it "know" to oscillate if we are taking independent samples?
 
  • #46
mathman said:
Discriminant =[tex]b^2-c[/tex]
Discriminant > 0 equivalent to real roots.
Case1: c<0 -> real roots.
Case2: Choose b and c uniformly from some large positive interval (0,N) Prob(discr. <0) ~ 1/N. Limiting case prob. both roots real =1.

This is the gist of it. I thought it was a fun little problem that gives people a flavor of: probability zero can happen. It also has a nice little picture associated with it -- if people like drawing squares and parabolas.

Stephen Tashi said:
Selecting a single value from a uniform distribution over [-1,1] involves the happening of an event that has probability zero. So we have a problem right there.

I don't really see this as a problem to be honest. It is the generator of the correct (standard) answer for OP. The fact that when generating n, n-dimensional random vectors, with probability one they will be linearly independent is... of interest. (Of course when I do this with finite precision floats on my computer we're just talking about a very very small probability of linear dependence that is quite close to zero.)

With respect to the rest of your posting... it seems like a giant rabbit hole.
 
  • #47
StoneTemplePython said:
I don't really see this as a problem to be honest.
It's a problem in a practical sense, since it can't be done by any known method.

It is the generator of the correct (standard) answer for OP.
The original post concerns two questions. The first is the mathematical question of how a set of points can be assigned a nonzero length or "measure" when each individual point has zero measure. The second is whether an event with zero probability can actually happen - which is not a mathematical question. I agree that there is a standard mathematical answer to the first question.
 
  • #48
Regarding the issue of assigning a 0 probability to a result that physically may happen (or has already happened):

So much use of probability is addressing the question of what result we should guess for an event that has already happened, given the information that we have. Suppose I say that I have already flipped a coin and got a result. If that is all you know, you would legitimately give a 50/50 probability to heads and tails. But in reality, the coin has already been flipped, so in the physical world one result (say heads) has already happened and the other has not. If you don't have any hints about the result, you must keep the 50/50 odds.

Probability addresses the question of what odds we should assume for alternative results given the information we have. This is true whether the experiments have already been done or not. If the only information you are given is that a number has been (or will be) from a uniform distribution on [0,1], then you must say that the probability of any exact pre-assigned value is zero. If you know all kinds of details about how the number has been (or will be) selected, then you may be able to give positive values to particular exact values. But without that knowledge, you are left with P(X=x0) = 0. The experiment is done in the physical world and there are many details that may influence your guess, but if you don't have any of that information, the probability must remain 0.
 
  • #49
FactChecker said:
Probability addresses the question of what odds we should assume for alternative results given the information we have.

Attempting to define probability in terms of odds is a circular definition unless we define "odds" using some non-probabilistic concept. However, I agree that probability theory can be applied, as you say, in the Bayesian sense where we assign probabilities differently as different information becomes available.

If we are given the information that a number was selected from a uniform distribution on [0,1] (and we believe that information) then obviously we have accepted that it is possible to select a number from a uniform distribution on [0,1]. Given that information, we must (by the definition of uniform distribution and definition of how it assigns probability) use the uniform distribution to assign zero probability to that event. However, this hypothetical situation does not address whether it is actually possible to select a number from a uniform distribution on [0,1]. As I said before, the question of whether it is possible by any physically realizable process to select a number from a uniform distribution on [0,1] is a question for physics.
 
  • Like
Likes FactChecker
  • #50
Stephen Tashi said:
Attempting to define probability in terms of odds is a circular definition unless we define "odds" using some non-probabilistic concept. However, I agree that probability theory can be applied, as you say, in the Bayesian sense where we assign probabilities differently as different information becomes available.

If we are given the information that a number was selected from a uniform distribution on [0,1] (and we believe that information) then obviously we have accepted that it is possible to select a number from a uniform distribution on [0,1]. Given that information, we must (by the definition of uniform distribution and definition of how it assigns probability) use the uniform distribution to assign zero probability to that event. However, this hypothetical situation does not address whether it is actually possible to select a number from a uniform distribution on [0,1]. As I said before, the question of whether it is possible by any physically realizable process to select a number from a uniform distribution on [0,1] is a question for physics.
I agree. My point is that without information about the physical process, the true fact is that our knowledge has not improved and the probabilities are unchanged. For all we know, the method used to select the random number have itself been randomly selected out of a multitude of methods. I believe it is legitimate to talk about a continuous CDF with zero probability at every single point, even for a physical, non-theoretical case. Whether that is physically possible seems to be a "religious" issue.
 
  • Like
Likes StoneTemplePython
  • #51
Suppose I throw a dart at an (X,Y) grid [0,1]x[0,1], and say that the result is the exact x-coordinate of the dart's center of mass. Then I have done an experiment in the physical world where the result has an infinite accuracy. In the physical world, we will not be able to determine the exact result with infinite accuracy, but nevertheless, it has happened. That exact result had zero probability. (Here I am assuming that there is no quantization in nature of the space-time coordinate system.) The fact that we can not determine and record the result with infinite accuracy is not relevant.
 
  • #52
FactChecker said:
Suppose I throw a dart at an (X,Y) grid [0,1]x[0,1], and say that the result is the exact x-coordinate of the dart's center of mass. Then I have done an experiment in the physical world where the result has an infinite accuracy. In the physical world, we will not be able to determine the exact result with infinite accuracy, but nevertheless, it has happened. That exact result had zero probability. (Here I am assuming that there is no quantization in nature of the space-time coordinate system.) The fact that we can not determine and record the result with infinite accuracy is not relevant.

This assumes that the dart has a well-defined centre of mass and that the centre of mass has a well-defined position without being measured. Mathematically, this is fine. But, I don't think these mathematical concepts extend to the real world with "infinite accuracy". Whether you consider Quantum Mechanics or not. Classically, the atoms in a dart are never at rest, so its centre of mass is never at rest either.

To me, this is a fundamental difference between mathematics and reality. An experiment cannot generate these mathematical things, like an x-coordinate to infnite accuracy.
 
Last edited:
  • #53
StoneTemplePython said:
If you work through the math, as n grows large, you actually get real roots with probability one (i.e. the probability of complex root is zero)
Please share this working through.

Suppose that we select a, b and c from the uniform distribution over [-1,+1]. We get a non-zero probability of the result having a discriminant ##b^2-4ac## greater than zero and a non-zero probability of the discriminant having a value less than zero.

Now suppose instead that we select A, B and C from a uniform distribution over [-n,+n]. This is equivalent to selecting a, b and c as above from a uniform distribution over [-1,+1] and setting A=na, B=nb, C=nc. The resulting discriminant is then given by ##(nb)^2-4(na)(nc) = n^2(b^2-4ac)##. The distribution of signs of the discriminant is independent of n.

The limiting probability of a complex root is not zero.
 
  • #54
PeroK said:
This assumes that the dart has a well-defined centre of mass and that the centre of mass has a well-defined position without being measured. Mathematically, this is fine. But, I don't think these mathematical concepts extend to the real world with "infinite accuracy". Whether you consider Quantum Mechanics or not. Classically, the atoms in a dart are never at rest, so its centre of mass is never at rest either.

To me, this is a fundamental difference between mathematics and reality. An experiment cannot generate these mathematical things, like an x-coordinate to infnite accuracy.
It doesn't matter if there is an atom there or not or if atoms are moving. It doesn't matter how the center of mass is defined. However it is defined, If that location exists with infinite precision in the time-space coordinate system, then the experiment has a result that has infinite precision. The probability of the infinite precision result is zero. We can never measure or record it with infinite precision, but that is a different issue.
 
  • #55
FactChecker said:
It doesn't matter if there is an atom there or not or if atoms are moving. It doesn't matter how the center of mass is defined. However it is defined, If that location exists with infinite precision in the time-space coordinate system, then the experiment has a result that has infinite precision. The probability of the infinite precision result is zero. We can never measure or record it with infinite precision, but that is a different issue.

The task, I believe, is to choose a random number. Your answer is to throw a dart at a board and take the "x-coordinate". The issue is whether choosing a random number includes identifying it.

Since you can't actually identify the one you have chosen, does that count as choosing one?

In the finite case it wouldn't work. If you have to draw the first round of a tennis tournament, you have to actually produce the players' names. You can't have match 1 between two unknown players who have have been chosen by a process that didn't actually identify them!

It would be different if you identified it by some property, like the solution to a transendental equation. That still identifies the number uniquely.

But, to say that your random number is "the x-coordinate of that dart (whatever it is)" doesn't feel like you've actually chosen a specific number.
 
  • #56
PeroK said:
The task, I believe, is to choose a random number. Your answer is to throw a dart at a board and take the "x-coordinate". The issue is whether choosing a random number includes identifying it.

Since you can't actually identify the one you have chosen, does that count as choosing one?
The question is whether there are always physical-world constraints on the accuracy of experimental results. I believe that there are examples where the only limitation is on our ability to observe and record the result with infinite accuracy, not on the result itself. I think that the dart-throw is a physical experiment where the result had a zero prior probability. The dart landed. The x-coordinate position exists (unless time-space coordinates are quantized). The physical result occurred. The issue of whether I was able to observe it and record it with infinite precision is a separate issue.
 
  • #57
FactChecker said:
The question is whether there are always physical-world constraints on the accuracy of experimental results. I believe that there are examples where the only limitation is on our ability to observe and record the result with infinite accuracy, not on the result itself. I think that the dart-throw is a physical experiment where the result had a zero prior probability. The dart landed. The x-coordinate position exists (unless time-space coordinates are quantized). The physical result occurred. The issue of whether I was able to observe it and record it with infinite precision is a separate issue.

I've had an idea that I'll post on another thread.
 
  • #58
FactChecker said:
I agree. My point is that without information about the physical process, the true fact is that our knowledge has not improved and the probabilities are unchanged. For all we know, the method used to select the random number have itself been randomly selected out of a multitude of methods. I believe it is legitimate to talk about a continuous CDF with zero probability at every single point, even for a physical, non-theoretical case. Whether that is physically possible seems to be a "religious" issue.

I like this, though I trust you mean PDF, not CDF. (If you actually mean CDF... well then we have a big problem as it seems to mean your PDF doesn't integrate to one and so on... if you are mapping all probability to ##\infty## the random variable would seem to be pathologically defective).

To a large degree, a lot of this discussion reminds of when I tell someone the convention in mathematics that

##1.99999999999... = 2##

It is a convention and if someone says they don't accept it, well fine, but this creates all kinds of contradictions with basic rules of arithmetic and requires a lot of special, new inventions. You point that out to them and after a bit of thought they either accept the convention, or they respond by saying that ##1.99999999999...## doesn't exist. But existence has nothing to do with it. And even if people don't like the beauty of math, I think people get that using things like infinite series and continuity can be immensely useful.
 
  • Like
Likes FactChecker
  • #59
FactChecker said:
Suppose I throw a dart at an (X,Y) grid [0,1]x[0,1], and say that the result is the exact x-coordinate of the dart's center of mass. Then I have done an experiment in the physical world where the result has an infinite accuracy.

We can suppose such a thing can happen, but If we suppose that you (or Nature) can pick an exact mathematical point from a continuous distribution then we have made an assumption about physics.
In the physical world, we will not be able to determine the exact result with infinite accuracy, but nevertheless, it has happened. That exact result had zero probability. (Here I am assuming that there is no quantization in nature of the space-time coordinate system.) The fact that we can not determine and record the result with infinite accuracy is not relevant.

I agree that the following physical situations are different:

1) Nature cannot select an exact result from a continuous probability distribution.

2) Nature can select an exact result from a continuous probability distribution, but we cannot measure what nature has done exactly.

So the fact we cannot measure an exact result from an experiment doesn't tell us whether 1) or 2) is the case.

My point about the mathematical theory of probability is that it does not assert we can do such an experiment with a dart. - i.e. it does not assert that 2) is the case.
 
  • Like
Likes FactChecker
  • #60
jbriggs444 said:
Please share this working through.

Suppose that we select a, b and c from the uniform distribution over [-1,+1]. We get a non-zero probability of the result having a discriminant ##b^2-4ac## greater than zero and a non-zero probability of the discriminant having a value less than zero.

Now suppose instead that we select A, B and C from a uniform distribution over [-n,+n]. This is equivalent to selecting a, b and c as above from a uniform distribution over [-1,+1] and setting A=na, B=nb, C=nc. The resulting discriminant is then given by ##(nb)^2-4(na)(nc) = n^2(b^2-4ac)##. The distribution of signs of the discriminant is independent of n.

The limiting probability of a complex root is not zero.

First, carefully look at the original problem. Or as I noted in a follow-up post:

StoneTemplePython said:
For avoidance of doubt, there is no ##a## term in ##x^2 + 2bx + c = 0## (or put differently, ##a## is fixed at one).

Now draw a graph with a square from ##[-n, n]## on the X and Y axis. Area of the square is ##2n * 2n = 4n^2##. Now draw parabola associated with ##b^2 -c \lt 0##. Color blue: all the area 'inside' the parabola (and bounded above by the square's top edge). For ##n \geq 1## the blue area should be ##\frac{4}{3}n^{\frac{3}{2}}##. What is the ratio of blue area to total square? ##\frac{ \frac{4}{3}n^{\frac{3}{2}}}{4n^2}##, or as I'd call it ##O\big(\frac{1}{\sqrt n}\big)##.
 
  • #61
StoneTemplePython said:
I trust you mean PDF, not CDF.
Good point. Thanks. I worded my statement badly. I meant that the CDF is continuous, implying that the probability of any single exact resulting value is zero. -- Not that the cumulative probability is zero.
 
  • #62
Stephen Tashi said:
We can suppose such a thing can happen, but If we suppose that you (or Nature) can pick an exact mathematical point from a continuous distribution then we have made an assumption about physics.
Good point. In fact, I may have seen somewhere that in quantum theory time is in fact quantized, so location on an X-axis may also be quantized. I don't know enough to comment more than that. Even if that is true, I think that I would accept the approximation of the discrete physics with a continuous model for the purpose of ignoring any quantization of time-space.
I agree that the following physical situations are different:

1) Nature cannot select an exact result from a continuous probability distribution.

2) Nature can select an exact result from a continuous probability distribution, but we cannot measure what nature has done exactly.

So the fact we cannot measure an exact result from an experiment doesn't tell us whether 1) or 2) is the case.

My point about the mathematical theory of probability is that it does not assert we can do such an experiment with a dart. - i.e. it does not assert that 2) is the case.
I have to agree. At the finest level of detail, we may never know the answer. I will have to resign myself to the realization that, at the quantum level, the continuous CDF may not be possible. It may be an approximation.
 
  • #63
Consider a block of wood whose linear density you know, say 100 g per cm. You acquire mass by spanning a distance. As that distance gets smaller so does the mass acquired, a span the thickness of a thin paper would be very small. In the limit as the span approaches zero you would of course have zero mass. The same effect is seen in spectrum analysis. As the bandwidth gets narrower the energy measured gets less, if you had a bandwidth of zero - a single frequency - you would have zero energy.
 
  • #64
Calling it zero probability is linguistically misleading -- only the impossible has zero probability -- calling something possible is the same as saying it has more than zero probability. The probability that 1 = 2 is 0. The probability that a number to be chosen from all real numbers will be 1.2 is > 0.
 

Similar threads

  • General Math
Replies
31
Views
1K
Replies
12
Views
736
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
104
  • General Math
Replies
3
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Science and Math Textbooks
Replies
6
Views
835
  • General Math
Replies
5
Views
1K
  • General Math
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
956
Back
Top