B Continuous random variable: Zero probablity

AI Thread Summary
In discussions about continuous random variables, the probability of any specific value is zero, despite the probability density function (PDF) potentially being non-zero at that point. This leads to the understanding that while the area under the curve at a single point is zero, the cumulative distribution function (CDF) can still yield non-zero probabilities over intervals. The confusion arises from interpreting the PDF and CDF, as the PDF does not represent probability directly but rather the slope of the CDF. The mathematical framework of probability does not assert that events with zero probability cannot occur; it merely assigns probabilities based on a measure space. Thus, the concept of "zero almost surely" allows for the possibility of events with zero probability happening in practical applications.
Biker
Messages
416
Reaction score
52
I just have a couple of questions about how it can be zero probability.

In case, you have a continuous cumulative probability distribution such that there is a derivative at each point not equal to zero. This means that every point as a different value than the other which means that every point contributes to the probability.
Now I know you can't assign a finite value because it will go to infinity and you can't assign zero because that would certainly mean that the derivative is zero.
However, They use: Zero almost surely...
Which means that an even can happen even if it has zero probability which is fine but why not say that it is an infinitesimal? (Hyperreal, is it possible?) and keep the notion of zero to impossible of it happening

Is it just zero to keep it in the real numbers or is it exactly zero?

Of course the area under the curve of a probability density function in a single point is zero that doesn't mean that it has probability zero
 
Mathematics news on Phys.org
The probability of a continuous distribution having an exact pre-specified value (like 2.100000000...) is zero. The only thing with a positive probability is a range (e.g. from 2.0 to 2.02) or a large set of values (e.g. irrational numbers). Of course, any particular result DOES have an exact value but that exact value had 0 probability and will not happen again.
 
FactChecker said:
The probability of a continuous distribution having an exact pre-specified value (like 2.100000000...) is zero. The only thing with a positive probability is a range (e.g. from 2.0 to 2.02) or a large set of values (e.g. irrational numbers). Of course, any particular result DOES have an exact value but that exact value had 0 probability and will not happen again.
Sorry I didn't understand the last part.

It is that if every probability at an exact value is zero and that the cumulative probability is continuous on R then how can any point differ than another?
How can I interpret the curve?
 
Biker said:
then how can any point differ than another?
They don't. Every point has the same probability.
 
As I understand the concern, it is that the probability of obtaining any specific result value is zero. So one would expect that the value of the cumulative probability immediately before that value to be the same as the value of the cumulative probability immediately after that value. And so on -- so that one would expect the cumulative PDF to necessarily be a constant function.

One difficulty with that reasoning is that there is no such thing as a point either immediately before or immediately after another on the real line. For every pair of distinct points, there is a non-zero interval between them. The probability density function integrated over a non-zero interval can have a strictly positive result -- the probability of obtaining a result in that interval can be non-zero.

Another difficulty is with the notion of transitivity being applied over an uncountable set. Ordinary mathematical induction cannot extend it that far.
 
  • Like
Likes FactChecker and Biker
Biker said:
Sorry I didn't understand the last part.

It is that if every probability at an exact value is zero and that the cumulative probability is continuous on R then how can any point differ than another?
How can I interpret the curve?
A continuous PDF does not directly give a "probability" for a single value. You are trying to use the PDF in a way that it can not be used. For one thing, the PDF at a point can be much greater than 1, so clearly it is not a probability. The probabilities are only defined as the integral of the PDF over a measurable set of values. You can get the probability as the limit of integrals that narrow down to a single point, but that is not the same as the value of the PDF at that point. No matter how large the PDF is at a point, when the integral narrows down to that point the integral goes to 0.
 
Biker said:
Of course the area under the curve of a probability density function in a single point is zero that doesn't mean that it has probability zero
Yes it does. Suppose I tell you that I did an experiment with a continuous PDF and got EXACTLY X=2.11111111111111... How likely would you say that was? An infinite number of '1's? I would say that the likelihood was 0. Of course, the result of the experiment was some EXACT number, so things do happen all the time where the pre-experiment likelihood was 0.
 
jbriggs444 said:
As I understand the concern, it is that the probability of obtaining any specific result value is zero. So one would expect that the value of the cumulative probability immediately before that value to be the same as the value of the cumulative probability immediately after that value. And so on -- so that one would expect the cumulative PDF to necessarily be a constant function.

One difficulty with that reasoning is that there is no such thing as a point either immediately before or immediately after another on the real line. For every pair of distinct points, there is a non-zero interval between them. The probability density function integrated over a non-zero interval can have a strictly positive result -- the probability of obtaining a result in that interval can be non-zero.

Another difficulty is with the notion of transitivity being applied over an uncountable set. Ordinary mathematical induction cannot extend it that far.

That is exactly what I meant. If you choose a particular value then every point after it should be the same. The thing that lead me to this was that I was trying to make a cdf. You take a bunch of data then you approximate but this approximation with a continuous function strictly says (if the derivative is nonZero )that every point has a different value which made this contradiction. How can I correctly interpret this?

It is similar to the problem of a line which bison made out of zero width points

Mfb, Could you please elaborate?

And thank you factchecker
 
Last edited:
You need to be careful here. I think you are implying that a zero probability of X=x0 means that the PDF is 0 at x0. That is not true. The PDF is NOT a probability. The PDF is the slope of the CDF. If the PDF, f(x) at the point x0 is zero, then the slope of the CDF at that point is zero. For a continuous random variable, the probability of any exact single value is zero no matter what the value of the PDF is.
 
  • #10
FactChecker said:
You need to be careful here. I think you are implying that a zero probability of X=x0 means that the PDF is 0 at x0. That is not true. The PDF is NOT a probability. The PDF is the slope of the CDF. If the PDF, f(x) at the point x0 is zero, then the slope of the CDF at that point is zero. For a continuous random variable, the probability of any exact single value is zero no matter what the value of the PDF is.
I am not. I know what each one presents.
I am talking about the cumulative distribution as if it was a sum of points because it is continuous everywhere it. If you some how say that the exact probability is zero then it follows the the slope must be equal to zero of the Cdf. But no, any interval has some probability. So the probability must not be zero and not finite. Jbriggs explained what I meant above.
It is a matter of how can I interpret a continuous cdf while probability of exact value is zero.
The same thing can be applied to a line, say you take out a point does that make difference to the length?
The whole problem is about the sum of zeros can result in a finite number

And that I want to know how you can interpret a continuous cdf

PS I know I can't talk about points in continuous distribution but it just a contradiction that you some how have continuous cdf but zero probability for each value
 
  • #11
Biker said:
I am not. I know what each one presents.
I am talking about the cumulative distribution as if it was a sum of points because it is continuous everywhere it. If you some how say that the exact probability is zero then it follows the the slope must be equal to zero of the Cdf.
Not true. The slope does not have to be 0. That is all I can say.
But no, any interval has some probability. So the probability must not be zero and not finite. Jbriggs explained what I meant above.
It is a matter of how can I interpret a continuous cdf while probability of exact value is zero.
The same thing can be applied to a line, say you take out a point does that make difference to the length?
The whole problem is about the sum of zeros can result in a finite number

And that I want to know how you can interpret a continuous cdf
The interpretation of a continuous CDF is P( X∈[x1, x2] ) = CDF(x2) - CDF(x1) for x1 ≤ x2. So P( X=x1) = P( X∈[x1, x1] ) = CDF(x1) - CDF(x1) = 0, for any continuous CDF with any slope.
 
Last edited:
  • #12
Biker said:
I just have a couple of questions about how it can be zero probability.

It should be emphasized that your questions concern the application of probabilty theory, not the mathematical theory of probability because your are asking about physical events - whether something actually happens. Mathematical probability theory does not define whether an event that has been assigned a probability will (or can) actually happen. It does not even have an axiom that says it is possible to take random samples from a distribution- in the sense of forcing a random variable to "actually" take on a particular value. The assumption that we can do random sampling is an assumption that is done when making applications of probability theory to particular problems.

Mathematical probability theory only assumes that there is a "measure space" in which events can be assigned probabilities. It doesn't comment on whether these events actually happen.

For example, if you are measuring peoples' weights, you may choose to assume that your are sampling from a continuous distribution such as lognormal distribution. You will only be able to measure a person's weight to a finite precision, so you can't experimentally prove or disprove that people's weights have a continuous distribution. If you assume peoples' weights have a lognormal distribution and treat your measurement of someone's weight as being exactly the correct weight then you do have the awkward situation that an event with zero probability has happened. However, that awkwardness can't be resolved by mathematical probability theory - the awkwardness involves the desire of people who apply probability theory to interpret "zero probability" as meaning "can't actually happen".

The mathematical problem of assigning continuous probabilities is similar to the problem of assigning a mass to part of an object that has a continuously varying density. We use a mass density function to describe the varying density in the object, but "at a point" in the object, there is no mass. So we may have a mass density of 120 lbs per cubic inch at a point, but we don't say the point itself has any mass.

People working problems about mass densities don't face the task of taking a sample of an object consisting of a mathematical point and putting it on a slide to examine under a microscope. In contrast, people who apply probability theory often use language that suggests they (or Nature) is accomplishing the feat of having an event with zero probability actually happen. Some people may take that idea seriously and others may regard it as merely a convenient fiction that approximates the way things work. Mathematical probability theory doesn't comment on the issue.
 
  • Like
Likes Biker
  • #13
Biker said:
However, They use: Zero almost surely...
Which means that an even can happen even if it has zero probability which is fine but why not say that it is an infinitesimal? (Hyperreal, is it possible?) and keep the notion of zero to impossible of it happening

Is it just zero to keep it in the real numbers or is it exactly zero?

Of course the area under the curve of a probability density function in a single point is zero that doesn't mean that it has probability zero

To answer your question -- modern probability theory comes from Kolmogorov, using measure theory. There are other rigorous formulations -- e.g. Nelson's Radically Elementary Probability Theory which does use nonstandard analysis -- the preface humorously suggests that even high school students can understand the book (it is not an easy read). You can read it here: https://web.math.princeton.edu/~nelson/books/rept.pdf

For the most part you will not see people use infinitesimals in standard probability theory -- except when they 'tap out' and feel like it is the needed to convey what they want to say. (In particular I've seen a lot of people start talking about infinitesimal generators of continuous Markov chains, but otherwise they won't use infinitesimals.)

Once you get used to the idea that zero probability events can happen (but not zero density ones), I suspect you'll be ok with the terminology. It took me a while.
- - - -
Sometimes coming at this from a different angle is illuminating. Here's a very relevant problem from Fifty Challenging Problems in Probability

>> What is the probability that the quadratic equation: ##x^2 + 2bx + c = 0## has real roots?

note your domain is reals, b and c are independently sampled from ##(-\infty,\infty)##, though it might be helpful to think of them coming from ##(-3, 3)## or ##(-n, n)## and to then consider the limiting behavior.

I used to not like this "zero probability but still possible" stuff -- but eventually I got over it, and this problem helped that process move along.
 
  • #14
One idea, apart from probability, which might clarify things. The unit interval has length 1. Each point on the interval has length 0. Add the points together to get the unit interval, but you cannot add up lengths.
 
  • #15
mathman said:
One idea, apart from probability, which might clarify things. The unit interval has length 1. Each point on the interval has length 0. Add the points together to get the unit interval, but you cannot add up lengths.
And then we end up in measure theory...
 
  • #16
The foundation of measure theory and the foundation of probability are quite similar.
 
  • #17
If the probability of something is less than then the probability of the Universe existing, that will do for me as a definition of 'Zero'
 
  • #18
StoneTemplePython said:
- - - -
Sometimes coming at this from a different angle is illuminating. Here's a very relevant problem from Fifty Challenging Problems in Probability

>> What is the probability that the quadratic equation: ##x^2 + 2bx + c = 0## has real roots?

note your domain is reals, b and c are independently sampled from ##(-\infty,\infty)##, though it might be helpful to think of them coming from ##(-3, 3)## or ##(-n, n)## and to then consider the limiting behavior.

I used to not like this "zero probability but still possible" stuff -- but eventually I got over it, and this problem helped that process move along.

I guess your argument is that if I am asked to produce a quadratic equation of that form and I go for, say, ##b = 5, c=1##, i.e.:

##x^2 + 10x + 1##

Then, as this quadratic has real roots, something with 0 probability has actually happened?

Not to mention that my picking two integers for the coefficients also had 0 probability!
 
  • #19
The probability is zero because it is meant to be impossible to get an event occur unless you have infinitely many tries of a continuous stochastic process.

A continuous stochastic process has infinitely many values in its state space and you will never actually realize that state with a normal continuous PDF stochastic process [like with a Normal distribution or some other analytic PDF].

However - you can have processes that have many values that can be realized and the way that this is studied is through what is called pure probability.

If you want to understand this more, then you will have to look at graduate statistics which includes measure theory, analysis, and probability related subjects like sigma algebras and stochastic processes.

I'd wait until you get to that if you are in undergraduate, but those subjects will help answer your questions in regard to your original post.
 
  • Like
Likes EnumaElish and Biker
  • #20
Throw a dart at a number line to get a number between 0 and 1. The PDF is 1 on [0,1] and 0 otherwise. The CDF is y=x, 0≤x≤1.
You know that some number must occur from your experiment. Suppose the number is 0.1592653589793238462643383279502884197169399375105820974944592307816406286 208998628034825342117067982148086513282306647093844609550582231725359408128481 117450284102701938521105559644622948954930381964428810975665933446128475648233 786783165271201909145648566923460348610454326648213393607260249141273724587006 606315588174881520920962829254091715364367892590360011330530548820466521384146 951941511609433057270365759591953092186117381932611793105118548074462379962749 567351885752724891227938183011949129833673362440656643086021394946395224737190 702179860943702770539217176293176752384674818467669405132000568127145263560827 785771342757789609173637178721468440901224953430146549585371050792279689258923 542019956112129021960864034418159813629774771309960518707211349999998372978049 951059731732816096318595024459455346908302642522308253344685035261931188171010 003137838752886587533208381420617177669147303598253490428755468731159562863882 353787593751957781857780532171226806613001927876611195909216420198938095257201 065485863278865936153381827968230301952035301852968995773622599413891249721775 283479131515574857242454150695950829533116861727855889075098381754637464939319 255060400927701671139009848824012858361603563707660104710181942955596198946767 837449448255379774726847104047534646208046684259069491293313677028989152104752 162056966024058038150193511253382430035587640247496473263914199272604269922796...

I can add another thousand digits, and millions after that.
What were the odds of that number being the result? Some number must occur, but the pre-experiment probability of that exact number occurring was 0. And that exact number will never happen again in a billion billion tries.
 
  • #21
FactChecker said:
I can add another thousand digits, and millions after that.
What were the odds of that number being the result? Some number must occur, but the pre-experiment probability of that exact number occurring was 0. And that exact number will never happen again in a billion billion tries.

In any such experiment, only a finite number of results is possible.
 
  • #22
PeroK said:
In any such experiment, only a finite number of results is possible.
Really? What is that finite number?
EDIT: I am trying to illustrate a mathematical concept with a easily understood physical example. Some abstraction must be applied.
 
  • #23
FactChecker said:
Really? What is that finite number?

It depends on the experiment.

Given any number, you could devise an experiment with more than that number of possible results. But, the number of results of any experiment is finite.
 
  • #24
PeroK said:
I guess your argument is that if I am asked to produce a quadratic equation of that form and I go for, say, ##b = 5, c=1##, i.e.:

##x^2 + 10x + 1##

Then, as this quadratic has real roots, something with 0 probability has actually happened?
I believe that you are misinterpreting the idea being presented. We are asked to pick a, b and c from a uniform distribution over an interval centered on zero. We are asked to assess the a priori probability that the resulting polynomial has real roots. That is to say, we are asked for the probability that the discriminant, ##b^2 - 4ac## will be greater than zero.

In order to avoid concerns with the impossibility of a uniform distribution over the real numbers we are asked to take the limit of this probability as the length of the interval increases without bound.

A key observation is that the computed probability is independent of the size of the interval. So evaluating the limit is trivial. Just evaluate the result for an interval of one's choosing. It is clear that the probability is neither zero nor one. [And, accordingly, has precious little to do with the subject matter of this thread].

Not to mention that my picking two integers for the coefficients also had 0 probability!
It is fairly clear that you did not pick those coefficients at random using a continuous PDF.
 
  • #25
PeroK said:
It depends on the experiment.

Given any number, you could devise an experiment with more than that number of possible results. But, the number of results of any experiment is finite.
Maybe in the physical world. But the mathematical concepts are not limited to that. Some abstract thought must be applied to answer the OP.
 
  • #26
jbriggs444 said:
I believe that you are misinterpreting the idea being presented. We are asked to pick a, b and c from a uniform distribution over an interval centered on zero. We are asked to assess the a priori probability that the resulting polynomial has real roots. That is to say, we are asked for the probability that the discriminant, ##b^2 - 4ac## will be greater than zero.

In order to avoid concerns with the impossibility of a uniform distribution over the real numbers we are asked to take the limit of this probability as the length of the interval increases without bound.

A key observation is that the computed probability is independent of the size of the interval. So evaluating the limit is trivial. Just evaluate the result for an interval of one's choosing. It is clear that the probability is neither zero nor one. [And, accordingly, has precious little to do with the subject matter of this thread].It is fairly clear that you did not pick those coefficients at random using a continuous PDF.

Exactly. Will post a fuller analysis of the issue.
 
  • #27
FactChecker said:
Maybe in the physical world. But the mathematical concepts are not limited to that. Some abstract thought must be applied to answer the OP.

Yes, but then you cannot use the mathematical result to make a claim about the physical world, unless you can apply that mathematics to the physical world.
 
  • #28
PeroK said:
Yes, but then you cannot use the mathematical result to make a claim about the physical world, unless you can apply that mathematics to the physical world.
The concept should be understood before it can be applied to anything, even as an approximation. The OP did not mention any particular application. The mathematical concept is trivial and the misconception of the OP was basic. Time to move on.
 
  • #29
StoneTemplePython said:
Sometimes coming at this from a different angle is illuminating. Here's a very relevant problem from Fifty Challenging Problems in Probability

>> What is the probability that the quadratic equation: ##x^2 + 2bx + c = 0## has real roots?

note your domain is reals, b and c are independently sampled from ##(-\infty,\infty)##, though it might be helpful to think of them coming from ##(-3, 3)## or ##(-n, n)## and to then consider the limiting behavior.

I used to not like this "zero probability but still possible" stuff -- but eventually I got over it, and this problem helped that process move along.

Here is a better an fuller analysis of the problem with this statement:

1) How can you generate a quadratic equation at random?

You can have a finite number of discrete options for each coefficient and a uniform distribution.

You can have a finite interval of options and a uniform (density) distribution on that interval.

But:

You cannot have an infinite number of discrete options and a uniform distribution.

You cannot have an infinite interval and a uniform density.

2) Given any interval ##[-n, n]## and a uniform density distribution, you can calculate the probability that the quadratic equation having real roots.

3) The limit as ##n \rightarrow \infty## equals 0.

But, there is no probability distribution represented by this limit. In other words, the limit of a sequence of pdf's is not necessarily a pdf.

Therefore, there is no mathematical or physical process by which you can choose any real coefficients at random, uniformly distributed.

In order to allow any real coefficients to be chosen, you must abandon the uniform distribution and then the probability of the quadratic having real roots depends on the distribution.
 
  • Like
Likes jbriggs444
  • #30
PeroK said:
I guess your argument is that if I am asked to produce a quadratic equation of that form and I go for, say, ##b = 5, c=1##, i.e.:

##x^2 + 10x + 1##

Then, as this quadratic has real roots, something with 0 probability has actually happened?

Not to mention that my picking two integers for the coefficients also had 0 probability!

I was actually suggesting the opposite of this. If you work through the math, as n grows large, you actually get real roots with probability one (i.e. the probability of complex root is zero). Yet we all know that complex roots are possible over real quadratic equations -- it is where quite a few of us first ran into complex numbers, I think.

- - - -
For avoidance of doubt, there is no ##a## term in ##x^2 + 2bx + c = 0## (or put differently, ##a## is fixed at one).
 
  • #31
StoneTemplePython said:
I was actually suggesting the opposite of this. If you work through the math, as n grows large, you actually get real roots with probability one (i.e. the probability of complex root is zero). Yet we all know that complex roots are possible over real quadratic equations -- it is where quite a few of us first ran into complex numbers, I think.

- - - -
For avoidance of doubt, there is no ##a## term in ##x^2 + 2bx + c = 0## (or put differently, ##a## is fixed at one).

Yes, you're right, it's the probablity of having complex roots that tends to ##0##. But, the point remains, the limit where this probability is ##0## is not represented by any pdf for the coefficients. Your limiting pdf is identically zero

Nothing with probablity 0 ever happens, by definition.
 
  • #32
PeroK said:
Yes, you're right, it's the probablity of having complex roots that tends to ##0##. But, the point remains, the limit where this probability is ##0## is not represented by any pdf for the coefficients. Your limiting pdf is identically zero

(This may have been mentioned elsewhere in the thread) The distribution works just fine as an improper prior and could be useful as the starting point in a bayesian inference problem. Yes -- without any (satisfactory) likelihood function applied to it, we can't get a satisfactory normalizing constant over the full infinite interval.

If you think of the limiting process, it gives you some interesting asymptotic information : this tells us: as n grows larger and larger, the probability of a complex root get vanishingly close to zero. (You can always look at the rate of this convergence for a simple problem like this for more info.)

Puzzles can sometimes help people build insight -- if they don't help, then no need to use them.

PeroK said:
Nothing with probablity 0 ever happens, by definition.

The whole point of this thread is the exact opposite of this statement. There are well understood and accepted definitions in probability theory that do not agree with this statement. When people come up with their own private definitions this is not helpful. Perhaps you mean that nothing with density 0 ever happens?
 
  • #33
StoneTemplePython said:
(This may have been mentioned elsewhere in the thread) The distribution works just fine as an improper prior and could be useful as the starting point in a bayesian inference problem. Yes -- without any (satisfactory) likelihood function applied to it, we can't get a satisfactory normalizing constant over the full infinite interval.

If you think of the limiting process, it gives you some interesting asymptotic information : this tells us: as n grows larger and larger, the probability of a complex root get vanishingly close to zero. (You can always look at the rate of this convergence for a simple problem like this for more info.)

Puzzles can sometimes help people build insight -- if they don't help, then no need to use them.
The whole point of this thread is the exact opposite of this statement. There are well understood and accepted definitions in probability theory that do not agree with this statement. When people come up with their own private definitions this is not helpful. Perhaps you mean that nothing with density 0 ever happens?

An event with a probablity of 0 cannot happen, by definition. Something in mathematics doesn't "happen". If I say, for example:

Let ##f(x) = sin(x)##, then nothing has happened. It doesn't mean that we actually have an infinite line anywhere with an infinite sine function.

But, also, your logic of assuming that a property shared by the elements of a sequence must be present in the limit is false. And this false logic leads to paradoxes like the one you have quoted.

Each pdf that represents a uniform distribution on ##[-n, n]## is indeed a pdf. But, the limit of this sequence of pdf's is not itself a pdf: it is the zero function. That's where your paradox about quadratics comes from and why some quadratics have real roots and some have complex roots. It's not because things of zero probability actually happen.
 
  • #34
PeroK said:
Nothing with probablity 0 ever happens, by definition.

No definition of probability in terms of measure theory deals with whether events happen. If there is a definition saying an event with probability zero never happens, it isn't a definition from mathematical probability theory.
 
  • #35
Stephen Tashi said:
No definition of probability in terms of measure theory deals with whether events happen. If there is a definition saying an event with probability zero never happens, it isn't a definition from mathematical probability theory.

How would you make something happen that has 0 probability?
 
  • #36
Stephen Tashi said:
No definition of probability in terms of measure theory deals with whether events happen. If there is a definition saying an event with probability zero never happens, it isn't a definition from mathematical probability theory.

And, do you agree that:

If you choose two real numbers, ##a, b## at random from a uniform distribution on ##\mathbb{R}##, then the probability that the quadratic equation ##x^2 + 2bx + c = 0## has complex roots is 0?
 
  • #37
PeroK said:
How would you make something happen that has 0 probability?

How to do that is irrelevant to the content of mathematical probability theory. That's my point.

Mathematical probability simply doesn't deal with the happening or non-happening of events. It merely says that a collection of events can be assigned a number called a "probability". The question of whether an event happens given that has been assigned a probability (be that probability zero, or one, or some other number - be it assigned by a probability density or by some other form of "measure") is a question of physics or of whatever science governs the particular problem being studied. The happening or non-happening of events involves the interpretation of how probability serves as a model for some physical phenomena.
 
  • #38
PeroK said:
And, do you agree that:

If you choose two real numbers, ##a, b## at random from a uniform distribution on ##\mathbb{R}##,

What specific function would be a "uniform distribution on ##\mathbb{R}##"?
 
  • #39
Stephen Tashi said:
What specific function would be a "uniform distribution on ##\mathbb{R}##"?

The one implied by this post, which started the whole debate:
StoneTemplePython said:
Sometimes coming at this from a different angle is illuminating. Here's a very relevant problem from Fifty Challenging Problems in Probability

>> What is the probability that the quadratic equation: x2+2bx+c=0x^2 + 2bx + c = 0 has real roots?

note your domain is reals, b and c are independently sampled from (−∞,∞)(-\infty,\infty), though it might be helpful to think of them coming from (−3,3)(-3, 3) or (−n,n)(-n, n) and to then consider the limiting behavior.

I used to not like this "zero probability but still possible" stuff -- but eventually I got over it, and this problem helped that process move along.
 
  • #40
PeroK said:
The one implied by this post, which started the whole debate:

I don't see any specific function mentioned in the passage you quoted.

The passage you quoted attempts to define a uniform distribution on ##\mathbb{R}## as a limit of a sequence uniform distributions on ##[-n,n]## as ##n \rightarrow \infty##. However a function that is the limit of a sequence of distribution functions is not necessarily a distribution function itself.
 
  • #41
Stephen Tashi said:
I don't see any specific function mentioned in the passage you quoted.

The passage you quoted attempts to define a uniform distribution on ##\mathbb{R}## as a limit of a sequence uniform distributions on ##[-n,n]## as ##n \rightarrow \infty##. However a function that is the limit of a sequence of distribution functions is not necessarily a distribution function itself.

That doesn't answer the question about whether it's possible for a quadratic equation to have complex roots, even if the probability that it has complex roots is 0!

I'm certainly willing to retract my statement about "something with probability 0 cannot happen, by definition", as meaningless because it mixes a strictly mathematical concept (probability) with a physical concept (something happening).

I would replace it by:

In any application of probability theory, we should not apply the theory in such a way that events that are physically possible are mapped to sets with zero probability (measure).
 
  • #42
PeroK said:
In any application of probability theory, we should not apply the theory in such a way that events that are physically possible are mapped to sets with zero probability (measure).

That's uncontroversial in many applications. However, physics uses continuous distributions and the probability of Nature realizing a specific value from a continuous distribution like a gaussian is zero. So, are physically possible events mapped to sets with probability zero? Or are continuous distributions an incorrect model for what is physically possible? (I don't know.)

Probability theory doesn't even have any axiom that says we can take samples of a random variable. This thread is discusses the distinction between being able to "realize" a set as the value of a random sample versus being able to assign a set a probability: https://www.physicsforums.com/threads/relative-frequency-and-nonmeasurable-sets.909705/
 
  • #43
PeroK said:
In any application of probability theory, we should not apply the theory in such a way that events that are physically possible are mapped to sets with zero probability (measure).

Among other things, your suggestion would have unfortunate geometric implications. Consider the case of generating two 2-d vectors with each of the (4 total) Cartesian coordinates generated independently, uniformly at random over ##[-1,1]##. Can these two vectors be linearly dependent? Yes, of course. What is the probability of that? Zero. And how much area / 'volume' is enclosed by two linearly dependent vectors? Zero -- i.e. determinant = 0.

This experiment and it's results generalize to a (finite) n-dimensional vector space with ##\leq n## vectors in them. (Of course if you had n + 1 vectors in an n dimensional space... the results would be different, with probability one. It is also worth noting that that probability of generating the zero vector during one of these experiments is zero, even though it is technically possible.)

There is considerably more thought behind these things -- defective random variables certainly come to mind -- but that's probably about as much as I have to say on it.

If you don't like the problem from Mosteller (or to be technically accurate -- my perhaps overly quick re-statement of it), it's a minor point. Trying to re-define the meaning of zero probability on the other hand is quite an undertaking. You may like the nonstandard view from Nelson.
 
  • #44
StoneTemplePython said:
To answer your question -- modern probability theory comes from Kolmogorov, using measure theory. There are other rigorous formulations -- e.g. Nelson's Radically Elementary Probability Theory which does use nonstandard analysis -- the preface humorously suggests that even high school students can understand the book (it is not an easy read). You can read it here: https://web.math.princeton.edu/~nelson/books/rept.pdf

For the most part you will not see people use infinitesimals in standard probability theory -- except when they 'tap out' and feel like it is the needed to convey what they want to say. (In particular I've seen a lot of people start talking about infinitesimal generators of continuous Markov chains, but otherwise they won't use infinitesimals.)

Once you get used to the idea that zero probability events can happen (but not zero density ones), I suspect you'll be ok with the terminology. It took me a while.
- - - -
Sometimes coming at this from a different angle is illuminating. Here's a very relevant problem from Fifty Challenging Problems in Probability

>> What is the probability that the quadratic equation: ##x^2 + 2bx + c = 0## has real roots?

note your domain is reals, b and c are independently sampled from ##(-\infty,\infty)##, though it might be helpful to think of them coming from ##(-3, 3)## or ##(-n, n)## and to then consider the limiting behavior.

I used to not like this "zero probability but still possible" stuff -- but eventually I got over it, and this problem helped that process move along.
Discriminant =b^2-c
Discriminant > 0 equivalent to real roots.
Case1: c<0 -> real roots.
Case2: Choose b and c uniformly from some large positive interval (0,N) Prob(discr. <0) ~ 1/N. Limiting case prob. both roots real =1.
 
  • #45
StoneTemplePython said:
Among other things, your suggestion would have unfortunate geometric implications. Consider the case of generating two 2-d vectors with each of the (4 total) Cartesian coordinates generated independently, uniformly at random over ##[-1,1]##.

Selecting a single value from a uniform distribution over [-1,1] involves the happening of an event that has probability zero. So we have a problem right there.

The other thread I mentioned poses the following apparent paradox. Suppose it is possible to select numbers independently at random from a uniform distribution on [0,1]. Let S be a (well defined) subset of [0,1] that is not measurable. For each number we select , we determine if that number is in S or not. We keep a tally of the fraction of numbers that are selected that are in S. What would such empirical data look like? If its tends to approach a limiting frequency then that would empirically suggest that a probability can be assigned to the event S. If it does not approach a limiting frequency, then what does the data look like? Does it oscillate? How can it "know" to oscillate if we are taking independent samples?
 
  • #46
mathman said:
Discriminant =b^2-c
Discriminant > 0 equivalent to real roots.
Case1: c<0 -> real roots.
Case2: Choose b and c uniformly from some large positive interval (0,N) Prob(discr. <0) ~ 1/N. Limiting case prob. both roots real =1.

This is the gist of it. I thought it was a fun little problem that gives people a flavor of: probability zero can happen. It also has a nice little picture associated with it -- if people like drawing squares and parabolas.

Stephen Tashi said:
Selecting a single value from a uniform distribution over [-1,1] involves the happening of an event that has probability zero. So we have a problem right there.

I don't really see this as a problem to be honest. It is the generator of the correct (standard) answer for OP. The fact that when generating n, n-dimensional random vectors, with probability one they will be linearly independent is... of interest. (Of course when I do this with finite precision floats on my computer we're just talking about a very very small probability of linear dependence that is quite close to zero.)

With respect to the rest of your posting... it seems like a giant rabbit hole.
 
  • #47
StoneTemplePython said:
I don't really see this as a problem to be honest.
It's a problem in a practical sense, since it can't be done by any known method.

It is the generator of the correct (standard) answer for OP.
The original post concerns two questions. The first is the mathematical question of how a set of points can be assigned a nonzero length or "measure" when each individual point has zero measure. The second is whether an event with zero probability can actually happen - which is not a mathematical question. I agree that there is a standard mathematical answer to the first question.
 
  • #48
Regarding the issue of assigning a 0 probability to a result that physically may happen (or has already happened):

So much use of probability is addressing the question of what result we should guess for an event that has already happened, given the information that we have. Suppose I say that I have already flipped a coin and got a result. If that is all you know, you would legitimately give a 50/50 probability to heads and tails. But in reality, the coin has already been flipped, so in the physical world one result (say heads) has already happened and the other has not. If you don't have any hints about the result, you must keep the 50/50 odds.

Probability addresses the question of what odds we should assume for alternative results given the information we have. This is true whether the experiments have already been done or not. If the only information you are given is that a number has been (or will be) from a uniform distribution on [0,1], then you must say that the probability of any exact pre-assigned value is zero. If you know all kinds of details about how the number has been (or will be) selected, then you may be able to give positive values to particular exact values. But without that knowledge, you are left with P(X=x0) = 0. The experiment is done in the physical world and there are many details that may influence your guess, but if you don't have any of that information, the probability must remain 0.
 
  • #49
FactChecker said:
Probability addresses the question of what odds we should assume for alternative results given the information we have.

Attempting to define probability in terms of odds is a circular definition unless we define "odds" using some non-probabilistic concept. However, I agree that probability theory can be applied, as you say, in the Bayesian sense where we assign probabilities differently as different information becomes available.

If we are given the information that a number was selected from a uniform distribution on [0,1] (and we believe that information) then obviously we have accepted that it is possible to select a number from a uniform distribution on [0,1]. Given that information, we must (by the definition of uniform distribution and definition of how it assigns probability) use the uniform distribution to assign zero probability to that event. However, this hypothetical situation does not address whether it is actually possible to select a number from a uniform distribution on [0,1]. As I said before, the question of whether it is possible by any physically realizable process to select a number from a uniform distribution on [0,1] is a question for physics.
 
  • Like
Likes FactChecker
  • #50
Stephen Tashi said:
Attempting to define probability in terms of odds is a circular definition unless we define "odds" using some non-probabilistic concept. However, I agree that probability theory can be applied, as you say, in the Bayesian sense where we assign probabilities differently as different information becomes available.

If we are given the information that a number was selected from a uniform distribution on [0,1] (and we believe that information) then obviously we have accepted that it is possible to select a number from a uniform distribution on [0,1]. Given that information, we must (by the definition of uniform distribution and definition of how it assigns probability) use the uniform distribution to assign zero probability to that event. However, this hypothetical situation does not address whether it is actually possible to select a number from a uniform distribution on [0,1]. As I said before, the question of whether it is possible by any physically realizable process to select a number from a uniform distribution on [0,1] is a question for physics.
I agree. My point is that without information about the physical process, the true fact is that our knowledge has not improved and the probabilities are unchanged. For all we know, the method used to select the random number have itself been randomly selected out of a multitude of methods. I believe it is legitimate to talk about a continuous CDF with zero probability at every single point, even for a physical, non-theoretical case. Whether that is physically possible seems to be a "religious" issue.
 
  • Like
Likes StoneTemplePython
Back
Top