# How improbable is impossible?

• I

## Summary:

Realizations of very extreme outcomes from distributions with finite variance and infinite support.

## Main Question or Discussion Point

Distributions with finite variance and infinite support suggest non-zero, but negligible probability of very extreme outcomes. But how small is negligible, and how improbable is actually impossible?

For example, the adult male height in the US can be roughly characterized as N~(175, 9) in centimeters.  The tallest man within the last century was over 250cm, and the shortest male adult from the same period was only 55cm tall, the length of a large healthy newborn. The probability of exceeding each in their respective tail-ward direction, according to the distribution assumption, is

P(h>250) = 3e-19,
P(h<55) = 1e-37.

These numbers are in any conventional sense negligible, yet they both occurred within such a small span of human history, and other cases likely existed in the past. 

Even if we characterize the probability of at least one such outcome realizing in repeated individual trials based on the total number of men born during this period (about 6e9 since 1990, generously including a significant number of boys who did not survive into adulthood), we still have the very improbable

P(at least one > 250) = 2e-9
P(at least one <55) = 6e-28

If we model number of hurricanes in a year in the Atlantic as Poisson(6), then the comparable number of hurricanes to the probabilities in order of such infinitesimitude would be :

P(25) = 4e-9
P(39) = 3e-19
P(49) = 5e-28
P(59) = 1e-37

The maximum number of hurricanes in the past 50 years was 15, at P(15)=9e-2, or 7 orders of magnitude than the chance of 25. At the same probability that a 250cm giant is born among us in a century, we would never expect 25 hurricanes in a year (at least not yet), let alone 40, 50, and 60.

So for hurricanes, one in a 250 million is impossible (without climate change, at least). For human beings, as numerous as we are, one in a billion is a reasonable expectation to occur, but one in a billion? One in a quintillion? Even one in a undecillion is apparently possible.

There are about 1e21 stars in the universe, so these probabilities are tiny even compared to the inverse of a universal scale. Maybe it can be put into the context of number of elementary particles in the universe, about 1e86.

But we would never expect a 5-meter tall man, not with 1e-86 probability, not even with 1e-1e100000 probability, both of which are allowed by such distributions.

So where is the line drawn?

(1) Granted it could be modeled more accurately as log-normal, but the generality of the discussion remains.
(2) Yes, the probability of any point outcome of a continuous distribution is zero yet they happen anyway, but let's not split this particular hair since any point outcome can be modeled as within a range.
(3) There's some small dispersion and the events are correlated, but the point is the same.

Related Set Theory, Logic, Probability, Statistics News on Phys.org
marcusl
Gold Member
There are formal “lines.” In probability theory the term “almost never” refers to an event that happens with probability zero but is not actually outlawed. It is differentiated from “Never” which also has P=0 because it is not actually possible.

• Klystron and FactChecker
Dale
Mentor
Summary: Realizations of very extreme outcomes from distributions with finite variance and infinite support.

The probability of exceeding each in their respective tail-ward direction, according to the distribution assumption, is

P(h>250) = 3e-19,
P(h<55) = 1e-37.
So would you say that the normal distribution mentioned above fits the tails well?

WWGD
Gold Member
2019 Award
I guess in a measure-theoretic sense, the measure of subsets of the Real line containing events several sigma from the mean have measure approaching $0$ the further away from the mean you get, but never quite equaling $0$. For a not-too-tight lower bound, by Chevychase( Sp?) inequality, the probability of an event $k \sigma$ from the mean $\mu$ is less than $1/ k^2$. But notice this last condition never "tightens" to $0$

Stephen Tashi
Summary: Realizations of very extreme outcomes from distributions with finite variance and infinite support.
Rigourous mathematical probability theory (based on "measure theory") does not deal with whether events are possible or impossible. Whether an event that is assigned a given probability is possible or impossible is a question of how probability theory is applied to a particular situation. By way of analogy, the theory of trigonometry does not deal with ladders leaning against walls. How to analyze a ladder leaning against a wall is question of applying theory to a particular situation. How well trigonometry applies to real ladders is a question for people who know real ladders. Likewise, how probability theory is interpreted when applied to people's heights is a question for experts who study people's heights.

WWGD
Gold Member
2019 Award
Maybe for insight it may help to consider extreme distributions like Cauchy , without a mean and infinite variance.

PeroK
Homework Helper
Gold Member
Summary: Realizations of very extreme outcomes from distributions with finite variance and infinite support.

Distributions with finite variance and infinite support suggest non-zero, but negligible probability of very extreme outcomes. But how small is negligible, and how improbable is actually impossible?

For example, the adult male height in the US can be roughly characterized as N~(175, 9) in centimeters.  The tallest man within the last century was over 250cm, and the shortest male adult from the same period was only 55cm tall, the length of a large healthy newborn. The probability of exceeding each in their respective tail-ward direction, according to the distribution assumption, is

P(h>250) = 3e-19,
P(h<55) = 1e-37.

These numbers are in any conventional sense negligible, yet they both occurred within such a small span of human history, and other cases likely existed in the past. 
One problem in analysing extremes is that your distribution may be influenced by other factors. These may be unlikely circumstances that do not actually fit the original distribution model. For example:

Let's take a sporting contest between a top professional and a club player. Tennis, say. You have a model for how likely the players are to win each point and it all works well. The club player wins 1 point in 10, say. Very rarely wins a game. And, effectively, never wins a set, let alone a 3-set match. Although, of course, statistically it could happen.

But, one day the professional injures herself and has to stop, and the club player wins the match by default. Or, maybe loses by default because she got stuck in traffic and missed the match.

The rare event in this case comes not from an outlier in the original distribution, but from a new factor that was not part of the original model: loss of the entire match through illness or injury. Which is unrelated to the original statistical analysis of winning points.

I can't speak about human growth from any specialist knowledge. But, from a data analysis point of view, you would need to look at the assumptions that led to a pure normal distribution. Outliers could be the result of some external factor that was not part of the original assumptions.

Another example is a machine that produces boxes of matches. It dispenses 50 matches at a time, normally distributed with a low variance. Almost always 48-52, say. Then, one day, a component in the machine breaks and thousands of matches come tumbling out!

• hutchphd and FactChecker
StoneTemplePython
Gold Member
2019 Award
For a not-too-tight lower bound, by Chevychase( Sp?) inequality, the probability of an event $k \sigma$ from the mean $\mu$ is less than $1/ k^2$. But notice this last condition never "tightens" to $0$
I think he came up with this bound while filming The Three Amigos.

Aside from attribution, I think the point of the thread is that if you pretend something to be well approximated by a gaussian and it isn't, well you get bad estimates. I'm not sure how human heights have 'infinite support'. I'd bound them to be less than 10 feet and certainly suggest humans cannot be taller than planet earth. Bounded distributions admit fairly easy concentration inequalities if OP wants something tighter than Chebyshev

• WWGD
WWGD
Gold Member
2019 Award
Maybe looking at it in terms of quality control, if the process in question is in control, most of the data will fall in a given range. The rest will fall out of it from randon variability which cannot, maybe by definition itself, be controlled.

WWGD
Gold Member
2019 Award
I think he came up with this bound while filming The Three Amigos.

Aside from attribution, I think the point of the thread is that if you pretend something to be well approximated by a gaussian and it isn't, well you get bad estimates. I'm not sure how human heights have 'infinite support'. I'd bound them to be less than 10 feet and certainly suggest humans cannot be taller than planet earth. Bounded distributions admit fairly easy concentration inequalities if OP wants something tighter than Chebyshev
Do you have examples?

gleem
This sort of question is discussed in the book"The Black Swan: The Impact of the highly improbable" by Nassim Nicholas Taleb. Taleb claims that (classical?) statistical inference is justified only when you know all the factors that influence a random outcome. This book criticizes the current use of statistics to predict future financial market events .

Author of the Ludic fallacy from Widipedia
The fallacy is a central argument in the book and a rebuttal of the predictive mathematical models used to predict the future – as well as an attack on the idea of applying naïve and simplified statistical models in complex domains. According to Taleb, statistics is applicable only in some domains, for instance casinos in which the odds are visible and defined. Taleb's argument centers on the idea that predictive models are based on platonified forms, gravitating towards mathematical purity and failing to take various aspects into account:[citation needed]

• It is impossible to be in possession of the entirety of available information.
• Small unknown variations in the data could have a huge impact. Taleb differentiates his idea from that of mathematical notions in chaos theory (e.g., the butterfly effect).
• Theories or models based on empirical data are claimed to be flawed as they may not be able to predict events which are previously unobserved, but have tremendous impact (e.g., the 9/11 terrorist attacks or the invention of the automobile), a.k.a. black swan theory.

• Klystron, WWGD and PeroK
PeroK
Homework Helper
Gold Member
If i might add a point to the above. A particularly dangerous case, when quoting extreme unlikelihood, is where there is the probability you are wrong!

For example, there was an infamous criminal case in the UK where a women was convicted on medical expert testimony that estimated the odds at 72 million to one against.

But, that failed to take into account the probability that the medical theory on which it was based was at least partially wrong. Which eventually was revealed to be the case.

In one sense, nothing like that is ever 72 million to one, as the finite probability of an error in the theory dominates.

• Klystron, gleem and Dale
WWGD
Gold Member
2019 Award
If i might add a point to the above. A particularly dangerous case, when quoting extreme unlikelihood, is where there is the probability you are wrong!

For example, there was an infamous criminal case in the UK where a women was convicted on medical expert testimony that estimated the odds at 72 million to one against.

But, that failed to take into account the probability that the medical theory on which it was based was at least partially wrong. Which eventually was revealed to be the case.

In one sense, nothing like that is ever 72 million to one, as the finite probability of an error in the theory dominates.
That was the baby-shaking case, right? Wasn't it an issue too of not using conditional probability correctly?

PeroK
Homework Helper
Gold Member
That was the baby-shaking case, right? Wasn't it an issue too of not using conditional probability correctly?
Mutiple cot deaths in the same family. The quoted probability depends on there being no possible genetic or environmental link. At least three women, for whom there was no other evidence they would harm their children, were convicted before the medical testimony was questioned.

WWGD
Gold Member
2019 Award
So it seems there is an issue too of independence, right, baby deaths were assumed to be independent of each other ( ignoring the enviromental link)?

PeroK
Homework Helper
Gold Member
So it seems there is an issue too of independence, right, baby deaths were assumed to be independent of each other ( ignoring the enviromental link)?
Absolutely, these odds assume no possible correlation.

The final statistical irony, of course, is that the odds of a given woman murdering her two or three children is also extremely remote.

Some basic probability analysis leads to two unlikely scenarios. Two murders or two cot deaths. The conditional probability of murder given two deaths is then estimated at less than 50%.

In which case you must look for other evidence of murder.

Tragic.

• WWGD
StoneTemplePython
Gold Member
2019 Award
Do you have examples?
the simplest is when we have independence and use e.g. Chernoff bounds. A more sophisticated but still relatively straightforward one is Azuma-Hoeffding.

Vershynin's book is chock full of them. Chapter 2 is somewhat straightforward and a nice introduction to why one may prefer various concentration inequalities over normal approximations. (It quickly gets more difficult from there.)

https://www.math.uci.edu/~rvershyn/papers/HDP-book/HDP-book.pdf

• WWGD
Rigourous mathematical probability theory (based on "measure theory") does not deal with whether events are possible or impossible. Whether an event that is assigned a given probability is possible or impossible is a question of how probability theory is applied to a particular situation. By way of analogy, the theory of trigonometry does not deal with ladders leaning against walls. How to analyze a ladder leaning against a wall is question of applying theory to a particular situation. How well trigonometry applies to real ladders is a question for people who know real ladders. Likewise, how probability theory is interpreted when applied to people's heights is a question for experts who study people's heights.
This cannot be emphasized enough. Probability theory is not a descriptive theory which conceptually explains probabilities - like e.g. theories of mechanics being descriptive theories which explain motion using a method e.g. the calculus.

In fact, the name probability theory itself is deceiving: probability theory is not a theory, it is a calculus; a more appropriate name would have been probability calculus, and in fact this was the name for quite a while!

Instead of being a descriptive theory, probability theory is - just like the calculus - a method for calculating certain kinds of numbers which can be interpreted as being probabilities if one is willing to make enough assumptions such that the method can produce such numbers.

All of the above applies to practically all known mathematical formulations of probability theory; this is precisely why even a rigorous probability theory - such as Kolmogorov probability theory - is in fact far more barren in a explanatory theoretical sense than many naive utilizers of probability theory tend to presume.

• sysprog, marcusl and Klystron
WWGD
Gold Member
2019 Award
This cannot be emphasized enough.

Probability theory is not a descriptive theory which conceptually explains probabilities - like e.g. theories of mechanics being descriptive theories which explain motion using a method e.g. the calculus. In fact, the name probability theory itself is deceiving: probability theory is not a theory, it is a calculus; a more appropriate name would have been probability calculus.

Instead of being a descriptive theory, probability theory is like the calculus a method for calculating certain kinds of numbers which can be interpreted as being probabilities if one is willing to make enough assumptions such that the method can produce such numbers.

All of the above applies to practically all known mathematical formulations of probability theory; this is precisely why even a rigorous probability theory - such as Kolmogorov probability theory - is in fact far more barren in a explanatory theoretical sense than many naive utilizers of probability theory tend to presume.
Edit: This is true of all of (theoretical) Mathematics, not just probability theory. It is a Calculus to be instantiated to specifics , a tool box without intrinsic content/semantics. My $0.02. Last edited: • sysprog and Klystron This is true of all of Mathematics, not just probability theory. It is a Calculus to be instantiated to specifics , a tool box without intrinsic content/semantics. My$0.02.
It is important to recognize that not everyone realizes this; the cognitive step towards presuming that a more sophisticated version of probability theory can and even does offer intrinsic semantics is a fallacious step that is quite often taken, precisely because it can easily be taken.

The intrinsic content of what probabilities and chance are in fact have multiple explanations, namely in philosophy, logic and the foundations of mathematics, but these explanations when described as mathematical models go far beyond known mathematical probability theory. There are multiple new models and they are still work in progress, because they usually literally uproot all of mathematics as well.

This is in essence the entire reason we have had different interpretations in the foundations of probability theory for over a century, and as an extension also in the foundations of QM: the discussion is about which of the new mathematical models can completely subsume the existing mathematical model, which is merely an idealized limiting case, a toy model which has some empirical use.

• Stephen Tashi
FactChecker
Gold Member
In one sense, nothing like that is ever 72 million to one, as the finite probability of an error in the theory dominates.
I think it is safer to say that nothing is proven to be 72 million to one because of flaws in the model. There certainly are probabilities that are, in fact, that small or smaller. I still think that statement is too strong.

• sysprog
Let us please not deprive the word 'zero' of its meaning, and not deny zero its proper place on the number line, or allow zero to ever be said to have a positive value, or allow any number that is definitely positive to be called zero.

The probability that a given real number will be selected from within the unit interval is not uncommonly among mathematicians called zero. More properly, each of the infinite number of such possibilities is sometimes for purposes of utility treated as zero. In fact each such possibility is the infinitesimal, which may be called the least number greater than zero.

Some present the argument that there is no least number greater than zero, because no matter how small the number is, there is always one smaller.

Counter to this runs the argument that every real number in the closed unit interval that is not zero is greater than zero, and therefore cannot be equal to zero. The infinitesimal is never equal to zero.

If the infinitesimal were actually equal to zero, that would lead to absurdities such as that all integrations would sum to zero, wherefore the area under every curve would be zero. To deny that the infinitesimals within the unit are always positive, however small, is to retain a 'pet' inconsistency. It may be useful for non-rigorous parlance, but it is nevertheless incorrect use of language.

The fact that it will always be the case that some number within the unit interval will be selected, along with the definitional premise that 'selected at random' means that all probabilities have equal and positive probability of being the selected number, entails that no such number has zero probability of being selected.

However, it is impossible that the selected number will be two, wherefore the probability that it will be two is actually zero, and not merely infinitesimal. Only the impossible actually has zero probability.

• • Klystron and Auto-Didact
WWGD
Gold Member
2019 Award
Let us please not deprive the word 'zero' of its meaning, and not deny zero its proper place on the number line, or allow zero to ever be said to have a positive value, or allow any number that is definitely positive to be called zero.

The probability that a given real number will be selected from within the unit interval is not uncommonly among mathematicians called zero. More properly, each of the infinite number of such possibilities is sometimes for purposes of utility treated as zero. In fact each such possibility is the infinitesimal, which may be called the least number greater than zero.

Some present the argument that there is no least number greater than zero, because no matter how small the number is, there is always one smaller.

Counter to this runs the argument that every real number in the closed unit interval that is not zero is greater than zero, and therefore cannot be equal to zero. The infinitesimal is never equal to zero.

If the infinitesimal were actually equal to zero, that would lead to absurdities such as that all integrations would sum to zero, wherefore the area under every curve would be zero. To deny that the infinitesimals within the unit are always positive, however small, is to retain a 'pet' inconsistency. It may be useful for non-rigorous parlance, but it is nevertheless incorrect use of language.

The fact that it will always be the case that some number within the unit interval will be selected, along with the definitional premise that 'selected at random' means that all probabilities have equal and positive probability of being the selected number, entails that no such number has zero probability of being selected.

However, it is impossible that the selected number will be two, wherefore the probability that it will be two is actually zero, and not merely infinitesimal. Only the impossible actually has zero probability.
I guess one may say that the degree/level if resolution in probability theory does not allow us to distinguish such small probabilities from 0 itself. Just like a map of the world cannot pinpoint every thing in it, so, according to the map somethings are not there. Not sure I am fully making sense. Maybe one may use the Hyperreals to assign probabilities but dont know if that can be done.Edit: Re your claim about 2 , you may say that there is a real number r>0 so that the probability assigned to the interval (2-r, 2+r) is 0. You cannot say the same about points in [0,1].

WWGD said:
Edit: Re your claim about 2 , you may say that there is a real number r>0 so that the probability assigned to the interval (2-r, 2+r) is 0. You cannot say the same about points in [0,1].
Could you please elaborate? Did you mean to use the open interval? What do you mean by "the same"? Are you agreeing with my claim? I was merely distinguishing between the maximally improbable, and the impossible, and claiming that only the latter lel of probability should be called zero.

Last edited:
PeroK