# Math Challenge - September 2021

• Challenge
• Featured
Mentor
One surprising fact to me is the fact that ##\pi## is open implies that if there is an open set ##U \in gK##, then the entire coset is open, since ##\pi^{-1}(\pi(U)) = gK##. This must be related to 5.5.
##\pi^{-1}(\pi(U)) =U \neq_{i.g} gK##

##\pi## is open means that it maps open sets to open sets, so there are open neighborhoods around the images of those points.

sysprog
14. An ellipse, whose semi-axes have lengths ##a## and ##b##, rolls without slipping on the curve ##y = c \sin(x/a)##. How are ##a, b, c## related, given that the ellipse completes one revolution when it traverses one period of the curve?

Partial or incomplete answer, but I am posting this because I think the approach is not too far off from the correct, complete answer and so would like to know if and what mistake, if any, exists in my attempted solution. Since I don't get simple, closed-form expressions relating ##a, b## and ##c##, I suspect there might be a mistake in at least one of the intermediate equations, but I am unable to find where the mistake arises.

The ellipse's circumference must be equal to the length one period of the mentioned curve (a sinusoidal wave) if the ellipse completes exactly one revolution when rolls over one period of the wave.

The circumference of the ellipse, assuming ##a## is the semi-major axis and ##b## is the the semi-minor axis, is $$C = 4a \int_{0}^{\pi / 2} \sqrt {1 - (1 - b^2/a^2) \sin^2{θ}} \, dθ$$

The length of a curve ##y = f(x)## between ##x=x_1## and ##x=x_2##, assuming that ##f(x)## is differentiable in the range ##[x_1, x_2]##, is given by ##\int_{x_1}^{x_2} \sqrt {(1 + (f'(x))^2} \, dx##. For ##y = c \sin(x/a)##, ##y' = f'(x) = \dfrac {c} {a} \cos (x/a)## and therefore the length ##L## of one period of the curve is given by $$L = \int_{\frac{x}{a} = 0}^{2 \pi} \sqrt {1 + \dfrac {c^2} {a^2} \cos^2 (x/a)} \, dx = \int_{\frac{x}{a} = 0}^{2 \pi} \sqrt {1 + \dfrac {c^2} {a^2} (1 - \sin^2 (x/a))} \, dx = \int_{\frac{x}{a} = 0}^{2 \pi} \sqrt {\dfrac{a^2+c^2}{a^2}} \sqrt {1 - \dfrac {c^2} {a^2 + c^2} \sin^2 (x/a)} \, dx$$

Due to the symmetry of sine wave's shape, the length of the curve between ##\frac{x}{a} = 0## and ##\frac{x}{a} = 2π## equals 4 times the length of the curve between ##\frac{x}{a} = 0## and ##\frac{x}{a} = π/2##. Using this property, and defining ##z = x/a## and substituting for ##x, dx## accordingly in the above equation for ##L##, we get $$L = 4 \sqrt {a^2+c^2} \int_{z = 0}^{π/2} \sqrt {1 - \dfrac {c^2} {a^2 + c^2} \sin^2 z} \, dz$$

Since ##L = C##, we get $$\dfrac {4 \sqrt {a^2+c^2}} {4a} = \dfrac {\int_{0}^{\pi / 2} \sqrt {1 - (1 - b^2/a^2) \sin^2{θ}} \, dθ} {\int_{0}^{π/2} \sqrt {1 - \dfrac {c^2} {a^2 + c^2} \sin^2 z} \, dz}$$

When I first attempted to solve this, I had wrongly used ##\pi## instead of ##2\pi## as the upper bound for one of the integrals and that led to neat and simple relations between ##a, b, c## and I believed that a similar simple relations would be found even after correcting that error, but that didn't happen. I thought I could try equating the constants outside the integrals in expressions for ##L## and ##C## and similarly equate the multipliers of the ##\sin^2(.)## expression inside the 2 integrands to get at least one of the possibly many valid relations between the ##a, b, c##, but equating ##4\sqrt {a^2+c^2}## to ##4a## leads to ##c=0## and that cannot be a valid solution. Where is the mistake?

##\pi^{-1}(\pi(U)) =U \neq_{i.g} gK##

##\pi## is open means that it maps open sets to open sets, so there are open neighborhoods around the images of those points.
This doesn't make sense to me...

If ##U \subset \bigcup_{k_i \in K} gk_i## is open in ##G##, then
## \pi(U) = gK## which must be open in ##G/K##.

Therefore, ##\pi^{-1}(gK) = \bigcup_{k_i \in K} gk_i##, which must be open in ##G##.

Where is the above logic incorrect?

Mentor
Partial or incomplete answer, but I am posting this because I think the approach is not too far off from the correct, complete answer and so would like to know if and what mistake, if any, exists in my attempted solution. Since I don't get simple, closed-form expressions relating ##a, b## and ##c##, I suspect there might be a mistake in at least one of the intermediate equations, but I am unable to find where the mistake arises.

The ellipse's circumference must be equal to the length one period of the mentioned curve (a sinusoidal wave) if the ellipse completes exactly one revolution when rolls over one period of the wave.

The circumference of the ellipse, assuming ##a## is the semi-major axis and ##b## is the the semi-minor axis, is $$C = 4a \int_{0}^{\pi / 2} \sqrt {1 - (1 - b^2/a^2) \sin^2{θ}} \, dθ$$

The length of a curve ##y = f(x)## between ##x=x_1## and ##x=x_2##, assuming that ##f(x)## is differentiable in the range ##[x_1, x_2]##, is given by ##\int_{x_1}^{x_2} \sqrt {(1 + (f'(x))^2} \, dx##. For ##y = c \sin(x/a)##, ##y' = f'(x) = \dfrac {c} {a} \cos (x/a)## and therefore the length ##L## of one period of the curve is given by $$L = \int_{\frac{x}{a} = 0}^{2 \pi} \sqrt {1 + \dfrac {c^2} {a^2} \cos^2 (x/a)} \, dx = \int_{\frac{x}{a} = 0}^{2 \pi} \sqrt {1 + \dfrac {c^2} {a^2} (1 - \sin^2 (x/a))} \, dx = \int_{\frac{x}{a} = 0}^{2 \pi} \sqrt {\dfrac{a^2+c^2}{a^2}} \sqrt {1 - \dfrac {c^2} {a^2 + c^2} \sin^2 (x/a)} \, dx$$

Due to the symmetry of sine wave's shape, the length of the curve between ##\frac{x}{a} = 0## and ##\frac{x}{a} = 2π## equals 4 times the length of the curve between ##\frac{x}{a} = 0## and ##\frac{x}{a} = π/2##. Using this property, and defining ##z = x/a## and substituting for ##x, dx## accordingly in the above equation for ##L##, we get $$L = 4 \sqrt {a^2+c^2} \int_{z = 0}^{π/2} \sqrt {1 - \dfrac {c^2} {a^2 + c^2} \sin^2 z} \, dz$$

Since ##L = C##, we get $$\dfrac {4 \sqrt {a^2+c^2}} {4a} = \dfrac {\int_{0}^{\pi / 2} \sqrt {1 - (1 - b^2/a^2) \sin^2{θ}} \, dθ} {\int_{0}^{π/2} \sqrt {1 - \dfrac {c^2} {a^2 + c^2} \sin^2 z} \, dz}$$

When I first attempted to solve this, I had wrongly used ##\pi## instead of ##2\pi## as the upper bound for one of the integrals and that led to neat and simple relations between ##a, b, c## and I believed that a similar simple relations would be found even after correcting that error, but that didn't happen. I thought I could try equating the constants outside the integrals in expressions for ##L## and ##C## and similarly equate the multipliers of the ##\sin^2(.)## expression inside the 2 integrands to get at least one of the possibly many valid relations between the ##a, b, c##, but equating ##4\sqrt {a^2+c^2}## to ##4a## leads to ##c=0## and that cannot be a valid solution. Where is the mistake?

If I should check your calculations, then you will have to write way more lines of calculation.

The solution is actually very easy and short. Write the circumference of the ellipse as
##C=\int_{0}^{2\pi}\sqrt{(-a\sin \left(\theta\right))^2+(b\cos\left(\theta\right))^2}\,d\theta## and keep the polar coordinates for ##L##! Next simply bring those integrals in a comparable form for ##C=L## and conclude the solution.

Mentor
This doesn't make sense to me...

If ##U \subset \bigcup_{k_i \in K} gk_i## is open in ##G##, then
## \pi(U) = gK## which must be open in ##G/K##.

Therefore, ##\pi^{-1}(gK) = \bigcup_{k_i \in K} gk_i##, which must be open in ##G##.

Where is the above logic incorrect?
And the construction ##gk_i## in ##G## doesn't make any sense to me. It confuses the distinction between elements of ##G## and elements of ##G/K##.

Why do you look at ##U\subseteq \{g\cdot k\,|\,k\in K\}## at all? Yes, you get ##\pi(U)=gK##, so what?

##\pi^{-1}(gK)=\{g\}## not ##U##.

And the construction ##gk_i## in ##G## doesn't make any sense to me. It confuses the distinction between elements of ##G## and elements of ##G/K##.
I am not sure the confusion. A coset in ##G## consists of the set of elements, ##gK = \bigcup g_i \exists k \in K s.t. g_i k = g##. In ##G/K## that set becomes a single element.
Why do you look at ##U\subseteq \{g\cdot k\,|\,k\in K\}## at all? Yes, you get ##\pi(U)=gK##, so what?

##\pi^{-1}(gK)=\{g\}## not ##U##.
I know it isn't ##U##. I never claimed it was. ##U## isn't a saturated open subset unless it is the entire coset. It also isn't a single element ##g## since all elements of the coset in ##G## map to the same element ##gK##.

Either I have a giant misunderstanding or we are having some miscommunication issue here.

Mentor
Either I have a giant misunderstanding or we are having some miscommunication issue here.
I guess misunderstanding.
One surprising fact to me is the fact that ##\pi## is open implies that if there is an open set ##U \in gK##, then the entire coset is open, since ##\pi^{-1}(\pi(U)) = gK##. This must be related to 5.5.
I did not understand ##U\in gK##. This is a mixture of ##G## and ##G/K##. If you say open, then you have to say open in what.

Last edited:
jbergman
I guess misunderstanding.

I did not understand ##U\in gK##. This is a mixture of ##G## and ##G/K##. If you say open, then you have to say open in what.
Sorry for the confusion. I am not sure how people typically notate these things.

Last edited by a moderator:
Mentor
Sorry for the confusion. I am not sure how people typically notate these things.
One convention is to write cosets with an overline: ##gK=\overline{g}##. If you write ##gK## then it is unclear whether you mean ##gK=\{gk\,|\,k\in K\}\subseteq G## or ##gK\in G/K##.

So ##U\subseteq G## and ##\overline{U}\subseteq G/K## is a possibility to distinguish ##G## from ##G/K##. And, of course, we could also write everything in ##G/K## as images of ##\pi\, : \,G \twoheadrightarrow G/K## namely ##gK=\overline{g}=\pi(g) \in G/K\, , \,\overline{U}=\pi(U)\subseteq G/K.##

So ##gK \subseteq G## is actually the misleading notation. It would be better to write ##\pi^{-1}(\overline{g}).## This has also the advantage, that it is immediately obvious that ##gK\subseteq G## is closed, as it is the pre-image of a closed set ##(\overline{g}\in G/K## is a singleton and therefore closed, at least in ##T_1## topologies) under a continuous function.

Last edited:
julian
Gold Member
Problem #7:

Write

\begin{align*}
f (x) = x - (1+x) \log (1+x) + \frac{3x^2}{2 (3+x)}
\end{align*}

We must show that ##f (x) \leq 0## for ##x > -1##.

The first derivative is

\begin{align*}
\frac{d}{dx} f (x) = - \log (1+x) + \frac{3 x (x + 6)}{2(3+x)^2}
\end{align*}

This has a zero at ##x=0##, with ##f (0) = 0##.

The second derivative is

\begin{align*}
\frac{d^2}{dx^2} f (x) = - \frac{x^2 (x + 9)}{(1+x) (3+x)^3}
\end{align*}

which is zero at ##x = 0##, so we may have a maximum, a minimum or an inflection.

As

\begin{align*}
\frac{d}{dx} f (x) = - \int_0^x \frac{y^2 (y + 9)}{(1+y) (3+y)^3} dy
\end{align*}

we have ##\frac{d}{dx} f (x)## is positive for all ##-1 < x < 0## and we have ##\frac{d}{dx} f (x)## is negative for all ##x > 0##.

Which means the function ##f (x)## is monotonically increasing for ##-1 < x < 0## and monotonically decreasing for ##x > 0##. So ##f(x)## has a global maximum of zero at ##x = 0## for ##x > -1##.

ADDITONAL: The third derivative, ##f''' (0)##, is zero but the fourth derivative, ##f'''' (0)##, is negative, so by the higher derivative test the function ##f(x)## has a local maximum at ##x = 0##.

Last edited:
fresh_42
julian
Gold Member
Problem #9 parts 5 and 6.

5.

We prove ##\frac{2^{2n-1}}{n} \leq \binom{2n}{n}## by induction. The base case is trivial ##\frac{2^1}{1} \leq \binom{2}{1}##.

We assume

\begin{align*}
\frac{2^{2n-1}}{n} \leq \binom{2n}{n} .
\end{align*}

This together with ##4n \leq 2(2n+1)## implies

\begin{align*}
4n \cdot \frac{2^{2n-1}}{n} \leq 2 (2n+1) \binom{2n}{n}
\end{align*}

or

\begin{align*}
\frac{1}{n+1} \cdot 2^{2n+1} \leq \frac{(2n + 2) (2n+1)}{(n+1)^2} \binom{2n}{n}
\end{align*}

or

\begin{align*}
\frac{2^{2(n+1)-1}}{n + 1} \leq \binom{2(n+1)}{n+1}
\end{align*}

completing the inductive argument.

We prove ##\binom{2n}{n} \leq 2^{2n-1}## by induction. The base case is trivial ##\binom{2}{1} \leq 2^1##.

We assume

\begin{align*}
\binom{2n}{n} \leq 2^{2n-1} .
\end{align*}

This together with ##2n+1 \leq 2 (n+1)## implies

\begin{align*}
2 \frac{2n+1}{n+1} \binom{2n}{n} \leq 4 \cdot 2^{2n-1}
\end{align*}

which easily becomes

\begin{align*}
\binom{2(n+1)}{n+1} \leq 2^{2(n+1)-1}
\end{align*}

completing the inductive argument.

6.

We prove ##\prod_{p \leq n} p < 4^n## by induction. The base cases ##n = 2## obviously works (I guess the for the case ##n = 1## you use the convention that ##\prod_{p \leq 1} p = 0## as 1 is not a prime).

Assume ##\prod_{p \leq m} p < 4^m## for all ##m < n##, we wish to prove the result ##\prod_{p \leq n} p < 4^n##.

This is easy if ##n## is not prime because then

\begin{align*}
\prod_{p \leq n} p = \prod_{p \leq n-1} p < 4^{n-1} < 4^n .
\end{align*}

We now consider the case where ##n## is prime. As we are considering primes greater than 2, we can write ##n = 2k+1##.

Consider

\begin{align*}
\binom{2k+1}{k} = \frac{(2k+1) \cdots (q) \cdots (k+2)}{k!} = (2k+1) \cdots \frac{(q)}{k!} \cdots (k+2)
\end{align*}

where ##q## is a prime. As none of the factors in ##k!## can divide ##q## the ##q## cannot be cancelled. Hence ##q## divides ##\binom{2k+1}{k}##. Hence, ##\binom{2k+1}{k}## is divisible by all prime numbers greater than or equal to ##k + 2## and less than or equal to ##2k + 1##, that is, it is divisible by

\begin{align*}
Q = \prod_{k+2 \leq p \leq 2k+1} p = \dfrac{\prod_{p \leq 2k+1} p}{\prod_{p \leq k+1} p}
\end{align*}

So that ##Q \leq \binom{2k+1}{k}##. But

\begin{align*}
\binom{2k+1}{k} & < \binom{2k+1}{0} + \binom{2k+1}{1} + \cdots + \binom{2k+1}{k - 1} + \binom{2k+1}{k}
\nonumber \\
& = \frac{1}{2} \left( \binom{2k+1}{0} + \binom{2k+1}{1} + \cdots + \binom{2k+1}{2k} + \binom{2k+1}{2k + 1} \right)
\nonumber \\
& = \frac{1}{2} (1+1)^{2k + 1} = 4^k .
\end{align*}

Hence ##Q < 4^k##, or in other words

\begin{align*}
\dfrac{\prod_{p \leq 2k+1} p}{\prod_{p \leq k+1} p} < 4^k
\end{align*}

which by the inductive hypothesis implies

\begin{align*}
\prod_{p \leq 2k+1} p < 4^k \prod_{p \leq k+1} p < 4^{2k + 1} .
\end{align*}

Completing the inductive argument.

Mentor
(I guess the for the case ##n=1## you use the convention that ##\prod_{p\leq 1}p=0## as ##1## is not a prime).
No. The usual convention for an empty product is, to set it equal to ##1##. This is because ##0## isn't part of multiplicative groups, and ##1## is their neutral element. It is the same reason why empty sums are considered to be ##0.## And it preserves the important functional equation ##\exp(0+0)=\exp(0)\cdot\exp(0).##

julian
Gold Member
Problem #9 part 4.

By definition of ##\text{ord}_p (N)## it is the exponent of ##p## in the prime number decomposition of the number ##N!##. Using part 1 of the question we have that the exponent of ##p## in the prime number decomposition of ##\binom{2n}{n}##, denote it ##\nu_p (n)##, is given by

\begin{align*}
\nu_p (n) & = \text{ord}_p (2n) - 2 \text{ord}_p (n)
\nonumber \\
& = \left( \left\lfloor \frac{2n}{p} \right\rfloor - 2 \left\lfloor \frac{n}{p} \right\rfloor \right) + \left( \left\lfloor \frac{2n}{p^2} \right\rfloor - 2 \left\lfloor \frac{n}{p^2} \right\rfloor \right) + \cdots \quad (*)
\end{align*}

When ##p^k >2n## we have that ##\left( \left\lfloor \frac{2n}{p^k} \right\rfloor - 2 \left\lfloor \frac{n}{p^k} \right\rfloor \right) = 0##. Otherwise each such term in ##(*)## is either 0 or 1 as ##\left\lfloor 2a \right\rfloor - 2 \left\lfloor a \right\rfloor## is either 0 or 1. Let ##k_{max}## be the largest integer such that ##p^{k_{max}} \leq 2n##. Then ##\nu_p (n) \leq k_{max}##.

So if ##p^r | \binom{2n}{n}## then

\begin{align*}
p^r \leq p^{\nu_p (n)} \leq p^{k_{max}} \leq 2n .
\end{align*}

Last edited:
fresh_42
benorin
Homework Helper
#1) "Um, I'll take what is the Bohr-Mollerup Theorem for 200 points, Alex."

I want you honest opinions here, what do you think of my new signature? Simple yet amazing truth but is it too long?
Who's grading this month, @fresh_42 ? I think I should also get credit for #1) bc of the quoted post, the Bohr-Mollerup Theorem:

It is trivial from this theorem's definition of the Gamma function to obtain the form of it given in the problem (by existence + uniqueness of the Gamma function), but if you like I can furnish the leg work.

MathematicalPhysicist
Gold Member
Who's grading this month, @fresh_42 ? I think I should also get credit for #1) bc of the quoted post, the Bohr-Mollerup Theorem:

View attachment 289735

It is trivial from this theorem's definition of the Gamma function to obtain the form of it given in the problem (by existence + uniqueness of the Gamma function), but if you like I can furnish the leg work.
Is there a prize for the highest grades?
Otherwise I wouldn't care if I got graded or not.

Mentor
Is there a prize for the highest grades?
Well, we would have to ask @Greg Bernhardt. Maybe we could create a badge for it.
Otherwise I wouldn't care if I got graded or not.
I have written a TeX-file and created a pdf listing all of my problems and solutions up to the end of the year. It has 530 pages. Additionally, we have had many problems from other members. This means: the counting has to be done by somebody else! I suggest an Excel sheet.

Not anonymous
15. For a partition ##\pi## of ##N := \{1, 2, ..., 9\}##, let ##\pi(x)## be the number of elements in the part containing ##x##. Prove that for any two partitions ##\pi_1## and ##\pi_2##, there are two distinct numbers ##x## and ##y## in ##N## such that ##\pi_j(x) = \pi_j(y)## for ##j=1,2##.

Claim 1: Suppose a partition ##\pi_1## of ##N = \{1,2,...,9\}## has 4 distinct numbers, say ##a_1, a_2, a_3, a_4##, such that ##\pi_1(a_1) = \pi_1(a_2) = \pi_1(a_3) = \pi_1(a_4)##. Then there cannot be another partition ##\pi_2## of ##N## such that ##\pi_1(x) = \pi_1(y)## but ##\pi_2(x) \neq \pi_2(y)## for all distinct ##x, y \in N##. In other words, for any other partition ##\pi_2## of ##N##, we can find distinct ##x, y \in N## such that ##\pi_j(x) = \pi_j(y)## for ##j=1,2##

Proof by contradiction: Suppose there exists a partition ##\pi_2## of the form claimed to not exist above. Then we must have ##\pi_2(a_i) \neq \pi_2(a_j)## for all ##i, j \in \{1,2,3,4\}## with ##i \neq j##. Hence ##\pi_2(a_1), \pi_2(a_2), \pi_2(a_3), \pi_2(a_4)## must all be distinct, i.e. each of ##a_1, a_2, a_3, a_4## must belong to a different part in ##\pi_2## and this implies that ##\pi_2## has at least 4 parts of distinct cardinalities. Thus, the number of numbers in ##\pi_2## must be at least ##1+2+3+4 = 10##, but this contradicts the fact that ##\pi_2## is a partition of ##N## and hence must have only 9 distinct numbers. This proves that any other partition of ##\pi_2## of ##N## must be such that there will exist at least 2 distinct numbers ##x, y \in \{a_1, a_2, a_3, a_4\}## such that ##\pi_2(x) = \pi_2(y)##. And since by definition of ##a_i##'s, ##\pi_1(x) = \pi_1(y)##, we have found ##x,y## such that ##\pi(x) = \pi(y)## in both ##\pi_1## and ##\pi_2##.

Now consider the following cases for ##\pi_1##.

Case 1: ##\pi_1## has at least 1 part with a cardinality of 4 or more (i.e. the part is a subset of ##N## with at least 4 elements) or has at least 2 parts with the same cardinality and these equal-sized parts have 2 or more elements.

In this case, it is easy to see that there are at least 4 distinct numbers in ##N##, say ##a_1, a_2, a_3, a_4##, such that ##\pi_1(a_1) = \pi_1(a_2) = \pi_1(a_3) = \pi_1(a_4)##. We simply take any 4 numbers from the largest cardinality part in ##\pi_1## as that part is stated to have 4 or more elements. This ##\pi_1## clearly satisfies the condition of claim 1.

Case 2: All parts in ##\pi_1## have a cardinality of 3 or less and at most one part in ##\pi_1## can have a cardinality of 2 or 3.

In this case, since at most one part can have a cardinality greater than 1 and the maximum allowed cardinality of a part is 3, ##\pi_1## must have at least 4 parts whose cardinality is 1 (single element parts). This is because the maximum number of elements contained in the parts of size more than one is ##2+3=5## (one part of size 2, one part of size 3) and the remaining ##9-5=4## numbers must be covered by single-element parts. Here again, ##\pi_1## clearly satisfies the condition of claim 1.

Cases (1) and (2) cover all possible ways to form a partition ##\pi_1## of ##N##. Since in both cases ##\pi_1## meets the condition stated in claim 1, it follows that it is not possible to have another partition ##\pi_2## of ##N## such that ##\pi_1(x) \neq \pi_2(y)## for all distinct ##x, y \in N##. In other words, for any other partition ##\pi_2## of ##N##, we can find distinct ##x, y \in N## such that ##\pi(x) = \pi(y)## in both ##\pi_1## and ##\pi_2##. Hence proved.

fresh_42
The well known Maschke's theorem for Problem 4 :)

Mentor
The well known Maschke's theorem for Problem 4 :)
But Maschke is easy enough to prove, fulfills the requirement to learn something, and last but not least doesn't provide a counterexample for ##\operatorname{char}\mathbb{F}\,\mid\,|G| .##

IIRc the result is the group algebra is semisimple if and only if $\mathrm{char}F \not\mid |G|$.

Mentor
IIRc the result is the group algebra is semisimple if and only if $\mathrm{char}F \not\mid |G|$.
Sure. Otherwise, there wouldn't be a counterexample.

Office_Shredder
Staff Emeritus
Gold Member
It's funny, number 3 is just a continuous version of number 2 (if your random variables take discrete values with probabilities that can be expressed as integer ratios as each other, then #3 turns into #2 I think) but somehow, #3 seemed obviously true to me and #2 seemed like a crazy result when I first read it.

Anyway, here's a proof for #3.

Edit: ahh my dog posted this when I started typing. I'll remove this when I finish writing the proof.

It suffices to show for ##r>1##, that ##P(r-1 < |X-Y| \leq r) \leq 2P(|X-Y| \leq 1)##, since then
$$P(|X-Y| \leq r) = P(| X-Y| \leq 1) + \sum_{k=2}^r P(r-1 < |X-Y| \leq r) \leq P(| X-Y| \leq 1) + 2(r-1)P(| X-Y| \leq 1).$$

Mentor
It's funny, number 3 is just a continuous version of number 2 (if your random variables take discrete values with probabilities that can be expressed as integer ratios as each other, then #3 turns into #2 I think) but somehow, #3 seemed obviously true to me and #2 seemed like a crazy result when I first read it.

Anyway, here's a proof for #3.

Edit: ahh my dog posted this when I started typing. I'll remove this when I finish writing the proof.

It suffices to show for ##r>1##, that ##P(r-1 < |X-Y| \leq r) \leq 2P(|X-Y| \leq 1)##, since then
$$P(|X-Y| \leq r) = P(| X-Y| \leq 1) + \sum_{k=2}^r P(r-1 < |X-Y| \leq r) \leq P(| X-Y| \leq 1) + 2(r-1)P(| X-Y| \leq 1).$$
Yes, and it's a bit short. I post my solution which is a bit more detailed.

Fix an integer ##m,## and let ##S:=(x_1,\ldots,x_m)## be a random sequence of ##m## elements, where each ##x_i## is chosen, randomly and independently, according to the distribution of ##X.## By the previous statement (problem #2)
$$|S_r| <(2r-1)|S_1|.$$
Therefore, the expectation of ##|S_r|## is smaller than that of ##(2r-1)|S_1|.## However, by the linearity of expectation it follows that the expectation of ##|S_b|## is precisely ##m+m(m-1)p_b## for every ##b>0.## Thus
$$m+m(m-1)p_r< (2r-1)(m+(m-1)p_1),$$
implying that for every integer ##m,##
$$p_r< (2r-1)p_1+\dfrac{2r-2}{m-1} \stackrel{m\to \infty }{\longrightarrow } (2r-1)p_1$$

It can be proven that even the strict inequality
$$\operatorname{prob}\left(|X-Y|\leq 2\right) < 3\cdot \operatorname{prob}\left(|X-Y|\leq 1\right)$$
holds, in which case we speak of the 1-2-3 theorem.

julian
Gold Member
Problem #8:

We have

\begin{align*}
P \left( \frac{1}{n} \sum_{k=1}^n X_k \geq a \right) & = P \left( \sum_{k=1}^n X_k \geq na \right)
\nonumber \\
& = P \left( e^{t \sum_{k=1}^n X_k} \geq e^{tna} \right)
\nonumber \\
& \leq e^{- tna} E (e^{t \sum_{k=1}^n X_k}) \qquad \qquad (\text{Chernoff bound})
\nonumber \\
& = e^{- tna} \prod_{k=1}^n E (e^{t X_k}) \qquad (\text{independent random variables}) \quad (*)
\end{align*}

Let ##E(X_k) = 0##, ##E(X_k^2) = \sigma_k^2 \leq \sigma^2## and that ##| X_k | \leq B##. We have for ##i > 2## that ##E (X_k^i) = E (X_k^{i-2} X_k^2) \leq B^{i-2} \sigma_k^2##, so that

\begin{align*}
R_k & = \sum_{i=2}^\infty \frac{t^{i-2} E(X_k^i)}{i! \sigma_k^2} \leq \sum_{i=2}^\infty \frac{t^{i-2} B^{i-2} \sigma_k^2}{i! \sigma_k^2}
\nonumber \\
& = \frac{1}{(t B)^2} \sum_{i=2}^\infty \frac{t^i B^i}{i!}
\nonumber \\
& = \frac{e^{tB} - 1 - tB}{(t B)^2}
\end{align*}

Then

\begin{align*}
E (e^{t X_k}) & = E \left( 1 + t X_k + \sum_{i=2}^\infty \frac{t^i X_k^i}{i!} \right)
\nonumber \\
& = 1 + t^2 \sigma_k^2 R_k
\nonumber \\
& \leq e^{t^2 \sigma_k^2 R_k}
\nonumber \\
& = \exp \left\{ t^2 \sigma_k^2 \frac{e^{tB} - 1 - tB}{(t B)^2} \right\}
\end{align*}

Using this in ##(*)##

\begin{align*}
P \left( \frac{1}{n} \sum_{k=1}^n X_k \geq a \right) & \leq e^{- tna} \prod_{k=1}^n \exp \left\{ t^2 \sigma_k^2 \frac{e^{tB} - 1 - tB}{(t B)^2} \right\}
\nonumber \\
& = e^{- tna} \exp \left\{ n t^2 \tilde{\sigma}^2 \frac{e^{tB} - 1 - tB}{(tb)^2}
\end{align*}

(where we have defined ##\tilde{\sigma}^2 = \frac{1}{n} \sum_{k=1}^n \sigma_k^2 \leq \sigma^2##). We minimise the RHS of ##(**)## with respect to ##t##:

\begin{align*}
& \frac{d}{dt} \exp \left\{ n t^2 \tilde{\sigma}^2 \frac{e^{tB} - 1 - tB}{(tB)^2} - tna \right\}
\nonumber \\
& = \exp \left\{ n t^2 \tilde{\sigma}^2 \frac{e^{tB} - 1 - tB}{(tb)^2} - tna \right\} n (\tilde{\sigma}^2 \left( \frac{B e^{tB} - B}{B^2} \right) - a) = 0
\end{align*}

We have a tighter bound when ##t = \frac{1}{B} \log \left( \frac{aB}{\tilde{\sigma}^2} + 1 \right)##. Using this in ##(**)##:

\begin{align*}
P \left( \frac{1}{n} \sum_{k=1}^n X_k \geq a \right) & \leq \exp \left\{ n \tilde{\sigma}^2 \frac{ (\frac{aB}{\tilde{\sigma}^2} + 1) - 1 - \log \left( \frac{aB}{\tilde{\sigma}^2} + 1 \right)}{B^2} - \frac{na}{B} \log \left( \frac{aB}{\tilde{\sigma}^2} + 1 \right) \right\}
\nonumber \\
& = \exp \left\{ \frac{n \tilde{\sigma}^2}{B^2} \left[ \frac{aB}{\tilde{\sigma}^2} - \log \left( \frac{aB}{\tilde{\sigma}^2} + 1 \right) - \frac{aB}{\tilde{\sigma}^2} \log \left( \frac{aB}{\tilde{\sigma}^2} + 1 \right) \right] \right\}
\nonumber \\
& = \exp \left\{ - \frac{n \tilde{\sigma}^2}{B^2} f \left( \frac{aB}{\tilde{\sigma}^2} \right) \right\}
\end{align*}

where ##f(u) = (1+ u) \log (1+ u) - u \geq 0## for ##u \geq 0## (as ##f(0) = 0## and ##f'(u) = \log (1 + u) \geq 0##). We can use the inequality ##u - (1+u) \log (1+ u) \leq - \dfrac{3u^2}{2(u + 3)}## (##u > -1##), corresponding to problem #7, to obtain

\begin{align*}
P \left( \frac{1}{n} \sum_{k=1}^n X_k \geq a \right) & \leq \exp \left\{ - \frac{a^2}{\dfrac{2 \tilde{\sigma}^2}{n} + \dfrac{2Ba}{3n}}
\right\}
\nonumber \\
& \leq \exp \left\{ - \frac{a^2}{\dfrac{2 \sigma^2}{n} + \dfrac{2Ba}{3n}} \right\} \qquad (\sigma^2 \geq \tilde{\sigma}^2)
\end{align*}

Put

\begin{align*}
a = \sqrt{ \frac{2 \sigma^2 T}{n}} + \frac{2BT}{3n}
\end{align*}

then

\begin{align*}
a^2 = \frac{2 \sigma^2 T}{n} + 2 \sqrt{ \frac{2 \sigma^2}{n}} \frac{2B}{3n} T^{3/2} + \left( \frac{2B}{3n} \right)^2 T^2
\end{align*}

and

\begin{align*}
\frac{2 \sigma^2}{n} + \frac{2B}{3n} a = \frac{2 \sigma^2}{n} + \sqrt{ \frac{2 \sigma^2}{n}} \frac{2B}{3n} \sqrt{T} + \left( \frac{2B}{3n} \right)^2 T
\end{align*}

so that

\begin{align*}
- \frac{a^2}{\frac{2 \sigma^2}{n} + \frac{2B}{3n} a} \leq - \frac{a^2}{\frac{2 \sigma^2}{n} + \sqrt{ \frac{2 \sigma^2}{n}} \frac{2B}{3n} \sqrt{T} + \frac{2B}{3n} a} = - T
\end{align*}

So finally

\begin{align*}
P \left( \frac{1}{n} \sum_{k=1}^n X_k \geq \sqrt{ \frac{2 \sigma^2 T}{n}} + \frac{2BT}{3n} \right) \leq e^{-T} .
\end{align*}

Last edited:
fresh_42
Mentor
Problem #8:

We have

\begin{align*}
P \left( \frac{1}{n} \sum_{k=1}^n X_k \geq a \right) & = P \left( \sum_{k=1}^n X_k \geq na \right)
\nonumber \\
& = P \left( e^{t \sum_{k=1}^n X_k} \geq e^{tna} \right)
\nonumber \\
& \leq e^{- tna} E (e^{t \sum_{k=1}^n X_k}) \qquad \qquad (\text{Chernoff bound})
\nonumber \\
& = e^{- tna} \prod_{k=1}^n E (e^{t X_k}) \qquad (\text{independent random variables}) \quad (*)
\end{align*}

Let ##E(X_k) = 0##, ##E(X_k^2) = \sigma_k^2 \leq \sigma^2## and that ##| X_k | \leq B##. We have for ##i > 2## that ##E (X_k^i) = E (X_k^{i-2} X_k^2) \leq B^{i-2} \sigma_k^2##, so that

\begin{align*}
R_k & = \sum_{i=2}^\infty \frac{t^{i-2} E(X_k^i)}{i! \sigma_k^2} \leq \sum_{i=2}^\infty \frac{t^{i-2} B^{i-2} \sigma_k^2}{i! \sigma_k^2}
\nonumber \\
& = \frac{1}{(t B)^2} \sum_{i=2}^\infty \frac{t^i B^i}{i!}
\nonumber \\
& = \frac{e^{tB} - 1 - tB}{(t B)^2}
\end{align*}

Then

\begin{align*}
E (e^{t X_k}) & = E \left( 1 + t X_k + \sum_{i=2}^\infty \frac{t^i X_k^i}{i!} \right)
\nonumber \\
& = 1 + t^2 \sigma_k^2 R_k
\nonumber \\
& \leq e^{t^2 \sigma_k^2 R_k}
\nonumber \\
& = \exp \left\{ t^2 \sigma_k^2 \frac{e^{tB} - 1 - tB}{(t B)^2} \right\}
\end{align*}

Using this in ##(*)##

\begin{align*}
P \left( \frac{1}{n} \sum_{k=1}^n X_k \geq a \right) & \leq e^{- tna} \prod_{k=1}^n \exp \left\{ t^2 \sigma_k^2 \frac{e^{tB} - 1 - tB}{(t B)^2} \right\}
\nonumber \\
& = e^{- tna} \exp \left\{ n t^2 \tilde{\sigma}^2 \frac{e^{tB} - 1 - tB}{(tb)^2}
\end{align*}

(where we have defined ##\tilde{\sigma}^2 = \frac{1}{n} \sum_{k=1}^n \sigma_k^2 \leq \sigma^2##). We minimise the RHS of ##(**)## with respect to ##t##:

\begin{align*}
& \frac{d}{dt} \exp \left\{ n t^2 \tilde{\sigma}^2 \frac{e^{tB} - 1 - tB}{(tB)^2} - tna \right\}
\nonumber \\
& = \exp \left\{ n t^2 \tilde{\sigma}^2 \frac{e^{tB} - 1 - tB}{(tb)^2} - tna \right\} n (\tilde{\sigma}^2 \left( \frac{B e^{tB} - B}{B^2} \right) - a) = 0
\end{align*}

We have a tighter bound when ##t = \frac{1}{B} \log \left( \frac{aB}{\tilde{\sigma}^2} + 1 \right)##. Using this in ##(**)##:

\begin{align*}
P \left( \frac{1}{n} \sum_{k=1}^n X_k \geq a \right) & \leq \exp \left\{ n \tilde{\sigma}^2 \frac{ (\frac{aB}{\tilde{\sigma}^2} + 1) - 1 - \log \left( \frac{aB}{\tilde{\sigma}^2} + 1 \right)}{B^2} - \frac{na}{B} \log \left( \frac{aB}{\tilde{\sigma}^2} + 1 \right) \right\}
\nonumber \\
& = \exp \left\{ \frac{n \tilde{\sigma}^2}{B^2} \left[ \frac{aB}{\tilde{\sigma}^2} - \log \left( \frac{aB}{\tilde{\sigma}^2} + 1 \right) - \frac{aB}{\tilde{\sigma}^2} \log \left( \frac{aB}{\tilde{\sigma}^2} + 1 \right) \right] \right\}
\nonumber \\
& = \exp \left\{ - \frac{n \tilde{\sigma}^2}{B^2} f \left( \frac{aB}{\tilde{\sigma}^2} \right) \right\}
\end{align*}

where ##f(u) = (1+ u) \log (1+ u) - u \geq 0## for ##u \geq 0## (as ##f(0) = 0## and ##f'(u) = \log (1 + u) \geq 0##). We can use the inequality ##u - (1+u) \log (1+ u) \leq - \dfrac{3u^2}{2(u + 3)}## (##u > -1##), corresponding to problem #7, to obtain

\begin{align*}
P \left( \frac{1}{n} \sum_{k=1}^n X_k \geq a \right) & \leq \exp \left\{ - \frac{a^2}{\dfrac{2 \tilde{\sigma}^2}{n} + \dfrac{2Ba}{3n}}
\right\}
\nonumber \\
& \leq \exp \left\{ - \frac{a^2}{\dfrac{2 \sigma^2}{n} + \dfrac{2Ba}{3n}} \right\} \qquad (\sigma^2 \geq \tilde{\sigma}^2)
\end{align*}

Put

\begin{align*}
a = \sqrt{ \frac{2 \sigma^2 T}{n}} + \frac{2BT}{3n}
\end{align*}

then

\begin{align*}
a^2 = \frac{2 \sigma^2 T}{n} + 2 \sqrt{ \frac{2 \sigma^2}{n}} \frac{2B}{3n} T^{3/2} + \left( \frac{2B}{3n} \right)^2 T^2
\end{align*}

and

\begin{align*}
\frac{2 \sigma^2}{n} + \frac{2B}{3n} a = \frac{2 \sigma^2}{n} + \sqrt{ \frac{2 \sigma^2}{n}} \frac{2B}{3n} \sqrt{T} + \left( \frac{2B}{3n} \right)^2 T
\end{align*}

so that

\begin{align*}
- \frac{a^2}{\frac{2 \sigma^2}{n} + \frac{2B}{3n} a} \leq - \frac{a^2}{\frac{2 \sigma^2}{n} + \sqrt{ \frac{2 \sigma^2}{n}} \frac{2B}{3n} \sqrt{T} + \frac{2B}{3n} a} = - T
\end{align*}

So finally

\begin{align*}
P \left( \frac{1}{n} \sum_{k=1}^n X_k \geq \sqrt{ \frac{2 \sigma^2 T}{n}} + \frac{2BT}{3n} \right) \leq e^{-T} .
\end{align*}

Very good!

I knew the Chernoff bound as Markov inequality ...

The Markov inequality says that for a monotone increasing function ##f: \mathbb{R}\longrightarrow [0,\infty )## and a constant ##a\in \mathbb{R}##
$$f(a)\cdot P(X\geq a) \leq E(f(X))$$
and in particular for $f(a)>0$
$$P(X\geq a) \leq \dfrac{E(f(X))}{f(a)}.$$

... but I also thought nobody would solve this one. The name of the inequality in the statement (#8) is Bernstein inequality.

Last edited: