A decision tree has an expected depth of at least log n

mathmari · Dec 3, 2014

Hey!

I am looking at the proof of the following theorem and I have some questions.

The theorem is the following:

On the assumption that all permutations of a sequence of $n$ elements are equally likely to appear as input, any decision tree that sorts $n$ elements has an expected depth of at least $\log n!$.

The proof is the following:

Let $D(T)$ be the sum of the depths of the leaves of a binary rtee $T$.

Let $D(m)$ be the smallest value of $D(T)$ taken over all binary trees $T$ with $m$ leaves.

We shall show, by induction on $m$, that $D(m) \geq m \log m$.

The basis, $m=1$, is trivial.

Now, assume the inductive hypothesis is true for all values of $m$ less than $k$.

Consider a decision tree $T$ with $k$ leaves.

$T$ consists of a root having a left subtree $T_{i}$ with $i$ leaves and a right subtree $T_{k-i}$ with $k-i$ leaves for some $i$, $1 \leq i \leq k$.

Clearly, $$D(T)=i+D(T_{i})+(k-i)+D(T_{k-i})$$

Therefore, the minimum sum is given by $$D(k)=MIN_{1 \leq i \leq k} [k+D(i)+D(k-i)] \ \ \ \ (*)$$

Invoking the inductive hypothesis, we obtain from $(*)$ $$D(k) \geq k+MIN_{1 \leq i \leq k} [i \log i+(k-i \log (k-i)]$$

It is easy to show that the minimum occurs at $i=k/2$. Thus $$D(k) \geq k+k \log \frac{K}{2}=k \log k$$

We conclude that $D(m) \geq m \log m$ for all $m \geq 1$.

Now we claim that a decision tree $T$ sorting $n$ random elements has at least $n!$ leaves.

Moreover, exactly $n!$ leaves will have probability $1/n!$ each, and the remaining leaves will have probability zero.

We may remove from $T$ all vertices that are ancestors only of leaves of probability zero, without changing the expected depth of $T$.

We are thus left with a tree $T'$ having $n!$ leaves each of probability $1/n!$.

Since $D(T') \geq n! \log n!$, the expected depth of $T'$ (and hence of$T$) is at least $(1/n!)n! \log n!=\log n!$.

My questions are the following:

Why do we want to show, by induction, that $D(m) \geq m \log m$ ??
When we consider a decision tree $T$ with $k$ leaves where $T_{i}$ is the left subtree with $i$ leaves and $T_{k-i}$ is the right subtree with $k-i$ leaves, why does it stand that $$D(T)=i+D(T_{i})+(k-i)+D(T_{k-i})$$ ??
How do we conlcude that the minimum sum is given by $$D(k)=MIN_{1 \leq i \leq k} [k+D(i)+D(k-i)]$$ ??
How do we obtain, from the inductive hypothesis, that $$D(k) \geq k+MIN_{1 \leq i \leq k} [i \log i +(k-i) \log (k-i) ]$$ ??
Could you expalin me the part after "Now, assume the inductive hypothesis is true for all values of $m$ less than $k$." ??

(Wondering)

Future Bruno · Dec 3, 2014

The goal of the proof is to show that any decision tree that sorts $n$ elements has an expected depth of at least $\log n!$. To do this, we use induction to prove that $D(m) \geq m\log m$. This result is then used to show that the expected depth of a decision tree with $n!$ leaves is at least $\log n!$.To prove that $D(m) \geq m\log m$, we consider a decision tree $T$ with $k$ leaves. We note that $T$ consists of a root with a left subtree $T_i$ with $i$ leaves and a right subtree $T_{k-i}$ with $k-i$ leaves for some $i$, $1 \leq i \leq k$. Then, we have $$D(T)=i+D(T_i)+(k-i)+D(T_{k-i})$$ which means the minimum sum is given by $$D(k)=MIN_{1 \leq i \leq k} [k+D(i)+D(k-i)] \ \ \ \ (*).$$Invoking the inductive hypothesis, we obtain from $(*)$ $$D(k) \geq k+MIN_{1 \leq i \leq k} [i \log i+(k-i \log (k-i)]$$ It is easy to show that the minimum occurs at $i=k/2$. Therefore, $$D(k) \geq k+k \log \frac{K}{2}=k \log k$$We conclude that $D(m) \geq m \log m$ for all $m \geq 1$. Now we claim that a decision tree $T$ sorting $n$ random elements has at least $n!$ leaves. Moreover, each of the $n!$ leaves will have probability $1/n!$ and the remaining leaves will have probability zero. We may remove from $T$ all vertices that are ancestors only of leaves of probability zero, without changing the expected depth of $T$. This leaves us with a tree $T'$ with $n!$ leaves each of probability $1/n!$. Since $D(T') \geq n! \log n

A decision tree has an expected depth of at least log n

1. What is a decision tree?

2. What is the expected depth of a decision tree?

3. How is the expected depth of a decision tree calculated?

4. What does it mean if a decision tree has an expected depth of at least log n?

5. What are the benefits of having an expected depth of at least log n in a decision tree?

Similar threads

Hot Threads

Recent Insights