Is it the algorithm the optimal one?

evinda · Nov 1, 2016

Hello! (Wave)

I want to write an optimal algorithm that solves the following problem:

Someone chooses a number from the set $\{ 1, \dots,1000\}$ and I have to find it. I can ask if the number that was chosen is $< , > , \leq $ or $ \geq $ from a number and the possible answers are yes or no.
Also I want to show that my algorithm is optimal.

I have written the following algorithm:

Code:

Algorithm {
 int low=1, high=1000;
 k=Search(low,high);
 printf("The number that was chosen is %d", &k);
 return;
}

Code:

Search(low,high){
 char ans;
 if (high<low) return 0;
 int mid=low+floor((high-low)/2);
 printf("Is the number equal to %d ?", &mid);
 scanf("%c",&ans);
 if (ans=='yes') return mid;
 else {
       printf("Is the number <= %d", &mid);
       scanf("%c",&ans);
       if (ans=='yes') Search(low,mid-1);
       else Search(mid+1,high);
 }
}

But is my algorithm optimal? If so, how can we prove it? (Thinking)

evinda · Nov 3, 2016

Do we have to show that the time complexity is $O(\log n)$ and that there is no algorithm with lower time complexity that solves the problem? (Thinking)

Evgeny.Makarov · Nov 3, 2016

evinda said:

But is my algorithm optimal?

Optimal in what sense? Its working time presumably includes the time it takes the user to answer questions.

If you count the number of questions, it's probably not optimal because it asks roughly $2\lfloor\log_2n\rfloor$ questions instead of $\lfloor\log_2n\rfloor$.

evinda said:

If so, how can we prove it?

The algorithm should be able to return any number between 1 and 1000. If you write its execution as a binary tree where each node corresponds to a question it asks, then this tree must have at least 1000 leaves. This imposes a lower bound on the tree height, i.e., on the number of questions asked along each execution path.

evinda · Nov 3, 2016

Evgeny.Makarov said:

Optimal in what sense? Its working time presumably includes the time it takes the user to answer questions.

If you count the number of questions, it's probably not optimal because it asks roughly $2\lfloor\log_2n\rfloor$ questions instead of $\lfloor\log_2n\rfloor$.

Yes, we count the number of questions and we don't look at the time it takes the user to answer questions.

Evgeny.Makarov said:

The algorithm should be able to return any number between 1 and 1000. If you write its execution as a binary tree where each node corresponds to a question it asks, then this tree must have at least 1000 leaves. This imposes a lower bound on the tree height, i.e., on the number of questions asked along each execution path.

So at the algorithm, we consider that we have a binary tree with $1000$ nodes, the root will be $500$ , the left subtree will contain all numbers smaller than $500$ and greater or equal to $1$ and the right subtree will contain all numbers greater than $500$ and smaller or equal to $1000$. Right?Then the first level will contain $2^0$ leaves, the second $2^1$, the second $2^2$ and so on. How can we find the height of the tree?

Evgeny.Makarov · Nov 3, 2016

evinda said:

So at the algorithm, we consider that we have a binary tree with $1000$ nodes

Not nodes, but leaves.

evinda said:

the root will be $500$ , the left subtree will contain all numbers smaller than $500$ and greater or equal to $1$ and the right subtree will contain all numbers greater than $500$ and smaller or equal to $1000$.

Yes. Paths from the root to leaves represent possible executions of the algorithms and internal nodes represent questions. Each question splits an execution into two. Leaves represent the answer returned by the algorithm. To produce a certain number of different answers (leaves) the algorithm cannot make fewer than a certain number of comparisons (internal nodes).

This idea is used in the book Aho, Hopcroft, Ullman, "The Design and Analysis of Computer Algorithm", section 3.3, to prove the lower bound on the complexity of sorting.

evinda · Nov 3, 2016

Evgeny.Makarov said:

Not nodes, but leaves.
Yes. Paths from the root to leaves represent possible executions of the algorithms and internal nodes represent questions. Each question splits an execution into two. Leaves represent the answer returned by the algorithm. To produce a certain number of different answers (leaves) the algorithm cannot make fewer than a certain number of comparisons (internal nodes).

This idea is used in the book Aho, Hopcroft, Ullman, "The Design and Analysis of Computer Algorithm", section 3.3, to prove the lower bound on the complexity of sorting.

I see. At the algorithm do we assume that our binary tree is sorted or do we sort it ourselves?

So if someone has chosen a number, we start from the root of the binary tree, that is we ask if the number is $\geq 500$. If the answer is yes, we continue at the leaf of the right subtree. Otherwise at the leaf of the left subtree.
These nodes represent again a question and taking into consideration the answer we will continue either with the leaf of the left or of the right subtree. We continue in the same way, so we will visit $h+1$ leaves, and so this will be the time complexity of the algorithm.

Right?

Evgeny.Makarov · Nov 3, 2016

evinda said:

At the algorithm do we assume that our binary tree is sorted or do we sort it ourselves?

No tree exists as a data structure, so there is nothing to sort. The tree we are talking about is an abstract concept. It represents all possible execution paths of the algorithm you wrote. It is determined by the algorithm.

evinda said:

So if someone has chosen a number, we start from the root of the binary tree, that is we ask if the number is $\geq 500$. If the answer is yes, we continue at the leaf of the right subtree.

At the root of the right subtree.

evinda said:

These nodes represent again a question and taking into consideration the answer we will continue either with the leaf of the left or of the right subtree.

Roots of subtrees.

evinda said:

We continue in the same way, so we will visit $h+1$ leaves

Nodes (internal ones). Also, it is not clear what $h$ is, namely, is this phrase the definition of $h$ or some statement to prove?

evinda · Nov 3, 2016

Evgeny.Makarov said:

No tree exists as a data structure, so there is nothing to sort. The tree we are talking about is an abstract concept. It represents all possible execution paths of the algorithm you wrote. It is determined by the algorithm.

A ok. There is not a unique tree that we could use. There are a lot of possible trees. Right?

Evgeny.Makarov said:

At the root of the right subtree.

Roots of subtrees.

Ok.

Evgeny.Makarov said:

Nodes (internal ones). Also, it is not clear what $h$ is, namely, is this phrase the definition of $h$ or some statement to prove?

With $h$ I meant the height of the tree, i.e. the longest path from the root to a leaf.

Evgeny.Makarov · Nov 3, 2016

evinda said:

There is not a unique tree that we could use. There are a lot of possible trees. Right?

No, please re-read my previous response.

evinda said:

With $h$ I meant the height of the tree, i.e. the longest path from the root to a leaf.

Yes, the tree height is the worst-case complexity.

evinda · Nov 3, 2016

Evgeny.Makarov said:

No, please re-read my previous response.

Ok.

Evgeny.Makarov said:

Yes, the tree height is the worst-case complexity.

And we know that $h \geq \log{(n!)}=O(\log{n})$, right?Isn't the algorithm that we have the same as the one I have written at my initial post?

Evgeny.Makarov · Nov 3, 2016

evinda said:

And we know that $h \geq \log{(n!)}=O(\log{n})$, right?

No, where did $n!$ come from? And $\log{(n!)}$ is not $O(\log{n})$.

evinda said:

Isn't the algorithm that we have the same as the one I have written at my initial post?

In this thread, the algorithm from post #1 is the only one I was talking about. I just mentioned the book as having a similar proof.

evinda · Nov 4, 2016

Evgeny.Makarov said:

No, where did $n!$ come from? And $\log{(n!)}$ is not $O(\log{n})$.

In this case, $h$ is such that $\sum_{i=0}^h 2^i=1000$. Right?

Evgeny.Makarov · Nov 4, 2016

evinda said:

In this case, $h$ is such that $\sum_{i=0}^h 2^i=1000$.

$\sum_{i=0}^h 2^i$ is the number of all nodes (vertices) in a tree, and 1000 is the number of leaves (nodes with no children).

Lemma 3.1 and (an analog of) Theorem 3.4 from the book are also true here.

Lemma 3.1. A binary tree of height $h$ contains at most $2^h$ leaves.

Therefore,

Theorem 3.4. The height any decision tree guessing any number out of $n$ possible numbers is at least $\log n$.

evinda · Nov 4, 2016

Evgeny.Makarov said:

$\sum_{i=0}^h 2^i$ is the number of all nodes (vertices) in a tree, and 1000 is the number of leaves (nodes with no children).

Lemma 3.1 and (an analog of) Theorem 3.4 from the book are also true here.

Lemma 3.1. A binary tree of height $h$ contains at most $2^h$ leaves.

Therefore,

Theorem 3.4. The height any decision tree guessing any number out of $n$ possible numbers is at least $\log n$.

So we have $1000$ leaves.

$2^h= 1000 \Rightarrow h=\log_{2}{(1000)}$

And so we deduce that the algorithm needs to ask $h=O(\log_2{(1000)})$ questions in order to find the number, right?

Evgeny.Makarov · Nov 4, 2016

evinda said:

$2^h= 1000 \Rightarrow h=\log_{2}{(1000)}$

As a tree height, $h$ is an integer, so this cannot happen.

evinda said:

And so we deduce that the algorithm needs to ask $h=O(\log_2{(1000)})$ questions in order to find the number, right?

$O(\log_2{(1000)})=O(1)=O(c)$ for any constant $c$.

Let $\mathcal{A}$ be any algorithm that asks yes/no questions in order guess any number between $1$ and $n$. Let $q_{\mathcal{A}}(k)$ be the number of question it takes $\mathcal{A}$ to guess $k$. Then $q_{\mathcal{A}}(k)$ is the length of path from the root of the decision tree to the leaf $k$. The theorem above says that $\max\limits_{k=1,\dots,n}q_{\mathcal{A}}(k)\ge\lceil\log_2n\rceil$. If you write an algorithm that asks at most $\lceil\log_2n\rceil$ questions, then it will be optimal not just in big-O sense.

evinda · Nov 4, 2016

Evgeny.Makarov said:

As a tree height, $h$ is an integer, so this cannot happen.

Applying Lemma 3.1 we have that [m] number of leaves $\leq 2^h \Rightarrow 1000 \leq 2^h \Rightarrow h \geq \log_2{1000}$[/m], right?

Evgeny.Makarov said:

$O(\log_2{(1000)})=O(1)=O(c)$ for any constant $c$.

Oh yes, right. So at the algorithm we have to pick $n$ as the highest number, right?We have to use an other inequality for the height of the decision tree in order to find a lower bound, or not?

Evgeny.Makarov said:

Let $\mathcal{A}$ be any algorithm that asks yes/no questions in order guess any number between $1$ and $n$. Let $q_{\mathcal{A}}(k)$ be the number of question it takes $\mathcal{A}$ to guess $k$. Then $q_{\mathcal{A}}(k)$ is the length of path from the root of the decision tree to the leaf $k$. The theorem above says that $\max\limits_{k=1,\dots,n}q_{\mathcal{A}}(k)\ge\lceil\log_2n\rceil$. If you write an algorithm that asks at most $\lceil\log_2n\rceil$ questions, then it will be optimal not just in big-O sense.

I see.

Evgeny.Makarov · Nov 4, 2016

evinda said:

Applying Lemma 3.1 we have that number of leaves $\leq 2^h \Rightarrow 1000 \leq 2^h \Rightarrow h \geq \log_2{1000}$, right?

Yes. This is (an analog of) theorem 3.4 from post 13.

evinda said:

So at the algorithm we have to pick $n$ as the highest number, right?

I am not sure what you are trying to say. The best way to clear an ambiguity is to use a precise statement. "Have to" is not a mathematical statement.

evinda said:

We have to use an other inequality for the height of the decision tree in order to find a lower bound, or not?

What other inequality? The lower bound on $h$ (the number of questions in the worst case) is established in your first quote above.

evinda · Nov 5, 2016

Evgeny.Makarov said:

Yes. This is (an analog of) theorem 3.4 from post 13.

Ok.

Evgeny.Makarov said:

I am not sure what you are trying to say. The best way to clear an ambiguity is to use a precise statement. "Have to" is not a mathematical statement.

What other inequality? The lower bound on $h$ (the number of questions in the worst case) is established in your first quote above.

I am sorry. I re-looked my algorithm and what I said makes no sense.So we want to find the number of questions that the function [m]Search[/m] of my initial post makes.

Let $n=high-low+1$ and suppose that we denote with $T(n)$ the number of questions that the functions makes.

For $n=1$, i.e. when [m]high=low[/m] , then [m]mid=low[/m], the algorithm asks the question [m]Is the number equal to mid ?[/m] , the answer will be yes and so the function [m]Search[/m] will return the number mid.
So we have $T(1)=1$.

Suppose that [m]high>low[/m].
Then at the worst case the answer of the question [m]Is the number equal to mid ?[/m] is no.
Then the function will ask the question [m]Is the number <=mid? [/m].
So $2$ questions are asked.
Then the function calls itself either with the parameters (low,mid-1) or with the parameters (mid+1, high).
So the new value of $n$ is either $\lfloor \frac{high-low}{2}\rfloor-low$ or $high-\lfloor \frac{high-low}{2}\rfloor$.

So at the first case, the number of questions is given by the recurrence relation $T(n)=2+T{\left( \lfloor \frac{n-1}{2} \rfloor-low\right)}$ and at the second case by the recurrence relation $T(n)=2+T{\left(high- \lfloor \frac{n-1}{2} \rfloor\right)}$.

Right?

Evgeny.Makarov · Nov 6, 2016

I agree, but I think it is better to write the recurrence relation without using $low$ and $high$, i.e., only using $n$.

Also, please re-read post #3 about optimality.

evinda · Nov 6, 2016

Evgeny.Makarov said:

I agree, but I think it is better to write the recurrence relation without using $low$ and $high$, i.e., only using $n$.

Can we substite [m] low [/m] by $1$ and [m]high[/m] by $1000$ ?

Evgeny.Makarov said:

Also, please re-read post #3 about optimality.

Evgeny.Makarov said:

If you count the number of questions, it's probably not optimal because it asks roughly $2\lfloor\log_2n\rfloor$ questions instead of $\lfloor\log_2n\rfloor$.

How do we deduce it that the function asks roughly $2\lfloor\log_2n\rfloor$ questions instead of $\lfloor\log_2n\rfloor$?
Is there an other way to find the number of questions rather than solving the recurrence relations?

We could equivalently write the function as follows, right?

Code:

Search(low,high){
 char ans;
 if (high=low) return low;
 int mid=low+floor((high-low)/2);
 printf("Is the number <= %d", &mid);
 scanf("%c",&ans);
 if (ans=='yes') Search(low,mid);
 else Search(mid+1,high);
 }
}

evinda · Nov 7, 2016

The number of questions of the newest version of the function [m]Search[/m] that I have written at the previous post is given by the following recurrence relation:

$$ T(n)=\left\{\begin{matrix}
0 &, n=1 \\
1+\max\{T{\left( \lfloor \frac{n-1}{2}\rfloor+1 \right )}, T{\left( n-1-\lfloor \frac{n-1}{2}\rfloor \right )}\} & , n>1
\end{matrix}\right.$$
It holds that $T(2^i)=i$.

By setting $i=\log_2 n$ we get that $T(n)=\log_2{n}$. But can we just set it?

Is it the algorithm the optimal one?

Related to Is it the algorithm the optimal one?

1. What is an algorithm?

2. How do you determine if an algorithm is optimal?

3. Can an algorithm be improved or changed to become more optimal?

4. What are some common factors that affect the optimality of an algorithm?

5. How do algorithms impact our daily lives?

Similar threads

Hot Threads

Recent Insights