# Which Fibonacci number would you like me to compute

I'm given a source file:

int main()
{
bool running = true;
string pre_sum;
int fib_th;

while (running)
{
cout << "Which Fibonacci number would you like me to compute ? ('q' to quit) ";
cin >> pre_sum;
fib_th = atoi(pre_sum.c_str());

if ( pre_sum[0] == 'q' )
{
running = false;
}
else if ( fib_th >= 0 )
{
if ( fib_th % 10 == 1 )
cout << "The " << fib_th << "st Fibonacci number is " << fibb(fib_th) << endl;
else if ( fib_th % 10 == 2 )
cout << "The " << fib_th << "nd Fibonacci number is " << fibb(fib_th) << endl;
else if ( fib_th % 10 == 3 )
cout << "The " << fib_th << "rd Fibonacci number is " << fibb(fib_th) << endl;
else
cout << "The " << fib_th << "th Fibonacci number is " << fibb(fib_th) << endl;
}

}

cout << "Goodbye." << endl;

return 0;

}

I have to define the funtion:

int fibb(int num_in);

in a new file to have the program compute whatever fibonacci number is requester by the user.

any suggestions?
Thanks.

Borek
Mentor

Does it have to be efficient?

No, it just has to work

CRGreathouse
Homework Helper

The definition of the Fibonacci sequence is that f(0) = 0, f(1) = 1, and f(n) = f(n-2) + f(n-1). Can you program that, using if statements to check the first two and a recursive call for the last one?

Thanks I got it

You can adopt the common dynamic programming approach to solving this problem iteratively. That's a lot of jargon to say that you can use a loop and no recursion if you want to, and change the complexity from 2^n to n... a fairly substantial savings.

Another technique to improve efficiency (which is applicable to quite a few problems that are harder to optimize by simply converting to a loop of some sort) is using memoization.

The basic idea is to create a data structure that maps problem statements to solutions. In this case, you create a map from integers K to integers V and when you make a recursive call to solve for an integer K, you first check your mapping and, if an instance for K already exists in your map, you simply return V; otherwise you solve the subproblems as before.

This prevents you from solving the same problem multiple times and can result in quite a speedup.

rcgldr
Homework Helper

I thought the point was to implement a simple example of recursion, even though in this case it's not the most optimal implementation. Factorial is an other algorithm used to teach recursion, although it's not optimal to implement it that way.

One case where recursion makes the code easier to implement is a tree scan, for example a file and directory sub-tree scan.

"I thought the point was to implement a simple example of recursion, even though in this case it's not the most optimal implementation."

How not? The recursive algorithm I'm thinking of for the factorial function doesn't duplicate any work, and therefore has time complexity O(n). Comparing...

Code:
factorial(n)
1.   if n = 0
2.      then result := 1
3.   else
4.      result := n * factorial(n-1)
5.   return result

factorial(n)
1.   result := 1
2.   for i = 1...n
3.      do result = result * i
4.   return result

I understand that there is some inefficiency associated with the call stack of a recursive function, but this does not seem like enough of an issue to seriously prefer one algorithm over the other.

rcgldr
Homework Helper

The recursive algorithm I'm thinking of for the factorial function doesn't duplicate any work, and therefore has time complexity O(n). I understand that there is some inefficiency associated with the call stack of a recursive function, but this does not seem like enough of an issue to seriously prefer one algorithm over the other.
The overhead goes beyond just the call stack, depending on the machine. In the recursive version, the local variables are forced to be memory (stack) based, while the iterative version allows the local variables to be register based.

Then again, in the case of modern PC's, performance is so fast and memory space so large that it simply doesn't matter. In a thread on sorting algorithms, 4 million 64 bit integers (32 MB of psuedo random data) were sorted, did it really matter if one sort algorithm took .352 second versus another sorth algorithm that took .554 second, as both were less than 1 second?

Last edited:

The overhead goes beyond just the call stack, depending on the machine. In the recursive version, the local variables are forced to be memory (stack) based, while the iterative version allows the local variables to be register based.

The bit about the registers is a really good insight. I would have forgotten that because normally I'm more worried about recursive algorithms exhausting memory.

However, if the compiler is worth its salt, it won't actually do the stack push. The resulting assembly language should look like an iterative solution (and be able to exploit registers).

The exception to this is when the instruction set is limited, and tail recursion is not handled well. Such is the case in the Java Virtual Machine, for example. Except for this Achilles' heel, recursion is a lovely way to organize certain programs, and should be preferred in those cases where clarity is enhanced.

Borek
Mentor

Then again, in the case of modern PC's, performance is so fast and memory space so large that it simply doesn't matter. In a thread on sorting algorithms, 4 million 64 bit integers (32 MB of psuedo random data) were sorted, did it really matter if one sort algorithm took .352 second versus another sorth algorithm that took .554 second, as both were less than 1 second?

That's a poor example, as it doesn't say anything about the time complexity, although - as long as you know that your algorithm won't be used for larger data sets - spending time on fine tuning details is not worth it. I suppose that's what you meant.

Recursive Fibonacci numbers generation is an easy way of challenging your processor with a lot of work That's an over ten years old story on the theme:

http://www.bpp.com.pl/pentium/pentium_ii.htm

rcgldr
Homework Helper

Recursive Fibonacci numbers generation is an easy way of challenging your processor with a lot of work.
http://www.bpp.com.pl/pentium/pentium_ii.htm
Piggy little program, on my Core 2 X6800 (2.93ghz), it took 207 ticks = 11.37 seconds. Then again, it took longer to reboot my system to enable floppy (I usually keep it disabled) and boot from floppy, transfer the program to a bootable dos floppy, reboot again in dos to run the program, reboot again to turn disable floppy again, and reboot back to XP. Plus the time it took to read your post, and write my response.

... sort 4 million 64 bit integers in less than a second.
That's a poor example, as it doesn't say anything about the time complexity, although - as long as you know that your algorithm won't be used for larger data sets.
Those sort times were just an example that the dataset wasn't large enough to be concerned about reasonable implementations of sort algorithms.

Plus my main point here was that it appeared that this usage of recursion was just for learning purposes, not as a good example for using recursion.

Last edited:

recursive fib function :

Code:
fib( int x)
{
unsigned int result = 1;
if(x == 0 || x == 1) return 1;
else
result +=  fib( x-2) + fib(n-1)
return result
}

Borek
Mentor

Code:
fib( int x)
{
unsigned int result = 1;
if(x == 0 || x == 1) return 1;
else
result +=  fib( x-2) + fib(n-1)
return result
}

:yuck: There are at least two semicolons missing. You should not return unsigned int when your fuction is expected to return int. You define local variable that you don't need. You assign initial value that you don't need. And finally - it is wrong, as it generates 1, 1, 3 instead of 1, 1, 2.

However, if the compiler is worth its salt, it won't actually do the stack push. The resulting assembly language should look like an iterative solution (and be able to exploit registers).

Nope. Here's what gcc -S -O3 (highest optimization level) yields on x86, OS X:

Code:
_fac:
pushl	%ebp
movl	$1, %eax movl %esp, %ebp pushl %edi pushl %esi subl$48, %esp
movl	8(%ebp), %esi
cmpl	$1, %esi jle L4 leal -1(%esi), %edi cmpl$1, %edi
jle	L7
leal	-2(%esi), %eax
movl	%eax, -36(%ebp)
movl	$1, %eax cmpl$1, -36(%ebp)
jle	L10
leal	-3(%esi), %eax
movl	%eax, -32(%ebp)
movl	$1, %eax cmpl$1, -32(%ebp)
jle	L13
leal	-4(%esi), %eax
movl	%eax, -28(%ebp)
movl	$1, %eax cmpl$1, -28(%ebp)
jle	L16
leal	-5(%esi), %eax
movl	%eax, -24(%ebp)
movl	$1, %eax cmpl$1, -24(%ebp)
jle	L19
leal	-6(%esi), %eax
movl	%eax, -20(%ebp)
movl	$1, %eax cmpl$1, -20(%ebp)
jle	L22
leal	-7(%esi), %eax
movl	%eax, -16(%ebp)
movl	$1, %eax cmpl$1, -16(%ebp)
jle	L25
leal	-8(%esi), %eax
movl	%eax, -12(%ebp)
movl	$1, %eax cmpl$1, -12(%ebp)
jle	L28
leal	-9(%esi), %eax
movl	%eax, (%esp)
call	_fac
imull	-12(%ebp), %eax
L28:
imull	-16(%ebp), %eax
L25:
imull	-20(%ebp), %eax
L22:
imull	-24(%ebp), %eax
L19:
imull	-28(%ebp), %eax
L16:
imull	-32(%ebp), %eax
L13:
imull	-36(%ebp), %eax
L10:
imull	%edi, %eax
L7:
imull	%esi, %eax
L4:
addl	$48, %esp popl %esi popl %edi leave ret GCC does not convert it into an iterative algorithm, and it doesn't even move the function call to registers - all arguments and return values go through the stack. It unrolls the loop to nine levels, which takes advantage of multiple dispatch (there are 3 scalar ALUs on the Core architecture), and also reduces the number of function calls by a factor of 9. Multiple dispatch should make little performance difference, as the bottleneck is memory, not CPU time. Core architecture: http://en.wikipedia.org/wiki/File:Intel_Core2_arch.svg Code: #include <stdio.h> int fac (int n) { if (n <= 1) {return 1;} return n * fac(n-1); } int main(int argc, char* argv[]) { int input; sscanf(argv[1], "%u", &input); printf("%d\n", fac(input)); return 0; } Last edited: Okay, simple test. I showed the recursive factorial: here's the iterative, Code: int fac (int n) { int i, prod; for (i=1, prod=1; i<=n; i++) { prod *= n; } return prod; } Compiling both at -O0 and -O3, and timing: Code: bash-3.2$ time ./recfac3 1000000
0

real	0m0.026s
user	0m0.011s
sys	0m0.013s

bash-3.2$time ./iterfac3 1000000 0 real 0m0.009s user 0m0.005s sys 0m0.004s bash-3.2$ time ./iterfac0 1000000
0

real	0m0.013s
user	0m0.009s
sys	0m0.004s

(Ignore the obviously incorrect answer: the integers overflowed many times, and eventually stabilized on 0).

The recursive version segfaults at -O0 (stack overflows). Here it is with smaller samples (10^5, not 10^6):

Code:
bash-3.2$time ./recfac0 100000 0 real 0m0.018s user 0m0.007s sys 0m0.011s bash-3.2$ time ./recfac3 100000
0

real	0m0.008s
user	0m0.002s
sys	0m0.005s

Lots of interesting things happening here. You see the huge effects of loop unrolling (recfac0 -> recfac3), vastly reducing the number of memory accesses. You also see a large effect in CPU time ('user'), from the CPU scheduling multiple instructions to run at the same time in parallel (multiple dispatch); but this is unimportant here, dwarfed by memory time. And there is another vast speedup in making the algorithm iterative and completely in registers (iterfac0 and interfac3). This second speedup is bigger here than unrolling loops - compare iterfac0 (registers, no unrolling loops) with recfac3 (recursive, memory stack, unrolled loops).

Here is iterfac0, which is completely in registers:

Code:
_fac:
pushl	%ebp
movl	%esp, %ebp
subl	$24, %esp movl$1, -16(%ebp)
movl	$1, -12(%ebp) jmp L2 L3: movl -12(%ebp), %eax imull 8(%ebp), %eax movl %eax, -12(%ebp) leal -16(%ebp), %eax incl (%eax) L2: movl -16(%ebp), %eax cmpl 8(%ebp), %eax jle L3 movl -12(%ebp), %eax leave ret Last edited: rcgldr Homework Helper Here is iterfac0, which is completely in registers: ... All of those (%ebp) and (%eax) are memory references. Microsoft (old VC 4.0 version in this case) compiled output of same source other than the initial set edx to immediate 1, and move ecx, _n$ from the stack, only registers are used (if fastcall was used, "n" would have been passed in a register):

fixed version of fac:

Code:
int fac (int n) {
int i, prod;
for (i=1, prod=1; i<=n; i++) {
prod *= i;    // i instead of n here
}
return prod;
}

assembly output (note instruction operand format is dst, src):

Code:
_n$= 8 _fac PROC NEAR mov edx, 1 mov ecx, DWORD PTR _n$[esp-4]
mov     eax, edx
cmp     ecx, edx
jl      SHORT $L81$L79:   imul    eax, edx
inc     edx
cmp     edx, ecx
jle     SHORT $L79$L81:   ret     0
_fac    ENDP

I don't think that the Microsoft compilers will attempt to unfold loops. Also these simple examples might be good as learning tools, but not good examples of using recursion. One typical example of recursion is searching through a tree, for example, producing a list of all files and subdirectories in a partition. I usually do this with a function that has two main loops (each loop could be a sub-function), the first loop shows all the files in the current directory, the second loop then rescans the current directory looking for sub-directories and makes a recursive call.

Last edited:

You're completely right. I shouldn't comment in the middle of the night.

rcgldr
Homework Helper

For fac(), you could optimize the code knowing compiler and machine language, reducing number of instructions per loop from 4 to 3, the equivalent of using C as a macro language for assembly:

Code:
int fac (int n) {
int i, prod;
for (i=n, prod=1; --i != 0;) {
prod *= i;
}
return prod;
}

Code:
fac     PROC NEAR
_n$= ecx mov eax, 1 dec ecx je SHORT$L81
$L80: imul eax, ecx dec ecx jne SHORT$L80
\$L81:   ret     0
fac     ENDP

Unless you're doing something like computing pi to 10,000,000 digits in less than 30 seconds (PiFast43 can do this), or real time programming where response times aren't arbitrary, this level of optimization isn't needed.