Which Fibonacci number would you like me to compute

ZakAttk1 · Apr 23, 2009

I'm given a source file:

int main()
{
bool running = true;
string pre_sum;
int fib_th;

while (running)
{
count << "Which Fibonacci number would you like me to compute ? ('q' to quit) ";
cin >> pre_sum;
fib_th = atoi(pre_sum.c_str());

if ( pre_sum[0] == 'q' )
{
running = false;
}
else if ( fib_th >= 0 )
{
if ( fib_th % 10 == 1 )
count << "The " << fib_th << "st Fibonacci number is " << fibb(fib_th) << endl;
else if ( fib_th % 10 == 2 )
count << "The " << fib_th << "nd Fibonacci number is " << fibb(fib_th) << endl;
else if ( fib_th % 10 == 3 )
count << "The " << fib_th << "rd Fibonacci number is " << fibb(fib_th) << endl;
else
count << "The " << fib_th << "th Fibonacci number is " << fibb(fib_th) << endl;
}

}

count << "Goodbye." << endl;

return 0;

}

I have to define the funtion:

int fibb(int num_in);

in a new file to have the program compute whatever fibonacci number is requester by the user.

any suggestions?
Thanks.

Borek · Apr 23, 2009

Does it have to be efficient?

ZakAttk1 · Apr 23, 2009

No, it just has to work

CRGreathouse · Apr 23, 2009

The definition of the Fibonacci sequence is that f(0) = 0, f(1) = 1, and f(n) = f(n-2) + f(n-1). Can you program that, using if statements to check the first two and a recursive call for the last one?

ZakAttk1 · Apr 23, 2009

Thanks I got it

AUMathTutor · Apr 23, 2009

You can adopt the common dynamic programming approach to solving this problem iteratively. That's a lot of jargon to say that you can use a loop and no recursion if you want to, and change the complexity from 2^n to n... a fairly substantial savings.

Mark512 · Apr 24, 2009

Another technique to improve efficiency (which is applicable to quite a few problems that are harder to optimize by simply converting to a loop of some sort) is using memoization.

The basic idea is to create a data structure that maps problem statements to solutions. In this case, you create a map from integers K to integers V and when you make a recursive call to solve for an integer K, you first check your mapping and, if an instance for K already exists in your map, you simply return V; otherwise you solve the subproblems as before.

This prevents you from solving the same problem multiple times and can result in quite a speedup.

rcgldr · Apr 24, 2009

I thought the point was to implement a simple example of recursion, even though in this case it's not the most optimal implementation. Factorial is an other algorithm used to teach recursion, although it's not optimal to implement it that way.

One case where recursion makes the code easier to implement is a tree scan, for example a file and directory sub-tree scan.

AUMathTutor · Apr 25, 2009

"I thought the point was to implement a simple example of recursion, even though in this case it's not the most optimal implementation."

How not? The recursive algorithm I'm thinking of for the factorial function doesn't duplicate any work, and therefore has time complexity O(n). Comparing...

Code:

factorial(n)
1.   if n = 0
2.      then result := 1
3.   else
4.      result := n * factorial(n-1)
5.   return result

factorial(n)
1.   result := 1
2.   for i = 1...n
3.      do result = result * i
4.   return result

I understand that there is some inefficiency associated with the call stack of a recursive function, but this does not seem like enough of an issue to seriously prefer one algorithm over the other.

rcgldr · Apr 25, 2009

AUMathTutor said:

The recursive algorithm I'm thinking of for the factorial function doesn't duplicate any work, and therefore has time complexity O(n). I understand that there is some inefficiency associated with the call stack of a recursive function, but this does not seem like enough of an issue to seriously prefer one algorithm over the other.

The overhead goes beyond just the call stack, depending on the machine. In the recursive version, the local variables are forced to be memory (stack) based, while the iterative version allows the local variables to be register based.

Then again, in the case of modern PC's, performance is so fast and memory space so large that it simply doesn't matter. In a thread on sorting algorithms, 4 million 64 bit integers (32 MB of psuedo random data) were sorted, did it really matter if one sort algorithm took .352 second versus another sorth algorithm that took .554 second, as both were less than 1 second?

Cantab Morgan · Apr 25, 2009

Jeff Reid said:

The overhead goes beyond just the call stack, depending on the machine. In the recursive version, the local variables are forced to be memory (stack) based, while the iterative version allows the local variables to be register based.

The bit about the registers is a really good insight. I would have forgotten that because normally I'm more worried about recursive algorithms exhausting memory.

However, if the compiler is worth its salt, it won't actually do the stack push. The resulting assembly language should look like an iterative solution (and be able to exploit registers).

The exception to this is when the instruction set is limited, and tail recursion is not handled well. Such is the case in the Java Virtual Machine, for example. Except for this Achilles' heel, recursion is a lovely way to organize certain programs, and should be preferred in those cases where clarity is enhanced.

Borek · Apr 26, 2009

Jeff Reid said:

Then again, in the case of modern PC's, performance is so fast and memory space so large that it simply doesn't matter. In a thread on sorting algorithms, 4 million 64 bit integers (32 MB of psuedo random data) were sorted, did it really matter if one sort algorithm took .352 second versus another sorth algorithm that took .554 second, as both were less than 1 second?

That's a poor example, as it doesn't say anything about the time complexity, although - as long as you know that your algorithm won't be used for larger data sets - spending time on fine tuning details is not worth it. I suppose that's what you meant.

Recursive Fibonacci numbers generation is an easy way of challenging your processor with a lot of work

That's an over ten years old story on the theme:

http://www.bpp.com.pl/pentium/pentium_ii.htm

rcgldr · Apr 26, 2009

Borek said:

Recursive Fibonacci numbers generation is an easy way of challenging your processor with a lot of work.
http://www.bpp.com.pl/pentium/pentium_ii.htm

Piggy little program, on my Core 2 X6800 (2.93ghz), it took 207 ticks = 11.37 seconds. Then again, it took longer to reboot my system to enable floppy (I usually keep it disabled) and boot from floppy, transfer the program to a bootable dos floppy, reboot again in dos to run the program, reboot again to turn disable floppy again, and reboot back to XP. Plus the time it took to read your post, and write my response.

... sort 4 million 64 bit integers in less than a second.

Borek said:

That's a poor example, as it doesn't say anything about the time complexity, although - as long as you know that your algorithm won't be used for larger data sets.

Those sort times were just an example that the dataset wasn't large enough to be concerned about reasonable implementations of sort algorithms.

Plus my main point here was that it appeared that this usage of recursion was just for learning purposes, not as a good example for using recursion.

tnutty · Apr 27, 2009

recursive fib function :

Code:

fib( int x)
{
   unsigned int result = 1;
    if(x == 0 || x == 1) return 1;
    else
   result +=  fib( x-2) + fib(n-1) 
   return result
}

Borek · Apr 27, 2009

tnutty said:

There are at least two semicolons missing. You should not return unsigned int when your fuction is expected to return int. You define local variable that you don't need. You assign initial value that you don't need. And finally - it is wrong, as it generates 1, 1, 3 instead of 1, 1, 2.

signerror · Apr 27, 2009

Cantab Morgan said:

However, if the compiler is worth its salt, it won't actually do the stack push. The resulting assembly language should look like an iterative solution (and be able to exploit registers).

Nope. Here's what gcc -S -O3 (highest optimization level) yields on x86, OS X:

Code:

_fac:
	pushl	%ebp
	movl	$1, %eax
	movl	%esp, %ebp
	pushl	%edi
	pushl	%esi
	subl	$48, %esp
	movl	8(%ebp), %esi
	cmpl	$1, %esi
	jle	L4
	leal	-1(%esi), %edi
	cmpl	$1, %edi
	jle	L7
	leal	-2(%esi), %eax
	movl	%eax, -36(%ebp)
	movl	$1, %eax
	cmpl	$1, -36(%ebp)
	jle	L10
	leal	-3(%esi), %eax
	movl	%eax, -32(%ebp)
	movl	$1, %eax
	cmpl	$1, -32(%ebp)
	jle	L13
	leal	-4(%esi), %eax
	movl	%eax, -28(%ebp)
	movl	$1, %eax
	cmpl	$1, -28(%ebp)
	jle	L16
	leal	-5(%esi), %eax
	movl	%eax, -24(%ebp)
	movl	$1, %eax
	cmpl	$1, -24(%ebp)
	jle	L19
	leal	-6(%esi), %eax
	movl	%eax, -20(%ebp)
	movl	$1, %eax
	cmpl	$1, -20(%ebp)
	jle	L22
	leal	-7(%esi), %eax
	movl	%eax, -16(%ebp)
	movl	$1, %eax
	cmpl	$1, -16(%ebp)
	jle	L25
	leal	-8(%esi), %eax
	movl	%eax, -12(%ebp)
	movl	$1, %eax
	cmpl	$1, -12(%ebp)
	jle	L28
	leal	-9(%esi), %eax
	movl	%eax, (%esp)
	call	_fac
	imull	-12(%ebp), %eax
L28:
	imull	-16(%ebp), %eax
L25:
	imull	-20(%ebp), %eax
L22:
	imull	-24(%ebp), %eax
L19:
	imull	-28(%ebp), %eax
L16:
	imull	-32(%ebp), %eax
L13:
	imull	-36(%ebp), %eax
L10:
	imull	%edi, %eax
L7:
	imull	%esi, %eax
L4:
	addl	$48, %esp
	popl	%esi
	popl	%edi
	leave
	ret

GCC does not convert it into an iterative algorithm, and it doesn't even move the function call to registers - all arguments and return values go through the stack. It unrolls the loop to nine levels, which takes advantage of multiple dispatch (there are 3 scalar ALUs on the Core architecture), and also reduces the number of function calls by a factor of 9. Multiple dispatch should make little performance difference, as the bottleneck is memory, not CPU time.Core architecture:

http://en.wikipedia.org/wiki/File:Intel_Core2_arch.svg

Code:

#include <stdio.h>

int fac (int n) {
  if (n <= 1) {return 1;}
  return n * fac(n-1);
}

int main(int argc, char* argv[]) {
  int input;
  sscanf(argv[1], "%u", &input);
  printf("%d\n", fac(input));
  
  return 0;
}

signerror · Apr 27, 2009

Okay, simple test. I showed the recursive factorial: here's the iterative,

Code:

int fac (int n) {
  int i, prod;
  for (i=1, prod=1; i<=n; i++) {
    prod *= n;
  }
  return prod;
}

Compiling both at -O0 and -O3, and timing:

Code:

bash-3.2$ time ./recfac3 1000000
0

real	0m0.026s
user	0m0.011s
sys	0m0.013s

bash-3.2$ time ./iterfac3 1000000
0

real	0m0.009s
user	0m0.005s
sys	0m0.004s

bash-3.2$ time ./iterfac0 1000000
0

real	0m0.013s
user	0m0.009s
sys	0m0.004s

(Ignore the obviously incorrect answer: the integers overflowed many times, and eventually stabilized on 0).

The recursive version segfaults at -O0 (stack overflows). Here it is with smaller samples (10^5, not 10^6):

Code:

bash-3.2$ time ./recfac0 100000
0

real	0m0.018s
user	0m0.007s
sys	0m0.011s
bash-3.2$ time ./recfac3 100000
0

real	0m0.008s
user	0m0.002s
sys	0m0.005s

Lots of interesting things happening here. You see the huge effects of loop unrolling (recfac0 -> recfac3), vastly reducing the number of memory accesses. You also see a large effect in CPU time ('user'), from the CPU scheduling multiple instructions to run at the same time in parallel (multiple dispatch); but this is unimportant here, dwarfed by memory time. And there is another vast speedup in making the algorithm iterative and completely in registers (iterfac0 and interfac3). This second speedup is bigger here than unrolling loops - compare iterfac0 (registers, no unrolling loops) with recfac3 (recursive, memory stack, unrolled loops).

Here is iterfac0, which is completely in registers:

Code:

_fac:
	pushl	%ebp
	movl	%esp, %ebp
	subl	$24, %esp
	movl	$1, -16(%ebp)
	movl	$1, -12(%ebp)
	jmp	L2
L3:
	movl	-12(%ebp), %eax
	imull	8(%ebp), %eax
	movl	%eax, -12(%ebp)
	leal	-16(%ebp), %eax
	incl	(%eax)
L2:
	movl	-16(%ebp), %eax
	cmpl	8(%ebp), %eax
	jle	L3
	movl	-12(%ebp), %eax
	leave
	ret

rcgldr · Apr 27, 2009

signerror said:

Here is iterfac0, which is completely in registers: ...

All of those (%ebp) and (%eax) are memory references. Microsoft (old VC 4.0 version in this case) compiled output of same source other than the initial set edx to immediate 1, and move ecx, _n$ from the stack, only registers are used (if fastcall was used, "n" would have been passed in a register):

fixed version of fac:

Code:

int fac (int n) {
    int i, prod;
    for (i=1, prod=1; i<=n; i++) {
        prod *= i;    // i instead of n here
    }
    return prod;
}

assembly output (note instruction operand format is dst, src):

Code:

_n$ = 8
_fac    PROC NEAR
        mov     edx, 1
        mov     ecx, DWORD PTR _n$[esp-4]
        mov     eax, edx
        cmp     ecx, edx
        jl      SHORT $L81
$L79:   imul    eax, edx
        inc     edx
        cmp     edx, ecx
        jle     SHORT $L79
$L81:   ret     0
_fac    ENDP

I don't think that the Microsoft compilers will attempt to unfold loops. Also these simple examples might be good as learning tools, but not good examples of using recursion. One typical example of recursion is searching through a tree, for example, producing a list of all files and subdirectories in a partition. I usually do this with a function that has two main loops (each loop could be a sub-function), the first loop shows all the files in the current directory, the second loop then rescans the current directory looking for sub-directories and makes a recursive call.

signerror · Apr 27, 2009

You're completely right. I shouldn't comment in the middle of the night.

rcgldr · Apr 27, 2009

For fac(), you could optimize the code knowing compiler and machine language, reducing number of instructions per loop from 4 to 3, the equivalent of using C as a macro language for assembly:

Code:

int fac (int n) {
    int i, prod;
    for (i=n, prod=1; --i != 0;) {
        prod *= i;
    }
    return prod;
}

Code:

fac     PROC NEAR
_n$ = ecx
        mov     eax, 1
        dec     ecx
        je      SHORT $L81
$L80:   imul    eax, ecx
        dec     ecx
        jne     SHORT $L80
$L81:   ret     0
fac     ENDP

Unless you're doing something like computing pi to 10,000,000 digits in less than 30 seconds (PiFast43 can do this), or real time programming where response times aren't arbitrary, this level of optimization isn't needed.

Which Fibonacci number would you like me to compute

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Who May Find This Useful

Similar threads

Sweetspot of data compression

Other than just FizzBuzz to test programmer candidates

How to show RS(U+TRS)* is equivalent to (R+SUT)SU?

HTML/CSS Problems with DNS records

PHP My website presents the visitor with the choice of opting out of using cookies....

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect