As Jeff and AlephZero point out, performance depends on hardware, compiler, code, and the algorithm.
If you write your program to run on a relatively recent x86/x86-64 machine using vector (packed) SSE, you might see a sharp difference between 32 and 64 bit FP. This is because there are two...