In line vs. In function produces different results in C++

In summary, the conversation discusses the speed difference between using the function Hamiltonian(q,b) and copying the code directly into the main statement. The discussion includes potential reasons for this difference, such as the compiler's optimization and the number of variables in registers. It is suggested to use optimization and to put the garbage collection in a separate function. The conversation also mentions the use of global variables and the issue of bad allocation errors.
  • #1
maverick_starstrider
1,119
6
Hi, I'm working on a code where essentially I have two very large arrays q and b and I have this function Hamiltonian(double * v1, double * v2) which takes the points of two arrays and does all sorts of things with them. Anyways. If I write my code using Hamiltonian(q,b) my code runs FASTER than if I just copy and paste everything that was in the function Hamiltonian and just plunk it in my main statement. Why should this be? Doesn't the compiler just put it in line anyways when it is compiling to assembly?
 
Technology news on Phys.org
  • #2
No when it compiles the code it separates out the function and calls it when it's needed. Inline only saves you the cost of putting the arguments on the stack, doing a jump to the function and reading them back - basically a clock cycle on a modern cpu.

I can't see a reason that putting the code into the main function would speed things up - did you only test it once? It could just have the contents cached?

ps. Unless you are doing something dumb with the addresses you pass, like copying the values into another array one at a time.
 
  • #3
mgb_phys said:
No when it compiles the code it separates out the function and calls it when it's needed. Inline only saves you the cost of putting the arguments on the stack, doing a jump to the function and reading them back - basically a clock cycle on a modern cpu.

I can't see a reason that putting the code into the main function would speed things up - did you only test it once? It could just have the contents cached?

ps. Unless you are doing something dumb with the addresses you pass, like copying the values into another array one at a time.

The gist of the relevant part of my main statement is:

void Hamiltonian(double * v1, double * v2){
blah blah blah
}

main(){
double q = new double[bigNumber];
double b= new double[bigNumber];
Hamiltonian(q,b);
*garbage collection*
}
it runs faster then when I take blah blah blah and just plop it in the code (replacing all references to v1 and v2 to q and b obviously). And I've run both version about 10 times and there is a noticeable speed up.

P.S. it's when my code uses the function that it SPEEDS UP. When I just plop it in the code IT IS SLOWER. Thus my conundrum.
 
  • #4
I suspect that either there's something funny going on with your compiler's optimization (try using maximum optimization) or there's something going on in the "blah blah blah" part (maybe the inlined version does not do the same thing as the other version, due to some mistake).
 
  • #5
There are only a handful of CPU registers that can be used as variables. When you plop all code in line in the same block, the compiler assumes all variables must be available to all statements, so it can't trade-off the usage of CPU registers. So it runs out of CPU registers and has to start using RAM for variable space (which ends up being L1 cache unless you're messing with enough data to fill it up, then it goes to the L2 cache, then main storage). When you call a function, local variables are pushed on the stack once before the call, then the function has all those CPU registers free to use, then when the function returns the variables are popped off the stack back into the CPU registers. This doesn't mean functions are always faster! Its just that in your case the number of variables needed, and the amount of time it takes to push/pop etc. all worked out that it happened to be faster to push the variables so that the function had more CPU registers to use. That's why its usually best to let the optimizer decide whether to inline functions.
 
  • #6
Actually, fleem, an optimizing compiler will detect when a variable does not need to be in a register. http://en.wikipedia.org/wiki/Register_allocation

Anyway, in this case that wouldn't be an issue even when no optimization is attempted, since (from what he has written) there does not appear to be a significant number of extra variables in the main() frame. However, he didn't even fill in the arguments to main, so I think there may be a lot he's not telling us.
 
  • #7
Depends on the compiler and the cpu. In this case it appears that you end up with more variables in registers than memory if you use the function. You could try inlining the main function and putting the garbage collection stuff in another function.
 
  • #8
mXSCNT said:
Actually, fleem, an optimizing compiler will detect when a variable does not need to be in a register. http://en.wikipedia.org/wiki/Register_allocation

Anyway, in this case that wouldn't be an issue even when no optimization is attempted, since (from what he has written) there does not appear to be a significant number of extra variables in the main() frame. However, he didn't even fill in the arguments to main, so I think there may be a lot he's not telling us.

Oh ya. There's a lot of variables in my main. And I actually never create any new variables in my functions. All my functions just manipulate global variables and return void. I do this because when I said q and b were really large they're actually going to be as large as I can possible allocate on a given node. Thus I can't have my functions need a bunch more space because there might be no room and my code might throw a bad allocation error in the middle of its running (unlike at the beginning like it does now).

And yes. My code is actually over 1000 lines so I just wrote down what was relevant to the specific problem.

Anywho, thanks for the help.
 
  • #9
mXSCNT said:
Actually, fleem, an optimizing compiler will detect when a variable does not need to be in a register. http://en.wikipedia.org/wiki/Register_allocation

Anyway, in this case that wouldn't be an issue even when no optimization is attempted, since (from what he has written) there does not appear to be a significant number of extra variables in the main() frame. However, he didn't even fill in the arguments to main, so I think there may be a lot he's not telling us.

The optimizer often fails to do this reasonably when there are loops in the code for which the optimizer cannot know how many iterations there will be. Or the code is simply too complex (too much indirection, variable array indexes, etc.) for the optimizer to keep track of usage.
 
  • #10
Well, if any optimization is performed I'm sure the arguments to main() will not go in registers since they are not used at all, and then (assuming the "blah blah blah" part is accurately transcribed and there aren't any extra variables above that point which maverick_starstrider didn't mention) there is the same number of variables in the inlined version as there is in the function call.
 
  • #11
mXSCNT said:
Well, if any optimization is performed I'm sure the arguments to main() will not go in registers since they are not used at all, and then (assuming the "blah blah blah" part is accurately transcribed and there aren't any extra variables above that point which maverick_starstrider didn't mention) there is the same number of variables in the inlined version as there is in the function call.

Yes, all my functions only manipulate global variables that were declared at the very start of the code.
 
  • #12
Could it also be that when the code is plugged in the main, many temporary variables cannot be released, while when tucked into a method, garbage collection is sure to get rid of the unused local variables, and probably faster?
 
  • #13
On running out of memory...

Local variables are often not put into memory at all -- they are frequently completely ephemeral things whose lifespan is spent entirely in registers. Even when that doesn't happen, local variables are placed onto the stack, which may be a different part of memory than the one that new accesses. The same is true of array variables -- ones declared as
int foo[47]; // This goes on the stack, not the heap

Incidentally, this is why on many computers, the declaration
int bigarray[1 << 22]; // I overflow the stack! Haha!
will cause a segmentation fault (because the OS limits how big the stack can be, and this exceeds that limit), whereas
int *bigarray = new int[1 << 22]; // I live on the heap. It's very roomy here
will succeed.

Unless you are doing something very silly -- e.g. allocating lots of large arrays on the stack or extremely deeply nested function calls each with lots of local state -- you will never run out of memory because you allocated local variables. (Unless you intentionally decreased the amount of stack space your program allocates, or are on a peculiar architecture that likes to have tiny stacks)

Aside from the programming drawbacks of global variables being used in that way, you will often get better performance by having your scalar variables -- or even small to medium sized arrays -- be local variables. If your function needs lots of scratch space and you really are concerned about functions allocating their own memory, you can pass in a buffer as an argument, such as this routine to multiply multi-precision integers:
// result and buffer must be arrays of xwords * ywords elements
void poorly_implemented_high_school_multiply(unsigned long *result, size_t *rwords, const unsigned long *x, size_t xwords, const unsigned long *y, size_t ywords, unsigned long *buffer);
 
Last edited:
  • #14
And yes. My code is actually over 1000 lines so I just wrote down what was relevant to the specific problem.
Are you absolutely, positively sure? It's astonishingly easy to overlook relevant things when it comes to high-performance code... You won't believe how many times I've found errors, slow bits, or other problems in mine and others' code by checking for something that was seemingly impossible.

Possibly silly question -- you are compiling with optimization flags turned on, right?

Another possibly silly question -- how are you timing things? That can be surprisingly tricky to get right as well.
 
Last edited:
  • #15
Incidentally, the drawback (at least, the only one I really know) to inlining (large) functions is bloated code size -- among the negative effects it could potentially have are:

* The optimizer might have a more difficult time optimizing one giant function rather than several smaller functions.
(But do keep in mind the converse problem -- modularity can sometimes make it impossible for the optimizer to do things)

* More memory traffic -- more bandwidth to the L2 cache (or memory) to fetch instructions, and more opportunities for icache misses.
(But it's hard to imagine how this would happen if your function is called exactly once)

* Confuse the icache controller? I know very, very little about how the hardware on modern processors manage instruction data -- so I still find it plausible that some systems might have an easier time when dealing with a function call rather than with inlined instructions.
 
  • #16
Hurkyl said:
Are you absolutely, positively sure? It's astonishingly easy to overlook relevant things when it comes to high-performance code... You won't believe how many times I've found errors, slow bits, or other problems in mine and others' code by checking for something that was seemingly impossible.

Possibly silly question -- you are compiling with optimization flags turned on, right?

Another possibly silly question -- how are you timing things? That can be surprisingly tricky to get right as well.

The compiler arguments are actually hidden from me, although I assume they are running with optimization flags. This is because in the environment my code runs on all the standard compiler commands (for example mpiCC -o blah blah.cpp) are actually just wrappers for a proprietary compiler that is used (pathscale I think) and the wrapper is hidden from me.

I'm timing things by just using MPI's MPI_Wtime() function which according to the specification assures accuracy on all implementations. I just get the MPI_Wtime() at the beginning of my main and then subtract that from the value at the end of my main.
 
  • #17
run this on another machine and see if the unexpected speed up still occurs. my guess is it's something to do with how the program gets loaded into memory and paged in/out during execution, esp. since large arrays of data are involved. might not see the same result on another machine, or even on the same machine if many programs are running, vs. only one.
 

What is the difference between in line and in function in C++?

Inline and function are two different ways of defining and calling functions in C++. In line functions are defined within the class declaration and are automatically expanded at the point of calling, while in function calls are made to a separate function definition outside of the class declaration.

What is the advantage of using in line functions in C++?

The main advantage of using in line functions is that it can improve the performance of the program by eliminating the overhead of function calls. This is because the code for in line functions is directly inserted into the calling code at compile time, rather than having to jump to a separate function definition.

Can any function be defined as in line in C++?

No, not all functions can be defined as in line in C++. In line functions are generally used for small, simple functions. Functions that are too complex or have a large amount of code may not be suitable for in line definition.

What are the limitations of using in line functions in C++?

One limitation of in line functions is that it can increase the size of the compiled code, which can impact the memory usage of the program. Additionally, in line functions cannot be used for recursive functions or virtual functions.

When should I use in function calls instead of in line functions in C++?

In function calls should be used when the function is more complex or has a larger amount of code. Additionally, in function calls allow for more flexibility, such as being able to use recursive or virtual functions. In general, it is up to the programmer's discretion to determine when to use in line functions or in function calls in their code.

Similar threads

  • Programming and Computer Science
Replies
2
Views
364
  • Programming and Computer Science
Replies
2
Views
1K
  • Programming and Computer Science
Replies
6
Views
8K
  • Programming and Computer Science
Replies
5
Views
1K
  • Programming and Computer Science
Replies
20
Views
1K
  • Programming and Computer Science
Replies
20
Views
2K
  • Programming and Computer Science
Replies
6
Views
909
  • Programming and Computer Science
2
Replies
39
Views
3K
  • Programming and Computer Science
Replies
4
Views
3K
  • Programming and Computer Science
Replies
1
Views
941
Back
Top