Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Faster please?

  1. Jun 19, 2008 #1
    EDIT: OK sorry, I'll try and be clearer.

    I'm trying to make this program finish more quickly. For this program to give a good result I need the for loop to run at least 10^9 times. Which takes hours...

    Any way to make this program with n=10^9 finish more quickly?

    Code (Text):

    # include <cstdlib>
    # include <iostream>
    # include <iomanip>
    # include <stdio.h>
    # include <math.h>
    # include <cmath>
    # include <string>
    # include <fstream>
    # include <process.h>
    #include <limits>

    using namespace std;

    # include "sobol.H"

    const int dim=7;
    double H=0.000001;

    const double E=exp(1.0);
    const double c=299792458.0;
    const double Pi=3.14159265;

    const double lp=0.0000002;

    const double hbar=1.05457148*pow(10.0,-34);

    const double VLargeL=20.0*H/lp;

    double S2Min=0;
    double x2Min=0; double xd2Min=-VLargeL;
    double y2Min=0; double yd2Min=-VLargeL;
    double z2Min=-VLargeL; double zd2Min=H/lp;

    double S2Max=10.0;
    double x2Max=VLargeL; double xd2Max=VLargeL;
    double y2Max=VLargeL; double yd2Max=VLargeL;
    double z2Max=0; double zd2Max=VLargeL;

    double S2Range=(S2Max - S2Min); double x2Range=x2Max-x2Min; double xd2Range=xd2Max-xd2Min;
    double y2Range=y2Max-y2Min; double yd2Range=yd2Max-yd2Min; double z2Range=z2Max-z2Min;
    double zd2Range=zd2Max-zd2Min;

    double Integrand1(double S2, double x2, double xd2, double y2, double yd2,
                      double z2, double zd2)
    {
        return (1 + 4*Pi*S2*(sqrt(pow(x2 - xd2,2) + pow(y2 - yd2,2) + pow(z2 - zd2,2)) +
            Pi*S2*(pow(x2 - xd2,2) + pow(y2 - yd2,2) + pow(z2 - zd2,2))))/
       (pow(E,4*Pi*S2*sqrt(pow(x2 - xd2,2) + pow(y2 - yd2,2) + pow(z2 - zd2,2)))*
         pow(2 + pow(S2,2),2)*pow(pow(x2 - xd2,2) + pow(y2 - yd2,2) +
           pow(z2 - zd2,2),3));
    }

    //****************************************************************************80

    int main ( void )
    {
        cout << "Define n by entering an integer please:\n";
        long long int n;
        cin >> n;
        cout << "n = " << n <<"\n";
        cout << "\n";

        double r[dim];
        long long int seed=0;

        long double Result1=0;

        long long int PercentagesShown=0;
        long double OnePercent=n/100.0;

        for(long long int i=0;i<n;i++)
        {
            i8_sobol(dim,&seed,r);
           
            Result1+=Integrand1(r[0] * S2Range,x2Min + (r[1]*x2Range),xd2Min + (r[2]*xd2Range),
                y2Min + (r[3]*y2Range), yd2Min + (r[4]*yd2Range),z2Min + (r[5] * z2Range),
                zd2Min + (r[6] * zd2Range));

            if(i>2000000*(PercentagesShown+1))
            {
                system("cls");
                cout << i << " loops completed";
                PercentagesShown++;
            }
        }

        Result1=(-(c*hbar)/(4.*lp*pow(Pi,2)))*2*S2Range*x2Range*xd2Range*y2Range*yd2Range*z2Range*zd2Range*Result1/(n*pow(2*VLargeL,2));
       
        system("cls");
        cout << "Result = " << Result1 << "\n";

        std::ofstream OutputFile("SQR2ndOrderFlatResults.txt", std::ios_base::out | std::ios_base::app);
        OutputFile << "\n\n";
        OutputFile << "2nd Order Flat\n";
        OutputFile << "Very Large Length= " << VLargeL;
        OutputFile << ", N = " << n;
        OutputFile << ", H = " << H;
        OutputFile << ", S2Max = " << S2Max;
        OutputFile << "\nIntegration Result = " << Result1 << "\n";
        OutputFile.close();

        cout <<"\n";
        cout << "Exit program? (y/n)\n";

        std::string BoolStr;
        cin >> BoolStr;
        while(BoolStr!="y")
        {
            cout <<"\n";
            cout << "User chose not to exit program.\n";
            cout << "Exit program? (y/n)\n";
            cin >> BoolStr;
        }
    }
     
     
    Last edited: Jun 19, 2008
  2. jcsd
  3. Jun 19, 2008 #2

    mgb_phys

    User Avatar
    Science Advisor
    Homework Helper

    Coherent question please?
     
  4. Jun 19, 2008 #3
    OK sorry, I'll try and be clearer.

    I'm trying to make this program finish more quickly. For this program to give a good result I need the for loop to run at least 10^9 times. Which takes hours...

    Any way to make this program with n=10^9 finish more quickly?
     
  5. Jun 19, 2008 #4

    mgb_phys

    User Avatar
    Science Advisor
    Homework Helper

    There isn't anything obvious. You are basically just doing an awful lot of floating point calculations!
    Check you are using the optomise flag on your compiler, but thats' about it.

    The only obvious thing would be to try and parallel it.
    As far as I can see the calls to Integrand1() are independant, it doesn't depend on the previous call and doesn't modify any of it's arguements.
    So could you just have 10 machines and run 10^8 iterations on each?
     
  6. Jun 19, 2008 #5

    CRGreathouse

    User Avatar
    Science Advisor
    Homework Helper

    So the essential problem is that you want to evaluate a complicated function a billion times, and you want to speed the process up. Other than small concerns, like the above and loop unrolling, all that can be done is to make the function itself more efficient.

    So first, a question. What does i8_sobol do? It's not apparently defined in this code.


    There are some minor things that can be done regardless, though none will give you a huge boost.
    1. There are better ways to display progress that don't involve checking every loop. Make an outer loop that runs a hundred times, then an inner loop that runs n/100 times each. This removes a test from each loop.
    2. Replace pow(E, ...) with exp(...). This is faster to run, more precise, and doesn't require you to pass a double on the stack.
    3. You're duplicating calculation. Try this:
    Code (Text):
    double Integrand1(double S2, double x2, double xd2, double y2, double yd2,
                      double z2, double zd2)
    {
        double sumOfSquares = pow(x2 - xd2,2) + pow(y2 - yd2,2) + pow(z2 - zd2,2);
        double sqrtSOS = sqrt(sumOfSquares);
        double piS2 = Pi * S2;
        return (1 + 4*piS2*(sqrtSOS + piS2*sumOfSquares)) /
            (exp(4*piS2*sqrtSOS)* pow(2 + pow(S2,2),2) * pow(sumOfSquares,3));
    }
    4. I don't know how it's implemented, but I doubt pow(blah, 2) is as efficient as blah * blah since you're only dealing with doubles. It may be faster to do S2 * S2 than pow(S2, 2). For the xyz, you can do x2 = x2 - xd2 then x2 * x2. This would be worth testing for some small value of n.
     
  7. Jun 19, 2008 #6

    mgb_phys

    User Avatar
    Science Advisor
    Homework Helper

    A common way to increment progress in a very big loop is to use the modulus operator.
    it's a very quick operation.

    // at start of program
    long count=0;

    // inside loop
    if ( !(++count%1000) ) {
    // done count loops - will only output every 1000 iterations
    }
     
  8. Jun 19, 2008 #7

    CRGreathouse

    User Avatar
    Science Advisor
    Homework Helper

    I guess that's perspective. It takes something like 30 cycles on AMD chips and almost double that on Intels... which I see as expensive. The amortized cost of nested loops like I suggested should be less than one ten-thousandth of a cycle.
     
  9. Jun 19, 2008 #8

    mgb_phys

    User Avatar
    Science Advisor
    Homework Helper

    I knew pow() was expensive but I hadn't realised exp() was any quicker, since they both do a similair task - why the big difference?
     
  10. Jun 19, 2008 #9

    CRGreathouse

    User Avatar
    Science Advisor
    Homework Helper

    I hope I didn't claim a big difference, probably less than 30%. But generally speaking pow() is implemented very nearly as "transform -- exp -- transform".

    It was just one of the little things I noticed that can be done better (could be 1 ULP more precise) and faster (a bit).
     
  11. Jun 19, 2008 #10

    DaveC426913

    User Avatar
    Gold Member

    Yeah. Drastically limiting your output will definitely increase speed.
     
  12. Jun 19, 2008 #11

    Dale

    Staff: Mentor

    Have you measured the time consumed in each step? There is absolutely no point in trying to speed up a chunk of code unless you know which part is the bottleneck.
     
  13. Jun 19, 2008 #12

    robphy

    User Avatar
    Science Advisor
    Homework Helper
    Gold Member

    Lots of good advice in this thread... especially with DaleSpam's comment.

    Here is an example of loop unrolling (mentioned by CRGreathouse)
    http://www.codemaestro.com/reviews/5

    I would also use multiplication x*x over pow(x,2).

    To eliminate the overhead of the function call to Integrand1(), you could write a compiler macro http://en.wikipedia.org/wiki/C_preprocessor#Macro_definition_and_expansion

    Is it better to use (instead of looping with a long long int) nested loops with regular ints?

    Is this your i8_sobol()?
    http://people.scs.fsu.edu/~burkardt/cpp_src/sobol/sobol.C
     
  14. Jun 19, 2008 #13

    mgb_phys

    User Avatar
    Science Advisor
    Homework Helper

    I think it was pretty obvious that calling pow() 12 times in each of 10^9 loop iterations is a limiting step. There is only one function call so not much point in measuring it.

    But in general yes - don't optomise until you have measured!
     
  15. Jun 19, 2008 #14

    CRGreathouse

    User Avatar
    Science Advisor
    Homework Helper

    Assuming i8_sobol is fairly fast, my improvements should roughly double the speed of the code. If i8_sobol is slow, then we need to work on that instead of your function.
     
  16. Jun 21, 2008 #15
    Thank you very much for all your replies. I asked a friend for some advice, and although I haven't yet digested all the posts in the thread, it seems a lot of the suggestions are new to me. I will implement all the suggestions on Monday and post back with the re-draft.

    Much appreciated, and many thanks :)
     
  17. Jun 23, 2008 #16
    Hmm I'm a slower worker than I made out :P

    I'll address questions as I read through the posts.

    How please? I'm using Visual C++.

    Thanks.
     
  18. Jun 23, 2008 #17
  19. Jun 23, 2008 #18

    CRGreathouse

    User Avatar
    Science Advisor
    Homework Helper

    If you really want it to be fast, I'd use something other than VisualStudio. I like it for making GUIs, but it compiles to MSIL which isn't that fast -- and it doesn't optimize well.

    Having said that, here's the way. Hit Alt + F7, then:
    • Configuration properties, Whole Program Optimization (choose Profile Guided Optimization - Optimize if you have the full version, or No Whole Program Optimization if you have Express)
    • C/C++, General, Debug Information Format (choose Disabled)
    • C/C++, Optimization, Optimization (choose Full Optimization (/Ox))
    • C/C++, Code Generation, Enable Minimal Rebuild (choose No)
    • C/C++, Code Generation, Basic Runtime Checks (choose default)

    You can play around with the other settings also.
     
  20. Jun 23, 2008 #19

    CRGreathouse

    User Avatar
    Science Advisor
    Homework Helper

    I can't get i8_sobol to compile. I even stripped it down to two functions (i8_sobol itself and i8_bit_lo0 which it calls) and the compiler just hangs.
     
  21. Jun 23, 2008 #20

    mgb_phys

    User Avatar
    Science Advisor
    Homework Helper

    Visual studio uses the MS version 9 C++ compiler, it only compiles c# to MSIL.
    The latest MS compiler isn't bad (at least it finally compiles STL cleanly!), we found that it sometimes beat both GCC and Intel on certain code, the whole program optimisation is very good.
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook

Have something to add?