Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Faster please?

  1. Jun 19, 2008 #1
    EDIT: OK sorry, I'll try and be clearer.

    I'm trying to make this program finish more quickly. For this program to give a good result I need the for loop to run at least 10^9 times. Which takes hours...

    Any way to make this program with n=10^9 finish more quickly?

    Code (Text):

    # include <cstdlib>
    # include <iostream>
    # include <iomanip>
    # include <stdio.h>
    # include <math.h>
    # include <cmath>
    # include <string>
    # include <fstream>
    # include <process.h>
    #include <limits>

    using namespace std;

    # include "sobol.H"

    const int dim=7;
    double H=0.000001;

    const double E=exp(1.0);
    const double c=299792458.0;
    const double Pi=3.14159265;

    const double lp=0.0000002;

    const double hbar=1.05457148*pow(10.0,-34);

    const double VLargeL=20.0*H/lp;

    double S2Min=0;
    double x2Min=0; double xd2Min=-VLargeL;
    double y2Min=0; double yd2Min=-VLargeL;
    double z2Min=-VLargeL; double zd2Min=H/lp;

    double S2Max=10.0;
    double x2Max=VLargeL; double xd2Max=VLargeL;
    double y2Max=VLargeL; double yd2Max=VLargeL;
    double z2Max=0; double zd2Max=VLargeL;

    double S2Range=(S2Max - S2Min); double x2Range=x2Max-x2Min; double xd2Range=xd2Max-xd2Min;
    double y2Range=y2Max-y2Min; double yd2Range=yd2Max-yd2Min; double z2Range=z2Max-z2Min;
    double zd2Range=zd2Max-zd2Min;

    double Integrand1(double S2, double x2, double xd2, double y2, double yd2,
                      double z2, double zd2)
    {
        return (1 + 4*Pi*S2*(sqrt(pow(x2 - xd2,2) + pow(y2 - yd2,2) + pow(z2 - zd2,2)) +
            Pi*S2*(pow(x2 - xd2,2) + pow(y2 - yd2,2) + pow(z2 - zd2,2))))/
       (pow(E,4*Pi*S2*sqrt(pow(x2 - xd2,2) + pow(y2 - yd2,2) + pow(z2 - zd2,2)))*
         pow(2 + pow(S2,2),2)*pow(pow(x2 - xd2,2) + pow(y2 - yd2,2) +
           pow(z2 - zd2,2),3));
    }

    //****************************************************************************80

    int main ( void )
    {
        cout << "Define n by entering an integer please:\n";
        long long int n;
        cin >> n;
        cout << "n = " << n <<"\n";
        cout << "\n";

        double r[dim];
        long long int seed=0;

        long double Result1=0;

        long long int PercentagesShown=0;
        long double OnePercent=n/100.0;

        for(long long int i=0;i<n;i++)
        {
            i8_sobol(dim,&seed,r);
           
            Result1+=Integrand1(r[0] * S2Range,x2Min + (r[1]*x2Range),xd2Min + (r[2]*xd2Range),
                y2Min + (r[3]*y2Range), yd2Min + (r[4]*yd2Range),z2Min + (r[5] * z2Range),
                zd2Min + (r[6] * zd2Range));

            if(i>2000000*(PercentagesShown+1))
            {
                system("cls");
                cout << i << " loops completed";
                PercentagesShown++;
            }
        }

        Result1=(-(c*hbar)/(4.*lp*pow(Pi,2)))*2*S2Range*x2Range*xd2Range*y2Range*yd2Range*z2Range*zd2Range*Result1/(n*pow(2*VLargeL,2));
       
        system("cls");
        cout << "Result = " << Result1 << "\n";

        std::ofstream OutputFile("SQR2ndOrderFlatResults.txt", std::ios_base::out | std::ios_base::app);
        OutputFile << "\n\n";
        OutputFile << "2nd Order Flat\n";
        OutputFile << "Very Large Length= " << VLargeL;
        OutputFile << ", N = " << n;
        OutputFile << ", H = " << H;
        OutputFile << ", S2Max = " << S2Max;
        OutputFile << "\nIntegration Result = " << Result1 << "\n";
        OutputFile.close();

        cout <<"\n";
        cout << "Exit program? (y/n)\n";

        std::string BoolStr;
        cin >> BoolStr;
        while(BoolStr!="y")
        {
            cout <<"\n";
            cout << "User chose not to exit program.\n";
            cout << "Exit program? (y/n)\n";
            cin >> BoolStr;
        }
    }
     
     
    Last edited: Jun 19, 2008
  2. jcsd
  3. Jun 19, 2008 #2

    mgb_phys

    User Avatar
    Science Advisor
    Homework Helper

    Coherent question please?
     
  4. Jun 19, 2008 #3
    OK sorry, I'll try and be clearer.

    I'm trying to make this program finish more quickly. For this program to give a good result I need the for loop to run at least 10^9 times. Which takes hours...

    Any way to make this program with n=10^9 finish more quickly?
     
  5. Jun 19, 2008 #4

    mgb_phys

    User Avatar
    Science Advisor
    Homework Helper

    There isn't anything obvious. You are basically just doing an awful lot of floating point calculations!
    Check you are using the optomise flag on your compiler, but thats' about it.

    The only obvious thing would be to try and parallel it.
    As far as I can see the calls to Integrand1() are independant, it doesn't depend on the previous call and doesn't modify any of it's arguements.
    So could you just have 10 machines and run 10^8 iterations on each?
     
  6. Jun 19, 2008 #5

    CRGreathouse

    User Avatar
    Science Advisor
    Homework Helper

    So the essential problem is that you want to evaluate a complicated function a billion times, and you want to speed the process up. Other than small concerns, like the above and loop unrolling, all that can be done is to make the function itself more efficient.

    So first, a question. What does i8_sobol do? It's not apparently defined in this code.


    There are some minor things that can be done regardless, though none will give you a huge boost.
    1. There are better ways to display progress that don't involve checking every loop. Make an outer loop that runs a hundred times, then an inner loop that runs n/100 times each. This removes a test from each loop.
    2. Replace pow(E, ...) with exp(...). This is faster to run, more precise, and doesn't require you to pass a double on the stack.
    3. You're duplicating calculation. Try this:
    Code (Text):
    double Integrand1(double S2, double x2, double xd2, double y2, double yd2,
                      double z2, double zd2)
    {
        double sumOfSquares = pow(x2 - xd2,2) + pow(y2 - yd2,2) + pow(z2 - zd2,2);
        double sqrtSOS = sqrt(sumOfSquares);
        double piS2 = Pi * S2;
        return (1 + 4*piS2*(sqrtSOS + piS2*sumOfSquares)) /
            (exp(4*piS2*sqrtSOS)* pow(2 + pow(S2,2),2) * pow(sumOfSquares,3));
    }
    4. I don't know how it's implemented, but I doubt pow(blah, 2) is as efficient as blah * blah since you're only dealing with doubles. It may be faster to do S2 * S2 than pow(S2, 2). For the xyz, you can do x2 = x2 - xd2 then x2 * x2. This would be worth testing for some small value of n.
     
  7. Jun 19, 2008 #6

    mgb_phys

    User Avatar
    Science Advisor
    Homework Helper

    A common way to increment progress in a very big loop is to use the modulus operator.
    it's a very quick operation.

    // at start of program
    long count=0;

    // inside loop
    if ( !(++count%1000) ) {
    // done count loops - will only output every 1000 iterations
    }
     
  8. Jun 19, 2008 #7

    CRGreathouse

    User Avatar
    Science Advisor
    Homework Helper

    I guess that's perspective. It takes something like 30 cycles on AMD chips and almost double that on Intels... which I see as expensive. The amortized cost of nested loops like I suggested should be less than one ten-thousandth of a cycle.
     
  9. Jun 19, 2008 #8

    mgb_phys

    User Avatar
    Science Advisor
    Homework Helper

    I knew pow() was expensive but I hadn't realised exp() was any quicker, since they both do a similair task - why the big difference?
     
  10. Jun 19, 2008 #9

    CRGreathouse

    User Avatar
    Science Advisor
    Homework Helper

    I hope I didn't claim a big difference, probably less than 30%. But generally speaking pow() is implemented very nearly as "transform -- exp -- transform".

    It was just one of the little things I noticed that can be done better (could be 1 ULP more precise) and faster (a bit).
     
  11. Jun 19, 2008 #10

    DaveC426913

    User Avatar
    Gold Member

    Yeah. Drastically limiting your output will definitely increase speed.
     
  12. Jun 19, 2008 #11

    Dale

    Staff: Mentor

    Have you measured the time consumed in each step? There is absolutely no point in trying to speed up a chunk of code unless you know which part is the bottleneck.
     
  13. Jun 19, 2008 #12

    robphy

    User Avatar
    Science Advisor
    Homework Helper
    Gold Member

    Lots of good advice in this thread... especially with DaleSpam's comment.

    Here is an example of loop unrolling (mentioned by CRGreathouse)
    http://www.codemaestro.com/reviews/5 [Broken]

    I would also use multiplication x*x over pow(x,2).

    To eliminate the overhead of the function call to Integrand1(), you could write a compiler macro http://en.wikipedia.org/wiki/C_preprocessor#Macro_definition_and_expansion

    Is it better to use (instead of looping with a long long int) nested loops with regular ints?

    Is this your i8_sobol()?
    http://people.scs.fsu.edu/~burkardt/cpp_src/sobol/sobol.C [Broken]
     
    Last edited by a moderator: May 3, 2017
  14. Jun 19, 2008 #13

    mgb_phys

    User Avatar
    Science Advisor
    Homework Helper

    I think it was pretty obvious that calling pow() 12 times in each of 10^9 loop iterations is a limiting step. There is only one function call so not much point in measuring it.

    But in general yes - don't optomise until you have measured!
     
  15. Jun 19, 2008 #14

    CRGreathouse

    User Avatar
    Science Advisor
    Homework Helper

    Assuming i8_sobol is fairly fast, my improvements should roughly double the speed of the code. If i8_sobol is slow, then we need to work on that instead of your function.
     
  16. Jun 21, 2008 #15
    Thank you very much for all your replies. I asked a friend for some advice, and although I haven't yet digested all the posts in the thread, it seems a lot of the suggestions are new to me. I will implement all the suggestions on Monday and post back with the re-draft.

    Much appreciated, and many thanks :)
     
  17. Jun 23, 2008 #16
    Hmm I'm a slower worker than I made out :P

    I'll address questions as I read through the posts.

    How please? I'm using Visual C++.

    Thanks.
     
  18. Jun 23, 2008 #17
    Yes it is.


    -----


    Hmm, that's all the questions I have =) Cheers.
     
    Last edited by a moderator: May 3, 2017
  19. Jun 23, 2008 #18

    CRGreathouse

    User Avatar
    Science Advisor
    Homework Helper

    If you really want it to be fast, I'd use something other than VisualStudio. I like it for making GUIs, but it compiles to MSIL which isn't that fast -- and it doesn't optimize well.

    Having said that, here's the way. Hit Alt + F7, then:
    • Configuration properties, Whole Program Optimization (choose Profile Guided Optimization - Optimize if you have the full version, or No Whole Program Optimization if you have Express)
    • C/C++, General, Debug Information Format (choose Disabled)
    • C/C++, Optimization, Optimization (choose Full Optimization (/Ox))
    • C/C++, Code Generation, Enable Minimal Rebuild (choose No)
    • C/C++, Code Generation, Basic Runtime Checks (choose default)

    You can play around with the other settings also.
     
  20. Jun 23, 2008 #19

    CRGreathouse

    User Avatar
    Science Advisor
    Homework Helper

    I can't get i8_sobol to compile. I even stripped it down to two functions (i8_sobol itself and i8_bit_lo0 which it calls) and the compiler just hangs.
     
  21. Jun 23, 2008 #20

    mgb_phys

    User Avatar
    Science Advisor
    Homework Helper

    Visual studio uses the MS version 9 C++ compiler, it only compiles c# to MSIL.
    The latest MS compiler isn't bad (at least it finally compiles STL cleanly!), we found that it sometimes beat both GCC and Intel on certain code, the whole program optimisation is very good.
     
  22. Jun 24, 2008 #21

    CRGreathouse

    User Avatar
    Science Advisor
    Homework Helper

    My understanding is that all .NET languages compile to MSIL/CIL. Here's a quick Google quote for support:

    "This is, of course, possible because all .NET language compilers are required to emit IL rather than native code. To a certain extent, this IL generated by a .NET compiler is more or less the same, which could lead one to the conclusion that the language used to create the managed type is irrelevant. Certainly, this is an impression that Microsoft drives home often -- people like me even claim "It's the Runtime Stupid!" But the fact of the matter is that this is not strictly true, as the C++ compiler performs some optimization on the IL it produces, resulting in code that performs better than code generated from the C# or VB.NET compilers." -Introduction to Managed C++


    Regardless of the IL issue, my experience seems to differ from yours. Perhaps I should test this more to see how good the compiler really is.
     
  23. Jun 24, 2008 #22

    CRGreathouse

    User Avatar
    Science Advisor
    Homework Helper

    OK, I did some optimizations and such to the program, and rolled everything into one file. It takes a pretty long time to compile, but at least it does compile for me now.

    There was a lot to optimize in the i8_solbol function, so I made changes there both for readability and speed. (Readability let me see what was going on so I could further optimize)

    Unfortunately I don't know what the program is really doing, so I can't tell if it's working properly. I may have inadvertently changed program behavior while making it run quickly.

    Code (Text):
    // VC++... grumble.
    #include "stdafx.h"

    # include <cstdlib>
    # include <iostream>
    # include <iomanip>
    # include <stdio.h>
    # include <math.h>
    # include <cmath>
    # include <string>
    # include <fstream>
    # include <process.h>
    #include <limits>

    using namespace std;

    const int dim=7;
    double H=0.000001;

    const double c=299792458.0;
    const double Pi=3.141592653589793; // Improved from 3.14159265 to full double precision

    const double lp=0.0000002;
    const double hbar=1.05457148*pow(10.0,-34); // Doesn't seem quite right... not as precise as stated?
    const double VLargeL=20.0*H/lp;

    double S2Min=0;
    double x2Min=0; double xd2Min=-VLargeL;
    double y2Min=0; double yd2Min=-VLargeL;
    double z2Min=-VLargeL; double zd2Min=H/lp;

    double S2Max=10.0;
    double x2Max=VLargeL; double xd2Max=VLargeL;
    double y2Max=VLargeL; double yd2Max=VLargeL;
    double z2Max=0; double zd2Max=VLargeL;

    double S2Range=(S2Max - S2Min); double x2Range=x2Max-x2Min; double xd2Range=xd2Max-xd2Min;
    double y2Range=y2Max-y2Min; double yd2Range=yd2Max-yd2Min; double z2Range=z2Max-z2Min;
    double zd2Range=zd2Max-zd2Min;

    // Formward declarations
    void i8_sobol ( int dim_num, long long int *seed, double quasi[ ] );
    int i8_bit_lo0 ( long long int n );

    inline int count_bits(long long j){
        int m = 0;
        while (j > 1LL) {
            j <<= 2;
            m += 2;
        }
        return m + (int)j;
    }

    double Integrand1(double S2, double x2, double xd2, double y2, double yd2,
                      double z2, double zd2)
    {
        double sumOfSquares = pow(x2 - xd2,2) + pow(y2 - yd2,2) + pow(z2 - zd2,2);
        double sqrtSOS = sqrt(sumOfSquares);
        double piS2 = Pi * S2;
        return (1 + 4*piS2*(sqrtSOS + piS2*sumOfSquares)) /
            (exp(4*piS2*sqrtSOS)* pow(2 + pow(S2,2),2) * pow(sumOfSquares,3));
    }

    int main ( void )
    {
        cout << "Define n by entering an integer please:\n";
        long long int n;
        cin >> n;
        cout << "n = " << n <<"\n";
        cout << "\n";

        double r[dim];
        long long int seed=0;

        long double Result1=0;

        long long int PercentagesShown=0;
        long double OnePercent=n/100.0;

        for(long long int i=0;i<n;i++)
        {
            i8_sobol(dim,&seed,r);
           
            Result1+=Integrand1(r[0] * S2Range,x2Min + (r[1]*x2Range),xd2Min + (r[2]*xd2Range),
                y2Min + (r[3]*y2Range), yd2Min + (r[4]*yd2Range),z2Min + (r[5] * z2Range),
                zd2Min + (r[6] * zd2Range));

            if(i>2000000*(PercentagesShown+1))
            {
                system("cls");
                cout << i << " loops completed";
                PercentagesShown++;
            }
        }

        Result1=(-(c*hbar)/(4.*lp*pow(Pi,2)))*2*S2Range*x2Range*xd2Range*y2Range*yd2Range*z2Range*zd2Range*Result1/(n*pow(2*VLargeL,2));
       
        system("cls");
        cout << "Result = " << Result1 << "\n";

        std::ofstream OutputFile("SQR2ndOrderFlatResults.txt", std::ios_base::out | std::ios_base::app);
        OutputFile << "\n\n";
        OutputFile << "2nd Order Flat\n";
        OutputFile << "Very Large Length= " << VLargeL;
        OutputFile << ", N = " << n;
        OutputFile << ", H = " << H;
        OutputFile << ", S2Max = " << S2Max;
        OutputFile << "\nIntegration Result = " << Result1 << "\n";
        OutputFile.close();

        cout <<"\n";
        cout << "Exit program? (y/n)\n";

        std::string BoolStr;
        cin >> BoolStr;
        while(BoolStr!="y")
        {
            cout <<"\n";
            cout << "User chose not to exit program.\n";
            cout << "Exit program? (y/n)\n";
            cin >> BoolStr;
        }
        //*/
    }

    //////////////////////////////////////////////////////////////////////////////////////////////////////////
    int i8_bit_lo0 ( long long int n )
    {
      int bit;
      bit = 1;

      while (n&1)
      {
        n >>= 1;
        bit++;
      }

      return bit;
    }
    //////////////////////////////////////////////////////////////////////////////////////////////////////////


    void i8_sobol ( int dim_num, long long int *seed, double quasi[ ] )
    {
    # define DIM_MAX 40
    # define DIM_MAX2 1111
    # define LOG_MAX 62
    //
    //  Here, we have commented out the definition of ATMOST, because
    //  in some cases, a compiler was complaining that the value of ATMOST could not
    //  seem to be properly stored.  We only need ATMOST in order to specify MAXCOL,
    //  so as long as we set MAXCOL (below) to what we expect it should be, we
    //  may be able to get around this difficulty.
    //  JVB, 24 January 2006.
    //
    //static long long int atmost = 4611686018427387903;
    //
      static int dim_num_save = 0;
      long long int i;
      bool includ[LOG_MAX];
      static bool initialized = false;
      long long int j;
      long long int k;
      long long int l;
      static long long int lastq[DIM_MAX2];
      long long int m;
      static long long int maxcol;
      long long int newv;
      static long long int poly[DIM_MAX2] =
      {
            1,    3,    7,   11,   13,   19,   25,   37,   59,   47,
           61,   55,   41,   67,   97,   91,  109,  103,  115,  131,
          193,  137,  145,  143,  241,  157,  185,  167,  229,  171,
          213,  191,  253,  203,  211,  239,  247,  285,  369,  299,
          301,  333,  351,  355,  357,  361,  391,  397,  425,  451,
          463,  487,  501,  529,  539,  545,  557,  563,  601,  607,
          617,  623,  631,  637,  647,  661,  675,  677,  687,  695,
          701,  719,  721,  731,  757,  761,  787,  789,  799,  803,
          817,  827,  847,  859,  865,  875,  877,  883,  895,  901,
          911,  949,  953,  967,  971,  973,  981,  985,  995, 1001,
         1019, 1033, 1051, 1063, 1069, 1125, 1135, 1153, 1163, 1221,
         1239, 1255, 1267, 1279, 1293, 1305, 1315, 1329, 1341, 1347,
         1367, 1387, 1413, 1423, 1431, 1441, 1479, 1509, 1527, 1531,
         1555, 1557, 1573, 1591, 1603, 1615, 1627, 1657, 1663, 1673,
         1717, 1729, 1747, 1759, 1789, 1815, 1821, 1825, 1849, 1863,
         1869, 1877, 1881, 1891, 1917, 1933, 1939, 1969, 2011, 2035,
         2041, 2053, 2071, 2091, 2093, 2119, 2147, 2149, 2161, 2171,
         2189, 2197, 2207, 2217, 2225, 2255, 2257, 2273, 2279, 2283,
         2293, 2317, 2323, 2341, 2345, 2363, 2365, 2373, 2377, 2385,
         2395, 2419, 2421, 2431, 2435, 2447, 2475, 2477, 2489, 2503,
         2521, 2533, 2551, 2561, 2567, 2579, 2581, 2601, 2633, 2657,
         2669, 2681, 2687, 2693, 2705, 2717, 2727, 2731, 2739, 2741,
         2773, 2783, 2793, 2799, 2801, 2811, 2819, 2825, 2833, 2867,
         2879, 2881, 2891, 2905, 2911, 2917, 2927, 2941, 2951, 2955,
         2963, 2965, 2991, 2999, 3005, 3017, 3035, 3037, 3047, 3053,
         3083, 3085, 3097, 3103, 3159, 3169, 3179, 3187, 3205, 3209,
         3223, 3227, 3229, 3251, 3263, 3271, 3277, 3283, 3285, 3299,
         3305, 3319, 3331, 3343, 3357, 3367, 3373, 3393, 3399, 3413,
         3417, 3427, 3439, 3441, 3475, 3487, 3497, 3515, 3517, 3529,
         3543, 3547, 3553, 3559, 3573, 3589, 3613, 3617, 3623, 3627,
         3635, 3641, 3655, 3659, 3669, 3679, 3697, 3707, 3709, 3713,
         3731, 3743, 3747, 3771, 3791, 3805, 3827, 3833, 3851, 3865,
         3889, 3895, 3933, 3947, 3949, 3957, 3971, 3985, 3991, 3995,
         4007, 4013, 4021, 4045, 4051, 4069, 4073, 4179, 4201, 4219,
         4221, 4249, 4305, 4331, 4359, 4383, 4387, 4411, 4431, 4439,
         4449, 4459, 4485, 4531, 4569, 4575, 4621, 4663, 4669, 4711,
         4723, 4735, 4793, 4801, 4811, 4879, 4893, 4897, 4921, 4927,
         4941, 4977, 5017, 5027, 5033, 5127, 5169, 5175, 5199, 5213,
         5223, 5237, 5287, 5293, 5331, 5391, 5405, 5453, 5523, 5573,
         5591, 5597, 5611, 5641, 5703, 5717, 5721, 5797, 5821, 5909,
         5913, 5955, 5957, 6005, 6025, 6061, 6067, 6079, 6081, 6231,
         6237, 6289, 6295, 6329, 6383, 6427, 6453, 6465, 6501, 6523,
         6539, 6577, 6589, 6601, 6607, 6631, 6683, 6699, 6707, 6761,
         6795, 6865, 6881, 6901, 6923, 6931, 6943, 6999, 7057, 7079,
         7103, 7105, 7123, 7173, 7185, 7191, 7207, 7245, 7303, 7327,
         7333, 7355, 7365, 7369, 7375, 7411, 7431, 7459, 7491, 7505,
         7515, 7541, 7557, 7561, 7701, 7705, 7727, 7749, 7761, 7783,
         7795, 7823, 7907, 7953, 7963, 7975, 8049, 8089, 8123, 8125,
         8137, 8219, 8231, 8245, 8275, 8293, 8303, 8331, 8333, 8351,
         8357, 8367, 8379, 8381, 8387, 8393, 8417, 8435, 8461, 8469,
         8489, 8495, 8507, 8515, 8551, 8555, 8569, 8585, 8599, 8605,
         8639, 8641, 8647, 8653, 8671, 8675, 8689, 8699, 8729, 8741,
         8759, 8765, 8771, 8795, 8797, 8825, 8831, 8841, 8855, 8859,
         8883, 8895, 8909, 8943, 8951, 8955, 8965, 8999, 9003, 9031,
         9045, 9049, 9071, 9073, 9085, 9095, 9101, 9109, 9123, 9129,
         9137, 9143, 9147, 9185, 9197, 9209, 9227, 9235, 9247, 9253,
         9257, 9277, 9297, 9303, 9313, 9325, 9343, 9347, 9371, 9373,
         9397, 9407, 9409, 9415, 9419, 9443, 9481, 9495, 9501, 9505,
         9517, 9529, 9555, 9557, 9571, 9585, 9591, 9607, 9611, 9621,
         9625, 9631, 9647, 9661, 9669, 9679, 9687, 9707, 9731, 9733,
         9745, 9773, 9791, 9803, 9811, 9817, 9833, 9847, 9851, 9863,
         9875, 9881, 9905, 9911, 9917, 9923, 9963, 9973,10003,10025,
        10043,10063,10071,10077,10091,10099,10105,10115,10129,10145,
        10169,10183,10187,10207,10223,10225,10247,10265,10271,10275,
        10289,10299,10301,10309,10343,10357,10373,10411,10413,10431,
        10445,10453,10463,10467,10473,10491,10505,10511,10513,10523,
        10539,10549,10559,10561,10571,10581,10615,10621,10625,10643,
        10655,10671,10679,10685,10691,10711,10739,10741,10755,10767,
        10781,10785,10803,10805,10829,10857,10863,10865,10875,10877,
        10917,10921,10929,10949,10967,10971,10987,10995,11009,11029,
        11043,11045,11055,11063,11075,11081,11117,11135,11141,11159,
        11163,11181,11187,11225,11237,11261,11279,11297,11307,11309,
        11327,11329,11341,11377,11403,11405,11413,11427,11439,11453,
        11461,11473,11479,11489,11495,11499,11533,11545,11561,11567,
        11575,11579,11589,11611,11623,11637,11657,11663,11687,11691,
        11701,11747,11761,11773,11783,11795,11797,11817,11849,11855,
        11867,11869,11873,11883,11919,11921,11927,11933,11947,11955,
        11961,11999,12027,12029,12037,12041,12049,12055,12095,12097,
        12107,12109,12121,12127,12133,12137,12181,12197,12207,12209,
        12239,12253,12263,12269,12277,12287,12295,12309,12313,12335,
        12361,12367,12391,12409,12415,12433,12449,12469,12479,12481,
        12499,12505,12517,12527,12549,12559,12597,12615,12621,12639,
        12643,12657,12667,12707,12713,12727,12741,12745,12763,12769,
        12779,12781,12787,12799,12809,12815,12829,12839,12857,12875,
        12883,12889,12901,12929,12947,12953,12959,12969,12983,12987,
        12995,13015,13019,13031,13063,13077,13103,13137,13149,13173,
        13207,13211,13227,13241,13249,13255,13269,13283,13285,13303,
        13307,13321,13339,13351,13377,13389,13407,13417,13431,13435,
        13447,13459,13465,13477,13501,13513,13531,13543,13561,13581,
        13599,13605,13617,13623,13637,13647,13661,13677,13683,13695,
        13725,13729,13753,13773,13781,13785,13795,13801,13807,13825,
        13835,13855,13861,13871,13883,13897,13905,13915,13939,13941,
        13969,13979,13981,13997,14027,14035,14037,14051,14063,14085,
        14095,14107,14113,14125,14137,14145,14151,14163,14193,14199,
        14219,14229,14233,14243,14277,14287,14289,14295,14301,14305,
        14323,14339,14341,14359,14365,14375,14387,14411,14425,14441,
        14449,14499,14513,14523,14537,14543,14561,14579,14585,14593,
        14599,14603,14611,14641,14671,14695,14701,14723,14725,14743,
        14753,14759,14765,14795,14797,14803,14831,14839,14845,14855,
        14889,14895,14909,14929,14941,14945,14951,14963,14965,14985,
        15033,15039,15053,15059,15061,15071,15077,15081,15099,15121,
        15147,15149,15157,15167,15187,15193,15203,15205,15215,15217,
        15223,15243,15257,15269,15273,15287,15291,15313,15335,15347,
        15359,15373,15379,15381,15391,15395,15397,15419,15439,15453,
        15469,15491,15503,15517,15527,15531,15545,15559,15593,15611,
        15613,15619,15639,15643,15649,15661,15667,15669,15681,15693,
        15717,15721,15741,15745,15765,15793,15799,15811,15825,15835,
        15847,15851,15865,15877,15881,15887,15899,15915,15935,15937,
        15955,15973,15977,16011,16035,16061,16069,16087,16093,16097,
        16121,16141,16153,16159,16165,16183,16189,16195,16197,16201,
        16209,16215,16225,16259,16265,16273,16299,16309,16355,16375,
        16381 };
      static double recipd;
      static long long int seed_save = 0;
      long long int seed_temp;
      static long long int v[DIM_MAX2][LOG_MAX];

      if ( !initialized || dim_num != dim_num_save )
      {
        initialized = true;
        for ( i = 0; i < DIM_MAX2; i++ )
          for ( j = 0; j < LOG_MAX; j++ )
            v[i][j] = 0;
    //
    //  Initialize (part of) V.
    //

    //  Warning: long initialization table snipped here!  Add back in before compiling.
    cout << "Add in initialization table and recompile.\n";
    return 1;

    //
    //  Check parameters.
    //
        if ( dim_num < 2 || DIM_MAX2 < dim_num )
        {
          cout << "\n";
          cout << "I8_SOBOL - Fatal error!\n";
          cout << "  The spatial dimension DIM_NUM should satisfy:\n";
          cout << "    2 <= DIM_NUM <= " << DIM_MAX2 << "\n";
          cout << "  But this input value is DIM_NUM = " << dim_num << "\n";
          exit ( 1 );
        }

        dim_num_save = dim_num;
    //
    //  Find the number of bits in ATMOST.
    //
    //  Here, we have short-circuited the computation of MAXCOL from ATMOST, because
    //  in some cases, a compiler was complaining that the value of ATMOST could not
    //  seem to be properly stored.  We only need ATMOST in order to specify MAXCOL,
    //  so if we know what the answer should be we can try to get it this way!
    //  JVB, 24 January 2006.
    //
    //  maxcol = i8_bit_hi1 ( atmost );
    //
        maxcol = 62;
    //
    //  Initialize row 1 of V.
    //
        for ( j = 0; j < maxcol; j++ )
        {
          v[0][j] = 1;
        }
    //
    //  Initialize the remaining rows of V.
    //
        for ( i = 1; i < dim_num; i++ )
        {
    //
    //  The bit pattern of the integer POLY(I) gives the form
    //  of polynomial I.
    //
    //  Find the degree of polynomial I from binary encoding.
    //
          j = poly[i];
          m = count_bits(j);
    //
    //  We expand this bit pattern to separate components
    //  of the logical array INCLUD.
    //
          for ( k = m-1; 0 <= k; k-- )
          {
            includ[k] = j & 1;
            j >>= 1;
          }
    //
    //  Calculate the remaining elements of row I as explained
    //  in Bratley and Fox, section 2.
    //
    //  Some tricky indexing here.  Did I change it correctly?
    //
          for ( j = m; j < maxcol; j++ )
          {
            newv = v[i][j-m];
            l = 1;

            for ( k = 0; k < m; k++ )
            {
              l <<= 1;
              if ( includ[k] )
                newv ^= l * v[i][j-k-1];
            }
            v[i][j] = newv;
          }
        }

    //
    //  Multiply columns of V by appropriate power of 2.
    //
        l = 0;
        for ( j = maxcol - 2; 0 <= j; j-- )
          for (++l, i = 0; i < dim_num; i++ )
            v[i][j] <<= l;
    //
    //  RECIPD is 1/(common denominator of the elements in V).
    //
        recipd = 1.0E+00 / ( ( double ) ( 2 * l ) );    // Faster way of doing this: direct math on exponent field
      }

      if ( *seed < 0 )
        *seed = 0;

      if ( *seed == 0 )
      {
        l = 1;
        for ( i = 0; i < dim_num; i++ )
          lastq[i] = 0;
      }
      else if ( *seed == seed_save + 1 )
      {
        l = i8_bit_lo0 ( *seed );
      }
      else if ( *seed <= seed_save )
      {
        seed_save = 0;
        l = 1;
        for ( i = 0; i < dim_num; i++ )
          lastq[i] = 0;

        for ( seed_temp = seed_save; seed_temp <= (*seed)-1; seed_temp++ )
        {

          l = i8_bit_lo0 ( seed_temp );
          for ( i = 0; i < dim_num; i++ )
            lastq[i] ^= v[i][l-1];
        }
        l = i8_bit_lo0 ( *seed );
      }
      else if ( seed_save+1 < *seed )
      {
        for ( seed_temp = seed_save+1; seed_temp < *seed; seed_temp++ )
        {
          l = i8_bit_lo0 ( seed_temp );
          for ( i = 0; i < dim_num; i++ )
            lastq[i] ^= v[i][l-1];
        }
        l = i8_bit_lo0 ( *seed );
      }
    //
    //  Check that the user is not calling too many times!
    //
      if ( maxcol < l )
      {
        cout << "\n";
        cout << "I8_SOBOL - Fatal error!\n";
        cout << "  The value of SEED seems to be too large!\n";
        cout << "  SEED =   " << *seed  << "\n";
        cout << "  MAXCOL = " << maxcol << "\n";
        cout << "  L =      " << l << "\n";
        exit ( 2 );
      }
    //
    //  Calculate the new components of QUASI.
    //  The caret indicates the bitwise exclusive OR.
    //
      for ( i = 0; i < dim_num; i++ )
      {
        quasi[i] = ( ( double ) lastq[i] ) * recipd;
        lastq[i] ^= v[i][l-1];
      }

      seed_save = *seed;
      *seed = *seed + 1;

      return;
    # undef DIM_MAX
    # undef DIM_MAX2
    # undef LOG_MAX
    }
     
  24. Jun 24, 2008 #23

    CRGreathouse

    User Avatar
    Science Advisor
    Homework Helper

    After my improvements to both your function and i8_sobol, and compiler optimization, the program runs in about a third the time: under 17 seconds for ten million. But I still don't know if the results are correct.

    Running the optimized program with n = 10^9 should take about 25 minutes. (Edit: tested at 27 minutes.)
     
    Last edited: Jun 24, 2008
  25. Jun 25, 2008 #24

    CRGreathouse

    User Avatar
    Science Advisor
    Homework Helper

    Putting in basic profiling (and a few more optimizations), I get the following breakdown for 100 million. (Sobol = time in i8_sobol, inmteg = time in Integrand1, other = time for profiling, I/O, etc.)

    Code (Text):
    Time:  183.55s
      Sobol: 18.605s
      Integ: 154.615s
    This means that improving Integrand1 is still the most important part.
     
  26. Jun 26, 2008 #25
    Standard C++ is not a .NET language. Managed C++ is an extension created by Microsoft--not the same thing. (In fact, the update to Managed C++ is C++/CLI which is really its own language and shouldn't even get to have C++ in its name)
     
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook