Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

If/else C Macro

  1. Jul 2, 2015 #1
    So I have a header file containing things like

    Code (C):
    #ifndef _params_h
    #define _params_h

    #define REAL double // precision of data
    #define ALIGN 64    // byte size of alignment
    #define DBG 1       // 1 prints out results
    #define READIN 1    // 1 reads in white paper data, 0 randomizes data
    #define NVAR 128      // ncols
    #define NITER 25000    // nrows
    #define MCITER 10000  // Monte Carlo iterations
    #define RANDMIN -2
    #define RANDMAX 2

    #endif
    and I am using a lot of MKL calls within my program. If I change REAL to float, I get a bunch of errors about those MKL calls being double and not float, so what I would like to do is something like this

    Code (C):
    #ifndef _params_h
    #define _params_h

    #define REAL double // precision of data
    #define ALIGN 64    // byte size of alignment
    #define DBG 1       // 1 prints out results
    #define READIN 1    // 1 reads in white paper data, 0 randomizes data
    #define NVAR 128      // ncols
    #define NITER 25000    // nrows
    #define MCITER 10000  // Monte Carlo iterations
    #define RANDMIN -2
    #define RANDMAX 2

    #if REAL == double
       #define GEMM cblas_dgemm
       #define TRANS mkl_dimatcopy
       #define CHOL dpotrf
       #define INV LAPACKE_dtrtri
    #else if REAL == float
       #define GEMM cblas_sgemm
       #define TRANS mkl_simatcopy
       #define CHOL spotrf
       #define INV LAPACKE_strtri
    #endif

    #endif
    Would this work? Is there a better way to ubiquitously change from single to double precision?

    In the mean time, I'm going to go ahead and try and implement this, but I doubt it will work. Ha!

    Thanks in advance.
     
  2. jcsd
  3. Jul 2, 2015 #2

    jedishrfu

    Staff: Mentor

    This may be related to math library issues where typically all math functions are written with double arguments so that changing the REAL define to float you are breaking the argument rule. Floats are less precise so library developers chose doubles for all arguments and returns.

    Here's some discussion on the use of floats vs doubles that sway you to stay with doubles

    http://programmers.stackexchange.co...n-do-you-use-float-and-when-do-you-use-double

    Here's a related article on the pitfalls common to floating pt usage:

    http://www.codeproject.com/Articles/29637/Five-Tips-for-Floating-Point-Programming
     
  4. Jul 4, 2015 #3
    A lot of the time I found double is faster than float. It's probably due to implicit conversions somewhere in the codebase. Theortically they both execute at the same speed inside the CPU (assumption: modern x86) so execution time differences will depend on data access (which is going to be negligible if they are in the L1 cache) or whether you can be more parallel with floats, e.g. you shove two floats into a single register and do both in a single operation.

    I'd really recommend using double unless have profiling data and you really know what you're doing.
     
  5. Jul 4, 2015 #4

    D H

    User Avatar
    Staff Emeritus
    Science Advisor

    In this case, the OP has a 128×25000 matrix. On my computer, which has 6 MB of L3 cache, that matrix won't fit in L3 cache, let alone L1 cache. Given all the cache hits that will inevitably occur in this case, changing from doubles to floats should result in significantly improved performance.

    I can't disagree with you here. Switching from doubles to floats sometimes results in complete precision loss, where there are zero significant digits in the result. For example, a mildly ill-conditioned matrix is all it takes.
     
  6. Jul 4, 2015 #5

    D H

    User Avatar
    Staff Emeritus
    Science Advisor

    Are you turning off your browser, your email, your chat, *everything*, and spreading those Monte Carlo iterations over every available CPU, or are you running single threaded? If you have a decent laptop, a seven-fold increase in performance is possible. Double that again with a decent desktop machine. And if you have a blade server available, trying to bring it to its knees is always fun.
     
  7. Jul 4, 2015 #6
    Yes, those numbers are probably the array bounds.

    You would be entirely correct if the data was being manipulated in a single block. I didn't provide numbers as I have no way of determining this. If OP is processing individual rows, OP only needs to store 128 values.
     
  8. Jul 4, 2015 #7

    wle

    User Avatar

    Another difference this could make is that if you or your Sufficiently Smart CompilerTM can make effective use of SIMD, you can potentially pack in registers and simultaneously operate on twice as many floats at a time as doubles (e.g. add four pairs of floats instead of two pairs of doubles with just one CPU instruction). (I haven't tested how much of a difference this makes.)
     
  9. Jul 6, 2015 #8
    Sorry for the late response, I was out of town for the 4th. My intent here is to first compare precision of the answer and second to investigate speed increases due to exactly what you guys are talking about with fitting more data into cache. I don't believe my Monte Carlo loop can be effectively parallelized since there is a dependence between the (i+1)th and (i)th iterations and I fear the cost of communication would void the benefit of parallelization. That's just something to test as well. The internal computations parallelize nicely though.

    My exact code listed in the first post works for the double case, but does not work for the float case. I get errors of the following type:

    Code (Text):
    main.c(299): warning #167: argument of type "float *" is incompatible with parameter of type "double *"
         MKL_TRANS('R', 'T', m_dim, n_dim, 1.0, r_data, n_dim, m_dim);
                                                ^

    main.c(342): warning #167: argument of type "float *" is incompatible with parameter of type "double *"
         MKL_CHOL(upper, &n_dim, p_data, &n_dim, &info); // Cholesky factorization of C*
                                 ^

    main.c(425): warning #167: argument of type "float *" is incompatible with parameter of type "double *"
            MKL_CHOL(upper, &n_dim, t_data, &n_dim, &info); // Cholesky factorization of T, stored in T
                                    ^

    main.c(461): warning #167: argument of type "float *" is incompatible with parameter of type "double *"
            MKL_INV(CblasRowMajor, 'L', 'N', n_dim, t_data, n_dim);
                                                    ^

    main.c(465): warning #167: argument of type "float *" is incompatible with parameter of type "const double *"
            MKL_GEMM(CblasRowMajor, CblasNoTrans, CblasNoTrans, n_dim, n_dim, n_dim, 1.0, p_data, n_dim, t_data, n_dim, 0.0, s_data, n_dim);
                                                                                          ^

    main.c(465): warning #167: argument of type "float *" is incompatible with parameter of type "const double *"
            MKL_GEMM(CblasRowMajor, CblasNoTrans, CblasNoTrans, n_dim, n_dim, n_dim, 1.0, p_data, n_dim, t_data, n_dim, 0.0, s_data, n_dim);
                                                                                                         ^

    main.c(465): warning #167: argument of type "float *" is incompatible with parameter of type "double *"
            MKL_GEMM(CblasRowMajor, CblasNoTrans, CblasNoTrans, n_dim, n_dim, n_dim, 1.0, p_data, n_dim, t_data, n_dim, 0.0, s_data, n_dim);
                                                                                                                             ^

    main.c(477): warning #167: argument of type "float *" is incompatible with parameter of type "const double *"
            MKL_GEMM(CblasRowMajor, CblasNoTrans, CblasNoTrans, n_dim, m_dim, n_dim, 1.0, s_data, n_dim, r_data, m_dim, 1.0, rstarb_data, m_dim); // For (RS')'=SR'
                                                                                          ^

    main.c(477): warning #167: argument of type "float *" is incompatible with parameter of type "const double *"
            MKL_GEMM(CblasRowMajor, CblasNoTrans, CblasNoTrans, n_dim, m_dim, n_dim, 1.0, s_data, n_dim, r_data, m_dim, 1.0, rstarb_data, m_dim); // For (RS')'=SR'
                                                                                                         ^

    main.c(477): warning #167: argument of type "float *" is incompatible with parameter of type "double *"
            MKL_GEMM(CblasRowMajor, CblasNoTrans, CblasNoTrans, n_dim, m_dim, n_dim, 1.0, s_data, n_dim, r_data, m_dim, 1.0, rstarb_data, m_dim); // For (RS')'=SR'
    I believe this may be due to the issue jedishrfu brought up.
     
  10. Jul 7, 2015 #9

    wle

    User Avatar

    Try:
    Code (C):

    #define USE_FLOAT        // Comment out this line to use double.

    // ...

    #ifndef USE_FLOAT
        #define REAL double
        #define GEMM cblas_dgemm
        #define TRANS mkl_dimatcopy
        #define CHOL dpotrf
        #define INV LAPACKE_dtrtri
    #else
        #define REAL float
        #define GEMM cblas_sgemm
        #define TRANS mkl_simatcopy
        #define CHOL spotrf
        #define INV LAPACKE_strtri
    #endif

    I did some Googling to try to understand why the attempt in the OP wasn't working (and specifically, why it was failing in the weird way it was) and came across this. It seems that #if can only understand tests involving integers and integer arithmetic. If you do #define REAL float then my understanding is that the following things happen:
    1. The macro REAL in the directive #if REAL == double gets expanded to float, so the test becomes float == double.
    2. float and double aren't macros (or integer or character literals), so they're interpreted as identifiers.
    3. There's a rule saying that identifiers are treated as zero, so the test is treated as 0 == 0, which is 1 (true).
    So it's the macros under #if REAL == double that are defined, even if you #define REAL float.

    (Incidentally, there's no #else if directive. Use #elif for that.)
     
    Last edited: Jul 7, 2015
  11. Jul 7, 2015 #10
    wle,

    Making the modifications you suggested ended up working. Thanks for the help! Now, I'd like to compare the runtimes, but my timing is just printing out zeros for the float case. My timer is

    Code (C):
    #include <time.h>
    #include <sys/time.h>
    #include <assert.h>
    #include <sys/mman.h>
    #include "params.h"

    REAL dtime()
    {
       REAL tseconds = 0.0;
       struct timeval mytime;
       gettimeofday(&mytime, (struct timezone *) 0);
       tseconds = (REAL)(mytime.tv_sec + (REAL)mytime.tv_usec*1.0e-6);
       return(tseconds);
    }
    The parameters header file now looks like

    Code (C):
    #ifndef _params_h
    #define _params_h

    #define USE_FLOAT
    #define ALIGN 64    // byte size of alignment
    #define DBG 0       // 1 prints out results
    #define READIN 0    // 1 reads in white paper data, 0 randomizes data
    #define NVAR 128      // ncols
    #define NITER 25000    // nrows
    #define MCITER 1  // Monte Carlo iterations
    #define RANDMIN -2
    #define RANDMAX 2

    #ifndef USE_FLOAT
       #define LHP 0
       #define REAL double
       #define MKL_GEMM cblas_dgemm
       #define MKL_TRANS mkl_dimatcopy
       #define MKL_CHOL dpotrf
       #define MKL_INV LAPACKE_dtrtri
    #else
       #define LHP 1
       #define REAL float
       #define MKL_GEMM cblas_sgemm
       #define MKL_TRANS mkl_simatcopy
       #define MKL_CHOL spotrf
       #define MKL_INV LAPACKE_strtri
    #endif

    #endif
    and I'm printing from main with

    Code (C):
       ...
       tstart = dtime();
       ...
       tfinish = dtime();

       printf("\nTotal run time:\nNumber of Monte Carlo iterations: %d\nNumber of iterations: %d\nNumber of variables: %d\nTotal time: %f (s)\n\n", MCITER, m_dim, n_dim, tfinish-tstart);
       printf("\nRun time of step 2: %f (s)\n", t2time);
       printf("\nRun time of step 3: %f (s)\n", t3time);
       printf("\nRun time of step 4a: %f (s)\n", t4atime);
       printf("\nRun time of step 4b: %f (s)\n", t4btime);
       printf("\nRun time of step 5: %f (s)\n", t5time);
    Any ideas why I would be getting 0.00000000 for all of my timings?
     
  12. Jul 7, 2015 #11
    I had a quick look then I saw `gettimeofday`.

    You should almost certainly be using a higher resolution, monotonic timer for this. On a lot of systems that thing only has one second resolution and its affected by local timezone changes.

    I can't remember C's library but I think you want `clock_gettime(CLOCK_MONOTONIC)` from `time.h` for timing programs. It has a higher resolution and importantly, monotonics only ever tick upwards regardless of local TZ adjustments. However I am rusty with C so you'd should look in the C library docs for the correct func for your plaform. In C++ you would use Chrono.
     
  13. Jul 7, 2015 #12
  14. Jul 7, 2015 #13
    Turns out that double versus single precision in this case does not make much of a difference.

    Code (Text):
    Double:                                      Single:
    Total run time: 12.210 (s)                      Total run time: 11.240 (s)
       Number of Monte Carlo iterations: 1             Number of Monte Carlo iterations: 1
       Number of iterations: 25000                   Number of iterations: 25000
       Number of variables: 128                      Number of variables: 128
       Data precision: double                         Data precision: single
       
    Total run time: 112.20 (s)                      Total run time: 108.770 (s)
       Number of Monte Carlo iterations: 10          Number of Monte Carlo iterations: 10
       Number of iterations: 25000                   Number of iterations: 25000
       Number of variables: 128                      Number of variables: 128
       Data precision: double                         Data precision: single
       
    Total run time: 1113.580 (s)                   Total run time: 1099.50 (s)
       Number of Monte Carlo iterations: 100          Number of Monte Carlo iterations: 100
       Number of iterations: 25000                   Number of iterations: 25000
       Number of variables: 128                      Number of variables: 128
       Data precision: double                         Data precision: single
     
  15. Jul 7, 2015 #14

    D H

    User Avatar
    Staff Emeritus
    Science Advisor

    In my opinion, that is a rather insignificant performance improvement that comes at the expense of a very significant loss of precision.
     
  16. Jul 7, 2015 #15
    I agree. I was just interested in the experiment.
     
  17. Jul 7, 2015 #16
    You can/should look into compiler options.
    For a course on Monte Carlo simulations I looked into this for a bit and got astonishing results.

    If I'm not mistaken I even got a 10-fold decrease in runtime for a certain simulation.

    Edit:
    I looked into my files because I remember writing a readme for my partner.
    I used the following command to compile my simulation (hard disks fluid in the plane)
    Code (Text):

    gcc Disks.c -o Disks -lm -m64 -Ofast -flto -march=native -funroll-loops
     
    -m64 signifies that I'm on a 64 bit system.
    -Ofast is an automatic optimisation setting (look into this)

    You should find out what these flags do and test what they do for you
     
  18. Jul 7, 2015 #17
    The current compiler options I am using are

    Code (Text):
    icc -c -std=c99 -O3 -opt-streaming-stores auto -qopt-report=3 -opt-multi-version-aggressive -opt-assume-safe-padding -mkl
    Do you have any examples of the compiler options you saw increase performance?
     
  19. Jul 7, 2015 #18
    I edited my post.
    I must say I'm not exactly sure what gave the best results.
    You have to try and find out what works best on your machine since it can be very machine-dependent.
     
  20. Jul 7, 2015 #19
    I would add -DNDEBUG just for good measure.
     
  21. Jul 7, 2015 #20
    What does NDEBUG do? I just read that it preprocesses asserts, but I don't use any.
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook