# If/else C Macro

swartzism
So I have a header file containing things like

Code:
#ifndef _params_h
#define _params_h

#define REAL double // precision of data
#define ALIGN 64    // byte size of alignment
#define DBG 1       // 1 prints out results
#define READIN 1    // 1 reads in white paper data, 0 randomizes data
#define NVAR 128      // ncols
#define NITER 25000    // nrows
#define MCITER 10000  // Monte Carlo iterations
#define RANDMIN -2
#define RANDMAX 2

#endif

and I am using a lot of MKL calls within my program. If I change REAL to float, I get a bunch of errors about those MKL calls being double and not float, so what I would like to do is something like this

Code:
#ifndef _params_h
#define _params_h

#define REAL double // precision of data
#define ALIGN 64    // byte size of alignment
#define DBG 1       // 1 prints out results
#define READIN 1    // 1 reads in white paper data, 0 randomizes data
#define NVAR 128      // ncols
#define NITER 25000    // nrows
#define MCITER 10000  // Monte Carlo iterations
#define RANDMIN -2
#define RANDMAX 2

#if REAL == double
#define GEMM cblas_dgemm
#define TRANS mkl_dimatcopy
#define CHOL dpotrf
#define INV LAPACKE_dtrtri
#else if REAL == float
#define GEMM cblas_sgemm
#define TRANS mkl_simatcopy
#define CHOL spotrf
#define INV LAPACKE_strtri
#endif

#endif

Would this work? Is there a better way to ubiquitously change from single to double precision?

In the mean time, I'm going to go ahead and try and implement this, but I doubt it will work. Ha!

Mentor
This may be related to math library issues where typically all math functions are written with double arguments so that changing the REAL define to float you are breaking the argument rule. Floats are less precise so library developers chose doubles for all arguments and returns.

Here's some discussion on the use of floats vs doubles that sway you to stay with doubles

http://programmers.stackexchange.co...n-do-you-use-float-and-when-do-you-use-double

Here's a related article on the pitfalls common to floating pt usage:

http://www.codeproject.com/Articles/29637/Five-Tips-for-Floating-Point-Programming

Carno Raar
A lot of the time I found double is faster than float. It's probably due to implicit conversions somewhere in the codebase. Theortically they both execute at the same speed inside the CPU (assumption: modern x86) so execution time differences will depend on data access (which is going to be negligible if they are in the L1 cache) or whether you can be more parallel with floats, e.g. you shove two floats into a single register and do both in a single operation.

I'd really recommend using double unless have profiling data and you really know what you're doing.

Staff Emeritus
Theortically they both execute at the same speed inside the CPU (assumption: modern x86) so execution time differences will depend on data access (which is going to be negligible if they are in the L1 cache)
In this case, the OP has a 128×25000 matrix. On my computer, which has 6 MB of L3 cache, that matrix won't fit in L3 cache, let alone L1 cache. Given all the cache hits that will inevitably occur in this case, changing from doubles to floats should result in significantly improved performance.

I'd really recommend using double unless have profiling data and you really know what you're doing.
I can't disagree with you here. Switching from doubles to floats sometimes results in complete precision loss, where there are zero significant digits in the result. For example, a mildly ill-conditioned matrix is all it takes.

Staff Emeritus
So I have a header file containing things like
Code:
#define MCITER 10000  // Monte Carlo iterations
Are you turning off your browser, your email, your chat, *everything*, and spreading those Monte Carlo iterations over every available CPU, or are you running single threaded? If you have a decent laptop, a seven-fold increase in performance is possible. Double that again with a decent desktop machine. And if you have a blade server available, trying to bring it to its knees is always fun.

Carno Raar
In this case, the OP has a 128×25000 matrix. On my computer, which has 6 MB of L3 cache, that matrix won't fit in L3 cache, let alone L1 cache. Given all the cache hits that will inevitably occur in this case, changing from doubles to floats should result in significantly improved performance.

Yes, those numbers are probably the array bounds.

You would be entirely correct if the data was being manipulated in a single block. I didn't provide numbers as I have no way of determining this. If OP is processing individual rows, OP only needs to store 128 values.

wle
In this case, the OP has a 128×25000 matrix. On my computer, which has 6 MB of L3 cache, that matrix won't fit in L3 cache, let alone L1 cache. Given all the cache hits that will inevitably occur in this case, changing from doubles to floats should result in significantly improved performance.

Another difference this could make is that if you or your Sufficiently Smart CompilerTM can make effective use of SIMD, you can potentially pack in registers and simultaneously operate on twice as many floats at a time as doubles (e.g. add four pairs of floats instead of two pairs of doubles with just one CPU instruction). (I haven't tested how much of a difference this makes.)

swartzism
Sorry for the late response, I was out of town for the 4th. My intent here is to first compare precision of the answer and second to investigate speed increases due to exactly what you guys are talking about with fitting more data into cache. I don't believe my Monte Carlo loop can be effectively parallelized since there is a dependence between the (i+1)th and (i)th iterations and I fear the cost of communication would void the benefit of parallelization. That's just something to test as well. The internal computations parallelize nicely though.

My exact code listed in the first post works for the double case, but does not work for the float case. I get errors of the following type:

Code:
main.c(299): warning #167: argument of type "float *" is incompatible with parameter of type "double *"
MKL_TRANS('R', 'T', m_dim, n_dim, 1.0, r_data, n_dim, m_dim);
^

main.c(342): warning #167: argument of type "float *" is incompatible with parameter of type "double *"
MKL_CHOL(upper, &n_dim, p_data, &n_dim, &info); // Cholesky factorization of C*
^

main.c(425): warning #167: argument of type "float *" is incompatible with parameter of type "double *"
MKL_CHOL(upper, &n_dim, t_data, &n_dim, &info); // Cholesky factorization of T, stored in T
^

main.c(461): warning #167: argument of type "float *" is incompatible with parameter of type "double *"
MKL_INV(CblasRowMajor, 'L', 'N', n_dim, t_data, n_dim);
^

main.c(465): warning #167: argument of type "float *" is incompatible with parameter of type "const double *"
MKL_GEMM(CblasRowMajor, CblasNoTrans, CblasNoTrans, n_dim, n_dim, n_dim, 1.0, p_data, n_dim, t_data, n_dim, 0.0, s_data, n_dim);
^

main.c(465): warning #167: argument of type "float *" is incompatible with parameter of type "const double *"
MKL_GEMM(CblasRowMajor, CblasNoTrans, CblasNoTrans, n_dim, n_dim, n_dim, 1.0, p_data, n_dim, t_data, n_dim, 0.0, s_data, n_dim);
^

main.c(465): warning #167: argument of type "float *" is incompatible with parameter of type "double *"
MKL_GEMM(CblasRowMajor, CblasNoTrans, CblasNoTrans, n_dim, n_dim, n_dim, 1.0, p_data, n_dim, t_data, n_dim, 0.0, s_data, n_dim);
^

main.c(477): warning #167: argument of type "float *" is incompatible with parameter of type "const double *"
MKL_GEMM(CblasRowMajor, CblasNoTrans, CblasNoTrans, n_dim, m_dim, n_dim, 1.0, s_data, n_dim, r_data, m_dim, 1.0, rstarb_data, m_dim); // For (RS')'=SR'
^

main.c(477): warning #167: argument of type "float *" is incompatible with parameter of type "const double *"
MKL_GEMM(CblasRowMajor, CblasNoTrans, CblasNoTrans, n_dim, m_dim, n_dim, 1.0, s_data, n_dim, r_data, m_dim, 1.0, rstarb_data, m_dim); // For (RS')'=SR'
^

main.c(477): warning #167: argument of type "float *" is incompatible with parameter of type "double *"
MKL_GEMM(CblasRowMajor, CblasNoTrans, CblasNoTrans, n_dim, m_dim, n_dim, 1.0, s_data, n_dim, r_data, m_dim, 1.0, rstarb_data, m_dim); // For (RS')'=SR'

I believe this may be due to the issue jedishrfu brought up.

wle
Try:
C:
#define USE_FLOAT        // Comment out this line to use double.

// ...

#ifndef USE_FLOAT
#define REAL double
#define GEMM cblas_dgemm
#define TRANS mkl_dimatcopy
#define CHOL dpotrf
#define INV LAPACKE_dtrtri
#else
#define REAL float
#define GEMM cblas_sgemm
#define TRANS mkl_simatcopy
#define CHOL spotrf
#define INV LAPACKE_strtri
#endif

I did some Googling to try to understand why the attempt in the OP wasn't working (and specifically, why it was failing in the weird way it was) and came across this. It seems that #if can only understand tests involving integers and integer arithmetic. If you do #define REAL float then my understanding is that the following things happen:
1. The macro REAL in the directive #if REAL == double gets expanded to float, so the test becomes float == double.
2. float and double aren't macros (or integer or character literals), so they're interpreted as identifiers.
3. There's a rule saying that identifiers are treated as zero, so the test is treated as 0 == 0, which is 1 (true).
So it's the macros under #if REAL == double that are defined, even if you #define REAL float.

(Incidentally, there's no #else if directive. Use #elif for that.)

Last edited:
swartzism
wle,

Making the modifications you suggested ended up working. Thanks for the help! Now, I'd like to compare the runtimes, but my timing is just printing out zeros for the float case. My timer is

Code:
#include <time.h>
#include <sys/time.h>
#include <assert.h>
#include <sys/mman.h>
#include "params.h"

REAL dtime()
{
REAL tseconds = 0.0;
struct timeval mytime;
gettimeofday(&mytime, (struct timezone *) 0);
tseconds = (REAL)(mytime.tv_sec + (REAL)mytime.tv_usec*1.0e-6);
return(tseconds);
}

The parameters header file now looks like

Code:
#ifndef _params_h
#define _params_h

#define USE_FLOAT
#define ALIGN 64    // byte size of alignment
#define DBG 0       // 1 prints out results
#define READIN 0    // 1 reads in white paper data, 0 randomizes data
#define NVAR 128      // ncols
#define NITER 25000    // nrows
#define MCITER 1  // Monte Carlo iterations
#define RANDMIN -2
#define RANDMAX 2

#ifndef USE_FLOAT
#define LHP 0
#define REAL double
#define MKL_GEMM cblas_dgemm
#define MKL_TRANS mkl_dimatcopy
#define MKL_CHOL dpotrf
#define MKL_INV LAPACKE_dtrtri
#else
#define LHP 1
#define REAL float
#define MKL_GEMM cblas_sgemm
#define MKL_TRANS mkl_simatcopy
#define MKL_CHOL spotrf
#define MKL_INV LAPACKE_strtri
#endif

#endif

and I'm printing from main with

Code:
   ...
tstart = dtime();
...
tfinish = dtime();

printf("\nTotal run time:\nNumber of Monte Carlo iterations: %d\nNumber of iterations: %d\nNumber of variables: %d\nTotal time: %f (s)\n\n", MCITER, m_dim, n_dim, tfinish-tstart);
printf("\nRun time of step 2: %f (s)\n", t2time);
printf("\nRun time of step 3: %f (s)\n", t3time);
printf("\nRun time of step 4a: %f (s)\n", t4atime);
printf("\nRun time of step 4b: %f (s)\n", t4btime);
printf("\nRun time of step 5: %f (s)\n", t5time);

Any ideas why I would be getting 0.00000000 for all of my timings?

Carno Raar
I had a quick look then I saw gettimeofday.

You should almost certainly be using a higher resolution, monotonic timer for this. On a lot of systems that thing only has one second resolution and its affected by local timezone changes.

I can't remember C's library but I think you want clock_gettime(CLOCK_MONOTONIC) from time.h for timing programs. It has a higher resolution and importantly, monotonics only ever tick upwards regardless of local TZ adjustments. However I am rusty with C so you'd should look in the C library docs for the correct func for your plaform. In C++ you would use Chrono.

swartzism
Turns out that double versus single precision in this case does not make much of a difference.

Code:
Double:                                      Single:
Total run time: 12.210 (s)                      Total run time: 11.240 (s)
Number of Monte Carlo iterations: 1             Number of Monte Carlo iterations: 1
Number of iterations: 25000                   Number of iterations: 25000
Number of variables: 128                      Number of variables: 128
Data precision: double                         Data precision: single

Total run time: 112.20 (s)                      Total run time: 108.770 (s)
Number of Monte Carlo iterations: 10          Number of Monte Carlo iterations: 10
Number of iterations: 25000                   Number of iterations: 25000
Number of variables: 128                      Number of variables: 128
Data precision: double                         Data precision: single

Total run time: 1113.580 (s)                   Total run time: 1099.50 (s)
Number of Monte Carlo iterations: 100          Number of Monte Carlo iterations: 100
Number of iterations: 25000                   Number of iterations: 25000
Number of variables: 128                      Number of variables: 128
Data precision: double                         Data precision: single

Staff Emeritus
In my opinion, that is a rather insignificant performance improvement that comes at the expense of a very significant loss of precision.

swartzism
In my opinion, that is a rather insignificant performance improvement that comes at the expense of a very significant loss of precision.

I agree. I was just interested in the experiment.

JorisL
You can/should look into compiler options.
For a course on Monte Carlo simulations I looked into this for a bit and got astonishing results.

If I'm not mistaken I even got a 10-fold decrease in runtime for a certain simulation.

Edit:
I looked into my files because I remember writing a readme for my partner.
I used the following command to compile my simulation (hard disks fluid in the plane)
Code:
gcc Disks.c -o Disks -lm -m64 -Ofast -flto -march=native -funroll-loops

-m64 signifies that I'm on a 64 bit system.
-Ofast is an automatic optimisation setting (look into this)

You should find out what these flags do and test what they do for you

swartzism
You can/should look into compiler options.
For a course on Monte Carlo simulations I looked into this for a bit and got astonishing results.

If I'm not mistaken I even got a 10-fold decrease in runtime for a certain simulation.

The current compiler options I am using are

Code:
icc -c -std=c99 -O3 -opt-streaming-stores auto -qopt-report=3 -opt-multi-version-aggressive -opt-assume-safe-padding -mkl

Do you have any examples of the compiler options you saw increase performance?

JorisL
I edited my post.
I must say I'm not exactly sure what gave the best results.
You have to try and find out what works best on your machine since it can be very machine-dependent.

newjerseyrunner
I would add -DNDEBUG just for good measure.

swartzism
I would add -DNDEBUG just for good measure.

What does NDEBUG do? I just read that it preprocesses asserts, but I don't use any.

newjerseyrunner
What does NDEBUG do? I just read that it preprocesses asserts, but I don't use any.
You may not, but some header libraries may. It's not a huge optimization, but it may remove some unnecessary code.