[C++] Alternatives to multi-dimensional vectors

1. Oct 9, 2012

burritoloco

Hello,

I'm wondering what other alternatives, besides multi-dimensional vectors, are there. Can the look-up speed be improved with other types of multi-containers? I have tried using hash maps with tuples as indices, but at least for my specific case, look-up time was slightly slower (perhaps this has more to do with tuple implementation speed; I tried both the std and boost tuples). Are there any other ideas which could potentially be faster? Thanks!

2. Oct 9, 2012

I like Serena

Hi burritoloco!

Typical alternative is a linear vector in which you calculate a multi-dimensional index.
But before you change anything, read on.

Suppose you have a 2-dimensional matrix with nrows and ncolumns.
If you want to look up the entry at (r,c) in a linear vector, you would calculate its index with:
Code (Text):
index = r * ncolumns + c
If you want to find a value, the best way is indeed a hash map.
But do not look up a tuple, but use the index just mentioned.

Note that you can still use multi-dimensional vectors, but to store a value in a hash map, use the equivalent index the entry would have in a linear vector.
Then, to find the actual row and column, use:
Code (Text):
r = index / ncolumns;
c = index % ncolumns;

3. Oct 9, 2012

D H

Staff Emeritus
For what application? If you're talking about matrices and vectors (matrices and vectors as used in math and physics, not those goofy things in the standard library that aren't vectors), you're looking in the wrong direction. Vectors of vectors are slow at best. Other collections are even slower. C++ out of the box isn't particularly well suited to scientific computing. It can, however be adapted quite nicely to scientific computing.

Here's just a small sampler. Note that under the hood I'm using a one dimensional array. The function operator, operator(), nicely hides this fact. I've also shown addition as an operator overload, which may not be desirable. Multiplication of course is also possible as a member function.

Code (Text):
#include <iostream>
#include <cstring>

template <unsigned int nrows, unsigned int ncols>
class Matrix {
double data[nrows*ncols];

public:
Matrix () {
std::memset (data, 0, sizeof(data));
}

virtual ~Matrix () {}

double operator() (unsigned int irow, unsigned int icol) const {
return data[irow*ncols+icol];
}

double& operator() (unsigned int irow, unsigned int icol) {
return data[irow*ncols+icol];
}

Matrix operator+ (const Matrix & addend) {
Matrix result;

for (unsigned int ii = 0; ii < nrows; ++ii) {
for (unsigned int jj = 0; jj < ncols; ++jj) {
unsigned int idx = ii*ncols + jj;
result.data[idx] = data[idx] + addend.data[idx];
}
}

return result;
}
};

int main () {
Matrix<2,2> ident;
ident(0,0) = ident(1,1) = 1.0;

Matrix<2,2> addend;
addend(0,1) = 1.0;
addend(1,1) = 2.0;

Matrix<2,2> sum = ident + addend;

std::cout << "Identity matrix:\n"
<< ident(0,0) << "  " << ident(0,1) << "\n"
<< ident(1,0) << "  " << ident(1,1) << "\n\n";

std::cout << "Addend matrix:\n"
<< addend(0,0) << "  " << addend(0,1) << "\n"
<< addend(1,0) << "  " << addend(1,1) << "\n\n";

std::cout << "Sum:\n"
<< sum(0,0) << "  " << sum(0,1) << "\n"
<< sum(1,0) << "  " << sum(1,1) << "\n";

#if 0
Matrix<3,3> ident_3x3;
Matrix<3,3> sum_3x3 = ident_3x3 + ident;  // Won't compile!
#endif
}
Note that trying to add a 2x2 and a 3x3 matrix won't compile, which is very nice. You can similarly make it so that multiplication fails to compile for ill-formed products (e.g., a 2x3 matrix by a 2x4 matrix).

One final comment: If you want it to be fast, matrix multiplication is non-trivial, particularly if the matrices are large. You have to know a lot about how cache works, and you have use tiling techniques so that memory isn't constantly be swapped into and out of cache.

4. Oct 10, 2012

MisterX

5. Oct 11, 2012

burritoloco

Thanks guys. Sorry for the late reply; so busy these days. Interesting, your comments seem to imply that look-up in the usual multi-containers is even slower compared to the extra multiplication/divisions operations used to obtain the multi-indices, in the linear vector type. In my case, most of the containers needed are 3-dimensional and are used simply for storage (not mathematical objects). Probably one can generalize your index formula for higher dimensions. I'll take a look as soon as I'm done with these terrible things that need to get done, and I'll let you know how it went! Cheers.

6. Oct 12, 2012

I like Serena

Indexing the vectors won't matter much.
That's not really where performance is lost.

Actually, if you would only deal with 2- or 3-dimensional vectors and matrices, it's most efficient to use regular arrays or structures.
In other words, the best method mostly depends on what you want to use them for.

7. Oct 12, 2012

D H

Staff Emeritus
If you know the dimensionality ahead of time (which is often the case in scientific and engineering), those multiplications are done with registers. That's blazingly fast. If the matrix is a part of the object rather than a pointer inside the object, only one memory lookup is needed to access an element of the array. (One extra lookup is needed if the matrix is a pointer to allocated memory.) Compare to a vector of vector of vectors. One memory lookup to get the outer pointer, a second to resolve the first index, a third to resolve the second index, and a fourth to resolve the final index. Going through memory, even level 1 cache, slows things down. It disrupts the computation pipeline. If you get a miss on the level 1 cache, it slows things down a lot. If that miss has to go all the way through to main memory, well, your computer just took a nice long nap. Or it swapped to another thread. You're going through four pointers to get that element. The memory allocations needed to create that vector of vector of vectors makes it a very good bet that you won't be localized; you'll almost certainly have level 1 cache misses.

Keeping memory localized and in level 1 cache is central to the scientific libraries that MisterX mentioned. The LAPACK and LINPACK packages do their very best to avoid cache misses. These libraries have been built and refined over multiple decades. The code is a bit ugly. There's a price to be paid for optimal performance.

One thing you need to ask yourself is whether you want to pay that price. It can be quite high. Some questions to ask yourself:
• Do you have a performance problem?
If you don't, work toward making your code understandable and maintainable. Optimization goes against the grain of almost every other software metric. It's expensive to develop, hard to understand, hard to maintain, and because it's more complex, more prone to bugs.

• Do you have a CPU performance problem?
Suppose you do have a performance problem (i.e., it's not as fast as you want it to be / need it to be). If your program is I/O bound rather than CPU bound you are chasing down the wrong alley here. Making your program take a bit less time in its CPU computations won't help you if your program is spending the majority of its time waiting on an I/O device. Splitting your program into threads is probably a better bet.

• Is your use of matrices the cause of your CPU performance problems?
Suppose you do have a performance problem and you know it's CPU rather than I/O that is causing the problems. It doesn't matter if you are using the least efficient or most efficient data representation scheme possible if your use of matrices consumes one percent of the total CPU time. It doesn't matter all that much even if its 10% rather than 1%. You need to find the dozen lines of code or so out of the thousands / hundreds of thousands that comprise your program that are consuming the vast majority of the CPU. It's usually just a very small number functions that are responsible for the majority of your CPU consumption, and of that small number, only a handful of lines that are the true culprits.

You're just grasping at straws if you don't know the nature or causes of the bottlenecks in your program. You need to find out what those bottlenecks are before you can fix your performance problems.

8. Oct 13, 2012

burritoloco

Well, you guys are right :). I have just tried linear vectors replacing my 3-dim vectors, and yes, the speed increase is significant; about 1000 iterations more (for my specific case) per 30 seconds. Thank you!

p.s. I also tried similar linear hash maps, but the vectors seem faster here, probably due to the high density of my data. Cheers.

Share this great discussion with others via Reddit, Google+, Twitter, or Facebook