[C++] Alternatives to multi-dimensional vectors

  • C/C++
  • Thread starter burritoloco
  • Start date
  • Tags
    Vectors
In summary, the conversation discusses alternatives to multi-dimensional vectors for improving look-up speed, such as using linear vectors or hash maps. The use of regular arrays or structures is also suggested for efficiency. The focus is on mathematical objects, specifically matrices and vectors. Options for matrix libraries are also mentioned.
  • #1
burritoloco
83
0
Hello,

I'm wondering what other alternatives, besides multi-dimensional vectors, are there. Can the look-up speed be improved with other types of multi-containers? I have tried using hash maps with tuples as indices, but at least for my specific case, look-up time was slightly slower (perhaps this has more to do with tuple implementation speed; I tried both the std and boost tuples). Are there any other ideas which could potentially be faster? Thanks!
 
Technology news on Phys.org
  • #2
Hi burritoloco!

Typical alternative is a linear vector in which you calculate a multi-dimensional index.
But before you change anything, read on.

Suppose you have a 2-dimensional matrix with nrows and ncolumns.
If you want to look up the entry at (r,c) in a linear vector, you would calculate its index with:
Code:
index = r * ncolumns + c

If you want to find a value, the best way is indeed a hash map.
But do not look up a tuple, but use the index just mentioned.

Note that you can still use multi-dimensional vectors, but to store a value in a hash map, use the equivalent index the entry would have in a linear vector.
Then, to find the actual row and column, use:
Code:
r = index / ncolumns; 
c = index % ncolumns;
 
  • #3
burritoloco said:
I'm wondering what other alternatives, besides multi-dimensional vectors, are there. Can the look-up speed be improved with other types of multi-containers?
For what application? If you're talking about matrices and vectors (matrices and vectors as used in math and physics, not those goofy things in the standard library that aren't vectors), you're looking in the wrong direction. Vectors of vectors are slow at best. Other collections are even slower. C++ out of the box isn't particularly well suited to scientific computing. It can, however be adapted quite nicely to scientific computing.

Here's just a small sampler. Note that under the hood I'm using a one dimensional array. The function operator, operator(), nicely hides this fact. I've also shown addition as an operator overload, which may not be desirable. Multiplication of course is also possible as a member function.

Code:
#include <iostream>
#include <cstring>

template <unsigned int nrows, unsigned int ncols>
class Matrix {
   double data[nrows*ncols];

public:
   Matrix () {
      std::memset (data, 0, sizeof(data));
   }   

   virtual ~Matrix () {}

   double operator() (unsigned int irow, unsigned int icol) const {
      return data[irow*ncols+icol];
   }   

   double& operator() (unsigned int irow, unsigned int icol) {
      return data[irow*ncols+icol];
   }   

   Matrix operator+ (const Matrix & addend) {
      Matrix result;

      for (unsigned int ii = 0; ii < nrows; ++ii) {
         for (unsigned int jj = 0; jj < ncols; ++jj) {
            unsigned int idx = ii*ncols + jj; 
            result.data[idx] = data[idx] + addend.data[idx];
         }
      }   

      return result;
   }   
};

int main () {
   Matrix<2,2> ident;
   ident(0,0) = ident(1,1) = 1.0;

   Matrix<2,2> addend;
   addend(0,1) = 1.0;
   addend(1,1) = 2.0;

   Matrix<2,2> sum = ident + addend;


   std::cout << "Identity matrix:\n"
             << ident(0,0) << "  " << ident(0,1) << "\n"
             << ident(1,0) << "  " << ident(1,1) << "\n\n";

   std::cout << "Addend matrix:\n"
             << addend(0,0) << "  " << addend(0,1) << "\n"
             << addend(1,0) << "  " << addend(1,1) << "\n\n";

   std::cout << "Sum:\n"
             << sum(0,0) << "  " << sum(0,1) << "\n"
             << sum(1,0) << "  " << sum(1,1) << "\n";

#if 0
   Matrix<3,3> ident_3x3;
   Matrix<3,3> sum_3x3 = ident_3x3 + ident;  // Won't compile!
#endif
}

Note that trying to add a 2x2 and a 3x3 matrix won't compile, which is very nice. You can similarly make it so that multiplication fails to compile for ill-formed products (e.g., a 2x3 matrix by a 2x4 matrix).

One final comment: If you want it to be fast, matrix multiplication is non-trivial, particularly if the matrices are large. You have to know a lot about how cache works, and you have use tiling techniques so that memory isn't constantly be swapped into and out of cache.
 
  • #5
Thanks guys. Sorry for the late reply; so busy these days. Interesting, your comments seem to imply that look-up in the usual multi-containers is even slower compared to the extra multiplication/divisions operations used to obtain the multi-indices, in the linear vector type. In my case, most of the containers needed are 3-dimensional and are used simply for storage (not mathematical objects). Probably one can generalize your index formula for higher dimensions. I'll take a look as soon as I'm done with these terrible things that need to get done, and I'll let you know how it went! Cheers.
 
  • #6
Indexing the vectors won't matter much.
That's not really where performance is lost.

Actually, if you would only deal with 2- or 3-dimensional vectors and matrices, it's most efficient to use regular arrays or structures.
In other words, the best method mostly depends on what you want to use them for.
 
  • #7
burritoloco said:
Interesting, your comments seem to imply that look-up in the usual multi-containers is even slower compared to the extra multiplication/divisions operations used to obtain the multi-indices, in the linear vector type. In my case, most of the containers needed are 3-dimensional and are used simply for storage (not mathematical objects).
If you know the dimensionality ahead of time (which is often the case in scientific and engineering), those multiplications are done with registers. That's blazingly fast. If the matrix is a part of the object rather than a pointer inside the object, only one memory lookup is needed to access an element of the array. (One extra lookup is needed if the matrix is a pointer to allocated memory.) Compare to a vector of vector of vectors. One memory lookup to get the outer pointer, a second to resolve the first index, a third to resolve the second index, and a fourth to resolve the final index. Going through memory, even level 1 cache, slows things down. It disrupts the computation pipeline. If you get a miss on the level 1 cache, it slows things down a lot. If that miss has to go all the way through to main memory, well, your computer just took a nice long nap. Or it swapped to another thread. You're going through four pointers to get that element. The memory allocations needed to create that vector of vector of vectors makes it a very good bet that you won't be localized; you'll almost certainly have level 1 cache misses.

Keeping memory localized and in level 1 cache is central to the scientific libraries that MisterX mentioned. The LAPACK and LINPACK packages do their very best to avoid cache misses. These libraries have been built and refined over multiple decades. The code is a bit ugly. There's a price to be paid for optimal performance.


One thing you need to ask yourself is whether you want to pay that price. It can be quite high. Some questions to ask yourself:
  • Do you have a performance problem?
    If you don't, work toward making your code understandable and maintainable. Optimization goes against the grain of almost every other software metric. It's expensive to develop, hard to understand, hard to maintain, and because it's more complex, more prone to bugs.

  • Do you have a CPU performance problem?
    Suppose you do have a performance problem (i.e., it's not as fast as you want it to be / need it to be). If your program is I/O bound rather than CPU bound you are chasing down the wrong alley here. Making your program take a bit less time in its CPU computations won't help you if your program is spending the majority of its time waiting on an I/O device. Splitting your program into threads is probably a better bet.

  • Is your use of matrices the cause of your CPU performance problems?
    Suppose you do have a performance problem and you know it's CPU rather than I/O that is causing the problems. It doesn't matter if you are using the least efficient or most efficient data representation scheme possible if your use of matrices consumes one percent of the total CPU time. It doesn't matter all that much even if its 10% rather than 1%. You need to find the dozen lines of code or so out of the thousands / hundreds of thousands that comprise your program that are consuming the vast majority of the CPU. It's usually just a very small number functions that are responsible for the majority of your CPU consumption, and of that small number, only a handful of lines that are the true culprits.

You're just grasping at straws if you don't know the nature or causes of the bottlenecks in your program. You need to find out what those bottlenecks are before you can fix your performance problems.
 
  • #8
Well, you guys are right :). I have just tried linear vectors replacing my 3-dim vectors, and yes, the speed increase is significant; about 1000 iterations more (for my specific case) per 30 seconds. Thank you!

p.s. I also tried similar linear hash maps, but the vectors seem faster here, probably due to the high density of my data. Cheers.
 

1. What are some alternatives to using multi-dimensional vectors in C++?

Some alternatives to using multi-dimensional vectors in C++ include using arrays, nested vectors, or custom data structures such as matrices or linked lists.

2. How do arrays compare to multi-dimensional vectors in terms of performance?

Arrays tend to have better performance than multi-dimensional vectors due to their contiguous memory allocation. However, multi-dimensional vectors offer more flexibility and ease of use.

3. Can I use nested vectors as a replacement for multi-dimensional vectors?

Yes, nested vectors can be used as a replacement for multi-dimensional vectors. However, they may not offer the same level of performance and may require more complex indexing.

4. Are there any drawbacks to using custom data structures instead of multi-dimensional vectors?

Using custom data structures may require more time and effort to implement, and may not have the same level of functionality as multi-dimensional vectors.

5. How do I decide which alternative to use for multi-dimensional vectors?

The best alternative to use for multi-dimensional vectors will depend on the specific needs and requirements of your project. Some factors to consider include performance, ease of use, and the complexity of your data structure.

Similar threads

Replies
7
Views
1K
  • Programming and Computer Science
Replies
5
Views
1K
  • Programming and Computer Science
Replies
3
Views
1K
  • General Math
Replies
11
Views
1K
  • Programming and Computer Science
2
Replies
49
Views
10K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
904
  • Programming and Computer Science
Replies
1
Views
1K
  • Special and General Relativity
2
Replies
51
Views
2K
Replies
13
Views
1K
Back
Top