Debugging Lower-Triangular Matrix Multiply with BLAS strmm()

  • Thread starter Thread starter swartzism
  • Start date Start date
  • Tags Tags
    Debugging Matrix
Click For Summary
SUMMARY

The discussion focuses on debugging the lower-triangular matrix multiplication using the BLAS function strmm(). The user encountered an issue where the resulting matrix B remained unchanged after the function call, despite correctly setting up the input matrices A and B. The solution involved recognizing that BLAS routines expect vector input for triangular matrices, necessitating the use of a one-dimensional array representation. Additionally, the user highlighted the importance of understanding Fortran's column-major storage when working with these libraries.

PREREQUISITES
  • Understanding of BLAS (Basic Linear Algebra Subprograms) and its routines, specifically strmm()
  • Familiarity with matrix representation in programming, particularly lower-triangular matrices
  • Knowledge of memory allocation in C, especially using malloc()
  • Basic understanding of Fortran's column-major order and its implications for C programming
NEXT STEPS
  • Research how to implement matrix operations using BLAS in C, focusing on strmm() and its parameters
  • Learn about memory management in C, specifically dynamic memory allocation and deallocation
  • Explore the differences between row-major and column-major storage in programming languages
  • Investigate other BLAS routines for matrix operations, such as dgemm() for general matrix multiplication
USEFUL FOR

Software developers, numerical analysts, and researchers working with linear algebra computations, particularly those utilizing the Intel MKL and BLAS libraries for performance optimization in scientific computing.

swartzism
Messages
103
Reaction score
0
Right now, I'm putting together a program to perform a lower-triangular matrix matrix multiply, which should be straightforward, but I'm getting crap results. I'm using BLAS's strmm (s for single precision) and the call is of the form:

Code:
strmm(&side, &uplo, &transa, &diag, &N, &N, &alpha, *A, &N, *B, &N)

The full documentation of the subroutine's arguments can be found here: http://software.intel.com/sites/pro...GUID-0E088C97-E200-44A2-AA53-7DDBD070F061.htm

For this, the main things you need to know are N (dimension of A and B: NxN), A and B. A and B are the lower-triangular square matrices. So I set those matrices correctly (I checked) and then I call strmm in the exact manner I listed above, and it returns the exact same B matrix.

Code:
Matrix dimension is set to 5 Ingoing matrix B:

  2.000   0.000   0.000   0.000   0.000 
  2.000   2.000   0.000   0.000   0.000 
  2.000   2.000   2.000   0.000   0.000 
  2.000   2.000   2.000   2.000   0.000 
  2.000   2.000   2.000   2.000   2.000 

Resulting matrix B:

  2.000   0.000   0.000   0.000   0.000 
  2.000   2.000   0.000   0.000   0.000 
  2.000   2.000   2.000   0.000   0.000 
  2.000   2.000   2.000   2.000   0.000 
  2.000   2.000   2.000   2.000   2.000

Does anyone have any experience with using ?trmm or any other BLAS routines? If so, any help would be greatly appreciated.

Code:
// System headers 
#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>
#include <stdint.h>
#include <time.h>

// MKL header

#include "mkl.h"

int main(int argc, char **argv)
{   
   char side = 'L',     // B := alpha*op(A)*B 
        uplo = 'L',     // Matrix is lower triangular
        transa = 'N',   // op(A) = A
        diag = 'U';     // Matrix is not unit triangular
   MKL_INT N = 5, NP;   // M, N, lda, ldb
   float alpha = 1.0;   // Scaling factor
   float A[N][N];       // A matrix (NxN)
   float B[N][N];       // B matrix (NxN)
   int matrix_elements; // Matrix size in elements

   int i, j; // Counters

   time_t t0, t1; // Timers

   printf("\nMatrix dimension is set to %d \n\n", (int)N);

   t0 = time(NULL);

   matrix_elements = N * N;

   // Set A and B to 0.0
   for (i = 0; i < N; i++) 
      for (j = 0; j <= i; j++) 
      {
         A[i][j] = 0.0; 
         B[i][j] = 0.0; 
      }

   // Initialize the matrices to be lower triangluar
   for (i = 0; i < N; i++) 
      for (j = 0; j <= i; j++) 
      {
         A[i][j] = 2.0; 
         B[i][j] = 2.0; 
      }
   
   // Display the input
   printf("\nIngoing matrix B:\n");
   if (N>10) 
   {
      printf("NOTE: B is too large, print only upper-left 10x10 block...\n");
      NP=10;
   } 
   else 
      NP=N;

   printf("\n");
   for (i = 0; i < NP; i++) 
   {
      for (j = 0; j < NP; j++)
         printf("%7.3f ", B[i][j]);
      printf("\n");
   }

   strmm(&side, &uplo, &transa, &diag, &N, &N, &alpha, *A, &N, *B, &N);

   t1 = time(NULL);

   // Display the result
   printf("\nResulting matrix B:\n");
   if (N>10) 
   {
      printf("NOTE: B is too large, print only upper-left 10x10 block...\n");
      NP=10;
   } 
   else 
      NP=N;

   printf("\n");
   for (i = 0; i < NP; i++) 
   {
      for (j = 0; j < NP; j++)
         printf("%7.3f ", B[i][j]);
      printf("\n");
   }

   printf ("\nElapsed wall clock time: %ld\n", (long)(t1 - t0));

   return 0;
}
 
Technology news on Phys.org
I found the answer to my question, and it is kind of funny. So all of these BLAS MM subroutines want vector input, so for triangular matrices, you have to scatter your data along a vector: X 0 0 0 X X 0 0 X X X 0 X X X X (for a 4x4 lower-triangular). Another catch is that a lot of these math libraries like BLAS were written for Fortran. The catch there is that Fortran is column-major, so you have to scatter those values backwards. Like so:

Code:
// System headers
#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>
#include <stdint.h>
#include <time.h>

// MKL header
#include "mkl.h"

int main(int argc, char **argv)
{
   char transa = 'N', side = 'L', uplo = 'L', diag = 'U'; // Transposition options
   MKL_INT N = 5, NP; // N = M, N, K, lda, ldb, ldc
   float alpha = 1.0; // Scaling factors 
   float *A, *B; // Matrices 
   int matrix_bytes; // Matrix size in bytes 
   int matrix_elements; // Matrix size in elements
   int i, j; // Counters
   time_t t0, t1; // timers

   printf("\nMatrix dimension is set to %d \n\n", (int)N);

   t0 = time(NULL);

   matrix_elements = N * N;
   matrix_bytes = sizeof(float) * matrix_elements;

   // Allocate the matrices
   A = malloc(matrix_bytes);
   if (A == NULL) 
   {
      printf("Could not allocate matrix A\n");
      return -1;
   }

   B = malloc(matrix_bytes);
   if (B == NULL) 
   {
      printf("Could not allocate matrix B\n");
      return -1;
   }

   for (i = 0; i < matrix_elements; i++) 
   {
      A[i] = 0.0;
      B[i] = 0.0;
   }

   // Initialize the matrices
   //for (i = 0; i < matrix_elements; i++) 
   for (i = 0; i < N; i++) 
      for (j = 0; j <= i; j++) 
      {
         A[i+N*j] = 1.0; 
         B[i+N*j] = 2.0; 
      }
   
   // Display the input
   printf("Ingoing matrix B:\n");
   if (N > 10) 
   {
      printf("NOTE: B is too large, print only upper-left 10x10 block...\n");
      NP = 10;
   } 
   else 
      NP = N;

   printf("\n");
   for (i = 0; i < NP; i++) 
   {
      for (j = 0; j < NP; j++)
         printf("%7.3f ", B[i+j*N]);
      printf("\n");
   }

   strmm(&side, &uplo, &transa, &diag, &N, &N, &alpha, A, &N, B, &N);

   t1 = time(NULL);

   // Display the result
   printf("\nResulting matrix B:\n");
   if (N > 10) 
   {
      printf("NOTE: B is too large, print only upper-left 10x10 block...\n");
      NP = 10;
   } 
   else 
      NP = N;

   printf("\n");
   for (i = 0; i < NP; i++) 
   {
      for (j = 0; j < NP; j++)
         printf("%7.3f ", B[i + j * N]);
      printf("\n");
   }

   printf ("\nElapsed wall clock time: %ld\n", (long)(t1 - t0));

   // Free the matrix memory
   free(A); 
   free(B); 

   return 0;
}
 
Last edited by a moderator:

Similar threads

  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 25 ·
Replies
25
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 7 ·
Replies
7
Views
5K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 17 ·
Replies
17
Views
3K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 2 ·
Replies
2
Views
3K