# Debugging Lower-Triangular Matrix Multiply with BLAS strmm()

• swartzism
In summary, BLAS strmm() is a function used for debugging lower-triangular matrix multiplication. It is a subroutine of BLAS (Basic Linear Algebra Subprograms) that helps to optimize the performance of matrix operations. This function is particularly useful for debugging as it allows for efficient computation of lower-triangular matrix multiplication by using BLAS routines. It can also handle various data types and sizes, making it a versatile tool for debugging matrix operations. Overall, BLAS strmm() is a valuable tool for improving the efficiency and accuracy of lower-triangular matrix multiplication in a variety of applications.
swartzism
Right now, I'm putting together a program to perform a lower-triangular matrix matrix multiply, which should be straightforward, but I'm getting crap results. I'm using BLAS's strmm (s for single precision) and the call is of the form:

Code:
strmm(&side, &uplo, &transa, &diag, &N, &N, &alpha, *A, &N, *B, &N)

The full documentation of the subroutine's arguments can be found here: http://software.intel.com/sites/pro...GUID-0E088C97-E200-44A2-AA53-7DDBD070F061.htm

For this, the main things you need to know are N (dimension of A and B: NxN), A and B. A and B are the lower-triangular square matrices. So I set those matrices correctly (I checked) and then I call strmm in the exact manner I listed above, and it returns the exact same B matrix.

Code:
Matrix dimension is set to 5 Ingoing matrix B:

2.000   0.000   0.000   0.000   0.000
2.000   2.000   0.000   0.000   0.000
2.000   2.000   2.000   0.000   0.000
2.000   2.000   2.000   2.000   0.000
2.000   2.000   2.000   2.000   2.000

Resulting matrix B:

2.000   0.000   0.000   0.000   0.000
2.000   2.000   0.000   0.000   0.000
2.000   2.000   2.000   0.000   0.000
2.000   2.000   2.000   2.000   0.000
2.000   2.000   2.000   2.000   2.000

Does anyone have any experience with using ?trmm or any other BLAS routines? If so, any help would be greatly appreciated.

Code:
// System headers
#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>
#include <stdint.h>
#include <time.h>

#include "mkl.h"

int main(int argc, char **argv)
{
char side = 'L',     // B := alpha*op(A)*B
uplo = 'L',     // Matrix is lower triangular
transa = 'N',   // op(A) = A
diag = 'U';     // Matrix is not unit triangular
MKL_INT N = 5, NP;   // M, N, lda, ldb
float alpha = 1.0;   // Scaling factor
float A[N][N];       // A matrix (NxN)
float B[N][N];       // B matrix (NxN)
int matrix_elements; // Matrix size in elements

int i, j; // Counters

time_t t0, t1; // Timers

printf("\nMatrix dimension is set to %d \n\n", (int)N);

t0 = time(NULL);

matrix_elements = N * N;

// Set A and B to 0.0
for (i = 0; i < N; i++)
for (j = 0; j <= i; j++)
{
A[i][j] = 0.0;
B[i][j] = 0.0;
}

// Initialize the matrices to be lower triangluar
for (i = 0; i < N; i++)
for (j = 0; j <= i; j++)
{
A[i][j] = 2.0;
B[i][j] = 2.0;
}

// Display the input
printf("\nIngoing matrix B:\n");
if (N>10)
{
printf("NOTE: B is too large, print only upper-left 10x10 block...\n");
NP=10;
}
else
NP=N;

printf("\n");
for (i = 0; i < NP; i++)
{
for (j = 0; j < NP; j++)
printf("%7.3f ", B[i][j]);
printf("\n");
}

strmm(&side, &uplo, &transa, &diag, &N, &N, &alpha, *A, &N, *B, &N);

t1 = time(NULL);

// Display the result
printf("\nResulting matrix B:\n");
if (N>10)
{
printf("NOTE: B is too large, print only upper-left 10x10 block...\n");
NP=10;
}
else
NP=N;

printf("\n");
for (i = 0; i < NP; i++)
{
for (j = 0; j < NP; j++)
printf("%7.3f ", B[i][j]);
printf("\n");
}

printf ("\nElapsed wall clock time: %ld\n", (long)(t1 - t0));

return 0;
}

I found the answer to my question, and it is kind of funny. So all of these BLAS MM subroutines want vector input, so for triangular matrices, you have to scatter your data along a vector: X 0 0 0 X X 0 0 X X X 0 X X X X (for a 4x4 lower-triangular). Another catch is that a lot of these math libraries like BLAS were written for Fortran. The catch there is that Fortran is column-major, so you have to scatter those values backwards. Like so:

Code:
// System headers
#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>
#include <stdint.h>
#include <time.h>

#include "mkl.h"

int main(int argc, char **argv)
{
char transa = 'N', side = 'L', uplo = 'L', diag = 'U'; // Transposition options
MKL_INT N = 5, NP; // N = M, N, K, lda, ldb, ldc
float alpha = 1.0; // Scaling factors
float *A, *B; // Matrices
int matrix_bytes; // Matrix size in bytes
int matrix_elements; // Matrix size in elements
int i, j; // Counters
time_t t0, t1; // timers

printf("\nMatrix dimension is set to %d \n\n", (int)N);

t0 = time(NULL);

matrix_elements = N * N;
matrix_bytes = sizeof(float) * matrix_elements;

// Allocate the matrices
A = malloc(matrix_bytes);
if (A == NULL)
{
printf("Could not allocate matrix A\n");
return -1;
}

B = malloc(matrix_bytes);
if (B == NULL)
{
printf("Could not allocate matrix B\n");
return -1;
}

for (i = 0; i < matrix_elements; i++)
{
A[i] = 0.0;
B[i] = 0.0;
}

// Initialize the matrices
//for (i = 0; i < matrix_elements; i++)
for (i = 0; i < N; i++)
for (j = 0; j <= i; j++)
{
A[i+N*j] = 1.0;
B[i+N*j] = 2.0;
}

// Display the input
printf("Ingoing matrix B:\n");
if (N > 10)
{
printf("NOTE: B is too large, print only upper-left 10x10 block...\n");
NP = 10;
}
else
NP = N;

printf("\n");
for (i = 0; i < NP; i++)
{
for (j = 0; j < NP; j++)
printf("%7.3f ", B[i+j*N]);
printf("\n");
}

strmm(&side, &uplo, &transa, &diag, &N, &N, &alpha, A, &N, B, &N);

t1 = time(NULL);

// Display the result
printf("\nResulting matrix B:\n");
if (N > 10)
{
printf("NOTE: B is too large, print only upper-left 10x10 block...\n");
NP = 10;
}
else
NP = N;

printf("\n");
for (i = 0; i < NP; i++)
{
for (j = 0; j < NP; j++)
printf("%7.3f ", B[i + j * N]);
printf("\n");
}

printf ("\nElapsed wall clock time: %ld\n", (long)(t1 - t0));

// Free the matrix memory
free(A);
free(B);

return 0;
}

Last edited by a moderator:

## 1. What is BLAS strmm()?

BLAS strmm() is a function in the Basic Linear Algebra Subprograms (BLAS) library that computes the multiplication of a lower-triangular matrix with another matrix. It stands for "Single precision Triangular Matrix-Matrix multiply".

## 2. What is debugging and why is it necessary for strmm()?

Debugging is the process of identifying and fixing errors or bugs in a software program. It is necessary for strmm() because errors can occur during the execution of the function, leading to incorrect results. Debugging helps to ensure that the function performs as intended and produces accurate results.

## 3. How do I debug strmm()?

To debug strmm(), you can start by checking the inputs and making sure they are in the correct format and size. You can also use debugging tools such as print statements or a debugger to track the execution of the function and identify any errors. Testing the function with different input values can also help in detecting and fixing any bugs.

## 4. What are some common errors that can occur in strmm()?

Some common errors that can occur in strmm() include incorrect input sizes, uninitialized variables, and incorrect indexing. These errors can lead to incorrect results or even crashes in the program. Using proper input validation and careful coding can help prevent these errors.

## 5. How can I optimize the performance of strmm()?

To optimize the performance of strmm(), you can use compiler optimizations such as loop unrolling and vectorization. You can also consider using parallel programming techniques to distribute the workload across multiple processors. Additionally, using efficient algorithms and data structures can also improve the performance of the function.

• Programming and Computer Science
Replies
25
Views
2K
• Programming and Computer Science
Replies
4
Views
1K
• Programming and Computer Science
Replies
17
Views
2K
• Programming and Computer Science
Replies
5
Views
1K
• Programming and Computer Science
Replies
5
Views
2K
• Programming and Computer Science
Replies
7
Views
4K
• Programming and Computer Science
Replies
4
Views
1K
• Programming and Computer Science
Replies
1
Views
2K
• Programming and Computer Science
Replies
8
Views
3K
• Atomic and Condensed Matter
Replies
7
Views
501