Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Strmm Issue

  1. Feb 8, 2013 #1
    Right now, I'm putting together a program to perform a lower-triangular matrix matrix multiply, which should be straightforward, but I'm getting crap results. I'm using BLAS's strmm (s for single precision) and the call is of the form:

    Code (Text):
    strmm(&side, &uplo, &transa, &diag, &N, &N, &alpha, *A, &N, *B, &N)
    The full documentation of the subroutine's arguments can be found here: http://software.intel.com/sites/pro...GUID-0E088C97-E200-44A2-AA53-7DDBD070F061.htm

    For this, the main things you need to know are N (dimension of A and B: NxN), A and B. A and B are the lower-triangular square matrices. So I set those matrices correctly (I checked) and then I call strmm in the exact manner I listed above, and it returns the exact same B matrix.

    Code (Text):
    Matrix dimension is set to 5


    Ingoing matrix B:

      2.000   0.000   0.000   0.000   0.000
      2.000   2.000   0.000   0.000   0.000
      2.000   2.000   2.000   0.000   0.000
      2.000   2.000   2.000   2.000   0.000
      2.000   2.000   2.000   2.000   2.000

    Resulting matrix B:

      2.000   0.000   0.000   0.000   0.000
      2.000   2.000   0.000   0.000   0.000
      2.000   2.000   2.000   0.000   0.000
      2.000   2.000   2.000   2.000   0.000
      2.000   2.000   2.000   2.000   2.000
    Does anyone have any experience with using ?trmm or any other BLAS routines? If so, any help would be greatly appreciated.

    Code (Text):
    // System headers
    #include <stdio.h>
    #include <stdlib.h>
    #include <malloc.h>
    #include <stdint.h>
    #include <time.h>

    // MKL header

    #include "mkl.h"

    int main(int argc, char **argv)
    {  
       char side = 'L',     // B := alpha*op(A)*B
            uplo = 'L',     // Matrix is lower triangular
            transa = 'N',   // op(A) = A
            diag = 'U';     // Matrix is not unit triangular
       MKL_INT N = 5, NP;   // M, N, lda, ldb
       float alpha = 1.0;   // Scaling factor
       float A[N][N];       // A matrix (NxN)
       float B[N][N];       // B matrix (NxN)
       int matrix_elements; // Matrix size in elements

       int i, j; // Counters

       time_t t0, t1; // Timers

       printf("\nMatrix dimension is set to %d \n\n", (int)N);

       t0 = time(NULL);

       matrix_elements = N * N;

       // Set A and B to 0.0
       for (i = 0; i < N; i++)
          for (j = 0; j <= i; j++)
          {
             A[i][j] = 0.0;
             B[i][j] = 0.0;
          }

       // Initialize the matrices to be lower triangluar
       for (i = 0; i < N; i++)
          for (j = 0; j <= i; j++)
          {
             A[i][j] = 2.0;
             B[i][j] = 2.0;
          }
       
       // Display the input
       printf("\nIngoing matrix B:\n");
       if (N>10)
       {
          printf("NOTE: B is too large, print only upper-left 10x10 block...\n");
          NP=10;
       }
       else
          NP=N;

       printf("\n");
       for (i = 0; i < NP; i++)
       {
          for (j = 0; j < NP; j++)
             printf("%7.3f ", B[i][j]);
          printf("\n");
       }

       strmm(&side, &uplo, &transa, &diag, &N, &N, &alpha, *A, &N, *B, &N);

       t1 = time(NULL);

       // Display the result
       printf("\nResulting matrix B:\n");
       if (N>10)
       {
          printf("NOTE: B is too large, print only upper-left 10x10 block...\n");
          NP=10;
       }
       else
          NP=N;

       printf("\n");
       for (i = 0; i < NP; i++)
       {
          for (j = 0; j < NP; j++)
             printf("%7.3f ", B[i][j]);
          printf("\n");
       }

       printf ("\nElapsed wall clock time: %ld\n", (long)(t1 - t0));

       return 0;
    }
     
  2. jcsd
  3. Feb 8, 2013 #2
    I found the answer to my question, and it is kind of funny. So all of these BLAS MM subroutines want vector input, so for triangular matrices, you have to scatter your data along a vector: X 0 0 0 X X 0 0 X X X 0 X X X X (for a 4x4 lower-triangular). Another catch is that a lot of these math libraries like BLAS were written for Fortran. The catch there is that Fortran is column-major, so you have to scatter those values backwards. Like so:

    Code (Text):
    // System headers
    #include <stdio.h>
    #include <stdlib.h>
    #include <malloc.h>
    #include <stdint.h>
    #include <time.h>

    // MKL header
    #include "mkl.h"

    int main(int argc, char **argv)
    {
       char transa = 'N', side = 'L', uplo = 'L', diag = 'U'; // Transposition options
       MKL_INT N = 5, NP; // N = M, N, K, lda, ldb, ldc
       float alpha = 1.0; // Scaling factors
       float *A, *B; // Matrices
       int matrix_bytes; // Matrix size in bytes
       int matrix_elements; // Matrix size in elements
       int i, j; // Counters
       time_t t0, t1; // timers

       printf("\nMatrix dimension is set to %d \n\n", (int)N);

       t0 = time(NULL);

       matrix_elements = N * N;
       matrix_bytes = sizeof(float) * matrix_elements;

       // Allocate the matrices
       A = malloc(matrix_bytes);
       if (A == NULL)
       {
          printf("Could not allocate matrix A\n");
          return -1;
       }

       B = malloc(matrix_bytes);
       if (B == NULL)
       {
          printf("Could not allocate matrix B\n");
          return -1;
       }

       for (i = 0; i < matrix_elements; i++)
       {
          A[i] = 0.0;
          B[i] = 0.0;
       }

       // Initialize the matrices
       //for (i = 0; i < matrix_elements; i++)
       for (i = 0; i < N; i++)
          for (j = 0; j <= i; j++)
          {
             A[i+N*j] = 1.0;
             B[i+N*j] = 2.0;
          }
       
       // Display the input
       printf("Ingoing matrix B:\n");
       if (N > 10)
       {
          printf("NOTE: B is too large, print only upper-left 10x10 block...\n");
          NP = 10;
       }
       else
          NP = N;

       printf("\n");
       for (i = 0; i < NP; i++)
       {
          for (j = 0; j < NP; j++)
             printf("%7.3f ", B[i+j*N]);
          printf("\n");
       }

       strmm(&side, &uplo, &transa, &diag, &N, &N, &alpha, A, &N, B, &N);

       t1 = time(NULL);

       // Display the result
       printf("\nResulting matrix B:\n");
       if (N > 10)
       {
          printf("NOTE: B is too large, print only upper-left 10x10 block...\n");
          NP = 10;
       }
       else
          NP = N;

       printf("\n");
       for (i = 0; i < NP; i++)
       {
          for (j = 0; j < NP; j++)
             printf("%7.3f ", B[i + j * N]);
          printf("\n");
       }

       printf ("\nElapsed wall clock time: %ld\n", (long)(t1 - t0));

       // Free the matrix memory
       free(A);
       free(B);

       return 0;
    }
     
    Last edited by a moderator: Feb 8, 2013
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook




Similar Discussions: Strmm Issue
  1. Ifstream issues (Replies: 8)

  2. Weird C++ Array Issue (Replies: 15)

  3. Issues with C++ errors? (Replies: 13)

Loading...