# Ranking Algorithm

## Main Question or Discussion Point

Not sure if this is the correct place to post this type of question, so I will relocate if need be. What I'm interested in doing is taking an array of numbers and assigning a rank based on ascending order. Explicitly, the ranking of an array would look like

1.534 - 15
-0.887 - 3
-0.489 - 5
0.887 - 13
1.15 - 14
0.157 - 9
-1.15 - 2
0 - 8
0.319 - 10
-0.319 - 6
-1.534 - 1
-0.157 - 7
0.489 - 11
0.674 - 12
-0.674 - 4

So I don't actually want to re-order the array, rather assign a number based on the position's size. I'm coding this in C. It is easy to get the min and max, but how would I go about the rest of the ranking in an efficient manner? Looking ahead, I would like to do this on an array of length 25000!

Related Programming and Computer Science News on Phys.org
I'm thinking of something like

sort array and store in new array sorted
for i from 0 to size(sorted)
for j from 0 to size(sorted)
if sorted(i) == unsorted(j) then rank(i) = j

where unsorted is the original array. Not sure how efficient this is, but it should work.

Svein
In C, you would most likely do it with pointers. Create a matching array with pointers to the elements in the first array and sort them (check your libraries for quicksort).

D H
Staff Emeritus
Not sure how efficient this is, but it should work.
That's not very efficient, and there's a problem if you have duplicate elements.

First things first:
• This is C, not Fortran. The obvious value in C for the smallest value of ranking is zero, not one. Zero works much nicer in C than does one.
• Do you need to know how the array was organized prior to sorting? If you don't, just sort the array and be done with it. In that case, the index numbers *are* the rankings.
Assuming that maintaining the original ordering is important, I'll outline a solution. Oftentimes, it's handy to sort the indices rather than the values. Doing this maintains the original structure. You'll need a comparison function:
Code:
static const double* array_to_be_sorted;
void set_array_to_be_sorted (const double* arr)
{
array_to_be_sorted = arr;
}

int compare_by_index (const void* a, const void* b)
{
double da = array_to_be_sorted[*(size_t*)a];
double db = array_to_be_sorted[*(size_t*)b];
return (da > db) - (da < db);
}
Note that the above uses the global (but static, meaning hidden from the linker) variable array_to_be_sorted. In C++, this need for a global vanishes if one uses a functor or a lambda.

With this, you can sort the array by indices:
Code:
   size_t* sorted = malloc (len*sizeof(size_t));
for (size_t idx = 0; idx < len; idx++) { sorted[idx] = idx; }
set_array_to_be_sorted (arr);
qsort (sorted, len, sizeof(size_t), compare_by_index);
With this, the array sorted gives the index of the elements in arr had that array itself been sorted. You can use these to find the zero-based ranking of elements in arr via
Code:
   size_t* rank = malloc (len*sizeof(size_t));
for (size_t idx = 0; idx < len; idx++) { rank[sorted[idx]] = idx; }

• Silicon Waffle
wle
Note that the above uses the global (but static, meaning hidden from the linker) variable array_to_be_sorted. In C++, this need for a global vanishes if one uses a functor or a lambda.
You can also avoid it by sorting an array of pointers, like Svein suggested.

I could use the practice, so I typed a working example:

C:
#include <stdio.h>
#include <stdlib.h>

typedef int (*qsort_comp)(const void *, const void *);

int dptr_comp(const double **px, const double **py)
{
double x = **px, y = **py;
return (x > y) - (x < y);
}

int rank_elts(const double *base, size_t *rank, size_t n_elts)
{
const double **elts = malloc(sizeof(base) * n_elts);

if (!elts)
return 0;

for (size_t i = 0; i < n_elts; ++i)
elts[i] = base + i;

qsort(elts, n_elts, sizeof(*elts), (qsort_comp)dptr_comp);

for (size_t i = 0; i < n_elts; ++i)
rank[elts[i] - base] = i + 1;

free(elts);
return 1;
}

/* Try it out. */

int main(int argc, char **argv)
{
double my_data[] = { 1.534, -0.887, -0.489,  0.887,  1.15,
0.157, -1.15,   0.0,    0.319, -0.319,
-1.534, -0.157,  0.489,  0.674, -0.674};

size_t rank[sizeof(my_data) / sizeof(*my_data)];
size_t n_elts = sizeof(my_data) / sizeof(*my_data);

if ( !rank_elts(my_data, rank, n_elts) ) {
if (argc > 0)
fprintf(stderr, "%s: ", argv);

fputs("error: unable to allocate memory.\n", stderr);
return EXIT_FAILURE;
}

for (size_t i = 0; i < n_elts; ++i)
printf("%6.3f => %2zu\n", my_data[i], rank[i]);

return 0;
}
This uses that &my_data[n] - &my_data == n in order to recover the index.

• Silicon Waffle
Thanks for the responses! I'll be testing out how quick these are on a 25000x128 size this week and updating with issues/progress.

D H
Staff Emeritus
Thanks for the responses! I'll be testing out how quick these are on a 25000x128 size this week and updating with issues/progress.
If you want to speed things up a bit, take my approach but use an array of int for the indices rather than an array of size_t, and change the types of the loop indices from size_t to int. On a 64 bit machine, an array of pointers will take twice the size as an array of integers. With 3200000 (25000x128) elements, you won't be able to have all of your data in level 2 cache. That reduction in size will result in fewer cache misses. While using an array of pointers (wle's suggestion) does eliminate the need for a global variable, it comes at the expense of double indirection and increased memory consumption.

• Silicon Waffle
Alright boys, what did I do wrong?

C:
#include <stdio.h>
#include <stdlib.h>

static const double* array_to_be_sorted;

void set_array_to_be_sorted(const double* arr)
{
array_to_be_sorted = arr;
}

int compare_by_index(const void* a, const void* b)
{
double da = array_to_be_sorted[*(int*)a];
double db = array_to_be_sorted[*(int*)b];
return (da > db) - (da < db);
}

int main(int argc, char **argv)
{
int len = 15;
double arr[] = { 1.534, -0.887, -0.489, 0.887,  1.15,
0.157, -1.14,  0.0,  0.319, -0.319,
-1.534, -0.157,  0.489, 0.674, -0.674 };
int* sorted = malloc(len*sizeof(int));
int* rank = malloc(len*sizeof(int));

for (int idx = 0; idx < len; idx++)
{
sorted[idx] = idx;
}
set_array_to_be_sorted(arr);
qsort(sorted, len, sizeof(int), compare_by_index);
for (int idx = 0; idx < len; idx++)
{
rank[sorted[idx]] = idx;
}

for (int idx = 0; idx < len; idx++)
{
printf("\t%f\t%d\t%d\n", arr[idx], sorted[idx], rank[idx]);
}

return 0;
}
Gives output

Code:
  1.534000  10  14
-0.887000  6  2
-0.489000  1  4
0.887000  14  12
1.150000  2  13
0.157000  9  8
-1.140000  11  1
0.000000  7  7
0.319000  5  9
-0.319000  8  5
-1.534000  12  0
-0.157000  13  6
0.489000  3  10
0.674000  4  11
-0.674000  0  3

D H
Staff Emeritus
Alright boys, what did I do wrong?
The only thing you did wrong was to pay attention to the array sorted. That array is but an intermediary that helps your get what you want, which is the ranking. Taking your exact output, and discarding the middle item, and prettying things up a bit yields

Code:
  1.534000   14
-0.887000    2
-0.489000    4
0.887000   12
1.150000   13
0.157000    8
-1.140000    1
0.000000    7
0.319000    9
-0.319000    5
-1.534000    0
-0.157000    6
0.489000   10
0.674000   11
-0.674000    3
That's exactly what you want.

• Silicon Waffle
Brain fart! I'm going to attempt to modify things a bit to fit in my overall program. I will update if I run into any issues.

Thanks again for the help.

...
DH,

I have successfully implemented your routine! I modified the above code to be a subroutine taking in a NxK matrix and displaying the sorted columns individually with the following results:

Code:
Time to sort:
Number of iterations: 25000
Number of variables: 128
Total time: 0.481050 (s)
(I have verified the results as well.) The next step for me is to reorder the original column based off of this new rank order and iterate the "guts" of my program a certain number of times. I'm very happy with the sorting time of your method! Thanks again!

I will post my (probably naive) method of reordering the original column sometime this week. Hopefully y'all can help me out with another slick way of doing it!

OK after doing some scribbling on paper, here is a brute force method I've come up with

Code:
read in R, R*, m, n to subroutine sort
/* looking to resort columns of R based off of rank of columns of R */
/* m is number of rows, n is number of columns (R & R* are mxn) */
for i = 0, n
get rank of column i for R, R* with current code (store in rankR, rankR*, respectively)
for j = 0, m
for k = 0, m
if (rankR(k) == rankR*(j))
tmp(j) = R(rankR(k)) /* can use sorted array as tmp to save memory since it is available after ranking is completed */
endif
end k for
end j for
R = tmp
end i for
Is there possibly a slicker way to do this? I had no idea qsort existed, so there may be something clever that I don't even know about!

wle
OK after doing some scribbling on paper, here is a brute force method I've come up with

Code:
read in R, R*, m, n to subroutine sort
/* looking to resort columns of R based off of rank of columns of R */
/* m is number of rows, n is number of columns (R & R* are mxn) */
for i = 0, n
get rank of column i for R, R* with current code (store in rankR, rankR*, respectively)
for j = 0, m
for k = 0, m
if (rankR(k) == rankR*(j))
tmp(j) = R(rankR(k)) /* can use sorted array as tmp to save memory since it is available after ranking is completed */
endif
end k for
end j for
R = tmp
end i for
Is there possibly a slicker way to do this? I had no idea qsort existed, so there may be something clever that I don't even know about!

Assuming rankR and rankR* contain the same integer ranks, shouldn't this bit of your pseudocode:
Code:
      for k = 0, m
if (rankR(k) == rankR*(j))
tmp(j) = R(rankR(k)) /* can use sorted array as tmp to save memory since it is available after ranking is completed */
endif
end k for
just reduce to this:
Code:
      tmp(j) = R(rankR*(j))
?

It's not very clear from your explanations what you actually want to do. Guessing from your pseudocode, it looks like you've got some matrix R and you want to sort its columns so they're in the same order as the columns of another matrix R*. So for example if R*(0,i) is the 5th smallest element in the ith column of R* then, after sorting, you want R(0,i) to be the 5th smallest element in the ith column of R. Is this correct?

Incidentally, if you're using C then you should seriously consider storing your data to be sorted by row instead of by column. In general in C, the array elements a[j][k] and a[j][k+1] are adjacent to one another in memory; the array elements a[j][k] and a[j+1][k] are not. a[j] also is (or evaluates to) a pointer to the first element of the jth row, so you can write code like
C:
double *v = R[j];
v[k] = some_value;  // Same thing as R[j][k] = some_value;
foo(v);             // Call a function that operates on a row.
//   Same thing as foo(R[j]);

wle,

You're exactly right with the row-major bit. I'm so used to Fortran HPC that I often mix these up. I have a Pearsons correlation routine in this program that could possibly be sped up by simply transposing the data.

It's not very clear from your explanations what you actually want to do. Guessing from your pseudocode, it looks like you've got some matrix R and you want to sort its columns so they're in the same order as the columns of another matrix R*. So for example if R*(0,i) is the 5th smallest element in the ith column of R* then, after sorting, you want R(0,i) to be the 5th smallest element in the ith column of R. Is this correct?
This is correct.

wle
This is correct.
In this case, is there a reason you can't simply sort both arrays? What you want to do will involve sorting two arrays anyway.

If you specifically want to sort an array so it's in the same order as another array, then the first point is that the ranks aren't what you want in the end, so you can delete the final loop that assigns them (the loop over rank[sorted[idx]] = idx; from DH's post). You just need a shorter function that orders the indices of a given array:
C:
void sort_indices(const double *arr, size_t *ind, size_t len)
{
for (size_t j = 0; j < len; ++j)
ind[j] = j;

set_array_to_be_sorted(arr);
qsort(ind, len, sizeof(ind), compare_by_index);
}
This sorts an array of indices so that, for example, if ind == 5, it means that arr is the smallest element of the array.

Now let's say you've got two arrays, a and b, and you want to sort a so that its order matches b. You'll want to generate two corresponding arrays of sorted indices, ind_a and ind_b, and use them to relocate the elements of a. You can do this by running tmp[ind_b[j]] = a[ind_a[j]] in a loop* and then copying the contents of tmp to a:
C:
void matching_sort(double *a, const double *b, size_t len)
{
size_t *ind_a = malloc(len * sizeof(size_t));
size_t *ind_b = malloc(len * sizeof(size_t));
double *tmp = malloc(len * sizeof(*a));

sort_indices(a, ind_a, len);
sort_indices(b, ind_b, len);

for (size_t j = 0; j < len; ++j)
tmp[ind_b[j]] = a[ind_a[j]];

for (size_t j = 0; j < len; ++j)
a[j] = tmp[j];

free(ind_a);
free(ind_b);
free(tmp);
}

Finally, since it's actually two tables of numbers you've got, you can simply sort each row in a loop:
C:
int main(void)
{
/* Generate/read your data and store it by row in arrays r and s. */

for (size_t i = 0; i < number_of_rows; ++i)
matching_sort(r[i], s[i], row_length);

/* The rows of r are now ordered the same way as the rows of s. */

return 0;
}

A few remarks:
1. I've omitted error handling for simplicity, to avoid cluttering the example code above. In particular, matching_sort() tries to allocate temporary storage, which can fail (malloc() returns a null pointer to signal this). It's a good habit to explicitly test for and handle this sort of thing (see my post #5 above for example).
2. The way I've structured this code, matching_sort() allocates temporary storage every time you call it on a new row of data, which might significantly affect performance. It's possible to avoid this (e.g., make ind_a, ind_b, and tmp global variables or make matching_sort() take them as additional parameters, and allocate the necessary storage just once in advance, or just write matching_sort() to operate on the entire table) if you find that this slows your code more than you're willing to accept.
3. You can change the size_ts to ints like DH suggested, though my impression is that this is a bit of a hack and a potential bug (I might be wrong and maybe DH can comment on this, but my understanding is that size_t is guaranteed to be large enough to hold any (nonnegative) array index, while there's no such guarantee with int, though in practice this shouldn't be an issue with the array sizes you want to manipulate).

* a[ind_a[j]] is the jth smallest element of a (counting from j = 0) before we've sorted it. b[ind_b[j]] is similarly the jth smallest element of b. If the end goal is for a to be ordered the same way as b, this means that, after reordering, we want a[ind_b[j]] to be the jth smallest element of a, i.e., we want to move a[ind_a[j]] ##\mapsto## a[ind_b[j]].

Last edited:
wle,

This looks great. Per your row-major suggestion, I'm going to run back through my code and transpose the data off the bat so that I'm working with 128x25000 in the end rather than 25000x128. I don't know why I didn't think of that before! I'm hoping to get this rank and sort routine runtime down as low as possible since they may be performed upwards of a million times.

I will let you know when I get through with that and onto your suggestions above. For the time being, the general idea of what I'm doing is this:

Code:
input matrix A
do while some criteria is not met
perform some linear algebra stuff to calculate A*
reshuffle A based on rank of A*
check criteria
end do

...the ranks aren't what you want in the end, so you can delete the final loop that assigns them (the loop over rank[sorted[idx]] = idx; from DH's post).
wle,

I actually do need these rank values (for A*) to calculate the Spearman rho values between rows (I successfully converted to row major) as the Spearmans rank correlation matrix is part of the criteria for convergence.

wle,

I'm having some trouble implementing your method. I'm reading in the R and R* matrices and selecting their rows to operate the sort on, but am running into a segfault somehow. Here is my current code

Code:
#include <stdio.h>

#include <stdlib.h> // qsort
#include <mkl.h>
#include "params.h"

static const REAL* array_to_be_sorted;

int compare_by_indx(const void* a, const void* b)
{
REAL da = array_to_be_sorted[*(int*)a];
REAL db = array_to_be_sorted[*(int*)b];
return (da > db) - (da < db);
}

void set_array_to_be_sorted(const REAL* arr)
{
array_to_be_sorted = arr;
}

void sort_indices(const REAL *arr, int *ind, int len)
{
for (int i = 0; i < len; i++)
{
ind[i] = i;
}
set_array_to_be_sorted(arr);

printf("Printing first row of R\n");
for (int i = 0; i < len; i++)
{
printf("%f ", arr[i]);
}

qsort(ind, len, sizeof(ind), compare_by_indx);
printf("finished that too\n");
}

void matching_sort(REAL *r_data, REAL *rstarb_data, int m_dim, int n_dim)
{
int *ind_a = mkl_malloc(n_dim * sizeof(int), 64);
int *ind_b = mkl_malloc(n_dim * sizeof(int), 64);
REAL *tmp = mkl_malloc(n_dim * sizeof(REAL), 64);
REAL *a = mkl_malloc(n_dim * sizeof(REAL), 64);
REAL *b = mkl_malloc(n_dim * sizeof(REAL), 64);

if (ind_a == NULL || ind_b == NULL || tmp == NULL || a == NULL || b == NULL)
{
printf("\nERROR: Cannot allocate memory for matrices in MATCHING_SORT. Aborting...\n\n");
mkl_free(ind_a);
mkl_free(ind_b);
mkl_free(tmp);
mkl_free(a);
mkl_free(b);
exit(0);
}
else // Zero out arrays
{
for (int i = 0; i < n_dim; i++)
{
ind_a[i] = 0;
ind_b[i] = 0;
tmp[i] = 0.0;
ind_a[i] = 0.0;
ind_b[i] = 0.0;
}
}

for (int i = 0; i < m_dim; i++)
{
int indx = i * n_dim;
for (int j = 0; j < n_dim; j++)
{
a[j] = r_data[indx + j];
b[j] = rstarb_data[indx + j];
printf("%f %f\n", a[j], b[j]);
}

sort_indices(a, ind_a, n_dim);
sort_indices(b, ind_b, n_dim);

for (int j = 0; j < n_dim; j++)
{
tmp[ind_b[j]] = a[ind_a[j]];
}

for (int j = 0; j < n_dim; j++)
{
r_data[j] = tmp[j];
}
}

mkl_free(a);
mkl_free(b);
mkl_free(ind_a);
mkl_free(ind_b);
mkl_free(tmp);
}
and with these debug statements, I get the following output

Code:
1.534000 1.534000
-0.887000 -0.887000
-0.489000 -0.489000
0.887000 0.887000
1.150000 1.150000
0.157000 0.157000
-1.150000 -1.150000
0.000000 0.000000
0.319000 0.319000
-0.319000 -0.319000
-1.534000 -1.534000
-0.157000 -0.157000
0.489000 0.489000
0.674000 0.674000
-0.674000 -0.674000
Printing first row of R
Segmentation fault (core dumped)
Do you have any idea why I'm getting the segfault?

wle,

I fixed the error by basically inlining your sort_indices from my already working ranking routine. This produces what I need

Code:
#include <stdio.h>

#include <stdlib.h> // qsort
#include <mkl.h>
#include "params.h"

static REAL* array_to_be_ranked;

int compare_by_index(const void* a, const void* b)
{
REAL da = array_to_be_ranked[*(int*)a];
REAL db = array_to_be_ranked[*(int*)b];
return (da > db) - (da < db);
}

void set_array_to_be_ranked(REAL* arr)
{
array_to_be_ranked = arr;
}

void matching_sort(REAL *r_data, REAL *rstarb_data, int m_dim, int n_dim)
{
int *ind_a = mkl_malloc(n_dim * sizeof(int), 64);
int *ind_b = mkl_malloc(n_dim * sizeof(int), 64);
int *rank_a = mkl_malloc(n_dim * sizeof(int), 64);
int *rank_b = mkl_malloc(n_dim * sizeof(int), 64);
REAL *tmp = mkl_malloc(n_dim * sizeof(REAL), 64);
REAL *a = mkl_malloc(n_dim * sizeof(REAL), 64);
REAL *b = mkl_malloc(n_dim * sizeof(REAL), 64);

if (ind_a == NULL || ind_b == NULL || tmp == NULL || a == NULL || b == NULL)
{
printf("\nERROR: Cannot allocate memory for matrices in MATCHING_SORT. Aborting...\n\n");
mkl_free(ind_a);
mkl_free(ind_b);
mkl_free(tmp);
mkl_free(a);
mkl_free(b);
exit(0);
}
else // Zero out arrays
{
for (int i = 0; i < n_dim; i++)
{
ind_a[i] = 0;
ind_b[i] = 0;
tmp[i] = 0.0;
a[i] = 0.0;
b[i] = 0.0;
}
}

for (int i = 0; i < m_dim; i++)
{
int indx = i * n_dim;
for (int j = 0; j < n_dim; j++)
{
ind_a[j] = j;
ind_b[j] = j;
a[j] = r_data[indx + j];
b[j] = rstarb_data[indx + j];
}

set_array_to_be_ranked(a);
qsort(ind_a, n_dim, sizeof(int), compare_by_index);

set_array_to_be_ranked(b);
qsort(ind_b, n_dim, sizeof(int), compare_by_index);

for (int j = 0; j < n_dim; j++)
{
tmp[ind_b[j]] = a[ind_a[j]];
}

for (int j = 0; j < n_dim; j++)
{
r_data[indx + j] = tmp[j];
}
}

mkl_free(ind_a);
mkl_free(ind_b);
mkl_free(tmp);
mkl_free(a);
mkl_free(b);
}
I have modified all of my routines (with the exception of this one which I will do later) to read in the arrays they used to allocate to save on memory cost per your suggestion. Once I get that done, put everything in a do-while loop to terminate on a criteria, and verify the results with the white paper I'm testing, it'll be time for me to go to town on vectorization, parallelization, and other optimizations.

You guys have been a great help, I really appreciate it. If you're interested, I'll give initial timings for the 25000x128 case and update with speedups from the optimizations I plan on making.

wle
Do you have any idea why I'm getting the segfault?
I think that was my fault. My guess is that the problem was probably the point where you call qsort() in your sort_indices() function:
C:
   qsort(ind, len, sizeof(ind), compare_by_indx);
The argument sizeof(ind) should have been sizeof(*ind) or sizeof(ind). On my PC (x86-64 architecture), this error doesn't show up if ind is of type size_t *, but an int is 4 bytes while an int * is 8 bytes, which would cause qsort() to think the array of integers to be sorted is spread over twice as much memory as it really is. Using sizeof(int), like in your updated code, also fixes this, though it means that you'll have to remember to also change the argument to qsort() if you later decide to change the type of integer you use in your arrays.

(It's not obvious that the crash occurs earlier since standard output is buffered, meaning that just because you called printf() in the immediately preceding loop doesn't mean the data is immediately displayed. You can call fflush(stdout) to force currently buffered data to be displayed.)

I have modified all of my routines (with the exception of this one which I will do later) to read in the arrays they used to allocate to save on memory cost per your suggestion. Once I get that done, put everything in a do-while loop to terminate on a criteria, and verify the results with the white paper I'm testing, it'll be time for me to go to town on vectorization, parallelization, and other optimizations.
Well the first thing to consider when optimising is whether it's worth it at all. By the sound of things you've already spent at least a week or two developing this code. How often and how long is it actually going to run? If the unoptimised version runs in, say, a couple of hours or even a day or two, is it really a good use of your time to spend many hours or days optimising it?

That said, one thing you can certainly do easily is remove redundant code, since this has the practical benefit of shortening your code (and hence making it clearer) in addition to speeding it up a bit. In your version of matching_sort(),
1. You declare two pointers, int *rank_a and int *rank_b and allocate memory to them, but you never seem to actually use them.
2. This bit of code, which zero-initialises a lot of memory, is redundant:
C:
   else // Zero out arrays
{
for (int i = 0; i < n_dim; i++)
{
ind_a[i] = 0;
ind_b[i] = 0;
tmp[i] = 0.0;
a[i] = 0.0;
b[i] = 0.0;
}
}
You later overwrite all of this memory before doing anything important with it anyway.
3. You also declare and allocate memory for pointers REAL *a and REAL *b which seem unnecessary to me. Since the calls to qsort() don't modify your arrays of REAL numbers, you don't need to copy rows of your initial data. So you can delete all the code involving a and b and change the code using qsort() and the subsequent allocation to tmp to use r_data and rstarb_data directly:
C:
      set_array_to_be_ranked(r_data + indx);
qsort(ind_a, n_dim, sizeof(int), compare_by_index);

set_array_to_be_ranked(rstarb_data + indx);
qsort(ind_b, n_dim, sizeof(int), compare_by_index);

for (int j = 0; j < n_dim; j++)
{
tmp[ind_b[j]] = r_data[indx + ind_a[j]];
}
Alternatively, you can just set a = r_data + indx and b = rstarb_data + indx (so that for example a[j] becomes a synonym for r_data[indx + j]) and don't use mkl_malloc() on or copy to a or b, if you find this clearer.
4. A minor detail, but ++i is a simpler instruction than i++ (semantically, the latter needs to save a copy of the initial value somewhere). In practice your compiler will probably produce identical code if you don't use the return value, but older or less sophisticated compilers might not. I've just checked that both GCC and Clang produce identical code for ++i and i++ (even without -On optimisations), but PCC (an admittedly older and less developed compiler) doesn't unless you explicitly turn optimisation on.

Last edited:
Well the first thing to consider when optimising is whether it's worth it at all. By the sound of things you've already spent at least a week or two developing this code. How often and how long is it actually going to run? If the unoptimised version runs in, say, a couple of hours or even a day or two, is it really a good use of your time to spend many hours or days optimising it?
Absolutely. This is more of a learning process than anything. Plus, it's fun to see something go 7000x faster than an alternate version!
• I have removed the rank_* matrices (thought I needed them when I didn't).
• I initially zero out all arrays in the main program (I have found myself bitten in the butt when I don't do this in the past), so those only happen once and the speed is negligible
• I'll have to test out the pointer point you made. I am so used to avoiding pointers like the plague in coding Fortran that I am not all that familiar with them in C
• I did not know that about ++i versus i++. I'll have to correct that and use it in the future. I am currently using Intel's icc

wle,

I'm running into an issue that I either missed or is new. Matching sort is crapping out after sorting the second row:

Code:
R rank data:
15.0000     3.0000     5.0000    13.0000    14.0000     9.0000     2.0000     8.0000    10.0000     6.0000     1.0000     7.0000    11.0000    12.0000     4.0000
15.0000     5.0000    12.0000     8.0000     6.0000     1.0000     4.0000     3.0000     7.0000     9.0000    13.0000     2.0000    11.0000    10.0000    14.0000
1.0000    13.0000     5.0000     4.0000    11.0000     3.0000     7.0000     9.0000    12.0000     6.0000    14.0000    15.0000     2.0000    10.0000     8.0000
1.0000     3.0000    14.0000    10.0000    12.0000     4.0000     9.0000     6.0000    13.0000     2.0000    15.0000     7.0000    11.0000     8.0000     5.0000
11.0000     7.0000    15.0000     8.0000     9.0000     6.0000     1.0000     4.0000    12.0000    14.0000     5.0000     2.0000     3.0000    13.0000    10.0000
6.0000    12.0000     5.0000     1.0000    14.0000     9.0000     7.0000    13.0000    15.0000     3.0000     2.0000     4.0000     8.0000    10.0000    11.0000

R*B rank data:
15.0000     3.0000     5.0000    13.0000    14.0000     9.0000     2.0000     8.0000    10.0000     6.0000     1.0000     7.0000    11.0000    12.0000     4.0000
15.0000     6.0000    12.0000     8.0000     5.0000     1.0000     4.0000     3.0000     7.0000     9.0000    13.0000     2.0000    11.0000    10.0000    14.0000
5.0000    10.0000     4.0000     7.0000    14.0000     1.0000     2.0000     8.0000    13.0000     6.0000    12.0000    15.0000     3.0000    11.0000     9.0000
2.0000     1.0000    15.0000    10.0000    11.0000     8.0000     9.0000     6.0000    12.0000     3.0000    13.0000     5.0000    14.0000     7.0000     4.0000
1.0000     3.0000    15.0000    12.0000    13.0000    10.0000     5.0000     6.0000    14.0000     8.0000     9.0000     2.0000     7.0000    11.0000     4.0000
15.0000    13.0000     1.0000     2.0000     7.0000     8.0000    11.0000    12.0000     3.0000     6.0000     4.0000    10.0000     9.0000     5.0000    14.0000

Reranked R rank data:
15.0000     3.0000     5.0000    13.0000    14.0000     9.0000     2.0000     8.0000    10.0000     6.0000     1.0000     7.0000    11.0000    12.0000     4.0000
15.0000     6.0000    12.0000     8.0000     5.0000     1.0000     4.0000     3.0000     7.0000     9.0000    13.0000     2.0000    11.0000    10.0000    14.0000
3.0000    12.0000     1.0000     5.0000    10.0000     2.0000     4.0000     8.0000    14.0000     6.0000    11.0000    15.0000     7.0000    13.0000     9.0000
3.0000     2.0000    11.0000    10.0000    13.0000     5.0000     9.0000     6.0000    15.0000     1.0000    14.0000     8.0000    12.0000     4.0000     7.0000
13.0000     3.0000    15.0000    14.0000     5.0000     4.0000    11.0000    10.0000     8.0000    12.0000     1.0000     2.0000     7.0000     9.0000     6.0000
3.0000    13.0000    10.0000     5.0000    11.0000     8.0000    14.0000    12.0000     6.0000    15.0000     1.0000     2.0000     9.0000     4.0000     7.0000
Here is the current code:

Code:
#include <stdio.h>

#include <stdlib.h> // qsort
#include <mkl.h>
#include "params.h"

void show_matrix(REAL *A, int m_dim, int n_dim);

static REAL* array_to_be_sorted;

int compare_by_indx(const void* a, const void* b)
{
REAL da = array_to_be_sorted[*(int*)a];
REAL db = array_to_be_sorted[*(int*)b];
return (da > db) - (da < db);
}

void set_array_to_be_sorted(REAL* arr)
{
array_to_be_sorted = arr;
}

void matching_sort(REAL *r_data, REAL *rstarb_data, int *ind_a, int *ind_b, REAL *tmp, REAL *a, REAL *b, int m_dim, int n_dim)
{
__assume_aligned(r_data, ALIGN);
__assume_aligned(rstarb_data, ALIGN);
__assume_aligned(ind_a, ALIGN);
__assume_aligned(ind_b, ALIGN);
__assume_aligned(tmp, ALIGN);
__assume_aligned(a, ALIGN);
__assume_aligned(b, ALIGN);

for (int i = 0; i < m_dim; ++i)
{
int indx = i * n_dim;
#pragma ivdep
for (int j = 0; j < n_dim; ++j)
{
ind_a[j] = j;
ind_b[j] = j;
}

set_array_to_be_sorted(r_data + indx);
qsort(ind_a, n_dim, sizeof(int), compare_by_indx);

set_array_to_be_sorted(rstarb_data + indx);
qsort(ind_b, n_dim, sizeof(int), compare_by_indx);

for (int j = 0; j < n_dim; ++j)
{
tmp[ind_a[j]] = r_data[ind_b[j] + indx];
}

#pragma ivdep
for (int j = 0; j < n_dim; ++j)
{
r_data[indx + j] = tmp[j];
}
}
}
(I'm in the process of debugging and phasing out a and b, so there's some stuff in here that can be removed) Do you have any idea why this might be happening?

Edit: after some hail mary debugging, it came down to this simple fix

Code:
      for (int j = 0; j < n_dim; ++j)
{
tmp[ind_b[j]] = r_data[ind_a[j] + indx];
}

Last edited:
rcgldr
Homework Helper
Note if making a sorted copy of an array, then either sorted or rank can be used:
Code:
  array     index  sorted   rank
1.534000   0     10       14
-0.887000   1      6        2
-0.489000   2      1        4
0.887000   3     14       12
1.150000   4      2       13
0.157000   5      9        8
-1.140000   6     11        1
0.000000   7      7        7
0.319000   8      5        9
-0.319000   9      8        5
-1.534000  10     12        0
-0.157000  11     13        6
0.489000  12      3       10
0.674000  13      4       11
-0.674000  14      0        3
Code:
    for(i = 0; i < len; i++)
sorted_array[i] = array[sorted[i]];
Code:
    for(i = 0; i < len; i++)
sorted_array[rank[i]] = array[i];
If doing an inplace reorder_according_to, then sorted would be used. Note that this example of reorder_according_to also sorts sorted[], so a copy of sorted would be needed if reordering more than one array according to sorted.

reorder_according_to reorders array[] according to the cycles found in sorted[]. Using the above data, the first cycle is sorted = 10, sorted = 12, sorted = 3, sorted = 14, sorted = 0. So for that cycle, t = array, array = array, array = array, array = array, array = array, array = t, and sorted[{10, 12, 3, 14, 0}] = {10, 12, 3, 14, 0} (which sorts that cycle in sorted). The next cycle: sorted = 6, sorted = 11, sorted = 13, sorted = 4, sorted = 2, sorted = 1. The next cycle: sorted = 9, sorted = 8, sorted = 5. The last cycle is sorted = 7, already sorted. Note that every write puts an element in it's proper location so the time complexity is O(n).

Code:
void reorder_according_to(double array[], int sorted[], int len)
{
int i, j, k;
double t;
for(i = 0; i < len; i++){
if(i != sorted[i]){
t = array[i];
k = i;
while(i != (j = sorted[k])){
/* every store places an element in it's sorted location */
array[k] = array[j];
sorted[k] = k;
k = j;
}
array[k] = t;
sorted[k] = k;
}
}
}

Last edited: