How does radix sort perform on partially sorted data?

  • Thread starter Thread starter rcgldr
  • Start date Start date
  • Tags Tags
    Sorting
Click For Summary
SUMMARY

The discussion focuses on the performance of radix sort compared to other sorting algorithms on partially sorted data. The results indicate that radix sort, implemented in the provided sortv.cpp, achieves a sorting time of 203 ms for 4,194,304 64-bit elements, outperforming hybrid (265 ms), merge (297 ms), and Microsoft sorting algorithms (344 ms and 375 ms). The tests were conducted on a Windows XP64 system using Visual Studio 2005 with an Intel i7 2600K processor. The radix sort implementation utilizes a pre-pass for histogram generation and requires only one temporary array for sorting.

PREREQUISITES
  • Understanding of radix sort algorithm and its implementation
  • Familiarity with C++ programming and Visual Studio 2005
  • Knowledge of sorting algorithm performance metrics
  • Experience with 64-bit data types and memory management
NEXT STEPS
  • Explore the performance of radix sort on partially sorted data
  • Investigate the implementation details of hybrid sorting algorithms
  • Learn about optimizing sorting algorithms for specific data distributions
  • Examine the impact of different data structures on sorting performance
USEFUL FOR

Software developers, algorithm researchers, and data scientists interested in optimizing sorting performance and understanding the efficiency of different sorting algorithms on large datasets.

rcgldr
Homework Helper
Messages
8,948
Reaction score
687
Added radix sort to sort examples in:

http://rcgldr.net/misc/sortv.zip

Results on my system:

Code:
Time to sort 4,194,304 64 bit elements of psuedo random data:

radix sort                 rsortv.cpp    203 ms
hybrid (radix+merge) sort  hsortv.cpp    265 ms
merge sort                 msortv.cpp    297 ms
microsoft sort             sortv.cpp     344 ms
microsoft stable_sort      sortv.cpp     375 ms

64 bit mode, Windows XP64, Visual Studio 2005 compiler.
Intel i7 2600K 3.4ghz, DP67BG motherboard, 4GB ram

Previous threads:

https://www.physicsforums.com/showthread.php?t=218369

https://www.physicsforums.com/showthread.php?t=181589

Code for radix sort function example - it makes one pre-pass over the data array, generating an array of histograms (256 entries for each histogram), then coverts the histograms into the starting indices for each "bucket". Each pass places data into "buckets" based on the 8 bit value of the data, and the passes go from least signficant byte to most significant byte. In this example, 1 pre pass and 8 sort passes are used to sort an array of 64 bit unsigned integers. Only one temporary array (the same size as the original array) is required for the radix sort, using variable size "buckets".

Code:
typedef unsigned __int64    UI64;
typedef unsigned __int64 *  PUI64;

PUI64 RadixSort(PUI64 pData, PUI64 pTemp, size_t count)
{
size_t mIndex[8][256] = {0};            // index matrix
PUI64 pDst, pSrc, pTmp;
size_t i,j,m,n;
UI64 u;

    for(i = 0; i < count; i++){         // generate histograms
        u = pData[i];
        for(j = 0; j < 8; j++){
            mIndex[j][(size_t)(u & 0xff)]++;
            u >>= 8;
        }       
    }
    for(j = 0; j < 8; j++){             // convert to indices
        n = 0;
        for(i = 0; i < 256; i++){
            m = mIndex[j][i];
            mIndex[j][i] = n;
            n += m;
        }       
    }

    pDst = pTemp;                       // radix sort
    pSrc = pData;
    for(j = 0; j < 8; j++){
        for(i = 0; i < count; i++){
            u = pSrc[i];
            m = (size_t)(u >> (j<<3)) & 0xff;
            pDst[mIndex[j][m]++] = u;
        }
        pTmp = pSrc;
        pSrc = pDst;
        pDst = pTmp;
    }

    return(pSrc);
}
 
Last edited:
Technology news on Phys.org
You might want to try comparing how the algorithms perform on partially sorted data. It's just a thought.
 
MisterX said:
You might want to try comparing how the algorithms perform on partially sorted data. It's just a thought.
This was previously done with one of the pointer based text file sorts, msortp.cpp, that optionally takes advantage of ascending or descending sequences (change "#define VARSZGRP 0" to "#define VARSZGRP 1" in msortp.cpp). If the data is already sorted, it determines this with one read pass of the data and completes.

http://rcgldr.net/misc/sortp.zip

This wasn't done with the vector / array test sort programs, since the goal at that time was to benchmark the sorts with pseudo random data.

The previous thread didn't have a proper radix sort, so I added it to the zip file and created this thread.
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
4K