Why insertion sort works better than quick-sort for small data?

  • Thread starter Thread starter Avichal
  • Start date Start date
  • Tags Tags
    Data Sort Works
Click For Summary

Discussion Overview

The discussion revolves around the comparative efficiency of insertion sort and quick-sort for small datasets, specifically when the number of elements is around 30-40. Participants explore the reasons why insertion sort may be preferred despite its worse theoretical time complexity in the worst case.

Discussion Character

  • Debate/contested
  • Technical explanation
  • Mathematical reasoning

Main Points Raised

  • Some participants note that while quick-sort has a time complexity of n log n, insertion sort is recommended for small datasets due to its lower overhead per operation.
  • Others argue that each step of quick-sort typically requires more work than a step of insertion sort, suggesting that the constant factors in their time complexities can affect performance for small n.
  • A participant compares the worst-case scenarios of insertion sort and merge sort, detailing the number of iterations involved in each algorithm.
  • Another participant explains the merging process in merge sort, emphasizing the number of comparisons and moves required, and questions the constants involved in the time complexity expressions.
  • Some participants discuss the overhead associated with quick-sort, such as the number of temporary variables and the time taken for array division, which may contribute to its slower performance for small datasets.
  • There is a recognition that the constants mentioned in the comparison of algorithms were exaggerated, but the underlying concern about overhead in more complex algorithms remains relevant.
  • Areas of Agreement / Disagreement

    Participants express differing views on the efficiency of insertion sort versus quick-sort for small datasets, with no consensus reached on the superiority of one algorithm over the other in this context.

    Contextual Notes

    Participants highlight the importance of considering overhead and constant factors in algorithm performance, particularly for small datasets, but do not resolve the implications of these factors on the overall efficiency of the sorting algorithms discussed.

Avichal
Messages
294
Reaction score
0
I have seen in books that when number of elements is small ~ 30-40 insertion sort is recommended. Why is that? The worst case of insertion sort is n2 whereas for quick-sort it is nlogn.

nlogn beats n2 for all values of n isn't it? Then why?
 
Technology news on Phys.org
Avichal said:
nlogn beats n2 for all values of n isn't it?

Each step of quicksort normally requires more "work" than a step of insertion sort. Compare 10000 n log n with 10 n2. Of course, this is an exaggeration, but it illustrates the basic idea.
 
Merge sort is much easier for me so let's talk compare merge-sort with insertions sort.
In worst case insertion sort will be 1 + 2 + 3 + ... + n which is n.(n+1) / 2 iterations/steps.
There is not worst case in merge sort. Each step (that is dividing the array by two) takes n iterations (that is merging). This happens logn or logn + 1 times. Thus total iterations/steps is n.logn or n.logn + n

Where does the constant in front comes from?
 
Avichal said:
Merge sort is much easier for me so let's talk compare merge-sort with insertions sort.
Using a bottom up merge sort is easier to explain. In a top down merge sort, no merging takes place until sub-array's of size 1 are produced by recursively dividing the array by 2 (or creating a tree of indexes on the stack).

The initial condition of a merge sort considers an array of size n as n arrays of size 1 (an array of size 1 can be considered sorted) and the first pass merges those n arrays of size 1 into a secondary buffer producing n/2 sorted arrays of size 2. The next pass merges the n/2 sorted arrays of size 2 back into the primary buffer producing n/4 sorted arrays of size 4. This merge process is repeated until there is 1 sorted array. The number of elements moved on each pass is n, and the number of passes is log2(n), so the number of moves is n log2(n). The worst case number of compares is n log2(n) - n + 1. This is explained next:

On the first pass each pair of arrays of size 1 are merged taking 1 compare, so that's n/2 compares. On the second pass, the worst case number of compares for each pair of arrays of size 2 is 3, so that's 3n/4 compares. For the third pass, 7n/8. For the final log2(n) pass, each pair of arrays of size n/2 are merged taking n-1 compares in the worst case, for a total of (n-1) n / n = n - 1. To calculate the sum, subtract 1x the number of compares from 2x the number of compares:

Code:
  2x # compares = 1n/1 + 3n/2 + 7n/4 + ... 
- 1x # compares =        1n/2 + 3n/4 + ... + (n-1)
------------------------------------------------
  1x # compares =    n +    n +    n + ... - (n-1) = n log2(n) - n + 1
 
Last edited:
rcgldr said:
Using a bottom up merge sort is easier to explain. In a top down merge sort, no merging takes place until sub-array's of size 1 are produced by recursively dividing the array by 2 (or creating a tree of indexes on the stack).

The initial condition of a merge sort considers an array of size n as n arrays of size 1 (an array of size 1 can be considered sorted) and the first pass merges those n arrays of size 1 into a secondary buffer producing n/2 sorted arrays of size 2. The next pass merges the n/2 sorted arrays of size 2 back into the primary buffer producing n/4 sorted arrays of size 4. This merge process is repeated until there is 1 sorted array. The number of elements moved on each pass is n, and the number of passes is log2(n), so the number of moves is n log2(n). The worst case number of compares is n log2(n) - n + 1.

Thanks for the explanation.
jtbell said:
Each step of quicksort normally requires more "work" than a step of insertion sort. Compare 10000 n log n with 10 n2. Of course, this is an exaggeration, but it illustrates the basic idea.
Anyways where do the constants 10000 and 10 come up? As explained above the number of steps/comparisons is n log2n ... no big constants in it
 
Avichal said:
Thanks for the explanation.
I updated my previous post to explain how the worst case number of compares are calclated.

Avichal said:
Anyways where do the constants 10000 and 10 come up?
That was an exagerration. The issue is the number of temporary variables, like indexes, and size of array, and how often they have to be updated in order to implement a sort algorithm. So the overhead per comparason, swap, or move may be higher for the "better" algorithms, making them slower if the number of elements in an array is very small.
 
rcgldr said:
That was an exagerration. The issue is the number of temporary variables, like indexes, and size of array, and how often they have to be updated in order to implement a sort algorithm. So the overhead per comparason, swap, or move may be higher for the "better" algorithms, making them slower if the number of elements in an array is very small.

Yeah I knew they were an exaggeration but couldn't see how any constants could rise. I totally forgot about overhead time taken to divide the array into two, calling other functions, swapping etc.
I was just looking at the comparisons!
Anyways thank you for clearing the doubt
 

Similar threads

  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 12 ·
Replies
12
Views
2K
  • · Replies 10 ·
Replies
10
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
Replies
17
Views
8K
  • · Replies 11 ·
Replies
11
Views
2K
  • · Replies 12 ·
Replies
12
Views
2K
  • · Replies 3 ·
Replies
3
Views
5K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 4 ·
Replies
4
Views
5K