Permutation Counting Algorithm for Large Integers

  • Thread starter Thread starter DrDu
  • Start date Start date
  • Tags Tags
    Permutation
AI Thread Summary
The discussion centers on constructing a vector b from a permuted vector a, where each element in b represents the count of elements in a that are larger than the corresponding element in a for indices less than i. The initial inquiry seeks a solution faster than O(n^2), suggesting a potential use of mergesort. Participants explore various approaches, including a tree-based method where each node tracks the number of higher values seen, which could improve efficiency. A detailed implementation of this tree structure is provided, demonstrating how to count higher numbers while inserting new values. Additionally, a mergesort-based solution is proposed, emphasizing the importance of maintaining a counter during the merge process to achieve O(n log n) complexity. The conversation also touches on theoretical considerations regarding the efficiency of operations with variable-length integers, suggesting that while algorithms can be designed to be O(n), practical limitations may affect performance.
DrDu
Science Advisor
Messages
6,419
Reaction score
1,003
Remark: This is not homework

Given a vector a of permuted numbers from 1 to n, I want to construct the vector b which contains in the ith position the number of elemets of a_j with j <i which are larger than a_i. E.g.
a=(5,2,3,1,4), b=(0,1,1,3,1). Is there an algorithm which scales faster than O(n^2) ? I think yes, maybe a variant of mergesort or the like.

Thank you
DrDu
 
Technology news on Phys.org
There is no element with j<i if i=1.
 
Just blue-skying here without a lot of thought.

Make the first array 2-dimensional with the 2nd dimension containing the element index-1
Sort the full 2-dimensional array based on the original data
The 2nd-dimensional data, carrying the original index, will be in the position of <No.-of-smaller-values>

Cheers,
Tom

EDIT: This will give you the ≤ function rather than <. Hope that's adequate. If not, a second pass over the sorted data to massage the duplicates is needed.
 
Dear Tom,

if I understand correctly you are constructing the inverse permutation. It cannot coincide with the vector b I am looking for, as the vector b usually contains repeated entries, as in the example.
 
Can you use a tree? Each node has a number and at most two children, one with a lower number and one with a higher number. Each node keeps track of how many higher children it has.

You start with a root node containing 5. Then you add 2 as the "lower child" of 5, then 3 as the "higher child" of 2 (because it's lower than 5, but 5 already has a lower child but 2 doesn't have a higher child), then 1 as the "lower child" of 2 (5 already has a lower child but 2 doesn't), then 4 as the "higher child" of 3 (5 has a lower child, 2 has a higher child, but 3 doesn't).

If you keep track of how many child nodes have been added on the higher subtree of each one, I think you can work out how many higher numbers you've already seen without having to check every one, which ought to be better than ##O(n^2)##.
 
Last edited:
Python:
class TreeNode:
    def __init__(self,v):
        self.v=v # A number from the original list
        self.nGreater=0 # The number of higher numbers we've seen.
        self.lowerChild=None
        self.greaterChild=None
    def addNode(self,newV):
        if newV<self.v:
            if self.lowerChild is None:
                # New number is lower than this node and we have not seen any such numbers before. Return 1 (this node) plus how many greater-than-this-node children
                self.lowerChild=TreeNode(newV)
                return 1+self.nGreater
            else:
                # New number is lower than this node. Return 1 (this node) plus how many greater-than-this-node children plus anything between the new node and this node
                return 1+self.nGreater+self.lowerChild.addNode(newV)
        else:
            self.nGreater+=1
            if self.greaterChild is None:
                # Greater than this node and we haven't seen such a number before - return that there are no larger numbers below this node
                self.greaterChild=TreeNode(newV)
                return 0
            else:
                # Greater than this node - return how many greater-than-the-new-node values are found in the subtree
                return self.greaterChild.addNode(newV)
    def printTree(self):
        chstr=""
        if self.lowerChild is None:
            chstr+="-"
        else:
            chstr+=str(self.lowerChild.v)
        if self.greaterChild is None:
            chstr+=" -"
        else:
            chstr+=" "+str(self.greaterChild.v)
        print "Node:",self.v," Children:",chstr
        if self.lowerChild is not None:
            self.lowerChild.printTree()
        if self.greaterChild is not None:
            self.greaterChild.printTree()

a=[5,2,3,1,4]
rootNode=TreeNode(a[0])
b=[0]
for a_i in a[1:]:
    b.append(rootNode.addNode(a_i))
print b
rootNode.printTree()
This has not been extensively tested, and was put together in half an hour on a tiny phone. No guarantees it's right!
 
Last edited:
Thank you Ibix!
In the meanwhile, I figured out how to do this using mergesort:

1. Initialize the vector b to n times 0.
When joining two lists, which are already sorted in inverse order, you create a new list, taking always the larger left element of the two sub-lists. Just keep a counter of the number of elements already in the new list which have been selected from the left list. When an element from the right list is joined to the new list, increase b by the value of that counter.

Edit: replaced right with left.
 
Last edited:
  • Love
Likes Tom.G
Btw, a note on this: mergesort is O(n log n).

I do believe this is O(n) if one can use unbound (ie infinitely-long) integers (by using 2 bitmasks). Of course this is theoretical, as there's no machine that can implement unbound integers.

Then one could consider variable-length integers; then the algorithm itself is still O(n), but each bitmask operation in current hardware for variable-length integers is O(n) by itself, so that makes the algorithm actually O(n^2), but one could argue that's a hardware limitation. If one had a (theoretical) hardware that bitmask operations in variable length integers were O(1), then the algorithm would be O(n).
 
Last edited:
Back
Top