MHB Divide Binary Search Tree at Key k: Algorithm & Analysis

AI Thread Summary
The discussion focuses on developing an algorithm to divide a binary search tree (BST) into two smaller trees based on a key value, k. Participants suggest using a recursive approach to extract nodes less than and greater than k, with pointers being adjusted accordingly during the traversal. There are concerns about correctly identifying the nodes to compare with k, as some participants initially misinterpret the problem. A C program is shared that implements the splitting logic, demonstrating how to maintain the properties of the BST while ensuring no nodes are lost during the split. The conversation emphasizes the importance of passing the key value k in recursive calls to facilitate the correct division of the tree.
evinda
Gold Member
MHB
Messages
3,741
Reaction score
0
Hello! (Wave)

Given a binary search tree $B$, I want to write an algorithm, that divides $B$ into two new trees $B_1, B_2$, so that the first one contains all the keys of $B$ that are smaller than $k$ and the second one contains all the keys of $B$ that are greater than $k$.
Hint : Execute a search of $k$, "cutting out" the pointers that "hang" at the left side and the right side of the path. After that, merge the trees of the forest, you get.

I tried to find like that, the position, at which $k$ is:

Code:
if (B==k) p=B;
else if (B>k){
      while (B->left>x){
               B=B->left;
      }
      p=B;
else if (B<x){
      while (B->right<x){
               B=B->right;
      }
      p=B;
}

Is it right? Or have I done something wrong? (Thinking)
 
Technology news on Phys.org
Or do we have to do it in an other way, according to the hint? (Thinking)
 
evinda said:
Hello! (Wave)

Given a binary search tree $B$, I want to write an algorithm, that divides $B$ into two new trees $B_1, B_2$, so that the first one contains all the keys of $B$ that are smaller than $k$ and the second one contains all the keys of $B$ that are greater than $k$.
Hint : Execute a search of $k$, "cutting out" the pointers that "hang" at the left side and the right side of the path. After that, merge the trees of the forest, you get.

I tried to find like that, the position, at which $k$ is:

Code:
if (B==k) p=B;
else if (B>k){
      while (B->left>x){
               B=B->left;
      }
      p=B;
else if (B<x){
      while (B->right<x){
               B=B->right;
      }
      p=B;
}

Is it right? Or have I done something wrong? (Thinking)

Hey! (Smile)

evinda said:
Or do we have to do it in an other way, according to the hint? (Thinking)
I think you are supposed to set up a recursive algorithm.
Something like:
Code:
function algorithm(B, *B1, *B2)
  if B == NULL
    return
  extractLess(B, B->data, B1)
  extractGreater(B, B->data, B2)function extractLess(node, k, *destination)
  if node == NULL
    return
  if node->data < k
    addToTree(node->data, destination)

  extractLess(node->left, k, destination)
  extractLess(node->right, k, destination)

...
(Wasntme)
 
I like Serena said:
Hey! (Smile)

I think you are supposed to set up a recursive algorithm.
Something like:
Code:
function algorithm(B, *B1, *B2)
  if B == NULL
    return
  extractLess(B, B->data, B1)
  extractGreater(B, B->data, B2)function extractLess(node, k, *destination)
  if node == NULL
    return
  if node->data < k
    addToTree(node->data, destination)

  extractLess(node->left, k, destination)
  extractLess(node->right, k, destination)

...
(Wasntme)

If we have, for example, this tree $B$:

View attachment 3614

and we want to divide it into two new trees $B1,B2$, so that the first one contains all the keys of $B$ that are smaller than $5$ and the second one contains all the keys of $B$ that are greater than $5$, won't we compare all the values of the nodes with $12$, instead of $5$? (Worried)
Or have I understood it wrong? (Thinking)
 

Attachments

  • treeee.png
    treeee.png
    3 KB · Views: 95
evinda said:
If we have, for example, this tree $B$:

and we want to divide it into two new trees $B1,B2$, so that the first one contains all the keys of $B$ that are smaller than $5$ and the second one contains all the keys of $B$ that are greater than $5$, won't we compare all the values of the nodes with $12$, instead of $5$? (Worried)
Or have I understood it wrong? (Thinking)

I misread the problem statement, treating $k$ as the value of the root node, which it doesn't have to be. (Blush)
 
I found this to be an interesting exercise. The following somewhat lengthy C program solves the problem. I recommend that you read this, even if you're not familiar with C. There are several binary search tree functions that are applicable in a lot of places.
If anyone can help me, I have noticed empirically, that the heights of the split trees are never more than the height of the original tree. I can't prove this, though.

Code:
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
#define  MAX (1000)

typedef struct node_tag {
    int key;
    struct node_tag *left_child,*right_child;
}
node;

int insertIntoSearchTree(node* root,int x);
node* buildRandomTree(int n);
node* getNode(void);
int isSearchTree(node* root);
void splitTree(node* root,int x,node** lessRoot,node** greaterRoot);
node* appendToTree(node* root,node* p);
int countNodes(node* root);
int findMinKey(node* root);
int findMaxKey(node* root);
node* copyTree(node* root);

node* getNode(){
    node *p;
    if ((p=(node *) malloc(sizeof(node)))==NULL) {
        fprintf(stderr,"Memory allocation failure\n");
        exit(1);
    }
    p->key=0;
    p->left_child=p->right_child=NULL;
    return(p);
}

// Just frees all the nodes in the tree pointed to by root.

void destroy(node *root) {
    if (root==NULL) {
        return;
    }
    destroy(root->left_child);
    destroy(root->right_child);
    free(root);
}/* Inserts a new node (as a leaf) with key x into the NON-EMPTY binary search tree with root root.
   Return 1 if successful, but 0 if node with key x already in tree.
*/
int insertIntoSearchTree(node* root,int x) {
    node *p,*parent;
    p=root;
    while (p!=NULL && p->key!=x) {
        parent=p;
        if (x<p->key) {
            p=p->left_child;
        }
        else {
            p=p->right_child;
        }
    }
    if (p!=NULL && p->key==x) {
        return(0);
    }
    p=getNode();
    p->key=x;
    if (x<parent->key) {
        parent->left_child=p;
    }
    else {
        parent->right_child=p;
    }
    return(1);
}

/* Following builds a "random" tree with n nodes.  Found by first generating a random permutation and then
inserting the nodes in turn into an initially empty binary tree.
*/
node* buildRandomTree(int n) {
    node* t;
    int i,j,temp;
    int randomPerm[MAX];
    for (i=0;i<n;randomPerm[i]=2*i,i++);
    for (i=n-1;i>0;i--) {
        j=rand()%i;
        temp=randomPerm[i];
        randomPerm[i]=randomPerm[j];
        randomPerm[j]=temp;
    }
    t=getNode();
    t->key=randomPerm[0];
    for (i=1;i<n;i++) {
        insertIntoSearchTree(t,randomPerm[i]); // all components of randomPerm different
    }
    return(t);
}

/*  Finds the minimum key in tree root; artificially returns INT_MAX for
    an empty tree.  Convenient for comparisons made else where.
*/

int findMinKey(node* root) {
    node* p,*parent;
    if (root==NULL) {
        return(INT_MAX);
    }
    for (parent=root,p=root->left_child;p!=NULL;parent=p,p=p->left_child);
    return(parent->key);
}

// Dual of findMinKey

int findMaxKey(node* root) {
    node* p,*parent;
    if (root==NULL) {
        return(INT_MIN);
    }
    for (parent=root,p=root->right_child;p!=NULL;parent=p,p=p->right_child);
    return(parent->key);
}/* isSearchTree returns true iff the binary tree with root root is a binary search tree.
*/
int isSearchTree(node* root) {
    node* p,*parent;
    int min,max;
    if (root==NULL) {
        return(1);
    }
    int valid=isSearchTree(root->left_child) && isSearchTree(root->right_child);
    if (!valid) {
        return(0);
    }
    max=findMaxKey(root->left_child);
    min=findMinKey(root->right_child);
    return(max<root->key && root->key<min);
}

/* SPECIAL append function does NOT work for arbitrary trees.  Here the search
   tree rooted at p is appended to the search tree rooted at root with the result
   a search tree. The condition on p is that ANY node in p with key k would be inserted
   into root at the same place.  This is called only by splitTree.
*/

node* appendToTree(node* root,node* p) {
    int x;
    node *q,*parent;
    if (root==NULL) {
        return(p);
    }
    if (p==NULL) {
        return(root);
    }
    x=p->key;
    parent=NULL;
    q=root;
    while (q!=NULL) {
        parent=q;
        if (x<q->key) {
            q=q->left_child;
        }
        else {
            q=q->right_child;
        }
    }
    if (x<parent->key) {
        parent->left_child=p;
    }
    else {
        parent->right_child=p;
    }
    return(root);
}

/*  The tree at root is split into two trees lessRoot and greaterRoot with the key
    of any node in lessRoot < x and the key of any node in greaterRoot is >= x.  So if
    x is a key in the original tree root, x is the least key in greaterRoot; however,
    it is still valid if x is not a key in root.  No new nodes are allocated, so the original
    tree is invalid after the split.
*/

void splitTree(node* root,int x,node** lessRoot,node** greaterRoot) {
    node* p,*parent;
    p=root;
    while (p!=NULL && p->key!=x) {
        parent=p;
        if (x<p->key) {
            p=p->left_child;
            parent->left_child=NULL;
            *greaterRoot=appendToTree(*greaterRoot,parent);
        }
        else {
            p=p->right_child;
            parent->right_child=NULL;
            *lessRoot=appendToTree(*lessRoot,parent);
        }
    }
    if (p!=NULL) {
        *lessRoot=appendToTree(*lessRoot,p->left_child);
        p->left_child=NULL;
        *greaterRoot=appendToTree(*greaterRoot,p);
    }
}

// Just counts the nodes

int countNodes(node* root) {
    int n;
    if (root==NULL) {
        return(0);
    }
    n=countNodes(root->left_child);
    n+=countNodes(root->right_child);
    return(n+1);
}

/* Debugging aid to make certain above splitTree works. splitTree was called with pivot
   x and the origingal tree had total nodes.
*/

int checkSplitTree(int x,int total,node* lessRoot,node* greaterRoot) {
    int n,max,min;
    n=countNodes(lessRoot)+countNodes(greaterRoot);
    // first make sure no nodes got lost
    if (n!=total) {
        return(0);
    }
    // now the two split trees better be binary search trees
    if (!isSearchTree(lessRoot) || !isSearchTree(greaterRoot)) {
        return(0);
    }
    // finally, the split worked
    max=findMaxKey(lessRoot);
    min=findMinKey(greaterRoot);
    return(max<x && x<=min);
}

// Just copies the tree into a whole new tree.
node* copyTree(node* root) {
    node* p;
    if (root==NULL) {
        return(NULL);
    }
    p=getNode();
    p->key=root->key;
    p->left_child=copyTree(root->left_child);
    p->right_child=copyTree(root->right_child);
    return(p);
}

// Compute the height of the tree:
int height(node* root) {
    int h1,h2;
    if (root==NULL) {
        return(-1);
    }
    h1=height(root->left_child);
    h2=height(root->right_child);
    return((h1<=h2) ? h2+1 : h1+1);
}int main(){
    node *root,*root1;
    node *rootLess,*rootGreater;
    int i,n=800,valid;
    int ht,htLess,htGreater,h;
    //    root=buildComplete(1,n);
    root1=buildRandomTree(n);
    valid=1;
    ht=height(root1);
    htLess=htGreater=INT_MIN;
    for (i=0;i<2*n;i++) {
        root=copyTree(root1);
        rootLess=NULL;
        rootGreater=NULL;
        splitTree(root,i,&rootLess,&rootGreater);
        if (!checkSplitTree(i,n,rootLess,rootGreater)) {
            printf("Oops, something wrong! at %d\n",i);
            valid=0;
            break;
        }
        h=height(rootLess);
        if (h>htLess) {
            htLess=h;
        }
        h=height(rootGreater);
        if (h>htGreater) {
            htGreater=h;
        }
        destroy(rootLess);
        destroy(rootGreater);
    }
    if (valid) {
        printf("everything ok!\n");
        printf("original ht:%d , max of rootLess: %d, max of rootGreater: %d\n",ht,htLess,htGreater);
    }
    return(0);
}
 
Could you describe me what we have to do? (Thinking)
 
evinda said:
Could you describe me what we have to do? (Thinking)

The algorithm I suggested is still a good way to go.
We should just pass $k$ to it, and also pass $k$ on to the recursive calls. (Wasntme)
 
I like Serena said:
The algorithm I suggested is still a good way to go.
We should just pass $k$ to it, and also pass $k$ on to the recursive calls. (Wasntme)

So, does the function have to be of this form? :confused:

Code:
function algorithm(B, k,B1, B2)
  if B == NULL
    return
  extractLess(B, k,B->data, B1)
  extractGreater(B, k,B->data, B2)

If so, what type of parameters should the functions [m] extractLess [/m] and [m] extractGreater [/m] have? (Thinking)
 
  • #10
As I read the original question and the hint, I assumed that the problem was to split the given tree into two new trees with no new nodes allocated. As I understand ILikeSerena's suggestion, his algorithm creates new trees with new nodes. It also appears that his suggestion is of order n (number of nodes in the original tree), whereas the solution I gave is of order the height of the original tree -- for a reasonably balanced tree, this will result in order ln(n).

Btw, here's some unsolicited advice. I think the study of algorithms is just like mathematics; it's not a spectator sport. Surely you must know some programming language (Java, C++, Python etc.). Implement your proposed solution in your favorite language. When I used to teach algorithms, I almost always expected a working program that implemented a given exercise.
 
  • #11
johng said:
As I read the original question and the hint, I assumed that the problem was to split the given tree into two new trees with no new nodes allocated. As I understand ILikeSerena's suggestion, his algorithm creates new trees with new nodes. It also appears that his suggestion is of order n (number of nodes in the original tree), whereas the solution I gave is of order the height of the original tree -- for a reasonably balanced tree, this will result in order ln(n).

Btw, here's some unsolicited advice. I think the study of algorithms is just like mathematics; it's not a spectator sport. Surely you must know some programming language (Java, C++, Python etc.). Implement your proposed solution in your favorite language. When I used to teach algorithms, I almost always expected a working program that implemented a given exercise.

I want the algorithm to be of order the height of the original tree.. (Nod)

So, in pseudocode should it be like that?

Code:
pointer appendToTree(pointer root,poiter p) {
    int x;
    pointer q,parent;
    if (root==NULL) {
        return(p);
    }
    if (p==NULL) {
        return(root);
    }
    x=p->key;
    parent=NULL;
    q=root;
    while (q!=NULL) {
        parent=q;
        if (x<q->key) {
            q=q->left_child;
        }
        else {
            q=q->right_child;
        }
    }
    if (x<parent->key) {
        parent->left_child=p;
    }
    else {
        parent->right_child=p;
    }
    return(root);
}
void splitTree(pointer root,int x,pointer lessRoot,pointer greaterRoot) {
    pointer p,*parent;
    p=root;
    while (p!=NULL && p->key!=x) {
        parent=p;
        if (x<p->key) {
            p=p->left_child;
            parent->left_child=NULL;
            greaterRoot=appendToTree(greaterRoot,parent);
        }
        else {
            p=p->right_child;
            parent->right_child=NULL;
            lessRoot=appendToTree(lessRoot,parent);
        }
    }
    if (p!=NULL) {
        lessRoot=appendToTree(lessRoot,p->left_child);
        p->left_child=NULL;
        greaterRoot=appendToTree(greaterRoot,p);
    }
}

Or is it wrong? (Thinking)

At the algorithm [m] splitTree [/m], why is [m] p [/m] a pointer, but [m]parent[/m] a pointer to a pointer? (Worried)

Can the type of a function in pseudocode be [m] pointer [/m] ? :confused:
 
Back
Top