# Intersection-find Data Structure

1. Aug 23, 2009

### mXSCNT

Is there a data structure like the union-find data structure, but for intersections (and the sets needn't be disjoint)?

I'm looking for a data structure that stores an indexed set of sets A_1, A_2, ..., A_n, and supports the following operations:
replace the sets A_i, A_j with their intersection A_i n A_j, in sub-linear time, and returns the index of the intersection thus formed.
given the index of a set, list its contents in linear time

2. Aug 23, 2009

### junglebeast

You can do it in sub-linear time only if the elements have a position in some K-dimensional space...allowing you to store them in a BST that can be partitioned sub-linearly for intersection. Otherwise, you're going to be stuck with linear time.

3. Aug 23, 2009

### mXSCNT

Well, suppose the elements are integers (therefore they are in a 1 dimensional space). What would you do with that?

Or did you mean that the {A_i} are "simply shaped" regions in a k-dimensional space (such as cubes or intervals)?

4. Aug 23, 2009

### junglebeast

In this case you can store the set as a list of subranges, and the intersection time will be O( (m+n) log (m+n) ), for intersecting subsets that have m and n subranges, respectively.

For example, if your sets are:

A = {1,2,3,4,9,10,11}
B = {3,4,5,6,12,13,14}

Then, represented as subranges you have,
A = { 1-4, 9-11 }
B = {3-6, 12-14 }

Sorting takes O(k log k) time (k=m+n),

1as 3bs 4ae 6be 9as 11ae 12bs 14be

(note: 1as = subrange from A starting with 1)

Then you can perform the final intersection in a final O(k) pass by choosing start point after both subranges have started, and end point as soon as either subrange ends. Result:

AintB = { 3-4 }

5. Aug 23, 2009

### mXSCNT

True, the problem is not difficult if you are only dealing with interval sets. I meant the more general case.

6. Aug 24, 2009

### junglebeast

It is obviously impossible to do this in sub-linear time in the general case because you want to process a linear number of elements. At the very least you will need to consider each element since there is no structure, and that means the algorithm must be at least linear time. QED

However when using integers, you can use Bucket sort to individually sort the two sets of integers in O(N) time, and you can then perform the intersection in one pass in O(N) time using similar method to what I described above. Note that this isn't truly general because it only works on integers and assumes a bounded range on those integers.

If you are repeatedly doing intersections like this, it would make more sense to use a priority queue that keeps them always sorted and then merging can be done in O(N) time for the truly general case (ie, floating point numbers), and with a much lower constant factor.