Intersection-find Data Structure

  • Thread starter Thread starter mXSCNT
  • Start date Start date
  • Tags Tags
    Data Structure
AI Thread Summary
The discussion revolves around finding a data structure for efficiently managing intersections of indexed sets that need not be disjoint. It highlights that while sub-linear intersection time is feasible for sets with elements in a K-dimensional space, achieving this for general cases is inherently linear due to the necessity of processing each element. For integer sets, using Bucket sort allows for O(N) time complexity for both sorting and intersection. The conversation also suggests that employing a priority queue could optimize repeated intersection operations, particularly for floating-point numbers. Overall, the complexity of intersections varies significantly based on the nature of the sets involved.
mXSCNT
Messages
310
Reaction score
1
Is there a data structure like the union-find data structure, but for intersections (and the sets needn't be disjoint)?

I'm looking for a data structure that stores an indexed set of sets A_1, A_2, ..., A_n, and supports the following operations:
replace the sets A_i, A_j with their intersection A_i n A_j, in sub-linear time, and returns the index of the intersection thus formed.
given the index of a set, list its contents in linear time
 
Technology news on Phys.org
You can do it in sub-linear time only if the elements have a position in some K-dimensional space...allowing you to store them in a BST that can be partitioned sub-linearly for intersection. Otherwise, you're going to be stuck with linear time.
 
Well, suppose the elements are integers (therefore they are in a 1 dimensional space). What would you do with that?

Or did you mean that the {A_i} are "simply shaped" regions in a k-dimensional space (such as cubes or intervals)?
 
mXSCNT said:
Well, suppose the elements are integers (therefore they are in a 1 dimensional space). What would you do with that?

Or did you mean that the {A_i} are "simply shaped" regions in a k-dimensional space (such as cubes or intervals)?

In this case you can store the set as a list of subranges, and the intersection time will be O( (m+n) log (m+n) ), for intersecting subsets that have m and n subranges, respectively.

For example, if your sets are:

A = {1,2,3,4,9,10,11}
B = {3,4,5,6,12,13,14}

Then, represented as subranges you have,
A = { 1-4, 9-11 }
B = {3-6, 12-14 }

Sorting takes O(k log k) time (k=m+n),

1as 3bs 4ae 6be 9as 11ae 12bs 14be

(note: 1as = subrange from A starting with 1)

Then you can perform the final intersection in a final O(k) pass by choosing start point after both subranges have started, and end point as soon as either subrange ends. Result:

AintB = { 3-4 }
 
True, the problem is not difficult if you are only dealing with interval sets. I meant the more general case.
 
mXSCNT said:
True, the problem is not difficult if you are only dealing with interval sets. I meant the more general case.

It is obviously impossible to do this in sub-linear time in the general case because you want to process a linear number of elements. At the very least you will need to consider each element since there is no structure, and that means the algorithm must be at least linear time. QED

However when using integers, you can use Bucket sort to individually sort the two sets of integers in O(N) time, and you can then perform the intersection in one pass in O(N) time using similar method to what I described above. Note that this isn't truly general because it only works on integers and assumes a bounded range on those integers.

If you are repeatedly doing intersections like this, it would make more sense to use a priority queue that keeps them always sorted and then merging can be done in O(N) time for the truly general case (ie, floating point numbers), and with a much lower constant factor.
 
Dear Peeps I have posted a few questions about programing on this sectio of the PF forum. I want to ask you veterans how you folks learn program in assembly and about computer architecture for the x86 family. In addition to finish learning C, I am also reading the book From bits to Gates to C and Beyond. In the book, it uses the mini LC3 assembly language. I also have books on assembly programming and computer architecture. The few famous ones i have are Computer Organization and...
I had a Microsoft Technical interview this past Friday, the question I was asked was this : How do you find the middle value for a dataset that is too big to fit in RAM? I was not able to figure this out during the interview, but I have been look in this all weekend and I read something online that said it can be done at O(N) using something called the counting sort histogram algorithm ( I did not learn that in my advanced data structures and algorithms class). I have watched some youtube...
Back
Top