Intersection-find Data Structure

  • Thread starter Thread starter mXSCNT
  • Start date Start date
  • Tags Tags
    Data Structure
Click For Summary

Discussion Overview

The discussion revolves around the concept of a data structure for finding intersections of sets, akin to the union-find data structure, but applicable to non-disjoint sets. Participants explore the feasibility of performing intersection operations efficiently, particularly in sub-linear time, and consider various scenarios including different dimensional spaces and types of elements.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant inquires about a data structure that can replace two sets with their intersection in sub-linear time and return the index of the new set.
  • Another participant suggests that sub-linear time operations may only be feasible if elements are positioned in a K-dimensional space, allowing for the use of a binary search tree (BST) for partitioning.
  • A participant questions the approach if the elements are integers in one-dimensional space and asks whether the sets are "simply shaped" regions in higher dimensions.
  • It is proposed that sets of integers can be represented as lists of subranges, with a specific intersection time complexity provided for intersecting subsets with multiple subranges.
  • Some participants agree that the problem becomes simpler when dealing with interval sets, but emphasize the complexity in more general cases.
  • One participant argues that achieving sub-linear time in the general case is impossible due to the need to process a linear number of elements without structure.
  • Another participant suggests using Bucket sort for integers to achieve linear time intersection and mentions the potential for using a priority queue for repeated intersections in more general cases.

Areas of Agreement / Disagreement

Participants express differing views on the feasibility of achieving sub-linear time for intersections in general cases, with some arguing it is impossible while others propose specific methods for certain types of data. The discussion remains unresolved regarding the optimal approach for non-disjoint sets in various contexts.

Contextual Notes

Limitations include assumptions about the types of elements (e.g., integers vs. general sets) and the dimensionality of the space in which the sets exist. The discussion also highlights the dependency on the structure of the sets for determining intersection efficiency.

mXSCNT
Messages
310
Reaction score
1
Is there a data structure like the union-find data structure, but for intersections (and the sets needn't be disjoint)?

I'm looking for a data structure that stores an indexed set of sets A_1, A_2, ..., A_n, and supports the following operations:
replace the sets A_i, A_j with their intersection A_i n A_j, in sub-linear time, and returns the index of the intersection thus formed.
given the index of a set, list its contents in linear time
 
Technology news on Phys.org
You can do it in sub-linear time only if the elements have a position in some K-dimensional space...allowing you to store them in a BST that can be partitioned sub-linearly for intersection. Otherwise, you're going to be stuck with linear time.
 
Well, suppose the elements are integers (therefore they are in a 1 dimensional space). What would you do with that?

Or did you mean that the {A_i} are "simply shaped" regions in a k-dimensional space (such as cubes or intervals)?
 
mXSCNT said:
Well, suppose the elements are integers (therefore they are in a 1 dimensional space). What would you do with that?

Or did you mean that the {A_i} are "simply shaped" regions in a k-dimensional space (such as cubes or intervals)?

In this case you can store the set as a list of subranges, and the intersection time will be O( (m+n) log (m+n) ), for intersecting subsets that have m and n subranges, respectively.

For example, if your sets are:

A = {1,2,3,4,9,10,11}
B = {3,4,5,6,12,13,14}

Then, represented as subranges you have,
A = { 1-4, 9-11 }
B = {3-6, 12-14 }

Sorting takes O(k log k) time (k=m+n),

1as 3bs 4ae 6be 9as 11ae 12bs 14be

(note: 1as = subrange from A starting with 1)

Then you can perform the final intersection in a final O(k) pass by choosing start point after both subranges have started, and end point as soon as either subrange ends. Result:

AintB = { 3-4 }
 
True, the problem is not difficult if you are only dealing with interval sets. I meant the more general case.
 
mXSCNT said:
True, the problem is not difficult if you are only dealing with interval sets. I meant the more general case.

It is obviously impossible to do this in sub-linear time in the general case because you want to process a linear number of elements. At the very least you will need to consider each element since there is no structure, and that means the algorithm must be at least linear time. QED

However when using integers, you can use Bucket sort to individually sort the two sets of integers in O(N) time, and you can then perform the intersection in one pass in O(N) time using similar method to what I described above. Note that this isn't truly general because it only works on integers and assumes a bounded range on those integers.

If you are repeatedly doing intersections like this, it would make more sense to use a priority queue that keeps them always sorted and then merging can be done in O(N) time for the truly general case (ie, floating point numbers), and with a much lower constant factor.
 

Similar threads

  • · Replies 10 ·
Replies
10
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 25 ·
Replies
25
Views
3K
  • · Replies 0 ·
Replies
0
Views
2K
Replies
10
Views
2K
  • · Replies 7 ·
Replies
7
Views
4K
  • · Replies 10 ·
Replies
10
Views
2K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 75 ·
3
Replies
75
Views
7K
Replies
1
Views
14K