Register to reply

Subset problem

by SeventhSigma
Tags: subset
Share this thread:
dodo
#19
May2-12, 03:13 PM
P: 688
Here is some rationale as to why the above program appears to work. Consider, for example, the case N=5, where you want all subsets of {1,2,3,4} whose sum is larger than 5. I have listed all subsets below.

                 Sum
 1    {      1}   1
 2    {    2  }   2
 3    {    2 1}   3
 4    {  3    }   3
 5    {  3   1}   4
 6    {  3 2  }   5
 7    {  3 2 1}   6
--------------------
 8    {4      }   4
--------------------
 9    {4     1}   5
10    {4   2  }   6
11    {4   2 1}   7
12    {4 3    }   7
13    {4 3   1}   8
14    {4 3 2  }   9
15    (4 3 2 1}  10
The middle row (#8) contains a subset with only one element, N-1 = 4.

And notice that, by construction, all sums *after* the middle row are equal to 4 plus the corresponding sums *before* the middle row.

Therefore, the sums larger than 5 are:
- the sums after the middle (rows 9-15) larger than 5, which are as many as the sums before the middle (rows 1-7) that are larger than 5-4 = 1;
- plus 1 if the middle row were larger than 5, which it isn't; so nothing added here;
- plus the sums before the middle (rows 1-7) which are larger than 5.

So we are referring now *only* to the sums in the first rows (1-7).

These sums before the middle can be figured out recursively, as they follow the same pattern:
               Sum
 1    {    1}   1
 2    {  2  }   2
 3    {  2 1}   3
-----------------
 4    {3    }   3
------------------
 5    {3   1}   4
 6    {3 2  }   5
 7    {3 2 1}   6
where the three sums after this new middle (rows 5-7) add to 3 plus the corresponding sums in rows 1-3.

The final line in the 'compare' function
return compare(n - p, p - 1) + (p > n) + compare(n, p - 1)
is the one responsible for subdividing the problem in the three parts I mentioned before (after the middle, middle, and before the middle). The variable 'p' is meant to represent a power of 2.
SeventhSigma
#20
May2-12, 03:17 PM
P: 250
However those results aren't quite right... should be 2, 7, 19, 47, 108, 236, 501, onward
dodo
#21
May2-12, 03:20 PM
P: 688
Quote Quote by SeventhSigma View Post
However those results aren't quite right... should be 2, 7, 19, 47, 108, 236, 501, onward
Hmm, curious... I tried generating the subsets by brute force and I get the same numbers as I listed above, not yours. Let me re-check.
SeventhSigma
#22
May2-12, 03:26 PM
P: 250
some raw output: http://justpaste.it/yc5
dodo
#23
May2-12, 03:34 PM
P: 688
Ahhh... I was always using {1,2,3,...,N} (with all integers from 1 to N) as my primary set of length N. If you choose an arbitrary set of length N, then the count for N will obviously depend on the particular set, not just on N or on the largest integer in the set.

Edit: Oh, wait, you said in post #6 that the list comes from a specific recurrence sequence. Also, I see now that I have the comparison criterion wrong: a subset is valid if the largest element in the subset (not N, nor the largest in the original set) is smaller than the sum of the others; this renders my subdivision strategy useless.
SeventhSigma
#24
May2-12, 05:01 PM
P: 250
right, the subset,sorry for the confusion
dodo
#25
May2-12, 05:20 PM
P: 688
Hmm, not all is lost. Here is a variation of the program; it produces the output below in under a second, but it will get terribly slower for larger N, it is not linear.

Output:
4 2
5 7
6 19
7 47
8 108
9 236
10 501
11 1045
12 2149
13 4378
14 8869
15 17892
16 35988
17 72254
18 144886
19 290271
20 581207
Program:
#!/usr/bin/python

def main():
	sequence = [1,2,3]

	for n in range(4, 21):
		addOneMore(sequence)
		print n, count(sequence, n)

def addOneMore(sequence):
	sequence.append(sequence[-1] + sequence[-3])
	
def count(sequence, n):
	sum = 0
	for largest in range(3, n):
		sum += compare(sequence, sequence[largest], largest)
	return sum

def compare(sequence, top, p):
	if top <= 0:
		return 2**p - 1
	if p <= 0:
		return 0

	return compare(sequence, top - sequence[p - 1], p - 1)	\
	       + (sequence[p - 1] > top)			\
	       + compare(sequence, top, p - 1)

main()
Perhaps the only Python-specifics needed to understand this are that sequence indexes are 0-based (from 0 to length-1), and that negative indexes pick items from the end: sequence[-1] is the last, sequence[-2] the next to last, and so on. The iterator range(a,b) will iterate starting from 'a' and until less than 'b'; that is, up to b-1.
SeventhSigma
#26
May2-12, 05:39 PM
P: 250
sum-k's fall into an arithmetic progression across N's... but it's not clear how to derive them otherwise
dodo
#27
May2-12, 05:47 PM
P: 688
Quote Quote by SeventhSigma View Post
sum-k's fall into an arithmetic progression across N's... but it's not clear how to derive them otherwise
Sorry, I don't know which sum do you mean.
SeventhSigma
#28
May2-12, 05:53 PM
P: 250
say N=5 or [1, 2, 3, 4, 6]

and we count invalid solutions instead:
there are X ways to have an inner sum of k:

there are 4 ways to have an inner sum of 1
(1, 2) (1, 3) (1, 4) (1, 6)
there are 3 ways to have an inner sum of 2
(2, 3) (2, 4) (2, 6)
there are 5 ways to have an inner sum of 3
(1, 2, 3) (1, 2, 4) (1, 2, 6) (3, 4) (3, 6)
there are 3 ways to have an inner sum of 4
(1, 3, 4) (1, 3, 6) (4, 6)
and so forth
SeventhSigma
#29
May4-12, 11:01 AM
P: 250
lots of patterns but they all seem random
SeventhSigma
#30
May7-12, 11:37 AM
P: 250
Does anyone have any further thoughts on this problem?
haruspex
#31
May18-12, 02:34 AM
Homework
Sci Advisor
HW Helper
Thanks ∞
P: 9,643
This is connected with partition numbers, in particular, restricted partitions: http://en.wikipedia.org/wiki/Partiti...ted_partitions.
Let PD(n) be the number of partitions of n into distinct positive integers.
An 'invalid' set in the OP is one in which the largest in the set, Y, exceeds or equals the sum, X, of the rest. For a given X and any N >= Y >= X, there are PD(X) of these. (Except, when Y=X we have to exclude the partition of X as just X, otherwise the invalid set would consist of the same number twice.) So in total there are N.PD(1) - 1 + (N-1).PD(2) - 1 + ... + 1.PD(N) - 1 that are subsets of {1..N}.
Check: N = 5:
5.PD(1) + 4.PD(2) + 3.PD(3) + 2.PD(4) + 1.PD(5) - 5 =
5.1 + 4.1 + 3.2 + 2.2 + 1.3 - 5 = 17
But these are only the invalid sets of 2 or more numbers. Any single member set is necessarily invalid, for a total of 22.
The number of valid sets = 31 - 22 = 9, which I believe is correct.
ClifDavis
#32
May18-12, 09:36 AM
P: 36
Quote Quote by SeventhSigma View Post
I've tried most of these techniques already; in all actuality my set S is more like {1, 2, 3, 4, 6, 9, 13, 19, 28, 41} where the first three terms are hardcoded and the rest: a(n) = a(n-1) + a(n-3)

Problem: When I try to even list out all possible sets by hand, I don't see any patterns, nor do I know how to take advantage of combinatorics. Every time I think I've found a pattern, it diverges. I am aware of the points stated so far in this thread though. I think my issue is understanding the combinatorics behind the counting.
This is a very different and much more tractable problem. It is easier than finding the solutions for sets, S'={1,2,3,4,5,6,7,...,n} because S={a(1),a(2),a(3),...,a(n)} has much more structure.

For starts try to find a formula that will tell you exactly how many subsets of S have totals that are exactly equal to a(n). For any of these subsets, if you add an extra a(i) to it, it will form the basis for one of the subsets you're counting. So for each of these subsets check how many ways you can add to them without creating duplicates.


The case of S'=(1,2,3,...,n) is interesting though.
SeventhSigma
#33
May18-12, 10:51 AM
P: 250
Can you provide an example ? Not sure I understand
ClifDavis
#34
May18-12, 03:38 PM
P: 36
Let's take the general case of count(n) so that S={a(1),a(2),a(3),...a(n-3),a(n-2),a(n-1),a(n)}

Now we know the a(i) are positive and increasing, and you should be able to prove it if needed.

We also know that a(n)=a(n-1)+a(n-3). But they are increasing so a(n-2)>a(n-3). So it follows that after we add a(n-1) to both we get a(n-1)+a(n-2)>a(n-1)+a(n-3)=a(n). In other words any subset of S that contains both a(n-1) and a(n-2) is one we should count. The other smaller elements can either be present or not present (two possibilities) and there are exactly n-3 of these other elements so there are 2^(n-3) different subsets that go into our count so far (where 2^(n-3) should be read as 2 to the power of (n-3)).

Of course that isn't all by a long shot. So far we just have the subsets counted that contain both a(n-1) and a(n-2). Let's stick with a(n-1) and drop a(n-2). Now as we started out noticing a(n)=a(n-1)+a(n-3). So if we take any subset that contains a(n-1) and a(n-3) and at least one more element then we have a subset of S we should count. If the subset contained a(n-2) we would have already counted it in our previous steps, so there are only n-4 other elements that might be present or not. The only possibility we don't want to count is where they all aren't present and just leave us with {a(n-3),a(n-1)} which isn't quite enough and so we have (2^(n-4))-1 additional subsets to count.

So far we have count(n)=(2^(n-3))+(2^(n-4))-1+???

Have we counted everything that contains a(n-1)? We've counted everything that contains a(n-1) and either a(n-2) or a(n-3) or both. Are there other combinations that add up greater than a(n-3) (so that when combined with a(n-1) will be greater than a(n))? Why yes there are. You should be able to see there are count(n-3) of them because of the way count is defined.

So we now have count(n)=(2^(n-3))+(2^(n-4))-1+count(n-3)+???
with just having counted the Subsets containing a(n-1).


Can you take it from there???
SeventhSigma
#35
May19-12, 12:12 AM
P: 250
not so much. I am trying to use a smaller set of S to see if I can count them properly. I am not sure that approach works because it's easier to count the extremities than it is to count the nitty gritty stuff towards the center of S where stuff gets messier
haruspex
#36
May19-12, 12:37 AM
Homework
Sci Advisor
HW Helper
Thanks ∞
P: 9,643
I'm puzzled there's been no response to my post (#31).
Isn't it the answer to the originally posed question? Do I need to explain more?


Register to reply

Related Discussions
Set Theory| Proof if A subset B then f(A) subset f(B) Calculus & Beyond Homework 5
Could someone check this proof?! If c\b subset c\a, then prove a subset b Calculus & Beyond Homework 2
Another countable dense subset problem Calculus & Beyond Homework 2
Maximal disjoint subset problem Programming & Computer Science 2
Subset and proper subset Set Theory, Logic, Probability, Statistics 6