Solving the Subset Problem: Finding the Count of Valid Sets from 1 to N

  • Thread starter SeventhSigma
  • Start date
In summary, the conversation discusses how to count valid subsets of a set of numbers from 1 to N, where a valid subset has the largest element smaller than the sum of the rest of the subset. The conversation provides a method for finding these subsets and discusses the different types of subsets based on the relationship between the largest element and the sum of the rest of the elements. The conversation also mentions that the problem becomes more complex when considering subsets of arbitrary natural numbers.
  • #1
SeventhSigma
257
0
If I have a set of numbers from 1 to N, I want to know how to count sets such that the highest number in the set is smaller than the sum of the rest of the subset.

example - (4,7,8) is OK because 4+7>8
(1,2,3,6) is not OK because 1+2+3<=6

Not even sure how to begin @____@
 
Physics news on Phys.org
  • #2
SeventhSigma said:
If I have a set of numbers from 1 to N, I want to know how to count sets such that the highest number in the set is smaller than the sum of the rest of the subset.

example - (4,7,8) is OK because 4+7>8
(1,2,3,6) is not OK because 1+2+3<=6

Not even sure how to begin @____@



I'm not sure what you meant by "how to count sets such that...". Do you mean you have to characterise these sets, build them...or what?

If you want to build them, take any number of natural numbers [itex]a_1, a_2,...,a_n[/itex] , evaluate their sum [itex]S=a_1+...+a_n[/itex] ,and now

just take ANY number [itex]H > S\Longrightarrow \{a_1,...,a_n,H\}[/itex] is a set as you want.

If you want something else write it down clearly.

DonAntonio
 
  • #3
I'm not trying to build sets; I'm trying to count valid subsets.

The main set, call it S, is a list from 1 to N (N can be anything).

I am trying to figure out a good approach/formula for counting valid subsets of S. A valid subset is one where the largest element of the subset is less than the sum of the rest of the subset.
 
  • #4
SeventhSigma said:
I'm not trying to build sets; I'm trying to count valid subsets.

The main set, call it S, is a list from 1 to N (N can be anything).

I am trying to figure out a good approach/formula for counting valid subsets of S. A valid subset is one where the largest element of the subset is less than the sum of the rest of the subset.

The way to begin is to start by computing the sets by hand for n = 1, 2, etc. After a while you'll either see a pattern or realize that the combinatorics are too complicated for you to figure out.

The other thing to do as you go through these exercises is to develop some notation and start to gain familiarity with how the sets work.

So let's take n = 1.

We only have two possible subsets, [itex]\emptyset[/itex] and {1}.

[itex]\emptyset[/itex] has no largest element so there's nothing we can say about it. From now on we won't consider the empty set.

The largest element of {1} is 1. Let's denote this as L({1}) = 1. For any set, L(A) is the largest element of the set.

What's the sum of the remaining elements? Well the set of remaining elements other than 1 is empty. The empty sum is zero. Let's denote this as S({1}) = 0. In other words for any set A, S(A) is the sum of the elements in A excluding L(A).

Now there are three possibilities for any set A:

* S(A) < L(A). In this case we'll call A an "L-set," for "less than."

* S(A) = L(A). In this case we'll call A an "E-set," for "equal."

* S(A) > L(A). In this case we'll call A a "G-set," for "greater than." These are the ones you're interested in. My notation is backward from your description, my apologies. This seemed natural to me because if you arrange the elements in order, the largest is on the right and the sum of the rest of them is on the left. So S is to the left of L in my mind. You see that a key part of the process is developing your own mental model of the problem.

With this notation, let's just bang out a bunch of examples and see what we see.

Also note that your problem has a constraint that makes it much easier than it could be. For each n we only have to consider subsets of {1, 2, 3, ...n}. In other words we don't have to deal with arbitrary subsets of the natural numbers such as {47, 100, 148}, which is an L-set. But we aren't considering those types of arbitrary sets. For us, the only possible 3-element set is {1,2,3}, which happens to be an E-set. So your problem is a special case of a much harder problem.

Ok for n = 1 we already saw that we have an enumeration of the subsets:

{1}: L

For n = 2 we have:

{1} : L
{2} : L
{1, 2} : L

We now make a couple of handy observations. Every singleton is an L-set. And every 2-element set is an L-set. Handy to know, saves us some thinking as we go. You see the point of these warmup exercises is to start to become familiar with the problem space.

For n = 3 we have

{1} : L
{2} : L
{3} : L -- remember we don't even have to think about singletons, we know they're L's.
{1,2} : L
{1,3} : L
{2,3} : L -- likewise the 2-elt sets are L's.
{1,2,3} : E -- Ok we finally saw something interesting. Our first E-set.

Also note that your table always has 2n - 1 entries. That's the number of nonempty sets in the power set of an n-element set.

Now I won't write out the entire table for n = 4. There are 15 nonempty subsets. Four are singletons, six are 2-elt sets ({1,2}; {1,3}; {1,4}; {2,3}; {2,4}; {3,4)}. All of these are L-sets.

When we get to the 3-elt sets we have

{1,2,3} : E
{1,2,4} : L
{1,3,4} : E
{2,3,4} : G -- Hooray, our first G-set, which is the type you are interested in.

And finally the 4-banger.

{1,2,3,4} : G -- Our second G-set.

So if G(n) is the number of G-sets for n, we have G(1) = G(2) = G(3) = 0; and G(4) = 2.

I'll leave the rest to you. Clearly n = 5 is tedious and n = 6 more tedious still. But at some point you might see the pattern by which G-sets are formed.

I hope I provided an answer to the question, How do we start a problem like this?

What we do is start computing the objects of interest by hand for n = 1, 2, ...

As we go, we are forced to confront some conceptual issues such as remembering that an empty sum is zero. We also develop some notation to serve as a shorthand for ideas that we're using over and over. And as we go, we start to develop some skill at manipulating the objects of interest. We know without having to think that singletons and 2-bangers are L-sets. We know that the smallest G-set is {2,3,4}, and so on.

I think that this is going to get combinatorially tricky as the numbers get larger. But this is how to get started. Roll up your sleeves and look at examples.

ps -- If you know a programming language, this is the kind of problem that would be straightforward to program. The benefit of that approach is that you can produce a table for G(n) for much larger values of n than you could do by hand. However there is also some virtue to writing out a lot of cases by hand so you develop intuition for the problem.
 
Last edited:
  • #5
On the subject of finding shortcuts, it may be useful to realize that, for N>3, the whole set S={1,2,3,...N} is a 'G-set' (thus 'valid'), since the largest element is N and the sum of the others is N(N-1)/2, and (N-1)/2 > 1 when N>3. On these conditions, any subset including N will be 'valid', as the sum of the others will not be higher than in the worst case, S itself.
 
  • #6
I've tried most of these techniques already; in all actuality my set S is more like {1, 2, 3, 4, 6, 9, 13, 19, 28, 41} where the first three terms are hardcoded and the rest: a(n) = a(n-1) + a(n-3)

Problem: When I try to even list out all possible sets by hand, I don't see any patterns, nor do I know how to take advantage of combinatorics. Every time I think I've found a pattern, it diverges. I am aware of the points stated so far in this thread though. I think my issue is understanding the combinatorics behind the counting.
 
  • #7
Dodo said:
On the subject of finding shortcuts, it may be useful to realize that, for N>3, the whole set S={1,2,3,...N} is a 'G-set' (thus 'valid'), since the largest element is N and the sum of the others is N(N-1)/2, and (N-1)/2 > 1 when N>3. On these conditions, any subset including N will be 'valid', as the sum of the others will not be higher than in the worst case, S itself.


And that's precisely the reason why it is not true that any subset containing N will be valid!

For example, for [itex]N\geq 3[/itex] , the set [itex]\{1,2,N\}[/itex] is not valid...

In fact, S is not the "worst case" but the "best"...

DonAntonio
 
  • #8
There are many subsets containing N that work, but not all
 
  • #9
I might also note that I have written a program that will show all the combinations but it finds them all through brute force. OK for small S, not okay for large S.
 
  • #10
Still not finding any pattern... how to use combinatorics here?
 
  • #11
DonAntonio said:
In fact, S is not the "worst case" but the "best"...
Oh boy, what was I smoking... sorry about that! You are right. Post #5 should be read only up to "... when N>3", the last sentence is wrong wrong wrong.
 
  • #12
Is anyone able to help?
 
  • #13
Here's a general suggestion that may or may not help: You say that you have calculated your desired result for small values of N. Search for that sequence in the On-Line Encyclopedia of Integer Sequences. If you get a match, the information might be useful.
 
  • #14
I have had that site open pretty much the entire time I've been working on this and nothing's been particularly helpful, unfortunately. None of the sequences match (or any of the subsequences I get when I try creative ways to partition the results).
 
  • #15
I wonder (without really knowing what I'm talking about) if the problem can be approached from the other end: counting ways to partition an integer K as a sum of distinct terms, all smaller than your given N; and then adding all these counts from K=N+1 to N(N-1)/2.
 
  • #16
perhaps, but for very large N I am not sure if that will work because it would require calculating that for each and every K.
 
  • #17
SeventhSigma said:
perhaps, but for very large N I am not sure if that will work because it would require calculating that for each and every K.



Have you thought of the not-at-all remote possibility that there is no pattern whatsoever for your problem?

Perhaps this is one of those problems that require to know the exact set to work with in order to construct all the valid subsets...

DonAntonio
 
  • #18
SeventhSigma said:
perhaps, but for very large N I am not sure if that will work because it would require calculating that for each and every K.
Plus, having a theoretical answer or a practical method are two very different things; your comment suggests that you are looking for the latter, but the former is a necessary step anyway.

P.S.: Please compare the output of the following Python program (Python is almost pseudo-code, so it should be "understandable"), to see if this is what you expect. (I need some time to explain to myself why this seems to work; my subconscious tends to work faster --though not always right-- than my rational part, I guess.)

At first I thought this program would run in linear time because, while the problem is subdivided in two and recursively solved, I thought the left branch of the tree would extinguish pretty quickly. When trying with larger numbers I see that this is not really true. It's still way faster than the brute-force approach of generating all subsets and counting them.

Output:
Code:
1 0
2 0
3 0
4 2
5 7
6 19
7 46
8 104
9 224
10 470
11 970
12 1979
13 4009
14 8083
15 16248
16 32600
17 65330
18 130820
19 261838
20 523918
Program:
Code:
#!/usr/bin/python

def main():
	for n in range(1, 21):
		print n, count(n)

def count(n):
	return compare(n, n - 1)

def compare(n, p):
	if n <= 0:
		return 2**p - 1
	if p <= 0:
		return 0

	return compare(n - p, p - 1) + (p > n) + compare(n, p - 1)

main()
 
Last edited:
  • #19
Here is some rationale as to why the above program appears to work. Consider, for example, the case N=5, where you want all subsets of {1,2,3,4} whose sum is larger than 5. I have listed all subsets below.

Code:
                 Sum
 1    {      1}   1
 2    {    2  }   2
 3    {    2 1}   3
 4    {  3    }   3
 5    {  3   1}   4
 6    {  3 2  }   5
 7    {  3 2 1}   6
--------------------
 8    {4      }   4
--------------------
 9    {4     1}   5
10    {4   2  }   6
11    {4   2 1}   7
12    {4 3    }   7
13    {4 3   1}   8
14    {4 3 2  }   9
15    (4 3 2 1}  10

The middle row (#8) contains a subset with only one element, N-1 = 4.

And notice that, by construction, all sums *after* the middle row are equal to 4 plus the corresponding sums *before* the middle row.

Therefore, the sums larger than 5 are:
- the sums after the middle (rows 9-15) larger than 5, which are as many as the sums before the middle (rows 1-7) that are larger than 5-4 = 1;
- plus 1 if the middle row were larger than 5, which it isn't; so nothing added here;
- plus the sums before the middle (rows 1-7) which are larger than 5.

So we are referring now *only* to the sums in the first rows (1-7).

These sums before the middle can be figured out recursively, as they follow the same pattern:
Code:
               Sum
 1    {    1}   1
 2    {  2  }   2
 3    {  2 1}   3
-----------------
 4    {3    }   3
------------------
 5    {3   1}   4
 6    {3 2  }   5
 7    {3 2 1}   6

where the three sums after this new middle (rows 5-7) add to 3 plus the corresponding sums in rows 1-3.

The final line in the 'compare' function
Code:
return compare(n - p, p - 1) + (p > n) + compare(n, p - 1)
is the one responsible for subdividing the problem in the three parts I mentioned before (after the middle, middle, and before the middle). The variable 'p' is meant to represent a power of 2.
 
  • #20
However those results aren't quite right... should be 2, 7, 19, 47, 108, 236, 501, onward
 
Last edited:
  • #21
SeventhSigma said:
However those results aren't quite right... should be 2, 7, 19, 47, 108, 236, 501, onward

Hmm, curious... I tried generating the subsets by brute force and I get the same numbers as I listed above, not yours. Let me re-check.
 
  • #23
Ahhh... I was always using {1,2,3,...,N} (with all integers from 1 to N) as my primary set of length N. If you choose an arbitrary set of length N, then the count for N will obviously depend on the particular set, not just on N or on the largest integer in the set.

Edit: Oh, wait, you said in post #6 that the list comes from a specific recurrence sequence. Also, I see now that I have the comparison criterion wrong: a subset is valid if the largest element in the subset (not N, nor the largest in the original set) is smaller than the sum of the others; this renders my subdivision strategy useless.
 
Last edited:
  • #24
right, the subset,sorry for the confusion
 
  • #25
Hmm, not all is lost. Here is a variation of the program; it produces the output below in under a second, but it will get terribly slower for larger N, it is not linear.

Output:
Code:
4 2
5 7
6 19
7 47
8 108
9 236
10 501
11 1045
12 2149
13 4378
14 8869
15 17892
16 35988
17 72254
18 144886
19 290271
20 581207

Program:
Code:
#!/usr/bin/python

def main():
	sequence = [1,2,3]

	for n in range(4, 21):
		addOneMore(sequence)
		print n, count(sequence, n)

def addOneMore(sequence):
	sequence.append(sequence[-1] + sequence[-3])
	
def count(sequence, n):
	sum = 0
	for largest in range(3, n):
		sum += compare(sequence, sequence[largest], largest)
	return sum

def compare(sequence, top, p):
	if top <= 0:
		return 2**p - 1
	if p <= 0:
		return 0

	return compare(sequence, top - sequence[p - 1], p - 1)	\
	       + (sequence[p - 1] > top)			\
	       + compare(sequence, top, p - 1)

main()

Perhaps the only Python-specifics needed to understand this are that sequence indexes are 0-based (from 0 to length-1), and that negative indexes pick items from the end: sequence[-1] is the last, sequence[-2] the next to last, and so on. The iterator range(a,b) will iterate starting from 'a' and until less than 'b'; that is, up to b-1.
 
  • #26
sum-k's fall into an arithmetic progression across N's... but it's not clear how to derive them otherwise
 
  • #27
SeventhSigma said:
sum-k's fall into an arithmetic progression across N's... but it's not clear how to derive them otherwise

Sorry, I don't know which sum do you mean.
 
  • #28
say N=5 or [1, 2, 3, 4, 6]

and we count invalid solutions instead:
there are X ways to have an inner sum of k:

there are 4 ways to have an inner sum of 1
(1, 2) (1, 3) (1, 4) (1, 6)
there are 3 ways to have an inner sum of 2
(2, 3) (2, 4) (2, 6)
there are 5 ways to have an inner sum of 3
(1, 2, 3) (1, 2, 4) (1, 2, 6) (3, 4) (3, 6)
there are 3 ways to have an inner sum of 4
(1, 3, 4) (1, 3, 6) (4, 6)
and so forth
 
  • #29
lots of patterns but they all seem random
 
  • #30
Does anyone have any further thoughts on this problem?
 
  • #31
This is connected with partition numbers, in particular, restricted partitions: http://en.wikipedia.org/wiki/Partition_(number_theory)#Restricted_partitions.
Let PD(n) be the number of partitions of n into distinct positive integers.
An 'invalid' set in the OP is one in which the largest in the set, Y, exceeds or equals the sum, X, of the rest. For a given X and any N >= Y >= X, there are PD(X) of these. (Except, when Y=X we have to exclude the partition of X as just X, otherwise the invalid set would consist of the same number twice.) So in total there are N.PD(1) - 1 + (N-1).PD(2) - 1 + ... + 1.PD(N) - 1 that are subsets of {1..N}.
Check: N = 5:
5.PD(1) + 4.PD(2) + 3.PD(3) + 2.PD(4) + 1.PD(5) - 5 =
5.1 + 4.1 + 3.2 + 2.2 + 1.3 - 5 = 17
But these are only the invalid sets of 2 or more numbers. Any single member set is necessarily invalid, for a total of 22.
The number of valid sets = 31 - 22 = 9, which I believe is correct.
 
  • #32
SeventhSigma said:
I've tried most of these techniques already; in all actuality my set S is more like {1, 2, 3, 4, 6, 9, 13, 19, 28, 41} where the first three terms are hardcoded and the rest: a(n) = a(n-1) + a(n-3)

Problem: When I try to even list out all possible sets by hand, I don't see any patterns, nor do I know how to take advantage of combinatorics. Every time I think I've found a pattern, it diverges. I am aware of the points stated so far in this thread though. I think my issue is understanding the combinatorics behind the counting.

This is a very different and much more tractable problem. It is easier than finding the solutions for sets, S'={1,2,3,4,5,6,7,...,n} because S={a(1),a(2),a(3),...,a(n)} has much more structure.

For starts try to find a formula that will tell you exactly how many subsets of S have totals that are exactly equal to a(n). For any of these subsets, if you add an extra a(i) to it, it will form the basis for one of the subsets you're counting. So for each of these subsets check how many ways you can add to them without creating duplicates.


The case of S'=(1,2,3,...,n) is interesting though.
 
  • #33
Can you provide an example ? Not sure I understand
 
  • #34
Let's take the general case of count(n) so that S={a(1),a(2),a(3),...a(n-3),a(n-2),a(n-1),a(n)}

Now we know the a(i) are positive and increasing, and you should be able to prove it if needed.

We also know that a(n)=a(n-1)+a(n-3). But they are increasing so a(n-2)>a(n-3). So it follows that after we add a(n-1) to both we get a(n-1)+a(n-2)>a(n-1)+a(n-3)=a(n). In other words any subset of S that contains both a(n-1) and a(n-2) is one we should count. The other smaller elements can either be present or not present (two possibilities) and there are exactly n-3 of these other elements so there are 2^(n-3) different subsets that go into our count so far (where 2^(n-3) should be read as 2 to the power of (n-3)).

Of course that isn't all by a long shot. So far we just have the subsets counted that contain both a(n-1) and a(n-2). Let's stick with a(n-1) and drop a(n-2). Now as we started out noticing a(n)=a(n-1)+a(n-3). So if we take any subset that contains a(n-1) and a(n-3) and at least one more element then we have a subset of S we should count. If the subset contained a(n-2) we would have already counted it in our previous steps, so there are only n-4 other elements that might be present or not. The only possibility we don't want to count is where they all aren't present and just leave us with {a(n-3),a(n-1)} which isn't quite enough and so we have (2^(n-4))-1 additional subsets to count.

So far we have count(n)=(2^(n-3))+(2^(n-4))-1+?

Have we counted everything that contains a(n-1)? We've counted everything that contains a(n-1) and either a(n-2) or a(n-3) or both. Are there other combinations that add up greater than a(n-3) (so that when combined with a(n-1) will be greater than a(n))? Why yes there are. You should be able to see there are count(n-3) of them because of the way count is defined.

So we now have count(n)=(2^(n-3))+(2^(n-4))-1+count(n-3)+?
with just having counted the Subsets containing a(n-1).


Can you take it from there?
 
  • #35
not so much. I am trying to use a smaller set of S to see if I can count them properly. I am not sure that approach works because it's easier to count the extremities than it is to count the nitty gritty stuff towards the center of S where stuff gets messier
 
<h2>What is the subset problem?</h2><p>The subset problem is a mathematical problem that involves finding all possible subsets of a given set. In other words, it is the problem of finding all possible combinations of elements from a set.</p><h2>What is the significance of solving the subset problem?</h2><p>The subset problem has many applications in computer science and mathematics, such as in data analysis, graph theory, and algorithm design. It is also a fundamental problem in computer science and has been extensively studied and used in various fields.</p><h2>How is the subset problem solved?</h2><p>The subset problem can be solved using various approaches, such as recursion, dynamic programming, and backtracking. Each approach has its advantages and disadvantages, and the choice of method depends on the specific problem and its constraints.</p><h2>What is the "count of valid sets" in the subset problem?</h2><p>The count of valid sets in the subset problem refers to the number of subsets that satisfy a given condition or criteria. For example, in the problem of finding all subsets of a set that add up to a specific sum, the count of valid sets would be the number of subsets that add up to that sum.</p><h2>What is the range of values for "N" in the subset problem?</h2><p>The range of values for "N" in the subset problem can vary depending on the specific problem. In general, "N" can be any positive integer, but it can also be limited by the available memory and computational resources. It is essential to consider the constraints of the problem when determining the range of values for "N".</p>

What is the subset problem?

The subset problem is a mathematical problem that involves finding all possible subsets of a given set. In other words, it is the problem of finding all possible combinations of elements from a set.

What is the significance of solving the subset problem?

The subset problem has many applications in computer science and mathematics, such as in data analysis, graph theory, and algorithm design. It is also a fundamental problem in computer science and has been extensively studied and used in various fields.

How is the subset problem solved?

The subset problem can be solved using various approaches, such as recursion, dynamic programming, and backtracking. Each approach has its advantages and disadvantages, and the choice of method depends on the specific problem and its constraints.

What is the "count of valid sets" in the subset problem?

The count of valid sets in the subset problem refers to the number of subsets that satisfy a given condition or criteria. For example, in the problem of finding all subsets of a set that add up to a specific sum, the count of valid sets would be the number of subsets that add up to that sum.

What is the range of values for "N" in the subset problem?

The range of values for "N" in the subset problem can vary depending on the specific problem. In general, "N" can be any positive integer, but it can also be limited by the available memory and computational resources. It is essential to consider the constraints of the problem when determining the range of values for "N".

Similar threads

  • Linear and Abstract Algebra
Replies
1
Views
822
  • Calculus and Beyond Homework Help
Replies
1
Views
455
  • Linear and Abstract Algebra
Replies
1
Views
837
  • Set Theory, Logic, Probability, Statistics
Replies
20
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • General Math
Replies
5
Views
982
  • Precalculus Mathematics Homework Help
Replies
6
Views
768
  • Linear and Abstract Algebra
Replies
1
Views
963
  • Precalculus Mathematics Homework Help
Replies
3
Views
138
Back
Top