Solving the Subset Problem: Finding the Count of Valid Sets from 1 to N

haruspex · May 19, 2012

I'm puzzled there's been no response to my post (#31).
Isn't it the answer to the originally posed question? Do I need to explain more?

ClifDavis · May 19, 2012

haruspex said:

I'm puzzled there's been no response to my post (#31).
Isn't it the answer to the originally posed question? Do I need to explain more?

As you note, you are answering the original question for {1,2,3,...,n} whereas what he actually needs is an answer to the simpler question for {a(1),a(2),a(3),...a(n)} which generally looks like {1,2,3,4,6,9,13,...}.

More importantly your answer is wrong for the question you are answering.

For N=5 you calculate an answer of 9 subsets. But the subsets that fit the criteria are:
{1,2,3,4} 1+2+3+4=13>5
{2,3,4} 2+3+4=9>5
{1,3,4} 1+3+4=8>5
{3,4} 3+4=7>5
{1,2,4} 1+2+4=7>5
{2,4} 2+4=6>5
{1,2,3} 1+2+3=6>5

By my count that makes 7. If you could cough up another two, that would be interesting.

So yes, I think you do need to explain more.

ClifDavis · May 19, 2012

SeventhSigma said:

not so much. I am trying to use a smaller set of S to see if I can count them properly. I am not sure that approach works because it's easier to count the extremities than it is to count the nitty gritty stuff towards the center of S where stuff gets messier

Well the reason that I initially suggested that you start by checking the number of subsets of S that total to n is that those subsets are the key to counting the qualifying subsets that total to more than n. And finding them points out that you only have to count the extremities in that you only have to count subsets that contain a(n-1) which we've already done, or that don't contain a(n-1) and do contain a(n-2).

At some point you do have to look at the simple sets that don't have enough elements to fit the general case, but until you look at the general case, you don't know how many of the simple cases to look at.

Our basic relationship a(n-3)+a(n-1)=a(n) was the key to counting the number of qualifying subsets that contained a(n-1). To look at those that don't contain a(n-1) we want to get rid of the a(n-1) in our equivalence expression, so we use the known fact from our definition of the a function that a(n-1)=a(n-2)+a(n-4). So we have a(n)=a(n-3)+a(n-1)=a(n-3)+a(n-2)+a(n-4) or to rearrange the terms slightly, we have a(n-4)+a(n-3)+a(n-2)=a(n) and so we have another equivalence where a(n-1) does not occur. This one will be our starting point for counting qualifying subsets where a(n-2) is in the subset.

Now counting the qualifying subsets with a(n-2) as the largest element is something we still have to do. But once we are through we will want to move on to get the next equivalence that doesn't contain a(n-2) either. We know that a(n-2)=a(n-3)+a(n-5) so we can use that to get rid of a(n-2).
So, we have
a(n)=a(n-4)+a(n-3)+a(n-2) and applying the definition of a to a(n-2)
=a(n-4)+a(n-3)+a(n-3)+a(n-5)
=a(n-5)+a(n-4)+a(n-3)+a(n-3) which is okay except we have two a(n-3) but as before
=a(n-5)+a(n-4)+a(n-3)+a(n-4)+a(n-6) we get rid of an a(n-3)
=a(n-6)+a(n-5)+a(n-4)+a(n-4)+a(n-3) but now we have two a(n-4)s so get rid of 1
=a(n-7)+a(n-6)+a(n-5)+a(n-5)+a(n-4)+a(n-3) and now two a(n-5)s...
=a(n-8)+a(n-7)+a(n-6)+a(n-6)+a(n-5)+a(n-4)+a(n-3)

and you see what is happening. Every time we use the definition of the function a to get rid of a duplicate, it creates a new duplicate and it will continue to do so until we extend down to an a(1) term at which point we'll have a duplicate a(3) term. In other words the sum
a(1)+a(2)+a(3)+...+a(n-3)<a(n). If we're doing this right, we should prove that by induction on n. And since the sum of a(1) to a(n-3) is less than n, any qualifying subset must contain a(n-2) or a(n-3) or both. So our sets with extreme values really are all we have to count.

Now there's a still a trick to counting qualifying subsets that have a(n-2) as their largest element, but if you start off counting the sets with a(n-2) as the largest element the same way I did for sets with a(n-1) as the largest element, you will run into it and hopefully see the solution.

haruspex · May 19, 2012

ClifDavis said:

For N=5 you calculate an answer of 9 subsets. But the subsets that fit the criteria are:
{1,2,3,4} 1+2+3+4=13>5
{2,3,4} 2+3+4=9>5
{1,3,4} 1+3+4=8>5
{3,4} 3+4=7>5
{1,2,4} 1+2+4=7>5
{2,4} 2+4=6>5
{1,2,3} 1+2+3=6>5

By my count that makes 7. If you could cough up another two, that would be interesting.

The original statement is this:

If I have a set of numbers from 1 to N, I want to know how to count sets such that the highest number in the set is smaller than the sum of the rest of the subset.

Note: highest number in the subset (not necessarily N).
This was made clearer in post #3:

A valid subset is one where the largest element of the subset is less than the sum of the rest of the subset.

Necessarily, no subset of 1 or 2 elements can be valid.
The following are the 9 valid for N=5:
2, 3, 4; 2, 4, 5; 3, 4, 5; 1, 2, 3, 4; 1, 2, 3, 5; 1, 2, 4, 5; 1, 3, 4, 5; 2, 3, 4, 5; 1, 2, 3, 4, 5;

There is an easy connection between the two interpretations. If OP(N) is the answer to the original problem and OPN(N) is the answer when the highest term in each subset is required to be N then
OP(N+1) = OP(N) + OPN(N+1)
E.g. OP(4) = 2.

Presumably the same misunderstanding has arisen regarding the a(n) version of the problem; luckily the corresponding relationship applies: AP(N+1) = AP(N) + APN(N+1) where N is the number of elements in the full set.

I agree this version of the problem looks far more amenable.

SeventhSigma · May 19, 2012

ClifDavis I think we may be talking about different problems. There are possible valid subsets where values aren't from extremities at all, or they may be from all over the place, and so on. You can't just count from extremity cases.

Look at the solution for N=10:

http://pastebin.com/jr6cr7q1

I don't think the approach you are describing emulates the nature of this output. It's looking at a subset and checking the sum of inner elements against the highest element of that subset.

N=10 means S goes up to a(10), or S = [1, 2, 3, 4, 6, 9, 13, 19, 28, 41]
one valid subset might be (1, 6, 13, 19) because 1+6+13 = 20, which is larger than 19.
haruspex: Those solutions aren't right either because your subsets include numbers (namely 5 in this case) that aren't in S.

haruspex · May 19, 2012

SeventhSigma said:

ClifDavis I think we may be talking about different problems. ... It's looking at a subset and checking the sum of inner elements against the highest element of that subset.

As I said, I believe ClifDavis' interpretation ("APN(N)", where a(N) is the highest in the set) is readily mapped to what you intended ("AP(N)") according to: AP(N+1) = AP(N) + APN(N+1). I.e. AP(N) = sum APN(1) to APN(N).

haruspex: Those solutions aren't right either because your subsets include numbers (namely 5 in this case) that aren't in S.

Quite so - at #31 I was answering the problem as originally posted. I came into the thread just at the point where you added the explanation about a(n) and posted my input without reading that update.

SeventhSigma · May 19, 2012

I don't think so, because in his earlier analysis he compares subset sums to 5, which is not a valid comparison. You need to compare the highest number of the subset against the sum of the rest of the subset

haruspex · May 19, 2012

SeventhSigma said:

I don't think so, because in his earlier analysis he compares subset sums to 5, which is not a valid comparison. You need to compare the highest number of the subset against the sum of the rest of the subset

Sorry, you've lost me. Which post is that in? If you mean post #37, that's where ClifD was saying I had the wrong answer to your original question (1, 2... n, rather than a(1), ... a(n)).

ClifDavis · May 19, 2012

SeventhSigma said:

I don't think so, because in his earlier analysis he compares subset sums to 5, which is not a valid comparison. You need to compare the highest number of the subset against the sum of the rest of the subset

haruspex is quite right. I had gotten the statement of the problem confused from looking at other people's posts and programs rather than yours. Essentially I was showing how to count the new elements for n and not all the elements for n. And I wasn't putting the last term in the subset. Which doesn't effect the count. However it means that my explanation of what I was doing is totally confusing.

To rephrase what he was saying into the notation I was using, if count(n) gives the count for all the qualifying subsets of {a(1),...a(n)} and countnew(n) gives the count of just those qualifying subsets that contain a(n), then count(n)=countnew(n)+count(n-1)

Or conversely countnew(n)=count(n)-count(n-1) which is something that is handy to know.

But what it means is if you have an equation for countnew(n)=stuff then you also have a valid recurrence relationship count(n)=count(n-1)+stuff.

But now that we're on the same page, I'll go back and cover what I was doing for the general case and use smaller arguments for illustration which may make it easier to follow. But not tonight. Time will be tight Sunday and Monday, so it may be Tuesday before I can run through a reasonable explanation, though I'll try to slip it in earlier. If you solve it in the meantime or if haruspex can give you useful clues then it won't matter.

But the important points are still the same. Your a(n)'s are positive and increasing. a(1)+a(2)+a(3)+...a(n-3)<a(n) so any subset to be counted that contains a(n) as the largest element must contain a(n-1) or a(n-2) or both. And if you look at the different subsets containing a(n) as the largest element where the other elements add up to a(n), this will guide you to the different categories of subsets that you can count without having to explicitly create them.

haruspex · May 19, 2012

Fwiw, I had a go at this simpler problem:
Sn = {A(1), .. , A(n)}, where the A(n) satisfy the recurrence relation A(n) = A(n-1) + A(n-3), etc.
How many subsets of Sn sum to A(n) exactly?
I can't even solve that yet. The problem is that some such subsets cannot be obtained by repeated application of the recurrence relation to break down the target sum. E.g. A(1)+A(2) = A(3). So instead I'll just go for the lower bound which only counts the subsets that can be obtained in this way.

Let T(n,P) = number of subsets that add up to the same as the terms indicated by the bit pattern P, e.g.
P = 1 stands for just the term A(n)
P = 11 stands for A(n-1)+A(n)
P = 101 stands for A(n-2)+A(n)
etc.
T(n,1) = 1 + T(n-1,101)
T(n,101) = T(n-1,111) + T(n-2,1)
T(n,1111) = T(n-1,111) = T(n-2,11)
T(n,11) = T(n-1,1) + T(n-1,1111)
whence
T(n,1) = 1 + T(n-2,111) + T(n-3,1)
T(n,111) = T(n-2,1) + T(n-3,111)
For the general solution we drop the "1+" and try λ[itex]^{n}[/itex].
This yields the polynomial equation:
λ[itex]^{3}[/itex] +/- λ - 1 = 0
Bringing the "1+" back in, the particular solution to be added on is T(n,1) = 0, T(n,111) = -1.
The general solution is therefore
[itex]\Sigma^{6}_{i=1} k_{i}λ_{i}^n[/itex]
where the λ[itex]_{i}[/itex] are the roots of the polynomial.

haruspex · May 20, 2012

OK, here's my attempt at the actual problem. Methods and results largely the same.

Let C(n) be the number of subsets of Sn that sum to more than A(n).
Let D(n) be the number of subsets of Sn that sum to more than A(n)+A(n-1).

Starting with a summation target of A(n)+1 (or higher):
1. Suppose we use A(n). We now only have to use one other, i.e. any of 2^(n-1)-1 subsets. OTOH,
2. if we don't use A(n) we know we must use either A(n-1) or A(n-2) or both.
2a. If we use both then we can use any combination of A(1) to A(n-3), 2^(n-3) subsets.
2b. If we use A(n-1) but not A(n-2) then we have a remaining target of A(n)+1-A(n-1) = A(n-3)+1, for which there are C(n-3) possibilities.
2c. If we use neither A(n) nor A(n-1) then we have a target of A(n)+1 = A(n-2)+A(n-3)+A(n-4)+1. We must use A(n-2). That leaves a target of A(n-3)+A(n-4)+1, for which there are D(n-3) possibilities.
Putting this together:
C(n) = 2^(n-1)-1 + 2^(n-3) + C(n-3) + D(n-3)

Similarly, with D(n), we can use A(n), leaving a target of A(n-1)+1; or not use A(n), leaving a target of A(n)+1 = A(n-2) + A(n-3) + A(n-4) + 1.
D(n) = C(n-1) + D(n-3)

The homogeneous solution is λ[itex]^{n}[/itex], where λ satisfies the same equation as in the previous post: λ[itex]^{3}[/itex] +/- λ - 1 = 0.
The particular solution is now:
C(n) = (2^n)*7/9; D(n) = (2^n)*4/9 + 1.

SeventhSigma · May 21, 2012

I don't think that's right... doesn't return 501 for N=10 although the logic seems to be headed in the right direction

SeventhSigma · May 21, 2012

Again I think I maybe haven't made it clear what this problem entails:

If we have N=5 that means S has the first five terms a(1) through a(5): [1, 2, 3, 4, 6]
The answer here is 7 because
(2,3,4) because the internal sum of 5 is greater than 4
(3,4,6) because the internal sum of 7 is greater than 6
(1,2,3,4) because the internal sum of 6 is greater than 4
(1,2,4,6) because the internal sum of 7 is greater than 6
(1,3,4,6) because the internal sum of 8 is greater than 6
(2,3,4,6) because the internal sum of 9 is greater than 6
(1,2,3,4,6) because the internal sum of 10 is greater than 6

A valid subset of S is one where the highest element of that subset is less than the sum of the rest of that subset.

"Starting with a summation target of A(n)+1 (or higher):
1. Suppose we use A(n). We now only have to use one other, i.e. any of 2^(n-1)-1 subsets. OTOH, "

Implies the first summation target we look at for N=5 is A(5)+1 = 6+1 = 7
Now suppose we use A(5) or 6 in our subset: (6)
You say that including anything from the rest of S, (1,2,3,4), is valid here, but it isn't. I could use one other (1,2) giving me (1,2,6) which is not valid because 1+2 is < 6. It is true that 1+2+6 >= 7, but that isn't what we're trying to solve for here.

haruspex · May 21, 2012

SeventhSigma, Sorry, I forgot to point out that this is not quite the problem as you stated it, but that it should be mappable to it fairly easily.
First, if you exclude the use in the sum of the top term in the set, that just subtracts 2^(n-1)-1 possibilities. That gets us to the version ClifDavis considered. (I probably could have got straight to that by modifying the argument a little. You might care to try that.) Now you have to sum the counts for n = 1 to N to get the solution to the correct version.

SeventhSigma · May 21, 2012

I tried to do that but was getting erroneous values. Would you be able to show how it works for something like N=10 in terms of setup? (I know the answer is 501 there)

N=1, 2, 3: Answer 0
N=4: Answer 2
N=5: Answer 7
N=6: Answer 19
N=7: Answer 47
N=8: Answer 108
N=9: Answer 236
N=10: Answer 501

haruspex · May 21, 2012

So you found all the roots and coefficients? I haven't tried to do that. Could you post them? Or did you just run through the logic recurrence to see if it gave the right answer?

SeventhSigma · May 21, 2012

No I mean I tried to take the general logic of your explanation and see if I could remap it by changing the way I handle targets and inclusions

[1, 2, 3, 4, 6] = S for N=5
so if summation target = 6 that means we would have (4, 6) but the rest of the subset S only works if we take at least 2 elements because (1, 4, 6) would not work, and so on.

Otherwise yes I just went through the recurrence to see if it returned the right answer (wrote a program)

haruspex · May 21, 2012

I put my recurrence relation, plus the mapping to your problem, into a spreadsheet and it gave 501 for N=10. You have to be careful where you start the recurrence. C(3) = D(4) = 3, D(3) = 1. If you start too soon you pick up the fictitious value A(0) = 1.
(This is a bit confusing because C(n) is in spreadsheet col D, D(n) in col E.)
n A 2^ C D APN AP
1 1 1 0 0 0
2 2 2 1 0 0 0
3 3 4 3 1 0 0
4 4 8 9 3 2 2
5 6 16 20 9 5 7
6 9 32 43 21 12 19
7 13 64 91 46 28 47
8 19 128 188 100 61 108
9 28 256 383 209 128 236
10 41 512 776 429 265 501
The formulae, starting with N=5, look like:
D5 =C5-1+C3+D2+E2 (i.e. C(n), col D)
E5 =D4+E2
F5 =D5-C5+1
G5 =G4+F5

haruspex · May 21, 2012

As I said, the double use of A, C, D for variables and spreadsheet columns is confusing.
In the spreadsheet:
col A is N
col B is A(N)
col C is 2^(N-1)
col D is my C(N)
col E is my D(N)
col F is what I've previously referred to as APN(N)
col G should be the number you're after

The formulae at the end of my previous post refer to spreadsheet columns.
The array starts (N=1) in row 2 of the spreadsheet, row 1 being headings.
I've given the formulae that go in row 5. Above row 5 these don't apply in all columns so you need to plug in the hard numbers.

APN(N) is ClifDavis (mis)reading of the problem, i.e. it takes the target to beat as being A(N), rather than the highest number in the subset. I've kept that in because it is a useful step along the way to the numbers you want.

SeventhSigma · May 21, 2012

I am trying to find another way to simplify everything so I can solve this for large n mod m eventually

using your notation:

c(n) = 2^(n-1) - 1 + 2^(n-3) + c(n-3) + d(n-3)
c(1) = 0, c(2) = 1, c(3) = 3

d(n) = c(n-1) + d(n-3)
d(1) = 0, d(2) = 0, d(3) = 1

g(n) = c(n) - 2^(n-1) + 1 + g(n-1)
g(1) = 0

where g(n) is the answer function

but I can rewrite g(n) as
g(n) = n - 2^n + 1 + X
Where X is c(2) + ... + c(n)

What might be a good way to quickly find sums of c(n) sequences? I tried listing everything out longhand:

Code:

c(1) = 0
c(2) = 1
c(3) = 3
c(4) = 2^(3) - 1 + 2^(1)
c(5) = 2^(4) - 1 + 2^(2) + 1
c(6) = 2^(5) - 1 + 2^(3) + 3 + 1
c(7) = 2^(6) - 1 + 2^(4) + 2^(3) - 1 + 2^(1) + 3
c(8) = 2^(7) - 1 + 2^(5) + 2^(4) - 1 + 2^(2) + 1 + 2^(3) - 1 + 2^(1)
c(9) = 2^(8) - 1 + 2^(6) + 2^(5) - 1 + 2^(3) + 3 + 1 + 2^(4) - 1 + 2^(2) + 1 + 1
c(10) = 2^(9) - 1 + 2^(7) + 2^(6) - 1 + 2^(4) + 2^(3) - 1 + 2^(1) + 3 + 2^(5) - 1 + 2^(3) + 3 + 1 + 3

d(1) = 0
d(2) = 0
d(3) = 1
d(4) = 3 
d(5) = 2^(3) - 1 + 2^(1)
d(6) = 2^(4) - 1 + 2^(2) + 1 + 1
d(7) = 2^(5) - 1 + 2^(3) + 3 + 1 + 3

so g(10) here would be g(10) = 10 - 2^10 + 1 + c(2) + c(3) + c(4) + c(5) + c(6) + c(7) + c(8) + c(9) + c(10) = 501

haruspex · May 22, 2012

Recall that I arrived at:
C(n) = (2^n)*7/9 + [itex]\sum[/itex][itex]^{6}_{i=1}[/itex]k[itex]_{i}[/itex]λ[itex]^{n}_{i}[/itex] = (2^(n-1))*14/9 + [itex]\sum[/itex][itex]^{6}_{i=1}[/itex]k[itex]_{i}[/itex]λ[itex]^{n}_{i}[/itex]
where the λ[itex]^{i}[/itex] are the roots of λ^3 +/- λ - 1 = 0
So G(n) = [itex]\sum[/itex]{(2^(n-1))*5/9 + 1 + [itex]\sum[/itex][itex]^{6}_{i=1}[/itex]k[itex]_{i}[/itex]λ[itex]^{n}_{i}[/itex]}
= ((2^n)-1))*5/9 + n + [itex]\sum[/itex][itex]^{6}_{i=1}[/itex]g[itex]_{i}[/itex]λ[itex]^{n}_{i}[/itex] + some constant
where the g[itex]_{i}[/itex] are related to the k[itex]_{i}[/itex] by g[itex]_{i}[/itex] = k[itex]_{i}[/itex]*λ[itex]_{i}[/itex]/(λ[itex]_{i}[/itex]-1)

It remains to figure out the λ[itex]_{i}[/itex] (two sets of roots will be complex pairs) and use the first so many values of G(N) to find the g[itex]_{i}[/itex]. The constant should be
-[itex]\sum[/itex][itex]^{6}_{i=1}[/itex]g[itex]_{i}[/itex]/λ[itex]_{i}[/itex]
Of course, the g[itex]_{i}[/itex] will be such that all the imaginary terms cancel.

SeventhSigma · May 22, 2012

I don't really understand that method at all... there's no other shortcut?

Mandlebra · May 22, 2012

Some times a generating function approach*is*the short cut

SeventhSigma · May 22, 2012

I guess I am just not understanding how generating functions work here because I am trying to solve this for a very large n with modulus, and I worry that using roots will result at in decimal values that will lack precision (assuming that I even figure out how that approach works)

haruspex · May 22, 2012

It's not a generating function. Generating functions have the values of interest as coefficients in an infinite power series. It is a closed form expression which, once the constants have been figured out, will produce the answer for any n in a fixed number of steps. You can do the same for all sorts of series. E.g. http://en.wikipedia.org/wiki/Fibonacci_number#Closed-form_expression.
Although the computation involves irrationals, the answer, if calculated precisely, always produces an integer. So you only have to calculate it precisely enough to get the error less than 0.5. The snag is that how accurately the constants need to be calculated probably depends on n.
When I get time I'll have a go at finding the constants.

Mandlebra · May 22, 2012

Ah are correct, I misread. Good thread though with a nice topic

SeventhSigma · May 22, 2012

I don't understand how this lambda equation was arrived at or how you solve it. Where does the k term come from? Are all lambdas the same since we're looking at the same equation here (λ^3 +/- λ - 1 = 0)? Trying to piece it together but I am not sure what I'm doing.

haruspex · May 22, 2012

The lambdas are roots of a 6th order polynomial, so there are in principle 6 of them.
The polynomial is (λ^3 + λ - 1)(λ^3 - λ - 1), so the roots are those of the two cubics (λ^3 + λ - 1) and (λ^3 - λ - 1). Each has one real root and a complex pair.
The importance of these numbers is that if C(n) is a solution of the recurrence relation then C(n) + λ^n is also, but only for these 6 values of λ. This means that we can generate all possible solutions for C(n) by starting with anyone solution and adding combinations of terms like λ^n. Then we plug in the boundary conditions (the first few values) to find the right combination of these terms.
The real roots are 0.6823.., 1.3247.. The complex roots are -.3412+/-i*1.1615, -.6624+/-i*0.5623.
I'm not sure how the cancellation of the imaginary parts will happen. I'm going to assume that it happens separately within each pair, i.e. the imaginary parts of λ^n will cancel with the contribution from its conjugate root. This means the coefficients will be the same for both members of the pair. That gets us down to only 4 terms to worry about.
The easiest way to extract the real components is to write the complex roots in r.e^iθ form. Then the real part of λ^n is r^n.cos(n*θ).
The table of lambda powers therefore starts:
n 1 2 4 5
0 1 1 1 1
1 1.32 1.32 0.68 0.68
2 1.75 1.44 0.47 -0.88
3 2.32 1.07 0.32 -2.44
4 3.08 -0.15 0.22 -1.73
5 4.08 -2.61 0.15 2.07
6 5.4 -6.6 0.1 5.97
7 7.16 -12.09 0.07 4.39
8 9.48 -18.35 0.05 -4.85
9 12.56 -23.59 0.03 -14.58
10 16.64 -24.5 0.02 -11.1
It remains to find four constant coefficients to multiply each column by, and one more additive constant. Adding up those 5 terms for each row, plus ((2^n)-1))*5/9 + n, should then give the required answers. We need to use the known values for G(1) to G(5) to find these. So that's an exercise is simultaneous linear equations with 5 unknowns. (As I mentioned, the additive constant can actually be calculated from the four coefficients and the lambdas, so we can reduce it to four unknowns.)
I can have a go at that later if you like.

SeventhSigma · May 22, 2012

Would this method be workable for solutions where n>10^18, modulo 1 billion? I ask because I am wondering how precise the roots would need to be?

ClifDavis · May 22, 2012

haruspex said:

Fwiw, I had a go at this simpler problem:
Sn = {A(1), .. , A(n)}, where the A(n) satisfy the recurrence relation A(n) = A(n-1) + A(n-3), etc.
How many subsets of Sn sum to A(n) exactly?

The value of the question isn't in obtaining the actual value of of the number of such subsets of Sn, but rather in the fact that in order to do so, you have to be able to characterize the subsets that Sn counts and it's what you have to do to accomplish that which gives you the tools to answer the original question.

haruspex said:

I can't even solve that yet. The problem is that some such subsets cannot be obtained by repeated application of the recurrence relation to break down the target sum. E.g. A(1)+A(2) = A(3).

And you put your finger on the significant question. Can all such subsets be obtained by repeated application of the recurrence relation? And as you correctly point out, the answer is no, as a(1)+a(2)=a(3) which certainly cannot be obtained by repeated application of the recurrence relation. Nor is this a lone exception. For example a(1)+a(2)+a(5)=a(6) cannot be obtained by repeated use of a(n)=a(n-1)+a(n-3). But we can go a(6)=a(6-1)+a(6-3)=a(5)+a(3)=a(5)+a(2)+a(1).

So this leads to a new question. Can all such subsets be obtained by repeated application of the recurrence relation combined with the possible substitution of a(1)+a(2) for a(3)?

In order to answer the question we need 4 facts.

(1) The a(i) are positive. Or another words if n>0 then a(n)>0.

(2) The a(i) are increasing, a(1)<a(2)<a(3)<a(4)<... Or in other words if n>1 then a(n-1)<a(n).

(3) For n>0, the sum of the first n values, a(1)+...+a(n) is less than a(n+3).

[[If n=1, the expression on the left is intended to have only one term as a(n) is a(1). I would use summation notation to be clearer, but have no appropriate tool to do that.
Notice that this third item can also be stated as:
For n>3, the sum of the first n-3 values, a(1)+...+a(n-3) < a(n).]]

(4) for n>0, a(1)+...+a(n) <a(n+1)+a(n+2)

[[or for n>2 a(1)+...+a(n-2)<a(n-1)+a(n)]]

ClifDavis · May 22, 2012

The a(n) are positive, ie. For n>0, a(n)>0.

Our definition of the a function is:
a(1)=1
a(2)=2
a(3)=3
and for n>3, a(n)=a(n-1)+a(n-3).

Notice that a(0) and a(-1) etc. are not defined. We might be able to use the recurrence relationship to define them, but we don't. There is also the assumption that n is an integer. We don't define a(1.5).

Suppose the a(n) are not all positive. If for some integer we have a zero or negative result, ie. a value of a(n) which is not positive, then since they are integers, there is a smallest value, let's call it k, for which a(k)>0 is not true.

Now k is not 1 or 2 or 3 as
a(1)=1>0
a(2)=2>0
a(3)=3>0
and so k>3.

But this means a(k)=a(k-1)+a(k-3).

Since k>3 then k-1>3-1=2>0 (or k-1>0 ) and k-3>3-3=0 (k-3>0). So a(k-1) and a(k-3) are defined. But since -1<0 k-1<k. And likewise k-3<k. Now k was the smallest integer for which a(k)>0 is false so we know that a(k-1)>0 and a(k-3)>0. But this means that a(k-1)+a(k-3)>0+0=0. But from the recurrence relationship a(k) is a(k-1)+a(k-1), and so a(k)>0 contrary to our assumption. And this means that a(k) is not the smallest value of k that isn't positive, and so there is no smallest integer k such that a(k) is not positive and so there aren't any. Which is to say they are all positive.

ClifDavis · May 22, 2012

The a(i) are increasing, if n>1 then a(n-1)<a(n).

Well for n=2 then 1<2 or a(1)<a(2 or a(2-1)<a(2) or a(n-1)<a(n) is true for n=2.
For n=3 we know 2<3 so a(2)<a(3) or a(3-1)<a(3) or a(n-1)<a(n) for n=3.

Assume there is an integer n>1 for which a(n-1)<a(n) is not true. Let i be the smallest such that i>1 but a(n-1)<a(n) is false.
We have already seen that i is not 2 or 3 because it would be true for them. So i>3. This means that i-1>2>0 and i-3>0. Which means a(i-1) and a(i-3) are defined and a(i-1)>0 and a(i)-3>0. Or as I prefer, 0<a(i-3). So a(i-1)+0<a(i-1)+a(i-3). But a(i-1)+0=a(i-1) and as we all know since i>3, a(i-1)+a(i-3)=a(i), which gives us a(i-1)<a(i), contrary to our assumption. So there no such smallest i, which means there is no such n and the a(i) are strictly increasing.

ClifDavis · May 22, 2012

For n>0, the sum of the first n values of a(i) is less than a(n+3).

I like to state it that way, though when I use it, its usually in the form, if n>3 then the sum of the first n-3 values of a(i) is less than a(n).

For n=1 the first 1 values of a(i) is a(1) and it's sum is a(1)=1. a(1+3)=a(4)=4. It's certainly the case that 1<4, and so it's true for n=1.

Let's assume that it's true for some integer k>0, ie. the sum of the first k values of a(k) is less than a(k+3). Or as I prefer to write it, a(1)+...+a(k)<a(k+3). Since k>0 then k+1 >0 and so a(k+1) is defined and we can add it to both sides of our inequality to get a(1)+...+a(k)+a(k+1)<a(k+3)+a(k+1). Since k>0, k+4>4>3, and so from our recurrence relationship we know that a(k+3)+a(k+1)=a(k+4).
And so we have the sum of the first k+1 values of a(i), a(1)+...+a(k+1)<a(k+3)+a(k+1)=a(k+3+1).

So we know it's true for k=1 and if its true for k, then it's true for k+1, so by induction it's true for all integer k>0 or if you prefer, for all integer n>0.

ClifDavis · May 22, 2012

For n>0, a(1)+...+a(n) <a(n+1)+a(n+2). For n=1 this is true as a(1)< a(1+1)+a(1+2) or another words 1<2+3. This is another case where a(1)+...+a(n) is intended to show one term when n=1. I apologize for not using summation notation and can see that I'm going to have to learn MathXML.

For n>1 then n+2>3 and so a(1)+...+a(n-1)<a(a+2). But since the a(i) are strictly increasing, a(n)<a(n+1). All of which means a(1)+...+a(n-1)+a(n)<a(n+1). And so it's true for all n>0,

ClifDavis · May 22, 2012

Let Sn={a(1),...,a(n)} for any n>0. How do we characterize the subsets of Sn that add up to a(n)? Can we get a subset of Sn for any n, such that the elements of the subset add up to a(n) but the elements cannot be derived from the recurrence relationship and a(3)=a(2)+a(1) [substituting a(2)+a(1) for a(3)]?

Let i be the smallest integer such that Si has a subset where all the elements of the subset add up to a(n) but the set is not {a(n)} and cannot be derived from {a(n)} by repeated application of the recurrence relationship [that for n>3, a(n-3)+a(n-1)=a(n)] and the fact that a(1)+a(2)=a(3). Then i<6.

Assume that i>5 and i is the smallest integer so that there is a subset S'i of Si such that the elements of S'i add up to a(n), but S'i is not {a(n)} and cannot be derived from {a(n)} with repeated applications of the recurrence relationship and a(1)+a(2)=a(3).

(1) It is not the case that a(n) is an element of S'i.
If a(n) were an element then either a(n) is the only element and S'i={a(n)} contrary to assumption or there are other elements a(r1),...,a(rj), but each of the elements of Si is positive and so a(r1)+...+a(r2) >0 and adding a(n) to both sides of the inequality give us a(r1)+...a(rj)+a(n)>a(n) contrary to the assumption that the elements add up to a(n).

(2) At least one of a(i-1) and a(i-2) is an element of S'i.
Since i>5 then a(i-1) and a(i-2) exist. Assume that neither a(i-1) nor a(i-2) is in S'i. But since a(i) is not in S'i, we see that all the elements of S'i are in S(i-3). (S(i-3) exists since i>5>3.) But even if all the elements of S(i-3) were added up, a(1)+...+a(i-3), since i>3 a(1)+...+a(i-3)<a(i) and so as the elements of S(i-3) are all positive, the elements of S'i cannot add up to a(i). Therefor at least one of a(i-1) or a(i-2) is an element of S'i.

(3) a(i-1) and a(i-2) are not both elements of S'i.
Since i>5, i>3 and so a(i)=a(i-1)+a(i-3). Since i>3, i-2>1 and so a(i-3)<a(i-2). Or the same thing, a(i-2)>a(i-3). Adding a(i-1) to both sides give us a(i-1)+a(i-2)>a(i-1)+a(i-3)=a(i). Since a(i-1) and a(i-2) add up to more than a(i) and all elements of S'i must be positive, if both a(i-1) and a(i-2) were elements of S'i then the sum of its elements must be more than a(i) contrary to assumption. Therefor a(i-1) and a(i-2) are not both elements of S'i.

(4) S'i is not {a(i-3),a(i-1)}.
While this would add up to a(i), it can be derived from {a(i)} with a single application of the recurrence relationship, contrary to assumption.

(5) a(i-1) is not in S'i.
Assume a(i-1) is in S'i and a(i-2) is not. Then as a(i-1)<a(i), it must be the case that there are other elements of S'i, a(r1),...,a(rj) such that a(r1)+...+a(rj)+a(i-1)=a(i). But a(i)=a(i-1)+a(i-3) so a(r1)+...+a(rj)+a(i-1)=a(i-1)+a(i-3) and subtracting a(i-1) from both sides,a(r1)+...+a(rj)=a(i-3). Since neither a(i-2) nor a(i) are in S'i , they're also not in {a(r1),...,a(rj)} and so {a(r1),...,a(rj)} is a subset of S(i-3)={a(1),...,a(i-3)}. Notably it's a subset of S(i-3) whose elements sum to a(i-3). Furthermore {a(r1),...,a(rj)} cannot be derived from {a(i-3)} by the use of the recurrence relationship and the substitution of a(2)+a(1) for a(3), since if it could then {a(r1),...,a(rj),a(i-1)} could be derived in the same way from {a(i-3),a(i-1)}. But S'i={a(r1),...,a(rj),a(i-1)} and {a(i-3),a(i-1)} can be derived from {a(i)} using the recurrence relationship, and so S'i could be derived from {a(i)} contrary to our definition of S'i. So, i-3<i and S(i-3) has a subset {a(r1),...,a(rj)} which cannot be derived from {a(i-3)} using the recurrence relation and possibly substituting a(2)+a(1) for a(3). But this contradicts our assumption that i is the smallest integer with that property, so if we are to preserve that assumption we must conclude that in fact a(i-1) is not in S'i.

(6) a(i-2) is in S'i.
This is straightforward. By (2) a(i-1) or a(i-2) is in S'i. By (5), a(i-1) is not in S'i. And so a(i-2) is in S'i.

(7) {a(i-4),a(i-3),a(i-2)} is not S'i.
Since i>5 a(i-4),a(i-3),and a(i-2) all exist and {a(i-4),a(i-3),a(i-2)} is a subset of Si. It's elements do add up to a(i) [see the derivation that follows immediately]. However from {a(i)} we can derive {a(i-3),a(i-1)} by applying the recurrence relationship to a(i), and then by applying it to a(i-1), we get {a(i-4),a(i-3),a(i-2)} and since it is derivable from {a(i)} it cannot be S'i.

(8) Both a(i-4) and a(i-3) cannot be in S'i.
By (6), a(i-2) is in S'i. If a(i-3) and a(i-4) are as well then either there are no additional elements in S'i which contradicts (7) or there are and since they are positive, the elements of S'i will add up to more than a(i-4)+a(i-3)+a(i-2) = a(i), contrary to the definition of S'i.

(9) a(i-4) and a(i-3) cannot both be absent from S'i.
At this point we know from (1) that a(i) cannot be in S'i. We know from (5) that a(i-1) cannot be in S'i. We know from (6) that a(i-2) is in S'(i). Assume that a(i-4) and a(i-3) are both absent from S'i. Since S'i is a subset of Si the the largest value that S'i could sum to under our assumptions is a(1)+...a(i-5)+a(i-2). But i>5 and so i-3>2 and so a(1)+...a(i-5)<a(i-4)+a(i-3). Adding a(i-2) to both sides we see that a(1)+...a(i-5)+a(i-2)<a(i-4)+a(i-3)+a(i-2)=a(i) and the elements of S'i cannot add up to a(i). But this violates our definition of S'i and so either a(i-4) or a(i-3) must be an element of S'i.

(10) a(i-4) is not an element of S'i.
Assume a(i-4) is in S'i and a(i-3) is not. Then it must be the case that there are other elements of S'i, a(r1),...,a(rj) such that a(r1)+...+a(rj)+a(i-4)+a(i-2)=a(i). But a(i)=a(i-4)+a(i-3)+a(i-2) so a(r1)+...+a(rj)+a(i-4)+a(i-2)=a(i-4)+a(i-3)+a(i-2) or subtracting a(i-4)+a(i-2) from both sides,a(r1)+...+a(rj)=a(i-3). Since neither a(i-2) nor a(i) are in S'i , their also not in {a(r1),...,a(rj)} and so {a(r1),...,a(rj)} is a subset of S(i-3). Once again it's a subset of S(i-3) whose elements sum to a(i-3). Furthermore {a(r1),...,a(rj)} cannot be derived from {a(i-3)} by the use of the recurrence relationship (and/or a(3)=a(2)+a(1)) since if it could then {a(r1),...,a(rj),a(i-1)} could be derived in the same way from {a(i-3),a(i-1)}. But S'i={a(r1),...,a(rj),a(i-1)} and {a(i-3),a(i-1)} can be derived from {a(i)} using the recurrence relationship, and so S'i could be derived from {a(i)} contrary to our definition of S'i. So, i-3<i and S(i-3) has a subset {a(r1),...,a(rj)} which cannot be derived from {a(i-3)} using the recurrence relation and a(3)=a(2)+a(1). But this contradicts our assumption that i is the smallest integer with that property, so if we are to preserve that assumption we must conclude that in fact a(i-4) is not in S'i.

(11) The big contradiction. a(i-3) cannot be an element of S'i either.
Assume a(i-3) is in S'i and a(i-4) is not. Then as we also know a(i-2) is in S'i and a(i-1) and a(i) are not, there must be other elements of S'I, a(r1),...,a(rj) such that a(r1)+...+a(rj)+a(i-3)+a(i-2)=a(i). But a(i)=a(i-4)+a(i-3)+a(i-2) so a(r1)+...+a(rj)+a(i-3)+a(i-2)=a(i-4)+a(i-3)+a(i-2). This time we subtract a(i-3)+a(i-2) from both sides to obtain a(r1)+...+a(rj)=a(i-4). The elements on the left side of the equation all came from S'i and since they were not a(i-3) or a(i-2) and could not be a(i-1) and a(i) because those elements weren't in S'i then {a(r1),...,a(rj)} is a subset of S(i-4) which has elements that add to a(i-4). {a(r1),...,a(rj)} cannot be derived from {a(i-4)} by the use of the recurrence relationship and the substitution of a(1)+a(2) for a(3) since if it could then S'i={a(r1),...a(rj),a(i-3),a(i-2)} could be similarly derived from {a(i-4),a(i-3),a(i-2)} which can be derived from {a(i-3),a(i-1)} by the recurrence relationship and can in turn be derived from {a(i)} by the same relationship. This would violate our definition of S'i and so no such derivation of {a(r1),...,a(rj)} is possible. But now since i-4<i, we have a violation of our assumption that i is the smallest such value and so our assumption that a(i-3) is in S'i must be wrong.

(12) The big finish.
We assumed that i was the smallest integer for which there was a subset, S'i of Si, such that the elements of S'i added up to a(i) but S'i was distinct from {a(i)} and could not be derived from {a(i)} using some combination of the recurrence relationship a(n-3)+a(n-1)=a(n) for n>3 and a(3)=a(1)+a(2) and that i>5.
We deduced that:
(8) a(i-4) and a(i-3) cannot both be in S'i.
(9) Either a(i-4) or a(i-3) must be an element of S'i.
(10) a(i-4) is not an element of S'i.
(11) a(i-3) is not an element of S'i.
Our only possible conclusion? If such an i exists then it is not the case that i>5 and so i<6.

Solving the Subset Problem: Finding the Count of Valid Sets from 1 to N

Similar threads

Hot Threads

Recent Insights