The sum-subset problem states:
Given a set of integers, is there a non-empty subset whose sum is zero?
This problem is NP-complete in general. I'm curious if the complexity of this slight variant is known:
Given a set of integers, is there a subset of size
k
whose sum is zero?
For example, if k = 1
, you can do a binary search to find the answer in O(log n)
. If k = 2
, then you can get it down to O(n log n)
(e.g. see Find a pair of elements from an array whose sum equals a given number). If k = 3
, then you can do O(n^2)
(e.g. see Finding three elements in an array whose sum is closest to a given number).
Is there a known bound that can be placed on this problem as a function of
k
?
As motivation, I was thinking about this question How do you partition an array into 2 parts such that the two parts have equal average? and trying to determine if it is actually NP-complete. The answer lies in whether or not there is a formula as described above.
Barring a general solution, I'd be very interested in knowing an optimal bound for k=4
.
For k=4, space complexity O(n), time complexity O(n2 * log(n))
Sort the array. Starting from 2 smallest and 2 largest elements, calculate all lesser
sums of 2 elements (a[i] + a[j])
in the non-decreasing order and all greater
sums of 2 elements (a[k] + a[l])
in the non-increasing order. Increase lesser
sum if total sum is less than zero, decrease greater
one if total sum is greater than zero, stop when total sum is zero (success) or a[i] + a[j] > a[k] + a[l]
(failure).
The trick is to iterate through all the indexes i
and j
in such a way, that (a[i] + a[j])
will never decrease. And for k
and l
, (a[k] + a[l])
should never increase. A priority queue helps to do this:
key=(a[i] + a[j]), value=(i = 0, j = 1)
to priority queue.(sum, i, j)
from priority queue.sum
in the above algorithm.(a[i+1] + a[j]), i+1, j
and (a[i] + a[j+1]), i, j+1
to priority queue only if these elements were not already used. To keep track of used elements, maintain an array of maximal used 'j' for each 'i'. It is enough to use only values for 'j', that are greater, than 'i'.For k>4
If space complexity is limited to O(n), I cannot find anything better, than use brute force for k-4
values and the above algorithm for the remaining 4
values. Time complexity O(n(k-2) * log(n)).
For very large k
integer linear programming may give some improvement.
Update
If n
is very large (on the same order as maximum integer value), it is possible to implement O(1) priority queue, improving complexities to O(n2) and O(n(k-2)).
If n >= k * INT_MAX
, different algorithm with O(n) space complexity is possible. Precalculate a bitset for all possible sums of k/2
values. And use it to check sums of other k/2
values. Time complexity is O(n(ceil(k/2))).