Fast solution to Subset sum

Ecir Hana picture Ecir Hana · Mar 21, 2012 · Viewed 14.4k times · Source

Consider this way of solving the Subset sum problem:

def subset_summing_to_zero (activities):
  subsets = {0: []}
  for (activity, cost) in activities.iteritems():
      old_subsets = subsets
      subsets = {}
      for (prev_sum, subset) in old_subsets.iteritems():
          subsets[prev_sum] = subset
          new_sum = prev_sum + cost
          new_subset = subset + [activity]
          if 0 == new_sum:
              new_subset.sort()
              return new_subset
          else:
              subsets[new_sum] = new_subset
  return []

I have it from here:

http://news.ycombinator.com/item?id=2267392

There is also a comment which says that it is possible to make it "more efficient".

How?

Also, are there any other ways to solve the problem which are at least as fast as the one above?

Edit

I'm interested in any kind of idea which would lead to speed-up. I found:

https://en.wikipedia.org/wiki/Subset_sum_problem#cite_note-Pisinger09-2

which mentions a linear time algorithm. But I don't have the paper, perhaps you, dear people, know how it works? An implementation perhaps? Completely different approach perhaps?

Edit 2

There is now a follow-up:
Fast solution to Subset sum algorithm by Pisinger

Answer

MrGomez picture MrGomez · Mar 29, 2012

I respect the alacrity with which you're trying to solve this problem! Unfortunately, you're trying to solve a problem that's NP-complete, meaning that any further improvement that breaks the polynomial time barrier will prove that P = NP.

The implementation you pulled from Hacker News appears to be consistent with the pseudo-polytime dynamic programming solution, where any additional improvements must, by definition, progress the state of current research into this problem and all of its algorithmic isoforms. In other words: while a constant speedup is possible, you're very unlikely to see an algorithmic improvement to this solution to the problem in the context of this thread.

However, you can use an approximate algorithm if you require a polytime solution with a tolerable degree of error. In pseudocode blatantly stolen from Wikipedia, this would be:

initialize a list S to contain one element 0.
 for each i from 1 to N do
   let T be a list consisting of xi + y, for all y in S
   let U be the union of T and S
   sort U
   make S empty 
   let y be the smallest element of U 
   add y to S 
   for each element z of U in increasing order do
      //trim the list by eliminating numbers close to one another
      //and throw out elements greater than s
     if y + cs/N < z ≤ s, set y = z and add z to S 
 if S contains a number between (1 − c)s and s, output yes, otherwise no

Python implementation, preserving the original terms as closely as possible:

from bisect import bisect

def ssum(X,c,s):
    """ Simple impl. of the polytime approximate subset sum algorithm 
    Returns True if the subset exists within our given error; False otherwise 
    """
    S = [0]
    N = len(X)
    for xi in X:
        T = [xi + y for y in S]
        U = set().union(T,S)
        U = sorted(U) # Coercion to list
        S = []
        y = U[0]
        S.append(y)
        for z in U: 
            if y + (c*s)/N < z and z <= s:
                y = z
                S.append(z)
    if not c: # For zero error, check equivalence
        return S[bisect(S,s)-1] == s
    return bisect(S,(1-c)*s) != bisect(S,s)

... where X is your bag of terms, c is your precision (between 0 and 1), and s is the target sum.

For more details, see the Wikipedia article.

(Additional reference, further reading on CSTheory.SE)