Subset sum Problem

Alberto Leal picture Alberto Leal · May 16, 2011 · Viewed 9.8k times · Source

recently I became interested in the subset-sum problem which is finding a zero-sum subset in a superset. I found some solutions on SO, in addition, I came across a particular solution which uses the dynamic programming approach. I translated his solution in python based on his qualitative descriptions. I'm trying to optimize this for larger lists which eats up a lot of my memory. Can someone recommend optimizations or other techniques to solve this particular problem? Here's my attempt in python:

import random
from time import time
from itertools import product

time0 = time()

# create a zero matrix of size a (row), b(col)
def create_zero_matrix(a,b):
    return [[0]*b for x in xrange(a)]

# generate a list of size num with random integers with an upper and lower bound
def random_ints(num, lower=-1000, upper=1000):
    return [random.randrange(lower,upper+1) for i in range(num)]

# split a list up into N and P where N be the sum of the negative values and P the sum of the positive values.
# 0 does not count because of additive identity
def split_sum(A):
    N_list = []
    P_list = []
    for x in A:
        if x < 0:
            N_list.append(x)
        elif x > 0:
            P_list.append(x)
    return [sum(N_list), sum(P_list)]

# since the column indexes are in the range from 0 to P - N
# we would like to retrieve them based on the index in the range N to P
# n := row, m := col
def get_element(table, n, m, N):
    if n < 0:
        return 0
    try:
        return table[n][m - N]
    except:
        return 0

# same definition as above
def set_element(table, n, m, N, value):
    table[n][m - N] = value

# input array
#A = [1, -3, 2, 4]
A = random_ints(200)

[N, P] = split_sum(A)

# create a zero matrix of size m (row) by n (col)
#
# m := the number of elements in A
# n := P - N + 1 (by definition N <= s <= P)
#
# each element in the matrix will be a value of either 0 (false) or 1 (true)
m = len(A)
n = P - N + 1;
table = create_zero_matrix(m, n)

# set first element in index (0, A[0]) to be true
# Definition: Q(1,s) := (x1 == s). Note that index starts at 0 instead of 1.
set_element(table, 0, A[0], N, 1)

# iterate through each table element
#for i in xrange(1, m): #row
#    for s in xrange(N, P + 1): #col
for i, s in product(xrange(1, m), xrange(N, P + 1)):
    if get_element(table, i - 1, s, N) or A[i] == s or get_element(table, i - 1, s - A[i], N):
        #set_element(table, i, s, N, 1)
        table[i][s - N] = 1

# find zero-sum subset solution
s = 0
solution = []
for i in reversed(xrange(0, m)):
    if get_element(table, i - 1, s, N) == 0 and get_element(table, i, s, N) == 1:
        s = s - A[i]
        solution.append(A[i])

print "Solution: ",solution

time1 = time()

print "Time execution: ", time1 - time0

Answer

Martin Carames Abente picture Martin Carames Abente · May 17, 2011

I'm not quite sure if your solution is exact or a PTA (poly-time approximation).

But, as someone pointed out, this problem is indeed NP-Complete.

Meaning, every known (exact) algorithm has an exponential time behavior on the size of the input.

Meaning, if you can process 1 operation in .01 nanosecond then, for a list of 59 elements it'll take:

2^59 ops -->     2^59     seconds -->     2^26      years -->      1 year
            --------------           ---------------
            10.000.000.000           3600 x 24 x 365

You can find heuristics, which give you just a CHANCE of finding an exact solution in polynomial time.

On the other side, if you restrict the problem (to another) using bounds for the values of the numbers in the set, then the problem complexity reduces to polynomial time. But even then the memory space consumed will be a polynomial of VERY High Order.
The memory consumed will be much larger than the few gigabytes you have in memory. And even much larger than the few tera-bytes on your hard drive.

( That's for small values of the bound for the value of the elements in the set )

May be this is the case of your Dynamic programing algorithm.

It seemed to me that you were using a bound of 1000 when building your initialization matrix.

You can try a smaller bound. That is... if your input is consistently consist of small values.

Good Luck!