Quick way to extend a set if we know elements are unique

Stewart_R picture Stewart_R · Jun 5, 2013 · Viewed 45.6k times · Source

I am performing multiple iterations of the type:

masterSet=masterSet.union(setA)

As the set grows the length of time taken to perform these operations is growing (as one would expect, I guess).

I expect that the time is taken up checking whether each element of setA is already in masterSet?

My question is that if i KNOW that masterSet does not already contain any of elements in setA can I do this quicker?

[UPDATE]

Given that this question is still attracting views I thought I would clear up a few of the things from the comments and answers below:

When iterating though there were many iterations where I knew setA would be distinct from masterSet because of how it was constructed (without having to process any checks) but a few iterations I needed the uniqueness check.

I wondered if there was a way to 'tell' the masterSet.union() procedure not to bother with the uniquness check this time around as I know this one is distinct from masterSet just add these elements quickly trusting the programmer's assertion they were definately distict. Perhpas through calling some different ".unionWithDistinctSet()" procedure or something.

I think the responses have suggested that this isnt possible (and that really set operations should be quick enough anyway) but to use masterSet.update(setA) instead of union as its slightly quicker still.

I have accepted the clearest reponse along those lines, resolved the issue I was having at the time and got on with my life but would still love to hear if my hypothesised .unionWithDistinctSet() could ever exist?

Answer

mgilson picture mgilson · Jun 5, 2013

You can use set.update to update your master set in place. This saves allocating a new set all the time so it should be a little faster than set.union...

>>> s = set(range(3))
>>> s.update(range(4))
>>> s
set([0, 1, 2, 3])

Of course, if you're doing this in a loop:

masterSet = set()
for setA in iterable:
    masterSet = masterSet.union(setA)

You might get a performance boost by doing something like:

masterSet = set().union(*iterable)

Ultimately, membership testing of a set is O(1) (in the average case), so testing if the element is already contained in the set isn't really a big performance hit.