Why does Python's itertools.permutations contain duplicates? (When the original list has duplicates)

ShreevatsaR picture ShreevatsaR · Jun 30, 2011 · Viewed 13.5k times · Source

It is universally agreed that a list of n distinct symbols has n! permutations. However, when the symbols are not distinct, the most common convention, in mathematics and elsewhere, seems to be to count only distinct permutations. Thus the permutations of the list [1, 1, 2] are usually considered to be
[1, 1, 2], [1, 2, 1], [2, 1, 1]. Indeed, the following C++ code prints precisely those three:

int a[] = {1, 1, 2};
do {
    cout<<a[0]<<" "<<a[1]<<" "<<a[2]<<endl;
} while(next_permutation(a,a+3));

On the other hand, Python's itertools.permutations seems to print something else:

import itertools
for a in itertools.permutations([1, 1, 2]):
    print a

This prints

(1, 1, 2)
(1, 2, 1)
(1, 1, 2)
(1, 2, 1)
(2, 1, 1)
(2, 1, 1)

As user Artsiom Rudzenka pointed out in an answer, the Python documentation says so:

Elements are treated as unique based on their position, not on their value.

My question: why was this design decision made?

It seems that following the usual convention would give results that are more useful (and indeed it is usually exactly what I want)... or is there some application of Python's behaviour that I'm missing?

[Or is it some implementation issue? The algorithm as in next_permutation — for instance explained on StackOverflow here (by me) and shown here to be O(1) amortised — seems efficient and implementable in Python, but is Python doing something even more efficient since it doesn't guarantee lexicographic order based on value? And if so, was the increase in efficiency considered worth it?]

Answer

Gareth Rees picture Gareth Rees · Jun 30, 2011

I can't speak for the designer of itertools.permutations (Raymond Hettinger), but it seems to me that there are a couple of points in favour of the design:

First, if you used a next_permutation-style approach, then you'd be restricted to passing in objects that support a linear ordering. Whereas itertools.permutations provides permutations of any kind of object. Imagine how annoying this would be:

>>> list(itertools.permutations([1+2j, 1-2j, 2+j, 2-j]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: no ordering relation is defined for complex numbers

Second, by not testing for equality on objects, itertools.permutations avoids paying the cost of calling the __eq__ method in the usual case where it's not necessary.

Basically, itertools.permutations solves the common case reliably and cheaply. There's certainly an argument to be made that itertools ought to provide a function that avoids duplicate permutations, but such a function should be in addition to itertools.permutations, not instead of it. Why not write such a function and submit a patch?