Java's TreeSet equivalent in Python?

viksit picture viksit · Apr 26, 2010 · Viewed 27.3k times · Source

I recently came across some Java code that simply put some strings into a Java TreeSet, implemented a distance based comparator for it, and then made its merry way into the sunset to compute a given score to solve the given problem.

My questions,

  • Is there an equivalent data structure available for Python?

    • The Java treeset looks basically to be an ordered dictionary that can use a comparator of some sort to achieve this ordering.
  • I see there's a PEP for Py3K for an OrderedDict, but I'm using 2.6.x. There are a bunch of ordered dict implementations out there - anyone in particular that can be recommended?

PS, Just to add - I could probably import DictMixin or UserDict and implement my own sorted/ordered dictionary, AND make it happen through a comparator function - but that seems to be overkill.

Thanks.


Update. Thanks for the answers. To elaborate a bit, lets say I've got a compare function thats defined like, (given a particular value ln),

def mycmp(x1, y1, ln):
  a = abs(x1-ln)
  b = abs(y1-ln)
  if a<b:
    return -1
  elif a>b:
    return 1
  else:
    return 0

I'm a bit unsure about how I'd integrate this into the ordering given in the ordered dict link given here...

Something like,

OrderedDict(sorted(d.items(), cmp=mycmp(len)))

Ideas would be welcome.

Answer

unutbu picture unutbu · Apr 26, 2010

The Python 2.7 docs for collections.OrderedDict has a link to a OrderedDict recipe that runs on Python 2.4 or better.

Edit: In regard to sorting: Use key= rather than cmp=. It tends to lead to faster code and moreover, the cmp= keyword has been eliminated in Python3.

d={5:6,7:8,100:101,1:2,3:4}
print(d.items())
# [(1, 2), (3, 4), (100, 101), (5, 6), (7, 8)]

The code you posted for mycmp doesn't make it clear what you want passed as x1. Below, I assume x1 is supposed to be the value in each key-value pair. If so, you could do something like this:

length=4
print(sorted(d.items(),key=lambda item: abs(item[1]-length) ))
# [(3, 4), (1, 2), (5, 6), (7, 8), (100, 101)]

key=... is passed a function, lambda item: abs(item[1]-length). For each item in d.items(), the lambda function returns the number abs(item[1]-length). This number acts as proxy for the item as far as sorting is concerned. See this essay for more information on sorting idioms in Python.

PS. len is a Python builtin function. So as to not clobber that len, I've changed the variable name to length.