Python's list comprehension vs .NET LINQ

prosseek picture prosseek · Oct 13, 2010 · Viewed 16.7k times · Source

The following simple LINQ code

string[] words = { "hello", "wonderful", "linq", "beautiful", "world" };

// Get only short words
var shortWords =
  from word in words
  where word.Length <= 5
  select word;

// Print each word out
shortWords.Dump();

can be translated into python using list comprehension as follows.

words = ["hello", "wonderful", "linq", "beautiful", "world"]
shortWords = [x for x in words if len(x) <=5]
print shortWords
  • Is LINQ just another idea to implement list comprehension?
  • What examples might be that LINQ can do but list comprehension can't do.

Answer

user395760 picture user395760 · Oct 13, 2010

(Warning: Mammoth answer ahead. The part up to the first horizontal line makes a good tl;dr section, I suppose)

I'm not sure if I qualify as Python guru... but I have a solid grasp on iteration in Python, so let's try :)

First off: Afaik, LINQ queries are executed lazily - if that's the case, generator expressions are a closer Python concept (either way, list-, dict- and set comprehensions are conceptually just generator expressions fed to the list/dict/set constructor!).

Also, there is a conceptual difference: LINQ is for, as the name says, querying data structures. List-/dict-/set comprehensions are possible application of this (e.g. filtering and projecting the items of a list). So they are in fact less general (as we will see, many things built into LINQ are not built into them). Likewise, generator expressions are a way to formulate an one-time forward iterator in-place (I like to think of it as lambda for generator functions, only without an ugly, long keyword ;) ) and not a way to describe a complex query. They overlap, yes, but they are not identical. If you want all the power of LINQ in Python, you will have to write a fully-fledged generator. Or combine the numerous powerful generators built-in and in itertools.


Now, Python counterparts for the LINQ capabilities Jon Skeet named:

Projections: (x.foo for ...)

Filtering: (... if x.bar > 5)

  • Joins (x join y on x.foo equals y.bar)

The closest thing would be((x_item, next(y_item for y_item in y if x_item.foo == y_item.bar)) for x_item in x), I suppose.

Note that this will not iterate over the whole y for each x_item, it will only get the first match.

  • Group joins (x join y on x.foo equals y.bar into g)

This is harder. Python doesn't have anonymous types, though they are trivial to do yourself if you don't mind messing with __dict__:

class Anonymous(object):
    def __init__(self, **kwargs):
        self.__dict__ = kwargs

Then, we could do (Anonymous(x=x, y=y) for ...) to get a list of objects that are have x and y members with the respective values. The right thing is usually feeding the results to the constructor of an approriate class, say, XY.

  • Grouping (group x.foo by x.bar)

Now it gets hairy... there is no build-in way, afaik. But we can define it ourself if we need it:

from collections import defaultdict

def group_by(iterable, group_func):
    groups = defaultdict(list)
    for item in iterable:
        groups[group_func(item)].append(item)
    return groups

Example:

>>> from operator import attrgetter
>>> group_by((x.foo for x in ...), attrgetter('bar'))
defaultdict(<class 'list'>, {some_value_of_bar: [x.foo of all x where x.bar == some_value_of_bar], some_other_value_of_bar: [...], ...})

This requires whatever we group by to be hashable, though. It's possible to avoid this, and I'll make a stab if there is public demand. But for now, I'm being lazy :)

We can also just return an iterable of groups without the values we grouped by, by calling .values() on the result (of course we can feed that to list to get something we can index and iterate several times). But who knows if we won't need the group values...

  • Ordering (orderby x.foo ascending, y.bar descending)

Sorting needs special syntax? The build-in sorted works for iterables, too: sorted(x % 2 for x in range(10)) or sorted(x for x in xs, key=attrgetter('foo')). Sorted ascending by default, the keyword argument reverse gives descending order.

Alas, afaik sorting by multiple attributes is not that easy, especially when mixing ascending and descending. Hmm... topic for a recipe?

  • Intermediate variables (let tmp = x.foo)

No, not possible in comprehensions or generator expressions - they are, as the name says, supposed to be expressions (and usually only span one or two lines). It's perfectly possible in generator function, though:

(x * 2 for x in iterable)

rewritten as generator with intermediate variable:

def doubles(iterable):
    for x in iterable:
        times2 = x * 2
        yield times2

Flattening: (c for s in ("aa","bb") for c in s )


Note that although LINQ to Objects deals with delegates, other query providers (e.g. LINQ to SQL) can deal in expression trees which describe the query instead of just presenting executable delegates. This allows the query to be translated into SQL (or other query languages) - again, I don't know whether Python supports that sort of thing or not. It's a significant part of LINQ though.

Python definitely does no such thing. List expressions correspond one-to-one to accumulating a plain list in a (possibly nested) for-loop, generator expressions correspond one-to-one to a generator. Given the parser and ast module, it would be possible in theory to write a library for converting a comprehension into e.g. an SQL query. But nobody cares to.