What does Expression.Quote() do that Expression.Constant() can’t already do?

Timwi picture Timwi · Sep 15, 2010 · Viewed 10.8k times · Source

Note: I am aware of the earlier question “What is the purpose of LINQ's Expression.Quote method?, but if you read on you will see that it doesn’t answer my question.

I understand what the stated purpose of Expression.Quote() is. However, Expression.Constant() can be used for the same purpose (in addition to all the purposes that Expression.Constant() is already used for). Therefore, I don’t understand why Expression.Quote() is at all required.

To demonstrate this, I have written a quick example where one would customarily use Quote (see the line marked with exclamation points), but I used Constant instead and it worked equally well:

string[] array = { "one", "two", "three" };

// This example constructs an expression tree equivalent to the lambda:
// str => str.AsQueryable().Any(ch => ch == 'e')

Expression<Func<char, bool>> innerLambda = ch => ch == 'e';

var str = Expression.Parameter(typeof(string), "str");
var expr =
    Expression.Lambda<Func<string, bool>>(
        Expression.Call(typeof(Queryable), "Any", new Type[] { typeof(char) },
            Expression.Call(typeof(Queryable), "AsQueryable",
                            new Type[] { typeof(char) }, str),
            // !!!
            Expression.Constant(innerLambda)    // <--- !!!
        ),
        str
    );

// Works like a charm (prints one and three)
foreach (var str in array.AsQueryable().Where(expr))
    Console.WriteLine(str);

The output of expr.ToString() is the same for both, too (whether I use Constant or Quote).

Given the above observations, it appears that Expression.Quote() is redundant. The C# compiler could have been made to compile nested lambda expressions into an expression tree involving Expression.Constant() instead of Expression.Quote(), and any LINQ query provider that wants to process expression trees into some other query language (such as SQL) could look out for a ConstantExpression with type Expression<TDelegate> instead of a UnaryExpression with the special Quote node type, and everything else would be the same.

What am I missing? Why was Expression.Quote() and the special Quote node type for UnaryExpression invented?

Answer

Eric Lippert picture Eric Lippert · Sep 20, 2010

Short answer:

The quote operator is an operator which induces closure semantics on its operand. Constants are just values.

Quotes and constants have different meanings and therefore have different representations in an expression tree. Having the same representation for two very different things is extremely confusing and bug prone.

Long answer:

Consider the following:

(int s)=>(int t)=>s+t

The outer lambda is a factory for adders that are bound to the outer lambda's parameter.

Now, suppose we wish to represent this as an expression tree that will later be compiled and executed. What should the body of the expression tree be? It depends on whether you want the compiled state to return a delegate or an expression tree.

Let's begin by dismissing the uninteresting case. If we wish it to return a delegate then the question of whether to use Quote or Constant is a moot point:

        var ps = Expression.Parameter(typeof(int), "s");
        var pt = Expression.Parameter(typeof(int), "t");
        var ex1 = Expression.Lambda(
                Expression.Lambda(
                    Expression.Add(ps, pt),
                pt),
            ps);

        var f1a = (Func<int, Func<int, int>>) ex1.Compile();
        var f1b = f1a(100);
        Console.WriteLine(f1b(123));

The lambda has a nested lambda; the compiler generates the interior lambda as a delegate to a function closed over the state of the function generated for the outer lambda. We need consider this case no more.

Suppose we wish the compiled state to return an expression tree of the interior. There are two ways to do that: the easy way and the hard way.

The hard way is to say that instead of

(int s)=>(int t)=>s+t

what we really mean is

(int s)=>Expression.Lambda(Expression.Add(...

And then generate the expression tree for that, producing this mess:

        Expression.Lambda(
            Expression.Call(typeof(Expression).GetMethod("Lambda", ...

blah blah blah, dozens of lines of reflection code to make the lambda. The purpose of the quote operator is to tell the expression tree compiler that we want the given lambda to be treated as an expression tree, not as a function, without having to explicitly generate the expression tree generation code.

The easy way is:

        var ex2 = Expression.Lambda(
            Expression.Quote(
                Expression.Lambda(
                    Expression.Add(ps, pt),
                pt)),
            ps);

        var f2a = (Func<int, Expression<Func<int, int>>>)ex2.Compile();
        var f2b = f2a(200).Compile();
        Console.WriteLine(f2b(123));

And indeed, if you compile and run this code you get the right answer.

Notice that the quote operator is the operator which induces closure semantics on the interior lambda which uses an outer variable, a formal parameter of the outer lambda.

The question is: why not eliminate Quote and make this do the same thing?

        var ex3 = Expression.Lambda(
            Expression.Constant(
                Expression.Lambda(
                    Expression.Add(ps, pt),
                pt)),
            ps);

        var f3a = (Func<int, Expression<Func<int, int>>>)ex3.Compile();
        var f3b = f3a(300).Compile();
        Console.WriteLine(f3b(123));

The constant does not induce closure semantics. Why should it? You said that this was a constant. It's just a value. It should be perfect as handed to the compiler; the compiler should be able to just generate a dump of that value to the stack where it is needed.

Since there is no closure induced, if you do this you'll get a "variable 's' of type 'System.Int32' is not defined" exception on the invocation.

(Aside: I've just reviewed the code generator for delegate creation from quoted expression trees, and unfortunately a comment that I put into the code back in 2006 is still there. FYI, the hoisted outer parameter is snapshotted into a constant when the quoted expression tree is reified as a delegate by the runtime compiler. There was a good reason why I wrote the code that way which I do not recall at this exact moment, but it does have the nasty side effect of introducing closure over values of outer parameters rather than closure over variables. Apparently the team which inherited that code decided to not fix that flaw, so if you are relying upon mutation of a closed-over outer parameter being observed in a compiled quoted interior lambda, you're going to be disappointed. However, since it is a pretty bad programming practice to both (1) mutate a formal parameter and (2) rely upon mutation of an outer variable, I would recommend that you change your program to not use these two bad programming practices, rather than waiting for a fix which does not appear to be forthcoming. Apologies for the error.)

So, to repeat the question:

The C# compiler could have been made to compile nested lambda expressions into an expression tree involving Expression.Constant() instead of Expression.Quote(), and any LINQ query provider that wants to process expression trees into some other query language (such as SQL) could look out for a ConstantExpression with type Expression instead of a UnaryExpression with the special Quote node type, and everything else would be the same.

You are correct. We could encode semantic information that means "induce closure semantics on this value" by using the type of the constant expression as a flag.

"Constant" would then have the meaning "use this constant value, unless the type happens to be an expression tree type and the value is a valid expression tree, in which case, instead use the value that is the expression tree resulting from rewriting the interior of the given expression tree to induce closure semantics in the context of any outer lambdas that we might be in right now.

But why would we do that crazy thing? The quote operator is an insanely complicated operator, and it should be used explicitly if you're going to use it. You're suggesting that in order to be parsimonious about not adding one extra factory method and node type amongst the several dozen already there, that we add a bizarre corner case to constants, so that constants are sometimes logically constants, and sometimes they are rewritten lambdas with closure semantics.

It would also have the somewhat odd effect that constant doesn't mean "use this value". Suppose for some bizarre reason you wanted the third case above to compile an expression tree into a delegate that hands out an expression tree that has a not-rewritten reference to an outer variable? Why? Perhaps because you are testing your compiler and want to just pass the constant on through so that you can perform some other analysis on it later. Your proposal would make that impossible; any constant that happens to be of expression tree type would be rewritten regardless. One has a reasonable expectation that "constant" means "use this value". "Constant" is a "do what I say" node. The constant processor's job is not to guess at what you meant to say based on the type.

And note of course that you are now putting the burden of understanding (that is, understanding that constant has complicated semantics that mean "constant" in one case and "induce closure semantics" based on a flag that is in the type system) upon every provider that does semantic analysis of an expression tree, not just upon Microsoft providers. How many of those third-party providers would get it wrong?

"Quote" is waving a big red flag that says "hey buddy, look over here, I'm a nested lambda expression and I have wacky semantics if I'm closed over an outer variable!" whereas "Constant" is saying "I'm nothing more than a value; use me as you see fit." When something is complicated and dangerous we want to be making it wave red flags, not hiding that fact by making the user dig through the type system in order to find out whether this value is a special one or not.

Furthermore, the idea that avoiding redundancy is even a goal is incorrect. Sure, avoiding unnecessary, confusing redundancy is a goal, but most redundancy is a good thing; redundancy creates clarity. New factory methods and node kinds are cheap. We can make as many as we need so that each one represents one operation cleanly. We have no need to resort to nasty tricks like "this means one thing unless this field is set to this thing, in which case it means something else."