Saturday 14 April 2012

C# Lambda Expressions and Closures

 

A lambda expression is an anonymous function that can contain expressions and statements, and can be used to create delegates or expression tree types.

So lambda's are the coolest things in the world right? I sure love them, however I have discovered one or two pitfalls when working with them that I wanted to share, when I was still fairly new to them I had occasionally discovered weird behaviour and at the time didn't know what was wrong. When I encountered them again I remembered and decided to take a closer look.

Behind the scenes

First let's just look at what a lambda actually looks like behind the scenes.

Consider this simple lambda function. I have a list of actions and for each action i specify a lambda function that prints the value in the enumeration.

var items = new[] { "Foo", "Bar" };
var actions = new List<Action>();
string[] array = items;
foreach (var item in
items)
{
actions.Add(() =>
Console
.WriteLine(item));
}


foreach (var action in
actions)
{
action();
}

This is really just syntactic sugar and not really the code that is generated by the compiler. Let's look at the code that is generated by the compiler for the lambda function:

using System;
using
System.Runtime.CompilerServices;
[System.Runtime.CompilerServices.CompilerGenerated]

private sealed class
<>c__DisplayClass11
{
public string
item;
public void
<ModifiedArrayClosureForeachTest>b__f()
{
System.
Console.WriteLine(this
.item);
}
}

As you can see it is just a class with a 1 or more fields representing the variables you capture and a method that is used at the pointer to the action.

The For-Each loop pitfall

I want to show you how using lambdas in For-Each loops or in fact anywhere where you use variables that are outside of the scope can cause problems, I won't call this a bug but rather a pitfall and the compiler will not give you a warning of this.

If I run the first code snippet I notice a weird result.

Because it is clear that we have an array with 2 items Foo and Bar. I was expecting to see:

image

But instead this is what I got

image

Lets look at the code that is generated for the first code snippet for the For-Each statement loop:

string[] items = new string[]
{
"Foo"
,
"Bar"
};
List<Action> actions = new List<Action
>();
LambdaTests.<>c__DisplayClass11 <>c__DisplayClass =
new LambdaTests.<>c__DisplayClass11();
string[] array = items;
for (int
i = 0; i < array.Length; i++)
{
<>c__DisplayClass.item = array[i];
actions.Add(
new Action
(<>c__DisplayClass.<ModifiedArrayClosureForeachTest>b__f));
}

foreach (Action action in
actions)
{
action();
}

Interestingly enough the C# compiler is being really clever and realizes that this is an array converted this to a for loop to increase performance.

However look more closely to the anonymous function creation. There is only one instance and it is created above and outside of the scope of the for loop and the same item variable is overwritten each time, resulting in only the last item in the enumeration being saved.

Lets do this same test on an actual list with an enumerator.

var items = new[] { "Foo", "Bar" }.ToList();

This time the C# compiler is using an enumerator. However again the creation of the anonymous class is created outside of the For-Each loop and the same value is overwritten each time.

using (List<string>.Enumerator enumerator = items.GetEnumerator())
{
LambdaTests.<>c__DisplayClass15 <>c__DisplayClass =
new
LambdaTests.<>c__DisplayClass15();
while
(enumerator.MoveNext())
{
<>c__DisplayClass.item = enumerator.Current;
actions.Add(
new Action
(<>c__DisplayClass.<ModifiedArrayClosureForeachListTest>b__13));
}
}

Preventing the For-Each pitfall

In both cases this can be solved by assigning a new variable inside the foreach scope.

foreach (var item in items)
{
var
newItemScope = item;
actions.Add(() =>
Console
.WriteLine(newItemScope));
}

image

This time the compiler generates the code that creates a new lambda class for each iteration.

for (int i = 0; i < array.Length; i++)
{
string
item = array[i];
LambdaTests.<>c__DisplayClass10 <>c__DisplayClass =
new
LambdaTests.<>c__DisplayClass10();
<>c__DisplayClass.newItemScope = item;
actions.Add(
new Action
(<>c__DisplayClass.<ModifiedArrayClosureForeachTest>b__f));
}

Closures

Closures are the anonymous classes created for the lambda functions, that contains the methods and variables being referenced.

In the next seemingly harmless code snippet I create 3 actions independent of each other, but notice how i make a modification to sC. Because these are completely different scopes i would expect sC the last action not to be affected by changing the variable sC in a different closure, but it does.

string sA = "A";
string sB = "B";
string sC = "C";
Action
a = () =>
{
Console
.WriteLine(sA);
sC =
"D"
;
};


Action
b = () =>
{
Console
.WriteLine(sB);
};


Action
c = () =>
{
Console
.WriteLine(sC);
};

a();
b();
c();

The output:

image

The first lambda is modifying sC which is reflected in lambda c.

How is C# doing this?

Well because all the lambdas are accessing the variables on the same scope there is actually just one closure class with a method for each lambda and they all share the same fields:

LambdaTests.<>c__DisplayClass23 <>c__DisplayClass = new LambdaTests.<>c__DisplayClass23();
<>c__DisplayClass.sA =
"A"
;
<>c__DisplayClass.sB =
"B"
;
<>c__DisplayClass.sC =
"C";

Action a = new Action(<>c__DisplayClass.<LambdaInstancingTestsSameScope>b__20);
Action b = new Action(<>c__DisplayClass.<LambdaInstancingTestsSameScope>b__21);
Action c = new Action
(<>c__DisplayClass.<LambdaInstancingTestsSameScope>b__22);
a();
b();
c();

Lets restructure the scope and variables

I will move action a into its own scope and declare a local variable localSc and set the value.

string sA = "A";
string sB = "B";
string sC = "C";

var actions = new List<Action
>();

{
string localSc = "C"
;

Action
a = () =>
{
Console
.WriteLine(localSc);
localSc =
"D"
;
sC =
"D"
;
};
actions.Add(a);
}



Action
b = () =>
{
Console
.WriteLine(sB);
};
actions.Add(b);



Action
c = () =>
{
Console.WriteLine("c: "
+ sC);
};
actions.Add(c);



foreach (var action in
actions)
{
action();
}

sC is still being modified

image

However if we look at the code generated by the compiler there is a brand new class for action a (DisplayClass27)

LambdaTests.<>c__DisplayClass25 <>c__DisplayClass = new LambdaTests.<>c__DisplayClass25();
<>c__DisplayClass.sB =
"B"
;
<>c__DisplayClass.sC =
"C";
List<Action> actions = new List<Action
>();
LambdaTests.<>c__DisplayClass27 <>c__DisplayClass2 =
new
LambdaTests.<>c__DisplayClass27();
<>c__DisplayClass2.CS$<>8__locals26 = <>c__DisplayClass;
<>c__DisplayClass2.localSc =
"C";
Action a = new Action
(<>c__DisplayClass2.<LambdaInstancingTestsDiffScope>b__22);
actions.Add(a);

Action b = new Action
(<>c__DisplayClass.<LambdaInstancingTestsDiffScope>b__23);
actions.Add(b);

Action c = new Action
(<>c__DisplayClass.<LambdaInstancingTestsDiffScope>b__24);
actions.Add(c);

foreach (Action action in
actions)
{
action();
}

Now because localSc, it is in a new scope and does not exist in the main method scope it gets a new class of its own containing the field localSc, but interestingly notice the variable 8__locals26 we need to look at the class generated by the compiler to see the definition of locals26:

[System.Runtime.CompilerServices.CompilerGenerated]
private sealed class
<>c__DisplayClass27
{
public
LambdaTests.<>c__DisplayClass25 CS$<>8__locals26;
public string
localSc;
public void
<LambdaInstancingTestsDiffScope>b__22()
{
System.Console.WriteLine(
this
.localSc);
this.localSc = "D"
;
this.CS$<>8__locals26.sC = "D"
;
}
}

See that it is a pointer to the other closure and that i am still able to modify its variables through this pointer.

LINQ

So at this point you might be thinking:

This is all very interesting but why do I care if I use LINQ only.

Because Linq uses lambda expressions too so the same thing could happen:

var items = new[] { "Foo", "Bar" };
var actions = new List<Action>();

var itemsQuery = items.AsQueryable();

foreach (var item in
items)
{
if (item == "Foo"
)
{
itemsQuery = itemsQuery.Where(queryItem => queryItem == item);
}
}


foreach (var item in
itemsQuery)
{
Console
.WriteLine(item);
}

The same issue can be seen.

image

So also in fact anywhere you access a variable outside of the local scope could result in a modified closure.

More on variables and scope

So we found that the variables between these lambdas are shared, but what about the method? The strings are obviously value types to modifying the different variables will not cause the same modification in the variables of the main method.

I wanted to know too so I modified one of the code samples slightly and the results are very interesting.

string sA = "A";
string sB = "B";
string sC = "C";
string sD = "D";
Action
a = () =>
{
Console
.WriteLine(sA);
Console
.WriteLine(sC);
sC =
"D"
;
};


Action
b = () =>
{
Console
.WriteLine(sB);
};


Action
c = () =>
{
Console
.WriteLine(sC);
};

sC =
"X"
;
a();
b();
c();


Console.WriteLine(sC);
Console
.WriteLine(sD);

Notice that I declare the lambdas, then I modify sC after this point, I then run

Action a, b, and then c, finally in the main method I print out sC and sD, and I got these results:

image

So we saw previously that a new class is created for the anonymous (lambda) function and all the fields are set from the local variables, so the fact that the lambdas share the same value is of no surprise.

But how it is that I can still read and write to the same variables as the lambda functions?

NOTE: sD has a value of "D" and is never used in one of the lambda functions.

A look at the compiler generated code reveals the answer:

LambdaTests.<>c__DisplayClass23 <>c__DisplayClass = new LambdaTests.<>c__DisplayClass23();
<>c__DisplayClass.sA =
"A"
;
<>c__DisplayClass.sB =
"B"
;
<>c__DisplayClass.sC =
"C";
string sD = "D";

Action a = new Action(<>c__DisplayClass.<LambdaInstancingTestsSameScope>b__20);
Action b = new Action(<>c__DisplayClass.<LambdaInstancingTestsSameScope>b__21);
Action c = new Action
(<>c__DisplayClass.<LambdaInstancingTestsSameScope>b__22);
<>c__DisplayClass.sC =
"X"
;
a();
b();
c();

Console.WriteLine(<>c__DisplayClass.sC);
Console.WriteLine(sD);

Notice that there are no longer variables sA, sB, and sC by themselves and they are now replaced by one local variable that is the class generated for the lambda function. So the value is shared through the anonymous class instance. Then also NOTE that sD is still kept as a normal variable because it is never used in one of the lambda functions.

This has been some of my findings with lambda functions and closures, got any interesting findings of your own?

1 comment:

  1. Unfortunately I couldn't follow as there are broken images in this page.

    ReplyDelete