Saturday, 14 April 2012

Chain Of Responsibility: Why Can't Programmers Program?

According to CodingHorror he was battling to understand why:

Like me, the author is having trouble with the fact that 199 out of 200 applicants for every programming job can't write code at all. I repeat: they can't write any code whatsoever.

In order to make sure that the applicant could write code they would ask them to perform a simple task:

Write a program that prints the numbers from 1 to 100. But for multiples of three print "Fizz" instead of the number and for the multiples of five print "Buzz". For numbers which are multiples of both three and five print "FizzBuzz".

Most good programmers should be able to write out on paper a program which does this in a under a couple of minutes.

But Apparently:

Want to know something scary? The majority of comp sci graduates can't. I've also seen self-proclaimed senior programmers take more than 10-15 minutes to write a solution.

So apparently a number of companies use this question to quickly eliminate those who can't write any code. When I first heard of this I was surprised and thought what is the catch.

Would the person asking this be expecting some really fancy implementation?

But what the person is really looking for is to write it so that it just works exactly as requested

You can look at this page or search for examples which are available in just about every language out there, but here is the C# code:

public void Main()
{
    for (int
i = 1; i <= 100; i++)
    {
        FizzBuzz(i);
    }
}


public void FizzBuzz(int
value)
{
    if
(value % 3 == 0 && value % 5 == 0)
    {
        Console.WriteLine("FizzBuzz"
);
    }
    else if
(value % 3 == 0)
    {
       Console.WriteLine("Fizz"
);
    }
    else if
(value % 5 == 0)
    {
       Console.WriteLine("Buzz"
);
    }
    else
    {
       Console
.WriteLine(value);
    }
}

Now you can certainly try and get more fancy about this and some have certainly done so, but I thought it would be valuable to be able to write this and then say what fundamental principle are you violating and how to solve it.

The Chain Of Responsibility

The above example can be thought of in a real world application with various conditions for each of the inputs and providing a different implementation for each. And in this case this is a violation of a fundamental principle called the Open/closed principle

And in the above implementation we might want to add a check for the number divisible by 10 and print out Bazz. This certainly means our function is not Closed for modifications as we would have to add another code block and modify the if-else logic.

One solution to this problem would be to create one strategy per implementation and use the Chain-of-responsibility pattern

First create a printer object for any ancestor in the chain:

public abstract class Printer
{
    private Printer
_Next;
    public Printer SetNext(Printer
next)
    {
       this
._Next = next;
       return
next;
    }

    protected abstract bool HandlePrint(int
value);

    public virtual bool Print(int
value)
    {
        if (!this
.HandlePrint(value))
        {
           if (this._Next != null
)
           {
              return this
._Next.Print(value);
           }
           return false
;
        }
        return true
;
    }
}

The abstract method HandlePrint gets a value and it is up to the strategy to decide whether it supports the given value and handles the request. It can either stop processing or let the processing continue further down the chain.

And then we would have an implementation for each of the printing strategies:

public class FizzBuzzPrinter : Printer
{
    protected override bool HandlePrint(int
value)
    {
       if
(value % 3 == 0 && value % 5 == 0)
       {
           Console.WriteLine("FizzBuzz"
);
           return true
;
        }
        return false
;
    }
}


public class FizzPrinter : Printer
{
    protected override bool HandlePrint(int
value)
    {
        if
(value % 3 == 0)
        {
            Console.WriteLine("Fizz"
);
            return true
;
        }
        return false
;
    }
}


public class BuzzPrinter : Printer
{
    protected override bool HandlePrint(int
value)
    {
       if
(value % 5 == 0)
       {
           Console.WriteLine("Buzz"
);
           return true
;
       }
       return false
;
    }
}

Also we can now very easily add the ability to check for 10 and print Bazz by registering a new handler.

public class BazzPrinter : Printer
{
    protected override bool HandlePrint(int
value)
    {
        if
(value % 10 == 0)
        {
            Console.WriteLine("Bazz"
);
            return true
;
        }
        return false
;
     }
}

Now our implementation of our printing engine can be closed to modifications:

public void Main()
{
    Printer printer = new FizzBuzzPrinter
();
        printer
          .SetNext(
new BazzPrinter
())
          .SetNext(
new FizzPrinter
())
          .SetNext(
new BuzzPrinter
());

// Add bazz printer

    for (int
i = 1; i <= 100; i++)
    {
        Console.Write(string.Format("{0}:"
, i));

        if
(!printer.Print(i))
        {
            Console
.WriteLine();
        }
     }
}

image

So there you have it the solution can and is simple initially provided you know what the potential violation is and problems that can be caused down the line. And once you made more than 2-3 modifications to it its time to think about refactoring it.

Live writer was harmed 5 times during the making of this post.

C# Lambda Expressions and Closures

 

A lambda expression is an anonymous function that can contain expressions and statements, and can be used to create delegates or expression tree types.

So lambda's are the coolest things in the world right? I sure love them, however I have discovered one or two pitfalls when working with them that I wanted to share, when I was still fairly new to them I had occasionally discovered weird behaviour and at the time didn't know what was wrong. When I encountered them again I remembered and decided to take a closer look.

Behind the scenes

First let's just look at what a lambda actually looks like behind the scenes.

Consider this simple lambda function. I have a list of actions and for each action i specify a lambda function that prints the value in the enumeration.

var items = new[] { "Foo", "Bar" };
var actions = new List<Action>();
string[] array = items;
foreach (var item in
items)
{
actions.Add(() =>
Console
.WriteLine(item));
}


foreach (var action in
actions)
{
action();
}

This is really just syntactic sugar and not really the code that is generated by the compiler. Let's look at the code that is generated by the compiler for the lambda function:

using System;
using
System.Runtime.CompilerServices;
[System.Runtime.CompilerServices.CompilerGenerated]

private sealed class
<>c__DisplayClass11
{
public string
item;
public void
<ModifiedArrayClosureForeachTest>b__f()
{
System.
Console.WriteLine(this
.item);
}
}

As you can see it is just a class with a 1 or more fields representing the variables you capture and a method that is used at the pointer to the action.

The For-Each loop pitfall

I want to show you how using lambdas in For-Each loops or in fact anywhere where you use variables that are outside of the scope can cause problems, I won't call this a bug but rather a pitfall and the compiler will not give you a warning of this.

If I run the first code snippet I notice a weird result.

Because it is clear that we have an array with 2 items Foo and Bar. I was expecting to see:

image

But instead this is what I got

image

Lets look at the code that is generated for the first code snippet for the For-Each statement loop:

string[] items = new string[]
{
"Foo"
,
"Bar"
};
List<Action> actions = new List<Action
>();
LambdaTests.<>c__DisplayClass11 <>c__DisplayClass =
new LambdaTests.<>c__DisplayClass11();
string[] array = items;
for (int
i = 0; i < array.Length; i++)
{
<>c__DisplayClass.item = array[i];
actions.Add(
new Action
(<>c__DisplayClass.<ModifiedArrayClosureForeachTest>b__f));
}

foreach (Action action in
actions)
{
action();
}

Interestingly enough the C# compiler is being really clever and realizes that this is an array converted this to a for loop to increase performance.

However look more closely to the anonymous function creation. There is only one instance and it is created above and outside of the scope of the for loop and the same item variable is overwritten each time, resulting in only the last item in the enumeration being saved.

Lets do this same test on an actual list with an enumerator.

var items = new[] { "Foo", "Bar" }.ToList();

This time the C# compiler is using an enumerator. However again the creation of the anonymous class is created outside of the For-Each loop and the same value is overwritten each time.

using (List<string>.Enumerator enumerator = items.GetEnumerator())
{
LambdaTests.<>c__DisplayClass15 <>c__DisplayClass =
new
LambdaTests.<>c__DisplayClass15();
while
(enumerator.MoveNext())
{
<>c__DisplayClass.item = enumerator.Current;
actions.Add(
new Action
(<>c__DisplayClass.<ModifiedArrayClosureForeachListTest>b__13));
}
}

Preventing the For-Each pitfall

In both cases this can be solved by assigning a new variable inside the foreach scope.

foreach (var item in items)
{
var
newItemScope = item;
actions.Add(() =>
Console
.WriteLine(newItemScope));
}

image

This time the compiler generates the code that creates a new lambda class for each iteration.

for (int i = 0; i < array.Length; i++)
{
string
item = array[i];
LambdaTests.<>c__DisplayClass10 <>c__DisplayClass =
new
LambdaTests.<>c__DisplayClass10();
<>c__DisplayClass.newItemScope = item;
actions.Add(
new Action
(<>c__DisplayClass.<ModifiedArrayClosureForeachTest>b__f));
}

Closures

Closures are the anonymous classes created for the lambda functions, that contains the methods and variables being referenced.

In the next seemingly harmless code snippet I create 3 actions independent of each other, but notice how i make a modification to sC. Because these are completely different scopes i would expect sC the last action not to be affected by changing the variable sC in a different closure, but it does.

string sA = "A";
string sB = "B";
string sC = "C";
Action
a = () =>
{
Console
.WriteLine(sA);
sC =
"D"
;
};


Action
b = () =>
{
Console
.WriteLine(sB);
};


Action
c = () =>
{
Console
.WriteLine(sC);
};

a();
b();
c();

The output:

image

The first lambda is modifying sC which is reflected in lambda c.

How is C# doing this?

Well because all the lambdas are accessing the variables on the same scope there is actually just one closure class with a method for each lambda and they all share the same fields:

LambdaTests.<>c__DisplayClass23 <>c__DisplayClass = new LambdaTests.<>c__DisplayClass23();
<>c__DisplayClass.sA =
"A"
;
<>c__DisplayClass.sB =
"B"
;
<>c__DisplayClass.sC =
"C";

Action a = new Action(<>c__DisplayClass.<LambdaInstancingTestsSameScope>b__20);
Action b = new Action(<>c__DisplayClass.<LambdaInstancingTestsSameScope>b__21);
Action c = new Action
(<>c__DisplayClass.<LambdaInstancingTestsSameScope>b__22);
a();
b();
c();

Lets restructure the scope and variables

I will move action a into its own scope and declare a local variable localSc and set the value.

string sA = "A";
string sB = "B";
string sC = "C";

var actions = new List<Action
>();

{
string localSc = "C"
;

Action
a = () =>
{
Console
.WriteLine(localSc);
localSc =
"D"
;
sC =
"D"
;
};
actions.Add(a);
}



Action
b = () =>
{
Console
.WriteLine(sB);
};
actions.Add(b);



Action
c = () =>
{
Console.WriteLine("c: "
+ sC);
};
actions.Add(c);



foreach (var action in
actions)
{
action();
}

sC is still being modified

image

However if we look at the code generated by the compiler there is a brand new class for action a (DisplayClass27)

LambdaTests.<>c__DisplayClass25 <>c__DisplayClass = new LambdaTests.<>c__DisplayClass25();
<>c__DisplayClass.sB =
"B"
;
<>c__DisplayClass.sC =
"C";
List<Action> actions = new List<Action
>();
LambdaTests.<>c__DisplayClass27 <>c__DisplayClass2 =
new
LambdaTests.<>c__DisplayClass27();
<>c__DisplayClass2.CS$<>8__locals26 = <>c__DisplayClass;
<>c__DisplayClass2.localSc =
"C";
Action a = new Action
(<>c__DisplayClass2.<LambdaInstancingTestsDiffScope>b__22);
actions.Add(a);

Action b = new Action
(<>c__DisplayClass.<LambdaInstancingTestsDiffScope>b__23);
actions.Add(b);

Action c = new Action
(<>c__DisplayClass.<LambdaInstancingTestsDiffScope>b__24);
actions.Add(c);

foreach (Action action in
actions)
{
action();
}

Now because localSc, it is in a new scope and does not exist in the main method scope it gets a new class of its own containing the field localSc, but interestingly notice the variable 8__locals26 we need to look at the class generated by the compiler to see the definition of locals26:

[System.Runtime.CompilerServices.CompilerGenerated]
private sealed class
<>c__DisplayClass27
{
public
LambdaTests.<>c__DisplayClass25 CS$<>8__locals26;
public string
localSc;
public void
<LambdaInstancingTestsDiffScope>b__22()
{
System.Console.WriteLine(
this
.localSc);
this.localSc = "D"
;
this.CS$<>8__locals26.sC = "D"
;
}
}

See that it is a pointer to the other closure and that i am still able to modify its variables through this pointer.

LINQ

So at this point you might be thinking:

This is all very interesting but why do I care if I use LINQ only.

Because Linq uses lambda expressions too so the same thing could happen:

var items = new[] { "Foo", "Bar" };
var actions = new List<Action>();

var itemsQuery = items.AsQueryable();

foreach (var item in
items)
{
if (item == "Foo"
)
{
itemsQuery = itemsQuery.Where(queryItem => queryItem == item);
}
}


foreach (var item in
itemsQuery)
{
Console
.WriteLine(item);
}

The same issue can be seen.

image

So also in fact anywhere you access a variable outside of the local scope could result in a modified closure.

More on variables and scope

So we found that the variables between these lambdas are shared, but what about the method? The strings are obviously value types to modifying the different variables will not cause the same modification in the variables of the main method.

I wanted to know too so I modified one of the code samples slightly and the results are very interesting.

string sA = "A";
string sB = "B";
string sC = "C";
string sD = "D";
Action
a = () =>
{
Console
.WriteLine(sA);
Console
.WriteLine(sC);
sC =
"D"
;
};


Action
b = () =>
{
Console
.WriteLine(sB);
};


Action
c = () =>
{
Console
.WriteLine(sC);
};

sC =
"X"
;
a();
b();
c();


Console.WriteLine(sC);
Console
.WriteLine(sD);

Notice that I declare the lambdas, then I modify sC after this point, I then run

Action a, b, and then c, finally in the main method I print out sC and sD, and I got these results:

image

So we saw previously that a new class is created for the anonymous (lambda) function and all the fields are set from the local variables, so the fact that the lambdas share the same value is of no surprise.

But how it is that I can still read and write to the same variables as the lambda functions?

NOTE: sD has a value of "D" and is never used in one of the lambda functions.

A look at the compiler generated code reveals the answer:

LambdaTests.<>c__DisplayClass23 <>c__DisplayClass = new LambdaTests.<>c__DisplayClass23();
<>c__DisplayClass.sA =
"A"
;
<>c__DisplayClass.sB =
"B"
;
<>c__DisplayClass.sC =
"C";
string sD = "D";

Action a = new Action(<>c__DisplayClass.<LambdaInstancingTestsSameScope>b__20);
Action b = new Action(<>c__DisplayClass.<LambdaInstancingTestsSameScope>b__21);
Action c = new Action
(<>c__DisplayClass.<LambdaInstancingTestsSameScope>b__22);
<>c__DisplayClass.sC =
"X"
;
a();
b();
c();

Console.WriteLine(<>c__DisplayClass.sC);
Console.WriteLine(sD);

Notice that there are no longer variables sA, sB, and sC by themselves and they are now replaced by one local variable that is the class generated for the lambda function. So the value is shared through the anonymous class instance. Then also NOTE that sD is still kept as a normal variable because it is never used in one of the lambda functions.

This has been some of my findings with lambda functions and closures, got any interesting findings of your own?