Getting Func-y with Lambdas

Tuesday, February 24, 2009

Let's say we've got some information stored somewhere (database, XML, file – it doesn't matter) about individuals. For simplicity, let's look at a class that represents this data:

class Person
{
    public string FirstName { get; set; }
    public string LastName { get; set; }
    public string EmailAddress { get; set; }
  
    public static Person FindByEmailAddress(string emailAddress)
    {
        // implementation omitted
    }
  
    public void Save()
    {
        // implementation omitted
    }
}

We've got some exported CSV data that needs to be merged into this set of data. However, the information from the CSV file has multiple email address' per person. Here's the class that represents each row of CSV data:

class CSVData
{
    public string FirstName { get; set; }
    public string LastName { get; set; }
    public string HomeEmail { get; set; }
    public string WorkEmail { get; set; }
}

Some people in the CSV data will have only a home email, some will have only a work email, and some will have both. I can easily get a couple of lists that contain the people with email address' defined via LINQ (note: the implementation of LoadCSVData is not important here):

IList<CSVData> csvData = LoadCSVData();
var peopleWithHomeEmail = from c in csvData where c.HomeEmail.Length > 0 select c;
var peopleWithWorkEmail = from c in csvData where c.WorkEmail.Length > 0 select c;

Now I want to loop through each list, see if the email address is already in our current data store and add ones that are not.

   1: UpdateEmails(peopleWithHomeEmail);
   2: UpdateEmails(peopleWithWorkEmail);

A simple implementation of UpdateEmails might look like this:

private void UpdateEmails(IEnumerable<CSVData> list)
{
    foreach (var dataItem in list)
    {
        Person person = Person.FindByEmailAddress(dataItem.HomeEmail);
        if (person == null)
        {
            person = new Person()
            {
                FirstName = dataItem.FirstName,
                LastName = dataItem.LastName,
                EmailAddress = dataItem.HomeEmail
            };
            person.Save();
        }
    }
}

The obvious problem with this is that it always accesses the HomeEmail address field from the CSVData. What's going to happen when I pass "peopleWithWorkEmail" to this method? Not good.

Delegates To The Rescue

This is the perfect place for a delegate. We'd like to have some function that accepts a CSVData object and returns a string – either home email or work email:

   1: private delegate string GetEmailAddress(CSVData data);

Now we can re-write our UpdateEmails method to accept a delegate that will determine which email address we'll grab:

private void UpdateEmails(IEnumerable<CSVData> list, GetEmailAddress getEmail)
{
    foreach (var dataItem in list)
    {
        Person person = Person.FindByEmailAddress(getEmail(dataItem));
        if (person == null)
        {
            person = new Person()
            {
                FirstName = dataItem.FirstName,
                LastName = dataItem.LastName,
                EmailAddress = getEmail(dataItem)
            };
            person.Save();
        }
    }
}

Thanks lambdas, we can make the calling code very clean:

   1: UpdateEmails(peopleWithHomeEmail, p => p.HomeEmail);
   2: UpdateEmails(peopleWithWorkEmail, p => p.WorkEmail);

A very clean solution. Except…

Delegate Maintenance

The only issue with this is now we have a delegate sitting around just for this simple lambda expression. At some point in time, we may want to do some date calculations from data found inside CSVData. If we had multiple dates to pick from (like multiple emails in this situation), we may have to create another delegate that accepts a CSVData and returns a DateTime. What we need is a generic way of defining a method that accepts some data type(s) and returns a specific data type (note emphasis on generic!).

Since this is such a common scenario, Microsoft has pre-defined a bunch of generic delegates that do exactly what we need.

Getty Func-y

Here's what you can use from System.Core:

Func<TResult> – This delegate takes no parameters and simply returns an object of type TResult

Func<T, TResult> – Just like Func<TResult>, but this one accepts a single parameter (T). This is exactly the situation we have in our example.

Microsoft also defines three other Func<> delegates – one that accepts 2 parameters, one that accepts 3 and finally, one that accepts 4. Anything more than four and you'd have to define your own Func<> delegate.

We can now get rid of our GetEmailAddress delegate and replace it with a Func<CSVData, string> (which is the exact same signature – a method that accepts a CSVData and returns a string):

   1: private void UpdateEmails(IEnumerable<CSVData> list, Func<CSVData, string> getEmail)

Our calling code doesn't need to change at all. We're still using the same signature, so the C# compiler can infer the delegate usage for us and generate the anonymous method. Yeah!

What about a Union

Why not use a Union along with an anonymous type? This is definitely an idea to consider. We could do this:

var home = from c in csvData where c.HomeEmail.Length > 0 select new { FirstName = c.FirstName, LastName = c.LastName, Email = c.HomeEmail };
var work = from c in csvData where c.WorkEmail.Length > 0 select new { FirstName = c.FirstName, LastName = c.LastName, Email = c.WorkEmail };
var completeList = home.Union(work);

Now we have a single list which contains everything we need. The anonymous type contains a single "Email" field which will contain either the HomeEmail or the WorkEmail. The only issue with this is that you can't pass anonymous types to methods – and I wanted the email update loop to be it's own method.

And yes, I could have created an actual type instead of an anonymous type – but then I would have an extra type laying around just to aggregate the CSVData. Kind of sounds like the reason I got rid of the delegate… :)

Technorati Tags: .NET,Lambdas,LINQ

This is an excellent use of the functional paradigm! Great job - I love seeing the .NET community moving towards functional programming and this is a great example of why we should. Functional programming lets us talk about *what* we want the computer to do instead of *how* it should do it.

You may want to watch your .Length method calls though, they will throw when the string is null. Of course the C# best practice is the hideous String.IsNullOrEmpty() method but that can always be hidden behind an extension method (which will work even if the string is null).

fdumlao - Wednesday, February 25, 2009 2:23:52 AM

fdumlao -- agreed(about the String.Length), but in this particular instance (the "real world scenario" that brought about this blog post) my data was always going to exist. Even someone without an email would have an empty string.

And thanks! I'm glad you like the post.

PSteele - Monday, March 2, 2009 5:07:41 AM

Delegates To The Rescue

Delegate Maintenance

Getty Func-y

What about a Union

2 Comments