Visitor Design Pattern: revisited for .NET

This is a review of the Visitor Design Pattern, in the light of .NET and C#, as well as how it can be used and expanded in this platform.
Classes and collections of classes are often used in OO programming. The Visitor pattern allows the developer to define new operations on the elements without changing their interface. We are analyzing the pattern as originally exposed in the GoF book [1].

GoF Review

Let's first have a look at the design pattern in general.

Motivation

Consider the following structure:

Diagram of sample classes for Customer/Orders we will use in the article. My site may be offline temporarily, so be patient. If only we had small uploads for these posts... ;)

This is a very usual object structure. Customers, Orders and Items represent collection objects, and Customer, Order and Item are the leaf components. Suppose we want to perform a common operation on each element, such as BuildReport. Of course each element would generate different information for its own report. We would have to add a BuildReport method to each element in the tree and then traverse the tree, calling that method on each element.
Wait, you might be thinking: why not to centralize the BuildReport method in another class, make it general enough as to work with any collection type, and just iterate the elements and build the report there? Well, that would require a lengthy method with a big if..else statements block to output different information for each element type:
public string BuildReport(ICollection collection)
{
   StringWriter report = new StringWriter();

   foreach (object item in collection)
   { 
      if (item is Customer)
      {
         Customer customer = item as Customer; 
         report.WriteLine(customer.FirstName + ", " + customer.LastName);
         report.WriteLine(BuildReport(customer.Orders));
      } 
      else if (item is Order)
      {
         Order order = item as Order; 
         report.WriteLine(order.Date + " - " + order.Amount);
         report.WriteLine(BuildReport(order.Items));
      } 
      else if (item is Item)
      {
         Item orderitem= item as Item; 
         report.WriteLine(orderitem.Code + " - " + orderitem.Quantity);
      }
   }

   report.ToString();
}

And we are building a very simple report with a quite simple hierarchy. If this were the system's main reporting class, we would have to be prepared to make an else if block for each type of object in the system. And what if the passed collection contains collections in turn? Things get very intricate really fast and we end up with spaghetti code.
Adding the BuildReport method to the element itself simplifies this task, and puts the logic in a place where it is easy to find and maintain. So we decide to add the method to every class in the hierarchy.
Now suppose that we want to add another method to them, such as DumpToFile, which would save the object's information to disk. We would have to add the new method to all the classes in the tree, again.
The conclusion: whenever we want to perform new operations on an existing hierarchy of objects, we need to change all the elements' classes, and recompile everything.

Solution

The solution is to separate the methods from the elements and add them to a separate interface, which is called the Visitor interface. This interface contains a method to process each type of element in the hierarchy, and is implemented by all the visitors, i.e. a BuildReportVisitor, a DumpToFileVisitor, etc.:
public interface IVisitor
{
   void VisitCustomer(Customer customer);
   void VisitOrder(Order order);
   void VisitItem(Item item);
}
Another implementation may use method overloading, calling all the methods Visit alone. The Visit method for the collection elements (Customers, Orders and Items) is a design decision, not a requirement.
In the Node hierarchy (that is, the original object structure), we replace all the methods with a single one, which is abstracted in an interface to be implemented in all the nodes:
public interface IVisitable
{
   void Accept(IVisitor visitor);
}
For a leaf component such as Customer, the method implementation would be:
public virtual void Accept(IVisitor visitor)
{
   visitor.VisitCustomer(this);
   _orders.Accept(visitor)
}
The Customer is calling Accept on the Orders object in turn.
For a composite element such as Customers):
public virtual void Accept(IVisitor visitor)
{
   visitor.VisitCustomers(this);
   foreach (IVisitable visitable in this._children)
      visitable.Accept(visitor);
}
The composite element traverses its own children and calls accept on them in turn. Note that each component is responsible of calling the right VisitXXX method. If we were using method overloading there would be no need for that.
Now the classes interaction is:

Diagram of new class interaction with the visitor pattern implemented. My site may be offline temporarily, so be patient. If only we had small uploads for these posts... ;)

Note that the visitor is responsible for accumulating results and returning them to the calling application. Now the code for building reports (in this example) is centralized in a single class. We can add behavior to classes in the hierarchy by creating additional visitor to pass to them.
This behavior is called a Double-Dispath pattern, because the method being executed at last depends both on the caller's type (the visitor) and the element's type (the visitable element). See the [3].

Consequences

  • Adding new operations is easy
  • Related operations are centralized in a single class
  • Adding a new element is difficult, as it requires changes in all the existing visitors
  • State can be accumulated by the visitor
  • Visitors must implement the complete interface even if they don't use the concrete element (i.e. if BuildReportVisitor doesn't build report information for Orders).

For more information, see the GoF book [1].

Our .NET Implementation

We based our implementation on Jeremy Blosser's proposal [2].
As we have seen, the visitor class can be implemented using two approaches:
  • Use one method for each product type:
    we use a method name according to the product type, for example, VisitCustomer, VisitOrder, VisitItem, etc.
    As a consequence, the IVisitor interface, to be implemented by all the visitors, needs to have one method definition for each product type. This makes it difficult to extend the product tree, as changes to it requires changes in the interface and all the visitor's classes.
  • Use operator overloading:
    we use only one method, and let the method overloading mechanism pick up the right method to use. We still have the problem that all the visitors must implement a method for each product type, to avoid runtime exceptions, even if they don't need the method at all.

Extending the pattern with Reflection

We will use the second approach (operator overloading) and simultaneously provide an implementation that avoids the limitations for descendent classes, that is, the requisite of implementing a method for each product type even when it is not used.
Basically, we are going to provide a single method in the IVisitor interface:
public interface IVisitor 
{ 
   void Visit(object visitable); 
}
This way we dramatically simplify the implementation requisites for visitors: they only have to provide one obligatory method implementation. Then they can implement the methods with the product they want to react to:
public virtual void Visit(Customer customer) 
{ 
   //Process the customer element. 
}
Omiting the other products has no side effect other than ignoring it.
However, as the product receives an object of type IVisitor, when it calls its Accept method passing itself as a parameter, he is actually calling the method that receives an object as parameter, not the typed class type.
The effect is that the method we have just seen, is never called, and the call is always dispatched to the IVisitor.Visit method implementation.
To avoid this, we will provide an abstract implementation of the IVisitor interface which will use reflection to overcome this limitation. Here is the complete class code, which we will analyze in detail:
public abstract class Visitor : IVisitor
{
   private MethodInfo _lastmethod = null;
   private object _lastvisitable = null;
   public void Visit(object visitable)
   {
      try
      {
         MethodInfo method = this.GetType().GetMethod("Visit",  
                        BindingFlags.ExactBinding | BindingFlags.Public | BindingFlags.Instance, 
                        Type.DefaultBinder, new Type[] { visitable.GetType() }, new ParameterModifier[0]);
         if (method != null)
            // Avoid StackOverflow exceptions by executing only if the method and visitable  
            // are different from the last parameters used.
            if (method != _lastmethod || visitable != _lastvisitable)
            {
               _lastmethod = method;
               _lastvisitable = visitable;
               method.Invoke(this, new object[] { visitable });
            }
      }
      catch (Exception ex)
      {
         if (ex.InnerException != null)
            throw ex.InnerException;
         throw ex;
      }
   }
}
The most important line here is the line where we retrieve the MethodInfo instance:
MethodInfo method = this.GetType().GetMethod("Visit",  
                        BindingFlags.ExactBinding | BindingFlags.Public | BindingFlags.Instance, 
                        Type.DefaultBinder, new Type[] { visitable.GetType() }, new ParameterModifier[0]);
First we call GetType in the current object, which returns the Type object corresponding to the visitor being used. The method GetMethod receives the following parameters:
  • name: this is the method name, "Visit".
  • bindingAttr: this is a flag of type BindingFlags which specifies options for the member search. Public and Instance flags must be used together, and ExactBinding is used to avoid reentrancy on this method. This may happen when an specific product is not implemented in the visitor, and thus we could get a match against the IVisitor.Visit implementation, which is the method we are executing. This could result in a stack overflow.
  • binder: this is the binder used to find the method. Passing null in this parameter has the same effect as passing Type.DefaultBinder, which is the default value.
  • types: an array of Type objects representing the number, order, and type of the parameters for the method to get.
  • parameters: this parameter adds information about the types parameter, but isn't used by the DefaultBinder, so we pass a zero-lenght array.
After the call, we have either a null object or the MethodInfo instance. The next lines deal with a special case: if the visitor calls base.Visit in a method which is not an override (of course this is a developer's mistake), the compiler will not complain, because the base class has a method which satisfies the call, the IVisitor.Visit implementation. But this results in the GetMethod call matching the same method we are currently executing, causing a stack overflow. As this is a very subtle mistake of the concrete visitor developer, we decided to prevent this from happening, even when the right solution is to remove the wrong call to the base class method.
This is achieved by saving a reference to the last method called and last visitor object used. If the current ones are the same, we just skip the method call. Beware that a change in any of the two means we have a new method call:
if (method != _lastmethod || visitable != _lastvisitable)
The method call is performed using the MethodInfo instance, passing the current object and the array of parameters (the visitable object):
method.Invoke(this, new object[] { visitable });
Finally, the catch block rethrows the exception, with one detail: when using reflection to invoke a type's member, an exception thrown by it will be wrapped inside a TargetInvocationException. The undelying exception thrown by the method is placed inside the InnerException property. We check for its presence and rethrow it if appropiate.

Samples

NMatrix XGoF (SourceForge site) uses this visitor implementation to build a tree based on an XSD schema. The new hierarchy, which resembles the original schema, is ready to be visited with multiple components which perform automatic code generation based on the element being visited. For example, there are ClassBuilder, CollectionBuilder, PropertyBuilder and MethodBuilder visitors. Each of them is responsible to generate code for an specific part of the output classes.

Bibliography:

[1] - Design Patterns, Elements of Reusable Object-Oriented Software. Amazon, Addison-Wesley
[2] - Reflect on the Visitor design pattern.
[3] - A Pattern Language to Visitors.

5 Comments

  • Just wanted to say thanks for posting this - I found it took me a while to get my head around the benefits of the reflection usage, but now I love it. :) Many thanks.

  • Nice take on the visitor pattern! Do you have any measurements that compares the classic pattern with the reflection-based solution?

  • The interesting thing is that the reflection-based implementation could easily be upgraded to use the .NET 2.0 lighweight reflection emit API (DynamicMethod) to avoid all reflection costs altogether. By emiting the IL that will call the precise method the first time, and caching that dynamic method, you can get performance equal to that of a multi-method visitor interface.

  • Can you modify the article to show how "lighweight reflection emit API (DynamicMethod)" would be done?

  • If you google for DynamicMethod, you will find lots of articles on how to create strong-typed methods on the fly.

Comments have been disabled for this content.