Visitor Design Pattern: revisited for .NET
This is a review of the Visitor Design Pattern, in the light of .NET and C#,
as well as how it can be used and expanded in this platform.
Classes and collections of classes are often used in OO programming. The
Visitor pattern allows the developer to define new operations on the elements
without changing their interface. We are analyzing the pattern as originally
exposed in the GoF book
[1].
GoF Review
Let's first have a look at the design pattern in general.
Motivation
Consider the following structure:
This is a very usual object structure.
Customers,
Orders
and
Items represent collection objects, and
Customer,
Order and
Item are the leaf components. Suppose we want
to perform a common operation on each element, such as
BuildReport.
Of course each element would generate different information for its own report.
We would have to add a
BuildReport method to each element in the
tree and then traverse the tree, calling that method on each element.
Wait, you might be thinking: why not to centralize the
BuildReport
method in another class, make it general enough as to work with any collection
type, and just iterate the elements and build the report there? Well, that
would require a lengthy method with a big
if..else statements
block to output different information for each element type:
public string BuildReport(ICollection collection)
{
StringWriter report = new StringWriter();
foreach (object item in collection)
{
if (item is Customer)
{
Customer customer = item as Customer;
report.WriteLine(customer.FirstName + ", " + customer.LastName);
report.WriteLine(BuildReport(customer.Orders));
}
else if (item is Order)
{
Order order = item as Order;
report.WriteLine(order.Date + " - " + order.Amount);
report.WriteLine(BuildReport(order.Items));
}
else if (item is Item)
{
Item orderitem= item as Item;
report.WriteLine(orderitem.Code + " - " + orderitem.Quantity);
}
}
report.ToString();
}
And we are building a very simple report with a quite simple hierarchy. If this
were the system's main reporting class, we would have to be prepared to make an
else if block for each type of object in the system. And what if the
passed collection contains collections in turn? Things get very intricate
really fast and we end up with spaghetti code.
Adding the
BuildReport method to the element itself simplifies
this task, and puts the logic in a place where it is easy to find and maintain.
So we decide to add the method to every class in the hierarchy.
Now suppose that we want to add another method to them, such as
DumpToFile,
which would save the object's information to disk. We would have to add the new
method to all the classes in the tree, again.
The conclusion: whenever we want to perform new operations on an existing
hierarchy of objects, we need to change all the elements' classes, and
recompile everything.
Solution
The solution is to separate the methods from the elements and add them to a
separate interface, which is called the
Visitor interface. This
interface contains a method to process each type of element in the hierarchy,
and is implemented by all the visitors, i.e. a
BuildReportVisitor,
a
DumpToFileVisitor, etc.:
public interface IVisitor
{
void VisitCustomer(Customer customer);
void VisitOrder(Order order);
void VisitItem(Item item);
}
Another implementation may use method overloading, calling all the methods
Visit
alone. The Visit method for the collection elements (
Customers,
Orders and
Items) is a design decision, not a
requirement.
In the Node hierarchy (that is, the original object structure), we replace all
the methods with a single one, which is abstracted in an interface to be
implemented in all the nodes:
public interface IVisitable
{
void Accept(IVisitor visitor);
}
For a leaf component such as
Customer, the method implementation
would be:
public virtual void Accept(IVisitor visitor)
{
visitor.VisitCustomer(this);
_orders.Accept(visitor)
}
The
Customer is calling
Accept on the
Orders
object in turn.
For a composite element such as
Customers):
public virtual void Accept(IVisitor visitor)
{
visitor.VisitCustomers(this);
foreach (IVisitable visitable in this._children)
visitable.Accept(visitor);
}
The composite element traverses its own children and calls accept on them in
turn. Note that each component is responsible of calling the right
VisitXXX
method. If we were using method overloading there would be no need for that.
Now the classes interaction is:
Note that the visitor is responsible for accumulating results and returning
them to the calling application. Now the code for building reports (in this
example) is centralized in a single class. We can add behavior to classes in
the hierarchy by creating additional visitor to pass to them.
This behavior is called a Double-Dispath pattern, because the method being
executed at last depends both on the caller's type (the visitor) and the
element's type (the visitable element). See the
[3].
Consequences
-
Adding new operations is easy
-
Related operations are centralized in a single class
-
Adding a new element is difficult, as it requires changes in all the existing
visitors
-
State can be accumulated by the visitor
-
Visitors must implement the complete interface even if they don't use the
concrete element (i.e. if BuildReportVisitor doesn't build report information
for Orders).
For more information, see the GoF book
[1].
Our .NET Implementation
We based our implementation on Jeremy Blosser's proposal
[2].
As we have seen, the visitor class can be implemented using two approaches:
-
Use one method for each product type:
we use a method name according to the product type, for example, VisitCustomer,
VisitOrder, VisitItem, etc.
As a consequence, the IVisitor interface, to be implemented by all
the visitors, needs to have one method definition for each product type. This
makes it difficult to extend the product tree, as changes to it requires
changes in the interface and all the visitor's classes.
-
Use operator overloading:
we use only one method, and let the method overloading mechanism pick up the
right method to use. We still have the problem that all the visitors must
implement a method for each product type, to avoid runtime exceptions, even if
they don't need the method at all.
Extending the pattern with Reflection
We will use the second approach (operator overloading) and simultaneously
provide an implementation that avoids the limitations for descendent classes,
that is, the requisite of implementing a method for each product type even when
it is not used.
Basically, we are going to provide a single method in the IVisitor interface:
public interface IVisitor
{
void Visit(object visitable);
}
This way we dramatically simplify the implementation requisites for visitors:
they only have to provide one obligatory method implementation. Then they can
implement the methods with the product they want to react to:
public virtual void Visit(Customer customer)
{
//Process the customer element.
}
Omiting the other products has no side effect other than ignoring it.
However, as the product receives an object of type
IVisitor, when
it calls its
Accept method passing itself as a parameter, he is
actually calling the method that receives an
object as parameter,
not the typed class type.
The effect is that the method we have just seen, is never called, and the call
is always dispatched to the
IVisitor.Visit method implementation.
To avoid this, we will provide an abstract implementation of the IVisitor
interface which will use reflection to overcome this limitation. Here is the
complete class code, which we will analyze in detail:
public abstract class Visitor : IVisitor
{
private MethodInfo _lastmethod = null;
private object _lastvisitable = null;
public void Visit(object visitable)
{
try
{
MethodInfo method = this.GetType().GetMethod("Visit",
BindingFlags.ExactBinding | BindingFlags.Public | BindingFlags.Instance,
Type.DefaultBinder, new Type[] { visitable.GetType() }, new ParameterModifier[0]);
if (method != null)
// Avoid StackOverflow exceptions by executing only if the method and visitable
// are different from the last parameters used.
if (method != _lastmethod || visitable != _lastvisitable)
{
_lastmethod = method;
_lastvisitable = visitable;
method.Invoke(this, new object[] { visitable });
}
}
catch (Exception ex)
{
if (ex.InnerException != null)
throw ex.InnerException;
throw ex;
}
}
}
The most important line here is the line where we retrieve the
MethodInfo
instance:
MethodInfo method = this.GetType().GetMethod("Visit",
BindingFlags.ExactBinding | BindingFlags.Public | BindingFlags.Instance,
Type.DefaultBinder, new Type[] { visitable.GetType() }, new ParameterModifier[0]);
First we call
GetType in the current object, which returns the
Type object corresponding to the visitor being used. The method
GetMethod
receives the following parameters:
-
name: this is the method name, "Visit".
-
bindingAttr: this is a flag of type BindingFlags which
specifies options for the member search. Public and Instance
flags must be used together, and ExactBinding is used to avoid
reentrancy on this method. This may happen when an specific product is not
implemented in the visitor, and thus we could get a match against the IVisitor.Visit
implementation, which is the method we are executing. This could result in a
stack overflow.
-
binder: this is the binder used to find the method. Passing null
in this parameter has the same effect as passing Type.DefaultBinder,
which is the default value.
-
types: an array of Type objects representing the
number, order, and type of the parameters for the method to get.
-
parameters: this parameter adds information about the types
parameter, but isn't used by the DefaultBinder, so we pass a
zero-lenght array.
After the call, we have either a null object or the
MethodInfo instance.
The next lines deal with a special case: if the visitor calls
base.Visit
in a method which is not an override (of course this is a developer's mistake),
the compiler will not complain, because the base class has a method which
satisfies the call, the
IVisitor.Visit implementation. But this
results in the
GetMethod call matching the same method we are
currently executing, causing a stack overflow. As this is a very subtle mistake
of the concrete visitor developer, we decided to prevent this from happening,
even when the right solution is to remove the wrong call to the base class
method.
This is achieved by saving a reference to the last method called and last
visitor object used. If the current ones are the same, we just skip the method
call. Beware that a change in any of the two means we have a new method call:
if (method != _lastmethod || visitable != _lastvisitable)
The method call is performed using the
MethodInfo instance,
passing the current object and the array of parameters (the visitable object):
method.Invoke(this, new object[] { visitable });
Finally, the catch block rethrows the exception, with one detail: when using
reflection to invoke a type's member, an exception thrown by it will be wrapped
inside a
TargetInvocationException. The undelying exception thrown
by the method is placed inside the
InnerException property. We
check for its presence and rethrow it if appropiate.
Samples
NMatrix XGoF (
SourceForge site) uses this visitor implementation to build a tree based on
an XSD schema. The new hierarchy, which resembles the original schema, is ready
to be visited with multiple components which perform automatic code generation
based on the element being visited. For example, there are ClassBuilder,
CollectionBuilder, PropertyBuilder and MethodBuilder visitors. Each of them is
responsible to generate code for an specific part of the output classes.
Bibliography:
[1] - Design Patterns, Elements of Reusable Object-Oriented
Software.
Amazon,
Addison-Wesley
[2] -
Reflect on the Visitor design pattern.
[3] - A Pattern Language to Visitors.