Code Generation, SharedContracts and The Sneaky Bug

Tags: .NET, WCF, WinFX

A short discussion ensued today on the topic of Code Generation tools like CodeSmith.

Like Unit Testing, code generation is a topic that some people swear by and some reject out of hand. I'm sure this is mostly a question of getting used to the concept. Only after I had used unit tests extensively in a real project could I appreciate the real value of having them - not just occasionally running some tests, or writing some cases beforehand. I'm talking a full suite of automated tests that could be run nightly or as part of a continuous integration setup. But I digress.

Code Generation was of course very useful in .NET 1.1 for generating strongly-typed collection classes and the likes, but that aspect has been pretty much deprecated with Generics in the 2.0 framework. There are still very useful for generating boilerplate code and translating metadata information (XSD, WSDL and other contract-type information) into strongly typed classes.

WCF uses Code Generation to create proxy wrapper from WSDL, creating a class with static code based on the contract information. This is called SharedContract mode. The alternative is Shared Type mode, where the interface isn't defined as WSDL but as .NET Metadata - i.e. an interface or a class - and the proxy is based on that interface.

In a previous article I expressed a preference for SharedType when we have a closed system where we control both client and server. Ralph Squillace claims that there should be no difference as far as the developer experience is concerned - whether the proxy is generated dynamically at runtime from a shared type, or statically at compile-time by a shared contract.

The reason I disagree with this statement is the same reason I am wary of Code Generation tools in general. It's not because I don't trust them - it's true that code generation bugs can introduce subtle errors into the system, but I assume that a serious Code Generation tool will receive proper attention and QA. I had already filed one bug report on SVCUTIL's proxy generation code and it was promptly fixed.

The reason isn't that I don't trust the tool or even that I don't trust the programmer using the tool, it's that I don't trust any process. The more steps I have, the more things can go wrong. When these steps are manual, even more so.

In a Shared Type scenario, changing the contract involves three steps:

1. Update the interface.
2. Update the Service.
3. Update the client.

In a Shared Contract scenario, it's slightly different:

1. Update the interface.
2. Update service.
3. Regenerate the proxy.
4. Update the client.

(Note that #1 and #2 might be the same step, if the WSDL is generated directly from the service).

I'm leery of step #3. Not because it's hard. Not because it's long or exhausting or particularly annoying to perform - it's not much more than a menu click in Visual Studio. I'm worried about it because it is a manual step, and all manual steps are bound to be forgotten occasionally. No matter how much we worry, we are that much more likely to find ourselves with a mismatched contract between client and server.

If the mismatch is big, it will be quickly noticed. If my client tries to call an operation that doesn't exist, I'll receive an error immediately. If I changed the types of my parameters, I'll get an exception on the server.

But what if my changes are more subtle? What if I added an OperationBehavior on one end that wasn't replicated on the other? What if I added a [KnownType] on one end and forgot to synchronize it on the other?

These are errors that hard to catch, and usually manifest much later than they are introduced. This is caused by my synchronization process being manual and more likely to fail.

This is true for other Code Generation scenarios too. If my code generation template creates strongly typed classes based on my database schema, I need to make sure I rerun the generation after each change to my database. How many times have I changed a database table during development and started debugging only to have my Typed Dataset code crash on load because of incompatible schemas, just because I forgot to rerun the generation tool? What if the changes were more subtle (like changing a string length limit) and would only be apparent at some later time in a specific set of circumstances?

 

I'm not saying that SharedContract is bad. It's a necessity, of course, with open systems and interoperability scenarios. I'm not saying these problems are inevitable when generating code. A bit of discipline and common sense will go a long way. I'm just saying that leaving these holes can come back and bite us. And if we can do without them, we should.

3 Comments

  • Stuart Ballard said

    I'm not remotely familiar (heh, pun intended only in retrospect) with WCF but it seems to me that code generation should as a general rule be a standard part of the compile process for any project that uses it. I've been doing database access through a home-grown code-generation-based tool for several years now, and we've never had any problems with people forgetting to regenerate the code, because regeneration happens every single time that a developer hits "build" or "rebuild" (although naturally file timestamps are used to avoid slowing the build with too much redundant work). In an ideal world - and I believe, though I'm not completely sure, that with msbuild this might even be possible - the CG would get redone in realtime fast enough to affect intellisense in the rest of the project. But as long as it happens by compiletime, your concern about forgetting to do it is moot.

  • AvnerK said

    Even if the codegen step is run as part of the build process, it's still a step. One of the reasons the preprocessor was deprecated in C#, in my understanding, is that it introduced a disconnect between the code you see and the code that's running. Added complexity and added bugs. A codegen step as part of the build process does the same. Again, this is more of a general feeling of dread than an actual condemnation.

  • Rune FS said

    I agree with u a long way down the road. Lets avoid manual steps if at ever possible but and yes i think there's a but. U state ur self that u use unit tests and the scenarios u mention (string length for one) are exellent examples of why to use unit tests and how to construct them. let's say we have a max length of 50 characters and we later change it to 100. We have a class db that updates the value and returns the value now in the db. (dont know why u would do that irl :-) ) nice unit test code (c#/nunit) would then be string fifty = new string('a',50); string fiftyone = new string('a',51); string fiftyResult = db.Update(fifty); string fiftyoneResult db.Update(fiftyone); Assert.AreEqual(fifty,fiftyResult); Assert.AreNotEqual(fiftyone,fiftyoneResult); because we test boundary condition we can be sure that the above mentioned example with string length will be caught as will every violation of boundaries if we remember that just because we have 100% statement coverage we dont have 100% tested. My point is with prober (unit) testing code generation is safe. With out unit testing of the generated code i wouldnt use it. Im to paranoid for that :-p

Comments have been disabled for this content.