Miscellaneous Debris

Avner Kashtan's Frustrations and Exultations

August 2006 - Posts

Code Generation, SharedContracts and The Sneaky Bug

A short discussion ensued today on the topic of Code Generation tools like CodeSmith.

Like Unit Testing, code generation is a topic that some people swear by and some reject out of hand. I'm sure this is mostly a question of getting used to the concept. Only after I had used unit tests extensively in a real project could I appreciate the real value of having them - not just occasionally running some tests, or writing some cases beforehand. I'm talking a full suite of automated tests that could be run nightly or as part of a continuous integration setup. But I digress.

Code Generation was of course very useful in .NET 1.1 for generating strongly-typed collection classes and the likes, but that aspect has been pretty much deprecated with Generics in the 2.0 framework. There are still very useful for generating boilerplate code and translating metadata information (XSD, WSDL and other contract-type information) into strongly typed classes.

WCF uses Code Generation to create proxy wrapper from WSDL, creating a class with static code based on the contract information. This is called SharedContract mode. The alternative is Shared Type mode, where the interface isn't defined as WSDL but as .NET Metadata - i.e. an interface or a class - and the proxy is based on that interface.

In a previous article I expressed a preference for SharedType when we have a closed system where we control both client and server. Ralph Squillace claims that there should be no difference as far as the developer experience is concerned - whether the proxy is generated dynamically at runtime from a shared type, or statically at compile-time by a shared contract.

The reason I disagree with this statement is the same reason I am wary of Code Generation tools in general. It's not because I don't trust them - it's true that code generation bugs can introduce subtle errors into the system, but I assume that a serious Code Generation tool will receive proper attention and QA. I had already filed one bug report on SVCUTIL's proxy generation code and it was promptly fixed.

The reason isn't that I don't trust the tool or even that I don't trust the programmer using the tool, it's that I don't trust any process. The more steps I have, the more things can go wrong. When these steps are manual, even more so.

In a Shared Type scenario, changing the contract involves three steps:

1. Update the interface.
2. Update the Service.
3. Update the client.

In a Shared Contract scenario, it's slightly different:

1. Update the interface.
2. Update service.
3. Regenerate the proxy.
4. Update the client.

(Note that #1 and #2 might be the same step, if the WSDL is generated directly from the service).

I'm leery of step #3. Not because it's hard. Not because it's long or exhausting or particularly annoying to perform - it's not much more than a menu click in Visual Studio. I'm worried about it because it is a manual step, and all manual steps are bound to be forgotten occasionally. No matter how much we worry, we are that much more likely to find ourselves with a mismatched contract between client and server.

If the mismatch is big, it will be quickly noticed. If my client tries to call an operation that doesn't exist, I'll receive an error immediately. If I changed the types of my parameters, I'll get an exception on the server.

But what if my changes are more subtle? What if I added an OperationBehavior on one end that wasn't replicated on the other? What if I added a [KnownType] on one end and forgot to synchronize it on the other?

These are errors that hard to catch, and usually manifest much later than they are introduced. This is caused by my synchronization process being manual and more likely to fail.

This is true for other Code Generation scenarios too. If my code generation template creates strongly typed classes based on my database schema, I need to make sure I rerun the generation after each change to my database. How many times have I changed a database table during development and started debugging only to have my Typed Dataset code crash on load because of incompatible schemas, just because I forgot to rerun the generation tool? What if the changes were more subtle (like changing a string length limit) and would only be apparent at some later time in a specific set of circumstances?

 

I'm not saying that SharedContract is bad. It's a necessity, of course, with open systems and interoperability scenarios. I'm not saying these problems are inevitable when generating code. A bit of discipline and common sense will go a long way. I'm just saying that leaving these holes can come back and bite us. And if we can do without them, we should.

Posted Wednesday, August 16, 2006 6:28 PM by AvnerK | 3 comment(s)

Filed under: , ,

Creating CustomBindings programatically

I had another revelation earlier, when going over Nicholas Allen's explanation of the GetProperty<T> method and BindingContexts:

The Binding objects supplied by the WCF framework are only a wrapper. They have no logic in them. They mean nothing. They do nothing except allow access to the internal properties of their BindingElements. The BindingElements are the ones who do the real work.

There. It's a bit harsh, but it had to be said. Took me a while to wrap my head around it. It's syntactic sugar, since it's easier to do this:

NetTcpBinding binding = new NetTcpBinding();
binding.ReaderQuotas.MaxArrayLength = 512000;

than:

CustomBinding binding = new CustomBinding(new TransactionFlowBindingElement(), new BinaryMessageEncodingBindingElement(), new WindowsStreamSecurityBindingElement(), new TcpTransportBindingElement());
new BindingContext(binding, new BindingParameterCollection()).GetInnerPropertyProperty<XmlDictionaryReaderQuotas>().MaxArrayLength = 512000;

The two, however, are equivalent. You can think of the NetTcpBinding class as a pre-selected 'kit' with several predefined selections, whereas the CustomBinding is a do-it-yourself set. The basic building blocks, though, are the same.

The problem with this is that if I want to deviate from the given settings, I have to start from scratch. Let's say I have the NetTcpBinding which by default uses the BinaryMessageEncodingBindingElement. If I want to replace that with a TextMessageEncodingBindingElement for some reason, or maybe the CompressionMessageEncodingBindingElement that is a part of the SDK samples for WCF, I can't do that with the NetTcpBinding.  I have to create my own custom binding based on the NetTcpBinding and modify it there.

Luckily, it's not that hard, and simpler than the ugly bit of code I had earlier:
note: The CompressionMessageEncodingBindingElement encapsulates the encoder that actually encodes the message.

// Create a custom binding based on NetTcp
CustomBinding compressingTcpBinding = new CustomBinding(new NetTcpBinding());

// Find the current MessageEncoding binding and its position in the BindingElement stack.
BinaryMessageEncoderBindingElement currentEncoder = compressingTcpBinding.Find<BinaryMessageEncoderBindingElement>();
int encoderIndex = compressingTcpBinding.Elements.IndexOf(currentEncoder);

// Create the new Encoder
CompressionMessageEncoderBindingElement compressionEncoder = new CompressionMessageEncoderBindingElement(currentEncoder);

// Add it to the stack instead of the current encoder.
compressingTcpBinding.Elements.SetItem(encoderIndex, compressionEncoder);

There - a perfect little NetTcpBinding clone with the Encoder neatly replaced, and all done by code, so we have a better idea of what actually happens there.

Nicholas Allen has promised an upcoming article about the binding element stack - waiting expectantly.

Posted Friday, August 11, 2006 12:49 AM by AvnerK | with no comments

Filed under: ,

Aggregated Interface Implementation

I've been struggling around the aggregation used in WCF's Binding object model, as implemented in the GetProperty<T> method (see relevant discussions with Nicholas Allen from the WCF team here and here), and I'm struck by how the need for a flexible, late-bound way of extracting information from a composite object forces us to write ugly code.

Let's say I have an object myObject. myObject doesn't implement any interfaces, but it contains a list called elements that contain various other objects, each of which might implement those interfaces. Now lets say I want to get a handle to one of those interfaces - to make things more concrete, let's say we want the ISecurityProvider interface. Using the current WCF model, we would do something like this:

ISecurityProvider sec =  myObject.GetProperty<ISecurityProvider>();

Which in turn would do something like this:

T GetProperty<T>() where T : class
{
   T prop =
null;
  
foreach (object o in elements)
   {
     
if (o is T)
      {
         prop = (T)o;
        
break;
      }
   }
  
return prop;
}

(Remember, this is a simplified version. The actual WCF implementation is more complicated).

Now, what I would like to see, as a simplified and cleaner syntax, is the ability to do this:

ISecurityProvider sec = myObject as ISecurityProvider;

This is much clearer, adhers to the interface-implementation paradigm, and still allows me to make late-bound changes to my object's implementation. Ideally, the casting operation would internally call the GetProperty<T> method. The definition would go something like this:public static explicit operator <T> (Aggregator agg)
{
  
return agg.GetProperty<T>();
}

Unfortunately, C#'s syntax doesn't allow for generic type parameters in operator overloading statements, so this can at most be a feature request - and I'm not entirely sure it's a justified one. Mostly a pipe dream. :)

A question that immediately pops up is what benefit does this give us over the GetProperty<T> syntax. The simple answer is uniformity - if I have code right now that checks a list of objects to see if they implement an interface, it will still work without having to make manual changes.

The downsides? I can think of several. First of all, this will require much more work than simply allowing generic cast operators in order to make the feature worthwhile. These aggregated implementations are naturally invisible to Reflection since they're relevant only at run-time. But what about the is operator? Currently it's ultra-quick and relies on an intrinsic CLR operation. Expanding that operation to look for aggregated implementations would naturally be a serious perfomance problem, but leaving it as is will seriously limit the usefulness of the feature since I have to constantly be aware of different results between is checking and explicit casting.

In short, while I think my idea has a certain amount of charm I don't think it's entirely usable. I won't be opening a language feature request for it on Connect, but I'd love to hear feedback and ideas about it.

Unfortunately, C#'s syntax doesn't allow for generic type parameters in operator overloading statements, so this can at most be a feature request - and I'm not entirely sure it's a justified one. Mostly a pipe dream. :)

A question that immediately pops up is what benefit does this give us over the GetProperty<T> syntax. The simple answer is uniformity - if I have code right now that checks a list of objects to see if they implement an interface, it will still work without having to make manual changes.

The downsides? I can think of several. First of all, this will require much more work than simply allowing generic cast operators in order to make the feature worthwhile. These aggregated implementations are naturally invisible to Reflection since they're relevant only at run-time. But what about the is operator? Currently it's ultra-quick and relies on an intrinsic CLR operation. Expanding that operation to look for aggregated implementations would naturally be a serious perfomance problem, but leaving it as is will seriously limit the usefulness of the feature since I have to constantly be aware of different results between is checking and explicit casting.

In short, while I think my idea has a certain amount of charm I don't think it's entirely usable. I won't be opening a language feature request for it on Connect, but I'd love to hear feedback and ideas about it.

Posted Tuesday, August 08, 2006 9:01 AM by AvnerK | 4 comment(s)

Filed under: , , , ,

WCF: Navigating the Binding maze

A few days ago I ranted a bit about the Binding object model in WCF and how restrictive it feels when I want to imperatively describe my service bindings without being tied down to a specific transport. If I want to disable security or set the maximum allowed size per field in a message, I have to write identical-looking code for each different binding - something that just begs for refactoring which can't be done.

Luckily, the WCF team bloggers are ever alert, and a few days later I got a response from Nicholas Allen describing the reasons behind this behavior. His explanantion consisted of two parts, and I'd like to address them seperately:

1) Avoiding meaningless abstractions.

Take the example of security configuration on a transport. There are entirely incompatible security objects called NetTcpSecurity and NetNamedPipeSecurity. The only overlap in configuring these two transports happens to be when security is entirely disabled. That's not a very interesting abstraction to make.

Granted, one of my common failings is the urge for over-abstraction. I've read Spolsky and I've read Gunnerson, and I try to avoid excessive generalizations, but somtimes it gets the better of me. Having the security settings share a common base - even if only to allow setting No Security in a shared way - seems intuitive. I'll bow down to the WCF team's decision here - if the binding security objects really don't share a common base, there's no reason to abstract them together.

2) Aggregation, not inheritance.

We have a mechanism for dealing with this problem called GetProperty. For instance, if you want to set an XML reader quota in a generic fashion, you would use:

binding.GetProperty<XmlDictionaryReaderQuotas>(new BindingParameterCollection()).MaxArrayLength = 2;

Instead of relying on inheritance for shared properties, the object model views each Binding object as an aggregate of several binding elements and properties. Since the properties of a binding can differ between two instances of the same transport, the GetProperty<T> mechanism allows us to query the binding for a specific capability - the XmlDictionaryReaderQuotas capability in this case - and access it generically. I would assume the above line would have to be wrapped in some sort of null check in case the binding doesn't support XmlDictionaryReaderQuotas.

The first thing that came to my mind about this is the similarity to the way C++ code worked with COM objects. You used the shared IUnknown interface to call QueryInterface and see what interfaces were supported by the object, and then got a handle to those interfaces. This is similar - the GetProperty<T> is the shared interface which allows us to query for further functionality.

This, however, led to this question: why wasn't this functionality also implemented with interfaces? Why couldn't we have an ISupportsReaderQuotas interface which defines the XmlDictionaryReaderQuotas property, and instead of calling GetProperty() I can cast my binding to the interface and work with that? Each binding can implement the interfaces that make sense to it, and I can use the existing .NET mechanisms (the is and as operators, as well as reflection support) to query the object for the required operations.

This will also bypass other limitations of this model. As it stands, I have no idea how to access the MaxReceivedMessageSize property on the binding - it's an Int64 seperately defined in each binding class. GetProperty<T>'s generic parameter is constrained to be a reference type, but even if I could query for Int64 I have no way of specifying which Int64 parameter I want. If this property was part of an IBindingMessageProvider interface I could access it through there.

The advantages to the GetProperty<T> approach is that it is more flexible. I don't have to know at compile-time what interfaces my binding supports. Glancing at the GetProperty code in Reflector, I could see it go deep into the BindingElements that make up the Binding and calling GetProperty<T> on all of them. This means that a binding element added at runtime or through configuration (say, a message encoder) get be retrieved without it being defined as part of the NetTcpBinding class.

Thinking it over (this is a stream-of-consciousness blog, can't you tell?), I can certainly see the logic behind the GetProperty<T> mechanism, but it requires more extensive support for it in the binding object. This means the binding object should have as little "unattached" properties as possible, especially if they can be a part of several different bindings. The MaxMessageSize/MaxPoolSize and related parameters can be extracted to a MessageSizeInformation class, while the TransactionFlow/TransactionProtocol can be extracted to a TransactionInformation class - that way I can always query for the required "interface" at runtime without having to familiarize myself with specific properties of specific bindings.

(This actually gives me an interesting idea about combining interfaces and aggregation, but more on that later)

So, assuming you made it to the end - how do you feel about the current implementation? Am I completely off-base here? Do I make a valid point? Were these issues discussed internally before this implementation were chosen? What were the pros and cons?

I'd love to hear more opinions, both from anyone on the WCF that might be listening and from anyone with an opinion.

Posted Monday, August 07, 2006 10:24 AM by AvnerK | 3 comment(s)

Filed under: ,

WCF Serialization Part 2d: A Solution, a Conclusion and a Contribution

This is the last part of my continuing saga of serializing dictionaries over WCF and beyond.

Quick recap: While WCF allows me to serialize an IDictionary easily, trying to serialize that dictionary later for other uses fails - specifically, caching it to disk using the Enterprise Library. This is because the Enterprise Library relies on the BinaryFormatter, which in turns relies on the type implementing ISerializable. An alternate solution was to use the NameValueCollection which implements ISerializable, but is incompatible with WCF's serialization.

I felt trapped, having to juggle between two incompatible serialization engines - one for communications, one for persistance. Frustrated. Annoyed. Helpless.

But then, as I was whining to my teammates, the solution came to me - there really isn't any reason to jump from one serialization method to the other. Since WCF gives me the most freedom and lets me use the IDictionary that I want, I can simply use WCF's serializer - the NetDataContractSerializer - for the Enterprise Library's cache serialization.

Going over EntLib's code proved very easy - the Caching project has a class called SerializationUtility that exposes two methods - ToBytes() and ToObject(). I'll reproduce the entire method, just to illustrate how simple it is:

public static byte[] ToBytes(object value)
{
   if (value == null)
   {
     
return null;
   }
  
  
byte[] inMemoryBytes;
  
using (MemoryStream inMemoryData = new MemoryStream())
   {
     
new BinaryFormatter().Serialize(inMemoryData, value);
      inMemoryBytes = inMemoryData.ToArray();
   }
  
return inMemoryBytes;
}

Given this simple method (and its even simpler brother, ToObject()) it's not hard to see that all the work that needs to be done is adding a reference to System.Runtime.Serialization and replacing BinaryFormatter with NetDataContractSerializer - and that's it. Their methods have identical signatures, so there's no work there, either.

The lovely thing about EntLib is that it comes with comprehensive unit tests. The only thing I did after making this change is letting NUnit chew on the 200-odd test methods defined and give me a positive result, and I'm good to go.

I've attached the new SerializationUtility.cs for those too lazy to make the change themselves, and the compiled Microsoft.Practices.EnterpriseLibrary.Caching.dll for those who want to just drop it in. Enjoy.

Posted Thursday, August 03, 2006 4:23 PM by AvnerK | 1 comment(s)

WCF Serialization Part 2c: Hacking the NameValueCollection (unsuccesfully)

As we mentioned here and here, I've been struggling to get the NameValueCollection object to pass through WCF serialization. Please read the first two posts first for some context.

 

In the previous episodes, we saw that we can't get WCF to serialize the NameValueCollection (NVC, from now on) because it incorrectly marked for its CollectionDataContract serialization, but then choked because it had no Add(object) method.

 

So to take the most direct approach, I subclassed the NameValueCollection and added my own Add(object) implementation to see if this jedi mind trick will let WCF work with the NVC class:

 

public class NVC : NameValueCollection

{

    public void Add(object obj)

    {

    }

}

 

Once I did this, the WCF stopped yelling at me that the contract was invalid, but (naturally) my data would vanish during transfer.

 

So next I tried putting a breakpoint inside my Add() method to see what object is passed to it – I was hoping for something along the lines of a KeyValuePair<K,V> or even a DictionaryEntry, something I can use to manually recreate my NVC.

Turns out there's no such luck – even though the NVC has an internal object called NameObjectEntry that the NVC uses internally for storage, it's not exposed externally. Even the GetEnumerator() method returns an enumerator that only goes over the Keys, not a key/value dictionary.

This means that when recreating my dictionary, the value passed to my Add() method is a single string with the key name only, and the value is lost in translation.

 

No luck there, either.

Posted Wednesday, August 02, 2006 4:28 PM by AvnerK | 4 comment(s)

Filed under: ,

WCF Serialization Part 2b: WCF Collection Serialization

As we mentioned here, I'm currently struggling with getting WCF to serialize my NameValueCollection object on a service operation. The previous post goes over the general details, while here I'll dive a bit deeper. It's recommended to read the first part before continuing.

 

I don't know exactly the logic that goes in the WCF serialization engine to determine what data contract will be selected for serialization.. I tried wading in with Reflector but got a bit lost. I managed to track down some interesting places, though. Both in the System.Runtime.Serialization assembly:

 

  • System.Runtime.Serialization.DataContract.CreateDataContract() 

This is apparently called when the serializer needs to decide what datacontract applies to the object, if nothing more explicit is found. We can see in line 38 that the function tries to create an instance of the collection type, which leads us to:

 

  • System.Runtime.Serialization.CollectionDataContract.IsCollectionOrTryCreate() 

This method checks if the type is a collection in various ways (interfaces implemented, etc), and has an explicit check if the type has an Add method that receives an object parameter (line 110):

itemType = type3.IsGenericType ? type3.GetGenericArguments()[0] : Globals.TypeOfObject;
CollectionDataContract.GetCollectionMethods(type, type3, new Type[] { itemType }, false, out info2, out info1);
if (info1 == null)
{             
   // Handle invalid collection
} 

 

I'm not entirely clear on the entire logic flow, but these two places seem to indicate that the WCF engine sees the NVC, flags it as a Collection (maybe due to it implementing ICollection) then makes sure it stands up to its strict requirements (stricter than just implementing ICollection) and throws an exception when it doesn't.

 

The big question here is why the IsCollectionOrTryCreate() method, which should return bool if the type is not a collection, instead chooses to throw an exception when it's an incompatible collection rather than returning false and letting the ISerializable handler take it from there.

 

In my next post: Some ugly hacks that also didn't work.

Posted Wednesday, August 02, 2006 4:25 PM by AvnerK | 1 comment(s)

Filed under: ,

WCF Serialization Part 2a: NameValueCollection Denied!

As we all know, IDictionaries aren't serializable. This is has been a cause of much concern and consternation throughout the years. Throughout these trying times, however, we had one shining beacon in the night - the NameValueCollection in the System.Collections.Specialized namespace was a completely serializable implementation of a string/string dictionary. Used by the framework for everything from configuration settings to ASP query strings.

Now WCF comes along, and with it much serialization goodness, including the ability to easily serialize IDictionaries over the wire. While this is extremely convenient, we have to remember that IDictionaries are still not really serializable - they're just special-cased in WCF. This means that if we want to transfer a dictionary over the wire and then persist it to disk (using the Caching Application Block, for instance) we're still out of luck.

So I found myself coming back to the familiar old NameValueCollection (NVC from now on, for brevity) and passing that over the wire. Imagine my surprise, then, to realize that WCF fails to serialize an NVC in a service operation.

The error received is a System.Runtime.Serialization.InvalidDataContractException', saying that:
Type 'System.Collections.Specialized.NameObjectCollectionBase' is an invalid collection type since it does not have a valid Add method with parameter of type 'System.Object'

This is true - the NVC (and its base type, NameObjectCollectionBase) doesn't implement IList, only ICollection. Unlike ICollection<T>, ICollection doesn't specify an Add() method - this is specifically added by its children, IList or IDictionary. This means that the NVC is free to add its own Add method, and in our case adds two overloads - one receiving string/string, the other receiving a whole NVC to merge.

Now let's go back to Sowmy Srinivasan's blog entry as linked above, we see that the WCF serialization engine uses these precedence rules:

  1. CLR built-in types
  2. Byte array, DateTime, TimeSpan, GUID, Uri, XmlQualifiedName, XmlElement and XmlNode array
  3. Enums
  4. Types marked with DataContract or CollectionDataContract attribute
  5. Types that implement IXmlSerializable
  6. Arrays and Collection classes including List<T>, Dictionary<K,V> and Hashtable.
  7. Types marked with Serializable attribute including those that implement ISerializable.

We can see that if I have an ICollection that's also marked as ISerializable (like the NVC), the built-in support for collections will kick in first, despite the type being explicitly marked for serialization.

This seems to be a bug in WCF's handling - on one hand it treats it as a Collection class, but then immediately dismisses it as unacceptable, but without letting it fall back on the 7th serialization option. Unfortunately, we can't mark a class to explicitly NOT take part in the built-in collection serialization, even if I mark it as [DataContract] instead of [CollectionDataContract].

Right now I have no solution for this problem. I'm using a Dictionary now with explicit implementation of IXmlSerializable, and manually serializing it before passing it on to my Cache manager.

In my next post I'll go over some of the deep digging I did to get to these conclusions.

Posted Wednesday, August 02, 2006 3:40 PM by AvnerK | 3 comment(s)

Filed under: ,

More Posts