On Productivity and API Wars

Thursday, June 17, 2004

Joel Spolsky recently posted another essay, this time ranting about Microsoft, platform shifts, runtimes, etc. I'm a big fan of Spolsky's and I strongly encourage anyone who is at all interested in software to read all his essays on software. He seems to be showing a bit more bitterness than usual these days though. One of the points he tries to make is that the reason developers are more productive with programming platforms like Visual Basic or the .NET Framework compared with traditional programming languages like C and C++ is because the former have automatic memory management.

Hmm... Well I'm certainly a big fan of the CLR's garbage collector. But I really don't think you can attribute the huge productivity gains that .NET developers experience purely to automatic memory management. In my mind the reason for the productivity boost comes from two factors. The first is the extensive .NET Framework class libraries and the second is the common type system.

Consider what a C# application would look like if not for the base class libraries. Sure, all the heap based variables will be automatically freed by the garbage collector at some point in the future which does simplify the programming model to a degree, but without good libraries you're still stuck talking directly to the operating system in the only language that operating systems ultimately speak - C. After all, what does the ever popular FileStream class from the .NET Framework do? Internally it calls the CreateFile, CloseHandle, ReadFile and WriteFile functions, the same functions you would call from C or C++ to read and write to a file. If you've ever had to do any amount of P/Invoke work in C# you probably realized that it's just a whole lot easier from native C++. So good libraries are a must. But these libraries can also be written in C++. In fact C++ is all about creating class libraries of reusable functionality. In fact I think this is an area where C++ will begin to shine even more with the introduction of Visual C++ 2005 and its first class support for writing managed code using the C++/CLI language design.

So what about automatic memory management? Well it depends what you mean by automatic. If you mean that memory allocated on the heap must be freed at some unpredictable time in the future then garbage collection will serve you well. But if what you really want is to not have to worry about freeing memory explicitly well then gosh, but C++ has had that since day one with destructors. Here is an example of allocating a simple buffer in C# and C++ with its size determined at runtime:

[C#]

byte[] buffer = new byte[size];

[C++]

std::vector<BYTE> buffer(size);

In the C# case, the GC will collect the memory when there is memory pressure and there are no longer any roots referring to the variable. In the C++ case the memory will be released in the vector instance's destructor that is called when the variable goes out of scope. The C++ code is not any harder to write than the C# code. Sure there are issues around resource ownership but there are well-established techniques for solving those problems. Reference counting is a great example that COM popularized in some circles. Andrei Alexandrescu also presents some great techniques that you can use to write more flexible and extensible libraries in Modern C++ Design. Brian Harry goes into a lot more detail on the topic of memory management and explains the path to garbage collection in the CLR. But the point is that C++ does provide memory management and with good libraries, the code is just as concise and developers can be almost as productive. The trouble of course is that there just isn't a single library near the scale of the .NET Framework for C++. There are lots of smaller libraries that focus on different things and often overlap. That leads me to the second reason for the productivity boost owned to .NET.

The common type system is an often overlooked aspect of the CLR and the .NET Framework yet it is critical to its success. Let's for a moment stroll down memory lane and consider the infamous string. If you know anything about programming it is that we use strings to allow programs to communicate with humans. C defined the simplest of string types:

char* pString;

pString is a stable pointer into the process' address space. Programs can read the string, character by character, using memory addresses and pointer arithmetic until they reach a character with a value of zero, referred to as the null-terminator. Of course there are many problems with this type of string such as determining who owns the memory, string manipulation woes, security problems, etc. The StrSafe.h header file goes a long way toward helping developers write more robust code using C strings.

The next string type is the C++ string. Huh? What string? Do you mean CString or perhaps std::string or perhaps you mean your favorite home-grown string class? I'm sure I could find a dozen different string classes in common use today. The problem of course is that C++ didn't originally provide a native string type. The official C++ string, std::basic_string, was only introduced much later.

Then of course along came object/component technology and the desire to build large applications out of individual components engineered by different vendors. These different components would be written in different languages, each with their own string types. How do they interoperate? COM didn't mandate any standard string type. It did however popularize the Visual Basic string, or BSTR. Using BSTRs from C++ meant using a wrapper class around functions like SysAllocString and SysFreeString. So component technology in the 90's suffered from type friction and guys like me who loved digging into things like COM marshalling and interception and understood how to build large component based systems made a good living.

Clearly if every programming language and platform used a consistent set of types, building components to interoperate would be significantly simpler. This goes beyond strings. Consider approaches to error handling. COM enforced a strict error handling protocol not natural to either C++ or Visual Basic. So providing a common type system and programming model are essential ingredients in making developers more productive by reducing the noise involved in using different libraries from different teams and vendors.

So we've talked about the importance of good libraries and the value of a common type system. Both of these are present in the .NET Framework. As developers we need to strive to write our own types in such a way that they work naturally with other libraries. Using common types like System.String goes along way. Another common type that pervades the .NET Framework is the System.IO.Stream class. If you have some resource that is naturally represented as a stream of bytes then you should consider implementing a Stream-derived class to extend this library pattern to your types.

I was originally going to blog about Stream implementations today but I got sidetracked and this felt like a good introduction to it anyway. Stay tuned. I have an interesting post coming up on better design for .NET.

No Comments