February 2006 - Posts

I received a couple of comments on my posting on how bad RPC can be for your mental health. Although the feedback was positive and agreed with my statements, some expressed concern as to what an alternative to RPC-syntax could be. I thus feel encouraged to elaborate on how I think any software developer can avoid damage to his/her mental health ;-)

If subroutine call syntax like

int r = Calc(a, b);

suggests, a service is guaranteed to be carried out immediately and synchronously by "someone you know intimately", then we should first clarify how situations might look, where this syntax is not appropriate. Let me phrase the following recommendation:

Don´t use subroutine call syntax, if

  • a service will not be carried out immediately, or
  • you don´t want to wait for the service to finish its work, or
  • you don´t know the service agent intimately (including where it resides), or
  • you have doubts the service is available right now at all.

In addition there is one more premise of subroutine calls: You pretty much know exactly what service you need. So whenever you don´t know (or don´t want to know) what service or services should work on a piece of information, then you should not try to use subroutine calls.

Ok, so what does that mean? To find out, we need to look at the two sides of a subroutine call, the caller and the callee, or client and service:

This is the usual case. Client and service know each other well and the usual "quality of service" (QoS) criteria are met; the service can fulfill the promise of direct subroutine call syntax.

But this picture of harmony and easieness should change in your mind, once the QoS fulfillment is not guaranteed. Once communication between client and service is not stack based anymore, a chasm opens between caller and callee:

Trying to bridge this chasm using the same form (notation) for communication, I think, is plain wrong. Whether client and service are separated by a thread boundary, or an AppDomain boundary, or a process boundary does not matter. In any case there needs to be bridge across the chasm and this bridge should be obvious to anybody looking at the code.

Hiding the bridge and thereby suggesting "You can fully trust this subroutine service call. All´s well, it will fulfill its promise." would be (self-)deception. And being deceived sure is something nobody really likes.

As a solution I propose to always (!) - sorry, Ingo, for trying to set up a rule again ;-) - use an explicit bridge to cross the chasm or to make the bridge very clear that´s used under the hood of any RPC-style remote service invocation anyway:

Using the term "bridge" here, does not mean, I´m talking about the Bridge design pattern. Right to the contrary: I don´t want to hide the chasm behind layers of abstraction. (Although the technical details of how the chasm is bridged are not important to client or service and should be hidden from them.)

Rather, if you like to think in patterns, I´m talking about some kind of Mediator, i.e. an entity that "encapsulates how a set of objects interact".

But I´d like to get more general. I´d like to call the "bridge" just a coordination structure. It´s some kind of code which helps to coordinate the work of client and service. Sometimes this coordination structure (or entity) is small, sometimes it´s a large piece of infrastructure. In any case it means an indirection in the communication between client and service and it represents some kind of data structure.

Now, how does a client change when a coordination structure is introduced or just made explicit? The client does not call the service directly anymore; instead it interacts with the coordination structure:

CalcRequest req = new CalcRequest(a, b);
calcCoordQueue.Enque(req, new CalcResponseHandlerDelegate(CalcResponseHandler));
...
void CalcResponseHandler(CalcResponse resp) { ... }

From this code, you can´t glean, where and when the service will be fulfilling the request. It might run on a different thread or on a different machine - or even in the same thread as the caller. You just don´t know. And that´s a good thing. It´s the prerequisite for a clean distribution of code.

It might sound so good, when .NET Remoting and Serviced Components and WCF tell you, "Hey, you can distribute your code transparently. A client does not need to see a difference between local and remote processing." But that´s the song of Ulysses´ sirens! Don´t let yourself be lured into thinking, you don´t need to take into account "how far apart" client and service will live in the end.

The contract between client and service who always will communicate locally needs to look different from a contract between client and service which only possibly might at some time in the future need to communicate remotely.

To make this distinction clear, I´m saying: Use ordinary subroutine calls to call local services. But always use indirect communication via a coordination structure whenever a service today or at some time in the future cannot be called directly. Make the boundary between a client and service easy to see in your code. This helps to build trust in you code. It makes it easier to maintain and evolve - even though it might mean, you need to write a little bit more code today.

So far I´ve been talking about the client side of a subroutine call. But what about the service side? What´s the promise of the subroutine definition syntax?

int Calc(int a, int b)
{
    int result;
    ...
    return result;
}

When writing such code you don´t think about whether the client waits for you to return a result or even if it is still alive at all. You just do whatever any service worker does: you fulfill the request as fast as possible. Whether you do that on your own thread or even on a different machine is of no concern to you. The only thing that´s for sure is, parameters come in on the stack and a result is returned via the stack. Who calls the service when, where the parameters come from, where the result goes to... all this the service does not know.

Hence I´d say: A subroutine is always an event handler. It has no control over when it´s called or who calls it, just like any button click event handler or a SQL Service Broker stored procedure.

What does that mean for receiving service requests? I´d say, it does not necessarily need to have an impact on how a service receives and handles requests. When implementing the service you don´t need to see a coordination structure, if a chasm needs to be bridged between you and your clients. Your service can pretty much look the same whether it´s called directly by a client or indirectly by a coordination structure. For the latter case, though, you need to register your service with the coordination structure. Your service then becomes a true event handler:

calcCoordQueue.RegisterRequestHandler(typeof(CalcRequest), new RequestHandlerDelegate(Calc));
...
CalcResponse Calc(CalcRequest req) { ... }

Instead of giving up control and waiting for events your service could be written to be "self servicing", i.e. interacting with the coordination structure directly to look for work.

void Calc()
{
    while(true)
    {
        CalcRequest req = calcCoordQueue.Dequeue();
        ...
        CalcResponse resp = ...

        calcCoordQueue.Reply(req, resp);
    }
}

For the usual FIFO service request, this might not be the most intuitive way to go. Fulfilling requests coming in on a queue is the canonical example for event-driven programming. But who says, coordination structures need to be FIFO-based? FIFO is still about military like orders: a client orders the fulfillment of a request - and the service better fulfills it as fast as possible.

But commands and orders are not the only way how results can be achieved. Since SOA values autonomy high, more peer like cooperation should enter how software parts deal with each other.

Clients and Service could be viewed as grouped around a common coordination structure, acting more like peers or servents than true clients and services. A coordination structure then would become a cooperation structure each peer accesses to get data to work on and insert results into.

So my bottom line is:

  • In distributed software model communication between clients and services explicitly by using obvious coordination structures. It doesn´t make a difference whether client and service are running on different threads or on different machines. Also try to foresee future changes, stay flexible. If in doubt go for coordination structures.
  • Use event-driven programming, i.e. implement services as event handlers whenever possible, especially in FIFO scenarios. It´s a way of decoupling the service from the coordination structure.
  • If communication between clients and services becomes more complicated or at least cannot be modelled using the FIFO pattern switch to self-servicing services and let the access the coordination structure explicitly. If in doubt do so even if the coordination structure still is a FIFO.

If all this means waiving some convenience tools might be offering, I say: don´t bother. You´ll gain so much in code clarity and flexibility, you won´t miss this convenience much. And I promise you: there will be more technologies in the near future which will make it even easier to work with coordination or even cooperation structures on the .NET platform.

What is a subroutine call like the following?

int r = Calc(a, b);

It is an abstraction and a promise!

It is an abstraction of something like this

push a
push b
call 1234
pop r

The 3GL syntax hides all the low level details like allocating a stack frame, pushing actual parameters on the stack, jumping to the subroutines address, and later retrieving its result from the stack. Maybe even no stack is used at all. Maybe the parameters are passed in registers. In order to solve your customers problem you don´t want to be concerned with all this nitty gritty detail and any machine dependencies.

In addition the subroutine is a promise to return a result pretty fast. It says, "Hey, you´ll get the requested result in just a moment. I´ll be back in a hurry. It´s worthwhile to wait right here." This promise is implicit, though. The usual notation like above does not contain this promise. Rather it stems from your practice with such kind of subroutines. And you become aware of it whenever a subroutine takes an overly long time to return and thus hampers the performance of your program.

Form follows function promise

Now, let´s look at a couple of other well known abstractions and promises:

What are the abstractions? Instead of doing the switching of lines you use a telephone and dial a number. Instead of growing vegetables and milking cows yourself, you go to the supermarket. Instead of going going to a theater play or produce a movie yourself you switch on the TV. (Yeah, I know, maybe the "abstractions" are not 100% correct. But in any case, these tokens of civilization hide uncomfortable details, don´t they? They make life easier for you, don´t they? Since making something easier is the purpose of abstraction, I´d say they are abstraction. Ok?)

And what are the promises? The phone promises a direct link and fluent conversation with your beloved ones. With today´s mobile phones even "... whereever and whenever you like!" is appended to it. The super market promises fresh food and a huge choice almost 24x7. And the TV promises at least moving pictures - sometimes even matching your current mood.

This all great, I´d say. We love these abstractions and promises. And we are very used to their form. Maybe we even love their form, because it´s so easy to use. Their form therefore is a representation of their services and promises we are very familiar with. Or to turn it around: Whenever we encouter the form, we immeditately associtate a certain promise with it.

Now, what happens, if the promise is not fullfilled? We are disappointed. If you want to talk to your friend on the phone to impart some exicting news and keep hearing just her answering machine, you are frustrated. If you go to the super market and they don´t carry the type of bread or beer, you are frustraed. If you switched on the TV and all you could see were interviews or stills of an exciting event, you´d be frustrated.

If some technical gadget or service does not keep its promise... you´d rather like to know so you can switch to an alternative that´s better suited to what you want or what is available.

If communication via phone degrades to leaving messages on answering machines, you probably will resort to letters, faxes or emails as means for asynchronous communication. If a super market does not provide fresh food next to beer and next to magazines you probably prefer to get really fresh vegetables at a local farmers´ market with the added chance to actually bargain your price. And if a TV program deteriorates to pretty pictures, why not look at them in a nice book at any time and at any place and as often as you want?

Letters, farmers´ markets,  and books hold different promises than phones, super markets, and TVs. So their form is different, it´s optimized for their kind of services.

Why am I talking about phones, letters, books in a posting on software? Because I believe in "form follows function" - and I guess, you do so too. What strikes me odd, then, is how little we care about form when it comes to communication in today´s distributed software.

To say it bluntly: If a subroutine call is the accepted form for fullfilling the promise of swift and synchronous execution of a request, then, why would anyone use the same form, if this promise cannot be kept?

I have to admit, for a long time I liked the programming model of .NET Remoting and RPC-style Web services, or even the former DCOM. But just recently I came to the conclusion that using the syntax of subroutine calls is fundamentally wrong whenever the promise that´s tied to it cannot be kept, i.e. when what happens under the hood is completely different.

I´m not just talking about switching to message oriented "thinking" when designing distributed software. I´m talking about the need for actually seek appropriate manifestations for communication.

If you look at a phone or book you immediately know what kind of service you can expect from it. Also you have a pretty clear idea of the quality of the service.

RPC is the wrong form

When you look at a subroutine call, though, you might have an idea of the service, the subroutine´s function - but nowadays you cannot really know anything about the non-functional aspects of the service. Will it keep the promise of the subroutine call syntax and return fast? Will the service execute synchronously or asynchronously? Will the service run in the same address space, on the same thread, or on a different machine in a far away country? Can I be sure the request reaches the service worker at all? Is the service worker available? Can I be sure to receive the result?

By looking at

int r = myservice.Calc(a, b);

you can´t answer any of these questions. That´s what I find fundamentally wrong with today´s communication offerings - even with WCF. Because it does not really make a difference if I write the above or the following:

RequestMessage req = new RequestMessage(a, b);
ResponseMessage resp = myservice.Calc(req);
int r = resp.result;

The latter form is only slightly different. It´s still a subroutine call promising at least guaranteed, synchronous and immediate processing of my request.

So what I want to say is:

Whenever the promise of a subroutine call syntax - guaranteed, synchronous and immediate processing - cannot be kept by a service, then don´t use subroutine call syntax for communication.

You´re asking why? Well, it´s bad for your mental health ;-) Mental health, I´d say, depends on trust. The less you can trust an environment, the more uneasy you feel - leading to paranoia in extreme cases. Trust is fundamental for your wellbeing. And trust starts with looking at something and immediately be able to categorize it, to infer its properties. But if you look at the above subroutine call you cannot infer its properties, which leads to a lack of trust. You don´t necessarily get what you see. And that´s bad - as you might also understand if you´re a proponent of WYSIWYG ;-)

Fundamental for productive and healthy communication is to be clear and outspoken. Say what you mean, make obviouse what you can do and cannot do. And I think that´s also true for any communication related with software. For example that´s the reason everybody likes understandable and informative subroutine names.

If you subscribe to that, then you should understand why I think RPC remote (or non-stack based) communication mostly (or maybe even always) is bad for your mental health. It violates the above basic requirements for any communication because it suggests non-functional properties it mostly does not have.

While pondering about how to get a grip on software architecture, I now and again of course stumble upon questions on communication between distributed software parts. WCF then springs up as the state-of-the-art technology to answer those questions. But more and more I´m asking myself: Is WCF all there is we need to know about communication in distributed software? Is it the holy grail with its beautiful abstraction over SOAP, WSDL, COM+, HTTP, MSMQ etc.?

The longer I´m thinking about this, the more I´d say, no WCF is not the end, it´s hardly the beginning. WCF is the foundation for technologies to come which will make it truely easy to communicate in distributed software systems. WCF is (just) a wrapper around many basic intricacies of message based communication. It hides the ugly details of a communication model that´s very different from local method calls. In that it will make communication as much easier as sockets made it easier compared to lower level APIs in the OSI stack of communication layers.

WCF thus is a necessary and overdue unification and abstraction. But despite all its features WCF still is in the tradition of basic socket communication. WCF is about FIFO and streams of bytes flowing from here to there and maybe back.

And this made me think. What´s communication in software about anyway?

Communication always is about a data structure. And thus communication is not different from code, meaning: there is only code and no communication. Or to put it differently: Bits flowing back and forth (mostly) can be neglected from an application programmer´s point of view; what is important, though, is the code and the data structure it implements. Also, what´s important is where the control over a the data structure is at each point in time.

Let me illustrate what I mean:

Often our software diagrams contain to different kinds of "entities": software artifacts and communication "lines". We´re describing software using graphs. Code is depicted as vertices, communication is depicted as edges. That´s great and easy to understand.

But my feeling is, this becomes a problems once you forget, that communication does not come for free. And also I find this depiction limiting, since it lures you into thinking, communication always goes through some kind of pipe. Because what the above picture suggests is something like this:

There are hard working software artifacts at two ends of a pipeline. The pipeline is just some kind of channel to let data flow between the "data factories".

Spaces instead of edges

Well, that´s a nice analogy that appropriate very often. But unless you´re aware it´s just an analogy and because of that probably only one of many possible analogies, it´s also a limiting view of software interaction.

Hence let me change the software diagram to show how software communication really works:

Between the communicating parties there is no direct connection. If you like, you can think of this connection being a hardware cable or a RAM chip. That´s fine - but pretty irrelevant from an application programmer´s point of view. Rather communication always manifests itself in another piece of code. So an edge can be represented as a piece of code sitting between the originally connected software artifacts. Of course this edge-code then is again connected with those software artifacts by edges - which in turn can be represented as a piece of code etc.

Since edges don´t seem to go away in the above picture, let me use another depiction to make clear, what I mean:

See, there are no edges anymore. Instead of an edge there is a space where communication takes place. And this space is spanned by some special communication code.

Today´s communication structures

Now, what does this code facilitating communication do? I like to call it a communication structure or even a coordination structure. This code always implements some kind of data structure sitting between the communicating parties. The purpose of its data structure is to enable communication and not permanently storing data. It helps the coordination of the cooperation of code accessing it.

Sometime code accesses this coordination structure sequentially, sometimes code accesses it in parallel. (The latter case is the more interesting one ;-)

You might now ask, "But where is the data structure when calling methods?" And the answer is: It´s the stack. And the whole purpose of the method call and method definition syntax is to hide the stack (or any other data structure, like registers) from you. Local communication (within the same address space) usually uses the stack which is invisible - but nevertheless present:

The compiler generates the code to put the actual parameters on the stack, transfer control to the called code, and later on clear the stack. Back in the good ole times when we all were still programming assembler we were aware of this all the time. We had to think hard about whether to pass parameters on the stack or in registers. The communication structure so to speak always was on our mind. Then came higher level languages and we don´t want and need to see it anymore. Great!

Enter distributed computing! The communication structure between software parts running in parallel or not living in the same address space is not the stack. Instead we´re talking about message orientation, since communication mostly uses streams to pass bytes to and fro. However, RPC and Web services made this fundamental change in the basic communication structure transparent for application programmers. It´s well know how contraproductive this abstraction in many cases was.

But today with WCF this has changed. WCF is not trying tell you, communication between distributed parts of a software system is like calling code locally. Rather, the WCF message is messages. WCF acknowledges the fundamental difference between a stack and a stream and encourages to adapt your thinking.

The WCF message is: Be aware of the channel! Because a channel it is, through which communication flows between distributed software parts:

A stack is a LIFO data structure. A channel is a FIFO data structure. This difference alone should make it obvious how different communication in distributed software is from local method calls.

Nevertheless, code using WCF to communicate can pretty much look like code communicating via the stack. Why is that? Because a LIFO and a FIFO have well defined "input and output points". Both are not random access, but impose strict rules. Both say: data can only go in here, and data can leave only over there. The single entry and exit to the data structure makes it possible (and pretty intuitive) to model interaction with it using method calls:

int r = DoSomething(a, b);

This can mean "Push a and b on the stack, later pop the result off the stack and put it into r." or it could mean "Enque a, enque b, wait for a response on another queue and put it into r."

At the receiving side the code of DoSomething() can pop the actual parameters off the stack or dequeue them. The code can either be explicitly called to do this or can already run in parallel and wait. In any case, since there is only one place to look for the data (top of stack or head of queue) it can be modelled like an ordinary method receiving its input via parameters.

Tomorrow´s communication structures

Now, here comes the 10,000 dollar question: Is this all that´s to communication in distributed systems?

The answer is yes, if you look at current and widely established technology. Code is just using either the stack or a stream/queue to communicate and is happy doing so. Or not? The more I think about it, the more I lean towards no as an answer.

Stack and queue are just the only communication structures we have. So they are our only hammers - and thus every communication problem looks like a nail. Our thinking is constrained by these to (pre)dominant communication structures.

But if we´d take a look from 30,000 feet we´d see, they are just two very common, but nevertheless special cases of communication structures. Think of the possibilities you had, if you were not limited in your choice of communication structures. Think how intuitive cooperation between distributed software parts could be modeled, if you could use not only LIFO and FIFO data structures, but lists, trees, arrays, sets, dictionaries and what not!

And if you like, you can still use a queue in distributed apps. But then you´d always be aware that you have a choice! (Actually you could even use a stack even in distributed solutions ;-)

Today you have the choice between passing information between local cooperating pieces of your code via stack or global variables. The general rule is to avoid globals variables and use the stack instead because this fosters decoupling. But sometimes global variables are easier and even necessary. Think about how to pass data into another thread. This can only (!) be done using global data structures. (Let´s leave aside the state parameter on ThreadPool.QueueUserWorkItem(). It´s also just hiding a communication structure.)

Virtual Shared Memory

Using global data structures sometimes even today is necessary for communication. And more choices for communication structures sure are a good thing. I guess you agree here; but you might say, "Well, what´s the point? I can use a global tree or use parameters to get data into a method I call."

My point is: We have and require this flexibility for local code, code running in the same address space. But we lack this flexibility in distributed systems!

For local code it´s implicit stack and explicit queue, stack, tree, list, array, set, dictionary etc.

For remote code it´s implicit stream.

That´s it.

I´d say: a poor choice of programming models this is for communication in distributed systems.

Especially if you consider there are solutions available since long. Tuple Space and Virtual Shared Memory (VSM) implementations are available since the 1980s. But they never really made it onto the Windows/.NET platform. JavaSpaces on the other hand is part of the J2EE platform. Why haven´t we really heard about TSpaces, Linda, Ruple, or Corso? Why have the successes of VSM on Apollo workstations been forgotten?

I don´t know. I really don´t know. But lately I´ve played with Corso for which a .NET binding exists. And I can tell you: it was great fun. Communication in distributed apps suddenly felt much easier. I even did not mind using a queue explicitly - because I felt I had a choice to change the communication structure at any time.

So what I´m looking for is a renaissance of VSM on Windows. And I´ll try to help it come about. I´m convinced, more choices are better than less.

Having a ubiquitous VSM platform available would make programming in the small (local) and programming in the large (distributed) much more symmetrical. And if you´re in doubt and say, "Hey, but then we´d no longer see the fundamental difference between stack and stream based communication, since there would be just data structures and objects!" I´d answer: No, not necessarily. Like today you can hide a queue behind a method call. Or, on the other hand, yes, sure, and: that´s even the point about VSM!

The bad thing about object orientation in distributed systems in the past was not the objects, but the illusion, method calls were still for free. Hiding the data structure of distributed communication was bad. But once the data structure is visible again, interaction with it is explicit. So you can actually know what you´re doing. And then of course such shared communication structures are transactional meaning, you don´t have to fear interacting with them will drive network traffic through the roof.

I´d say: Let´s give VSM a try and see how far beyond WCF it can take us. WCF will of course stay important as a fundamental unification of several current concepts and technologies. But that doesn´t mean we can´t move up the abstraction ladder a bit, or does it?

 

Posted by ralfw | 3 comment(s)
More Posts