Async Programming can sometimes complicate algorithms (Part 1 of 2)...
Alrighty, so the algorithm of the day is going to be NNTP, since I happen to have some NNTP classes on my disk that demonstrate the issue at hand. Note there are several ways to read from sockets, which I'll quickly enumerate, along with their pitfalls.
1. Receive. This is a blocking method that doesn't return. If you aren't running things on a separate thread then this method will hang until some data comes in.
2. Receive using AvailableBytes. First you check to see if data exists, then you read it in. This is a bit better, since you can control the read operation so it doesn't block.
3. BeginReceive/EndReceive. Okay, this one requires callbacks to process the incoming data as it becomes available. Callbacks can take a linear read/response algorithm and turn it into a bunch of smaller functions that control the asynchronous processes. At times this can obfuscate the underlying algorithm.
So, ideally, we'd like to establish an algorithm, make it look good and still use some form of asynchronous programming so we can easily cancel out of operations that appear to be taking too long. To do this, I'll take a single NNTP command LIST OVERVIEW.FMT, and demonstrate a method of using Receive in order to process the commands incoming data. What is important here is the while loop and the initial if statement used to control the processing of data. In a Network environment you can't use the 0 bytes available on the stream to determine if it is complete or not, and you can't use the stream closed condition either. You have to actually process the incoming data and determine based on the underlying protocol that the message has completed. That makes all of the looping constructs and data checks very important. Let me just toss the code in for the command:
public class NewsReaderCommand_ListOverviewFormat : NewsReaderCommand {
protected string[] headers = null;
public override void RunCommand(Socket newsSocket) {
newsSocket.Send(System.Text.Encoding.ASCII.GetBytes("LIST OVERVIEW.FMT\r\n"));
bool complete = false;
string groupText = "";
ArrayList groups = new ArrayList();
while(!complete) {
if ( newsSocket.Available > 0 ) {
byte[] b = new byte[newsSocket.Available];
newsSocket.Receive(b);
groupText += System.Text.Encoding.ASCII.GetString(b);
} else {
System.Threading.Thread.Sleep(50);
}
if ( groupText.Length > 0 ) {
while(groupText.IndexOf("\r\n") > -1 ) {
string response = groupText.Substring(0, groupText.IndexOf("\r\n"));
groupText = groupText.Substring(groupText.IndexOf("\r\n") + 2);
if ( message == null ) {
message = new NewsResponse(response);
} else {
if ( response == "." ) {
success = true;
complete = true;
} else {
try {
groups.Add(response);
} catch {
Console.WriteLine(response);
throw;
}
}
}
}
}
}
headers = (string[]) groups.ToArray(typeof(string));
}
public string[] Headers {
get {
return headers;
}
}
}
Okay, so we poll the news socket for available data and if it is available we actually read it in. Note this could be 1 byte or a thousand bytes so we just allocate a buffer based on what is available. We then translate the bytes into ASCII (1 byte per character, so hopefully no worries of truncated characters), and append it to our string buffer for later processing. If no data is available we simply sleep the current thread giving that time back for othe threads in our application to run.
Next we process any lines that have been made available. Each line in the response should either be an NNTP response message (first line), part of the response, or the end of message terminator the period. Processing the text of the message isn't hard and we keep on going until we find the end of message terminator making sure to set some flags to break out of our data reading loop.
If you look at the code the algorithm for this particular command is readily apparent. There isn't an overshadowing by the data reading code or the message processing code that obfuscates the algorithm and makes it hard to read or understand. At this point we have a pseudo-asynchronous way to process this particular NNTP message. The algorithm processes the returned data in real-time, meaning we don't read the entire message into a buffer and then start processing it later (the class handles both the IO and the processing as a pair rather than an IO first, processing later deal). Porting this into asynchronous sockets code won't look pretty I'm thinking. My original implementation was based around asynchronous IO and because of the ugliness (I consider it ugliness) I resorted to helper classes for reading an entire message before having it processed. That way I wrote the IO code as a black box that just gave me some bytes back. This resulted in some large byte arrays getting passed back (listing groups on msnews.microsoft.com for example), and so memory was becoming an issue.
I'm not nearly as stupid this time though, and I'm not just hacking something together like I was when I originally wrote the code. I have a purpose. To clearly write the above NNTP command algorithm using asynchronous sockets without obfuscating the algorithm itself by making it seem less apparent amongst a bunch of asynchronous IO code. I'll also be adding in the timeout code which for controlling command processing time. You'll notice it doesn't exist in the code above. This is because the LIST OVERVIEW.FMT command doesn't really return a good deal of information. I didn't feel the need for a timeout. However, if I never receieve a response, then a timeout would be appropriate. It just wasn't high on the list of commands that needed the extra control code.
Well, keep your eyes peeled for Part 2 and a fully asynchronous version of the LIST OVERVIEW.FMT command ;-)