Asynchronous Regular Expressions using the ThreadPool and a cancellation model.

There are probably numerous ways to shoot holes in this class, but this is a 30 minute attempt at providing an asynchronous framework for running regular expressions.  The crux of the model is to allow for cancellation of long-running expressions that might otherwise have a negative impact on your application or web server.

You need to start with an IAsyncResult implementation that supports the general asynchronous infrastructure.  We'll also have a bunch of extra information for the regular expression work, namely an expression and input string.  What I've come up with so far supports all of the features I need including cancellation, full reporting of passed in procedures on the async object for later processing of the work item, and fairly decent protection against the internal members controlling the asynchronous operation.

public class AsynchronousRegexResult : IAsyncResult {
    internal object asyncState = null;
    internal Regex expression = null;
    internal Match match = null;
    internal string inputString = null;
    internal bool complete = false;
    internal bool completedSynchronously = false;
    internal bool cancelled = false;
    internal ManualResetEvent waitHandle = null;
    internal AsyncCallback callback = null;
    internal Thread currentThread = null;

    internal AsynchronousRegexResult(Regex innerExpression, string inputString, AsyncCallback callback, object state) {
        this.asyncState = state;
        this.expression = innerExpression;
        this.inputString = inputString;
        this.callback = callback;

        if ( this.expression == null || this.inputString == null ) {
            this.complete = true;
            this.completedSynchronously = true;
        }
        this.waitHandle = new ManualResetEvent(this.completedSynchronously);
    }

    public object AsyncState { get { return this.asyncState; } }
    public bool CompletedSynchronously { get { return this.completedSynchronously; } }
    public bool IsCompleted { get { return this.complete; } }
    public WaitHandle AsyncWaitHandle { get { return this.waitHandle; } }

    public Regex Expression { get { return this.expression; } }
    public string Input { get { return this.inputString; } }
    public Match Result { get { return this.match; } }

    public bool Cancel() {
        if ( this.cancelled || this.complete ) {
            return false;
        }

        this.cancelled = true;
        if ( this.currentThread != null ) {
            this.currentThread.Abort();
        }
        return true;
    }

    internal void Complete() {
        this.complete = true;
        this.waitHandle.Set();

        if ( callback != null ) {
            callback(this);
        }
    }
}

Three additional methods are needed to make this work.  First, we need a BeginInvoke to launch our operation into the thread pool and an EndInvoke to retrieve the results.  We don't really need the EndInvoke, but it builds in semantics for waiting on results if they aren't already available.  Finally, the ThreadPool_WaitCallback will be the protected method that enables the true cancellation semantics and the ability to run our expressions.

using System;
using System.Threading;
using System.Text.RegularExpressions;

public sealed class AsynchronousRegex {

    // Removed result class for brevity, but it is a public nested class.

    public static IAsyncResult BeginInvoke(Regex innerExpression, string inputString, AsyncCallback callback, object state) {
#if DEBUG
        Console.WriteLine("AsynchronousRegex.BeginInvoke::Enter");
#endif
        AsynchronousRegexResult result = new AsynchronousRegexResult(innerExpression, inputString, callback, state);
       
        if ( !result.complete ) {
#if DEBUG
            Console.WriteLine("AsynchronousRegex.BeginInvoke::ThreadPool Queue");
#endif
            ThreadPool.QueueUserWorkItem(new WaitCallback(ThreadPool_WaitCallback), result);
        }
       
#if DEBUG
        Console.WriteLine("AsynchronousRegex.BeginInvoke::Leave");
#endif
        return result;
    }
   
    private static void ThreadPool_WaitCallback(object state) {
#if DEBUG
        Console.WriteLine("AsynchronousRegex.ThreadPool_WaitCallback::Enter");
#endif
        AsynchronousRegexResult result = state as AsynchronousRegexResult;
       
        if ( result != null && !result.cancelled ) {
#if DEBUG
            Console.WriteLine("AsynchronousRegex.ThreadPool_WaitCallback::Setting Current Thread");
#endif
            result.currentThread = Thread.CurrentThread;
#if DEBUG
            Console.WriteLine("AsynchronousRegex.ThreadPool_WaitCallback::Gathering Match");
#endif
            try {
                result.match = result.expression.Match(result.inputString);
            } catch (ThreadAbortException) {
#if DEBUG
                Console.WriteLine("AsynchronousRegex.ThreadPool_WaitCallback::Thread Aborted, Resetting");
#endif
                Thread.ResetAbort();
            } finally {
                result.currentThread = null;
            }
#if DEBUG
            Console.WriteLine("AsynchronousRegex.ThreadPool_WaitCallback::Completing Asynchronous Result");
#endif
            result.Complete();
        }
#if DEBUG
        Console.WriteLine("AsynchronousRegex.ThreadPool_WaitCallback::Leave");
#endif
    }
    
    public static Match EndInvoke(IAsyncResult ar) {
        AsynchronousRegexResult result = ar as AsynchronousRegexResult;
       
        if ( result != null ) {
            result.AsyncWaitHandle.WaitOne();
            return result.match;
        } else {
            throw new Exception("EndInvoke was called with the wrong async result");
        }
    }
}

As mentioned this class is not 100% thread-safe, but will work for the most part.  I ran it for a couple of hours and didn't run into any troubles.  Not a multi-threaded system though so concurrent access issues wouldn't surface easily (context switching might, but in most cases no).

Published Saturday, May 22, 2004 12:40 AM by Justin Rogers
Filed under:

Comments

Saturday, May 22, 2004 6:55 AM by TrackBack

# Running your expressions asynchronously and making them cancellable.

Saturday, May 22, 2004 5:34 PM by David Levine

# re: Asynchronous Regular Expressions using the ThreadPool and a cancellation model.

An interesting class. I might extend this so that the actual operation was called via a generic interface so that the actual work/cancellable operation can be anything.

I whipped up a little test based on this and found a couple of things you might consider.

First, in the ThreadPool_WaitCallback method you should either add a generic catch block or restructure it so that the call to result.Complete(); is in the finally block. The reason is that if the call to perform the work operation (in your example, the Match(...) call) ever throws an exception other then ThreadAbort, the call to result.Complete will never execute and you will instead have an unhandled exception. This is likely to result in some higher level code hanging.

The other thing is that you might use Thread.Interrupt rather then Thread.Abort to signal the worker thread to interrupt its processing. For one thing, your callback might not have sufficient security privileges to call ResetAbort, and for another, using Abort has undesirable side effects that would argue against using it in a generic manner - unless you know that the code you are interrupting can gracefully handle an abort I'd use some other technique.



Saturday, May 22, 2004 6:15 PM by Justin Rogers

# re: Asynchronous Regular Expressions using the ThreadPool and a cancellation model.

I have developed a generic version of the above class, but it doesn't help people trying to do their operations today. Nor does it solve very many problems.

Generic methods can't accept variable numbers of parameters, so the BeginInvoke/EndInvoke method signatures constantly change.

Since this is for regular expressions, I get more performance having the layout abovce. However, you can add an exception local, and move Complete into the finally block, as well as setting the caught exception (if anything but a ThreadAbort) on the async result. When the user calls EndInvoke you re-throw the exception.

I tested the abort code with and without ResetAbort and it doesn't thrash the system either way.

How does Interrupt buy you anything? Interrupt won't cancel a running thread, only Abort does that. Interrupt is capable of signalling a thread that is in a wait state, not a running state. As for the side-effects, I'm well aware of them. With another 50-100 lines of code we could remove those side effects, and make this completely unsuitable for use on a web server.

The two choices are app-domains or using something closer to a native thread. Both approaches make the code iffy in a server environment while making it much more stable and robust in a client environment.
Saturday, May 22, 2004 11:57 PM by TrackBack

# Generic, Cancellable, Asynchronous operations? Yeah, I'll blog about that.

Sunday, May 23, 2004 7:26 AM by David Levine

# re: Asynchronous Regular Expressions using the ThreadPool and a cancellation model.

My original thought was that the class was nicely layed out and a generic version of it would be useful to have; my comments mainly followed from that starting point. By targeting the class to a single type the design is simplified and the comments do not necessarily apply. e.g. if Regex.Match will never throw an exception then there's no need to move the call to Complete.

Using Interrupt has pros and cons. It will interrupt the target thread when it makes a call that puts it into an alertable wait state so its effects are more predictable, but the downside is that if the target thread does not make such a call it will have no effect on it. Again, this may not be useful with Match.

I've had mixed results using abort; if the thread has wandered off into unmanaged code it wont be delivered until it returns to managed code. I've had better results when unloading appdomains - I could unload an appdomain even when I could not abort a specific thread in it.

I don't quite see how using a native thread would work. Would it use the RegEx class, and would you use native signalling methods to interrupt it?

Sunday, May 23, 2004 5:00 PM by Justin Rogers

# re: Asynchronous Regular Expressions using the ThreadPool and a cancellation model.

Native threading API's can be used to ensure that a given thread gets shut-down. This is the not nice way to go about things and basically has the layout of:

Cancel() -> Abort() -> Oops, we still haven't aborted after some time limit? -> TerminateThread.

The goal is to ensure things get cancelled. The Win32 threading API's provide a fast easy way to cancel the operation, while at the same time introducing a slight chance of data corruption. Again, the concept is that in a server environment, you don't want to be loading a large number of AppDomain's. You could share AppDomain's but then bringing one down would mean cancelling some other code that was running in the domain as well.
Monday, May 24, 2004 7:49 AM by David Levine

# re: Asynchronous Regular Expressions using the ThreadPool and a cancellation model.

I agree, it definitely is not a nice way to go about shutting down a thread :-) I experimented with TerminateThread back in runtime v1.0 and decided to not use it in .NET.

One issue was getting the native system Win32 thread handle - there's no exposed mechanism for getting it. I've seen an undocumented method that uses hardocded offsets into the thread class. I came up with an alternative method that worked well enough for my experiment - get the list of all the system threads in my process before creating my worker thread, create the worker thread, get a new list of system threads and look for the new thread and use its handle. If I could not find it or if there were more then one new thread, I would terminate the worker thread and repeat - eventually I would find the handle.

After that I was able to terminate the underlying system thread, but the problem was that the part of the runtime that controlled threads and execution state has no idea that I had yanked the rug out from underneath it. It still thought the managed thread object was alive and well even though I had terminated the system thread that it was mapped to. Perhaps there's a way to tell the runtime that the thread had died but I didn't find it.

Also, TerminateThread does not clean up after a thread so that the thread's Win32 stack is still allocated (1Meg of virtual address space), owned objects (mutexes, etc.) are still owned, so deadlocks can result, resources may leak, data can get corrupted, etc. I've used TerminateThread in control engines but only as an absolute last resort.

Future (present???) versions of the runtime are supposed to support models built on execution units other then threads, such as fibers. I haven't worked with fibers so I don't know how well they map into using the native API to control them from the managed world.

Even though creating/unloading an appdomain is a somewhat heavy operation it does not have all those issues. From what I've observed it appears to be fairly aggressive at unloading an appdomain even when a thread with stack in the doomed appdomain has become "stuck" and unresponsive. If the task was a very long running operation it might be worthwhile to give it its own appdomain, but as you point out there are other issues to consider on a server - all designs involve tradeoffs.


Monday, July 26, 2004 7:55 AM by Darren Neimke

# re: Asynchronous Regular Expressions using the ThreadPool and a cancellation model.

Here's a pattern and an input string to test against:

pattern: a([^b]+|.)*c$
input: accccccccccccccc.

To crank up the time it takes for the Match to complete just add more padding into the middle of the string.

BONUS POINTS : for the first person to correctly reply with how many operations are performed against the above regex with an NFA engine!
Monday, July 26, 2004 12:20 PM by TrackBack

# Running matches Asynchronously for a bit of protection

Monday, August 09, 2004 10:35 AM by TrackBack

# Running matches Asynchronously for a bit of protection

Leave a Comment

(required) 
(required) 
(optional)
(required)