David Stone's Blog

I'm open to suggestions for a subtitle here! (Really!)

More Compiler Construction stuff...

I realize this will bore those of you who don't go to UCSD or aren't interested in how one would go about writing an XQuery interpreter. Sorry. I promise that I'll come up with some interesting posts about LINQ and stuff in the coming weeks. I've got finals coming up though. So it's not likely to be long stuff.

Anyway. Compilers people. Read on.

Decisions, decisions...

For the interpreter, you really only have two choices:

  1. Build an interpreter that goes through your AST and evaluates it. (Something like the visitor pattern. It's just a class that walks the tree...TreeWalker?)
  2. Define evaluate methods on your AST classes that evaluate their children and return a result.

What I Did

For my interpreter, I did 2. At the time, it seemed like a brilliant idea. I mean all I had to do was make it so that each class overrode the evaluate method and returned the appropriate result. So, for instance, my Range class had an evaluate method that looked like so (without type checking or semantic requirements:

public XSequence evaluate(Context context) throws Exception
    BigInteger one = new BigInteger("1");
    BigInteger from = new BigInteger(args[0].toString());
    BigInteger to = new BigInteger(args[1].toString());
    // Return a sequence of integers
    XSequence s = new XSequence();
    while(from.compareTo(to) <= 0)
        s.add(new XInteger(from));
        from = from.add(one);
    return s;

Now this is great. When I have a Range object, it just gets evaluated (in the given context) and returns an XSequence of XIntegers, right? Fantastic. For the most part, this worked.

What I Should Have Done

Honestly, looking back, it makes more sense to write a treewalker class that goes through your AST and does stuff based on the type of node it's going to find. This is how antlr (a very popular and fully featured compiler construction toolkit) does things and it makes more sense in certain cases. Especially when it comes to the error checking in function call code and type checking in your context code.

Universal Truths

This class is not context free...

Regardless of what you choose to do, you can rest assured that you'll need a Context class. At least, that's what I called it. You could also call it the environment or something like that. In my case, the Context class contained all the global variables, all the global functions, all the namespaces, etc. Methods to add variables and remove them (bind and unbind, essentially, so that you could deal with method parameters with the same name as global variables), etc. I also threw a few static methods in there to deal with type checking for the application so that I always had access to those methods whenever I needed to use them. It's also handy for doing type checking on function calls and return types.

Typing is important

Speaking of types. Build the types first. I swear. You must do this. And define as many of the built in functions in your type classes. For instance, you need to implement set theoretic operators (union, intersect, except) on sequences, right? How about defining:

public XSequence union(XSequence other);
public XSequence intersect(XSequence other);
public XSequence except(XSequence other);

That way you make each type responsible for its share of the built in functions. And it makes your built in function code much much cleaner.

Another nice thing about abstracting your types like this and shoving as much functionality as possible into them is that you can more easily write unit tests for your own type classes than you can for the AST classes. That way you can ensure that your set theoretic operators are working and that all those other built-ins are working just fine. After all, those are the building blocks of every XQuery program.

Built In Functions

I also had a class called BuiltInFunctions. This was pretty cool. So in XQuery, there's a FunctionDeclaration, right? You probably have a node that serves this purpose in your AST. You'll also have a FunctionCall that asks the Context class for a matching function declaration, hands over the arguments, and executes the ExpressionElement that is the body of the FunctionDeclaration. Well, since I was using the evaluate method I described earlier, I was able to use Java's anonymous classes for the built-in functions. Like so:

    new FunctionDeclaration("op:intersect", "sequence",
    new ExpressionElement("intersectBLOCKED EXPRESSION
        public XSequence evaluate(Context context) throws Exception
            XSequence set1 = (XSequence)context.getVariable("set1").evaluate(context);
            XSequence set2 = (XSequence)context.getVariable("set2").evaluate(context);
            return set1.intersect(set2);
    new ArgumentDeclaration("set1", "sequence"),
    new ArgumentDeclaration("set2", "sequence")));

So now my global functions variable has a function declaration for the built in op:intersect function...and the fact that my new intersectExpression element has an evaluate method that will return the set intersection of the two arguments. Two things to note here: 1) It's clean because of the methods I implemented on my types. 2) It negates the need for me to make a ton of classes for each built in function.

If you do decide to go the route of the tree-walker, you'll probably do something like this. But I'm tired and can't decide exactly how I'd work it. Probably every time I saw a built in function being called, I'd just call the method in my treewalker class that executed that built in.

Start Early...

My last comment would be to start this one early. It will take lots of time.

You guys all know my phone number. (If not, Scott has it.) And I'm in the channel enough. So if you've got questions, feel free to ask.


... said:

Gute Arbeit hier! Gute Inhalte.

# March 3, 2009 9:10 PM

weblogs.asp.net said:

More compiler construction stuff.. Outstanding :)

# March 25, 2011 8:53 PM

weblogs.asp.net said:

More compiler construction stuff.. WTF? :)

# April 24, 2011 3:00 AM

weblogs.asp.net said:

More compiler construction stuff.. May I repost it? :)

# June 7, 2011 2:53 AM