Justin's language processor series : Reuseable Lexer and TokenEnumerator
Yesterday I mentioned the series of articles which Justin has been writing on language processing. This series has really opened my eyes to a few techniques which will improve several of my parsing techniques and provide me with the ability to produce high-performance language parsers in a shorter period of time.
Firstly, Justin introduced a basic lexer which will tokenize a source input into a fairly generic array of tokens. He then implemented several enumerator classes to read through a token stream and return application specific, typed tokens. Moving forward these 2 objects will be the staple items of all my future language processors; the lexer probably wouldn't change much and the enumerator would only require a minor alteration to the kinds of typed tokens which it returns.
By the time you get to writing a parser you already have all of the common functions for reading and moving through input stored away in these 2 classes allowing you to focus on implementing the grammar parsing and state transition algorithms without having to bother with the plumbing. The added bonus is that, while the common plumbing functionality - such as ReadNextChar(), IsSpecialChar() and IsEndOfStream() - are monotonous they are undoubtedly a potential source of errors; having common classes to carry forward (the lexer and enumerator classes) means that future parsers are working atop tested code, thus eliminating those silly errors - such as "out by one's" and stuff.
My key learnings to this point have been:
- Lexing into tokens simplifies the amount of parsing logic
- Implementing the enumerator pattern is an elegant way of reading through an input stream
- Separating the functions of language processing can improve performance, re-use, maintainability and performance
I'm still working through some of the more recent articles which deal with state transition graphs, state machine algorithms and recursive descent, top down parsing. I'll post another summary about the learnings and cool features of this stuff when I've covered it well enough.