Building a basic parser
Since going through the regex code in Rotor I've become very interested in how to build parsers. I started building my first parser last week and I can say that it's a truly fasciniating experience. It reminds me of flying on a flying-fox ride. When you hop on the flying-fox ride you hold on handle which zips along the length of the wire; during your journey you can scan the surrounds and take in many pleasurable sights. Parsers are really no different. A basic parser would look something like this:
Dim lbl As New Label Me.Controls.Add(lbl)
Dim str As String = "This is some text." Dim idx As Integer = 0
While idx < str.Length lbl.Text &= str.Chars(idx).ToString() idx += 1 End While
As you build up your parser you encounter a myriad of useful Helper routines, some of which include:
- CurrentPos : returns the current parsing index within the text.
- ReadNextChar : peek ahead at the next character.
- GetNextChar : peek ahead at the next character and increment the parsing index.
- IsWhitespace : Evaluate whether a char is a whitespace character
- IsDigit : Evaluate whether a char is a number character
- IsWordCharacter : Evaluate whether a char is a number or letter character
- EOF : Used to determine whether the current parsing index has reached the end of the text.
As you build up your libraries of helper routines you push the operations which are prone to cause exceptions - such as boundary errors - down to lower levels and end up with abstracted code such as:
While Not EOF() lbl.Text &= GetNextChar().ToString() End While
In addition to the "IsX" type functions which allow you to determine the "specialness" of a character, you can also create specific scanning/parsing routines which parse special textual nodes - such as whitespace. Here's a working demo of a dirt simple parser which uses some of these helpers (notice how abstracted the code in the final loop is from the code that I started with):
Protected Overrides Sub OnLoad(ByVal e As System.EventArgs) ''''''''''''''''''''''''''''''''''''''''''''' ' Driver Loop ''''''''''''''''''''''''''''''''''''''''''''' While Not EOF() SlurpSpaces() SlurpWords() End While End Sub ''''''''''''''''''''''''''''''''''''''''''''' ' Specific Parsing Routines for special nodes ''''''''''''''''''''''''''''''''''''''''''''' Private Sub SlurpSpaces() While Not EOF() AndAlso IsWhitespaceChar(ReadNextChar()) lbl.Text &= GetNextChar().ToString() End While End Sub
Private Sub SlurpWords() While Not EOF() AndAlso Not IsWhitespaceChar(ReadNextChar()) lbl.Text &= GetNextChar().ToString() End While End Sub ''''''''''''''''''''''''''''''''''''''''''''' ' Helper Routines ''''''''''''''''''''''''''''''''''''''''''''' Private Function EOF() As Boolean Return idx >= str.Length End Function
Private Function GetNextChar() As Char GetNextChar = str.Chars(idx) idx += 1 End Function
Private Function ReadNextChar() As Char Return str.Chars(idx) End Function
Private Function IsWhitespaceChar(ByVal ch As Char) As Boolean Return ch.ToString.Trim.Length = 0 End Function
I'm really hoping to find the time to build up a full article about creating generic parsers because it's a really useful topic.