Building a basic parser

Since going through the regex code in Rotor I've become very interested in how to build parsers. I started building my first parser last week and I can say that it's a truly fasciniating experience. It reminds me of flying on a flying-fox ride. When you hop on the flying-fox ride you hold on handle which zips along the length of the wire; during your journey you can scan the surrounds and take in many pleasurable sights. Parsers are really no different. A basic parser would look something like this:

Dim lbl As New Label
Me.Controls.Add(lbl)
Dim str As String = "This is some text." Dim idx As Integer = 0
While idx < str.Length lbl.Text &= str.Chars(idx).ToString() idx += 1 End While

 

As you build up your parser you encounter a myriad of useful Helper routines, some of which include:

  • CurrentPos : returns the current parsing index within the text.
  • ReadNextChar : peek ahead at the next character.
  • GetNextChar : peek ahead at the next character and increment the parsing index.
  • IsWhitespace : Evaluate whether a char is a whitespace character
  • IsDigit : Evaluate whether a char is a number character
  • IsWordCharacter : Evaluate whether a char is a number or letter character
  • EOF : Used to determine whether the current parsing index has reached the end of the text.

As you build up your libraries of helper routines you push the operations which are prone to cause exceptions - such as boundary errors - down to lower levels and end up with abstracted code such as:

While Not EOF()
     lbl.Text &= GetNextChar().ToString()
End While

 

In addition to the "IsX" type functions which allow you to determine the "specialness" of a character, you can also create specific scanning/parsing routines which parse special textual nodes - such as whitespace. Here's a working demo of a dirt simple parser which uses some of these helpers (notice how abstracted the code in the final loop is from the code that I started with):

    Protected Overrides Sub OnLoad(ByVal e As System.EventArgs)
        '''''''''''''''''''''''''''''''''''''''''''''
        ' Driver Loop
        '''''''''''''''''''''''''''''''''''''''''''''
        While Not EOF()
            SlurpSpaces()
            SlurpWords()
        End While
    End Sub

    '''''''''''''''''''''''''''''''''''''''''''''
    ' Specific Parsing Routines for special nodes
    '''''''''''''''''''''''''''''''''''''''''''''
    Private Sub SlurpSpaces()
        While Not EOF() AndAlso IsWhitespaceChar(ReadNextChar())
            lbl.Text &= GetNextChar().ToString()
        End While
    End Sub
Private Sub SlurpWords() While Not EOF() AndAlso Not IsWhitespaceChar(ReadNextChar()) lbl.Text &= GetNextChar().ToString() End While End Sub ''''''''''''''''''''''''''''''''''''''''''''' ' Helper Routines ''''''''''''''''''''''''''''''''''''''''''''' Private Function EOF() As Boolean Return idx >= str.Length End Function
Private Function GetNextChar() As Char GetNextChar = str.Chars(idx) idx += 1 End Function
Private Function ReadNextChar() As Char Return str.Chars(idx) End Function
Private Function IsWhitespaceChar(ByVal ch As Char) As Boolean Return ch.ToString.Trim.Length = 0 End Function

 

I'm really hoping to find the time to build up a full article about creating generic parsers because it's a really useful topic.

11 Comments

  • It's been a while since I've created a parser, but I believe what you have here is a lexer, not a parser. A lexer turns free form text into a series of tokens. A parser turns the series of lexicons into meaning.



    For example, take the statement:



    If x = 4 then Response.Write(&quot;foobar&quot;)



    Lexing would identify that the statement consists of the lexicons:



    If

    x

    =

    4

    then

    Response.Write(&quot;foobar&quot;)



    Parsing would then turn the lexicons into, say, an abstract syntax tree corresponding to VB.NET's grammar.



    Again, it's been too long since I've looked at this stuff, so my terminology may be off/incorrect. :-)

  • Thanks so much Frans and Scott for taking the time to comment on my code. As you can probably tell, I'm just a NEWBIE at this stuff trying to learn some new tricks ;-)



    As for the Dragon Book recommendation Frans - you are too late; Scott already mentioned it to me and I've added it to my Amazon Shopping Cart.



    Gee I love learning stuff :D

  • Sir/ Madam,
    I am a Master Student of Information Science at the University Of Ibadan, Nigeria. I am researching on building a parser for Yoruba language, a major language in nigeria. Is there any way you can be of assistance to me? Please treat it urgently. Thank
    e-mail: akibod@yahoo.com


  • United we stand, divided we fall,


  • One lie leads to another.


  • The best things come in small packages.

  • I do not know if it is just me or if everybody else encountering difficulties with your website. It appears as if some of the written text on your content are running off the screen. Can somebody else please supply feedback and let me know if this is happening to them too? This might be a dilemma with my web browser for the reason that I've had this take place just before.

  • Couldn't be written any better. Reading this post reminds me of my old room mate! He normally kept talking about this. I will forward this write-up to him. Fairly positive he will have a fantastic read. Thanks for sharing!

  • Если потерять чувство юмора, что останется?
    мобильные голосовые приколы

  • Hair diversify in length of life and bawl out of growth. Longest-living fraction on his mind - to 4 or uniform 10 years, but the whisker eye the armpits, eyebrows and eyelashes - only 3-4 months. Japanese wife Hiroko Yamaske took 18 years to reach its braid length of 2.6 m common flowering of whisker per date - back 0.35-0.4 mm, and at end of day they reach below par, and preferably in the evening. On the head, beard and underarm trifle grows more actively than in the recess of the body.

  • A colleague referred me to this site. Thnx for the details.

Comments have been disabled for this content.