Not the Way to Introduce XmlTextReader

Thom Robbins is a great guy, but unfortunately for him he has bumped into one of my major pet peeves, Viral Coding Examples with his Introducing the XmlTextReader post.  It really isn’t his fault, since the code he uses is very similar to the code example in the XmlTextReader.Read()  documentation, and I complained about that code to the System.Xml team at the MVP Summit.  I did promise to write something up on it, and Thom’s post finally got me to do it (it has been months since I promised to write this up).

The problem is in the structure of the code:

Dim xmlFileStream As New FileStream("cust.xml", FileMode.Open)
Dim xmlRead As New XmlTextReader(xmlFileStream)

While xmlRead.Read
    xmlRead.MoveToContent()
     If xmlRead.HasValue Then
         MsgBox(xmlRead.Value)
     End If
End While

xmlRead.Close()

xmlFileStream.Close()

At first glance the code looks perfectly fine.  But, knowing full well that some developer new to System.Xml will be using this code as a template for bigger things we have an obligation to make it easier for them to adapt this code without causing “strange” errors.

Problem #1

No explicit setting of the WhitespaceHandling option.  Unless the developer is familiar with XmlTextReader (which shouldn't be in this case), they would not know that the default is WhitespaceHandling.All, which causes the reader to return all Whitespace and SignificantWhitespace nodes (which will definitely confuse the developer).  So after the declaration of the xmlRead variable you should set the WhitespaceHandling property.

xmlRead.WhitespaceHandling = WhitespaceHandling.None

At least now the developer realizes that there is a property for WhitespaceHandling, and will/can change it as needed.

Problem #2

Implicit Control of Reads in While Loops.  My biggest problem with the code examples used to XmlTextReader has to do with the While xmlRead.Read loops.  Although it looks very harmless, the while loop that executes a read at the beginning (or the end) of a looping structure will cause bugs to creep into the code because of the way other methods on the XmlTextReader handle the cursor used to point to the current node.  If all you do is execute Reads via the while loop, you are perfectly fine.  But once you add code that manipulates the cursor from within the while loop, now you run the chance of skipping nodes accidentally.

Here’s a great example.  You have an XML stream that looks like this:

<ROOT>
 <LEVEL1>
  <LEVEL2>1st Level2 text node</LEVEL2>
  <LEVEL2>2nd Level2 text node </LEVEL2>  
 </LEVEL1>
</ROOT>

And you want to print out the contents of the elements level2, so you modify the standard code example to look like this:

While xmlRead.Read
    xmlRead.MoveToContent()
    If xmlRead.IsStartElement() then
        If xmlReader.Name = “level2” then
           MsgBox(xmlRead.ReadInnerXml())
        End If
    End If
End While

And you know what, it works fine.  But say the XML stream does not have all that pretty whitespace, or that they took my advice and set the WhitespaceHandling property (in this case to None).  Now the code doesn’t work, since you had a bug in your code and you didn’t know it.  What?  How is that?  Well, the ReadInnerXml method reads to the first node past the EndElement.  In the case of the XML Stream with the nice formatting (and when WhitespaceHandling is All) the next node is a whitespace node, and when the while loop fires the Read method, all is well and the cursor is moved to the next Node (which should be the next StartElement, otherwise the MoveToContent method will move you to the next content node (which is any node that is non-white space text, CDATA, Element, EndElement, EntityReference, or EndEntity)).  But without the whitespace nodes to stop the ReadInnerXml method the cursor is positioned at the next non whitespace node (which in this case is the StartElement for level2) and then the while loop fires the Read method, and when we enter the loop we have now read past the StartElement and the if condition is not met (and we skip the whole element).

So, what is a better example for an introduction to the XmlTextReader?  Explicitly control when a Read is executed.

XmlRead.WhitespaceHandling = WhitespaceHandling.None
Dim Continue as Boolean
If xmlRead.Read = False then
    Continue = False
End If
While Continue
    If xmlRead.IsStartElement then
        If xmlRead.Name = “level2” then
            MsgBox(xmlRead.ReadInnerXml())
        Else
            Continue = xmlRead.Read()
        End If
    Else
        Continue = xmlRead.Read()
    End If
End While

Now we have explicit control over when a Read is executed, and in the case of rogue methods that place your cursor to the next node (that you haven’t tested yet), you can skip the implicit Read.

If you want, you can download a fully functional example with 5 different test cases.


The preceding blog entry has been syndicated from the DonXML Demsak’s All Things Techie Blog.  Please post all comments on the original post. 

No Comments