DonXML Blog

The East Coast Don

November 2004 - Posts

Not the Way to Introduce XmlTextReader

Thom Robbins is a great guy, but unfortunately for him he has bumped into one of my major pet peeves, Viral Coding Examples with his Introducing the XmlTextReader post.  It really isn’t his fault, since the code he uses is very similar to the code example in the XmlTextReader.Read()  documentation, and I complained about that code to the System.Xml team at the MVP Summit.  I did promise to write something up on it, and Thom’s post finally got me to do it (it has been months since I promised to write this up).

The problem is in the structure of the code:

Dim xmlFileStream As New FileStream("cust.xml", FileMode.Open)
Dim xmlRead As New XmlTextReader(xmlFileStream)

While xmlRead.Read
    xmlRead.MoveToContent()
     If xmlRead.HasValue Then
         MsgBox(xmlRead.Value)
     End If
End While

xmlRead.Close()

xmlFileStream.Close()

At first glance the code looks perfectly fine.  But, knowing full well that some developer new to System.Xml will be using this code as a template for bigger things we have an obligation to make it easier for them to adapt this code without causing “strange” errors.

Problem #1

No explicit setting of the WhitespaceHandling option.  Unless the developer is familiar with XmlTextReader (which shouldn't be in this case), they would not know that the default is WhitespaceHandling.All, which causes the reader to return all Whitespace and SignificantWhitespace nodes (which will definitely confuse the developer).  So after the declaration of the xmlRead variable you should set the WhitespaceHandling property.

xmlRead.WhitespaceHandling = WhitespaceHandling.None

At least now the developer realizes that there is a property for WhitespaceHandling, and will/can change it as needed.

Problem #2

Implicit Control of Reads in While Loops.  My biggest problem with the code examples used to XmlTextReader has to do with the While xmlRead.Read loops.  Although it looks very harmless, the while loop that executes a read at the beginning (or the end) of a looping structure will cause bugs to creep into the code because of the way other methods on the XmlTextReader handle the cursor used to point to the current node.  If all you do is execute Reads via the while loop, you are perfectly fine.  But once you add code that manipulates the cursor from within the while loop, now you run the chance of skipping nodes accidentally.

Here’s a great example.  You have an XML stream that looks like this:

<ROOT>
 <LEVEL1>
  <LEVEL2>1st Level2 text node</LEVEL2>
  <LEVEL2>2nd Level2 text node </LEVEL2>  
 </LEVEL1>
</ROOT>

And you want to print out the contents of the elements level2, so you modify the standard code example to look like this:

While xmlRead.Read
    xmlRead.MoveToContent()
    If xmlRead.IsStartElement() then
        If xmlReader.Name = “level2” then
           MsgBox(xmlRead.ReadInnerXml())
        End If
    End If
End While

And you know what, it works fine.  But say the XML stream does not have all that pretty whitespace, or that they took my advice and set the WhitespaceHandling property (in this case to None).  Now the code doesn’t work, since you had a bug in your code and you didn’t know it.  What?  How is that?  Well, the ReadInnerXml method reads to the first node past the EndElement.  In the case of the XML Stream with the nice formatting (and when WhitespaceHandling is All) the next node is a whitespace node, and when the while loop fires the Read method, all is well and the cursor is moved to the next Node (which should be the next StartElement, otherwise the MoveToContent method will move you to the next content node (which is any node that is non-white space text, CDATA, Element, EndElement, EntityReference, or EndEntity)).  But without the whitespace nodes to stop the ReadInnerXml method the cursor is positioned at the next non whitespace node (which in this case is the StartElement for level2) and then the while loop fires the Read method, and when we enter the loop we have now read past the StartElement and the if condition is not met (and we skip the whole element).

So, what is a better example for an introduction to the XmlTextReader?  Explicitly control when a Read is executed.

XmlRead.WhitespaceHandling = WhitespaceHandling.None
Dim Continue as Boolean
If xmlRead.Read = False then
    Continue = False
End If
While Continue
    If xmlRead.IsStartElement then
        If xmlRead.Name = “level2” then
            MsgBox(xmlRead.ReadInnerXml())
        Else
            Continue = xmlRead.Read()
        End If
    Else
        Continue = xmlRead.Read()
    End If
End While

Now we have explicit control over when a Read is executed, and in the case of rogue methods that place your cursor to the next node (that you haven’t tested yet), you can skip the implicit Read.

If you want, you can download a fully functional example with 5 different test cases.


The preceding blog entry has been syndicated from the DonXML Demsak’s All Things Techie Blog.  Please post all comments on the original post. 
Posted: Nov 29 2004, 09:59 PM by DonXML
Filed under: ,
Alternate User Group Meeting Arrangements

I’ve been trying (with help from ScottW and the local MS DEs) to start up a new user group in NJ, but I’ve run into a couple problems, the most important is the lack of a standard meeting date that does not intrude on the pre-existing user groups.  The problem is a good one to have, since it means that the developer community is of sufficient size to support more focused user groups (rather than the typical general purpose groups).  But by focusing the user group on one topic, it also limits its potential audience, so we need to make it available to a larger group of developers.  What I was thinking of doing is to create 3 new user groups (or one big one with 3 different tracks) which all meet at the same location, just separate rooms.  The 3 groups would be an Asp.Net group, a SQL Development DBA group (for folks who write sprocs and DTS packages, OLAP and such, geared towards the new Yukon dev stuff) and a traditional SQL Server Support DBA group (for the traditional backup/recovery/performance stuff).  Then, instead of meeting every month on a weeknight, meet once a quarter on a Saturday morning and have 3 presentations per group (or track) which would mean 9 sessions in total.  The idea is that most folks can't travel more than 20 miles on a weeknight because that is about an hour commute time (thanks to traffic), so a Saturday morning (with no traffic) would mean that people could travel further.  But who wants to give up 1 Saturday morning a month?  So if we have 3 sessions on a quarterly basis we could meet once a quarter, and still cover the same amount of material, and only have to give up 1 Saturday a quarter.

I’ve tried to find out what others areas are doing to solve this problem, but I haven’t found others that have run into this (I would think that only high density populations of developers, like Silicon Valley, would have hit this yet).  Anyone have feedback for me on this?  I guess trying to keep a community active in a group that meets only once a quarter may be an issue, but with a proper community site, and support from the monthly user groups, I’m inclined to believe that this shouldn’t be a problem. 

Also, does anyone know of a User Group that is a member of PASS and Ineta?  I would think that with Yukon coming next year, more groups will be registered with both user communities.


The preceding blog entry has been syndicated from the DonXML Demsak’s All Things Techie Blog.  Please post all comments on the original post. 
Posted: Nov 15 2004, 08:30 PM by DonXML
Filed under:
Try Catch Differences between VB.Net and C#

I’ve been extremely busy since coming back from Vegas, and even caught a little flack for not posting more quality stuff (and I agree, the quality isn’t there at the moment, but just wait until you see what I’ve been working on).

I ran across this little tidbit a couple weeks ago, and I wasn’t going to post it because I didn’t want to start another VB.Net versus C# thread, but I think it shows some of things done in the name of backward compatibility with VB6 which are helping kill a perfectly good language (VB.Net).   I’ve got a bunch of other stuff that VS.Net does to “help” the VB.Net programmer, but only succeeds in making it hard for them to produce enterprise ready code, but this “flaw” is in the complier not the IDE.

My currently client has requested that the code be done in VB.Net, so I’m living a world trying to make VB.Net adhere to the same coding styles as C# (no VB only functions, using namespaces, no BAS files, good OO and Domain Driven Design (well sort of)) and fighting the IDE the whole way.  I was disassembling one of our libraries and noticed a reference to VisualBasicMicrosoft.VisualBasic even though I specifically removed the default import of that namespace.  I was curious as to why that was happening and noticed that it was only in the Try Catch statements.  I thought that maybe it was something I was doing so I created 2 projects, one in C# and one in VB.Net, with one class, and a simple Try Catch in each

C#


using System;
public class Class1
{
 public Class1()
 {
  try
  {
   Array a;
  }
  catch (Exception ex)
  {
   Console.WriteLine(ex.Message);
   throw (ex);
  }
 }
}

VB.Net

Imports System
Public Class Class1
    Public Sub New()
        Try
            Dim a As Array
        Catch ex As Exception
            Console.WriteLine(ex.Message)
            Throw (ex)
        End Try
    End Sub
End Class

You would think that both sets of code would compile down to the same IL, but they don’t.

C# IL


.method public hidebysig specialname rtspecialname
        instance void  .ctor() cil managed
{
  // Code size       23 (0x17)
  .maxstack  2
  .locals init ([0] class [mscorlib]System.Array V_0,
           [1] class [mscorlib]System.Exception ex)
  IL_0000:  ldarg.0
  IL_0001:  call       instance void [mscorlib]System.Object::.ctor()
  .try
  {
    IL_0006:  leave.s    IL_0016
  }  // end .try
  catch [mscorlib]System.Exception
  {
    IL_0008:  stloc.1
    IL_0009:  ldloc.1
    IL_000a:  callvirt   instance string [mscorlib]System.Exception::get_Message()
    IL_000f:  call       void [mscorlib]System.Console::WriteLine(string)
    IL_0014:  ldloc.1
    IL_0015:  throw
  }  // end handler
  IL_0016:  ret
} // end of method Class1::.ctor

VB.Net IL


.method public specialname rtspecialname
        instance void  .ctor() cil managed
{
  // Code size       29 (0x1d)
  .maxstack  2
  .locals init (class [mscorlib]System.Array V_0,
           class [mscorlib]System.Exception V_1)
  IL_0000:  ldarg.0
  IL_0001:  call       instance void [mscorlib]System.Object::.ctor()
  .try
  {
    IL_0006:  leave.s    IL_001c
  }  // end .try
  catch [mscorlib]System.Exception
  {
    IL_0008:  dup
    IL_0009:  call       void [Microsoft.VisualBasic]Microsoft.VisualBasic.CompilerServices.ProjectData::SetProjectError(class [mscorlib]System.Exception)
    IL_000e:  stloc.1
    IL_000f:  ldloc.1
    IL_0010:  callvirt   instance string [mscorlib]System.Exception::get_Message()
    IL_0015:  call       void [mscorlib]System.Console::WriteLine(string)
    IL_001a:  ldloc.1
    IL_001b:  throw
  }  // end handler
  IL_001c:  ret
} // end of method Class1::.ctor


The VB.Net IL has one distinct addition to the IL within the Catch block there is a call to the VisualBasic dll, SetProjectError.  Why would the VB Team add this call to their compiler?  Backward Compatibility with VB6.  As per Niklas (from the VB Compiler team:

“The extra two calls are there to support the "On Error" language feature that was retained to make it easier to upgrade from VB6 to VB.NET. … they only cost you time (and very little) if an exception actually happens. The time for the two calls is minor compared to the overhead of propagating exceptions.”

My problem with this is that you get this even if you are not using the old Or Error syntax.  There is no reason why this can’t be a compiler option, or even better yet, let the compiler figure out is On Error is used and act accordingly.  Because C# is not (currently) hindered by backward compatibility, it can avoid such issues (for now).  I know this really isn’t that big of a deal, in terms of performance, it is just a VB mindset issue that helps to promote the idea that VB.Net is a second class language (which is re-enforced by little things like this).

[Corrected the VB IL code, since it was compiled with the debug option]


The preceding blog entry has been syndicated from the DonXML Demsak’s All Things Techie Blog.  Please post all comments on the original post. 

Posted: Nov 15 2004, 10:47 AM by DonXML
Filed under:
More Posts