XMLDocuments and Default Namespaces

Wednesday, December 8, 2004

In my latest project, we have been working a lot with some complex XML documents, which never used to have a default namespace. eg.

<MyRoot>
<Element>
......

</Element>
<MyRoot>

instead of

<MyRoot xmlns="urn:MyFunkyNamespace">
<Element>
......

</Element>
<MyRoot>

however, now we have to deal with a mix of XML documents, some of which have default namespaces, some of which dont. As you may know, when constructing your XPath statements, you need to make sure you take this into account as you will need to prefix your XML elements in your XPath which an identifier which is associated with the namespace. So the XPath statement below :-

//Node1/Node2/Element/text()

is ok for documents without a namespace, but as soon as you add a simple default namespace, it wont work. It needs to contain a prefix like so:-

//ns:Node1/ns:Node2/ns:Element/text()

where ns = "urn:YourDefaultNamespace".

This has caused some of our developers some major grief as they were not aware of this behaviour. Also, a large amount of existing XPath code was written to not expect default namespaces. To make matters worse, we need to support both situations, XMLDocuments with and without default namespaces. To that end, I hacked together a bit of code to detect if a default namespace is present, and to "pre prepare" an XPath statement for use in both situations. Code is below:

To detect a default namespace:

xmlDoc = new XmlDocument();
xmlDoc.LoadXml(someDocument);

// Determine the default namespace if one exists.
XPathNavigator nav = xmlDoc.CreateNavigator();
nav.MoveToRoot();
nav.MoveToFirstChild();

do
{
   if (nav.Name == PREDEFINED_ROOT_ELEMENT || (nav.NamespaceURI != null && nav.NamespaceURI != string.Empty))
   {
      useXmlNamespacePrefix = true;
      xmlNamespace = nav.NamespaceURI;
      break;
   }
} while (nav.MoveToNext());

And in all our XPath statements, we run them through a "PrepareXPath" method, which looked something like this :-

public string PrepareXPath(string xpathStatement)
{
   const string nsPrefix = "ns1"; // Our namespace prefix
   string nsNewElementStartSlash = string.Format("/{0}:",nsPrefix); // What we replace/prepend a forward slash "/" with
   string nsNewElementStartNoSlash = string.Format("{0}:",nsPrefix); // What we prepend a single element name with (no slash)

   string retXPath = null; // our eventual return value;
   try
   {
      System.Text.StringBuilder newXpath = new System.Text.StringBuilder(xpathStatement);
      // Create our Namespace manager if required
      if (useDefaultXmlNamespaceFlag && !nsManagerCreated)
      {
         namespaceMgr = new XmlNamespaceManager(xmlDoc.NameTable);
         namespaceMgr.AddNamespace(String.Empty,xmlNamespaceText);
         namespaceMgr.AddNamespace(nsPrefix,xmlNamespaceText);
         nsManagerCreated = true;
      }

      // Now perform a text substitution if we are using a default namespace. Here we prepend the namespace prefix defined in our
      // namespace manager to all elements in the XPath statement.
      if (useDefaultXmlNamespaceFlag)
      {
         if (xpathStatement != "." && xpathStatement != "..")
         {
            int idx = newXpath.ToString().IndexOf("/");
            if ( idx >= 0) // If we find a slash, then replace it with an appropriate slash-prefix
            {
               newXpath.Replace("/@","<-#->"); // Bit of a hack but replaces any instances of "/@" with a character set that is very unlikely to appear in XPath
                                             // We do this as "/@" can be valid when querying attributes but we dont prefix attributes so if we continue with
                                             // the search and replace, this would end up being prefixed, and create invalid xpath.
                newXpath.Replace("/",nsNewElementStartSlash);
                newXpath.Replace("<-#->","/@"); // restore the attribute.

                // --Update--
                if (xpathStatement[0] != '/' && xpathStatement[0] != '.')
                {
                   string tmp = string.Format("{0}:{1}",nsPrefix,newXpath.ToString());
                   newXpath = new System.Text.StringBuilder(tmp);
                }
            } else // If there are no "/", we assume they have passed only a single element name, which we still have to qualify/pre-pend with the namespace prefix
            {
               System.Text.StringBuilder tmpSb = new System.Text.StringBuilder();
                tmpSb.AppendFormat("{0}{1}",nsNewElementStartNoSlash,newXpath.ToString());
                newXpath = tmpSb;
            }
         }
      }

      retXPath = newXpath.ToString();
   }
   catch (Exception ex)
   {
      string msg = string.Format("Error processing XPath statement [{0}] DefaultNamespace: [{1}]. Execution may not yield expected results. Returning original XPath statement supplied.",xpathStatement,xmlNamespaceText);
      EventReporting.WriteEventLogError("",msg,ex,2);
      retXPath = xpathStatement;
   }

   return retXPath;

}

So now the XPath statements are:-

string xPath = PrepareXpath("//Node1/Node2/Element/text()");

All the prefixes are put in there if required, and the xpath works in documents with or without namespaces.

Although a bit of a hack, it seems to work well (note: the namespace manager object and XMLDocument object are predefined objects in the class that this method was a part of, but it can easily be adapted to suit). It is a pain, to ensure the XPath statements are run through this, and a more elegant solution could be achieved with the clever use of a facade class to get around this, but, archtectural issues aside, it did solve the immediate problem and the code/technique is what is of import here.

Hope it helps someone. I will also note that I think a default namespace should be exactly that, a "default" namspace which is not how it seems to work (from a logical point of view) . If none is supplied, then the default should be (IMO) assumed. I understand the technical issues as to why the default namespace has this particular behaviour (ie. dont prefix your XPath statements when a default namespace is present and they will fail), it just doesn't seem to fit my a practical understanding of a default value.

venom00, what happens is.... you get to fix it :-)

Glav - Wednesday, November 12, 2008 6:06:16 AM

How can I get all code ?

What is PREDEFINED_ROOT_ELEMENT ??

Thanks

alhambraeidos - Thursday, May 28, 2009 6:22:34 AM

Excellent Article...!
Probably you are not aware how much pain this thing was causing to my code. Now with your article its resolved.

But I wonder why Microsoft does not highlight such caveats clearly, I almost burned my eyes reading through the MSDN

Thanks!

Tushar - Thursday, June 18, 2009 1:53:47 PM

3 Comments