XMLDocuments and Default Namespaces

Published Thursday, December 09, 2004 8:46 AM

In my latest project, we have been working a lot with some complex XML documents, which never used to have a default namespace. eg.

<MyRoot>
   <Element>
     ......

   </Element>
<MyRoot>

instead of

<MyRoot xmlns="urn:MyFunkyNamespace">
   <Element>
     ......

   </Element>
<MyRoot>

however, now we have to deal with a mix of XML documents, some of which have default namespaces, some of which dont. As you may know, when constructing your XPath statements, you need to make sure you take this into account as you will need to prefix your XML elements in your XPath which an identifier which is associated with the namespace. So the XPath statement below :-

//Node1/Node2/Element/text()

is ok for documents without a namespace, but as soon as you add a simple default namespace, it wont work. It needs to contain a prefix like so:-

//ns:Node1/ns:Node2/ns:Element/text()

where ns = "urn:YourDefaultNamespace".

This has caused some of our developers some major grief as they were not aware of this behaviour. Also, a large amount of existing XPath code was written to not expect default namespaces. To make matters worse, we need to support both situations, XMLDocuments with and without default namespaces. To that end, I hacked together a bit of code to detect if a default namespace is present, and to "pre prepare" an XPath statement for use in both situations. Code is below:

To detect a default namespace:

xmlDoc = new XmlDocument();
xmlDoc.LoadXml(someDocument);

// Determine the default namespace if one exists.
XPathNavigator nav = xmlDoc.CreateNavigator();
nav.MoveToRoot();
nav.MoveToFirstChild();

do
{
   if (nav.Name == PREDEFINED_ROOT_ELEMENT || (nav.NamespaceURI != null && nav.NamespaceURI != string.Empty))
   {
     
useXmlNamespacePrefix =
true;
     
xmlNamespace = nav.NamespaceURI;
     
break;
   }
}
while (nav.MoveToNext());

And in all our XPath statements, we run them through a "PrepareXPath" method, which looked something like this :-

public string PrepareXPath(string xpathStatement)
{
  
const string nsPrefix = "ns1"; // Our namespace prefix
  
string nsNewElementStartSlash = string.Format("/{0}:",nsPrefix); // What we replace/prepend a forward slash "/" with
  
string nsNewElementStartNoSlash = string.Format("{0}:",nsPrefix); // What we prepend a single element name with (no slash)

   string retXPath = null; // our eventual return value;
  
try
   {
      System.Text.StringBuilder newXpath =
new System.Text.StringBuilder(xpathStatement);
     
// Create our Namespace manager if required
     
if (useDefaultXmlNamespaceFlag && !nsManagerCreated)
      {
         namespaceMgr =
new XmlNamespaceManager(xmlDoc.NameTable);
         namespaceMgr.AddNamespace(String.Empty,xmlNamespaceText);
         namespaceMgr.AddNamespace(nsPrefix,xmlNamespaceText);
         nsManagerCreated =
true;
      }

      // Now perform a text substitution if we are using a default namespace. Here we prepend the namespace prefix defined in our
      // namespace manager to all elements in the XPath statement.
      if (useDefaultXmlNamespaceFlag)
      {
        
if (xpathStatement != "." && xpathStatement != "..")
         {
           
int idx = newXpath.ToString().IndexOf("/");
           
if ( idx >= 0) // If we find a slash, then replace it with an appropriate slash-prefix
           
{
                newXpath.Replace("/@","<-#->");
// Bit of a hack but replaces any instances of "/@" with a character set that is very unlikely to appear in XPath
                                            
// We do this as "/@" can be valid when querying attributes but we dont prefix attributes so if we continue with
                                            
// the search and replace, this would end up being prefixed, and create invalid xpath.
                newXpath.Replace("/",nsNewElementStartSlash);
                newXpath.Replace("<-#->","/@");
// restore the attribute. 

                // --Update--
                if
(xpathStatement[0] != '/' && xpathStatement[0] != '.')
               
{
                  
string tmp = string.Format("{0}:{1}",nsPrefix,newXpath.ToString());
                   newXpath =
new System.Text.StringBuilder(tmp);
                }
           
} else // If there are no "/", we assume they have passed only a single element name, which we still have to qualify/pre-pend with the namespace prefix
           
{
                System.Text.StringBuilder tmpSb =
new System.Text.StringBuilder();
                tmpSb.AppendFormat("{0}{1}",nsNewElementStartNoSlash,newXpath.ToString());
                newXpath = tmpSb;
            }
         }
      }

      retXPath = newXpath.ToString();
   }
   catch (Exception ex)
   {
     
string msg = string.Format("Error processing XPath statement [{0}] DefaultNamespace: [{1}]. Execution may not yield expected results. Returning original XPath statement supplied.",xpathStatement,xmlNamespaceText);
      EventReporting.WriteEventLogError("",msg,ex,2);
      retXPath = xpathStatement;
   }

   return retXPath;

}

So now the XPath statements are:-

string xPath = PrepareXpath("//Node1/Node2/Element/text()");

All the prefixes are put in there if required, and the xpath works in documents with or without namespaces.

Although a bit of a hack, it seems to work well (note: the namespace manager object and XMLDocument object are predefined objects in the class that this method was a part of, but it can easily be adapted to suit). It is a pain, to ensure the XPath statements are run through this, and a more elegant solution could be achieved with the clever use of a facade class to get around this, but, archtectural issues aside, it did solve the immediate problem and the code/technique is what is of import here.

Hope it helps someone. I will also note that I think a default namespace should be exactly that, a "default" namspace which is not how it seems to work (from a logical point of view) . If none is supplied, then the default should be (IMO) assumed. I understand the technical issues as to why the default namespace has this particular behaviour (ie. dont prefix your XPath statements when a default namespace is present and they will fail), it just doesn't seem to fit my a practical understanding of a default value.

by Glav

Comments

# TrackBack said on Tuesday, December 21, 2004 2:58 AM
# George said on Tuesday, December 19, 2006 2:18 PM

You mention ' I will also note that I think a default namespace should be exactly that, a "default" namspace which is not how it seems to work (from a logical point of view) . If none is supplied, then the default should be (IMO) assumed. I understand the technical issues as to why the default namespace has this particular behaviour (ie. dont prefix your XPath statements when a default namespace is present and they will fail), it just doesn't seem to fit my a practical understanding of a default value.'

I agree!!! I just spent the better part of 3 hours hacking at this - until I stumbled upon your post. You helped me, that's for sure. To think, I simply needed a /ns:name instead of /name... Bleh.

Thanks!

George

# Ian said on Friday, January 26, 2007 7:21 PM

I've just been wrestling with the same problem. Unfortunately I've got XPaths like 'Node/ChildNode|Node/OtherChild[count(GrandChild)>3' or 'SomeNode/Amount1+SomeNode/Amount2+sum(OtherNodes[startswith(@att,'test'])'

(they are used in some bigish XLSTs). With out some serious tokenisation and grammer parsing there is no way to auto-fix namespaces into complex XPaths.

How did this situation ever come about? It's madness that XPath 1.0 can't deal with default namespaces (apparently XPath 2.0 will), and that in Microsofts XML DOM it is impossible to remove the default namespace from a document without serialising, hacking the ascii and deserialising again.

I realise this comment isn't very helpful, but I just had to let off some steam. Thanks for the post, and confirming my findings,

Ian.

# Izzeddeen said on Friday, November 16, 2007 3:09 PM

I prefer using the following xPath query:

//*[local-name(.)='Node1']/*[local-name(.)='Node2']/*[local-name(.)='Element']/text()

# Luke Breuer said on Saturday, December 29, 2007 10:47 PM

All that gross string manipulation to add your prefix can be reduced to a very simple regex operation:

xpath = Regex.Replace(xpath, @"(?<=/|^)(?=\w)(?!\w+:)", NamespacePrefix + ":");

# Glav said on Sunday, December 30, 2007 12:04 AM

All Regex is evil :-)

Seriously, i am not a regex fan but your suggestion is a good one. Thanks for the feedback.

#   Using XPath and Default Namespaces by A Duet in Rhapsody said on Tuesday, January 29, 2008 3:44 PM

Pingback from  &nbsp; Using XPath and Default Namespaces&nbsp;by&nbsp;A Duet in Rhapsody

# Yuiry said on Wednesday, April 02, 2008 12:56 PM

You are an XML god.

# Using XPath and Default Namespaces - Duet in Rhapsody said on Wednesday, September 10, 2008 2:56 AM

Pingback from  Using XPath and Default Namespaces - Duet in Rhapsody

# venom00 said on Tuesday, November 11, 2008 1:50 PM

Very good article. Right what i was looking for.

# venom00 said on Tuesday, November 11, 2008 5:07 PM

Wait! This isn't a very good code! What if there's an XPath expression like this:

/element[condition='the/slash/is/a/problem']

# Glav said on Wednesday, November 12, 2008 1:06 AM

venom00, what happens is.... you get to fix it :-)

# alhambraeidos said on Thursday, May 28, 2009 2:22 AM

How can I get all code ?

What is PREDEFINED_ROOT_ELEMENT ??

Thanks

# Tushar said on Thursday, June 18, 2009 9:53 AM

Excellent Article...!

Probably you are not aware how much pain this thing was causing to my code. Now with your article its resolved.

But I wonder why Microsoft does not highlight such caveats clearly, I almost burned my eyes reading through the MSDN

Thanks!

Leave a Comment

(required) 
(required) 
(optional)
(required) 

This Blog

Syndication