Xml Streaming Events: simple streaming Xml handing (and changing) at work.
Note: this entry has moved.
I presented XSE (Xml Streaming Events) in a previous post. In this post I will show some examples of what can be acomplished with it in a streaming (therefore performant) way.
IMPORTANT NOTICE: for anyone not playing with Whidbey,
all statements that look like
delegate { ... //.net code ... } can be
replaced with the usual v1.x
new EventHandler( yourMethod ) where the
method implements the code inside the brackets.
Upgrading namespaces
Sometimes there's a need to perform some on-the-fly change
in a document stream. For example, imagine you have upgraded
your schemas to a different namespace (i.e. from
xmlns:kzu="http://kzu.aspnet2.com/2003/schematron"
to
xmlns:kzu="http://kzu.aspnet2.com/2004/schematron"). I know there are several discussions all over the web
(see
Dare's post,
David Orchad's, etc.) on schema versioning, and most (including me) agree
that changing the namespace name is not versioning at all.
It's a whole new schema. Aside from that, there are concrete
cases where this has happened and will happen, as well as
your own business requirements. Right now I can think of WXS
and SOAP as two concrete examples. With XSE, it can be
achieved easily at the reader level (that is, BEFORE you
even load a SoapMessage,
XPathDocument or whatever):
Note that the transformation feature is layered on top of
the base XseReader so that I only have to pay
the performance cost for what I use. If I don't need
modifications to the InfoSet, I don't have to pay for the
cost of checks for transformations. A document loaded with
this reader will see an infoset complying with the new
namespace. I can hand this reader to an
XmlValidatingReader and have it validated
against the new schema (remember there's a
known bug in v1.x validating reader that
prevents this, but it has been fixed in v2). Note that
because we're matching with a wildcard, this works at any
level in the document. For example, the following document:
Is upgraded as follows:
Note that a root-element namespace change alone is not enough. So, in order to achieve similar functionality today, a full string loading and find&replace would need to be issued. Again, full streaming support is a top priority for XSE.
Simple element name transformations
Another common use case is simple name changes in a
document. For example, an incoming document may have a
<customer> element when you expect a
<person>, or an
<orderDate> when you need an
<ordered> element. XSE removes the
need for full document loading and XSLT stylesheet creation
and processing that would be required for such a simple
case:
Note that I changed at the same time the element name and namespace.
Simple content adaptation
In the above example, I showed changing a
<customer> element name and namespace to
the expected <person> one. Combined with
node skipping, I can adapt (sort-of downgrade in this case)
the former element to your desired representation. For
example, if the <customer> element
includes a <contact> children that our
<person> element doesn't expect, I can
simply skip it:
Transparent elements and namespaces
James Clark
has proposed what he calls Transparent namespaces in his
Namespace Routing Language (NRL)
proposal, which may make it into the
ISO/IEC 19757 Document Schema Definition Languages
(DSDL). He gives examples where it is useful to have an
element ignored from the stream, as if it didn't exist at
all, but without losing its content. This is different than
XmlReader.Skip() method in that the later stops
parsing the skipped element's children. He gives as an
example an XSLT stylesheet containing XHTML, for example:
It's impossible to validate the XHTML against the corresponding schema, unless you modify it accordingly to include extension points all over the place. The proposed solution is to make the xsl:* elements transparent for the validation process, while retaining their children. James proposes this "transparentizing" at the namespace level. This can be easily achieved with XSE:
Note that I'm using a special wildcard supported by XSE. Wildcard options are:
-
* : mathes any element in any namespace. Is equivalent to
*:*. -
*:item : matches an item with a
LocalName="item", irrespective of namespace. -
kzu:* : matches any element in the namespace mapped to
the "kzu" prefix by the
XmlNamespaceManager. -
:* : matches any element with a
NamespaceURI="". Note that this is not the same as*:*(first option).
Therefore, making elements transparent is supported at a
more granular level than that proposed by James. If a
document is loaded (or a ReadOuterXml()
is performed on the reader), the following infoset is seen:
And of course, as the implementation supports streaming scenarios, you can pass it to the next processing hop without ever loading the entire stream. Another example is processing the body of a SOAP message:
Handing this reader to the processing phase will result in
only the contents of the soap:Body to be seen.
Skip irrelevant content
Finally, if we're processing XML with mixed namespaces, it may be the case that our application only cares about elements from our own namespace. In such cases, loading irrelevant nodes in a document is a clear waste of resources. We can choose to make those other nodes transparent or skip them altogether:
I still have to decide on what syntax would be the most convenient way to say "match everthing that is NOT in this namespace". Options I can think of are:
- ^kzu:*
- !kzu:*
-
Create another strategy factory that interprets the
matches as negative asserts instead of positives. i.e.:
IMatchStrategy nonblank = new NegativeRelativePath().Create(":*"); Instead of matching anything with a blank namespace would match anything with a non-blank NamespaceURI.
As usual, I look forward your feedback as I finish setting up the opensource project for this.
Update: read these follow-up: