April 2004 - Posts
Note: this entry has moved.
As Dare noticed, this is the month of the XmlReader. Here's a new player: the XPathIteratorReader. This time, the scenario is the following: you have an XPathDocument or XPathNavigator, and need to get a reader containing a subset resulting from an XPath query. For example, you may need to return all feeds items from an XML file that contain a certain word in the title:
public XmlReader GetFeedsContaining(string theWord)
{
XPathDocument doc = new XPathDocument(theFeed);
XPathNodeIterator it = doc.CreateNavigator().Select(
"/rss/channel/item[contains(title,'" + theWord + "')]");
return new XPathIteratorReader(it);
} Just like the XmlFragmentStream, this reader fakes a root node for the iterator. Additional constructor overloads allow you to change the default root node which is <root>:
public XPathIteratorReader(XPathNodeIterator iterator, string rootName)
public XPathIteratorReader(XPathNodeIterator iterator, string rootName, string ns) This reader also inherits from XmlTextReader, using the same technique of the XPathNavigatorReader. This means you can validate subsets of nodes against selective XML Schemas too. It also implements IXmlSerializable, so you can directly return this subset of nodes from a web service for example.
Subsets can be written down to disk easily using XmlWriter.WriteNode() method:
public void SaveFeedsContaining(string theWord, string toFile)
{
XPathDocument doc = new XPathDocument(theFeed);
XPathNodeIterator it = doc.CreateNavigator().Select(
"/rss/channel/item[contains(title,'" + theWord + "')]");
using (StreamWriter sw = new StreamWriter(toFile, false))
{
XmlTextWriter tw = new XmlTextWriter(sw);
tw.WriteNode(new XPathIteratorReader(it), false);
tw.Close();
}
} There are a couple interesting things inside this class:
- It leverages the XPathNavigatorReader for each item in the iterator. So it basically passes through property and method calls to it.
- Depth is increased by one all the time, except for the faked root element.
- Instead of having ifs on all XmlTextReader overrides checking whether it's at the faked root or not, I decided to go for the more elegant approach of creating a FakedRootReader class. So the code in XPathIteratorReader becomes drastically simpler. It's mostly passing calls down to the current reader and that's it. Therefore, the only braching code exists in the Read method, and it's really trivial, basically checking with the current ReadState and creating the FakedRootReader if necessary: public override bool Read()
{
// Return fast if state is no appropriate.
if (_current.ReadState == ReadState.Closed || _current.ReadState == ReadState.EndOfFile)
return false;
bool read = _current.Read();
if (!read)
{
read = _iterator.MoveNext();
if (read)
{
// Just move to the next node and create the reader.
_current = new XPathNavigatorReader(_iterator.Current);
return _current.Read();
}
else
{
if (_current is FakedRootReader && _current.NodeType == XmlNodeType.EndElement)
{
// We're done!
return false;
}
else
{
// We read all nodes in the iterator. Return to faked root end element.
_current = new FakedRootReader(_rootname.Name, _rootname.Namespace, XmlNodeType.EndElement);
return true;
}
}
}
return read;
} - The IXmlSerializable implementation uses the following trick: it loads the incoming document, moves to the root and makes this the new "faked" root, and uses an iterator over all root node children as its new internal state :D. Here it is: void IXmlSerializable.ReadXml(XmlReader reader)
{
XPathDocument doc = new XPathDocument(reader);
XPathNavigator nav = doc.CreateNavigator();
// Pull the faked root out.
nav.MoveToFirstChild();
_rootname = new XmlQualifiedName(nav.LocalName, nav.NamespaceURI);
// Get iterator for all child nodes.
_iterator = nav.SelectChildren(XPathNodeType.All);
}
+ As usual, if you just want the full class code to copy-paste on your project, here it is. As usual too, though, I strongly encourage you to take a look at the Mvp.Xml project ;)
namespace Mvp.Xml.XPath { /// <summary> /// Provides an <see cref="XmlReader"/> over an /// <see cref="XPathNodeIterator"/>. /// </summary> /// <remarks> /// The reader exposes a new root element enclosing all navigators from the /// iterator. This root node is configured in the constructor, by /// passing the desired name and optional namespace for it. /// <para>Author: Daniel Cazzulino, kzu@aspnet2.com</para> /// See: http://weblogs.asp.net/cazzu/archive/2004/04/26/120684.aspx /// </remarks> public class XPathIteratorReader : XmlTextReader, IXmlSerializable { #region Fields // Holds the current child being read. XmlReader _current; // Holds the iterator passed to the ctor. XPathNodeIterator _iterator; // The name for the root element. XmlQualifiedName _rootname; #endregion Fields #region Ctor /// <summary> /// Parameterless constructor for XML serialization. /// </summary> /// <remarks>Supports the .NET serialization infrastructure. Don't use this /// constructor in your regular application.</remarks> [System.ComponentModel.EditorBrowsable(System.ComponentModel.EditorBrowsableState.Never)] public XPathIteratorReader() { } /// <summary> /// Initializes the reader, using the default <root> element. /// </summary> /// <param name="iterator">The iterator to expose as a single reader.</param> public XPathIteratorReader(XPathNodeIterator iterator) : this(iterator, "root", String.Empty) { } /// <summary> /// Initializes the reader. /// </summary> /// <param name="iterator">The iterator to expose as a single reader.</param> /// <param name="rootName">The name to use for the enclosing root element.</param> public XPathIteratorReader(XPathNodeIterator iterator, string rootName) : this(iterator, rootName, String.Empty) { } /// <summary> /// Initializes the reader. /// </summary> /// <param name="iterator">The iterator to expose as a single reader.</param> /// <param name="rootName">The name to use for the enclosing root element.</param> /// <param name="ns">The namespace URI of the root element.</param> public XPathIteratorReader(XPathNodeIterator iterator, string rootName, string ns) : base(new StringReader(String.Empty)) { _iterator = iterator.Clone(); _current = new FakedRootReader(rootName, ns, XmlNodeType.Element); _rootname = new XmlQualifiedName(rootName, ns); } #endregion Ctor #region Private members /// <summary> /// Returns the XML representation of the current node and all its children. /// </summary> private string Serialize() { StringWriter sw = new StringWriter(); XmlTextWriter tw = new XmlTextWriter(sw); tw.WriteNode(this, false); sw.Flush(); return sw.ToString(); } #endregion Private members #region Properties /// <summary>See <see cref="XmlReader.AttributeCount"/></summary> public override int AttributeCount { get { return _current.AttributeCount; } } /// <summary>See <see cref="XmlReader.BaseURI"/></summary> public override string BaseURI { get { return _current.BaseURI; } } /// <summary>See <see cref="XmlReader.Depth"/></summary> public override int Depth { get { return _current.Depth + 1; } } /// <summary>See <see cref="XmlReader.EOF"/></summary> public override bool EOF { get { return _current.ReadState == ReadState.EndOfFile || _current.ReadState == ReadState.Closed; } } /// <summary>See <see cref="XmlReader.HasValue"/></summary> public override bool HasValue { get { return _current.HasValue; } } /// <summary>See <see cref="XmlReader.IsDefault"/></summary> public override bool IsDefault { get { return false; } } /// <summary>See <see cref="XmlReader.IsDefault"/></summary> public override bool IsEmptyElement { get { return _current.IsEmptyElement; } } /// <summary>See <see cref="XmlReader.this"/></summary> public override string this[string name, string ns] { get { return _current[name, ns]; } } /// <summary>See <see cref="XmlReader.this"/></summary> public override string this[string name] { get { return _current[name, String.Empty]; } } /// <summary>See <see cref="XmlReader.this"/></summary> public override string this[int i] { get { return _current[i]; } } /// <summary>See <see cref="XmlReader.LocalName"/></summary> public override string LocalName { get { return _current.LocalName; } } /// <summary>See <see cref="XmlReader.Name"/></summary> public override string Name { get { return _current.Name; } } /// <summary>See <see cref="XmlReader.NamespaceURI"/></summary> public override string NamespaceURI { get { return _current.NamespaceURI; } } /// <summary>See <see cref="XmlReader.NameTable"/></summary> public override XmlNameTable NameTable { get { return _current.NameTable; } } /// <summary>See <see cref="XmlReader.NodeType"/></summary> public override XmlNodeType NodeType { get { return _current.NodeType; } } /// <summary>See <see cref="XmlReader.Prefix"/></summary> public override string Prefix { get { return _current.Prefix; } } /// <summary>See <see cref="XmlReader.QuoteChar"/></summary> public override char QuoteChar { get { return _current.QuoteChar; } } /// <summary>See <see cref="XmlReader.ReadState"/></summary> public override ReadState ReadState { get { return _current.ReadState; } } /// <summary>See <see cref="XmlReader.Value"/></summary> public override string Value { get { return _current.Value; } } /// <summary>See <see cref="XmlReader.XmlLang"/></summary> public override string XmlLang { get { return _current.XmlLang; } } /// <summary>See <see cref="XmlReader.XmlSpace"/></summary> public override XmlSpace XmlSpace { get { return XmlSpace.Default; } } #endregion Properties #region Methods /// <summary>See <see cref="XmlReader.Close"/></summary> public override void Close() { _current.Close(); } /// <summary>See <see cref="XmlReader.GetAttribute"/></summary> public override string GetAttribute(string name, string ns) { return _current.GetAttribute(name, ns); } /// <summary>See <see cref="XmlReader.GetAttribute"/></summary> public override string GetAttribute(string name) { return _current.GetAttribute(name); } /// <summary>See <see cref="XmlReader.GetAttribute"/></summary> public override string GetAttribute(int i) { return _current.GetAttribute(i); } /// <summary>See <see cref="XmlReader.LookupNamespace"/></summary> public override string LookupNamespace(string prefix) { return _current.LookupNamespace(prefix); } /// <summary>See <see cref="XmlReader.MoveToAttribute"/></summary> public override bool MoveToAttribute(string name, string ns) { return _current.MoveToAttribute(name, ns); } /// <summary>See <see cref="XmlReader.MoveToAttribute"/></summary> public override bool MoveToAttribute(string name) { return _current.MoveToAttribute(name); } /// <summary>See <see cref="XmlReader.MoveToAttribute"/></summary> public override void MoveToAttribute(int i) { _current.MoveToAttribute(i); } /// <summary>See <see cref="XmlReader.MoveToContent"/></summary> public override XmlNodeType MoveToContent() { return base.MoveToContent(); } /// <summary>See <see cref="XmlReader.MoveToElement"/></summary> public override bool MoveToElement() { return _current.MoveToElement(); } /// <summary>See <see cref="XmlReader.MoveToFirstAttribute"/></summary> public override bool MoveToFirstAttribute() { return _current.MoveToFirstAttribute(); } /// <summary>See <see cref="XmlReader.MoveToNextAttribute"/></summary> public override bool MoveToNextAttribute() { return _current.MoveToNextAttribute(); } /// <summary>See <see cref="XmlReader.Read"/></summary> public override bool Read() { // Return fast if state is no appropriate. if (_current.ReadState == ReadState.Closed || _current.ReadState == ReadState.EndOfFile) return false; bool read = _current.Read(); if (!read) { read = _iterator.MoveNext(); if (read) { // Just move to the next node and create the reader. _current = new XPathNavigatorReader(_iterator.Current); return _current.Read(); } else { if (_current is FakedRootReader && _current.NodeType == XmlNodeType.EndElement) { // We're done! return false; } else { // We read all nodes in the iterator. Return to faked root end element. _current = new FakedRootReader(_rootname.Name, _rootname.Namespace, XmlNodeType.EndElement); return true; } } } return read; } /// <summary>See <see cref="XmlReader.ReadAttributeValue"/></summary> public override bool ReadAttributeValue() { return _current.ReadAttributeValue(); } /// <summary>See <see cref="XmlReader.ReadInnerXml"/></summary> public override string ReadInnerXml() { if (this.Read()) return Serialize(); return String.Empty; } /// <summary>See <see cref="XmlReader.ReadOuterXml"/></summary> public override string ReadOuterXml() { if (_current.ReadState != ReadState.Interactive) return String.Empty; return Serialize(); } /// <summary>See <see cref="XmlReader.Read"/></summary> public override void ResolveEntity() { // Not supported. } #endregion Methods #region IXmlSerializable Members void IXmlSerializable.WriteXml(XmlWriter writer) { writer.WriteNode(this, false); } System.Xml.Schema.XmlSchema IXmlSerializable.GetSchema() { return null; } void IXmlSerializable.ReadXml(XmlReader reader) { XPathDocument doc = new XPathDocument(reader); XPathNavigator nav = doc.CreateNavigator(); // Pull the faked root out. nav.MoveToFirstChild(); _rootname = new XmlQualifiedName(nav.LocalName, nav.NamespaceURI); // Get iterator for all child nodes. _iterator = nav.SelectChildren(XPathNodeType.All); } #endregion #region Internal classes #region FakedRootReader private class FakedRootReader : XmlReader { public FakedRootReader(string name, string ns, XmlNodeType nodeType) { _name = name; _namespace = ns; _nodetype = nodeType; _state = nodeType == XmlNodeType.Element ? ReadState.Initial : ReadState.Interactive; } #region Properties /// <summary>See <see cref="XmlReader.AttributeCount"/></summary> public override int AttributeCount { get { return 0; } } /// <summary>See <see cref="XmlReader.BaseURI"/></summary> public override string BaseURI { get { return String.Empty; } } /// <summary>See <see cref="XmlReader.Depth"/></summary> public override int Depth { // Undo the depth increment of the outer reader. get { return -1; } } /// <summary>See <see cref="XmlReader.EOF"/></summary> public override bool EOF { get { return _state == ReadState.EndOfFile; } } /// <summary>See <see cref="XmlReader.HasValue"/></summary> public override bool HasValue { get { return false; } } /// <summary>See <see cref="XmlReader.IsDefault"/></summary> public override bool IsDefault { get { return false; } } /// <summary>See <see cref="XmlReader.IsDefault"/></summary> public override bool IsEmptyElement { get { return false; } } /// <summary>See <see cref="XmlReader.this"/></summary> public override string this[string name, string ns] { get { return null; } } /// <summary>See <see cref="XmlReader.this"/></summary> public override string this[string name] { get { return null; } } /// <summary>See <see cref="XmlReader.this"/></summary> public override string this[int i] { get { return null; } } /// <summary>See <see cref="XmlReader.LocalName"/></summary> public override string LocalName { get { return _name; } } string _name; /// <summary>See <see cref="XmlReader.Name"/></summary> public override string Name { get { return _name; } } /// <summary>See <see cref="XmlReader.NamespaceURI"/></summary> public override string NamespaceURI { get { return _namespace; } } string _namespace; /// <summary>See <see cref="XmlReader.NameTable"/></summary> public override XmlNameTable NameTable { get { return null; } } /// <summary>See <see cref="XmlReader.NodeType"/></summary> public override XmlNodeType NodeType { get { return _state == ReadState.Initial ? XmlNodeType.None : _nodetype; } } XmlNodeType _nodetype; /// <summary>See <see cref="XmlReader.Prefix"/></summary> public override string Prefix { get { return String.Empty; } } /// <summary>See <see cref="XmlReader.QuoteChar"/></summary> public override char QuoteChar { get { return '"'; } } /// <summary>See <see cref="XmlReader.ReadState"/></summary> public override ReadState ReadState { get { return _state; } } ReadState _state; /// <summary>See <see cref="XmlReader.Value"/></summary> public override string Value { get { return String.Empty; } } /// <summary>See <see cref="XmlReader.XmlLang"/></summary> public override string XmlLang { get { return String.Empty; } } /// <summary>See <see cref="XmlReader.XmlSpace"/></summary> public override XmlSpace XmlSpace { get { return XmlSpace.Default; } } #endregion Properties #region Methods /// <summary>See <see cref="XmlReader.Close"/></summary> public override void Close() { _state = ReadState.Closed; } /// <summary>See <see cref="XmlReader.GetAttribute"/></summary> public override string GetAttribute(string name, string ns) { return null; } /// <summary>See <see cref="XmlReader.GetAttribute"/></summary> public override string GetAttribute(string name) { return null; } /// <summary>See <see cref="XmlReader.GetAttribute"/></summary> public override string GetAttribute(int i) { return null; } /// <summary>See <see cref="XmlReader.LookupNamespace"/></summary> public override string LookupNamespace(string prefix) { return null; } /// <summary>See <see cref="XmlReader.MoveToAttribute"/></summary> public override bool MoveToAttribute(string name, string ns) { return false; } /// <summary>See <see cref="XmlReader.MoveToAttribute"/></summary> public override bool MoveToAttribute(string name) { return false; } /// <summary>See <see cref="XmlReader.MoveToAttribute"/></summary> public override void MoveToAttribute(int i) { } public override XmlNodeType MoveToContent() { if (_state == ReadState.Initial) _state = ReadState.Interactive; return _nodetype; } /// <summary>See <see cref="XmlReader.MoveToElement"/></summary> public override bool MoveToElement() { return false; } /// <summary>See <see cref="XmlReader.MoveToFirstAttribute"/></summary> public override bool MoveToFirstAttribute() { return false; } /// <summary>See <see cref="XmlReader.MoveToNextAttribute"/></summary> public override bool MoveToNextAttribute() { return false; } /// <summary>See <see cref="XmlReader.Read"/></summary> public override bool Read() { if (_state == ReadState.Initial) { _state = ReadState.Interactive; return true; } if (_state == ReadState.Interactive && _nodetype == XmlNodeType.EndElement) { _state = ReadState.EndOfFile; return false; } return false; } /// <summary>See <see cref="XmlReader.ReadAttributeValue"/></summary> public override bool ReadAttributeValue() { return false; } /// <summary>See <see cref="XmlReader.ReadInnerXml"/></summary> public override string ReadInnerXml() { return String.Empty; } /// <summary>See <see cref="XmlReader.ReadOuterXml"/></summary> public override string ReadOuterXml() { return String.Empty; } /// <summary>See <see cref="XmlReader.Read"/></summary> public override void ResolveEntity() { // Not supported. } #endregion Methods } #endregion FakedRootReader #endregion Internal classes } } The full Mvp.Xml project source code can be downloaded from SourceForge.
Enjoy and please give us feedback on the project!
Check out the Roadmap to high performance XML.
Note: this entry has moved.
Last week I wrote a post about the
XPathNavigatorReader. That post has some XML showing the output of a webservice, output generated from the Pubs sample database. As you know, there's a title table. Well, as I
already said in a previous post, the
<xmp> element is a useful one to me: it's like
<pre> but you can have markup that is not parsed (think C# XML documentation tags...). As it's not parsed, you don't need to escape anything. Cool, IMO.
So, the following code showed up in that post, *unescaped* inside that xmp element:
<XPathNavigatorReader>
<titles>
<title_id>BU2075</title_id>
<title>You Can Combat Computer Stress!</title>
...
Due to a very ugly bug in
Technorati, I had to escape tags in that post. Why? Because they are parsing the whole page (not just the
<head> section) searching for a
<title> element to use as the weblog title, and in my case, they found the one in my XML snippet from Pubs!!! Hence, they think this is the title of my weblog :( :

Bottom line: escape content to avoid buggy sniffers... Damn, I was so happy to avoid harmful escaped markup...
Special thanks to my friend and partner
VGA for warning me about this!
Note: this entry has moved.
I've just checked the stats of our
Mvp.Xml project. Pretty good news for a project born only two months ago:
Page views: 1.421
Downloads: 243
Last 30 days we've had a steady rate of increase in page views of 200% a day, and 150% in downloads.
Note: this entry has moved.
Back at the 2004 MVP Global Summit, I met fellow XML fan Kirk, who was seeking a solution to the following problem: you have a (several) megabytes file containing multiple XML fragments, and you want to read it (in his case specially through the SgmlReader). The problem is, of course, that the XmlTextReader will throw an exception as soon as it finds the second fragment, unless you use the special ctor. overload that takes an XmlParsingContext. Dare shows an alternate solution based on XML inclusion techniques, either DTD external entities or XInclude.
These techniques effectively expose a fully well-formed document to your application, which has a number of benefits, including the ability to transform it if you need to, for example. But I was thinking more along the lines of providing a class that could actually read the fragments without resorting to those mechanisms. I couldn't cheat the XmlTextReader, so I decided to go one step lower. The result is the XmlFragmentStream, a class that wraps any System.IO.Stream and fakes the missing root element, so that an XmlTextReader layered on top of it, will think the document is well-formed. Here's how to use it:
Given the following XML fragments:
127.0.0.1 GET ... 127.0.0.1 POST ...
... You can read (and even validate with an XmlValidatingReader) using this code:
using (Stream stm = File.OpenRead("events.xml"))
{
XmlTextReader tr = new XmlTextReader(new XmlFragmentStream(stm));
// Do performant ref comparison
string ev = tr.NameTable.Add("event");
while (tr.Read())
{
if (tr.LocalName == ev)
// Process it!
}
} The XmlFragmentStream class also contain two contructor overloads that allow you to specify the name and namespace of the enclosing root element (by default <root>):
public XmlFragmentStream(Stream innerStream, string rootName)
public XmlFragmentStream(Stream innerStream, string rootName, string namespaceURI)
This technique is proven by a real world (surely happy) customer Kirk helped ;). What's more, he even contributed a bug-fix he found when using it.
The performance impact of this approach in negligible because the class is basically an intermediary with minimal processing.
As Oleg noted pointed in a comment (and motivated a slight editing in this post), as well as showed in his weblog, you can do this with the aforementioned special XmlTextReader constructor overload, passing an XmlParsingContext. This is more cumbersome, in my opinion, and still leaves you with the problem of not having a valid XML document.
+ As usual, if you just want the full class code to copy-paste on your project, here it is. I strongly encourage you to take a look at the Mvp.Xml project, as there're other really cool goodies there!
#region using
using System;
using System.IO;
using System.Text;
#endregion using
namespace Mvp.Xml.Common
{
/// <summary>
/// Allows streams without a root element (i.e. multiple document
/// fragments) to be passed to an <see cref="System.Xml.XmlReader"/>.
/// </summary>
/// <remarks>A faked root element is added at the stream
/// level to enclose the fragments, which can be customized
/// using the overloaded constructors.
/// <para>Author: Daniel Cazzulino, kzu@aspnet2.com</para>
/// See: http://weblogs.asp.net/cazzu/archive/2004/04/23/119263.aspx.
/// </remarks>
public class XmlFragmentStream : Stream
{
#region Fields
// Holds the inner stream with the XML fragments.
Stream _stream;
bool _first = true;
bool _done = false;
bool _eof = false;
// TODO: there's a potential encoding issue here.
byte[] _rootstart = UTF8Encoding.UTF8.GetBytes("<root>");
byte[] _rootend = UTF8Encoding.UTF8.GetBytes("</root>");
int _endidx = -1;
#endregion Fields
#region Ctors
/// <summary>
/// Initializes the class with the underlying stream to use, and
/// uses the default <root> container element.
/// </summary>
/// <param name="innerStream">The stream to read from.</param>
public XmlFragmentStream(Stream innerStream)
{
if (innerStream == null)
throw new ArgumentNullException("innerStream");
_stream = innerStream;
}
/// <summary>
/// Initializes the class with the underlying stream to use, with
/// a custom root element.
/// </summary>
/// <param name="innerStream">The stream to read from.</param>
/// <param name="rootName">Custom root element name to use.</param>
public XmlFragmentStream(Stream innerStream, string rootName) : this (innerStream)
{
_rootstart = UTF8Encoding.UTF8.GetBytes("<" + rootName + ">");
_rootend = UTF8Encoding.UTF8.GetBytes("</" + rootName + ">");
}
/// <summary>
/// Initializes the class with the underlying stream to use, with
/// a custom root element.
/// </summary>
/// <param name="innerStream">The stream to read from.</param>
/// <param name="rootName">Custom root element name to use.</param>
/// <param name="ns">The namespace of the root element.</param>
public XmlFragmentStream(Stream innerStream, string rootName, string ns) : this (innerStream)
{
_rootstart = UTF8Encoding.UTF8.GetBytes("<" + rootName + " xmlns=\"" + ns + "\">");
_rootend = UTF8Encoding.UTF8.GetBytes("</" + rootName + ">");
}
#endregion Ctors
#region Stream abstract implementation
/// <summary>See <see cref="Stream.Flush"/>.</summary>
public override void Flush()
{
_stream.Flush();
}
/// <summary>See <see cref="Stream.Seek"/>.</summary>
public override long Seek(long offset, SeekOrigin origin)
{
return _stream.Seek(offset, origin);
}
/// <summary>See <see cref="Stream.SetLength"/>.</summary>
public override void SetLength(long value)
{
_stream.SetLength(value);
}
/// <summary>See <see cref="Stream.Write"/>.</summary>
public override void Write(byte[] buffer, int offset, int count)
{
_stream.Write(buffer, offset, count);
}
/// <summary>See <see cref="Stream.CanRead"/>.</summary>
public override bool CanRead { get { return _stream.CanRead; } }
/// <summary>See <see cref="Stream.CanSeek"/>.</summary>
public override bool CanSeek { get { return _stream.CanSeek; } }
/// <summary>See <see cref="Stream.CanWrite"/>.</summary>
public override bool CanWrite { get { return _stream.CanWrite; } }
/// <summary>See <see cref="Stream.Length"/>.</summary>
public override long Length { get { return _stream.Length; } }
/// <summary>See <see cref="Stream.Position"/>.</summary>
public override long Position
{
get { return _stream.Position; }
set { _stream.Position = value; }
}
#endregion Stream abstract implementation
#region Read method
/// <summary>See <see cref="Stream.Read"/>.</summary>
public override int Read(byte[] buffer, int offset, int count)
{
if (_done)
{
if(!_eof)
{
_eof = true;
return 0;
}
else
{
throw new System.IO.EndOfStreamException(SR.GetString(SR.XmlFragmentStream_EOF));
}
}
// If this is the first one, return the wrapper root element.
if (_first)
{
_rootstart.CopyTo(buffer, 0);
_stream.Read(buffer, _rootstart.Length, count - _rootstart.Length);
_first = false;
return count;
}
// We have a pending closing wrapper root element.
if (_endidx != -1)
{
for (int i = _endidx; i < _rootend.Length; i++)
{
buffer[i] = _rootend[i];
}
return _rootend.Length - _endidx;
}
int ret = _stream.Read(buffer, offset, count);
// Did we reached the end?
if (ret < count)
{
_rootend.CopyTo(buffer, ret);
if (count - ret > _rootend.Length)
{
_done = true;
return ret + _rootend.Length;
}
else
{
_endidx = count - ret;
return count;
}
}
return ret;
}
#endregion Read method
}
}
The full Mvp.Xml project source code can be downloaded from SourceForge.
Enjoy and please give us feedback on the project!
Check out the Roadmap to high performance XML.
Note: this entry has moved.
Oleg's
IndexingXPathNavigator is now part of the opensource
Mvp.Xml project. A another good addition to the package...
The full project source code can be downloaded from SourceForge
Check out the Roadmap to high performance XML.
Note: this entry has moved.
There are many reasons why developers don't use the XPathDocument and XPathNavigator APIs and resort to XmlDocument instead. I outlined some of them with regards to querying functionality in my posts about how to take advantage of XPath expression precompilation, and How to get an XmlNodeList from an XPathNodeIterator (reloaded).
XPathNavigator is a far superior way of accessing and querying data because it offers built-in support for XPath querying independently of the store, which automatically gain the feature and more importantly, because it abstracts the underlying store mechanism, which allows multiple data formats to be accessed consistently. The XML WebData team has seriously optimized the internal storage of XPathDocument, which results in important improvents both in loading time and memory footprint, as well as general performance. This was possible because the underlying store is completely hidden from the developer behind the XPathNavigator class, therefore, even the most drastic change in internal representation does not affect current applications.
However, some useful features of the XmlDocument and XmlReader classes are not available. Basically, I've created an XmlReader facade over the XPathNavigator class, which allows you to work against either an streaming or a cursor API. I'll discuss how the missing features are enabled by the use of the new XPathNavigatorReader class, part of the opensource Mvp.Xml project.
Examples use an XML document with the structure of the Pubs database.
Serialization as XML
Both the XmlDocument (more properly, the XmlNode) the and XmlReader offer built-in support to get a raw string representing the entire content of any node. XmlNode exposes InnerXml and OuterXml properties, whereas the XmlReader offers ReadInnerXml and ReadOuterXml methods.
Once you go the XPathDocument route, however, you completely loss this feature. The new XPathNavigatorReader is an XmlReader implementation over an XPathNavigator, thus providing the aforementioned ReadInnerXml and ReadOuterXml methods. Basically, you work with the XPathNavigator object, and at the point you need to serialize it as XML, you simply construct this new reader over it, and use it as you would with any XmlReader:
XPathDocument doc = new XPathDocument(input);
XPathNavigator nav = doc.CreateNavigator();
// Move navigator, select with XPath, whatever.
XmlReader reader = new XPathNavigatorReader(nav);
// Initialize it.
if (reader.Read())
{
Console.WriteLine(reader.ReadOuterXml());
// We can also use reader.ReadInnerXml();
} Another useful scenario is directly writing a fragment of the document by means of the XmlWriter.WriteNode method:
// Will select the title id.
XPathExpression idexpr = navigator.Compile("string(title_id/text())");
XPathNodeIterator it = navigator.Select("//titles[price > 10]");
while (it.MoveNext())
{
XmlReader reader = new XPathNavigatorReader(it.Current);
// Save to a file with the title ID as the name.
XmlTextWriter tw = new XmlTextWriter(
(string) it.Current.Evaluate(idexpr) + ".xml",
System.Text.Encoding.UTF8);
// Dump it!
writer.WriteNode(reader, false);
writer.Close();
} This code saves each book with a price bigger than 10 to a file named after the title id. You can note that the reader works in the scope defined by the navigator passed to its constructor, effectively providing a view over a fragment of the entire document. It's also important to observe that even when an evaluation will cause a cursor movement to the navigator in it.Current, the reader we're using will not be affected, as the constructor clones it up-front. Also, note that it's always a good idea to precompile an expression that is going to be executed repeatedly (ideally, application-wide).
XmlSerializer-ready
The reader implements IXmlSerializable, so you can directly return it from WebServices for example. You could have a web service returning the result of an XPath query without resorting to hacks like loading XmlDocument s or returning an XML string that will be escaped. XPathDocument is not XML-serializable either. Now you can simply use code like the following:
[WebMethod]
public XPathNavigatorReader GetData()
{
XPathDocument doc = GetDocument();
XPathNodeIterator it = doc.CreateNavigator().Select("//titles[title_id='BU2075']");
if (it.MoveNext())
return new XPathNavigatorReader(it.Current);
return null;
} This web service response will be:
<XPathNavigatorReader>
<titles>
<title_id>BU2075</title_id>
<title>You Can Combat Computer Stress!</title>
<type>business </type>
<pub_id>0736</pub_id>
<price>2.99</price>
<advance>10125</advance>
<royalty>24</royalty>
<ytd_sales>18722</ytd_sales>
<notes>The latest medical and psychological techniques for living with the electronic office. Easy-to-understand explanations.</notes>
<pubdate>1991-06-30T00:00:00.0000000-03:00</pubdate>
</titles>
</XPathNavigatorReader>
XML Schema Validation
Imagine the following scenario: you are processing a document, where only certain elements and their content need to be validated against an XML Schema, such as the contents of an element inside a soap:Body. If you're working with an XmlDocument, a known bug in XmlValidatingReader prevents you from doing the following:
XmlDocument doc = GetDocument(); // Get the doc somehow.
XmlNode node = doc.SelectSingleNode("//titles[title_id='BU2075']");
// Create a validating reader for XSD validation.
XmlValidatingReader vr = new XmlValidatingReader(new XmlNodeReader(node)); The validating reader will throw an exception because it expects an instance of an XmlTextReader object. This will be fixed in Whidbey, but no luck for v1.x. You're forced to do this:
XmlDocument doc = GetDocument(); // Get the doc somehow.
XmlNode node = doc.SelectSingleNode("//titles[title_id='BU2075']");
// Build the reader directly from the XML string taken through OuterXml.
XmlValidatingReader vr = new XmlValidatingReader(
new XmlTextReader(new StringReader(node.OuterXml))); Of course, you're paying the parsing cost twice here. The XPathNavigatorReader, unlike the XmlNodeReader, derives directly from XmlTextReader, therefore, it fully supports fragment validation. You can validate against XML Schemas that only define the node where you're standing. The following code validates all expensive books with a narrow schema, instead of a full-blown Pubs schema:
XmlSchema sch = XmlSchema.Read(expensiveBooksSchemaLocation, null);
// Select expensive books.
XPathNodeIterator it = navigator.Select("//titles[price > 10]");
while (it.MoveNext())
{
// Create a validating reader over an XPathNavigatorReader for the current node.
XmlValidatingReader vr = new XmlValidatingReader(new XPathNavigatorReader(it.Current));
// Add the schema for the current node.
vr.Schemas.Add(sch);
// Validate it!
while (vr.Read()) {}
} This opens the possiblity for modular validation of documents, which is specially useful when you have generic XML processing layers that validate selectively depending on namespaces, for example. What's more, this feature really starts making the XPathDocument/XPathNavigator combination a more feature-complete option to XmlDocument when you only need read-only access to the document.
+ Implementation details. Expand only if you care to know a couple tricks I did ;)
Implementation Goodies
If you wonder how did I implement it from XmlTextReader instead of XmlReader , read on. If you just want to go straight to downloading and using it, you can safely skip this section.
Even in the face of the XmlValidatingReader bug, I found a workaround that works great. Luckily, the XmlTextReader is not a sealed class, so intead of inheriting from XmlReader, I inherited from it. I basically cheat it at construction time, passing an empty string to it:
public class XPathNavigatorReader : XmlTextReader { public XPathNavigatorReader(XPathNavigator navigator) : base(new StringReader(String.Empty)) ... Next, I override all the methods which are abstract on the base XmlReader, basically replacing all the functionality from the XmlTextReader. Next, I also replaced the functionality of ReadInnerXml and ReadOuterXml methods, which are new from the XmlTextReader:
public override string ReadInnerXml() { if (this.Read()) return Serialize(); return String.Empty; } public override string ReadOuterXml() { if (_state != ReadState.Interactive) return String.Empty; return Serialize(); } They are both passthrough methods to the Serialize one that performs actual writing. I think you will be surprised by the following snippet. There's no interesting or complex code here, and I basically use the same node writing feature I explained above:
private string Serialize() { StringWriter sw = new StringWriter(); XmlTextWriter tw = new XmlTextWriter(sw); tw.WriteNode(this, false); sw.Flush(); return sw.ToString(); } This is a benefit of having a 100% reader implementation.
Another interesting thing in the implementation is that the XPathNavigator class provides separate handling of namespace attributes and regular ones (GetAttribute and GetNamespace), unlike the XmlReader, which exposes both simply as attributes. The reader MoveToFirstAttribute method checks for both cases, moving either to the first regular attribute or the namespace one:
public override bool MoveToFirstAttribute() { if (_isendelement) return false; bool moved = _navigator.MoveToFirstAttribute(); if (!moved) moved = _navigator.MoveToFirstNamespace(XPathNamespaceScope.Local); if (moved) { // Escape faked text node for attribute value. if (_attributevalueread) _depth--; // Reset attribute value read flag. _attributevalueread = false; } return moved; } The same work is done in the MoveToNextAttribute:
public override bool MoveToNextAttribute() { bool moved = false; if (_navigator.NodeType == XPathNodeType.Attribute) { moved = _navigator.MoveToNextAttribute(); if (!moved) { // We ended regular attributes. Start with namespaces if appropriate. _navigator.MoveToParent(); moved = _navigator.MoveToFirstNamespace(XPathNamespaceScope.Local); } } else if (_navigator.NodeType == XPathNodeType.Namespace) { moved = _navigator.MoveToNextNamespace(XPathNamespaceScope.Local); } if (moved) { // Escape faked text node for attribute value. if (_attributevalueread) _depth--; // Reset attribute value read flag. _attributevalueread = false; } return moved; } I also take into account that the ReadAttributeValue method call causes a reader to be moved into the attribute value, where the current node type becomes Text usually (there's also the Entity resolution and references stuff). The documentation for the XmlReader.ReadAttributeValue method states that the depth is incremented, so I take into account that too. This is basically a matter of setting a flag:
public override bool ReadAttributeValue() { // If this method hasn't been called yet for the attribute. if (!_attributevalueread && (_navigator.NodeType == XPathNodeType.Attribute || _navigator.NodeType == XPathNodeType.Namespace)) { _attributevalueread = true; _depth++; return true; } return false; } bool _attributevalueread = false; I came across the need to implement this when I used the XmlWriter.WriteNode method, which uses it intensively. I studied both the XmlValidatingReader and XmlTextWriter usage of the underlying XmlReader, by creating an XmlTextReader descendant that basically logs calls to its methods (yup, I could have used DevPartner Profiler, or any other profiles, I know...), which gave me the following picture on what's used by each:
XmlReader methods | Called by XmlValidatingReader | Called by XmlTextWriter |
AttributeCount | AttributeCount | AttributeCount |
BaseURI | BaseURI | BaseURI |
Close | | |
Depth | Depth | Depth |
EOF | EOF | EOF |
GetAttribute | | |
HasValue | | |
IsDefault | IsDefault | IsDefault |
IsEmptyElement | IsEmptyElement | IsEmptyElement |
Item | | |
LocalName | LocalName | LocalName |
LookupNamespace | | |
MoveToAttribute | MoveToAttribute(int) | MoveToAttribute(int) |
MoveToElement | MoveToElement | MoveToElement |
MoveToFirstAttribute | MoveToFirstAttribute | MoveToFirstAttribute |
MoveToNextAttribute | MoveToNextAttribute | MoveToNextAttribute |
Name | Name | Name |
NamespaceURI | NamespaceURI | NamespaceURI |
NameTable | NameTable | NameTable |
NodeType | NodeType | NodeType |
Prefix | Prefix | Prefix |
QuoteChar | QuoteChar | QuoteChar |
Read | Read | Read |
ReadAttributeValue | | ReadAttributeValue |
ReadState | ReadState | ReadState |
ResolveEntity | | |
Value | Value | Value |
XmlLang | XmlLang | XmlLang |
XmlSpace | XmlSpace | XmlSpace |
There's some interesting information here! For example, neither class uses the XmlReader.HasValue property, or the GetAttribute or indexer (Item) to access attributes. Mostly, access to attributes is done either by calling MoveToFirstAttribute/MoveToNextAttribute or by gettting the AttributeCount and later using MoveToAttribute(int index) for each, something like:
for (int i = 0; i < reader.AttributeCount; i++) { reader.MoveToAttribute(i); // Do something with it. } I've seen other attempts at this issue (both for XPathNavigatorReader and NavigatorReader classes) that basically iterate attributes each time AttributeCount is retrieved, and do the same until the desired index is reached in MoveToAttribute(i), by calling MoveToNextAttribute() repeatedly. From the table above, I could see that was a pretty bad idea. Therefore, I store in an ArrayList (therefore accessible by index) the name and namespace of each attribute of the current node, cache it and return its length. When the MoveToAttribute(i) is executed, I retrieve he name/namespace combination through the list for the index specified, and simply call the MoveToAttribute native method in the navigator with these parameters. I think this is better, although I haven't measured the difference.
As a final word on the implementation: I reviewed Aaron Skonnard attempt at this feature, but I discarded it because it's XmlReader-based, didn't handle attribute/namespace attribute manipulation the way I expected, etc. So I decided to just start from scratch. If you look at his and my code, you'll see they're quite different. I recall Don Box did something too, but XmlReader-based too..
+ As usual, if you just want the full class code to copy-paste on your project, here it is. I strongly encourage you to take a look at the Mvp.Xml project, as there're other cool goodies there!
using System; using System.Collections; using System.Collections.Specialized; using System.IO; using System.Xml; using System.Xml.Serialization; using System.Xml.XPath; namespace Mvp.Xml.XPath { /// <summary> /// Provides an <see cref="XmlReader"/> over an /// <see cref="XPathNavigator"/>. /// </summary> /// <remarks> /// Reader is positioned at the current navigator position. Reading /// it completely is similar to querying for the <see cref="XmlNode.OuterXml"/> /// property. /// <para>The navigator is cloned at construction time to avoid side-effects /// in calling code.</para> /// <para>Author: Daniel Cazzulino, kzu@aspnet2.com</para> /// <para>See: http://weblogs.asp.net/cazzu/archive/2004/04/19/115966.aspx</para> /// </remarks> public class XPathNavigatorReader : XmlTextReader, IXmlSerializable { #region Fields // Cursor that will be moved by the reader methods. XPathNavigator _navigator; // Cursor remaining in the original position, to determine EOF. XPathNavigator _original; // Will track whether we're at a faked end element bool _isendelement = false; #endregion Fields #region Ctor /// <summary> /// Parameterless constructor for XML serialization. /// </summary> /// <remarks>Supports the .NET serialization infrastructure. Don't use this /// constructor in your regular application.</remarks> [System.ComponentModel.EditorBrowsable(System.ComponentModel.EditorBrowsableState.Never)] public XPathNavigatorReader() { } /// <summary> /// Initializes the reader. /// </summary> /// <param name="navigator">The navigator to expose as a reader.</param> public XPathNavigatorReader(XPathNavigator navigator) : base(new StringReader(String.Empty)) { _navigator = navigator.Clone(); _original = navigator.Clone(); } #endregion Ctor #region Private members /// <summary> /// Retrieves and caches node positions and their name/ns /// </summary> private ArrayList OrderedAttributes { get { // List contains the following values: string[] { name, namespaceURI } if (_orderedattributes != null) return _orderedattributes; // Cache attributes position and names. // We do this because when an attribute is accessed by index, it's // because of a usage pattern using a for loop as follows: // for (int i = 0; i < reader.AttributeCount; i++) // Console.WriteLine(reader[i]); // Init list. _orderedattributes = new ArrayList(); // Return empty list for end elements. if (_isendelement) return _orderedattributes; // Add all regular attributes. if (_navigator.HasAttributes) { XPathNavigator attrnav = _navigator.Clone(); _orderedattributes = new ArrayList(); if (attrnav.MoveToFirstAttribute()) { _orderedattributes.Add(new string[] { attrnav.LocalName, attrnav.NamespaceURI }); while (attrnav.MoveToNextAttribute()) { _orderedattributes.Add(new string[] { attrnav.LocalName, attrnav.NamespaceURI }); } } } // Add all namespace attributes declared at the current node. XPathNavigator nsnav = _navigator.Clone(); if (nsnav.MoveToFirstNamespace(XPathNamespaceScope.Local)) { _orderedattributes.Add(new string[] { nsnav.LocalName, XmlNamespaces.XmlNs }); while (nsnav.MoveToNextNamespace(XPathNamespaceScope.Local)) { _orderedattributes.Add(new string[] { nsnav.LocalName, XmlNamespaces.XmlNs }); } } return _orderedattributes; } } ArrayList _orderedattributes; /// <summary> /// Returns the XML representation of the current node and all its children. /// </summary> private string Serialize() { StringWriter sw = new StringWriter(); XmlTextWriter tw = new XmlTextWriter(sw); tw.WriteNode(this, false); sw.Flush(); return sw.ToString(); } #endregion Private members #region Properties /// <summary>See <see cref="XmlReader.AttributeCount"/></summary> public override int AttributeCount { get { // When the user requests the attribute count, it's usually to // use a for iteration pattern for accessing attributes. Therefore, // we force loading the attributes positions to prepare for // indexed access to them. This is done in the OrderedAttributes getter. return OrderedAttributes.Count; } } /// <summary>See <see cref="XmlReader.BaseURI"/></summary> public override string BaseURI { get { return _navigator.BaseURI; } } /// <summary>See <see cref="XmlReader.Depth"/></summary> public override int Depth { get { return _depth; } } int _depth = 0; /// <summary>See <see cref="XmlReader.EOF"/></summary> public override bool EOF { get { return _eof; } } bool _eof = false; /// <summary>See <see cref="XmlReader.HasValue"/></summary> public override bool HasValue { get { return ( _navigator.NodeType == XPathNodeType.Namespace || _navigator.NodeType == XPathNodeType.Attribute || _navigator.NodeType == XPathNodeType.Comment || _navigator.NodeType == XPathNodeType.ProcessingInstruction || _navigator.NodeType == XPathNodeType.SignificantWhitespace || _navigator.NodeType == XPathNodeType.Text || _navigator.NodeType == XPathNodeType.Whitespace); } } /// <summary>See <see cref="XmlReader.IsDefault"/></summary> public override bool IsDefault { get { return false; } } /// <summary>See <see cref="XmlReader.IsDefault"/></summary> public override bool IsEmptyElement { get { return _navigator.IsEmptyElement; } } /// <summary>See <see cref="XmlReader.this"/></summary> public override string this[string name, string namespaceURI] { get { // Attribute requested may be a namespaces prefix mapping. if (namespaceURI == XmlNamespaces.XmlNs) { return _navigator.GetNamespace(name); } else { return _navigator.GetAttribute(name, namespaceURI); } } } /// <summary>See <see cref="XmlReader.this"/></summary> public override string this[string name] { get { return this[name, String.Empty]; } } /// <summary>See <see cref="XmlReader.this"/></summary> public override string this[int i] { get { // List contains the following values: string[] { name, namespaceURI } string[] values = (string[]) OrderedAttributes[i]; return this[values[0], values[1]]; } } /// <summary>See <see cref="XmlReader.LocalName"/></summary> public override string LocalName { get { return _navigator.LocalName; } } /// <summary>See <see cref="XmlReader.Name"/></summary> public override string Name { get { return _navigator.Name; } } /// <summary>See <see cref="XmlReader.NamespaceURI"/></summary> public override string NamespaceURI { get { return _navigator.NodeType == XPathNodeType.Namespace ? XmlNamespaces.XmlNs : _navigator.NamespaceURI; } } /// <summary>See <see cref="XmlReader.NameTable"/></summary> public override XmlNameTable NameTable { get { return _navigator.NameTable; } } /// <summary>See <see cref="XmlReader.NodeType"/></summary> public override XmlNodeType NodeType { get { // Special states. if (_state != ReadState.Interactive) return XmlNodeType.None; if (_isendelement) return XmlNodeType.EndElement; if (_attributevalueread) return XmlNodeType.Text; switch(_navigator.NodeType) { case XPathNodeType.Attribute: // Namespaces are exposed by the XmlReader as attributes too. case XPathNodeType.Namespace: return XmlNodeType.Attribute; case XPathNodeType.Comment: return XmlNodeType.Comment; case XPathNodeType.Element: return XmlNodeType.Element; case XPathNodeType.ProcessingInstruction: return XmlNodeType.ProcessingInstruction; case XPathNodeType.Root: return XmlNodeType.Document; case XPathNodeType.SignificantWhitespace: return XmlNodeType.SignificantWhitespace; case XPathNodeType.Text: return XmlNodeType.Text; case XPathNodeType.Whitespace: return XmlNodeType.Whitespace; default: return XmlNodeType.None; } } } /// <summary>See <see cref="XmlReader.Prefix"/></summary> public override string Prefix { get { return _navigator.Prefix; } } /// <summary>See <see cref="XmlReader.QuoteChar"/></summary> public override char QuoteChar { get { return '"'; } } /// <summary>See <see cref="XmlReader.ReadState"/></summary> public override ReadState ReadState { get { return _state; } } ReadState _state = ReadState.Initial; /// <summary>See <see cref="XmlReader.Value"/></summary> public override string Value { get { return HasValue ? _navigator.Value : String.Empty; } } /// <summary>See <see cref="XmlReader.XmlLang"/></summary> public override string XmlLang { get { return _navigator.XmlLang; } } /// <summary>See <see cref="XmlReader.XmlSpace"/></summary> public override XmlSpace XmlSpace { get { return XmlSpace.Default; } } #endregion Properties #region Methods /// <summary>See <see cref="XmlReader.Close"/></summary> public override void Close() { _state = ReadState.Closed; _eof = true; } /// <summary>See <see cref="XmlReader.GetAttribute"/></summary> public override string GetAttribute(string name, string namespaceURI) { return this[name, namespaceURI]; } /// <summary>See <see cref="XmlReader.GetAttribute"/></summary> public override string GetAttribute(string name) { return this[name]; } /// <summary>See <see cref="XmlReader.GetAttribute"/></summary> public override string GetAttribute(int i) { return this[i]; } /// <summary>See <see cref="XmlReader.LookupNamespace"/></summary> public override string LookupNamespace(string prefix) { return _navigator.GetNamespace(prefix); } /// <summary>See <see cref="XmlReader.MoveToAttribute"/></summary> public override bool MoveToAttribute(string name, string ns) { return _navigator.MoveToAttribute(name, ns); } /// <summary>See <see cref="XmlReader.MoveToAttribute"/></summary> public override bool MoveToAttribute(string name) { return MoveToAttribute(name, String.Empty); } /// <summary>See <see cref="XmlReader.MoveToAttribute"/></summary> public override void MoveToAttribute(int i) { string[] values = (string[]) OrderedAttributes[i]; MoveToAttribute(values[0], values[1]); } /// <summary>See <see cref="XmlReader.MoveToElement"/></summary> public override bool MoveToElement() { if (_navigator.NodeType == XPathNodeType.Attribute || _navigator.NodeType == XPathNodeType.Namespace) { _navigator.MoveToParent(); // Escape faked text node for attribute value. if (_attributevalueread) _depth--; _attributevalueread = false; return true; } return false; } /// <summary>See <see cref="XmlReader.MoveToFirstAttribute"/></summary> public override bool MoveToFirstAttribute() { if (_isendelement) return false; bool moved = _navigator.MoveToFirstAttribute(); if (!moved) moved = _navigator.MoveToFirstNamespace(XPathNamespaceScope.Local); if (moved) { // Escape faked text node for attribute value. if (_attributevalueread) _depth--; // Reset attribute value read flag. _attributevalueread = false; } return moved; } /// <summary>See <see cref="XmlReader.MoveToNextAttribute"/></summary> public override bool MoveToNextAttribute() { bool moved = false; if (_navigator.NodeType == XPathNodeType.Attribute) { moved = _navigator.MoveToNextAttribute(); if (!moved) { // We ended regular attributes. Start with namespaces if appropriate. _navigator.MoveToParent(); moved = _navigator.MoveToFirstNamespace(XPathNamespaceScope.Local); } } else if (_navigator.NodeType == XPathNodeType.Namespace) { moved = _navigator.MoveToNextNamespace(XPathNamespaceScope.Local); } if (moved) { // Escape faked text node for attribute value. if (_attributevalueread) _depth--; // Reset attribute value read flag. _attributevalueread = false; } return moved; } /// <summary>See <see cref="XmlReader.Read"/></summary> public override bool Read() { // Return fast if state is no appropriate. if (_state == ReadState.Closed || _state == ReadState.EndOfFile) return false; if (_state == ReadState.Initial) { _state = ReadState.Interactive; if (_navigator.NodeType == XPathNodeType.Root) { // Sync to the real first node. _original.MoveToFirstChild(); return _navigator.MoveToFirstChild(); } return true; } // Reset temp state. _orderedattributes = null; // Reading attribute values causes movement to faked Text node. if (_attributevalueread) _depth--; // Reset the flag afterwards. _attributevalueread = false; // Reposition if we moved to attributes. if (_navigator.NodeType == XPathNodeType.Attribute || _navigator.NodeType == XPathNodeType.Namespace) _navigator.MoveToParent(); if (_isendelement) { // If we're at the same position we started, it's eof; if (_navigator.IsSamePosition(_original)) { _eof = true; _state = ReadState.EndOfFile; return false; } // If we're at the faked end element, move to next sibling. if (_navigator.MoveToNext()) { _isendelement = false; return true; } else { // Otherwise, move to the parent and set as the // end element of it (we already read all children therefore). _navigator.MoveToParent(); _depth--; // _isendelement remains true. return true; } } else if (_navigator.HasChildren) { _depth++; // Move to child node. return _navigator.MoveToFirstChild(); } else { // Otherwise, try to move to sibling. if (_navigator.MoveToNext()) { return true; } else { // Otherwise, move to the parent and set as the // end element of it (we already read all children therefore). _navigator.MoveToParent(); _depth--; _isendelement = true; return true; } } } /// <summary>See <see cref="XmlReader.ReadAttributeValue"/></summary> public override bool ReadAttributeValue() { // If this method hasn't been called yet for the attribute. if (!_attributevalueread && (_navigator.NodeType == XPathNodeType.Attribute || _navigator.NodeType == XPathNodeType.Namespace)) { _attributevalueread = true; _depth++; return true; } return false; } bool _attributevalueread = false; /// <summary>See <see cref="XmlReader.ReadInnerXml"/></summary> public override string ReadInnerXml() { if (this.Read()) return Serialize(); return String.Empty; } /// <summary>See <see cref="XmlReader.ReadOuterXml"/></summary> public override string ReadOuterXml() { if (_state != ReadState.Interactive) return String.Empty; return Serialize(); } /// <summary>See <see cref="XmlReader.Read"/></summary> public override void ResolveEntity() { // Not supported. } #endregion Methods #region IXmlSerializable Members void IXmlSerializable.WriteXml(XmlWriter writer) { writer.WriteNode(this, false); } System.Xml.Schema.XmlSchema IXmlSerializable.GetSchema() { return null; } void IXmlSerializable.ReadXml(XmlReader reader) { XPathDocument doc = new XPathDocument(reader); _navigator = doc.CreateNavigator(); } #endregion } } Finally, I imagine you could even think about loading an XmlDocument from an XPathNavigator using the XPathNavigatorReader... although can't think of any good reason why would you want to do such a thing :S...
The full project source code can be downloaded from SourceForge .
Enjoy and please give us feedback on the project!
Special credits: the idea of a reader over a navigator isn't new. Aaron Skonnard did an implementation quite some time ago, as well as Don Box (you'll need to search the page for "XPathNavigatorReader". Mine is not based on theirs, and has features lacking on them, but they came first, that's for sure ;).
Check out the Roadmap to high performance XML.
Note: this entry has moved.
Finally I got tired of creating Windows Forms applications by mistake. You know,
when you select Add New Project to a solution, it's the first item. I create FAR more class library projects than UI clients!
So, I went to the C:\Program Files\Microsoft Visual Studio .NET
2003\VC#\CSharpProjects folder, open CSharp.vsdir in notepad, and switch the
priority of CSharpEXE.vsz with the one for CSharpDLL.vsz. This is the fouth
component of each line. It should look like the following:
CSharpEXE.vsz|{FAE04EC1-301F-11d3-BF4B-00C04F79EFBC}|#2318|20|#2319|{FAE04EC1-301F-11d3-BF4B-00C04F79EFBC}|4554|
|WindowsApplication
CSharpDLL.vsz|{FAE04EC1-301F-11d3-BF4B-00C04F79EFBC}|#2322|10|#2323|{FAE04EC1-301F-11d3-BF4B-00C04F79EFBC}|4547|
|ClassLibrary
That's it. It will be the first item in the list forever... (until I hit
one of those mondays at least)
BTW, there's no magic hacking here, it's documented.
Note: this entry has moved.
The whole week I've been digging into a pretty large configuration file and its
schema. It's
Shadowfax (Sfx), which I already introduced in a
previous post. I see some points that allow for improvements,
which mainly have to do with namespaces and extensibility.
Let's recap about what namespaces are for. Here's what the
W3C Namespaces in XML specification says in the motivation section:
We envision applications of Extensible Markup Language (XML) where a single
XML document may contain elements and attributes (here referred to as a "markup
vocabulary") that are defined for and used by multiple software modules. One
motivation for this is modularity; if such a markup vocabulary exists which is
well-understood and for which there is useful software available, it is better
to re-use this markup rather than re-invent it.
Such documents, containing multiple markup vocabularies, pose problems of
recognition and collision. Software modules need to be able to recognize the
tags and attributes which they are designed to process, even in the face of
"collisions" occurring when markup intended for some other software package
uses the same element type or attribute name.
So, namespaces should be used when you expect a document to be extended by
aggregating elements from multiple disparate schemas. This motivation has
to drive the design of the schema, to allow for easy extensibility while
retaining XML-friendliness with regards to the format. The following concrete
points could be improved:
-
Element prefixing: this is just a fragment of the file as it is now:
Clearly, using a prefix is not necessary here. The Sfx namespace is the
only one used in the whole document, so a default unprefixed namespace
could be used on the root of the hierarchy, and the namespace rules of XML
would propagate it to its children. Therefore, the fragment above is
absolutely equivalent from the point of view of XML to the following one:
Consistency of namespaces and their use is also desired across the schemas for
<referenceArchitecture>, <businessActionsDefinition>,
<eventConfiguration>, etc.
-
Attributes with namespace: attributes shouldn't be assigned namespaces. It's
common practice (and the W3C default) to leave attributes without namespaces.
This also makes for more readable files. This default is changed in
the Sfx schema by setting the
attributeFormDefault= "qualified".
What this means is that all attributes in the instance document (the actual
configuration file) must be prefixed, as the attributes are now part of the
targetNamespace:
...
This is pretty cumbersome to read and author, and doesn't really add value
to the extensibilty/usability of the schema and the config file. This may be
a valid (and even necessary) approach for a highly composed document
such as a SOAP message is, where every WS-* spec defines its own attributes and
elements, and almost everything is prefixed. But I wonder if this actually
necessary in a config file... Leaving the default
attributeFormDefault (or omitting it) in the schema, gives
you the following valid instance:
...
I believe this is far better and more familiar. Extensibility isn't hurt, as
the xs:anyAttribute
can still be used, but now you only force attribute prefixing on
extensions, not built-in values, which are the more commonly used. This
brings us to the last point.
-
Schema and configuration extensibility: Sfx is meant to be flexible and
allow a wide range of applications. With this idea in mind, almost everything
is configurable... to an extent. One of the key pieces in this
architecture (and any other SOA-like) is to provide a platform of common
services where your services (let's call them business actions -BA- as in
Sfx) run. I envision that some BAs may need additional
configuration in order to perform their work. I've worked on such an
architecture and BA developers started developing custom configuration
mechanisms for their libraries because the infrastructure didn't provide it,
which led to serious maintenance and deploy problems.
So, the schema for BA configuration should allow for open content in order to
accomodate extensibility elements/attributes.
-
Configuration versioning: given the current target namespace for the
configuration schema (
http://www.microsoft.com/practices/referencearchitecture/services/03-08-2004/ReferenceArchitectureSection.xsd)
it's only natural to infer that versioning will be handled through namespace
changes, according to the release date. There's a
lot of discussion in the community about schema versioning, but most
agree that versioning through namespace changes is not recommended.
This document explains in a short and consice manner the available
options. My suggesion to make the migration path in the future when
configuration is upgraded as easy as possible for developers (and an optional
upgrade tool) is to use the optional XSD version attribute in the schema,
together with a new schemaVersion attribute in the configuration file. The
schema would look like the following:
...etc...
While the configuration would include the appropriate version attribute:
Now when v2 comes out, a tool can detect the version in the configuration file,
and perform any relevant upgrade (for example through an XSLT transformation to
accommodate elements to the new format).
Finally, special care should be taken to specify the
type attribute on
all attribute declarations.
Of course configuration is just the tip of the iceberg of such a comprehensive
product. Shadowfax is a very interesting architecture to build applications on
top. MS is very open on feedback from the community, so I expect it to become
more and more polished and sleek over time. These are my 2 cents with regards
to its configuration file.
If I misunderstood some points in the schema design, I'd be glad to hear from
the Sfx dev guys!
Note: this entry has moved.
Recently, I started working with
Shadowfax, a reference architecure coming from the
Patterns & Practices group at Microsoft. You have a look at an
overview of the architecture at the GDN workspace home. I also recommend
Hernan de Lahitte's blog, one of the guys working on it. He has a nice
introduction as as well as a
closer inspection of the processing pipeline.
I don't to repeat those intros. I'll just say that if you're developing SOA-like
architectures, or you need one for your project, you should definitely take a
look. It's being developed in a very open manner, with source releases pretty
often, with open feedback through the workspace, and even if it's not ready for
prime time yet, it's a good indication of where MS thinks you should be going
with your projects. It closely follows the advises from Application
Architecture for .NET: Designing Applications and Services, and
generally represents a pretty good compendium of best practices. It makes heavy
use of several application blocks from PAG, su as
Configuration Management Application Block,
Authorization and Profile Application Block an the
Logging Application Block.
As this is going to take most of my working day now, I'll start a new category
for these posts. I hope ScottW adds support
for subscribing to a single category soon (as dasBlog
does since quite a while...). This way, you can subscribe to this single category if you want.
I have the task of helping developers have a smooth experience
when programming against Shadowfax. I'll drop my ideas here, as well as
the dev. aids I think could be useful. Even if you don't use Shadowfax, it
would be cool to get your feedback... because it's possible that some of my
ideas seem useful only to me! Stay tunned, lots of posts are coming ;)
Note: this entry has moved.
My previous post has been uploaded for more than 4 hours and I still don't see it in the feed! (try it at http://weblogs.asp.net/cazzu/Rss.aspx)
Highly dissapointing (to say the least).
More Posts
Next page »