June 2004 - Posts
Note: this entry has moved.
This is not really a new trick, I must say up-front. In order to have intellisense on XSLT documents you must copy this XSLT schema file into the folder C:\Program Files\Microsoft Visual Studio .NET 2003\Common7\Packages\schemas\xml.
Restart the IDE and you're done. Enjoy!
Credits:
FeserSoft, who developed the schema.
Note: this entry has moved.
I've been using FireFox
for a while now. It used to be FireBird when I started, and it's been my
default browser ever since. But every new version surprises me, as well the
constant availability and improvement of all sorts of
extensions to it, as well as a growing number of
themes to make it look even better.
For example, I stumbled across the
Google Toolbar. It even supports features not available in the
Google-provided IE toolbar! Amazing. And I can't live anymore without the
extensible search functionality:

It has a sleek design, way better than aged IE one:
And there's an absolutely must-have extension that allows me to open those
everytime-less-often pages that can only be browsed with IE:

And there's even an ActiveX
wrapper that hosts the Gecko rendering engine (the one behind FireFox,
Mozilla and Netscape) and that implements all the interfaces of the IE
WebBrowser control. This means you can get rid of the buggy-non-standard-old
jscript-aged IE browser for your embedded browser needs!
As if it wasn't enough, Mono has a
Gecko# project in the works! IE is dead. Way
toooo late to make anything useful to it to catch up with the new wave of
internet browsers. I sincerely hope all developers start using FireFox,
deploying it, and making web apps that leverage the power of the latest W3C
standards. For most extranet/intranet applications, you can control the
browser. Go fight for any Gecko-based browser. You'll make yourself a
big favor by avoiding IE quicks. You'll also contribute in forcing
Microsoft to stop toying with the idea that they can control the internet, its
content, and the platform through XAML, Avalon and whatever they invent for
that purpose. Dynamic and useful websites were largely possible thanks to MS
innovation back in the browser wars days. Don't let them abandon that
path. It's the one that benefits us all, just as it did by freeing us from the
even-more-buggy Netscape Navigator 4.
Update: I forgot to mention the most important thing for newcomers: FireFox setup for Windows is only 4.7 MB!!! That's even less the size that Netscape Navigator 3.04 had (5.53 MB)!!! Compare that with the incredibly huge 77.51 MB of IE6+SP1...
There's an interesting article on the revival of the browsers war on XML.com.
Note: this entry has moved.
I may have not stressed enough one of the most important features enabled by
the XPathNavigatorReader: in-memory
(without reparsing) XML Schema validation of arbitrary sources exposed as XPathNavigator.
When XML editing is required, developers typically resort to OuterXml->new
XmlTextReader->new XmlValidatingReader->Validate (and re-parse!):
XmlDocument doc = GetModifiedDocument(); //
Get the modified doc
somehow. // Create the reader from the XML string taken through OuterXml.
XmlValidatingReader vr = new XmlValidatingReader(
new XmlTextReader(new StringReader(node.OuterXml)));
There is an absolutely unnecessary re-parsing step that degrades
performance. The same scenario can be solved trivially with the
XPathNavigatorReader:
XmlDocument doc = GetModifiedDocument(); // Get the modified doc somehow.
// Create the validating reader with the new reader over the root document navigator
XmlValidatingReader vr = new XmlValidatingReader(
new XPathNavigatorReader(doc.CreateNavigator()));
That "simple" change completely bypases the need to re-parse the document.
Needless to say, the bigger the document, the higher
the cost. In my tests with a fairly small document (~50kb) I
could save about 30-40% processing time. And if you use
an XPathDocument instead, the processing saving
skyrockets to more than 60%! As usual, this shows the superiority of
the XPathDocument as a generic XML in-memory store. I can't
wait for Whidbey release, when it will offer all of XmlDocument features and
more.
As I explained in
my previous post, there's another interesting story for the XPathNavigatorReader,
and that's about document fragment validation. As the reader considers the
navigator's current position as the root node, you can
validate a subset against a refined schema. Specially with complex documents
and schemas, this can significantly improve performance too.
The full project source code can be downloaded from
SourceForge .
Enjoy and please give us feedback on the project!
Check out the Roadmap to high performance XML.
Note: this entry has moved.
For my own future reference, the link to the
MS VirtualCD tool.
Note: this entry has moved.
In a previous post I showed how to load and transform subsets of a document with the XPathNavigatorReader. In the example I used, which follows the MSDN documentation one (under the section "Transforming a Section of an XML Document"), XML parsing is happening once, but in-memory document building is happening for each subtree being transformed, effectively loading those fragments in memory twice. The relevant piece of code is:
XPathNodeIterator books = doc.CreateNavigator().Select("/library/book");
while (books.MoveNext())
{
// There's no XML re-parsing, but a new XPathDocument is loaded!
XPathDocument tmpDoc = new XPathDocument(
new XPathNavigatorReader(books.Current));
... As Oleg apropriately pointed out, the definite solution (and the one he used for the Mvp.Xml project XmlNodeNavigator) is to have a wrapper navigator that doesn't allow an XPathNavigator to go outside a certain scope. Now the Mvp.Xml project has that solution for all XPathNavigator implementations, the SubtreeXPathNavigator. This class is very similar in nature and implementation to Oleg's. Usage is straightforward: you just pass a newly constructed SubtreeXPathNavigator to the XslTransform class, and it will work on the subtree starting at the navigator received in the constructor, which is considered the new root.
Again, I'll follow the MSDN documentation example. Check my previous post for the original code. In the new version, only a single line of code is changed. Inside the while loop, instead of loading a new XPathDocument to perform the transformation, a new SubtreeXPathNavigator instance constructed:
XslTransform xslt = new XslTransform();
// Always pass evidence!
xslt.Load("print_root.xsl", null, this.GetType().Assembly.Evidence);
// Load the entire doc.
XPathDocument doc = new XPathDocument("library.xml");
// Create a new document for each child
XPathNodeIterator books = doc.CreateNavigator().Select("/library/book");
while (books.MoveNext())
{
// Transform the subtree defined by the current navigator scope.
xslt.Transform(new SubtreeeXPathNavigator(books.Current),
null, Console.Out, null);
}
Ignoring the time it takes to load the stylesheet and the XPath expression compilation (which should both be cached), this yields an amazing 3.5X performance boost, for this simple example. And it uses an XML input of 200 bytes, and a really trivial transformation!
Transforming subtrees may also be useful to reduce the complexity of your stylesheets, and help the XSLT processor in .NET (which is not the fastest in the world) to perform better.
The full Mvp.Xml project source code can be downloaded from SourceForge.
Enjoy and please give us feedback on the project!
Check out the Roadmap to high performance XML.
Note: this entry has moved.
DonXML pointed
some issues with regards to the
Checklist: XML Performance article. I believe the checklist (and the
corresponding "full-length" explanations) could have benefit from more space to
cover the topic. I agree with most of Don's comments. The only one I'm not so
sure about is his assertion:
By implementing #1 (Use XPathDocument to process XPath statements),
it forces you to break #2 (Avoid the // operator by reducing the search scope),
since XPathNavigator.Select() always evaluates from the root, not from the
context of the current cursor location.
This observation is partially true. I say partially because you can reduce the
scope of a search by explicitly addressing the full hierarchy of nodes, instead
of the "//" which is a shortcut for "descendant-of-self". The real cost of
"//" is that all nodes being matched must not be duplicated in the
resulting node-set, and this incurrs an additional calculation cost. For
example, let's say you have an XHTML document, and you want to process all
links that exist inside a paragraph. The XPath could be something
like: //p//a. Well, as you know, a <p> can be nested in other
<p> elements, so that an <a> can be determined to (initially)
satisfy the "//a" for two <p> that happen to be parent and child. At this
point, the XPath evaluator must skip those <a> that have already been
matched. This is what makes the process much more slower.
So, if you positively know that all your <a>s happen as a direct child of
<p>, and your <p>s you want to process always appear inside
the <body>, you could get an amazing speed boost by replacing the
query with "/html/body/p/a". And I really mean *amazing*. Try for yourself with
an XHTML version of a long spec, for example the XML Schema part 2.
But, going back to the main point raised by Don (and which helped me
remember those ugly days when I stumbled with it), the core issue is that
there's a conscious design decision of making the overload to Evaluate that
receives an XPathNodeIterator as the context, absolutely useless. Let me
explain (and what follows is exactly the use case I explained in
the public newsgroup).
Let's say you have the Pubs database as XML. Now you have selected (for
whatever reason) all titles with "//publishers/titles". This will be an
XPathNodeIterator with the results:
XPathNavigator nav = document.CreateNavigator();
XPathNodeIterator nodes = nav.Select("//publishers/titles");
At some point, let's say you need to work with all prices from that set
of nodes. The navigator exposes an overload for the Evaluate method that
receives an XPathNodeIterator object as the context to execute the evaluation
on. It seems natural, then, to think that the following code would yield the
results we expect:
XPathExpression expr = nav.Compile("price");
object allprices = nav.Evaluate(expr, nodes);
The result I expect is a node-set (XPathNodeIterator) for each price child of
the titles I passed as the second argument to Evaluate. Well, that isn't
happening, because the "price" expression is being evaluated from the document
root. So, what's this overload useful for?
The code
Oleg used doesn't test the problem, as he's iterating each node (i.e. the
nodes variable above) and evaluating on each of them without using the other
overload. This works, just as the regular Select method does.
Note: this entry has moved.
From one of the developers behind GDN:
Btw, Did i tell you how cool and powerful .Net is and the way it lends itself to building reliable production quality applications.
It's absolutely new to me that a platform by itself "lends" to such reliable and production quality applications. I thought you had to study, read patterns for reliable software design, test and design for performance from the start...
Judging from these words, it seems I've been using a site that isn't GDN... and all those badly designed, buggy, slow-as-a-dog, prototype-quality (at best) .NET apps I see every now and then are just an illusion...
If I only new I didn't need to study so much to build reliable production quality apps!
More Posts