Syndication

News

     

Archives

Miscelaneous

Programming

June 2004 - Posts

Note: this entry has moved.

This is not really a new trick, I must say up-front. In order to have intellisense on XSLT documents you must copy this XSLT schema file into the folder C:\Program Files\Microsoft Visual Studio .NET 2003\Common7\Packages\schemas\xml.
Restart the IDE and you're done. Enjoy!

Credits: FeserSoft, who developed the schema.
Posted by Daniel Cazzulino | 4 comment(s)
Filed under: ,

Note: this entry has moved.

I've been using FireFox for a while now. It used to be FireBird when I started, and it's been my default browser ever since. But every new version surprises me, as well the constant availability and improvement of all sorts of extensions to it, as well as a growing number of themes to make it look even better.

For example, I stumbled across the Google Toolbar. It even supports features not available in the Google-provided IE toolbar! Amazing. And I can't live anymore without the extensible search functionality:

FireFox search box, showing alternative search engines

It has a sleek design, way better than aged IE one:

FireFox options dialog, way better than IE

And there's an absolutely must-have extension that allows me to open those everytime-less-often pages that can only be browsed with IE:

FireFox extension that allows you to view a page in IE

And there's even an ActiveX wrapper that hosts the Gecko rendering engine (the one behind FireFox, Mozilla and Netscape) and that implements all the interfaces of the IE WebBrowser control. This means you can get rid of the buggy-non-standard-old jscript-aged IE browser for your embedded browser needs!

As if it wasn't enough, Mono has a Gecko# project in the works! IE is dead. Way toooo late to make anything useful to it to catch up with the new wave of internet browsers. I sincerely hope all developers start using FireFox, deploying it, and making web apps that leverage the power of the latest W3C standards. For most extranet/intranet applications, you can control the browser. Go fight for any Gecko-based browser. You'll make yourself a big favor by avoiding IE quicks. You'll also contribute in forcing Microsoft to stop toying with the idea that they can control the internet, its content, and the platform through XAML, Avalon and whatever they invent for that purpose. Dynamic and useful websites were largely possible thanks to MS innovation back in the browser wars days. Don't let them abandon that path. It's the one that benefits us all, just as it did by freeing us from the even-more-buggy Netscape Navigator 4.

Update: I forgot to mention the most important thing for newcomers: FireFox setup for Windows is only 4.7 MB!!! That's even less the size that Netscape Navigator 3.04 had (5.53 MB)!!! Compare that with the incredibly huge 77.51 MB of IE6+SP1...

There's an interesting article on the revival of the browsers war on XML.com.

Posted by Daniel Cazzulino | 15 comment(s)
Filed under:

Note: this entry has moved.

I may have not stressed enough one of the most important features enabled by the XPathNavigatorReaderin-memory (without reparsing) XML Schema validation of arbitrary sources exposed as XPathNavigator.

When XML editing is required, developers typically resort to OuterXml->new XmlTextReader->new XmlValidatingReader->Validate (and re-parse!):

XmlDocument doc = GetModifiedDocument(); // Get the modified doc somehow. // Create the reader from the XML string taken through OuterXml. XmlValidatingReader vr = new XmlValidatingReader( new XmlTextReader(new StringReader(node.OuterXml)));

 There is an absolutely unnecessary re-parsing step that degrades performance. The same scenario can be solved trivially with the XPathNavigatorReader:

XmlDocument doc = GetModifiedDocument(); // Get the modified doc somehow. // Create the validating reader with the new reader over the root document navigator XmlValidatingReader vr = new XmlValidatingReader( new XPathNavigatorReader(doc.CreateNavigator()));

That "simple" change completely bypases the need to re-parse the document. Needless to say, the bigger the document, the higher the cost. In my tests with a fairly small document (~50kb) I could save about 30-40% processing time. And if you use an XPathDocument instead, the processing saving skyrockets to more than 60%! As usual, this shows the superiority of the XPathDocument as a generic XML in-memory store. I can't wait for Whidbey release, when it will offer all of XmlDocument features and more.

As I explained in my previous post, there's another interesting story for the XPathNavigatorReader, and that's about document fragment validation. As the reader considers the navigator's current  position as the root node, you can validate a subset against a refined schema. Specially with complex documents and schemas, this can significantly improve performance too.

The full project source code can be downloaded from SourceForge .

Enjoy and please give us feedback on the project!

Check out the Roadmap to high performance XML.

Note: this entry has moved.

For my own future reference, the link to the MS VirtualCD tool.
Posted by Daniel Cazzulino | 1 comment(s)
Filed under:

Note: this entry has moved.

In a previous post I showed how to load and transform subsets of a document with the XPathNavigatorReader. In the example I used, which follows the MSDN documentation one (under the section "Transforming a Section of an XML Document"), XML parsing is happening once, but in-memory document building is happening for each subtree being transformed, effectively loading those fragments in memory twice. The relevant piece of code is:

XPathNodeIterator books = doc.CreateNavigator().Select("/library/book");
while (books.MoveNext())
{
// There's no XML re-parsing, but a new XPathDocument is loaded!
XPathDocument tmpDoc = new XPathDocument(
new XPathNavigatorReader(books.Current));
...

As Oleg apropriately pointed out, the definite solution (and the one he used for the Mvp.Xml project XmlNodeNavigator) is to have a wrapper navigator that doesn't allow an XPathNavigator to go outside a certain scope. Now the Mvp.Xml project has that solution for all XPathNavigator implementations, the SubtreeXPathNavigator. This class is very similar in nature and implementation to Oleg's. Usage is straightforward: you just pass a newly constructed SubtreeXPathNavigator to the XslTransform class, and it will work on the subtree starting at the navigator received in the constructor, which is considered the new root.

Again, I'll follow the MSDN documentation example. Check my previous post for the original code. In the new version, only a single line of code is changed. Inside the while loop, instead of loading a new XPathDocument to perform the transformation, a new SubtreeXPathNavigator instance constructed:

XslTransform xslt = new XslTransform();
// Always pass evidence!
xslt.Load("print_root.xsl", null, this.GetType().Assembly.Evidence);
// Load the entire doc.
XPathDocument doc = new XPathDocument("library.xml");

// Create a new document for each child
XPathNodeIterator books = doc.CreateNavigator().Select("/library/book");
while (books.MoveNext())
{
// Transform the subtree defined by the current navigator scope.
xslt.Transform(new SubtreeeXPathNavigator(books.Current),
null, Console.Out, null);
}

Ignoring the time it takes to load the stylesheet and the XPath expression compilation (which should both be cached), this yields an amazing 3.5X performance boost, for this simple example. And it uses an XML input of 200 bytes, and a really trivial transformation!

Transforming subtrees may also be useful to reduce the complexity of your stylesheets, and help the XSLT processor in .NET (which is not the fastest in the world) to perform better.

The full Mvp.Xml project source code can be downloaded from SourceForge.

Enjoy and please give us feedback on the project!

Check out the Roadmap to high performance XML.

Note: this entry has moved.

DonXML pointed some issues with regards to the Checklist: XML Performance article. I believe the checklist (and the corresponding "full-length" explanations) could have benefit from more space to cover the topic. I agree with most of Don's comments. The only one I'm not so sure about is his assertion:

By implementing #1 (Use XPathDocument to process XPath statements), it forces you to break #2 (Avoid the // operator by reducing the search scope), since XPathNavigator.Select() always evaluates from the root, not from the context of the current cursor location. 

This observation is partially true. I say partially because you can reduce the scope of a search by explicitly addressing the full hierarchy of nodes, instead of the "//" which is a shortcut for "descendant-of-self". The real cost of "//" is that all nodes being matched must not be duplicated in the resulting node-set, and this incurrs an additional calculation cost. For example, let's say you have an XHTML document, and you want to process all links that exist inside a paragraph. The XPath could be something like: //p//a. Well, as you know, a <p> can be nested in other <p> elements, so that an <a> can be determined to (initially) satisfy the "//a" for two <p> that happen to be parent and child. At this point, the XPath evaluator must skip those <a> that have already been matched. This is what makes the process much more slower.
So, if you positively know that all your <a>s happen as a direct child of <p>, and your <p>s you want to process always appear inside the <body>, you could get an amazing speed boost by replacing the query with "/html/body/p/a". And I really mean *amazing*. Try for yourself with an XHTML version of a long spec, for example the XML Schema part 2. 

But, going back to the main point raised by Don (and which helped me remember those ugly days when I stumbled with it), the core issue is that there's a conscious design decision of making the overload to Evaluate that receives an XPathNodeIterator as the context, absolutely useless. Let me explain (and what follows is exactly the use case I explained in the public newsgroup).

Let's say you have the Pubs database as XML. Now you have selected (for whatever reason) all titles with "//publishers/titles". This will be an XPathNodeIterator with the results:

XPathNavigator nav = document.CreateNavigator(); XPathNodeIterator nodes = nav.Select("//publishers/titles");

At some point, let's say you need to work with all prices from that set of nodes. The navigator exposes an overload for the Evaluate method that receives an XPathNodeIterator object as the context to execute the evaluation on. It seems natural, then, to think that the following code would yield the results we expect:

XPathExpression expr = nav.Compile("price"); object allprices = nav.Evaluate(expr, nodes);

The result I expect is a node-set (XPathNodeIterator) for each price child of the titles I passed as the second argument to Evaluate. Well, that isn't happening, because the "price" expression is being evaluated from the document root. So, what's this overload useful for?

The code Oleg used doesn't test the problem, as he's iterating each node (i.e. the nodes variable above) and evaluating on each of them without using the other overload. This works, just as the regular Select method does.

Posted by Daniel Cazzulino | 4 comment(s)
Filed under: ,

Note: this entry has moved.

From one of the developers behind GDN:

Btw, Did i tell you how cool and powerful .Net is and the way it lends itself to building reliable production quality applications.

It's absolutely new to me that a platform by itself "lends" to such reliable and production quality applications. I thought you had to study, read patterns for reliable software design, test and design for performance from the start...
Judging from these words, it seems I've been using a site that isn't GDN... and all those badly designed, buggy, slow-as-a-dog, prototype-quality (at best) .NET apps I see every now and then are just an illusion...

If I only new I didn't need to study so much to build reliable production quality apps!
Posted by Daniel Cazzulino | 4 comment(s)
Filed under:
More Posts