Cool XPathReader, but...

Saturday, May 8, 2004

Note: this entry has moved.

Congrats to Dare and Howard about the release of the XPathReader on the MSDN XML DevCenter!

But (there's always a but), I don't think it will actually reach mainstream usage. The idea and implementation is really interesting, but it is "derivative work" (in legal terms) of the SSCLI which explicitly states:

You may use this Software for any non-commercial purpose, subject to the restrictions in this license. Some purposes which can be non-commercial are teaching, academic research, and personal experimentation.

It even shares a typo existing in .NET v1.0, XPathScanner.CurerntChar property... :S.

This renders an exciting piece of software in a pretty useless thing, at least from where I see it: either as something we could take advantage of in the opensource Mvp.Xml project or for use in real-world production (i.e. *not* personal experimentation) systems :(.

On the technical side, the implementation uses the same mechanism (obviously, as it's based on the same code) as the v1.x XPath: build an AST, the dynamically move through it. What's more, it makes the same mistake (IMO) than it: make the expression itself (in this case the XPathCollection class) stateful with regards to the query evaluation and the reader, which is not good as it's impossible to cache it neither it's thread-safe. It's even worse than XPathExpression because it doesn't even implement ICloneable. Building ASTs may be fancy, but it's far from performant if they are used at runtime. The XML team learned this the hard way from the v1.x XSLT implementation, whose performance is far from that achieved by the award-winning MSXML. The brand-new v2 impl. (still an XSLT 1.0 one, remember) now takes the right approach: generating compiled IL. This makes for awesome performance that I believe will surpass any other existing XSLT processor. The same approach is taken for XQuery. So, if you ask me, I'm not excited at all about XPath 1.0 approach in current .NET technologies (not even the XPathReader for the added license reason).

Three links from Juan Wajnerman got me thinking again about streaming full-compliant XPath (and if it's possible at all):

I'll definitely do more research in this area...

BTW, this is what I meant when asked "so an XPathReader could be the solution?". I have doubts with regards to calling "XPath" to such a limited subset... I also don't see how this approach is much better than alternatives such as XSE (Xml Streaming Events), which is opensource in addition, other than it uses the word XPath :o) (you still have to learn all those rules that are no longer valid, hence you're in the presence of another vocabulary). My current thinking is that pull-based APIs may not be that good for streaming (complex) scenarios... maybe a combination like XSE, definitely NOT a straight port of SAX, don't know...
We need to move forward... in a streaming, compiled and performant way ;)

Well, making compiled XPath expression statefull is kinda common pattern. We did the same in our XSLT processor for mainframe several years ago. It usually works fine within XSLT - there is no multhithreading issues and usual compiler practices from the dragon book like stack frames etc apply ok.

And that's still XPath, but somewhat limited.

oleg@tkachenko.com (Oleg Tkachenko) - Wednesday, May 12, 2004 8:16:00 AM

So Daniel,

I am enjoying reading about what you are doing and hope to get into the MVPXML soon. While you are on the subject of pointing out people's typos (as you rightly should!), I believe the correct spelling is "Stateful", with one "L". Cheers.

Peter Bromberg - Thursday, May 13, 2004 4:40:00 PM

Hehe... thanks for the tip Peter! It's interesting that my typo also led Oleg to commit the same one :o)

I'll give Schematron.NET another round soon...

Daniel Cazzulino - Thursday, May 13, 2004 5:35:00 PM

3 Comments