March 2004 - Posts
Note: this entry has moved.
After reading Oleg's post
about an upcoming SAX.NET implementation, and while I still
look forward to the other XML fellow developer working on that, I got certainly
excited and run to download and have a look at it. I was dissapointed, I must
say.
When I see a project implementing a class called XmlNamespaces containing
methods such as AddMapping, GetPrefixMapping, PushScope
and PopScope (among others), which do exactly the same as the
System.Xml.XmlNamespaceManager with its AddNamespace, LookupNamespace,
LookupPrefix, PushScope and PopScope, I
start wondering whether straight ports of other platform libraries really does
make sense in .NET. The mismatch doesn't end there:
-
There's IAttributes AND IAttributes2, and the corresponding implementations
called AttributesImpl and AttributesImpl2 (?!?!). Multiply that by ILocator,
IEntityResolver and so on. This is the first port and there's already interface
versioning problems?
-
There's an IXMLReader (note the casing) class with an EntityResolver property
which doesn't try to take advantage of .NET XmlResolver class, instead
reinventing it through the IEntityResolver interface
-
All the GetFeature/SetFeature/IProperty baggage that only makes sense when
multiple XML parsers are available and with varying features support (which
judging from the
silence of death
to my request for support to such scenarios isn't going to happen at all in
.NET)
-
Non-standard delegate such as
OnPropertyChange(IProperty property, object
newValue) - in .NET world it would have been OnPropertyChange(object
sender, ProperyChangeEventArgs e).
-
Trivial things such as:
public static string GetString(RsId id)
{
string name = Enum.GetName(typeof(RsId), id);
return rm.GetString(name);
// Should have been:
// return rm.GetString(id.ToString());
}
I think copying Java projects over to .NET is not always a good idea, specially
if done by people who doesn't work with C# and .NET on a daily basis. Examples of well done ports are NUnit and Log4Net, for example. Note,
however, that it wasn't until v2 that NUnit started using .NET-isms as custom
Attributes.
So, do I want SAX.NET? Definitely NOT. I like some of its ideas. We, as .NET
developers, should take the best ideas from it, mix them with .NET-friendly
APIs, take advantage of built-in infrastructure, and improve on it. So, I still
like it much more the Xml
Streaming Events (XSE) idea than any of these ports. I have to work
further on it, develop more use cases, clarify the API and give a second though
to some concepts, but it definitely integrates far better with current and
future .NET XML support. What I definitely don't want, is to code against
a pseudo-.NET/pseudo-Java API.
Note: this entry has moved.
Just in case
you didn't hear it before.
Note: you will have to drill down to the Developer Tools tree node to find it. It isn't in the home as New Downloads yet...
Note: this entry has moved.
Today, you validate XML in .NET v1.x by creating an XmlValidatingReader,
setting the schema, and reading:
// Configure the validating reader
XmlValidatingReader vr = new XmlValidatingReader(theinput);
// Add the schema to the reader (usually the schema is preloaded only once).
vr.Schemas.Add(theschema);
while (vr.Read())
{
// Do your stuff.
}
You have two options for handling invalid content in the input document (with
regards to the schema/s):
-
Catch the exception thrown at the first error, halting processing:
try
{
while (vr.Read())
{
// Do your stuff.
}
}
catch (XmlException ex)
{
// Report the *parse* exception/rethrow.
}
catch (XmlSchemaException ex)
{
// Report the *validation* exception.
}
-
Attach to the
ValidationEventHandler (according to .NET naming conventions this
would have been named ValidationError or something like that):
vr.ValidationEventHandler += new ValidationEventHandler(OnValidationError);
while (vr.Read())
{
// Do your stuff.
}
if (_haserrors)
{
// Report the errors/throw.
}
Here you get a chance of sort of recovering from errors, as you can keep
reading and working with data. The _haserrors flag is set by your
OnValidationError event handler, as well as the accumulation of
error messages.
So far so good. All this is clearly explained in the
MSDN documentation. The validation handler signature looks just like
what you would expect:
void ValidationCallback(object sender, ValidationEventArgs e)
{
}
In case 2, what happens to the invalid XML item in the input? Well, it's read
anyways, as well as its content. Now, suppose that the element just found
doesn't even exist in your schema, and most probably its inner content either.
Your validation error messages will be filled with errors about each and every
single item inside the erroneous element. What's more, I may want my application
to work in a "forgiveness" mode and so do something useful with what IS valid so far.
Easy enough, I though. I have a sender in my validation callback. I
bet it's the reader. I just have to cast it back, call the
Skip method, accumulate just one error for the current
validation failure, and move on:
private void OnValidationError(object sender, ValidationEventArgs e)
{
if (e.Severity = XmlSeverityType.Error)
{
// Accumulate error, set flag.
((XmlReader)sender).Skip();
}
}
Unfortunately, the sender is null in v1.x, so no luck.
The good news is that this has been fixed in the PDC bits. Maybe we can hope a
service pack/hotfix for v1.1...
Note: this entry has moved.
For the past year I had the pleasure of working at Lagash Systems SA, a
high-end consulting firm in Argentina, run buy really cool guys who created a
company that is by far the best place you can work in Argentina right
now. You won't find Morts there, only Einsteins. It was really an excelent
experience, working with clever people, doing interesting and advanced stuff,
and sharing knowledge as I had never seen in other companies. I can honestly
say that my
initial expectations were easily surpassed. The company afforded a
diving course (including the initiating trip to a "lake"!) for all of us, where
we spent a couple great days, and they even gave me as a gift a beatiful cradle
when
my little baby Agustina was born, which will always make me remember
them. All I can say is a big "thank you", I've nothing but gratitude to them.
However, it's a fact of life that you always want more. And it was time for me
to start my own company. I had been working on my own before (a whole year
devoted to .NET research and writing for Wrox, which eventually led me to be a
speaker in .NET ONE 2002 in Frankfurt), but couldn't find
a partner so share the effort, and then Lagash came. This time, I found
such a partner, the brilliant and excelent guy Victor (a.k.a.
vga). We share a common view about technology, and the enthusiasm
to continuously learn new advanced stuff and play with the latest .NET bits we
can get.
So it's now time for Clarius Consulting SA (clariusconsulting.com
and clariusconsulting.net in the registration process now), where we expect to
develop further our public visibility and share with the comunity the stuff we
learn (mainly with Whidbey now) through our new site
aspnet2 (under construction still) and our books (two
of them
comming out soon from Apress). We have officially started the company (that is,
we signed the appropriate papers with our lawyer) on March 15, 2004. An
important day in our lives, and the beginning of interesting times, I'm sure...
Note: this entry has moved.
A couple weeks ago Rob Howard
(from the ASP.NET team)
announced the "disclosure" of the
Provider Design Pattern they are using in Whidbey ASP.NET (v2).
I've got a couple complaints with this implementation:
While the first two are a matter of taste in the end, the last
one should be fixed promptly. I didn't hear any voice complaining,
however. Am I the only one envisioning complex providers with the need to
configure themselves with hierarchical XML information? It's all too common
everywhere!
You want a a concrete example? Here it goes:
What if I develop a provider that implements automatic DB schema installation
and migration? My super provider could allow the full DB schema to be specified
in the configuration itself:
...other tables...
The provider can detect the presence of the schema and create it automatically
if necessary. I could even go as far as saying that it could even define
through configuration the way to migrate a schema if it's incompatible, or
whatever.
Another one: maybe my provider uses a webservice. I may need to pass complex
information to the provider, such as credentials, proxy information, SOAP
message skeletons, or whatever. None of this is possible with a NameValueCollection.
Note: this entry has moved.
When dealing with XML in .NET, you're mostly faced with two options:
-
Streaming API: the XmlReader.
-
Object model API: either XmlDocument, XPathDocument or an XmlSerializer-aware
custom object model.
Several reasons can lean you towards any of the later ones, such as strong
typing (XmlSerializer), flexibility and XPath querying (XmlDocument and
XPathDocument), etc. Any of the three object model API approaches, however,
require the entire XML input to be parsed and loaded to memory. Therefore, when
you're presented with large documents, or need the fastest processing, all
you're left with is the XmlReader. If you worked with it doing anything but the
most trivial XML processing, you know how ugly it can become. Lots of string
comparison, endless switch, if, loops, whatever.
From my point of view, working against a custom object model is best,
as it gives you a level of abstraction from the wire format, and you get
to work with OO classes and properties, which is far more comfortable than
dealing with InnerXml, Value, etc. If you haven't tried the XmlSerializer
approach before, you definitely should.
When you move to streaming processing, you lose all that. And you don't
lose it because the abstractions of your entities have disappeared, as you most
probably have an XML Schema defining what the XML must look like. You just lose
it because of the API. You can still use the XML Schema to validate as you
read, and get some (very little) extra functionality from the XmlValidatingReader.ReadTypedValue()
method. If you're like me, you may be asking: given that I know the schema at
design time, isn't there a way to use it to make things easier for me?
And that's not the only issue. Validating against an XML Schema, even if it's
absolutely a really good idea to keep your application data consistent and
considerably reduce your own validation code, is not for free. According to
tests I've done with the (fairly simple) purchase order schema and
instance document in XML Schema Part 0:
Primer, XmlValidatingReader is between 10X and 12X slower
than the XmlTextReader. Not that this is a bad number, just that you
need to have that in mind. And why is it so costly? Well, mostly because it's a
generic XML Schema validator, which means as it parses, it checks valid
transition between states, data types, facets, etc. And again, given that
I know the schema at design time, isn't there a way to use it to make things
easier for the parser?
Typed readers
Just as typed datasets build upon the generic DataSet to bring strong-typing and
validation to the game, based on an XML Schema, wouldn't it be great if the
same existed for readers?
A typed reader should be built upon the XmlReader and provide the same
validation capabilities as XmlValidatingReader, but at a fraction of the cost,
because it would already know all the elements, attributes and types, and it
would also be able to read and validate an specific schema.
Given a purchase order document,
I could write code as follows:
poReader r = new poReader( inputStream );
if (r.Read())
{
// Typed date for the orderDate attribute.
Console.WriteLine( r.orderDate.ToShortDateString() );
shipToReader shipto = r.ReadshipTo();
// Country attribute turned into an Enum
if (shipto.country == shipToCountry.US)
Console.WriteLine( "US!!" );
// An inner simple-typed element is made a property
// In OO, there's no distinction between this and an attribute.
Console.WriteLine( shipto.name );
}
Maybe it should be something more like this:
poReader r = new poReader( inputStream );
while (r.Read())
{
if (r.TypedReader is shipToReader)
{
shipToReader shipto = (shipToReader) r.TypedReader();
// Work against the typed one now.
}
else if (r.TypedReader is itemsReader)
{
// Do so for items.
}
}
I sort of prefer the later. The TypedReader property would contain
the instance used to read (and validate) the current element content model,
which would be the current
strategy being applied. With the advent of generics, maybe I should
even be allowed to pass the typed reader I want...
r.Read();
I guess in Whidbey that would be way to implement it internally, anyways....
Another possible use is dynamic run-time generation of these typed readers
for a schema. If we can prove that performance will increase, we could use the
typed readers not to gain usability but to gain speed. This could be a
specialized factory that emits the code (the same your would get at design
time) to execute:
XmlSchema sch = new XmlSchema.Read(theFile, null);
XmlReader r = XmlTypedFactory.CreateReader( sch );
The factory itself would keep cached versions of the Types it has already generated from a certain schema...
So, what do you think about such an idea? Is it useful? Would you use it? What
should the API look like?
This may be part of the new Mvp.Xml project
most XML MVPs (including me, of course) are heading.
Note: this entry has moved.
Hernan de Lahitte brings signing, encrypting and hashing to the masses with his Crypto for Everyone post. His work is not pet-project development. I worked with this guy in probably the biggest .NET project in Argentina, and he really knows what security means. The helpers he presents are fully tested, widely used in that project to perform many security-sensitive actions, and is really a time-saver for anyone (like myself) who just wants to call a method and get somebody else worry about the intricacies of cryptography.
Next time you need to encrypt a license key, a password, in-memory (i.e. CallContext) data internal to your framework, etc., be sure to download it.
Note: this entry has moved.
Dare is announcing the upcoming MSDN XML Dev Center (~ two weeks from launch), and asking for a tagline suggestion. Mine is:
The asphalt for the Information Highway.
About covering newer, work-in-progress and unreleased stuff, I definitely think it makes for more compelling and interesting reading, if mixed with today's technologies articles. Having the mix means I can read those for today during work, and enjoy the edge-stuff at night at home :). It's a must that the article BEGINS by saying which versions of which draft/beta/platform are used for the discussion, so that whenever those either are deprecated/disappear/mutate or go Recomendations/Standard/RTM the reader knows that right from the start.
I also agree with Oleg that they should be more theory/exploratory than the regular material.
Note: this entry has moved.
Every now and then I receive complains about XPathNodeIterator.
You know, it allows iteration where each Current element is an XPathNavigator.
Not too useful if you're looking for OuterXml, or are
too-dependant on the XmlNode-based API (i.e. XmlDocument). The
most worrying issue is that people use this argument against using compiled
XPath expressions, which are known to significantly improve performance (see
Performant XML (I) and
Performant XML (II) articles). The reason is that in order to get an
XmlNodeList, you have to use the SelectNodes method of the XmlNode (and
therefore XmlDocument), whose signature is as follows:
public XmlNodeList SelectNodes(string xpath);
public XmlNodeList SelectNodes(string xpath, XmlNamespaceManager nsmgr);
This means that most developers won't compile their expressions simply because
in order to use the XPathExpression, they have to explicitly
create a navigator for the node/document and work against the cursor-style API
of the XPathNodeIterator and XPathNavigator:
// Statically compile and cache the expression.
XPathExpression expr;
// Init and load a document.
XmlDocument document;
// Create navigator, clone expression and execute query.
XPathNodeIterator it = document.CreateNavigator().Select(expr.Clone());
while (it.MoveNext())
{
// Do something with it.Current which is an XPathNavigator.
}
This approach generally means that in order to optimize the code by compiling
expression, you actually have to refactor significant pieces of your code. And
you don't have any other choice if you need to sort the query by using XPathExpression.AddSort().
There's a solution to this problem, as usual :).
You know that the XPathNavigator is an abstract class that allows
multiple underlying implementations to offer the same cursor-style API and gain
the instant benefit of XPath querying.
Aaron Skonnard has some interesting implementations showing this
concept. Therefore, when you're iterating the results of the query, and asking
for the current element, you're actually using something that is dependant on
the implementation. Therefore, this object, besides being an XPathNavigator
(that is, the XPathNodeIterator.Current property), can also implement other
interfaces as part of the underlying implementation. As such, queries executed
against an XmlNode-based element will have each Current element implementing IHasXmlNode
whereas XPathDocument-based ones will implement IXmlLineInfo.
And what is this useful for? Well, just to get access to additional information
beyond the standard XPathNavigator API that depends on the
concrete implementation. So, inside the while look above, we can ask:
while (it.MoveNext())
{
if (it.Current is IHasXmlNode)
{
XmlNode node = ((IHasXmlNode)it.Current).GetNode();
// Work with your beloved DOM api ;)
}
}
Still, this doesn't solve the problem that you have to iterate diffently than
you're used to, and that significant rewrites are still needed when you use XPathExpression
for querying.
The solution is to use the knowledge about the underlying implementation (i.e.
you KNOW you're querying against an XmlDocument) and get an easier
API to it. This can be achieved by creating an IEnumerable class
that provides iteration ofer the XPathNodeIterator but exposing
the underlying XmlNode. Also, a helper method returning an
array of XmlNodes is useful. It would be used as follows:
XPathNodeIterator it = doc.CreateNavigator().Select(expr.Clone());
XmlNodesEnumerable nodes = new XmlNodesEnumerable(it);
foreach (XmlNode node in en)
{
Response.Write(node.OuterXml);
}
// Or use the array directly:
XmlNode[] list = nodes.ToArray();
Complete code for the custom enumerable object and its internal enumerator
implementation follows.
+ Collapsible code listing.
///
/// Provides enumeration over an but
/// exposing the underlying elements.
///
public class XmlNodesEnumerable : IEnumerable
{
XPathNodeIterator _iterator;
///
/// Constructs the iterator.
///
/// The instance containing the nodes to iterate.
public XmlNodesEnumerable(XPathNodeIterator iterator)
{
_iterator = iterator;
}
///
/// Returns all nodes in the underlying iterator as an array.
///
/// An array with all nodes.
public XmlNode[] ToArray()
{
ArrayList list = new ArrayList();
IEnumerator en = new XmlNodesEnumerator(_iterator);
while (en.MoveNext())
{
list.Add(en.Current);
}
return (XmlNode[]) list.ToArray(typeof(XmlNode));
}
#region IEnumerable Members
IEnumerator IEnumerable.GetEnumerator()
{
return new XmlNodesEnumerator(_iterator);
}
#endregion
#region Inner XmlNodesEnumerator class
///
/// Provides iteration over an but
/// exposing the underlying elements.
///
private class XmlNodesEnumerator : IEnumerator
{
XPathNodeIterator _iterator;
///
/// Constructs the iterator.
///
/// The instance containing the nodes to iterate.
public XmlNodesEnumerator(XPathNodeIterator iterator)
{
_iterator = iterator;
}
#region IEnumerator Members
///
/// Not supported.
///
void IEnumerator.Reset()
{
throw new NotSupportedException("Can't reset this enumerator.");
}
///
/// Returns the current .
///
/// The current item in the
/// underlying doesn't point to an .
object IEnumerator.Current
{
get
{
IHasXmlNode node = _iterator.Current as IHasXmlNode;
if (node == null)
throw new ArgumentException("Can only traverse XmlNode iterators.");
return node.GetNode();
}
}
///
/// Advances the iteration cursor.
///
/// True if more nodes remain in the iterator.
bool IEnumerator.MoveNext()
{
return _iterator.MoveNext();
}
#endregion
}
#endregion
}
Update: check an even better approach here.
Enjoy!
Check out the Roadmap to high performance XML.
More Posts