The lost art of using regular expressions for parsing?
Note: this entry has moved.
Regular expressions are really powerful and very cool. Most people think of them as just a validation mechanism. They are missing a big scenario enabled by regexes: parsing.
Some other people think that if you're doing any parsing, you **have** to use parser generator tools (i.e. yacc/lex, antlr, coco/r, etc), build a formal grammar of your language, etc. But do you really **need** to get into that? Do you want proof that you can achieve the same goal with regular expressions? The ASP.NET page parser is built with regular expressions, and not only the v1.x, but the Whidbey version too.
Wanna confirm? Fire up Reflector, search for the TemplateParser
class in the System.Web.UI
namespace, and look at the ParseStringInternal
method. There you will see how the BaseParser
class is being used to parse the page source, which contains all the regular expressions for the several pieces of a page.
I've build a number of parsers with regexes, from simple expression parsers (i.e. a more flexible and powerful expression format than DataBinder.Eval, for example) to full template file parsing (i.e. templates with ASP-like syntax for codegen, in the spirit of CodeSmith, NVelocity, etc.). And it works very well. And your code using very complex regular expressions doesn't have to be a cryptic-impossible to read-never ending-line of almost garbage that only you can understand.
Bottom-line: learn regular expression. There're a lot of very real problems that you can solve SO easily with them...