If it seems like I'm obsessed with IIS Log files -- well, okay, I am. My latest adventure with them has been figuring out how to merge logs from IIS5 with logs running on IIS6 to our analysis tools and do their thing with them. Turns out that the IIS6 log file fields vary in sequence from IIS5 and my have an extra value in them at maximum logging levels. So I wanted to write write a set of RegExs that determined if a given line of useful data came from IIS5 or IIS6 without having any of the file headers.
Yeah, really. I'm that big of a geek. :)
It didn't take long to figure that this isn't as trivial of a tasks as it sounds and I needed a way to test my RegExs -- sort of a RegEx Editor and Debugger, if you will. And I found a great one in Regex Buddy. Granted, you might never had need for this tool, but if you're regularly working with RegExs, it just rocks.
For me the coolest thing is it's color-coding features that make it easy to tell patterns apart. The paren balance color feature is very helpful too. But the coolest feature is that you can load pattern file into it and it will show you match/not matching lines by color coding. I highly recommend this tool for anybody that's doing non-trivial work with Regular Expressions.
And did I mention it has a visual tool for building Regexs based on "human friendly" terms as a list/tree? This this tool in bag of tricks, you really don't even have to have a full command of RegExs to make full of them. Tell me that doesn't rock!
Anyway, here's my nicely annotated RegEx for IIS5 logs.
((?# date)\d{4}\-\d{2}\-\d{2}\s+)((?# time)\d{2}\:\d{2}\:\d{2}\s+)((?# c-ip)\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\s+)((?# cs-username).+?\s+)((?# s-sitename).+?\s+)((?# s-computername).+?\s+)((?# s-ip)\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\s+)((?# s-port)\d{1,3}\s+)((?# cs-method).+?\s+)((?# cs-uri-stem).+?\s+)((?# cs-uri-query).+?\s+)((?# sc-status)\d{1,3}\s+)((?# sc-win32-status)\d+\s+)((?# sc-bytes)\d+\s+)((?# cs-bytes)\d+\s+)((?# time-taken)\d+\s+)((?# cs-version)(HTTP\/\d\.\d)|\-\s+)((?# cs-host).+?\s+)((?# csUser-Agent).+?\s+)((?# csCookie).+?\s+)((?# csReferer).+)
And one for IIS6 logs.
((?# date)\d{4}\-\d{2}\-\d{2}\s+)((?# time)\d{2}\:\d{2}\:\d{2}\s+)((?# s-sitename).+?\s+)((?# s-computername).+?\s+)((?# s-ip)\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\s+)((?# cs-method).+?\s+)((?# cs-uri-stem).+?\s+)((?# cs-uri-query).+?\s+)((?# s-port)\d{1,3}\s+)((?# cs-username).+?\s+)((?# c-ip)\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\s+)((?# cs-version)(HTTP\/\d\.\d)|\-\s+)((?# csUser-Agent).+?\s+)((?# csCookie).+?\s+)((?# csReferer).+)((?# cs-host).+?\s+)((?# sc-status)\d{1,3}\s+)((?# sc-substatus)\d{1,3}\s+)((?# sc-win32-status)\d+\s+)((?# sc-bytes)\d+\s+)((?# cs-bytes)\d+\s+)((?# time-taken)\d+)