February 2005 - Posts

What do you use to analyze your IIS log files?

Currently I'm using awstats which is a very nice free log analyzer. It has a lot of nice statistics. One stat is the referrers however I would like to see which page that the referrer linked too. It is also nice that you can see the search phrases used to get to your site, but I would also like to know which page is associated with a particular search phrase. As far as I can tell these are not available in awstats.

Does anyone know how to get these with awstats? Or can anyone recommend another free log file analyzer that will do this? I'm mostly interested in page counts and referrers.

Posted by puzzlehacker | with no comments

ChangeThis

I just ran across changethis.com thanks to rido's post The six laws of software. ChangeThis has some pretty cool articles or manifestos as they tend to call them. Here is the description they provide about the site.

ChangeThis is creating a new kind of media. A form of media that uses existing tools (like PDFs, blogs and the web) to challenge the way ideas are created and spread.

We're on a mission to spread important ideas and change minds. Read more...

Check it out.

Color tools for the design impaired

Michael Moncur has listed a number of color tools in his post Color tools for the design impaired. I particularly like the Color Scheme Generator 2. These really help someone like me, who doesn't know crap about proper color combinations.

Michael also has a post about Web icons for the design impaired.

Matching Balanced Constructs with .NET Regular Expressions

Brief Computer Science Theory Background

In computer science a formal language is a set of finite character strings that are created by some finite alphabet. There exist four major formal language classes as defined by the Chomsky hierarchy.

Most programming language syntax can be described by a context-free language and can be recognized by a PDA. A PDA can be thought of as a FSA or FSM that can use a stack to store data. Regular languages are used to describe simple string patterns such as program identifiers. Regular expressions are strings that describe a particular regular language. Regular languages cannot recognize any string that requires any sort of counting. One classic language anbn where n > 0 is not a regular language because it cannot be recognized by a FSA. It is a context-free language and can be recognized by a PDA. Whenever an a is read then an a is pushed onto the stack and then whenever a b is read an a is popped off the stack and if the stack is empty after reading all the characters of the string then the string is accepted. Similarly properly balanced constructs such as balanced parentheses need a PDA to be recognized and thus cannot be represented by a regular expression.

.NET Regular Expression Engine

As described above properly balanced constructs cannot be described by a regular expression. However, the .NET regular expression engine provides a few constructs that allow balanced constructs to be recognized. 

  • (?<group>) - pushes the captured result on the capture stack with the name group.
  • (?<-group>) - pops the top most capture with the name group off the capture stack.
  • (?(group)yes|no) - matches the yes part if there exists a group with the name group otherwise matches no part.

These constructs allow for a .NET regular expression to emulate a restricted PDA by essentially allowing simple versions of the stack operations: push, pop and empty. The simple operations are pretty much equivalent to increment, decrement and compare to zero respectively. This allows for the .NET regular expression engine to recognize a subset of the context-free languages, in particular the ones that only require a simple counter. This in turn allows for the non-traditional .NET regular expressions to recognize individual properly balanced constructs.

.NET Regular Expression Examples

The classic anbn example.

Regex re = new Regex(@"^
  (?<N>a)+    # For every a push N on capture stack
  (?<-N>b)+   # For every b pop N from capture stack
  (?(N)(?!))  # If N exists on stack then fail (?!)
  $", RegexOptions.IgnorePatternWhitespace);

This regular expression recognizes any number of a's followed by the same number of b's. Essentially for every a it adds a named group N to the capture stack and then for every b it removes a named group N from the capture stack. Once it gets past the last b it checks to see if the named group N exists on the capture stack and if it does then there were more a's then b's and so it forces a failure by matching (?!) (this is a negative lookahead with no expression which is a guaranteed failure). It is worth mentioning that if no named group N exists when trying to pop (<-N>) then it will fail and thus this prevents accepting strings where there are more b's then a's.

Balanced Parentheses.

Jeffrey Friedl provides the following example in his excellent book Mastering Regular Expressions, 2nd Edition.

Dim R As Regex = New Regex(" \(                   " & _
                           "   (?>                " & _
                           "       [^()]+         " & _
                           "     |                " & _
                           "       \( (?<DEPTH>)  " & _
                           "     |                " & _
                           "       \) (?<-DEPTH>) " & _
                           "   )*                 " & _
                           "   (?(DEPTH)(?!))     " & _
                           " \)                   ", _
       RegexOptions.IgnorePatternWhitespace)

Now this expression works just fine for matching properly-nested parentheses but its layout doesn't work for matching nested constructs which are more then an single character such as XML tags for example. Here is another regular expression for matching parentheses that can be expanded easily for other multi-character delimiters.

Regex re = new Regex(@"^
  (?>
      \( (?<LEVEL>)   # On opening paren push level
    |    
      \) (?<-LEVEL>)  # On closing paren pop level
    |
      (?! \( | \) ) . # Match any char except ( or ) 
  )+
  (?(LEVEL)(?!))      # If level exists then fail
  $", RegexOptions.IgnorePatternWhitespace);

This expression also matches properly-nested parentheses. The biggest difference here is that instead of matching a character class of  [^()]+ it uses a negative lookahead to ensure that the character is not a paren. It also only captures one character instead of one or more. For a single character delimiter like the parentheses a lookahead may be more than what is needed but it is needed in the next example.

Balanced XML tags.

Regex re = new Regex(@"^
  (?>
      <tag>  (?<LEVEL>)      # On opening <tag> push level
    | 
      </tag> (?<-LEVEL>)     # On closing </tag> pop level
    |
      (?! <tag> | </tag> ) . # Match any char unless the strings   
  )+                         # <tag> or </tag> in the lookahead string
  (?(LEVEL)(?!))             # If level exists then fail
  $", RegexOptions.IgnorePatternWhitespace | RegexOptions.IgnoreCase);

This expression matches the properly-nested XML tags <tag> and </tag>. The only change from the parentheses expression is to replace ( with <tag> and ) with </tag>. This can be generalized such that all that is needed is the regular expressions for the opening and closing delimiters. The next example shows how one could use this expression in a more general fashion.

General version of Balanced constructs (HTML Anchor tags used in example).

Regex re = new Regex(string.Format(@"^
  (?>
      {0} (?<LEVEL>)      # On opening delimiter push level
    | 
      {1} (?<-LEVEL>)     # On closing delimiter pop level
    |
      (?! {0} | {1} ) .   # Match any char unless the opening   
  )+                      # or closing delimiters are in the lookahead string
  (?(LEVEL)(?!))          # If level exists then fail
  $", "<a[^>]*>", "</a>"), 
  RegexOptions.IgnorePatternWhitespace | RegexOptions.IgnoreCase);

Now this expression uses a simple string format to replace the opening and closing delimiters in the expression string. In this case a simplistic version of the opening and closing HTML anchor tags are used. In general any opening and closing delimiters can be provided to this expression to create a .NET regular expression to match properly balanced constructs.

Retrieving data between delimiters where there are possible nested delimiters

One application commonly needed is the ability to retrieve the text between a set of tags when there is the possibility of the nesting. If there were no nested tags then this regular expression would be rather simple but since there are one essentially needs to wrap the expression from above with the set of outer tags and then capture the inner text. 

Regex re = new Regex(string.Format(@"^
  {0}                       # Match first opeing delimiter
  (?<inner>
    (?>
        {0} (?<LEVEL>)      # On opening delimiter push level
      | 
        {1} (?<-LEVEL>)     # On closing delimiter pop level
      |
        (?! {0} | {1} ) .   # Match any char unless the opening   
    )+                      # or closing delimiters are in the lookahead string
    (?(LEVEL)(?!))          # If level exists then fail
  )
  {1}                       # Match last closing delimiter
  $", "<quote>", "</quote>"), 
  RegexOptions.IgnorePatternWhitespace | RegexOptions.IgnoreCase);
re.Match("<quote>inner text</quote>").Groups["inner"].Value == "inner text" re.Match("<quote>a<quote>b</quote>c</quote>").Groups["inner"].Value == "a<quote>b</quote>c"

This example strips off the outer most <quote> tags and stores the inner text result in the named-capture group inner.  

Matching multiple balanced constructs

The original intent of this example was to show how to match multiple properly balanced tags with a single expression. However, after creating the expression and testing it an interesting problem cropped up. For example to make sure () and [] are properly nested individually is easy as shown above but to make sure they are properly nested together is not possible with .NET regular expressions. To better understand the problem consider the following improperly nested examples ([)] or [(()]). They are individually properly-nested but improperly-nested when considering them together. Here is an expression that could potentially recognize this:

Regex re = new Regex(@"^
  (?>
      (?<LEVEL> \()                 # On opening paren capture ( on stack
    | 
      (?(\k<LEVEL>=\()              # Make sure the top of stack is (
      (?<-LEVEL> \) ))              # On closing paren pop ( off stack
    |
      (?<LEVEL> \[ )                # On opening bracket capture [ on stack
    |
      (?(\k<LEVEL>=\])              # Make sure the top of stack is [
      (?<-LEVEL> \] ))              # On closing bracket pop [ off stack
    |
      (?! \( | \) | \[ | \] ) .     # Match any char except (, ), [ or ]
  )+
  (?(LEVEL)(?!))                    # If level exists then fail
  $", RegexOptions.IgnorePatternWhitespace);

THIS REGULAR EXPRESSION DOES NOT WORK IT IS ONLY USED AS A DEMONSTRATION

The captured value on the top of the stack can be retrieved by using a backreference \k<LEVEL> but there is no way to test the value. The above expression doesn't work because of (?(\k<LEVEL>=\() and (?(\k<LEVEL>=\]) they try to match the string literals "(=)" or "[=]". What really needs to happen is the value on the top of stack needs to be compared to ( or [ however this is not possible with .NET regular expressions. This is an example of a context-free language that cannot be recognized by a simple counter.

Conclusion

Hopefully this article has provided a better understanding of how and why the .NET regular expression engine can recognize individually properly balanced constructs.

Mounting ISO files as non-admin

Well a long time ago I posted about using Virtual CDRom Control Panel to create a Virtual CDRom Drive (i.e. mount ISO files). I use this quite often however now that I have switched to being a non-administrator on my system I can't use it anymore. I keep getting the error message "Cannot Open SCM : Access is denied". I tried to figure out how to setup the correct permissions and got as far as attempting to change the Security Descriptor via the sc command, but no luck. Eventually I gave up and looked for another way to mount ISO files.

Currently I have installed Daemon Tools and it allows me to mount ISO files properly. Like most other stuff you do need to install Daemon Tools as an Administrator but afterwards it works properly for normal users.

Updated version of CodeHTMLer

I just finished updating CodeHTMLer. I fixed some bugs and also added a couple more options that will hopefully help make it more useful.

I also added a forum where people can post bugs. They can also request or post language definitions.

If you post code on the web and you would like to make it more colorful then check it out and of course if you use PostXING it is integrated right into the client for easy code colorization for your blog.

Running as non-admin

After setting up my new domain recently I decided I would finally try to run as non-admin. Currently I'm a normal user.

There is however those times where you need to run as admin. To run applications you could use the MakeMeAdmin utility. However I prefer to use Run++. It allows you to run an executable as another user (i.e. Administrator). You can setup a custom command and then provide it the username and/or password then whenever you run that command it will run that command under the given user context. Run++ accomplishes this by simply using the new options on the ProcessStartInfo in the .NET Framework 2.0 which allows you to pass user credentials (underneath the hood it makes a call to CreateProcessWithLogonW with the given credentials).

By doing this I don't have to type the administrator username and password every time I want to start a program as administrator. FYI the password is encrypted using the new ProtectedData class which uses DPAPI and by using the DataProtectionScope.CurrentUser scope the the password can only be decrypted by the user that created it.

Currently I only have a command prompt that is setup to run as administrator that way I can just do any other work I need from there.

Posted by puzzlehacker | with no comments

Run++ 0.7.7.0 Released

I have made some bug fixes and added a couple new features to Run++.

Version 0.7.7.0
* Fixed problem with dialogs not coming to the front because they were launched from a background thread
* Fixed problem with holding CTRL down to edit command
* Made <domain>\<username> format work for the runas create process
* Fixed problem with dropdown not closing when command is run
* Made the import/export of the command list run on background workers
* Other minor bug fixes

If you have not yet looked at Run++ please have a look at http://puzzleware.net/runpp/.

Download available here or here.

Please let me know of any problems or feature requests in the Run++ discussion forum.

Posted by puzzlehacker | with no comments

Re-associating Compressed Folders with .zip files

Once upon a time I installed WinZip and so the .zip file association got switched to WinZip. I really like selecting files and then right clicking and selecting Send To > Compressed (zipped) Folder in XP/2003. However you cannot do that unless Compressed Folder is the program associated with .zip files. It will usually ask you "Do you want to designate Compressed (zipped) Folders as the application for handling ZIP files?" and I always click yes but it never re-associates. Anyways I found a registry fix on Doug Knox's page Windows XP File Association Fixes. If you import the ZIP Folder Association Fix into the registry it will restore the default .zip file association. Now I can use the Send To > Compressed Folder again. Thanks Doug.

On a side note at one point the Compressed (zipped) Folder was not even an option in the Send To menu. So I tried to figure out how to put it back. I looked up and down through the registry only to figure out later that the Send To items are stored as shortcuts in a Send To hidden system folder in your profile. If you need to add it back the easiest way is to copy the Compressed (zipped) Folder shortcut from another profile into your profile.

Creative Commons License

Today my advisor pointed me to Creative Commons to create a license for my Thesis work. I never knew about this. You can very easily create licenses for anything by answering a few questions. It gives you an icon (like the one below) to put on your website or text to put into text files (i.e. source code files). I hate trying to figure out licensing stuff so this will make it very easy to license my stuff.

Creative Commons License
This work is licensed under a Creative Commons License.

Posted by puzzlehacker | with no comments
More Posts Next page »