RegularExpressionValidator woes and the semantics of dot
There was a somewhat lengthy discussion on the Win Tech Off Topic list regarding how to use a regular expression to limit the length of a string. I had recommended the following expression:
^.{0,n}$
Where 'n' is the upper bound; 500 in his case. Most regular expressions implementations allow you to alter the semantics of the "match anything except newline" metacharacter '.' in the above expression to match newlines. In .NET, one simply uses RegexOptions.Singleline for the options. Most other languages that actually have regex literals in the language itself (e.g. Perl, Javascript, ...) allow you to specify such modifiers after the closing regex literal delimiter:
/^.{0,n}$/s
The 's' modifier in the above expression is equivalent to using the RegexOptions.Singleline in .NET. This modifier will cause the expression to match correctly against up to 'n' characters with the ability to span multiple lines.
I didn't realize at the beginning of the discussion that he was using a RegularExpressionValidator control in ASP.NET, but that causes issue with using my recommended expression. If using said validator with the EnableClientScript attribute == true, the validation code will obviously have to be rendered as client-side Javascript. This means that one can't apply RegexOptions, and it also means you can't use the modifiers in the regex literal as noted above for Javascript as the expression is taken directly from the ValidationExpression attribute of the control; a string literal. The relevant code in the generated WebUIValidation.js file looks like the following:
var rx = new RegExp(val.validationexpression);
Javascript does have an overloaded RegExp constructor to allow the passing of modifiers, but unfortunately it appears it only supports the 'g' and/or 'i' modifiers (global and ignore case, respectively).
Brad Wilson suggested creating a new validator, which is certainly an option; however, Chris Frazier pointed us to the LengthValidator controls. It certainly solves the problem, but for those who are curious, you could use the following expression when you haven't the option to change the semantics of the '.' metacharacter:
^(.|\s){0,n)$
An inefficient expression indeed, but sometimes you haven't an option (pun intended). A character class might be more efficient here (e.g. [.\s]) instead of using alternation, but the '.' metacharacter is no longer a metacharacter in the context of a character class. Confusing enough?
This post was rather anticlimactic and longer than intended, especially considering this was in response to a mailing list post and not something I needed to do personally. Hopefully someone can benefit from the above information.