Q&A - Greedy matching in regular expressions - ISerializable - Roy Osherove's Blog

Q&A - Greedy matching in regular expressions

This came in the mail, thought other folks might be interested.

Hi Roy. I need to check a line of html and make the value of the style attribute lowercase.  I've tried to come up with a regex that will work but I keep making the entire line of html lowercase instead of just the stuff in the style value.  I can't get the match to end with the correct quote, instead it goes to the last quote on the line.  So something like this:

[Tag style="WIDTH:20px; color:blue;" href="blah.com/PageTWO"] I want to change to this:
[Tag style="width:20px; color:blue;" href="blah.com/PageTWO"]

But instead I get this:
[Tag style="width:20px; color:blue;" href="blah.com/pagetwo"]

Because the match ends with the end quote of the href.


If you can point me in the right direction (or having something like this laying around), I would GREATLY appreciate it.

Answer:

It's called "greedy matching" - because it looks for the *last* character.
Try to add a "?" after the quanitiy specifier (probably '*'). That makes the match end on the *first* match.

For example, given the following string as input:
"abcdfgdrbdtargd"
The following greedy regex (greedy by default) will match up until the lasd 'd':
(.*d)

However, this regex will find several matches, the first one is "abcd":
(.*?d)
(you can do without the braces if you want).

I'd also suggest adding two good regex mailing list to your arsenal instead of sending help messages to various people:
http://groups.yahoo.com/group/dotnetregex/
http://lists.aspadvice.com/SignUp/list.aspx?l=68&c=16

There are people there that know a whole lot more than me on regular expressions.

 

Published Monday, January 10, 2005 7:37 PM by RoyOsherove
Filed under:

Comments

Monday, January 10, 2005 1:51 PM by Aaron Robinson

# re: Q&A - Greedy matching in regular expressions

Fore the case of wanting to match the quoted attributes, you should also be able to specify that the attribute value consists of any character that don't include the quotation mark, such that the closing quote (and subsequent opens and closes) aren't included:

\"[^\"]*\"


- Aaron

# Visual Studio Find (and Replace) Regular Expressions | Aaron Lerch

Pingback from  Visual Studio Find (and Replace) Regular Expressions | Aaron Lerch