A simple PowerShell script to find and replace using regular expressions in multiple files

One thing I need which I come across from time to time is the ability to perform a find and replace operation in multiple files, using regular expressions. When this happens, I usually tend to exploit Visual Studio's own support for this kind of necessity; soon, however, I have to give it up and blame my favorite IDE for the lack of adherence with the regular expressions syntax adopted by the .NET framework, which I'm used to.
So, today, after my umpteenth unsuccessful attempt with Visual Studio, I resolved to implement a simple PowerShell script stub, which would act as a strating point for performing this job for me hereafter. No, this is not by far a complete grep-like tool; I would like it to be just a demonstration of how easy, powerful and "clean" are PowerShell scripts like this one. And yes, I know there is plenty of third party tools which do this kind of things...

To go down into the specifics of my problem, I was trying to combine a set of html files, that I grabbed after a CHM to HTM conversion, into a single one; since images inside these documents are just thumbnails contained inside an hyperlink which let the user eventually click to see the image at the original size, I want to perform some regular expressions substitution in order to have the original size image embedded directly into the document, have the thumbnails removed and the header and footer of each individual html file removed before being combined into the target one.
Since PowerShell is a .NET managed shell, we can naturally use our beloved Regex class to perform our regular expressions substitution, thus adopting the syntax we are accustomed with.

Here's the code which today solved my problem; feel free to use it as the starting point for similar issues:

$rxFigure = New-Object System.Text.RegularExpressions.Regex "(?:\<span.class=""figure""\>)\<a.href=""(?<Url>.*?)"".*?\>\</a>"
$rxHeader = New-Object System.Text.RegularExpressions.Regex "<html>.*?</table>", SingleLine
$rxFooter = New-Object System.Text.RegularExpressions.Regex "<table.*?</html>", SingleLine
$combinedOutput = [System.IO.File]::CreateText([System.IO.Path]::Combine((Get-Location).Path, "..\Combined.htm"))

Get-Item "*.html" | ForEach-Object {
    $text = [System.IO.File]::ReadAllText($_.FullName)
    $text = $rxFigure.Replace($text, "<img src=""`${Url}"" />")
    $text = $rxHeader.Replace($text, "")
    $text = $rxFooter.Replace($text, "<br style=""page-break-after: always;"" />")
   
    $combinedOutput.Write($text);
}

$combinedOutput.Close()

8 Comments

  • Thanks! Very helpful!

  • @DoG: Yeah, I must admit that wasn't the right thing to start from... But hey, it was back in 2007 and I've developed my knowledge since then, really! ;)

    Check out my PowerShell Italian community web site at http://www.powershell.it

  • I love to share knowledge that will I've built up through the season to
    help improve group performance.

  • I like to disseminate knowledge that will I've accumulated with
    the calendar year to help enhance team functionality.

  • Excellent willing synthetic eye meant for fine detail and can anticipate
    problems just before they occur.

  • I like to share understanding that will I have built up
    with the 12 months to assist enhance team overall performance.

  • I have a happy synthetic eyesight meant for details and can anticipate difficulties before they will take place.

  • , in Donaghue -v- Stevenson 1932 for illness of consumer from manufacturer's drink purchased
    by another, and not if immune as public policy in Hill -v- Chief Constable 1988, or
    as barristers or judges - Saif -v- Sydney Mitchell 1980; as well as to one with blood-ties:
    e. Keeping organized would be the distinction between maintaining business success and
    finding you floundering in the business world. "This testimony creates a problem for the official story.

Comments have been disabled for this content.