Automated Search and Replace in Word 2007 documents with C#
I worked on an interesting problem last night and thought I'd post the code. I'm working on a software conversion project which has a new requirements/use case structure, and I had a list of about 700 requirement numbers that each needed to be replaced with a new requirement number, throughout 20 Word documents that averaged 20 pages apiece.
Going through each document and doing 700 "Replace Alls" didn't sound like much fun, and there are lots more documents and requirements coming down the pike that will need this same operation done to them, so I embarked on a VSTO expedition.
I created a console app in Visual Studio to run the code, and the first thing I noticed is that the Office 12 (Office 2007) Primary Interop Assemblies were not registered on my PC. A quick search came up with this Microsoft download that lets you install these to your GAC with an MSI.
Next, I found a great VB.Net code snippet in a Microsoft forum (it's the second post in the thread, from "Spotty") that gives the basic code needed to do this for a single file.
I would say that if you are going to do a lot of interop work, it may be worthwhile to use VB.Net; the support for optional parameters saves a lot of time. But my initial conversion of Spotty's VB code looks like this:
Spotty's original VB.Net code:
Dim word As New Microsoft.Office.Interop.Word.Application Dim doc As Microsoft.Office.Interop.Word.Document Try doc = word.Documents.Open("c:\test.doc") doc.Activate() Dim myStoryRange As Microsoft.Office.Interop.Word.Range For Each myStoryRange In doc.StoryRanges With myStoryRange.Find .Text = "findme" .Replacement.Text = "findyou" .Wrap = Microsoft.Office.Interop.Word.WdFindWrap.wdFindContinue .Execute(Replace:=Microsoft.Office.Interop.Word.WdReplace.wdReplaceAll) End With Next myStoryRange doc.SaveAs("c:\test1.doc") Catch ex As Exception MessageBox.Show("Error accessing Word document.") End Try
My conversion to C#:
(note: add a reference to Microsoft.Office.Interop.Word (version 12) and the Using statement below)
using Word = Microsoft.Office.Interop.Word;
public static void DoSearchAndReplaceInWord() { // Create the Word application and declare a document Word.Application word = new Word.Application(); Word.Document doc = new Word.Document(); // Define an object to pass to the API for missing parameters object missing = System.Type.Missing; try { // Everything that goes to the interop must be an object object fileName = @"C:\myDocument.doc"; // Open the Word document. // Pass the "missing" object defined above to all optional // parameters. All parameters must be of type object, // and passed by reference. doc = word.Documents.Open(ref fileName, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing); // Activate the document doc.Activate(); // Loop through the StoryRanges (sections of the Word doc) foreach (Word.Range tmpRange in doc.StoryRanges) { // Set the text to find and replace tmpRange.Find.Text = "findme"; tmpRange.Find.Replacement.Text = "findyou"; // Set the Find.Wrap property to continue (so it doesn't // prompt the user or stop when it hits the end of // the section) tmpRange.Find.Wrap = Word.WdFindWrap.wdFindContinue; // Declare an object to pass as a parameter that sets // the Replace parameter to the "wdReplaceAll" enum object replaceAll = Word.WdReplace.wdReplaceAll; // Execute the Find and Replace -- notice that the // 11th parameter is the "replaceAll" enum object tmpRange.Find.Execute(ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref replaceAll, ref missing, ref missing, ref missing, ref missing); } // Save the changes doc.Save(); // Close the doc and exit the app doc.Close(ref missing, ref missing, ref missing); word.Application.Quit(ref missing, ref missing, ref missing); } catch (Exception ex) { doc.Close(ref missing, ref missing, ref missing); word.Application.Quit(ref missing, ref missing, ref missing); } }
After this was up and running, setting up the data reader and looping though the directory to operate on all files was pretty straightforward -- the biggest tricks were declaring the "missing" object variable for Type.Missing, and adding the code to close the doc and exit the application.
If you set up a VSTO project, you get the "missing" object declared as a global variable, so you don't need to declare it. But for stand-alone Word interop, I think this is pretty clean.