Unicode and the Copyright Symbol

Here’s how I solved the copyright character being changed to an invalid Unicode character block on source control commits.

I ran into an interesting problem with the copyright symbol recently: ©

The symbol appears in the comments of a number of source code files that I work on. And I was noticing that using Subversion‘s difference tool was complaining that my editor was producing an invalid character. The © symbol was changing to that familiar rectangle that says the character is invalid.

I’m using two editors, Notepad++ and IntelliJ Ultimate. I trust both to do the right thing.

Using Cygwin, I was able to dump the file and see what was going on.
$ od -ctx1 filename

In the original file, the © symbol was rendered using 0xA9 (ascii 169). This byte was in that magical 128-255 block, and under certain conditions, with the right code page and font loaded, it would appear as ©, though in many others it would just appear as an invalid character block.

However, if I removed the offensive byte and replaced it with the copyright character, a dump showed something more UTF-8 looking, a byte pair for the character: 0xC2 0xA9

Subversion’s tool then rendered the character properly, as well the compilers and editors working well.

Find and Replace in Word using C# .NET

Solution to how to do a global search and replace in MS-Word, including across floating text objects, in C#/.NET.

Heads up, this article contains high quantity of geek content. Non-geeks should move along.

I’ve been trying to use Microsoft.Office.Interop.Word to perform a global bulk search and replace operations across an entire document. The problem was, however, if a document contained a floating text box, which manifested itself as a shape object of type textbox, the find and replace wouldn’t substitute the text for that region. Even using Word’s capability to record a macro and show the VBA code wasn’t helpful, as the source code in BASIC wasn’t performing the same operation as inside the Word environment.

What I wanted was a simple routine to replace text anywhere inside of a document. If you Google for this you’ll get the wrong kind of textbox, the wrong language, people telling you not to use floating textboxes, and all kinds of weird story iterators.

One site seemed to have the solution; many kind thanks to Doug Robbins, Greg Maxey, Peter Hewett, and Jonathan West for coming up with this solution and explaining it so well.

However, the solution was in Visual Basic for Applications, and I needed a C# solution for a .NET project. Here’s my port, which works with Office 2010 and Visual Studio 2010 C#/.NET 4.0. I’ve left a lot of redundant qualifiers and casting on to help people searching for this article.

using System;
using System.Collections.Generic;
using System.ComponentModel;
using Microsoft.Office.Interop.Word;

// BEGIN: Somewhere in your code
Application app = null;
Document doc = null;
try
{
  app = new Microsoft.Office.Interop.Word.Application();

  doc = app.Documents.Open(filename, Missing, Missing, Missing, Missing, Missing, Missing, Missing, Missing, Missing);

  FindReplaceAnywhere(app, find_text, replace_text);

  doc.SaveAs(outfilename, Missing, Missing, Missing, Missing, Missing, Missing, Missing, Missing, Missing);
}
finally
{
  try
  {
      if (doc != null) ((Microsoft.Office.Interop.Word._Document) doc).Close(true, Missing, Missing);
  }
  finally { }
  if (app != null) ((Microsoft.Office.Interop.Word._Application) app).Quit(true, Missing, Missing);
}
// END: Somewhere in your code             



// Helper
private static void searchAndReplaceInStory(Microsoft.Office.Interop.Word.Range rngStory, string strSearch, string strReplace)
{
    rngStory.Find.ClearFormatting();
    rngStory.Find.Replacement.ClearFormatting();
    rngStory.Find.Text = strSearch;
    rngStory.Find.Replacement.Text = strReplace;
    rngStory.Find.Wrap = WdFindWrap.wdFindContinue;

    object arg1 = Missing; // Find Pattern
    object arg2 = Missing; //MatchCase
    object arg3 = Missing; //MatchWholeWord
    object arg4 = Missing; //MatchWildcards
    object arg5 = Missing; //MatchSoundsLike
    object arg6 = Missing; //MatchAllWordForms
    object arg7 = Missing; //Forward
    object arg8 = Missing; //Wrap
    object arg9 = Missing; //Format
    object arg10 = Missing; //ReplaceWith
    object arg11 = WdReplace.wdReplaceAll; //Replace
    object arg12 = Missing; //MatchKashida
    object arg13 = Missing; //MatchDiacritics
    object arg14 = Missing; //MatchAlefHamza
    object arg15 = Missing; //MatchControl

    rngStory.Find.Execute(ref arg1, ref arg2, ref arg3, ref arg4, ref arg5, ref arg6, ref arg7, ref arg8, ref arg9, ref arg10, ref arg11, ref arg12, ref arg13, ref arg14, ref arg15);
}

// Main routine to find text and replace it,
//   var app = new Microsoft.Office.Interop.Word.Application();
public static void FindReplaceAnywhere(Microsoft.Office.Interop.Word.Application app, string findText, string replaceText)
{
    // http://forums.asp.net/p/1501791/3739871.aspx
    var doc = app.ActiveDocument;

    // Fix the skipped blank Header/Footer problem
    //    http://msdn.microsoft.com/en-us/library/aa211923(office.11).aspx
    Microsoft.Office.Interop.Word.WdStoryType lngJunk = doc.Sections[1].Headers[WdHeaderFooterIndex.wdHeaderFooterPrimary].Range.StoryType;

    // Iterate through all story types in the current document
    foreach (Microsoft.Office.Interop.Word.Range rngStory in doc.StoryRanges)
    {

        // Iterate through all linked stories
        var internalRangeStory = rngStory;

        do
        {
            searchAndReplaceInStory(internalRangeStory, findText, replaceText);

            try
            {
                //   6 , 7 , 8 , 9 , 10 , 11 -- http://msdn.microsoft.com/en-us/library/aa211923(office.11).aspx
                switch (internalRangeStory.StoryType)
                {
                    case Microsoft.Office.Interop.Word.WdStoryType.wdEvenPagesHeaderStory: // 6
                    case Microsoft.Office.Interop.Word.WdStoryType.wdPrimaryHeaderStory:   // 7
                    case Microsoft.Office.Interop.Word.WdStoryType.wdEvenPagesFooterStory: // 8
                    case Microsoft.Office.Interop.Word.WdStoryType.wdPrimaryFooterStory:   // 9
                    case Microsoft.Office.Interop.Word.WdStoryType.wdFirstPageHeaderStory: // 10
                    case Microsoft.Office.Interop.Word.WdStoryType.wdFirstPageFooterStory: // 11

                        if (internalRangeStory.ShapeRange.Count > 0)
                        {
                            foreach (Microsoft.Office.Interop.Word.Shape oShp in internalRangeStory.ShapeRange)
                            {
                                if (oShp.TextFrame.HasText != 0)
                                {
                                    searchAndReplaceInStory(oShp.TextFrame.TextRange, findText, replaceText);
                                }
                            }
                        }
                        break;

                    default:
                        break;
                }
            }
            catch
            {
                // On Error Resume Next
            }

            // ON ERROR GOTO 0 -- http://www.harding.edu/fmccown/vbnet_csharp_comparison.html

            // Get next linked story (if any)
            internalRangeStory = internalRangeStory.NextStoryRange;
        } while (internalRangeStory != null); // http://www.harding.edu/fmccown/vbnet_csharp_comparison.html
    }

}

Let me know if it worked for you; bug fixes and enhancements welcome.