Unicode and the Copyright Symbol

Here’s how I solved the copyright character being changed to an invalid Unicode character block on source control commits.

I ran into an interesting problem with the copyright symbol recently: ©

The symbol appears in the comments of a number of source code files that I work on. And I was noticing that using Subversion‘s difference tool was complaining that my editor was producing an invalid character. The © symbol was changing to that familiar rectangle that says the character is invalid.

I’m using two editors, Notepad++ and IntelliJ Ultimate. I trust both to do the right thing.

Using Cygwin, I was able to dump the file and see what was going on.
$ od -ctx1 filename

In the original file, the © symbol was rendered using 0xA9 (ascii 169). This byte was in that magical 128-255 block, and under certain conditions, with the right code page and font loaded, it would appear as ©, though in many others it would just appear as an invalid character block.

However, if I removed the offensive byte and replaced it with the copyright character, a dump showed something more UTF-8 looking, a byte pair for the character: 0xC2 0xA9

Subversion’s tool then rendered the character properly, as well the compilers and editors working well.

Mysterious Copyright

This is clearly one of those things I did to myself as a good idea, then forgot about, only to be plagued by it later.

I noticed that all of my photographs on my camera were reporting a copyright with a 2009 year inside the exif data.

I’ve been unable to figure out where it was coming from, resorting to exiftool to remote it.

My natural thought was that perhaps it was some preference in a photo editing tool or a geospatial locator tool. But, no. Turns out I did it to myself.

The Canon EOS Utility has a nifty ability to include a value for the Copyright tag. And about a year ago when I tethered it to the computer, I must have noticed this and set it to some precanned value that includes the year.

It looked something like this:
Copyright (c) 2009 by Walt Stoneburner, All Rights Reserved.

And ever since then, my photos were stamped with that value. Which was fine, back in 2009.

Fixing the problem was as simple as tethering the camera again and firing up Canon EOS Utility. It also gave me an opportunity to update the firmware.

Strange copyright exif data: mystery solved.