I ran into an interesting problem with the copyright symbol recently: ©
The symbol appears in the comments of a number of source code files that I work on. And I was noticing that using Subversion‘s difference tool was complaining that my editor was producing an invalid character. The © symbol was changing to that familiar rectangle that says the character is invalid.
I’m using two editors, Notepad++ and IntelliJ Ultimate. I trust both to do the right thing.
Using Cygwin, I was able to dump the file and see what was going on.
$ od -ctx1 filename
In the original file, the © symbol was rendered using 0xA9 (ascii 169). This byte was in that magical 128-255 block, and under certain conditions, with the right code page and font loaded, it would appear as ©, though in many others it would just appear as an invalid character block.
However, if I removed the offensive byte and replaced it with the copyright character, a dump showed something more UTF-8 looking, a byte pair for the character: 0xC2 0xA9
Subversion’s tool then rendered the character properly, as well the compilers and editors working well.