Recently I had the need to do a lot of text processing, where I was taking text from a number of sources, munging it, and creating a new source. Obviously Perl was the correct way to do this.
Unfortunately, the data sources I was reading were a mish-mash of stuff, and my Perl code had to sort things back out into neat piles so the very quick and efficient data loads could be done into a database. For the curious, I was taking a pile of INSERT statements that took forever to run and doing a block load into Postgres with the COPY command.
My first attempt was to use Perl’s string concatenation operator. $string .= $more_stuff_to_append;
The problem was that an awful lot of string manipulation was going on, and the poor string pool was being beaten to death.
My next thought was to use Perl’s push operator, but discovered that this was faster:
$array[++$#array] = $stuff_to_append;
This basically gets the length of the existing array, adds one to it, and then inserts the element at that new position (at the end).
But I got an interesting surprise when I went to print it in a here-document.
print <<EOT;
blah blah blah
@array
blah blah blah
EOT
This did something I wasn’t expecting. The first element looked fine, but the following elements were all preceded with white space. And that white space was confusing the database bulk load operation.
Example:
Thing1
Thing2
Thing3
That’s when it struck me. The here document is like putting something in double quotes.
With Perl, doing this:
print “@array”;
…will print each element of the array separated by spaces for readability.
But, doing this without the quotes:
print @array;
…will print each element one right after the other with no separator.
Simply moving the array outside the here document resolved the problem.