Tuesday night David Lowe gave a very interesting
talk
at SF.pm on pack/unpack
and some of the awful things you can do with
them.1 We ended the meeting talking about whether you could use the
pack format “P” (which packs and unpacks “a pointer to a structure
(fixed-length string)”) to force poor Perl to do C-like pointer arithmetic.
David is using unpack
to do a binary search of fixed width blobs of data in
order to avoid unserializing it. His current (minor) bottleneck is creating the
pack format string dynamically for each step in the binary search (ie, 'x' .
($record_size * $record + 1)
). The math is fast, the string concatenation is
relatively slow. I wondered if you could use the “P” format to avoid creating
the format string on each pass and stick with simple integer arithmetic.
After a bit of hacking, it turns out this can be done. Instead of David’s very complicated:
# Create an unpack format to skip the first $record * $record_size
# bytes, then return the next 100 byte null padded string
my $format = 'x' . ( $record_size * $record ) . 'Z100';
# Unpack from our binary blob
my $element = unpack( $format, ${$frozen_haystack_ref} );
You get the nearly unfathomable:
# Use pointer arithmetic to calculate where the record is in memory
# and convert the Perl integer into an unsigned long integer
my $ptr = pack( 'L!', $ptr_to_base + $record_size * $record );
# Pull 100 bytes from that spot in memory
my $element = unpack( 'P100', $ptr );
And _voila_, Perl is doing pointer arithmetic and accessing structures just
like C. Unfortunately, unpack("P")
won’t take a native Perl integer as an
argument. You need to use pack("L!")
to turn a Perl integer it into a long
integer. So we trade the string concatenation in David’s code for a
pack("L!")
in this code. And even worse, string concatenation is about 20%
faster than unpack
.
So, while this doesn’t appear to help David speed up his already cheetah like code, it does prove that you can have pointers in Perl. Of course, you should never ever do anything like this. It is fraught with potential bugs and will drive anyone stuck maintaining your code insane.
Feel free to take apart my ugly benchmarking code. Maybe someone who knows this better can actually save David a few clock-cycles.
By the way, thanks to Matt Trout who got me motivated to (re)start blogging about Perl. In the past, I have gotten bogged down by setting up a site rather than focusing on adding content2. This time I decided to let Google do the work for me and focus on the content. Hopefully, this will result in more regular (and interesting?) posts. Feedback is very welcome.
Footnotes:
The contents of this blog are licensed under the Creative Commons “Attribution-Noncommercial-Share Alike 3.0″ license.