I was recently talking on the phone with a person who lives at least 2,200
miles away and whom I'd never met or spoken to before. This is a
surprisingly common occurrence in this day and age. I was explaining some
of the things that I like about Perl. When I got to the part about how I
love writing 4 or 5 lines of code where programmers of other languages have
to write 20 or 30, my new friend hinted that he thought I was talking about
completely unreadable Perl code. I was quick to point out that I wasn't
talking about the Perl golf that looks like encrypted text, but rather
about using higher-order functions like
map and
grep. I've
written about this sort of thing before. See, for example, the article
titled
Concise Programming and Super General Functions.
But, here's another example of how Perl can be very concise. I'm going to
present a program with 4 lines of code that can read a quoted CSV file (with a
column-headings row) and parse it into an array of hash references that
allow you to access a piece of data by line number and field name, like
this:
print "keyword: $hdata[2]->{keyword}\n",
" visits: $hdata[2]->{visits}\n";
The above code would print
keyword: convert soap to rest
visits: 81
Here's the code:
#!/usr/bin/perl
use Text::ParseWords;
my @data= map {s/\s$//g; +[quotewords ',', 0, $_]} <>;
my @fields= map {lc $_} @{shift @data};
my @hdata= map {my $v= $_; +{map {($_ => (shift @{$v}))} @fields}} @data;
# Print the top 3 keywords and their visits
print "$_->{keyword} => $_->{visits}\n" for @hdata[0..2];
Here's the sample CSV file:
keyword,visits,pages_per_visit,avg_time_on_site,new_visits,bounce_rate
"perl thread pool",210,1.00,00:00:00,23.81%,100.00%
"soap vs rest web services",152,1.00,00:00:00,100.00%,100.00%
"convert soap to rest",81,1.00,00:00:00,12.50%,100.00%
"perl threads writing to the same socket",63,1.00,00:00:00,0.00%,100.00%
"object oriented perl resumes.",52,1.00,00:00:00,20.00%,100.00%
"perl thread::queue thread pool",54,1.00,00:00:00,0.00%,100.00%,
"donnie knows marklogic install",43,1.00,00:00:00,25.00%,100.00%
"perl threaded server",45,2.00,00:01:28,75.00%,25.00%
"slava akhmechet xpath",44,1.50,00:08:46,0.00%,75.00%
"donnie knows",36,6.67,00:02:56,66.67%,0.00%
And here's an example of how the code stores a line of that CSV file in
memory:
{
keyword => "perl thread pool",
visits => 210,
pages_per_visit => 1.00,
avg_time_on_site => "00:00:00",
new_visits => "23.81%",
bounce_rate => "100.00%"
}
The program doesn't use any modules that don't ship with Perl, so you don't
have to install anything beyond Perl itself to make this program work.
Also, once you learn a few standard Perl functions for processing lists and
a little about Perl data structures, the code is actually very readable.
Let's take a detailed look at the code to see how so little of it can
accomplish so much. We'll start with the definition of
@data.
my @data= map {s/\s$//g; +[quotewords ',', 0, $_]} <>;
List processing and filtering functions are often best read
backwards. Let's start with the
<> operator, which in list
context reads all the lines from the file name you provide to the
program. If you save the program as
read.pl, for example, and you
run it like this:
./read.pl file.csv
Then the
<> operator will read the contents of
file.csv
and return it as an array of lines. The
map function accepts some
code and an array and returns a new array. Each element in the new array
consists of the result of applying the given code to the corresponding
element in the original array. Here's an example of how to use the
map function:
@n_plus_2 = map {$_ + 2} qw/1 2 3/;
@n_plus_2 ends up with the values 3, 4, and 5.
The function we pass to
map in the
@data assignment removes
trailing spaces from the each line of text, then splits the line of text
into values at the commasexcluding quoted commas, of course, and
allowing escaped quotes within a value. So
@data ends up looking
like this (only the first two lines of the example CSV file included here):
(
["keyword", "visits", "pages_per_visit", "avg_time_on_site",
"new_visits", "bounce_rate"],
["perl thread pool", 210, 1.00, "00:00:00", "23.81%", "100.00%"],
["soap vs rest web services", 210, 1.00, "00:00:00", "23.81%",
"100.00%"]
.
.
.
)
The
@fields assignment simply pulls the first row out of the
@data array, lower-cases each column title, and assigns the
resulting array to
@fields.
Finally, the
@hdata assignment converts each array reference in
@data into a hash reference. It does so by associating a key
(a column title) with each value in the array reference. The resulting
@hdata array contains hash references.
How many lines of code does it take to do this kind of stuff in your
favorite language?