Catching the Human Mind

I was benchmarking PCs to try to determine where to put some Chattermancy code and I started to wonder once more where we are in the unintentional technological race to build a cheap computer that can outperform the human brain. I’ve done this exercise a half-dozen times over the years. Here’s what I found this time.

Calculating how long it’s going to take for PCs to become as powerful as the human brain is easy if you can make a few assumptions. Here are my assumptions:

  • Processing power of the human brain: 100 teramips
  • Processing power of an inexpensive PC today: 0.025 teramips
  • Rate of increase of computer processing power: doubling every year

With those assumptions and a tool like a simple spreadsheet, a small computer program, or a pencil and some paper, anyone can calculate more or less when an inexpensive PC will have the processing power of the human brain. I chose the computer program path. Here’s an example in Common Lisp:

(defun forecast (&key (today-teramips 0.025) (target-teramips 100))
  (let ((power (- (/ (log target-teramips) (log 2))
                  (/ (log today-teramips) (log 2)))))
    (format nil "~a~a~a~%~a~3$~a~%~a~$"
            "Human brain throughput: " target-teramips " teramips"
            "PC throughput: " today-teramips " teramips"
            "Years until inexpensive PC outperforms human brain: " power)))

If you run that program like this

(forecast :today-teramips 0.025 :target-teramips 100)

the program will spit out the following text:

Human brain throughput: 100 teramips
PC throughput: 0.025 teramips
Years until PC outperforms human brain: 11.97

If your assumption about the processing power of the human mind is off by an order of magnitude, such as 100 teramips would be if the human mind actually ran at 1,000 teramips, the result differs little (15 years instead of 12, in this example).

If you make the same assumptions that I made, then the calculation itself is easy and probably not disputable. However, the assumptions are definitely quite disputable, so I will explain how I arrived at those.

Processing power of the human brain

This number is the hard one to pin down. Picking the right value for this variable is the key to this exercise. Here are a few starting points:

Processing power of an inexpensive PC today

To figure this out, I started by downloading the BYTEmark benchmarking software from here: http://www.tux.org/~mayer/linux/bmark.htmlI then extracted, compiled, and ran the software on a quad-core core-2 duo machine with 6GB of RAM, a machine that is worth about US$800.00 today. One number that I used from the results of that test was 81.153, the original BYTEmark results integer index. This number means that the computer is roughly 81 times faster than the baseline for this test, a Pentium 90. I found information on the Internet that indicated that the Pentium 90 ran at approximately 150 mips.

 

That information allowed me to calculate a mips rating for my computer:

81 * 150 mips = 12172 mips

But my computer has 4 processors and this test measures the throughput of a single processor. I multiplied the result by 2 rather than by 4 to obtain a conservative 25,000 mips (or 0.025 teramips) rating for my computer.

Rate of increase of computer processing power:

This was the easiest of all assumptions to make. I simply checked the processing power of the fastest computer you can buy for US$1000 today and it turned out to be about double the of the fastest computer you could buy a year ago for the same amount of money. However, I believe that this assumption of computer power doubling every year is extremely conservative because there are now desktop supercomputers (still expensive–in the ten-thousand-dollar range) that, for some tasks, are hundreds of times faster than a PC. This indicates that in the near future we’re going to see inexpensive computers with hundreds of processors that are likely going to make the conservative estimate we’re using here (computer power doubles every year) seem far too conservative.

Common Plisp: List Processing in Perl

I was recently talking on the phone with a person who lives at least 2,200 miles away and whom I’d never met or spoken to before. This is a surprisingly common occurrence in this day and age. I was explaining some of the things that I like about Perl. When I got to the part about how I love writing 4 or 5 lines of code where programmers of other languages have to write 20 or 30, my new friend hinted that he thought I was talking about completely unreadable Perl code. I was quick to point out that I wasn’t talking about the Perl golf that looks like encrypted text, but rather about using higher-order functions like map and grep. I’ve written about this sort of thing before. See, for example, the article titled Concise Programming and Super General Functions.

But, here’s another example of how Perl can be very concise. I’m going to present a program with 4 lines of code that can read a quoted CSV file (with a column-headings row) and parse it into an array of hash references that allow you to access a piece of data by line number and field name, like this:

print "keyword: $hdata[2]->{keyword}\n",
      " visits: $hdata[2]->{visits}\n";

The above code would print

        keyword: convert soap to rest
         visits: 81

Here’s the code:

#!/usr/bin/perl
use Text::ParseWords;

my @data= map {s/\s$//g; +[quotewords ',', 0, $_]} <>;
my @fields= map {lc $_} @{shift @data};
my @hdata= map {my $v= $_; +{map {($_ => (shift @{$v}))} @fields}} @data;

# Print the top 3 keywords and their visits
print "$_->{keyword} => $_->{visits}\n" for @hdata[0..2];

Here’s the sample CSV file:

keyword,visits,pages_per_visit,avg_time_on_site,new_visits,bounce_rate
"perl thread pool",210,1.00,00:00:00,23.81%,100.00%
"soap vs rest web services",152,1.00,00:00:00,100.00%,100.00%
"convert soap to rest",81,1.00,00:00:00,12.50%,100.00%
"perl threads writing to the same socket",63,1.00,00:00:00,0.00%,100.00%
"object oriented perl resumes.",52,1.00,00:00:00,20.00%,100.00%
"perl thread::queue thread pool",54,1.00,00:00:00,0.00%,100.00%,
"donnie knows marklogic install",43,1.00,00:00:00,25.00%,100.00%
"perl threaded server",45,2.00,00:01:28,75.00%,25.00%
"slava akhmechet xpath",44,1.50,00:08:46,0.00%,75.00%
"donnie knows",36,6.67,00:02:56,66.67%,0.00%

And here’s an example of how the code stores a line of that CSV file in
memory:

{
   keyword => "perl thread pool",
   visits => 210,
   pages_per_visit => 1.00,
   avg_time_on_site => "00:00:00",
   new_visits => "23.81%",
   bounce_rate => "100.00%"
}

The program doesn’t use any modules that don’t ship with Perl, so you don’t have to install anything beyond Perl itself to make this program work. Also, once you learn a few standard Perl functions for processing lists and a little about Perl data structures, the code is actually very readable.

Let’s take a detailed look at the code to see how so little of it can accomplish so much. We’ll start with the definition of @data.

my @data= map {s/\s$//g; +[quotewords ',', 0, $_]} <>;

List processing and filtering functions are often best read backwards. Let’s start with the <> operator, which in list context reads all the lines from the file name you provide to the program. If you save the program as read.pl, for example, and you run it like this:

./read.pl file.csv

Then the <> operator will read the contents of file.csv and return it as an array of lines. The map function accepts some code and an array and returns a new array. Each element in the new array consists of the result of applying the given code to the corresponding element in the original array. Here’s an example of how to use the map function:

@n_plus_2 = map {$_ + 2} qw/1 2 3/;

@n_plus_2 ends up with the values 3, 4, and 5.

The function we pass to map in the @data assignment removes trailing spaces from the each line of text, then splits the line of text into values at the commas—excluding quoted commas, of course, and allowing escaped quotes within a value. So @data ends up looking like this (only the first two lines of the example CSV file included here):

(
   ["keyword", "visits", "pages_per_visit", "avg_time_on_site",
     "new_visits", "bounce_rate"],

   ["perl thread pool", 210, 1.00, "00:00:00", "23.81%", "100.00%"],

   ["soap vs rest web services", 210, 1.00, "00:00:00", "23.81%",
    "100.00%"]
   .
   .
   .
)

The @fields assignment simply pulls the first row out of the @data array, lower-cases each column title, and assigns the resulting array to @fields.

Finally, the @hdata assignment converts each array reference in @data into a hash reference. It does so by associating a key (a column title) with each value in the array reference. The resulting @hdata array contains hash references.

How many lines of code does it take to do this kind of stuff in your favorite language?

Perl Sockets Swimming in a Thread Pool

I’ve written a simple multithreaded TCP server that uses sockets and a thread pool to handle client requests. This server is packaged into a class (DcServer) that can be trivial to use. Here’s how you might use DcServer to build a server that accepts text from clients and returns the same text with the characters in reverse order:

use DcServer;
my $server= DcServer->new(processor_cb => \&reverse_text);
$server->start;

sub reverse_text {
    my $data= shift;
    join('', reverse(split //, $data));
}

Run server.pl like this: perl server.pl

The server defaults to 10 threads, so you could have clients connecting concurrently to have their text reversed. And you could probably serve a great many of these clients per second.

I’ve also created code to help me build clients. You can, for example, build a client for the above server like this:

use DcClient;
my $message= shift;
die "Usage: perl client.pl string\n" unless defined($message);
my $c= DcClient->new;
print "$message => ", $c->query($message), "\n";

Run client.pl like this: perl client.pl "hello"

The code for the server is short and bare-bones and illustrates how to pass a socket from one thread to another.

Yes, I suffer from the NIH (not-invented-here) syndrome. I have no excuse for having written this. Let’s leave it at that. Nevertheless, I hope the code is useful to someone out there, directly or as a simple example of how to set up thread pools and how to share sockets among threads.

Here’s a little reference for how to use the DcServer module, which depends on these modules only:
– threads
– threads::shared
– Thread::Queue
– IO::Socket
– Time::HiRes

Overview

DcServer is a class that encapsulates a TCP server that implements a thread pool to handle client requests. The class allows the instantiator to provide call back functions for processing client requests, performing background work, and performing tasks after the server has shut down. The instantiator can shut down the server from within two of these callback functions. For example, the instantiator can shut down the server when certain arbitrary conditions are met (regardless of whether clients are connecting or not) or when a client explicitly requests it.

DcServer Methods

new
This method instantiates a server. It takes the following parameters.

  • host An IP address or host name that resolves to an IP address. This is to tell the server where it will be running. Clients should specify the same host name in order to connect. This value defaults to ‘localhost’ if you don’t specify it.
  • port The port on which the server will listen for connections. This value defaults to 8191 if you don’t provide it.
  • thread_count The number of threads that you want the server to start for its thread pool. The server uses the thread pool for handling client connections. This value defaults to 10 if you don’t specify it.
  • eom_marker This is the sequence of characters that the server and the client use to tell each other that they’re done transmitting data. This value defaults to “\\n\\.\\n”, which means “new line, period, new line”.
  • main_yield The server allows the instantiator to specify a callback function that the server will call in a loop. After calling this function, the server sleeps for main_yield seconds before calling the function again.. The value of main_yield defaults to 5 seconds.
  • main_cb A reference to a function that the server calls over and over again, independently of accepting and processing client connections. The server calls the function specified via main_cb with two positional parameters, $counter and $fnstop. The first parameter, $counter, contains an integer with the number of times that the server has called the main_cb function. The second parameter, $fnstop, is a code reference that, when called as a function, causes the server to stop. You can call the function in $fnstop like this: $fnstop->()

    When you call the $fnstop function, the function returns immediately, but by then the call has already initiated a server shutdown.

  • done_cb A reference to a function that the server calls when it has completed its shutdown procedure.
  • processor_cb A reference to a function that the server calls to process the data that the client has sent. The function should accept a string (the client’s request) and it should return another string (the result of processing the client’s request). More specifically, the server calls the processor_cb function with the following positional parameters: $data, $ip, $tid, and $fnstop. There following list describes these parameters:
    • $data: The data that the client sent to the server. This amounts to the client’s request. It is up to the instantiator to interpret and process that request.
    • $ip: The IP address of the client.
    • $tid: The ID of the thread that is handling the request.
    • $fnstop: A reference to a function that you can call to stop the server. You can call this function like this: $fnstop->()

    The processor_cb function should return a string consisting of an answer to the client’s request. The processor_cb function should be able to call the $fnstop function and still return data to the client. But if you do that, the call to $fnstop should occur immediately before the function returns, not before the function processes the client’s data.

start
This method takes no parameters. It simply starts the server that you previously instantiated with the new method.

Acknowledgements

While writing this code, I ran into some serious trouble with passing sockets from one thread to another. I was able to resolve the issue thanks to

something about sockets and threads that BrowserUk posted back in 2006.

Thanks BrowserUk! (And thanks PerlMonks!) I am not worthy of your guidance.

Source Code

You can view or download the code for the

DcServer and DcClient modules here.

You’re welcome also to add the code to CPAN or to POD it or to attempt to convince me to do any or all of this. I know that I should, but I’m such a sloth that I’ve never contributed even a single module to CPAN. The only thing I ask is that you let me know if you manage to improve the code in any significant way.