Hacking XQuery Into Emacs With Berkeley DB XML

I often need to query an XML document without having to load it into a native XML database. In a perfect world, I would simply load the document into my editor, sprinkle some XQuery into the document, highlight the portions I want to evaluate, then hit a key combination to view a new window with the results. The whole operation should take all of a few seconds for a large class of XML documents and queries.

Thankfully, this is indeed a perfect world. And, I'm an Emacs user. One of my goals in life is to fool people into thinking that I'm some kind of mad genius programmer, and the view of me quickly performing some ad-hoc analysis of an XML file, in Emacs, using syntax-highlighted XQuery, can have that effect.

Here's the hack that allows me to use Emacs to process XQuery code. I've tested this on Ubuntu only, with Emacs 23; some modifications might be needed for other flavors of Linux, Emacs, or other operating systems. However, keep in mind that we're talking about Emacs, Perl, and DB XML, so even your toaster should be able to run this hack.

  • Overview
    • Install the latest version of Berkley DB XML
    • Install pipeline.pl
    • Install xquery-mode.el
    • Configure Emacs with some Elisp code
  • Detailed Description
    1. Install the latest version of Berkley DB XML
      1. Download from here
      2. Extract DB XML, then move into the dbxml directory
      3. Build the software like this:
        sudo ./buildall.sh -b 64 --prefix=/usr/lib/share/dbxml --enable-perl
        
        Note:
        1. If your computer is a 32-bit machine, then use -b 32 instead of -b 64
        2. The build process is extensive and will take a few minutes even on a very fast machine.
    2. Install pipeline.pl
      1. Create a db-xml directory somewhere. Here's the location on my computer: /home/donnie/lib/db-xml
      2. Create a db directory inside the db-xml directory
      3. Download pipeline.txt to the db-xml directory, replace the .txt extension with .pl, change line 6 of the program to point to the correct db-xml/db directory, and make sure that the program has the executable bit set.
    3. Install xquery-mode.el
    4. Configure Emacs with some Elisp code
      1. Add these two functions to your .emacs file or wherever you see fit:
          (defun query-dbxml-with-region (beg end)
            "Query dbxml with selected text"
            (interactive "*r")
            (let ((newbuffer nil)
                  (buffer (get-buffer "result"))
                  (xquery (buffer-substring beg end)))
              (setq dbxml-result
                    (cond
                     ((buffer-live-p buffer) buffer)
                     (t (setq newbuffer t) (generate-new-buffer "result"))))
              (with-current-buffer dbxml-result
                (with-timeout
                    (10 (insert "Gave up because query was taking too long."))
                  (erase-buffer)
                  (insert (query-dbxml xquery t)))
                (nxml-mode)
                (format-xml)
                (goto-char (point-min))
                (when newbuffer (switch-to-buffer (current-buffer))))))
        
          (defun query-dbxml (xquery &optional timed)
            "Query the Momentum Berkeley DBXML database with an XQuery string"
            (let ((file (make-temp-file "elisp-dbxml-")))
              (write-region xquery nil file)
              (let ((result (time (shell-command-to-string
                                   (concat "cat " file " | " query-dbxml-pipeline)))))
                (delete-file file)
                (concat
                 (if timed (format "%.3f seconds\n\n" (car result)) nil)
                 (cadr result)))))
        
      2. Bind the query-dbxml-with-region function to a key chord
        (global-set-key (kbd "<C-S-return>") 'query-dbxml-with-region)
        
If you really want to run this on Emacs 22, you have to get NXML mode, which is included by default in Emacs 23.

Now that you have this hack running, let's take it for spin. Copy the following XML into an empty buffer in Emacs:

<users>
  <user>
    <first>Donnie</first>
    <last>Cameron</last>
    <handle>macnod</handle>
  </user>
  <user>
    <first>Tracy</first>
    <last>Cameron</last>
    <handle>abc</handle>
  </user>
  <user>
    <first>Olivia</first>
    <last>Shriver</last>
    <handle>def</handle>
  </user>
</users>
Let's say you want to know the number of users, the number of Camerons, and Olivia's last name. All you have to do is modify the above to look like this:
let $xml:=
<users>
  <user>
    <first>Donnie</first>
    <last>Cameron</last>
    <handle>macnod</handle>
  </user>
  <user>
    <first>Tracy</first>
    <last>Cameron</last>
    <handle>abc</handle>
  </user>
  <user>
    <first>Olivia</first>
    <last>Shriver</last>
    <handle>def</handle>
  </user>
</users>
return element answers {
  attribute users {count($xml/user)},
  attribute camerons {count($xml/user[last = "Cameron"])},
  attribute olivias-last-name {$xml/user[first = "Olivia"]/last}}
Select the whole lot and hit Control + Shift + Enter, and BAM!, Emacs displays the following in the result buffer:
0.067 seconds

<answers users="3" camerons="2" olivias-last-name="Shriver"/>
Of course, all of this becomes immensely more useful when you're dealing with a 22MB file.

--Donnie

AttachmentSize
pipeline.txt965 bytes