I often need to query an XML document without having to load it into a
native XML database. In a perfect world, I would simply load the document
into my editor, sprinkle some XQuery into the document, highlight the
portions I want to evaluate, then hit a key combination to view a new
window with the results. The whole operation should take all of a few
seconds for a large class of XML documents and queries.
Thankfully, this is indeed a perfect world. And, I'm an Emacs user. One of
my goals in life is to fool people into thinking that I'm some kind of mad
genius programmer, and the view of me quickly performing some ad-hoc
analysis of an XML file, in Emacs, using syntax-highlighted XQuery, can
have that effect.
Here's the hack that allows me to use Emacs to process XQuery code. I've
tested this on Ubuntu only, with Emacs 23; some modifications might be
needed for other flavors of Linux, Emacs, or other operating systems.
However, keep in mind that we're talking about Emacs, Perl, and DB XML, so
even your toaster should be able to run this hack.
- Overview
- Install the latest version of Berkley DB XML
- Install pipeline.pl
- Install xquery-mode.el
- Configure Emacs with some Elisp code
- Detailed Description
- Install the latest version of Berkley DB XML
- Download from here
- Extract DB XML, then move into the dbxml directory
- Build the software like this:
sudo ./buildall.sh -b 64 --prefix=/usr/lib/share/dbxml --enable-perl
Note:
- If your computer is a 32-bit machine, then use
-b 32 instead of -b 64
- The build process is extensive and will take a few
minutes even on a very fast machine.
- Install pipeline.pl
- Create a db-xml directory somewhere. Here's the location on my
computer: /home/donnie/lib/db-xml
- Create a db directory inside the db-xml directory
- Download pipeline.txt to the db-xml
directory, replace the .txt extension with .pl, change line 6 of the
program to point to the correct db-xml/db directory, and make
sure that the program has the executable bit set.
- Install
xquery-mode.el
- Configure Emacs with some Elisp code
-
Add these two functions to your .emacs file or wherever you see
fit:
(defun query-dbxml-with-region (beg end)
"Query dbxml with selected text"
(interactive "*r")
(let ((newbuffer nil)
(buffer (get-buffer "result"))
(xquery (buffer-substring beg end)))
(setq dbxml-result
(cond
((buffer-live-p buffer) buffer)
(t (setq newbuffer t) (generate-new-buffer "result"))))
(with-current-buffer dbxml-result
(with-timeout
(10 (insert "Gave up because query was taking too long."))
(erase-buffer)
(insert (query-dbxml xquery t)))
(nxml-mode)
(format-xml)
(goto-char (point-min))
(when newbuffer (switch-to-buffer (current-buffer))))))
(defun query-dbxml (xquery &optional timed)
"Query the Momentum Berkeley DBXML database with an XQuery string"
(let ((file (make-temp-file "elisp-dbxml-")))
(write-region xquery nil file)
(let ((result (time (shell-command-to-string
(concat "cat " file " | " query-dbxml-pipeline)))))
(delete-file file)
(concat
(if timed (format "%.3f seconds\n\n" (car result)) nil)
(cadr result)))))
- Bind the query-dbxml-with-region function to a key chord
(global-set-key (kbd "<C-S-return>") 'query-dbxml-with-region)
If you really want to run this on Emacs 22, you have to get
NXML
mode, which is included by default in Emacs 23.
Now that you have this hack running, let's take it for spin. Copy the
following XML into an empty buffer in Emacs:
<users>
<user>
<first>Donnie</first>
<last>Cameron</last>
<handle>macnod</handle>
</user>
<user>
<first>Tracy</first>
<last>Cameron</last>
<handle>abc</handle>
</user>
<user>
<first>Olivia</first>
<last>Shriver</last>
<handle>def</handle>
</user>
</users>
Let's say you want to know the number of users, the number of Camerons, and
Olivia's last name. All you have to do is modify the above to look like
this:
let $xml:=
<users>
<user>
<first>Donnie</first>
<last>Cameron</last>
<handle>macnod</handle>
</user>
<user>
<first>Tracy</first>
<last>Cameron</last>
<handle>abc</handle>
</user>
<user>
<first>Olivia</first>
<last>Shriver</last>
<handle>def</handle>
</user>
</users>
return element answers {
attribute users {count($xml/user)},
attribute camerons {count($xml/user[last = "Cameron"])},
attribute olivias-last-name {$xml/user[first = "Olivia"]/last}}
Select the whole lot and hit Control + Shift + Enter, and BAM!, Emacs
displays the following in the result buffer:
0.067 seconds
<answers users="3" camerons="2" olivias-last-name="Shriver"/>
Of course, all of this becomes immensely more useful when you're dealing
with a 22MB file.
--Donnie