Documenting Common Lisp Projects for GitHub

I want to be able to automatically create the README.md for GitHub whenever I create a new Common Lisp project.  I was impressed by the number of systems that exist for documenting a Common Lisp package.  However, I couldn’t find anything that was simple and that created Markdown output, like the README.md that GitHub uses.

I was in a hurry, so after a few minutes of Googling, I decided to create my own function for documenting packages for GitHub.  It worked.  Sort of.  It documents function only, not macros or other things.  Here’s the function:

(defun document-package (package output-filename)
  "Documents the Common Lisp package PACKAGE and writes that documentation to the file given by OUTPUT-FILENAME."
  (loop for function being the external-symbols of (find-package package)
     when (and (fboundp function) (documentation (symbol-function function) t))
     collect
       (list :function function
             :function-name (string-downcase function)
             :documentation (documentation (symbol-function function) t))
     into functions
     finally
       (return (loop for function in 
                    (sort functions #'string<
                          :key (lambda (x) (getf x :function-name)))
                  collect (format nil "## ~a ~a~%~a"
                                  (string-downcase (getf function :function-name))
                                  (replace-regexs
                                   (format nil "~s"
                                           (sb-introspect:function-lambda-list
                                            (symbol-function (getf function :function))))
                                   '(("\\s\\s+" " ")
                                     ("DC-UTILITIES::" "")))
                                  (getf function :documentation))
                  into function-docs
                    finally (spew (format nil "# ~a~%~{~a~^~%~%~}" package function-docs)
                                  output-filename)))))

That function, when called like this: (document-package :dc-utilities “~/README.md”) produces Markdown that looks like this:

# DC-UTILITIES
## alist-values (ALIST &REST KEYS)
Returns the values associated with KEYS in ALIST.  ALIST is an associative list.

## bytes-to-uint (BYTE-LIST)
Converts the list of bytes BYTE-LIST into an unsigned integer.

## change-per-second (FUNCTION-OR-SYMBOL &OPTIONAL (SECONDS 1))
Given the function FUNCTION-OR-SYMBOL, who's return value changes over time, or a variable who's value changes over time, with the change being unidirectional, this function computes the rate of change by calling the function, sleeping, then calling the function again, then computing the rate of change per second.  You can optionally specify the number of seconds to wait between calls.  If FUNCTION-OR-SYMBOL is a variable, then this function retrieves the value of the variable, sleeps, then retrieves the value of the variable again.

## create-directory (DIR &KEY WITH-PARENTS)
Works just like the mkdir shell command.  DIR is the directory you want to create. Use WITH-PARENTS if you want the function to create parent directories as necessary.

## cull-named-params (NAMED-PARAMS CULL-KEYS)
Given a value for NAMED-PARAMS like this one

    '(:one 1 :two 2 :three 3)

and a list of CULL-KEYS like this one

    '(:one :two)

this function returns a list of named parameters that excludes the names (and their values) that match the names in CULL-KEYS.  In the above example, the result is

    '(:three 3)

## directory-exists (PATH)
Returns a boolean value indicating if the directory specified by PATH exists.

.
.
.

Which looks like this on GitHub:

Is there a package out there already that can do this better?  If so, please leave me a comment.  If not, I need to find out how to get the documentation for a macro, for a method, for a special variable, for a class, and so on.  Any help would be greatly appreciated.

Other things I want this function to do:

  • Check the .asd file to see if there’s a license bit: :license "MIT License"If there is, the code should check to see if there’s a LICENSE file in the project.  If there isn’t, the code should determine if the name of the license is known.  If the code is able to get the text for the license, it should include it at the beginning of the documentation.
  • Optionally produce a table of contents.
  • Output other formats (aside from Markdown) as well, like Text or Emacs Org-Mode.

For now, the document-package function is just a function in the dc-utilities package.  However, once I flesh the function out a little, I will probably move it to its own repo.  If you get ahead of me with any of this, please send me a note.  I could really use this functionality.

Predicting a User’s Movie Preferences

One thing that online video distributors (such as Netflix) don’t do yet (and that they will most certainly do in the future–some of the tech is already in place) is to take into account the content of the movies, such that they can learn a user’s likes in a way that is independent from (and complimentary to) the likes of other users with similar preferences.

It’s been possible for while, for example, to automatically identify “types” of documents and to tag or cluster those documents with a high degree of accuracy. Early last year, using a euclidean-distance algorithm that worked on trigram-vector representations of text, I created an example of just such a thing, document affinity, using Wikipedia articles. In the example, you could choose a Wikipedia article and the system would list the other Wikipedia articles in order of similarity to the one you chose. Despite the fact that this method is rather well known, the results of that experiment were nothing short of astonishing, and provided a quality of links among Wikipedia articles that exceeds anything that exists today, much better even than the human-made links that are in the articles. Moreover, consider that the bag-of-trigrams method I just described is archaic by comparison to some of the algorithms that research groups are exploring today, like recurrent neural networks using word vectors (my favorite).

Imagine if, in addition to resorting to the standard preference algorithms, an online video distributor (such as Eros Now or Netflix) were to analyze the script of the movie (subtitles, for example), identify similar movie scripts, and account for that when choosing other movies that you might like. The distributor’s predictions would improve in a dramatic way.

The technology to analyze the text of the movies exists already. But, if you look into the future, you’ll find that soon computers will be able to analyze the visual part of the movie, the frames themselves, the motion, and the sounds. Computers will be able to combine that type of analysis with script analysis for a full-content analysis that will increase the preference predictions even more markedly. Google and others have already started to do some work in related areas, largely based on the contributions to deep learning that Geoffrey Hinton and his students have made in the last decade, including rectified linear units, dropout, ways of combining convolutional neural-network layers with fully-connected layers for higher abstraction, and word-vectors and phrase-vector training.

We’re going to see far better preference predictions in the near future, probably by smaller companies in the beginning. In the long run, this kind of prediction, and the more advanced ones that I described above, which encroach into the realm of human cognitive abilities, will become widespread.

My Wishlist for the Future

I’m am waiting for the following developments and contributing in any way that I can to their fruition:

Technology

  • All languages to converge toward a Lisp-like language
  • Trillion-fold increase in computer processing power
  • Superhuman-level artificial general intelligence
  • Human mind uploads
  • Indefinite human lifespans

Government

  • Dissolution of all international borders
  • Fully distributed, internet-like, free-market, and non-invasive economy, with multiple distributed-ledger based currencies
  • Consolidation of all taxes, including income tax, sales tax, gas tax, local taxes, and property taxes into a single, monthly bill, at the local level
  • Completely open government process that collects and incorporates feedback from citizens, uses revision control, and uses open-source software to automate all aspects of government
  • Replacement of all government-issued business licenses (including medical licenses, legal licenses, engineering licenses, education licenses, and driving licenses) with commercial rating systems as well as legal and enforcement systems that can encourage honesty and deter wrongdoers
  • 1 year sunset on all regulations

I fully expect all of these developments to happen in my lifetime. They will bring tremendous wealth, health, equality, meaning, and worthiness to all of humanity.