Sonntag, 9. Oktober 2011

Trivial GNU Readline support

For some reason only few Common Lisp (CL) implementations has integrated "GNU Readline"-like funktionality. For most implementations you can download and use modules, like sbcl-readline for SBCL and ecl-readline for ECL, which are available on CLiki.

Here we will write trivial but extendable and portable GNU Readline support with following features:
  • readline function as main API
  • CL completion for symbols
  • Autoformatting and pretty printing of lisp code
  • Extendable prompt and completion functionality
  • Stream API to readline or similar function
  • API for integration in REPL, at least for SBCL and ECL

We begin the implementation of the package with the description of the API in first four code blocks, so you get a feeling for the scope of functions. The last blocks defines the meta information about our package and describes the build system. In the links the "U" points to positions, where the code is used and "D" points to positions where the code is defined. At the end of the article is a index of all code blocks and defined symbols.

This package I call trivial-readline:
<Trivial readline package name>= (U-> U-> U->)
:trivial-readline
Defines trivial-readline (links are to index).


1. The main function of our package is the readline function:

<Readline function API>= (U->)
readline (prompt &key (subprompt "->")
                      (name "trivial-readline")
                      (quotes "\"")
                      (completion #'default-completion))

This function reads from standard input one line of lisp code. It is possible to define different completion function, but per default it will use our own lisp completion function:

<Default completion function API>= (U->)
default-completion (text start end)

Here the text argument is text given so far and start/end points to the word which should be completed.

Although our library is completely based on CL data, it should also define and export a function that reads simple strings:

<String readline function API>= (U->)
string-readline (prompt)

The API symbols must be exported, to be usable from outside:

<Exported API symbols>= (U->) [D->]
:readline
:string-readline
:default-completion


2. With GNU Readline integration this package should also support the history functionality, therefor it needs functions to manipulate history. The first function appends one line to the history:

<Add history function API>= (U->)
add-history (text)

The history will be stored to a file and we will define an parameter for the filename:

<Global definitions>= (U->) [D->]
(defparameter *history-file* <Default history file name>)
Defines *history-file* (links are to index).

Typicaly history file is an file in the user home diretory:

<Default history file name>= (<-U)
(merge-pathnames (make-pathname :name ".trivial_readline_history")
                 (user-homedir-pathname))

And we need function to read from the history file to our environment:

<Read history function API>= (U->)
read-history ()

It takes no arguments, because it reads the entries from the file name parameter and stores it in the GNU Readline environment, similar is also the write function:

<Write history function API>= (U->)
write-history (&optional (data nil))

The write-history has an optional data argument, to combine the princ-to-string of the lisp data, add-history to add the line string to current environment and write-history to write the history to our history file. When it's called without the optional argument, then it works like regular write-history function.

<Exported API symbols>+= (U->) [<-D->]
:*history-file*
:add-history
:read-history
:write-history


3. Often it is very handy to have an stream class, which reads data with the readline function, so you can easy integrate readline functionality to existing applications. We define for it an input stream class:

<Readline stream class API>= (U->)
readline-stream (fundamental-character-input-stream)

The fundamental-character-input-stream class comes from trivial-gray-streams package and represents standard CL stream. So we create a new type of standard input stream, which reads data with our readline function, enstead of a file descriptor or similar. Such a stream is everywhere usable, where a standard CL stream is required.

We define for our stream also an accessor to get and set the effective function, which should return string lines. It is defined as accessor to an class slot:

<Readline accessor method API>= (U->)
:accessor readline-stream-readline
Defines readline-stream-readline (links are to index).

In this way it is possible to write own readline functions and use them in our stream objects.

<Exported API symbols>+= (U->) [<-D->]
:readline-stream
:readline-stream-readline


4. Our readline function have also a prompt and is either a string or a function. The prompt is printed before reading the line. In case of a function as prompt, this function is executed and should return the prompt string. We define in our package a default prompt function:

<Default prompt function API>= (U->)
default-prompt ()

<Exported API symbols>+= (U->) [<-D->]
:default-prompt


5. We begin the implementation of our package with the readline function. The readline function initialize the GNU Readline subsystem and reads history on each call, to have correct history in different readline calls.

<Readline function>= (U->) [D->]
(defun <Readline function API>
  <Initialize readline functionality>
  <Read history from the history file>
  <Read lines until get full expression>)
Defines readline (links are to index).

Our readline function calls the readline function from GNU Readline library multiple times, as long as the result dont give a full lisp expression. The expression is then appended to the history with write-history function and returned:

<Read lines until get full expression>= (<-U)
(loop for curr = (string-readline (prompt-string prompt))
            then (string-readline (prompt-string subprompt))
      for lines = (join-lines lines curr)
      for (readen . data) = (deserialize-from-string lines)
      when (and (not curr) (not readen)) do (return nil)
      when readen do (return (write-history data)))

This function uses on the first line the main prompt and the subprompt on the consequent lines. The readen lines are joined with join-lines and parsed with deserialize-from-string, to which we will look later.

<Readline function>+= (U->) [<-D]
(defun join-lines (line1 line2)
  (if (null line2) line1
    (with-output-to-string (fd)
      (when line1 (write-line line1 fd))
      (write-string line2 fd))))
Defines join-lines (links are to index).

When string-readline returns a value of nil, then reading from input provided an end of file. When we have end of file on input and the already readen data could not be parsed, then we return nil. But if the data could be readen fully, then add this data to the history, write the history and return this data. Otherwise we call string-readline again to get more data.

To have an working history, we need to read the history file before readline is started. We could simply use the current history in the GNU Readline environment, enstead to read it on each call. But this way you can use different readline functions with different histories in the same program. This behaviour has the side effect, that each time you enter a command somewhere, all other processes, which uses the same history file will get this line in its history. You can see it as a feature or as a bug, but I like it.

<Read history from the history file>= (<-U)
(read-history)

The string-readline function cals callback function on pressing TAB for completion. This function should return list of strings, as proposes for completion. So we need to say the readline function our completion callback given in argument key completion:

<Initialize readline functionality>= (<-U) [D->]
(setf *current-completion* completion)


6. The default prompt shows only the current package name:

<Default prompt function>= (U->) [D->]
(defun <Default prompt function API>
  (format nil "~a> " (package-name *package*)))
Defines default-prompt (links are to index).

The prompt can be defined as string or function, so we need the prompt-string function, which returns string for the readline function.

<Default prompt function>+= (U->) [<-D]
(defun prompt-string (prompt)
  (coerce (cond ((null prompt) "")
                ((functionp prompt) (funcall prompt))
                (t prompt))
          'string))
Defines prompt-string (links are to index).


7. Stream is not directly depended to the GNU Readline functionality, but defines general input stream with line-based buffer. To define it, we need the Gray Streems, which implements extendable streams.

<External dependencies>= (U-> U->) [D->]
:trivial-gray-streams

The stream class contains buffer and current position in the buffer. This is an character input stream, so it will be readen character wise:

<Readline stream class>= (U->)
(defclass <Readline stream class API>
  ((in-buffer :initform (make-string 0))
   (in-index  :initform 0)
   (readline  <Readline accessor method API>
              :initform (lambda () (readline #'default-prompt)))))
<Method for character reading>
<Method for unreading character>
Defines in-buffer, in-index, readline-stream (links are to index).

The input stream logic is very simple:

<Method for character reading>= (<-U)
(defmethod stream-read-char ((stream readline-stream))
  (with-slots (in-buffer in-index readline) stream
    <Fill the readline stream buffer if required>
    <Return next character from stream buffer, or end of file>))
Defines stream-read-char (links are to index).

Everytime the read method of this stream comes to the length of current buffer, it calls the readline function from readline slot, converts it to string and resets the index:

<Fill the readline stream buffer if required>= (<-U)
(when (= in-index (length in-buffer))
  (let* ((data (funcall readline))
         (line (serialize-to-string data :newline t :escape t)))
    (setq in-index 0 in-buffer line)))

The conversion to string is required, because we define an character stream, which returns charaters or strings, but no other data. The string should be then escaped, it means so as entered, and with newline character at the end.

On read, if readline function returns nil, the stream should handle it as end of file, otherwise simply return next character and increment the index:

<Return next character from stream buffer, or end of file>= (<-U)
(if (not in-buffer) :eof
  (prog1 (char in-buffer in-index)
    (incf in-index)))

The specification of Gray Streams defines, that also the stream-unread-char must be defined, in our implementation it simply decrases index, if possible:

<Method for unreading character>= (<-U)
(defmethod stream-unread-char ((stream readline-stream) character)
  (with-slots (in-index) stream
    (if (> in-index 0)
      (decf in-index))))
Defines stream-unread-char (links are to index).


8. One of the main differences between trivial-readline and GNU Readline, is the transpartent handling and formatting of CL data. To do it, we need a conversion function from data to an string, which can be converted back to data simply with read or read-from-string function:

<String serialization functions>= (U->)
(defun serialize-to-string (data &key (escape nil) (newline nil))
  (when data
    (let ((data (write-to-string data :escape escape
                                      <String printing parameters>)))
      (if (not newline) data
        (format nil "~A~%" data)))))
<String deserialization function>
Defines serialize-to-string (links are to index).

We use the standard formatter of CL for this functionality, but to avoid unexpected results in the case a user changes some global parameters, we should set all parameter explicitely.

<String printing parameters>= (<-U)
:array t          ; Output arrays
:base 10          ; Decimal numbers
:case :downcase   ; Downcase letters
:circle t         ; Print recursion correctly
:gensym t         ; Print gensyms correctly
:length nil       ; No length limit
:level nil        ; No depth limit
:lines nil        ; No lines limit
:pretty t         ; It should look nice
:radix nil        ; No radix output
:readably t       ; It should be all in one line
:right-margin nil ; No right margin limit

The deserialization function is the regular read function, but with one exception. The deserialize-from-string returns a cons, whose first argument is a boolean value and the secound argument reprents the actual data. This boolean value indicates whether the deserialization was successful:

<String deserialization function>= (<-U)
(defun deserialize-from-string (line)
  (if (null line) (cons nil nil)
    (handler-case (cons t (read-from-string line))
      (end-of-file (e) (cons nil e)))))
Defines deserialize-from-string (links are to index).

This are usefull serialization and deserialization functions, so we export them to:

<Exported API symbols>+= (U->) [<-D->]
:serialize-to-string
:deserialize-from-string


9. The completion functionality is done per default by default-completion function. This function uses the cl-ppcre package to analyze the text, we are trying to complete. So we define cl-ppcre as external dependency:

<External dependencies>+= (U-> U->) [<-D->]
:cl-ppcre

The default-completion function takes the word, we are trying to complete, analyzes it with regular expressions and generates suggestions:

<Default completion>= (U->)
<Macro to find symbols>
(defun <Default completion function API>
  (let ((word (subseq text start end)))
    (or (find-symbols (pkg sym) ("^([^:]+):([^:]+)$" word) :ext pkg sym)
        (find-symbols (pkg sym) ("^([^:]+)::([^:]+)$" word) :int pkg sym)
        (find-symbols (sym) ("^(:[^:]+)$" word) :int :keyword sym)
        (find-symbols (pkg) ("^([^:]+):$" word) :ext pkg "")
        (find-symbols (pkg) ("^([^:]+)::$" word) :int pkg "")
        (find-symbols (sym) ("^([^:]+)$" word) :int *package* sym))))
Defines default-completion (links are to index).

The actual job is done by find-symbols funtion, which takes regular expression and generates fo the given word and package a list of suggestions. If regular expression does not match, or no suggestions are found, then it returns nil:

<Macro to find symbols>= (<-U)
(defmacro find-symbols (vars regex styp pkg sym)
  `(register-groups-bind (,@vars) (,@regex :sharedp t)
     (let* ((pkg (search-for-package ,pkg))
            (sym (regex-replace "^.*:+" ,sym ""))
            (len (length sym))
            (slist nil))
       (if (null pkg) nil
         (,(if (eql styp :ext) 'do-external-symbols
             (if (eql styp :int) 'do-symbols styp))
           (found pkg slist)
           (let* ((found (serialize-to-string found :escape t))
                  (name (regex-replace "^.*:+" found "")))
             (when (and (>= (length name) len)
                        (string-equal (subseq name 0 len) sym))
               (push found slist))))))))
Defines find-symbols (links are to index).

If no suggestions are found at all, then GNU Readline should call his own default completion functionality.
Other function we need, is search-for-package to take package name and return the package object. This function is used in a macro, so it needs to be defined at comple time.

<Search for package function>= (U->)
(eval-when (:compile-toplevel :load-toplevel :execute)
  (defun search-for-package (pkg)
    (cond ((stringp pkg) (find-package (read-from-string pkg)))
          ((symbolp pkg) (find-package pkg))
          ((packagep pkg) pkg)
          (t (error "'~S' can not be used to search for package" pkg)))))
Defines search-for-package (links are to index).


10. To integrate GNU Readline we use CFFI package. This packages loads the appropriate library and give us the possibility of calling functions of this library in CL. We will prefix all externaly implemented functions with "%" sign, because they require special handling for arguments and return values.

<Import external GNU Readline library>= (U->)
(define-foreign-library libreadline
  (:unix "libreadline.so")
  (t (:default "libreadline")))
(use-foreign-library libreadline)
<Declaration of external variables>
<Declaration of external functions>

We need also to define CFFI as external dependency:

<External dependencies>+= (U-> U->) [<-D]
:cffi

The first function, we should define here, is %readline:

<Declaration of external functions>= (<-U) [D->]
(defcfun (%readline "readline") :pointer (prompt :pointer))
Defines %readline (links are to index).

We need for readline the function to generating completion results:

<Declaration of external functions>+= (<-U) [<-D->]
(defcfun (%matches "rl_completion_matches")
  :pointer (text :pointer) (func :pointer))
Defines %matches (links are to index).

We cannot allocate memory for strings with CFFI functions, because the readline function frees this memory by self. We will use here simply strdup function, to copy our lisp strings:

<Declaration of external functions>+= (<-U) [<-D->]
(defcfun (%strdup "strdup") :pointer (text :pointer))
Defines %strdup (links are to index).

In the same way we declare also the function for history manimulations:

<Declaration of external functions>+= (<-U) [<-D]
(defcfun (%read-history "read_history") :void (fname :pointer))
(defcfun (%write-history "write_history") :void (fname :pointer))
(defcfun (%add-history "add_history") :void (text :pointer))
(defcfun (%clear-history "clear_history") :void)
Defines %add-history, %clear-history, %read-history, %write-history (links are to index).

To have a readline function which works with strings, we define the string-readline function which converts the arguments and return values automaticaly:

<String readline function>= (U->)
(defun <String readline function API>
  (convert-from-foreign
    (with-foreign-string (cp prompt)
      (%readline cp))
    :string))
Defines string-readline (links are to index).


11. GNU Readlien defines some variables, which need to be used or set respectively. Therefore we import it to our package with CFFI. The first one, is the buffer with current text:

<Declaration of external variables>= (<-U) [D->]
(defcvar (*rl-line-buffer* "rl_line_buffer") :pointer)
Defines *rl-line-buffer* (links are to index).

The remaining variables are settings to the readline system:

<Declaration of external variables>+= (<-U) [<-D]
(defcvar (*rl-readline-name* "rl_readline_name") :pointer)
(defcvar (*rl-quote-chars* "rl_basic_quote_characters") :pointer)
(defcvar (*rl-completion-func* "rl_attempted_completion_function") :pointer)
Defines *rl-completion-func*, *rl-quote-chars*, *rl-readline-name* (links are to index).

This variables should be set on each call to the readline function:

<Initialize readline functionality>+= (<-U) [<-D]
(setf *rl-readline-name* (convert-to-foreign name :string))
(setf *rl-completion-func* (callback complete-cb))
(setf *rl-quote-chars* (convert-to-foreign quotes :string))

The convert-to-foreign function comes from CFFI and converts the lisp data to C structures, in this case it converts a lisp string to C string. The callback-cb is a wrapper, which calls the defined callback and converts the result to C structures:

<Completion callback wrapper>= (U->)
<Completion callback for entries>
(defcallback complete-cb :pointer ((ctext :pointer) (start :int) (end :int))
  (let* ((line (foreign-string-to-lisp *rl-line-buffer*))
         (*current-result*
           (if (not *current-completion*) nil
             (coerce (funcall *current-completion* line start end) 'list))))
    (if (eql *current-result* nil) (null-pointer)
      (%matches ctext (callback complete-entry-cb)))))
Defines complete-cb (links are to index).

This functions communicates with our lisp environment through dynamic variables *current-result* and *current-completion*, so we need to define it as such:

<Global definitions>+= (U->) [<-D]
(defvar *current-completion* nil)
(defvar *current-result* nil)

We use here the callback, spezified in the readline function, but convert the arguments and the return value properly. Here is also an additional callback used, which is not required by self, but is required by the %matches function. In our case, it is basicaly the call to pop function, but with C data handling:

<Completion callback for entries>= (<-U)
(defcallback complete-entry-cb :pointer ((ctext :pointer) (state :int))
  (declare (ignore ctext state))
  (if (not *current-result*) (null-pointer)
    (with-foreign-string (rettext (pop *current-result*))
      (%strdup rettext))))
Defines complete-entry-cb (links are to index).


12. Through CFFI the history managment functions are imported, but to be usable, we need to define wrappers with argument and return values conversions:

<History functions>= (U->) [D->]
(defun <Add history function API>
  (with-foreign-string (x text)
    (%add-history x)))
Defines add-history (links are to index).

The read-history function automaticaly clears the history, before reading it:

<History functions>+= (U->) [<-D->]
(defun <Read history function API>
  (when *history-file*
    (with-foreign-string (fname (princ-to-string *history-file*))
      (%clear-history)
      (%read-history fname))))
Defines read-history (links are to index).

To have a convinient way to add entries to the history, we define a write-history function, which automaticaly serializes the data, adds it, writes it to the history file and finaly returns the given data:

<History functions>+= (U->) [<-D]
(defun <Write history function API>
  (let ((line (serialize-to-string data :escape t)))
    (prog1 data
      (when data (add-history line))
      (when *history-file*
        (with-foreign-string (fname (princ-to-string *history-file*))
          (%write-history fname))))))
Defines write-history (links are to index).


13. It would be nice if you could use the readline function in REPL. Unfortunately, every CL defines its own interface for adjusting the REPL. For this reason, we define a function, which makes its own adjustments on each implementation:

<REPL activation function>= (U->)
(defun activate-in-repl ()
  #+sbcl (progn <REPL activation in SBCL>)
  #+ecl  (progn <REPL activation in ECL>)
  #-(or sbcl ecl) (error "only sbc and ecl are supported for now")
  (values))
Defines activate-in-repl (links are to index).

On SBCL we need to so set the sb-impl::*repl-read-form-fun* variable to replace the read function on REPL:

<REPL activation in SBCL>= (<-U) [D->]
(setf sb-impl::*repl-read-form-fun*
      (lambda (in out)
        (declare (ignore in out))
        (readline #'default-prompt :name "sbcl")))

To disable SBCL's own prompt, we need to set the sb-impl::*repl-prompt-fun* variable:

<REPL activation in SBCL>+= (<-U) [<-D]
(setf sb-impl::*repl-prompt-fun*
      (lambda (stream) (declare (ignore stream)) (terpri)))

In ECL it is more complicated, we need to replace the *standard-input* with our own stream and we need to disable ECL's prompt:

<REPL activation in ECL>= (<-U)
(let* ((input (make-instance 'readline-stream))
       (output (two-way-stream-output-stream *terminal-io*))
       (inout (make-two-way-stream input output))
       (lastlevel system::*break-level*))
  (setf system::*tpl-prompt-hook* (lambda () (fresh-line))
        (readline-stream-readline input)
            (lambda ()
                (if (not (= system::*break-level* lastlevel))
                    (progn (setf lastlevel system::*break-level*) t)
                    (readline (lambda ()
                                (if (> system::*break-level* 0)
                                    (format nil "DBG ~A> " system::*break-level*)
                                    (funcall #'default-prompt)))
                              :name "ecl")))
        *standard-input* input
        *terminal-io* inout
        system::*standard-input* inout))

I've decided to use a history file for both implementations. This allows the things I've tried in a system very easy to try in another, just where I take it from history.

<Exported API symbols>+= (U->) [<-D]
:activate-in-repl



14. Our resulting code file looks like follows:

<trivial-readline.lisp>=
(in-package <Trivial readline package name>)
;;
;; Package parameters
<Global definitions>
;;
;; CFFI - integration of libreadline.so
<Import external GNU Readline library>
<Completion callback wrapper>
;;
;; Main functions
<Readline function>
<String readline function>
<String serialization functions>
;;
;; Completion and history
<Search for package function>
<Default completion>
<Default prompt function>
<History functions>
;;
;; Inputstream and integration in REPL
<Readline stream class>
<REPL activation function>



15. To have a fully working package, we need to define the trivial-readline package in the package file:

<package.lisp>=
(in-package :cl-user)
(defpackage <Trivial readline package name>
  (:nicknames :rl)
  (:use :common-lisp
        <External dependencies>)
  (:export <Exported API symbols>))

To be usable with ASDF, we should also define an .asd file:

<trivial-readline.asd>=
(in-package :cl-user)
(defpackage :rl.system
  (:use :cl :asdf))
(in-package :rl.system)
(defsystem <Trivial readline package name>
  :version "0.1"
  :author "Oleksandr Kozachuk <ddeus.lp@mailnull.com>"
  :license "WTFPL"
  :serial t
  :depends-on (<External dependencies>)
  :components ((:file "package")
               (:file "trivial-readline")))


16. This package is written with WEB, a computer programming concept created by Donald E. Knuth. The particular implementation used here is NoWEB developed by Norman Ramsey. This whole package is written in one NoWEB File and we write here a shell script, which uses installed NoWEB distribution and generates all required files:

<build.sh>=
#!/bin/sh
export FINDUSES_LISP=1
noweave -index -html -n trivial-readline.nw | sed '/^ *$/d' > trivial-readline.html
notangle -Rbuild.sh trivial-readline.nw >build.sh
notangle -Rtrivial-readline.lisp trivial-readline.nw >trivial-readline.lisp
notangle -Rpackage.lisp trivial-readline.nw >package.lisp
notangle -Rtrivial-readline.asd trivial-readline.nw >trivial-readline.asd

The FINDUSES_LISP variable is for my patched version of NoWEB which has better support for CL symbols. This patch is not realy required, but makes better formatting of this document. This patch can be downloaded here.


17. This packages is available as NoWEB source here:
trivial-readline.nw

You can build the build.sh script with the command:

notangle -Rbuild.sh trivial-readline.nw >build.sh
With the build.sh script you can produce the rest of files, inclusive this HTML page.

Code chunk index