2012/12/30

A framework for CPS transformation (and a Github account)

I have implemented a framework for efficient transformation of CPS code. The code is too big to be presented in a blog post; I have set up a github account to put it (https://github.com/mlemerre/l-lang/). It has been working for several months, but I have spent a long time to improve its structure, write commented interfaces, and document it to make it easily readable, as I have done with the previous modules. Do not hesitate to tell me about any comments you may have, on the code or documentation.

The code is based on the paper "Compiling with continuations, continued" by Andrew Kennedy (which is very well written and easy to read), itself inspired by "Shrinking lambda expressions in linear time", by Andrew W. Appel and Trevor Jim.

The CPS representation in Andrew Kennedy's paper provides many interesting features:

  • It is efficiently compilable and can use a stack; see the translation of this representation to the SSA form of LLVM.
  • The representation separates continuation from normal functions; this ensures that continuations do not require heap allocations and are compiled into jumps. This allows to express control flow, and control flow optimizations, in the representation.
  • Appel and Jim, and Kennedy have developed a representation that allows efficient (in-place) rewrite of terms while maintaining the links between variables, their occurrences, and their binding sites. This allows to implement shrinking reductions (and other transformations, such as closure conversion) in linear time.

    Shrinking reductions rules are very easy to understand, and can be used as a basis for expressing guaranteed optimizations. For instance, it should be easy to state that functions used only once are inlined, that tuples used only to pass information locally are not heap-allocated, etc.

The modules I have implemented provide means to access, print, or change terms in the CPS intermediary language. The main modules, are represented on the figure below.

The Base module is the entry point for accessing the CPS representation, and the first module if trying to understand my code. It provides read-only direct access to the CPS representation, and to the links between variables, occurrences, and their binding sites. It also gives access to the other modules that implement the CPS manipulation framework:

  • Ast: Really a part of Base representation, it provides access to the CPS representation using simple algebraic datatypes of syntax tree.
  • Check: Allows checking for some invariants of terms in the CPS representation. Some information in the representation is redundant, which allows fast access; this module checks that redundant information is in sync. For instance, if a term contains a variable, it checks that the variable's uplink also points to that term.
  • Build: Provides functions that allows to create new terms in the CPS representation, without worrying about the complexities of the representation. The API of this module is based on the idea of higher-order abstract syntax, i.e. where binding creations in the destination language (L) language correspond to creation of bindings in the source language (OCaml).
  • Traverse: Allows folding and iteration on the terms, variables, and occurrences of the CPS representation. Using this module allows, in particular, code to be independent of future changes to the CPS data structures, which will be gradually improved.
  • Change: Provides high-level functions to change CPS terms. This is how transformation passes modify terms in the CPS representation.
  • Print: Provides a human-readable representation of the CPS form. I have tried to come up with a representation that is "easy" to read; the representation looks more like SSA and/or assembly than classical lambda-calculus (which make it much easier to read on huge terms). This textual representation should also be easily parsable (although I did not write the parser).
  • Def: Implements and provides accessors and a first level of abstraction to the CPS data structures. It relies on Var, which implements the relationships between variables and their occurrences (itself based on the Union_find data structure described earlier on this blog).

The other modules on the figure are Union_find and Unique, which are "support" modules; and Closure_conversion and Shrinking_reductions, which are transformation passes on the CPS representation. These two passes are not yet in the github repository.

I have a working closure conversion that gives me a complete basic working compiler for the L programming language, based on this CPS framework. I plan to document it and upload it to github very soon. I also have implemented some basic shrinking reductions.

Next I will concentrate on the parser and the L abstract syntax tree, and I will be will be covering the syntax and semantics of L in a future blog post. But here is already an excerpt of test L code (that uses first-class functions) that can be compiled to LLVM:

assert( { let true = { (x,y) -> x }
          let false = { (x,y) -> y }
          let pair = { (first,second) -> boolean -> boolean( first, second) }
          let first = { p -> p( true)}
          let second = { p -> p( false)}
          let p = pair( 7, 5)
          second( p) * (first( p) + second( p)) } == 60)

2012/12/20

Using OCaml packages with ocamlbuild: a recipe

Using OCaml packages with ocamlbuild: a recipe

Using OCaml packages with ocamlbuild can raise different error messages that are difficult to trace; there is a documentation ( http://brion.inria.fr/gallium/index.php/Ocamlbuild_and_module_packs ) that helps, but does not provide a step-by-step guide with common pitfalls, so here is one.

Compilation of bytecode files

First create a project with the following files:

File: plu/bla.ml

let bla = 3;;

File: plu/bli.ml

let bli = Bla.bla;;

File: plu.mlpack

plu/Bla
plu/Bli

File: use.ml

Plu.Bli.bli

Note: all the files do not have to be in a directory named plu/; this is just my convention.

Try to compile:

ocamlbuild use.byte
/usr/bin/ocamlc -c -o use.cmo use.ml
+ /usr/bin/ocamlc -c -o use.cmo use.ml
File "use.ml", line 1, characters 0-11:
Error: Unbound module Plu
Command exited with code 2.

Compilation will not work, probably because by default Ocamlbuild does not traverse directories. It will work if you use the -r option to ocamlbuild, or create a (even empty) _tags file at the root of the project, that will allow ocamlbuild to traverse the directories:

ocamlbuild use.byte
/usr/bin/ocamldep -modules plu/bla.ml > plu/bla.ml.depends
/usr/bin/ocamldep -modules plu/bli.ml > plu/bli.ml.depends
/usr/bin/ocamlc -c -I plu -o plu/bla.cmo plu/bla.ml
/usr/bin/ocamlc -c -I plu -o plu/bli.cmo plu/bli.ml
/usr/bin/ocamlc -pack plu/bla.cmo plu/bli.cmo -o plu.cmo
/usr/bin/ocamlc -c -o use.cmo use.ml
/usr/bin/ocamlc plu.cmo use.cmo -o use.byte

Notice that ocamlbuild has added -I plu when compiling modules of the Plu package; so for instance, Bli can access Bla.

This odd behaviour can be the cause of many headaches, so I prefer to document it here (and I filed a bug report about this issue here).

Also note that if you make a mistake in the plu.mlpack file, for instance an error in the name of the directory or in that of the module, ocamlbuild will fail without much warning. It will fail with a compile error, as before; it may also fail at the linking step, if you make a mistake in the plu.mlpack file, but you had successfully compiled a previous version. This problem can also be painful to track, so I hope this help someone.

ocamlbuild use.byte
/usr/bin/ocamlc use.cmo -o use.byte
+ /usr/bin/ocamlc use.cmo -o use.byte
File "_none_", line 1, characters 0-1:
Error: Error while linking use.cmo:
Reference to undefined global `Plu'
Command exited with code 2.

Adding native compilation

You cannot pack a set of .cmx files (resulting from compilation of a file) if you did not specified that the .cmx can be packed when you compile the file. Failure to do that result in an error like this:

ocamlbuild use.native
/usr/bin/ocamldep -modules use.ml > use.ml.depends
/usr/bin/ocamldep -modules plu/bla.ml > plu/bla.ml.depends
/usr/bin/ocamldep -modules plu/bli.ml > plu/bli.ml.depends
/usr/bin/ocamlc -c -I plu -o plu/bla.cmo plu/bla.ml
/usr/bin/ocamlc -c -I plu -o plu/bli.cmo plu/bli.ml
/usr/bin/ocamlc -pack plu/bla.cmo plu/bli.cmo -o plu.cmo
/usr/bin/ocamlc -c -o use.cmo use.ml
/usr/bin/ocamlopt -c -I plu -o plu/bla.cmx plu/bla.ml
/usr/bin/ocamlopt -c -I plu -o plu/bli.cmx plu/bli.ml
touch plu.mli  ; if  /usr/bin/ocamlopt -pack -I plu plu/bla.cmx plu/bli.cmx -o plu.cmx  ; then  rm -f plu.mli  ; else  rm -f plu.mli  ; exit 1; fi
+ touch plu.mli  ; if  /usr/bin/ocamlopt -pack -I plu plu/bla.cmx plu/bli.cmx -o plu.cmx  ; then  rm -f plu.mli  ; else  rm -f plu.mli  ; exit 1; fi
File "plu.cmx", line 1, characters 0-1:
Error: File plu/bla.cmx
was not compiled with the `-for-pack Plu' option
Command exited with code 1.

The lines:

/usr/bin/ocamlopt -c -I plu -o plu/bla.cmx plu/bla.ml
/usr/bin/ocamlopt -c -I plu -o plu/bli.cmx plu/bli.ml

Need to be compiled with the -for-pack Plu option. This is achieved by a simple change in the _tags file: just fill it with

<plu/*.cmx>: for-pack(Plu)

And now everything works:

ocamlbuild use.native
/usr/bin/ocamldep -modules use.ml > use.ml.depends
/usr/bin/ocamldep -modules plu/bla.ml > plu/bla.ml.depends
/usr/bin/ocamldep -modules plu/bli.ml > plu/bli.ml.depends
/usr/bin/ocamlc -c -I plu -o plu/bla.cmo plu/bla.ml
/usr/bin/ocamlc -c -I plu -o plu/bli.cmo plu/bli.ml
/usr/bin/ocamlc -pack plu/bla.cmo plu/bli.cmo -o plu.cmo
/usr/bin/ocamlc -c -o use.cmo use.ml
/usr/bin/ocamlopt -c -for-pack Plu -I plu -o plu/bla.cmx plu/bla.ml
/usr/bin/ocamlopt -c -for-pack Plu -I plu -o plu/bli.cmx plu/bli.ml
touch plu.mli  ; if  /usr/bin/ocamlopt -pack -I plu plu/bla.cmx plu/bli.cmx -o plu.cmx  ; then  rm -f plu.mli  ; else  rm -f plu.mli  ; exit 1; fi
/usr/bin/ocamlopt -c -o use.cmx use.ml
/usr/bin/ocamlopt plu.cmx use.cmx -o use.native

I will update this post if/when I find more weird error messages using packages with ocamlbuild; or if you find such error, just tell me!

Update: linking problem when refering to modules outside of the package

I discovered a new problem with ocamlbuild. The problems occur when you put code in a library which is included, and referenced only inside of a package; in that case ocamlbuild forgets to link with the library. The problem is well described in this mail. The author of that mail also provides a ocamlbuild plugin that works around the problem.

This seems to be a classical bug, and there is a pending bug report for it here.