Thursday, April 30, 2009

Combinators for Pretty Printers, Part 1

Scala has a flexible and powerful syntax that is very convenient when building libraries. Among the best examples of Scala's syntactic power put to excellent use are the parser combinator libraries in the standard distribution. Code that uses Scala's parser combinators looks remarkably like BNF (the standard notation for describing formal language grammars) but is actually an executable program which parses the language it describes.

Consider the BNF for a simple expression grammar:

expr     ::= sum
sum ::= product { ("+" | "-") product }
product ::= power { ("*" | "/") power }
power ::= factor [ "^" factor ]
factor ::= "(" expr ")" | variable | constant
variable ::= identifier
constant ::= floatLiteral
And the equivalent Scala code using parser combinators:
def expr     = sum
def sum = product ~ rep(("+" | "-") ~ product)
def product = power ~ rep(("*" | "/") ~ power)
def power = factor ~ opt("^" ~ factor)
def factor = "(" ~ expr ~ ")" | variable | constant
def variable = ident
def constant = floatLit
If you want to learn more about Scala's parser combinators, there are several good articles on them by Debasish Ghosh, Daniel Spiewak, Jim McBeath, and Ted Neward.

What I want to write about today are pretty printer combinators. If a parser takes a stream of characters and turns them into an abstract syntax tree, a pretty printer does the opposite: it turns some tree into a string, subject to some constraints (to keep the output "pretty"). I've found myself needing to programmatically generate Scala code, but I want the output to be human-readable as well as valid Scala syntax. There is no equivalent to BNF for pretty printers, but my goal is to create a library of combinators to encode pretty printing rules in a custom syntax.

A Pretty Printer Algebra

Lets put aside concerns about syntax for now and think about the underlying mechanics of pretty printers. Phil Wadler, the same person who popularized parser combinators, describes an algebra for pretty printers in his paper "A prettier printer". Without getting into the details of why this is an "algebra", let's take a look at how Wadler's insights might be encoded in Scala.
sealed abstract class Doc
case object Empty extends Doc
case object Line extends Doc
case class Text (s: String) extends Doc
case class Cons (left: Doc, right: Doc) extends Doc
This code describes four kinds of documents. The Empty document is self-explanatory. The Line document represents a new line. The Text document represents raw text (which MAY NOT contain new lines). The Cons document joins together two other documents. Given these rules, it's trivial to define a layout method that turns a Doc into a String.
def layout(d: Doc): String = d match {
case Empty => ""
case Line => "\n"
case Text(s) => s
case Cons(l, r) => layout(l) + layout(r)
}
Let's see how we might use this to encode a document that contains a block of Scala syntax:
val doc =
Cons(Text("def theAnswer {"), Cons(Line,
Cons(Text(" var i = 42"), Cons(Line,
Cons(Text(" println(i)"), Cons(Line,
Text("}")))))))

layout(doc)
// returns:
// "def theAnswer {
// var i = 42
// println(i)
// }"
These building block are sufficient to encode any document with any layout. The only problem is, well, they're not very useful. For example, we had to manually specify the indentation level of the code inside the brackets for each line independently. Keeping track of details like this is tedious, error-prone, and mechanical; it's just the kind of thing we should let computers do for us. With this use-case in mind, let's add a new kind of document:
case class  Nest  (n: Int, d: Doc)        extends Doc
The Nest document is actually a transformation on an existing document. It modifies the preexisting document by adding n spaces after every line found in the enclosed document. We can encode the same document we did before, with the same final layout, without manually adding spaces but instead using our new Nest document:
val doc =
Cons(Text("def theAnswer {"), Cons(Nest(2, Cons(Line,
Cons(Text("var i = 42"), Cons(Line,
Text("println(i)"))))), Cons(Line,
Text("}"))))

layout(doc)
// returns: same as above
This also means our layout method needs to take into account the new Nest document.
def layout(d: Doc): String = d match {
... as above ...
case Nest(n, Empty) => layout(Empty)
case Nest(n, Line) => "\n" + (" " * n)
case Nest(n, Text(s)) => layout(Text(s))
case Nest(n, Cons(l, r)) => layout(Cons(Nest(n, l), Nest(n, r)))
case Nest(i, Nest(j, x)) => layout(Nest(i+j, x))
}
The second case gives the key function of nesting: it adds n spaces after every newline. The rest of the cases state that nesting does nothing to Text and Empty, that it applies to both documents inside a Cons, and that nested Nests just add together.

There's still the issue of syntax. We'd like something a little prettier than the mess of nested parentheses we have now. Let's add an implicit, a few shortcuts, and a convenient definition for block which puts stuff in braces and automatically idents it.
sealed abstract class Doc {
def ~(that: Doc) = Cons(this, that)
def ~~(that: Doc) = this ~ space ~ that
def <~(that: Doc) = this ~ line ~ that
def ~>(that: Doc) = this ~ nest(2, line ~ that)
}

implicit def string(s: String): Doc = Text(s)
val line = Line
val space = Text(" ")
def nest(n: Int, d: Doc) = Nest(n, d)
def block(d: Doc): Doc = space ~ "{" ~> d <~ "}"
This lets us define our simple document like so:
val doc =
"def theAnswer" ~ block(
"var i = 42" <~
"println(i)"
)

layout(doc)
// returns:
// "def theAnswer {
// var i = 42
// println(i)
// }"
Much better!

We've assembled enough functionality to have a rudimentary pretty printer combinator. With this we can build pretty printers from very simple combinators. There's still a lot of work to be done: higher-level combinators, columns and alignment, and the ability to describe groups of layouts and to choose among them the one that best fits the width of the page. We'll go through all this and more in Part 2 of Combinators for Pretty Printers.

3 comments:

Jonathan said...

In part 2, perhaps you could discuss scala.text.Document. Purely by cooincidence while evaluating Scala, I was browsing the API and randomly looked at that package. What most struck we was that the extent of the scaladoc was a single sentence: "A basic pretty-printing library, based on Lindig's strict version of Wadler's adaptation of Hughes' pretty-printer."

It fills me with dread that a package could get into Scala's standard library with such documentation.

vehusake said...

Very enlightening and beneficial to someone whose been out of the circuit for a long time.

Aromatherapy Treatments

vajofrey said...

Did you heard what Rob Matts said about that?

clomid