Monday, April 28, 2008

Michael Galpin on Scala and XML, and some notes on xml.pull

Michael Galpin has written a nice overview of Scala and XML on developerworks.

I wish he would also have gone into the scala.xml.pull package, but then I really should write documentation for it instead of leaving this to others.

The point of pull-parsing is of course to avoid building up an in-memory representation of the XML. We might want to avoid is because

  • the XML data is just too large, or

  • because we know that we are going to throw most of it away (garbage) and need the performance gain implied by not even allocating the garbage.

There are alternatives, for instance implementing the SAX-like MarkupHandler. However, MarkupHandler and SAX are examples of push parsing, and sometimes dealing with the sequence of events explicitly is more lucid than having some Handler class and managing state of the handler with lots of control variables.

In absence of more elaborate documentation, here is the example from the scaladoc of XMLEventReader.scala comment and from the test, in the hope they provide some insights in what pull parsing is. I should really give a more thorough example, but then there are excellent articles on the net describing pull-parsing (e.g. check out Elliotte Rusty Harold on Stax, from just a couple of years ago). That article also shows IMHO that pulling XML events is even more useful and readable when used with pattern matching. I will let somebody else drive that point home, here now the scala doc.

A pull parser that offers to view an XML document as a series of events.

import scala.xml._
import scala.xml.pull._

object reader {
val src = Source.fromString("")
val er = new XMLEventReader().initialize(src)

def main(args: Array[String]) {
Console.println( // print event for start tag hello
Console.println( // print event for start tag world
// ...

Events are described in file XMLEvent.scala.

Happy XML pulling ;)