Monday, April 28, 2008

Michael Galpin on Scala and XML, and some notes on xml.pull

Michael Galpin has written a nice overview of Scala and XML on developerworks.

I wish he would also have gone into the scala.xml.pull package, but then I really should write documentation for it instead of leaving this to others.

The point of pull-parsing is of course to avoid building up an in-memory representation of the XML. We might want to avoid is because

  • the XML data is just too large, or

  • because we know that we are going to throw most of it away (garbage) and need the performance gain implied by not even allocating the garbage.

There are alternatives, for instance implementing the SAX-like MarkupHandler. However, MarkupHandler and SAX are examples of push parsing, and sometimes dealing with the sequence of events explicitly is more lucid than having some Handler class and managing state of the handler with lots of control variables.

In absence of more elaborate documentation, here is the example from the scaladoc of XMLEventReader.scala comment and from the test, in the hope they provide some insights in what pull parsing is. I should really give a more thorough example, but then there are excellent articles on the net describing pull-parsing (e.g. check out Elliotte Rusty Harold on Stax, from just a couple of years ago). That article also shows IMHO that pulling XML events is even more useful and readable when used with pattern matching. I will let somebody else drive that point home, here now the scala doc.

A pull parser that offers to view an XML document as a series of events.

import scala.xml._
import scala.xml.pull._

object reader {
val src = Source.fromString("")
val er = new XMLEventReader().initialize(src)

def main(args: Array[String]) {
Console.println( // print event for start tag hello
Console.println( // print event for start tag world
// ...

Events are described in file XMLEvent.scala.

Happy XML pulling ;)


Michael Galpin said...

Glad you liked the article! It was fun to write about Scala, though it did take a little bit of twisting of IBM's arm.

The pull parsing in Scala is very nice. It seemed to me to be very similar to a Java StAX implementation, like WoodStox. What kind of advantages does it have over StAX?

Burak Emir said...

The main advantage is IMHO is that it uses Scala features: its events are case classes so one can pattern-match on the events.
OTOH it does not have all the fancy low-level XML events ... this can be considered an advantage (simplicity) or a disadvantage (if you neeed to do fancy entity resolution or whatever).

dontcare said...

You might also want to look at vtd-xml, the latest and most advanced XML processing API available today

Greenwood Jonny said...

Obat tradisional sipilis paling manjur
Obat sipilis Terbukti Manjur
Jual Obat sipilis Herbal Yang terbukti Manjur
Obat sipilis Herbal Yang Manjur Tanpa Ke Dokter
Obat sipilis Paling Manjur dan murah
obat sipilis yang terbukti manjur
artikel obat penyakit sipilis pada pria
obat penyakit sipilis pada pria
obat herbal alami sifilis
obat alami buat sipilis
obat alami buat penyakit sipilis
obat alami untuk menyembuhkan sipilis
obat alami menyembuhkan sipilis
obat alami mengobati sipilis

Dedi Abdullah said...

penyakit memegang alat kelamin
kelamin wanita hamil
obat hpv 11 pada wanita
Tubuh wanita yang akan melakukan proses persalinan
obat kutil tradisional
Cara menghilangkan kutil kelamin wanita dengan menggunakan obattan tradisional
cara mengobati kelamin cewek agar tidak menular dengan cara alami
cara obati kutil di kelamin wanita secara tradisional
Obat alami mengobati kemaluan pada orang wanita
obat penyembuh kutil dialat kelamin
penyebab kutil di vagina dan cara mengobatinya secara tradisional
Tips untuk menyembuhkan kutil kelamin dgan cara alami