Tuesday, December 16, 2008

Growing a language

Today I needed to do some load testing on a web site. All I wanted to do was hammer it with requests and observe how well it held up. I wasn't familiar with any tools that did this, and had no desire to learn them, so I sat down wrote my own mini-framework in less than a hundred lines of Scala. It's nothing special, but I want to share bits of it to show how you can grow Scala to fit your needs.

HTTP GET

At the heart of any web testing is HTTP. Scala runs on the JVM, so I browsed through some Java libraries for fetching HTTP content, but the existing Java approaches were too verbose for my taste. Java is a systems language and it shows, especially in the library design decisions. For this particular application, I don't want to deal with the nitty-gritty of InputStreams or RetryHandlers. I just want to GET. So I borrowed a page from a Ruby library and wrote the following:

object Http {
def get(url: String): Array[Byte] = ...
}

Http.get("http://scala-tools.org/")
Thanks Ruby, that's much cleaner. (Scala's big win over Ruby? Concurrency for Grownups, as opposed to Ruby's fake kiddie toy concurrency. Thankfully configuring commons-httpclient to manage concurrent requests is fairly straightforward.)

Timing is Everything

When load testing, it's nice to know how long things take. In Java, you need to spare a few lines of code whenever you want to time an operation. In Scala, once the appropriate abstraction is written, it's a one liner.
object Time {
def apply[T](action: => T): (T, Long) = ...
def elapsed(action: => Unit): Long = ...
}

val (response, ms) = Time(Http.get("http://scala-blogs.org/"))
val ms = Time.elapsed(Http.get("http://scala-tools.org/"))
A call to Time(...) returns a pair with the result of the action and the time it took to complete that action. I'm using special val syntax to deconstruct the pair and get at its contents. If I don't care about the result of my action, a call to Time.elapsed(...) just returns how long the action took.

How does it work? The type of the parameters has a => in front of it. This means the parameters are call-by-name parameters. That just means the parameter won't get evaluated until I use it's name somewhere in my method. That means Http.get doesn't get called until my method implementation says it should be called. This lets me record the start and end times for the Http.get operation, and report the elapsed time when my method returns. (Try that in Java!)

Getting Repetitive

I want to hammer my poor web server and put it under a lot of stress, so repetition is key. Scala has ways to repeat something several times (notably, (1 to n) syntax), but for several reasons (it's lazy by default, I don't care about the indices) I wrote my own.
def repeat[T](n: Int)(what: => T): List[T] = ...

repeat(5) {
println(Time.elapsed(Http.get("http://scala-blogs.org/")))
}

val randomInts: List[Int] = repeat(10)(random.nextInt)

val latencies: List[Long] =
repeat(1000)(Time.elapsed(Http.get("http://scala-blogs.org/")))

val (latencies, totalTime) =
Time(repeat(1000)(Time.elapsed(Http.get("http://scala-blogs.org/")))
The repeat method takes a number n and an action. It performs the action n times and returns a list (of size n) with the results of the actions.

Repetition, Repetition, Repetition (in parallel!)

Of course, in order to really hammer a server we have to request stuff in parallel, not sequentially. Actors make this really easy.
def repeatParallel[T](n: Int)(what: => T): List[T] =
repeat(n)(scala.actors.Futures.future(what)).map(_.apply)
The call to future fires off an actor that will perform the given action and immediately returns a "promise". Calling apply on the promise blocks the current thread until the actor has finished computing and returned the result of the action we requested. The repeatParallel method launches n actors in parallel, and then blocks until they've all finished computing.

Load Testing

So what does my load testing code look like?
def url = ... // randomly generated url
def timedRequest = (Time elapsed (Http get url))

repeat(5) {
val (latencies, ms) = Time(repeatParallel(1000)(timedRequest))
report(latencies, ms)
}
The report will just print out some basic statistics about the latencies (min, max, average, median) as well as the total throughput (number of requests per second over the whole period).

These six lines of code will hammer my poor little server with 5,000 requests, printing a report after every thousand requests. Every request will hit a randomly generated URL, so I'm making sure to stress test all the aspects of my web service that I care about. I'm not doing any validation on the responses (I've done separate tests for correctness), in fact, I'm throwing the responses away, but I could easily add some validators. (Scala's XML libraries and it's ability to pass functions as objects would make web service validators a zinch.)

The Lesson?

Scala lets you grow the language to mold it to your needs. The guts of my testing logic is about half a dozen lines. The other hundred-or-so lines of code I wrote are hidden away in little utilities here and there, utilities that have extended Scala in tiny ways and made it a more powerful language for the particular task I had. More importantly, all these little utilities are reusable, so now I have a more powerful language to confront whatever tasks I might face tomorrow.

Wednesday, December 3, 2008

Manifests: Reified Types

Reification (n.): When an abstract concept [...] is treated as a concrete 'thing'.[1]
When Java-the-language added generic types in version 1.5, Java-the-virtual-machine (JVM) did not. Generic types are a fiction of the compiler. They exist at compile time, but they are omitted from the generated bytecode and are therefore unavailable at run time. This phenomenon is known as erasure.

By contrast, when C# added generic types in version 2.0, the .NET Common Language Runtime (CLR) added them as well. This allows C# to access information about generic types at run time. We say that C#'s generic types are reified.

Java's type erasure leads to some peculiar limitations. To deal with these limitations, Java programmers often find themselves passing around java.lang.Class<T> objects: concrete manifestations of the otherwise abstract concept that is the type "T".

Starting with version 2.7.2, Scala has added manifests, an undocumented (and still experimental) feature for reifying types. They take advantage of a pre-existing Scala feature: implicit parameters.

To use Manifests, simply add an implicit scala.reflect.Manifest[T] parameter to your method, like so:
def name[T](implicit m: scala.reflect.Manifest[T]) = m.toString
This method prints the name of any Scala type. To use it, invoke the method with some type parameter:
name[Int => Int]
// returns "scala.Function1[int, int]"
When using implicit parameters, you usually have to declare a implicit identifier with the same type as your method expects, in the same scope where the method is called. With Manifests, the compiler automatically injects the implicit parameter for you, as long as it has enough type information to generate a Manifest. Essentially, this is a way of carrying over type information available at compile time into objects available at run time.

An Example: Generic Collections Optimized for Primitives

In Java, generic collections can't be used with primitive types. The boxed types must be used instead. This has implications for both run time performance and memory consumption. Despite HotSpot's best optimization efforts, boxed types can add significant overhead in performance-critical code.

In C#, by contrast, generic collections are optimized for performance with primitive types. This is possible because C#'s types are reified. If your code calls for a List<T>, C# knows at run time what T is and can use the specific implementation of List that is optimized for your T.

Let's see what Scala can do with Manifests and a lot of help from the fastutil library, which provides 1000+ (!) Java collection classes optimized for primitives.
import scala.reflect.Manifest
import it.unimi.dsi.fastutil._

def mkList[T](implicit m: Manifest[T]) = {
(m.toString match {
case "boolean" => new booleans.BooleanArrayList
case "byte" => new bytes.ByteArrayList
case "char" => new chars.CharArrayList
case "double" => new doubles.DoubleArrayList
case "float" => new floats.FloatArrayList
case "int" => new ints.IntArrayList
case "long" => new longs.LongArrayList
case "short" => new shorts.ShortArrayList
case _ => new objects.ObjectArrayList[T]
}).asInstanceOf[java.util.List[T]]
}
Now, invoking the mkList method with the proper type (i.e., mkList[Int], mkList[Double], etc.) will create an ArrayList optimized for that particular primitive type. We can verify that the correct classes are being instantiated by doing a little reflection.
mkList[Int].getClass.getName
// returns "it.unimi.dsi.fastutil.ints.IntArrayList"

mkList[Double].getClass.getName
// returns "it.unimi.dsi.fastutil.doubles.DoubleArrayList"
These optimized collections can have considerable benefits over their unoptimized counterparts in java.util. Just imagine, for example, the memory overhead of storing boxed Bytes instead of unboxed bytes. Storing just the pointer for one boxed Byte takes up at least four (and possibly eight) bytes!

Of course, the same effect could be achieved in Java by passing an explicit java.lang.Class<T> parameter to mkList. The disadvantage is that this imposes a significant overhead on the programmer to create and pass around these Class<T> objects. Scala's support for implicit parameters, and the automatic injection of Manifests by the compiler, makes reifying types on the JVM slightly less painful.

Appendix: Other Methods on Manifests

In this example, the only method we used on Manifests was toString. What other methods are available? Manifests are as-yet undocumented (not in the official Scaladocs), but looking through the source can give us an idea of what's available.

erasure

This method returns the java.lang.Class that corresponds to the run time erasure of the Scala type represented by the Manifest.

<:< and >:>

These methods test whether the type represented by the Manifest is a subtype or a supertype (respectively) of the type represented by the given parameter. As the source warns, the current implementation is merely an approximation, as it is based on the erasure of the two types.

equals

This method tests whether two types are equal. Again, the source warns that the current implementation is merely an approximation, as it is based on the erasure of the two types.

Thanks to David Hall for suggesting the example.