Switch statements on steroids: Scala pattern matching

Scala provides a powerful pattern matching feature. Pattern matching enables us to match on much more than just ordinal values. It can be thought of as a switch statement on steriods. This post only scratches the surface of what can be done with pattern matching in Scala, but should provide you with a taste of what’s possible.

Matching based on types

The first example demonstrates how Scala can match based on type. Object of different types are fed into the match expression. Note how objects can be identified based on their type (in this case Int, String, Map):

    //match by type
    for (item <- List(66, "StringsAreGreat", Map.empty[Int, String], 22f)) {
      item match {
        case n: Int => println("Integers are awesome!")
        case str: String => println("Strings are the best")
        case m: Map[Int, String] => println("Ooh a map! buried treasure?")
        case _ => println("Something unexpected")
      }
    }
	

Matching on regular expressions

Matching with regular expressions enables you to find not only Strings which match the regex, but also provides access to group matches. Note how (_,_,octet3,_) maps onto the four IP address octets in the regex. We use ‘_’ as we only care about the actual value of the third octet :

    //match with regex
    val Ipv4Addr = """(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})""".r //yes there are better regexes for this
    val TwentiethCentury = """MCM.*""".r
    for (str <- List("172.99.6.4", "MCMXV", "woof")) {
      str match {
        case Ipv4Addr(_, _, octet3, _) => println ("Found an IPv4 address with a third octet of "+octet3)
        case TwentiethCentury() => println("Received a date in the twentieth century as Roman numerals")
        case _ => println("Your input doesn't match any of our patterns")
      }
    }
    

Wildcard matching with case classes

One of the most powerful ways to use pattern matching in Scala is to use case classes. Case classes enable matching based on selected attributes of objects. Note how we can filter for objects with certain attributes, using ‘_’ for cases where we don’t care what a value is:

    case class Person(name: String, hairColour: String,  age: Int)

    //matching with case classes
    for (item <- List(Person("Fred", "Brown", 44), Person("Jessica", "Blonde", 23), Person("Emma", "Blonde", 25))) {
      item match {
        case Person(_,"Brown",_) => println("A person with brown hair found. Welcome")
        case Person(name, "Blonde", 25) => println("25 year old blonde. Welcome "+name)
        case Person(name, hair, _) => println("An unexpected visitor with "+hair+" hair. Welcome "+name)
      }
    }
	

It should be clear, even from these short examples, that pattern matching allows for much more expressive code than using a traditional switch statement.

Free online books for learning Scala

So you’ve probably heard about Scala and would like to learn it. How do you go about learning this language? There are a number of free Scala tutorials and books on the net. I’ve included links to 3 excellent free online books on Scala. Enjoy!

O’Reilly book on Scala: http://ofps.oreilly.com/titles/9780596155957/

1st Edition of Martin Odersky’s book on Scala: http://www.artima.com/pins1ed/

Core chapters from Scala for the Impatient (email address required): http://typesafe.com/resources/scala-for-the-impatient

Note: I highly recommend IntelliJ community edition + Scala plugin as your IDE of choice for learning Scala.

 

Death by boilerplate: Iterating over lines in a file in Java (vs Scala, Ruby, Groovy)

How to iterate over lines in a file in Java, Scala, Ruby and Groovy

Iterating over the lines of a small text file is a pretty common (and simple) operation. Given how routine this task is you would think that it would require minimal code. Java is quite famous for requiring lots of code to get the job done. Using the standard libraries in Java 6 we typically end up with code looking something like this:

Java 6:

     try {
            BufferedReader reader = new BufferedReader(new FileReader("blah.txt"));
            try {
                String line = null;
                while ((line = reader.readLine()) != null) {
                    System.out.println(line);
                }
            } finally {
                reader.close();
            }
        } catch (IOException ioe) {
            System.err.println("oops " + ioe.getMessage());
        }
    }

Every Java programmer has at one stage or another had to write code that looks a lot like this. I’ve probably written code looking a lot like this many, many times. Even for someone who can type very quickly it’s a lot of boilerplate. It’s not particularly complicated code, just very verbose. While there are 3rd party libraries (Apache commons,io) that make this much simpler many (most?) developers will rather just use the standard libs.

This may seem perfectly reasonable to a seasoned Java dev, but for anyone coming from Python, Ruby or even C# the amount of code required for this simple task seems excessive. Java 7 improves on this somewhat with ARM (Automatic Resource Management) and by providing utility methods in java.nio.file.Files. I was originally going to write a blog post about how useful this class is for doing file reads with minimal boilerplate code, but alas it still requires some extra plumbing. The readAllLines(Path path, Charset cs) method cuts down on much of the boilerplate. Unfortunately it still requires a path object to be created, along with a charset (no option to use the default on the local system). This results in code along these lines:

Java 7:

    try {
        for (String line : Files.readAllLines(new File("blah.txt").toPath(), Charset.forName("UTF-8"))) {
            System.out.println(line);
        }
    } catch (IOException ioe) {
        System.err.println("oops " + ioe.getMessage());
    }

If we compare this with virtually any well known modern language the difference is obvious. Consider for example the code required to perform this tasks in Scala, Ruby or Groovy:
Scala:

Source.fromFile("blah.txt").getLines.foreach { println }

Groovy:

new File("blah.txt").eachLine { line -> println(line) }

Ruby:

IO.foreach("blah.txt") { |line| puts line }

Notice how much less boilerplate is involved. It only takes one line of code to perform the task. The question which needs to be asked is this: after more than 15 years, reading from a file in Java is still more painful than virtually any other modern language. This would be easy to fix by providing sensible methods with defaults. Imagine something more like:

    try {
        for (String line : Files.readAllLines("blah.txt")) { //what the method would look like with a simpler interface...
            System.out.println(line);
        }
    } catch (IOException ioe) {
        System.err.println("oops " + ioe.getMessage());
    }

Please Oracle, add something sensible like this to JDK 8.

The Big Three – Scala, Clojure and Groovy

There have recently been two large JVM language polls (poll1, poll2). These polls have yielded some very interesting data. The results of the two polls differ due to differences between the audiences from which the voters were drawn. Clojure fared particularly well (ahead of Scala and Groovy) in the poll I ran due to many of the voters coming from the LISP friendly HackerNews community. The DZone poll which drew a slightly larger number of voters (primarily Java devs. from the dzone community) favoured Groovy, with Scala in second place, followed by Clojure. One thing which stands out in the results of both polls is the clear separation between “The Big Three” JVM languages (Scala, Clojure and Groovy) and the rest. This “Tier One” group represents alternative JVM languages which have garnered the most support among developers.

In order to get a better picture of popularity spanning both polls I combined the results and plotted a chart. “The Big Three” and JRuby (an honorable mention) are included in the chart. In the combined vote counts Scala, Clojure and Groovy are closely matched:
"The Big Three" JVM languages

Popularity amongst developers does not always necessarily translate directly into commercial adoption. Indeed.com draws its data from a very large number of job websites and as such is an ideal source of data regarding commercial adoption. Running The Big Three languages through their job trends system yields some interesting results:the big three commercial adoption
In the data both Groovy and Scala are showing signs of significant commercial adoption, with Clojure trailing. This is consistent with the DZone poll results and adds support to my theory that Clojure support draws heavily from hobbyists and lisp hackers rather than commercial organizations. Groovy comes out on top in this chart. The big three contains two dynamically typed languages (Groovy, Clojure) and one statically typed language (Scala).

The most encouraging outcome of the last few years has been the flourishing ecosystem around new JVM languages. A decade ago the CLR was being proclaimed as *the* runtime to support multiples languages. Thanks to the community the JVM is looking more and more like the preferred target for new languages, innovation and research.

Which alternate JVM language do you prefer?

Over the last few years the JVM language landscape has evolved rapidly, with a number of very interesting new languages appearing, some of which are starting to enter mainstream usage. In order to get some numbers/data regarding preferences of Java programmers I’ve decided to run an opinion poll. Please vote below for the alternate JVM language you like the most. I’ve included all of the better known JVM languages (and even a few that aren’t so well known). The ordering of the options is randomized. [UPDATE: It seems that voters coming from Hacker News favour Clojure over Scala by a factor of 1.4, whereas those from a more Java-centric sources (dzone) favouring Scala and Groovy. This would explain the difference in results between this poll and the dzone-only poll posted in the comments section. Currently Scala and Clojure are quite closely matched in my poll.]

Excellent Scala videos on YouTube

YouTube is a surprisingly good source of introductory content on Scala. I’ve linked to some of the most interesting videos below:

Excellent video introducing Scala. I think the presenter is Venkat Subramaniam:
http://www.youtube.com/watch?v=LH75sJAR0hc

Martin Odersky discussing upcoming features in Scala 2.10 and Scala Adoption:
http://www.youtube.com/watch?v=qqQNqIy5LdM
Accompanying slides can be found here:
http://mrkn.co/s/video_martin_odersky_what_s_next_for_scala,575/index.html

Scala Adoption

Word on the street is that Scala is increasingly being used by Java developers as a “Better Java”. As a longtime Java developer I find this language very appealing, something which has many of the same strengths as Java, but which is more expressive. In order to get an idea of Scala adoption in the wild I did some basic research and found some interesting results.

The job search site Indeed.com has some very neat trend identification tools. The graph below shows the rapid up-tick in Scala adoption in recent times:

The absolute percentage of Jobs listing mentioning Scala is still low, but the growth trend is clear.

Since I’m curious to find out who is using Scala I combined the list of organisations known to be using Scala from the official Scala website with some sleuthing around the job ads. I combed through the job ads on indeed.com listing Scala and put together a list of well known organizations listing Scala in their job ads*. Combined with the organizations listed on the Scala website I get the following (in no particular order):

  • HP
  • Twitter
  • Google
  • eBay
  • VMware
  • Salesforce.com
  • Oracle
  • Amazon
  • Accenture
  • eHarmony
  • Coca-cola
  • tumblr
  • University of California Berkeley
  • McGraw Hill
  • Heroku
  • Foursquare
  • LinkedIn
  • the Guardian
  • Morgan Stanley
  • Credit Swiss
  • UBS
  • HSBC
  • AT&T
  • University of Arizona
  • General Dynamics
  • Thomson Reuters
  • Deutsche Bank Global Technology
  • Huawei
  • Standard and Poor’s
  • AOL
  • Yammer

*Note that the Job ad simply needed to list Scala to be included here, which doesn’t mean that the organization is necessarily using it extensively (or even at all, in a few cases ads would say “knowledge of a functional language like erlang, Scala or Haskell an advantage” or similar). Even with that caveat in mind the list of organizations taking notice of Scala is very impressive indeed!