mashiblog

Friday, January 5, 2007

Java Closures on JavaPolis 2006

Closures the def

The cherry on the cake during JavaPolis 2006 was the talk Neal Gafter gave on Closures you can see the slides with the audio stream here. For Dolphin Java 7 it seems that finally Java would have closures. So what are closures, closures are blocs of code that can reference variables defined in the enclosing scope and can be passed as arguments of a method or a function. Ehm if this is the first time you hear about colsures the phrase above is anything but clear. So here's an example in Ruby given by Charles Miller confluence lead developper on his blog

IO.foreach("foo.txt") do |line|
if (line =~ /total: (\d+)/)
puts $1;
end
end

So what does this code do it opens a text file, it passes each line of the file to a block of code that is the closure, the code in the closure prints a number upon matching a line with 'total:'. If we leave the iteration aspect conveniant as it is, it would appear that somthing is missing in the code above, usally if you open a file you have to close it after being done with it. Enter closures, the method forEach in class IO takes care of that task for us. It takes a filename as first argument and a block of code as second argument, it opens the file, passes each line of the file to the block of code delimited by do |line| and 'end', then closes the file after procesing. So what does this actually buy you, in Java each time you open a file you have a try catch finally, and it kind of obscures the code, and in the longer run you get a bit tired of typing the same boiler plate code again and again.

The case for closures
I believe the case for closure was beautefully made by Steve Yegge in his Language Trickery and EJB , the gist of it is that one of the main aspects in a programming language design is the choice of language constructs, what elements should be part of the language syntax and what should be left to APIs, the exmple given in a very thorough manner is the foEachr loop construct introduced in Tiger. Back in 2002 Charles Miller had written "Java really needs a convenient, concise way to do closures (or blocks, for Smalltalkers)" in response to the addition of the forEach loop under JSR 201. Neal said as much in his presentation, the forEach loop wouldn't have been added to the language had closure been implemented earlier. Closures basically gives you language extensibility, in ruby for example much of the looping is done through closures rather with for loop language construct, some crude regexp with grep would yield 33 hits of instance of "for stuff in someCollection" against some 561 occurances of .each methods calling blocs in rails 1.1.6 code base. Even more euh 'pure' is Smalltalk, in SmallTalk even conditional and looping constructs are actually method calls, and some smalltakers are quite eager to remind the rubyists of their impurity and their somewhat shameful Perl heritage as in
here
. More in mainstream languages we can see examples of these trade-offs being made, in C# the powers that be in Redmond decided that this closing resources thing is important enough and that users code should not littered by try catch finnaly stuff whenever they open files, but well C# as Java didn't have closures, so a language construct was needed and the keyword "using" was deemed reasonable enough even if it is also used as an import facility for namespaces.

Depending on whom you are talking to, Java has already closures, yes they would be the anonymous inner classes. The thing is that there two major closure usecases, termed synchronous and asynchronous on Neal Gafter's blog, one that mimics control like statements such as the example given above, the seconde case being callbacks as for example in response of a swing event such as when a button is pressed, and in the former case anonymous classes are good enough albeit whith an ugly syntax. And that's how quite a number of people feel, among them Joshua Block, Doug Lee and "Crazy" Bob Lee came up with a proposal to do away with the syntax ugliness, but IMHO the proposal is not that enticing, it introduces some weirdness of its own, the public thing is quite disturbing. In his talk Neal Gafter gave further examples of closures versus language specific constructs. Multithreading was built into java from day one, and if you had a chunk of code that needed to be accessed by threads in a serial manner you have the "synchronized" keyword. Now, synchronized gives no garentees as to what threads will be executed, it could well be that under heavy contention, some threads are starved and never served, a way to solve it would be using Locks provided in the java.util.concurrents.Locks such as ReentrentLock which has can be passed a boolean flag in the constructor to indicate fairness so that threads would be served on first come basis. No closure article is complete without the reference to this mail from Guy Steele ( via Martin Fowlers's article on closures), Java forefathers are eminent lispers, and closures were always on the radar but they were deemed not crucial enough to the java clientele. I hope that by now, you are a bit enticed about closures for Java, please do read Steve's and Charles's entries and watch Neal's presentation as they do a far better job of stating things than my incoherent babling.

Closures for Java proposal

Now for some of the details as to how closures are to be implemented in Java. Currently the specification has reached version 0.4 and it is available here. Things are still at an early may change considerably. As I have said earlier there are two major usecases for closure, a control statement like case dubbed synchronous, and a callback like case termed asynchronous, where you pass a block of code to be executed later in another lexical scope and very possibly in a different Thread, it is the case for which anonymous inner classes are also appropriate. First a disclaimer, this is based on the 0.4 specification functional version and my understanding of it, which may well be very distincts, so please read the following very critically and please point out to any discrepencies you may uncover. Now for some examples:

Synchronous case: If we were to write a smilar utility as the IO.foreach in ruby using the control like statement syntax it will be similar to


eachLine(String line : new File("foo.txt")){
//do stuff
return result;
}
or with the functional like syntax

ecahLine(new File("foo.txt"), { String line => /* do stuff*/ result });

Asynchronous case: This how you would write it things with an inner anonymous class:
button.addActionListener( new ActionListener(){
public void actionPerformed(ActionEvent event){
doSomeThing(event); }});

With the current closure proposal you would write it:

button.addActionListener({ActionEvent event => doSomeThing(event); });

or in the control like statement syntax
button.addActionListener(ActionEvent event :){
doSomeThing(event);
}

Let's examine first the synchronous case, there are actually two syntaxes for invoking closures, the first one is actualy the prefered one for calling synchronous like cases, why you might ask it's because they look like language statement, this is of course deliberate because closure in the synchronous case should feel like control like language constructs, the philosophy behind this choice is exposed by Neal Gafter on his blog here. The gist of it is that return, break, continue and this keywords should have the same meaning as when using control like language constructs, so if we are inside a for loop and we do a return, we return from the nearest enclosing method which is not the case with anonymous inner classes. This is really the big difference between closures and anonymous inner classes, simiraly you are allowed to access non-final variables from the enclosing scope.

The second syntax preview the function type notation. The specification introduces function types her's an example:

{ int, int => int}
is a function type that takes two ints and return an int. you can write
{ int, int => int} plus ={int x,int y =>  x+y }
Function types are mapped at compile time to generic interfaces with a single method. They can similarly be converted to any interface with a single method that have an equivalent set of arguments and return type, for the exact semantics of the closure conversion please refer to the Closure conversion section in the spec. The return type of a closure is inferred from the last statement in the case above x+y, notice the absence of ';' if we add a semi colon then return type is void. if instead of x+y we had throw new AssertionFailureError() the return type would be java.lang.Unreachable yes it's a new type and there is a nice explanation on Remi Forax's blog

In the asynchronous case the closure way of the doing things will amount to syntactic sugar, sweet as it might be, as all the restrictions on anonymous classes will and actually should apply, indeed break, continue and return are simply meaningless in the context of a callback as I've seen Neal Gafter repeatedly argue during Javapolis BOF session. In the synchronous case you would be returning from an enclosing method that you are writing, while in the asynchronous case, the callback you provide will be called by the host framework on which you have no control, and have no way of knowing beforehand in which way your callabck will be called, and really returning from the encolosing framework method you know very little if anything about does not make much sense. Asynchronous closures would implement a marker interface RestrictedClosure that will enforce that the closure will behave similarly to anonymous inner classes.

I believe the duality in the syntax is very deliberate, the first one is for control like statements and it implies that the closure will be executing "subsequently" or in a "synchronous" manner if you will, thus very similar to language statements, while for the callback case you would use the functional notation, as it carries the meaning better, it says that I'm passing this function to be called later when a certain condition is fulfilled. If you actually use the control like syntax in the callback case, and the method registering the callback doesn't take other arguments than the closure, you can end up with a smiley face in your code, which was literally what I got while typing this on confluence!

Another very interesting aspect is Exception transparency, see if we take the while loop statement, you are free to put in whatever statements inside the brackets, and these statements may throw any checked Exceptions that the enclosing code should catch or declare appropriate throws clauses. Well closures in the synchronous case should do no less, remember they should behave as control like language statements, but the problem is that when you are writing your method that will take a closure as an argument, you can not know beforehand what Exceptions the closure will be throwing, the solution would be using a Generic throw declaration, Neal explains it quite nicely in his artima interview. The details for Exception transparency are not yet entirely fleshed out, the spec being still in its 0.4 revision.

A last word on Performance, returns would be implemented via Exceptions, apparently no need to worry, there is nothing specific that needs to be done at the JVM level, from what I could make out of Neal's response in the BOF, Hotspot should be able to optimize the repeated invocations should that be needed.