A look at the preparations behind the JVM port, and a progress update

After my last post giving a status update on the JVM porting of NQP and the compiler toolchain Rakudo builds upon, hercynium++ left a comment suggesting that I also blog about the design work behind this effort. I liked the idea, and in this post I’ll attempt to describe it a bit. I can’t possibly capture all of the interesting things in a single post, so if this doesn’t cover aspects that are particularly interesting to anybody reading, let me know and I’ll try and find time to write something on them. :-)

It started long ago…

The first commit to the repository where I’m doing the initial porting work to the JVM may have been back in November, but that isn’t where the journey really started. We’ve known for multiple years now that we would want Rakudo and NQP to target backends besides Parrot. In that time, we’ve had to build a lot of technology in order to be able to build Rakudo at all. Some things we’ve had to build more than once because the first time didn’t produce something satisfactory (where satisfactory means “actually meets our needs”, not “is the most awesome thing ever possible”). Software is, fairly often, as much about learning as it is about building. The more complex the domain you’re working in, there more this applies, and the more likely it is that you’ll have to build one to throw away. By now we’ve thrown away a parser engine, an AST, and about 3 implementations of roles. :-)

Of course, there’s the build/buy thing, where buy in open source really means “buy into”, as in use an existing library. We’ve done a bunch of that too, such as libtommath for our big integer support and dyncall for NativeCall support. But the closer something is to the “core domain” – the thing that makes your product distinctive and special – the less able you are to use something off the shelf. Parsing Perl 6 really needs to be done with a Perl 6 grammar, using Longest Token Matching. Its object system really needs something that supports meta-programming, representation polymorphism and gradual typing. Getting BEGIN/eval right and supporting compilation and having the possibility for lexical and anonymous types and packages, which can be dynamically constructed and exported, also left us with something to build (this is the work that led to bounded serialization).

Eventual portability has been a design factor in what we’ve built for quite a while. While the only 6model implementation to have become complete enough to support all of Rakudo’s object needs so far is the one running on Parrot, the initial prototypes of 6model were done on the .Net CLR. This was in no small part to make sure that there was a feasible way to implement it on such a VM. Granted, what I actually discovered was a less than awesome way to build it on the CLR (and what I’m doing on the JVM this time around fits in far better with the JVM’s world view). But it was a design consideration from the start.

When we updated PAST, the previous AST representation, to QAST (Q is just P++ :-)) then once again portability was a concern; the VM specific bits were all placed under a QAST::VM node type. This makes it easy to escape to the underlying VM where needed or where it is most expedient, but it’s both explicit and done in a way that allows specification of what to do on other backends. As part of this work we also build support for the nqp::op abstraction directly into the AST format. The nqp::ops form an opcode set independent of any particular VM. These get mapped as part of turning a QAST tree into code for the target backend (thus meaning there’s no overhead for them in the generated code). They may map directly to the VM’s opcodes, a function or method call in the VM, or do some more complex code generation.

The other important piece of the groundwork for portability is that we implemented Rakudo in a mixture of Perl 6 and NQP, and over time have got NQP to the point where it is also written in NQP (and thus can compile itself). This has been a gradual thing; the earliest NQP editions were written directly in PIR, and with time we’ve migrated those bits to NQP – usually at the same point we were doing other improvements already. For example, pmichaud++ wrote the latest edition of the regex engine, with LTM support, in NQP. PAST, written in PIR, was replaced by QAST, written in NQP. And 6model’s meta-objects were, from the start, expressed in NQP too. It’s pretty neat that NQP’s definition of things so fundamental as classes is actually written in NQP. It means that we don’t have to port classes and roles, just the primitives they are made out of.

So digging into the JVM port itself…

With all of the above mentioned things in place, it was possible to form a fairly concrete roadmap for porting NQP, then Rakudo, over to the JVM. Being comfortable that the result would enable us to get a fully functional Rakudo on the JVM and an idea of how to get there was important. It’s easy to implement a subset, but if it isn’t factored in a way that lets you do the rest afterwards then you’re in bother and it’ll be time for re-work. My hope was that, after some years of learning about things that don’t work and replacing them with things that do, this time much of the re-work could be avoided. A starting point for this was taking a good look at the JVM’s instruction set, as well as considering what JVMs are typically good at doing.

The JVM is a stack machine. This is in contrast to Parrot, which is a register machine. Thankfully, this is mostly a code generation detail rather than being especially deep. As well as the stack, a given method can have local variables (note that everything that contains code on the JVM is called a method, even subroutines, but they call them static methods because it sounds so much more OO :-)). These can hold longer-lived things, so in a sense could be used a bit like Parrot registers. In general, the code generation from QAST uses the stack where possible and falls back to locals where needed. This is because stack usage fits well with what a JVM expects to be doing, and also what its bytecode format expresses most compactly.

Locals have an important restriction: they can only be accessed directly in the scope where they are declared. There is no notion of nested methods at the JVM level. This means that locals are not suitable for implementing lexical variables. Thankfully, there is a well established solution: promote such things onto the heap, keeping them in some kind of invocation record. This is what happens with closures in C# on the CLR, for example. There are a bunch of ways to do this transform, with various trade-offs. I’ve done one that was fairly fast to implement, but also enables lookup by array indexing rather than needing a named (hash) lookup in the vast majority of cases. As well as an array index being algorithmically cheaper than a hash lookup, the JVM supports array indexing natively in its opcode set, but not hash lookups.

Talking of allocating things on the heap brings us nicely to think about objects. JVMs are very good at fast allocation and collection of objects, because they have to be; there is no stack allocation in Java of anything non-trivial. Of course, that doesn’t mean the VM can’t do escape analysis and stack allocate under the hood. That the VM is really good at object allocation and GC means we don’t need to worry too much about lexicals leading to invocation records on the heap; there’s plenty of performant uses of this approach in the wild. Furthermore, most records will be very short lived, nicely meeting the generational hypothesis (which is that most objects are either short lived or long lived, and so we can optimize separately for each through doing generational garbage collection).

While invocation records are relatively internal, of course NQP and Perl 6 involve lots of user-visible objects. From the things you think about as objects (and call “new” on) to things like scalar containers, strings, boxed integers and so forth, both NQP and Perl 6 lead to plenty of allocations. While some things are quite predictably shaped, most come from user class definitions. Ideally, we’d like it if a Perl 6 class definition like:

class Point {
    has $!surface;
    has num $!x;
    has num $!y;
}

Was to use memory similarly to if you wrote something in Java like:

class Point {
    private Object surface;
    private double x;
     private double y;
}

At the same time, we know that the JVM’s idea of type is some way off the Perl 6 notion of type, so we can’t simply turn Perl 6 classes into JVM classes. Thankfully, 6model has from the start been designed around the idea of representation polymorphism. Really, this is just a separation of concerns: we decouple the details of memory representation and access from the notion of being a type and dispatch. The former is handled by a representation, and the latter two by a meta-object. One early but important observation I made when designing 6model is that the representation will always couple closely to the underlying runtime (and thus would need to be implemented for each runtime we wanted to run on), whereas the other bits can be expressed in a higher level way, with the common cases made efficient by caching. Thus there’s no reason to re-implement classes and roles per VM, but there is a need to provide a different, VM-specific way to do P6opaque (the default representation for NQP and Perl 6 objects).

The C implementation of P6opaque on Parrot works by calculating a memory layout – essentially, figuring out a struct “on the fly”. What’s the JVM equivalent of that? Well, that’s just creating a JVM class on the fly and loading it. Is the JVM capable of doing that? Sure, it’s easily dynamic enough. Furthermore, once we’ve done that little bit of bytecode generation, it’s a perfectly ordinary JVM class. This means that the JIT compiler knows what to do with it. Does doing any of this require changes to the meta-objects for classes in NQP and Rakudo? No, because these details are all encapsulated in the representation. Things like these are good signs for a design; it tends to show that responsibilities are correctly identified and segregated.

So, how’s the cross-compiler going?

Things are going nicely. Having got much of the way there with the NQP MOP, I turned to ModuleLoader and started to get together a basic setting (the setting being the place where built-ins are defined). With those in place, work has moved on to trying to pass the NQP test suite.

The build process cross-compiles the MOP, module loader and setting. To run the test suite, each test is taken and cross-compiled against those, then the result of compiling it is run on the JVM. The fact we invoke NQP, then invoke the JVM twice in order to run each test gives quite a bit of fixed overhead per test; once we have NQP itself (that is, the compiler) cross-compiled and self-hosting on the JVM it’ll be down to a single invocation.

The NQP test suite for the NQP language itself consists of 65 test files. 3 of them are specific to Parrot, so there’s 62 that are interesting to make run. As of today, we pass 46 of those test files in full. While some of those passing tests exercise relatively simple things (literals, operators, variables, conditionals, loops, closures), others exercise more advanced features (classes, roles, basic MOP functionality, runtime mixins and so forth). Of the 16 test files that remain, 9 of them depend on regexes or grammars. Getting those to run will be the focus of the next major chunk of work: porting the regex compiler and getting the NFA, Cursor and Match classes to cross-compile (which will involve some portability improvements). The other 7 relate to non-trivial, but smaller-than-grammars features (for example, 2 are about multiple dispatch, which I’m working on porting at the moment).

It was only three weeks ago when I wrote that the JVM port did not even manage “hello world” yet, and that I had little more to show than something that could turn a range of QAST trees into JVM bytecode. Three weeks later and we’re running around 75% of the NQP test files, and timotimo++ was even able to feed an almost unmodified Levenstein distance implementation written in NQP to the cross-compiler and have it run on the JVM.

So, mad amounts of coding have taken place? Well, only sorta…I’ve taught two three-day classes for $dayjob in the last couple of weeks also. :-) Mostly, progress has been fast now because the foundations it is building upon have proved fairly solid. For the backend, this is in no small part down to having grown a test suite for the QAST to JVM phase of the work as it progressed. The fact we could happily couple this new backend to the existing NQP parser is thanks to the compiler being structured as a pipeline of stages, each one strongly isolated from the others, just passing a data structure between them. In my teaching work, I often encourage automated testing and talk a lot about the importance of enforcing carefully chosen, well-defined, information-centric boundaries between components. It’s pleasing to see these things paying off well in my Perl 6 work also. :-)

This entry was posted in Uncategorized. Bookmark the permalink.

15 Responses to A look at the preparations behind the JVM port, and a progress update

  1. Brandon says:

    This is all very exciting! It truly is amazing all that you’ve done over these last few years.
    Jonathan, do you think Perl 6 code in general will run faster on jvm than on parrot? (Any predictions on how much faster?)

    • It’s a bit early to make good predictions on performance. So far we can only really talk about NQP code rather than Perl 6 (NQP can make assumptions that Perl 6 can’t), and the only thing we’ve got to look at are micro-benchmarks, which are not always great at indicating how things will pan out for practical code. I wouldn’t claim that those have been done in a very scientific way either. :-)

      Anyway, of what we do have, the Levenstein code I mentioned in the blog post was reported to run a few times faster on the JVM, a “make a hundred million calls” benchmark came out almost 15 times faster on the JVM (but maybe it’s just very good at optimizing no-ops :-)), and today nwc10++ posted on the perl6-compiler mailing list that he’d observed a factor of 8 speedup over NQP on Parrot with a fib benchmark. The final one was also interesting since he compared it to Perl 5 running the same thing, and NQP on the JVM came out ahead (by a factor of about 1.5).

      It’s hard to know how this will generalize. Also, there’s a couple of notable things that the JVM port doesn’t support yet that will introduce potentially global overhead. On the flip side, my focus hasn’t been “make it crazy fast” so far, but “make it work”. While I’m quite happy with the overall design of the port, there are aspects of the code-gen that I know suck, invocation could be decidedly cheaper and there are no doubt places where we could do things in a way the JVM can better optimize.

      tl;dr, it looks promising for being faster, but the factor is hard to predict and will no doubt vary significantly by workload.

  2. Brandon says:

    That’s some very good signs. Thank you for sharing that.

  3. someperson says:

    How is switching to yet another VM(jvm) going to speed up the completion of the implementation of p6 ?

    • It’s hardly “yet another”; to date Rakudo has only run on Parrot. Also, “switching” is a misunderstanding too; we’re *adding* the capability to run on the JVM.

      Anyway, on how it helps us get there sooner… The way Perl 6 progresses in terms of its language design is thanks to people using the bits of it that are implemented so far and providing feedback. While a great deal of the language spec is now very stable, that has come after a great deal of feedback. We’ve some areas where we’ve not been able to build and try out certain things on Parrot, a big one being concurrency. Parrot lacked any kind of threads for years, and what it has now is not a great basis for building much on, sadly. That still leaves aside async operations, which should be considered in with all of this. Meanwhile, the JVM can offer well tested, widely used, battle-hardened building blocks for looking into these things, which will allow us to focus on the Perl 6 aspects of things without having to worry (much) about hitting VM bugs.

      The other thing is that Rakudo on Parrot is pretty slow. It’s not that we haven’t put a bunch of time into improving performance; things have got better, but not sufficiently better for people to not quickly feel the slowness when they try it. Meanwhile, modern JVMs are widely known for their highly optimized JIT compilation. Additionally, it’s a confidence thing: LOADS of important things are running on the JVM. More confidence = more users = faster convergence.

      Hope this somewhat answers both questions.

  4. sameperson says:

    Why isn’t Parrot good enough ? There has been so much effort put into it. This surely feels like a bit of a non-delightful end for Parrot.. as p6 was its primary client…

    • gerdr says:

      I’ve only been involved with Parrot peripherally (some build system stuff, fixed a gc bug, worked on an m0 prototype or three), but from where I’m standing the Parrot codebase is doomed: It’s just not competitive.

      Reini Urban is currently looking into _why’s potion as a backend for Perl5 (and possibly Perl6) and posted the following numbers:

      parrot examples/benchmarks/fib.pir 28  1.746s
      perl   examples/benchmarks/fib.pl  28  0.439s
      potion examples/benchmarks/fib.pn  28  0.013s
      
      parrot examples/benchmarks/fib.pir 40  3m36.447s
      perl   examples/benchmarks/fib.pl  40  2m19.752s
      potion examples/benchmarks/fib.pn  40  0m3.512s

      That’s just a single datapoint, and it might not be that bad if Perl6 semantics mapped directly to Parrot semantics. They don’t, and it would be a major undertaking to make it so.

      My thoughts on the issue as of five months ago basically still hold.

      Personally, I’m content with sitting back and watching for a few months. If Rakudo/JVM works out (order of magnitude performance improvement), I’d be fine with letting Parrot quietly die. If it doesn’t, but rurban’s work does, it might be a good idea to look into it as a platform for Parrot2.

    • Moritz says:

      Parrot lacks a JIT compiler, for example.

    • carl says:

      If Parrot is good enough, what’s the problem with an additional backend?

      Also, would it be too much to ask when you comment veiled criticism on the blog of a developer reporting great progress on one of his projects, that you reveal your real identity? You gutless coward.

  5. gerdr says:

    I’ve only been involved with Parrot peripherally (some build system stuff, fixed a gc bug, worked on an m0 prototype or three), but from where I’m standing the Parrot codebase is doomed: It’s just not competitive.

    Reini Urban is currently looking into _why’s potion as a possible backend for Perl5 (and possibly Perl6) and posted the following numbers:

    parrot examples/benchmarks/fib.pir 28  1.746s
    perl   examples/benchmarks/fib.pl  28  0.439s
    potion examples/benchmarks/fib.pn  28  0.013s
    
    parrot examples/benchmarks/fib.pir 40  3m36.447s
    perl   examples/benchmarks/fib.pl  40  2m19.752s
    potion examples/benchmarks/fib.pn  40  0m3.512s

    That’s just a single datapoint, and it might not be that bad if Perl6 semantics mapped directly to Parrot semantics. They don’t, and it would be a major undertaking to make it so.

    My thoughts on the issue as of five months ago basically still hold.

    Personally, I’m content with sitting back and watching for a few months. If Rakudo/JVM works out (order of magnitude performance improvement), I’d be fine with letting Parrot quietly die. If it doesn’t, but rurban’s work does, it might be a good idea to look into it as a platform for Parrot2.

  6. Brandon says:

    It seems to me that adding support for compiling Perl 6 to JVM fits perfectly into the Perl philosophy of “There Is More Than One Way To Do It”.

Leave a reply to jnthnwrthngtn Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.