NQP on JVM bootstrapped, soon will land in NQP master

The work to get NQP running and bootstrapped on the JVM has reached an interesting milestone, and I thought I’d take a few moments from working on it to talk about what has taken place since my last post here, as well as taking a look at what will be coming next.

Getting NQP on JVM Bootstrapped

When I last posted here, I had reached the point of having an NQP cross-compiler to the JVM that covered quite a lot of NQP’s language features. Cross-compilation meant using NQP on Parrot to parse NQP source, build an AST, and then turn the AST into a kind of JVM assembly language. This in turn was then transformed into a JVM bytecode file by a small tool written in Java, which could then be executed on the JVM.

Since the last post, the cross-compiler became capable of cross-compiling NQP itself. This meant taking the NQP source files and using the cross-compiler to produce an NQP that would run on the JVM – without depending on Parrot. This also enabled support for eval-like scenarios. I reached this stage last month; the work involved tracking down a range of bugs, implementing some missing features, and doing a little more work to improve NQP’s portability. So, we had an NQP that ran on the JVM. Port done? Well, not quite.

A little vacation later, I dug into the next stage: making NQP on the JVM able to compile (and thus reproduce) itself. While I’d already implemented the deserialization used to persist meta-objects (representing classes, grammars and so forth), for this next step I had to implement the serialization side of things. Thankfully, there are a bunch of tests for this, so this was largely “just” a matter of working through making them pass. Finally, it was time to work through the last few problems, and get NQP on JVM able to build the NQP sources – and therefore able to compile new versions of itself. I decided to do this work as part of moving the JVM support into the main NQP repository.

Since we’ve only had NQP running on one backend (Parrot) up until now, certain aspects of the repository structure were not ideal. Before starting to bring in the JVM support, I first did a little bit of reorganization to segregate the VM specific components from the VM independent ones. Happily, much of NQP’s implementation falls into the latter category. Next came gradually building up the bootstrapping build process, working a file at a time, tracking down any issues that came up. This was a little tedious, especially given a couple of the problems were separate compilation leakages (where things from the running compiler would get confused with the version of the compiler that it was compiling). It was pretty clear that this was the problem from the errors I was seeing, but such problems show up long after things actually go wrong, requiring some careful analysis to hunt down. With those leaks plugged, and a few other relatively small bugs fixed, I had a working NQP on JVM…compiled by NQP on JVM.

The work from there has been to fill out the rest of the build process, adding in the second bootstrap stage and the test targets. The good news: the NQP produced by NQP on JVM passes all the tests that the original cross-compiled version did, so we’ve got no regressions there as a result of the bootstrap.

This work is currently sat in the jvm-support branch of the NQP repository. After the upcoming NQP release, it will be merged.

Supporting invokedynamic

Amongst all of this progress, we’ve also gained infrastructure to support using the invokedynamic instruction. This is a mechanism that enables those implementing non-Java languages on the JVM to teach its JIT about how their dispatch works. Most of the hard work here was done by donaldh++. I’d initially built things using BCEL in order to do code generation. While it served well up to a point, it turns out that ASM has much better support for invokedynamic, as well as being a little faster. So, donaldh got things switched over, and I soon was able to emit invokedynamic.

So far, we are not using it a great deal (just for making QAST::WVal compile to something a bit – or potentially a lot – cheaper), but in the future it will be used for things you’d typically think of as invocations (sub and method calls). I’ll write in more detail about it as things evolve.

What next?

With NQP now ported, the focus will fall on Rakudo itself. Quite a lot of preparations have already been made; for example, many pir:: ops have been replaced with nqp:: ones, multiple dispatch has been ported to NQP from C (fixing some bugs along the way), the way HLL boundaries work has been updated to cope with a fully-6model world (this also let me fix a long-standing introspection bug).

The path through getting Rakudo ported will largely follow the build order. This means starting with the module loader, then the compiler, followed by the MOP and its bootstrapping. After that comes the setting – the place where the built-ins live. There’s around 13,650 lines worth of that, so of course I expect to take it a little at a time. :-) I’ll try to remember to get a progress update here in a couple of weeks time.

Posted in Uncategorized | 6 Comments

NQP on JVM gets Grammars, Multiple Dispatch

Having just reached an interesting milestone, I thought I’d blog a quick progress update on my work to port NQP to the JVM, in preparation for also doing a port of Rakudo Perl 6.

The big news is that the grammar and regex engine is pretty much ported. This was a sizable and interesting task, and while a few loose ends need to be wrapped up I think it’s fair to say that the hard bits are done. The port includes support for the basics (literals, quantifiers, character classes, positional and named captures, calls to other rules, and so forth) as well as the more advanced features (transitive Longest Token Matching for both alternations and protoregexes, and embedded code blocks and assertions). Missing are anchors, conjunctions and a small number of built-in rules; none of these are particularly demanding or require primitives that don’t already exist, however. It’s also worth pointing out that the NQP code to calculate the NFAs used in Longest Token Matching also runs just fine atop of the JVM.

Another interesting feature that I ported a little while ago is multiple dispatch. This was some effort to port, since the original implementation had been done in C. While it’s sensible to have a close-to-the-VM dispatch cache, there’s little reason for the one-off candidate sorting work (not a hot path) to be done in C, so I ported the code for this to NQP. This meant that on the JVM side, I just needed to implement a few extra primitives, and could then run the exact same candidate sorting code.

I think it’s worth noting again that I’m really doing two things in parallel here: hunting down places where NQP couples too tightly to Parrot and loosening the coupling, and also doing the porting to the JVM. The first half of this work is relevant to all future ports. In many cases, I’m also finding that the changes give us architectural improvements or just cleaner, more maintainable code. I wanted to point this out especially because I’m seeing various comments popping up suggesting that Rakudo (or even Perl 6) is on a one-way road to the JVM, forsaking all other platforms. That’s not the case. The JVM has both strengths (mature, a solid threading story, widely deployed, the only allowed deployment platform in some development shops, increasing attention to supporting non-Java languages through things like invokedynamic) as well as weaknesses (slow startup time, lack of native co-routine support, and the fact that it was originally aimed at static languages). Rakudo most certainly should run on JVM – and it most certainly should run on other platforms too. And, as I wrote in my previous post, we’ve designed things so that we are able to do so. Perl has always been a language where There’s More Than One Way To Do It. Perl also has a history of running on a very wide range of platforms. Perl 6 should continue down this track – but the new reality is that a bunch of the interesting platforms are virtual, not hardware/OS ones.

By now, the JVM porting work is fast approaching a turning point. Up until now, it’s been about getting a cross-compiler and runtime support in place and working our way through the NQP test suite. This phase is largely over. The next phase is about getting NQP itself cross-compiled – that is, cross-compiling the compiler, so that we have an NQP compiler that runs on the JVM, supporting eval and able to run independently.

Posted in Uncategorized | 9 Comments

A look at the preparations behind the JVM port, and a progress update

After my last post giving a status update on the JVM porting of NQP and the compiler toolchain Rakudo builds upon, hercynium++ left a comment suggesting that I also blog about the design work behind this effort. I liked the idea, and in this post I’ll attempt to describe it a bit. I can’t possibly capture all of the interesting things in a single post, so if this doesn’t cover aspects that are particularly interesting to anybody reading, let me know and I’ll try and find time to write something on them. :-)

It started long ago…

The first commit to the repository where I’m doing the initial porting work to the JVM may have been back in November, but that isn’t where the journey really started. We’ve known for multiple years now that we would want Rakudo and NQP to target backends besides Parrot. In that time, we’ve had to build a lot of technology in order to be able to build Rakudo at all. Some things we’ve had to build more than once because the first time didn’t produce something satisfactory (where satisfactory means “actually meets our needs”, not “is the most awesome thing ever possible”). Software is, fairly often, as much about learning as it is about building. The more complex the domain you’re working in, there more this applies, and the more likely it is that you’ll have to build one to throw away. By now we’ve thrown away a parser engine, an AST, and about 3 implementations of roles. :-)

Of course, there’s the build/buy thing, where buy in open source really means “buy into”, as in use an existing library. We’ve done a bunch of that too, such as libtommath for our big integer support and dyncall for NativeCall support. But the closer something is to the “core domain” – the thing that makes your product distinctive and special – the less able you are to use something off the shelf. Parsing Perl 6 really needs to be done with a Perl 6 grammar, using Longest Token Matching. Its object system really needs something that supports meta-programming, representation polymorphism and gradual typing. Getting BEGIN/eval right and supporting compilation and having the possibility for lexical and anonymous types and packages, which can be dynamically constructed and exported, also left us with something to build (this is the work that led to bounded serialization).

Eventual portability has been a design factor in what we’ve built for quite a while. While the only 6model implementation to have become complete enough to support all of Rakudo’s object needs so far is the one running on Parrot, the initial prototypes of 6model were done on the .Net CLR. This was in no small part to make sure that there was a feasible way to implement it on such a VM. Granted, what I actually discovered was a less than awesome way to build it on the CLR (and what I’m doing on the JVM this time around fits in far better with the JVM’s world view). But it was a design consideration from the start.

When we updated PAST, the previous AST representation, to QAST (Q is just P++ :-)) then once again portability was a concern; the VM specific bits were all placed under a QAST::VM node type. This makes it easy to escape to the underlying VM where needed or where it is most expedient, but it’s both explicit and done in a way that allows specification of what to do on other backends. As part of this work we also build support for the nqp::op abstraction directly into the AST format. The nqp::ops form an opcode set independent of any particular VM. These get mapped as part of turning a QAST tree into code for the target backend (thus meaning there’s no overhead for them in the generated code). They may map directly to the VM’s opcodes, a function or method call in the VM, or do some more complex code generation.

The other important piece of the groundwork for portability is that we implemented Rakudo in a mixture of Perl 6 and NQP, and over time have got NQP to the point where it is also written in NQP (and thus can compile itself). This has been a gradual thing; the earliest NQP editions were written directly in PIR, and with time we’ve migrated those bits to NQP – usually at the same point we were doing other improvements already. For example, pmichaud++ wrote the latest edition of the regex engine, with LTM support, in NQP. PAST, written in PIR, was replaced by QAST, written in NQP. And 6model’s meta-objects were, from the start, expressed in NQP too. It’s pretty neat that NQP’s definition of things so fundamental as classes is actually written in NQP. It means that we don’t have to port classes and roles, just the primitives they are made out of.

So digging into the JVM port itself…

With all of the above mentioned things in place, it was possible to form a fairly concrete roadmap for porting NQP, then Rakudo, over to the JVM. Being comfortable that the result would enable us to get a fully functional Rakudo on the JVM and an idea of how to get there was important. It’s easy to implement a subset, but if it isn’t factored in a way that lets you do the rest afterwards then you’re in bother and it’ll be time for re-work. My hope was that, after some years of learning about things that don’t work and replacing them with things that do, this time much of the re-work could be avoided. A starting point for this was taking a good look at the JVM’s instruction set, as well as considering what JVMs are typically good at doing.

The JVM is a stack machine. This is in contrast to Parrot, which is a register machine. Thankfully, this is mostly a code generation detail rather than being especially deep. As well as the stack, a given method can have local variables (note that everything that contains code on the JVM is called a method, even subroutines, but they call them static methods because it sounds so much more OO :-)). These can hold longer-lived things, so in a sense could be used a bit like Parrot registers. In general, the code generation from QAST uses the stack where possible and falls back to locals where needed. This is because stack usage fits well with what a JVM expects to be doing, and also what its bytecode format expresses most compactly.

Locals have an important restriction: they can only be accessed directly in the scope where they are declared. There is no notion of nested methods at the JVM level. This means that locals are not suitable for implementing lexical variables. Thankfully, there is a well established solution: promote such things onto the heap, keeping them in some kind of invocation record. This is what happens with closures in C# on the CLR, for example. There are a bunch of ways to do this transform, with various trade-offs. I’ve done one that was fairly fast to implement, but also enables lookup by array indexing rather than needing a named (hash) lookup in the vast majority of cases. As well as an array index being algorithmically cheaper than a hash lookup, the JVM supports array indexing natively in its opcode set, but not hash lookups.

Talking of allocating things on the heap brings us nicely to think about objects. JVMs are very good at fast allocation and collection of objects, because they have to be; there is no stack allocation in Java of anything non-trivial. Of course, that doesn’t mean the VM can’t do escape analysis and stack allocate under the hood. That the VM is really good at object allocation and GC means we don’t need to worry too much about lexicals leading to invocation records on the heap; there’s plenty of performant uses of this approach in the wild. Furthermore, most records will be very short lived, nicely meeting the generational hypothesis (which is that most objects are either short lived or long lived, and so we can optimize separately for each through doing generational garbage collection).

While invocation records are relatively internal, of course NQP and Perl 6 involve lots of user-visible objects. From the things you think about as objects (and call “new” on) to things like scalar containers, strings, boxed integers and so forth, both NQP and Perl 6 lead to plenty of allocations. While some things are quite predictably shaped, most come from user class definitions. Ideally, we’d like it if a Perl 6 class definition like:

class Point {
    has $!surface;
    has num $!x;
    has num $!y;

Was to use memory similarly to if you wrote something in Java like:

class Point {
    private Object surface;
    private double x;
     private double y;

At the same time, we know that the JVM’s idea of type is some way off the Perl 6 notion of type, so we can’t simply turn Perl 6 classes into JVM classes. Thankfully, 6model has from the start been designed around the idea of representation polymorphism. Really, this is just a separation of concerns: we decouple the details of memory representation and access from the notion of being a type and dispatch. The former is handled by a representation, and the latter two by a meta-object. One early but important observation I made when designing 6model is that the representation will always couple closely to the underlying runtime (and thus would need to be implemented for each runtime we wanted to run on), whereas the other bits can be expressed in a higher level way, with the common cases made efficient by caching. Thus there’s no reason to re-implement classes and roles per VM, but there is a need to provide a different, VM-specific way to do P6opaque (the default representation for NQP and Perl 6 objects).

The C implementation of P6opaque on Parrot works by calculating a memory layout – essentially, figuring out a struct “on the fly”. What’s the JVM equivalent of that? Well, that’s just creating a JVM class on the fly and loading it. Is the JVM capable of doing that? Sure, it’s easily dynamic enough. Furthermore, once we’ve done that little bit of bytecode generation, it’s a perfectly ordinary JVM class. This means that the JIT compiler knows what to do with it. Does doing any of this require changes to the meta-objects for classes in NQP and Rakudo? No, because these details are all encapsulated in the representation. Things like these are good signs for a design; it tends to show that responsibilities are correctly identified and segregated.

So, how’s the cross-compiler going?

Things are going nicely. Having got much of the way there with the NQP MOP, I turned to ModuleLoader and started to get together a basic setting (the setting being the place where built-ins are defined). With those in place, work has moved on to trying to pass the NQP test suite.

The build process cross-compiles the MOP, module loader and setting. To run the test suite, each test is taken and cross-compiled against those, then the result of compiling it is run on the JVM. The fact we invoke NQP, then invoke the JVM twice in order to run each test gives quite a bit of fixed overhead per test; once we have NQP itself (that is, the compiler) cross-compiled and self-hosting on the JVM it’ll be down to a single invocation.

The NQP test suite for the NQP language itself consists of 65 test files. 3 of them are specific to Parrot, so there’s 62 that are interesting to make run. As of today, we pass 46 of those test files in full. While some of those passing tests exercise relatively simple things (literals, operators, variables, conditionals, loops, closures), others exercise more advanced features (classes, roles, basic MOP functionality, runtime mixins and so forth). Of the 16 test files that remain, 9 of them depend on regexes or grammars. Getting those to run will be the focus of the next major chunk of work: porting the regex compiler and getting the NFA, Cursor and Match classes to cross-compile (which will involve some portability improvements). The other 7 relate to non-trivial, but smaller-than-grammars features (for example, 2 are about multiple dispatch, which I’m working on porting at the moment).

It was only three weeks ago when I wrote that the JVM port did not even manage “hello world” yet, and that I had little more to show than something that could turn a range of QAST trees into JVM bytecode. Three weeks later and we’re running around 75% of the NQP test files, and timotimo++ was even able to feed an almost unmodified Levenstein distance implementation written in NQP to the cross-compiler and have it run on the JVM.

So, mad amounts of coding have taken place? Well, only sorta…I’ve taught two three-day classes for $dayjob in the last couple of weeks also. :-) Mostly, progress has been fast now because the foundations it is building upon have proved fairly solid. For the backend, this is in no small part down to having grown a test suite for the QAST to JVM phase of the work as it progressed. The fact we could happily couple this new backend to the existing NQP parser is thanks to the compiler being structured as a pipeline of stages, each one strongly isolated from the others, just passing a data structure between them. In my teaching work, I often encourage automated testing and talk a lot about the importance of enforcing carefully chosen, well-defined, information-centric boundaries between components. It’s pleasing to see these things paying off well in my Perl 6 work also. :-)

Posted in Uncategorized | 15 Comments

A quick JVM backend update

Things have been moving along quite rapidly on the JVM backend since my last post. Sadly, I’m too sick to hack on anything much this evening (hopefully, this turns out to be a very temporary affliction…) but I can at least just about write English, so I figured I’d provide a little update. :-)

Last time I blogged here, I was able to compile various QAST trees down to JVM bytecode and had a growing test suite for this. My hope was that, by some inductive handwaving, being able to compile a bunch of QAST nodes and operations correctly would mean that programs made up of a whole range of them would also compile correctly. In the last week or so, that has come to pass.

Having reached the point of having coverage of quite a lot of QAST, I decided to look into getting an NQP frontend plugged into my QAST to JVM backend. In the process, I found that NQP lacked the odd VM abstraction here and there in the common prelude that it includes with every QAST tree it produces. Thankfully, this was easily rectified. Even better, I got rid of a couple of old hacks that were no longer required. With those things out of the way, I found that this common prelude depended on a couple of operations that I’d not got around to implementing in the JVM backend. These were also simple to add. And…here endeth the integration story. Yup, that was it: I now had a fledgling NQP cross-compiler. An NQP compiler running on Parrot, but producing output for the JVM.

This result is rather exciting, because…

  • It’s using exactly the same parse and action stages as when we’re targeting Parrot. No hacks, no fork. The QAST tree we get from the NQP source code that goes in is exactly the one we get when targeting Parrot. Everything that happens differently happens is beyond that stage, in the backend. This is an extremely positive sign, architecturally.
  • With a couple of small additions to handle the prelude, I was immediately able to cross-compile simple NQP programs and run them on the JVM. There’s no setting or MOP yet, but the basics (variables, loops, subroutines with parameters, even closures) Just Worked.
  • The program I wrote to glue the JVM backend work and the existing NQP frontend together was about 30 lines of NQP code.
  • This whole integration process was about an afternoon’s worth of work.

Since I got that working, my focus has been on getting nqp-mo (the NQP meta-objects) to cross-compile. This is where classes and roles are implemented, and thus is a prerequisite for cross-compiling the NQP setting, which is in turn a prerequisite for being able to start cross-compiling and running the NQP test suite. The NQP MOP is about 1500 lines of NQP code, and at this point I’ve got about 1400 of them to cross-compile. So I’m almost there with it? Well, not quite. Turns out that the next thing I need to port is the bounded serialization stuff. That’s a rather hairy chunk of work.

Anyway, things are moving along nicely. The immediate roadmap is to get the bounded serialization to the point where it’s enough for the NQP MOP, then move on to getting a minimal setting cross compiling. Beyond that, it’ll be working through the test suite, porting the regex compilation and seeing what else is needed to cross-compile the rest of NQP.

Posted in Uncategorized | 5 Comments

A Bunch of Rakudo News

Seems it’s high time for some news here. It’s not that I didn’t do any blogging about Perl 6 in December; it’s just that all of those posts were over on the Perl 6 advent calendar. Anyway, now it’s a new year, and I’m digging back into things after an enjoyable Christmas and New Year’s break in the UK with family and friends. Here’s a bunch of things that already happened but I didn’t get around to talking about here yet, and some things that will be coming up.

Better Parse Errors In 2012.12

Ever had Rakudo tell you there’s a problem on line 1, when it’s really on line 50? Or wished that even in the common case where it gets the line right, it would tell you exactly where on the line things went wrong? Or how about the time it told you “Confused” because you got a closing paren too many?

Many of my contributions to the 2012.12 Rakudo release centered around improving its reporting of parse errors. STD, the standard Perl 6 grammar, has had much better error reporting than Rakudo for a while. Therefore, I spent a bunch of time aligning our error reporting more closely with what STD does. Some of this is cosmetic: you get the colored output and the indication of the parse location. But while these cosmetic changes will be the most immediately visible thing, the changes go far deeper. Of note, a high water mark is kept so we can be a lot more accurate in reporting where things came unstuck, and we track what was expected so we can produce better errors. Just doing the cosmetic stuff without being able to give it a better location to report wouldn’t have helped so much. :-)

One other change is that we don’t bail out on the first thing that’s wrong when it’s possible to survive and continue parsing. When this is possible, up to 10 errors will be reported (since that’s typically a screen worth). Of course, some things just hose the parse and we can’t continue in any sensible way.

Hopefully, these improvements will make using Rakudo feel a lot nicer. Already on channel, I can feel the feedback we’re giving about parse errors when people use the evalbot is often a lot more pleasant and informative. Of course, there’ll be more improvements in the future too, but this is a big step forward.

Faster Auto-Threading

The junction auto-threader could sometimes be insanely slow. As in, ridiculously so. After hearing a bunch of reports about this, I decided to dig in and work out why. A rewrite later, the little benchmark I was using with it ran almost 30 times faster. Not so bad… :-) This change also made it into the 2012.12 release.

JVM Backend Preparations Underway

I’ve talked plenty about plans for NQP and Rakudo to run on things besides Parrot for a while now. Over the last year or two, we’ve laid a lot of the groundwork for this. What’s been especially pleasing is that it’s also made Rakudo a better quality Perl 6 implementation on Parrot, thanks to the many architectural improvements. Of note, in many places we’ve closed semantic gaps between what Perl 6 wants and the primitives we were building it out of; the new QAST (Q Abstract Syntax Tree) is a great example.

Anyway, with NQP now being written pretty much entirely in NQP, and many of the right abstractions in place, it felt like time to start slowly picking away at getting 6model ported to the JVM and work on turning QAST trees into Java bytecode. I quietly started on this in November, and mentioned the existence of the work on #perl6 in December. I was delighted to see Coke++ jump in and start working through the Low Hanging Fruit – a file where I’m putting tasks that should be relatively easy to pick off. I actually had to re-fill it, after the last round were depleted. ;-) By now, quite a few bits of QAST are supported and the 6model on JVM implementation is coming along nicely. Yes, this means it’s already capable of doing basic meta-programming stuff.

Note that this work isn’t at the stage where it’s of use for anything yet. You can’t even write a say(“hello world”) in NQP and have it run on the JVM yet, since all the work so far is just about turning QAST trees into JVM bytecode and building the runtime support it needs. That may seem like a curious way to work, but once you do enough compiler stuff you find yourself thinking quite naturally in trees. It meant I didn’t have to worry about creating some stripped-down NQP that could emit super-simple trees to be able to test really simple things. After all, the goal is to run NQP itself on the JVM, and then Rakudo, and only then will things be interesting to the everyday user.

To address a couple of immediate concerns that some may have…

  • No, this is not a case of “stop running on Parrot, start running on JVM”. It’s adding an additional backend, much like pmurias++ has been working on adding a JavaScript backend for NQP. Of course, I expect resource allocation in the future to be driven by which backends users desire most. For some, the JVM is “that evil Oracle thing” and they don’t want to touch it. For others, the JVM is “the only thing our company will deploy on”. Thus I expect this work to matter more to some people than others. That’s fine.
  • No, targeting multiple backends doesn’t mean performance-sucking abstractions everywhere. It’s a pretty typical thing for a compiler to do. As usual, it’s about picking the right abstractions. The debugger was implemented as an additional Rakudo frontend without a single addition to Rakudo or NQP or anything anywhere in the stack. That was possible because things were designed well. I’m sure the process of getting things running on the JVM will flag up a few places where things aren’t designed as well as they need to be, but already I’m seeing a lot of places where things are mapping over very nicely indeed.
  • No, this doesn’t mean that all other Rakudo development will grind to a halt. I’ve been working on the JVM backend stuff in November and December; both months saw a huge amount of Rakudo progress too. Things will go on that way.

Type System Improvements

Rakudo does a lot of things well when it comes to supporting the various kinds of types Perl 6 offers, but there are some weak areas. Here are some of the things I plan to focus on:

  • Getting the definedness constraint stuff working better (the :D and :U annotations). At the moment, they’re supported as a special case in multiple dispatch and the signature binder. In reality, they’re meant to be first class and work everywhere. You may not think you care about these much. Actually, you probably at least indirectly do, because once the optimizer is able to analyze them, it’ll be able to do a bunch more inlining of things than it can today. :-)
  • Getting coercion types in place. Again, this is turning the special-cased “as” syntax into the much more general coercion type syntax (for example, Int(Str) is the type that accepts a Str and coerces it into an Int).
  • Getting native types much better supported. At the moment, you can use them but…there are pitfalls. Having them available has been a huge win for us in the CORE setting, where we’ve used them in the built-ins. But they’re still a bit “handle with great care”. I want to change that.
  • Implementing compact arrays.
  • Improving our parametric type and type variable support. Many things work, but there are some fragile spots and some bugs that want some attention.

I intend to have some of these things in the January release, and a bunch more in the February one. We’ll see how I get along. :-)

Posted in Uncategorized | 3 Comments

Lots of improvements coming in the Rakudo November release

The November release is still a couple of weeks off, but it’s already looking like one of the most exciting ones in a while. Here’s a rundown of the major improvements you can expect.

User Defined Operator Improvements

The way Rakudo handles parsing of user-defined operators has been almost completely reworked. The original implementation dated back quite a long way (we’re talking largely unchanged since probably 2010 or so), and as you might imagine, we’ve the tools to do a lot better now. The most significant change is that we now parse them using a lexically scoped derived language. This is done by mixing in to the current cursor’s grammar – something we couldn’t have done some years ago when NQP lacked roles! This means that the additions to the grammar don’t leak outside of the scope where the user-defined operators are defined – unless they are explicitly imported into another scope, of course.

    sub postfix:<!>($n) { [*] 1..$n };
    say 10!; # 3628800
say 10!; # fails to parse

Before, things were rather more leaky and thus not to spec. Perl 6 may allow lots of language extensibility, but the language design takes great care to limit its scope. Now Rakudo does much better here. As well as making things more correct, the nasty bug with pre-compilation of modules containing user defined operators is gone. And, last but not least, the precedence and associativity traits are now implemented, so user defined operators can pick their precedence level.

sub infix:<multiplied-with>($a, $b) is equiv(&infix:<*>) {
    $a * $b
sub infix:<to-the-power>($a, $b) is equiv(&infix:<**>) {
    $a ** $b
say 2 multiplied-with 3 to-the-power 2;

Quote Adverbs and Heredocs

The other thing that’s had a big do-over is quote parsing. This meant a lot of quoting related things that hadn’t been easy to attack before became very possible. So possible, in fact, that they’ve been done. Thus, we now support heredocs:

say q:to/DRINKS/

This outputs:


Notice how it trims whitespace at the start of the string to the same level of degree of indentation as the end marker, so you can have heredocs indented to the same level your code is, if you wish. The :to syntax is an example of a quote adverb, and we now support those generally too. For example, a double-quoted string normally interpolates a bunch of different things: scalars, arrays, function calls, closures, etc. But what if you want a quoting construct that only interpolates scalars? You do something like:

my $value = 42;
say Q:s'{ "value": $value }';

Alternatively, you could take a quoting construct that normally interpolates everything and just switch off the closure interpolation:

my $value = 42;
say qq:!c'{ "value": $value }';

Last but not least, the shell words syntax now works. This lets you do quoting where things are broken up into a list by whitespace, but you can quote individual parts to prevent them
getting split, and also do interpolation.

say << Hobgoblin 'Punk IPA' >>.perl; # ("Hobgoblin", "Punk IPA")

Sadly, I’ve got too bad a headache today to enjoy any of the listed beers. But hey, at least now you know some good ones to try… :-)

Operator Adverbs

Well, weren’t these some fun to implement. STD has parsed them for a long while. You’d think that’d make it easy to steal from, but no. Before you can do that, you have to figure out that not only does it parse adverbs as if they were infixes, but then pushes them onto the op stack in the operator precedence parser and does a reduce such that they come out looking like postfixes, which in turn need an AST transform to turn the postfix inside-out so what was originally parsed as a fake infix becomes a named argument to the thing preceding it. I’m still not sure if this is beautiful or horrifying, but now it’s in place. This means we can at last handle the adverb-y syntax for doing things like hash key existence checks or deletion:

my %h = a => 0;
say %h<a>:exists; # True
say %h<a>:delete; # 0
say %h<a>:exists; # False


Recent work by masak++ has got macros to the point where they’re powerful enough to be potentially useful now. See his blog post for the sometimes-gory details, illustrated with exciting adventures involving wine barrels and questionably pluralized bits of Russian landscape. Really.

NFA Precomputation

This is decidedly a guts change, but worth a mention because it’s probably the main performance improvement we’ll see in the November release: the transitive NFAs that are used by the parser are now computed at the time we compile the Perl 6 grammar, not on-demand as we start parsing. They are then serialized, so at startup we just deserialize them and dig into the parsing. They’re not entirely cheap to construct, and so this saves a bit of work per invocation of the compiler. In the spectests, it made an almost 10% difference to the time they take to run (many test files are quite small, and we run over 700 test files, so reducing invocation overhead adds up).

By the way, if you’re confused on what these transitive NFAs are about, let me try and explain a little. In order to decide which of a bunch of protoregexes or which branch of an alternation to take, the declarative parts of a grammar are analyzed to build NFAs: state machines that can efficiently decide which branches are possible then rank them by token length. It’s not only important for a correct parse (to get longest token matching semantics correct), but also important algorithmically. If you look at the way Perl 6 grammars work, the naive view is that it’s recursive descent. That makes it nice to write, but if it really was evaluated that way, you’d end up trying loads of paths that didn’t work out before hitting upon the correct one. The NFA is used to trim the set of possible paths through the grammar down to the ones that are actually worth following, and their construction is transitive, so that we can avoid following call chains several levels deep that would be fruitless. If you’ve ever used the debugger to explore a grammar, and wondered how we seem to magically jump to the correct alternative so often, well, now you know. :-)

Other Things

There are a few other bits and pieces: INIT phasers now work as r-values, you can use the FIRST/NEXT/LAST phasers in all the looping constructs now (previously, they worked only in for loops), version control markers left behind in code are gracefully detected for what they are, and a bunch of proto subs in the setting have been given narrower signatures which makes various things you’d expect to work when passed to, say, map, actually do so.

Oh, and whatever else we manage to get around to in the next couple of weeks. :-)

Posted in Uncategorized | Leave a comment

Rakudo Debugger Updates

A while ago I wrote about Rakudo getting an interactive debugger. The feedback I’ve got has been a happy mixture of “this is useful” and “I’d really like it to X” – so I’ve been working on some improvements. I showed them off at my Nordic Perl Workshop talk the weekend before last, in a fun session where I used the debugger to hunt down some problems in a small tool that had a couple of modules. It was nice not only to demonstrate the debugger itself, but also because I could show some neat Perl 6 code along the way.

So, what new things can you expect from the debugger that will be bundled in the Rakudo Star release that will be emerging in the next few days? Here’s a quick rundown.

Attributes that are in scope can now be introspected, just by typing their name. Additionally, the self keyword is recognized as a variable name by the debugger, meaning you can look at it directly (just type “self” and press enter) or even to look at public attributes (“self.name”).

Before, if you were on a line of code that was going to make a call, you would always step in to the callee on pressing enter. Now, if you type “s” and press enter, you will step over the call. Got bored of debugging the current routine and want to let it run its course, and break on the next statement after it returns? Just type “so” and press enter, and you will step out of the current routine.

Trace points are perhaps the most powerful of the new features. They enable you to add print statements to your code, without actually adding print statements to your code. A trace point is like a break point, but instead of breaking, it just evaluates an expression and logs it. Later on, you can view the log.

To go with Rakudo’s improving support for the P5 adverb on regexes, which allows the use of Perl 5 regex syntax from within Perl 6, the debugger now also supports single-stepping through those Perl 5 regexes.

There are also a range of small fixes and tweaks that avoid some of the noise you could get before. For example, fails no longer count as exception throws that cause a rt (run until throw) to break, and sigspace in rules is no longer single-stepped, just jumping straight to the next interesting atom.

What of plans for the future? There are some other ideas already in the issues queue. Beyond those, I’m planning to make a web-based API frontend to the debugger, to go alongside the command line one. This should allow a browser-based debugging interface to be built, but should also enable tools like Padre to integrate with the Rakudo Debugger.

Enjoy the updates!

Posted in Uncategorized | Leave a comment