The Rakudo move to QAST: progressing nicely

It’s been a little while since I wrote anything here. After all the work getting the new regex engine bootstrapped and alternations having longest token matching semantics, I’ve been taking things just a little bit easier. Only a little bit though…things have still been moving along nicely. :-)

My current focus is on getting Rakudo switched over to QAST, our refreshed abstract syntax tree design and implementation. What is an AST, you may wonder? Basically, it’s just a tree representation of the behavior of a program. As we parse programs in Perl 6, the grammar engine builds a parse tree. This is very tied to the syntax of the program, whereas an AST is all about semantics. A chunk of code known as the actions transforms pieces of the parse tree into an abstract syntax tree. Not all elements of your program are represented in the AST, however. Declarations are handled differently, through a mechanism called the world. If you pre-compile a module, the declarative bits are serialized; the AST, on the other hand, represents things that will actually run, so it needs to be turned into code for the target runtime.

Thus, the biggest part of QAST isn’t the AST nodes themselves. It’s the code that takes them and produces lower level code for a target VM; in Rakudo’s case, that is currently just Parrot, but we’re looking forward to targeting other VMs in the not too distant future also.
QAST is the successor to PAST (Q is just P++), and has a great deal in common with it. This is because actually, PAST is pretty nice. So what makes QAST nicer?

  • It’s implemented in NQP, not PIR. This makes it much more pleasant to work on.
  • It’s much more closely integrated with other parts of the compiler tool chain, including the bounded serialization support and 6model-based type system. All of the old stuff for talking about types as just string names is dead and buried.
  • Since the AST nodes are now 6model objects, we can use natively typed attributes in many places. This should lead to memory usage improvements during compilation.
  • The nodes being 6model objects means we can just serialize them using the existing serialization support. Why do we want to serialize AST nodes, you may ask? There are two immediate cases we have. One is masak++’s macros work, when a macro lives in a pre-compiled module. The other is because we want simple routines that can be inlined into other compilation units by the optimizer to carry their AST along with them. At the moment we support this but…it’s a very restricted hacky solution. Now with QAST I can toss a bunch of code I never really liked and simultaneously get a more capable optimizer. Win.
  • It’s a bit smarter about compiling lexical accesses, which will result in better code in some cases.
  • It handles native types better.
  • The nqp::op abstraction layer is now incorporated into QAST itself. This means that the optimizer and other bits of the compiler can work completely in terms of abstract operations, independent of any particular virtual machine. This will help with porting, but incidentally has made a various bits of code cleaner too.
  • The nqp::op layer has also been unified with the notion of operations in general (what used to be pasttype). In addition to this, operations can be overridden at a HLL level (meaning Rakudo can supply its view of operations where they differ from the simpler view NQP takes). This has been put to use already in the updated box/unbox model.
  • Exception handlers were somewhat coupled to blocks in PAST. In QAST the handlers model is a bit different; in my view, it’s simpler and more flexible than what came before it also.
  • VM specific things are now all arranged under a QAST::VM, which provides an escape hatch where it’s needed.
  • Serialization of declarative elements now takes place during the final QAST compilation phase. This is significant because it opens the door to implementing optimizations that twiddle with declarative bits of the program. In the way things were set up in PAST, by the time the optimizer got hold of things, it was already too late.

There’s various other cleanups and improvements too, but I think the list above captures a lot of the niceness.

So, what’s the current status? I decided to go with updating Rakudo first, and do NQP later on. The overall plan of attack was:

  1. Get QAST nodes defined and a QAST::Compiler in place that covers a lot of what PAST::Compiler could do before.
  2. Keep using the original PAST compiler to build a perl6 executable, the core setting and Test.pm. Then also start building a parallel version of the compiler (qperl6.exe) that would use QAST.
  3. Get the basic sanity tests passing using qperl6 (note, the core setting is compiled with the original PAST-based perl6 executable still)
  4. Next up, start attacking the spectests, and make a significant dent into them.
  5. Once it’s capable of doing so, start using qperl6 to compile Test.pm also.
  6. Continue tackling more of the spectests, to get most of them passing again.
  7. Get the optimizer up and running with QAST, and make sure it doesn’t cause any regressions to the spectests.
  8. Get qperl6 able to compile the core setting.
  9. Make the qperl6 code be the main compiler code (so the perl6 executable is now using the QAST toolchain).
  10. Sync up latest changes from the main development branch, triage the remaining handful of spectest issues and module space issues.
  11. Merge! Beer!

Note that these steps aren’t at all similar in size. At the moment, thanks to much hacking by masak++ and myself, design input from pmichaud++ and feedback from moritz++ and tadzik++, we’re up to steps 6 and 7. This means that Rakudo using QAST is able to compile, run and pass the majority (I’d say about 95%) of the spectests, and that we have Test.pm being compiled with QAST-based Rakudo also. My current work is updating the optimizer and taking care of the last few things that I know absolutely have to be sorted out before trying out compiling the core setting. I’m currently optimistic that by the time the weekend is over, I’ll be onto step 9. I’m hopeful that by the following weekend, step 11 will have happened, and thus the August Rakudo release will be QAST based.

The standard for this branch landing is the same as for qbootstrap and altnfa, the last two sizable branches we landed: no regressions in the spectests and module space (unless, of course, they relied on bugs that this branch fixes). So from a user perspective, there’s nothing much to fear from this change, and plenty of nice things to look forward to that it will make possible.

Posted in Uncategorized | 1 Comment

LTM for alternations

This month’s Rakudo development work has already seen us switch to the new QRegex grammar engine for parsing Perl 6 source, unifying it with the mechanism for user-space grammars and regexes. A week and a bit on, another major improvement in this space has also landed: alternations now participate in Longest Token Matching, as per spec. What does this mean? To give a simple example:

> say "beer" ~~ /be|.*/
q[beer]

Here, the second branch of the alternation wins – because it matches more characters than the first branch. This is in contrast to sequential alternation (which you are likely more used to), which is done with the || operator in Perl 6:

> say "beer" ~~ /be||.*/
q[be]

The || may remind you of the short-circuiting “or” operator, which is exactly what a sequential alternation in a regex does: we try the possibilities in order and pick the first one that matches. On the other hand, the | is a reminder of the “any” junction constructor, which is analogous to what happens in a regex too: we process all of the branches with a parallel NFA, trimming impossible options as we go, and the one that matches most characters will win. If multiple match, we take them in order of length until one matches.

Note that – just like with protoregexes – the thing we actually use the NFA on is the declarative prefix. Perl 6 regexes are a mixture of declarative and procedural; the switch between them is seamless. The declarative bits are amenable to processing with an NFA.

Longest token matching is not only a Perl 6 user-space feature, but also used when parsing Perl 6 – and this goes for alternations too. In fact, the ability to quickly decide which branch to take out of a bunch of possible options is also important for parsing performance. STD, the standard grammar, is written so that trying things sequentially will usually give a correct parse. However, there are exceptions, and up until now they have been problematic. With this work, we now come closer to parsing things the way the standard grammar does. In fact, a lot of the tweaks I had to make in order to get the Perl 6 grammar to parse things correctly again after implementing longest token matching for alternations were a case of aligning it more closely with STD, which is decidedly encouraging.

So, the branches in NQP and Rakudo containing this work have landed. Once again, it was a fairly deep and significant change, and pulling it off has involved various other improvements along the way (such as making tie-breaking by declaration order work reliably). Happily, the improvements we’ve made because we dogfood the grammar engine to parse Perl 6 source will also make things better for those writing grammars in Rakudo. I merged it this evening, with no regressions in the spectests or in module space tests.

While I’ve put in most of the commits on this work, it certainly wasn’t a one person effort. pmichaud++ is once again to thank for the excellent design work behind this, and moritz++, tadzik++ and kboga++ have both helped with testing, fixing tests that had bad assumptions about LTM semantics and fixing Pod parsing to work with the new alternation semantics.

The next release is still two weeks off. I expect to spend my tuits, which should be in reasonable supply, on various follow-up tweaks as a result of the regex engine work, pre-compilation improvements and diving into the QAST work, which I’m hopeful will land in time for the July release. Meanwhile, stay tuned: I expect pmichaud++ will have some nice news about what he’s been cooking up for the June release coming up soon. :-)

Posted in Uncategorized | Leave a comment

Rakudo switched over to QRegex

In my last post – just two days ago – I talked about how the work to switch Rakudo over to using QRegex for parsing Perl 6 source was going well. I guessed there was a 90% chance we’d land it well in time for the next release, hoping it would happen sometime in the next week.

Well, after a flurry of fixes and testing, with contributions from moritz++, diakopter++ and tadzik++, it landed today. We got to the point where the spectests showed up zero regressions, which was a very encouraging sign. Then, tadzik++ did a run of Ementaler (automatic building and testing of all the modules) to see how module ecosystem had fared the transition. That caught one issue, which was easily fixed. After that…no regressions there either. And not only did we not regress on any spectests, but the improved LTM meant some previously failing spectests are now passing also.

So, it’s merged. NQP is now bootstrapped using a regex implementation written in itself, Rakudo now uses QRegex for parsing, and the next release will ship with it. And there’s still a good three weeks to go until the next release to tune it – and, of course, for plenty more general Rakudo development.

Posted in Uncategorized | 2 Comments

Switching to QRegex for parsing Perl 6 source

I planned to blog when I finished hacking on Rakudo stuff today. The headline has shifted about three times during the day, however, as I was able to make progress much faster than expected. Well, also because I was having fun and have stayed up doing this until 3am… :-)

So, what am I up to?

Since last summer, Rakudo has used two distinct regex engines. One is QRegex. If you’ve been writing grammars and regexes in Rakudo, you’ve been using it. It’s the shiny new one, doing LTM (Longest Token Matching) with an NFA, tracking the (now very occasional) regex spec changes and getting new features. It is implemented in NQP. The other one is far older, doesn’t do LTM apart from for literal prefixes, and is written in PIR. We’ve continued to use this for parsing Perl 6 source in Rakudo, as well as parsing NQP source in NQP.

I think it goes without saying that we’d rather have just one regex engine, and that working in a high level language is much more productive. So, we’ve wanted to switch everything over to QRegex for quite a while. At the same time, there’s been many other more pressing things to do, so the switch has been a “nudge it along now and then” project.

The work towards this has been ongoing for a few months in the qbootstrap branch in the NQP repository. The “q” comes from QRegex, but why “bootstrap”? Well, because there’s a fun little bit of circularity going on here. QRegex is written in NQP, but at the same time we want to use it as the engine for parsing NQP. It has to be able to parse and compile itself. Mix that with bounded serialization, and you need to make darn sure you’re doing your separate compilation right. Anyway, if you were wondering, “why didn’t you write it in NQP in the first place”, now you know why. You have to be able to parse NQP before you can write a program in NQP. :-)

Anyway, here’s what I thought I’d be excitedly reporting earlier on today.

Finally, the “qbootstrap” branch bootstraps!

This is a really nice milestone. In the last couple of days, I’ve been picking away at the qbootstrap branch, tracking down the last test failures, trying to make sure that an NQP built with QRegex was going to be able to compile itself. Not just that, but also that the compiled output would in turn be able to compile itself again. Finally, I managed to get there today. If you build the qbootstrap branch in NQP now, the only regex engine you’re using is QRegex. The older one isn’t built in the branch; in fact, the files are removed just to be really sure all was working properly.

And that would have been a nice thing to report, but…

The Rakudo compiler executable builds on NQP/qbootstrap!

It was always clear that Rakudo would need changes. Having got an NQP that does everything with QRegex to hand, I set it on building the Rakudo compiler. Note that when I say compiler, I do not mean CORE.setting – the 11,000+ line library with all the built-ins. Just the core of the compiler itself.

I was ready for trouble here. Rakudo’s grammar is notably more complicated than the one in NQP. However, in the end it was all relatively painless. A few updates to syntax, a few tweaks to use updated libraries, and…it built! That didn’t mean it had produced a parser that would actually work, of course. But that it got through all of the compiler source with so few complaints was very encouraging.

So, I thought, with that done, why not hack on? And so now…

The Rakudo CORE.setting builds on NQP/qbootstrap!

This took more effort. I was ready for this to take O(days), not O(hours). Especially when the first thing I got was a segfault a fraction of a second in to the compilation process. Thankfully, that turned out to be an easy fix. And then we got up to…line 106 and failed. What was wrong?

After tracing what the protoregex LTM calculations were doing, it soon became apparent it was trying to wrong branch first. Happily, though, the fix was to delete a lookahead. Even nicer, it was one we had got in order to work around lack of proper transitive LTM in the previous regex engin. Well, that was a win. Onto the next failure…

Over the next couple of hours, I tracked down and fixed a variety of issues, some in the grammar and some in QRegex itself. And finally…it built! But what about the spectests?

The spectest fallout is not too bad

The first run through the spectests looked horrible, with over a third of the test files – if not more – showing up with issues. Thankfully, I quickly realized that many of them shared a common and easily fixed root cause, due to a bug in handling a very common syntactic construct (quote words, of all things!)

The second run – and the way things stand now – looks much better. Out of over 600 test files, less than 30 exhibit failures that will need to be investigated. To say that within the last 24 hours I didn’t even have QRegex able to build NQP, this is great progress. Of course, while I’ve spearheaded this part of the work, much credit must go to pmichaud++, who built the vast majority of QRegex.

The plan from here

Earlier, I expected in this post to be saying there’s a 50/50 chance we’d ship the June Rakudo release with QRegex used for parsing. By now, I’d say there’s at least a 90% chance we will do so. The next monthly compiler release isn’t until the 21st of June; I’d want to land such a big switchover at least two weeks before that, and that currently feels very achievable. I’ll keep you posted.

What does this actually mean for Rakudo end users?

This work isn’t so much about immediate wins as it is about having a good base to build another round of user-facing improvements atop of. That said, there will be some nice improvements available right away.

  • The work done in the qbootstrap branch has vastly improved protoregex handling. The NFA evaluation currently stands at around 15-20 times faster, and the NFA construction is now transitive into subrule calls to other protoregexes (both a correctness and parse performance win). Your own grammars will benefit from these improvements.
  • Parsing Perl 6 is a pretty decent stress test for the engine, and of course showed up some bugs, with have been fixed. Again, this improves the experience for those using grammars and regexes in Rakudo.
  • Having one regex engine in memory instead of two should give a slight memory usage reduction.

Here’s some of the follow-up things that will come along as a result of this.

  • Once some of the “get stuff working” bits of the implementation are optimized, Rakudo should be able to parse Perl 6 source faster than before.
  • Work on precedence traits for user defined operators should now be unblocked.
  • Further convergence with the way STD parses things should be possible.
  • We can start considering how to implement things like slangs.

Further, if one of the things you’re looking forward to is Rakudo running on a VM besides Parrot, then this work also eliminates one of our last two large components written in PIR.

Anyways, it’s decidedly now time for some sleep. ‘night!

Posted in Uncategorized | 2 Comments

Since the Hackathon…

Last time I got around to writing here was while I was at the Oslo Hackathon. It was a truly great event: hugely productive, a great deal of fun and a real motivation booster too. I’d like to again thank Oslo.pm, and especially Salve, Rune and Jan, for thinking to organize this, and then making a superb job of doing so. Everything ran smoothly, there was lots of undistracted hacking time, and a couple of evening dinner and beer outings provided chance to enjoy time together as a team in real life, without the restricted bandwidth IRC normally enforces on us.

After the hackathon, it was right back to $dayjob, though in a fun way: teaching Git for a couple of days. Less fun was the couple of days after, which involved a lot of pain followed by a lot of noisy drilling at my local dental clinic. It took some days beyond that before I stopped feeling unusually tired… Anyways, things are back to normal again, or as normal as things ever tend to look for me… :-)

So, what’s been going on in Rakudo land since the hackathon? Lots and lots, as it turns out. Most public-facing, the April 2012 Rakudo Star release landed; moritz++ is to thank for getting the release out of the door this month. It’s the first distribution release to incorporate the bounded serialization work, which delivers the much-improved startup time. While I’ve talked about many of the compiler side improvements in previous posts, Rakudo Star also includes modules, and we had some exciting additions in this release too: Bailador (a Perl 6 port of Dancer), LWP::Simple and JSON::RPC.

Of course, one nice release out the door means time to start working on things to make the next one interesting. Here’s some of what to expect.

  • Following on from removing “lib” from the default @*INC, the current working directory (“.”) is also gone now. While they may be nice for development, they are not good defaults from a security perspective when you actually run stuff in a production environment. Making this a more comfortable removal, as well as supplying search paths through the PERL6LIB environment (as has been supported for a long while), the -I command line option and “use lib” have now been implemented, and will be in the next release. moritz++ is primarily to thank for this work.
  • I realized that now we have LEAVE and UNDO phasers, I could implement “temp” and “let”. “temp” saves away the current value of a variable, and arranges for that value to be restored when the block is left, so any changes made to it will be undone. “let” is a way of making hypothetical changes to a variable; if the block is left due to an exception or if it evaluates to an undefined value, the change will be rolled back just like “temp”, but a successful completion of the block will leave the value in place. Both are now in place.
  • Several days back, moritz++ asked me for some inspiration of what to hack on. Amongst my suggestions was working on tagged imports/exports. And sure enough, with me throwing the odd commit in now and then, we have ’em working. Actually the main thing I did was improve the trait handling to support passing the tags to export; some work had been left undone awaiting real serialization, which we now have, so happily it was fairly straightforward to sort things out there.
  • pmichaud++ has been working on making things better if you build Parrot and Rakudo without ICU. Previously, any case-insensitive string match – even if it didn’t use anything beyond ASCII – would immediately die over a missing ICU. Now you can get away with that.
  • A while back I did a first cut implementation of “ff”, the flip-flop operator. It sorta worked, but we missed the “fff” form and, true to form, masak++ managed to find a bug to submit too. :-) Today I tossed the previous approach and re-did it, using a state variable for the underlying storage (since I did the first cut at it, the spec was updated to explicitly say that it should be done with a state variable). I also got the “fff” form in place. We now pass the vast majority of spectests for this feature.
  • A few days back, tadzik++ showed up with a snippet of code that ran insanely slowly. It was doing regex interpolation: / <$search_term> / or the like. Here, if $search_term is a string, it will be compiled to a regex and evaluated. Turns out that we weren’t caching that compilation with the string, though. So, if it had to scan or if you backtracked over the <$search_term>, then tried it again, you’d end up compiling it every time. A little caching later and things went rather faster. I was happy to have a profiler to hand; the call counts into the compiler very quickly showed what was up.
  • Our END phasers didn’t always run when you’d hope they did. In the last few days, moritz++ ensured they run on exceptional exits, and I made sure they run when execution is terminated using the “exit” function.
  • This evening, kboga++ showed up with a patch to turn Real, which ended up as a class during the early days of the “nom” development branch, into a role as it should be. This enabled custom numeric types. It also enabled us to run real-bridge.t, which brought and extra 200 spectests, pushing us nicely over the 22,000 passing spectests mark.
  • In various other fixes: //=, ||= and &&= now short-circuit properly, reduction meta-operators on list-associative ops do the right thing (pmichaud++), %*ENV propagates changes properly (tadzik++), ms// now works (moritz++) and enumeration types can now turn themselves into roles and be composed or mixed in to things.

There’s still over a week before the next compiler release, and I’m sure there will be some more things beyond this. It’s already looking like a nice bunch of improvements, though.

In other news, the work on QAST has also been chugging along. QAST is a replacement for PAST, which is a set of AST nodes together with something that maps them down to POST (Parrot Opcode Syntax Tree), which then becomes PIR code (which Parrot then turns into bytecode and runs). In essence, it’s part of the compiler guts. The not-quite-gone-yet old regex engine aside, PAST is the only significant part of our codebase that is still written in PIR (an assembly-ish language for Parrot). It also predates 6model, bounded serialization and many years of learned lessons. All that said, it gets a lot right, so much will stay the same.

QAST really is a simultaneous port to NQP, leaving out things we came to realize were bad or unrequired, adding much better 6model integration, unifying the notion of “ops” somewhat, improving native type support and taking advantage of native types during its implementation to get the AST nodes more compact, so we can save on memory during the compilation. Additionally, it will unblock masak++’s work on quasi splicing in macros, and is a step towards Rakudo targeting other backends being a reality rather than a nice idea. So, lots of win lies ahead…after the hard work of getting it landed. I’m hoping to make some significant steps forward on it during May.

Last but not least, masak++ and I will be heading over to the Bristol IT Megameet the weekend after next, giving a 30 minute joint talk on Perl 6, followed by a couple of hour tutorial. Looking forward to it – and hopefully I’ll be able to sneak a British ale in while I’m over there too. :-)

Posted in Uncategorized | Leave a comment

Hackathoning in Oslo

I’m in Oslo with a bunch of Perl 6 folks. It’s great to see old friends and meet some new ones – and we’re having a highly productive time. After a wonderful evening of tasty food and lovely beer (amongst others, a delicious imperial stout) yesterday, today has been solidly focused on Getting Stuff Done.

I’m a little tired now, so here’s just a quick rundown of some of what I’ve been up to.

  • Since bounded serialization landed, we’ve had a few issues with pre-compilation of MiniDBI, the database access module. I’ve now tracked down all of the remaining issues there, fixed them. and happily moritz++ has been hacking lots on improving the module in other ways too. The MySQL and Postgres drivers now pass all the tests we have for them, which is some nice progress.
  • I’ve been answering a few questions for arnsholt++, who has picked up the Zavolaj (native calling) module where I left off, adding support for passing/returning arrays of structs, structs pointing to arrays and various other permutations. This will greatly improve the range of C libraries that can be used with it.
  • I had a design session with pmichaud++ on QAST, the successor to our current AST. The new nodes will integrate far better with 6model and bounded serialization, give us better native type handling and be much more memory efficient due to being able to use natively typed attributes in them. This is also a key part of our work towards getting Rakudo up and running on an extra back end.
  • After that, I got the nodes fleshed out somewhat, and have started a little work on QAST::Compiler too. It’s underway!
  • I also spent some time in the ticket queue and fixed a bunch of Rakudo bugs: constant initializers containing blocks now work out, state declarations together with list assignment work, $.foo/@.foo/%.foo now contextualize as they should, and :i now also applies to interpolated variables.

So, lots of stuff – and that’s just the things I’ve been directly involved with.  It’s nice to be a part of this hive of activity…and tomorrow there’s another day of this! Catch you then. :-)

Posted in Uncategorized | 1 Comment

Back from vacation, hackathon coming up!

So, I’m back from vacation. Turns out Argentina is a pretty awesome place to vacation in, too. As well as wonderful food and delicious imperial stout (amongst other good beers), there was walking like this…

…and other cool stuff, like glaciers…


…and so even though the laptop came with me, it was just too much fun to be outside, especially when the weather was good so much of the time. I did sneak in a few patches, though, most notably implementing PRE and POST phasers.

Anyway, I’m safely back, after an 8 hour flight delay from Buenos Aires and a small bus accident at Frankfurt airport. Yes, this airport fails SO hard they managed to screw up the 2 minute bus trip from the plane to the terminal…anyway, I got off with just a few small cuts. Suggest taking the train to YAPC::Europe this summer… :-)

So, what’s coming up? Well, this month brings a Perl 6 hackathon in Oslo, where I look forward to being together with a bunch of other Rakudo and Perl 6 folks. I’m sure we’ll get some nice stuff done, and some future directions worked out. I’m happy that one of the most industrious Perl Mongers groups I know when it comes to organizing such events is also set in a very pleasant city situated in a beautiful country. :-) By the way, it’s still very possible (and very encouraged) to sign up if you want to come along.

As moritz++ noted on rakudo.org, we’re skipping doing a distribution (Star) release based on the March compiler release since an unfortunate bug slipped in that busted precompilation of modules that used NativeCall. We hold ourselves to higher standards of stability in the distribution releases (which are user focused) than the compiler ones (which just ship at the same time each month), and this would have been too big a regression. The good news is that I’ve patched the bug today, so we’re now all clear for doing an April one – and what a nice release it should be.

Well, time for dinner here – which I’ll be having with masak++. No doubt macros will come up, and what’s needed to get us along to the next level with those. Stay tuned; the next month should be interesting in Rakudo land. :-)

Posted in Uncategorized | 2 Comments