Rakudo switched over to QRegex

In my last post – just two days ago – I talked about how the work to switch Rakudo over to using QRegex for parsing Perl 6 source was going well. I guessed there was a 90% chance we’d land it well in time for the next release, hoping it would happen sometime in the next week.

Well, after a flurry of fixes and testing, with contributions from moritz++, diakopter++ and tadzik++, it landed today. We got to the point where the spectests showed up zero regressions, which was a very encouraging sign. Then, tadzik++ did a run of Ementaler (automatic building and testing of all the modules) to see how module ecosystem had fared the transition. That caught one issue, which was easily fixed. After that…no regressions there either. And not only did we not regress on any spectests, but the improved LTM meant some previously failing spectests are now passing also.

So, it’s merged. NQP is now bootstrapped using a regex implementation written in itself, Rakudo now uses QRegex for parsing, and the next release will ship with it. And there’s still a good three weeks to go until the next release to tune it – and, of course, for plenty more general Rakudo development.

Posted in Uncategorized | 2 Comments

Switching to QRegex for parsing Perl 6 source

I planned to blog when I finished hacking on Rakudo stuff today. The headline has shifted about three times during the day, however, as I was able to make progress much faster than expected. Well, also because I was having fun and have stayed up doing this until 3am… :-)

So, what am I up to?

Since last summer, Rakudo has used two distinct regex engines. One is QRegex. If you’ve been writing grammars and regexes in Rakudo, you’ve been using it. It’s the shiny new one, doing LTM (Longest Token Matching) with an NFA, tracking the (now very occasional) regex spec changes and getting new features. It is implemented in NQP. The other one is far older, doesn’t do LTM apart from for literal prefixes, and is written in PIR. We’ve continued to use this for parsing Perl 6 source in Rakudo, as well as parsing NQP source in NQP.

I think it goes without saying that we’d rather have just one regex engine, and that working in a high level language is much more productive. So, we’ve wanted to switch everything over to QRegex for quite a while. At the same time, there’s been many other more pressing things to do, so the switch has been a “nudge it along now and then” project.

The work towards this has been ongoing for a few months in the qbootstrap branch in the NQP repository. The “q” comes from QRegex, but why “bootstrap”? Well, because there’s a fun little bit of circularity going on here. QRegex is written in NQP, but at the same time we want to use it as the engine for parsing NQP. It has to be able to parse and compile itself. Mix that with bounded serialization, and you need to make darn sure you’re doing your separate compilation right. Anyway, if you were wondering, “why didn’t you write it in NQP in the first place”, now you know why. You have to be able to parse NQP before you can write a program in NQP. :-)

Anyway, here’s what I thought I’d be excitedly reporting earlier on today.

Finally, the “qbootstrap” branch bootstraps!

This is a really nice milestone. In the last couple of days, I’ve been picking away at the qbootstrap branch, tracking down the last test failures, trying to make sure that an NQP built with QRegex was going to be able to compile itself. Not just that, but also that the compiled output would in turn be able to compile itself again. Finally, I managed to get there today. If you build the qbootstrap branch in NQP now, the only regex engine you’re using is QRegex. The older one isn’t built in the branch; in fact, the files are removed just to be really sure all was working properly.

And that would have been a nice thing to report, but…

The Rakudo compiler executable builds on NQP/qbootstrap!

It was always clear that Rakudo would need changes. Having got an NQP that does everything with QRegex to hand, I set it on building the Rakudo compiler. Note that when I say compiler, I do not mean CORE.setting – the 11,000+ line library with all the built-ins. Just the core of the compiler itself.

I was ready for trouble here. Rakudo’s grammar is notably more complicated than the one in NQP. However, in the end it was all relatively painless. A few updates to syntax, a few tweaks to use updated libraries, and…it built! That didn’t mean it had produced a parser that would actually work, of course. But that it got through all of the compiler source with so few complaints was very encouraging.

So, I thought, with that done, why not hack on? And so now…

The Rakudo CORE.setting builds on NQP/qbootstrap!

This took more effort. I was ready for this to take O(days), not O(hours). Especially when the first thing I got was a segfault a fraction of a second in to the compilation process. Thankfully, that turned out to be an easy fix. And then we got up to…line 106 and failed. What was wrong?

After tracing what the protoregex LTM calculations were doing, it soon became apparent it was trying to wrong branch first. Happily, though, the fix was to delete a lookahead. Even nicer, it was one we had got in order to work around lack of proper transitive LTM in the previous regex engin. Well, that was a win. Onto the next failure…

Over the next couple of hours, I tracked down and fixed a variety of issues, some in the grammar and some in QRegex itself. And finally…it built! But what about the spectests?

The spectest fallout is not too bad

The first run through the spectests looked horrible, with over a third of the test files – if not more – showing up with issues. Thankfully, I quickly realized that many of them shared a common and easily fixed root cause, due to a bug in handling a very common syntactic construct (quote words, of all things!)

The second run – and the way things stand now – looks much better. Out of over 600 test files, less than 30 exhibit failures that will need to be investigated. To say that within the last 24 hours I didn’t even have QRegex able to build NQP, this is great progress. Of course, while I’ve spearheaded this part of the work, much credit must go to pmichaud++, who built the vast majority of QRegex.

The plan from here

Earlier, I expected in this post to be saying there’s a 50/50 chance we’d ship the June Rakudo release with QRegex used for parsing. By now, I’d say there’s at least a 90% chance we will do so. The next monthly compiler release isn’t until the 21st of June; I’d want to land such a big switchover at least two weeks before that, and that currently feels very achievable. I’ll keep you posted.

What does this actually mean for Rakudo end users?

This work isn’t so much about immediate wins as it is about having a good base to build another round of user-facing improvements atop of. That said, there will be some nice improvements available right away.

  • The work done in the qbootstrap branch has vastly improved protoregex handling. The NFA evaluation currently stands at around 15-20 times faster, and the NFA construction is now transitive into subrule calls to other protoregexes (both a correctness and parse performance win). Your own grammars will benefit from these improvements.
  • Parsing Perl 6 is a pretty decent stress test for the engine, and of course showed up some bugs, with have been fixed. Again, this improves the experience for those using grammars and regexes in Rakudo.
  • Having one regex engine in memory instead of two should give a slight memory usage reduction.

Here’s some of the follow-up things that will come along as a result of this.

  • Once some of the “get stuff working” bits of the implementation are optimized, Rakudo should be able to parse Perl 6 source faster than before.
  • Work on precedence traits for user defined operators should now be unblocked.
  • Further convergence with the way STD parses things should be possible.
  • We can start considering how to implement things like slangs.

Further, if one of the things you’re looking forward to is Rakudo running on a VM besides Parrot, then this work also eliminates one of our last two large components written in PIR.

Anyways, it’s decidedly now time for some sleep. ‘night!

Posted in Uncategorized | 2 Comments

Since the Hackathon…

Last time I got around to writing here was while I was at the Oslo Hackathon. It was a truly great event: hugely productive, a great deal of fun and a real motivation booster too. I’d like to again thank Oslo.pm, and especially Salve, Rune and Jan, for thinking to organize this, and then making a superb job of doing so. Everything ran smoothly, there was lots of undistracted hacking time, and a couple of evening dinner and beer outings provided chance to enjoy time together as a team in real life, without the restricted bandwidth IRC normally enforces on us.

After the hackathon, it was right back to $dayjob, though in a fun way: teaching Git for a couple of days. Less fun was the couple of days after, which involved a lot of pain followed by a lot of noisy drilling at my local dental clinic. It took some days beyond that before I stopped feeling unusually tired… Anyways, things are back to normal again, or as normal as things ever tend to look for me… :-)

So, what’s been going on in Rakudo land since the hackathon? Lots and lots, as it turns out. Most public-facing, the April 2012 Rakudo Star release landed; moritz++ is to thank for getting the release out of the door this month. It’s the first distribution release to incorporate the bounded serialization work, which delivers the much-improved startup time. While I’ve talked about many of the compiler side improvements in previous posts, Rakudo Star also includes modules, and we had some exciting additions in this release too: Bailador (a Perl 6 port of Dancer), LWP::Simple and JSON::RPC.

Of course, one nice release out the door means time to start working on things to make the next one interesting. Here’s some of what to expect.

  • Following on from removing “lib” from the default @*INC, the current working directory (“.”) is also gone now. While they may be nice for development, they are not good defaults from a security perspective when you actually run stuff in a production environment. Making this a more comfortable removal, as well as supplying search paths through the PERL6LIB environment (as has been supported for a long while), the -I command line option and “use lib” have now been implemented, and will be in the next release. moritz++ is primarily to thank for this work.
  • I realized that now we have LEAVE and UNDO phasers, I could implement “temp” and “let”. “temp” saves away the current value of a variable, and arranges for that value to be restored when the block is left, so any changes made to it will be undone. “let” is a way of making hypothetical changes to a variable; if the block is left due to an exception or if it evaluates to an undefined value, the change will be rolled back just like “temp”, but a successful completion of the block will leave the value in place. Both are now in place.
  • Several days back, moritz++ asked me for some inspiration of what to hack on. Amongst my suggestions was working on tagged imports/exports. And sure enough, with me throwing the odd commit in now and then, we have ‘em working. Actually the main thing I did was improve the trait handling to support passing the tags to export; some work had been left undone awaiting real serialization, which we now have, so happily it was fairly straightforward to sort things out there.
  • pmichaud++ has been working on making things better if you build Parrot and Rakudo without ICU. Previously, any case-insensitive string match – even if it didn’t use anything beyond ASCII – would immediately die over a missing ICU. Now you can get away with that.
  • A while back I did a first cut implementation of “ff”, the flip-flop operator. It sorta worked, but we missed the “fff” form and, true to form, masak++ managed to find a bug to submit too. :-) Today I tossed the previous approach and re-did it, using a state variable for the underlying storage (since I did the first cut at it, the spec was updated to explicitly say that it should be done with a state variable). I also got the “fff” form in place. We now pass the vast majority of spectests for this feature.
  • A few days back, tadzik++ showed up with a snippet of code that ran insanely slowly. It was doing regex interpolation: / <$search_term> / or the like. Here, if $search_term is a string, it will be compiled to a regex and evaluated. Turns out that we weren’t caching that compilation with the string, though. So, if it had to scan or if you backtracked over the <$search_term>, then tried it again, you’d end up compiling it every time. A little caching later and things went rather faster. I was happy to have a profiler to hand; the call counts into the compiler very quickly showed what was up.
  • Our END phasers didn’t always run when you’d hope they did. In the last few days, moritz++ ensured they run on exceptional exits, and I made sure they run when execution is terminated using the “exit” function.
  • This evening, kboga++ showed up with a patch to turn Real, which ended up as a class during the early days of the “nom” development branch, into a role as it should be. This enabled custom numeric types. It also enabled us to run real-bridge.t, which brought and extra 200 spectests, pushing us nicely over the 22,000 passing spectests mark.
  • In various other fixes: //=, ||= and &&= now short-circuit properly, reduction meta-operators on list-associative ops do the right thing (pmichaud++), %*ENV propagates changes properly (tadzik++), ms// now works (moritz++) and enumeration types can now turn themselves into roles and be composed or mixed in to things.

There’s still over a week before the next compiler release, and I’m sure there will be some more things beyond this. It’s already looking like a nice bunch of improvements, though.

In other news, the work on QAST has also been chugging along. QAST is a replacement for PAST, which is a set of AST nodes together with something that maps them down to POST (Parrot Opcode Syntax Tree), which then becomes PIR code (which Parrot then turns into bytecode and runs). In essence, it’s part of the compiler guts. The not-quite-gone-yet old regex engine aside, PAST is the only significant part of our codebase that is still written in PIR (an assembly-ish language for Parrot). It also predates 6model, bounded serialization and many years of learned lessons. All that said, it gets a lot right, so much will stay the same.

QAST really is a simultaneous port to NQP, leaving out things we came to realize were bad or unrequired, adding much better 6model integration, unifying the notion of “ops” somewhat, improving native type support and taking advantage of native types during its implementation to get the AST nodes more compact, so we can save on memory during the compilation. Additionally, it will unblock masak++’s work on quasi splicing in macros, and is a step towards Rakudo targeting other backends being a reality rather than a nice idea. So, lots of win lies ahead…after the hard work of getting it landed. I’m hoping to make some significant steps forward on it during May.

Last but not least, masak++ and I will be heading over to the Bristol IT Megameet the weekend after next, giving a 30 minute joint talk on Perl 6, followed by a couple of hour tutorial. Looking forward to it – and hopefully I’ll be able to sneak a British ale in while I’m over there too. :-)

Posted in Uncategorized | Leave a comment

Hackathoning in Oslo

I’m in Oslo with a bunch of Perl 6 folks. It’s great to see old friends and meet some new ones – and we’re having a highly productive time. After a wonderful evening of tasty food and lovely beer (amongst others, a delicious imperial stout) yesterday, today has been solidly focused on Getting Stuff Done.

I’m a little tired now, so here’s just a quick rundown of some of what I’ve been up to.

  • Since bounded serialization landed, we’ve had a few issues with pre-compilation of MiniDBI, the database access module. I’ve now tracked down all of the remaining issues there, fixed them. and happily moritz++ has been hacking lots on improving the module in other ways too. The MySQL and Postgres drivers now pass all the tests we have for them, which is some nice progress.
  • I’ve been answering a few questions for arnsholt++, who has picked up the Zavolaj (native calling) module where I left off, adding support for passing/returning arrays of structs, structs pointing to arrays and various other permutations. This will greatly improve the range of C libraries that can be used with it.
  • I had a design session with pmichaud++ on QAST, the successor to our current AST. The new nodes will integrate far better with 6model and bounded serialization, give us better native type handling and be much more memory efficient due to being able to use natively typed attributes in them. This is also a key part of our work towards getting Rakudo up and running on an extra back end.
  • After that, I got the nodes fleshed out somewhat, and have started a little work on QAST::Compiler too. It’s underway!
  • I also spent some time in the ticket queue and fixed a bunch of Rakudo bugs: constant initializers containing blocks now work out, state declarations together with list assignment work, $.foo/@.foo/%.foo now contextualize as they should, and :i now also applies to interpolated variables.

So, lots of stuff – and that’s just the things I’ve been directly involved with.  It’s nice to be a part of this hive of activity…and tomorrow there’s another day of this! Catch you then. :-)

Posted in Uncategorized | 1 Comment

Back from vacation, hackathon coming up!

So, I’m back from vacation. Turns out Argentina is a pretty awesome place to vacation in, too. As well as wonderful food and delicious imperial stout (amongst other good beers), there was walking like this…

…and other cool stuff, like glaciers…

…and so even though the laptop came with me, it was just too much fun to be outside, especially when the weather was good so much of the time. I did sneak in a few patches, though, most notably implementing PRE and POST phasers.

Anyway, I’m safely back, after an 8 hour flight delay from Buenos Aires and a small bus accident at Frankfurt airport. Yes, this airport fails SO hard they managed to screw up the 2 minute bus trip from the plane to the terminal…anyway, I got off with just a few small cuts. Suggest taking the train to YAPC::Europe this summer… :-)

So, what’s coming up? Well, this month brings a Perl 6 hackathon in Oslo, where I look forward to being together with a bunch of other Rakudo and Perl 6 folks. I’m sure we’ll get some nice stuff done, and some future directions worked out. I’m happy that one of the most industrious Perl Mongers groups I know when it comes to organizing such events is also set in a very pleasant city situated in a beautiful country. :-) By the way, it’s still very possible (and very encouraged) to sign up if you want to come along.

As moritz++ noted on rakudo.org, we’re skipping doing a distribution (Star) release based on the March compiler release since an unfortunate bug slipped in that busted precompilation of modules that used NativeCall. We hold ourselves to higher standards of stability in the distribution releases (which are user focused) than the compiler ones (which just ship at the same time each month), and this would have been too big a regression. The good news is that I’ve patched the bug today, so we’re now all clear for doing an April one – and what a nice release it should be.

Well, time for dinner here – which I’ll be having with masak++. No doubt macros will come up, and what’s needed to get us along to the next level with those. Stay tuned; the next month should be interesting in Rakudo land. :-)

Posted in Uncategorized | 2 Comments

Meta-programming slides, and some Rakudo news

I’m just back from a trip to the German Perl Workshop. Of course, the food was great and the beer was greater. :-) I was also very happy to see various Rakudo hackers in person; moritz++ was one of the organizers, and tadzik++ attended and gave a talk also. Much fun was had, and some hacking happened too. :-)

You may be interested to check out the slides from my talk on meta-programming in Perl 6. It covers some of the things we’ve been able to do for a while, as well as having a new example that shows off an interesting use of bounded serialization in combination with meta-programming. Also, I have now submitted this talk to the French Perl Workshop, which will take place in June. :-)

As for Rakudo progress, here’s some of the things that have been going on.

  • The bounded serialization branch has landed (and thus will be in the next release)! This has, as hoped, greatly decreased Rakudo’s startup time – an improvement that has been welcomed by a bunch of folks on #perl6. Module and setting pre-compilation is now also faster, along with loading of pre-compiled modules. Win all around.
  • Thanks to the serialization, various things have been unblocked. Amongst them, the constant declarator now takes non-literals on its RHS, and the BEGIN and CHECK phasers now work in r-value context. In both cases, the values are serialized if these things are done in a pre-compiled module.
  • masak++ merged his initial work on macros. While there’s much more to do in this area yet, it’s great that we now have the basics in place, enabling declarations of macros and quasis. I’m sure masak will blog about this soon – keep a lookout for it.
  • tadzik++ has got us some support for the Set, Bag, KeySet and KeyBag types.
  • moritz++ cleaned up match handling, and also implemented the :exhaustive and :nth match adverbs. So, that’s two out of three branches mentioned in my last post landed. :-)
  • Stunning progress on phasers! We now support ENTER, LEAVE, KEEP, UNDO and START. The LEAVE, KEEP and UNDO drew very heavily on work by mls++. Additionally, FIRST, NEXT and LAST are supported inside for loops (though not yet available in other loops). I’m fairly fired up to do PRE and POST too, though need some spec clarifications first.

moritz++ has also been continuing his excellent typed exceptions work; we discussed how to handle typed exceptions in the metamodel during the workshop, which should extend their reach to cover even more of the errors that can crop up.

So, all in all, lots of progress, and we’ve still just under two weeks to go until the March release. Having landed the bounded serialization work, I’ve been enjoying doing some feature development; right now, I have a branch that is muchly cleaning up name handling, improving name interpolation and getting various of the missing pseudo-packages in place. I expect to have this work completed in time to go in to the release also.

Beyond that, it’ll be time for me to dig back into some of the fundamentals again. Most immediately, that will be getting QRegex to be the engine we use to parse NQP and Perl 6 code (currently, we use QRegex for compiling the regexes and grammars you write in your Perl 6 programs, but still use the previous generation engine for parsing Perl 6 source). This will allow us to eliminate a vast swathe of Parrot PIR code that we depend on (the new regex compiler is written in NQP), aiding portability. I aim to land this in time for the April release; I’ve already done significant pieces of the work towards it.

Probably somewhat in parallel with that (though set to land after it), I’ll also be taking on our AST and AST compiler. This work will see it being ported from PIR to NQP, switched over to use 6model objects (which means we can use natively typed attributes to make the nodes more memory efficient, and also serialize the nodes as macros require) and given much better handling of native types. Generally I’m hoping for compilation performance improvements and better code generation. This will also eliminate our last large PIR dependency, thus clearing the final hurdle for making a real stab at targeting an extra backend.

One slight slowdown will be that I’m taking a vacation during the second half of March. I’m going to a land far away, and plan to spend most of my time seeing it, and checking out their food and drink. The laptop is coming, though, so it won’t be a complete stop; my stress levels are happily in fairly good shape right now, so I don’t need my vacation to be a complete break with normality, just a refreshing change of scene. :-)

Posted in Uncategorized | 2 Comments

Rakudo Star 2012.02 out – and the focus for 2012.03

I’ve just cut the February 2012 release of Rakudo Star. The good news is that we got a bunch of nice improvements into the release.

  • More typed exceptions, and better backtraces. This work has been done by moritz++; I’ll simply point you to his excellent blog post on that topic today. I think everyone working with Rakudo will appreciate the new backtraces, which cut out a lot of the uninformative details and are thus much more focused.
  • FatRat is now implemented. This type gives you rational number support with big integer semantics for both the numerator and denominator. The Rat type now correctly degrades to a Num if the denominator gets too large, as per spec (this avoids performance issues in certain cases). Again, we’ve moritz++ to thank for this.
  • We have object hashes! You can now make a declaration like:
    my %hash{Any};
    Which will have keys of type Any. You can constrain this as you wish. You can also still constrain the value in the usual way:
    my Int %hash{Any};
    This latter example gives you a hash with Int values and object keys, meaning that keys are not coerced to strings, as is the default. Thus, using %hash.keys will give you back the original objects. Identify is done based on .WHICH and thus ObjAt.
  • The coercion syntax, Int($foo), is now implemented. This gives you another – perhaps neater – way to write $foo.Int, but types may also implemented the method postcircumfix:<( )> if the target type is the one that knows how to do the coercion.
  • The reduction meta-operator used to have crazy bad performance, and parsed dodgy in some cases. Both of these issues are now resolved; it’s an order of magnitude faster and parses more correctly.
  • Various regex improvements and more problems caught at compile time, which I discussed in my previous post.

As always, there’s a bunch of bug fixes and other tweaks – see the Rakudo ChangeLog for more details.

Naturally, one release done simply means another one to work towards. :-) If you’ve been following along with my posts, you’ll have noticed that the bounded serialization work I’ve been doing didn’t make it in to the 2012.02 release; it just wasn’t done in time. The good news is that it’s well on course to land in very good time for the 2012.03 release.

Today I got to the milestone of having a Rakudo binary, with a serialized CORE.setting, that will compile Test.pm and attempt the spectests. There’s a bunch of failures (which I expected), but a lot more passes than fails. Anyway, that means things are essentially working and I’m into triage mode. I’ve also got some very preliminary data on what kind of improvements it brings.

Just from trying it out, I could feel that startup was faster, as hoped. tadzik++ built the bounded serialization branch and the current main development branch and compared the time they took for a simple “hello world” (such a simple program that startup cost will dominate). The result: the bounded serialization branch ran it in around 25% of the time. I think we can squeeze a bit more out yet, but achieving startup in a quarter of the time we do today is a big improvement. Loading time for pre-compiled modules will see a similar improvement.

I also checked memory consumption when compiling CORE.setting. The bounded serialization branch uses 60% of the memory that the main development branch uses. I’d been hoping that this work would cut memory usage by around a third, and it’s done at least that. I didn’t measure the speedup for this task yet; while the other bits are fair comparisons, this one would not be yet even if I had a figure, since the optimizer is currently disabled (one of the transformations causes issues; I’m not expecting it to be a huge pain to resolve, but was more focused on the critical path to running spectests again so I could understand the fallout better).

Once the busted spectests are running again, I’ll get this merged, then dig in to exploiting some of the opportunities it makes available. I’m hoping we can get some more nice features in place for the next release also; moritz++ has already got branches for sink context and the :exhaustive modifier in regexes in progress, and I’ve got some ideas of things to hack on… :-)

Posted in Uncategorized | 1 Comment