Bounded serialization, better regexes and better errors

Here’s a quick roundup of some of the things that have been going on in Rakudo since the January release.

Bounded Serialization

This is where I’ve been focusing the majority of my efforts. At the moment, when we pre-compile Perl 6 code – such as the CORE setting with all the built-ins – we also build up a bunch of code that reconstructs the various objects that are made as a result of compile time declarations. This is done by recording an “event” for each thing that happens at compile time, then replaying them when the module/setting is loaded.

We’ve done things this way throughout every generation of Rakudo to date, but as I worked on the “nom” development branch and 6model, I built things so that we could later switch over to a different model; one where all of the objects created as a result of compile-time declarations are serialized. Then, when we need to load the module, we deserialize this blob of data back into the required objects, rather than doing all of the method calls and data shuffling to build them up again.

I’ve been working on this since we got the last release out. So far I’ve got the serialization and deserialization engine to a reasonably capable state; it can happily handle references between compilation units, has no problems with circular references between objects, and today I got basic support for serializing closures in place also. At this point, I’ve just got it being exercised by a bunch of tests; the next step will be to integrate it into NQP’s compilation process. I’m hoping I can get through that at the weekend, and after that it’ll be time to try and get Rakudo using it. Whether I get this landed for the February release or if we have to wait until the March one, I’m not yet sure. I know for sure I don’t want a half-baked version of it going out, so I’ll hold it for the March release if I can’t get it working as reliably as I want for the February one.

Why am I working on this? Here’s some of the things that I’m aiming at as a result of this work.

  • Improved startup time (the deserialization should be less work than rebuilding everything up from scratch)
  • Reduced memory during pre-compilation of modules. Most notably, I’m hoping to make a notable reduction in the memory (and time) required to build CORE.setting, which will most certainly be welcomed by anyone trying to build in a lower memory environment. The faster build will also be helpful for Rakudo developers, and enable us to be more productive.
  • The restrictions on the “constant” declarator can be lifted, enabling use of non-literal values.
  • Phasers as r-values can be implemented.
  • We can implement constant folding much, much more easily, as well as build other immutables at compile time.
  • Other nice things, no doubt. :-)

Completing this will also be one prerequisite down for much better inlining optimization support.

More Regex Bits

We now support the use of <x> in a regex to also call any predeclared lexical regex “x”. The <Foo::Bar::baz> syntax for calling a rule from another grammar is also now supported. A nasty bug that caused <!> not to work has been fixed. Finally, <prior> – which matches the same string that the last successful match did – is now implemented.

Better Errors and Exceptions

moritz++ has continued this typed exception work. We’ve also been improving various bits of error reporting to be more informative, and catching a few more things at compile time. moritz++ has also ported STD’s $*HAS_SELF handling, which gives us better handling of knowing when operations needing an invocant can take place. We currently have this work in a branch (it depends on some other work I was doing to align us with changes to how STD parses initializers, and there’s one last bug that prevents us merging it just yet; it’ll be sorted out in the coming days, I’m sure).

Pod Bits

tadzik++ dropped by with a couple of patches, one a bug fix, the other pretty cute: it makes the auto-generated usage message for MAIN subs include the declarator documentation.

Other

Thanks to moritz++, we now support copy and rename functions. There’s also been the usual range of bug fixes that we get into every release, steadily chipping away at making Rakudo better.

Posted in Uncategorized | 4 Comments

This month’s Rakudo Star release – and what’s coming next

So, we made it – a Rakudo Star release based on the “nom” development branch has landed. It’s based on the compiler release moritz++ cut earlier this week, and pmichaud++, masak++ and myself have been involved in getting the Star-specific build and installation scripts in shape.

Getting to this release has been a lot of work; in many sense this is a revolution in Rakudo’s development rather than a small evolutionary step over what we had before. It’s taken a while (indeed, longer than I’d first hoped) and has been a lot of work – but it was worth it. Not just because of the many improvements that are in this release, but because of the enormous future potential that we now have.

Here’s some of the things I’m happiest about in the release.

  • The performance improvements in many areas. Yes, we’ve plenty of work to do here – but this is a solid step forward for a wide range of scripts, and in some cases an order of magnitude improvement.
  • That 6model – something I started designing a year and a half ago – has not only cleanly supported all of the things we needed to do in Rakudo, but also opened up so many other doors. For example, the new NativeCall module uses its representation polymorphism support to great effect.
  • Protoregexes doing real NFA-driven Longest Token Matching rather than the cheating version we had before that only operated on literals.
  • The optimizer, along with the various extra compile time error reporting it gives. This will be an important future area for Rakudo.
  • Initial native type support, and bigint semantics for the Int type.
  • The POD6 support, thanks to tadzik++’s Google Summer of Code grant in summer.

So, what’s next? Currently I’m working hard on getting true bounded serialization support in place. This should further improve BEGIN time support (including constructs that depend on BEGIN time), greatly cut down resource consumption during CORE.setting compilation (both time and memory) and give us faster startup. It’s hard to guess at figures for the improvement, but I’m expecting it to be a noticeable improvement in all of these areas. I’m aiming at getting this landed for the next Rakudo compiler release (which I expect us to do a Star release based on too), though largely it depends on whether I can get it working and stable enough in time; while some parts are a simple matter or programming, other parts are tricky.

That aside, we’ve already got various other new features in the pipeline; even since last weekend’s compiler release, there are multiple new regex-related things in place, moritz++ has continued with his typed exceptions work, we’re catching a couple more errors at compile time rather than letting them slip through until runtime, and there’s various other miscellaneous bug fixes. Also, masak++ is working on macros in a branch, and I’m optimistic that we’ll have some initial macro support in place by the next release also. Busy times! :-)

Posted in Uncategorized | 6 Comments

Looking back, looking forward

So, 2012 is here, and here’s my first Perl 6 post of the year. Welcome! :-)

Looking Back

2011 brought us a faster Rakudo with vastly improved meta-programming capabilities, the first work on exploring native types in Perl 6, the start of a powerful type-driven optimizer and many other bits. It also took me to various conferences and workshops, which I greatly enjoyed. I’d like to take a moment to thank everyone involved in all of this!

This was all great, but slightly tainted by not managing to get a Rakudo Star distribution release out based on a compiler release with all of these improvements. I’d really hoped to get one out in December. So what happened? Simply, there were a certain set of things I wanted to get in place, and while many of them got done, they didn’t all happen. While the compiler releases are time based – we do one every month – the distribution releases are more about stability and continuity. By the time I headed back to the UK to spend Christmas with family, we were still missing a some things I really wanted before a Star release was done. Given the first thing that happened when I started relaxing a little was that I immediately got unwell, I figured I should actually use my break as, well, a break – and come back recharged. So, I did that.

So, let’s try again

So, the new goal is this month. I’m happy to report that in the week since I’ve got back to things, one of the things I really wanted to sort out is now done: Zavolaj, the native calling library, now does everything the pre-6model version of it did. In fact, it does a heck of a lot more. It’s even documented now! It’s also far cleaner; the original implementation was built in the space of a couple of days with mberends++ while I was moving apartment, and was decidedly hacky in places. The missing bits of the NativeCall library were important because they are depended on by MiniDBI, and I really didn’t want to ship a Rakudo Star that can’t connect to a database. So, next up is to make sure that is in working order. I’m not expecting that to be difficult.

That aside, there were some things to worry about in Rakudo itself. I’ve dealt with some of those things in the last week, and perhaps the one important remaining thing I want to deal with before Star is a nasty regex engine backtracking related bug (I’ve been hoping pmichaud++, the regex engine guru, might appear and magic it away, but it seems it’s going to fall on my plate). But overall, we’re well on track to cut the Star release this month.

What’s the direction for the year ahead?

During 2011, Rakudo underwent a significant overhaul. It was somewhat painful, at times decidedly not much fun, but ultimately has been very much worth it: many long standing issues have been put to rest, performance has been improved and many things that were once hard to do are now relatively easy or at least accessible.

I think it goes without saying that we won’t be doing any such wide-ranging overhaul in 2012. :-) The work in 2011 has opened many doors that we have yet to walk through, and 2012 will see us doing that.

At a high level, here’s what I want:

  • Less bugs, more stability: mostly this is about continuing to work through the ticket queue and fix things, adding tests as bugs are fixed to ensure they stay fixed.
  • Better error reporting: there are things in STD, the Perl 6 standard grammar, that allow it to give much more informative error reports on syntax errors than we often can in Rakudo today. I want us to bring these things into Rakudo. Additionally, there’s plenty of improvements to be made in runtime errors. Furthermore, I want to expand the various bits of static analysis that I have started doing in the optimizer to catch a much wider range of errors at compile time.
  • Run programs faster: this process is helped by having decent profiling support these days. There’s a lot more to be done here; the optimizer will help, as will code generation improvements.
  • Compile programs faster: this will come from more efficient parsing and greatly improving the quality of the code NQP generates (NQP is the language we write much of the compiler in)
  • Shorter startup time: this mostly involves finishing the bounded serialization work up. I think the best way to describe this stuff is “fiddly”.
  • Use less memory: Rakudo has got faster in no small part thanks to being able to understand its performance through profiling. Our understanding of its memory consumption is much more limited, which makes it harder to target improvements. That said, I’ve some good guesses, and some ideas for analyzing the situation.
  • More features: while being able to do the things Rakudo can do today faster and with less bugs would give a very usable language for quite a lot of tasks, there’s still various features to come. In particular, Rakudo’s support for S05 and S09 needs work.
  • VM Portability: we’ve made good progress towards being able to make a serious stab at this over the last year, while at the same time also managing to perform vastly better on Parrot too. With help from kshannon++, I’m currently working on completing the switch to QRegex (a regex engine with NFA-powered LTM, and written in NQP rather than PIR), which should carry on a pattern of simultaneously increasing performance and portability. Beyond that will be an overhaul of our AST and code generation, with the same goal in mind.

So, lot’s of exciting things coming up, and I look forward to blogging about it here. :-)

A way to help

There are many ways to get involved, depending on what you’re interested in. One way is to take a look at our fixed bugs that need tests writing. At the time of writing, just short of 100 tickets are in this state. No guts-y knowledge needed for this – you just need to understand enough Perl 6 (or be willing to learn enough) to know what the ticket is about and how to write a test for it. Drop by #perl6 for hints. :-)

Posted in Uncategorized | 3 Comments

Rakudo: this week’s release, and the next Rakudo Star

On Thursday, tadzik++ cut release #46 of Rakudo. This time, we named it for the London Perl Mongers, organizers of this year’s outstanding and very well attended London Perl Workshop. I’d not been to one for a couple of years, and I’d forgotten what a fun event it is. This one felt better than ever. So, thanks folks! :-)

So, what was in the Thursday release?

  • Big Integer Support: the Int and Rat type now support very large values without loss of precision/coercing to floating point. This means you can do such things as compute factorial to 1000 (which stringifies to 2568 characters, by the way :-)). You don’t need to get any libraries set up for this – we bundle a slightly extended libtommath with NQP, and it’s exposed by some ops and an extra 6model representation, which the Int type inlines into its body (so Int is still a single garbage collectable object at the VM level). Thanks go to moritz++ for doing much of the work on this.
  • Protoregexes with LTM: it’s taken us a lot longer to get protoregexes back into the nom development branch than we’d initially hoped. Here’s the story. A while ago, pmichaud++ worked on real Longest Token Matching support – driven by an NFA – in a re-development of the regex engine. We’d only ever had “cheating” protoregexes in Rakudo before – ones that could work on literal prefixes, but nothing more. The improved engine is a really nice piece of work – amazingly little code for what it does, and very elegant. Sadly, pmichaud has not been able to work so much on Rakudo of late, so the work lay not-quite-done and unmerged for a while. Just ahead of the release, I picked it up, and found it was far enough along and sufficiently easy to get in to that I could get a first cut integrated in time for this month’s release. Since the release, along with diakopter++, I’ve been hacking away at regexy bits, so there’ll be many improvements in this area in next month’s release. It was nice to get something in place for this month’s, though.
  • CATCH improvements: mls++ did a bunch of work that made our handling of CATCH blocks far, far more in line with the Perl 6 specification. Naturally, this makes them much more useful; you now can write when blocks in your CATCH, and any exceptions you don’t match will get re-thrown to the next handler. Write a default block to swallow everything. We also do the stack unwinding at the correct point now. Great work!
  • Improved MAIN argument parsing: japhb++ has been leading the way on improving our support for the MAIN subroutine, including nice usage message generation. This is a really nice Perl 6 feature for writing command line applications, so it’s good to see work here. :-)
  • 6model REPR API 2: this is very much a behind the scenes thing, and deserves a post of its own for those who follow the blog in order to know more about metamodel design. However, in short: I re-worked the REPR API in 6model somewhat, and updated the various representations to use it. The immediate result was that we could efficiently have Perl 6′s Int type do big integer stuff. However, it’s also the foundation for some forthcoming work on compact structs and packed arrays. Hopefully at a user level, though, you’ll notice nothing at all about this change; it’s all guts. :-)
  • Various other fixes and improvements: bugs fixed, performance improved, missing stuff implemented…we had quite a few smaller, but very worthwhile things done in this release too.

So, this is great, but the next question on everybody’s mind is: when is a nom-based Rakudo Star release coming out? The answer – epic disasters aside – is “December”. Yes, next month. Two big things have happened in the last week: grammar support in the nom development branch drastically improved, and this unblocked tadzik++ to get our ecosystem tools (for module build and installation) working with it. The focus from here is…

  • Native call support: get our support for loading and calling into C libraries re-established. The NativeCall module basically worked…apart from it made a load of guts-y assumptions that don’t hold true any more, and was somewhat restricted in what it could handle (for multiple reasons). Now we have 6model and native type support, we can do far better. My task for over the next week is to get our native call support back in shape, all being well with a bunch more features.
  • More regex work: while we’ve come a really long way in recent days – we have many things working that never worked in the previous development branch – there are still some issues to address, including one nasty backtracking issue.
  • Fixing Modules: quite a few modules already work just fine on nom. Some block on native call support, others on the remaining regex stuff. Others will point to bugs or will be using outdated semantics or relying on previous bugs. tadzik++ has figured out how to automatically run through the module ecosystem, build modules and run their tests, so we can get a good sense of what needs doing.

And, of course, the usual round of feature addition, bug fixing and performance work.

And, with that, it’s back to the hacking. :-)

Posted in Uncategorized | 2 Comments

Slides from my Optimizing Rakudo Perl 6 talk

Over the weekend, I visited to Bratislava for a few days, the beautiful city I once called home. It felt oddly familiar, and I found myself noticing all kinds of little changes here and there – where one shop had given way to another, or a statue had appeared or changed. Happily, my favorite eating and watering holes were still there, and my sadly somewhat rusted Slovak language skills were still up to decoding menus and ordering tasty beer, and I did plenty of both. :-)

I was there was to attend the Twin City Perl Workshop. I repeated my Perl 6 grammars talk, and gave a new one about optimizing Rakudo. This included both the optimization work myself and others have been doing, but also some details about the optimizer. I also made a couple of nice diagrams of Rakudo’s overall architecture and what it does with a program.

You can get the slides here, or if you’re heading to the London Perl Workshop this coming Saturday, I’ll be delivering it there too. Enjoy! :-)

Posted in Uncategorized | Leave a comment

An optimizer lands, bringing native operators

For some weeks now, I’ve been working on adding an optimizer pass to Rakudo and implementing an initial set of optimizations. The work has been taking place in a branch, which I’m happy to have just merged into our main development branch. This means that the optimizer will be included in the October release! :-) In this post, I want to talk a little about what the optimizer can do so far.

When Optimization Happens

When you feed a Perl 6 program to Rakudo, it munches its way through your code, simultaneously parsing it and building an AST for the executable bits, and a bunch of objects that represent the declarative bits. These are in various kinds of relationship; a code object knows about the bit of as-yet uncompiled AST that corresponds to its body (which it needs to go and compile just in time should it get called at BEGIN time), and the AST has references to declarative objects (types, subs, constants). Normally, the next step is to turn this AST into intermediate code for the target VM (so for Parrot, that’s PIR). The optimizer nudges its way in between the two: it gets to see the fully constructed AST for the compilation unit, as well as all of the declarative objects. It can twiddle with either before we go ahead and finish the compilation process. This means that the optimizer gets to consider anything that took place at BEGIN and CHECK time also.

Using The Optimizer

The optimizer has three levels. The default level is 2. This is “optimizations we’re pretty comfortable with having on by default”. It’s possible to pass –optimize=3, in which case we’ll throw everything we’ve got at your program. If it breaks as a result, please tell us by filing an RT ticket; this is the pool of candidate optimizations to make it into group 2. After an optimization has had a while at level 2, combined with a happy and trouble-free history, we’ll promote it into level 1. Using –optimize=1 at the moment gets you pretty much nothing – the analysis but no transformations. In the long run, it should get you just the optimizations we feel are really safe, so you won’t lose everything if you need to switch down from –optimize=2 for some reason. Our goal is that you should never have to do that, of course. However, it’s good to provide options. My thanks go to pmichaud++ for suggesting this scheme.

Compile Time Type Checking of Sub Arguments

One thing the optimizer can do is consider the arguments that will be passed to a subroutine. If it has sufficient type information about those arguments, it may be able to determine that the call will always be successful. In this case, it can flag to the binder that it need never do the type checks at run time. This one can actually help untyped programs too. Since the default argument type is Any, if you pass a parameter of one subroutine as an argument to another, it can know that this would never be a junction, so it never has to do the junction fail-over checks.

Compile Time Multiple Dispatch Resolution

While the multiple dispatch cache current Rakudo has is by some margin the best it has ever had in terms of lookup performance, it still implies work at run time. Given enough information about the types of the arguments is present, the optimizer is able to resolve some multiple dispatches at compile time, by working out cases where the dispatch must always lead to a certain candidate getting invoked. Of course, how well it can do this depends on the type information it has to hand and the nature of the candidates. This is a double saving: we don’t have to do the multiple dispatch, and we don’t have to do the type checks in the binding of the chosen candidate either.

Basic Inlining

In some (currently very constrained) cases, if we know what code is going to be called at compile time, and we know that the types of arguments being passed are all OK, we can avoid making the call altogether and just inline the body of the subroutine right into the caller. Of course, this is only beneficial in the case where the work the subroutine does is dominated by the overhead of calling it, and there are some cases where inlining is impossible to do without causing semantic differences. For now, the focus has been on doing enough to be able to inline various of the setting built-ins, but it’s in no way restricted to just doing that. With time, the inline analysis will be made much smarter and more capable.

Native Operators

As part of getting the optimizer in place, moritz++ and I have also worked on native operators (that is, operators that operate on native types). This boils down to extra multiple dispatch candidates for various operators, in order to handle the natively typed case. However, something really nice happens here: because you always have to explicitly declare when you are using native types, we always have enough type information to inline them. Put another way, the native operator multis we’ve declared in the setting will always be inlined.

We’ve some way to go on this yet. However, this does already mean that there are some nice performance wins to be had by using native types in your program (int and num) where it makes sense to do so.

As an example, with –optimize=3 (the maximum optimization level, not the default one), we can compare:

my $i = 0; while $i < 10000000 { $i = $i + 1 }; say $i

Against:

my int $i = 0; while $i < 10000000 { $i = $i + 1 }; say $i

On my box, the latter typed version completes in 4.17 seconds, as opposed to the untyped version, which crawls in at 33.13 (so, a factor of 8 performance gain). If you’re curious how this leaves us stacking up against Perl 5, on my box it does:

my $i = 0; while ($i < 10000000) { $i = $i + 1 }; say $i

In 0.746 seconds. This means that, with type information provided and for this one benchmark, Rakudo can get within a factor of six of Perl 5 – and the optimizer still has some way to go yet on this benchmark. (Do not read any more into this. This performance factor is certainly not generally true of Rakudo at the moment.)

We’ll be continuing to work on native operators in the weeks and months ahead.

Immediate Block Inlining

We’ve had this in NQP for a while, but now Rakudo has it too. Where appropriate, we can now flatten simple immediate blocks (such as the bodies of while loops) into the containing block. This happens when they don’t require a new lexical scope (that is, when they don’t declare any lexicals).

That Could Never Work!

There’s another nice fallout of the analysis that the optimizer does: as well as proving dispatches that will always work out at compile time, it can also identify some that could never possibly work. The simplest case is calling an undeclared routine, something that STD has detected for a while. However, Rakudo goes a bit further. For example, suppose you have this program:

sub foo($x) { say $x }
foo()

This will now fail at compile time:

CHECK FAILED:
Calling 'foo' will never work with no arguments (line 2)
    Expected: :(Any $x)

It can also catch some simple cases of type errors. For example:

sub foo(Str $s) { say $s }
foo(42)

Will also fail at compile time:

CHECK FAILED:
Calling 'foo' will never work with argument types (int) (line 2)
    Expected: :(Str $s)

It can handle some basic cases of this with multiple dispatch too.

Propagating Type Information

If we know what routine we’re calling at compile time, we can take the declared return type of it and use it in further analysis. To give an example of how this aids failure analysis, consider the program:

sub foo() returns Int { 42 }
sub bar(Str $s) { say $s }
bar(foo())

This inevitable failure is detected at compile time now:

CHECK FAILED:
Calling 'bar' will never work with argument types (Int) (line 3)
    Expected: :(Str $s)

The real purpose of this is for inlining and compile time multi-dispatch resolution though; otherwise, we could never fully inline complex expressions like $x + $y * $z.

Optimizing The Setting

Since we have loads of tests for the core setting (many of the spectests cover it), we compile it with –optimize=3. This means that a bunch of the built-ins will now perform better. We’ll doubtless be taking advantage of native types and other optimizations to further improve the built-ins.

Gradual Typing

Many of these optimizations are a consequence of Perl 6 being a gradually typed language. You don’t have to use types, but when you do, we make use of them to generate better code and catch more errors for you at compile time. After quite a while just talking about these possible wins, it’s nice to actually have some of them implemented. :-)

The Future

Of course, this is just the start of the work – over the coming weeks and months, we should gain plenty of other optimizations. Some will focus on type-driven optimizations, others will not depend on this. And we’ll probably catch more of thsoe inevitable run time failures at compile time too. In the meantime, enjoy what we have so far. :-)

Posted in Uncategorized | 3 Comments

This is not enough!

The time for some shiny new hardware came around. Sat next to me, purring decidedly more quietly that its predecessor, is my new main development machine: a quad core Intel Core i7, pimped out with 16 GB of RAM and a sufficiently generous SSD that it can hold the OS, compiler toolchain and projects I work most actively on. It’s nice having a $dayjob that likes keeping their hackers…er, consultants…well kitted out. :-)

So, the question I had to ask was: how fast can this thing run the Rakudo spectests? I tried, and with –jobs=8 (the sweet spot, it seems) it chugged its way through them in 220s. That’s vastly better than I’d ever been able to do before, and I could immediately see it was going to be a boon for my Rakudo productivity. 3 minutes 40 seconds. Not so long to wait to know a patch is fine to push. But…what if it was less? It’s fast but…this is not enough!

A while ago, moritz++ showed how the nom branch of Rakudo ran mandelbrot 5 times faster than master. This was a fairly nice indicator. Around the time my new hardware arrived, an update was posted on #perl6: mandelbrot was now down to 2 minutes on the same machine the original tests were done. Again, I was happy to see progress in the right direction but I couldn’t help but feel…this is not enough!

So, I took a few days break from bug fixing and features, and decided to see if things could get faster.

Faster Attribute Access

One of the things I’ve had planned for since the early days of working on 6model is being able to look up attributes by index in the single inheritance case, rather than by name. I finally got around to finishing this up (I’d already put in most of the hooks, just not done the final bits). It’s not an entirely trivial thing to make work; at the point we parse an attribute access we don’t know enough about how the eventual memory layout of the object will be, or whether an indexed lookup will even work. Further, we have to involve the representation in the decision, since we can’t assume all types will use the same one. Mostly, it just involves a later stage of the code generation (PAST => POST in this case) having the type object reachable from the AST and asking it for a slot index, if possible.

Since I implemented it at the code-gen level, it meant the improvement was available to both NQP and Rakudo, so we get compiler and runtime performance improvements from it. Furthermore, I was able to improve various places where the VM interface does attribute lookups (for example, invocation of a code object involves grabbing the underlying VM-level thingy that represents an executable thing, and that “grabbing” is done by an attribute access on the code object). Attribute lookups never really showed up that high in the (C-level) profile, but now they’re way, way down the list.

The P6opaque Diet

P6opaque is by far the most common object representation used in NQP and Rakudo. It’s generally pretty smart; it has a header, and then lays out attributes – including natively typed ones – just like a C structure would be laid out in memory. In fact, it mimics C structures well enough that for a couple of parts of the low-level parts of Rakudo we have C struct definitions that let us pretend that full-blown objects are just plain old C structures. We don’t have to compromise on having first class objects in order to write fast low-level code that works against them any more. Of course, you do commit to a representation – but for a handful of built-in types that’s fine.

So, that’s all rainbows and butterflies, so what was the problem? Back last autumn, I thought I knew how implementing mix-ins and multiple inheritance attribute storage was going to look; it involved some attributes going into a “spill hash” if they were added dynamically, or all of them would go there apart from any in a common SI prefix. Come this spring when I actually did it for real, a slightly smarter me realized I could do much better. It involved a level of indirection – apart from that level already existed, so there was actually no added cost at all. Thing is, I’d already put the spill slot in there, and naughtily used the difference between NULL and PMCNULL as the thing that marked out whether the object was a type object or not.

This week, I shuffled that indicator to be a bit in the PMC object header (Parrot makes several such bits available for us to use for things like that). This meant the spill slot in the P6opaque header could go away. Result: every object using the P6opaque representation got 4 (32-bit) or 8 (64-bit) bytes lighter. This has memory usage benefits, but also some speed ones: we get more in the CPU cache for one, and for another we can pack more objects into fixed sized pools, meaning they have less arenas to manage. Win.

Constant Pain

In Perl 6 we have Str objects. Thanks to 6model’s capability to embed a native Parrot string right into an object, these got about three times cheaper in terms of memory already in nom. Well, hopefully. The thing is, there’s a very painful way to shoot yourself in the foot at the implementation level. 6model differentiates coercion (a high level, language sensitive operation) from unboxing (given this object, give me the native thingy inside of it). Coercion costs somewhat more (a method call or two) than unboxing (mostly just some pointer follows). If you manage to generate code that wants a VM-level string, and it just has an object, it’ll end up doing a coercion (since at that level, it doesn’t know the much cheaper unbox is possible/safe). After reading some of the compiler output, I spotted a bunch of cases where this was happening – worst of all, with constant strings in places we could have just emitted VM-level constant strings! Fixing that, and some other unfortunate cases of coercion instead of unbox, meant I could make the join method a load faster. Mandelbrot uses this method heavily, and it was a surprisingly big win. String concatenation had a variant of this kind of issue, so I fixed that up too.

Optimizing Lexical Lookup

We do a lot of lexical lookups. I’m hopeful that at some point we’ll have an optimizer that can deal with this (the analysis is probably quite tricky for full-blown Perl 6; in NQP it’s much more tractable). In the meantime, it’s nice if they can be faster. After a look over profiler output, I found a way to get a win by caching a low-level hash pointer directly in the lexpad rather than looking it up each time. Profilers. They help. :-)

Optimized MRO Compuation

The easiest optimizations for me to do are…the ones somebody else does. Earlier this week, after looking over the output from a higher level profiler that he’s developing for Parrot, mls++ showed up with a patch that optimized a very common path of C3 MRO computation. Curiously, we were spending quite a bit of time at startup doing that. Of course, once we can serialize stuff fully, we won’t have to do it at all, but this patch will still be a win for compile time, or any time we dynamically construct classes by doing meta-programming. A startup time improvement gets magnified by a factor of 450 times over a spectest run (that’s how many files we have), and it ended up being decidedly noticeable. Again, not where I’d have thought to look…profiling wins again.

Multi-dispatch Cache

We do a lot of multiple dispatch in Perl 6. While I expect an optimizer, with enough type information to hand, will be able to decide a bunch of them at compile time, we’ll always still need to do some at runtime, and they need to be fast. While we’ve cached the sorted candidate list for ages, it still takes a time to walk through it to find the best one. When I was doing the 6model on CLR work, I came up with a design for a multi-dispatch cache that seemed quite reasonable (of note, it does zero heap allocations in order to do a lookup and has decent cache properties). I ported this to C and…it caused loads of test failures. After an hour of frustration, I slept on it, then fixed the issue within 10 minutes the next morning. Guess sleep helps as well as profilers. Naturally, it was a big speed win.

Don’t Do Stuff Twice

Somehow, in the switch over to the nom branch, I’d managed to miss setting the flag that causes us not to do type checks in the binder if the multi-dispatcher already calculated they’d succeed. Since the multi-dispatch cache, when it gets a hit, can tell us that much faster than actually doing the checks, not re-doing them is a fairly notable win.

Results

After all of this, I now have a spectest run in just short of 170 seconds (for running 14267 tests). That’s solidly under the three minute mark, down 50s on earlier on this week. And if it’s that much of a win for me on this hardware, I expect it’s going to amount to an improvement measured in some minutes for some of our other contributors.

And what of mandelbrot? Earlier on today, moritz reported a time of 51 seconds. The best we ever got it to do in the previous generation of Rakudo was 16 minutes 14 seconds, making for a 19 times performance improvement for this benchmark.

This is not enough!

Of course, these are welcome improvements, and will make the upcoming first release of Rakudo from this new “nom” development branch nicer for our users. But it’s just one step on the way. These changes make Rakudo faster – but there’s still plenty to be done yet. And note that this work doesn’t deliver any of the “big ticket” items I mentioned in my previous post, which should also give us some good wins. Plus there’s parsing performance improvements in the pipeline – but I’ll leave those for pmichaud++ to tell you about as they land. :-)

Posted in Uncategorized | 3 Comments