Well that was hard work: getting nqp-rx grammars using 6model

Another Sunday, another “what happened this week” post.

The big meaty task this week was to get grammars switched over to using 6model. This – as I had feared – proved rather trickier than doing the same thing for classes. My initial approach was to port some of Cursor to NQP and have the rest in NQP but with PIR method bodies. This turned out to be biting off a bit more than I was really up for chewing, and with a growing queue of yaks awaiting shaving I put down the razor and decided to take a more modest approach: keep the PIR code and switch it to use 6model, even if this was going to look a bit, well, ugly, and would mean that some 6model optimizations were not going to be so readily applicable.

That itself still proved a fairly tall order, mostly because unlike with classes the code I needed to change wasn’t neatly reified into bootstrap stages. Thus changing it to what it needed to be in order to work with 6model would break earlier stages in the bootstrap that shared the code. In the end, I realized I’d have to make copies of various things and install them with a different name.

The steps between there and something that works were many, and in the end all a bit of a blur. Amongst them, I had to modify various bits in Cursor to handle working with the new meta-model (especially the way the proto-regex implementation did introspection), deal with proto-regex prefix introspection methods not getting registered with the new meta-model, get some bits from the HLL::Grammar cheats moved into HLL::Grammar itself, re-write a couple of them into NQP, and hunt down and fix a few mis-translations as they manifested themselves.

As I worked my way from “hopeless” to “wow, it parses stuff again”, something very worrying started to become emerge. Things had got enormously slower as a result of the transition. We’re not talking a little slower: the nqp-rx test suite ran 12 times slower than it had before.

I at first wondered if the wrapping of various things that were on the “hot path” in nqp-rx method bodies could have done it. This does result in an exception handler to catch return exceptions being set up. I did a micro-benchmark in PIR and found that the extra setup work this caused led to the overhead of a sub invocations being 3 to 4 times greater. (This is a bad result from a Parrot perspective; since the routine has the same handler needed in every invocation then the setup work should really only need doing once ever and apply to all invocations of the sub, or otherwise be very cheap per invocation to the point it’s noise in a benchmark).

I figured this couldn’t be the whole story, but put in the obvious, easy to implement optimization: only add a return exception handler if return is used anywhere in the routine. It made a noticeable difference…but was hardly the real issue: now things were just 11 times slower than they had been rather than 12.

I recalled that when classes were switched to 6model I hadn’t noticed a slowdown. My attention turned to attribute access: cursor objects work a lot with attributes, whereas the compiler object and action classes do little. Could something be adrift there? I reviewed the P6opaque code and hit upon a problem fairly quickly. Something that I’d translated fairly directly from the C# implementation turned out to have been a really bad idea. While on .Net you can create a hash table that is type parameterized and should perform quite well, Parrot’s hashes aren’t type parametric. They want the keys to be strings. So if using an integer as a key – as I was – it first puts that into a PMC (creating a GCable) and then coerces that into a string (another GCable) to do the hash lookup. In other words, every attribute get and bind – which I’d intended to be cheap – was actually going and allocating two GC-able objects. “Oops.” I fixed things.

So was that it? Well, no. Still 9 times slower. Both of the above changes were worth doing, but something else was still dominating me. Out of curiosity, I wrote a micro-benchmark of Parrot method dispatch performance vs 6model method dispatch performance. I was expecting them to be pretty much neck and neck; none of the gradual typing optimizations that 6model can support are in place yet, so much faster would have been surprising. As soon as I saw the numbers, I knew I’d found the real problem. 6model was dispatching methods 50 times slower! When a grammar isn’t allocating and populating a gazillion Cursor objects, it’ll be busying itself calling methods. Lots of them. This was almost certainly to blame.

So how does method dispatch work in 6model? Well, in theory finding a method is an operation on the meta-object. So you call a method on the meta-object to find the method you want. Of course, this bottoms out somewhere with a primitive (that’s the KnowHOW meta-object). In reality doing that every dispatch is far too slow, so meta-objects that know ahead of time what code object a given method name will map to can construct and publish a cache: essentially just a hash mapping names to invokable things. (A v-table is similar, but it’s an array, mapping indexes to invokable things, saving having to do a hash lookup). Well, that should have been the reality, but it turns out the code that populated the cache had a really silly bug in it. A bug this silly.

With that fixed, the test run was consistently beating the time it used to take before I switched grammars and cursors to 6model. That’s more like it! And that’s before any of the changes go in that really start taking advantage of 6model’s strengths, plus clawing back some of the losses I know have been incurred by wrapping various things in HLL::Grammar in NQP bodies, which has introduced some wasteful boxing.

The side show to all of that work was that I finally managed to spend some real time on natively typed attributes this week. I’m certainly not all the way there yet, but I completed the initial enabling refactor and the allocation and REPR side of the work. The enabling refactor itself has delivered some immediate benefits: it means that there is now one less memory allocation per object. Compared to Parrot objects, 6model objects now result in a half the number of allocations and are malloc-less (object bodies use the Parrot fixed size allocator, which tends to be more efficient).

I’m currently at the point of needing to work out some of the remaining design issues for natively typed attributes before I can actually have something working and ready to show. Trying to think that through will be one of my main tasks in the week to come. I expect that once I hit upon the design, the implementation will be relatively little effort.

Overall, things are going well. My aim was to have everything in place to be ready to branch Rakudo and re-build its OO bits atop of 6model by early February, and I seem to be on target with that. At a push it may happen next weekend (that is, just within January), but mostly it just boils down to “when I feel I’m ready”. The main tasks to do in order to get there are:

Resolve remaining design issues and finish getting a working implementation of natively typed attributes
Write a basic roles implementation for nqp-rx
Mix-in support (got most of the design in my head, just need to see if it can be implemented and work as easily as hoped)

Which is a pretty short list of “hard dependencies”. There are, of course, other bits that are desirable, such as getting the new multi-dispatch work neatened up where it has rough edges, getting Parrot v-table integration implemented and dealing with the fact that match objects are still not using 6model (something that will become a problem later on). Those will all happen anyway over the next few weeks, but needn’t block the start on the Rakudo changes.

2 Responses to Well that was hard work: getting nqp-rx grammars using 6model

luben says:

January 24, 2011 at 1:27 AM

Parrot hashes could be have integer keys/values but have to be told so:
h = new ['Hash'] # set key data type h = .Hash_key_type_int # set value type h.'set_value_type'(.DATATYPE_INTVAL)

- jnthnwrthngtn says:
  
  January 24, 2011 at 10:27 AM
  
  Ah, that’s good to know for the future, thanks. In the end, I realized that a hash was likely overkill compared to a simple list that I go through doing pointer comparisons; it’s just not going to be all that long (typically 3-4 items), so while it’s O(n) the constant factor in a practical situation means it’ll tend to be cheaper than the O(1) of a hash.
  
  Thanks!
  
  Jonathan

	wendyga on Recollections from the Raku Co…
	Релиз компилятора Ra… on The new MoarVM dispatch mechan…
	Peter Lund on The new MoarVM dispatch mechan…
	Peter Lund on The new MoarVM dispatch mechan…
	2021.40 It’s h… on The new MoarVM dispatch mechan…