Small, but welcome, fixes

Last week wasn’t one of my most productive on Perl 6, thanks to a mix of Easter holiday, a little more work than expected on another project, and feeling a tad under the weather for a day or so. In the time I did find, though, I managed to pick off some worthwhile bits and pieces that needed doing.

Heap snapshot issues triage

It’s generally very clean and pleasant that our meta-objects are, so far as the VM is concerned, just objects. This means the answer to “can types be GC’d” is easy: sure they can, so long as nothing is using them any more. The downside is that the VM has very little insight into this huge pile of objects, which is why we have various “protocols” where the MOP can provide some hints. A lot of performance wins come from this. When I was working on heap snapshots, I found it would also be rather useful if there was a way to let the VM know about a “debug name” for a type. A name that isn’t used for any kind of resolution, but that can be included in things like heap snapshots. It’ll no doubt also be useful for debugging. So, I added an op for that, nqp::setdebugtypename, and used it in a branch of NQP, meaning I got more useful heap snapshots. This week, I also added this op to the JVM backend, meaning that I could merge the branch that uses it. This means that you can now do a normal build of Rakudo/NQP/Moar and get useful typenames in the MoarVM heap snapshots.

Last time, I mentioned that the heap snapshot viewer took an age to start up because we were super slow at reading files with really long lines. This week, I fixed this performance bug with two patches. Now, the heap snapshot analyzer picks apart a 25MB snapshot file into its various big pieces (just looking at how each line starts) and reaches its prompt in under a second on my box. That’s a rather large improvement from before, when it took well over a minute.

I also looked at a C-level profile of where MoarVM spends time doing the detailed parsing of a heap snapshot, which happens in the background while the user types their first command, and then blocks execution of that command if it’s not completed. The processing uses a few threads. The profile showed up a couple of areas in need of improvement: we have a lot of contention on the fixed size allocator, and the GC’s world-stopping logic seems to have room for improvement too. So, those will be on my todo list for the future.

Fixing a couple of crashes

SIGSEGV is one of the least inspiring ways to crash and burn, so I’m fairly keen to hunt down cases where that happens and fix them. This week I fixed two. The first was a bug in the UTF8 Clean 8-bit encoding, which turned out to be one of those “duh, what was I thinking” off-by-ones. The second was a little more fun to track down, but turned out to be memory corruption thanks to missing GC rooting.

And…that’s it!

But, this week I’m set to have a good bit more time, so hope to have some more interesting things to talk about in the next report. :-)

Posted in Uncategorized | 1 Comment

Happy heapster!

In last week’s report, I was working away at a heap snapshot mechanism for MoarVM. If you don’t know what that is, I suggest taking a look at last week’s post for an explanation, and then this one will probably make a whole lot more sense. :-)

This week, I did the rest of the work needed to get heap snapshots being dumped. I’m not going to talk any more about that, since I discussed the ideas behind it last time, and I encountered nothing unexpected along the way.

Instead, I want to discuss my initial work on building a tool to analyze the heap snapshots and get useful data out of them. The snapshots themeselves are mostly just megabytes and megabytes worth of integers separated by commas and semicolons. Big data it ain’t, but it’s big enough that processing it needs at least a bit of thought if we’re going to do it fast enough and without using huge amounts of memory.

I decided early on that I wanted to write this analysis tool in Perl 6. Overall, I’m working on performance, memory use, and reliability. The results from analyzing heap snapshots will help cut down on memory use directly, and identify leaks. Building the analyzer in Perl 6 means I’ll be able to identify and address the performance issues that I run into along the way, helping performance. Further, I could see various opportunities to do parallel processing, giving concurrency things a bit of a workout and maybe shaking some issues out there.

The design

I decided to build a command line application, which would execute queries on the heap snapshot and display the results. Thinking a bit about that, I realized it would break nicely into two pieces: a model that held the snapshot data and provided various methods to interrogate it, and a shell that would parse the queries and nicely display the results. All the stuff that needed performance engineering would lie within the model. In the shell, I didn’t have to worry about that.

I also realized that there was a nice perceived speed trick I could pull. While waiting for the user to type a query, there’s no reason not to get to work on parsing the snapshot and building up a data model of it. In the best case, we’re done by the time they’ve typed their query and hit enter. Even if not, we’ve shaved some time off their wait.

Looking at heap snapshots

A heap snapshot file is really a heap snapshot collection file, in that it can hold multiple snapshots. It starts with some common data that is shared among all the snapshots:

strings: ["Permanent Roots","VM Instance Roots", ...]
types: 18,19;21,22;25,26;27,28;30,31;33,33;30,34;...
static_frames: 10,11,1,12;63,64,13,65;10,67,1,65;...

The first, strings, is just a JSON array of all the strings we refer to anywhere else in the snapshot. The types data set can be seen as representing a table with two columns, both of them being indexes into the strings array. The first column is the name of the representation, and the second is type names (for example, a Perl 6 Rat would have the P6opaque representation and the Rat type). This is so we can understand what the objects in the snapshot are. The static_framesdata set does a similar thing for frames on the heap, so we can see the name, line number, and file of the sub, method, or block that was closed over.

These are followed by the snapshots, which look like this:

snapshot 0
collectables: 9,0,0,0,0,4;5,0,0,0,125030,165;...
references: 2,0,1;2,1,2;2,2,3;2,3,4;2,4,5;2,5,6;...

These describe a graph, rooted at the very first collectable. The collectables are the nodes, and the references the edges between them. The integers for a collectable describe what kind of node it is (object, frame, STable, etc.), its size, how many edges originate from it and where in the references table they are located (we take care to emit those in a contiguous way, meaning we can save on a from field in the references table). The references have a to field – which is actually the third integer – along with some data to describe the reference when we know something about it (for example, array indexes and attribute names in objects can be used to label the edges).

To give you an idea of the number of nodes and edges we might be considering, the snapshot I took to start understanding the EVAL memory leak has 501,684 nodes and 1,638,375 edges – and that is on a fairly well golfed down program. We can expect heap snapshots of real, interesting applications to be larger than this.

We could in theory represent every node and edge with an object. It’d “only” be 3 million objects and, if you strip away all the Perl 6 machinery we’re still pretty bad at optimizing, MoarVM itself can chug through making those allocations and sticking them into a pre-sized array in about 0.6s on my box. Unfortunately, though, the memory locality we’ll get when looking through those objects will be pretty awful. The data we want will be spread out over the heap, with lots of object headers we don’t care about polluting the CPU caches. Even with well-optimized object creation, we’d still have a bad design.

Native arrays to the rescue

Thankfully, Perl 6 has compact native arrays. These are perfect for storing large numbers of integers in a space-efficient way, and tightly packed together to be cache-friendly. So, I decided that my objects would instead be about the data sets that we might query: types, static frames, and snapshots. Then, the code would be mostly focused around ploughing through integer arrays.

The types data set makes a fairly easy example. It has two int arrays of indexes into the strings array, meaning the data set will be stored in two contiguous blobs of memory. Also note the careful use of binding in the BUILD submethod. Perl 6’s array assignment semantics are copying in nature, which is great for avoiding action at a distance (a good default) but not quite what we want when passing around large volumes of data. The type-name and repr-name method just resolve indexes to strings. More interesting are the all-with-type and all-with-repr methods, which simply resolve the search string – if it exists – to an index, and then go through the native int array of type name string indexes to find what we’re after.

Parallel, and background, processing

The BUILD submethod for the overall model starts out by picking the file apart, which is some simple line-based parsing. Given the heap snapshot format has very few lines (though they are very long), there’s nothing too performance sensitive in this code.

Then, at the end, it sets off 3 bits of parallel work to decode the JSON strings, and to parse the type and static frame tables. It’s worth noting that the Types dataset does also need to be constructed with the JSON strings; a simple await is used to declare that dependency. With regard to the snapshots, there’s no point processing all of them. Often, the user will only care about the final one, or a handful of them. At the same time, some background processing is still desriable. Thus, the model has arrays of unparsed snapshots and parsed snapshot promises. When the user indicates interest in a snapshot, we will start to prepare it. At the point it’s actually needed, we then await the Promise for that snapshot.

Over in the shell, we special-case having 1 snapshot in the collection by just selecting it right away, getting on with the processing of it immediately while the user types their query. We also take a little care to let the user know when they performed an operation, but we’re not done with parsing the snapshot yet.

Early results

The first things I implemented were getting a summary of the heap snapshot, along with queries to see the top frames and objects, both by space and count. Here’s how a session with it on a Rakudo heap snapshot starts out:

$ perl6-m -Ilib bin\moar-ha rak-heap
Considering the snapshot...looks reasonable!

This file contains 1 heap snapshot. I've selected it for you.
Type `help` for available commands, or `exit` to exit.

I ask for a summary:

> summary
Wait a moment, while I finish loading the snapshot...

    Total heap size:              28,727,776 bytes

    Total objects:                372,884
    Total type objects:           2,248
    Total STables (type tables):  2,249
    Total frames:                 2,523

And then can ask about the kinds of objects taking up the most space:

> top objects
Name                                   Total Bytes
=====================================  ===============
NQPArray                               7,184,816 bytes
BOOTInt                                4,639,680 bytes
BOOTStaticFrame                        4,195,168 bytes
VMString                               2,804,344 bytes
BOOTCode                               1,486,872 bytes
<anon>                                 1,331,744 bytes
Parameter                              847,552 bytes
BOOTNum                                827,904 bytes
BOOTStr                                546,368 bytes
Perl6::Metamodel::ContainerDescriptor  367,848 bytes
BOOTArray                              344,056 bytes
QAST::Op                               262,656 bytes
<anon>                                 255,032 bytes
Method                                 238,792 bytes
Signature                              228,096 bytes

And by number of objects:

> top objects by count
Name                                   Count
=====================================  =======
BOOTInt                                144,990
NQPArray                               74,270
VMString                               34,440
BOOTNum                                25,872
BOOTCode                               20,651
BOOTStr                                17,074
BOOTStaticFrame                        16,916
Parameter                              6,232
Perl6::Metamodel::ContainerDescriptor  5,109
BOOTHash                               4,775
Signature                              3,168
<anon>                                 2,848
QAST::Op                               2,736
BOOTIntArray                           1,725
Method                                 1,571

Figuring out where those huge numbers of boxed integers come from, along with the arrays we might guess they’re being stored in, will no doubt be the subject of a future week’s post here. In fact, I can already see there’s going to be a lot of really valuable data to mine.

But…that EVAL leak

This kind of analysis doesn’t help find leaks, however. That needs something else. I figured that if I did something like this:

C:\consulting\rakudo>perl6-m --profile=heap -e "for ^20 { EVAL 'my class ABC { }' }"
Recording heap snapshot
Recording completed
Writing heap snapshot to heap-snapshot-1458770262.66355

I could then search for the type tables for the class ABC (which we could expect to be GC’d). That would confirm that they are staying around. So, I implemented a find query, which gave me output like this:

> find stables type="ABC"
Object Id  Description
=========  ===========
42840      ABC
42845      ABC
45288      ABC
64994      ABC
71335      ABC
76824      ABC
78599      ABC
82535      ABC
86105      ABC
146765     ABC
404166     ABC

Those integers on the left are unique IDs for each object on the heap. So, given one of those, I then needed to calculate a path through the heap from the root to this object, to understand why it could not be collected and freed. This would be most useful if I could be told the shortest path. Thankfully, there exists a well-known algorithm that runs in O(Nodes + Edges) time that can annotate all graph nodes with data that lets you then determinte the shortest path to the root in O(maximum path length). (That gives an upper bound of O(Edges) in general, however unlikely that is in the real world. To see why, consider a heap snapshot that consists entirely of a linked list.)

The algorithm, if you didn’t guess yet, is the breadth-first search. The nice thing is that we only have to compute it once, and can then answer shortest path queries really quickly. Here’s the patch that added it. And here’s what it found:

> path 42840
    --[ Thread Roots ]-->
Thread Roots
    --[ Compiling serialization contexts ]-->
BOOTArray (Object)
    --[ Unknown ]-->
SCRef (Object)
    --[ Unknown ]-->
ABC (STable)

Bingo! We’re somehow, in an EVAL, leaving a reference to the serialization context in the “stuff we’re still compiling” array. (Serialization contexts being the things that hold objects we create at compile-time, but want to refer to at runtime, and so must serialize when pre-compiling.) This discovery quickly led to an NQP patch. Which…helped, but we still leak, for a different reason. The good news, however, is that the heap snapshot analyzer could very quickly show that the fix had been successful, and identify the next thing to investigate. You can only imagine how much more frustrating it would be to not have tooling that can do this! Sadly, I didn’t have time this week to dig into the next problem.

In one final bit of heap snapshot analyzer news, Timo forked it and added more features. I love it when I build something and others dig in to make it better. \o/

So, how was Perl 6’s performance?

I discovered 3 things that basically killed performance. The first was Str.Int being horrid slow, and since reading in a heap snapshot is basically all about convering strings to integers that was a real killer. For example:

$ timecmd perl6-m -e "my $a; for ^1000000 { $a = '100'.Int }"
command took 0:1:09.83 (69.83s total)

Awful! But, a 15-minute patch later, I had it down to:

$ timecmd perl6-m -e "my $a; for ^1000000 { $a = '100'.Int }"
command took 0:0:1.87 (1.87s total)

A factor of 37 faster. Not super-fast, so I’ll return to it at some point, but tolerable for now. I guess this’ll help out plenty of other folk’s scripts, so it was very nice to fix it up a bit.

The second discovery was that a thread blocked on synchronous I/O could end up blocking garbage collection from taking place by any thread, which completely thwarted my plans to do the snapshot parsing in the background. This wasn’t a design problem in MoarVM, thankfully. It’s long had a way for a thread to mark itself as blocked, so another can steal its GC work. I just had to call it on a couple of extra code paths.

There’s a third issue, which is that something in MoarVM’s line reading gets costly when you have really long (like, multi-megabyte) lines. Probably not a common thing to be doing, but still, I’ll make sure to fix it. Didn’t have time for it this week, though. Next time!

Posted in Uncategorized | 3 Comments

A whole heap of work

The majority of my Perl 6 grant time in the last week went on building a heap snapshot mechanism in MoarVM. But, before I dug into that, I worked on…

More memory savings!

While I expect we’ll find plenty of interesting memory savings – not to mention the source of the EVAL memory leak – through the heap snapshots I’m working on, there’s still a good bit of info to be had from C-level heap profiling tools and taking a careful look at various data structures. This week, I found and improved 3 such things.

Every block, routine, or even thunk that we compile gets an ID that is unique within its compilation unit (that is, a body of code that is compiled at once). This is used to be able to refer to it unambiguously. Historically – dating all the way back to the Parrot days – these needed to be a bit more unique than just within a compilation unit. This made them fairly long, and since something like the Perl 6 builtins have into the tens of thousands of them, they added up quickly. So, I shortened them a good bit, knocking 188KB of NQP’s base memory, 1424KB off Rakudo’s base memory, and making the compiled CORE.setting a nice 288KB smaller.

These routines, blocks, and thunks are all represented down in MoarVM as static frames – a data structure I’ve talked about a bit in previous posts here. Aware that we have a lot of them, I looked to see if there were any more relatively easy savings to be made. I discovered a couple.

First up, we were keeping around a small map of the bytecode that was used to indicate instruction boundaries. This was used for precisely one thing: outputting instruction offsets when showing low-level backtraces. We never actually show that data to any normal Perl 6 end user. It was useful to have around early on in MoarVM’s development, but I can’t remember the last time I needed that information to debug something. It’s also hugely misleading these days, as the VM is often running specialized bytecode with lots of virtualization and indirection stripped out. So, I just got rid of it. We only generated these for the bytecode of static frames that were actually run, but it also saved a pointer for all static frames. In total, it was another 316KB off NQP’s base memory, and 1088KB off Rakudo’s.

Finally, back when I added the profiler, static frames got a few new fields. They’re totally unused when we’re not applying any kind of instrumentation, however. So, I pulled them out into a single struct, reducing us to just one pointer per static frame for this state. That was another 144KB off NQP, and 1036KB off Rakudo.

These total to around 3.5MB off the base size of Rakudo (measured by running the empty loop program), and all give further savings once extra modules are loaded or more of CORE.setting is used.

Multi caches get a size boost

I analyzed some code from RabidGravy doing sine wave generation, and noticed it was doing a lot of slow-path multi-dispatch. It turns out we were overflowing the multi caches, which in turn would be harming our ability to inline. While this points at a design issue I want to look into a little further, giving the caches a size boost provided some instant relief, getting the code down to 60% of its previous runtime. This does mean we use a bit more memory, but only for multis that we actually call, which isn’t all that many just for a normal startup.

Heap snapshots

This week I got a lot of work in on heap snapshots. First of all, what is a heap snapshot? It’s a representation of the memory allocated by a program, and under the management of the VM. In MoarVM this consists of three things:

  • Objects (both type objects and instances)
  • Shared Tables (aka “STables”), which hold the common things for a given meta-object and representation pairing
  • Call frames

The first two are managed by the GC; in fact, most code in the GC doesn’t care which it’s dealing with. The third is managed for now by reference counts (the reasons why, and how this gives a mix of great properties and awful properties, is a discussion for another day!)

The point isn’t to capture all of the data on the heap, but rather to capture the size, identity, and reference structure of the data. Size is easy enough to understand – it’s just about how many bytes are used. Identity is about what type an object has (the class name, for example), what type an STable represents, and what block or routine in the code a particular call frame is an invocation of. Together with counting how many items we have, we can use size and idendity data to answer questions like, “what kinds of objects are using up loads of memory”. I’ll be intersted to use this to see what’s eating up memory in NQP and Rakudo, before you even start doing any interesting work.

Reference structure is more interesting. It’s about what things point to, or refer to, other things. MoarVM has a reachability based GC. When we garbage collect, we start from a set of roots, which consist of things like the call stacks of each running thread along with other per-thread and per-VM state. We then – conceptually, at least – “mark” all the objects that these roots reference and put them on a todo list. We then visit them, asking each of them what they reference, and putting those objects onto the todo list. By taking care never to put an already marked object onto the todo list, we can be confident we’ll eventually terminate with an empty list. At the end, all of the unmarked objects are declared unreachable, and we can free them up. (And just to drive it home: this is a good way to conceptually understand what a reachability GC does, but does not reflect the way modern VMs that need to efficiently manage a large number of objects actually do it. And, like nearly all abstractions, it hides details that can matter in reality. Again, a topic for another day. :-))

Anyway, the reference structure is all about answering why a given object is still in memory. What path from the roots is keeping it alive? This is the sort of data that will help me understand the leaky EVAL bug I mentioned a week or so back.

Taking a heap snapshot involves capturing all of the raw data. It’s rather important that the process of taking a heap snapshot does not impact the liveness of the things in the snapshot. The instrumenting profiler, which was primarily built for giving us insight into microbenchmarks and how the dynamic optimizer works with them, references real type objects and static frames in its data set, which is then post-processed. In doing so, it keeps them alive. But the heap snapshotter doing this would be a disaster! It would distort the set of live objects, which it exists to measure. So, rule 1 is that the heap snapshot data structure must never hold any references to GC-managed objects. In fact, that’s probably the only really important rule. Of course, it’s going to affect the amount of memory the VM uses, but that must not show up in any way in the results. Most of the remaining challenge in taking heap snapshots is about trying to capture the data in a sufficiently compact way that it’s just large rather than utterly huge.

Another interesting question is when to take heap snapshots. A fairly good time is just after a GC run. Why? Because we’re in a state of “stop the world” at that point anyway, so we really are taking a snapshot of an atomic point in time with regard to the execution of the program. So, that’s where it’s done. (Note that this doesn’t bias things, in so far as we’d never be wanting to include unreachable things into the snapshot anyway.)

Anyway, by this point I’ve got pretty far with the code that does the snapshotting. It happily finds all object and STable references, and I’m part of the way through frames, which will mean it can snapshot the memory associated with closures properly. So, not much more work to go on producing the snapshots themselves.

Once I’ve got complete snapshots being produced, the next task will be to build some tools to analyze them. There will also be some work to make the snapshots a bit more detailed (for example, including the names of object attributes, so it’s easier to see the path keeping an object alive). I’ll probably take on some basic analysis tooling first, however, because I think even without that added context the snapshots will still be hugely revealing. Stay tuned for next week’s report for more!

Posted in Uncategorized | 5 Comments

Can’t forget about memory issues

I pondered at the end of last week’s post that I might work on speeding up object construction this week. I didn’t, but I did get a bunch of other useful stuff done – and got started on one of the big tooling deliverables of my grant.

Lazy string decoding

Last week, I’d been working on lazily decoding strings, so we could save some memory. I nearly got it working, and noted that Timo dropped a patch that got rid of the SEGV. This week I reviewed that patch. It was correct in so far as it fixed the issue, but I noted that other code was vulnerable to running into the same problem. In summary, the JIT compiler has a dynasm macro for grabbing a string from the string heap, and a robust fix would ensure we always made sure the string was decoded before JITting an unconditional lookup to it. So, I fixed it that way. Code review for the win!

With the patches merged, Rakudo’s memory use for the infinite loop program went down by 1272 KB, which is a welcome win. You might imagine that we’d have saved something on startup time by not doing the decoding work also. The answer is yes, but it’s in noise if you simply use something like time. Thankfully, there is a good way to measure more accurately: use callgrind to count the number of CPU instructions retired running the empty program. That shows a reduction of about 2.7 million instructions – significant, though less than 1%, which is why it’s noise on the wallclock time. Still, such changes add up.

Other little memory savings

Glancing the massif heap profiler output after the previous fix, it struck me that another huge use of memory allocations at startup is static frames. Static frames represent the static things we know about call frames. Call frames are the things created whenever you enter a routine or block, and keep hold of your lexical variables. The actual lexical storage is created every call frame (to understand why this is needed, consider recursion). However, things like the name of a routine, its bytecode, the mapping of lexical names to lexical storage slots in the call frame, and the ranges of code covered by exception handlers is “static” data. Therefore, we store it once in the “static frame”, and each call frame just points to its static frame.

You never actually see a static frame from MoarVM’s user-space. They’re an implementation detail. What you do see are coderefs (or “code objects”), which point to a static frame and also have an outer pointer, which is what makes closures work.

We already do partially lazily deserialize static frames. In short, we always deserialize the part you need for operations that just want to talk about it, and only deserialize the rest of the data if you ever actually call that code. That was a really quite good saving, but it looks from the profile like there may be value in trying to push that laziness evey further. Again, for a typical Perl 6 program, only a tiny fraction of the code in CORE.setting will ever be mentioned or invoked.

So, that may be a future task, but it will be a little tricky compared to strings. What I also did notice, however, is that in a compilation unit, we kept two arrays: one of static frames, and one of code objects. We almost never indexed the static frames array, and none of the places we did would be on a hot path. Since it was always possible to reach the static frames through the code objects, I did the refactor to get rid of the static frames array. It’s only a 16KB saving for NQP, but with Rakudo’s sizable CORE.setting and the compiler, we save a more satisfying 134KB off the memory needed for the infinite loop program. We also save ourselves walking through those long arrays of code objects during full GC runs (you’ll note the removed GC calls in the patch), which is also good for the CPU cache – and of course saves some instructions.

Another slightly cheeky saving came from lazily allocating GC nursery semispaces. The MoarVM GC, like most modern GCs that need to manage sizable heaps, is generational. This means it manages objects that live a long time differently from those that live a short time. For short-lived objects, we use a technique known as semi-space copying. The idea is simple: you allocate two equal size regions of memory (4MB each, per thread, in MoarVM). You do super-cheap bump-the-pointer allocation in the currently active semi-space. Eventually, you fill it, which triggers garbage collection. All the living objects are then copied, either to the start of the other semi-space if we never saw them before, or into the old generation storage if they survived a previous GC and so are probably going to be fairly long-lived.

For simple programs, or short-lived threads, we never need to ask for the second semi-space. In the past we always did. Now we don’t. It actually took a lot less time to write/test that patch than it did to explain it now. :-) But also, it’s worth noting that it’s a bit of a con on…well…every modern OS, I’d guess. While a heap profiler like massif will happily show that we just saved 4MB off each small program or short-lived thread, in reality operating systems don’t really swallow up pages of physical RAM until programs touch them. So, this is mostly a perception win for those who see numbers that factor in “what we asked for”, but it’s an easy one, and may save a brk system call or so.

Assorted small fixes

My work on speeding up accessors turned out to cause a regression in a module that was enjoying intropsecting accessor methods. It turns out I didn’t mark the generated methods rw when I should have, even if the generated code was correct. An easy fix.

Next up, I looked into a concurrency bug, where channels coerced to supplies and supplies coerced to channels would behave in the wrong way with exceptions. I guess it’s a pretty symmetry in some sense that we ended up with both directions broken, but both directions working is even prettier, so I fixed both cases.

It turned out that writing a submethod Bool didn’t work out right in all cases, while a method Bool always did. Boolification is fairly hot-path, so is fairly optimized (on the other hand, with our increasingly smart dynamic optimizer, we might be able to rid ourselves of the special cases in the future). For now, it was another easy fix (at least, if you know the MOP code well enough!)

Finally I took on a nasty SEGV found by Larry, which turned out to golf to a crash when concatenating strings that had Unicode characters in a fairly particular range at the end of the first string (there we likely a few other paths to the same bug). It turned out to be a silly braino in the generation of the primary composite table used for NFC transformation.

The EVAL memory monster

An RT ticket reported that EVAL of a regex appears to leak (it in fact asked about the <$x> regex construct, but that golfs to an EVAL):

for ^500 { EVAL 'regex { abcdef }' }

My first tool against leaks is Valgrind, which shows what memory is allocated but not freed. So, I tried it and…no dice. We leak very little, and it all seems to be missing cleanup at exit. The size of the leak was constant, not proportional to the number of EVALs. Therefore, I wasn’t looking at a real leak in the sense that MoarVM loses track of memory. All the memory the EVALs tie up is released at shut-down – it’s just not released as early as it should be.

My next theory was that perhaps we seem to leak because we end up with lots of objects promoted to the old generation of the GC, which we collect irregularly. That quickly became unlikely, though. The symptom of this is typically that memory use has a sawtooth shape, with lots getting allocated then a sudden big release on full collections. However, that didn’t seem to happen; memory use just grew. To completely rule it out, I tweaked the settings to make us do full collections really often, and it didn’t help.

While doing this analysis, I was looking at profiles some. The primary reason was because the profiler records all the GC runs that happen, so I could easily see that full heap collects really were happening. However, looking at the profile pointed out a few easy optimizations to compilation.

The first was noting that the QAST::Block.symbol method – used by the compiler to keep various interesting bits of data about symbols – was doing a huge number of allocations of hashes and hash iterators. I noted in many cases we don’t need to.

This was a big win – in fact, far too big. The profiler seemed to report too few allocations now. I dug in and noticed that it indeed was missing various allocations that could happen during parameter binding, both with slurpies and auto-boxing. Fixing that in turn showed up that the very same symbol method accounted for a huge amount of allocation as a result of boxing the $name parameter. So, I did a few fixes so that we could turn that to str.

The profiling data also showed up a few other easy wins in the AST to MoarVM compiler. One was opting NQP in to a multiple dispatch optimization in MoarVM that Rakudo was already using. The other was a few more native type annotationsto cut out some more allocations.

These together meant that my leaking example from earlier now ran in 75% of the time it used to. These improvements will help compilation speed generally too. The bad, if unsurprising, news is they did nothing at all to help with the leak.

A little further analysis showed that we never release the compilation unit and serialization context objects. That only leads to another question: what is referencing them, and so keeping them alive and preventing their garbage collection? That’s not an easy question to answer without better tooling. Happily, in my grant I’d already proposed to write a heap profiler, which can record snapshots of the GC-managed heap.

So, I’ve started work on building that. It’s not trivial, but on the other hand at least a good bit less difficult than writing a garbage collector. :-) Since I seem to have found plenty to talk about in this post already, I’ll save discussing the heap profiler until next week’s post – when it might actually be at the point of spitting out some interesting data. See you then!

Posted in Uncategorized | 3 Comments

Digging into the performance/reliability work

This week, I finally started digging into Rakudo and MoarVM things again, as part of my performance and reliability grant. Here’s what I got up to.

Faster accessors and an inlining fix

Making object accessors and object construction faster was one of the bullet points on the grant, and I decided to start out with accessors – largely because I’d already figured out pretty much exactly what I wanted to do with them. Here’s a small microbenchmark:

class A {
    has $.x;
my $a = => 42);
my $y;
loop (my int $i = 0; $i < 10000000; $i++) {
    $y = $a.x;
say $y;

The hot loop is just calling an accessor method ten million times and assigning the result. Before I started with any improvements, this code took 3.34s to run, including startup/compilation time. That was around 0.15s (measured by the time to do one iteration of the loop), so let’s call it 3.2s for the iterations themselves. That makes for about 0.32 microseconds per iteration. Being under a millionth of a second might not seem something to complain about, but it’s actually a little shy of a thousand CPU clock cycles on a 3 GHz CPU. Not cheap, for something you would expect to be very cheap indeed!

So, where is the cost? Here’s a bit of the profile output:


That <anon> is the accessor, which is called ten million times. It’s been JIT-compiled (the green), which is good. The column on the right, however, tells a less happy story: the accessor method was not inlined. Inlining is an optimization that involves taking the body of a called method and splicing it into the callee, such that you completely eliminate the dispatch and calling overhead. One other point on my grant is to look into decreasing call overhead, which really wants to be lower, but being able to inline and get rid of it altogether is still preferable. And accessor methods are tiny, so should be great inlining candidates. So what gives?

Well, at the heart of it was the way that we’d implemented accessor methods. Down in the metamodel code, we did something like this (simplified):

$package.^add_method($accessor-name, method () {
    nqp::getattr(self, $package, $attr-name)

The method being added there is a closure. It closes over the package the attribute is from and the name of the attribute, which together are used in the lookup. MoarVM’s inliner can not inline things that access lexicals from the surrounding scope, because then you actually need a callframe with a correctly bound outer frame to resolve them. There’s a further cost, though. If we write an accessor method by hand:

method x() { $!x }

Then we look at the generated code, we see these two instructions:

 wval r4(3), liti16(0), liti16(12)
 getattr_o r5(1), r0(1), r4(3), lits($!x), liti16(0)

The wval is just grabbing a constant symbol, and the name of the attribute is a literal string. Since all those things are known, the MoarVM code specializer is able to re-write those two into:

 sp_p6ogetvc_o r5(1), r0(1), liti16(8), sslot(3)

This is known as a “spesh op”, which only the optimizer is allowed to emit. The 8 there is an offset from the start of the object’s memory in bytes, and this op is interpreted by taking the object in register 0, adding 8 bytes to its address, and deferencing – without having to do much in the way of “safety checks”. This JIT compiles into machine code rather cheaply also.

But when we have a late-bound package and attribute name, this optimization cannot happen. So not only are we not getting inlining to take place, we’re also losing out by having the code for accessors that we generate not being so good as the code you’d get if you wrote them all by hand.

So, what to do? Well, since most of the time the MOP is invoked by the compiler, and the compiler’s job is code generation, the neat solution is to have it provide a compiler services object to the MOP. It can use this to generate the accessor methods. When there’s no compiler services object supplied, it falls back to the existing way (for example, this would happen when you’re doing dynamic generation of classes). So, that’s what I implemented.

The result? 2.43s, or about 2.3s with startup time removed. So, it now was running in 72% of the time it did before. That’s an improvement, but not what I was looking for. Disappointed, I looked at the new profiling data and found that we still weren’t actually inlining the accessor methods. Looking at their code, they were free of outer lexical references and otherwise really simple. So what was going on?

It turned out to be an inlining bug that was also leading to many more missed opportunities. I’d heard from other folks looking at profiles that they were surprised certain things didn’t get inlined, but in the push to get things to work before Christmas, rather than to get them fast, I didn’t look into it. Now I have chance to take care of such things, and so investigated. It turned out to be an unfortunate accident that occurred when teaching inlining about multiple dispatch, so it could inline multi subs and methods. It led to us, oddly, being pretty good at inlining multi-dispatch calls, but missing out on lots of (supposedly easier) single-dispatch calls. Once I figured out what was going on, the patch was short and simple. This should have much wider benefits than just for accessors.

But, back to accessors. With inlining of various single-dispatch things fixed up, the benchmark now ran in 0.63s, so let’s call it around 0.5s once startup time is removed. That’s 0.05 microseconds per iteration of the loop, or around 150 CPU cycles per loop. That’s still too high, but running that loop in 15% of the time we used to is a nice step forward.

Just to put that figure on the map a little, I wrote the following Perl 5 program, which I hope is reasonably equivalent:

package A;
use Moo;
has x => ( is => 'ro' ); 

package Main;
my $a = A->new(x => 42);
my $y;
for (my $i = 0; $i < 10000000; $i++) {
    $y = $a->x; 
say $y;

This ran in 5.12 seconds, with 0.08s startup/compilation time, so let’s call it 5 seconds for the iterations, or around 0.5 microseconds, or 1500 CPU cycles on our over-idealized 3GHz CPU. (For those wondering why I didn’t stick use integer into the Perl 5 benchmark to reflect the native int I used in the loop of the Perl 6 one: I tried, and on my box that made things slower.)

There was another small benefit: CORE.setting got a bit smaller. While we’re producing a bit more bytecode now, we’re serializing less closures, and it turns out that the bytecode approach weighs a little less. My guess going into this was that I’d about break even, so coming out with a 130KB lighter CORE.setting was very nice. That means a smaller resident memory size also.

A big leak

Away from performance work, I spent some time hunting down a reported memory leak. Tracing things back from the valgrind output I was sent, I recreated a one-liner to reproduce it. The leak needed certain combinations of features to be used; there were various paths to trigger it, though my reproduction of it involved a multiple dispatch sub with a where constraint, called with flattening arguments. If such situations happened a lot, we could leak rather heavily. It’s gone now.

The analysis of this issue was helped by my having spent some time in the run-up to this grant clearing up MoarVM’s –full-cleanup mode. Usually we don’t waste time neatly clearing everything up at exit, because the OS can do it way faster (yes, I measured). However, MoarVM invoked with the –full-cleanup flag will try to do such cleanup, freeing everything it allocated. This is useful because it makes actual leaks, not just incomplete cleanup, clear to see. I’m not all the way there with this work yet, however NQP and most of its test suite are already valgrind-clean. I’ll be continuing to pick off the missing bits of cleanup over the course of the grant, and at some point want to set up a spectest run with valgrind + –full-cleanup to give ourselves a good chance of hunting all the leaks down.

Work in process: lazy string heap decoding

A decent chunk of the CORE.setting compiled output, along with various other bits of the compiler, is string literals. Of course, not all of them are equally used (think about the hundreds of error messages you never normally encounter). To date, we’ve always taken all of the strings and decoded them at bytecode load time, creating an NFG MVMString data structure (and so doing the bit of analysis needed to see if they need any synthetic codepoints generating). Looking at our memory use, it became clear there might be a win from deserializing them on-demand. The patch doing so isn’t quite ready for prime-time yet (though a follow-up fix by Timo may have addressed the crash my initial patch produced in an NQP test), but it revealed that we can knock around 1.3MB off Rakudo’s base memory usage with such an approach. (Fun fact: with the patch, on my box NQP running while 1 { } uses 1.8MB less memory than the JVM running the equivalent Java program – except NQP actually has the NQP compiler in memory and compiled the script too!) Anyways, expect this improvement to land in the next couple of days.

What next?

There’s so much to do! One task will be applying the same approach I took for accessors to object construction, so see how far that helps. Our object construction speed is a known bottleneck, and a simple benchmark against Perl 5 + Moo shows Rakudo being a disappointing 7 times slower. So, some work will be needed for us to reach towards parity, then get ahead, there. And on the reliability track, I’ll be picking out some bugs, leaks, etc. to squish too.

Posted in Uncategorized | 3 Comments

A few words on Perl 6 versioning and compatibility

Recently, I put together a set of guidelines for Perl 6 versioning and backward compatibility. They have been refined with input from the Perl 6 community. Together, they amount to a read of over 4,000 words, and are fairly detailed. You can consider this post a TL;DR version of the guidelines, giving the big picture idea of what they mean.

There’s more than one thing to version

In common with many languages, Perl 6 has a clean separation of language version and compiler version. Just as C has language versions like C89, C99, and C11, Perl 6 has versions 6.c, 6.d, 6.e, and so forth. To compile C you might be using GCC, Clang, MSVC, or something else, and these have their own version schemes. To compile and run Perl 6 you’ll for now probably be using Rakudo, which has monthly releases with a boringly predictable naming scheme: 2015.11, 2015.12, 2016.01, etc.

Perl 6 language version = a test suite

A given Perl 6 language version is defined by a test suite. An implementation that passes that test suite can claim to provide that version of the Perl 6 language. Rakudo 2015.12 was the first release to provide Perl 6.c, because it passed the test suite. Rakudo 2016.01 also passes the test suite, and thus provides Perl 6.c too.

The master branch of the test suite represents the latest development of the Perl 6 language. Language releases are represented by tags (implying that 6.c, for example, is immutable).

Programs may declare which version they want

The first statement of a program (either a script or a module) may contain a use statement requesting a particular language version:

use v6.c;

Implementations are required to refuse to run code for versions of Perl 6 they do not provide.

No language version declaration implies latest released version

Pretty much self explanatory. Asking for nothing in particular gets you the latest and greatest Perl 6 version the implementation has to offer. Meaning that 20 years down the line, people wanting to show off the latest things won’t have to prefix all their example snippets with use v6.n.

Module installers/directories may be more demanding, and require modules to specify a language version. That’s a decision for the those making such tooling.

An implementation may support many language versions at once

Assuming that we release a Perl 6.d sometime during 2016 (remember that this means “release a test suite defining what 6.d is”), then Rakudo 2016.12 will provide both Perl 6.c and Perl 6.d. To do that, it will be required to pass both the 6.c and 6.d test suites.

Implementations will need to go to moderate effort to do this

New syntax (new operators, new phasers, etc.) introduced by Perl 6.d should not be available to a program declaring use v6.c. This is to ensure that the syntactic additions do not cause breakage. For example, adding a DONE phaser could break any code that had a DONE sub and called it with listop syntax. Rakudo will, for now, support this by guarding such syntax in the parser with a version assertion. (Another strategy is to ship multiple parsers, presumably loading “on demand” those that are actually needed.)

Lexical aspects of the CORE.setting (the builtins) get the same treatment. Recall that in Perl 6, the builtins are not imported, but rather are in the outer lexical scope of the program. If Perl 6.d adds new subs, different implementations of subs, new classes or roles, or new constants, they should go in a CORE.d.setting, or some equivalent. Implementations are free to choose exactly how they will structure things. In Rakudo, we’ll retain CORE.setting with the “base” set of things, and have a CORE.d.setting that overrides and adds or overrides things as needed for Perl 6.d. This means that new Perl 6 language versions can change the behavior of existing operators or subs in ways that are not backward compatible, without breaking code that declared in a use statement it wants a previous version of Perl 6.

Late-bound things (methods, dynamic variables) have different rules. See the full guidelines for the details, but the upshot is that Perl 6 designers have a bit less flexibility in how they evolve methods on built-in types compared to subs, syntax, etc. Backward incompatible changes (as judged by the test suite) are not allowed; new methods or new overloads that don’t conflict with existing behaviors are. The other key difference is that no effort is made to “hide” methods added as part of newer versions of Perl 6 from callers using an earlier version of the Perl 6 language.

TL;DR, supposing you have a module that declares use v6.c:

  • The onus is on Perl 6 implementations to not let your program run into 6.d syntax changes or new subs/constants in the builtins
  • The onus is on you to not use methods that did not exist in 6.c. However, tooling will probably come to exist to help out (for example some kind of Module::FlightCheck that uses rakudobrew to grab the 2016.01 release, which is known to support nothing later than 6.c, and run your module’s tests on it).

If you think the second part of this sucks, then read the guidelines to get an idea why the alternatives all suck harder. :-)

Long Term Support and security releases

Rakudo releases every month, and since each release will provide a range of Perl 6 language versions by passing their official test suites, then in a sense every one is an “official” release. To be very clear, there is no release we should talk about as being “the 6.c release of Rakudo” or “the 6.d release of Rakudo” (though some folks probably will anyway, no matter what I say, and they’ll most likely mean “the first Rakudo release that supported 6.c” or so).

What we will do is declare some releases as “Long Term Support” releases. This label will be applied to releases some time after they have been made, so we can support releases that we know behaved reasonably well in the wild – at least in their first month or so “out there”. For example, suppose that 2016.02 is fairly well received. We might declare it a LTS release, and we’ll declare with that the period of time we intend to “support” it for.

What does support mean? It means that, in the event of security patches or serious bug fixes, we’ll produce bug compatible releases of all current Long Term Support versions of Rakudo. For example, suppose 2016.02 and 2016.10 were marked as LTS releases for a period of 12 months. In December 2016, we find a serious security bug in Rakudo. We’d release a 2016.02.1 and 2016.10.1, which would branch from the 2016.02 and 2016.10 tags and have the required patch(es) cherry-picked in. This would allow upgrades to get the security fix with a very high degree of confidence that existing code will not break.

Minor language versions

One thing we’d prefer to avoid is people declaring dependencies in their Perl 6 code on particular compiler versions. There’s no way to prevent it, but we can try to reduce the temptation to do so. The typical use case would be wanting to depend on a particular bug fix. Fixes get coverage in the language test suite, and so will be part of the next language release – but since the major language versions will tend to have at least a year between them, that could be a bit too long to wait.

Therefore, we’ll also have minor language versions, named 6.c.1, 6.c.2, etc. These give something implementation-independent to depend on. Chances are these will be needed more in the short term than in the long term.

What will we market?

Major language versions, primarily. We’ll use minor language versions to focus on incremental improvement and refinement. The interesting “next big thing” will be the major language versions. Each major language version will get a name. We’ve picked celebrations as the naming theme; 6.c was “Christmas”, and 6.d will be “Diwali”. (That doesn’t mean we’ll be pushing ourselves to actually ship it anywhere near where Diwali falls in the calendar. We already did that with Christmas, taking care to release on Christmas day.) So, look out for a talk on “What’s coming in Perl 6 Diwali” at some conference later on in the year. :-)

Trying out the future

One question the above raises is how to try out the latest implementation work towards the next major version. For that, we’ll use versions such as 6.d.a (6.d “alpha”). So:

use v6.d.a;

Will get you access to the stuff we expect to be in 6.d. Note these lettered versions will really be giving you the current work in progress and come with absolutely no backward-compatibility or stability promises, and support for them will be dropped swiftly after the actual language release of 6.d.

Where will new spectests go? Do I need to tag them somehow?

In the master branch of the repository of all spectests, and they need no tagging up besides the usual fudging. Released language versions are handled as tags, which are immutable.

Are the spectests really enough to specify a language?

I think they’re the best tool we have for the job at the moment. We might want to look towards property based testing ala QuickCheck some more, but that’s still a test-based approach. Natural language doesn’t have the precision, but more critically lacks the verifiability (that is, you can’t run a Perl 6 compiler against a natural language specification). Formal methods, such as operational or denotational semantics, offer greater precision than tests, but the intersection of people who know how to apply those and who want to contribute to Perl 6 is probably tiny. Certainly they lack the accessibility of a test suite expressed in Perl 6, and so would take us away from the goal of Perl 6 being a community’s language.

All that said, it’s fairly clear we need anywhere between 2 and 10 times the current number of tests to have comfortable coverage of both the language and its wide array of built-ins. We’ll be looking into coverage analyses to help us understand where those tests are most lacking.

You didn’t answer my really important question about Perl 6 versioning!

Then leave a comment, and maybe I’ll do a follow-up post to answer it. :-)

Posted in Uncategorized | 6 Comments

Not guts, but 6: part 5

It’s time for me to start building a simple Stomp::Server class, test-driven. I’ll need to extend my Test::IO::Socket::Async to make this possible, as it currently doesn’t handle listening sockets.

The simplest start

I’ll start out by stubbing an almost empty Stomp::Server class, which goes in lib/Stomp/Server.pm6:

class Stomp::Server {
    method socket-provider() {

Once again, I’ll give it a socket-provider method so I can inject a socket test double. Then, it’s time for a new test file, server.t:

use Test;
use Test::IO::Socket::Async;
use Stomp::Server;

constant $test-socket =;
my \TestableServer = Stomp::Server but role {
    method socket-provider() {

So, where to begin? To set me off in a consistent direction, I take a look at Stomp::Client and notice it expects to be constructed with a host and port. That seems like a good starting point. So, some tests:

constant $test-host = 'localhost';
constant $test-port = 1234;
dies-ok { }, "Must provide host and port to new (1)";
dies-ok { => $test-host) }, "Must provide host and port to new (2)";
dies-ok { => $test-port) }, "Must provide host and port to new (3)";

These are easily passed, by adding to Stomp::Server:

has Str $.host is required;
has Int $.port is required;

A typical pattern for asynchronous server-like things in Perl 6 is to expose a supply of incoming connections. I may as well call that listen. Here’s the simplest test I can write for that:

my $test-server = => $test-host, port => $test-port);
my $listen-supply = $test-server.listen();
isa-ok $listen-supply, Supply, "Stomp::Server listen method returns a Supply";

It explodes since there’s no listen method. Stubbing one in that contains a supply block gets me a pass:

method listen() {
    supply {

So far, so easy.

The audience is listening

Now for something a little more involved. I want to make sure that a listening socket is only opened once I tap the supply that comes back from Stomp::Server’s listen method:

my $socket-listening = $test-socket.start-listening;
nok $socket-listening, "Not listening before supply is tapped";
my $listen-tap = $listen-supply.tap(-> $incoming { });
my $socket-listener = await $socket-listening;
ok $socket-listener, "Listening once supply is tapped";

I’d also like to make sure it listens on the correct host/port:

is $, $test-host, "Listening on correct host";
is $socket-listener.port, $test-port, "Listening on correct port";

And that closing the tap on the supply Stomp::Server gives me back will also close the listen supply from the socket:

ok (await $, "Closing supply tap also closes socket";

So, that’s the tests, but Test::IO::Socket::Async isn’t up to the job yet – so that’s first in line. I’ll want an object that represents a listening socket, and I can see that at the very least it’ll need a host and a port. I’ll also need to deal with the same race between tests and code under test that I had with connect, meaning that Listener itself should hold the Supply that I will use to simulate incoming connections. Finally, I need to provide a way to test that at some point it stops listening, exposing a Promise that is kept when that happens. That’s quite a few things that need wiring together. Happily, it falls out really quite easily, by wiring things up at construction time:

class Listener {
    has $.host;
    has $.port;
    has $.is-closed =;
    has $!is-closed-vow = $!is-closed.vow;
    has $!connection-supplier =;
    has $.connection-supply = $!connection-supplier
        .on-close({ $!is-closed-vow.keep(True) });

That’s really quite pretty. Perl 6 promises that attribute initializers run in order, so I can safely rely on $!is-closed containing the Promise I next take a vow from – and also keep that vow private. I also keep the ability to inject new connections private, and then tweak the Supply with on-close, which lets me run some logic to keep the is-closed promise when a tap on the supply is closed. Since everything exposed is either immutable or concurrent, there’s no need for this to be a monitor rather than a class.

That just leaves me to write a couple of methods in Test::IO::Socket::Async. One is listen, which should match the IO::Socket::Async method. The other is start-listening, which returns a Promise that tests can await to get the Listener instance. As with testing incoming connections, I’ll need a couple of attributes too.

has @!waiting-listens;
has @!waiting-start-listening-vows;

method listen(Str() $host, Int() $port) {
    my $listener =$host, :$port);
    with @!waiting-start-listening-vows.shift {
    else {

method start-listening() {
    my $p =;
    with @!waiting-listens.shift {
    else {

Recall that Test::IO::Socket::Async is a monitor, meaning there are no data races here. Installing these updates, and running my tests, things get a bit further, then hang here:

my $socket-listener = await $socket-listening;

That’s not surprising, because the code under test never actually starts to listen. Let me make that happen with the simplest possible addition to Stomp::Server’s listen method:

method listen() {
    supply {
        whenever self.socket-provider.listen($!host, $!port) {


With that, all the tests pass. But wait…how did the socket closed test pass? That’s thanks to supply blocks being smart enough to keep track of all the things tapped by a whenever inside of them, and closing them automatically when the corresponding tap on the supply block itself is closed. Resource management is one of the things that supply blocks quietly take care of, avoiding all kinds of potential resource leaks.

Incoming connections

Next, I want to test and implement the server side of the incoming connection handshake. This will need me to finish up Test::IO::Socket::Async. As usual, I’ll start by writing the tests I want to have:

constant $test-login = 'user';
constant $test-password = 'correcthorsebatterystaple';

my $test-server = => $test-host, port => $test-port);
my $listen-tap = $test-server.listen().tap(-> $conn { });
my $socket-listener = await $test-socket.start-listening;
my $test-conn = $socket-listener.incoming-connection;

    command => 'CONNECT',
    headers => (
        login => $test-login,
        passcode => $test-password,
        accept-version => '1.2'

my $message-text = await $test-conn.sent-data;
my $parsed-message = Stomp::Parser.parse($message-text);
ok $parsed-message, "Server responded to CONNECT with valid message";
my $message = $parsed-message.made;
is $message.command, "CONNECTED", "Server sent CONNECTED command";
ok $message.headers<accept-version>:exists, "Server sent accept-version header";
is $message.body, "", "Server sent no message body";

The key new piece is the incoming-connection method. The API for simulating received data and obtaining sent data works just like in the client sockets testing – suggesting they’ll want to share a lot of the same code. But can they share all of the same code?

Glancing at the Connection monitor I already wrote, it seems the answer is no. The first four attributes:

has $.host;
has $.port;
has $.connection-promise =;
has $!connection-vow = $!connection-promise.vow;

Aren’t really interesting for an incoming connection. The rest of the code, which deals purely with sending and receiving data, seems relevant, however. So, I’ll rename Connection to ClientConnection. I’ll then add a role called Connection, and factor the common bits out there. This gives me:

role Connection {
    has @!sent;
    has @!waiting-sent-vows;
    has $!received =;

    # print, write, sent-data, Supply, receive-data...

monitor ClientConnection does Connection {
    has $.host;
    has $.port;
    has $.connection-promise =;
    has $!connection-vow = $!connection-promise.vow;

    method accept-connection() {

    method deny-connection($exception = "Connection refused") {

I’ll then define a ServerConnection monitor, which for now simply composes the Connection role:

monitor ServerConnection does Connection {

I note how nice it is that code factored out to a role can happily be composed into classes and monitors, and will automatically get the desired mutual exclusion behaviour when composed into a monitor. Now, I can implement the incoming-connection method in Listener:

method incoming-connection() {
    my $conn =;

And I’m done extending Test::IO::Socket::Async to support testing server sockets. Committed! And yes, I owe that module some tests and a README some time soon…maybe on the plane tomorrow. For now, the STOMP must go on!

So, how can I make my test pass? By doing the absolute easiest thing possible, of course. Here it is:

method listen() {
    supply {
        whenever self.socket-provider.listen($!host, $!port) -> $conn {
            whenever $conn {
                await $conn.print:
                    command => 'CONNECTED',
                    headers => ( accept-version => '1.2' );

This is a fairly epic cheat. It just responds with a CONNECTED message when it receives anything on the socket! It passes the test, though. I’ve learned – mostly when doing ping-pong pair programming – that being willing to “cheat” my way to passing simple tests is actually a good thing. It makes it clear what tests I need to write next.

Sharing the parsing

To check I do treat CONNECT messages correctly, I’ll now write a test case where I sent complete junk to the server:

my $test-server = => $test-host, port => $test-port);
my $listen-tap = $test-server.listen().tap(-> $conn { });
my $socket-listener = await $test-socket.start-listening;
my $test-conn = $socket-listener.incoming-connection;

$test-conn.receive-data: "EPIC FAIL!";

my $message-text = await $test-conn.sent-data;
my $parsed-message = Stomp::Parser.parse($message-text);
ok $parsed-message, "Server responded to invalid message with valid message";
is $parsed-message.made.command, "ERROR", "Server sent ERROR command";

Obviously, it fails, sending back a CONNECTED message instead of an ERROR. So what to do about it? Clearly, I’ll need to start paying a bit more attention to the messages coming it. That is, I need to turn a sequence of packets into a sequence of Stomp::Message objects. Hmm, that sounds familiar! Glancing in Stomp::Client, I see this:

method !process-messages($incoming) {
    supply {
        my $buffer = '';
        whenever $incoming -> $data {
            $buffer ~= $data;
            while Stomp::Parser::ServerCommands.subparse($buffer) -> $/ {
                given $/.made -> $message {
                    die $message.body if $message.command eq 'ERROR';
                    emit $message;
                $buffer .= substr($/.chars);

Well, that’s precisely what I want – except it’s parsing server commands, and I need to parse client commands. Factoring a method out feels like a job for a role – and needing to make the factored out code be parametric on something makes a parametric role just the ticket. So, I’ll add a Stomp::MessageStream role:

role Stomp::MessageStream[::MessageGrammar] {
    method !process-messages($incoming) {
        supply {
            my $buffer = '';
            whenever $incoming -> $data {
                $buffer ~= $data;
                while MessageGrammar.subparse($buffer) -> $/ {
                    given $/.made -> $message {
                        die $message.body if $message.command eq 'ERROR';
                        emit $message;
                    $buffer .= substr($/.chars);

And use it in Stomp::Client:

class Stomp::Client does Stomp::MessageStream[Stomp::Parser::ServerCommands] {

The client.t tests still happily pass, so it goes in as a commit. Now I can also compose the role into my Stomp::Server and use it:

class Stomp::Server does Stomp::MessageStream[Stomp::Parser::ClientCommands] {
    has Str $.host is required;
    has Int $.port is required;

    method listen() {
        supply {
            whenever self.socket-provider.listen($!host, $!port) -> $conn {
                whenever self!process-messages($conn) {
                    await $conn.print:
                        command => 'CONNECTED',
                        headers => ( accept-version => '1.2' );

    method socket-provider() {

That turns my test fail into…a hang. Why? Because the process-messages private method simply assumes that if it didn’t manage to parse a message, the reason must be that it’s incomplete – not that it’s broken. It will therefore just accumulate broken data in its buffer. Clearly, the parser needs to fail more violently if the incoming message is utterly bogus.

So, over in lib/Stomp/Parser.pm6, I’ll add an exception:

class X::Stomp::MalformedMessage is Exception {
    has $.reason;
    method message() {
        "Malformed STOMP message: $!reason"

Then, I’ll tweak the TOP token to:

token TOP {
    [ <command> \n || <.maybe-command> ]
    [<header> \n]*

The addition here is the sequential alternation, calling maybe-command (the dot indicates to not capture the result). The idea of maybe-command is to check if the data so far might viably parse as a command if some more data were to arrive. It can look like this:

token maybe-command {
    <[A..Z]>**0..15 $ || <.malformed('invalid command')>
method malformed($reason) {

With that, I’ve got from a hanging test to an exploding test. Happily, all of the client.t tests that also use the parser still work, though, so it can go in as a commit of its own.

It’s at this point that I finally ran into my first Rakudo bug of this series. The code I want to now write looks like this:

method listen() {
    supply {
        whenever self.socket-provider.listen($!host, $!port) -> $conn {
            whenever self!process-messages($conn) {
                await $conn.print:
                    command => 'CONNECTED',
                    headers => ( accept-version => '1.2' );

                QUIT {
                    when X::Stomp::MalformedMessage {
                        await $conn.print:
                            command => 'ERROR',
                            body => .message;

Unfortunately, QUIT phasers suffer a scoping bug if there is an exception before any other messages have been received. Thankfully, it’s easy enough to drop the whenever sugar in this case and fall back on the tap method:

method listen() {
    supply {
        whenever self.socket-provider.listen($!host, $!port) -> $conn {
                    await $conn.print:
                        command => 'CONNECTED',
                        headers => ( accept-version => '1.2' );
                quit => {
                    when X::Stomp::MalformedMessage {
                        await $conn.print:
                            command => 'ERROR',
                            body => .message;

And with that, the test passes.

And that’s Stomp::Server’s first steps

There’s still a bit more to do to ensure the first message really is a CONNECT, and to provide a hook for authentication. And then I’ll need to work out how the API for incoming messages is going to look, along with subscription/unsubscription requests. But that’ll be for next time. In the meantime, here’s a commit of the server work so far.

Posted in Uncategorized | 3 Comments