Not guts, but 6: part 2

It’s time for more hacking on my Perl 6 STOMP module. Today: parsing.

Pulling out the parser

Given my plans for adding a Stomp::Server to go with my Stomp::Client, I need to factor my STOMP message parser out so it can be used by both. That will be an easy refactor. First, the parser moves off into a file of its own and gets called Stomp::Parser:

grammar Stomp::Parser {
    token TOP {
        <command> \n
        [<header> \n]*
        \n
        <body>
        \n*
    }
    token command {
        < CONNECTED MESSAGE RECEIPT ERROR >
    }
    token header {
        <header-name> ":" <header-value>
    }
    token header-name {
        <-[:\r\n]>+
    }
    token header-value {
        <-[:\r\n]>*
    }
    token body {
        <-[\x0]>* )> \x0
    }
}

Then it’s just a use statement and a small tweak back in Stomp::Client. Done!

Testing parsing of commands – and a discovery

Perhaps the most basic test I should write is for being able to parse all of recognized commands, but not unrecognized ones. So, here goes:

use Test;
use Stomp::Parser;

plan 16;

my @commands = <
    SEND SUBSCRIBE UNSUBSCRIBE BEGIN COMMIT ABORT ACK NACK
    DISCONNECT CONNECT STOMP CONNECTED MESSAGE RECEIPT ERROR
>;

for @commands {
    ok Stomp::Parser.parse(qq:to/TEST/), "Can parse $_ command (no headers/body)";
        $_

        \0
        TEST
}

nok Stomp::Parser.parse(qq:to/TEST/), "Cannot parse unknown command FOO";
    FOO

    \0
    TEST

This doesn’t pass yet, because it turns out the grammar only supports the commands that a server may send, not those a client may send. That’s an easy fix:

token command {
    <
        SEND SUBSCRIBE UNSUBSCRIBE BEGIN COMMIT ABORT ACK NACK
        DISCONNECT CONNECT STOMP CONNECTED MESSAGE RECEIPT ERROR
    >
}

That makes me stop and think a bit, though. I just took a parser suitable for Stomp::Client and generalized it. But now it will also accept messages that a client should never expect to receive. That means I’ll have to add an extra error path for them, which feels suboptimal. Thankfully, since grammars are just funky classes, I can easily introduce variants of the parser that just accept client and server commands:

grammar Stomp::Parser::ClientCommands is Stomp::Parser {
    token command {
        <
            SEND SUBSCRIBE UNSUBSCRIBE BEGIN COMMIT ABORT ACK NACK
            DISCONNECT CONNECT STOMP
        >
    }
}

grammar Stomp::Parser::ServerCommands is Stomp::Parser {
    token command {
        < CONNECTED MESSAGE RECEIPT ERROR >
    }
}

And yes, I added tests to cover these too, in the resulting commit.

From parse tree to message

It’s fairly common in Perl 6 for a grammar to come paired with actions, which process the raw parse tree into a higher level data structure. I certainly have a desired data structure: Stomp::Message. So how is it being made today? Here is the code in question:

while Stomp::Parser::ServerCommands.subparse($buffer) -> $/ {
    $buffer .= substr($/.chars);
    if $<command> eq 'ERROR' {
        die ~$<body>;
    }
    else {
        emit Stomp::Message.new(
            command => ~$<command>,
            headers => $<header>
                .map({ ~.<header-name> => ~.<header-value> })
                .hash,
            body => ~$<body>
        );
    }
}

Clearly, part of this would end up getting duplicated in a Stomp::Server, so it’d be better pulled out, and stuck in an actions class. So, I’ll define an actions class nested inside my grammar, and put the logic there:

grammar Stomp::Parser {
    ...

    class Actions {
        method TOP($/) {
            make Stomp::Message.new(
                command => ~$<command>,
                headers => $<header>
                    .map({ ~.<header-name> => ~.<header-value> })
                    .hash,
                body => ~$<body>
            );
        }
    }
}

It’s nice to notice how this is basically a cut-paste refactor. Now for a test:

{
    my $parsed = Stomp::Parser.parse(qq:to/TEST/);
        SEND
        destination:/queue/stuff

        Much wow\0
        TEST
    ok $parsed, "Parsed message with header/body";

    my $msg = $parsed.made;
    isa-ok $msg, Stomp::Message, "Parser made a Stomp::Message";
    is $msg.command, "SEND", "Command is correct";
    is $msg.headers, { destination => "/queue/stuff" }, "Header is correct";
    is $msg.body, "Much wow", "Body is correct";
}

The test fails, because I forgot to set the actions class when calling parse. Hmm…I’d need to do that in Stomp::Client too…and in Stomp::Server. In fact, I don’t have an example off hand when I’d care to avoid producing a Stomp::Message. That probably means it wants to be the default. That’s easily taken care of by overriding parse and subparse to set the actions by default:

method parse(|c) { nextwith(actions => Actions, |c); }
method subparse(|c) { nextwith(actions => Actions, |c); }

I use |c to swallow up all the incoming arguments, and then pass them along. Notice how I take care to put my default first, and then splice in anything the caller specifies. This means there’s still a way to provide alternate actions, or to pass Nil to get none at all. Test passes. Commit. Yay.

Finally, I can go back and tidy up the code in the buffer processing some:

method !process-messages($incoming) {
    supply {
        my $buffer = '';
        whenever $incoming -> $data {
            $buffer ~= $data;
            while Stomp::Parser::ServerCommands.subparse($buffer) -> $/ {
                given $/.made -> $message {
                    die $message.body if $message.command eq 'ERROR';
                    emit $message;
                }
                $buffer .= substr($/.chars);
            }
        }
    }
}

It no longer needs to dig into the parse tree to find the command and body for the error handling. Generally, the code in this method is much more focused on doing a single thing: turning a stream of incoming characters into a stream of messages, coping with messages that fall over packet boundaries. Win!

Simplifying the actions

Refactoring feels nicer when there’s tests. So, is there anything of the code I now have nicely covered that I fancy cleaning up? Well, perhaps there is a little bit of simplification on offer in my small Actions class:

class Actions {
    method TOP($/) {
        make Stomp::Message.new(
            command => ~$<command>,
            headers => $<header>
                .map({ ~.<header-name> => ~.<header-value> })
                .hash,
            body => ~$<body>
        );
    }
}

For one, I don’t actually need to explicitly do the hash coercion there. The default semantics of construction perform assignment, not binding, and a list of pairs can happily be assigned to a hash. That map is digging into the parse tree too, and it’d probably be neater to do handle the pair construction in a second action method. So, here goes:

class Actions {
    method TOP($/) {
        make Stomp::Message.new(
            command => ~$<command>,
            headers => $<header>.map(*.made),
            body    => ~$<body>
        );
    }
    method header($/) {
        make ~$<header-name> => ~$<header-value>;
    }
}

I think I like that better. Not really any shorter, but breaks the work up into smaller chunks for easier digesting of the code. So, it’s in.

Pretty nice progress

That’ll do me for this time. By now, I’ve got the things I’d need to build my Stomp::Server module nicely factored out. Better still, they’re covered by some tests. Stomp::Client itself is now much more focused, and down to under a hundred lines of code.

Next, I’ll want to look into getting some testing in place for Stomp::Client. And that will mean taking a little diversion: there’s no test double in the ecosystem for IO::Socket::Async yet, so I’ll need to build one.

Posted in Uncategorized | Leave a comment

Not guts, but 6: part 1

After the Christmas release of Perl 6, I spent the better part of a week in bed, exhausted and somewhat sick. I’m on the mend, but I’m going to be taking it easy for the coming weeks. I suspect it’ll be around February before I’m feeling ready for my next big adventures in Perl 6 compiler/VM hacking. It’s not so much a matter of not having motivation to work on stuff; I’ve quite a lot that I want to do. But, having spent six months where I was never quite feeling well, just somewhere between not well and tolerably OK, I’m aware I need to give myself some real rest, and slowly ease myself back into things. I’ll also be keeping my travel schedule very light over the coming months. The Perl 6 Christmas preparations were intense and tiring, but certainly not the only thing to thank for my exhaustion. 3-4 years of always having a flight or long-distance train trip in the near future – and especially the rather intense previous 18 months – has certainly taken its toll. So, for the next while I’ll be enjoying some quality time at home in Prague, enjoying nice walks around this beautiful city and cooking plenty of tasty Indian dishes.

While I’m not ready to put compiler hat back on yet, I still fancied a little gentle programming to do in the next week or two. And, having put so much effort into Perl 6, it’s high time I got to have the fun of writing some comparatively normal code in it. :-) So, I decided to take the STOMP client I hacked up in the space of an hour for my Perl 6 advent post, and flesh it out into a full module. As I do so, I’m going to blog about it here, because I think in doing so I’ll be able to share some ideas and ways of doing things that will have wider applicability. It will probably also be a window into some of the design thinking behind various Perl 6 things.

Step 0: git repo

I took the code from the blog post, and dropped it into lib/Stomp/Client.pm6. Then it was git init, git add, git commit, and voila, I’m ready to get going. I also decided to use Atom to work on this, so I can enjoy the nice Perl 6 syntax highlighting plug-in.

Testing thinking

Since my demos for the blog post actually worked, it’s fairly clear that I at this point have “working code”. Unfortunately, it also has no tests whatsoever. That makes me uneasy. I’m not especially religious about automated testing, I just know there have been very few times where I wrote tests and regretted spending time doing so, but a good number of times when I “didn’t need to spend time doing that” and later made silly mistakes that I knew full well would have been found by a decent suite of tests.

More than that, I find that testable designs tend to also be extensible and loosely coupled designs. That partly falls out of my belief that tests should simply be another client of the code. Sometimes on #perl6, somebody asks how to test their private methods. Of course, since I designed large parts of the MOP, I can rattle off “use .^find_private_method(‘foo’) to get hold of it, then call it” without thinking. But the more thoughtful answer is that I don’t think you should be testing private methods. They’re private. They’re an implementation detail, like attributes. My expectation in Perl 6 is that I can perform a correct refactor involving private methods or attributes without having to be aware of anything textually outside the body of the class in question. This means that flexibility for the sake of testability will need to make it into the public interface of code – and that’s good, because it will make the code more open to non-testing extension too.

My current Stomp::Client is not really open to easy automated testing. There is one non-easy way that’d work, though: write a fake STOMP server to test it against. That’s probably not actually all that hard. After all, I already have a STOMP message parser. But wait…if my module already contains a good chunk of the work needed to offer server support, maybe I should build that too. And even if I don’t, I should think about how I can share my message parser so somebody else can. And that means that rather than being locked up in my Stomp::Client class it will need to become public API. And that in turn would mean a large, complex, part of the logic…just became easily testable!

I love these kinds of design explorations, and it’s surprising how often the relatively boring question of “how will I test this” sets me off in worthwhile directions. But wait…I shouldn’t just blindly go with the first idea I have for achieving testability, even if it is rather promising. I’ve learned (the hard way, usually) that it’s nearly always worth considering more than one way to do things. That’s often harder that it should be, because I find myself way too easily attached to ideas I’ve already had, and wanting to defend them way too early against other lines of thought. Apparently this is human nature, or something. Whatever it is, it’s not especially helpful for producing good software!

Having considered how I might test it as is, let me ponder the simplest change I could make that would make the code a lot easier to test. The reason I’d need a fake server is because the code tightly couples to IO::Socket::Async. It’s infatuated with it. It hard-codes its name, declaring that we shall have no socket implementation, but IO::Socket::Async!

my $conn = await IO::Socket::Async.connect($!host, $!port);

So, I’ll change that to:

my $conn = await self.socket-provider.connect($!host, $!port);

And then add this method:

method socket-provider() {
    IO::Socket::Async
}

And…it’s done! My tests will simply need to do something like:

my \TestClient = Stomp::Client but role {
    method socket-provider() {
        Fake::Client::Socket
    }
}

And, provided I have some stub/mock/fake implementation of the client bits of IO::Socket::Async, all will be good.

But wait, there’s more. It’s also often possible to connect to STOMP servers using TLS, for better security. Suppose I don’t support that in my module. Under the previous design, that would have been a blocker. Now, provided there’s some TLS module that provides the same interface as IO::Socket::Async, it’ll be trivial to use it together with my Stomp::Client. Once again, thinking about testability in terms of the public interface gives me an improvement that is entirely unrelated to testability.

I liked this change sufficiently I decided it was time to commit. Here it is.

Exposing Message

I’m a big fan of the aggregate pattern. Interesting objects often end up with interesting internal structure, which is best expressed in terms of further objects. Since classes, grammars, roles and the like can all be lexically scoped in Perl 6, keeping such things hidden away as implementation details is easy. It’s how I tend to start out. For example, my Message class, representing a parsed STOMP message, is lexical and nested inside of the Stomp::Client class:

class Stomp::Client {
    my class Message {
        has $.command;
        has %.headers;
        has $.body;
    }

    ...
}

The grammar for parsing messages is even lexically scoped inside of the one method that uses it! Lexical scoping is another of those things Perl 6 offers for keeping code refactorable. In fact, it’s an even stronger one than private attributes and methods offer. Those you can go and get at using the MOP if you really want. There’s no such trickery on offer with lexical scoping.

So, that’s how I started out. But, by now, I know that for both testing and later implementing a Stomp::Server module, I’d like to pull Message out. So, off to a Stomp/Message.pm6 it goes. Since it was lexical before, it’s easy to fix up the references. In fact, the Perl 6 compiler will happily tell me about them at compile time, so I can be happy I didn’t miss any. (It turns out there is only one). Another commit.

Oh, behave!

At the point I expose a class to the world, I find it useful to step back and ask myself what it’s responsibilities are. Right now, the answer seems to be, “not much!” It’s really just a data object. But generally, objects aren’t just data. They’re really about behaviour. So, are there any behaviours that maybe belong on a Message object?

Looking through the code, I see this:

await $conn.print: qq:to/FRAME/;
    CONNECT
    accept-version:1.2
    login:$!login
    passcode:$!password

    \0
    FRAME 

And, later, this:

$!connection.print: qq:to/FRAME/;
    SEND
    destination:/queue/$topic
    content-type:text/plain

    $body\0
    FRAME

There’s another such case too, for subscribe. It’s quite easy for a string with a bit of interpolation to masquerade as being too boring to care about. But what I really have here is knowledge about how is STOMP message formed scattered throughout my code. As this module matures from 1-hour hack to a real implementation of the STOMP spec, this is going to have to respect a number of encoding rules – or risk being vulnerable to injection attacks. (All injection attacks really come from failing to treat languages as languages, and instead just treating them as strings that can be stuck together.) And logic that will therefore even be security sensitive absolutely does not want scattering throughout my code.

So, I’ll move the logic to Stomp::Message. First, a failing test goes into t/message.t:

use Test;
use Stomp::Message;

plan 1;

my $msg = Stomp::Message.new(
    command => 'SEND',
    headers => ( destination => '/queue/stuff' ),
    body    => 'Much wow');
is $msg, qq:to/EXPECTED/, 'SEND message correctly formatted';
    SEND
    destination:/queue/stuff

    Much wow\0
    EXPECTED

I find it reassuring to see a test actually fail before I do the work to make it pass. It tells me I actually did something. Now for the implementation:

method Str() {
    qq:to/END/
        $!command
        %!headers.fmt('%s:%s')

        $!body\0
        END
}

The fmt method is one of those small, but valuable Perl 6 features. It’s really just a structure-aware sprintf. On hashes, it can be given a format string for each key and value, along with a separator. The default separator is \n, which is exactly what I need, so I don’t need to pass it. This neatly takes a loop out of my code, and means I can lay out my heredoc to look quite like the message I’m producing. Here’s the change.

Construction tweaks

With a passing test under my belt, I’d like to ponder whether there’s any more interesting tests I might like to write Right Now for Stomp::Message. I know I will need to make a pass through the spec for encoding rules, but that’s for later. Putting that aside, however, are there any other ways that I might end up with my Stomp::Message class producing malformed messages?

The obvious risk is that an instance may be constructed with no command. This can never be valid, so I’ll simply forbid it. A failing test is easy:

dies-ok
    { Stomp::Message.new( headers => (foo => 'bar'), body => 'Much wow' ) },
    'Stomp::Message must be constructed with a command';

So is this fix: just mark the attribute as required!

has $.command is required;

It is allowable to have an empty body in some messages. At present, it kind of supports that without having to pass it explicitly, but there will be a warning. The fix is 4 characters. It’s really rather borderline whether this is worth a test, for me. But I’ll write one anyway:

{
    my $msg = Stomp::Message.new(
        command => 'CONNECT',
        headers => ( accept-version => '1.2' ));
    is $msg, qq:to/EXPECTED/, 'CONNECT message with empty body correctly formatted';
        CONNECT
        accept-version:1.2

        \0
        EXPECTED
    CONTROL {
        when CX::Warn { flunk 'Should not warn over uninitialized body' }
    }
}

It fails. And then I do:

has $.body = '';

And it passes. The boilerplate there makes me thing there’s some market for an easier way to express “it doesn’t warn” in a test, but I’ll leave that yak for somebody else.

Those went in as two commits, because they’re two separate changes. I like to keep my commits nice and atomic that way.

Eliminating the duplication

Finally, I go and replace the various places that produced formatted STOMP messages with use of the Stomp::Message class:

$!connection.print: Stomp::Message.new:
    command => 'SUBSCRIBE',
    headers => (
        destination => "/queue/$topic",
        id => $id
    );

3 changes, 1 commit, done.

Enough for this time!

Next time, I’ll be taking a look at factoring out the parser, and writing some tests for it. Beyond that, there’ll be faking the async socket API, supporting unsubscription from topics, building STOMP server support, and more.

Posted in Uncategorized | 2 Comments

Reflecting, celebrating, and looking forward

As I write, the Perl 6 Christmas release is taking place. It goes without saying that it’s been a long journey to this point. I’ve walked less than half of it, joining the effort 7-8 years ago through. Back then, I was fresh from university, had enjoyed the courses on compiler construction, type systems, and formal semantics, and was looking for an open source project in one or more of these fields to contribute to.

For many years before I got involved with Perl 6, I’d been using Perl extensively for web application development. It helped me live more comfortably through my college and university years, and I had sufficient work coming in that I kept a few other part-timers in a job too. That would be reason enough for being fond of a language, but there was more. Perl is really the language where I “grew up” as a programmer. It’s the first language where I used regexes and closures, it helped develop my early understanding of OOP, and it introduced me to testing culture. It’s the first language where I went to a conference, and realized the value a language’s community can have. All of this has been foundational to what I’ve done in my career since then, which besides Perl 5/6 has seen me deliver code and training in a wide range of languages, including C, C#, Java, and Python.

Through using Perl I got to know about the Perl 6 project – and it seemed a good fit with my interest in languages, compilers, and types. I knew it was a long-running project with plenty of history, but looking at what it was trying to achieve, I was convinced it was worth trying to help out a little to make it happen. It’s surprising just how successful Perl 6 has been over the years at attracting really great people to work on it – knowing full well its long, and at times difficult, history.

From patcher to architect

As usual with anyone new to an open source project, my first contributions were small. Portability things, small fixes here and there, minor features, and the like. Over time, I found myself taking on increasingly large language features. By 2009 I was a regular contributor, and had a decent grasp of much of the Rakudo compiler code base. I caused Patrick Michaud, the lead developer of Rakudo at the time, a good number of “oh no!” moments, as I didn’t yet have a great picture of the overall design – but was putting in notable features anyway. He, and others, did a great job of steering me gently in the right kind of direction.

2010 probably goes down as the most difficult year of my involvement in the Perl 6 project. The first Rakudo Star, a “useful, usable, early adopter’s release of Perl 6”, was generally graded as “not good enough”. More problematic, as I took some steps back and reflected on how to rectify this, was that the issues went right to the very heart of the architecture of Rakudo itself. Rakudo in those days was a traditional compiler: you fed it code in Perl 6, it spat out code in an intermediate language, which was executed on the Parrot VM. Of course, this is what you learn a compiler does at university, and it’s all very clean and simple…and not at all what Perl 6 demands.

Compile time and runtime are not so cleanly distinct concepts in Perl. They never have been, thanks to things like BEGIN blocks. But in Perl 6, we really wanted to both handle BEGIN time – and have lots of meta-programming stuff going on at BEGIN time – and still be able to have separate compilation. This, it turns out, is a tricky problem. And, as I looked into how to solve it, it became clear that it was going require a deep, drawn out, overhaul. Further, it became clear that this was an effort I was going to need to play architect on and largely lead. History has shown it to be the right call; the architecture put in place then has largely survived intact to today’s release, and it enabled a lot of the great things we simply take for granted today. But at the time, it was a lonely path to walk.

In the years since then, I led the way on getting Rakudo ported to the JVM – and have been happy to see that work taken forward by others. I was also a founder of MoarVM, the VM that the Perl 6 Christmas release primarily targets. Most recently, I took up the long-neglected task of getting Perl 6’s concurrency design into decent shape, doing the initial implementations of the features. It goes without saying that none of this would have been possible without the incredible bunch of people in the Perl 6 community who not only contributed their great technical skills to these efforts, but also a good deal of encouragement and friendship along the way.

For 7 years I made a really great job of just being “this guy who hacks on stuff” while managing to disclaim wearing any particular hat – until Larry went and called me out as architect at the Swiss Perl Workshop this last summer. It’s a role I’m proud to hold for the moment, and I look forward to continuing to contribute to the Perl 6 project for some years to come.

So, about the release…

By this point into writing the post – the Christmas release of Perl 6 has already taken place! Hurrah! But…what does it mean?

First, it’s important to understand that we’ve actually released two things (and that there are a couple more to come in the next days).

The first of these is the specification of the Perl 6 language itself, which is expressed as a suite of over 120,000 tests. Versions of the Perl 6 language will be named after festivals or celebrations; our alpha was Advent, our beta was Birthday, and we’re now at Christmas. This is referred to as “Perl 6 Christmas”, or in short as Perl 6.c. The next major language release will most likely be called Diwali, though I’m not sure we’ve worked out how to spell it yet. :-)

The second is a release of the Rakudo Perl 6 compiler that complies with this specification. We don’t imagine we’ll manage to stop people blurring language specification and language implementation, and know full well that when most people say they want “Perl 6 Christmas” they actually want a compiler that implements that version of the language. All the same, it’s a valuable distinction, as it means we remain open to alternative implementations – something that may not be that important now, but may be in a decade or two.

In the coming days, we’ll also produce a Rakudo Star release – which consists of the compiler along with documentation and a selection of modules – and that will also have an MSI, to make life easier for Windows folks.

What happens next?

For me? Rest. A lot of rest. Really a lot of rest. It’s been an exhausting last few months in the run up to the release, chasing down lots of little semantic details that we really wanted to get straightened out ahead of the freezing of the Perl 6 Christmas language specification. It was worth it, and I’m really happy with the language we’ve ended up with. But now it’s time to take care of myself for a while.

Come 2016, the work will go on. However, the focus will shift. For compiler and runtime folks like me, the focus will be largely on performance and reliability engineering. Now we have a stable language definition, it makes much more sense to invest more heavily in optimizing things. That isn’t to say a great deal of optimization work hasn’t already taken place; many of us have worked hard on that. But there’s a lot more that can, and will, be done.

Even more important than that work, however, will be the work that takes place on the Perl 6 ecosystem in the year to come. Since we announced the Christmas target for a stable language/compiler release, a number of new faces showed up up to help with writing modules and building supporting infrastructure. Now, their work won’t have to contend with us compiler hackers breaking the world under them every week – and that hopefully will encourage more to dive into the ecosystem work also. Maturity here will take time, but there’s plenty of expertise and wisdom on these matters in the Perl community.

Rakudo will stick to a monthly release cycle. We’ll be making a number of process changes to help us deliver those monthlies at a higher quality, especially with regard to not regressing on the 6.c language test suite, key modules, and ecosystem tooling. These changes will also introduce a stability level that lies between bleeding edge commits and monthlies. We also expect the language specification itself to have a small number of minor versions between now and Diwali, and we will treat these a lot like we have the Christmas release, with some extra attention going into the release than normal monthlies will get. Those releases will tend to have seen a greater focus on semantics detail, so will for now serve as our “stable track” for those who want something more occasional than the monthlies. We’ll see how that serves us and our userbase, and adjust as needed. It’s all about keeping the ceremony of contributing and releasing low, while keeping the quality of releases up.

Last but not least…

…I’d like to say thank you. Thank you to all those who have been my fellow contributors on the Perl 6 project, for being among the best people I’ve ever worked with on anything. Thank you to all those who came to my Perl conference talks and read my ramblings here over the years, and provided feedback and encouragement. Thank you to those who donated financially to the Perl 6 project, and so enabled Perl 6 to be part of my day job. And last, but absolutely not least, thank you to those of you who have written and run Perl 6 programs over the years, and shared that you were doing so – because perhaps the greatest reward of all for a compiler/VM hacker is seeing others use their work to build their own great creations.

Together, we’ve breathed life into a new Perl. I’m damn proud of what we’ve built together – and I can’t wait to see how people put it to work.

Merry Christmas!

Posted in Uncategorized | 13 Comments

Getting closer to Christmas

The list of things we “really want to do” before the Christmas release is gradually shrinking. Last time I wrote here, the xmas RT list was around 40 tickets. Now it’s under 20. Here’s an overview of the bits of that I’ve been responsible for.

Supply API cleanup

I did the original implementation of supplies a couple of years back. I wasn’t sure how they would be received by the wider community, so focused on just getting something working. (I also didn’t pick the name Supply; Larry is to thank for that, and most of the other naming). Supplies were, it turns out, very well received and liked, and with time we fleshed out the available operations on supplies, and 6 months or so back I introduced the supply, whenever, and react syntactic sugar for supplies.

What never happened, however, was a cleanup of the code and model at the very heart of supplies. We’ve had to “build one to throw away” with nearly everything in Perl 6, because the first implementation tended to show up some issues that really wanted taking care of. So it was with supplies. Thankfully, since everything was built on a fairly small core, this was not to be an epic task. And, where the built-ins did need to be re-worked, it could be
done much more simply than before by using the new supply/whenever syntax.

While much of the cleanup was to the internals, there are some user-facing things. The most major one is a breaking change to code that was doing Supply.new to create a live supply. As I started cleaning up the code, and with experience from using related APIs in other languages, it became clear that making Supply be both the thing you tapped and the thing that was used to publish data was a design mistake. It not only would make it harder to trust Supply-based code and enforce the Supply protocol (that is, emit* [done | quit]),
but it also would make it hard to achieve the good performance by forcing extra sanity checks all over the place.

So, we split it up. You now use a Supplier in order to publish data, and obtain a Supply from it to expose to the world:

# Create a Supplier
my $supplier = Supplier.new;

# Get a Supply from it
my $supply = $supplier.Supply;
$supply.tap({ .say });

# Emit on it
$supplier.emit('oh');
$supplier.emit('hai');

This also means it’s easy to keep the ability to emit to yourself, and expose the ability to subscribe:

class FTP::Client {
    has $!log-supplier = Supplier.new;
    has $.log = $!log-supplier.Supply;
    ...
}

Since you can no longer call emit/done/quit on a Supply, you can be sure there won’t be values getting sneaked in unexpectedly.

The other change is that we now much more strongly enforce the supply protocol (that is, you’ll never see another emit after a done/quit unless you really go out of your way to
do so) and that only value will be pushed through a chain of supplies at a time (which prevents people from ending up with data races). Since we can ask supplies if they are
already sane (following protocol and serial (one at a time), we can avoid the cost of enforcing it at every step along the way, which makes things cheaper. This is just one of the
ways performance has been improved. We’ve some way to go, but you can now push into the hundreds of thousands of messages per second through a Supply.

Along the way, I fixed exceptions accidentally getting lost when unhandled in supplies in some cases, a data loss bug in Proc::Async and IO::Socket::Async, and could also resolve the RT complaining that the supply protocol was not enforced.

Preparing ourselves for stronger back-comparability

Once the Perl 6 Christmas release of the language is made, we’ll need to be a lot more careful about not breaking code that’s “out there”. This will be quite a change from
the last months, where we’ve been tweaking lots of things that bothered us. To help us with this change, I wrote up a proposal on how we’ll manage not accidentally changing tests that are part of the Perl 6 Christmas language definition, allow code to be marked with the language version it expects, and how we’ll tweak our process to give us a better chance of shopping solid releases that do not introduce regressions. Further feedback is still welcome; as with all development process things, I expect this to continue to evolve over the years.

I/O API cleanups

A few tickets complained about inconsistencies in a few bits of the I/O APIs, such as the differing ways of getting supplies of chars/bytes for async processes, sockets, and files. This has received a cleanup now. The synchronous and asynchronous socket APIs also got a little further alignment, such that the synchronous sockets now also have connect and listen factory methods.

Bool is now a real enum

This is a years old issue that we’ve finally taken care of in time for the release: Bool is now a real enum. It was mostly tricky because Bool needs setting up really quite early on in the language bootstrap. Thankfully, nine++ spent the time to figure out how do to this. His
patch nearly worked – but ran into an issue involving closure semantics with BEGIN and COMPOSE blocks. I fixed that, and was able to merge in his work.

Interaction of start and dynamic variables

A start block can now see the dynamic variables where that were available where it was started.

my $*TARGET_DIR = 'output/';
await start { say $*TARGET_DIR } # now works

Correcting an array indexing surprise

Indexing with a range would always auto-truncate to the number of elements in an array:

my @a = 1, 2, 3;
say @a[^4]; # (1 2 3)

While on the surface this might be useful, it was rather good at confusing people who expected this to work:

my @a;
@a[^2] = 1, 2;
say @a;

Since it auto-truncated to nothing, no assignment took place. We’ve now changed it so only ranges whose iterators are considered lazy will auto-truncate.

my @a = 1, 2, 3;
say @a[^4]; # (1 2 3 (Any)) since not lazy
say @a[0..Inf] # (1 2 3) since infinite is lazy
say @a[1..Inf] # (2 3) since infinite is lazy
say @a[lazy ^4] # (1 2 3) since marked lazy

Phaser fixes

I fixed a few weird bugs involving phasers.

  • RT #123732 noted that return inside of a NEXT phaser but outside of a routine would just cause iteration to go to the next value, rather than give an error (it now does, and a couple of similarly broken things also do)
  • RT #123731 complained that the use of last in a NEXT phaser did not correctly exit the loop; it now does
  • RT #121147 noted that FIRST only worked in for loops, but not other loops; now it does

Other smaller fixes

Here are a number of other less notable things I did.

  • Fix RT #74900 (candidate with zero parameters should defeat candidate with optional parameter in no-arg multi dispatch)
  • Tests covering RT #113892 and RT #115608 on call semantics (after getting confirmed that Rakudo already did the right thing)
  • Review RT #125689, solve the issue in a non-hacky way, and add a test to cover it
  • Fix RT #123757 (semantics of attribute initializer values passed to constructor and assignemnt was a tad off)
  • Hunt down a GC hang blocking module precomp branch merge; hopefully fix it
  • Review socket listen backlog patch; give feedback
  • Write up rejection of RT #125400 (behavior of unknown named parameters on methods)
Posted in Uncategorized | 1 Comment

What one Christmas elf has been up to

Here’s a look over the many things I’ve been working on in recent weeks to bring us closer to the Christmas Perl 6 release. For the most part, I’ve been working through the tickets we’ve attached to our “things to resolve before the Christmas release” meta-ticket – and especially trying to pick off the hard/scary ones sooner rather than later. From a starting point of well over 100 tickets, we’re now down to less than 40.

NFG improvements

If you’ve been following along, you’ll recall that I did a bunch of work on Normal Form Grapheme earlier on in the year. Normal Form Grapheme is an efficient way of providing strings at grapheme level. If a single grapheme (that is, thing a human would consider a character) is represented by multiple codepoints, then we create a synthetic codepoint for it, so we can still get O(1) string indexing and cheaply and correctly answer questions like, “how many characters is this”.

So, what was there to improve? Being the first to do something gives a low chance of getting everything right first time, and so was the case here. My initial attempt at defining NFG was the simplest possible addition to NFC, which works in terms of a property known as the Canonical Combining Class. NFC, to handwave some details, takes cases where we have a character followed by a combining character, and if there is a precomposed codepoint representing the two, exchanges them for the single precomposed codepoint. So, I defined NFG as: first compute NFC, then if you still see combining characters after a base character, make a synthetic codepoint. This actually worked pretty well in many cases. And, given the NFC quick-check property can save a lot of analysis work, NFG could be computed relatively quickly.

Unfortunately, though, this approach was a bit too simplistic. Unicode does actually define an algorithm for grapheme clusters, and it’s a bit more complex than doing an “NFC++.” Fortunately, I’d got the bit of code that would need to change rather nicely isolated, in expectation that something like this might happen anyway. So, at least 95% of the NFG implementation work I’d done back in April didn’t have to change at all to support a new definition of “grapheme”. Better yet, the Unicode consortium provided a bunch of test data for their grapheme clustering algorithm, which I could process into tests for Perl 6 strings.

So far so good, but there was a bit of a snag: using the NFC quick check property was no longer going to be possible, and without it we’d be in for quite a slowdown when decoding bytes to strings – which of course we do every time we get input from the outside world! So, what did I do? Hack our Unicode database importer to compute an NFG quick check property, of course. “There, I fixed it.”

So, all good? Err…almost. I’d also back in April done some optimizations around assuming that anything in the ASCII range was not subject to NFG. Alas, \r\n has been defined as a single grapheme. And yes, that really does mean:

> say "OMG\r\n".chars
4

I suspect this will be one of those “you’ll can’t win” situations. Ignore that bit of the Unicode spec, and people who understand Unicode will complain that Perl 6 implements it wrong. Follow it, and folks who don’t know the reasoning will think the above answer is nuts. :-) By the way, asking “how many codepoints” is easy:

> say "Hi!\r\n".codes
5

Making “\r\n” a single grapheme was rather valuable for a reason I hadn’t expected: now that something really common (at least, on Windows) exercised NFG, a couple of small implementation bugs were exposed, and could be fixed. It was also rather a pain, because I had to go and fix the places that wrongly thought they needn’t care for NFG (for example, the ASCII and Latin-1 encondings). The wider community then had to fix various pieces of code that used ord – a codepoint level operation – to see if there was a \r, then expected a \n after it, and then got confused. So, this was certainly a good one to nail before the Christmas release, after which we need to get serious about not breaking existing code.

As a small reward for slogging through all of this, it turned out that \r\n being a single grapheme made a regex engine issue involving ^^ and $$ magically disappear. So, that was another one off the Christmas list.

There were a few other places where we weren’t quite getting things right with NFG:

  • Case folding, including when a synthetic was composed out of something that case folded to multiple codepoints (I’m doubtful this code path will ever be hit for text in any real language, but I’m willing to be surprised)
  • Longest Token Matching in grammars/regexes (the NFA compiler/evaluator wasn’t aware of synthetics; now it is)

And with that, I think we can call NFG done for Christmas. Phew!

Shaped arrays

I’ve finally got shaped arrays fleshed out and wired into the language proper. So, this now works:

> my @a[3;3] = 1..3, 4..6, 7..9;
[[1 2 3] [4 5 6] [7 8 9]]
> say @a[2;1]
8
> @a[3;1] = 42
Index 3 for dimension 1 out of range (must be 0..2)

This isn’t implemented as an array of arrays, but rather as a single blob of memory with 9 slots. Of course, those slots actually point to Scalars, so it’s only so much more efficient. Native arrays can be shaped also, though. So, this:

my int8 @a[1024;1024];

Will allocate a single 1MB blob of memory and all of the 8-bit integers will be packed into it.

Even if you aren’t going native, though, shaped arrays do have another sometimes-useful benefit over nested arrays: they know their shape. This means that if you ask for the values, you get them:

> my @a[3;3] = 1..3, 4..6, 7..9; say @a.values;
(1 2 3 4 5 6 7 8 9)

Whereas if you just have an array of arrays and asked for the values, you’d just have got the nested arrays:

> my @a = [1..3], [4..6], [7..9]; say @a.values;
([1 2 3] [4 5 6] [7 8 9])

The native array design has been done such that we’ll be able to do really good code-gen at various levels – including down in the JIT compiler. However, none of that is actually done yet, nor will it be this side of Christmas, so the performance of shaped arrays – including the native arrays – isn’t too hot. In general, we’re focusing really hard on places we need to nail down semantics at the moment, because we’ll have to live with those for a long time. We’re free to improve performance every single monthly release, though – and will be in 2016.

Module installation and precompilation

I spent some time pondering and writing up a gist about what I thought management of installed modules and their precompilations should look like, along with describing a precompilation solution for development time (so running module test suites can benefit “for free” from precompilation). I was vaguely hoping not to have to wade into this area – it’s just not the kind of problem I consider myself good at and there seem to be endless opinions on the subject – but got handed my architect hat and asked to weigh in. I’m fairly admiring of the approach taken under the .git directory in Git repositories, and that no doubt influenced the solution I came up with (yes, there are SHA-1s aplenty).

After writing it, I left for a week’s honeymoon/vacation, and while I was away, something wonderful happened: nine++ started implementing what I’d suggested! By this point he’s nearly done, and it’s largely fallen out as I had imagined, with the usual set of course corrections that implementing a design usually brings. I’m optimistic we’ll be able to merge the branch in during the next week or so, and another important piece will have fallen into place in time for Christmas. Thanks should also go to lizmat++, who has done much to drive module installation related work forward and also provided valuable feedback in earlier drafts of my design.

Line endings

Windows thinks newlines are \r\n. Most of the rest of the world think they are \n. And, of course, you end up with files shared between the two, and it’s all a wonderful tangle. In regexes in Perl 6, \n has always been logical: it will happy match \r\n or the actual UNIX-y \n. That has not been the case for \n in strings, however. Thus, on Windows:

say "foo!";

Would until recently just spit out \n, not \r\n. There actually are relatively few places that this actually causes problems today: the command shell is happy enough, pretty much every editor is happy (of course, Notepad isn’t), and so forth. Some folks wanted us to fix this, others said screw it, so I asked Larry what we should do. :-) The solution we settled on is making \n in strings also be logical, meaning whatever the $?NL compile-time constant contains. And we pick the default value of that by platform. So on Windows the above say statement will spit out \r\n. (We are smart enough to recognize that a \r\n sequence in a string is a “single thing” and not go messing with the “\n” inside of it!) There are also pragmas to control this more finely if you don’t want the platform specific semantics:

use newline :lf; # Always UNIX-y \x0A
 use newline :crlf; # Always Windows-y \x0D\x0A
 use newline :cr; # \x0D for...likely nothing Perl 6 runs on :-)

Along with this, newline related configuration on file handles and sockets has been improved and extended. Previously, there was just nl, which was the newline for input and output. You can now set nl-in to either a string separator or an array of string separators, and they can be multiple characters. For output, nl-out is used. The default nl-in is [“\r\n”, “\x0A”], and the default nl-out is “\n” (which is logically interpreted by platform).

Last but not least, the VM-level I/O layer is now aware of chomping, meaning that rather than it handing us back a string that we then go and chomp at Perl 6 level, it can immediately hand back a string with the line ending readily chopped off. This was an efficiency win, but since it was done sensitive to the current set of seperators also fixed a longstanding bug where we couldn’t support auto-chomping of custom input line separators.

Encodings

A couple of notable things happened with regards to encodings (the things that map between bytes and grapheme strings). On MoarVM, we’ve until recently assumed that every string coming to us from the OS was going to be decodable as UTF-8 (except on Windows, which is more into UCS-2). That often works out, but POSIX doesn’t promise you’ll get UTF-8, or even ASCII. It promises…a bunch of bytes. We can now cope with this properly – surprisingly enough, thanks to the power of NFG. We now have a special encoding, UTF-8 Clean-8-bit, which turns bytes that are invalid as UTF-8 into synthetics, from which we can recover the original bytes again at output. This means that any filename, environment variable, and so forth can be roundtripped through Perl 6 problem-free. You can concat “.bak” onto the end of such a string, and it’ll still work out just fine.

Another Christmas RT complained that if you encoded a string to an encoding that couldn’t represent some characters in it, it silently replaced them with a question mark, and that an exception would be a better default. This was implemented by ilmari++, who also added support for specifying a replacement character. I just had to review the patches, and apply them. Easy!

Here, here

I fixed all of the heredoc bugs in the Christmas RT list:

  • RT #120788 (adverbs after :heredoc/:to got “lost”)
  • RT #125543 (dedent bug when \n or \r\n showed up in heredocs)
  • RT #120895 (\t in heredoc got turned into spaces)

The final regex fixes

Similarly, I dealt with the final regex engine bugs before Christmas, including a rather hard to work out backtracking one:

  • RT #126438 (lack of error message when quantifying an anchor, just a hang)
  • RT #125285 (backtracking/capturing bug)
  • RT #88340 (backreference semantics when there are multiple captures)

Well, or so I thought. Then Larry didn’t quite like what I’d done in RT #88340, so I’ll have to go and revisit that a little. D’oh.

Other smaller Christmas RTs

  • Fix RT #125210 (postfix ++ and prefix ++ should complain about being non-associative)
  • Fix RT #123581 (.Capture on a lazy list hung, rather than complaining it’s not possible)
  • Add tests to codify that behavior observed in RT #118031 (typed hash binding vs assignment) is correct
  • Fix RT #115384 (when/default should not decont), tests for existing behavior ruled correct in RT #77334
  • Rule on RT #119929 and add test covering ruling (semantics of optional named parameters in multi-dispatch)
  • Fix RT #122715 and corrected tests (Promise could sink a Seq on keep, trashing the result)
  • Fix RT #117039 (run doesn’t fail); update design docs with current reality (Proc will now throw an exception in sink context if the process is unsuccessful), and add tests
  • Fix RT #82790 (indecisive about $*FOO::BAR; now we just outright reject such a declaration/usage)
  • Check into RT #123154; already fixed on Moar, just not JVM, so removing from xmas list
  • Review RT #114026, which confused invocation and coercion type literals. Codify the response by changing/adding tests.
  • Get ruling on RT #71112 and update tests accordingly, then resolve it.
  • Work on final bits needed to resolve RT #74414 (multi dispatch handling of `is rw`), building on work done so far by psch++
  • Fix RT #74646 (multi submethods were callable on the subclass)
  • Implement nextcallee and test it; fix nextsame/nextwith on nowhere to defer; together these resolved RT #125783
  • Fix RT #113546 (MoarVM mishandles flattening named args and named args with respect to ordering)
  • Fix RT #118361 (gist of .WHAT and .WHO isn’t shortname/longname respectively); RT #124750 got fixed along the way
  • Tests codifying decision on RT #119193 (.?/.+/.* behavior with multis)
  • Review/merge pull requests to implement IO::Handle.t from pmurias++, resolving RT #123347 (IO::Handle.t)

Busy times!

And last but not least…

I’ll be keynoting at the London Perl Workshop this year. See you there!

Posted in Uncategorized | 3 Comments

Last week: Unicode case fixes and much more

This report covers a two week period (September 28th through Sunday 11th October). However, the first week of it was almost entirely swallowed with teaching a class and the travel to and from that – so I’d have had precisely one small fix to talk about in a report. The second week saw me spend the majority of my working time on Perl 6, so now there’s plenty to say.

A case of Unicode

I decided to take on a couple of Unicode-related issues that were flagged for resolution ahead of the 6.christmas release. The first one was pretty easy: implementing stripping of the UTF-8 BOM. While it makes no sense to have a byte order mark in a byte-level encoding, various Windows programs annoyingly insert it to indicate to themselves that they’re looking at UTF-8. Which led to various fun situations where Perl 6 users on Windows would open a UTF-8 file and get some junk at the start of the string. Now they won’t.

The second task was much more involved. Unicode defines four case-changing operations: uppercasing, titlecasing, lowercasing, and case folding. We implemented the first three – well, sort of. We actually implemented the simple case mappings for the first three. However, there are some Unicode characters that become multiple codepoints, and even multiple graphemes, on case change. The German sharp S is one (apparently controversial) example, ligatures are another, and the rest are from the Greek and Armenian alphabets. First, I implemented case folding, now available as the fc method on strings in Perl 6. Along the way I made it handle full case folds that expand, and added tests. Having refactored various codepaths to cope with such expansion, it was then not too hard to get uc/tc/lc updated also. The final bit of fun was dealing with the interaction of all of this with NFG synthetics (tests here). Anyway, now we can be happy we reach Christmas with the case folding stuff correctly implemented.

Fixing some phasing issues with phasers

RT #121530 complained that when a LEAVE block threw an exception, it should not prevent other LEAVE and POST blocks running. I fixed that, and added a mechanism to handle the case where multiple LEAVE blocks manage to throw exceptions.

Amusingly enough, after fixing a case where we didn’t run a LEAVE block when we should, RT #121531 complained about us running them when we shouldn’t: in the case PRE phasers with preconditions failed. I fixed this also.

The usual bit of regex engine work

When you call a subrule in a regex, you can pass arguments. Normally positional ones are used, but RT #113544 noted that we didn’t yet handle named arguments, nor flattening of positional or named arguments. I implemented all of the cases, and added tests.

I reviewed and commented on a patch to implement the <?same> assertion from the design docs, which checks that the characters either side of it are the same. I noted a performance improvement was possible in the implementation, which was happily acted upon.

Finally, I started looking into an issue involving LTM, character classes, and the ignorecase flag. No fix yet; it’s going to be a bit involved (and I wanted to get our case handling straightened out before really attacking this one).

Copy-casta

We suddenly started getting some bizzare mis-compiles in CORE.setting, where references to classes near the end of it would point to completely the wrong things. It turned out to be a (MVMuint16) cast that should have been an (MVMuint32) down in MoarVM’s bytecode assembler – no doubt wrongly copied from the line above. It’s always a relief when utterly weird things end up not being GC bugs!

A little profiler fix

If you did –profile on a program that called exit(), the results were utterly busted. Now they’re correct.

Other little bits

Here’s a collection of other things I did that are worth a quick mention.

  • Reviewing new RT tickets and commits; of note, reviewing patch for making :D/:U types work in more places
  • Eliminating remaining method postcircumfix:<( )> uses in Rakudo/tests. Looking into coercion vs. call distinction.
  • Reviewing “make Bool an enum” branch
  • Looking further into call vs. coercion and coercion API as part of RT #114026; post a proposal for discussion
  • Fix CORE::.values to actually produce values (same for other pseudo-packages); fix startup pref regression along the way
  • Studying latest startup profile, identifying a recent small regression
  • Investigate RT #117417 and RT #119763; request design input to work out how we’ll resolve it
  • Fix RT #121426 (routines with declared return type should do Callable[ThatType])
  • Review RT #77334, some discussion about what the right semantics really are, file notes on ticket
  • Fix RT #123769 (binding to typed arrays failed to type check)
  • Write up a rejection of RT #125762
  • Resolve RT #118069 (remove section on protos auto-multi-ing from design docs as agreed, remove todo’d tests)
  • Fix RT #119763 and RT #117417 (bad errors due to now-gone colonpair trait syntax not being implemented)
  • Reading up on module installation writings: S11, S22, gist with input from many in preparation for contributing to design work in the area

And that’s it. Have a good week!

Posted in Uncategorized | 2 Comments

Those weeks: much progress!

This post is an attempt to summarize the things I worked on since the last weekly report up until the end of last week (so, through to Friday 25th). Then I’ll get back to weekly. :-)

GLR

The Great List Refactor accounted for a large amount of the time I spent. Last time I wrote here, I was still working on my prototype. That included exploring the hyper/race design space for providing for data parallel operations. I gave an example of this in my recent concurrency talk. I got the prototype far enough along to be comfortable with the API and to have a basic example working and showing a wallclock time win when scaling over multiple cores – and that was as far as I went. Getting the GLR from prototype mode to something others could help with was far more pressing.

Next came the long, hard, slog of getting the new List and Array types integrated into Rakudo proper. It was pleasant to start out by tossing a bunch of VM-specific things that were there “for performance” that were no longer going to be needed under the new design – reducing the amount of VM-specific code in Rakudo (we already don’t have much of it anyway, but further reductions are always nice). Next, I cleared the previous list implementation from CORE.setting and put the new stuff in place, including all of the new iterator API. And then it was “just” a case of getting the setting to compile again. You may wonder why getting it to even compile is challenging. The answer is that the setting runs parts of itself while it compiles (such is the nature of bootstrapping).

Once I got things to a point where Rakudo actually could build and run its tests again, others felt confident to jump in. And jump in they did – especially nine++. That many people were able to dive in and make improvements was a great relief – not only because it meant we could get the job done sooner, but because it told me that the new list and iterator APIs were easy for people to understand and work with. Seeing other people pick up my design, get stuff done with it, and make the right decisions without needing much guidance, is a key indicator for me as an architect that I got things sufficiently right.

Being the person who initially created the GLR branch, it was nice to be the person who merged it too. Other folks seemed to fear that task a bit; thankfully, it was in the end straightforward. On the other hand, I’ve probably taught 50 Git courses over the last several years, so one’d hope I’d be comfortable with such things.

After the merge, of course, came dealing with various bits of fallout that were discovered. Some showed up holes in the test suite, which were nice to fill. I also did some of the work on getting the JVM backend back up and running; again, once I got it over a certain hump, others eagerly jumped in to take on the rest.

GLR performance work

After landing the GLR came getting some of the potential optimizations in place. I wanted a real world codebase to profile to see how it handled things under the GLR, rather than just looking at synthetic benchmarks. I turned to Text::CSV, and managed a whole bunch of improvements from what I found. The improvements came from many areas: speeding up iterating lines read from a file, fixing performance issues with flattening, improving our code-gen in a number of places… There’s plenty more to be had when I decide it’s time for some more performance work; in the meantime, things are already faster and more memory efficient.

S07

I also did some work on S07, the Perl 6 design document for lists and iteration. I ended up starting over, now I knew how the GLR had worked out. So far I’ve got most of the user-facing bits documented in there; in the coming weeks I’ll get the sections on the iterator API and the parallel iteration design fleshed out.

Syntactic support for supplies

At YAPC::Asia I had the pleasure of introducing the new supply/react/whenever syntax in my presentation. It’s something of a game-changer, making working with streams of asynchronous data a lot more accessible and pleasant. Once I’d had the idea of how it should all work, getting to an initial working implementation wasn’t actually much effort. Anyway, that’s the biggest item ticked off my S17 changes list.

Other concurrency improvements

A few other concurrency bits got fixed. RT #125977 complained that if you sat in a tight loop starting and joining threads that themselves did nothing, you could eat rather a lot of memory up. It wasn’t a memory leak – the memory was recovered – just a result of allocating GC semispaces for each of the threads, and not deallocating them until a GC run occurred. The easy fix was to make joining a thread trigger a GC run – a “free” fix for typical Perl 6 programs which never touch threads directly, but just have a pool of them that are cleaned up by the OS at process end.

The second issue I hunted down was a subtle data race involving closure semantics and invocation. The symptoms were a “frame got wrong outer” on a good day, but usually a segfault. Anyway, it’s gone now.

Last but not least, I finally tracked down an issue that’s been occasionally reported over the last couple of months, but had proved hard to recreate reliably. Once I found it, I understood why: it would only show up in programs that both used threads and dynamically created types and left them to be GC’d. (For the curious: our GC is parallel during its stop the world phase, but lets threads do the finalization step concurrently so they can go their own way as soon as they get done finalizing. Unfortunately, the finalization of type tables could happen too soon, leaving another thread finalizing objects of that type with a dangling pointer. These things always sound like dumb bugs in hindsight…)

Fixed size/shaped arrays

Work on fixed size arrays and shaped arrays continued, helped along by the GLR. By now, you can say things like:

my @a := Array.new(:shape(3,3));
@a[1;1] = 42;

Next up will be turning that into just:

my @a[3;3];
@a[1;1] = 42;

Preparing for Christmas

With the 6.christmas release of the Perl 6 language getting closer, I decided to put on a project manager hat for a couple of hours and get a few things into, and out of, focus. First of all, I wrote up a list of things that will explicitly not be included in 6.christmas, and so deferred to a future Perl 6 language release.

And on the implementation side, I collected together tickets that I really want to have addressed in the Rakudo we ship as the Christmas release. Most of them relate to small bits of semantic indecision that we should really get cleaned up, so we don’t end up having to maintain (so many…) semantics we didn’t quite want for years and years to come. Having compiler crashes and fixing them up in the next release is far more forgivable than breaking people’s working code when they upgrade to the next release, so I’m worrying about loose semantic ends much more than “I can trigger a weird internal compiler error”.

The `is rw` cleanup

One of the issues on my Christmas list was getting the “is rw” semantics tightened up. We’ve not been properly treating it as a constraint, as the design docs wish, meaning that you could pass in a value rather than an assignable container and not get an error until you tried to assign to it. Now the error comes at signature binding time, so this program now gives an error:

sub foo($x is rw) { }
 foo(42); # the 42 fails to bind to $x

Error reporting improvements

I improved a couple of poor failure modes:

  • Fix RT #125812 (error reporting of with/without syntax issues didn’t match if/unless)
  • Finish fixing RT #125745 (add hint to implement ACCEPTS in error about ~~ being a special form)
  • Remove leftover debugging info in an error produced by MoarVM

Other bits

Finally, the usual collection of bits and pieces I did that don’t fit anywhere else.

  • Test and look over a MoarVM patch to fix VS 2015 build
  • Reject RT #125963 with an explanation
  • Write response to RT #126000 and reject it (operator lexicality semantics)
  • Start looking into RT #125985, note findings on ticket
  • Fix RT #126018 (crash in optimizer when analysing attribute with subset type as an argument)
  • Fix RT #126033 (hang when result of a match assigned to variable holding target)
  • Reviewing the gmr (“Great Map Refactor”) branch
  • Fix crash that sometimes happened when assembing programs containing labeled exception handlers
  • Review RT #125705, check it’s fixed, add a test to cover it. Same for RT #125161.
  • Cut September MoarVM release
  • Hunt JIT devirtualization bug on platforms using x64 POSIX ABI and fix it
  • Tests for RT #126089
  • Fix RT #126104 (the `is default` type check was inverted)
  • Investigate RT #126029, which someone fixed concurrently; added a test to cover it
  • Fix RT #125876 (redeclaring $_ inside of various constructs, such as a loop, caused a compiler crash)
  • Fix RT #126110 and RT #126115.
  • Fixed a POST regression
  • Fix passing allomorphs to native parameters and add tests; also clear up accidental int/num argument coercion and add tests
  • Fix RT #124887 and RT #124888 (implement missing <.print> and <.graph> subrules in regexes)
  • Fix RT #118467 (multi-dispatch sorting bug when required named came after positional slurpy)
  • Looking into RT #75638 (auto-export of multis), decided we’d rather not have that feature; updated design docs and closed ticket
  • Investigate weird return compilation bug with JIT/inline enabled; get it narrowed down to resolution of dynamics in JIT inlines
  • Fix RT #113884 (constant scalars interpolated in regexes should participate in LTM)
  • Investigate RT #76278, determine already fixed, ensure there is test coverage
Posted in Uncategorized | 1 Comment