pacc logo

Tying the loop tighter

If it seems to have gone quiet around here recently, that's only partly because it's summer, and the lure of the outdoors is stronger than staying in with my head in a chunk of code. I have been gnawing away at a major subproject: converting a real language based on yacc over to pacc. This has been quite an eye-opener!

I've been painfully aware of the lack of real clients for pacc for a long time. Apart from pacc.pacc itself, I have just a bunch of test cases. I've thought about various things, but never got round to it till now. And converting an existing project has shown up a number of deficiencies, or “bugs” as you might call them. One surprising bug was that result expressions could not contain newline characters! In 172 test files, it had never once occurred to me to type something like this:

S ←
  "yes" {
  } /
  "no" {

That was an easy (and urgent) fix. Others will have to wait for the next release. For example, for certain classes of error, pacc goes to great lengths to ensure that they are reported with their location, and this works very well with edit-compile-fix tools, such as :make in vim. But other error reports, those that pacc makes while looking at the parse tree, are not so helpful. For example, “rule not found” error is reported with the name of the calling rule, but not its location. That will have to wait for the next release.

If you omit the closing brace from a result expression, you get an error at the end of the file. We will need to work out some way that pacc can report the location of the opening brace. Again, that's for the next release, at the earliest!

A more urgent, and highly surprising omission was that braceless expressions don't record where they are. For a simple example:

S ← "yes" → q

Obviously the C compiler is going to complain that q is not in scope, but since we haven't generated a coords node for that braceless expression, no #line directive is emitted, and the error ends up being at line 456 of parse.c, instead of line 1 of parse.pacc.

(Ac-tu-al-ly.... pacc could detect that error itself, since it knows what identifiers are in scope. But even if we some day implement that, it's still Just Right for the coords node to be created here.)

The fix was a relatively minor change to the grammar, although it took me a while to get it right, due to an unfortunate wrong turn. I tried to use lBrace ← "{" _ to match the start of a braced expression. But since _ matches pacc comments, which include # to end of line, using lBrace swallows any #include lines. It took me quite a while to spot that.

All substantial changes to pacc.pacc must be reflected back into pacc0.c: the code that hand crafts the first pacc AST. (That first AST we walk with the emitter to build a parser that can parse pacc.pacc to build the second AST.) Of necessity, pacc0.c was originally hand written, and I have manually maintained it ever since.

Every time I've begun the tedious process of tweaking pacc0.c, I've wondered if it could be automated. I've always held back though. Partly because each set of changes was quite small (and “the last”—it's surprised me how often pacc.pacc has changed) so it barely seemed worthwhile. Partly out of a nostalgia for that early piece of scaffolding (I am intensely proud of pacc, and don't want to throw away its baby clothes). And partly because the way I was envisaging automating it was to run an (awk? perl? Haskell?) program over the output of a tree dump made with pacc -D0 pacc.pacc, and I didn't relish writing that.

This time, because of that summer thing we seem to be having at the moment, there was quite a gap between changing pacc.pacc and updating pacc0.c so I had a good chance to think over whether and how to do it (see also Stop & Think). I was just thinking about how to use the indentation in the tree dump to build the nesting structure when it finally dawned on me that it would be absolutely insane to re-parse a dump of a parse tree! Absolutely the right way to program this is in C: it's just an alternate backend to the compiler!

Now that seemed like a project I could get enthusiastic about! The new backend is almost trivial, in fact, here's the tree walker in its entirety:

void walk(struct s_node *p, int d) {
    int kid = p->first != 0;
    int sib = p->next != 0;
    char *t = decode_type(p->type);

    if (sib) walk(p->next, d);
    if (kid) walk(p->first, d + 1);
    printf("n = ");
    if (sib) printf("cons(");
    if (kid) printf("s_kid(%s, p%d)", t, d + 1);
    else printf("s_new(%s)", t);
    if (sib) printf(", p%d)", d);
    printf("; ");
    if (s_is_number(p->type))
        printf("n->number = %d; ", p->number);
    if (s_is_pair(p->type))
        printf("n->pair[0] = %d; n->pair[1] = %d; ", p->pair[0], p->pair[1]);
    if (s_is_text(p->type) && p->text) {
        char *s;
        printf("n->text = \"");
        for (s = p->text; *s; ++s) {
            switch (*s) {
                case '\n': printf("\\n"); break;
                case '"': printf("\\\""); break;
                case '\\': printf("\\\\"); break;
                default: printf("%c", *s);
        printf("\"; ");
    printf("p%d = n;\n", d);

So what does all this mean? Pacc will still ship with a pacc0.c which begins the build process. But in the future, when pacc.pacc changes, those changes can quickly be reflected back with:

make boot
./boot > pacc0.c

Pacc is slowly growing up. And if it's possible, I'm now even prouder of it than I was before.

Last updated: 2015-05-24 19:45:22 UTC


Porting and packaging

One thing pacc needs is more users. And, perhaps, one way to get more users is to reduce the friction in getting started with pacc. An obvious lubricant is packaging. Read More...

Release relief

Looking at _pacc_coords(), I noticed that it seemed to have the same realloc() bug that I'd just fixed in _pacc_result(). However, the "list of arrays" trick really wasn't going to work here. Read More...

See more news articles