pacc logo

Documenting feeding

I've always been very proud of feeding. It solves a problem which Bryan Ford highlights in his original thesis, and as far as I know nobody else has solved this problem. (If anybody would like to award me a PhD for this, please get in touch at the usual address.) Life has taught me that ideas are cheap. To implement an idea, well, that's...

... scarcely anything! I'd definitely thought of feeding by November 2009, I think rather sooner than that. (I came to it very early, since I decided to write pacc because the Unix shell I was working on had got too complicated for yacc, and feeding is basic to shells: think PS1 and PS2 in the Bourne shell.) I seem to have implemented feeding only just before the first release, in about May 2012.

Slightly better than implementing an idea is to publish it. Feeding was in pacc-0.0 of course, and at that time there was a very brief description of it on the website. But it is only now that I have sat down and tried to explain it. As usual, writing documentation throws up ideas about how the implementation could perhaps be better.

I'm reminded of a paper A critique of Java by one of my favourite university lecturers, Harold Thimbleby. Sadly not the hilarious lame ducks, but the part where he discusses Java's import statement which is restricted to occur only near the top of a source file. This restriction keeps rearing its head in the documentation. “But as the restriction is arbitrary, why not design the language without the restriction, for the description of the restricted language including providing examples is unnecessarily tedious and error prone.”

Anyway, in my tutorial document I am expanding the venerable calculator example to allow feeding. And describing it is harder than expected. As the implementation stands, there is a distinguished feed rule whose name defaults to __ (two underscores). At first I thought this could simply replace the whitespace rule _ ← [ \t\n]*. I wrote an example on this basis, but when I tried to run it I hit an assert!

On investigation, it turned out that the assert was essentially right; what I was trying to do was daft. So the first step was to turn that into a sensible error message (along with a test case of course). How is it daft? Rewind to a trivial grammar, which sums two digits, possibly separated by whitespace, and initially without feeding:

Sum ← _ a:Digit b:Digit → { a + b }
Digit ← [0-9] _  → { *ref_ptr(ref()) - '0' }
_ :: void ← [ \t\n] *

If we want to use this for feeding, we will need to “cook” it, by invoking pacc with the -f option. Cooking a grammar produces a second grammar in which each occurence of the distinguished rule is followed by either whatever follows it normally, or the end of the input. Then if the input fails to be matched by the full grammar, but can be matched by the cooked grammar, it is clear that we can read and feed another line.

Now, if we simply replace _ with __ (or invoke pacc -r_ to choose a non-standard name for the distinguished rule), it can't possibly do what we need. In the above example, for one thing, the only occurence of _ is already not followed by anything, so cooking can't work out how to modify it (that was what threw the assert). But more significantly, since there is no _ between the two Digits, how can cooking allow them to appear in separate bites of the input?

So we need to lift __ above the lexical elements, and specify it in the highest level rules where it makes sense. Then the first two rules of trivial grammar will look like this:

Sum ← _ a:Digit __ b:Digit → { a + b }
Digit ← [0-9] _  → { *ref_ptr(ref()) - '0' }

(I may return later to the question of whether the initial _ should in fact be __ or even _ __!) That is fine as far as it goes, but what should _ and __ match? It seems obvious that __ should match "\n"*, so in my second attempt at describing feeding, I suggested this:

_ :: void ← [ \t]*
__ ← "\n"*

But this is wrong on at least 2 counts. It doesn't match this input:

2
 3

because there's nothing to match the space on the second line. Possibly we could fix that with __ ← ("\n" _)*. But the test harness that I have includes the terminating \n character at the end of the input, so it still doesn't work. What does work is this:

_ :: void ← [ \t\n]*
__ ← ε

Recall that the epsilon ε is an optional placeholder: the __ rule now matches nothing! All of which makes me think that I've taken a wrong turn. If the most useful thing for __ to match is nothing, then it makes no sense for it to be a rule at all. Should it not be an operator in the grammar? The $ character is still available, so it could look like this:

Sum ← _ a:Digit $ b:Digit → { a + b }
Digit ← [0-9] _  → { *ref_ptr(ref()) - '0' }
_ :: void ← [ \t]*

I need to play with some more examples to see if this is a genuine improvement, but it looks plausible at the moment.

Last updated: 2015-05-24 19:45:29 UTC

News

Porting and packaging

One thing pacc needs is more users. And, perhaps, one way to get more users is to reduce the friction in getting started with pacc. An obvious lubricant is packaging. Read More...

Release relief

Looking at _pacc_coords(), I noticed that it seemed to have the same realloc() bug that I'd just fixed in _pacc_result(). However, the "list of arrays" trick really wasn't going to work here. Read More...

See more news articles

feed