pacc logo

Finally... left recursion

With the help of Dale Schumacher, I have cracked left recursion. At least, I have a paper design that I believe I can implement. I think I will attempt to do this before the next release, as really without it pacc is very hard to use for a number of important cases. It should also make it possible to implement arbitrary counted repetitions, such as {2,4}. As a bonus, this should make certain error messages less opaque. Finally, I believe it to be an original piece of work.

At present, pacc implements ?, *, and + by rewriting them into rules. I had vague plans to someday implement arbitrary repetitions, presumably using the {n,m} syntax of Perl-compatible regular expressions. The good news is that, if I can do so, then ? is simply sugar for {0,1}, * for {,} and + for {1,}. The AST for pacc already reflects this.

Schumacher's important observation is that left recursion is not important for defining which inputs a grammar recognises. The following three rules all express exactly the same set of valid inputs:

left ← left "-" decimal / decimal
right ← decimal "-" right / decimal
tail ← decimal ("-" decimal)*

Well, strictly speaking the first doesn't express anything at all; it's not a valid PEG. The second does match the right set of inputs, but unfortunately if we add the obvious expressions it makes the - operator right associative.

That leaves the third possibility. At present, pacc can only return a ref_t for a repetition operator. The basic idea of this proposal is that pacc will evaluate the repetition operator from left to right. This is rather similar to the foldl function; we have a base value, and a way to fold each new value that we match onto it. So here's what it could look like for the subtraction example:

sub ← v:decimal { v } ("-" d:decimal { a - d })*

The first thing that is a bit strange here is the identifier a. This is the accumulator. We could use some special syntax for this ($$ anyone?) but I'm inclined to simply say that it is always called a. Now, the syntax here is a bit convoluted, but I'm not quite sure how to improve it. Because I think we need to be able to handle a mixed sum, like this:

additive ← v:decimal { v }
             ( "+" d:decimal { a + d }
             / "-" d:decimal { a - d } )*

In any case, since we can certainly include a (parenthesized) alt inside a rep, it would be nice to allow a case like this. One thing that isn't quite clear to me at the moment is whether we have to have another expr to give the overall value of the rule.

Would it work better to say that this special processing can only apply to an entire rule? And the final value of the accumulator yields the value of the rule. So the last example could look something like this:

additive ← v:decimal w:additivetail { v + w }
additivetail ← { 0 }
                 ( "+" d:decimal { a + d }
                 / "-" d:decimal { a - d } )*

That's really not an improvement over simply saying that a remains in scope till the end of the rule (and the previous example needs a final → a to give its overall value).

Let's look at a couple more examples. I struggled with trying to implement atoi() in pacc——again it demands left association. Here's how to do it:

decimal ← { 0 } ( d:digit { a * 10 + d } )+ → a

And for a minimal example, what about simply counting the number of repetitions?

unary ← { 0 } ( "1" { a + 1 } )+ → a

That's about it. These examples aren't fantastically pretty, but I think they can be made to work. If I can get the backend sorted, I can perhaps tweak the syntax; it strikes me that ; is currently unused.

Last updated: 2015-05-24 19:45:22 UTC


Support the development of pacc with a donation! We accept donations in BitCoin or via PayPal who handle almost any other form of payment.


Porting and packaging

One thing pacc needs is more users. And, perhaps, one way to get more users is to reduce the friction in getting started with pacc. An obvious lubricant is packaging. Read More...

Release relief

Looking at _pacc_coords(), I noticed that it seemed to have the same realloc() bug that I'd just fixed in _pacc_result(). However, the "list of arrays" trick really wasn't going to work here. Read More...

See more news articles