pacc logo

Darn

Just had one of those “oh darn” experiences. I expect it'll all work out ok in the end, but it does seem to be a major flaw that has gone previously unnoticed.

I've been playing around with left recursion, which is in itself a very hard problem. It's also an important one: at present, it is impossible to write a pacc parser that can subtract 3 numbers correctly. Is that too strong a word? I wondered if I could follow Bryan Ford, perhaps even manually at first. The transformation performed by pappy is a pragmatic approach, but it involves higher order functions and partial application, so is scarcely amenable to implementing in C. (As far as I can tell, Rats! performs an equivalent transformation using some nasty Java libraries.)

I'd rather follow Dale Schumacher, and I'm going to study his idea further. I think it might be doable by adding special notation that allows the pacc grammar to specify how the result of * (and +) operators will be assembled. The special trick will be to allow a binding from each iteration to be in scope for the next one, which is why it is more than can be done simply by manually expanding the repetition operator.

But before I got onto that, I thought that there might just be a way to do it, at least for the subtraction example, by converting to reverse Łukasiewicz notation. (As one of my college lecturers explained, we call it “reverse Polish” notation only because no-one is confident of how to pronounce his name. The first letter of course has a “w” sound.) However, whilst playing around with this idea, I started writing a very modest grammar that happened to have two function definitions in the preamble. Bang! Pacc cannot parse such a thing.

After some considerably head scratching, I realised that the problem is in this definition:

CodeNames
  ← n:Name ns:CodeNames { s_set_cons(s_text(ident, n), ns) }
  / StringLit ns:CodeNames → ns
  / CharLit ns:CodeNames → ns
  / C_Comment ns:CodeNames → ns
  / [^{}] ns:CodeNames → ns
  / lBrace ns:CodeNames rBrace → ns
  / ε { 0 }

This handles as many nested braces as required, { { { } } }, but cannot handle even a single pair in sequence, { { } { } }. The penultimate alternative needs to be something like this:

/ lBrace ns:CodeNames* rBrace → ns

But that's not going to work, because we have no control over how the result of a repetition operator is assembled. (Is this starting to sound familiar?) At present, you just get a ref_t(). Anyway, after peering at this for quite a while, the lack of symmetry finally jumped out. That line should read:

/ lBrace ns:CodeNames rBrace ps:CodeNames → { append(ns, ps) }

I still feel that there's a * operator trying to escape there, but it will do for now. At least the bug is fixed: preambles work as expected, and can contain (multiple!) function definitions.

So back to left recursion. With considerable effort, I have implemented a trivial grammar that can evaluate 3-2-1 and get 0. It's pretty grody though: the code has to maintain its own stack to ensure that the subtractions are evaluated in the right order. There are already so many stacks in pacc, I would have thought one of them could do the job.

Not quite sure where to go with this at the moment. I really don't want to frob with the code much more before the release if I can help it.

Last updated: 2015-05-24 19:45:24 UTC

Donate

Support the development of pacc with a donation! We accept donations in BitCoin or via PayPal who handle almost any other form of payment.

News

Porting and packaging

One thing pacc needs is more users. And, perhaps, one way to get more users is to reduce the friction in getting started with pacc. An obvious lubricant is packaging. Read More...

Release relief

Looking at _pacc_coords(), I noticed that it seemed to have the same realloc() bug that I'd just fixed in _pacc_result(). However, the "list of arrays" trick really wasn't going to work here. Read More...

See more news articles

feed