pacc logo

Rep-rules again

It's so long since I've done any serious work on the parser engine that I've more or less completely forgotten how it works. So I've begun a side-project of documenting the internals. I've covered a significant amount of ground, which has refreshed my memory. (And also given me a few things to tidy up at a later date.)

With my recovered knowledge in hand, I've written a “rep-rule” version of the unary parser that actually gives the right answers. It's not quite right though—the current version doesn't handle bindings, and can't be made to do so.

The first part seems reasonably clear: to match a rep-rule, we have a for loop that looks something like this:

for (n = -1; _status == parsed; ++n) { ...

That allows us to count how many matches we can make of the repeated thing (let's call it the repetand) at the current parsing point. And then we test n against the bounds of the particular repetition operator to see if we have an overall match at all. That's all fine and good.

The trickier part is how to evaluate it. At present, I simply evaluate the associated expression n times. In the case of the unary example, the expression is a + 1, so we have the moral equivalent of:

for (i = 0; i < n; ++i) a = a + 1;

But that's definitely not going to work in the presence of bindings. To make those work, consider an input of 3-2-1. The repetand will be "-" d:decimal, and we will need to record on evlis that we must call decimal when _x is 2 (returns 2) and 4 (returns 1). To evaluate that, first of all we'll have to notice that the repetand contains a single binding. So we'll need to step through evlis 1 item at a time, evaluating a single call to decimal and binding its value to d, then evaluating a = a - d. If the repetand were to make 2 bindings, then we would need to step through evlist 2 items at a time.

I'm inclined to say that the repetand may not contain alternatives (directly; it can do so by calling a rule of course). If we allow a repetand like "+" d { a + d } / "-" d { a - d } then we'll have to record somewhere which of the alternative expressions we need to evaluate. In this particular case, there is no problem to forbidding the alternative: we can simply define:

sumTail <- "+" d:decimal { d } / "-" d:decimal { -d }

and make the repetand s:sumTail { a + s }.

The other most likely case I can think of is that we would be building a list, and one of the alternatives would push something on to the list { push(a, d) } and the other wouldn't { a }. It seems to me that in most realistic cases we could handle this with a maybe_push function that pushes only if d != 0.

A second advantage to outlawing alternatives within the repetand is that it makes the user syntax much easier to conceive of. I think I'd very much like to base it on the for loop, so for example we might write:

sum ← v:decimal (v; w:sumTail; a + w)*
sumTail ← "+" d:decimal { d } / "-" d:decimal { -d }

It looks a little odd to have expressions without { braces }, but it's succinct, and eminently parseable. Just to complete the other two examples, we have a decimal parsed by pacc:

decimal ← (0; d:digit; 10 * a + d)+

And parsing a unary number:

unary ← (0; "1"; a + 1)+

I very much like the look of all this. Now, can I implement it?

Last updated: 2015-05-24 19:45:28 UTC


Porting and packaging

One thing pacc needs is more users. And, perhaps, one way to get more users is to reduce the friction in getting started with pacc. An obvious lubricant is packaging. Read More...

Release relief

Looking at _pacc_coords(), I noticed that it seemed to have the same realloc() bug that I'd just fixed in _pacc_result(). However, the "list of arrays" trick really wasn't going to work here. Read More...

See more news articles