pacc logo

Fixing coords()

A major problem with the r┼Źnin release was that the pacc_coords() function performed a linear search on the input each time it was called. Although I don't actually have a way to measure it, this must have destroyed the parser's claimed linearity.

That is now fixed. The pacc_coords() function memoizes each result it returns. When it needs to find a new coordinate pair, it starts from the closest previous one. Hmm... this could pathologically still be linear (thus making the entire parser worse than linear), if it is called with always decreasing col values. That's desperately unlikely though; pacc itself apparently never calls pacc_coords() with a value for col that is smaller than the largest we've already seen.

Potentially, this could just about be fixed; we could count backwards from the closest next position if it's closer, or the previous position is non-existent. But I seriously doubt that's worth it.

I managed to cook up a contrived test case, that exercises the memmove() case, when a smaller col value shows up after a larger one.

On further reflection, the already dubious notion of counting lines backwards is almost entirely scuppered by the need to handle UTF-8 encoding. However, I believe that the value of col can only decrease if a seq has started matching, calling pacc_coords() for a semantic guard, and then failed, and a shorter seq has matched. (This is how the contrived test case works.) Bear in mind that value expressions aren't evaluated till a path through the grammar has already been plotted. So the number of times that col can decrease is bound by the grammar itself. Maybe not, but anyway, I'm definitely going to stop worry about it now!

In other news, a gentle hacking session yesterday made a couple of modest improvements to the emitted code.

For a start, parsing a seq proceeds from a default assumption of parsed, so there is no need for the any matcher to set this explicitly; it just needs to set no_parse when it fails. (Other matchers may well have the same redundancy, I haven't checked.) This fact, incidentally, is why all action must take place inside a seq.

I also noticed that on returning from a call, we take a copy of cur, but this is never used. Eliminating it reduces the size of pacc2.c from 9125 to 8982 lines; a modest, but useful improvement.

Last updated: 2015-05-24 19:45:25 UTC


Porting and packaging

One thing pacc needs is more users. And, perhaps, one way to get more users is to reduce the friction in getting started with pacc. An obvious lubricant is packaging. Read More...

Release relief

Looking at _pacc_coords(), I noticed that it seemed to have the same realloc() bug that I'd just fixed in _pacc_result(). However, the "list of arrays" trick really wasn't going to work here. Read More...

See more news articles