pacc logo

UTF-8 character classes

So I missed the “end of May” release date (I've been too busy enjoying myself!). I think that I should make a release as soon as possible. Remaining "blockers" are UTF-8 support in character classes, and documentation. I'm afraid that documentation will be minimal.

As for UTF-8 character classes, there should be an almost reasonable workaround for these, as follows. Where you'd like to write:

Currency ← [$£€]
foo

instead write:

Currency ← "$":. / "£":. / "€":.
bar

Let me just check if that actually works. Yes (although it gives some more examples of error messages that use desugared syntax). So that is test/pacc/utf8-2.pacc, and the character class version (which currently adds another two FAILs to the test suite) is test/pacc/utf8-3.pacc.

So I think I'm going to make a release with no more code changes. This won't be the ninja release after all, it will be rōnin. Except that in writing up the command line interface, I realise that I've got in a muddle with -f and -r. Oh, and I really really am going to anchor grammars on the right.

Sigh. Writing documentation is such a sobering experience. I was just describing a pacc grammar when I realised that we currently admit a grammar with no rules at all. Needless to say, if you actually specify such a thing, pacc dumps core. Even more hilariously, if pacc's input grammar is empty, we get a failure from mmap()! Well, they shouldn't be too tricky to fix... done the mmap() failure. The empty grammar failure was completely trivial to fix, change this, which matches zero or more Defns:

Defns
    ← d:Defn ds:Defns { cons(d, ds) }
    / ε { 0 }
foo

for this, which matches one or more:

Defns
    ← d:Defn ds:Defns { cons(d, ds) }
    / d:Defn → d
foo

It would, of course, be even nicer to do this with + and *, and maybe this shows the way to do it: when pacc needs to build a list, it will simply call cons(), and it is up to the user to have a suitable definition in scope. (We could support an option to call a different name instead.) That might even work...

Hmm... and another hitch. The support for UTF-8 in any matchers has changed the error output from the bound literal tests. In one sense this really doesn't matter, but I don't understand it. (And actually the previous errors were rather better.) Ah! I think I get it: the new any matcher doesn't reset col if it fails to match. Of course, it can only fail to match at the end of the input, but still it should rewind col to its starting point. OK, let's fix that. Done.

Last updated: 2015-05-24 19:45:22 UTC

Donate

Support the development of pacc with a donation! We accept donations in BitCoin or via PayPal who handle almost any other form of payment.

News

Porting and packaging

One thing pacc needs is more users. And, perhaps, one way to get more users is to reduce the friction in getting started with pacc. An obvious lubricant is packaging. Read More...

Release relief

Looking at _pacc_coords(), I noticed that it seemed to have the same realloc() bug that I'd just fixed in _pacc_result(). However, the "list of arrays" trick really wasn't going to work here. Read More...

See more news articles

feed