pacc logo
pacc v0.3: The any matcher

Next: , Previous: , Up: Top   [Contents][Index]


15 The any matcher

There is one matcher that we have not yet met. It is the any matcher, written as ‘.’, a period. As you might expect, it matches a single (UTF8) character (including the newline character), without regard to what that character actually is. If you are familiar with regular expressions, using ‘.’ to match any character may seem familiar, but beware! Pacc behaves quite differently from regular expression matchers.

15.1 Pacc expressions and regular expressions

For one thing, in regular expressions ‘.’ does not (usually) match the newline character. Pacc is more uniform: ‘.’ always matches any single character. But this is a minor detail. What I really want to demonstrate is how pacc’s matching rules are different from those of regular expressions.

Let’s look at an example. Consider the regular expression ‘/.*X/’. Under regular expression rules, ‘*’ is said to be greedy, which means that ‘.*’ matches as much as possible, while allowing the rest of the expression to match. Suppose the input string is ‘fooXbarXbaz’. In this case, the expression will match ‘fooXbarX’. In general, it will match to the last ‘X’ in the input, or fail if there is no ‘X’.

The pacc expression ‘.* "X"’ looks superficially similar, but in this case ‘.*’ will match the entire string, and there is nothing left for the ‘"X"’ to match. So ‘.* "X"’ will always fail. (If ‘*’ is greedy in regular expressions, it’s positively gluttonous in pacc!)

The matching engine for regular expressions effectively shifts input characters back and forth between different parts of the regular expression (such as the ‘.*’ and the ‘X’ in our example), with the aim of always finding a match if it possibly can. This is sometimes called backtracking.

Pacc never backtracks. Instead, each matcher in a pacc expression is considered in turn, and matches as much as possible. If that causes an overall failure, pacc simply moves on to the next alternative. So pacc expressions must be carefully written so they cannot match more than they should.

In this case, we can use a negated character class matcher instead of the any matcher. The pacc expression ‘[^X]* "X"’ is closer in meaning to the regular expression we started with. Of course, this will only match ‘fooX’. (It’s actually a common mistake when using regular expressions to write ‘/.*X/’ intending that it will only match to the first ‘X’. When that is what is required, we must adopt the same technique as the pacc expression and write ‘/[^X]*X/’, or resort to Perl’s non-greedy ‘*?’ operator if it’s available: ‘/.*?X/’.)

To match up to the lastX’ in pacc, we simply need to apply a repetition operator to the previous expression: ‘([^X]* "X")+’. Apart from newline handling, this is exactly equivalent to the regular expression we started with: it matches to the last ‘X’ in the input, and fails if there is no ‘X’.


Next: , Previous: , Up: Top   [Contents][Index]

Last updated: 2016-08-03 21:39:50 UTC

News

Porting and packaging

One thing pacc needs is more users. And, perhaps, one way to get more users is to reduce the friction in getting started with pacc. An obvious lubricant is packaging. Read More...

Release relief

Looking at _pacc_coords(), I noticed that it seemed to have the same realloc() bug that I'd just fixed in _pacc_result(). However, the "list of arrays" trick really wasn't going to work here. Read More...

See more news articles

feed