pacc logo

Implementing #line directives


So what about implementing #line directives? First thing is that we need to get line numbers into the AST. And I'm afraid to say that I really suspect the easiest (and possibly best) way to do this is to make an expr node hold a value like 6:"yes" which will eventually turn into:

something_or_other = (
#line 6 "parse.pacc"
#line 943 "parse.c"

These are both fiddly - it's irritating that the C pre-processor doesn't have a #unline directive. The second #line will need to be on line 942 of file parse.c to tell it that the next line is 943 in parse.c!

To implement this, I'm going to expunge all the random printf()``s in ``emit.c. This will make removing redundant types from the union much simpler, too.


So this is a major rewrite of emit.c, and I've got to error(), which is the major cause of bloat in the emitted code. For example, pacc2.c is currently 11937 lines. I suppose now would be as good a time as any to fix that up. Well, it drops to 9331 lines, which is a > 20% reduction, so that's actually not bad.

Reworking everything to go through c_str() and friends is complete. I can now add the #unline directives that go back to where we were, which was the whole point of that rewrite.

Argh, so that's full of hideousnesses. First, #line is supposed to be left-justified, but that doesn't match the way indents work at the moment. Second, we need to know how long the template is. Thirdly, it means that the "diff" to check pacc2.c versus pacc3.c fails miserably. All solved.


So, onto how to emit sensible #lines for the user-supplied text. At the moment, we have nodes in the parse tree like expr 102: 5. Either we change those to include the line number where that expr came from, so we might have expr 102: 7 5, or we add another node type.


When we report an error, we calculate the line number (and column number) in the input by counting \\ns. Which is all right as far as it goes, but we don't want to keep doing that every time we encounter an expression.

OK, I don't know why I've been holding off on this (except perhaps that there's this vast body of code that I barely understand). We need _pacc_line_number(_pacc_parser *, off_t). Initially use the code we've got, counting \\ns. Later we can memoize it.

(I'm worrying about off_t again; wondering if we wouldn't be better off with intmax_t or intptr_t or int_fast64_t. The problem is, I need to write some code that will convert a line number into a string. Oh, but hang on, they're longs anyway. OK, no problem.)

Right, well, that all seems to work except there's discrepancies between pacc2.c and pacc3.c on node numbers. Right, because pacc1 doesn't build line numbers into the tree. So let's fix that. Done.


Guards! Guards! They don't have #line numbers yet.

Well that's something I wouldn't have anticipated. First #line implementation produces code that looks like this:

cur->value.u106= (
#line 7 "bad/line.pacc"
#line 332 "parse.c"

But in this case, gcc doesn't spot the syntax error till it sees the closing brace, so we need to put that on the same line as the expression. Hmm... and now it says

bad/line.pacc:7:15: error: expected expression before ‘)’ token

which is a touch confusing, but I don't think there's much I can do about it. The column numbers are wrong, too, which will be fiddly to fix, sigh. Oh, here's a thought. Instead of tediously using snprintf (twice) to turn a line number into a n->text node, if the AST actually supported numeric nodes, we could carry the line and column numbers through as numbers. Then the line numbers are trivial to turn into a string (because we're writing to a stream, rather than memory), and the column number can be used as a number to indent appropriately.


Couple of things we can do here. We could have a node type coord that holds int[2]. We could have node types of line and column that both hold an int. We could have a single node type of type num that holds an int, and distinguish them by their position. On the whole, the first seems easiest. Especially since I've decided that it's OK to use an anonymous union (supported in C11 apparently as well as ganuck and others).

Almost there, but still some minor hitch with positioning, and one of those darned pacc2 vs pacc3 errors... which was easily fixed.

Last updated: 2015-05-24 19:45:26 UTC


Porting and packaging

One thing pacc needs is more users. And, perhaps, one way to get more users is to reduce the friction in getting started with pacc. An obvious lubricant is packaging. Read More...

Release relief

Looking at _pacc_coords(), I noticed that it seemed to have the same realloc() bug that I'd just fixed in _pacc_result(). However, the "list of arrays" trick really wasn't going to work here. Read More...

See more news articles