pacc logo

Lists of arrays

In the previous post I talked about my ideas for avoiding a realloc-induced heisenbug. I've now implemented this, let me describe it.

Recall that the data structure we are implementing is nominally a table with as many columns as there are characters in the input, and as many rows as there are rules in the grammer. Each cell of the table is currently called a struct pacc_mid, although I can't now remember where that name comes from and will probably change it to pacc_cell. Because the table is sparsely populated, we implement it as a hash table, using the (column, row) pair as the key.

The central struct pacc_parser include these members:

struct pacc_bkt *m_bkt;
unsigned int m_bkt_cnt;

During initialization in pacc_input(), we pick a value for m_bkt_cnt (more on this later), then allocate an array of that many buckets.

A bucket is defined like this:

struct pacc_bkt {
    struct pacc_blk *blk;
    unsigned int valid;

It is a pointer to the first struct pacc_blk block, and a counter which specifies how many cells there are in the bucket. And what is a block? That's defined like this:

struct pacc_blk {
    struct pacc_blk *next;
    struct pacc_mid mid[_pacc_MIDS_PER_BLK];

It is simply a bog standard linked list of blocks. Each block contains an array of cell structures, the number being a compile time constant.

There's nothing too tricky in the code to manipulate the structures. Fortunately knowledge of the hash table structure was (almost) confined to a single place:

static struct pacc_mid *_pacc_result(struct pacc_parser *p,
        size_t col, long rule);

(In passing, I wonder why I used long for rules, when int seems like a more sensible choice? Or even short! Not that long and int are actually different sizes on any common architecture.)

The other place that needs to understand the structure is pacc_destroy. Correctly freeing the table is annoying, especially since each pacc_mid contains a pointer to a list which must itself be freed, but it's only a few lines of code.

Now, there are two interacting knobs that we can twiddle to get various tradeoffs. First there is MIDS_PER_BLK which is a compile time constant. If this is set to 1, then the bucket structure degenerates to a standard linked list. Naturally, I have confirmed that all tests pass in this scenario.

With MIDS_PER_BLK == 1 we never allocate a cell structure that we don't use. But they are scattered arbitrarily through memory, and my assumption is that following the list will prove expensive in terms of page faults and CPU cache misses. (It would be nice to test this assumption at some point.) With more cells in a block, we hope to reduce this cost, at the expense of sometimes allocating cells that we don't use.

How far should we twist the knob? These days a Linux VM page is 4k, so a plausible definition might be this:

#define _pacc_MIDS_PER_BLK (4096 / sizeof(struct pacc_mid) - 1)

The idea of the - 1 is to ensure that sizeof (struct pacc_blk) < 4096. Although we have no reason to think that malloc() will align the structure so that it lies within a single page, we know that we can examine an entire block with at most two page faults.

It so happens that sizeof (struct pacc_mid) == 64 (currently), so we get 63 cells per block. Is that a reasonable number?

Unfortunately, my corpus of pacc grammars and inputs is small. Still, let's look at pacc.pacc for a start. The longest hash bucket created when parsing it is 17 cells. That means we end up using only 12% of the cells we allocate!

This brings us to the other knob. In _pacc_set_bkt_cnt, we choose the largest of a preset selection of primes which is smaller than:

unsigned long max = n_rules * (p->input_length + 1) / 100;

For the case of pacc.pacc, the numbers are 91 rules, 4981 characters, a maximum of 4533, and a selected prime of 2999. The hash function does a decent job of spreading the 23087 cells among the 2999 buckets: a perfect distribution would be 7.7 cells per bucket.

So it would seem that a more reasonable setting of MIDS_PER_BLK might be:

#define _pacc_MIDS_PER_BLK (1024 / sizeof(struct pacc_mid) - 1)

For the case of pacc.pacc, this results in us using 51% of all the cells we allocate, and according to valgrind actually using less memory than the previous table structure. (On reflection, the previous scheme is certain to allocate a lot of unused cells too, but it's not worth worrying about that now.)

In addition, crude timing tests suggest a tiny improvement in performance, of around 5%. So I declare this a win. Without more data points, I don't think further knob twiddling is worthwhile.

Last updated: 2015-07-19 07:30:55 UTC


Porting and packaging

One thing pacc needs is more users. And, perhaps, one way to get more users is to reduce the friction in getting started with pacc. An obvious lubricant is packaging. Read More...

Release relief

Looking at _pacc_coords(), I noticed that it seemed to have the same realloc() bug that I'd just fixed in _pacc_result(). However, the "list of arrays" trick really wasn't going to work here. Read More...

See more news articles