PLF C++ Library - plf::colony - skipfield comparison results

Last updated 23-11-2016 v3.86

Below we benchmark a colony container using a boolean field to indicate erasure, versus a colony container using a jump-counting skipfield, and as a reference point, a std::vector using a boolean field to indicate erasure.

We will ignore insertion benchmarks as the performance differences are not substantial between the boolean colony and the jump-counting colony, and focus on iteration and erasure results. Here are the iteration results prior to any erasures:

Click images or hover over to see results at linear scale instead

std::vector with a boolean erasure field comes in first, jump-counting colony second and the boolean colony close behind. At this stage the difference in speed between the last two comes down to the branching code in the boolean iteration compared to the simple addition necessary for the jump-counting iteration. Now we measure erasure performance, when iterating over the containers and erasing 25% of all elements at random:

Both boolean field implementations perform better, somewhat unsurprisingly, as there is only bit-flipping to be done for them. Now let's look at iteration after having erased that 25% of elements at random:

test result graph

Immediately we see a performance advantage for the jump-counting colony over both boolean approaches. The jump-counting colony's performance increases due to the fewer number of elements to read, but the boolean colony's performance almost halves. There are two reasons for this; firstly, the boolean colony must check the value of each skipfield node to determine the erased status of the corresponding element, and hence has no ability in the case of a run of erased elements to simply skip from one non-erased element to the next non-erased element. The jump-counting colony has this ability, and so iteration speed is greatly increased in this way.

Secondly, in the first iteration test there were no erased elements and hence while the boolean colony had branching code (to enable skipping of an individual element when it's corresponding skipfield node indicated it was erased), the CPU's branch prediction could lower the performance cost of the branching code minimizing cache misses, as there was only one path for the branch to take at this point, due to lack of erasures. But with 25% of all elements erased, the iterator will detect a skipped element 25% of the time, the default branching that the CPU will predict is therefore wrong 25% of the time, which dramatically increases the cost of the branching code. We will now show this effect further by erasing 50% of all elements in the original containers at random:

Here the boolean colony's iteration performance is worse than the 25% erasure performance, because for 50% randomized erasures there is no chance for the CPU's branch prediction to be correct any more than 50% of the time. The jump-counting colony's iteration performance continues to scale proportionally to the number of erasures, as does the std::vector's. Now we erase 75% of all elements in the original containers:

The jump-counting colony continues to scale proportionally to the number of erasures. The boolean approaches performs slightly better than in the 50% erasure iteration test, the likely reason being that at this point the CPU branch prediction can work at least as adequately as it did with the 25% erasure iteration, but unlike that benchmark, in the case iterating when 75% of all elements have been erased, only 25% of the original elements are having their values read, as opposed to the 75% of all original elements having their values read in the 25% erasure iteration test. It's also worth noting that at this point erasure performance for the boolean colony almost matches the colony with the jump-counting skipfield

Overall we can see that not only the lack of branch code but also the lack of reliance on CPU branch prediction plays a large role in the performance of a jump-counting pattern, over a simple boolean pattern. On a CPU without branch prediction, we can expect the boolean pattern to perform worse in the majority of cases, with the performance being closer to the 50% erasure iteration test. The rest of the performance loss is made up of the number of reads necessary for the boolean skipfield, versus a jump-counting skipfield.

Lastly we perform a slightly different test, simply to compare the performance of a std::deque - the one container under GCC which has performance characteristics that are contendors to colony in the same contexts. Again, we use a boolean skipfield for deque instead of doing proper erasure. This should show the performance benefit of a colony's structure over a deque's ignoring the jump-counting skipfield pattern.

From this we can see that a colony's internal structure can perform better than a std::deque's, regardless of whether a jump-counting or regular boolean skipfield is used - this is likely due to the fewer memory allocations necessary for colony due to it's 2x growth factor.