PLF C++ Library - plf::queue - 3rd gen i5 GCC Benchmark results

Last updated 13-06-2021 v1.15

Test setup
Raw performance tests
Real-world 'pump' tests
'Pump' tests memory usage
Conclusion

Test setup

The test setup is an an Intel 3rd gen i5, 8GB ram, running mingw GCC 9.2 x64 as compiler. OS is a stripped-back Windows 7 x64 SP1 installation with most services, background tasks (including explorer) and all networking disabled. Build settings are "-O2 -march=native -s -std=c++2a". These benchmarks are performed without calling reserve() on plf::queue, but if reserve() is used, performance of plf_queue will be much greater, since std::queue does not support reserve().

Tests are based on a sliding scale of number of runs vs number of elements, so a test with only 10 elements in a container may average 100000 runs, whereas a test with 100000 elements may only average 10 runs. This tends to give adequate results without overly lengthening test times.

Tests are carried out on the following types: (a) a 8-bit type ie. char, (b) a 32-bit type ie. int, (c) a 64-bit type ie. double, (d) a small struct containing two pointers and four scalar types, and (e) a large struct containing 2 pointers, 4 scalar types, a large array of ints and a small array of chars.

Initial testing only tests basic statistics in terms of raw performance. These are not overly useful in the context of a queue but are included anyway for those interested. The more-useful testing is the 'pump' test in the second section. This simulates real-world usage of a queue.

Raw performance tests

In order to better facilitate accurate time-keeping for short tests, both container construction and destruction times are included in the tests. The sequence is as follows: construction, push N elements, read (front) + pop all elements from the front, destruction. This is just to get general raw performance metrics and is not intended as a usage sequence. Because unlike a regular container, a queue must be pushed for every pop, and popped for every read, it most sense to combine these two timings and compare on that basis, as what is most important is the overall time taken. For that reason both separate and combined ("total time") benchmarks are presented below. However, since for every pop there must be a push, but not vice-versa, push is considered the more important of the two. For these tests priority is set to plf::speed.

Construction + Push

Click images or hover over to see results at linear scale instead

Back + Pop + Destruction

Click images or hover over to see results at linear scale instead

Total Time

Click images or hover over to see results at linear scale instead

Real-world 'pump' performance tests

This simulates a real-world queue scenario, with a relatively stable number of pushes and pops occuring most of the time, with some increases and decreases in the overall number of elements occuring over time. The 'pump' ratio determines how much the base number of elements, N, increases or decreases by over the course of the test. In the benchmarks below a pump ratio of 1.75x is used, meaning that the number of elements will increase to N * 1.75 at points, and at other points will drop to N / 1.75 (ie. slightly more than half-N).

Both total time and memory usage of std::queue vs plf::queue are presented for these benchmarks. The test involves:

Pushing elements until size N is reached
Pushing/popping equal amounts over a period of time
Gradually increasing size while pushing and popping until size N * 1.75 is reached
Pushing/popping equal amounts over a period of time
More-gradually decreasing size while pushing and popping until size N is reached
Pushing/popping equal amounts over a period of time
Quickly decreasing size while pushing and popping until size N / 1.75 is reached
Pushing/popping equal amounts over a period of time
Quickly increasing size while pushing and popping until size N * 1.75 is reached
Pushing/popping equal amounts over a period of time
Very-quickly decreasing size until size N is reached
Pushing/popping equal amounts over a period of time

Memory usage at each of these points is added to a total which is later averaged to give the memory stats following this section.

Unfortunately linear view is not available for these graphs as the result of some bad changes to libreoffice preventing me creating them.

plf::queue with priority == plf::speed is on average 20% faster than std::queue for type char, while with priority == plf::memory it is 7% faster.

plf::queue with priority == plf::speed is on average 10% faster than std::queue for type int, while with priority == plf::memory it is 9% faster.

plf::queue with priority == plf::speed is on average 15% faster than std::queue for type double, while with priority == plf::memory it is 12% faster.

plf::queue with priority == plf::speed is on average 40% faster than std::queue for small structs, while with priority == plf::memory it is 35% faster.

plf::queue with priority == plf::speed and plf::memory is on average 65% faster than std::queue for large structs.

plf::queue performs faster in every category, but moreso the larger the type is.

'Pump' tests memory usage

Memory usage comparisons from the above benchmarks are now presented.

For char plf::queue with priority == speed uses on average 30% more memory than std::queue, mostly in the lower ranges of numbers of elements, while with priority == memory it uses on average 10% less memory than std::queue.

For int plf::queue with priority == speed uses on average 34% more memory than std::queue, mostly in the lower ranges of numbers of elements, while with priority == memory it uses on average 2% less memory than std::queue.

For double plf::queue with priority == speed uses on average 36% more memory than std::queue, mostly in the lower ranges of numbers of elements, while with priority == memory it uses on average 2% more memory than std::queue.

For small structs plf::queue with priority == speed uses on average 32% more memory than std::queue, mostly in the lower ranges of numbers of elements, while with priority == memory it uses on average 2% more memory than std::queue.

For large structs plf::queue with priority == speed uses on average 60% more memory than std::queue, mostly in the lower ranges of numbers of elements, while with priority == memory it uses on average 11% more memory than std::queue. This is understandable as libstdc++'s deque implementation has a fixed block size of 512 bytes, while the large struct in this benchmark uses 496 bytes, meaning it only allocates one block per element, which is slow but saves significant amounts of ram.

Conclusion

plf::queue greatly out-performs std::queue (ie. std::deque). There are no other std:: containers which can replicate queue functionality with the exception of std::list, and as this is exceptionally slower than deque there is little point to benchmarking it. The larger the stored type is, the greater the performance advantage of plf::queue is. There is an increased memory cost when using plf::queue with template parameter Priority == plf::speed (the default), but with Priority == plf::memory the memory cost is much less and may even be smaller than std::deque's usage, depending on the size of the element type.