Home

PLF C++ Library - plf::bitset/bitsetb/bitsetc

Introduction

plf::bitset

This is a basic constexpr drop-in replacement for std::bitset, with generally better performance (this may change a little if vendors borrow the algorithms), some small changes to functions plus additional functions. The following stats summarise the significant performance results from the benchmarks versus std::bitset under GCC-libstdc++/MSVC-MSSTL respectively (I did not test libc++).

Under release (O2, AVX2) builds it has:

Under debug builds it has:

See benchmarks for more details. Other performance characteristics are more or less the same between plf and std.

Like std::bitset it takes it's size as a template parameter and allocates on the stack. If you want it to allocate on the heap, heap-allocate the whole bitset ie. plf::bitset<134> *temp = new plf::bitset<134>. In addition, it takes its storage type as an optional parameter (must be a trivial unsigned integer data type - uses std::size_t by default), which means, if you want to make the bitset use the least space possible for a size which isn't a multiple of sizeof(size_t) * 8, you can always do: plf::bitset<134, unsigned char>; or similar to reduce waste. All other functions match the std::bitset documentation, with the exceptions noted below.

New functionality

In addition plf::bitset brings new functions to the table:

Other differences

plf::bitsetb

This is a "borrowing" bitset. It implements the same functions (with the exception of operators ~, << and >> (<<= and >>= still exist)) as plf::bitset, but instead of allocating, it takes both it's size and it's buffer as constructor parameters. It is not as fast as plf::bitset nor necessarily (depending on the functions used) std::bitset, but this means you can construct a bitset from any readily-available memory you happen to have. It does not contain have the allocator or 'hardened mode' template parameters, but still takes the storage type as an optional template parameter, which means you can match your (unsigned integer) storage type to the supplied buffer. Or, if you don't mind undefined behaviour you can reinterpret_cast the pointer to your supplied buffer as std::size_t * (or whatever storage type you specify in the bitset template parameter).

It has a move constructor and a move assignment operator, whereas plf::bitset doesn't. These copy both size and the buffer pointer from source to *this and NULL the source. While most functions in plf::bitset are noexcept, plf::bitsetb cannot provide that guarantee because it doesn't know what the user-supplied buffer is and whether it is in fact as large as the size supplied by the user. In other words, out-of-bounds accesses are possible if the user is negligent and doesn't supply a large enough buffer (or deallocates the buffer then performs accesses on the bitset). On the plus side, you can change the size of the bitset on the fly (provided your buffer is large enough), and you can change the buffer+size by moving from another bitsetb. It cannot supply an operator ~ because that would involve returning a bitsetb with a dynamically-allocated buffer (which the user is then responsible for deallocating). Instead, copy-construct your own secondary bitsetb and call flip() on it.

Lastly, because size is a constructor parameter, you can have a bitsetb as part of a class without having to define the same sized bitsetb between class instances, or having to define the class as a template (this is helpful when bulk-processing multiple instances of a class). This would not be possible if size were a template parameter. This style of implementation is primarily what I constructed these bitsets for - the more standards-like plf::bitset is more of a happy side-effect of discovering faster methods for doing what I want. Please note: using bitsetb for a small bitset makes no sense - the size of the internal pointer to the buffer + the size of the internal 'size' parameter, in total, is 128 bits on a 64-bit platform. Therefore if you're making a 128-bit bitset you waste twice as much memory if you use plf::bitsetb over plf::bitset.

Other differences: swap is non-noexcept in bitsetb because the sizes of the two bitsetb's could differ (in which case an exception is thrown), whereas they cannot for plf::bitset. I didn't think it appropriate to swap buffers for this function, as this could lead to unexpected and odd behaviour, depending on the user's setup. The constexpr change_size(std::size_t new_size) function is provided for changing the size of your bitset (user discretion is required to ensure that the supplied buffer is in fact large enough to accomodate the new size). Copy construction takes a bitset size argument. Copy construction/assignment copies across a number of bytes equivalent to the smaller of the two bitset's sizes. Some small optimisations are not available because size is a constructor argument, not a template parameter.

plf::bitsetc (C++20 only)

This is useful for one of the same situations as bitsetb ie. when you want to use the bitset as part of a class, but don't want to templatize the class in order to support different sizes of bitset within different class instances - as you would have to do if you were to use plf::bitset or std::bitset. But instead of borrowing, it allocates the buffer itself. It inherits bitsetb but takes an allocator type as a template parameter (std::allocator<storage_type> by default) and dynamically allocates the buffer on the heap on construction, deallocating upon destruction. Unlike plf::bitset and plf::bitsetb it is supported under C++20 and upwards only - as the 'using' keyword is the only way to easily utilize the template member functions from bitsetb. Also, constexpr change_size(std::size_t new_size) now allocates a new buffer and copies the contents of the old buffer to it, before deallocating the old buffer. Swap, unlike bitsetb, swaps both the buffer and the size of the bitset, since the buffers are owned by the bitset and not 'borrowed' from the user as with bitsetb. Also, like plf::bitset, the copy constructor takes it's size from the source, not from a parameter. Otherwise it's functionality is the same as bitsetb.

If you merely want to have a bitset allocated on the heap and don't require different bitset sizes within instances of the same non-template class, allocating a plf::bitset instance on the heap is a better way to go than plf::bitsetc.

Frequently-asked questions

Why no iterator?

Iterators for bitsets don't work well, the reason being that an iterator has to, upon ++ iteration, increment the bit location within the selected storage unit and if it's over the sizeof(storage_type) threshold, increment the storage unit location to the next one over. This necessitates a branch instruction, which (and I did implement an iterator to prove this) makes it slower than simply using an index and the [] operator (which doesn't require a branch instruction). Also, an iterator is larger than storing an index because you have to store both the index of the storage unit and the sub-storage-unit bit index. Essentially there are no advantages to using an iterator over using a std::size_t index number.

Why not support std::string/long int constructors?

Because I couldn't see myself using it.

License

plf::bitset, plf::bitsetb and plf::bitsetc are under the Computing for Good ethical licence.

Download

Download here or view the repository

The bitset libraries are simple .h header files, to be included with a #include command.

Benchmarks for plf::bitset

Results below were calculated on an Intel i7-9750H processor under Windows 10 in safe mode, using nuwen mingw GCC 13.2 and MSVC 2022 (results for earlier CPUs were either similar or better). Codegen was O2, AVX2 and C++23 for the 'Release' results, just C++23 for the 'Debug' results. Code runs are mostly 'hot' with a number of 'warm-up' function passes before timing is calculated. Results are averaged across a large number of runs. Alll benchmark code is available here. I did not bother with benchmarks for bitsetb and bitsetc since their buffers are allocated on the heap and as such are not performance-comparable with std::bitset. Hardened template parameter is default (off).

Release build results

The only function results included below are where there were significant differences in performance between the different bitset implementations. Please assume that for all other functions the results are roughly equivalent.

set(position, value)

test result graph

One can see here that while the libstdc++ approach almost optimizes equivalently to plf::bitset under GCC, the exact same (seriously, check the libstdc++ vs MSSTL bitset code) approach optimizes poorly in MSVC, being roughly 3 times slower. The code in question is a branch between two expressions, for both libstdc++/MSSTIL, so my guess (I haven't checked the assembly) is that in this scenario GCC calculates both expressions and then performs a conditional move between the two results, whereas MSVC doesn't.

to_string()

test result graph

The reason why MSVC has such a superior result here is because it has an intrinsic (I'm guessing assembly-level or AVX-optimized) instruction for converting the bits to a string. I'm not even going to *try* and compete with that. However for GCC/libstdc++ we see a +144% performance improvement for plf::bitset over std::bitset.

>> and << shifting

test result graph

test result graph

Both shifts are roughly +100% faster for plf under GCC/libstdc++, and +35%/20% faster under MSVC/MSSTL.

plf::bitset::set_range/reset_range vs std::bitset::set/reset loop

Click image or hover over to see results at linear scale instead

test result graph

This result shows the difference between the optimized plf::bitset set_range/reset_range functions and doing the same thing in std::bitset using a loop and set() or reset(). The bitset sizes were 128000 bits. The operation was repeated 10000 times with the start and end locations of the range of bits to set/reset being randomly calculated each time, so, anything between 1 and 127999 bits being set/reset. In GCC set_range was +34652% faster than std::bitset::set, while in MSVC it was +67612% faster.

Test suite benchmark

test result graph

This shows the result of testing all functionality in plf::bitset vs std::bitset (minus the functions which are not shared between std::bitset and plf::bitset). Under GCC/libstdc++ there is a +24% overall performance increase, under MSVC/MSSTL it is +20%.

Operator [ ] ie. bit access

test result graph

Here we see the results for plf are 3% faster in GCC/libstdc++, but equivalent under MSVC. This is surprising since both libstdc++ and MSSTL use almost exactly the same operator [ ] code. The main difference is the lack of boundary checking in the plf code.

Debug build results

As a favour to game development, and all other fields which work extensively with debug builds as well as optimized builds and/or non-AVX builds, the following are provided. Again, results are only shown for where there were significant performance differences between bitset implementations.

set(position, value)

test result graph

Despite optimizing well at O2/AVX2, the libstdc++ approach to set(value, pos) is 3x slower than the plf approach in debug mode. While the same approach is only 31% slower than plf in MSVC - despite being 2x slower in release mode.

to_string()

test result graph

The superior result for MSVC's to_string intrinsic function is even more prominent here. For GCC/libstdc++ we see +27% improvement for plf over std.

>> and << shifting

test result graph

test result graph

For both results plf's approach is roughly +110% faster under GCC, while being around +85%/+66% faster under MSVC.

plf::bitset::set_range/reset_range vs std::bitset::set/reset loop

Click image or hover over to see results at logarithmic scale instead

test result graph

In debug mode we see plf's set_range is +428127% faster than a GCC/libstdc++ std::bitset::set loop, while in MSVC/MSSTL it is +750726% faster.

Test suite benchmark

test result graph

Under GCC/libstdc++ plf is +175% faster than std, under MSVC/MSSTL it is +40% faster.

Operator [ ]

test result graph

In debug mode we see the plf approach is +360% faster under GCC and +132% faster under MSVC.

Version history

Contact: footer
plf:: library and this page Copyright (c) 2025, Matthew Bentley