PLF C++ Library - plf::bitset/bitsetb/bitsetc

Intro
Frequently-asked questions
License
Download
Benchmarks
Version History
Contact

Introduction

plf::bitset (C++11 and above)

This is a basic constexpr drop-in replacement for std::bitset, with generally better performance (this may change a little if vendors borrow the algorithms), some small changes to functions plus additional functions. The following stats summarise the significant performance results from the benchmarks versus std::bitset under GCC-libstdc++/MSVC-MSSTL respectively (I did not test libc++).

Under release (O2, AVX2) builds it has:

34652%/67612% faster setting/resetting of ranges of bits.
101%/35% faster left-shifting and 98%/22% right-shifting.
6%/204% faster set(position, value).
3%/0% faster operator [ ].
24%/20% faster overall in test suite benchmarks (testing all functionality of bitset on loop).

Under debug builds it has:

428127%/750726% faster setting/resetting of ranges of bits.
108%/85% faster left-shifting and 110%/66% right-shifting.
206%/31% faster set(position, value).
360%/132% faster operator [ ].
175%/40% faster overall in test suite benchmarks.

See benchmarks for more details. Other performance characteristics are more or less the same between plf and std.

Like std::bitset it takes it's size as a template parameter and allocates on the stack. If you want it to allocate on the heap, heap-allocate the whole bitset ie. plf::bitset<134> *temp = new plf::bitset<134>. In addition, it takes its storage type as an optional parameter (must be a trivial unsigned integer data type - uses std::size_t by default), which means, if you want to make the bitset use the least space possible for a size which isn't a multiple of sizeof(size_t) * 8, you can always do: plf::bitset<134, unsigned char>; or similar to reduce waste. All other functions match the std::bitset documentation, with the exceptions noted below.

New functionality

In addition plf::bitset brings new functions to the table:

constexpr void bitset(bitset &source) noexcept
Copy constructor.
constexpr void operator = () noexcept
Copy assignment.
constexpr void set_range(size_t begin_index, size_t end_index), constexpr void reset_range(size_t begin_index, size_t end_index), constexpr void set_range(size_t begin_index, size_t end_index, bool value)
These are optimized functions for setting a range of bits to 1 or 0. See the summary above or the benchmarks below for the performance benefits.
constexpr std::size_t first_one() noexcept, constexpr std::size_t last_one() noexcept, constexpr std::size_t first_zero() noexcept, constexpr std::size_t last_zero() noexcept
Optimized functions to find the index of the first/last 1 or 0 in the bitset. If no 1's or 0's are found (depending on which function is called), the function will return std::numeric_limits<std::size_t>::max().
constexpr std::size_t next_one(std::size_t index) noexcept, constexpr std::size_t prev_one(std::size_t index) noexcept, constexpr std::size_t next_zero(std::size_t index) noexcept, constexpr std::size_t prev_zero(std::size_t index) noexcept
Optimized functions to find the index of the next/previous 1 or 0 in the bitset before/after a user-supplied index. If no 1's or 0's are found (depending on which function is called), or the supplied index is greater than the size of the bitset, the function will return std::numeric_limits<std::size_t>::max().
constexpr void shift_left_range (std::size_t shift_amount, std::size_t first) noexcept, constexpr void shift_left_range_one (const std::size_t first) noexcept
These are basically operator >>= but starting from a particular index - all bits before that index are unaffected. The second function is an optimization of the first specifically for shifting by one. I could've written shift_right_range and/or first, last variants as well but I didn't personally have any need for them. The 'left' refers to the direction the bits move index-wise (which to be honest should be how operator <<= is defined).
constexpr std::basic_string<char_type, traits, allocator_type> to_rstring()
I find the reverse ordering of std::bitset::to_string() compared to the actual indices to be a waste of my time and unintuitive. It is also slower to calculate. This fixes that.
constexpr std::basic_string<char> to_srstring()
Same as the above but faster. Only supports 1/0 output, and use only when not dealing with super-large bitsets, as it calculates the initial array on the stack.
constexpr unsigned long to_rulong() const and constexpr unsigned long long to_rullong() const
Similar to to_rstring, these deliver the same contents as to_ulong() and to_ullong() but with the order reversed ie. in index order.
constexpr void swap(bitset &source) noexcept, void std::swap(bitset &a, bitset &b) noexcept
Does an allocation-free swap of two plf::bitset's (with equal sizes, storage types and allocators) using the XOR method.

Other differences

Hardened mode - the third template parameter of plf::bitset is the hardened bool (false by default) which determines whether or not set, reset, set_range, reset_range and operator [ ] functions do bounds-checking. All hardened mode checks are constexpr, so the bounds-checking code won't be generated if hardened is false under C++17 and above.
Because I couldn't see the point in writing a bit reference class, operator [ ] can only be used to read bits, not to write to them. If you want to write to them, use set(pos), reset(pos) or set(pos, value) - the performance is the same because of the (by default) lack of bounds-checking.
plf::bitset only supports a default constructor and a copy constructor, none of the extra ones from std::bitset like the "from std::string" constructor.
to_string defaults to signed char, not char (which is not signed on all platforms) as signedness is necessary for the algorithm to work. Likewise, all potential supplied types to to-string()/to_rstring() must be signed.
Because it is pointless to do so, the following functions do not return anything: flip()/flip(pos), set()/set(pos)/set(pos, value), reset()/reset(pos).

plf::bitsetb (C++11 and above)

This is a "borrowing" bitset. It implements the same functions (with the exception of operators ~, << and >> (<<= and >>= still exist)) as plf::bitset, but instead of allocating, it takes both it's size and it's buffer as constructor parameters. It is not as fast as plf::bitset nor necessarily (depending on the functions used) std::bitset, but this means you can construct a bitset from any readily-available memory you happen to have. It does not contain have the allocator or 'hardened mode' template parameters, but still takes the storage type as an optional template parameter, which means you can match your (unsigned integer) storage type to the supplied buffer. Or, if you don't mind undefined behaviour you can reinterpret_cast the pointer to your supplied buffer as std::size_t * (or whatever storage type you specify in the bitset template parameter).

It has a move constructor and a move assignment operator, whereas plf::bitset doesn't. These copy both size and the buffer pointer from source to *this and NULL the source. While most functions in plf::bitset are noexcept, plf::bitsetb cannot provide that guarantee because it doesn't know what the user-supplied buffer is and whether it is in fact as large as the size supplied by the user. In other words, out-of-bounds accesses are possible if the user is negligent and doesn't supply a large enough buffer (or deallocates the buffer then performs accesses on the bitset). On the plus side, you can change the size of the bitset on the fly (provided your buffer is large enough), and you can change the buffer+size by moving from another bitsetb. It cannot supply an operator ~ because that would involve returning a bitsetb with a dynamically-allocated buffer (which the user is then responsible for deallocating). Instead, copy-construct your own secondary bitsetb and call flip() on it.

Lastly, because size is a constructor parameter, you can have a bitsetb as part of a class without having to define the same sized bitsetb between class instances, or having to define the class as a template (this is helpful when bulk-processing multiple instances of a class). This would not be possible if size were a template parameter. This style of implementation is primarily what I constructed these bitsets for - the more standards-like plf::bitset is more of a happy side-effect of discovering faster methods for doing what I want. Please note: using bitsetb for a small bitset makes no sense - the size of the internal pointer to the buffer + the size of the internal 'size' parameter, in total, is 128 bits on a 64-bit platform. Therefore if you're making a 128-bit bitset you waste twice as much memory if you use plf::bitsetb over plf::bitset.

Other differences: swap is non-noexcept in bitsetb because the sizes of the two bitsetb's could differ (in which case an exception is thrown), whereas they cannot for plf::bitset. I didn't think it appropriate to swap buffers for this function, as this could lead to unexpected and odd behaviour, depending on the user's setup. The constexpr change_size(std::size_t new_size) function is provided for changing the size of your bitset (user discretion is required to ensure that the supplied buffer is in fact large enough to accomodate the new size). Copy construction takes a bitset size argument. Copy construction/assignment copies across a number of bytes equivalent to the smaller of the two bitset's sizes. Some small optimisations are not available because size is a constructor argument, not a template parameter.

plf::bitsetc (C++20 and above)

This is useful for one of the same situations as bitsetb ie. when you want to use the bitset as part of a class, but don't want to templatize the class in order to support different sizes of bitset within different class instances - as you would have to do if you were to use plf::bitset or std::bitset. But instead of borrowing, it allocates the buffer itself. It inherits bitsetb but takes an allocator type as a template parameter (std::allocator<storage_type> by default) and dynamically allocates the buffer on the heap on construction, deallocating upon destruction. Unlike plf::bitset and plf::bitsetb it is supported under C++20 and upwards only - as the 'using' keyword is the only way to easily utilize the template member functions from bitsetb. Also, constexpr change_size(std::size_t new_size) now allocates a new buffer and copies the contents of the old buffer to it, before deallocating the old buffer. Swap, unlike bitsetb, swaps both the buffer and the size of the bitset, since the buffers are owned by the bitset and not 'borrowed' from the user as with bitsetb. Also, like plf::bitset, the copy constructor takes it's size from the source, not from a parameter. Otherwise it's functionality is the same as bitsetb.

If you merely want to have a bitset allocated on the heap and don't require different bitset sizes within instances of the same non-template class, allocating a plf::bitset instance on the heap is a better way to go than plf::bitsetc.

Frequently-asked questions

Why no iterator?

Iterators for bitsets don't work well, the reason being that an iterator has to, upon ++ iteration, increment the bit location within the selected storage unit and if it's over the sizeof(storage_type) threshold, increment the storage unit location to the next one over. This necessitates a branch instruction, which (and I did implement an iterator to prove this) makes it slower than simply using an index and the [] operator (which doesn't require a branch instruction). Also, an iterator is larger than storing an index because you have to store both the index of the storage unit and the sub-storage-unit bit index. Even if you implement it in terms of operator [], you're only adding unnecessary boilerplate complexity over a std::size_t index number and slowing down things like += operations in debug mode by passing numbers to functions etcetera. Essentially there are no advantages to using an iterator over using a std::size_t.

Why not support std::string/long int constructors?

Because I couldn't see myself using it.

License

plf::bitset, plf::bitsetb and plf::bitsetc are under the Computing for Good ethical licence.

Download

Download here or view the repository

The bitset libraries are simple .h header files, to be included with a #include command.

Benchmarks for plf::bitset

Results below were calculated on an Intel i7-9750H processor under Windows 10 in safe mode, using nuwen mingw GCC 13.2 and MSVC 2022 (results for earlier CPUs were either similar or better). Codegen was O2, AVX2 and C++23 for the 'Release' results, just C++23 for the 'Debug' results. Code runs are mostly 'hot' with a number of 'warm-up' function passes before timing is calculated. Results are averaged across a large number of runs. Alll benchmark code is available here. I did not bother with benchmarks for bitsetb and bitsetc since their buffers are allocated on the heap and as such are not performance-comparable with std::bitset. Hardened template parameter is default (off).

Release build results

The only function results included below are where there were significant differences in performance between the different bitset implementations. Please assume that for all other functions the results are roughly equivalent.

set(position, value)

One can see here that while the libstdc++ approach almost optimizes equivalently to plf::bitset under GCC, the exact same (seriously, check the libstdc++ vs MSSTL bitset code) approach optimizes poorly in MSVC, being roughly 3 times slower. The code in question is a branch between two expressions, for both libstdc++/MSSTIL, so my guess (I haven't checked the assembly) is that in this scenario GCC calculates both expressions and then performs a conditional move between the two results, whereas MSVC doesn't.

to_string()

The reason why MSVC has such a superior result here is because it has an intrinsic (I'm guessing assembly-level or AVX-optimized) instruction for converting the bits to a string. I'm not even going to *try* and compete with that. However for GCC/libstdc++ we see a +144% performance improvement for plf::bitset over std::bitset.

>> and << shifting

Both shifts are roughly +100% faster for plf under GCC/libstdc++, and +35%/20% faster under MSVC/MSSTL.

plf::bitset::set_range/reset_range vs std::bitset::set/reset loop

Click image or hover over to see results at linear scale instead

This result shows the difference between the optimized plf::bitset set_range/reset_range functions and doing the same thing in std::bitset using a loop and set() or reset(). The bitset sizes were 128000 bits. The operation was repeated 10000 times with the start and end locations of the range of bits to set/reset being randomly calculated each time, so, anything between 1 and 127999 bits being set/reset. In GCC set_range was +34652% faster than std::bitset::set, while in MSVC it was +67612% faster.

Test suite benchmark

This shows the result of testing all functionality in plf::bitset vs std::bitset (minus the functions which are not shared between std::bitset and plf::bitset). Under GCC/libstdc++ there is a +24% overall performance increase, under MSVC/MSSTL it is +20%.

Operator [ ] ie. bit access

Here we see the results for plf are 3% faster in GCC/libstdc++, but equivalent under MSVC. This is surprising since both libstdc++ and MSSTL use almost exactly the same operator [ ] code. The main difference is the lack of boundary checking in the plf code.

Debug build results

As a favour to game development, and all other fields which work extensively with debug builds as well as optimized builds and/or non-AVX builds, the following are provided. Again, results are only shown for where there were significant performance differences between bitset implementations.

set(position, value)

Despite optimizing well at O2/AVX2, the libstdc++ approach to set(value, pos) is 3x slower than the plf approach in debug mode. While the same approach is only 31% slower than plf in MSVC - despite being 2x slower in release mode.

to_string()

The superior result for MSVC's to_string intrinsic function is even more prominent here. For GCC/libstdc++ we see +27% improvement for plf over std.

>> and << shifting

For both results plf's approach is roughly +110% faster under GCC, while being around +85%/+66% faster under MSVC.

plf::bitset::set_range/reset_range vs std::bitset::set/reset loop

Click image or hover over to see results at logarithmic scale instead

In debug mode we see plf's set_range is +428127% faster than a GCC/libstdc++ std::bitset::set loop, while in MSVC/MSSTL it is +750726% faster.

Test suite benchmark

Under GCC/libstdc++ plf is +175% faster than std, under MSVC/MSSTL it is +40% faster.

Operator [ ]

In debug mode we see the plf approach is +360% faster under GCC and +132% faster under MSVC.

Version history

2024-07-11: v1.1.0 - Added next_one, next_zero, prev_one and prev_zero functions. Corrections to support under C++11.
2024-03-29: v1.0.3 - Added shift_left_range and shift_left_range_one.
2024-02-26: v1.0.1 - XOR swap changed to naive swap technique, as it is faster.
2024-02-15: v1.0.0 - First release