Home

PLF C++ Library - plf::colony

Introduction

plf::colony is the highest-performance C++ template-based data container for high-modification scenarios with unordered data. All elements within a colony have a stable memory location, meaning that pointers/iterators to non-erased elements are valid regardless of insertions/erasures to the container. Specifically the container provides better performance than other std:: library containers when:

  1. Insertion order is unimportant,
  2. Insertions and erasures to the container are occurring frequently in realtime i.e. in performance-critical code, and/or
  3. Insertions and erasures to the container may not invalidate pointers/iterators to non-erased elements.

While the benchmarks later on the page are a better way to get a feel for the performance benefits, the general performance characteristics are:

As explored further in the benchmarks there are some vector/deque modifications which can outperform colony during iteration while maintaining pointer validity, but at a cost to usability and memory usage and without colony's insert/erasure advantages. Colony's other advantages include freeing and/or recycling of unused memory on-the-fly, guaranteed stability of pointers/references/iterators to non-erased elements (which makes programming with containers of inter-related data structures faster and easier), and broad cross-compiler support.

A colony's structure can be reasonably compared to a "bucket array", where each element location can be emptied and skipped over. Unlike some bucket arrays it does not use keys or id's, and it iterates faster due to the use of an advanced skipfield type (previously a high complexity jump-counting pattern, now a low complexity jump-counting pattern) instead of a boolean field, and frees or recycles unused memory from erased elements on-the-fly. It was initially developed predominantly for game development, so is designed to favour larger-than-scalar-type (structs/classes of greater-than 128 bits total) performance over scalar-type (float, int, etc) performance. It supports over-aligned types.

A visual infographic summarising colony's internal structure

Colony is the container upon which the current standards proposal for "std::hive" is based. The name and some functionality has diverged as a result of colony having to cater for C++03/11/etc containers, some functions being seen as unnecessary and the committee viewing colony as an inappropriate name for various reasons. A reference implementation for hive is here (C++20 and upwards only).

Motivation

Note: initial motivation for the project came from video game engine development. Since this point it has become a more generalized container. Nevertheless, the initial motivation is presented below.
When working on video game engines we are predominantly dealing with collections of data where:

  1. Elements within data collections refer to elements within other data collections (through a variety of methods - indices, pointers, etc). An example is a game entity referring to both a texture object and collision blocks, as well as sound data. These references must stay valid throughout the course of the game/level. For this reason, any container (or use of a container) which causes pointer or index invalidation can create difficulties or necessitate workarounds.
  2. Order is unimportant for the most part. The majority of data collections are simply iterated over, transformed, referred to and utilized with no regard to order.
  3. Erasing or otherwise removing or deactivating objects occurs frequently in-game and in realtime (though often erasures will be implemented to occur at the end of a frame due to multithreading concerns). An example could be destroying a wall, or a game enemy. For this reason methods of erasure which create strong performance penalties are avoided.
  4. Creating new objects and adding them into the gameworld on-the-fly is also common - for example, a tree which drops leaves every so often, or a quadtree.
  5. We don't always know in advance how many elements there will be in a container at the beginning of development, or even at the beginning of a level during playback - an example of this being a MMORPG (massively multiplayer online role-playing game). In a MMORPG the number of game entities fluctuates based on how many players are playing, though there may be maximum capacity limits. Genericized game engines in particular have to adapt to considerably different user requirements and scopes. For this reason extensible containers which can expand and contract in realtime are usually necessary.
  6. Depending on the complexity and scope of any given game, we can be dealing with anywhere between 10 and 100000 objects in a given area. We are not typically dealing with the types of capacities necessitated by large-scale mathematical or statistical applications.
  7. For performance reasons, memory storage which is more-or-less contiguous is preferred. Lists, vectors of pointers to dynamically-allocated objects, and maps as implemented in the standard library are unusable.
  8. Memory wastage is avoided, and in particular, any container which allocates upon initialisation tends to be avoided as this can incur purposeless memory and performance costs.

To meet these requirements, game developers tend to either (a) develop their own custom containers for given scenario or (b) develop workarounds for the failings of std::vector. These workarounds are many and varied, but the most common are probably:

  1. Using a boolean flag (or similar) to indicate the inactivity of an object (as opposed to actually erasing from the vector). When erasing, one simply adjusts the boolean flag, and when iterating, items with the adjusted boolean flag are skipped. External elements refer to elements within the container via indexes rather than pointers (which can be invalidated upon insertion).

    Advantages: Fast erasure.

    Disadvantages: Slow to iterate due to branching and unpredictable number of skipfield checks between any two non-skipped elements.

  2. Utilising a vector (or array) of data with a secondary vector (or array) of indexes. When erasing, the erasure occurs in the vector of indexes, not the vector of data, and when iterating, one iterates over the vector of indexes, then accessing the data from the vector of data via the index.

    Advantages: Faster iteration.

    Disadvantages: Erasure still incurs some reallocation cost, can increase jitter. Insertion comparatively slow.

  3. Combining a swap-and-pop mechanism with some form of dereferenced lookup system to enable contiguous element iteration (sometimes known as a 'packed array', various other names including "slot map"). In this case when erasing we swap the back element from the vector of elements with the element being erased, then pop the (swapped) back element. When iterating over the data we simply iterate through the vector of elements. However to maintain valid external links to elements (which may be moved at any time) we must also maintain a stable vector of indexes (or similar) and update the index numbers corresponding with the element being erased and the element being moved. In this case external objects referring to elements within the container must store a pointer or index to the index for the element in question. There are many variants of this method. It is more useful when the container's data is mostly being iterated over with fewer references to the individual elements from outside objects, and less insertion/erasure.

    Advantages: Iteration is at standard vector speed.

    Disadvantages: Erase could be slow if objects are large and swap cost is therefore large. Insertion cost is larger than other techniques. All references to elements incur additional costs due to a two-fold reference mechanism.


All three techniques have the disadvantage of slow singular insertions, and the first two will also continually expand memory usage when erasing and inserting over periods of time. The third deals better with this scenario as it swaps from the back rather than leaving gaps in the elements vector, however will suffer in performance if elements within the container are heavily referred to by external objects/elements.

Colony is an attempt to bring a more generic solution to this domain. It has the advantage of good iteration speed while maintaining a similar erasure speed to the boolean technique described above (faster for multiple consecutive erasures), and without causing pointer invalidation upon either erasure or insertion. It's insertion speed is also much faster than a vector's. Memory from erased elements is either reused by subsequent insertions or released to the OS on-the-fly. It achieves these ends via the mechanisms described in the Intro above, examined further in the Implementation section below.

More data on the performance characteristics of the colony versus other containers and workarounds can be found in the benchmarks section, read below to get an understanding of the mechanics.

Implementation

Any colony implementation is defined by 3 different aspects:

For the first of these, plf::colony uses a chained-group memory allocation pattern with a growth factor of 2, which can otherwise be described as a doubly-linked list of element "groups" - memory blocks with additional structural metadata, including in this case per-memory-block Low complexity jump-counting skipfields. A growth factor of 2 tends to gives the best performance in the majority of scenarios (hence why it is also used for the majority of vector implementations). Colony's minimum and maximum memory block sizes can also be adjusted to suit individual use-cases. Using a multiple-memory-block approach removes the necessity for data reallocation upon insertion (compared to a vector). Because data is not reallocated, all references, pointers and iterators to container elements stay valid post-insertion.

For the second of these aspects, plf::colony uses an advanced skipfield type called a low complexity jump-counting pattern (versions prior to v5.0 utilized the high complexity jump-counting skipfield pattern). The third aspect is currently an intrusive list of groups containing erased elements, combined with per-group free-lists of skipblocks. A skipblock, in colony terminology, refers to a run of contiguous erased elements - including runs of only a single erased element. Versions prior to v5.0 utilized free lists of individual skipped elements, and versions prior to v4.5 utilized a stack of pointers to individual skipped elements.

Due to a std::vector being the most widely-used and commonly-understood of the std:: library containers, we will contrast the storage mechanisms of a colony with that of a vector, starting with insertion:

Visual demonstration of inserting to a full vector Visual demonstration of inserting to a full colony

The lack of reallocation is also why insertion into a colony is faster than insertion into a std::vector. Now we look at erasure. When an element is erased in a std::vector, the following happens:

Visual demonstration of randomly erasing from a vector

When an element is erased in a colony, this happens:

Visual demonstration of randomly erasing from a colony

This is why a colony has a performance advantage over std::vector when erasing.

Upon subsequent insertions to a colony post-erasure, colony will check to see if its intrusive list of groups-with-erasures is empty (described earlier). If empty, it inserts to the end of the colony, creating a new group if the last group is already full. If the intrusive list of groups-with-erasures isn't empty, the colony takes the first group within that list and re-uses the first memory location from the free list of skipblocks within that group. If you erase all elements in any given group in a colony, the group is removed from the colony's chain of groups and released to the OS, and removed from the intrusive list of groups-with-erasures.

License

plf::colony is under a permissive zlib license. This means: Software is used on 'as-is' basis. It can be modified and used in commercial software. Authors are not liable for any damages arising from its use. The distribution of a modified version of the software is subject to the following restrictions:

Download

Download here or view the repository

The colony library is a simple .h header file, to be included with a #include command.
In addition if you are interested in benchmarking you can also download the plf benchmark suite.

A conan package for colony is available here.

A Build2 package for colony is available here.

Function Descriptions and syntax

Colony meets the requirements of the C++ Container, AllocatorAwareContainer, and ReversibleContainer concepts.

For the most part the syntax and semantics of colony functions are very similar to all std:: c++ libraries. Formal description is as follows:

template <class T, class Allocator = std::allocator<T>, typename plf::colony_priority Priority = plf::performance> class colony

T - the element type. In general T must meet the requirements of Erasable, CopyAssignable and CopyConstructible.
However, if emplace is utilized to insert elements into the colony, and no functions which involve copying or moving are utilized, T is only required to meet the requirements of Erasable.
If move-insert is utilized instead of emplace, T must also meet the requirements of MoveConstructible.

Allocator - an allocator that is used to acquire memory to store the elements. The type must meet the requirements of Allocator. The behaviour is undefined if Allocator::value_type is not the same as T.

Priority - either plf::performance or plf::memory_use. In the context of this implementation this changes the skipfield type from unsigned short (performance) to unsigned char (memory_use), unless the type sizeof is <= 10 bytes, in which case the skipfield type will always be unsigned char (this always performs better for these smaller types due to the effect on cache). The skipfield type is used to form the skipfield which skips over erased elements. The maximum size of element memory blocks is also constrained by this type's bit-depth (due to the nature of a jump-counting skipfield). As an example, unsigned short on most platforms is 16-bit which thereby constrains the size of individual memory blocks to a maximum of 65535 elements (the default size is less than this, 8192 elements, for performance reasons). unsigned short has been found to be the optimal type for performance with elements larger than 10 bytes, based on benchmarking. However this template parameter is also used in a variety of scenarios within functions relating to performance or memory usage. In the case of small collections (e.g. < 512 elements) in a memory-constrained environment, it can be useful to reduce the memory usage of the skipfield by reducing the skipfield bit depth to unsigned char. In addition the reduced skipfield size will reduce cache saturation in this case without impacting iteration speed due to the low total number of elements. Since these scenarios are on a per-case basis, it is best to leave the control in the user's hands for the most part.

Basic example of usage

#include <iostream>
#include "plf_colony.h"

int main(int argc, char **argv)
{
plf::colony<int> i_colony;

// Insert 100 ints: for (int i = 0; i != 100; ++i)
{
i_colony.insert(i);
}

// Erase half of them: for (plf::colony<int>::iterator it = i_colony.begin(); it != i_colony.end(); ++it)
{
it = i_colony.erase(it);
}

// Total the remaining ints: int total = 0;

for (plf::colony<int>::iterator it = i_colony.begin(); it != i_colony.end(); ++it)
{
total += *it;
}

std::cout << "Total: " << total << std::endl;
std::cin.get();
return 0;
}

Example demonstrating pointer stability

#include <iostream>
#include "plf_colony.h"

int main(int argc, char **argv)
{
plf::colony<int> i_colony;
plf::colony<int>::iterator it;
plf::colony<int *> p_colony;
plf::colony<int *>::iterator p_it;

// Insert 100 ints to i_colony and pointers to those ints to p_colony: for (int i = 0; i != 100; ++i)
{
it = i_colony.insert(i);
p_colony.insert(&(*it));
}

// Erase half of the ints: for (it = i_colony.begin(); it != i_colony.end(); ++it)
{
it = i_colony.erase(it);
}

// Erase half of the int pointers: for (p_it = p_colony.begin(); p_it != p_colony.end(); ++p_it)
{
p_it = p_colony.erase(p_it);
}

// Total the remaining ints via the pointer colony (pointers will still be valid even after insertions and erasures): int total = 0;

for (p_it = p_colony.begin(); p_it != p_colony.end(); ++p_it)
{
total += *(*p_it);
}

std::cout << "Total: " << total << std::endl;

if (total == 2500)
{
std::cout << "Pointers still valid!" << std::endl;
}

std::cin.get();
return 0;
}

Iterator Invalidation

All read-only operations, swap, std::swap, splice, operator= && (source), reserve, trim Never.
clear, operator= & (destination), operator= && (destination) Always.
reshape Only if memory blocks exist whose capacities do not fit within the supplied limits.
shrink_to_fit Only if capacity() != size().
erase Only for the erased element. If an iterator is == end() it will be invalidated if the back element of the colony is erased (similar to deque (22.3.8)). Likewise if a reverse_iterator is == rend() it will be invalidated if the front element of the colony is erased. The same applies with cend() and crend() for const_iterator and const_reverse_iterator respectively.
insert, emplace If an iterator is == end() or == begin() it may be invalidated by a subsequent insert/emplace. Likewise if a reverse_iterator is == rend() or == rbegin() it may be invalidated by a subsequent insert/emplace. The same applies with cend(), cbegin() and crend(), crbegin() for const_iterator and const_reverse_iterator respectively.

Member types

Member type Definition
value_type T
allocator_type Allocator
size_type Allocator::size_type (pre-c++11)
std::allocator_traits<Allocator>::size_type (post-c++11)
difference_type Allocator::difference_type (pre-c++11)
std::allocator_traits<Allocator>::difference_type (post-c++11)
reference Allocator::reference (pre-c++11)
value_type & (post-c++11)
const_reference Allocator::const_reference (pre-c++11)
const value_type & (post-c++11)
pointer Allocator::pointer (pre-c++11)
std::allocator_traits<Allocator>::pointer (post-c++11)
const_pointer Allocator::const_pointer (pre-c++11)
std::allocator_traits<Allocator>::const_pointer (post-c++11)
iterator BidirectionalIterator
const_iterator Constant BidirectionalIterator
reverse_iterator BidirectionalIterator
const_reverse_iterator Constant BidirectionalIterator

Constructors

Note: from this point onward, if a feature (eg. noexcept) is specified which is only available in a particular language version, that function is available for all language versions, but that aspect of it is only available when that language version or higher is used. Except where otherwise specified.

standard constexpr colony() noexcept(noexcept(allocator_type))
explicit colony(const allocator_type &alloc) noexcept

explicit colony(plf::colony_limits minmax) noexcept(noexcept(allocator_type))
colony(plf::colony_limits minmax, const allocator_type &alloc) noexcept
copy colony(const colony &source)

colony(const colony &source, const std::type_identity_t<allocator_type> &alloc)
move (C++11 and upwards) colony(colony &&source) noexcept

colony(colony &&source, std::type_identity_t<allocator_type> &alloc)

Note: postcondition state of source colony is the same as that of an empty colony.

fill colony(size_type n, const allocator_type &alloc = allocator_type())
colony(size_type n, plf::colony_limits minmax, const allocator_type &alloc = allocator_type())

colony(size_type n, const value_type &element, const allocator_type &alloc = allocator_type())
colony(size_type n, const value_type &element, plf::colony_limits minmax, const allocator_type &alloc = allocator_type())
range template<typename InputIterator>
colony(const InputIterator &first, const InputIterator &last, const allocator_type &alloc = allocator_type())
template<typename InputIterator>
colony(const InputIterator &first, const InputIterator &last, plf::colony_limits minmax, const allocator_type &alloc = allocator_type())
initializer list colony(const std::initializer_list<value_type> &element_list, const allocator_type &alloc = allocator_type())
colony(const std::initializer_list<value_type> &element_list, plf::colony_limits minmax, const allocator_type &alloc = allocator_type())
ranges v3 (C++20 and upwards) template<class range_type>
requires std::ranges::range<range_type>
colony(plf::ranges::from_range_t, range_type &&rg, const allocator_type &alloc = allocator_type()):

template<class range_type>
requires std::ranges::range<range_type>
colony(plf::ranges::from_range_t, range_type &&rg, const colony_limits block_limits, const allocator_type &alloc = allocator_type()):
Some constructor usage examples

Iterators

All iterators are bidirectional but also provide >, <, >= and <= for convenience (for example, in for loops). Functions for iterator, reverse_iterator, const_iterator and const_reverse_iterator follow:

operator *
operator ->
operator ++
operator --
operator =
operator ==
operator !=
operator <
operator >
operator <=
operator >=
operator <=> (C++20 and upwards)
base() (reverse_iterator and const_reverse_iterator only)

All operators have O(1) amortised time-complexity. Originally there were +=, -=, + and - operators, however the time complexity of these varied from O(n) to O(1) depending on the underlying state of the colony, averaging in at O(log n). As such they were not includable in the iterator functions (as per C++ standards). These have been transplanted to colony's advance(), next(), prev() and distance() member functions. Greater-than/lesser-than operator usage indicates whether an iterator is higher/lower in position compared to another iterator in the same colony (i.e. closer to the end/beginning of the colony). const_iterator is provided for all C++ language versions.

Member functions

Insert

single element iterator insert (value_type &val)
fill iterator insert (size_type n, value_type &val)
range template <class InputIterator> iterator insert (InputIterator first, InputIterator last)
move (C++11 and upwards) iterator insert (value_type&& val)
initializer list iterator insert (std::initializer_list<value_type> il)
ranges v3 (C++20 and upwards) template<class range_type>
requires std::ranges::range<range_type>
void insert_range(range_type &&the_range)

Assign

fill void assign (size_type n, value_type &val)
range template <class InputIterator> void assign (InputIterator first, InputIterator last)
initializer list void assign (std::initializer_list<value_type> il)
ranges v3 (C++20 and upwards) template<class range_type>
requires std::ranges::range<range_type>
void assign_range(range_type &&the_range)

Erase

single element iterator erase(const_iterator it)
range iterator erase(const_iterator first, const_iterator last)

Other functions

Non-member function overloads

Benchmarks

Benchmark results for colony v7.39 under GCC 11.2 x64 on an Intel 2018 i5 are here.
Benchmark results for colony v6.11 under GCC 9.2 x64 on an Intel Xeon E3-1241 (Haswell) are here.

Old benchmark results for colony v3.87 under GCC 5.1 x64 on an Intel E8500 (Core2) are here,
and under MSVC 2015 update 3 on an Intel Xeon E3-1241 (Haswell) are here. There is no commentary for the MSVC results.

Frequently Asked Questions

  1. TLDR, what is a colony?

    A combination of a linked-list of increasingly-large memory blocks with metadata. Part of this metadata is the head of a free-list of erased elements within that group, and also a 'next' pointer to the next group which has erased element locations which can be re-used. Another part of the metadata is a low complexity jump-counting skipfield, which is used to skip over erased elements during iteration in O(1) amortised time. The end result's a bidirectional, unordered C++ template data container which maintains positive performance characteristics while ensuring pointer stability to non-erased container elements.

  2. Where is it worth using a colony in place of other std:: containers?

    As mentioned, it is worthwhile for performance reasons in situations where the order of container elements is not important and:

    1. Insertion order is unimportant
    2. Insertions and erasures to the container occur frequently in performance-critical code, and
    3. Links to non-erased container elements may not be invalidated by insertion or erasure.

    Under these circumstances a colony will generally out-perform other std:: containers. In addition, because it never invalidates pointer references to container elements (except when the element being pointed to has been previously erased) it may make many programming tasks involving inter-relating structures in an object-oriented or modular environment much faster, and could be considered in those circumstances.

  3. What are some examples of situations where a colony might improve performance?

    Some ideal situations to use a colony: cellular/atomic simulation, persistent octtrees/quadtrees, game entities or destructible-objects in a video game, particle physics, anywhere where objects are being created and destroyed continuously. Also, anywhere where a vector of pointers to dynamically-allocated objects or a std::list would typically end up being used in order to preserve object references but where order is unimportant.

  4. What are the time complexities for general operations?

    In most cases the time complexity of operations results from the fundamental requirements of the container itself, not the implementation. In some cases, such as erase complexity, this can change a little based on the implementation, but the change is minor.

    Insert (single): O(1) amortised

    One of the requirements of colony is that pointers to non-erased elements stay valid regardless of insertion/erasure within the container. For this reason the container must use multiple memory blocks. If a single memory block were used, like in a std::vector, reallocation of elements would occur when the container expanded (and the elements were copied to a larger memory block). Hence, colony will insert into existing memory blocks when able, and create a new memory block when all existing memory blocks are full.

    Because colony (by default) has a growth pattern for new memory blocks, the number of insertions before a new memory block is required increases as more insertions take place. Hence the insertion time complexity is O(1) amortised.

    Insert (multiple): O(N) amortised

    This follows the same principle - there will occasionally be a need to create a new memory block, but since this is infrequent it is O(N) amortised.

    Erase (single): O(1) amortised (as of v5.00)

    Generally speaking erasure is a simple matter of destructing the element in question and updating the skipfield. Since we started using the low complexity jump-counting pattern (v5.00) instead of the high complexity jump-counting skipfield pattern, that skipfield update is always O(1). However, when a memory block becomes empty of non-erased elements it must be freed to the OS (or stored for future insertions, depending on implementation) and removed from the colony's sequence of memory blocks. It it was not, we would end up with non-O(1) amortised iteration, since there would be no way to predict how many empty memory blocks there would be between the current memory block being iterated over, and the next memory block with non-erased (active) elements in it. Since removal of memory blocks is infrequent the time complexity becomes O(1) amortised.

    Erase (multiple): O(N) amortised for non-trivially-destructible types, for trivially-destructible types between O(1) and O(N) amortised depending on range start/end (approximating O(log n) average)

    In this case, where the element is non-trivially destructible, the time complexity is O(N) amortised, with infrequent deallocation necessary from the removal of an empty memory block as noted above. However where the elements are trivially-destructible, if the range spans an entire memory block at any point, that block and it's skipfield can simply be removed without doing any individual writes to it's skipfield or individual destruction of elements, potentially making this a O(1) operation.

    In addition (when dealing with trivially-destructible types) for those memory blocks where only a portion of elements are erased by the range, if no prior erasures have occurred in that memory block you can erase that range in O(1) time, as there will be no need to check within the range for previously erased elements, and the low complexity jump-counting pattern only requires two skipfield writes to indicate a range of skipped nodes. The reason you would need to check for previously erased elements within that portion's range is so you can update the metadata for that memory block to accurately reflect how many non-erased elements remain within the block. If that metadata is not present, there is no way to ascertain when a memory block is empty of non-erased elements and hence needs to be removed from the colony. The reasoning for why empty memory blocks must be removed is included in the Erase(single) section, above.

    However in most cases the erase range will not perfectly match the size of all memory blocks, and with typical usage of a colony there is usually some prior erasures in most memory blocks. So for example when dealing with a colony of a trivially-destructible type, you might end up with a tail portion of the first memory block in the erasure range being erased in O(N) time, the second and intermediary memory block being completely erased and freed in O(1) time, and only a small front portion of the third and final memory block in the range being erased in O(N) time. Hence the time complexity for trivially-destructible elements approximates O(log n) on average, being between O(1) and O(N) depending on the start and end of the erasure range.

    std::find: O(n)

    This relies on basic iteration so is O(N).

    splice: O(1)

    Colony only does full-container splicing, not partial-container splicing (use range-insert with std::move to achieve the latter, albiet with the loss of pointer validity to the moved range). When splicing the memory blocks from the source colony are transferred to the destination colony without processing the individual elements or skipfields, inserted after the destination's final block. However if there are unused element memory spaces at the back of the destination container (ie. the final memory block is not full), the skipfield nodes corresponding to those empty spaces must be altered to indicate that these are skipped elements. Luckily as of v5.00 this is also a O(1) operation, hence the overall operation is O(1).

    Iterator operators ++ and --: O(1) amortised

    Generally the time complexity is O(1), as the skipfield pattern used (current) must allow for skipping of multiple erased elements. However every so often iteration will involve a transistion to the next/previous memory block in the colony's sequence of blocks, depending on whether we are doing ++ or --. At this point a read of the next/previous memory block's corresponding skipfield is necessary, in case the first/last element(s) in that memory block are erased and hence skipped. So for every block transition, 2 reads of the skipfield are necessary instead of 1. Hence the time complexity is O(1) amortised.

    Skipfields must be per-block and independent between memory blocks, as otherwise you would end up with a vector for a skipfield, which would need a range erased every time a memory block was removed from the colony (see notes under Erase above), and reallocation to larger skipfield memory block when the colony expanded. Both of these procedures carry reallocation costs, meaning you could have thousands of skipfield nodes needing to be reallocated based on a single erasure (from within a memory block which only had one non-erased element left and hence would need to be removed from the colony). This is unacceptable latency for any fields involving high timing sensitivity (all of SG14).

    begin()/end(): O(1)

    For any implementation these should generally be stored as member variables and so returning them is O(1).

    advance/next/prev: between O(1) and O(n), depending on current iterator location and distance. Average approximates O(log N).

    The reasoning for this is similar to that of Erase(multiple), above. Memory blocks which are completely traversed by the increment/decrement distance can be skipped in O(1) time by reading the metadata for the block which details the number of non-erased elements in the block. If the current iterator location is not at the very beginning of a memory block, then the distance within that block will need to be processed in O(N) time, unless no erasures have occurred within that block, in which case the skipfield does not need to be checked and traversal is simply a matter of pointer arithmetic and checking the distance between the current iterator position and the end of the memory block, which is a O(1) operation.

    Similarly for the final block involved in the traversal (if the traversal spans multiple memory blocks!) if no erasures have occurred in that memory block pointer arithmetic can be used and the procedure is O(1), otherwise it is O(N) (++ iterating over the skipfield). We can therefore say that the overall time complexity for these operations might approximate O(log N), while being between O(1) and O(N) amortised respectively.

  5. Is it similar to a deque?

    A deque is reasonably dissimilar to a colony - being a double-ended queue, it requires a different internal framework. It typically uses a vector of memory blocks, whereas the colony implementation uses a linked list of memory blocks, essentially. A deque can't technically use a linked list of memory blocks because it will make some random_access iterator operations (e.g. + operator) non-O(1). In addition, being random-access container, having a growth factor for memory blocks in a deque is problematic (not impossible though).

    A deque and colony have no comparable performance characteristics except for insertion (assuming a good deque implementation). Deque erasure performance varies wildly depending on the implementation compared to std::vector, but is generally similar to vector erasure performance. A deque invalidates pointers to subsequent container elements when erasing elements, which a colony does not.

  6. What are the thread-safe guarantees?

    Unlike a std::vector, a colony can be read from and inserted into at the same time (assuming different locations for read and write), however it cannot be iterated over and written to at the same time. If we look at a (non-concurrent implementation of) std::vector's thread-safe matrix to see which basic operations can occur at the same time, it reads as follows (please note push_back() is the same as insertion in this regard):

    std::vector Insertion Erasure Iteration Read
    Insertion No No No No
    Erasure No No No No
    Iteration No No Yes Yes
    Read No No Yes Yes

    In other words, multiple reads and iterations over iterators can happen simultaneously, but the potential reallocation and pointer/iterator invalidation caused by insertion/push_back and erasure means those operations cannot occur at the same time as anything else.

    Colony on the other hand does not invalidate pointers/iterators to non-erased elements during insertion and erasure, resulting in the following matrix:

    plf::colony Insertion Erasure Iteration Read
    Insertion No No No Yes
    Erasure No No No Mostly*
    Iteration No No Yes Yes
    Read Yes Mostly* Yes Yes

    * Erasures will not invalidate iterators unless the iterator points to the erased element.

    In other words, reads may occur at the same time as insertions and erasures (provided that the element being erased is not the element being read), multiple reads and iterations may occur at the same time, but iterations may not occur at the same time as an erasure or insertion, as either of these may change the state of the skipfield which's being iterated over. Note that iterators pointing to end() may be invalidated by insertion.

    So, colony could be considered more inherently thread-safe than a (non-concurrent implementation of) std::vector, but still has some areas which would require mutexes or atomics to navigate in a multithreaded environment.

  7. Any pitfalls to watch out for?

    1. Because erased-element memory locations will be reused by insert() and emplace(), insertion position is essentially random unless no erasures have been made, or an equal number of erasures and insertions have been made.
    2. Due to implementation reasons it is possible to assign (or construct) a iterator from a const_iterator, which is not really good const behaviour (only assigning/constructing a const_iterator from an iterator is good const practice). However preventing this results in errors in all versions of MSVC, so it has been ignored.
  8. Am I better off storing iterators or pointers to colony elements?

    Testing so far indicates that storing pointers and then using get_iterator() when or if you need to do an erase operation on the element being pointed to, yields better performance than storing iterators and performing erase directly on the iterator. This is simply due to the size of iterators (3 pointers in width).

  9. Any special-case uses?

    In the special case where many, many elements are being continually erased/inserted in realtime, you might want to experiment with limiting the size of your internal memory groups in the constructor. The form of this is as follows:
    plf::colony<object> a_colony(plf::colony_limits(500, 5000));
    where the first number is the minimum size of the internal memory groups and the second is the maximum size. Note these can be the same size, resulting in an unchanging group size for the lifetime of the colony (unless reshape is called or operator = is called).

    One reason to do this is that it is slightly slower to obtain an element location from the list of groups-with-erasures (subsequently utilising that group's free list of erased locations), than it is to insert a new element to the end of the colony (the default behaviour when there are no previously-erased elements). If there are any erased elements in the colony, the colony will recycle those memory locations, unless the entire group is empty, at which point it is freed to memory. So if a group size is large and many, many erasures occur but the group is not completely emptied, iterative performance may suffer due to large memory gaps between any two non-erased elements. In that scenario you may want to experiment with benchmarking and limiting the minimum/maximum sizes of the groups, and find the optimal size for a particular use case.

    Please note that the the fill, range and initializer-list constructors can also take group size parameters, making it possible to construct filled colonies using custom group sizes.

  10. What is colony's Abstract Data Type (ADT)?

    Though I am happy to be proven wrong I suspect colony is it's own abstract data type. While it is similar to a multiset or bag, those utilize key values and are not sortable (by means other than automatically by key value). Colony does not utilize key values, is sortable, and does not provide the sort of functionality one would find in a bag (e.g. counting the number of times a specific value occurs). Some have suggested similarities to deque - but as described earlier the three core aspects of colony are:

    1. A multiple-memory-block based allocation pattern which allows for the removal of memory blocks when they become empty of elements.
    2. A skipfield to indicate erasures instead of reallocating elements, the iteration of which should typically not necessitate the use of branching code.
    3. A mechanism for recording erased element locations to allow for reuse of erased element memory space upon subsequent insertion.

    The only aspect out of these which deque also shares is a multiple-memory-block allocation pattern - not a strong association. As a result, deques do not guarantee pointer validity to non-erased elements post insertion or erasure, as colony does. Similarly if we look at a multiset, an unordered one could be implemented on top of a colony by utilising a hash table (and would in fact be more efficient than most non-flat implementations) but the fact that there is a necessity to add something to make it a multiset (to take and store key values) means colony is not an multiset.

  11. Exception Guarantees?

    All operations which allocate memory have strong exception guarantees and will roll back if an exception occurs, except for operator = which has a basic exception guarantee (see below). For colony, iterators are bounded by asserts in debug mode, but unbounded in release mode, so it is possible for an incorrect use of an iterator (iterating past end(), for example) to trigger an out-of-bounds memory exception.

    The reason why operator = only has a basic guarantee is they do not utilize the copy-swap idiom, as the copy-swap idiom significantly increases the chance of any exception occurring - this is because the most common way an exception tends to occur during a copy operation is due to a lack of memory space, and the copy-swap idiom doubles the memory requirement for a copy operation by constructing the copy before destructing the original data.

    This is doubly inappropriate in game development, which colony has been initially for, where memory constraints are often critical and the runtime lag from memory spikes can cause detrimental game performance. So in the case of colony if a non-memory-allocation-based exception occurs during copy, the = operators will have already destructed their data, so the containers will be empty and cannot roll back - hence they have a basic guarantee, not a strong guarantee.

  12. Iterators not following rule of zero?

    It was found under GCC and clang that the high-modification scenario benchmarks suffered a 11% overall performance loss when iterator copy and move constructors/operators where not explicitly defined, possibly because GCC did not know to vectorize the operations without explicit code or because the compiler implemented copy/swap idiom instead of simply copying directly. No performance gain was found under any tests when removing the copy and move constructors/operators.

  13. Why must groups be removed when empty?

    Two reasons:

    1. Standards compliance: if groups aren't removed then ++ and -- iterator operations become O(random) in terms of time complexity, making them non-compliant with the C++ standard. At the moment they are O(1) amortised, typically one update for both skipfield and element pointers, but two if a skipfield jump takes the iterator beyond the bounds of the current group and into the next group. But if empty groups are allowed, there could be anywhere between 1 and size_type empty groups between the current element and the next. Essentially you get the same scenario as you do when iterating over a boolean skipfield. It would be possible to move these to the back of the colony as trailing groups, but this may create performance issues if any of the groups are not at their maximum size (see below).
    2. Performance: iterating over empty groups is slower than them not being present, cache wise - but also if you have to allow for empty groups while iterating, then you have to include a while loop in every iteration operation, which increases cache misses and code size. The strategy of removing groups when they become empty also removes (assuming randomised erasure patterns) smaller groups from the colony before larger groups, which has a net result of improving iteration, because with a larger group, more iterations within the group can occur before the end-of-group condition is reached and a jump to the next group (and subsequent cache miss) occurs. Lastly, pushing to the back of a colony is faster than recycling memory locations as each insertion occurs in subsequent similar memory location (which is cache-friendlier) and also less computational work is necessary. If a group is removed it's recyclable memory locations are also removed from re-use, hence subsequent insertions are more likely to be pushed to the back of the colony.
  14. Why use three pointers for iterators rather than a group pointer, single index and then dereferencing?

    It's faster. In attempting on two separate occasions to switch to an index-based iterator approach, utilising three separate pointers was consistently faster.

  15. Group sizes - what are they based on, how do they expand, etc

    Group sizes start from the either the default minimum size (8 elements, larger if the type stored is small) or an amount defined by the programmer (with a minimum of 3 elements). Subsequent group sizes then increase the total capacity of the colony by a factor of 2 (so, 1st group 8 elements, 2nd 8 elements, 3rd 16 elements, 4th 32 elements etcetera) until the maximum group size is reached. The default maximum group size is the maximum possible number that the skipfield bitdepth is capable of representing (std::numeric_limits<skipfield_type>::max()). By default the skipfield bitdepth is 16 so the maximum size of a group is 65535 elements. However the skipfield bitdepth is also a template parameter which can be set to any unsigned integer - unsigned char, unsigned int, Uint_64, etc. Unsigned short (guaranteed to be at least 16 bit, equivalent to C++11's uint_least16_t type) was found to have the best performance in real-world testing due to the balance between memory contiguousness, memory waste and the restriction on skipfield update time complexity. Initially the design also fitted the use-case of gaming better (as games tend to utilize lower numbers of elements than some other fields), and that was the primary development field at the time.

  16. Why store a size member for each group when this can be obtained via reinterpret_cast<element_pointer_type>(skipfield) - elements?

    Because it's faster. While it can be obtained that way, having it precalculated gives a small but significant benefit. And it's only an additional 16 bits per group under 32-bit code (under 64-bit code there is typically no additional size increase, due to struct padding and the size of pointers).

  17. Can a colony be used with SIMD instructions?

    No and yes. Yes if you're careful, no if you're not.
    On platforms which support scatter and gather operations you can use colony with SIMD as much as you want, using gather to load elements from disparate or sequential locations, into a SIMD register. Then use scatter to push the post-SIMD-process values elsewhere after. The data() function is useful for this.
    On platforms which require elements to be contiguous in memory for SIMD processing, this is more complicated. When you have a bunch of erasures in a colony, there's no guarantee that your objects will be contiguous in memory, even though they are sequential during iteration. Some of them may also be in different memory blocks to each other. In these situations if you want to use SIMD with colony, you must do the following:

    Generally if you want to use SIMD without gather/scatter, it's probably preferable to use a vector or an array.



Version history

Contact: footer
plf:: library and this page Copyright (c) 2023, Matthew Bentley