Home

PLF C++ Library - plf::colony

Introduction

A colony is the highest-performance C++ template-based data container for high-modification scenarios with unordered data. Specifically it provides better performance than other std:: library containers when:

  1. Insertion order is unimportant,
  2. Insertions and erasures to the container are occurring frequently in realtime ie. in performance-critical code, and/or
  3. Pointers which point to non-erased container elements must not be invalidated by insertion or erasure.

While the benchmarks in the section below are a better area to get a feel for the performance benefits, the general speed characteristics are:

As explored further in the benchmarks there are some vector/deque modifications which can outperform colony during iteration while maintaining pointer validity, but at a cost to usability and memory usage. Colony's other advantages include the freeing and recycling of unused memory on-the-fly, the guaranteed stability of pointers/references/iterators to non-erased elements (which makes programming with containers of inter-related data structures faster and much easier), and broad cross-compiler support.

It can be reasonably compared to a "bag" or "bucket-array" styled structure, where each element location can be emptied and skipped over. Unlike a bucket array it does not use keys, iterates faster due to the use of a jump-counting skipfield instead of a boolean field, and frees or recycles unused memory from erased element locations on the fly. As it was initially developed predominantly for game development, colony favours larger-than-scalar-type (structs/classes of greater-than 128 bits total) performance over scalar-type (float, int, etc) performance.

Motivation

Note: initial motivation for the project came from video game engine development. Since this point it has become a more generalized container. Nevertheless, the initial motivation is presented below.
When working on video game engines we are predominantly dealing with collections of data where:

  1. Elements within data collections refer to elements within other data collections (through a variety of methods - indices, pointers, etc). An example is a game entity referring to both a texture object and collision blocks, as well as sound data. These references must stay valid throughout the course of the game/level. For this reason, any container (or use of a container) which causes pointer or index invalidation can create difficulties or necessitate workarounds.
  2. Order is unimportant for the most part. The majority of data collections are simply iterated over, transformed, referred to and utilized with no regard to order.
  3. Erasing or otherwise removing or deactivating objects occurs frequently in-game and in realtime (though often erasures will be implemented to occur at the end of a frame due to multithreading concerns). An example could be destroying a wall, or a game enemy. For this reason methods of erasure which create strong performance penalties are avoided.
  4. Creating new objects and adding them into the gameworld on-the-fly is also common - for example, a tree which drops leaves every so often, or a quadtree.
  5. We don't always know in advance how many elements there will be in a container at the beginning of development, or even at the beginning of a level during playback - an example of this being a MMORPG (massively multiplayer online role-playing game). In a MMORPG the number of game entities fluctuates based on how many players are playing, though there may be maximum capacity limits. Genericized game engines in particular have to adapt to considerably different user requirements and scopes. For this reason extensible containers which can expand and contract in realtime are usually necessary.
  6. Depending on the complexity and scope of any given game, we can be dealing with anywhere between 10 and 100000 objects in a given area. We are not typically dealing with the types of capacities necessitated by large-scale mathematical or statistical applications.
  7. For performance reasons, memory storage which is more-or-less contiguous is preferred. Lists, vectors of pointers to dynamically-allocated objects, and maps as implemented in the standard library are unusable.
  8. Memory wastage is avoided, and in particular, any container which allocates upon initialisation tends to be avoided as this can incur purposeless memory and performance costs.

To meet these requirements, game developers tend to either (a) develop their own custom containers for given scenario or (b) develop workarounds for the failings of std::vector. These workarounds are many and varied, but the most common are probably:

  1. Using a boolean flag (or similar) to indicate the inactivity of an object (as opposed to actually erasing from the vector). When erasing, one simply adjusts the boolean flag, and when iterating, items with the adjusted boolean flag are skipped. External elements refer to elements within the container via indexes rather than pointers (which can be invalidated upon insertion).

    Advantages: Fast erasure.

    Disadvantages: Slow to iterate due to branching.

  2. Utilizing a vector of data with a secondary vector of indexes. When erasing, the erasure occurs in the vector of indexes, not the vector of data, and when iterating, one iterates over the vector of indexes, then accessing the data from the vector of data via the index.

    Advantages: Faster iteration.

    Disadvantages: Erasure still incurs some reallocation cost, can increase jitter.

  3. Combining a swap-and-pop mechanism with some form of dereferenced lookup system to enable contiguous element iteration (sometimes known as a 'packed array', various other names). In this case when erasing we swap the back element from the vector of elements with the element being erased, then pop the (swapped) back element. When iterating over the data we simply iterate through the vector of elements. However to maintain valid external links to elements (which may be moved at any time) we must also maintain a stable vector of indexes (or similar) and update the index numbers corresponding with the element being erased and the element being moved. In this case external objects referring to elements within the container must store a pointer to the index for the element in question. There are many varied alternatives to this method. The method is more useful when the container's data is mostly being iterated over with fewer references to the individual elements from outside objects.

    Advantages: Iteration is at standard vector speed.

    Disadvantages: Erase could be slow if objects are large and swap cost is therefore large. Insertion cost is larger than other techniques. All references to elements incur additional costs due to a two-fold reference mechanism.


All three techniques have the disadvantage of slow singular insertions, and the first two will also continually expand memory usage when erasing and inserting over periods of time. The third deals better with this scenario as it swaps from the back rather than leaving gaps in the elements vector, however will suffer in performance if elements within the container are heavily referred to by external objects/elements.

Colony is an attempt to bring a more generic solution to this domain. It has the advantage of good iteration speed while maintaining a similar erasure speed to the boolean technique described above, and without causing pointer invalidation upon either erasure or insertion. It's insertion speed is also much faster than a vector's. Memory from erased elements is either reused by subsequent insertions or released to the OS on-the-fly. It achieves these ends by via a number of new approaches: a jump-counting skipfield (instead of a boolean field), a linked chain of increasingly-large memory blocks (instead of a singular memory block or vector of blocks), and a custom internal reduced stack (based on plf::stack) to enable erased memory location re-use.

More data on the performance characteristics of the colony versus other containers and workarounds can be found in the benchmarks section, read below to gain an understanding of the mechanics.

Implementation

plf::colony uses a chained-group memory allocation pattern with a growth factor of 2, (doubly-linked chains of element "groups" containing memory blocks with additional structure metadata, including in this case per-memory-block jump-counting skipfields). A growth factor of 2 tends gives the best performance in a majority of scenarios (hence why it is used for the majority of vector implementations). Colony's minimum and maximum memory block sizes can also be adjusted to suit individual use-cases. Using a multiple-memory-block approach removes the necessity for data reallocation upon insertion, and because data is not reallocated, all references/pointers/iterators to container elements stay valid post-insertion.

Due to a std::vector being the most widely-used and commonly-understood of the std:: library containers, we here contrast the storage mechanisms of a colony with that of a vector:

Visual demonstration of inserting to a full vector Visual demonstration of inserting to a full colony

The lack of reallocation is also why insertion into a colony is faster than insertion into a std::vector. Now we look at erasure. When an element is erased in a std::vector, the following happens:

Visual demonstration of randomly erasing from a vector

When an element is erased in a colony, this happens:

Visual demonstration of randomly erasing from a colony

This is why a colony has a performance advantage over std::vector when erasing.

Upon subsequent insertions to a colony post-erasure, colony will check to see if it's erased-location stack is empty. If it is empty, it inserts to the end of the colony, creating a new group if the last group is full. If it is not empty, it pops an erased element location off the stack and reuses that memory address for the newly-inserted element. If you erase all elements in any given group in a colony, the group is removed from the colony's chain and released to the OS - at that point any erased element locations present in the erased-location stack are also removed.

License

plf::colony is under a permissive zlib license. This means: Software is used on 'as-is' basis. It can be modified and used in commercial software. Authors are not liable for any damages arising from its use. The distribution of a modified version of the software is subject to the following restrictions:

Download

Download here (31kb zip file) or view the repository

The colony library is a simple .h header file, to be included with a #include command.

In addition if you are interested in benchmarking you can also download the plf benchmark suite (44kb zip file), which includes plf::nanotimer.

Function Descriptions and syntax

Colony meets the requirements of the C++ Container, AllocatorAwareContainer, and ReversibleContainer concepts.

For the most part the syntax and semantics of colony functions are very similar to all std:: c++ libraries. Formal description is as follows:

template <class T, class Allocator = std::allocator<T>, typename Skipfield_Type = unsigned short> class colony

T - the element type. In general T must meet the requirements of Erasable, CopyAssignable and CopyConstructible.
However, if emplace is utilized to insert elements into the colony, and no functions which involve copying or moving are utilized, T is only required to meet the requirements of Erasable.
If move-insert is utilized instead of emplace, T must also meet the requirements of MoveConstructible .

Allocator - an allocator that is used to acquire memory to store the elements. The type must meet the requirements of Allocator. The behavior is undefined if Allocator::value_type is not the same as T.

T_skipfield_type - an unsigned integer type. This type is used to form the skipfield which skips over erased T elements. The maximum size of element memory blocks is constrained by this type's bit-depth (due to the nature of a jump-counting skipfield). As an example, unsigned short on most platforms is 16-bit which thereby constrains the size of individual memory blocks to a maximum of 65535 elements. unsigned short has been found to be the optimal type for performance based on benchmarking. However this template parameter is important for a variety of scenarios relating to performance or memory usage. In the case of small collections (eg. < 512 elements) in a memory-constrained environment, it is useful to reduce the memory usage of the skipfield by reducing the skipfield bit depth to a Uint8 type. In addition the reduced skipfield size will reduce cache saturation in this case without impacting iteration speed due to the low element total. In the case of very large collections (millions) and where memory usage is not a concern, changing the skipfield bitdepth to a larger type leads to slightly increased iteration performance (offset by decreased erasure and insertion performance) due to the larger memory block sizes made possible by the larger bit depth. Since these scenarios are on a per-case basis, it is best to leave the control in the user's hands.

Basic example of usage

#include <iostream>
#include "plf_colony.h"

int main(int argc, char **argv)
{
  plf::colony<int> i_colony;

  // Insert 100 ints:
  for (int i = 0; i != 100; ++i)
  {
    i_colony.insert(i);
  }

  // Erase half of them:
  for (plf::colony<int>::iterator it = i_colony.begin(); it != i_colony.end(); ++it)
  {
    it = i_colony.erase(it);
  }

  // Total the remaining ints:
  int total = 0;

  for (plf::colony<int>::iterator it = i_colony.begin(); it != i_colony.end(); ++it)
  {
    total += *it;
  }

  std::cout << "Total: " << total << std::endl;
  std::cin.get();
  return 0;
} 

Example demonstrating pointer stability

#include <iostream>
#include "plf_colony.h"

int main(int argc, char **argv)
{
  plf::colony<int> i_colony;
  plf::colony<int>::iterator it;
  plf::colony<int *> p_colony;
  plf::colony<int *>::iterator p_it;

  // Insert 100 ints to i_colony and pointers to those ints to p_colony:
  for (int i = 0; i != 100; ++i)
  {
    it = i_colony.insert(i);
    p_colony.insert(&(*it));
  }

  // Erase half of the ints:
  for (it = i_colony.begin(); it != i_colony.end(); ++it)
  {
    it = i_colony.erase(it);
  }

  // Erase half of the int pointers:
  for (p_it = p_colony.begin(); p_it != p_colony.end(); ++p_it)
  {
    p_it = p_colony.erase(p_it);
  }

  // Total the remaining ints via the pointer colony (pointers will still be valid even after insertions and erasures):
  int total = 0;

  for (p_it = p_colony.begin(); p_it != p_colony.end(); ++p_it)
  {
    total += *(*p_it);
  }

  std::cout << "Total: " << total << std::endl;
  
  if (total == 2500)
  {
    std::cout << "Pointers still valid!" << std::endl;
  }
  
  std::cin.get();
  return 0;
} 

Iterator Invalidation

All read-only operations, swap, std::swap Never
clear, sort, reinitialize, operator = Always
reserve, shrink_to_fit Only if capacity is changed
change_group_sizes, change_minimum_group_size, change_maximum_group_size Only if supplied minimum group size is larger than smallest group in colony, or supplied maximum group size is smaller than largest group in colony.
erase Only for the erased element. If an iterator is == end() it may be invalidated if the last element in the colony is erased, in some cases. If a reverse_iterator is == rend() it may be invalidated if the first element in the colony is erased, in some cases.
insert, emplace If an iterator is == end() it may be invalidated by a subsequent insert/emplace, in some cases.

Member types

Member type Definition
value_type T
allocator_type Allocator
skipfield_type T_skipfield_type
size_type Allocator::size_type (pre-c++11)
std::allocator_traits<Allocator>::size_type (post-c++11)
difference_type Allocator::difference_type (pre-c++11)
std::allocator_traits<Allocator>::difference_type (post-c++11)
reference Allocator::reference (pre-c++11)
value_type & (post-c++11)
const_reference Allocator::const_reference (pre-c++11)
const value_type & (post-c++11)
pointer Allocator::pointer (pre-c++11)
std::allocator_traits<Allocator>::pointer (post-c++11)
const_pointer Allocator::const_pointer (pre-c++11)
std::allocator_traits<Allocator>::const_pointer (post-c++11)
iterator BidirectionalIterator
const_iterator Constant BidirectionalIterator
reverse_iterator BidirectionalIterator
const_reverse_iterator Constant BidirectionalIterator

Constructors

colony() explicit colony(const allocator_type &alloc)
fill explicit colony(size_type n, Skipfield_type min_group_size = 8, Skipfield_type max_group_size = std::numeric_limits<Skipfield_type>::max(), const allocator_type &alloc = allocator_type())
explicit colony(size_type n, const value_type &element, Skipfield_type min_group_size = 8, Skipfield_type max_group_size = std::numeric_limits<Skipfield_type>::max(), const allocator_type &alloc = allocator_type())
range template<typename InputIterator> colony(const InputIterator &first, const InputIterator &last, Skipfield_type min_group_size = 8, Skipfield_type max_group_size = std::numeric_limits<Skipfield_type>::max(), const allocator_type &alloc = allocator_type())
copy colony(const colony &source)
move colony(colony &&source) noexcept (C++11 and upwards)
initializer list colony(const std::initializer_list<value_type> &element_list, Skipfield_type min_group_size = 8, Skipfield_type max_group_size = std::numeric_limits<Skipfield_type>::max(), const allocator_type &alloc = allocator_type())
Some constructor usage examples

Iterators

All iterators are bidirectional but also provide >, <, >= and <= for convenience (for example, in for loops). Functions for iterator, reverse_iterator, const_iterator and const_reverse_iterator follow:

operator *
operator ->
operator ++
operator --
operator =
operator ==
operator !=
operator <
operator >
operator <=
operator >=
base() (reverse_iterator and const_reverse_iterator only)

All operators have O(1) amortised time-complexity. Originally there were +=, -=, + and - operators, however the time complexity of these varied from O(n) to O(1) depending on the underlying state of the colony, averaging in at O(log n). As such they were not includable in the iterator functions (as per C++ standards). These have been transplanted to colony's advance(), next(), prev() and distance() member functions. Greater-than/lesser-than operator usage indicates whether an iterator is higher/lower in position compared to another iterator in the same colony (ie. closer to the end/beginning of the colony). const_iterator is provided for both C++03 and C++11/14/etc compilers.

Member functions

Insert

single element iterator insert (const value_type &val)
fill iterator insert (const size_type n, const value_type &val)
range template <class InputIterator> iterator insert (const InputIterator first, const InputIterator last)
move iterator insert (value_type&& val) (C++11 and upwards)
initializer list iterator insert (const std::initializer_list<value_type> il)

Erase

single element iterator erase(const_iterator it)
range void erase(const_iterator first, const_iterator last)

Other functions

Non-member functions

Benchmarks

Benchmark results for colony under GCC 7.1 x64 on an Intel Xeon E3-1241 (Haswell) are here.

Older benchmark results for colony under GCC 5.1 x64 on an Intel E8500 (Core2) are here.

Older benchmark results for colony under MSVC 2015 update 3, on an Intel Xeon E3-1241 (Haswell) are here. There is no commentary for the MSVC results.

Frequently Asked Questions

  1. TLDR, what is a colony?

    A combination of a linked-list of increasingly-large memory blocks with metadata, combined with a jump-counting skipfield, combined with an erased-element memory-location stack for later reinsertions, resulting in a std::-styled C++ template data container with positive performance characteristics while maintaining pointer stability to container elements.

  2. Where is it worth using a colony in place of other std:: containers?

    As mentioned, it is worthwhile for performance reasons in situations where the order of container elements is not important and:

    1. Insertion order is unimportant
    2. Insertions and erasures to the container occur frequently in performance-critical code, and
    3. Links to non-erased container elements may not be invalidated by insertion or erasure.

    Under these circumstances a colony will generally out-perform other std:: containers. In addition, because it never invalidates pointer references to container elements (except when the element being pointed to has been previously erased) it may make many programming tasks involving inter-relating structures in an object-oriented or modular environment much faster, and could be considered in those circumstances.

  3. What are some examples of situations where a colony might improve performance?

    Some ideal situations to use a colony: cellular/atomic simulation, persistent octtrees/quadtrees, game entities or destructible-objects in a video game, particle physics, anywhere where objects are being created and destroyed continuously. Also, anywhere where a vector of pointers to dynamically-allocated objects or a std::list would typically end up being used in order to preserve object references but where order is unimportant.

  4. What situations should you explicitly not use a colony for?

    A colony should not be used as a stack, ie. erasing backwards from the back, and then filling, then erasing from the back, etcetera. In this case you should use a stack-capable container ie. plf::stack, std::vector or std::deque. The reason is that erasing backwards sequentially creates the greatest time complexity for skipfield updates, as does reinserting to the start of a sequentially-erased skipblock (which is what stack usage will entail). This effect is mitigated somewhat if an entire group is erased, in which case it is released to the OS and subsequent insertions will simply be pushing to back without the need to update a skipfield, but you'd still incur the skipfield update cost during erasure. Although in practice the performance difference is small due to the low cost of the operations and the small size of the skipfield, in general you should avoid erasing sequentially during reverse iteration except where neccessary, as the jump-counting skipfield format is optimized for either random erasure or sequential erasure while iterating forwards.

  5. What are the time complexities for general operations?

    Insert (single): O(1) amortised unless prior erasures have occurred in theusage lifetime of the colony instance. If prior erasures have occurred,updating the skipfield may require a memmove operation, which creates avariable time complexity depending on the range of skipfield needing to becopied (though in practice this will resolve to a singular raw memory blockcopy in most scenarios). This is O(random) with the range of the random numberbeing between O(1) and O(std::numeric_limits<skipfield_type>::max() - 2).Average time complexity varies based on erasure pattern, but with a randomerasure pattern it's closer to O(1) amortized.

    Insert (multiple): O(N) unless prior erasures have occurred. See Insertion(single) for rules in this case.

    Erase (single): If erasures to elements consecutive with the element being erased have not occurred, or only consecutive erasures before the element being erased have occurred, O(1) amortised. If consecutive erasures after the element being erased have occurred, updating of the skipfield requires a memmove operation or vectorized update of O(n) complexity, where n is the number of consecutive erased elements after the element being erased. This is O(random) with the range of the random number being between from 1 and std::numeric_limits<skipfield_type>::max() - 2. Average time complexity varies based on erasure pattern, but with a random erasure pattern it's closer to O(1) amortized.

    Erase (multiple): ~O(log N).

    std::find: O(n)

    Iterator operations:
    ++ and -- : O(1) amortized
    begin()/end(): O(1)
    advance/next/prev: between O(1) and O(n), depending on current location, end location and state of colony. Average ~O(log N).

  6. If the time complexities of the insert/erase functions are (mostly) O(random, ranged), why are they still fast?

    The time complexities are largely based on the skipfield updates necessary for the jump-counting skipfield pattern. The skipfield for each group is contiguous and separate from the skipfields for other groups, and so fits into the cache easily (unless the skipfield type is large), thus any changes to it can occur quickly - time complexity is no indicator of performance on a modern CPU for anything less than very large amounts of N (or when the type of N is large). The colony implementation uses memmove to modify the skipfield instead of iterative updates for all but one of the insert/erase operations, which decreases performance cost. memmove will typically be implemented as a single raw memory chunk copy. There is one rarer case in erase which does not use memmove, when an element is erased and is surrounded on both sides by consecutive erased elements. In this case it isn't possible to update the skipfield using memmove because the requisite numbers do not exist in the skipfield and therefore cannot be copied, so it is implemented as a vectorized iterative update instead. Again, due to a low amount of branching and the small size of each skipfield the time taken for this update is still small.

  7. Is it similar to a deque?

    A deque is reasonably dissimilar to a colony - being a double-ended queue, it requires a different internal framework. It typically uses a vector of memory blocks, whereas the colony implementation uses a linked list of memory blocks, essentially. A deque can't technically use a linked list of memory blocks because it will make some random_access iterator operations (eg. + operator) non-O(1). In addition, being a double-ended queue makes having a growth factor for memory blocks problematic because the rules for growth at each end of the queue become difficult to implement in a way which increase performance for all scenarios without memory bloat.

    A deque and colony have no comparable performance characteristics except for insertion (assuming a good deque implementation). Deque erasure performance varies wildly depending on the implementation compared to std::vector, but is generally similar to vector erasure performance. A deque invalidates pointers to subsequent container elements when erasing elements, which a colony does not.

  8. What are the thread-safe guarantees?

    Unlike a std::vector, a colony can be read from and written to at the same time (assuming different locations for read and write), however it cannot be iterated over and written to at the same time. If we look at a (non-concurrent implementation of) std::vector's threadsafe matrix to see which basic operations can occur at the same time, it reads as follows (please note push_back() is the same as insertion in this regard):

    std::vector Insertion Erasure Iteration Read
    Insertion No No No No
    Erasure No No No No
    Iteration No No Yes Yes
    Read No No Yes Yes

    In other words, multiple reads and iterations over iterators can happen simultaneously, but the potential reallocation and pointer/iterator invalidation caused by insertion/push_back and erasure means those operations cannot occur at the same time as anything else.

    Colony on the other hand does not invalidate pointers/iterators to non-erased elements during insertion and erasure, resulting in the following matrix:

    plf::colony Insertion Erasure Iteration Read
    Insertion No No No Yes
    Erasure No No No Mostly*
    Iteration No No Yes Yes
    Read Yes Mostly* Yes Yes

    * Erasures will not invalidate iterators unless the iterator points to the erased element.

    In other words, reads may occur at the same time as insertions and erasures (provided that the element being erased is not the element being read), multiple reads and iterations may occur at the same time, but iterations may not occur at the same time as an erasure or insertion, as either of these may change the state of the skipfield which's being iterated over. Note that iterators pointing to end() may be invalidated by insertion.

    So, colony could be considered more inherently threadsafe than a (non-concurrent implementation of) std::vector, but still has some areas which would require mutexes or atomics to navigate in a multithreaded environment.

  9. Any pitfalls to watch out for?

    1. Because erased-element memory locations will be reused by insert() and emplace(), insertion position is essentially random unless no erasures have been made, or an equal number of erasures and insertions have been made.
    2. For architectural reasons (empty groups cannot be present), reserve can only reserve a number of elements up to the maximum bit-depth of the skipfield type.
    3. Due to implementation reasons it is possible to assign (or construct) a iterator from a const_iterator, which is not really good const behaviour (only assigning/constructing a const_iterator from an iterator is good const practice). However preventing this results in errors in all versions of MSVC, so it has been ignored.
  10. Am I better off storing iterators or pointers to colony elements?

    Testing so far indicates that storing pointers and then using get_iterator_from_pointer() when or if you need to do an erase operation on the element being pointed to, yields better performance than storing iterators and performing erase directly on the iterator. This is simply due to the size of iterators (3 pointers)

  11. Any special-case uses?

    In the special case where many, many elements are being continually erased/inserted in realtime, you might want to experiment with limiting the size of your internal memory groups in the constructor. The form of this is as follows:
    plf::vector<object> a_vector;
    a_vector.change_group_sizes(500, 5000);

    where the first number is the minimum size of the internal memory groups and the second is the maximum size. Note these can be the same size, resulting in an unchanging group size for the lifetime of the colony (unless change_group_sizes is called again or operator = is called).

    One reason to do this is that it is slightly slower to pop an element location off the internal erased-location-recycling stack, than it is to insert a new element to the end of the colony (the default behaviour when there are no previously-erased elements). If there are any erased elements in the colony, the colony will recycle those memory locations, unless the entire group is empty, at which point it is freed to memory. So if a group size is large and many, many erasures occur but the group is not completely emptied, (a) the number of erased element locations in the recycling stack could get large and increase memory usage and (b) iteration performance may suffer due to large memory gaps between any two non-erased elements. In that scenario you may want to exeriment with benchmarking and limiting the minimum/maximum sizes of the groups, and find the optimal size for a specific use case.

    Please note that the the fill, range and initializer-list constructors can also take group size parameters, making it possible to construct filled colonies using custom group sizes.

  12. What is colony's Abstract Data Type (ADT)?

    Though I am happy to be proven wrong I suspect colony is it's own abstract data type. While it is similar to a multiset or bag, those utilize key values and are not sortable (by means other than automatically by key value). Colony does not utilize key values, is sortable, and does not provide the sort of functionality one would find in a bag (eg. counting the number of times a specific value occurs). Some have suggested simiarities to deque - but as described earlier the three core aspects of colony are:

    1. A multiple-memory-block based allocation pattern which allows for the removal of memory blocks when they become empty of elements.
    2. A skipfield to indicate erasures instead of reallocating elements, the iteration of which should typically not necessitate the use of branching code.
    3. A mechanism for recording erased element locations to allow for reuse of erased element memory space upon subsequent insertion.

    The only aspect out of these which deque also shares is a multiple-memory-block allocation pattern - not a strong association. As a result, deques do not guarantee pointer validity to non-erased elements post insertion or erasure, as colony does. Similarly if we look at a multiset, an unordered one could be implemented on top of a colony by utilizing a hash table (and would in fact be more efficient than most non-flat implementations) but the fact that there is a necessity to add something to make it a multiset (to take and store key values) means colony is not an multiset.

  13. Exception Guarantees?

    All operations which allocate memory have strong exception guarantees and will roll back if an exception occurs, except for operator = which has a basic exception guarantee (see below). For colony, iterators are bounded by asserts in debug mode, but unbounded in release mode, so it is possible for an incorrect use of an iterator (iterating past end(), for example) to trigger an out-of-bounds memory exception. These are the only two areas where exceptions can occur.

    The reason why operator = only has a basic guarantee is they do not utilize the copy-swap idiom, as the copy-swap idiom significantly increases the chance of any exception occurring - this is because the most common way an exception tends to occur during a copy operation is due to a lack of memory space, and the copy-swap idiom doubles the memory requirement for a copy operation by constructing the copy before destructing the original data. This is doubly innappropriate in game development, which colony has been initially for, where memory constraints are often critical and the runtime lag from memory spikes can cause detrimental game performance. So in the case of colony if a non-memory-allocation-based exception occurs during copy, the = operators will have already destructed their data, so the containers will be empty and cannot roll back - hence they have a basic guarantee, not a strong guarantee.

  14. Iterators not following rule of zero?

    It was found under GCC and clang that the high-modification scenario benchmarks suffered a 11% overall performance loss when iterator copy and move constructors/operators where not explicitly defined, possibly because GCC did not know to vectorise the operations without explicit code or because the compiler implemented copy/swap idiom instead of simply copying directly. No performance gain was found under any tests when removing the copy and move constructors/operators.

  15. Why must groups be removed when empty?

    Two reasons:

    1. Standards compliance: if groups aren't removed then ++ and -- iterator operations become O(random) in terms of time complexity, making them non-compliant with the C++ standard. At the moment they are O(1) amortised, typically one update for both skipfield and element pointers, but two if a skipfield jump takes the iterator beyond the bounds of the current group and into the next group. But if empty groups are allowed, there could be anywhere between 1 and size_type empty groups between the current element and the next. Essentially you get the same scenario as you do when iterating over a boolean skipfield. While it is possible to move these to the back of the colony as trailing groups, and remove their entries from the erased locations stack, this may create performance issues if any of the groups are not at their maximum size (see below).
    2. Performance: iterating over empty groups is slower than them not being present, cache wise - but also if you have to allow for empty groups while iterating, then you have to include a while loop in every iteration operation, which increases cache misses and code size. The strategy of removing groups when they become empty also removes (assuming randomized erasure) smaller groups from the colony before larger groups, which has a net result of improving iteration (as with a larger group, more iterations within the group can occur before the end-of-group condition is reached and a transfer to the next group (and subsequent cache miss) occurs). Lastly, pushing to the back of a colony is faster than recycling memory locations as each insertion occurs to a similar memory location and less work is necessary. When a group is removed or moved to the back of the group chain (past end()), it's recyclable memory locations are also removed from memory, hence subsequent insertions are more likely to be pushed to the back of the colony.
  16. Why use a stack instead of a free list for erased location storage?

    Two reasons:

    1. The most obvious is that unions in C++ (11 or otherwise) do not support object unions in a way that preserves fundamental functionality of the object being unioned (n4567, 9.5/2). Copy/move assignment/constructors, constructors and destructors would be lost. This makes this approach untenable for non-trivial types.
    2. Colony elements are not guaranteed to be of sufficient size (32-bit on x86, 64-bit or greater on other architectures) to store a pointer. For smaller types and 64-bit pointers, a free list union would end up wasting more space than a 16-bit skipfield. While the use-cases in games indicate larger-than-scalar types, there has been some interest and indication of usefulness from the high-performance-computing and storage domains, and I cannot predict their data patterns or usage.
    3. A free list will likely incur a significant number of cache misses and subsequent jitter when a colony memory block is freed to the OS. With a stack the erased locations are contiguous memory blocks, so it is extremely fast and cache-friendly to process the stack and remove erased locations belonging to the memory block being freed. On the other hand with a free list, removing the erased locations involves iterating through the list - which will most likely skip randomly throughout the colony blocks, involving many cache misses - in order to locate the nodes belonging to the erased memory block and joining previous nodes to the next non-erased-block locations.

    In short, it's probably not an effective solution in this case.

  17. Why use three pointers for iterators rather than a group pointer, single index and then dereferencing?

    It's faster. In attempting on two separate occasions to switch to an index-based iterator approach, utilizing three separate pointers was notably faster.

  18. Group sizes - what are they based on, how do they expand, etc

    Group sizes start from the either the default minimum size (8 elements, larger if the type stored is small) or an amount defined by the programmer (with a minimum of 3 elements). Subsequent group sizes then increase the total capacity of the colony by a factor of 2 (so, 1st group 8 elements, 2nd 8 elements, 3rd 16 elements, 4th 32 elements etcetera) until the maximum group size is reached. The default maximum group size is the maximum possible number that the skipfield bitdepth is capable of representing (std::numeric_limits<skipfield_type>::max()). By default the skipfield bitdepth is 16 so the maximum size of a group is 65535 elements. However the skipfield bitdepth is also a template parameter which can be set to any unsigned integer - unsigned char, unsigned int, Uint_64, etc. Unsigned short (guaranteed to be at least 16 bit, equivalent to C++11's uint_least16_t type) was found to have the best performance in real-world testing due to the balance between memory contiguousness, memory waste and the restriction on skipfield update time complexity. Initially the design also fitted the use-case of gaming better (as games tend to utilize lower numbers of elements than some other fields), and that was the primary development field at the time.

  19. Why store a size member for each group when this can be obtained via reinterpret_cast<element_pointer_type>(skipfield) - elements?

    Because it's faster. While it can be obtained that way, having it precalculated gives a small but significant benefit. And it's only an additional 16 bits per group in most cases.



Version history

Contact: footer
plf:: library and this page Copyright (c) 2017, Matthew Bentley