PLF C++ Library - plf::indiesort

Intro
Motivation
Implementation
License
Download
Function syntax
Benchmarks
Version History
Contact

Introduction

Indiesort (portmanteau of 'indirect sort') is a sort-function wrapper which enables use of std::sort (or other random-access sort techniques) with non-random-access containers, and also improves sort performance for large/non-trivially-copyable types in random-access containers and arrays. It achieves this by sorting pointers/indexes to elements by the value of the elements they point to, then relocating the elements, rather than sorting the elements themselves. Elements will be moved a maximum of two times - most will be moved once or not at all.

It has a temporary memory cost of N * (sizeof(pointer) + sizeof(size_t)) for sorting non-random-access iterators/containers (or N * sizeof(element) when sizeof(element) is smaller than (sizeof(pointer) + sizeof(size_t))),
and a N * sizeof(U) cost for random-access iterators/containers, where U = the smallest unsigned integer able to store N. For example if the size of the range being sorted is <= 255, U will be unsigned char.

Similar techniques are implemented by Morwenn's Mountain sort, and elsewhere. plf::list employs a similar technique in it's sort() techniques, but is able to do it much faster for larger-than-scalar types due to the structure of a linked list and only having to change pointers, not move elements. However when using plf::list with scalar types smaller than 8 bytes, you will get a 15% on average (up to 70%) performance increase using indiesort - at the cost of losing the validity of iterators/pointers pointing to the plf::list elements (as, unlike plf::list::sort(), indiesort changes element locations). So if you're using plf::list on larger-than-scalar types, don't use indiesort with it, just use it's own sort() technique.

Indiesort should be used when:

The temporary memory cost mentioned is non-problematic,
The container or iterators are not random_access and therefore std::sort cannot be used, and/or
The element type is large or non-trivially-movable/copyable.

It is, on average across 1, 2, 4, 48 and 490 bytes, and numbers of elements ranging from 10 to 1000000:

+250% faster using than a bidirectional-compatible introsort algorithm with plf::colony (using algorithm by Stephane Brumme).
+146% faster than std::sort when used on vectors or arrays of large structs (496 bytes). Crossover point for increased performance over std::sort is any type larger than 152 bytes.
+28% faster than std::list's internal sort, on types smaller than a large struct. Crossover point for increased performance over std::list's internal sort() is any type smaller than 272 bytes.

See the benchmarks for more information.

Motivation

This technique developed out of the plf::list project, where it was used with the linked list to create a much faster sort than was available in std::list. From there it was ported into plf::colony, which, while having a large memory cost compared to plf::list's sort, was still useful. On Jens Maurer's suggestion, I separated the sort technique from colony, and in the process, streamlined and reduced the memory cost. It can now be used with any iterator/pointer range or container. Note: plf::list's internal sort is still superior to indiesort for a linked list, as it changes the previous and next pointers of each node rather than relocating elements, whereas indiesort will always reallocate elements.

Implementation

As of v1.3, the algorithm changes for non-random-access containers/iterators if sizeof(element) <= 2 * sizeof(element *) and the element is trivially-copyable or move-assignable - in this case, it will simply copy the elements to a C-array and sort the array, then copy the elements back. This becomes faster when the type size is small as there is less data to sort. In all other cases, the original method below is used.

As of v1.4, for random-access containers/iterators the algorithm will switch to directly sorting elements via std::sort when the sizeof(element) <= 152 (the benchmarked threshold for improved performance via indirect sorting) and the type is trivially-copyable or move-assignable. Otherwise the original method below is used.

The algorithm consists of gather, sort and scatter phases:

Gather pointers:
1. Obtain the size N of the range to be sorted, either via a container's size() function, last - first for random-access iterator ranges, or ++ iteration for non-random-access iterator ranges. If size is < 2 we return.
2. Construct an array of pointer+size_t pair structs of size N. Fill the array with pointers to elements in the range + the corresponding index of each element within that range. This is the sort_array. It's pointer member is called original_location, while it's size_t member is called original_index.
  Note: For random_access iterators, original_location can be obtained from first + original_index, and can be omitted, with the size_t reduced to the smallest unsigned integer type whose std::numeric_limits::max() is larger than N. This significantly reduces memory usage in those scenarios.
Sort the sort_array via the values of the elements which the original_location members point to, by default using std::sort. This is done via use of a sort_dereferencer functor of a type similar to that used in plf::list's pointer-sort hack section.
Scatter the elements to their new locations: to do this we traverse the sort_array using index i in the loop below:
1. If the end of the array has been reached at this point we exit the loop. If sort_array[i].original_index == i this means element position is unchanged post-sort (or that it has already been sorted as part of the move-chain below), so we increment i and repeat this step.
2. Otherwise we move the element which this pair's original_location points to, into a temporary variable of the same element type called end_value. We then store i as destination_index and sort_array[i].original_index as source_index.
3. We move the element pointed to by sort_array[source_index].original_location, to the location pointed to by sort_array[destination_index].original_location. We then set destination_index to the value of source_index. and set source_index to sort_array[destination_index].original_index. Then set sort_array[destination_index].original_index to destination_index (ie. indicating that the element pointed to has been sorted so it can be skipped in step (a)). At this point, if source_index == i, that means the end of this move-chain has been reached, so we move the element from end_value to the location pointed to by sort_array[destination_index].original_location, increment i, and return to step (a). Otherwise we repeat this step.
Once the loop is over, we destroy and free the sort_array.

License

plf::indiesort is under a permissive zlib license. This means: Software is used on 'as-is' basis. It can be modified and used in commercial software. Authors are not liable for any damages arising from its use. The distribution of a modified version of the software is subject to the following restrictions:

The authorship of the original software must not be misrepresented,
Altered source versions must not be misrepresented as being the original software, and
The license notice must not be removed from source distributions.

Download

Download here (6kb zip file) or view the repository

The indiesort function library is a simple .h header file, to be included with a #include command.
In addition if you are interested in benchmarking you can also download the plf benchmark suite (64kb zip file).

Function Descriptions and syntax

template <class iterator_type, class sort_comparison> void indiesort (iterator_type first, iterator_type last, sort_comparison compare)
Sorts range using std::sort internally.
template <class iterator_type> void indiesort (iterator_type first, iterator_type last)
Sorts range using std::sort with a less-than comparison internally.
template <class container_type, class sort_comparison> void indiesort (container_type &container, sort_comparison compare)
Sorts container using std::sort internally.
template <class container_type> void indiesort (container_type &container)
Sorts container using std::sort and a less-than comparison.
template <class iterator_type, class sort_comparison> void indiesort (iterator_type first, iterator_type last, sort_comparison compare, std::size_t size)
Sorts range using std::sort internally. If the size of the range is known in advance and the iterators are not random_access, use this variant for better performance. If the size of the range isn't known, use the range-based functions above. If the range is begin() - end() of a container, use one of the container-based functions above. If the iterators are random_access, the range-based functions above will figure out the range from last - first.

Note: in the case of non-contiguous random-access containers, such as std::deque, use the function above as it will be much faster than the others. Unfortunately it will have the same memory cost as calling indiesort on a non-random-access containers.

To use a sort function other than std::sort, #define PLF_SORT_FUNCTION to the name of your sort function before calling plf::indiesort. Whatever sort function you use must take arguments in the same format and order as std::sort. If PLF_SORT_FUNCTION is not defined STL algorithm.h will be included and std::sort will be used instead. To avoid the plf_indiesort.h pulling in algorithm.h PLF_SORT_FUNCTION must be defined before including plf_indiesort.h.

Basic code example

#include <functional> // std::greater
#include <vector>
#include <iostream>

#include "plf_indiesort.h"


int main()
{
	std::vector<int> vec;

	for (int i = 0; i != 60; ++i)
	{
		vec.push_back(60 - i);
	}

	plf::indiesort(vec);

	int last = 0;

	for (std::vector<int>::iterator it = vec.begin(); it != vec.end(); ++it)
	{
		if (last > *it)
		{
			std::cout << "Less-than full-container sort failed\n";
			return -1;
		}

		last = *it;
	}

	std::cout << "Less-than full-container sort passed\n";


	std::vector<int>::iterator it1 = vec.begin() + 5, it2 = vec.begin() + 50;

	plf::indiesort(it1, it2, std::greater<int>());


	last = 60;

	for (std::vector<int>::iterator it = it1; it != it2; ++it)
	{
		if (last < *it)
		{
			std::cout << "Greater-than range sort failed\n";
			return -1;
		}

		last = *it;
	}

	std::cout << "Greater-than range sort passed\n";

        return 0;
}

Benchmarks

Benchmark results for indiesort v1.02 under GCC 9.2 x64 on an Intel Xeon E3-1241 (Haswell core) are here.

Version history

2025-07-11: v1.4.4 - Correction to type traits as C++26 deprecates std::is_trivial.
2024-12-07: v1.4.3 - Minor optimization to sort when type is small and trivial.
2024-04-10: v1.4.2 - Fix to sort optimisation when random access container and small number of elements. Change to vcpkg-compatible versioning. plf::make_move_iterator changed to be static. plf::equal_to uses references now and is entirely noexcept.
2023-08-24: v1.41 - Revert to random_access container detection instead of contiguous, as the latter does not work under GCC11 in my testing. Change detection for cheaper non-random-access sort routine to match trivial/movable types only.
2023-08-20: v1.40 - Correction to contiguous container detection. If contiguous container detected, will switch to direct std::sort if element type sizeof is < 152 (benchmarked threshold for increased performance of indiesort) and type is trivially_copyable or movable. Correction to method switching for non-random-access containers.
2023-08-10: v1.30 - Non-contiguous random-access containers such as deque now automatically use the pointer-based algorithm (faster for non-contiguous containers). If element type is <= element pointer type for non-random-access sorting, we use the method of simply copying the container contents to a flat array, sorting it there, then copying back to the container (faster than original method if size is small).
2023-07-05: v1.23 - Bypass buggy MSVC /W4 warning, comment corrections.
2023-07-01: v1.22 - Correction to header inclusion.
2023-06-23: v1.21 - Improvement for support on older compilers.
2023-06-06: v1.20 - update to shared plf tools, no impact on algorithm.
2023-02-03: v1.19 - More accurate compiler/library detection for C++20 support, C++20 support disabled for apple clang as it is not as up to date as non-apple clang.
2023-01-21: v1.18 - Added missing macro undef.
2022-10-02: v1.17 - Removed plf::rand dependency from test suite.
2022-09-03: v1.16 - More tooling functions made public for access across multiple plf:: libraries.
2022-08-27: v1.15 - Correction/reduction to compiler detection routines. Some tooling functions made public for access across multiple plf:: libraries.
2021-05-26: v1.14 - Correction to compiler detection routines for clang/gcc running in MSVC environment.
2021-02-07: v1.13 - Correction to compiler defines.
2021-01-13: v1.12 - Reduction of compiler feature macros for reasons of readability, code size and convenience of updating between the different headers. Hopefully, PLF_ by itself will be enough to disambiguate these from similarly-named macros. Improvement of a compiler rule.
2021-01-11: v1.11 - Corrections to compiler feature detection.
2021-01-09: v1.10 - Correction to compiler defines regression, updated compiler defines for clang.
2021-01-03: v1.09 - plf::pcg_rand renamed to plf::rand and modified to enable better support for C++03 and lower compilers.
2020-12-26: v1.08 - Corrections to compiler feature detection code. Test suite now uses plf::pcg_rand instead of xor_rand.
2020-9-17: v1.07 - Removal of destructor calls since all element locations inevitably get written to in the container/array so any necessary destructor calls for non-trivial objects will be handled by the move/copy operators of that element type.
2020-9-14: v1.06 - Removal of unnecessary branch checks, thanks to sentinel reduction, courtesy of github user morwenn.
2020-7-05: v1.05 - Alteration to make workaround for non-contiguous random-access containers.
2020-7-05: v1.04 - Addition of random-access range specialisation - 7% faster on average with reduction of temporary memory usage down to N * unsigned, where unsigned is the smallest unsigned integer able to hold N eg. if N is < 255, unsigned can be unsigned char. Improvement of 16% with vectors & large structs. Correction to test suite under C++03. Fixes & improvements for C++11 mode.
2020-6-28: v1.02 - Further compiler switch reduction. Correction to element destruction with non-move-semantics-supporting compilers.
2020-6-27: v1.01 - Comments cleanup, change to code to remove algorithm.h include if PLF_SORT_FUNCTION is defined. Reduction of compiler support biolerplate code to bare minimum. Reduction of test_suite to bare minimum. Stronger enforcement of internal namespace for C++11-alike functions/structs.
2020-6-26: v1.00 - First release