Indiesort (portmanteau of 'indirect sort') is a sort-function wrapper which enables use of std::sort (or other random-access sort techniques) with non-random-access containers, and also improves sort performance for large/non-trivially-copyable types in random-access containers and arrays. It achieves this by sorting pointers/indexes to elements by the value of the elements they point to, then relocating the elements, rather than sorting the elements themselves. Elements will be moved a maximum of two times - most will be moved once or not at all.
It has a temporary memory cost of N * (sizeof(pointer) + sizeof(size_t)) for sorting non-random-access iterators/containers (or N * sizeof(element) when sizeof(element) is smaller than (sizeof(pointer) + sizeof(size_t))),
and a N * sizeof(U) cost for random-access iterators/containers, where U = the smallest unsigned integer able to store N. For example if the size of the range being sorted is <= 255, U will be unsigned char.
Similar techniques are implemented by Morwenn's Mountain sort, and elsewhere. plf::list employs a similar technique in it's sort() techniques, but is able to do it much faster for larger-than-scalar types due to the structure of a linked list and only having to change pointers, not move elements. However when using plf::list with scalar types smaller than 8 bytes, you will get a 15% on average (up to 70%) performance increase using indiesort - at the cost of losing the validity of iterators/pointers pointing to the plf::list elements (as, unlike plf::list::sort(), indiesort changes element locations). So if you're using plf::list on larger-than-scalar types, don't use indiesort with it, just use it's own sort() technique.
Indiesort should be used when:
It is, on average across 1, 2, 4, 48 and 490 bytes, and numbers of elements ranging from 10 to 1000000:
See the benchmarks for more information.
This technique developed out of the plf::list project, where it was used with the linked list to create a much faster sort than was available in std::list. From there it was ported into plf::colony, which, while having a large memory cost compared to plf::list's sort, was still useful. On Jens Maurer's suggestion, I separated the sort technique from colony, and in the process, streamlined and reduced the memory cost. It can now be used with any iterator/pointer range or container. Note: plf::list's internal sort is still superior to indiesort for a linked list, as it changes the previous and next pointers of each node rather than relocating elements, whereas indiesort will always reallocate elements.
As of v1.3, the algorithm changes for non-random-access containers/iterators if sizeof(element) <= 2 * sizeof(element *)
and the element is trivially-copyable or move-assignable - in this case, it will simply copy the elements to a C-array and sort the array, then copy the elements back. This becomes faster when the type size is small as there is less data to sort. In all other cases, the original method below is used.
As of v1.4, for random-access containers/iterators the algorithm will switch to directly sorting elements via std::sort when the sizeof(element) <= 152
(the benchmarked threshold for improved performance via indirect sorting) and the type is trivially-copyable or move-assignable. Otherwise the original method below is used.
The algorithm consists of gather, sort and scatter phases:
last - first
for random-access iterator ranges, or ++ iteration for non-random-access iterator ranges. If size is < 2 we return.sort_array
. It's pointer member is called original_location
, while it's size_t member is called original_index
.end_value
.
We then store i as destination_index
and sort_array[i].original_index as source_index
.plf::indiesort is under a permissive zlib license. This means: Software is used on 'as-is' basis. It can be modified and used in commercial software. Authors are not liable for any damages arising from its use. The distribution of a modified version of the software is subject to the following restrictions:
Download here (6kb zip file) or
view the repository
The indiesort function library is a simple .h header file, to be included with a #include
command.
In addition if you are interested in benchmarking you can also download the plf benchmark suite (64kb zip
file).
template <class iterator_type, class sort_comparison>
void indiesort (iterator_type first, iterator_type last, sort_comparison compare)
Sorts range using std::sort internally.
template <class iterator_type>
void indiesort (iterator_type first, iterator_type last)
Sorts range using std::sort with a less-than comparison internally.
template <class container_type, class sort_comparison>
void indiesort (container_type &container, sort_comparison compare)
Sorts container using std::sort internally.
template <class container_type>
void indiesort (container_type &container)
Sorts container using std::sort and a less-than comparison.
template <class iterator_type, class sort_comparison>
void indiesort (iterator_type first, iterator_type last, sort_comparison compare, std::size_t size)
Sorts range using std::sort internally. If the size of the range is known in advance and the iterators are not random_access, use this variant for better performance. If the size of the range isn't known, use the range-based functions above. If the range is begin() - end() of a container, use one of the container-based functions above. If the iterators are random_access, the range-based functions above will figure out the range from last - first.
Note: in the case of non-contiguous random-access containers, such as std::deque, use the function above as it will be much faster than the others. Unfortunately it will have the same memory cost as calling indiesort on a non-random-access containers.
To use a sort function other than std::sort, #define PLF_SORT_FUNCTION to the name of your sort function before calling plf::indiesort. Whatever sort function you use must take arguments in the same format and order as std::sort. If PLF_SORT_FUNCTION is not defined STL algorithm.h
will be included and std::sort will be used instead. To avoid the plf_indiesort.h pulling in algorithm.h PLF_SORT_FUNCTION must be defined before including plf_indiesort.h.
#include <functional> // std::greater #include <vector> #include <iostream> #include "plf_indiesort.h" int main() { std::vector<int> vec; for (int i = 0; i != 60; ++i) { vec.push_back(60 - i); } plf::indiesort(vec); int last = 0; for (std::vector<int>::iterator it = vec.begin(); it != vec.end(); ++it) { if (last > *it) { std::cout << "Less-than full-container sort failed\n"; return -1; } last = *it; } std::cout << "Less-than full-container sort passed\n"; std::vector<int>::iterator it1 = vec.begin() + 5, it2 = vec.begin() + 50; plf::indiesort(it1, it2, std::greater<int>()); last = 60; for (std::vector<int>::iterator it = it1; it != it2; ++it) { if (last < *it) { std::cout << "Greater-than range sort failed\n"; return -1; } last = *it; } std::cout << "Greater-than range sort passed\n"; return 0; }
Benchmark results for indiesort v1.02 under GCC 9.2 x64 on an Intel Xeon E3-1241 (Haswell core) are here.
Contact:
plf:: library and this page Copyright (c) 2023, Matthew Bentley