![]() |
Taskflow
3.2.0-Master-Branch
|
cudaFlow provides template methods to create parallel sort tasks on a CUDA GPU.
You need to include the header file, taskflow/cuda/algorithm/sort.hpp
, for creating a parallel-sort task.
tf::cudaFlow::sort performs an in-place parallel sort over a range of elements specified by [first, last)
using the given comparator. The following code sorts one million random integers in an increasing order on a GPU.
You can specify a different comparator to tf::cudaFlow::sort to alter the sorting order. For example, the following code sorts one million random integers in an decreasing order on a GPU.
tf::cudaFlow::sort_by_key sorts a range of key-value items into ascending key order. If i
and j
are any two valid iterators in [k_first, k_last)
such that i
precedes j
, and p
and q
are iterators in [v_first, v_first + (k_last - k_first))
corresponding to i
and j
respectively, then comp(*j, *i)
evaluates to false
. The following example sorts a range of items into ascending key order and swaps their corresponding values:
While you can capture the values into the lambda and sort them indirectly using plain tf::cudaFlow::sort, this organization will result in frequent and costly access to the global memory. For example, we can sort idx
indirectly using the captured keys in vec:
The comparator here will frequently access the global memory of vec
, resulting in high memory latency. Instead, you should use tf::cudaFlow::sort_by_key that has been optimized for this purpose.
Parallel sort algorithms are also available in tf::cudaFlowCapturer with the same API.