![]() |
Taskflow
3.2.0-Master-Branch
|
cudaFlow provides template methods to create parallel sort tasks on a CUDA GPU.
You need to include the header file, taskflow/cuda/algorithm/sort.hpp, for creating a parallel-sort task.
tf::cudaFlow::sort performs an in-place parallel sort over a range of elements specified by [first, last) using the given comparator. The following code sorts one million random integers in an increasing order on a GPU.
You can specify a different comparator to tf::cudaFlow::sort to alter the sorting order. For example, the following code sorts one million random integers in an decreasing order on a GPU.
tf::cudaFlow::sort_by_key sorts a range of key-value items into ascending key order. If i and j are any two valid iterators in [k_first, k_last) such that i precedes j, and p and q are iterators in [v_first, v_first + (k_last - k_first)) corresponding to i and j respectively, then comp(*j, *i) evaluates to false. The following example sorts a range of items into ascending key order and swaps their corresponding values:
While you can capture the values into the lambda and sort them indirectly using plain tf::cudaFlow::sort, this organization will result in frequent and costly access to the global memory. For example, we can sort idx indirectly using the captured keys in vec:
The comparator here will frequently access the global memory of vec, resulting in high memory latency. Instead, you should use tf::cudaFlow::sort_by_key that has been optimized for this purpose.
Parallel sort algorithms are also available in tf::cudaFlowCapturer with the same API.