Taskflow provides standalone template methods for finding elements in the given ranges using CUDA.
Include the Header
You need to include the header file, taskflow/cuda/algorithm/find.hpp
, for creating a parallel-find task.
Find an Element in a Range
tf::cudaFlow::find_if finds the index of the first element in the range [first, last)
that satisfies the given criteria. This is equivalent to the parallel execution of the following loop:
unsigned idx = 0;
for(; first != last; ++first, ++idx) {
if (p(*first)) {
return idx;
}
}
return idx;
If no such an element is found, the size of the range is returned. The following code finds the index of the first element that is dividable by 17
over a range of one million elements.
const size_t N = 1000000;
auto vec = tf::cuda_malloc_shared<int>(N);
auto idx = tf::cuda_malloc_shared<unsigned>(1);
for(size_t i=0; i<N; vec[i++] = rand());
vec, vec+N, idx, [] __device__ (auto v) { return v%17 == 0; }
);
if(*idx != N) {
assert(vec[*idx] %17 == 0);
}
class to create a cudaFlow task dependency graph
Definition cudaflow.hpp:56
cudaTask find_if(I first, I last, unsigned *idx, U op)
creates a task to find the index of the first element in a range
Definition find.hpp:193
void offload()
offloads the cudaFlow and executes it once
Definition cudaflow.hpp:1654
class to create a task handle over an internal node of a cudaFlow graph
Definition cuda_task.hpp:65
void cuda_free(T *ptr, int d)
frees memory on the GPU device
Definition cuda_memory.hpp:101
Find the Minimum Element in a Range
tf::cudaFlow::min_element finds the index of the minimum element in the given range [first, last)
using the given comparison function object. This is equivalent to a parallel execution of the following loop:
if(first == last) {
return 0;
}
auto smallest = first;
for (++first; first != last; ++first) {
if (op(*first, *smallest)) {
smallest = first;
}
}
The following code finds the index of the minimum element in a range of one millions elements.
const size_t N = 1000000;
auto vec = tf::cuda_malloc_shared<int>(N);
auto idx = tf::cuda_malloc_shared<unsigned>(1);
for(size_t i=0; i<N; vec[i++] = rand());
vec, vec+N, idx, [] __device__ (auto a, auto b) { return a<b; }
);
cudaTask min_element(I first, I last, unsigned *idx, O op)
finds the index of the minimum element in a range
Definition find.hpp:340
Find the Maximum Element in a Range
Similar to tf::cudaFlow::min_element, tf::cudaFlow::max_element finds the index of the maximum element in the given range [first, last)
using the given comparison function object. This is equivalent to a parallel execution of the following loop:
if(first == last) {
return 0;
}
auto largest = first;
for (++first; first != last; ++first) {
if (op(*largest, *first)) {
largest = first;
}
}
The following code finds the index of the maximum element in a range of one millions elements.
const size_t N = 1000000;
auto vec = tf::cuda_malloc_shared<int>(N);
auto idx = tf::cuda_malloc_shared<unsigned>(1);
for(size_t i=0; i<N; vec[i++] = rand());
vec, vec+N, idx, [] __device__ (auto a, auto b) { return a<b; }
);
cudaTask max_element(I first, I last, unsigned *idx, O op)
finds the index of the maximum element in a range
Definition find.hpp:465
Miscellaneous Items
Parallel find algorithms are also available in tf::cudaFlowCapturer with the same API.