Taskflow  3.2.0-Master-Branch
Loading...
Searching...
No Matches
Parallel Find

Taskflow provides standalone template methods for finding elements in the given ranges using CUDA.

Include the Header

You need to include the header file, taskflow/cuda/algorithm/find.hpp, for creating a parallel-find task.

Find an Element in a Range

tf::cudaFlow::find_if finds the index of the first element in the range [first, last) that satisfies the given criteria. This is equivalent to the parallel execution of the following loop:

unsigned idx = 0;
for(; first != last; ++first, ++idx) {
if (p(*first)) {
return idx;
}
}
return idx;

If no such an element is found, the size of the range is returned. The following code finds the index of the first element that is dividable by 17 over a range of one million elements.

const size_t N = 1000000;
auto vec = tf::cuda_malloc_shared<int>(N); // vector
auto idx = tf::cuda_malloc_shared<unsigned>(1); // index
// initializes the data
for(size_t i=0; i<N; vec[i++] = rand());
// finds the index of the first element that is a multiple of 17
tf::cudaFlow cudaflow;
tf::cudaTask task = cudaflow.find_if(
vec, vec+N, idx, [] __device__ (auto v) { return v%17 == 0; }
);
cudaflow.offload();
// verifies the result
if(*idx != N) {
assert(vec[*idx] %17 == 0);
}
// deletes the memory
class to create a cudaFlow task dependency graph
Definition cudaflow.hpp:56
cudaTask find_if(I first, I last, unsigned *idx, U op)
creates a task to find the index of the first element in a range
Definition find.hpp:193
void offload()
offloads the cudaFlow and executes it once
Definition cudaflow.hpp:1654
class to create a task handle over an internal node of a cudaFlow graph
Definition cuda_task.hpp:65
void cuda_free(T *ptr, int d)
frees memory on the GPU device
Definition cuda_memory.hpp:101

Find the Minimum Element in a Range

tf::cudaFlow::min_element finds the index of the minimum element in the given range [first, last) using the given comparison function object. This is equivalent to a parallel execution of the following loop:

if(first == last) {
return 0;
}
auto smallest = first;
for (++first; first != last; ++first) {
if (op(*first, *smallest)) {
smallest = first;
}
}
return std::distance(first, smallest);
T distance(T... args)

The following code finds the index of the minimum element in a range of one millions elements.

const size_t N = 1000000;
auto vec = tf::cuda_malloc_shared<int>(N); // vector
auto idx = tf::cuda_malloc_shared<unsigned>(1); // index
// initializes the data
for(size_t i=0; i<N; vec[i++] = rand());
// finds the minimum element using the less comparator
tf::cudaFlow cudaflow;
tf::cudaTask task = cudaflow.min_element(
vec, vec+N, idx, [] __device__ (auto a, auto b) { return a<b; }
);
cudaflow.offload();
// verifies the result
assert(vec[*idx] == *std::min_element(vec, vec+N, std::less<int>{}));
// deletes the memory
cudaTask min_element(I first, I last, unsigned *idx, O op)
finds the index of the minimum element in a range
Definition find.hpp:340
T min_element(T... args)

Find the Maximum Element in a Range

Similar to tf::cudaFlow::min_element, tf::cudaFlow::max_element finds the index of the maximum element in the given range [first, last) using the given comparison function object. This is equivalent to a parallel execution of the following loop:

if(first == last) {
return 0;
}
auto largest = first;
for (++first; first != last; ++first) {
if (op(*largest, *first)) {
largest = first;
}
}
return std::distance(first, largest);

The following code finds the index of the maximum element in a range of one millions elements.

const size_t N = 1000000;
auto vec = tf::cuda_malloc_shared<int>(N); // vector
auto idx = tf::cuda_malloc_shared<unsigned>(1); // index
// initializes the data
for(size_t i=0; i<N; vec[i++] = rand());
// finds the maximum element using the less comparator
tf::cudaFlow cudaflow;
tf::cudaTask task = cudaflow.max_element(
vec, vec+N, idx, [] __device__ (auto a, auto b) { return a<b; }
);
cudaflow.offload();
// verifies the result
assert(vec[*idx] == *std::max_element(vec, vec+N, std::less<int>{}));
// deletes the memory
cudaTask max_element(I first, I last, unsigned *idx, O op)
finds the index of the maximum element in a range
Definition find.hpp:465
T max_element(T... args)

Miscellaneous Items

Parallel find algorithms are also available in tf::cudaFlowCapturer with the same API.