Taskflow  3.2.0-Master-Branch
Loading...
Searching...
No Matches
Parallel Find

Taskflow provides standalone template methods for finding elements in the given ranges using CUDA.

Include the Header

You need to include the header file, taskflow/cuda/algorithm/find.hpp, for using the parallel-find algorithm.

Find an Element in a Range

tf::cuda_find_if finds the index of the first element in the range [first, last) that satisfies the given criteria. This is equivalent to the parallel execution of the following loop:

unsigned idx = 0;
for(; first != last; ++first, ++idx) {
if (p(*first)) {
return idx;
}
}
return idx;

If no such an element is found, the size of the range is returned. The following code finds the index of the first element that is dividable by 17 over a range of one million elements.

const size_t N = 1000000;
auto vec = tf::cuda_malloc_shared<int>(N); // vector
auto idx = tf::cuda_malloc_shared<unsigned>(1); // index
// initializes the data
for(size_t i=0; i<N; vec[i++] = rand());
// finds the index of the first element that is a multiple of 17
policy, vec, vec+N, idx, [] __device__ (auto v) { return v%17 == 0; }
);
policy.synchronize();
// verifies the result
if(*idx != N) {
assert(vec[*idx] %17 == 0);
}
// deletes the memory
class to define execution policy for CUDA standard algorithms
Definition cuda_execution_policy.hpp:29
void cuda_find_if(P &&p, I first, I last, unsigned *idx, U op)
finds the index of the first element that satisfies the given criteria
Definition find.hpp:181
void cuda_free(T *ptr, int d)
frees memory on the GPU device
Definition cuda_memory.hpp:101

The find-if algorithm runs asynchronously through the stream specified in the execution policy. You need to synchronize the stream to obtain the correct result.

Find the Minimum Element in a Range

tf::cuda_min_element finds the index of the minimum element in the given range [first, last) using the given comparison function object. This is equivalent to a parallel execution of the following loop:

if(first == last) {
return 0;
}
auto smallest = first;
for (++first; first != last; ++first) {
if (op(*first, *smallest)) {
smallest = first;
}
}
return std::distance(first, smallest);
T distance(T... args)

Since tf::cuda_min_element may require extra buffer to store temporary results, you need to provide a buffer that holds at least tf::cuda_min_element_buffer_size bytes. The following code finds the index of the minimum element in a range of one millions elements.

const size_t N = 1000000;
auto vec = tf::cuda_malloc_shared<int>(N); // vector
auto idx = tf::cuda_malloc_shared<unsigned>(1); // index
// initializes the data
for(size_t i=0; i<N; vec[i++] = rand());
// queries the required buffer size using the given policy
auto bytes = tf::cuda_min_element_buffer_size<tf::cudaDefaultExecutionPolicy, int>(N);
auto buffer = tf::cuda_malloc_device<std::byte>(bytes);
// finds the minimum element using the less comparator
policy, vec, vec+N, idx, [] __device__ (auto a, auto b) { return a<b; }, buffer
);
policy.synchronize();
// verifies the result
assert(vec[*idx] == *std::min_element(vec, vec+N, std::less<int>{}));
// deletes the memory
tf::cuda_free(buffer);
T min_element(T... args)
void cuda_min_element(P &&p, I first, I last, unsigned *idx, O op, void *buf)
finds the index of the minimum element in a range
Definition find.hpp:288
Attention
You must keep the buffer alive before the tf::cuda_min_element completes.

Find the Maximum Element in a Range

Similar to tf::cuda_min_element, tf::cuda_max_element finds the index of the maximum element in the given range [first, last) using the given comparison function object. This is equivalent to a parallel execution of the following loop:

if(first == last) {
return 0;
}
auto largest = first;
for (++first; first != last; ++first) {
if (op(*largest, *first)) {
largest = first;
}
}
return std::distance(first, largest);

Since tf::cuda_max_element may require extra buffer to store temporary results, you need to provide a buffer that holds at least tf::cuda_max_element_buffer_size bytes. The following code finds the index of the maximum element in a range of one millions elements.

const size_t N = 1000000;
auto vec = tf::cuda_malloc_shared<int>(N); // vector
auto idx = tf::cuda_malloc_shared<unsigned>(1); // index
// initializes the data
for(size_t i=0; i<N; vec[i++] = rand());
// queries the required buffer size using the given policy
auto bytes = tf::cuda_max_element_buffer_size<tf::cudaDefaultExecutionPolicy, int>(N);
auto buffer = tf::cuda_malloc_device<std::byte>(bytes);
// finds the maximum element using the less comparator
policy, vec, vec+N, idx, [] __device__ (auto a, auto b) { return a<b; }, buffer
);
policy.synchronize();
// verifies the result
assert(vec[*idx] == *std::max_element(vec, vec+N, std::less<int>{}));
// deletes the memory
tf::cuda_free(buffer);
T max_element(T... args)
void cuda_max_element(P &&p, I first, I last, unsigned *idx, O op, void *buf)
finds the index of the maximum element in a range
Definition find.hpp:413
Attention
You must keep the buffer alive before tf::cuda_max_element completes.