Taskflow  3.2.0-Master-Branch
Loading...
Searching...
No Matches
Parallel Scan

cudaFlow provides template methods to create parallel scan tasks on a CUDA GPU.

Include the Header

You need to include the header file, taskflow/cuda/algorithm/scan.hpp, for creating a parallel-scan task.

Scan a Range of Items

tf::cudaFlow::inclusive_scan computes an inclusive prefix sum operation using the given binary operator over a range of elements specified by [first, last). The term "inclusive" means that the i-th input element is included in the i-th sum. The following code computes the inclusive prefix sum over an input array and stores the result in an output array.

const size_t N = 1000000;
int* input = tf::cuda_malloc_shared<int>(N); // input vector
int* output = tf::cuda_malloc_shared<int>(N); // output vector
// initializes the data
for(size_t i=0; i<N; input[i++]=rand());
// creates a cudaFlow of one task to perform inclusive scan
input, input + N, output, [] __device__ (int a, int b) { return a + b; }
);
cf.offload();
// verifies the result
for(size_t i=1; i<N; i++) {
assert(output[i] == output[i-1] + input[i]);
}
class to create a cudaFlow task dependency graph
Definition cudaflow.hpp:56
cudaTask inclusive_scan(I first, I last, O output, C op)
creates a task to perform parallel inclusive scan over a range of items
Definition scan.hpp:619
void offload()
offloads the cudaFlow and executes it once
Definition cudaflow.hpp:1654
class to create a task handle over an internal node of a cudaFlow graph
Definition cuda_task.hpp:65

On the other hand, tf::cudaFlow::exclusive_scan computes an exclusive prefix sum operation. The term "exclusive" means that the i-th input element is NOT included in the i-th sum.

// creates a cudaFlow of one task to perform exclusive scan
input, input + N, output, [] __device__ (int a, int b) { return a + b; }
);
cf.offload();
// verifies the result
for(size_t i=1; i<N; i++) {
assert(output[i] == output[i-1] + input[i-1]);
}
cudaTask exclusive_scan(I first, I last, O output, C op)
similar to cudaFlow::inclusive_scan but excludes the first value
Definition scan.hpp:637

Scan a Range of Transformed Items

tf::cudaFlow::transform_inclusive_scan transforms each item in the range [first, last) and computes an inclusive prefix sum over these transformed items. The following code multiplies each item by 10 and then compute the inclusive prefix sum over 1000000 transformed items.

const size_t N = 1000000;
int* input = tf::cuda_malloc_shared<int>(N); // input vector
int* output = tf::cuda_malloc_shared<int>(N); // output vector
// initializes the data
for(size_t i=0; i<N; i++)
input[i] = rand();
}
// creates a cudaFlow of one task to inclusively scan over transformed input
input, input + N, output,
[] __device__ (int a, int b) { return a + b; }, // binary scan operator
[] __device__ (int a) { return a*10; } // unary transform operator
);
cf.offload();
// verifies the result
for(size_t i=1; i<N; i++) {
assert(output[i] == output[i-1] + input[i] * 10);
}
cudaTask transform_inclusive_scan(I first, I last, O output, B bop, U uop)
creates a task to perform parallel inclusive scan over a range of transformed items
Definition scan.hpp:655

Similarly, tf::cudaFlow::transform_exclusive_scan performs an exclusive prefix sum over a range of transformed items. The following code computes the exclusive prefix sum over 1000000 transformed items each multipled by 10.

const size_t N = 1000000;
int* input = tf::cuda_malloc_shared<int>(N); // input vector
int* output = tf::cuda_malloc_shared<int>(N); // output vector
// initializes the data
for(size_t i=0; i<N; input[i++]=rand());
// creates a cudaFlow of one task to exclusively scan over transformed input
input, input + N, output,
[] __device__ (int a, int b) { return a + b; }, // binary scan operator
[] __device__ (int a) { return a*10; } // unary transform operator
);
cf.offload();
// verifies the result
for(size_t i=1; i<N; i++) {
assert(output[i] == output[i-1] + input[i-1] * 10);
}
cudaTask transform_exclusive_scan(I first, I last, O output, B bop, U uop)
similar to cudaFlow::transform_inclusive_scan but excludes the first value
Definition scan.hpp:677

Miscellaneous Items

Parallel scan algorithms are also available in tf::cudaFlowCapturer with the same API.