cudaFlow provides template methods to create parallel merge tasks on a CUDA GPU.
Include the Header
You need to include the header file, taskflow/cuda/algorithm/merge.hpp
, for creating a parallel-merge task.
Merge two Sorted Ranges of Items
tf::cudaFlow::merge performs a parallel merge over two ranges of elements into a sorted range of items. The following code merges two sorted arrays input_1
and input_2
, each of 1000 items, into a sorted array output
of 2000 items.
const size_t N = 1000;
int* input_1 = tf::cuda_malloc_shared<int>(N);
int* input_2 = tf::cuda_malloc_shared<int>(N);
int* output = tf::cuda_malloc_shared<int>(2*N);
for(size_t i=0; i<N; i++) {
input_1[i] = rand()%100;
input_2[i] = rand()%100;
}
input_1, input_1 + N, input_2, input_2 + N, output,
[]__device__ (int a, int b) { return a < b; }
);
class to create a cudaFlow task dependency graph
Definition cudaflow.hpp:56
void offload()
offloads the cudaFlow and executes it once
Definition cudaflow.hpp:1654
cudaTask merge(A a_first, A a_last, B b_first, B b_last, C c_first, Comp comp)
creates a task to perform parallel merge on two sorted arrays
Definition merge.hpp:652
class to create a task handle over an internal node of a cudaFlow graph
Definition cuda_task.hpp:65
Merge two Sorted Ranges of Key-Value Items
tf::cudaFlow::merge_by_key performs key-value merge over two sorted ranges in a similar way to tf::cuda_merge; additionally, it copies elements from the two ranges of values associated with two input keys, respectively. The following code performs key-value merge over a
and b:
const size_t N = 2;
int* a_keys = tf::cuda_malloc_shared<int>(N);
int* a_vals = tf::cuda_malloc_shared<int>(N);
int* b_keys = tf::cuda_malloc_shared<int>(N);
int* b_vals = tf::cuda_malloc_shared<int>(N);
int* c_keys = tf::cuda_malloc_shared<int>(2*N);
int* c_vals = tf::cuda_malloc_shared<int>(2*N);
a_keys[0] = 8, a_keys[1] = 1;
a_vals[0] = 1, a_vals[1] = 2;
b_keys[0] = 3, b_keys[1] = 7;
b_vals[0] = 3, b_vals[1] = 4;
a_keys, a_keys+N, a_vals,
b_keys, b_keys+N, b_vals,
c_keys, c_vals,
[] __device__ (int a, int b) { return a < b; },
);
cudaTask merge_by_key(a_keys_it a_keys_first, a_keys_it a_keys_last, a_vals_it a_vals_first, b_keys_it b_keys_first, b_keys_it b_keys_last, b_vals_it b_vals_first, c_keys_it c_keys_first, c_vals_it c_vals_first, C comp)
creates a task to perform parallel key-value merge
Definition merge.hpp:679
void cuda_free(T *ptr, int d)
frees memory on the GPU device
Definition cuda_memory.hpp:101
Miscellaneous Items
Parallel merge algorithms are also available in tf::cudaFlowCapturer with the same API.