Taskflow  3.2.0-Master-Branch
Loading...
Searching...
No Matches
Parallel Merge

Taskflow provides standalone template methods for merging two sorted ranges of items into a sorted range of items.

Include the Header

You need to include the header file, taskflow/cuda/algorithm/merge.hpp, for using the parallel-merge algorithm.

Merge Two Sorted Ranges of Items

tf::cuda_merge merges two sorted ranges of items into a sorted range. The following code merges two sorted arrays input_1 and input_2, each of 1000 items, into a sorted array output of 2000 items.

const size_t N = 1000;
int* input_1 = tf::cuda_malloc_shared<int>(N); // input vector 1
int* input_2 = tf::cuda_malloc_shared<int>(N); // input vector 2
int* output = tf::cuda_malloc_shared<int>(2*N); // output vector
// initializes the data
for(size_t i=0; i<N; i++) {
input_1[i] = rand()%100;
input_2[i] = rand()%100;
}
std::sort(input_1, input1 + N);
std::sort(input_2, input2 + N);
// queries the required buffer size to merge two N-element vectors
auto bytes = tf::cuda_merge_buffer_size<tf::cudaDefaultExecutionPolicy>(N, N);
auto buffer = tf::cuda_malloc_device<std::byte>(bytes);
// merge input_1 and input_2 to output
input_1, input_1 + N, input_2, input_2 + N, output,
[]__device__ (int a, int b) { return a < b; }, // comparator
buffer
);
// synchronizes the execution and verifies the result
policy.synchronize();
assert(std::is_sorted(output, output + 2*N));
// delete the buffer
tf::cuda_free(buffer);
class to define execution policy for CUDA standard algorithms
Definition cuda_execution_policy.hpp:29
T is_sorted(T... args)
void cuda_merge(P &&p, a_keys_it a_keys_first, a_keys_it a_keys_last, b_keys_it b_keys_first, b_keys_it b_keys_last, c_keys_it c_keys_first, C comp, void *buf)
performs asynchronous key-only merge over a range of keys
Definition merge.hpp:629
void cuda_free(T *ptr, int d)
frees memory on the GPU device
Definition cuda_memory.hpp:101
T sort(T... args)

The merge algorithm runs asynchronously through the stream specified in the execution policy. You need to synchronize the stream to obtain correct results. Since the GPU merge algorithm may require extra buffer to store the temporary results, you must provide a buffer of size at least bytes returned from tf::cuda_merge_buffer_size.

Attention
You must keep the buffer alive before the merge call completes.

Merge Two Sorted Ranges of Key-Value Items

tf::cuda_merge_by_key performs key-value merge over two sorted ranges in a similar way to tf::cuda_merge; additionally, it copies elements from the two ranges of values associated with the two input keys, respectively. The following code performs key-value merge over a and b:

const size_t N = 2;
int* a_keys = tf::cuda_malloc_shared<int>(N);
int* a_vals = tf::cuda_malloc_shared<int>(N);
int* b_keys = tf::cuda_malloc_shared<int>(N);
int* b_vals = tf::cuda_malloc_shared<int>(N);
int* c_keys = tf::cuda_malloc_shared<int>(2*N);
int* c_vals = tf::cuda_malloc_shared<int>(2*N);
// initializes the data
a_keys[0] = 8, a_keys[1] = 1;
a_vals[0] = 1, a_vals[1] = 2;
b_keys[0] = 3, b_keys[1] = 7;
b_vals[0] = 3, b_vals[1] = 4;
// queries the required buffer size to merge the two key-value items
auto bytes = tf::cuda_merge_buffer_size<tf::cudaDefaultExecutionPolicy>(N, N);
auto buffer = tf::cuda_malloc_device<std::byte>(bytes);
// merge keys and values of a and b to c
policy,
a_keys, a_keys+N, a_vals,
b_keys, b_keys+N, b_vals,
c_keys, c_vals,
[]__device__ (int a, int b) { return a < b; }, // comparator
buffer
);
policy.synchronize();
// now, c_keys = {1, 3, 7, 8}
// now, c_vals = {2, 3, 4, 1}
// delete the device memory
tf::cuda_free(buffer);
tf::cuda_free(a_keys);
tf::cuda_free(b_keys);
tf::cuda_free(c_keys);
tf::cuda_free(a_vals);
tf::cuda_free(b_vals);
tf::cuda_free(c_vals);
void cuda_merge_by_key(P &&p, a_keys_it a_keys_first, a_keys_it a_keys_last, a_vals_it a_vals_first, b_keys_it b_keys_first, b_keys_it b_keys_last, b_vals_it b_vals_first, c_keys_it c_keys_first, c_vals_it c_vals_first, C comp, void *buf)
performs asynchronous key-value merge over a range of keys and values
Definition merge.hpp:560

The buffer size required by tf::cuda_merge_by_key is the same as tf::cuda_merge and must be at least equal to or larger than the value returned by tf::cuda_merge_buffer_size.