![]() |
Taskflow
3.2.0-Master-Branch
|
To compile Taskflow with CUDA code, you need a nvcc
compiler. Please visit the official page of Downloading CUDA Toolkit.
Taskflow's GPU programming interface for CUDA is tf::cudaFlow. Consider the following simple.cu
program that launches a single kernel function to output a message:
The easiest way to compile Taskflow with CUDA code (e.g., cudaFlow, kernels) is to use nvcc:
taskflow/cudaflow.hpp
in order to use tf::cudaFlow.Large GPU applications often compile a program into separate objects and link them together to form an executable or a library. You can compile your CPU code and GPU code separately with Taskflow using nvcc
and other compilers (such as g++
and clang++
). Consider the following example that defines two tasks on two different pieces (main.cpp
and cudaflow.cpp
) of source code:
Compile each source to an object (g++
as an example):
The --extended-lambda
option tells nvcc
to generate GPU code for the lambda defined with device
. The -x cu
tells nvcc
to treat the input files as .cu files containing both CPU and GPU code. By default,
nvcc
treats .cpp files as CPU-only code. This option is required to have
nvcc
generate device code here, but it is also a handy way to avoid renaming source files in larger projects. The –dc
option tells nvcc
to generate device code for later linking.
You may also need to specify the target architecture to tell nvcc
to target on a compatible SM architecture using the option -arch. For instance, the following command requires device code linking to have compute capability 7.5 or later:
Using nvcc
to link compiled object code is nothing special but replacing the normal compiler with nvcc
and it takes care of all the necessary steps:
You can choose to use a compiler other than nvcc
for the final link step. Since your CPU compiler does not know how to link CUDA device code, you have to add a step in your build to have nvcc
link the CUDA device code, using the option -dlink:
This step links all the device object code and places it into gpuCode.o
.
main.o
and cudaflow.o
.To complete the link to an executable, you can use, for example, ld
or g++
.
We give g++
all of the objects again because it needs the CPU object code, which is not in gpuCode.o
. The device code stored in the original objects, main.o
and cudaflow.o
, does not conflict with the code in gpuCode.o
. g++
ignores device code because it does not know how to link it, and the device code in gpuCode.o
is already linked and ready to go.
nvcc
for linking, but we must explicitly link it (-lcudart
) when using another linker.