Install SYCL Compiler
To compile Taskflow with SYCL code, you need the DPC++ clang compiler, which can be acquired from Getting Started with oneAPI DPC++.
Compile Source Code Directly
Taskflow's GPU programming interface for SYCL is tf::syclFlow. Consider the following simple.cpp
program that performs the canonical saxpy (single-precision AX + Y) operation on a GPU:
#include <taskflow/syclflow.hpp>
int main() {
sycl::queue queue;
auto X = sycl::malloc_shared<float>(N, queue);
auto Y = sycl::malloc_shared<float>(N, queue);
[=] (sycl::id<1> id) {
X[id] = 3.0f * X[id] + Y[id];
}
).name("saxpy");
}, queue).name("syclFlow");
executor.
run(taskflow).wait();
}
class to create an executor for running a taskflow graph
Definition executor.hpp:50
tf::Future< void > run(Taskflow &taskflow)
runs a taskflow once
Definition executor.hpp:1573
class to create a taskflow object
Definition core/taskflow.hpp:73
class for building a SYCL task dependency graph
Definition syclflow.hpp:23
syclTask fill(void *ptr, const T &pattern, size_t count)
creates a fill task that fills typed data with the given value
Definition syclflow.hpp:467
syclTask parallel_for(ArgsT &&... args)
creates a kernel task
Definition syclflow.hpp:500
handle to a node of the internal CUDA graph
Definition sycl_task.hpp:21
syclTask & succeed(Ts &&... tasks)
adds precedence links from other tasks to this
Definition sycl_task.hpp:138
main taskflow include file
Use DPC++ clang to compile the program with the following options:
-fsycl:
enable SYCL compilation mode
-fsycl-targets=nvptx64-nvidia-cuda-sycldevice
: enable CUDA target
-fsycl-unnamed-lambda
: enable unnamed SYCL lambda kernel
~$ clang++ -fsycl -fsycl-unnamed-lambda \
-fsycl-targets=nvptx64-nvidia-cuda-sycldevice \ # for CUDA target
-I path/to/taskflow -pthread -std=c++17 simple.cpp -o simple
~$ ./simple
- Attention
- You need to include
taskflow/syclflow.hpp
in order to use tf::syclFlow.
Compile Source Code Separately
Large GPU applications often compile a program into separate objects and link them together to form an executable or a library. You can compile your SYCL code into separate object files and link them to form the final executable. Consider the following example that defines two tasks on two different pieces (main.cpp
and syclflow.cpp
) of source code:
int main() {
.name("cpu task");
tf::Task task2 = make_syclflow(taskflow);
executor.
run(taskflow).wait();
return 0;
}
Task emplace(C &&callable)
creates a static task
Definition flow_builder.hpp:742
class to create a task handle over a node in a taskflow graph
Definition task.hpp:187
Task & precede(Ts &&... tasks)
adds precedence links from this to other tasks
Definition task.hpp:420
#include <taskflow/syclflow.hpp>
inline sycl::queue queue;
printf("syclflow.cpp!\n");
}, queue).name("gpu task");
}
Task emplace_on(C &&callable, D &&device)
creates a cudaFlow task on the given device
Definition cudaflow.hpp:1666
syclTask single_task(F &&func)
invokes a SYCL kernel function using only one thread
Definition syclflow.hpp:492
Compile each source to an object using DPC++ clang:
~$ clang++ -I path/to/taskflow/ -pthread -std=c++17 -c main.cpp -o main.o
~$ clang++ -fsycl -fsycl-unnamed-lambda \
-fsycl-targets=nvptx64-nvidia-cuda-sycldevice \
-I path/to/taskflow/ -pthread -std=c++17 -c syclflow.cpp -o syclflow.o
# now we have the two compiled .o objects, main.o and syclflow.o
~$ ls
main.o syclflow.o
Next, link the two object files to the final executable:
~$ clang++ -fsycl -fsycl-unnamed-lambda \
-fsycl-targets=nvptx64-nvidia-cuda-sycldevice \ # for CUDA target
main.o syclflow.o -pthread -std=c++17 -o main
# run the main program
~$ ./main
main.cpp!
syclflow.cpp!