Taskflow  3.2.0-Master-Branch
Loading...
Searching...
No Matches
Compile Taskflow with SYCL

Install SYCL Compiler

To compile Taskflow with SYCL code, you need the DPC++ clang compiler, which can be acquired from Getting Started with oneAPI DPC++.

Compile Source Code Directly

Taskflow's GPU programming interface for SYCL is tf::syclFlow. Consider the following simple.cpp program that performs the canonical saxpy (single-precision AX + Y) operation on a GPU:

#include <taskflow/taskflow.hpp> // core taskflow routines
#include <taskflow/syclflow.hpp> // core syclflow routines
int main() {
tf::Executor executor;
tf::Taskflow taskflow("saxpy example");
sycl::queue queue;
auto X = sycl::malloc_shared<float>(N, queue);
auto Y = sycl::malloc_shared<float>(N, queue);
taskflow.emplace_on([&](tf::syclFlow& sf){
tf::syclTask fillX = sf.fill(X, 1.0f, N).name("fillX");
tf::syclTask fillY = sf.fill(Y, 2.0f, N).name("fillY");
tf::syclTask saxpy = sf.parallel_for(sycl::range<1>(N),
[=] (sycl::id<1> id) {
X[id] = 3.0f * X[id] + Y[id];
}
).name("saxpy");
saxpy.succeed(fillX, fillY);
}, queue).name("syclFlow");
executor.run(taskflow).wait();
}
class to create an executor for running a taskflow graph
Definition executor.hpp:50
tf::Future< void > run(Taskflow &taskflow)
runs a taskflow once
Definition executor.hpp:1573
class to create a taskflow object
Definition core/taskflow.hpp:73
class for building a SYCL task dependency graph
Definition syclflow.hpp:23
syclTask fill(void *ptr, const T &pattern, size_t count)
creates a fill task that fills typed data with the given value
Definition syclflow.hpp:467
syclTask parallel_for(ArgsT &&... args)
creates a kernel task
Definition syclflow.hpp:500
handle to a node of the internal CUDA graph
Definition sycl_task.hpp:21
syclTask & succeed(Ts &&... tasks)
adds precedence links from other tasks to this
Definition sycl_task.hpp:138
main taskflow include file

Use DPC++ clang to compile the program with the following options:

  • -fsycl: enable SYCL compilation mode
  • -fsycl-targets=nvptx64-nvidia-cuda-sycldevice: enable CUDA target
  • -fsycl-unnamed-lambda: enable unnamed SYCL lambda kernel
~$ clang++ -fsycl -fsycl-unnamed-lambda \
-fsycl-targets=nvptx64-nvidia-cuda-sycldevice \ # for CUDA target
-I path/to/taskflow -pthread -std=c++17 simple.cpp -o simple
~$ ./simple
Attention
You need to include taskflow/syclflow.hpp in order to use tf::syclFlow.

Compile Source Code Separately

Large GPU applications often compile a program into separate objects and link them together to form an executable or a library. You can compile your SYCL code into separate object files and link them to form the final executable. Consider the following example that defines two tasks on two different pieces (main.cpp and syclflow.cpp) of source code:

// main.cpp
tf::Task make_syclflow(tf::Taskflow& taskflow); // create a syclFlow task
int main() {
tf::Executor executor;
tf::Taskflow taskflow;
tf::Task task1 = taskflow.emplace([](){ std::cout << "main.cpp!\n"; })
.name("cpu task");
tf::Task task2 = make_syclflow(taskflow);
task1.precede(task2);
executor.run(taskflow).wait();
return 0;
}
Task emplace(C &&callable)
creates a static task
Definition flow_builder.hpp:742
class to create a task handle over a node in a taskflow graph
Definition task.hpp:187
Task & precede(Ts &&... tasks)
adds precedence links from this to other tasks
Definition task.hpp:420
// syclflow.cpp
#include <taskflow/syclflow.hpp>
inline sycl::queue queue; // create a global sycl queue
tf::Task make_syclflow(tf::Taskflow& taskflow) {
return taskflow.emplace_on([](tf::syclFlow& cf){
printf("syclflow.cpp!\n");
cf.single_task([](){}).name("kernel");
}, queue).name("gpu task");
}
Task emplace_on(C &&callable, D &&device)
creates a cudaFlow task on the given device
Definition cudaflow.hpp:1666
syclTask single_task(F &&func)
invokes a SYCL kernel function using only one thread
Definition syclflow.hpp:492

Compile each source to an object using DPC++ clang:

~$ clang++ -I path/to/taskflow/ -pthread -std=c++17 -c main.cpp -o main.o
~$ clang++ -fsycl -fsycl-unnamed-lambda \
-fsycl-targets=nvptx64-nvidia-cuda-sycldevice \
-I path/to/taskflow/ -pthread -std=c++17 -c syclflow.cpp -o syclflow.o
# now we have the two compiled .o objects, main.o and syclflow.o
~$ ls
main.o syclflow.o

Next, link the two object files to the final executable:

~$ clang++ -fsycl -fsycl-unnamed-lambda \
-fsycl-targets=nvptx64-nvidia-cuda-sycldevice \ # for CUDA target
main.o syclflow.o -pthread -std=c++17 -o main
# run the main program
~$ ./main
main.cpp!
syclflow.cpp!