• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1// Copyright (c) 2018-2020 NVIDIA Corporation
2//
3// SPDX-License-Identifier: CC-BY-4.0
4
5include::{generated}/meta/{refprefix}VK_NV_cuda_kernel_launch.adoc[]
6
7=== Other Extension Metadata
8
9*Last Modified Date*::
10    2020-09-30
11*Contributors*::
12  - Eric Werness, NVIDIA
13
14=== Description
15
16Interoperability between APIs can sometimes create additional overhead
17depending on the platform used.
18This extension targets deployment of existing CUDA kernels via Vulkan, with
19a way to directly upload PTX kernels and dispatch the kernels from Vulkan's
20command buffer without the need to use interoperability between the Vulkan
21and CUDA contexts.
22However, we do encourage actual development using the native CUDA runtime
23for the purpose of debugging and profiling.
24
25The application will first have to create a CUDA module using
26flink:vkCreateCudaModuleNV then create the CUDA function entry point with
27flink:vkCreateCudaFunctionNV.
28
29Then in order to dispatch this function, the application will create a
30command buffer where it will launch the kernel with
31flink:vkCmdCudaLaunchKernelNV.
32
33When done, the application will then destroy the function handle, as well as
34the CUDA module handle with flink:vkDestroyCudaFunctionNV and
35flink:vkDestroyCudaModuleNV.
36
37To reduce the impact of compilation time, this extension offers the
38capability to return a binary cache from the PTX that was provided.
39For this, a first query for the required cache size is made with
40flink:vkGetCudaModuleCacheNV with a `NULL` pointer to a buffer and with a
41valid pointer receiving the size; then another call of the same function
42with a valid pointer to a buffer to retrieve the data.
43The resulting cache could then be user later for further runs of this
44application by sending this cache instead of the PTX code (using the same
45flink:vkCreateCudaModuleNV), thus significantly speeding up the
46initialization of the CUDA module.
47
48As with slink:VkPipelineCache, the binary cache depends on the hardware
49architecture.
50The application must assume the cache might fail, and need to handle falling
51back to the original PTX code as necessary.
52Most often, the cache will succeed if the same GPU driver and architecture
53is used between the cache generation from PTX and the use of this cache.
54In the event of a new driver version, or if using a different GPU
55architecture, the cache is likely to become invalid.
56
57include::{generated}/interfaces/VK_NV_cuda_kernel_launch.adoc[]
58
59=== Issues
60
61None.
62
63=== Version History
64
65  * Revision 1, 2020-03-01 (Tristan Lorach)
66  * Revision 2, 2020-09-30 (Tristan Lorach)
67