1// Copyright (c) 2018-2020 NVIDIA Corporation 2// 3// SPDX-License-Identifier: CC-BY-4.0 4 5include::{generated}/meta/{refprefix}VK_NV_cuda_kernel_launch.adoc[] 6 7=== Other Extension Metadata 8 9*Last Modified Date*:: 10 2020-09-30 11*Contributors*:: 12 - Eric Werness, NVIDIA 13 14=== Description 15 16Interoperability between APIs can sometimes create additional overhead 17depending on the platform used. 18This extension targets deployment of existing CUDA kernels via Vulkan, with 19a way to directly upload PTX kernels and dispatch the kernels from Vulkan's 20command buffer without the need to use interoperability between the Vulkan 21and CUDA contexts. 22However, we do encourage actual development using the native CUDA runtime 23for the purpose of debugging and profiling. 24 25The application will first have to create a CUDA module using 26flink:vkCreateCudaModuleNV then create the CUDA function entry point with 27flink:vkCreateCudaFunctionNV. 28 29Then in order to dispatch this function, the application will create a 30command buffer where it will launch the kernel with 31flink:vkCmdCudaLaunchKernelNV. 32 33When done, the application will then destroy the function handle, as well as 34the CUDA module handle with flink:vkDestroyCudaFunctionNV and 35flink:vkDestroyCudaModuleNV. 36 37To reduce the impact of compilation time, this extension offers the 38capability to return a binary cache from the PTX that was provided. 39For this, a first query for the required cache size is made with 40flink:vkGetCudaModuleCacheNV with a `NULL` pointer to a buffer and with a 41valid pointer receiving the size; then another call of the same function 42with a valid pointer to a buffer to retrieve the data. 43The resulting cache could then be user later for further runs of this 44application by sending this cache instead of the PTX code (using the same 45flink:vkCreateCudaModuleNV), thus significantly speeding up the 46initialization of the CUDA module. 47 48As with slink:VkPipelineCache, the binary cache depends on the hardware 49architecture. 50The application must assume the cache might fail, and need to handle falling 51back to the original PTX code as necessary. 52Most often, the cache will succeed if the same GPU driver and architecture 53is used between the cache generation from PTX and the use of this cache. 54In the event of a new driver version, or if using a different GPU 55architecture, the cache is likely to become invalid. 56 57include::{generated}/interfaces/VK_NV_cuda_kernel_launch.adoc[] 58 59=== Issues 60 61None. 62 63=== Version History 64 65 * Revision 1, 2020-03-01 (Tristan Lorach) 66 * Revision 2, 2020-09-30 (Tristan Lorach) 67