1# Fast OpenGL State Transitions 2 3Typical OpenGL programs issue a few small state change commands between draw call commands. We want 4the typical app's use case to be as fast as possible so this leads to unique performance challenges. 5 6Vulkan is quite different from OpenGL because it requires a separate compiled 7[VkPipeline][VkPipeline] for each state vector. Compiling VkPipelines is multiple orders of 8magnitude slower than enabling or disabling an OpenGL render state. To speed this up we use three 9levels of caching when transitioning states in the Vulkan back-end. 10 11## L3 Cache 12 13The outermost level is the driver's [VkPipelineCache][VkPipelineCache]. The driver 14cache reduces pipeline recompilation time significantly. But even cached 15pipeline recompilations are orders of manitude slower than OpenGL state changes. 16 17## L2 Cache 18 19The second level cache is an ANGLE-owned hash map from OpenGL state vectors to compiled pipelines. 20See [GraphicsPipelineCache][GraphicsPipelineCache] in [vk_cache_utils.h](../vk_cache_utils.h). ANGLE's 21[GraphicsPipelineDesc][GraphicsPipelineDesc] class is a tightly packed 256-byte description of the 22current OpenGL rendering state. We also use a [xxHash](https://github.com/Cyan4973/xxHash) for the 23fastest possible hash computation. The hash map speeds up state changes considerably. But it is 24still significantly slower than OpenGL implementations. 25 26## L1 Cache 27 28To get best performance we use a transition table from each OpenGL state vector to neighbouring 29state vectors. The transition table points from GraphicsPipelineCache entries directly to 30neighbouring VkPipeline objects. When the application changes state the state change bits are 31recorded into a compact bit mask that covers the GraphicsPipelineDesc state vector. Then on the next 32draw call we scan the transition bit mask and compare the GraphicsPipelineDesc of the current state 33vector and the state vector of the cached transition. With the hash map we compute a hash over the 34entire state vector and then do a 256-byte `memcmp` to guard against hash collisions. With the 35transition table we will only compare as many bytes as were changed in the transition bit mask. By 36skipping the expensive hashing and `memcmp` we can get as good or faster performance than native 37OpenGL drivers. 38 39Note that the current design of the transition table stores transitions in an unsorted list. If 40applications map from one state to many this will slow down the transition time. This could be 41improved in the future using a faster look up. For instance we could keep a sorted transition table 42or use a small hash map for transitions. 43 44## L0 Cache 45 46The current active PSO is stored as a handle in the `ContextVk` for use between draws with no state 47change. 48 49[VkPipeline]: https://www.khronos.org/registry/vulkan/specs/1.1-extensions/man/html/VkPipeline.html 50[VkPipelineCache]: https://www.khronos.org/registry/vulkan/specs/1.1-extensions/man/html/VkPipelineCache.html 51[GraphicsPipelineCache]: https://chromium.googlesource.com/angle/angle/+/225f08bf85a368f905362cdd1366e4795680452c/src/libANGLE/renderer/vulkan/vk_cache_utils.h#498 52[GraphicsPipelineDesc]: https://chromium.googlesource.com/angle/angle/+/225f08bf85a368f905362cdd1366e4795680452c/src/libANGLE/renderer/vulkan/vk_cache_utils.h#244 53