1=============== 2ShadowCallStack 3=============== 4 5.. contents:: 6 :local: 7 8Introduction 9============ 10 11ShadowCallStack is an instrumentation pass, currently only implemented for 12aarch64, that protects programs against return address overwrites 13(e.g. stack buffer overflows.) It works by saving a function's return address 14to a separately allocated 'shadow call stack' in the function prolog in 15non-leaf functions and loading the return address from the shadow call stack 16in the function epilog. The return address is also stored on the regular stack 17for compatibility with unwinders, but is otherwise unused. 18 19The aarch64 implementation is considered production ready, and 20an `implementation of the runtime`_ has been added to Android's libc 21(bionic). An x86_64 implementation was evaluated using Chromium and was found 22to have critical performance and security deficiencies--it was removed in 23LLVM 9.0. Details on the x86_64 implementation can be found in the 24`Clang 7.0.1 documentation`_. 25 26.. _`implementation of the runtime`: https://android.googlesource.com/platform/bionic/+/808d176e7e0dd727c7f929622ec017f6e065c582/libc/bionic/pthread_create.cpp#128 27.. _`Clang 7.0.1 documentation`: https://releases.llvm.org/7.0.1/tools/clang/docs/ShadowCallStack.html 28 29Comparison 30---------- 31 32To optimize for memory consumption and cache locality, the shadow call 33stack stores only an array of return addresses. This is in contrast to other 34schemes, like :doc:`SafeStack`, that mirror the entire stack and trade-off 35consuming more memory for shorter function prologs and epilogs with fewer 36memory accesses. 37 38`Return Flow Guard`_ is a pure software implementation of shadow call stacks 39on x86_64. Like the previous implementation of ShadowCallStack on x86_64, it is 40inherently racy due to the architecture's use of the stack for calls and 41returns. 42 43Intel `Control-flow Enforcement Technology`_ (CET) is a proposed hardware 44extension that would add native support to use a shadow stack to store/check 45return addresses at call/return time. Being a hardware implementation, it 46would not suffer from race conditions and would not incur the overhead of 47function instrumentation, but it does require operating system support. 48 49.. _`Return Flow Guard`: https://xlab.tencent.com/en/2016/11/02/return-flow-guard/ 50.. _`Control-flow Enforcement Technology`: https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf 51 52Compatibility 53------------- 54 55A runtime is not provided in compiler-rt so one must be provided by the 56compiled application or the operating system. Integrating the runtime into 57the operating system should be preferred since otherwise all thread creation 58and destruction would need to be intercepted by the application. 59 60The instrumentation makes use of the platform register ``x18``. On some 61platforms, ``x18`` is reserved, and on others, it is designated as a scratch 62register. This generally means that any code that may run on the same thread 63as code compiled with ShadowCallStack must either target one of the platforms 64whose ABI reserves ``x18`` (currently Android, Darwin, Fuchsia and Windows) 65or be compiled with the flag ``-ffixed-x18``. If absolutely necessary, code 66compiled without ``-ffixed-x18`` may be run on the same thread as code that 67uses ShadowCallStack by saving the register value temporarily on the stack 68(`example in Android`_) but this should be done with care since it risks 69leaking the shadow call stack address. 70 71.. _`example in Android`: https://android-review.googlesource.com/c/platform/frameworks/base/+/803717 72 73Because of the use of register ``x18``, the ShadowCallStack feature is 74incompatible with any other feature that may use ``x18``. However, there 75is no inherent reason why ShadowCallStack needs to use register ``x18`` 76specifically; in principle, a platform could choose to reserve and use another 77register for ShadowCallStack, but this would be incompatible with the AAPCS64. 78 79Special unwind information is required on functions that are compiled 80with ShadowCallStack and that may be unwound, i.e. functions compiled with 81``-fexceptions`` (which is the default in C++). Some unwinders (such as the 82libgcc 4.9 unwinder) do not understand this unwind info and will segfault 83when encountering it. LLVM libunwind processes this unwind info correctly, 84however. This means that if exceptions are used together with ShadowCallStack, 85the program must use a compatible unwinder. 86 87Security 88======== 89 90ShadowCallStack is intended to be a stronger alternative to 91``-fstack-protector``. It protects from non-linear overflows and arbitrary 92memory writes to the return address slot. 93 94The instrumentation makes use of the ``x18`` register to reference the shadow 95call stack, meaning that references to the shadow call stack do not have 96to be stored in memory. This makes it possible to implement a runtime that 97avoids exposing the address of the shadow call stack to attackers that can 98read arbitrary memory. However, attackers could still try to exploit side 99channels exposed by the operating system `[1]`_ `[2]`_ or processor `[3]`_ 100to discover the address of the shadow call stack. 101 102.. _`[1]`: https://eyalitkin.wordpress.com/2017/09/01/cartography-lighting-up-the-shadows/ 103.. _`[2]`: https://www.blackhat.com/docs/eu-16/materials/eu-16-Goktas-Bypassing-Clangs-SafeStack.pdf 104.. _`[3]`: https://www.vusec.net/projects/anc/ 105 106Unless care is taken when allocating the shadow call stack, it may be 107possible for an attacker to guess its address using the addresses of 108other allocations. Therefore, the address should be chosen to make this 109difficult. One way to do this is to allocate a large guard region without 110read/write permissions, randomly select a small region within it to be 111used as the address of the shadow call stack and mark only that region as 112read/write. This also mitigates somewhat against processor side channels. 113The intent is that the Android runtime `will do this`_, but the platform will 114first need to be `changed`_ to avoid using ``setrlimit(RLIMIT_AS)`` to limit 115memory allocations in certain processes, as this also limits the number of 116guard regions that can be allocated. 117 118.. _`will do this`: https://android-review.googlesource.com/c/platform/bionic/+/891622 119.. _`changed`: https://android-review.googlesource.com/c/platform/frameworks/av/+/837745 120 121The runtime will need the address of the shadow call stack in order to 122deallocate it when destroying the thread. If the entire program is compiled 123with ``-ffixed-x18``, this is trivial: the address can be derived from the 124value stored in ``x18`` (e.g. by masking out the lower bits). If a guard 125region is used, the address of the start of the guard region could then be 126stored at the start of the shadow call stack itself. But if it is possible 127for code compiled without ``-ffixed-x18`` to run on a thread managed by the 128runtime, which is the case on Android for example, the address must be stored 129somewhere else instead. On Android we store the address of the start of the 130guard region in TLS and deallocate the entire guard region including the 131shadow call stack at thread exit. This is considered acceptable given that 132the address of the start of the guard region is already somewhat guessable. 133 134One way in which the address of the shadow call stack could leak is in the 135``jmp_buf`` data structure used by ``setjmp`` and ``longjmp``. The Android 136runtime `avoids this`_ by only storing the low bits of ``x18`` in the 137``jmp_buf``, which requires the address of the shadow call stack to be 138aligned to its size. 139 140.. _`avoids this`: https://android.googlesource.com/platform/bionic/+/808d176e7e0dd727c7f929622ec017f6e065c582/libc/arch-arm64/bionic/setjmp.S#49 141 142The architecture's call and return instructions (``bl`` and ``ret``) operate on 143a register rather than the stack, which means that leaf functions are generally 144protected from return address overwrites even without ShadowCallStack. 145 146Usage 147===== 148 149To enable ShadowCallStack, just pass the ``-fsanitize=shadow-call-stack`` 150flag to both compile and link command lines. On aarch64, you also need to pass 151``-ffixed-x18`` unless your target already reserves ``x18``. 152 153Low-level API 154------------- 155 156``__has_feature(shadow_call_stack)`` 157~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 158 159In some cases one may need to execute different code depending on whether 160ShadowCallStack is enabled. The macro ``__has_feature(shadow_call_stack)`` can 161be used for this purpose. 162 163.. code-block:: c 164 165 #if defined(__has_feature) 166 # if __has_feature(shadow_call_stack) 167 // code that builds only under ShadowCallStack 168 # endif 169 #endif 170 171``__attribute__((no_sanitize("shadow-call-stack")))`` 172~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 173 174Use ``__attribute__((no_sanitize("shadow-call-stack")))`` on a function 175declaration to specify that the shadow call stack instrumentation should not be 176applied to that function, even if enabled globally. 177 178Example 179======= 180 181The following example code: 182 183.. code-block:: c++ 184 185 int foo() { 186 return bar() + 1; 187 } 188 189Generates the following aarch64 assembly when compiled with ``-O2``: 190 191.. code-block:: none 192 193 stp x29, x30, [sp, #-16]! 194 mov x29, sp 195 bl bar 196 add w0, w0, #1 197 ldp x29, x30, [sp], #16 198 ret 199 200Adding ``-fsanitize=shadow-call-stack`` would output the following assembly: 201 202.. code-block:: none 203 204 str x30, [x18], #8 205 stp x29, x30, [sp, #-16]! 206 mov x29, sp 207 bl bar 208 add w0, w0, #1 209 ldp x29, x30, [sp], #16 210 ldr x30, [x18, #-8]! 211 ret 212