1=============================================== 2Memory Tagging Extension (MTE) in AArch64 Linux 3=============================================== 4 5Authors: Vincenzo Frascino <vincenzo.frascino@arm.com> 6 Catalin Marinas <catalin.marinas@arm.com> 7 8Date: 2020-02-25 9 10This document describes the provision of the Memory Tagging Extension 11functionality in AArch64 Linux. 12 13Introduction 14============ 15 16ARMv8.5 based processors introduce the Memory Tagging Extension (MTE) 17feature. MTE is built on top of the ARMv8.0 virtual address tagging TBI 18(Top Byte Ignore) feature and allows software to access a 4-bit 19allocation tag for each 16-byte granule in the physical address space. 20Such memory range must be mapped with the Normal-Tagged memory 21attribute. A logical tag is derived from bits 59-56 of the virtual 22address used for the memory access. A CPU with MTE enabled will compare 23the logical tag against the allocation tag and potentially raise an 24exception on mismatch, subject to system registers configuration. 25 26Userspace Support 27================= 28 29When ``CONFIG_ARM64_MTE`` is selected and Memory Tagging Extension is 30supported by the hardware, the kernel advertises the feature to 31userspace via ``HWCAP2_MTE``. 32 33PROT_MTE 34-------- 35 36To access the allocation tags, a user process must enable the Tagged 37memory attribute on an address range using a new ``prot`` flag for 38``mmap()`` and ``mprotect()``: 39 40``PROT_MTE`` - Pages allow access to the MTE allocation tags. 41 42The allocation tag is set to 0 when such pages are first mapped in the 43user address space and preserved on copy-on-write. ``MAP_SHARED`` is 44supported and the allocation tags can be shared between processes. 45 46**Note**: ``PROT_MTE`` is only supported on ``MAP_ANONYMOUS`` and 47RAM-based file mappings (``tmpfs``, ``memfd``). Passing it to other 48types of mapping will result in ``-EINVAL`` returned by these system 49calls. 50 51**Note**: The ``PROT_MTE`` flag (and corresponding memory type) cannot 52be cleared by ``mprotect()``. 53 54**Note**: ``madvise()`` memory ranges with ``MADV_DONTNEED`` and 55``MADV_FREE`` may have the allocation tags cleared (set to 0) at any 56point after the system call. 57 58Tag Check Faults 59---------------- 60 61When ``PROT_MTE`` is enabled on an address range and a mismatch between 62the logical and allocation tags occurs on access, there are three 63configurable behaviours: 64 65- *Ignore* - This is the default mode. The CPU (and kernel) ignores the 66 tag check fault. 67 68- *Synchronous* - The kernel raises a ``SIGSEGV`` synchronously, with 69 ``.si_code = SEGV_MTESERR`` and ``.si_addr = <fault-address>``. The 70 memory access is not performed. If ``SIGSEGV`` is ignored or blocked 71 by the offending thread, the containing process is terminated with a 72 ``coredump``. 73 74- *Asynchronous* - The kernel raises a ``SIGSEGV``, in the offending 75 thread, asynchronously following one or multiple tag check faults, 76 with ``.si_code = SEGV_MTEAERR`` and ``.si_addr = 0`` (the faulting 77 address is unknown). 78 79- *Asymmetric* - Reads are handled as for synchronous mode while writes 80 are handled as for asynchronous mode. 81 82The user can select the above modes, per thread, using the 83``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call where ``flags`` 84contains any number of the following values in the ``PR_MTE_TCF_MASK`` 85bit-field: 86 87- ``PR_MTE_TCF_NONE`` - *Ignore* tag check faults 88 (ignored if combined with other options) 89- ``PR_MTE_TCF_SYNC`` - *Synchronous* tag check fault mode 90- ``PR_MTE_TCF_ASYNC`` - *Asynchronous* tag check fault mode 91 92If no modes are specified, tag check faults are ignored. If a single 93mode is specified, the program will run in that mode. If multiple 94modes are specified, the mode is selected as described in the "Per-CPU 95preferred tag checking modes" section below. 96 97The current tag check fault mode can be read using the 98``prctl(PR_GET_TAGGED_ADDR_CTRL, 0, 0, 0, 0)`` system call. 99 100Tag checking can also be disabled for a user thread by setting the 101``PSTATE.TCO`` bit with ``MSR TCO, #1``. 102 103**Note**: Signal handlers are always invoked with ``PSTATE.TCO = 0``, 104irrespective of the interrupted context. ``PSTATE.TCO`` is restored on 105``sigreturn()``. 106 107**Note**: There are no *match-all* logical tags available for user 108applications. 109 110**Note**: Kernel accesses to the user address space (e.g. ``read()`` 111system call) are not checked if the user thread tag checking mode is 112``PR_MTE_TCF_NONE`` or ``PR_MTE_TCF_ASYNC``. If the tag checking mode is 113``PR_MTE_TCF_SYNC``, the kernel makes a best effort to check its user 114address accesses, however it cannot always guarantee it. Kernel accesses 115to user addresses are always performed with an effective ``PSTATE.TCO`` 116value of zero, regardless of the user configuration. 117 118Excluding Tags in the ``IRG``, ``ADDG`` and ``SUBG`` instructions 119----------------------------------------------------------------- 120 121The architecture allows excluding certain tags to be randomly generated 122via the ``GCR_EL1.Exclude`` register bit-field. By default, Linux 123excludes all tags other than 0. A user thread can enable specific tags 124in the randomly generated set using the ``prctl(PR_SET_TAGGED_ADDR_CTRL, 125flags, 0, 0, 0)`` system call where ``flags`` contains the tags bitmap 126in the ``PR_MTE_TAG_MASK`` bit-field. 127 128**Note**: The hardware uses an exclude mask but the ``prctl()`` 129interface provides an include mask. An include mask of ``0`` (exclusion 130mask ``0xffff``) results in the CPU always generating tag ``0``. 131 132Per-CPU preferred tag checking mode 133----------------------------------- 134 135On some CPUs the performance of MTE in stricter tag checking modes 136is similar to that of less strict tag checking modes. This makes it 137worthwhile to enable stricter checks on those CPUs when a less strict 138checking mode is requested, in order to gain the error detection 139benefits of the stricter checks without the performance downsides. To 140support this scenario, a privileged user may configure a stricter 141tag checking mode as the CPU's preferred tag checking mode. 142 143The preferred tag checking mode for each CPU is controlled by 144``/sys/devices/system/cpu/cpu<N>/mte_tcf_preferred``, to which a 145privileged user may write the value ``async``, ``sync`` or ``asymm``. The 146default preferred mode for each CPU is ``async``. 147 148To allow a program to potentially run in the CPU's preferred tag 149checking mode, the user program may set multiple tag check fault mode 150bits in the ``flags`` argument to the ``prctl(PR_SET_TAGGED_ADDR_CTRL, 151flags, 0, 0, 0)`` system call. If both synchronous and asynchronous 152modes are requested then asymmetric mode may also be selected by the 153kernel. If the CPU's preferred tag checking mode is in the task's set 154of provided tag checking modes, that mode will be selected. Otherwise, 155one of the modes in the task's mode will be selected by the kernel 156from the task's mode set using the preference order: 157 158 1. Asynchronous 159 2. Asymmetric 160 3. Synchronous 161 162Note that there is no way for userspace to request multiple modes and 163also disable asymmetric mode. 164 165Initial process state 166--------------------- 167 168On ``execve()``, the new process has the following configuration: 169 170- ``PR_TAGGED_ADDR_ENABLE`` set to 0 (disabled) 171- No tag checking modes are selected (tag check faults ignored) 172- ``PR_MTE_TAG_MASK`` set to 0 (all tags excluded) 173- ``PSTATE.TCO`` set to 0 174- ``PROT_MTE`` not set on any of the initial memory maps 175 176On ``fork()``, the new process inherits the parent's configuration and 177memory map attributes with the exception of the ``madvise()`` ranges 178with ``MADV_WIPEONFORK`` which will have the data and tags cleared (set 179to 0). 180 181The ``ptrace()`` interface 182-------------------------- 183 184``PTRACE_PEEKMTETAGS`` and ``PTRACE_POKEMTETAGS`` allow a tracer to read 185the tags from or set the tags to a tracee's address space. The 186``ptrace()`` system call is invoked as ``ptrace(request, pid, addr, 187data)`` where: 188 189- ``request`` - one of ``PTRACE_PEEKMTETAGS`` or ``PTRACE_POKEMTETAGS``. 190- ``pid`` - the tracee's PID. 191- ``addr`` - address in the tracee's address space. 192- ``data`` - pointer to a ``struct iovec`` where ``iov_base`` points to 193 a buffer of ``iov_len`` length in the tracer's address space. 194 195The tags in the tracer's ``iov_base`` buffer are represented as one 1964-bit tag per byte and correspond to a 16-byte MTE tag granule in the 197tracee's address space. 198 199**Note**: If ``addr`` is not aligned to a 16-byte granule, the kernel 200will use the corresponding aligned address. 201 202``ptrace()`` return value: 203 204- 0 - tags were copied, the tracer's ``iov_len`` was updated to the 205 number of tags transferred. This may be smaller than the requested 206 ``iov_len`` if the requested address range in the tracee's or the 207 tracer's space cannot be accessed or does not have valid tags. 208- ``-EPERM`` - the specified process cannot be traced. 209- ``-EIO`` - the tracee's address range cannot be accessed (e.g. invalid 210 address) and no tags copied. ``iov_len`` not updated. 211- ``-EFAULT`` - fault on accessing the tracer's memory (``struct iovec`` 212 or ``iov_base`` buffer) and no tags copied. ``iov_len`` not updated. 213- ``-EOPNOTSUPP`` - the tracee's address does not have valid tags (never 214 mapped with the ``PROT_MTE`` flag). ``iov_len`` not updated. 215 216**Note**: There are no transient errors for the requests above, so user 217programs should not retry in case of a non-zero system call return. 218 219``PTRACE_GETREGSET`` and ``PTRACE_SETREGSET`` with ``addr == 220``NT_ARM_TAGGED_ADDR_CTRL`` allow ``ptrace()`` access to the tagged 221address ABI control and MTE configuration of a process as per the 222``prctl()`` options described in 223Documentation/arm64/tagged-address-abi.rst and above. The corresponding 224``regset`` is 1 element of 8 bytes (``sizeof(long))``). 225 226Example of correct usage 227======================== 228 229*MTE Example code* 230 231.. code-block:: c 232 233 /* 234 * To be compiled with -march=armv8.5-a+memtag 235 */ 236 #include <errno.h> 237 #include <stdint.h> 238 #include <stdio.h> 239 #include <stdlib.h> 240 #include <unistd.h> 241 #include <sys/auxv.h> 242 #include <sys/mman.h> 243 #include <sys/prctl.h> 244 245 /* 246 * From arch/arm64/include/uapi/asm/hwcap.h 247 */ 248 #define HWCAP2_MTE (1 << 18) 249 250 /* 251 * From arch/arm64/include/uapi/asm/mman.h 252 */ 253 #define PROT_MTE 0x20 254 255 /* 256 * From include/uapi/linux/prctl.h 257 */ 258 #define PR_SET_TAGGED_ADDR_CTRL 55 259 #define PR_GET_TAGGED_ADDR_CTRL 56 260 # define PR_TAGGED_ADDR_ENABLE (1UL << 0) 261 # define PR_MTE_TCF_SHIFT 1 262 # define PR_MTE_TCF_NONE (0UL << PR_MTE_TCF_SHIFT) 263 # define PR_MTE_TCF_SYNC (1UL << PR_MTE_TCF_SHIFT) 264 # define PR_MTE_TCF_ASYNC (2UL << PR_MTE_TCF_SHIFT) 265 # define PR_MTE_TCF_MASK (3UL << PR_MTE_TCF_SHIFT) 266 # define PR_MTE_TAG_SHIFT 3 267 # define PR_MTE_TAG_MASK (0xffffUL << PR_MTE_TAG_SHIFT) 268 269 /* 270 * Insert a random logical tag into the given pointer. 271 */ 272 #define insert_random_tag(ptr) ({ \ 273 uint64_t __val; \ 274 asm("irg %0, %1" : "=r" (__val) : "r" (ptr)); \ 275 __val; \ 276 }) 277 278 /* 279 * Set the allocation tag on the destination address. 280 */ 281 #define set_tag(tagged_addr) do { \ 282 asm volatile("stg %0, [%0]" : : "r" (tagged_addr) : "memory"); \ 283 } while (0) 284 285 int main() 286 { 287 unsigned char *a; 288 unsigned long page_sz = sysconf(_SC_PAGESIZE); 289 unsigned long hwcap2 = getauxval(AT_HWCAP2); 290 291 /* check if MTE is present */ 292 if (!(hwcap2 & HWCAP2_MTE)) 293 return EXIT_FAILURE; 294 295 /* 296 * Enable the tagged address ABI, synchronous or asynchronous MTE 297 * tag check faults (based on per-CPU preference) and allow all 298 * non-zero tags in the randomly generated set. 299 */ 300 if (prctl(PR_SET_TAGGED_ADDR_CTRL, 301 PR_TAGGED_ADDR_ENABLE | PR_MTE_TCF_SYNC | PR_MTE_TCF_ASYNC | 302 (0xfffe << PR_MTE_TAG_SHIFT), 303 0, 0, 0)) { 304 perror("prctl() failed"); 305 return EXIT_FAILURE; 306 } 307 308 a = mmap(0, page_sz, PROT_READ | PROT_WRITE, 309 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); 310 if (a == MAP_FAILED) { 311 perror("mmap() failed"); 312 return EXIT_FAILURE; 313 } 314 315 /* 316 * Enable MTE on the above anonymous mmap. The flag could be passed 317 * directly to mmap() and skip this step. 318 */ 319 if (mprotect(a, page_sz, PROT_READ | PROT_WRITE | PROT_MTE)) { 320 perror("mprotect() failed"); 321 return EXIT_FAILURE; 322 } 323 324 /* access with the default tag (0) */ 325 a[0] = 1; 326 a[1] = 2; 327 328 printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]); 329 330 /* set the logical and allocation tags */ 331 a = (unsigned char *)insert_random_tag(a); 332 set_tag(a); 333 334 printf("%p\n", a); 335 336 /* non-zero tag access */ 337 a[0] = 3; 338 printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]); 339 340 /* 341 * If MTE is enabled correctly the next instruction will generate an 342 * exception. 343 */ 344 printf("Expecting SIGSEGV...\n"); 345 a[16] = 0xdd; 346 347 /* this should not be printed in the PR_MTE_TCF_SYNC mode */ 348 printf("...haven't got one\n"); 349 350 return EXIT_FAILURE; 351 } 352