1=============================================== 2Memory Tagging Extension (MTE) in AArch64 Linux 3=============================================== 4 5Authors: Vincenzo Frascino <vincenzo.frascino@arm.com> 6 Catalin Marinas <catalin.marinas@arm.com> 7 8Date: 2020-02-25 9 10This document describes the provision of the Memory Tagging Extension 11functionality in AArch64 Linux. 12 13Introduction 14============ 15 16ARMv8.5 based processors introduce the Memory Tagging Extension (MTE) 17feature. MTE is built on top of the ARMv8.0 virtual address tagging TBI 18(Top Byte Ignore) feature and allows software to access a 4-bit 19allocation tag for each 16-byte granule in the physical address space. 20Such memory range must be mapped with the Normal-Tagged memory 21attribute. A logical tag is derived from bits 59-56 of the virtual 22address used for the memory access. A CPU with MTE enabled will compare 23the logical tag against the allocation tag and potentially raise an 24exception on mismatch, subject to system registers configuration. 25 26Userspace Support 27================= 28 29When ``CONFIG_ARM64_MTE`` is selected and Memory Tagging Extension is 30supported by the hardware, the kernel advertises the feature to 31userspace via ``HWCAP2_MTE``. 32 33PROT_MTE 34-------- 35 36To access the allocation tags, a user process must enable the Tagged 37memory attribute on an address range using a new ``prot`` flag for 38``mmap()`` and ``mprotect()``: 39 40``PROT_MTE`` - Pages allow access to the MTE allocation tags. 41 42The allocation tag is set to 0 when such pages are first mapped in the 43user address space and preserved on copy-on-write. ``MAP_SHARED`` is 44supported and the allocation tags can be shared between processes. 45 46**Note**: ``PROT_MTE`` is only supported on ``MAP_ANONYMOUS`` and 47RAM-based file mappings (``tmpfs``, ``memfd``). Passing it to other 48types of mapping will result in ``-EINVAL`` returned by these system 49calls. 50 51**Note**: The ``PROT_MTE`` flag (and corresponding memory type) cannot 52be cleared by ``mprotect()``. 53 54**Note**: ``madvise()`` memory ranges with ``MADV_DONTNEED`` and 55``MADV_FREE`` may have the allocation tags cleared (set to 0) at any 56point after the system call. 57 58Tag Check Faults 59---------------- 60 61When ``PROT_MTE`` is enabled on an address range and a mismatch between 62the logical and allocation tags occurs on access, there are three 63configurable behaviours: 64 65- *Ignore* - This is the default mode. The CPU (and kernel) ignores the 66 tag check fault. 67 68- *Synchronous* - The kernel raises a ``SIGSEGV`` synchronously, with 69 ``.si_code = SEGV_MTESERR`` and ``.si_addr = <fault-address>``. The 70 memory access is not performed. If ``SIGSEGV`` is ignored or blocked 71 by the offending thread, the containing process is terminated with a 72 ``coredump``. 73 74- *Asynchronous* - The kernel raises a ``SIGSEGV``, in the offending 75 thread, asynchronously following one or multiple tag check faults, 76 with ``.si_code = SEGV_MTEAERR`` and ``.si_addr = 0`` (the faulting 77 address is unknown). 78 79The user can select the above modes, per thread, using the 80``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call where 81``flags`` contain one of the following values in the ``PR_MTE_TCF_MASK`` 82bit-field: 83 84- ``PR_MTE_TCF_NONE`` - *Ignore* tag check faults 85- ``PR_MTE_TCF_SYNC`` - *Synchronous* tag check fault mode 86- ``PR_MTE_TCF_ASYNC`` - *Asynchronous* tag check fault mode 87 88The current tag check fault mode can be read using the 89``prctl(PR_GET_TAGGED_ADDR_CTRL, 0, 0, 0, 0)`` system call. 90 91Tag checking can also be disabled for a user thread by setting the 92``PSTATE.TCO`` bit with ``MSR TCO, #1``. 93 94**Note**: Signal handlers are always invoked with ``PSTATE.TCO = 0``, 95irrespective of the interrupted context. ``PSTATE.TCO`` is restored on 96``sigreturn()``. 97 98**Note**: There are no *match-all* logical tags available for user 99applications. 100 101**Note**: Kernel accesses to the user address space (e.g. ``read()`` 102system call) are not checked if the user thread tag checking mode is 103``PR_MTE_TCF_NONE`` or ``PR_MTE_TCF_ASYNC``. If the tag checking mode is 104``PR_MTE_TCF_SYNC``, the kernel makes a best effort to check its user 105address accesses, however it cannot always guarantee it. Kernel accesses 106to user addresses are always performed with an effective ``PSTATE.TCO`` 107value of zero, regardless of the user configuration. 108 109Excluding Tags in the ``IRG``, ``ADDG`` and ``SUBG`` instructions 110----------------------------------------------------------------- 111 112The architecture allows excluding certain tags to be randomly generated 113via the ``GCR_EL1.Exclude`` register bit-field. By default, Linux 114excludes all tags other than 0. A user thread can enable specific tags 115in the randomly generated set using the ``prctl(PR_SET_TAGGED_ADDR_CTRL, 116flags, 0, 0, 0)`` system call where ``flags`` contains the tags bitmap 117in the ``PR_MTE_TAG_MASK`` bit-field. 118 119**Note**: The hardware uses an exclude mask but the ``prctl()`` 120interface provides an include mask. An include mask of ``0`` (exclusion 121mask ``0xffff``) results in the CPU always generating tag ``0``. 122 123Initial process state 124--------------------- 125 126On ``execve()``, the new process has the following configuration: 127 128- ``PR_TAGGED_ADDR_ENABLE`` set to 0 (disabled) 129- Tag checking mode set to ``PR_MTE_TCF_NONE`` 130- ``PR_MTE_TAG_MASK`` set to 0 (all tags excluded) 131- ``PSTATE.TCO`` set to 0 132- ``PROT_MTE`` not set on any of the initial memory maps 133 134On ``fork()``, the new process inherits the parent's configuration and 135memory map attributes with the exception of the ``madvise()`` ranges 136with ``MADV_WIPEONFORK`` which will have the data and tags cleared (set 137to 0). 138 139The ``ptrace()`` interface 140-------------------------- 141 142``PTRACE_PEEKMTETAGS`` and ``PTRACE_POKEMTETAGS`` allow a tracer to read 143the tags from or set the tags to a tracee's address space. The 144``ptrace()`` system call is invoked as ``ptrace(request, pid, addr, 145data)`` where: 146 147- ``request`` - one of ``PTRACE_PEEKMTETAGS`` or ``PTRACE_POKEMTETAGS``. 148- ``pid`` - the tracee's PID. 149- ``addr`` - address in the tracee's address space. 150- ``data`` - pointer to a ``struct iovec`` where ``iov_base`` points to 151 a buffer of ``iov_len`` length in the tracer's address space. 152 153The tags in the tracer's ``iov_base`` buffer are represented as one 1544-bit tag per byte and correspond to a 16-byte MTE tag granule in the 155tracee's address space. 156 157**Note**: If ``addr`` is not aligned to a 16-byte granule, the kernel 158will use the corresponding aligned address. 159 160``ptrace()`` return value: 161 162- 0 - tags were copied, the tracer's ``iov_len`` was updated to the 163 number of tags transferred. This may be smaller than the requested 164 ``iov_len`` if the requested address range in the tracee's or the 165 tracer's space cannot be accessed or does not have valid tags. 166- ``-EPERM`` - the specified process cannot be traced. 167- ``-EIO`` - the tracee's address range cannot be accessed (e.g. invalid 168 address) and no tags copied. ``iov_len`` not updated. 169- ``-EFAULT`` - fault on accessing the tracer's memory (``struct iovec`` 170 or ``iov_base`` buffer) and no tags copied. ``iov_len`` not updated. 171- ``-EOPNOTSUPP`` - the tracee's address does not have valid tags (never 172 mapped with the ``PROT_MTE`` flag). ``iov_len`` not updated. 173 174**Note**: There are no transient errors for the requests above, so user 175programs should not retry in case of a non-zero system call return. 176 177``PTRACE_GETREGSET`` and ``PTRACE_SETREGSET`` with ``addr == 178``NT_ARM_TAGGED_ADDR_CTRL`` allow ``ptrace()`` access to the tagged 179address ABI control and MTE configuration of a process as per the 180``prctl()`` options described in 181Documentation/arm64/tagged-address-abi.rst and above. The corresponding 182``regset`` is 1 element of 8 bytes (``sizeof(long))``). 183 184Example of correct usage 185======================== 186 187*MTE Example code* 188 189.. code-block:: c 190 191 /* 192 * To be compiled with -march=armv8.5-a+memtag 193 */ 194 #include <errno.h> 195 #include <stdint.h> 196 #include <stdio.h> 197 #include <stdlib.h> 198 #include <unistd.h> 199 #include <sys/auxv.h> 200 #include <sys/mman.h> 201 #include <sys/prctl.h> 202 203 /* 204 * From arch/arm64/include/uapi/asm/hwcap.h 205 */ 206 #define HWCAP2_MTE (1 << 18) 207 208 /* 209 * From arch/arm64/include/uapi/asm/mman.h 210 */ 211 #define PROT_MTE 0x20 212 213 /* 214 * From include/uapi/linux/prctl.h 215 */ 216 #define PR_SET_TAGGED_ADDR_CTRL 55 217 #define PR_GET_TAGGED_ADDR_CTRL 56 218 # define PR_TAGGED_ADDR_ENABLE (1UL << 0) 219 # define PR_MTE_TCF_SHIFT 1 220 # define PR_MTE_TCF_NONE (0UL << PR_MTE_TCF_SHIFT) 221 # define PR_MTE_TCF_SYNC (1UL << PR_MTE_TCF_SHIFT) 222 # define PR_MTE_TCF_ASYNC (2UL << PR_MTE_TCF_SHIFT) 223 # define PR_MTE_TCF_MASK (3UL << PR_MTE_TCF_SHIFT) 224 # define PR_MTE_TAG_SHIFT 3 225 # define PR_MTE_TAG_MASK (0xffffUL << PR_MTE_TAG_SHIFT) 226 227 /* 228 * Insert a random logical tag into the given pointer. 229 */ 230 #define insert_random_tag(ptr) ({ \ 231 uint64_t __val; \ 232 asm("irg %0, %1" : "=r" (__val) : "r" (ptr)); \ 233 __val; \ 234 }) 235 236 /* 237 * Set the allocation tag on the destination address. 238 */ 239 #define set_tag(tagged_addr) do { \ 240 asm volatile("stg %0, [%0]" : : "r" (tagged_addr) : "memory"); \ 241 } while (0) 242 243 int main() 244 { 245 unsigned char *a; 246 unsigned long page_sz = sysconf(_SC_PAGESIZE); 247 unsigned long hwcap2 = getauxval(AT_HWCAP2); 248 249 /* check if MTE is present */ 250 if (!(hwcap2 & HWCAP2_MTE)) 251 return EXIT_FAILURE; 252 253 /* 254 * Enable the tagged address ABI, synchronous MTE tag check faults and 255 * allow all non-zero tags in the randomly generated set. 256 */ 257 if (prctl(PR_SET_TAGGED_ADDR_CTRL, 258 PR_TAGGED_ADDR_ENABLE | PR_MTE_TCF_SYNC | (0xfffe << PR_MTE_TAG_SHIFT), 259 0, 0, 0)) { 260 perror("prctl() failed"); 261 return EXIT_FAILURE; 262 } 263 264 a = mmap(0, page_sz, PROT_READ | PROT_WRITE, 265 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); 266 if (a == MAP_FAILED) { 267 perror("mmap() failed"); 268 return EXIT_FAILURE; 269 } 270 271 /* 272 * Enable MTE on the above anonymous mmap. The flag could be passed 273 * directly to mmap() and skip this step. 274 */ 275 if (mprotect(a, page_sz, PROT_READ | PROT_WRITE | PROT_MTE)) { 276 perror("mprotect() failed"); 277 return EXIT_FAILURE; 278 } 279 280 /* access with the default tag (0) */ 281 a[0] = 1; 282 a[1] = 2; 283 284 printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]); 285 286 /* set the logical and allocation tags */ 287 a = (unsigned char *)insert_random_tag(a); 288 set_tag(a); 289 290 printf("%p\n", a); 291 292 /* non-zero tag access */ 293 a[0] = 3; 294 printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]); 295 296 /* 297 * If MTE is enabled correctly the next instruction will generate an 298 * exception. 299 */ 300 printf("Expecting SIGSEGV...\n"); 301 a[16] = 0xdd; 302 303 /* this should not be printed in the PR_MTE_TCF_SYNC mode */ 304 printf("...haven't got one\n"); 305 306 return EXIT_FAILURE; 307 } 308