• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1===============================================
2Memory Tagging Extension (MTE) in AArch64 Linux
3===============================================
4
5Authors: Vincenzo Frascino <vincenzo.frascino@arm.com>
6         Catalin Marinas <catalin.marinas@arm.com>
7
8Date: 2020-02-25
9
10This document describes the provision of the Memory Tagging Extension
11functionality in AArch64 Linux.
12
13Introduction
14============
15
16ARMv8.5 based processors introduce the Memory Tagging Extension (MTE)
17feature. MTE is built on top of the ARMv8.0 virtual address tagging TBI
18(Top Byte Ignore) feature and allows software to access a 4-bit
19allocation tag for each 16-byte granule in the physical address space.
20Such memory range must be mapped with the Normal-Tagged memory
21attribute. A logical tag is derived from bits 59-56 of the virtual
22address used for the memory access. A CPU with MTE enabled will compare
23the logical tag against the allocation tag and potentially raise an
24exception on mismatch, subject to system registers configuration.
25
26Userspace Support
27=================
28
29When ``CONFIG_ARM64_MTE`` is selected and Memory Tagging Extension is
30supported by the hardware, the kernel advertises the feature to
31userspace via ``HWCAP2_MTE``.
32
33PROT_MTE
34--------
35
36To access the allocation tags, a user process must enable the Tagged
37memory attribute on an address range using a new ``prot`` flag for
38``mmap()`` and ``mprotect()``:
39
40``PROT_MTE`` - Pages allow access to the MTE allocation tags.
41
42The allocation tag is set to 0 when such pages are first mapped in the
43user address space and preserved on copy-on-write. ``MAP_SHARED`` is
44supported and the allocation tags can be shared between processes.
45
46**Note**: ``PROT_MTE`` is only supported on ``MAP_ANONYMOUS`` and
47RAM-based file mappings (``tmpfs``, ``memfd``). Passing it to other
48types of mapping will result in ``-EINVAL`` returned by these system
49calls.
50
51**Note**: The ``PROT_MTE`` flag (and corresponding memory type) cannot
52be cleared by ``mprotect()``.
53
54**Note**: ``madvise()`` memory ranges with ``MADV_DONTNEED`` and
55``MADV_FREE`` may have the allocation tags cleared (set to 0) at any
56point after the system call.
57
58Tag Check Faults
59----------------
60
61When ``PROT_MTE`` is enabled on an address range and a mismatch between
62the logical and allocation tags occurs on access, there are three
63configurable behaviours:
64
65- *Ignore* - This is the default mode. The CPU (and kernel) ignores the
66  tag check fault.
67
68- *Synchronous* - The kernel raises a ``SIGSEGV`` synchronously, with
69  ``.si_code = SEGV_MTESERR`` and ``.si_addr = <fault-address>``. The
70  memory access is not performed. If ``SIGSEGV`` is ignored or blocked
71  by the offending thread, the containing process is terminated with a
72  ``coredump``.
73
74- *Asynchronous* - The kernel raises a ``SIGSEGV``, in the offending
75  thread, asynchronously following one or multiple tag check faults,
76  with ``.si_code = SEGV_MTEAERR`` and ``.si_addr = 0`` (the faulting
77  address is unknown).
78
79- *Asymmetric* - Reads are handled as for synchronous mode while writes
80  are handled as for asynchronous mode.
81
82The user can select the above modes, per thread, using the
83``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call where ``flags``
84contains any number of the following values in the ``PR_MTE_TCF_MASK``
85bit-field:
86
87- ``PR_MTE_TCF_NONE``  - *Ignore* tag check faults
88                         (ignored if combined with other options)
89- ``PR_MTE_TCF_SYNC``  - *Synchronous* tag check fault mode
90- ``PR_MTE_TCF_ASYNC`` - *Asynchronous* tag check fault mode
91
92If no modes are specified, tag check faults are ignored. If a single
93mode is specified, the program will run in that mode. If multiple
94modes are specified, the mode is selected as described in the "Per-CPU
95preferred tag checking modes" section below.
96
97The current tag check fault mode can be read using the
98``prctl(PR_GET_TAGGED_ADDR_CTRL, 0, 0, 0, 0)`` system call.
99
100Tag checking can also be disabled for a user thread by setting the
101``PSTATE.TCO`` bit with ``MSR TCO, #1``.
102
103**Note**: Signal handlers are always invoked with ``PSTATE.TCO = 0``,
104irrespective of the interrupted context. ``PSTATE.TCO`` is restored on
105``sigreturn()``.
106
107**Note**: There are no *match-all* logical tags available for user
108applications.
109
110**Note**: Kernel accesses to the user address space (e.g. ``read()``
111system call) are not checked if the user thread tag checking mode is
112``PR_MTE_TCF_NONE`` or ``PR_MTE_TCF_ASYNC``. If the tag checking mode is
113``PR_MTE_TCF_SYNC``, the kernel makes a best effort to check its user
114address accesses, however it cannot always guarantee it. Kernel accesses
115to user addresses are always performed with an effective ``PSTATE.TCO``
116value of zero, regardless of the user configuration.
117
118Excluding Tags in the ``IRG``, ``ADDG`` and ``SUBG`` instructions
119-----------------------------------------------------------------
120
121The architecture allows excluding certain tags to be randomly generated
122via the ``GCR_EL1.Exclude`` register bit-field. By default, Linux
123excludes all tags other than 0. A user thread can enable specific tags
124in the randomly generated set using the ``prctl(PR_SET_TAGGED_ADDR_CTRL,
125flags, 0, 0, 0)`` system call where ``flags`` contains the tags bitmap
126in the ``PR_MTE_TAG_MASK`` bit-field.
127
128**Note**: The hardware uses an exclude mask but the ``prctl()``
129interface provides an include mask. An include mask of ``0`` (exclusion
130mask ``0xffff``) results in the CPU always generating tag ``0``.
131
132Per-CPU preferred tag checking mode
133-----------------------------------
134
135On some CPUs the performance of MTE in stricter tag checking modes
136is similar to that of less strict tag checking modes. This makes it
137worthwhile to enable stricter checks on those CPUs when a less strict
138checking mode is requested, in order to gain the error detection
139benefits of the stricter checks without the performance downsides. To
140support this scenario, a privileged user may configure a stricter
141tag checking mode as the CPU's preferred tag checking mode.
142
143The preferred tag checking mode for each CPU is controlled by
144``/sys/devices/system/cpu/cpu<N>/mte_tcf_preferred``, to which a
145privileged user may write the value ``async``, ``sync`` or ``asymm``.  The
146default preferred mode for each CPU is ``async``.
147
148To allow a program to potentially run in the CPU's preferred tag
149checking mode, the user program may set multiple tag check fault mode
150bits in the ``flags`` argument to the ``prctl(PR_SET_TAGGED_ADDR_CTRL,
151flags, 0, 0, 0)`` system call. If both synchronous and asynchronous
152modes are requested then asymmetric mode may also be selected by the
153kernel. If the CPU's preferred tag checking mode is in the task's set
154of provided tag checking modes, that mode will be selected. Otherwise,
155one of the modes in the task's mode will be selected by the kernel
156from the task's mode set using the preference order:
157
158	1. Asynchronous
159	2. Asymmetric
160	3. Synchronous
161
162Note that there is no way for userspace to request multiple modes and
163also disable asymmetric mode.
164
165Initial process state
166---------------------
167
168On ``execve()``, the new process has the following configuration:
169
170- ``PR_TAGGED_ADDR_ENABLE`` set to 0 (disabled)
171- No tag checking modes are selected (tag check faults ignored)
172- ``PR_MTE_TAG_MASK`` set to 0 (all tags excluded)
173- ``PSTATE.TCO`` set to 0
174- ``PROT_MTE`` not set on any of the initial memory maps
175
176On ``fork()``, the new process inherits the parent's configuration and
177memory map attributes with the exception of the ``madvise()`` ranges
178with ``MADV_WIPEONFORK`` which will have the data and tags cleared (set
179to 0).
180
181The ``ptrace()`` interface
182--------------------------
183
184``PTRACE_PEEKMTETAGS`` and ``PTRACE_POKEMTETAGS`` allow a tracer to read
185the tags from or set the tags to a tracee's address space. The
186``ptrace()`` system call is invoked as ``ptrace(request, pid, addr,
187data)`` where:
188
189- ``request`` - one of ``PTRACE_PEEKMTETAGS`` or ``PTRACE_POKEMTETAGS``.
190- ``pid`` - the tracee's PID.
191- ``addr`` - address in the tracee's address space.
192- ``data`` - pointer to a ``struct iovec`` where ``iov_base`` points to
193  a buffer of ``iov_len`` length in the tracer's address space.
194
195The tags in the tracer's ``iov_base`` buffer are represented as one
1964-bit tag per byte and correspond to a 16-byte MTE tag granule in the
197tracee's address space.
198
199**Note**: If ``addr`` is not aligned to a 16-byte granule, the kernel
200will use the corresponding aligned address.
201
202``ptrace()`` return value:
203
204- 0 - tags were copied, the tracer's ``iov_len`` was updated to the
205  number of tags transferred. This may be smaller than the requested
206  ``iov_len`` if the requested address range in the tracee's or the
207  tracer's space cannot be accessed or does not have valid tags.
208- ``-EPERM`` - the specified process cannot be traced.
209- ``-EIO`` - the tracee's address range cannot be accessed (e.g. invalid
210  address) and no tags copied. ``iov_len`` not updated.
211- ``-EFAULT`` - fault on accessing the tracer's memory (``struct iovec``
212  or ``iov_base`` buffer) and no tags copied. ``iov_len`` not updated.
213- ``-EOPNOTSUPP`` - the tracee's address does not have valid tags (never
214  mapped with the ``PROT_MTE`` flag). ``iov_len`` not updated.
215
216**Note**: There are no transient errors for the requests above, so user
217programs should not retry in case of a non-zero system call return.
218
219``PTRACE_GETREGSET`` and ``PTRACE_SETREGSET`` with ``addr ==
220``NT_ARM_TAGGED_ADDR_CTRL`` allow ``ptrace()`` access to the tagged
221address ABI control and MTE configuration of a process as per the
222``prctl()`` options described in
223Documentation/arm64/tagged-address-abi.rst and above. The corresponding
224``regset`` is 1 element of 8 bytes (``sizeof(long))``).
225
226Example of correct usage
227========================
228
229*MTE Example code*
230
231.. code-block:: c
232
233    /*
234     * To be compiled with -march=armv8.5-a+memtag
235     */
236    #include <errno.h>
237    #include <stdint.h>
238    #include <stdio.h>
239    #include <stdlib.h>
240    #include <unistd.h>
241    #include <sys/auxv.h>
242    #include <sys/mman.h>
243    #include <sys/prctl.h>
244
245    /*
246     * From arch/arm64/include/uapi/asm/hwcap.h
247     */
248    #define HWCAP2_MTE              (1 << 18)
249
250    /*
251     * From arch/arm64/include/uapi/asm/mman.h
252     */
253    #define PROT_MTE                 0x20
254
255    /*
256     * From include/uapi/linux/prctl.h
257     */
258    #define PR_SET_TAGGED_ADDR_CTRL 55
259    #define PR_GET_TAGGED_ADDR_CTRL 56
260    # define PR_TAGGED_ADDR_ENABLE  (1UL << 0)
261    # define PR_MTE_TCF_SHIFT       1
262    # define PR_MTE_TCF_NONE        (0UL << PR_MTE_TCF_SHIFT)
263    # define PR_MTE_TCF_SYNC        (1UL << PR_MTE_TCF_SHIFT)
264    # define PR_MTE_TCF_ASYNC       (2UL << PR_MTE_TCF_SHIFT)
265    # define PR_MTE_TCF_MASK        (3UL << PR_MTE_TCF_SHIFT)
266    # define PR_MTE_TAG_SHIFT       3
267    # define PR_MTE_TAG_MASK        (0xffffUL << PR_MTE_TAG_SHIFT)
268
269    /*
270     * Insert a random logical tag into the given pointer.
271     */
272    #define insert_random_tag(ptr) ({                       \
273            uint64_t __val;                                 \
274            asm("irg %0, %1" : "=r" (__val) : "r" (ptr));   \
275            __val;                                          \
276    })
277
278    /*
279     * Set the allocation tag on the destination address.
280     */
281    #define set_tag(tagged_addr) do {                                      \
282            asm volatile("stg %0, [%0]" : : "r" (tagged_addr) : "memory"); \
283    } while (0)
284
285    int main()
286    {
287            unsigned char *a;
288            unsigned long page_sz = sysconf(_SC_PAGESIZE);
289            unsigned long hwcap2 = getauxval(AT_HWCAP2);
290
291            /* check if MTE is present */
292            if (!(hwcap2 & HWCAP2_MTE))
293                    return EXIT_FAILURE;
294
295            /*
296             * Enable the tagged address ABI, synchronous or asynchronous MTE
297             * tag check faults (based on per-CPU preference) and allow all
298             * non-zero tags in the randomly generated set.
299             */
300            if (prctl(PR_SET_TAGGED_ADDR_CTRL,
301                      PR_TAGGED_ADDR_ENABLE | PR_MTE_TCF_SYNC | PR_MTE_TCF_ASYNC |
302                      (0xfffe << PR_MTE_TAG_SHIFT),
303                      0, 0, 0)) {
304                    perror("prctl() failed");
305                    return EXIT_FAILURE;
306            }
307
308            a = mmap(0, page_sz, PROT_READ | PROT_WRITE,
309                     MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
310            if (a == MAP_FAILED) {
311                    perror("mmap() failed");
312                    return EXIT_FAILURE;
313            }
314
315            /*
316             * Enable MTE on the above anonymous mmap. The flag could be passed
317             * directly to mmap() and skip this step.
318             */
319            if (mprotect(a, page_sz, PROT_READ | PROT_WRITE | PROT_MTE)) {
320                    perror("mprotect() failed");
321                    return EXIT_FAILURE;
322            }
323
324            /* access with the default tag (0) */
325            a[0] = 1;
326            a[1] = 2;
327
328            printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]);
329
330            /* set the logical and allocation tags */
331            a = (unsigned char *)insert_random_tag(a);
332            set_tag(a);
333
334            printf("%p\n", a);
335
336            /* non-zero tag access */
337            a[0] = 3;
338            printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]);
339
340            /*
341             * If MTE is enabled correctly the next instruction will generate an
342             * exception.
343             */
344            printf("Expecting SIGSEGV...\n");
345            a[16] = 0xdd;
346
347            /* this should not be printed in the PR_MTE_TCF_SYNC mode */
348            printf("...haven't got one\n");
349
350            return EXIT_FAILURE;
351    }
352