• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1===============================================
2Memory Tagging Extension (MTE) in AArch64 Linux
3===============================================
4
5Authors: Vincenzo Frascino <vincenzo.frascino@arm.com>
6         Catalin Marinas <catalin.marinas@arm.com>
7
8Date: 2020-02-25
9
10This document describes the provision of the Memory Tagging Extension
11functionality in AArch64 Linux.
12
13Introduction
14============
15
16ARMv8.5 based processors introduce the Memory Tagging Extension (MTE)
17feature. MTE is built on top of the ARMv8.0 virtual address tagging TBI
18(Top Byte Ignore) feature and allows software to access a 4-bit
19allocation tag for each 16-byte granule in the physical address space.
20Such memory range must be mapped with the Normal-Tagged memory
21attribute. A logical tag is derived from bits 59-56 of the virtual
22address used for the memory access. A CPU with MTE enabled will compare
23the logical tag against the allocation tag and potentially raise an
24exception on mismatch, subject to system registers configuration.
25
26Userspace Support
27=================
28
29When ``CONFIG_ARM64_MTE`` is selected and Memory Tagging Extension is
30supported by the hardware, the kernel advertises the feature to
31userspace via ``HWCAP2_MTE``.
32
33PROT_MTE
34--------
35
36To access the allocation tags, a user process must enable the Tagged
37memory attribute on an address range using a new ``prot`` flag for
38``mmap()`` and ``mprotect()``:
39
40``PROT_MTE`` - Pages allow access to the MTE allocation tags.
41
42The allocation tag is set to 0 when such pages are first mapped in the
43user address space and preserved on copy-on-write. ``MAP_SHARED`` is
44supported and the allocation tags can be shared between processes.
45
46**Note**: ``PROT_MTE`` is only supported on ``MAP_ANONYMOUS`` and
47RAM-based file mappings (``tmpfs``, ``memfd``). Passing it to other
48types of mapping will result in ``-EINVAL`` returned by these system
49calls.
50
51**Note**: The ``PROT_MTE`` flag (and corresponding memory type) cannot
52be cleared by ``mprotect()``.
53
54**Note**: ``madvise()`` memory ranges with ``MADV_DONTNEED`` and
55``MADV_FREE`` may have the allocation tags cleared (set to 0) at any
56point after the system call.
57
58Tag Check Faults
59----------------
60
61When ``PROT_MTE`` is enabled on an address range and a mismatch between
62the logical and allocation tags occurs on access, there are three
63configurable behaviours:
64
65- *Ignore* - This is the default mode. The CPU (and kernel) ignores the
66  tag check fault.
67
68- *Synchronous* - The kernel raises a ``SIGSEGV`` synchronously, with
69  ``.si_code = SEGV_MTESERR`` and ``.si_addr = <fault-address>``. The
70  memory access is not performed. If ``SIGSEGV`` is ignored or blocked
71  by the offending thread, the containing process is terminated with a
72  ``coredump``.
73
74- *Asynchronous* - The kernel raises a ``SIGSEGV``, in the offending
75  thread, asynchronously following one or multiple tag check faults,
76  with ``.si_code = SEGV_MTEAERR`` and ``.si_addr = 0`` (the faulting
77  address is unknown).
78
79The user can select the above modes, per thread, using the
80``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call where
81``flags`` contain one of the following values in the ``PR_MTE_TCF_MASK``
82bit-field:
83
84- ``PR_MTE_TCF_NONE``  - *Ignore* tag check faults
85- ``PR_MTE_TCF_SYNC``  - *Synchronous* tag check fault mode
86- ``PR_MTE_TCF_ASYNC`` - *Asynchronous* tag check fault mode
87
88The current tag check fault mode can be read using the
89``prctl(PR_GET_TAGGED_ADDR_CTRL, 0, 0, 0, 0)`` system call.
90
91Tag checking can also be disabled for a user thread by setting the
92``PSTATE.TCO`` bit with ``MSR TCO, #1``.
93
94**Note**: Signal handlers are always invoked with ``PSTATE.TCO = 0``,
95irrespective of the interrupted context. ``PSTATE.TCO`` is restored on
96``sigreturn()``.
97
98**Note**: There are no *match-all* logical tags available for user
99applications.
100
101**Note**: Kernel accesses to the user address space (e.g. ``read()``
102system call) are not checked if the user thread tag checking mode is
103``PR_MTE_TCF_NONE`` or ``PR_MTE_TCF_ASYNC``. If the tag checking mode is
104``PR_MTE_TCF_SYNC``, the kernel makes a best effort to check its user
105address accesses, however it cannot always guarantee it. Kernel accesses
106to user addresses are always performed with an effective ``PSTATE.TCO``
107value of zero, regardless of the user configuration.
108
109Excluding Tags in the ``IRG``, ``ADDG`` and ``SUBG`` instructions
110-----------------------------------------------------------------
111
112The architecture allows excluding certain tags to be randomly generated
113via the ``GCR_EL1.Exclude`` register bit-field. By default, Linux
114excludes all tags other than 0. A user thread can enable specific tags
115in the randomly generated set using the ``prctl(PR_SET_TAGGED_ADDR_CTRL,
116flags, 0, 0, 0)`` system call where ``flags`` contains the tags bitmap
117in the ``PR_MTE_TAG_MASK`` bit-field.
118
119**Note**: The hardware uses an exclude mask but the ``prctl()``
120interface provides an include mask. An include mask of ``0`` (exclusion
121mask ``0xffff``) results in the CPU always generating tag ``0``.
122
123Initial process state
124---------------------
125
126On ``execve()``, the new process has the following configuration:
127
128- ``PR_TAGGED_ADDR_ENABLE`` set to 0 (disabled)
129- Tag checking mode set to ``PR_MTE_TCF_NONE``
130- ``PR_MTE_TAG_MASK`` set to 0 (all tags excluded)
131- ``PSTATE.TCO`` set to 0
132- ``PROT_MTE`` not set on any of the initial memory maps
133
134On ``fork()``, the new process inherits the parent's configuration and
135memory map attributes with the exception of the ``madvise()`` ranges
136with ``MADV_WIPEONFORK`` which will have the data and tags cleared (set
137to 0).
138
139The ``ptrace()`` interface
140--------------------------
141
142``PTRACE_PEEKMTETAGS`` and ``PTRACE_POKEMTETAGS`` allow a tracer to read
143the tags from or set the tags to a tracee's address space. The
144``ptrace()`` system call is invoked as ``ptrace(request, pid, addr,
145data)`` where:
146
147- ``request`` - one of ``PTRACE_PEEKMTETAGS`` or ``PTRACE_POKEMTETAGS``.
148- ``pid`` - the tracee's PID.
149- ``addr`` - address in the tracee's address space.
150- ``data`` - pointer to a ``struct iovec`` where ``iov_base`` points to
151  a buffer of ``iov_len`` length in the tracer's address space.
152
153The tags in the tracer's ``iov_base`` buffer are represented as one
1544-bit tag per byte and correspond to a 16-byte MTE tag granule in the
155tracee's address space.
156
157**Note**: If ``addr`` is not aligned to a 16-byte granule, the kernel
158will use the corresponding aligned address.
159
160``ptrace()`` return value:
161
162- 0 - tags were copied, the tracer's ``iov_len`` was updated to the
163  number of tags transferred. This may be smaller than the requested
164  ``iov_len`` if the requested address range in the tracee's or the
165  tracer's space cannot be accessed or does not have valid tags.
166- ``-EPERM`` - the specified process cannot be traced.
167- ``-EIO`` - the tracee's address range cannot be accessed (e.g. invalid
168  address) and no tags copied. ``iov_len`` not updated.
169- ``-EFAULT`` - fault on accessing the tracer's memory (``struct iovec``
170  or ``iov_base`` buffer) and no tags copied. ``iov_len`` not updated.
171- ``-EOPNOTSUPP`` - the tracee's address does not have valid tags (never
172  mapped with the ``PROT_MTE`` flag). ``iov_len`` not updated.
173
174**Note**: There are no transient errors for the requests above, so user
175programs should not retry in case of a non-zero system call return.
176
177``PTRACE_GETREGSET`` and ``PTRACE_SETREGSET`` with ``addr ==
178``NT_ARM_TAGGED_ADDR_CTRL`` allow ``ptrace()`` access to the tagged
179address ABI control and MTE configuration of a process as per the
180``prctl()`` options described in
181Documentation/arm64/tagged-address-abi.rst and above. The corresponding
182``regset`` is 1 element of 8 bytes (``sizeof(long))``).
183
184Example of correct usage
185========================
186
187*MTE Example code*
188
189.. code-block:: c
190
191    /*
192     * To be compiled with -march=armv8.5-a+memtag
193     */
194    #include <errno.h>
195    #include <stdint.h>
196    #include <stdio.h>
197    #include <stdlib.h>
198    #include <unistd.h>
199    #include <sys/auxv.h>
200    #include <sys/mman.h>
201    #include <sys/prctl.h>
202
203    /*
204     * From arch/arm64/include/uapi/asm/hwcap.h
205     */
206    #define HWCAP2_MTE              (1 << 18)
207
208    /*
209     * From arch/arm64/include/uapi/asm/mman.h
210     */
211    #define PROT_MTE                 0x20
212
213    /*
214     * From include/uapi/linux/prctl.h
215     */
216    #define PR_SET_TAGGED_ADDR_CTRL 55
217    #define PR_GET_TAGGED_ADDR_CTRL 56
218    # define PR_TAGGED_ADDR_ENABLE  (1UL << 0)
219    # define PR_MTE_TCF_SHIFT       1
220    # define PR_MTE_TCF_NONE        (0UL << PR_MTE_TCF_SHIFT)
221    # define PR_MTE_TCF_SYNC        (1UL << PR_MTE_TCF_SHIFT)
222    # define PR_MTE_TCF_ASYNC       (2UL << PR_MTE_TCF_SHIFT)
223    # define PR_MTE_TCF_MASK        (3UL << PR_MTE_TCF_SHIFT)
224    # define PR_MTE_TAG_SHIFT       3
225    # define PR_MTE_TAG_MASK        (0xffffUL << PR_MTE_TAG_SHIFT)
226
227    /*
228     * Insert a random logical tag into the given pointer.
229     */
230    #define insert_random_tag(ptr) ({                       \
231            uint64_t __val;                                 \
232            asm("irg %0, %1" : "=r" (__val) : "r" (ptr));   \
233            __val;                                          \
234    })
235
236    /*
237     * Set the allocation tag on the destination address.
238     */
239    #define set_tag(tagged_addr) do {                                      \
240            asm volatile("stg %0, [%0]" : : "r" (tagged_addr) : "memory"); \
241    } while (0)
242
243    int main()
244    {
245            unsigned char *a;
246            unsigned long page_sz = sysconf(_SC_PAGESIZE);
247            unsigned long hwcap2 = getauxval(AT_HWCAP2);
248
249            /* check if MTE is present */
250            if (!(hwcap2 & HWCAP2_MTE))
251                    return EXIT_FAILURE;
252
253            /*
254             * Enable the tagged address ABI, synchronous MTE tag check faults and
255             * allow all non-zero tags in the randomly generated set.
256             */
257            if (prctl(PR_SET_TAGGED_ADDR_CTRL,
258                      PR_TAGGED_ADDR_ENABLE | PR_MTE_TCF_SYNC | (0xfffe << PR_MTE_TAG_SHIFT),
259                      0, 0, 0)) {
260                    perror("prctl() failed");
261                    return EXIT_FAILURE;
262            }
263
264            a = mmap(0, page_sz, PROT_READ | PROT_WRITE,
265                     MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
266            if (a == MAP_FAILED) {
267                    perror("mmap() failed");
268                    return EXIT_FAILURE;
269            }
270
271            /*
272             * Enable MTE on the above anonymous mmap. The flag could be passed
273             * directly to mmap() and skip this step.
274             */
275            if (mprotect(a, page_sz, PROT_READ | PROT_WRITE | PROT_MTE)) {
276                    perror("mprotect() failed");
277                    return EXIT_FAILURE;
278            }
279
280            /* access with the default tag (0) */
281            a[0] = 1;
282            a[1] = 2;
283
284            printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]);
285
286            /* set the logical and allocation tags */
287            a = (unsigned char *)insert_random_tag(a);
288            set_tag(a);
289
290            printf("%p\n", a);
291
292            /* non-zero tag access */
293            a[0] = 3;
294            printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]);
295
296            /*
297             * If MTE is enabled correctly the next instruction will generate an
298             * exception.
299             */
300            printf("Expecting SIGSEGV...\n");
301            a[16] = 0xdd;
302
303            /* this should not be printed in the PR_MTE_TCF_SYNC mode */
304            printf("...haven't got one\n");
305
306            return EXIT_FAILURE;
307    }
308