• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1==============================
2Memory Layout on AArch64 Linux
3==============================
4
5Author: Catalin Marinas <catalin.marinas@arm.com>
6
7This document describes the virtual memory layout used by the AArch64
8Linux kernel. The architecture allows up to 4 levels of translation
9tables with a 4KB page size and up to 3 levels with a 64KB page size.
10
11AArch64 Linux uses either 3 levels or 4 levels of translation tables
12with the 4KB page configuration, allowing 39-bit (512GB) or 48-bit
13(256TB) virtual addresses, respectively, for both user and kernel. With
1464KB pages, only 2 levels of translation tables, allowing 42-bit (4TB)
15virtual address, are used but the memory layout is the same.
16
17ARMv8.2 adds optional support for Large Virtual Address space. This is
18only available when running with a 64KB page size and expands the
19number of descriptors in the first level of translation.
20
21User addresses have bits 63:48 set to 0 while the kernel addresses have
22the same bits set to 1. TTBRx selection is given by bit 63 of the
23virtual address. The swapper_pg_dir contains only kernel (global)
24mappings while the user pgd contains only user (non-global) mappings.
25The swapper_pg_dir address is written to TTBR1 and never written to
26TTBR0.
27
28
29AArch64 Linux memory layout with 4KB pages + 4 levels (48-bit)::
30
31  Start			End			Size		Use
32  -----------------------------------------------------------------------
33  0000000000000000	0000ffffffffffff	 256TB		user
34  ffff000000000000	ffff7fffffffffff	 128TB		kernel logical memory map
35  ffff800000000000	ffff9fffffffffff	  32TB		kasan shadow region
36  ffffa00000000000	ffffa00007ffffff	 128MB		bpf jit region
37  ffffa00008000000	ffffa0000fffffff	 128MB		modules
38  ffffa00010000000	fffffdffbffeffff	 ~93TB		vmalloc
39  fffffdffbfff0000	fffffdfffe5f8fff	~998MB		[guard region]
40  fffffdfffe5f9000	fffffdfffe9fffff	4124KB		fixed mappings
41  fffffdfffea00000	fffffdfffebfffff	   2MB		[guard region]
42  fffffdfffec00000	fffffdffffbfffff	  16MB		PCI I/O space
43  fffffdffffc00000	fffffdffffdfffff	   2MB		[guard region]
44  fffffdffffe00000	ffffffffffdfffff	   2TB		vmemmap
45  ffffffffffe00000	ffffffffffffffff	   2MB		[guard region]
46
47
48AArch64 Linux memory layout with 64KB pages + 3 levels (52-bit with HW support)::
49
50  Start			End			Size		Use
51  -----------------------------------------------------------------------
52  0000000000000000	000fffffffffffff	   4PB		user
53  fff0000000000000	fff7ffffffffffff	   2PB		kernel logical memory map
54  fff8000000000000	fffd9fffffffffff	1440TB		[gap]
55  fffda00000000000	ffff9fffffffffff	 512TB		kasan shadow region
56  ffffa00000000000	ffffa00007ffffff	 128MB		bpf jit region
57  ffffa00008000000	ffffa0000fffffff	 128MB		modules
58  ffffa00010000000	fffff81ffffeffff	 ~88TB		vmalloc
59  fffff81fffff0000	fffffc1ffe58ffff	  ~3TB		[guard region]
60  fffffc1ffe590000	fffffc1ffe9fffff	4544KB		fixed mappings
61  fffffc1ffea00000	fffffc1ffebfffff	   2MB		[guard region]
62  fffffc1ffec00000	fffffc1fffbfffff	  16MB		PCI I/O space
63  fffffc1fffc00000	fffffc1fffdfffff	   2MB		[guard region]
64  fffffc1fffe00000	ffffffffffdfffff	3968GB		vmemmap
65  ffffffffffe00000	ffffffffffffffff	   2MB		[guard region]
66
67
68Translation table lookup with 4KB pages::
69
70  +--------+--------+--------+--------+--------+--------+--------+--------+
71  |63    56|55    48|47    40|39    32|31    24|23    16|15     8|7      0|
72  +--------+--------+--------+--------+--------+--------+--------+--------+
73   |                 |         |         |         |         |
74   |                 |         |         |         |         v
75   |                 |         |         |         |   [11:0]  in-page offset
76   |                 |         |         |         +-> [20:12] L3 index
77   |                 |         |         +-----------> [29:21] L2 index
78   |                 |         +---------------------> [38:30] L1 index
79   |                 +-------------------------------> [47:39] L0 index
80   +-------------------------------------------------> [63] TTBR0/1
81
82
83Translation table lookup with 64KB pages::
84
85  +--------+--------+--------+--------+--------+--------+--------+--------+
86  |63    56|55    48|47    40|39    32|31    24|23    16|15     8|7      0|
87  +--------+--------+--------+--------+--------+--------+--------+--------+
88   |                 |    |               |              |
89   |                 |    |               |              v
90   |                 |    |               |            [15:0]  in-page offset
91   |                 |    |               +----------> [28:16] L3 index
92   |                 |    +--------------------------> [41:29] L2 index
93   |                 +-------------------------------> [47:42] L1 index (48-bit)
94   |                                                   [51:42] L1 index (52-bit)
95   +-------------------------------------------------> [63] TTBR0/1
96
97
98When using KVM without the Virtualization Host Extensions, the
99hypervisor maps kernel pages in EL2 at a fixed (and potentially
100random) offset from the linear mapping. See the kern_hyp_va macro and
101kvm_update_va_mask function for more details. MMIO devices such as
102GICv2 gets mapped next to the HYP idmap page, as do vectors when
103ARM64_HARDEN_EL2_VECTORS is selected for particular CPUs.
104
105When using KVM with the Virtualization Host Extensions, no additional
106mappings are created, since the host kernel runs directly in EL2.
107
10852-bit VA support in the kernel
109-------------------------------
110If the ARMv8.2-LVA optional feature is present, and we are running
111with a 64KB page size; then it is possible to use 52-bits of address
112space for both userspace and kernel addresses. However, any kernel
113binary that supports 52-bit must also be able to fall back to 48-bit
114at early boot time if the hardware feature is not present.
115
116This fallback mechanism necessitates the kernel .text to be in the
117higher addresses such that they are invariant to 48/52-bit VAs. Due
118to the kasan shadow being a fraction of the entire kernel VA space,
119the end of the kasan shadow must also be in the higher half of the
120kernel VA space for both 48/52-bit. (Switching from 48-bit to 52-bit,
121the end of the kasan shadow is invariant and dependent on ~0UL,
122whilst the start address will "grow" towards the lower addresses).
123
124In order to optimise phys_to_virt and virt_to_phys, the PAGE_OFFSET
125is kept constant at 0xFFF0000000000000 (corresponding to 52-bit),
126this obviates the need for an extra variable read. The physvirt
127offset and vmemmap offsets are computed at early boot to enable
128this logic.
129
130As a single binary will need to support both 48-bit and 52-bit VA
131spaces, the VMEMMAP must be sized large enough for 52-bit VAs and
132also must be sized large enough to accommodate a fixed PAGE_OFFSET.
133
134Most code in the kernel should not need to consider the VA_BITS, for
135code that does need to know the VA size the variables are
136defined as follows:
137
138VA_BITS		constant	the *maximum* VA space size
139
140VA_BITS_MIN	constant	the *minimum* VA space size
141
142vabits_actual	variable	the *actual* VA space size
143
144
145Maximum and minimum sizes can be useful to ensure that buffers are
146sized large enough or that addresses are positioned close enough for
147the "worst" case.
148
14952-bit userspace VAs
150--------------------
151To maintain compatibility with software that relies on the ARMv8.0
152VA space maximum size of 48-bits, the kernel will, by default,
153return virtual addresses to userspace from a 48-bit range.
154
155Software can "opt-in" to receiving VAs from a 52-bit space by
156specifying an mmap hint parameter that is larger than 48-bit.
157
158For example:
159
160.. code-block:: c
161
162   maybe_high_address = mmap(~0UL, size, prot, flags,...);
163
164It is also possible to build a debug kernel that returns addresses
165from a 52-bit space by enabling the following kernel config options:
166
167.. code-block:: sh
168
169   CONFIG_EXPERT=y && CONFIG_ARM64_FORCE_52BIT=y
170
171Note that this option is only intended for debugging applications
172and should not be used in production.
173