1.. _split_page_table_lock: 2 3===================== 4Split page table lock 5===================== 6 7Originally, mm->page_table_lock spinlock protected all page tables of the 8mm_struct. But this approach leads to poor page fault scalability of 9multi-threaded applications due high contention on the lock. To improve 10scalability, split page table lock was introduced. 11 12With split page table lock we have separate per-table lock to serialize 13access to the table. At the moment we use split lock for PTE and PMD 14tables. Access to higher level tables protected by mm->page_table_lock. 15 16There are helpers to lock/unlock a table and other accessor functions: 17 18 - pte_offset_map_lock() 19 maps pte and takes PTE table lock, returns pointer to the taken 20 lock; 21 - pte_unmap_unlock() 22 unlocks and unmaps PTE table; 23 - pte_alloc_map_lock() 24 allocates PTE table if needed and take the lock, returns pointer 25 to taken lock or NULL if allocation failed; 26 - pte_lockptr() 27 returns pointer to PTE table lock; 28 - pmd_lock() 29 takes PMD table lock, returns pointer to taken lock; 30 - pmd_lockptr() 31 returns pointer to PMD table lock; 32 33Split page table lock for PTE tables is enabled compile-time if 34CONFIG_SPLIT_PTLOCK_CPUS (usually 4) is less or equal to NR_CPUS. 35If split lock is disabled, all tables guaded by mm->page_table_lock. 36 37Split page table lock for PMD tables is enabled, if it's enabled for PTE 38tables and the architecture supports it (see below). 39 40Hugetlb and split page table lock 41================================= 42 43Hugetlb can support several page sizes. We use split lock only for PMD 44level, but not for PUD. 45 46Hugetlb-specific helpers: 47 48 - huge_pte_lock() 49 takes pmd split lock for PMD_SIZE page, mm->page_table_lock 50 otherwise; 51 - huge_pte_lockptr() 52 returns pointer to table lock; 53 54Support of split page table lock by an architecture 55=================================================== 56 57There's no need in special enabling of PTE split page table lock: everything 58required is done by pgtable_pte_page_ctor() and pgtable_pte_page_dtor(), which 59must be called on PTE table allocation / freeing. 60 61Make sure the architecture doesn't use slab allocator for page table 62allocation: slab uses page->slab_cache for its pages. 63This field shares storage with page->ptl. 64 65PMD split lock only makes sense if you have more than two page table 66levels. 67 68PMD split lock enabling requires pgtable_pmd_page_ctor() call on PMD table 69allocation and pgtable_pmd_page_dtor() on freeing. 70 71Allocation usually happens in pmd_alloc_one(), freeing in pmd_free() and 72pmd_free_tlb(), but make sure you cover all PMD table allocation / freeing 73paths: i.e X86_PAE preallocate few PMDs on pgd_alloc(). 74 75With everything in place you can set CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK. 76 77NOTE: pgtable_pte_page_ctor() and pgtable_pmd_page_ctor() can fail -- it must 78be handled properly. 79 80page->ptl 81========= 82 83page->ptl is used to access split page table lock, where 'page' is struct 84page of page containing the table. It shares storage with page->private 85(and few other fields in union). 86 87To avoid increasing size of struct page and have best performance, we use a 88trick: 89 90 - if spinlock_t fits into long, we use page->ptr as spinlock, so we 91 can avoid indirect access and save a cache line. 92 - if size of spinlock_t is bigger then size of long, we use page->ptl as 93 pointer to spinlock_t and allocate it dynamically. This allows to use 94 split lock with enabled DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC, but costs 95 one more cache line for indirect access; 96 97The spinlock_t allocated in pgtable_pte_page_ctor() for PTE table and in 98pgtable_pmd_page_ctor() for PMD table. 99 100Please, never access page->ptl directly -- use appropriate helper. 101