1File management in the Linux kernel 2----------------------------------- 3 4This document describes how locking for files (struct file) 5and file descriptor table (struct files) works. 6 7Up until 2.6.12, the file descriptor table has been protected 8with a lock (files->file_lock) and reference count (files->count). 9->file_lock protected accesses to all the file related fields 10of the table. ->count was used for sharing the file descriptor 11table between tasks cloned with CLONE_FILES flag. Typically 12this would be the case for posix threads. As with the common 13refcounting model in the kernel, the last task doing 14a put_files_struct() frees the file descriptor (fd) table. 15The files (struct file) themselves are protected using 16reference count (->f_count). 17 18In the new lock-free model of file descriptor management, 19the reference counting is similar, but the locking is 20based on RCU. The file descriptor table contains multiple 21elements - the fd sets (open_fds and close_on_exec, the 22array of file pointers, the sizes of the sets and the array 23etc.). In order for the updates to appear atomic to 24a lock-free reader, all the elements of the file descriptor 25table are in a separate structure - struct fdtable. 26files_struct contains a pointer to struct fdtable through 27which the actual fd table is accessed. Initially the 28fdtable is embedded in files_struct itself. On a subsequent 29expansion of fdtable, a new fdtable structure is allocated 30and files->fdtab points to the new structure. The fdtable 31structure is freed with RCU and lock-free readers either 32see the old fdtable or the new fdtable making the update 33appear atomic. Here are the locking rules for 34the fdtable structure - 35 361. All references to the fdtable must be done through 37 the files_fdtable() macro : 38 39 struct fdtable *fdt; 40 41 rcu_read_lock(); 42 43 fdt = files_fdtable(files); 44 .... 45 if (n <= fdt->max_fds) 46 .... 47 ... 48 rcu_read_unlock(); 49 50 files_fdtable() uses rcu_dereference() macro which takes care of 51 the memory barrier requirements for lock-free dereference. 52 The fdtable pointer must be read within the read-side 53 critical section. 54 552. Reading of the fdtable as described above must be protected 56 by rcu_read_lock()/rcu_read_unlock(). 57 583. For any update to the fd table, files->file_lock must 59 be held. 60 614. To look up the file structure given an fd, a reader 62 must use either fcheck() or fcheck_files() APIs. These 63 take care of barrier requirements due to lock-free lookup. 64 An example : 65 66 struct file *file; 67 68 rcu_read_lock(); 69 file = fcheck(fd); 70 if (file) { 71 ... 72 } 73 .... 74 rcu_read_unlock(); 75 765. Handling of the file structures is special. Since the look-up 77 of the fd (fget()/fget_light()) are lock-free, it is possible 78 that look-up may race with the last put() operation on the 79 file structure. This is avoided using atomic_long_inc_not_zero() 80 on ->f_count : 81 82 rcu_read_lock(); 83 file = fcheck_files(files, fd); 84 if (file) { 85 if (atomic_long_inc_not_zero(&file->f_count)) 86 *fput_needed = 1; 87 else 88 /* Didn't get the reference, someone's freed */ 89 file = NULL; 90 } 91 rcu_read_unlock(); 92 .... 93 return file; 94 95 atomic_long_inc_not_zero() detects if refcounts is already zero or 96 goes to zero during increment. If it does, we fail 97 fget()/fget_light(). 98 996. Since both fdtable and file structures can be looked up 100 lock-free, they must be installed using rcu_assign_pointer() 101 API. If they are looked up lock-free, rcu_dereference() 102 must be used. However it is advisable to use files_fdtable() 103 and fcheck()/fcheck_files() which take care of these issues. 104 1057. While updating, the fdtable pointer must be looked up while 106 holding files->file_lock. If ->file_lock is dropped, then 107 another thread expand the files thereby creating a new 108 fdtable and making the earlier fdtable pointer stale. 109 For example : 110 111 spin_lock(&files->file_lock); 112 fd = locate_fd(files, file, start); 113 if (fd >= 0) { 114 /* locate_fd() may have expanded fdtable, load the ptr */ 115 fdt = files_fdtable(files); 116 __set_open_fd(fd, fdt); 117 __clear_close_on_exec(fd, fdt); 118 spin_unlock(&files->file_lock); 119 ..... 120 121 Since locate_fd() can drop ->file_lock (and reacquire ->file_lock), 122 the fdtable pointer (fdt) must be loaded after locate_fd(). 123 124