Lines Matching +full:compare +full:- +full:and +full:- +full:swap
1 .. SPDX-License-Identifier: GPL-2.0
9 - Copyright (C) 1999 Richard Gooch
10 - Copyright (C) 2005 Pekka Enberg
21 VFS system calls open(2), stat(2), read(2), write(2), chmod(2) and so on
27 ------------------------------
29 The VFS implements the open(2), stat(2), chmod(2), and similar system
32 cache or dcache). This provides a very fast look-up mechanism to
34 in RAM and are never saved to disc: they exist only for performance.
40 and then loading the inode. This is done by looking up the inode.
44 ----------------
47 filesystem objects such as regular files, directories, FIFOs and other
50 are copied into the memory when required and changes to the inode are
57 required dentry (and hence the inode), we can do all those boring things
60 peeks at the inode data and passes some of it back to userspace.
64 ---------------
67 structure (this is the kernel-side implementation of file descriptors).
69 the dentry and a set of file operation member functions. These are
75 Reading, writing and closing files (and other assorted VFS operations)
77 file structure, and then calling the required file structure method to
82 Registering and Mounting a Filesystem
85 To register and unregister a filesystem, use the following API
88 .. code-block:: c
99 ->mount() will be attached to the mountpoint, so that when pathname
108 -----------------------
113 .. code-block:: c
140 "msdos" and so on
146 Initializes 'struct fs_context' ->ops and ->fs_private fields with
147 filesystem-specific data.
174 i_lock_key, i_mutex_key, invalidate_lock_key, i_mutex_dir_key: lockdep-specific
193 caller. An active reference to its superblock must be grabbed and the
196 The arguments match those of mount(2) and their interpretation depends
198 as block device name, that device is opened and if it contains a
199 suitable filesystem image the method creates and initializes struct
202 ->mount() may choose to return a subtree of existing filesystem - it
213 and provides a fill_super() callback instead. The generic variants are:
245 -----------------------
250 .. code-block:: c
295 struct inode and initialize it. If this function is not
303 ->alloc_inode was defined and simply undoes anything done by
304 ->alloc_inode.
308 in ->destroy_inode to free 'struct inode' memory, then it's
317 and struct inode has times updated since the last ->dirty_inode
327 inode->i_lock spinlock held.
331 not want to cache inodes - causing "delete_inode" to always be
340 *not* evict the pagecache or inode-associated metadata buffers;
343 the inode while (or after) ->evict_inode() is called. Optional.
355 Called instead of ->freeze_fs callback if provided.
356 Main difference is that ->freeze_super is called without taking
357 down_write(&sb->s_umount). If filesystem implements it and wants
358 ->freeze_fs to be called too, then it has to call ->freeze_fs
362 called when VFS is locking a filesystem and forcing it into a
364 Volume Manager (LVM) and ioctl(FIFREEZE). Optional.
367 called when VFS is unlocking a filesystem and making it writable
368 again after ->freeze_super. Optional.
371 called when VFS is unlocking a filesystem and making it writable
372 again after ->freeze_fs. Optional.
386 and /proc/<pid>/mountinfo.
400 filesystem-specific mount statistics.
421 also implement ->nr_cached_objects for it to be called
441 ---------------------
444 superblock field points to a NULL-terminated array of xattr handlers.
471 setxattr(2) and removexattr(2) system calls.
475 the various ``*xattr(2)`` system calls return -EOPNOTSUPP.
485 -----------------------
490 .. code-block:: c
527 called by the open(2) and creat(2) system calls. Only required
530 you will probably call d_instantiate() with the dentry and the
542 calls like create(2), mknod(2), mkdir(2) and so on will fail.
580 the parent and name given by the second inode and dentry.
582 The filesystem must return -EINVAL for any unsupported or
585 the rename exists the rename should fail with -EEXIST instead of
589 (2) RENAME_EXCHANGE: exchange source and target. Both must
591 and target may be of different type.
596 This method returns the symlink body to traverse (and possibly
605 have it return ERR_PTR(-ECHILD).
607 If the filesystem stores the symlink target in ->i_link, the
608 VFS may use it directly without calling ->get_link(); however,
609 ->get_link() must still be provided. ->i_link must not be
610 freed until after an RCU grace period. Writing to ->i_link
611 post-iget() time requires a 'release' memory barrier.
615 cases when ->get_link uses nd_jump_link() or object is not in
617 ->get_link for symlinks and readlink(2) will automatically use
621 called by the VFS to check for access rights on a POSIX-like
624 May be called in rcu-walk mode (mask & MAY_NOT_BLOCK). If in
625 rcu-walk mode, the filesystem must check the permission without
628 If a situation is encountered that rcu-walk cannot handle,
630 -ECHILD and it will be called again in ref-walk mode.
634 called by chmod(2) and related system calls.
638 called by stat(2) and related system calls.
647 itself and call mark_inode_dirty_sync.
651 method the filesystem can look up, possibly create and open the
658 handled by f_op->open(). If the file was created, FMODE_CREATED
659 flag should be set in file->f_mode. In case of O_EXCL the
660 method must only succeed if the file didn't exist and hence
665 atomically creating, opening and unlinking a file in given
671 called on ioctl(FS_IOC_GETFLAGS) and ioctl(FS_IOC_FSGETXATTR) to
672 retrieve miscellaneous file flags and attributes. Also called
675 fall back to f_op->ioctl().
678 called on ioctl(FS_IOC_SETFLAGS) and ioctl(FS_IOC_FSSETXATTR) to
679 change miscellaneous file flags and attributes. Callers hold
680 i_rwsem exclusive. If unset, then fall back to f_op->ioctl().
689 The address space object is used to group and manage pages in the page
691 else) and also track the mapping of sections of the file into process
695 address-space can provide. These include communicating memory pressure,
696 page lookup by address, and keeping track of pages tagged as Dirty or
701 in order to reuse them. To do this it can call the ->writepage method
702 on dirty pages, and ->release_folio on clean folios with the private
703 flag set. Clean pages without PagePrivate and with no external references
707 lru_cache_add and mark_page_active needs to be called whenever the page
710 Pages are normally kept in a radix tree index by ->index. This tree
711 maintains information about the PG_Dirty and PG_Writeback status of each
714 The Dirty tag is primarily used by mpage_writepages - the default
715 ->writepages method. It uses the tag to find dirty pages to call
716 ->writepage on. If mpage_writepages is not used (i.e. the address
717 provides its own ->writepages) , the PAGECACHE_TAG_DIRTY tag is almost
718 unused. write_inode_now and sync_inode do use it (through
719 __sync_single_inode) to check if ->writepages has been successful in
722 The Writeback tag is used by filemap*wait* and sync_page* functions, via
731 An address space acts as an intermediate between storage and
733 time, and provided to the application either by copying of the page, or
734 by memory-mapping the page. Data is written into the address space by
735 the application, and then written-back to storage typically in whole
739 process is more complicated and uses write_begin/write_end or
740 dirty_folio to write data into the address_space, and writepage and
743 Adding and removing pages to/from an address_space is protected by the
748 should clear PG_Dirty and set PG_Writeback. It can be actually written
753 operations. This gives the writepage and writepages operations some
754 information about the nature of and reason for the writeback request,
755 and the constraints under which it is being done. It is also used to
761 --------------------------------
789 file->fsync operation, they should call file_check_and_advance_wb_err to
795 -------------------------------
800 .. code-block:: c
836 wbc->sync_mode. The PG_Dirty flag has been cleared and
838 PG_Writeback, and should make sure the page is unlocked, either
842 If wbc->sync_mode is WB_SYNC_NONE, ->writepage doesn't have to
843 try too hard if there are problems, and may choose to write out
847 keep calling ->writepage on that page.
854 filesystems, and is generally not used by block based filesystems.
865 the page cache holds a reference count and that will not be
868 Filesystems may implement ->read_folio() synchronously.
869 In normal operation, folios are read through the ->readahead()
871 the read to complete will the page cache call ->read_folio().
873 in the ->read_folio() operation.
877 read will succeed in the future and return AOP_TRUNCATED_PAGE.
879 and call ->read_folio again.
881 Callers may invoke the ->read_folio() method directly, but using
883 read to complete and handle cases such as AOP_TRUNCATED_PAGE.
887 address_space object. If wbc->sync_mode is WB_SYNC_ALL, then
890 given and that many pages should be written if possible. If no
891 ->writepages is given, then mpage_writepages is used instead.
893 DIRTY and will pass them to ->writepage.
897 needed if an address space attaches private data to a folio, and
900 If defined, it should set the folio dirty flag, and the
905 object. The pages are consecutive in the page cache and are
910 rac->ra->async_size gives the number of async pages. The
915 and decrement the page refcount. Set PageUptodate if the I/O
922 complete, by allocating space if necessary and doing any other
924 basic-blocks on storage, then those blocks should be pre-read
942 After a successful write_begin, and data copy, write_end must be
943 called. len is the original len passed to write_begin, and
947 decrementing its refcount, and updating i_size.
955 and for working with swap-files. To be able to swap to a file,
956 the file must have a stable mapping to a block device. The swap
958 to find out where the blocks in the file are and uses those
966 space (in the latter case 'offset' will always be 0 and 'length'
969 and length is folio_size(), then the private data should be
971 discarded. This may be done by calling the ->release_folio
976 filesystem that the folio is about to be freed. ->release_folio
977 should remove any private data from the folio and clear the
981 active users. If ->release_folio succeeds, the folio will be
982 removed from the address_space and be freed.
987 filesystem explicitly requesting it as nfs and 9p do (when they
990 and needs to be certain that all folios are invalidated, then
998 assume that the original address_space mapping still exists, and
1002 called by the generic read/write routines to perform direct_IO -
1003 that is IO requests which bypass the page cache and transfer
1004 data directly between the storage and the application's address
1010 signalling imminent failure) it will pass a new folio and an old
1012 data across and update any references that it has to the folio.
1015 Called before freeing a folio - it writes back the dirty folio.
1027 dirty and writeback information to determine if it needs to
1029 Ordinarily it can use folio_test_dirty and folio_test_writeback but
1044 Called to prepare the given file for swap. It should perform
1045 any validation and preparation necessary to ensure that writes
1047 add_swap_extent(), or the helper iomap_swapfile_activate(), and
1049 through ->swap_rw(), it should set SWP_FS_OPS, otherwise IO will
1050 be submitted directly to the block device ``sis->bdev``.
1057 Called to read or write swap pages when SWP_FS_OPS is set.
1067 ----------------------
1072 .. code-block:: c
1119 called by read(2) and related system calls
1125 called by write(2) and related system calls
1138 activity on this file and (optionally) go to sleep until there
1139 is activity. Called by the select(2) and poll(2) system calls
1156 inode_operations", and you may be right. I think it's done the
1174 (non-blocking) mode is enabled for a file
1177 called by the fcntl(2) system call for F_GETLK, F_SETLK, and
1209 called by the ioctl(2) system call for FICLONERANGE and FICLONE
1210 and FIDEDUPERANGE commands to remap file ranges. An
1231 operations with those for the device driver, and then proceed to call
1242 ------------------------
1245 operations. Dentries and the dcache are the domain of the VFS and the
1251 .. code-block:: c
1271 called whenever a name look-up finds a dentry in the dcache.
1278 still valid, and zero or a negative error code if it isn't.
1280 d_revalidate may be called in rcu-walk mode (flags &
1281 LOOKUP_RCU). If in rcu-walk mode, the filesystem must
1283 d_parent and d_inode should not be used without care (because
1284 they can change and, in d_inode case, even become NULL under
1287 If a situation is encountered that rcu-walk cannot handle,
1289 -ECHILD and it will be called again in ref-walk mode.
1293 is called when a path-walk ends at dentry that was not acquired
1295 "." and "..", as well as procfs-style symlinks and mountpoint
1306 d_weak_revalidate is only called after leaving rcu-walk mode.
1313 Same locking and synchronisation rules as d_compare regarding
1317 called to compare a dentry name with a given name. The first
1319 the child dentry. len and name string are properties of the
1320 dentry to be compared. qstr is the name to compare it with.
1322 Must be constant and idempotent, and should not take locks if
1323 possible, and should not or store into the dentry. Should not
1327 However, our vfsmount is pinned, and RCU held, so the dentries
1328 and inodes won't disappear, neither will our sb or filesystem
1329 module. ->d_sb may be used.
1332 under "rcu-walk", ie. without any locks or references on things.
1335 called when the last reference to a dentry is dropped and the
1339 be constant and idempotent.
1364 buffer, and returns a pointer to the first char.
1370 .. code-block:: c
1375 dentry->d_inode->i_ino);
1380 This should create a new VFS mount record and return the record
1383 and the parent VFS mount record to provide inheritable mount
1386 an error code should be returned. If -EISDIR is returned, then
1387 the directory will be treated as an ordinary directory and
1391 on the mountpoint and will remove the vfsmount from its
1393 returned with 2 refs on it to prevent automatic expiration - the
1404 the daemon go past and construct the subtree there. 0 should be
1405 returned to let the calling process continue. -EISDIR can be
1407 directory and to ignore anything mounted on it and not to check
1412 pathwalk in RCU-walk mode. Sleeping is not permitted in this
1413 mode, and the caller can be asked to leave it and call again by
1414 returning -ECHILD. -EISDIR may also be returned to tell
1428 For non-regular files, the 'dentry' argument is returned.
1436 --------------------------
1447 the usage count drops to 0, and the dentry is still in its
1465 add a dentry to its parents hash list and then calls
1469 add a dentry to the alias hash list for the inode and updates
1477 look up a dentry given its parent and path name component It
1479 table. If it is found, the reference count is incremented and
1489 ---------------
1491 On mount and remount the filesystem is passed a string containing a
1504 ---------------
1509 - options MUST be shown which are not default or their values differ
1512 - options MAY be shown which are enabled by default or have their
1515 Options used only internally between a mount helper and the kernel (such
1521 can be accurately replicated (e.g. umounting and mounting again) based
1528 (Note some of these resources are not up-to-date with the latest kernel
1534 The Linux Virtual File-system Layer by Neil Brown. 1999
1535 <http://www.cse.unsw.edu.au/~neilb/oss/linux-commentary/vfs.html>