path-lookup.md - OpenGrok cross reference for /kernel/linux/linux-4.19/Documentation/filesystems/path-lookup.md

Lines Matching full:it
20 exploration is needed to discover, is that it is complex.  There are
23 acquainted with such complexity and has tools to help manage it.  One
49 It is tempting to describe the second kind as starting with a
51 slashes and components, it can be empty, in other words.  This is
53 in Linux permit it when the `AT_EMPTY_PATH` flag is given.  For
55 can execute it by calling [`execveat()`] passing the file descriptor,
60 it must identify a directory that already exists, otherwise an error
64 calls interpret it quite differently (e.g. some create it, some do
65 not), but it might not even exist: neither the empty pathname nor the
66 pathname that is just slashes have a final component.  If it does
67 exist, it could be "`.`" or "`..`" which are handled quite differently
72 If a pathname ends with a slash, such as "`/tmp/foo/`" it might be
89 checking that the trailing slash is not used where it isn't
90 permitted.  It also addresses the important issue of concurrent
117 that will be particularly relevant is that it is closely integrated
155 afraid of taking a lock when one is needed.  It uses a variety of
171 will behave as expected.  It also protects the `->d_inode` reference
181 setting `d_inode` to `NULL`, or by removing it from the hash table
183 If the dentry is still in use the second option is used as it is
184 perfectly legal to keep using an open file after it has been deleted
187 `d_inode` be set to `NULL`.  Doing it this way is more efficient for a
198 name (`d_name`) cannot be changed, and it cannot be removed from the
202 each candidate dentry that it finds in the hash table and then checks
203 that the parent and name are correct.  So it doesn't lock the parent
204 while searching in the cache; it only locks children.
208 but it first tries a more lightweight approach.  As seen in
224 it might end up continuing the search down the wrong chain,
228 from happening, but only to detect when it happens.
230 renamed.  If `d_lookup` finds that a rename happened while it
231 unsuccessfully scanned a chain in the hash table, it simply tries
238 cannot both happen at the same time.  It also keeps the directory
250 The mutex affects pathname lookup in two distinct ways.  Firstly it
259 Secondly, when pathname lookup reaches the final component, it will
268 Per-CPU here means that incrementing the count is cheap as it only
270 it needs to check with every CPU.  Taking a `mnt_count` reference
274 in particular, doesn't stabilize the link to the mounted-on dentry.  It
276 and it provides a reference to the root dentry of the mounted
282 `mount_lock` is a global seqlock, a bit like `rename_lock`.  It can be used to
307 In particular it is held while scanning chains in the dcache hash
310 Bringing it together with `struct nameidata`
348 only assigned the first time it is used, or when a non-standard root
350 only one root is in effect for the entire path walk, even if it races
361 escape that subtree.  It works a bit like a local `chroot()`.
374 it calls `handle_dots()` which does the necessary locking as already
375 described.  If it finds a `LAST_NORM` component it first calls
377 filesystem to revalidate the result if it is that sort of filesystem.
378 If that doesn't get a good result, it calls "`lookup_slow()`" which
385 reference to the new `vfsmount` which is only counted if it is
386 different from the previous `vfsmount`.  It then calls
399 `nd->last_type` to refer to the final component of the path.  It does
406 `path_parentat()` is clearly the simplest - it just wraps a little bit
414 `path_lookupat()` is nearly as simple - it is used when an existing
415 object is wanted such as by `stat()` or `chmod()`.  It essentially just
420 not try to revalidate the mounted filesystem.  It effectively
426 Finally `path_openat()` is used for the `open()` system call; it
432 not always, take `i_mutex`, depending on what it finds.
441 the final component, it must be a trailing slash.
450 On filesystems that require it, the lookup routines will call the
453 from a server.  In some cases it may find that there has been change
478 It can block to avoid races.  If an automount point is being
483 It can selectively allow only some processes to transit through a
484 mount point.  When a server process is managing automounts, it may
487 filesystem, which will then give it a special pass through
493 supports multiple filesystem namespaces, it is possible that the
509 communicate with server processes etc. but it should ultimately either
516 There is no new locking of import here and it is important that no
526 It is in many ways similar to REF-walk and the two share quite a bit
527 of code.  The significant difference in RCU-walk is how it allows for
532 refusing to handle a number of cases -- it instead falls back to
553 principle, but then it is really designed to work when there may well
557 parts of the filesystem tree, but in many parts it will be.  For the
558 other parts it is important that RCU-walk can quickly fall back to
562 as long as what it is looking for is in the cache and is stable.  It
564 and carefully watching where it is, to be sure it doesn't trip.  If it
566 isn't in the cache, then it tries to stop gracefully and switch to
572 This is an invariant that RCU-walk must guarantee.  It can only make
574 REF-walk could also have made if it were walking down the tree at the
577 RCU-walk finds it cannot stop gracefully, it simply gives up and
598 so it is very unlikely that there will be much, if any, benefit from
606 down a path.  The particular guarantee it provides is that the key
616 before taking references to the "next" dentry or vfsmount.  It also
620 Instead, it checks to see if a change has been made, and aborts or
621 retries if it has.
624 decisions that REF-walk could have made), it must make the checks at
636 is needed - which it usually is - RCU-walk must take a copy and then
649 instead.  Notably it does _not_ use `read_seqcount_retry()`, but
659 We already met the `mount_lock` seqlock when REF-walk used it to
661 it for that too, but for quite a bit more.
663 Instead of taking a counted reference to each `vfsmount` as it
669 relatively rare, it is reasonable to fall back on REF-walk any time
674 when the end of the path is reached.  It is also checked when stepping
676 `follow_dotdot_rcu()`).  If it is ever found to have changed, the
680 If RCU-walk finds that `mount_lock` hasn't changed then it can be sure
696 the required pattern, though it does so for three different cases.
710 twice, once to determine if it is NULL and once to verify access
713 access and it is stored in the `inode` field of `nameidata` from where
714 it can be safely accessed without further validation.
717 `lookup_slow()` being too slow and requiring locks.  It is in
724 revalidates the new `seq` number.  It then validates the old `dentry`
733 A mutex is a fairly heavyweight lock that can only be taken when it is
738 dentry that it is looking for, or it will find a dentry which
739 `read_seqretry()` won't validate.  In either case it will drop down to
742 Though `rename_lock` could be used by RCU-walk as it doesn't require
747 something in the dentry cache, whether it is really there or not, it
764 It is also called from `complete_walk()` when the lookup has reached
772 will return `-ECHILD` which will percolate up until it triggers a new
775 For those cases where `unlazy_walk()` is an option, it essentially
776 takes a reference on each of the pointers that it holds (vfsmount,
779 it, too, aborts with `-ECHILD`, otherwise the transition to REF-walk
784 already have one (often indirectly through another object), but it
786 all.  For `dentry->d_lockref`, it is safe to increment the reference
787 counter to get a reference unless it has been explicitly marked as
791 For `mnt->mnt_count` it is safe to take a reference as long as
793 validation fails, it may *not* be safe to just drop that reference in
795 progressed too far.  So the code in `legitimize_mnt()`, when it
796 finds that the reference it got might not be safe, checks the
798 correct, or if it should just decrement the count and pretend none of
807 file system might be included in RCU-walk, and it must know to be
813 In this case an extra "`MAY_NOT_BLOCK`" flag is passed so that it
814 knows not to sleep, but to return `-ECHILD` if it cannot complete
816 dentry, so it doesn't need to worry about further consistency checks.
817 However if it accesses any other filesystem data structures, it must
827 `seq` number from the `nameidata`, so it needs to be extra careful
830 result is not NULL before using it.  This pattern can be seen in
843 switch to REF-walk for the rest of the path.  We also saw it earlier
844 in `dget_parent()` when following a "`..`" link.  It tries a quick way
850 if anything goes wrong it is much safer to just abort and try a more
853 The emphasis here is "try quickly and check".  It should probably be
857 this whole process is assuming something is safe when in reality it
885 "`readlink -f`" command does, though it also edits out "`.`" and
910 >  Because it's a latency and DoS issue too. We need to react well to
911 >  true loops, but also to "very deep" non-loops. It's not about memory
912 >  use, it's about users triggering unreasonable CPU resources.
919 at most 40 symlinks in any one path lookup.  It previously imposed a
926 symlinks.  In many cases this will be sufficient.  If it isn't, a
931 It might seem that the name remnants are all that needs to be stored on
940 to external storage.  It is particularly important for RCU-walk to be
942 it doesn't need to drop down into REF-walk.
949 inode` it typically allocates extra space to store private data (a
958 construct the symlink content into that memory whenever it is needed.
960 When the symlink is stored in the inode, it has the same lifetime as
966 symlink is stored and it can be accessed directly whenever needed.
974 significantly, needs to release that reference when it is finished
975 with it.
978 mode.  It does require making changes to memory, which is best avoided,
979 but that isn't necessarily a big cost and it is better than dropping
983 filesystem cannot successfully get a reference in RCU-walk mode, it
989 RCU-walk mode as the rewrite is not quite complete.  It is likely that
991 called in RCU-walk mode so it both (1) knows to be careful, and (2) has the
994 all the data structures it references are safe to be accessed while
1000 complexity.  It requires a reference to the inode so that the
1010 provides an opaque "cookie" that must be passed to `->put_link()` so that it
1013 completely.  Only the filesystem knows what it is.
1027 with 40 entries it adds up to 1600 bytes total, which is less than
1028 half a page.  So it might seem like a lot, but is by no means
1032 part of the symlink that the other fields refer to.  It is the remnant
1049 called; it then gets the link from the filesystem.  Providing that
1062 It is most convenient to push the new symlink references onto the
1065 old symlink as it walks that last component.  So it is quite
1068 new symlink.  It is guided in this by two flags; `WALK_GET`, which
1069 gives it permission to follow a symlink if it finds one, and
1070 `WALK_PUT`, which tells it to release the current symlink after it has been
1096 something that looks like a symlink.  It is really a reference to the
1097 target file, not just the name of it.  When you `readlink` these
1098 objects you get a name that might refer to the same file - unless it
1111 following all symbolic links it finds, until it reaches the final
1114 `last` name if it doesn't exist or give an error if it does.  Other
1131 report that it is a symlink are `lookup_last()`, `mountpoint_last()`
1136 Of these, `do_last()` is the most interesting as it is used for
1145  it.  If the file was found in the dcache, then `vfs_open()` is used for
1147  the filesystem provides it) to combine the final lookup with the open, or
1174 We previously said of RCU-walk that it would "take no locks, increment
1186 Symlinks are different it seems.  Both reading a symlink (with `readlink()`)
1192 It is not clear why this is the case; POSIX has little to say on the
1206 quite complex.  Trying to stay in RCU-walk while doing it is best
1207 avoided.  Fortunately it is often permitted to skip the `atime`
1215 It is easy to test if an `atime` update is needed while in RCU-walk
1216 mode and, if it isn't, the update can be skipped and RCU-walk mode
1230 very early on.  If it is set, empty pathnames are not considered to be
1244 provided by the caller, so it shouldn't be released when it is no
1248 it had the right name but for some other reason.  This happens when
1264 point, then the mount is triggered.  Some operations would trigger it
1267 it sets `LOOKUP_AUTOMOUNT`, as does "`quotactl()`" and the handling of
1271 symlinks.  Some system calls set or clear it implicitly, while
1273 `UMOUNT_NOFOLLOW` to control it.  Its effect is similar to
1274 `WALK_GET` that we already met, but it is used in a different way.
1277 Various callers set this and it is also set when the final component
1284 if it knows that it will be asked to open or create the file soon.
1293 than even a couple of releases ago.  But that doesn't mean it is
1295 symlinks that are stored in the inode so, while it handles many ext4
1296 symlinks, it doesn't help with NFS, XFS, or Btrfs.  That support