1Shared Subtrees 2--------------- 3 4Contents: 5 1) Overview 6 2) Features 7 3) smount command 8 4) Use-case 9 5) Detailed semantics 10 6) Quiz 11 7) FAQ 12 8) Implementation 13 14 151) Overview 16----------- 17 18Consider the following situation: 19 20A process wants to clone its own namespace, but still wants to access the CD 21that got mounted recently. Shared subtree semantics provide the necessary 22mechanism to accomplish the above. 23 24It provides the necessary building blocks for features like per-user-namespace 25and versioned filesystem. 26 272) Features 28----------- 29 30Shared subtree provides four different flavors of mounts; struct vfsmount to be 31precise 32 33 a. shared mount 34 b. slave mount 35 c. private mount 36 d. unbindable mount 37 38 392a) A shared mount can be replicated to as many mountpoints and all the 40replicas continue to be exactly same. 41 42 Here is an example: 43 44 Lets say /mnt has a mount that is shared. 45 mount --make-shared /mnt 46 47 note: mount command does not yet support the --make-shared flag. 48 I have included a small C program which does the same by executing 49 'smount /mnt shared' 50 51 #mount --bind /mnt /tmp 52 The above command replicates the mount at /mnt to the mountpoint /tmp 53 and the contents of both the mounts remain identical. 54 55 #ls /mnt 56 a b c 57 58 #ls /tmp 59 a b c 60 61 Now lets say we mount a device at /tmp/a 62 #mount /dev/sd0 /tmp/a 63 64 #ls /tmp/a 65 t1 t2 t2 66 67 #ls /mnt/a 68 t1 t2 t2 69 70 Note that the mount has propagated to the mount at /mnt as well. 71 72 And the same is true even when /dev/sd0 is mounted on /mnt/a. The 73 contents will be visible under /tmp/a too. 74 75 762b) A slave mount is like a shared mount except that mount and umount events 77 only propagate towards it. 78 79 All slave mounts have a master mount which is a shared. 80 81 Here is an example: 82 83 Lets say /mnt has a mount which is shared. 84 #mount --make-shared /mnt 85 86 Lets bind mount /mnt to /tmp 87 #mount --bind /mnt /tmp 88 89 the new mount at /tmp becomes a shared mount and it is a replica of 90 the mount at /mnt. 91 92 Now lets make the mount at /tmp; a slave of /mnt 93 #mount --make-slave /tmp 94 [or smount /tmp slave] 95 96 lets mount /dev/sd0 on /mnt/a 97 #mount /dev/sd0 /mnt/a 98 99 #ls /mnt/a 100 t1 t2 t3 101 102 #ls /tmp/a 103 t1 t2 t3 104 105 Note the mount event has propagated to the mount at /tmp 106 107 However lets see what happens if we mount something on the mount at /tmp 108 109 #mount /dev/sd1 /tmp/b 110 111 #ls /tmp/b 112 s1 s2 s3 113 114 #ls /mnt/b 115 116 Note how the mount event has not propagated to the mount at 117 /mnt 118 119 1202c) A private mount does not forward or receive propagation. 121 122 This is the mount we are familiar with. Its the default type. 123 124 1252d) A unbindable mount is a unbindable private mount 126 127 lets say we have a mount at /mnt and we make is unbindable 128 129 #mount --make-unbindable /mnt 130 [ smount /mnt unbindable ] 131 132 Lets try to bind mount this mount somewhere else. 133 # mount --bind /mnt /tmp 134 mount: wrong fs type, bad option, bad superblock on /mnt, 135 or too many mounted file systems 136 137 Binding a unbindable mount is a invalid operation. 138 139 1403) smount command 141 142 Currently the mount command is not aware of shared subtree features. 143 Work is in progress to add the support in mount ( util-linux package ). 144 Till then use the following program. 145 146 ------------------------------------------------------------------------ 147 // 148 //this code was developed my Miklos Szeredi <miklos@szeredi.hu> 149 //and modified by Ram Pai <linuxram@us.ibm.com> 150 // sample usage: 151 // smount /tmp shared 152 // 153 #include <stdio.h> 154 #include <stdlib.h> 155 #include <unistd.h> 156 #include <string.h> 157 #include <sys/mount.h> 158 #include <sys/fsuid.h> 159 160 #ifndef MS_REC 161 #define MS_REC 0x4000 /* 16384: Recursive loopback */ 162 #endif 163 164 #ifndef MS_SHARED 165 #define MS_SHARED 1<<20 /* Shared */ 166 #endif 167 168 #ifndef MS_PRIVATE 169 #define MS_PRIVATE 1<<18 /* Private */ 170 #endif 171 172 #ifndef MS_SLAVE 173 #define MS_SLAVE 1<<19 /* Slave */ 174 #endif 175 176 #ifndef MS_UNBINDABLE 177 #define MS_UNBINDABLE 1<<17 /* Unbindable */ 178 #endif 179 180 int main(int argc, char *argv[]) 181 { 182 int type; 183 if(argc != 3) { 184 fprintf(stderr, "usage: %s dir " 185 "<rshared|rslave|rprivate|runbindable|shared|slave" 186 "|private|unbindable>\n" , argv[0]); 187 return 1; 188 } 189 190 fprintf(stdout, "%s %s %s\n", argv[0], argv[1], argv[2]); 191 192 if (strcmp(argv[2],"rshared")==0) 193 type=(MS_SHARED|MS_REC); 194 else if (strcmp(argv[2],"rslave")==0) 195 type=(MS_SLAVE|MS_REC); 196 else if (strcmp(argv[2],"rprivate")==0) 197 type=(MS_PRIVATE|MS_REC); 198 else if (strcmp(argv[2],"runbindable")==0) 199 type=(MS_UNBINDABLE|MS_REC); 200 else if (strcmp(argv[2],"shared")==0) 201 type=MS_SHARED; 202 else if (strcmp(argv[2],"slave")==0) 203 type=MS_SLAVE; 204 else if (strcmp(argv[2],"private")==0) 205 type=MS_PRIVATE; 206 else if (strcmp(argv[2],"unbindable")==0) 207 type=MS_UNBINDABLE; 208 else { 209 fprintf(stderr, "invalid operation: %s\n", argv[2]); 210 return 1; 211 } 212 setfsuid(getuid()); 213 214 if(mount("", argv[1], "dontcare", type, "") == -1) { 215 perror("mount"); 216 return 1; 217 } 218 return 0; 219 } 220 ----------------------------------------------------------------------- 221 222 Copy the above code snippet into smount.c 223 gcc -o smount smount.c 224 225 226 (i) To mark all the mounts under /mnt as shared execute the following 227 command: 228 229 smount /mnt rshared 230 the corresponding syntax planned for mount command is 231 mount --make-rshared /mnt 232 233 just to mark a mount /mnt as shared, execute the following 234 command: 235 smount /mnt shared 236 the corresponding syntax planned for mount command is 237 mount --make-shared /mnt 238 239 (ii) To mark all the shared mounts under /mnt as slave execute the 240 following 241 242 command: 243 smount /mnt rslave 244 the corresponding syntax planned for mount command is 245 mount --make-rslave /mnt 246 247 just to mark a mount /mnt as slave, execute the following 248 command: 249 smount /mnt slave 250 the corresponding syntax planned for mount command is 251 mount --make-slave /mnt 252 253 (iii) To mark all the mounts under /mnt as private execute the 254 following command: 255 256 smount /mnt rprivate 257 the corresponding syntax planned for mount command is 258 mount --make-rprivate /mnt 259 260 just to mark a mount /mnt as private, execute the following 261 command: 262 smount /mnt private 263 the corresponding syntax planned for mount command is 264 mount --make-private /mnt 265 266 NOTE: by default all the mounts are created as private. But if 267 you want to change some shared/slave/unbindable mount as 268 private at a later point in time, this command can help. 269 270 (iv) To mark all the mounts under /mnt as unbindable execute the 271 following 272 273 command: 274 smount /mnt runbindable 275 the corresponding syntax planned for mount command is 276 mount --make-runbindable /mnt 277 278 just to mark a mount /mnt as unbindable, execute the following 279 command: 280 smount /mnt unbindable 281 the corresponding syntax planned for mount command is 282 mount --make-unbindable /mnt 283 284 2854) Use cases 286------------ 287 288 A) A process wants to clone its own namespace, but still wants to 289 access the CD that got mounted recently. 290 291 Solution: 292 293 The system administrator can make the mount at /cdrom shared 294 mount --bind /cdrom /cdrom 295 mount --make-shared /cdrom 296 297 Now any process that clones off a new namespace will have a 298 mount at /cdrom which is a replica of the same mount in the 299 parent namespace. 300 301 So when a CD is inserted and mounted at /cdrom that mount gets 302 propagated to the other mount at /cdrom in all the other clone 303 namespaces. 304 305 B) A process wants its mounts invisible to any other process, but 306 still be able to see the other system mounts. 307 308 Solution: 309 310 To begin with, the administrator can mark the entire mount tree 311 as shareable. 312 313 mount --make-rshared / 314 315 A new process can clone off a new namespace. And mark some part 316 of its namespace as slave 317 318 mount --make-rslave /myprivatetree 319 320 Hence forth any mounts within the /myprivatetree done by the 321 process will not show up in any other namespace. However mounts 322 done in the parent namespace under /myprivatetree still shows 323 up in the process's namespace. 324 325 326 Apart from the above semantics this feature provides the 327 building blocks to solve the following problems: 328 329 C) Per-user namespace 330 331 The above semantics allows a way to share mounts across 332 namespaces. But namespaces are associated with processes. If 333 namespaces are made first class objects with user API to 334 associate/disassociate a namespace with userid, then each user 335 could have his/her own namespace and tailor it to his/her 336 requirements. Offcourse its needs support from PAM. 337 338 D) Versioned files 339 340 If the entire mount tree is visible at multiple locations, then 341 a underlying versioning file system can return different 342 version of the file depending on the path used to access that 343 file. 344 345 An example is: 346 347 mount --make-shared / 348 mount --rbind / /view/v1 349 mount --rbind / /view/v2 350 mount --rbind / /view/v3 351 mount --rbind / /view/v4 352 353 and if /usr has a versioning filesystem mounted, than that 354 mount appears at /view/v1/usr, /view/v2/usr, /view/v3/usr and 355 /view/v4/usr too 356 357 A user can request v3 version of the file /usr/fs/namespace.c 358 by accessing /view/v3/usr/fs/namespace.c . The underlying 359 versioning filesystem can then decipher that v3 version of the 360 filesystem is being requested and return the corresponding 361 inode. 362 3635) Detailed semantics: 364------------------- 365 The section below explains the detailed semantics of 366 bind, rbind, move, mount, umount and clone-namespace operations. 367 368 Note: the word 'vfsmount' and the noun 'mount' have been used 369 to mean the same thing, throughout this document. 370 3715a) Mount states 372 373 A given mount can be in one of the following states 374 1) shared 375 2) slave 376 3) shared and slave 377 4) private 378 5) unbindable 379 380 A 'propagation event' is defined as event generated on a vfsmount 381 that leads to mount or unmount actions in other vfsmounts. 382 383 A 'peer group' is defined as a group of vfsmounts that propagate 384 events to each other. 385 386 (1) Shared mounts 387 388 A 'shared mount' is defined as a vfsmount that belongs to a 389 'peer group'. 390 391 For example: 392 mount --make-shared /mnt 393 mount --bin /mnt /tmp 394 395 The mount at /mnt and that at /tmp are both shared and belong 396 to the same peer group. Anything mounted or unmounted under 397 /mnt or /tmp reflect in all the other mounts of its peer 398 group. 399 400 401 (2) Slave mounts 402 403 A 'slave mount' is defined as a vfsmount that receives 404 propagation events and does not forward propagation events. 405 406 A slave mount as the name implies has a master mount from which 407 mount/unmount events are received. Events do not propagate from 408 the slave mount to the master. Only a shared mount can be made 409 a slave by executing the following command 410 411 mount --make-slave mount 412 413 A shared mount that is made as a slave is no more shared unless 414 modified to become shared. 415 416 (3) Shared and Slave 417 418 A vfsmount can be both shared as well as slave. This state 419 indicates that the mount is a slave of some vfsmount, and 420 has its own peer group too. This vfsmount receives propagation 421 events from its master vfsmount, and also forwards propagation 422 events to its 'peer group' and to its slave vfsmounts. 423 424 Strictly speaking, the vfsmount is shared having its own 425 peer group, and this peer-group is a slave of some other 426 peer group. 427 428 Only a slave vfsmount can be made as 'shared and slave' by 429 either executing the following command 430 mount --make-shared mount 431 or by moving the slave vfsmount under a shared vfsmount. 432 433 (4) Private mount 434 435 A 'private mount' is defined as vfsmount that does not 436 receive or forward any propagation events. 437 438 (5) Unbindable mount 439 440 A 'unbindable mount' is defined as vfsmount that does not 441 receive or forward any propagation events and cannot 442 be bind mounted. 443 444 445 State diagram: 446 The state diagram below explains the state transition of a mount, 447 in response to various commands. 448 ------------------------------------------------------------------------ 449 | |make-shared | make-slave | make-private |make-unbindab| 450 --------------|------------|--------------|--------------|-------------| 451 |shared |shared |*slave/private| private | unbindable | 452 | | | | | | 453 |-------------|------------|--------------|--------------|-------------| 454 |slave |shared | **slave | private | unbindable | 455 | |and slave | | | | 456 |-------------|------------|--------------|--------------|-------------| 457 |shared |shared | slave | private | unbindable | 458 |and slave |and slave | | | | 459 |-------------|------------|--------------|--------------|-------------| 460 |private |shared | **private | private | unbindable | 461 |-------------|------------|--------------|--------------|-------------| 462 |unbindable |shared |**unbindable | private | unbindable | 463 ------------------------------------------------------------------------ 464 465 * if the shared mount is the only mount in its peer group, making it 466 slave, makes it private automatically. Note that there is no master to 467 which it can be slaved to. 468 469 ** slaving a non-shared mount has no effect on the mount. 470 471 Apart from the commands listed below, the 'move' operation also changes 472 the state of a mount depending on type of the destination mount. Its 473 explained in section 5d. 474 4755b) Bind semantics 476 477 Consider the following command 478 479 mount --bind A/a B/b 480 481 where 'A' is the source mount, 'a' is the dentry in the mount 'A', 'B' 482 is the destination mount and 'b' is the dentry in the destination mount. 483 484 The outcome depends on the type of mount of 'A' and 'B'. The table 485 below contains quick reference. 486 --------------------------------------------------------------------------- 487 | BIND MOUNT OPERATION | 488 |************************************************************************** 489 |source(A)->| shared | private | slave | unbindable | 490 | dest(B) | | | | | 491 | | | | | | | 492 | v | | | | | 493 |************************************************************************** 494 | shared | shared | shared | shared & slave | invalid | 495 | | | | | | 496 |non-shared| shared | private | slave | invalid | 497 *************************************************************************** 498 499 Details: 500 501 1. 'A' is a shared mount and 'B' is a shared mount. A new mount 'C' 502 which is clone of 'A', is created. Its root dentry is 'a' . 'C' is 503 mounted on mount 'B' at dentry 'b'. Also new mount 'C1', 'C2', 'C3' ... 504 are created and mounted at the dentry 'b' on all mounts where 'B' 505 propagates to. A new propagation tree containing 'C1',..,'Cn' is 506 created. This propagation tree is identical to the propagation tree of 507 'B'. And finally the peer-group of 'C' is merged with the peer group 508 of 'A'. 509 510 2. 'A' is a private mount and 'B' is a shared mount. A new mount 'C' 511 which is clone of 'A', is created. Its root dentry is 'a'. 'C' is 512 mounted on mount 'B' at dentry 'b'. Also new mount 'C1', 'C2', 'C3' ... 513 are created and mounted at the dentry 'b' on all mounts where 'B' 514 propagates to. A new propagation tree is set containing all new mounts 515 'C', 'C1', .., 'Cn' with exactly the same configuration as the 516 propagation tree for 'B'. 517 518 3. 'A' is a slave mount of mount 'Z' and 'B' is a shared mount. A new 519 mount 'C' which is clone of 'A', is created. Its root dentry is 'a' . 520 'C' is mounted on mount 'B' at dentry 'b'. Also new mounts 'C1', 'C2', 521 'C3' ... are created and mounted at the dentry 'b' on all mounts where 522 'B' propagates to. A new propagation tree containing the new mounts 523 'C','C1',.. 'Cn' is created. This propagation tree is identical to the 524 propagation tree for 'B'. And finally the mount 'C' and its peer group 525 is made the slave of mount 'Z'. In other words, mount 'C' is in the 526 state 'slave and shared'. 527 528 4. 'A' is a unbindable mount and 'B' is a shared mount. This is a 529 invalid operation. 530 531 5. 'A' is a private mount and 'B' is a non-shared(private or slave or 532 unbindable) mount. A new mount 'C' which is clone of 'A', is created. 533 Its root dentry is 'a'. 'C' is mounted on mount 'B' at dentry 'b'. 534 535 6. 'A' is a shared mount and 'B' is a non-shared mount. A new mount 'C' 536 which is a clone of 'A' is created. Its root dentry is 'a'. 'C' is 537 mounted on mount 'B' at dentry 'b'. 'C' is made a member of the 538 peer-group of 'A'. 539 540 7. 'A' is a slave mount of mount 'Z' and 'B' is a non-shared mount. A 541 new mount 'C' which is a clone of 'A' is created. Its root dentry is 542 'a'. 'C' is mounted on mount 'B' at dentry 'b'. Also 'C' is set as a 543 slave mount of 'Z'. In other words 'A' and 'C' are both slave mounts of 544 'Z'. All mount/unmount events on 'Z' propagates to 'A' and 'C'. But 545 mount/unmount on 'A' do not propagate anywhere else. Similarly 546 mount/unmount on 'C' do not propagate anywhere else. 547 548 8. 'A' is a unbindable mount and 'B' is a non-shared mount. This is a 549 invalid operation. A unbindable mount cannot be bind mounted. 550 5515c) Rbind semantics 552 553 rbind is same as bind. Bind replicates the specified mount. Rbind 554 replicates all the mounts in the tree belonging to the specified mount. 555 Rbind mount is bind mount applied to all the mounts in the tree. 556 557 If the source tree that is rbind has some unbindable mounts, 558 then the subtree under the unbindable mount is pruned in the new 559 location. 560 561 eg: lets say we have the following mount tree. 562 563 A 564 / \ 565 B C 566 / \ / \ 567 D E F G 568 569 Lets say all the mount except the mount C in the tree are 570 of a type other than unbindable. 571 572 If this tree is rbound to say Z 573 574 We will have the following tree at the new location. 575 576 Z 577 | 578 A' 579 / 580 B' Note how the tree under C is pruned 581 / \ in the new location. 582 D' E' 583 584 585 5865d) Move semantics 587 588 Consider the following command 589 590 mount --move A B/b 591 592 where 'A' is the source mount, 'B' is the destination mount and 'b' is 593 the dentry in the destination mount. 594 595 The outcome depends on the type of the mount of 'A' and 'B'. The table 596 below is a quick reference. 597 --------------------------------------------------------------------------- 598 | MOVE MOUNT OPERATION | 599 |************************************************************************** 600 | source(A)->| shared | private | slave | unbindable | 601 | dest(B) | | | | | 602 | | | | | | | 603 | v | | | | | 604 |************************************************************************** 605 | shared | shared | shared |shared and slave| invalid | 606 | | | | | | 607 |non-shared| shared | private | slave | unbindable | 608 *************************************************************************** 609 NOTE: moving a mount residing under a shared mount is invalid. 610 611 Details follow: 612 613 1. 'A' is a shared mount and 'B' is a shared mount. The mount 'A' is 614 mounted on mount 'B' at dentry 'b'. Also new mounts 'A1', 'A2'...'An' 615 are created and mounted at dentry 'b' on all mounts that receive 616 propagation from mount 'B'. A new propagation tree is created in the 617 exact same configuration as that of 'B'. This new propagation tree 618 contains all the new mounts 'A1', 'A2'... 'An'. And this new 619 propagation tree is appended to the already existing propagation tree 620 of 'A'. 621 622 2. 'A' is a private mount and 'B' is a shared mount. The mount 'A' is 623 mounted on mount 'B' at dentry 'b'. Also new mount 'A1', 'A2'... 'An' 624 are created and mounted at dentry 'b' on all mounts that receive 625 propagation from mount 'B'. The mount 'A' becomes a shared mount and a 626 propagation tree is created which is identical to that of 627 'B'. This new propagation tree contains all the new mounts 'A1', 628 'A2'... 'An'. 629 630 3. 'A' is a slave mount of mount 'Z' and 'B' is a shared mount. The 631 mount 'A' is mounted on mount 'B' at dentry 'b'. Also new mounts 'A1', 632 'A2'... 'An' are created and mounted at dentry 'b' on all mounts that 633 receive propagation from mount 'B'. A new propagation tree is created 634 in the exact same configuration as that of 'B'. This new propagation 635 tree contains all the new mounts 'A1', 'A2'... 'An'. And this new 636 propagation tree is appended to the already existing propagation tree of 637 'A'. Mount 'A' continues to be the slave mount of 'Z' but it also 638 becomes 'shared'. 639 640 4. 'A' is a unbindable mount and 'B' is a shared mount. The operation 641 is invalid. Because mounting anything on the shared mount 'B' can 642 create new mounts that get mounted on the mounts that receive 643 propagation from 'B'. And since the mount 'A' is unbindable, cloning 644 it to mount at other mountpoints is not possible. 645 646 5. 'A' is a private mount and 'B' is a non-shared(private or slave or 647 unbindable) mount. The mount 'A' is mounted on mount 'B' at dentry 'b'. 648 649 6. 'A' is a shared mount and 'B' is a non-shared mount. The mount 'A' 650 is mounted on mount 'B' at dentry 'b'. Mount 'A' continues to be a 651 shared mount. 652 653 7. 'A' is a slave mount of mount 'Z' and 'B' is a non-shared mount. 654 The mount 'A' is mounted on mount 'B' at dentry 'b'. Mount 'A' 655 continues to be a slave mount of mount 'Z'. 656 657 8. 'A' is a unbindable mount and 'B' is a non-shared mount. The mount 658 'A' is mounted on mount 'B' at dentry 'b'. Mount 'A' continues to be a 659 unbindable mount. 660 6615e) Mount semantics 662 663 Consider the following command 664 665 mount device B/b 666 667 'B' is the destination mount and 'b' is the dentry in the destination 668 mount. 669 670 The above operation is the same as bind operation with the exception 671 that the source mount is always a private mount. 672 673 6745f) Unmount semantics 675 676 Consider the following command 677 678 umount A 679 680 where 'A' is a mount mounted on mount 'B' at dentry 'b'. 681 682 If mount 'B' is shared, then all most-recently-mounted mounts at dentry 683 'b' on mounts that receive propagation from mount 'B' and does not have 684 sub-mounts within them are unmounted. 685 686 Example: Lets say 'B1', 'B2', 'B3' are shared mounts that propagate to 687 each other. 688 689 lets say 'A1', 'A2', 'A3' are first mounted at dentry 'b' on mount 690 'B1', 'B2' and 'B3' respectively. 691 692 lets say 'C1', 'C2', 'C3' are next mounted at the same dentry 'b' on 693 mount 'B1', 'B2' and 'B3' respectively. 694 695 if 'C1' is unmounted, all the mounts that are most-recently-mounted on 696 'B1' and on the mounts that 'B1' propagates-to are unmounted. 697 698 'B1' propagates to 'B2' and 'B3'. And the most recently mounted mount 699 on 'B2' at dentry 'b' is 'C2', and that of mount 'B3' is 'C3'. 700 701 So all 'C1', 'C2' and 'C3' should be unmounted. 702 703 If any of 'C2' or 'C3' has some child mounts, then that mount is not 704 unmounted, but all other mounts are unmounted. However if 'C1' is told 705 to be unmounted and 'C1' has some sub-mounts, the umount operation is 706 failed entirely. 707 7085g) Clone Namespace 709 710 A cloned namespace contains all the mounts as that of the parent 711 namespace. 712 713 Lets say 'A' and 'B' are the corresponding mounts in the parent and the 714 child namespace. 715 716 If 'A' is shared, then 'B' is also shared and 'A' and 'B' propagate to 717 each other. 718 719 If 'A' is a slave mount of 'Z', then 'B' is also the slave mount of 720 'Z'. 721 722 If 'A' is a private mount, then 'B' is a private mount too. 723 724 If 'A' is unbindable mount, then 'B' is a unbindable mount too. 725 726 7276) Quiz 728 729 A. What is the result of the following command sequence? 730 731 mount --bind /mnt /mnt 732 mount --make-shared /mnt 733 mount --bind /mnt /tmp 734 mount --move /tmp /mnt/1 735 736 what should be the contents of /mnt /mnt/1 /mnt/1/1 should be? 737 Should they all be identical? or should /mnt and /mnt/1 be 738 identical only? 739 740 741 B. What is the result of the following command sequence? 742 743 mount --make-rshared / 744 mkdir -p /v/1 745 mount --rbind / /v/1 746 747 what should be the content of /v/1/v/1 be? 748 749 750 C. What is the result of the following command sequence? 751 752 mount --bind /mnt /mnt 753 mount --make-shared /mnt 754 mkdir -p /mnt/1/2/3 /mnt/1/test 755 mount --bind /mnt/1 /tmp 756 mount --make-slave /mnt 757 mount --make-shared /mnt 758 mount --bind /mnt/1/2 /tmp1 759 mount --make-slave /mnt 760 761 At this point we have the first mount at /tmp and 762 its root dentry is 1. Lets call this mount 'A' 763 And then we have a second mount at /tmp1 with root 764 dentry 2. Lets call this mount 'B' 765 Next we have a third mount at /mnt with root dentry 766 mnt. Lets call this mount 'C' 767 768 'B' is the slave of 'A' and 'C' is a slave of 'B' 769 A -> B -> C 770 771 at this point if we execute the following command 772 773 mount --bind /bin /tmp/test 774 775 The mount is attempted on 'A' 776 777 will the mount propagate to 'B' and 'C' ? 778 779 what would be the contents of 780 /mnt/1/test be? 781 7827) FAQ 783 784 Q1. Why is bind mount needed? How is it different from symbolic links? 785 symbolic links can get stale if the destination mount gets 786 unmounted or moved. Bind mounts continue to exist even if the 787 other mount is unmounted or moved. 788 789 Q2. Why can't the shared subtree be implemented using exportfs? 790 791 exportfs is a heavyweight way of accomplishing part of what 792 shared subtree can do. I cannot imagine a way to implement the 793 semantics of slave mount using exportfs? 794 795 Q3 Why is unbindable mount needed? 796 797 Lets say we want to replicate the mount tree at multiple 798 locations within the same subtree. 799 800 if one rbind mounts a tree within the same subtree 'n' times 801 the number of mounts created is an exponential function of 'n'. 802 Having unbindable mount can help prune the unneeded bind 803 mounts. Here is a example. 804 805 step 1: 806 lets say the root tree has just two directories with 807 one vfsmount. 808 root 809 / \ 810 tmp usr 811 812 And we want to replicate the tree at multiple 813 mountpoints under /root/tmp 814 815 step2: 816 mount --make-shared /root 817 818 mkdir -p /tmp/m1 819 820 mount --rbind /root /tmp/m1 821 822 the new tree now looks like this: 823 824 root 825 / \ 826 tmp usr 827 / 828 m1 829 / \ 830 tmp usr 831 / 832 m1 833 834 it has two vfsmounts 835 836 step3: 837 mkdir -p /tmp/m2 838 mount --rbind /root /tmp/m2 839 840 the new tree now looks like this: 841 842 root 843 / \ 844 tmp usr 845 / \ 846 m1 m2 847 / \ / \ 848 tmp usr tmp usr 849 / \ / 850 m1 m2 m1 851 / \ / \ 852 tmp usr tmp usr 853 / / \ 854 m1 m1 m2 855 / \ 856 tmp usr 857 / \ 858 m1 m2 859 860 it has 6 vfsmounts 861 862 step 4: 863 mkdir -p /tmp/m3 864 mount --rbind /root /tmp/m3 865 866 I wont' draw the tree..but it has 24 vfsmounts 867 868 869 at step i the number of vfsmounts is V[i] = i*V[i-1]. 870 This is an exponential function. And this tree has way more 871 mounts than what we really needed in the first place. 872 873 One could use a series of umount at each step to prune 874 out the unneeded mounts. But there is a better solution. 875 Unclonable mounts come in handy here. 876 877 step 1: 878 lets say the root tree has just two directories with 879 one vfsmount. 880 root 881 / \ 882 tmp usr 883 884 How do we set up the same tree at multiple locations under 885 /root/tmp 886 887 step2: 888 mount --bind /root/tmp /root/tmp 889 890 mount --make-rshared /root 891 mount --make-unbindable /root/tmp 892 893 mkdir -p /tmp/m1 894 895 mount --rbind /root /tmp/m1 896 897 the new tree now looks like this: 898 899 root 900 / \ 901 tmp usr 902 / 903 m1 904 / \ 905 tmp usr 906 907 step3: 908 mkdir -p /tmp/m2 909 mount --rbind /root /tmp/m2 910 911 the new tree now looks like this: 912 913 root 914 / \ 915 tmp usr 916 / \ 917 m1 m2 918 / \ / \ 919 tmp usr tmp usr 920 921 step4: 922 923 mkdir -p /tmp/m3 924 mount --rbind /root /tmp/m3 925 926 the new tree now looks like this: 927 928 root 929 / \ 930 tmp usr 931 / \ \ 932 m1 m2 m3 933 / \ / \ / \ 934 tmp usr tmp usr tmp usr 935 9368) Implementation 937 9388A) Datastructure 939 940 4 new fields are introduced to struct vfsmount 941 ->mnt_share 942 ->mnt_slave_list 943 ->mnt_slave 944 ->mnt_master 945 946 ->mnt_share links together all the mount to/from which this vfsmount 947 send/receives propagation events. 948 949 ->mnt_slave_list links all the mounts to which this vfsmount propagates 950 to. 951 952 ->mnt_slave links together all the slaves that its master vfsmount 953 propagates to. 954 955 ->mnt_master points to the master vfsmount from which this vfsmount 956 receives propagation. 957 958 ->mnt_flags takes two more flags to indicate the propagation status of 959 the vfsmount. MNT_SHARE indicates that the vfsmount is a shared 960 vfsmount. MNT_UNCLONABLE indicates that the vfsmount cannot be 961 replicated. 962 963 All the shared vfsmounts in a peer group form a cyclic list through 964 ->mnt_share. 965 966 All vfsmounts with the same ->mnt_master form on a cyclic list anchored 967 in ->mnt_master->mnt_slave_list and going through ->mnt_slave. 968 969 ->mnt_master can point to arbitrary (and possibly different) members 970 of master peer group. To find all immediate slaves of a peer group 971 you need to go through _all_ ->mnt_slave_list of its members. 972 Conceptually it's just a single set - distribution among the 973 individual lists does not affect propagation or the way propagation 974 tree is modified by operations. 975 976 A example propagation tree looks as shown in the figure below. 977 [ NOTE: Though it looks like a forest, if we consider all the shared 978 mounts as a conceptual entity called 'pnode', it becomes a tree] 979 980 981 A <--> B <--> C <---> D 982 /|\ /| |\ 983 / F G J K H I 984 / 985 E<-->K 986 /|\ 987 M L N 988 989 In the above figure A,B,C and D all are shared and propagate to each 990 other. 'A' has got 3 slave mounts 'E' 'F' and 'G' 'C' has got 2 slave 991 mounts 'J' and 'K' and 'D' has got two slave mounts 'H' and 'I'. 992 'E' is also shared with 'K' and they propagate to each other. And 993 'K' has 3 slaves 'M', 'L' and 'N' 994 995 A's ->mnt_share links with the ->mnt_share of 'B' 'C' and 'D' 996 997 A's ->mnt_slave_list links with ->mnt_slave of 'E', 'K', 'F' and 'G' 998 999 E's ->mnt_share links with ->mnt_share of K 1000 'E', 'K', 'F', 'G' have their ->mnt_master point to struct 1001 vfsmount of 'A' 1002 'M', 'L', 'N' have their ->mnt_master point to struct vfsmount of 'K' 1003 K's ->mnt_slave_list links with ->mnt_slave of 'M', 'L' and 'N' 1004 1005 C's ->mnt_slave_list links with ->mnt_slave of 'J' and 'K' 1006 J and K's ->mnt_master points to struct vfsmount of C 1007 and finally D's ->mnt_slave_list links with ->mnt_slave of 'H' and 'I' 1008 'H' and 'I' have their ->mnt_master pointing to struct vfsmount of 'D'. 1009 1010 1011 NOTE: The propagation tree is orthogonal to the mount tree. 1012 1013 10148B Algorithm: 1015 1016 The crux of the implementation resides in rbind/move operation. 1017 1018 The overall algorithm breaks the operation into 3 phases: (look at 1019 attach_recursive_mnt() and propagate_mnt()) 1020 1021 1. prepare phase. 1022 2. commit phases. 1023 3. abort phases. 1024 1025 Prepare phase: 1026 1027 for each mount in the source tree: 1028 a) Create the necessary number of mount trees to 1029 be attached to each of the mounts that receive 1030 propagation from the destination mount. 1031 b) Do not attach any of the trees to its destination. 1032 However note down its ->mnt_parent and ->mnt_mountpoint 1033 c) Link all the new mounts to form a propagation tree that 1034 is identical to the propagation tree of the destination 1035 mount. 1036 1037 If this phase is successful, there should be 'n' new 1038 propagation trees; where 'n' is the number of mounts in the 1039 source tree. Go to the commit phase 1040 1041 Also there should be 'm' new mount trees, where 'm' is 1042 the number of mounts to which the destination mount 1043 propagates to. 1044 1045 if any memory allocations fail, go to the abort phase. 1046 1047 Commit phase 1048 attach each of the mount trees to their corresponding 1049 destination mounts. 1050 1051 Abort phase 1052 delete all the newly created trees. 1053 1054 NOTE: all the propagation related functionality resides in the file 1055 pnode.c 1056 1057 1058------------------------------------------------------------------------ 1059 1060version 0.1 (created the initial document, Ram Pai linuxram@us.ibm.com) 1061version 0.2 (Incorporated comments from Al Viro) 1062