locking.rst - OpenGrok cross reference for /Documentation/kernel-hacking/locking.rst

Lines Matching +full:we +full:- +full:on +full:- +full:ns
17 Linux Kernel, everyone hacking on the kernel needs to know the
37   +------------------------------------+------------------------------------+
41   +------------------------------------+------------------------------------+
43   +------------------------------------+------------------------------------+
45   +------------------------------------+------------------------------------+
47   +------------------------------------+------------------------------------+
49   +------------------------------------+------------------------------------+
51   +------------------------------------+------------------------------------+
57   +------------------------------------+------------------------------------+
61   +------------------------------------+------------------------------------+
63   +------------------------------------+------------------------------------+
65   +------------------------------------+------------------------------------+
67   +------------------------------------+------------------------------------+
69   +------------------------------------+------------------------------------+
71   +------------------------------------+------------------------------------+
75 ------------------------------------
77 This overlap, where the result depends on the relative timing of
80 Linux starting running on SMP machines, they became one of the major
84 preempting one task during the critical region, we have exactly the same
97 If I could give you one piece of advice on locking: **keep it simple**.
102 -----------------------------------------------------
106 single-holder lock: if you can't get the spinlock, you keep trying
122 ------------------------------
126 design decision: when no-one else can run at the same time, there is no
131 prevent any races. For most purposes, we can think of preemption as
139 between user contexts, as we will see below.
142 ----------------------------
154 nf_register_sockopt(). Registration and de-registration
155 are only done on module load and unload (and boot time, where there is
162 -----------------------------------------
168 used. It disables softirqs on that CPU, then grabs the lock.
184 -----------------------------------------
190 ---------------------------------------
197 -------------------------------
205 Since a tasklet is never run on two CPUs at once, you don't need to
207 on SMP.
216 on the same CPU.
219 ------------------------
226 The same softirq can run on the other CPUs: you can use a per-CPU array
227 (see `Per-CPU Data`_) for better performance. If you're
240 could be running on a different CPU.
250 ----------------------------------------------
255 by a hardware interrupt on another CPU. This is where
257 interrupts on that cpu, then grab the lock.
272 variant which saves whether interrupts were on or off in a flags word,
278 Note that softirqs (and hence tasklets and timers) are run on return
284 -------------------------------------
288 architecture-specific whether all interrupts are disabled inside irq
296 -  If you are in a process context (any syscall) and want to lock other
300 -  Otherwise (== data can be touched in an interrupt), use
304 -  Avoid holding spinlock for more than 5 lines of code and across any
308 -----------------------------
311 various contexts. In some cases, the same context can only be running on
313 particular thread can only run on one CPU at a time, but if it needs
337 +--------+----------------------------+
339 +--------+----------------------------+
341 +--------+----------------------------+
343 +--------+----------------------------+
345 +--------+----------------------------+
347 +--------+----------------------------+
360 spin_trylock() does not spin but returns non-zero if it
361 acquires the spinlock on the first try or 0 if not. This function can be
367 non-zero if it could lock the mutex on the first try or 0 if not. This
379 -------------------
381 For our first example, we assume that all operations are in user context
382 (ie. from system calls), so we can sleep. This means we can use a mutex
411                     if (i->id == id) {
412                             i->popularity++;
422             list_del(&obj->list);
424             cache_num--;
430             list_add(&obj->list, &cache);
434                             if (!outcast || i->popularity < outcast->popularity)
446                     return -ENOMEM;
448             strscpy(obj->name, name, sizeof(obj->name));
449             obj->id = id;
450             obj->popularity = 0;
468             int ret = -ENOENT;
474                     strcpy(name, obj->name);
480 Note that we always make sure we have the cache_lock when we add,
483 easy, since we copy the data for the user, and never let them access the
487 cache_add() we set up the fields of the object before
488 grabbing the lock. This is safe, as no-one else can access it until we
492 --------------------------------
498 The change is shown below, in standard patch format: the ``-`` are lines
503     --- cache.c.usercontext 2003-12-09 13:58:54.000000000 +1100
504     +++ cache.c.interrupt   2003-12-09 14:07:49.000000000 +1100
505     @@ -12,7 +12,7 @@
509     -static DEFINE_MUTEX(cache_lock);
514     @@ -55,6 +55,7 @@
521                      return -ENOMEM;
522     @@ -63,30 +64,33 @@
523              obj->id = id;
524              obj->popularity = 0;
526     -        mutex_lock(&cache_lock);
529     -        mutex_unlock(&cache_lock);
536     -        mutex_lock(&cache_lock);
541     -        mutex_unlock(&cache_lock);
548              int ret = -ENOENT;
551     -        mutex_lock(&cache_lock);
556                      strcpy(name, obj->name);
558     -        mutex_unlock(&cache_lock);
564 interrupts if they are on, otherwise does nothing (if we are already in
575 ----------------------------------
582 The first problem is that we use the ``cache_lock`` to protect objects:
583 we'd need to make this non-static so the rest of the code can use it.
590 worse, add another object, re-using the same address.
592 As there is only one lock, you can't hold it forever: no-one else would
602     --- cache.c.interrupt   2003-12-09 14:25:43.000000000 +1100
603     +++ cache.c.refcnt  2003-12-09 14:33:05.000000000 +1100
604     @@ -7,6 +7,7 @@
612     @@ -17,6 +18,35 @@
618     +        if (--obj->refcnt == 0)
624     +        obj->refcnt++;
648     @@ -35,6 +65,7 @@
651              list_del(&obj->list);
653              cache_num--;
656     @@ -63,6 +94,7 @@
657              strscpy(obj->name, name, sizeof(obj->name));
658              obj->id = id;
659              obj->popularity = 0;
660     +        obj->refcnt = 1; /* The cache holds a reference */
664     @@ -79,18 +111,15 @@
668     -int cache_find(int id, char *name)
672     -        int ret = -ENOENT;
677     -        if (obj) {
678     -                ret = 0;
679     -                strcpy(name, obj->name);
680     -        }
684     -        return ret;
688 We encapsulate the reference counting in the standard 'get' and 'put'
689 functions. Now we can return the object itself from
706 although for anything non-trivial using spinlocks is clearer. The
713     --- cache.c.refcnt  2003-12-09 15:00:35.000000000 +1100
714     +++ cache.c.refcnt-atomic   2003-12-11 15:49:42.000000000 +1100
715     @@ -7,7 +7,7 @@
719     -        unsigned int refcnt;
724     @@ -18,33 +18,15 @@
728     -static void __object_put(struct object *obj)
729     -{
730     -        if (--obj->refcnt == 0)
731     -                kfree(obj);
732     -}
733     -
734     -static void __object_get(struct object *obj)
735     -{
736     -        obj->refcnt++;
737     -}
738     -
741     -        unsigned long flags;
742     -
743     -        spin_lock_irqsave(&cache_lock, flags);
744     -        __object_put(obj);
745     -        spin_unlock_irqrestore(&cache_lock, flags);
746     +        if (atomic_dec_and_test(&obj->refcnt))
752     -        unsigned long flags;
753     -
754     -        spin_lock_irqsave(&cache_lock, flags);
755     -        __object_get(obj);
756     -        spin_unlock_irqrestore(&cache_lock, flags);
757     +        atomic_inc(&obj->refcnt);
761     @@ -65,7 +47,7 @@
764              list_del(&obj->list);
765     -        __object_put(obj);
767              cache_num--;
770     @@ -94,7 +76,7 @@
771              strscpy(obj->name, name, sizeof(obj->name));
772              obj->id = id;
773              obj->popularity = 0;
774     -        obj->refcnt = 1; /* The cache holds a reference */
775     +        atomic_set(&obj->refcnt, 1); /* The cache holds a reference */
779     @@ -119,7 +101,7 @@
783     -                __object_get(obj);
790 ---------------------------------
792 In these examples, we assumed that the objects (except the reference
793 counts) never changed once they are created. If we wanted to allow the
796 -  You can make ``cache_lock`` non-static, and tell people to grab that
799 -  You can provide a cache_obj_rename() which grabs this
803 -  You can make the ``cache_lock`` protect only the cache itself, and
806 Theoretically, you can make the locks as fine-grained as one lock for
810 -  One lock which protects the infrastructure (the ``cache`` list in
811    this example) and all the objects. This is what we have done so far.
813 -  One lock which protects the infrastructure (including the list
817 -  Multiple locks to protect the infrastructure (eg. one lock per hash
818    chain), possibly with a separate per-object lock.
820 Here is the "lock-per-object" implementation:
824     --- cache.c.refcnt-atomic   2003-12-11 15:50:54.000000000 +1100
825     +++ cache.c.perobjectlock   2003-12-11 17:15:03.000000000 +1100
826     @@ -6,11 +6,17 @@
841     -        int popularity;
845     @@ -77,6 +84,7 @@
846              obj->id = id;
847              obj->popularity = 0;
848              atomic_set(&obj->refcnt, 1); /* The cache holds a reference */
849     +        spin_lock_init(&obj->lock);
855 ``cache_lock`` rather than the per-object lock: this is because it (like
875 -----------------------------
881 stay-up-five-nights-talk-to-fluffy-code-bunnies kind of problem.
890 with a single CPU (although not on UP compiles, since spinlocks vanish
891 on kernel compiles with ``CONFIG_SMP``\ =n. You'll still get data
894 This complete lockup is easy to diagnose: on SMP boxes the watchdog
899 A more complex problem is the so-called 'deadly embrace', involving two
909 lock it twice. Secondly, if the same softirq on another CPU is trying to
913 +-----------------------+-----------------------+
916 | Grab lock A -> OK     | Grab lock B -> OK     |
917 +-----------------------+-----------------------+
918 | Grab lock B -> spin   | Grab lock A -> spin   |
919 +-----------------------+-----------------------+
927 -------------------
936 are never held around calls to non-trivial functions outside the same
955 -------------------------------
961 If you want to destroy the entire collection (say on module removal),
969                     struct foo *next = list->next;
970                     timer_delete(&list->timer);
978 Sooner or later, this will crash on SMP, because a timer can have just
980 the lock after we spin_unlock_bh(), and then try to free
985 If 0, it means (in this case) that it is currently running, so we can
992                             struct foo *next = list->next;
993                             if (!timer_delete(&list->timer)) {
1025 Concurrency depends on how long the lock is usually held: you should
1027 example, we always create the object without the lock held, and then
1028 grab the lock only when we are ready to insert it in the list.
1030 Acquisition times depend on how much damage the lock operations do to
1032 the last one to grab the lock (ie. is the lock cache-hot for this CPU):
1033 on a machine with more CPUs, this likelihood drops fast. Consider a
1034 700MHz Intel Pentium III: an instruction takes about 0.7ns, an atomic
1035 increment takes about 58ns, a lock which is cache-hot on this CPU takes
1036 160ns, and a cacheline transfer from another CPU takes an additional 170
1037 to 360ns. (These figures from Paul McKenney's `Linux Journal RCU
1041 by splitting locks into parts (such as in our final per-object-lock
1050 ------------------------
1065 --------------------------------
1068 Using RCU, the readers can avoid taking a lock altogether: as we expect
1072 How do we get rid of read locks? Getting rid of read locks means that
1074 actually quite simple: we can read a linked list while an element is
1078             new->next = list->next;
1080             list->next = new;
1088 otherwise: we want a reader to either not see the new element at all, or
1096 Removing an element from the list is even simpler: we replace the
1102             list->next = old->next;
1106 does this (the normal version poisons the old object, which we don't
1111 don't realize that the pre-fetched contents is wrong when the ``next``
1118 Our final dilemma is this: when can we actually destroy the removed
1120 the list right now: if we free this element and the ``next`` pointer
1121 changes, the reader will jump off into garbage and crash. We need to
1122 wait until we know that all the readers who were traversing the list
1123 when we deleted the element are finished. We use
1125 destroy the object once all pre-existing readers are finished.
1127 until all pre-existing are finished.
1136 readers cannot sleep, we know that any readers which were traversing the
1143     --- cache.c.perobjectlock   2003-12-11 17:15:03.000000000 +1100
1144     +++ cache.c.rcupdate    2003-12-11 17:55:14.000000000 +1100
1145     @@ -1,15 +1,18 @@
1155     -        /* These two protected by cache_lock. */
1165     @@ -40,7 +43,7 @@
1169     -        list_for_each_entry(i, &cache, list) {
1171                      if (i->id == id) {
1172                              i->popularity++;
1174     @@ -49,19 +52,25 @@
1178     +/* Final discard done once we know no readers are looking. */
1188     -        list_del(&obj->list);
1189     -        object_put(obj);
1190     +        list_del_rcu(&obj->list);
1191              cache_num--;
1192     +        call_rcu(&obj->rcu, cache_delete_rcu);
1198     -        list_add(&obj->list, &cache);
1199     +        list_add_rcu(&obj->list, &cache);
1203     @@ -104,12 +114,11 @@
1207     -        unsigned long flags;
1209     -        spin_lock_irqsave(&cache_lock, flags);
1214     -        spin_unlock_irqrestore(&cache_lock, flags);
1221 solution would be to make it an ``atomic_t``, but for this usage, we
1226 synchronization with any other functions, so is almost as fast on SMP as
1227 it would be on UP.
1238 need to actually get and put the reference count: we could expose
1239 __cache_find() by making it non-static, and such
1243 object is not altered in any way, which is much faster on SMP machines
1246 Per-CPU Data
1247 ------------
1255 machine to test on and can show that it is), you could instead use a
1260 Of particular use for simple per-cpu counters is the ``local_t`` type,
1262 more efficient than simple code on some architectures
1270 ----------------------------------------
1274 will not run simultaneously on multiple CPUs.
1287 (and waits for it to finish if it's currently running on other CPUs).
1302 --------------------------
1310 -  Accesses to userspace:
1312    -  copy_from_user()
1314    -  copy_to_user()
1316    -  get_user()
1318    -  put_user()
1320 -  kmalloc(GP_KERNEL) <kmalloc>`
1322 -  mutex_lock_interruptible() and
1332 --------------------------------
1337 -  printk()
1339 -  kfree()
1341 -  add_timer() and timer_delete()
1346 .. kernel-doc:: include/linux/mutex.h
1349 .. kernel-doc:: kernel/locking/mutex.c
1355 .. kernel-doc:: kernel/futex/core.c
1358 .. kernel-doc:: kernel/futex/futex.h
1361 .. kernel-doc:: kernel/futex/pi.c
1364 .. kernel-doc:: kernel/futex/requeue.c
1367 .. kernel-doc:: kernel/futex/waitwake.c
1373 -  ``Documentation/locking/spinlocks.rst``: Linus Torvalds' spinlocking
1376 -  Unix Systems for Modern Architectures: Symmetric Multiprocessing and
1394 Thanks to the cabal for having no influence on this document.
1405   preemption, even on UP.
1410   blocks any software interrupt on the current CPU. Bottom halves are
1423   Symmetric Multi-Processor: kernels compiled for multiple-CPU machines.
1432   interrupts which can run on multiple CPUs at once. Sometimes used to
1436   A dynamically-registrable software interrupt, which is guaranteed to
1437   only run on one CPU at a time.
1440   A dynamically-registrable software interrupt, which is run at (or close
1445   Uni-Processor: Non-SMP. (``CONFIG_SMP=n``).
1448   The kernel executing on behalf of a particular process (ie. a system