• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1:orphan:
2
3.. _multiprocessing-doc:
4
5Multiprocessing package - torch.multiprocessing
6===============================================
7
8.. automodule:: torch.multiprocessing
9.. currentmodule:: torch.multiprocessing
10
11.. warning::
12
13    If the main process exits abruptly (e.g. because of an incoming signal),
14    Python's ``multiprocessing`` sometimes fails to clean up its children.
15    It's a known caveat, so if you're seeing any resource leaks after
16    interrupting the interpreter, it probably means that this has just happened
17    to you.
18
19Strategy management
20-------------------
21
22.. autofunction:: get_all_sharing_strategies
23.. autofunction:: get_sharing_strategy
24.. autofunction:: set_sharing_strategy
25
26
27.. _multiprocessing-cuda-sharing-details:
28
29Sharing CUDA tensors
30--------------------
31
32Sharing CUDA tensors between processes is supported only in Python 3, using
33a ``spawn`` or ``forkserver`` start methods.
34
35
36Unlike CPU tensors, the sending process is required to keep the original tensor
37as long as the receiving process retains a copy of the tensor. The refcounting is
38implemented under the hood but requires users to follow the next best practices.
39
40.. warning::
41    If the consumer process dies abnormally to a fatal signal, the shared tensor
42    could be forever kept in memory as long as the sending process is running.
43
44
451. Release memory ASAP in the consumer.
46
47::
48
49    ## Good
50    x = queue.get()
51    # do somethings with x
52    del x
53
54::
55
56    ## Bad
57    x = queue.get()
58    # do somethings with x
59    # do everything else (producer have to keep x in memory)
60
612. Keep producer process running until all consumers exits. This will prevent
62the situation when the producer process releasing memory which is still in use
63by the consumer.
64
65::
66
67    ## producer
68    # send tensors, do something
69    event.wait()
70
71
72::
73
74    ## consumer
75    # receive tensors and use them
76    event.set()
77
783. Don't pass received tensors.
79
80::
81
82    # not going to work
83    x = queue.get()
84    queue_2.put(x)
85
86
87::
88
89    # you need to create a process-local copy
90    x = queue.get()
91    x_clone = x.clone()
92    queue_2.put(x_clone)
93
94
95::
96
97    # putting and getting from the same queue in the same process will likely end up with segfault
98    queue.put(tensor)
99    x = queue.get()
100
101
102Sharing strategies
103------------------
104
105This section provides a brief overview into how different sharing strategies
106work. Note that it applies only to CPU tensor - CUDA tensors will always use
107the CUDA API, as that's the only way they can be shared.
108
109File descriptor - ``file_descriptor``
110^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
111
112
113.. note::
114
115    This is the default strategy (except for macOS and OS X where it's not
116    supported).
117
118This strategy will use file descriptors as shared memory handles. Whenever a
119storage is moved to shared memory, a file descriptor obtained from ``shm_open``
120is cached with the object, and when it's going to be sent to other processes,
121the file descriptor will be transferred (e.g. via UNIX sockets) to it. The
122receiver will also cache the file descriptor and ``mmap`` it, to obtain a shared
123view onto the storage data.
124
125Note that if there will be a lot of tensors shared, this strategy will keep a
126large number of file descriptors open most of the time. If your system has low
127limits for the number of open file descriptors, and you can't raise them, you
128should use the ``file_system`` strategy.
129
130File system - ``file_system``
131^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
132
133This strategy will use file names given to ``shm_open`` to identify the shared
134memory regions. This has a benefit of not requiring the implementation to cache
135the file descriptors obtained from it, but at the same time is prone to shared
136memory leaks. The file can't be deleted right after its creation, because other
137processes need to access it to open their views. If the processes fatally
138crash, or are killed, and don't call the storage destructors, the files will
139remain in the system. This is very serious, because they keep using up the
140memory until the system is restarted, or they're freed manually.
141
142To counter the problem of shared memory file leaks, :mod:`torch.multiprocessing`
143will spawn a daemon named ``torch_shm_manager`` that will isolate itself from
144the current process group, and will keep track of all shared memory allocations.
145Once all processes connected to it exit, it will wait a moment to ensure there
146will be no new connections, and will iterate over all shared memory files
147allocated by the group. If it finds that any of them still exist, they will be
148deallocated. We've tested this method and it proved to be robust to various
149failures. Still, if your system has high enough limits, and ``file_descriptor``
150is a supported strategy, we do not recommend switching to this one.
151
152Spawning subprocesses
153---------------------
154
155.. note::
156
157   Available for Python >= 3.4.
158
159   This depends on the ``spawn`` start method in Python's
160   ``multiprocessing`` package.
161
162Spawning a number of subprocesses to perform some function can be done
163by creating ``Process`` instances and calling ``join`` to wait for
164their completion. This approach works fine when dealing with a single
165subprocess but presents potential issues when dealing with multiple
166processes.
167
168Namely, joining processes sequentially implies they will terminate
169sequentially. If they don't, and the first process does not terminate,
170the process termination will go unnoticed. Also, there are no native
171facilities for error propagation.
172
173The ``spawn`` function below addresses these concerns and takes care
174of error propagation, out of order termination, and will actively
175terminate processes upon detecting an error in one of them.
176
177.. automodule:: torch.multiprocessing.spawn
178.. currentmodule:: torch.multiprocessing.spawn
179
180.. autofunction:: spawn
181
182.. currentmodule:: torch.multiprocessing
183
184
185.. class:: SpawnContext
186
187   Returned by :func:`~spawn` when called with ``join=False``.
188
189   .. automethod:: join
190
191
192.. This module needs to be documented. Adding here in the meantime
193.. for tracking purposes
194.. py:module:: torch.multiprocessing.pool
195.. py:module:: torch.multiprocessing.queue
196.. py:module:: torch.multiprocessing.reductions
197