• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1[section:collectives Collective operations]
2
3[link mpi.tutorial.point_to_point Point-to-point operations] are the
4core message passing primitives in Boost.MPI. However, many
5message-passing applications also require higher-level communication
6algorithms that combine or summarize the data stored on many different
7processes. These algorithms support many common tasks such as
8"broadcast this value to all processes", "compute the sum of the
9values on all processors" or "find the global minimum."
10
11[section:broadcast Broadcast]
12The [funcref boost::mpi::broadcast `broadcast`] algorithm is
13by far the simplest collective operation. It broadcasts a value from a
14single process to all other processes within a [classref
15boost::mpi::communicator communicator]. For instance, the
16following program broadcasts "Hello, World!" from process 0 to every
17other process. (`hello_world_broadcast.cpp`)
18
19  #include <boost/mpi.hpp>
20  #include <iostream>
21  #include <string>
22  #include <boost/serialization/string.hpp>
23  namespace mpi = boost::mpi;
24
25  int main()
26  {
27    mpi::environment env;
28    mpi::communicator world;
29
30    std::string value;
31    if (world.rank() == 0) {
32      value = "Hello, World!";
33    }
34
35    broadcast(world, value, 0);
36
37    std::cout << "Process #" << world.rank() << " says " << value
38              << std::endl;
39    return 0;
40  }
41
42Running this program with seven processes will produce a result such
43as:
44
45[pre
46Process #0 says Hello, World!
47Process #2 says Hello, World!
48Process #1 says Hello, World!
49Process #4 says Hello, World!
50Process #3 says Hello, World!
51Process #5 says Hello, World!
52Process #6 says Hello, World!
53]
54[endsect:broadcast]
55
56[section:gather Gather]
57The [funcref boost::mpi::gather `gather`] collective gathers
58the values produced by every process in a communicator into a vector
59of values on the "root" process (specified by an argument to
60`gather`). The /i/th element in the vector will correspond to the
61value gathered from the /i/th process. For instance, in the following
62program each process computes its own random number. All of these
63random numbers are gathered at process 0 (the "root" in this case),
64which prints out the values that correspond to each processor.
65(`random_gather.cpp`)
66
67  #include <boost/mpi.hpp>
68  #include <iostream>
69  #include <vector>
70  #include <cstdlib>
71  namespace mpi = boost::mpi;
72
73  int main()
74  {
75    mpi::environment env;
76    mpi::communicator world;
77
78    std::srand(time(0) + world.rank());
79    int my_number = std::rand();
80    if (world.rank() == 0) {
81      std::vector<int> all_numbers;
82      gather(world, my_number, all_numbers, 0);
83      for (int proc = 0; proc < world.size(); ++proc)
84        std::cout << "Process #" << proc << " thought of "
85                  << all_numbers[proc] << std::endl;
86    } else {
87      gather(world, my_number, 0);
88    }
89
90    return 0;
91  }
92
93Executing this program with seven processes will result in output such
94as the following. Although the random values will change from one run
95to the next, the order of the processes in the output will remain the
96same because only process 0 writes to `std::cout`.
97
98[pre
99Process #0 thought of 332199874
100Process #1 thought of 20145617
101Process #2 thought of 1862420122
102Process #3 thought of 480422940
103Process #4 thought of 1253380219
104Process #5 thought of 949458815
105Process #6 thought of 650073868
106]
107
108The `gather` operation collects values from every process into a
109vector at one process. If instead the values from every process need
110to be collected into identical vectors on every process, use the
111[funcref boost::mpi::all_gather `all_gather`] algorithm,
112which is semantically equivalent to calling `gather` followed by a
113`broadcast` of the resulting vector.
114
115[endsect:gather]
116
117[section:scatter Scatter]
118The [funcref boost::mpi::scatter `scatter`] collective scatters
119the values from a vector in the "root" process in a communicator into
120values in all the processes of the communicator.
121 The /i/th element in the vector will correspond to the
122value received by the /i/th process. For instance, in the following
123program, the root process produces a vector of random nomber and send
124one value to each process that will print it. (`random_scatter.cpp`)
125
126  #include <boost/mpi.hpp>
127  #include <boost/mpi/collectives.hpp>
128  #include <iostream>
129  #include <cstdlib>
130  #include <vector>
131
132  namespace mpi = boost::mpi;
133
134  int main(int argc, char* argv[])
135  {
136    mpi::environment env(argc, argv);
137    mpi::communicator world;
138
139    std::srand(time(0) + world.rank());
140    std::vector<int> all;
141    int mine = -1;
142    if (world.rank() == 0) {
143      all.resize(world.size());
144      std::generate(all.begin(), all.end(), std::rand);
145    }
146    mpi::scatter(world, all, mine, 0);
147    for (int r = 0; r < world.size(); ++r) {
148      world.barrier();
149      if (r == world.rank()) {
150        std::cout << "Rank " << r << " got " << mine << '\n';
151      }
152    }
153    return 0;
154  }
155
156Executing this program with seven processes will result in output such
157as the following. Although the random values will change from one run
158to the next, the order of the processes in the output will remain the
159same because of the barrier.
160
161[pre
162Rank 0 got 1409381269
163Rank 1 got 17045268
164Rank 2 got 440120016
165Rank 3 got 936998224
166Rank 4 got 1827129182
167Rank 5 got 1951746047
168Rank 6 got 2117359639
169]
170
171[endsect:scatter]
172
173[section:reduce Reduce]
174
175The [funcref boost::mpi::reduce `reduce`] collective
176summarizes the values from each process into a single value at the
177user-specified "root" process. The Boost.MPI `reduce` operation is
178similar in spirit to the STL _accumulate_ operation, because it takes
179a sequence of values (one per process) and combines them via a
180function object. For instance, we can randomly generate values in each
181process and the compute the minimum value over all processes via a
182call to [funcref boost::mpi::reduce `reduce`]
183(`random_min.cpp`):
184
185  #include <boost/mpi.hpp>
186  #include <iostream>
187  #include <cstdlib>
188  namespace mpi = boost::mpi;
189
190  int main()
191  {
192    mpi::environment env;
193    mpi::communicator world;
194
195    std::srand(time(0) + world.rank());
196    int my_number = std::rand();
197
198    if (world.rank() == 0) {
199      int minimum;
200      reduce(world, my_number, minimum, mpi::minimum<int>(), 0);
201      std::cout << "The minimum value is " << minimum << std::endl;
202    } else {
203      reduce(world, my_number, mpi::minimum<int>(), 0);
204    }
205
206    return 0;
207  }
208
209The use of `mpi::minimum<int>` indicates that the minimum value
210should be computed. `mpi::minimum<int>` is a binary function object
211that compares its two parameters via `<` and returns the smaller
212value. Any associative binary function or function object will
213work provided it's stateless. For instance, to concatenate strings with `reduce` one could use
214the function object `std::plus<std::string>` (`string_cat.cpp`):
215
216  #include <boost/mpi.hpp>
217  #include <iostream>
218  #include <string>
219  #include <functional>
220  #include <boost/serialization/string.hpp>
221  namespace mpi = boost::mpi;
222
223  int main()
224  {
225    mpi::environment env;
226    mpi::communicator world;
227
228    std::string names[10] = { "zero ", "one ", "two ", "three ",
229                              "four ", "five ", "six ", "seven ",
230                              "eight ", "nine " };
231
232    std::string result;
233    reduce(world,
234           world.rank() < 10? names[world.rank()]
235                            : std::string("many "),
236           result, std::plus<std::string>(), 0);
237
238    if (world.rank() == 0)
239      std::cout << "The result is " << result << std::endl;
240
241    return 0;
242  }
243
244In this example, we compute a string for each process and then perform
245a reduction that concatenates all of the strings together into one,
246long string. Executing this program with seven processors yields the
247following output:
248
249[pre
250The result is zero one two three four five six
251]
252
253[h4 Binary operations for reduce]
254Any kind of binary function objects can be used with `reduce`. For
255instance, and there are many such function objects in the C++ standard
256`<functional>` header and the Boost.MPI header
257`<boost/mpi/operations.hpp>`. Or, you can create your own
258function object. Function objects used with `reduce` must be
259associative, i.e. `f(x, f(y, z))` must be equivalent to `f(f(x, y),
260z)`. If they are also commutative (i..e, `f(x, y) == f(y, x)`),
261Boost.MPI can use a more efficient implementation of `reduce`. To
262state that a function object is commutative, you will need to
263specialize the class [classref boost::mpi::is_commutative
264`is_commutative`]. For instance, we could modify the previous example
265by telling Boost.MPI that string concatenation is commutative:
266
267  namespace boost { namespace mpi {
268
269    template<>
270    struct is_commutative<std::plus<std::string>, std::string>
271      : mpl::true_ { };
272
273  } } // end namespace boost::mpi
274
275By adding this code prior to `main()`, Boost.MPI will assume that
276string concatenation is commutative and employ a different parallel
277algorithm for the `reduce` operation. Using this algorithm, the
278program outputs the following when run with seven processes:
279
280[pre
281The result is zero one four five six two three
282]
283
284Note how the numbers in the resulting string are in a different order:
285this is a direct result of Boost.MPI reordering operations. The result
286in this case differed from the non-commutative result because string
287concatenation is not commutative: `f("x", "y")` is not the same as
288`f("y", "x")`, because argument order matters. For truly commutative
289operations (e.g., integer addition), the more efficient commutative
290algorithm will produce the same result as the non-commutative
291algorithm. Boost.MPI also performs direct mappings from function
292objects in `<functional>` to `MPI_Op` values predefined by MPI (e.g.,
293`MPI_SUM`, `MPI_MAX`); if you have your own function objects that can
294take advantage of this mapping, see the class template [classref
295boost::mpi::is_mpi_op `is_mpi_op`].
296
297[warning Due to the underlying MPI limitations, it is important to note that the operation must be stateless.]
298
299[h4 All process variant]
300
301Like [link mpi.tutorial.collectives.gather `gather`], `reduce` has an "all"
302variant called [funcref boost::mpi::all_reduce `all_reduce`]
303that performs the reduction operation and broadcasts the result to all
304processes. This variant is useful, for instance, in establishing
305global minimum or maximum values.
306
307The following code (`global_min.cpp`) shows a broadcasting version of
308the `random_min.cpp` example:
309
310  #include <boost/mpi.hpp>
311  #include <iostream>
312  #include <cstdlib>
313  namespace mpi = boost::mpi;
314
315  int main(int argc, char* argv[])
316  {
317    mpi::environment env(argc, argv);
318    mpi::communicator world;
319
320    std::srand(world.rank());
321    int my_number = std::rand();
322    int minimum;
323
324    mpi::all_reduce(world, my_number, minimum, mpi::minimum<int>());
325
326    if (world.rank() == 0) {
327        std::cout << "The minimum value is " << minimum << std::endl;
328    }
329
330    return 0;
331  }
332
333In that example we provide both input and output values, requiring
334twice as much space, which can be a problem depending on the size
335of the transmitted data.
336If there  is no need to preserve the input value, the output value
337can be omitted. In that case the input value will be overridden with
338the output value and Boost.MPI is able, in some situation, to implement
339the operation with a more space efficient solution (using the `MPI_IN_PLACE`
340flag of the MPI C mapping), as in the following example (`in_place_global_min.cpp`):
341
342  #include <boost/mpi.hpp>
343  #include <iostream>
344  #include <cstdlib>
345  namespace mpi = boost::mpi;
346
347  int main(int argc, char* argv[])
348  {
349    mpi::environment env(argc, argv);
350    mpi::communicator world;
351
352    std::srand(world.rank());
353    int my_number = std::rand();
354
355    mpi::all_reduce(world, my_number, mpi::minimum<int>());
356
357    if (world.rank() == 0) {
358        std::cout << "The minimum value is " << my_number << std::endl;
359    }
360
361    return 0;
362  }
363
364
365[endsect:reduce]
366
367[endsect:collectives]
368