1[section:collectives Collective operations] 2 3[link mpi.tutorial.point_to_point Point-to-point operations] are the 4core message passing primitives in Boost.MPI. However, many 5message-passing applications also require higher-level communication 6algorithms that combine or summarize the data stored on many different 7processes. These algorithms support many common tasks such as 8"broadcast this value to all processes", "compute the sum of the 9values on all processors" or "find the global minimum." 10 11[section:broadcast Broadcast] 12The [funcref boost::mpi::broadcast `broadcast`] algorithm is 13by far the simplest collective operation. It broadcasts a value from a 14single process to all other processes within a [classref 15boost::mpi::communicator communicator]. For instance, the 16following program broadcasts "Hello, World!" from process 0 to every 17other process. (`hello_world_broadcast.cpp`) 18 19 #include <boost/mpi.hpp> 20 #include <iostream> 21 #include <string> 22 #include <boost/serialization/string.hpp> 23 namespace mpi = boost::mpi; 24 25 int main() 26 { 27 mpi::environment env; 28 mpi::communicator world; 29 30 std::string value; 31 if (world.rank() == 0) { 32 value = "Hello, World!"; 33 } 34 35 broadcast(world, value, 0); 36 37 std::cout << "Process #" << world.rank() << " says " << value 38 << std::endl; 39 return 0; 40 } 41 42Running this program with seven processes will produce a result such 43as: 44 45[pre 46Process #0 says Hello, World! 47Process #2 says Hello, World! 48Process #1 says Hello, World! 49Process #4 says Hello, World! 50Process #3 says Hello, World! 51Process #5 says Hello, World! 52Process #6 says Hello, World! 53] 54[endsect:broadcast] 55 56[section:gather Gather] 57The [funcref boost::mpi::gather `gather`] collective gathers 58the values produced by every process in a communicator into a vector 59of values on the "root" process (specified by an argument to 60`gather`). The /i/th element in the vector will correspond to the 61value gathered from the /i/th process. For instance, in the following 62program each process computes its own random number. All of these 63random numbers are gathered at process 0 (the "root" in this case), 64which prints out the values that correspond to each processor. 65(`random_gather.cpp`) 66 67 #include <boost/mpi.hpp> 68 #include <iostream> 69 #include <vector> 70 #include <cstdlib> 71 namespace mpi = boost::mpi; 72 73 int main() 74 { 75 mpi::environment env; 76 mpi::communicator world; 77 78 std::srand(time(0) + world.rank()); 79 int my_number = std::rand(); 80 if (world.rank() == 0) { 81 std::vector<int> all_numbers; 82 gather(world, my_number, all_numbers, 0); 83 for (int proc = 0; proc < world.size(); ++proc) 84 std::cout << "Process #" << proc << " thought of " 85 << all_numbers[proc] << std::endl; 86 } else { 87 gather(world, my_number, 0); 88 } 89 90 return 0; 91 } 92 93Executing this program with seven processes will result in output such 94as the following. Although the random values will change from one run 95to the next, the order of the processes in the output will remain the 96same because only process 0 writes to `std::cout`. 97 98[pre 99Process #0 thought of 332199874 100Process #1 thought of 20145617 101Process #2 thought of 1862420122 102Process #3 thought of 480422940 103Process #4 thought of 1253380219 104Process #5 thought of 949458815 105Process #6 thought of 650073868 106] 107 108The `gather` operation collects values from every process into a 109vector at one process. If instead the values from every process need 110to be collected into identical vectors on every process, use the 111[funcref boost::mpi::all_gather `all_gather`] algorithm, 112which is semantically equivalent to calling `gather` followed by a 113`broadcast` of the resulting vector. 114 115[endsect:gather] 116 117[section:scatter Scatter] 118The [funcref boost::mpi::scatter `scatter`] collective scatters 119the values from a vector in the "root" process in a communicator into 120values in all the processes of the communicator. 121 The /i/th element in the vector will correspond to the 122value received by the /i/th process. For instance, in the following 123program, the root process produces a vector of random nomber and send 124one value to each process that will print it. (`random_scatter.cpp`) 125 126 #include <boost/mpi.hpp> 127 #include <boost/mpi/collectives.hpp> 128 #include <iostream> 129 #include <cstdlib> 130 #include <vector> 131 132 namespace mpi = boost::mpi; 133 134 int main(int argc, char* argv[]) 135 { 136 mpi::environment env(argc, argv); 137 mpi::communicator world; 138 139 std::srand(time(0) + world.rank()); 140 std::vector<int> all; 141 int mine = -1; 142 if (world.rank() == 0) { 143 all.resize(world.size()); 144 std::generate(all.begin(), all.end(), std::rand); 145 } 146 mpi::scatter(world, all, mine, 0); 147 for (int r = 0; r < world.size(); ++r) { 148 world.barrier(); 149 if (r == world.rank()) { 150 std::cout << "Rank " << r << " got " << mine << '\n'; 151 } 152 } 153 return 0; 154 } 155 156Executing this program with seven processes will result in output such 157as the following. Although the random values will change from one run 158to the next, the order of the processes in the output will remain the 159same because of the barrier. 160 161[pre 162Rank 0 got 1409381269 163Rank 1 got 17045268 164Rank 2 got 440120016 165Rank 3 got 936998224 166Rank 4 got 1827129182 167Rank 5 got 1951746047 168Rank 6 got 2117359639 169] 170 171[endsect:scatter] 172 173[section:reduce Reduce] 174 175The [funcref boost::mpi::reduce `reduce`] collective 176summarizes the values from each process into a single value at the 177user-specified "root" process. The Boost.MPI `reduce` operation is 178similar in spirit to the STL _accumulate_ operation, because it takes 179a sequence of values (one per process) and combines them via a 180function object. For instance, we can randomly generate values in each 181process and the compute the minimum value over all processes via a 182call to [funcref boost::mpi::reduce `reduce`] 183(`random_min.cpp`): 184 185 #include <boost/mpi.hpp> 186 #include <iostream> 187 #include <cstdlib> 188 namespace mpi = boost::mpi; 189 190 int main() 191 { 192 mpi::environment env; 193 mpi::communicator world; 194 195 std::srand(time(0) + world.rank()); 196 int my_number = std::rand(); 197 198 if (world.rank() == 0) { 199 int minimum; 200 reduce(world, my_number, minimum, mpi::minimum<int>(), 0); 201 std::cout << "The minimum value is " << minimum << std::endl; 202 } else { 203 reduce(world, my_number, mpi::minimum<int>(), 0); 204 } 205 206 return 0; 207 } 208 209The use of `mpi::minimum<int>` indicates that the minimum value 210should be computed. `mpi::minimum<int>` is a binary function object 211that compares its two parameters via `<` and returns the smaller 212value. Any associative binary function or function object will 213work provided it's stateless. For instance, to concatenate strings with `reduce` one could use 214the function object `std::plus<std::string>` (`string_cat.cpp`): 215 216 #include <boost/mpi.hpp> 217 #include <iostream> 218 #include <string> 219 #include <functional> 220 #include <boost/serialization/string.hpp> 221 namespace mpi = boost::mpi; 222 223 int main() 224 { 225 mpi::environment env; 226 mpi::communicator world; 227 228 std::string names[10] = { "zero ", "one ", "two ", "three ", 229 "four ", "five ", "six ", "seven ", 230 "eight ", "nine " }; 231 232 std::string result; 233 reduce(world, 234 world.rank() < 10? names[world.rank()] 235 : std::string("many "), 236 result, std::plus<std::string>(), 0); 237 238 if (world.rank() == 0) 239 std::cout << "The result is " << result << std::endl; 240 241 return 0; 242 } 243 244In this example, we compute a string for each process and then perform 245a reduction that concatenates all of the strings together into one, 246long string. Executing this program with seven processors yields the 247following output: 248 249[pre 250The result is zero one two three four five six 251] 252 253[h4 Binary operations for reduce] 254Any kind of binary function objects can be used with `reduce`. For 255instance, and there are many such function objects in the C++ standard 256`<functional>` header and the Boost.MPI header 257`<boost/mpi/operations.hpp>`. Or, you can create your own 258function object. Function objects used with `reduce` must be 259associative, i.e. `f(x, f(y, z))` must be equivalent to `f(f(x, y), 260z)`. If they are also commutative (i..e, `f(x, y) == f(y, x)`), 261Boost.MPI can use a more efficient implementation of `reduce`. To 262state that a function object is commutative, you will need to 263specialize the class [classref boost::mpi::is_commutative 264`is_commutative`]. For instance, we could modify the previous example 265by telling Boost.MPI that string concatenation is commutative: 266 267 namespace boost { namespace mpi { 268 269 template<> 270 struct is_commutative<std::plus<std::string>, std::string> 271 : mpl::true_ { }; 272 273 } } // end namespace boost::mpi 274 275By adding this code prior to `main()`, Boost.MPI will assume that 276string concatenation is commutative and employ a different parallel 277algorithm for the `reduce` operation. Using this algorithm, the 278program outputs the following when run with seven processes: 279 280[pre 281The result is zero one four five six two three 282] 283 284Note how the numbers in the resulting string are in a different order: 285this is a direct result of Boost.MPI reordering operations. The result 286in this case differed from the non-commutative result because string 287concatenation is not commutative: `f("x", "y")` is not the same as 288`f("y", "x")`, because argument order matters. For truly commutative 289operations (e.g., integer addition), the more efficient commutative 290algorithm will produce the same result as the non-commutative 291algorithm. Boost.MPI also performs direct mappings from function 292objects in `<functional>` to `MPI_Op` values predefined by MPI (e.g., 293`MPI_SUM`, `MPI_MAX`); if you have your own function objects that can 294take advantage of this mapping, see the class template [classref 295boost::mpi::is_mpi_op `is_mpi_op`]. 296 297[warning Due to the underlying MPI limitations, it is important to note that the operation must be stateless.] 298 299[h4 All process variant] 300 301Like [link mpi.tutorial.collectives.gather `gather`], `reduce` has an "all" 302variant called [funcref boost::mpi::all_reduce `all_reduce`] 303that performs the reduction operation and broadcasts the result to all 304processes. This variant is useful, for instance, in establishing 305global minimum or maximum values. 306 307The following code (`global_min.cpp`) shows a broadcasting version of 308the `random_min.cpp` example: 309 310 #include <boost/mpi.hpp> 311 #include <iostream> 312 #include <cstdlib> 313 namespace mpi = boost::mpi; 314 315 int main(int argc, char* argv[]) 316 { 317 mpi::environment env(argc, argv); 318 mpi::communicator world; 319 320 std::srand(world.rank()); 321 int my_number = std::rand(); 322 int minimum; 323 324 mpi::all_reduce(world, my_number, minimum, mpi::minimum<int>()); 325 326 if (world.rank() == 0) { 327 std::cout << "The minimum value is " << minimum << std::endl; 328 } 329 330 return 0; 331 } 332 333In that example we provide both input and output values, requiring 334twice as much space, which can be a problem depending on the size 335of the transmitted data. 336If there is no need to preserve the input value, the output value 337can be omitted. In that case the input value will be overridden with 338the output value and Boost.MPI is able, in some situation, to implement 339the operation with a more space efficient solution (using the `MPI_IN_PLACE` 340flag of the MPI C mapping), as in the following example (`in_place_global_min.cpp`): 341 342 #include <boost/mpi.hpp> 343 #include <iostream> 344 #include <cstdlib> 345 namespace mpi = boost::mpi; 346 347 int main(int argc, char* argv[]) 348 { 349 mpi::environment env(argc, argv); 350 mpi::communicator world; 351 352 std::srand(world.rank()); 353 int my_number = std::rand(); 354 355 mpi::all_reduce(world, my_number, mpi::minimum<int>()); 356 357 if (world.rank() == 0) { 358 std::cout << "The minimum value is " << my_number << std::endl; 359 } 360 361 return 0; 362 } 363 364 365[endsect:reduce] 366 367[endsect:collectives] 368