1[/============================================================================ 2 Boost.odeint 3 4 Copyright 2013 Karsten Ahnert 5 Copyright 2013 Pascal Germroth 6 Copyright 2013 Mario Mulansky 7 8 Use, modification and distribution is subject to the Boost Software License, 9 Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at 10 http://www.boost.org/LICENSE_1_0.txt) 11=============================================================================/] 12 13 14[section Parallel computation with OpenMP and MPI] 15 16Parallelization is a key feature for modern numerical libraries due to the vast 17availability of many cores nowadays, even on Laptops. 18odeint currently supports parallelization with OpenMP and MPI, as described in 19the following sections. 20However, it should be made clear from the beginning that the difficulty of 21efficiently distributing ODE integration on many cores/machines lies in the 22parallelization of the system function, which is still the user's 23responsibility. 24Simply using a parallel odeint backend without parallelizing the system function 25will bring you almost no performance gains. 26 27[section OpenMP] 28 29[import ../examples/openmp/phase_chain.cpp] 30 31odeint's OpenMP support is implemented as an external backend, which needs to be 32manually included. Depending on the compiler some additional flags may be 33needed, i.e. [^-fopenmp] for GCC. 34[phase_chain_openmp_header] 35 36In the easiest parallelization approach with OpenMP we use a standard `vector` 37as the state type: 38[phase_chain_vector_state] 39 40We initialize the state with some random data: 41[phase_chain_init] 42 43Now we have to configure the stepper to use the OpenMP backend. 44This is done by explicitly providing the `openmp_range_algebra` as a template 45parameter to the stepper. 46This algebra requires the state type to be a model of Random Access Range and 47will be used from multiple threads by the algebra. 48[phase_chain_stepper] 49 50Additional to providing the stepper with OpenMP parallelization we also need 51a parallelized system function to exploit the available cores. 52Here this is shown for a simple one-dimensional chain of phase oscillators with 53nearest neighbor coupling: 54[phase_chain_rhs] 55 56[note In the OpenMP backends the system function will always be called 57sequentially from the thread used to start the integration.] 58 59Finally, we perform the integration by using one of the integrate functions from 60odeint. 61As you can see, the parallelization is completely hidden in the stepper and the 62system function. 63OpenMP will take care of distributing the work among the threads and join them 64automatically. 65[phase_chain_integrate] 66 67After integrating, the data can be accessed immediately and be processed 68further. 69Note, that you can specify the OpenMP scheduling by calling `omp_set_schedule` 70in the beginning of your program: 71[phase_chain_scheduling] 72 73See [github_link examples/openmp/phase_chain.cpp 74openmp/phase_chain.cpp] for the complete example. 75 76[heading Split state] 77 78[import ../examples/openmp/phase_chain_omp_state.cpp] 79 80For advanced cases odeint offers another approach to use OpenMP that allows for 81a more exact control of the parallelization. 82For example, for odd-sized data where OpenMP's thread boundaries don't match 83cache lines and hurt performance it might be advisable to copy the data from the 84continuous `vector<T>` into separate, individually aligned, vectors. 85For this, odeint provides the `openmp_state<T>` type, essentially an alias for 86`vector<vector<T>>`. 87 88Here, the initialization is done with a `vector<double>`, but then we use 89odeint's `split` function to fill an `openmp_state`. 90The splitting is done such that the sizes of the individual regions differ at 91most by 1 to make the computation as uniform as possible. 92[phase_chain_state_init] 93 94Of course, the system function has to be changed to deal with the 95`openmp_state`. 96Note that each sub-region of the state is computed in a single task, but at the 97borders read access to the neighbouring regions is required. 98[phase_chain_state_rhs] 99 100Using the `openmp_state<T>` state type automatically selects `openmp_algebra` 101which executes odeint's internal computations on parallel regions. 102Hence, no manual configuration of the stepper is necessary. 103At the end of the integration, we use `unsplit` to concatenate the sub-regions 104back together into a single vector. 105[phase_chain_state_integrate] 106 107[note You don't actually need to use `openmp_state<T>` for advanced use cases, 108`openmp_algebra` is simply an alias for `openmp_nested_algebra<range_algebra>` 109and supports any model of Random Access Range as the outer, parallel state type, 110and will use the given algebra on its elements.] 111 112See [github_link examples/openmp/phase_chain_omp_state.cpp 113openmp/phase_chain_omp_state.cpp] for the complete example. 114 115[endsect] 116 117[section MPI] 118 119[import ../examples/mpi/phase_chain.cpp] 120 121To expand the parallel computation across multiple machines we can use MPI. 122 123The system function implementation is similar to the OpenMP variant with split 124data, the main difference being that while OpenMP uses a spawn/join model where 125everything not explicitly paralleled is only executed in the main thread, in 126MPI's model each node enters the `main()` method independently, diverging based 127on its rank and synchronizing through message-passing and explicit barriers. 128 129odeint's MPI support is implemented as an external backend, too. 130Depending on the MPI implementation the code might need to be compiled with i.e. 131[^mpic++]. 132[phase_chain_mpi_header] 133 134Instead of reading another thread's data, we asynchronously send and receive the 135relevant data from neighbouring nodes, performing some computation in the interim 136to hide the latency. 137[phase_chain_mpi_rhs] 138 139Analogous to `openmp_state<T>` we use `mpi_state< InnerState<T> >`, which 140automatically selects `mpi_nested_algebra` and the appropriate MPI-oblivious 141inner algebra (since our inner state is a `vector`, the inner algebra will be 142`range_algebra` as in the OpenMP example). 143[phase_chain_state] 144 145In the main program we construct a `communicator` which tells us the `size` of 146the cluster and the current node's `rank` within that. 147We generate the input data on the master node only, avoiding unnecessary work on 148the other nodes. 149Instead of simply copying chunks, `split` acts as a MPI collective function here 150and sends/receives regions from master to each slave. 151The input argument is ignored on the slaves, but the master node receives 152a region in its output and will participate in the computation. 153[phase_chain_mpi_init] 154 155Now that `x_split` contains (only) the local chunk for each node, we start the 156integration. 157 158To print the result on the master node, we send the processed data back using 159`unsplit`. 160[phase_chain_mpi_integrate] 161 162[note `mpi_nested_algebra::for_each`[~N] doesn't use any MPI constructs, it 163simply calls the inner algebra on the local chunk and the system function is not 164guarded by any barriers either, so if you don't manually place any (for example 165in parameter studies cases where the elements are completely independent) you 166might see the nodes diverging, returning from this call at different times.] 167 168See [github_link examples/mpi/phase_chain.cpp 169mpi/phase_chain.cpp] for the complete example. 170 171[endsect] 172 173[section Concepts] 174 175[section MPI State] 176As used by `mpi_nested_algebra`. 177[heading Notation] 178[variablelist 179 [[`InnerState`] [The inner state type]] 180 [[`State`] [The MPI-state type]] 181 [[`state`] [Object of type `State`]] 182 [[`world`] [Object of type `boost::mpi::communicator`]] 183] 184[heading Valid Expressions] 185[table 186 [[Name] [Expression] [Type] [Semantics]] 187 [[Construct a state with a communicator] 188 [`State(world)`] [`State`] [Constructs the State.]] 189 [[Construct a state with the default communicator] 190 [`State()`] [`State`] [Constructs the State.]] 191 [[Get the current node's inner state] 192 [`state()`] [`InnerState`] [Returns a (const) reference.]] 193 [[Get the communicator] 194 [`state.world`] [`boost::mpi::communicator`] [See __boost_mpi.]] 195] 196[heading Models] 197* `mpi_state<InnerState>` 198 199[endsect] 200 201[section OpenMP Split State] 202As used by `openmp_nested_algebra`, essentially a Random Access Container with 203`ValueType = InnerState`. 204[heading Notation] 205[variablelist 206 [[`InnerState`] [The inner state type]] 207 [[`State`] [The split state type]] 208 [[`state`] [Object of type `State`]] 209] 210[heading Valid Expressions] 211[table 212 [[Name] [Expression] [Type] [Semantics]] 213 [[Construct a state for `n` chunks] 214 [`State(n)`] [`State`] [Constructs underlying `vector`.]] 215 [[Get a chunk] 216 [`state[i]`] [`InnerState`] [Accesses underlying `vector`.]] 217 [[Get the number of chunks] 218 [`state.size()`] [`size_type`] [Returns size of underlying `vector`.]] 219] 220[heading Models] 221* `openmp_state<ValueType>` with `InnerState = vector<ValueType>` 222 223[endsect] 224 225[section Splitter] 226[heading Notation] 227[variablelist 228 [[`Container1`] [The continuous-data container type]] 229 [[`x`] [Object of type `Container1`]] 230 [[`Container2`] [The chunked-data container type]] 231 [[`y`] [Object of type `Container2`]] 232] 233[heading Valid Expressions] 234[table 235 [[Name] [Expression] [Type] [Semantics]] 236 [[Copy chunks of input to output elements] 237 [`split(x, y)`] [`void`] 238 [Calls `split_impl<Container1, Container2>::split(x, y)`, splits `x` into 239 `y.size()` chunks.]] 240 [[Join chunks of input elements to output] 241 [`unsplit(y, x)`] [`void`] 242 [Calls `unsplit_impl<Container2, Container1>::unsplit(y, x)`, assumes `x` 243 is of the correct size ['__sigma `y[i].size()`], does not resize `x`.]] 244] 245[heading Models] 246* defined for `Container1` = __boost_range and `Container2 = openmp_state` 247* and `Container2 = mpi_state`. 248 249To implement splitters for containers incompatible with __boost_range, 250specialize the `split_impl` and `unsplit_impl` types: 251``` 252template< class Container1, class Container2 , class Enabler = void > 253struct split_impl { 254 static void split( const Container1 &from , Container2 &to ); 255}; 256 257template< class Container2, class Container1 , class Enabler = void > 258struct unsplit_impl { 259 static void unsplit( const Container2 &from , Container1 &to ); 260}; 261``` 262[endsect] 263 264[endsect] 265 266[endsect] 267