• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1[/
2          Copyright Oliver Kowalke 2017.
3 Distributed under the Boost Software License, Version 1.0.
4    (See accompanying file LICENSE_1_0.txt or copy at
5          http://www.boost.org/LICENSE_1_0.txt
6]
7
8[#numa]
9[section:numa NUMA]
10
11Modern micro-processors contain integrated memory controllers that are connected
12via channels to the memory. Accessing the memory can be organized in two kinds:[br]
13Uniform Memory Access (UMA) and Non-Uniform Memory Access (NUMA).
14
15In contrast to UMA, that provides a centralized pool of memory (and thus does
16not scale after a certain number of processors), a NUMA architecture divides the
17memory into local and remote memory relative to the micro-processor.[br]
18Local memory is directly attached to the processor's integrated memory controller.
19Memory connected to the memory controller of another micro-processor (multi-socket
20systems) is considered as remote memory. If a memory controller access remote memory
21it has to traverse the interconnect[footnote On x86 the interconnection is implemented
22by Intel's Quick Path Interconnect (QPI) and AMD's HyperTransport.] and
23connect to the remote memory controller.[br]
24Thus accessing remote memory adds additional latency overhead to local memory access.
25Because of the different memory locations, a NUMA-system experiences ['non-uniform]
26memory access time.[br]
27As a consequence the best performance is achieved by keeping the memory access
28local.
29
30[$../../../../libs/fiber/doc/NUMA.png [align center]]
31
32
33[heading NUMA support in Boost.Fiber]
34
35Because only a subset of the NUMA-functionality is exposed by several operating systems,
36Boost.Fiber provides only a minimalistic NUMA API.
37
38[important In order to enable NUMA support, b2 property `numa=on` must be specified
39and linked against additional library `libboost_fiber_numa.so`.]
40[important MinGW using pthread implementation is not supported on Windows.]
41
42[table Supported functionality/operating systems
43    [
44        []
45        [AIX]
46        [FreeBSD]
47        [HP/UX]
48        [Linux]
49        [Solaris]
50        [Windows]
51    ]
52    [
53        [pin thread]
54        [+]
55        [+]
56        [+]
57        [+]
58        [+]
59        [+]
60    ]
61    [
62        [logical CPUs/NUMA nodes]
63        [+]
64        [+]
65        [+]
66        [+]
67        [+]
68        [+[footnote Windows organizes logical cpus in groups of 64; boost.fiber maps
69           {group-id,cpud-id} to a scalar equivalent to cpu ID of Linux (64 * group ID + cpu ID).]]
70    ]
71    [
72        [NUMA node distance]
73        [-]
74        [-]
75        [-]
76        [+]
77        [-]
78        [-]
79    ]
80    [
81        [tested on]
82        [AIX 7.2]
83        [FreeBSD 11]
84        [-]
85        [Arch Linux (4.10.13)]
86        [OpenIndiana HIPSTER]
87        [Windows 10]
88    ]
89]
90
91In order to keep the memory access local as possible, the NUMA topology must be evaluated.
92
93    std::vector< boost::fibers::numa::node > topo = boost::fibers::numa::topology();
94    for ( auto n : topo) {
95        std::cout << "node: " << n.id << " | ";
96        std::cout << "cpus: ";
97        for ( auto cpu_id : n.logical_cpus) {
98            std::cout << cpu_id << " ";
99        }
100        std::cout << "| distance: ";
101        for ( auto d : n.distance) {
102            std::cout << d << " ";
103        }
104        std::cout << std::endl;
105    }
106    std::cout << "done" << std::endl;
107
108    output:
109        node: 0 | cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23 | distance: 10 21
110        node: 1 | cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31 | distance: 21 10
111        done
112
113The example shows that the systems consits out of 2 NUMA-nodes, to each NUMA-node belong
11416 logical cpus. The distance measures the costs to access the memory of another NUMA-node.
115A NUMA-node has always a distance `10` to itself (lowest possible value).[br]
116The position in the array corresponds with the NUMA-node ID.
117
118Some work-loads benefit from pinning threads to a logical cpus. For instance scheduling
119algorithm __numa_work_stealing__ pins the thread that runs the fiber scheduler to
120a logical cpu. This prevents the operating system scheduler to move the thread to another
121logical cpu that might run other fiber scheduler(s) or migrating the thread to a logical
122cpu part of another NUMA-node.
123
124        void thread( std::uint32_t cpu_id, std::uint32_t node_id, std::vector< boost::fibers::numa::node > const& topo) {
125            // thread registers itself at work-stealing scheduler
126            boost::fibers::use_scheduling_algorithm< boost::fibers::numa::algo::work_stealing >( cpu_id, node_id, topo);
127            ...
128        }
129
130        // evaluate the NUMA topology
131        std::vector< boost::fibers::numa::node > topo = boost::fibers::numa::topology();
132        // start-thread runs on NUMA-node `0`
133        auto node = topo[0];
134        // start-thread is pinnded to first cpu ID in the list of logical cpus of NUMA-node `0`
135        auto start_cpu_id = * node.logical_cpus.begin();
136        // start worker-threads first
137        std::vector< std::thread > threads;
138        for ( auto & node : topo) {
139            for ( std::uint32_t cpu_id : node.logical_cpus) {
140                // exclude start-thread
141                if ( start_cpu_id != cpu_id) {
142                    // spawn thread
143                    threads.emplace_back( thread, cpu_id, node.id, std::cref( topo) );
144                }
145            }
146        }
147        // start-thread registers itself on work-stealing scheduler
148        boost::fibers::use_scheduling_algorithm< boost::fibers::numa::algo::work_stealing >( start_cpu_id, node.id, topo);
149        ...
150
151The example evaluates the NUMA topology with `boost::fibers::numa::topology()`
152and spawns for each logical cpu a thread. Each spawned thread installs the
153NUMA-aware work-stealing scheduler. The scheduler pins the thread to the
154logical cpu that was specified at construction.[br]
155If the local queue of one thread runs out of ready fibers, the thread tries to
156steal a ready fiber from another thread running at logical cpu that belong to
157the same NUMA-node (local memory access). If no fiber could be stolen, the
158thread tries to steal fibers from logical cpus part of other NUMA-nodes (remote
159memory access).
160
161
162[heading Synopsis]
163
164    #include <boost/fiber/numa/pin_thread.hpp>
165    #include <boost/fiber/numa/topology.hpp>
166
167    namespace boost {
168    namespace fibers {
169    namespace numa {
170
171    struct node {
172        std::uint32_t                   id;
173        std::set< std::uint32_t >       logical_cpus;
174        std::vector< std::uint32_t >    distance;
175    };
176    bool operator<( node const&, node const&) noexcept;
177
178    std::vector< node > topology();
179
180    void pin_thread( std::uint32_t);
181    void pin_thread( std::uint32_t, std::thread::native_handle_type);
182
183    }}}
184
185    #include <boost/fiber/numa/algo/work_stealing.hpp>
186
187    namespace boost {
188    namespace fibers {
189    namespace numa {
190    namespace algo {
191
192    class work_stealing;
193
194    }}}
195
196
197[ns_class_heading numa..node]
198
199    #include <boost/fiber/numa/topology.hpp>
200
201    namespace boost {
202    namespace fibers {
203    namespace numa {
204
205    struct node {
206        std::uint32_t                   id;
207        std::set< std::uint32_t >       logical_cpus;
208        std::vector< std::uint32_t >    distance;
209    };
210    bool operator<( node const&, node const&) noexcept;
211
212    }}}
213
214[ns_data_member_heading numa..node..id]
215
216        std::uint32_t id;
217
218[variablelist
219[[Effects:] [ID of the NUMA-node]]
220]
221
222[ns_data_member_heading numa..node..logical_cpus]
223
224        std::set< std::uint32_t > logical_cpus;
225
226[variablelist
227[[Effects:] [set of logical cpu IDs belonging to the NUMA-node]]
228]
229
230[ns_data_member_heading numa..node..distance]
231
232        std::vector< std::uint32_t > distance;
233
234[variablelist
235[[Effects:] [The distance between NUMA-nodes describe the cots of accessing the
236remote memory.]]
237[[Note:] [A NUMA-node has a distance of `10` to itself, remote NUMA-nodes
238have a distance > `10`. The index in the array corresponds to the ID `id`
239of the NUMA-node. At the moment only Linux returns the correct distances,
240for all other operating systems remote NUMA-nodes get a default value of
241`20`.]]
242]
243
244[ns_operator_heading numa..node..operator_less..operator<]
245
246        bool operator<( node const& lhs, node const& rhs) const noexcept;
247
248[variablelist
249[[Returns:] [`true` if `lhs != rhs` is true and the
250implementation-defined total order of `node::id` values places `lhs` before
251`rhs`, false otherwise.]]
252[[Throws:] [Nothing.]]
253]
254
255
256[ns_function_heading numa..topology]
257
258    #include <boost/fiber/numa/topology.hpp>
259
260    namespace boost {
261    namespace fibers {
262    namespace numa {
263
264    std::vector< node > topology();
265
266    }}}
267
268[variablelist
269[[Effects:] [Evaluates the NUMA topology.]]
270[[Returns:] [a vector of NUMA-nodes describing the NUMA architecture of the
271system (each element represents a NUMA-node).]]
272[[Throws:] [`system_error`]]
273]
274
275
276[ns_function_heading numa..pin_thread]
277
278    #include <boost/fiber/numa/pin_thread.hpp>
279
280    namespace boost {
281    namespace fibers {
282    namespace numa {
283
284    void pin_thread( std::uint32_t cpu_id);
285    void pin_thread( std::uint32_t cpu_id, std::thread::native_handle_type h);
286
287    }}}
288
289[variablelist
290[[Effects:] [First version pins `this thread` to the logical cpu with ID `cpu_id`, e.g.
291the operating system scheduler will not migrate the thread to another logical cpu.
292The second variant pins the thread with the native ID `h` to logical cpu with ID `cpu_id`.]]
293[[Throws:] [`system_error`]]
294]
295
296
297[ns_class_heading numa..work_stealing]
298
299This class implements __algo__; the thread running this scheduler is pinned to the given
300logical cpu. If the local ready-queue runs out of ready fibers, ready fibers are stolen
301from other schedulers that run on logical cpus that belong to the same NUMA-node (local
302memory access).[br]
303If no ready fibers can be stolen from the local NUMA-node, the algorithm selects
304schedulers running on other NUMA-nodes (remote memory access).[br]
305The victim scheduler (from which a ready fiber is stolen) is selected at random.
306
307        #include <boost/fiber/numa/algo/work_stealing.hpp>
308
309        namespace boost {
310        namespace fibers {
311        namespace numa {
312        namespace algo {
313
314        class work_stealing : public algorithm {
315        public:
316            work_stealing( std::uint32_t cpu_id,
317                           std::uint32_t node_id,
318                           std::vector< boost::fibers::numa::node > const& topo,
319                           bool suspend = false);
320
321            work_stealing( work_stealing const&) = delete;
322            work_stealing( work_stealing &&) = delete;
323
324            work_stealing & operator=( work_stealing const&) = delete;
325            work_stealing & operator=( work_stealing &&) = delete;
326
327            virtual void awakened( context *) noexcept;
328
329            virtual context * pick_next() noexcept;
330
331            virtual bool has_ready_fibers() const noexcept;
332
333            virtual void suspend_until( std::chrono::steady_clock::time_point const&) noexcept;
334
335            virtual void notify() noexcept;
336        };
337
338        }}}}
339
340[heading Constructor]
341
342        work_stealing( std::uint32_t cpu_id, std::uint32_t node_id,
343                       std::vector< boost::fibers::numa::node > const& topo,
344                       bool suspend = false);
345
346[variablelist
347[[Effects:] [Constructs work-stealing scheduling algorithm. The thread is pinned to logical cpu with ID
348`cpu_id`. If local ready-queue runs out of ready fibers, ready fibers are stolen from other schedulers
349using `topology` (represents the NUMA-topology of the system).]]
350[[Throws:] [`system_error`]]
351[[Note:][If `suspend` is set to `true`, then the scheduler suspends if no ready fiber could be stolen.
352The scheduler will by woken up if a sleeping fiber times out or it was notified from remote (other thread or
353fiber scheduler).]]
354]
355
356[ns_member_heading numa..work_stealing..awakened]
357
358        virtual void awakened( context * f) noexcept;
359
360[variablelist
361[[Effects:] [Enqueues fiber `f` onto the shared ready queue.]]
362[[Throws:] [Nothing.]]
363]
364
365[ns_member_heading numa..work_stealing..pick_next]
366
367        virtual context * pick_next() noexcept;
368
369[variablelist
370[[Returns:] [the fiber at the head of the ready queue, or `nullptr` if the
371queue is empty.]]
372[[Throws:] [Nothing.]]
373[[Note:] [Placing ready fibers onto the tail of the sahred queue, and returning them
374from the head of that queue, shares the thread between ready fibers in
375round-robin fashion.]]
376]
377
378[ns_member_heading numa..work_stealing..has_ready_fibers]
379
380        virtual bool has_ready_fibers() const noexcept;
381
382[variablelist
383[[Returns:] [`true` if scheduler has fibers ready to run.]]
384[[Throws:] [Nothing.]]
385]
386
387[ns_member_heading numa..work_stealing..suspend_until]
388
389        virtual void suspend_until( std::chrono::steady_clock::time_point const& abs_time) noexcept;
390
391[variablelist
392[[Effects:] [Informs `work_stealing` that no ready fiber will be available until
393time-point `abs_time`. This implementation blocks in
394[@http://en.cppreference.com/w/cpp/thread/condition_variable/wait_until
395`std::condition_variable::wait_until()`].]]
396[[Throws:] [Nothing.]]
397]
398
399[ns_member_heading numa..work_stealing..notify]
400
401        virtual void notify() noexcept = 0;
402
403[variablelist
404[[Effects:] [Wake up a pending call to [member_link
405work_stealing..suspend_until], some fibers might be ready. This implementation
406wakes `suspend_until()` via
407[@http://en.cppreference.com/w/cpp/thread/condition_variable/notify_all
408`std::condition_variable::notify_all()`].]]
409[[Throws:] [Nothing.]]
410]
411
412[endsect]
413