• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1[section Object Code]
2
3Let's look at some assembly.  All assembly here was produced with Clang 4.0
4with `-O3`.  Given these definitions:
5
6[arithmetic_perf_decls]
7
8Here is a _yap_-based arithmetic function:
9
10[arithmetic_perf_eval_as_yap_expr]
11
12and the assembly it produces:
13
14    arithmetic_perf[0x100001c00] <+0>:  pushq  %rbp
15    arithmetic_perf[0x100001c01] <+1>:  movq   %rsp, %rbp
16    arithmetic_perf[0x100001c04] <+4>:  mulsd  %xmm1, %xmm0
17    arithmetic_perf[0x100001c08] <+8>:  addsd  %xmm2, %xmm0
18    arithmetic_perf[0x100001c0c] <+12>: movapd %xmm0, %xmm1
19    arithmetic_perf[0x100001c10] <+16>: mulsd  %xmm1, %xmm1
20    arithmetic_perf[0x100001c14] <+20>: addsd  %xmm0, %xmm1
21    arithmetic_perf[0x100001c18] <+24>: movapd %xmm1, %xmm0
22    arithmetic_perf[0x100001c1c] <+28>: popq   %rbp
23    arithmetic_perf[0x100001c1d] <+29>: retq
24
25And for the equivalent function using builtin expressions:
26
27[arithmetic_perf_eval_as_cpp_expr]
28
29the assembly is:
30
31    arithmetic_perf[0x100001e10] <+0>:  pushq  %rbp
32    arithmetic_perf[0x100001e11] <+1>:  movq   %rsp, %rbp
33    arithmetic_perf[0x100001e14] <+4>:  mulsd  %xmm1, %xmm0
34    arithmetic_perf[0x100001e18] <+8>:  addsd  %xmm2, %xmm0
35    arithmetic_perf[0x100001e1c] <+12>: movapd %xmm0, %xmm1
36    arithmetic_perf[0x100001e20] <+16>: mulsd  %xmm1, %xmm1
37    arithmetic_perf[0x100001e24] <+20>: addsd  %xmm0, %xmm1
38    arithmetic_perf[0x100001e28] <+24>: movapd %xmm1, %xmm0
39    arithmetic_perf[0x100001e2c] <+28>: popq   %rbp
40    arithmetic_perf[0x100001e2d] <+29>: retq
41
42If we increase the number of terminals by a factor of four:
43
44[arithmetic_perf_eval_as_yap_expr_4x]
45
46the results are the same: in this simple case, the _yap_ and builtin
47expressions result in the same object code.
48
49However, increasing the number of terminals by an additional factor of 2.5
50(for a total of 90 terminals), the inliner can no longer do as well for _yap_
51expressions as for builtin ones.
52
53More complex nonarithmetic code produces more mixed results. For example, here
54is a function using code from the Map Assign example:
55
56    std::map<std::string, int> make_map_with_boost_yap ()
57    {
58        return map_list_of
59            ("<", 1)
60            ("<=",2)
61            (">", 3)
62            (">=",4)
63            ("=", 5)
64            ("<>",6)
65            ;
66    }
67
68By contrast, here is the Boost.Assign version of the same function:
69
70    std::map<std::string, int> make_map_with_boost_assign ()
71    {
72        return boost::assign::map_list_of
73            ("<", 1)
74            ("<=",2)
75            (">", 3)
76            (">=",4)
77            ("=", 5)
78            ("<>",6)
79            ;
80    }
81
82Here is how you might do it "manually":
83
84    std::map<std::string, int> make_map_manually ()
85    {
86        std::map<std::string, int> retval;
87        retval.emplace("<", 1);
88        retval.emplace("<=",2);
89        retval.emplace(">", 3);
90        retval.emplace(">=",4);
91        retval.emplace("=", 5);
92        retval.emplace("<>",6);
93        return retval;
94    }
95
96Finally, here is the same map created from an initializer list:
97
98    std::map<std::string, int> make_map_inializer_list ()
99    {
100        std::map<std::string, int> retval = {
101            {"<", 1},
102            {"<=",2},
103            {">", 3},
104            {">=",4},
105            {"=", 5},
106            {"<>",6}
107        };
108        return retval;
109    }
110
111All of these produce roughly the same amount of assembly instructions.
112Benchmarking these four functions with Google Benchmark yields these results:
113
114[table Runtimes of Different Map Constructions
115    [[Function] [Time (ns)]]
116
117    [[make_map_with_boost_yap()]          [1285]]
118    [[make_map_with_boost_assign()]       [1459]]
119    [[make_map_manually()]                 [985]]
120    [[make_map_inializer_list()]           [954]]
121]
122
123The _yap_-based implementation finishes in the middle of the pack.
124
125In general, the expression trees produced by _yap_ get evaluated down to
126something close to the hand-written equivalent.  There is an abstraction
127penalty, but it is small for reasonably-sized expressions.
128
129
130[endsect]
131