• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1________________________________________________________________________
2
3PYBENCH - A Python Benchmark Suite
4________________________________________________________________________
5
6     Extendable suite of low-level benchmarks for measuring
7          the performance of the Python implementation
8                 (interpreter, compiler or VM).
9
10pybench is a collection of tests that provides a standardized way to
11measure the performance of Python implementations. It takes a very
12close look at different aspects of Python programs and let's you
13decide which factors are more important to you than others, rather
14than wrapping everything up in one number, like the other performance
15tests do (e.g. pystone which is included in the Python Standard
16Library).
17
18pybench has been used in the past by several Python developers to
19track down performance bottlenecks or to demonstrate the impact of
20optimizations and new features in Python.
21
22The command line interface for pybench is the file pybench.py. Run
23this script with option '--help' to get a listing of the possible
24options. Without options, pybench will simply execute the benchmark
25and then print out a report to stdout.
26
27
28Micro-Manual
29------------
30
31Run 'pybench.py -h' to see the help screen.  Run 'pybench.py' to run
32the benchmark suite using default settings and 'pybench.py -f <file>'
33to have it store the results in a file too.
34
35It is usually a good idea to run pybench.py multiple times to see
36whether the environment, timers and benchmark run-times are suitable
37for doing benchmark tests.
38
39You can use the comparison feature of pybench.py ('pybench.py -c
40<file>') to check how well the system behaves in comparison to a
41reference run.
42
43If the differences are well below 10% for each test, then you have a
44system that is good for doing benchmark testings.  Of you get random
45differences of more than 10% or significant differences between the
46values for minimum and average time, then you likely have some
47background processes running which cause the readings to become
48inconsistent. Examples include: web-browsers, email clients, RSS
49readers, music players, backup programs, etc.
50
51If you are only interested in a few tests of the whole suite, you can
52use the filtering option, e.g. 'pybench.py -t string' will only
53run/show the tests that have 'string' in their name.
54
55This is the current output of pybench.py --help:
56
57"""
58------------------------------------------------------------------------
59PYBENCH - a benchmark test suite for Python interpreters/compilers.
60------------------------------------------------------------------------
61
62Synopsis:
63 pybench.py [option] files...
64
65Options and default settings:
66  -n arg           number of rounds (10)
67  -f arg           save benchmark to file arg ()
68  -c arg           compare benchmark with the one in file arg ()
69  -s arg           show benchmark in file arg, then exit ()
70  -w arg           set warp factor to arg (10)
71  -t arg           run only tests with names matching arg ()
72  -C arg           set the number of calibration runs to arg (20)
73  -d               hide noise in comparisons (0)
74  -v               verbose output (not recommended) (0)
75  --with-gc        enable garbage collection (0)
76  --with-syscheck  use default sys check interval (0)
77  --timer arg      use given timer (time.time)
78  -h               show this help text
79  --help           show this help text
80  --debug          enable debugging
81  --copyright      show copyright
82  --examples       show examples of usage
83
84Version:
85 2.1
86
87The normal operation is to run the suite and display the
88results. Use -f to save them for later reuse or comparisons.
89
90Available timers:
91
92   time.time
93   time.clock
94   systimes.processtime
95
96Examples:
97
98python3.0 pybench.py -f p30.pybench
99python3.1 pybench.py -f p31.pybench
100python pybench.py -s p31.pybench -c p30.pybench
101"""
102
103License
104-------
105
106See LICENSE file.
107
108
109Sample output
110-------------
111
112"""
113-------------------------------------------------------------------------------
114PYBENCH 2.1
115-------------------------------------------------------------------------------
116* using CPython 3.0
117* disabled garbage collection
118* system check interval set to maximum: 2147483647
119* using timer: time.time
120
121Calibrating tests. Please wait...
122
123Running 10 round(s) of the suite at warp factor 10:
124
125* Round 1 done in 6.388 seconds.
126* Round 2 done in 6.485 seconds.
127* Round 3 done in 6.786 seconds.
128...
129* Round 10 done in 6.546 seconds.
130
131-------------------------------------------------------------------------------
132Benchmark: 2006-06-12 12:09:25
133-------------------------------------------------------------------------------
134
135    Rounds: 10
136    Warp:   10
137    Timer:  time.time
138
139    Machine Details:
140       Platform ID:  Linux-2.6.8-24.19-default-x86_64-with-SuSE-9.2-x86-64
141       Processor:    x86_64
142
143    Python:
144       Implementation: CPython
145       Executable:   /usr/local/bin/python
146       Version:      3.0
147       Compiler:     GCC 3.3.4 (pre 3.3.5 20040809)
148       Bits:         64bit
149       Build:        Oct  1 2005 15:24:35 (#1)
150       Unicode:      UCS2
151
152
153Test                             minimum  average  operation  overhead
154-------------------------------------------------------------------------------
155          BuiltinFunctionCalls:    126ms    145ms    0.28us    0.274ms
156           BuiltinMethodLookup:    124ms    130ms    0.12us    0.316ms
157                 CompareFloats:    109ms    110ms    0.09us    0.361ms
158         CompareFloatsIntegers:    100ms    104ms    0.12us    0.271ms
159               CompareIntegers:    137ms    138ms    0.08us    0.542ms
160        CompareInternedStrings:    124ms    127ms    0.08us    1.367ms
161                  CompareLongs:    100ms    104ms    0.10us    0.316ms
162                CompareStrings:    111ms    115ms    0.12us    0.929ms
163                CompareUnicode:    108ms    128ms    0.17us    0.693ms
164                 ConcatStrings:    142ms    155ms    0.31us    0.562ms
165                 ConcatUnicode:    119ms    127ms    0.42us    0.384ms
166               CreateInstances:    123ms    128ms    1.14us    0.367ms
167            CreateNewInstances:    121ms    126ms    1.49us    0.335ms
168       CreateStringsWithConcat:    130ms    135ms    0.14us    0.916ms
169       CreateUnicodeWithConcat:    130ms    135ms    0.34us    0.361ms
170                  DictCreation:    108ms    109ms    0.27us    0.361ms
171             DictWithFloatKeys:    149ms    153ms    0.17us    0.678ms
172           DictWithIntegerKeys:    124ms    126ms    0.11us    0.915ms
173            DictWithStringKeys:    114ms    117ms    0.10us    0.905ms
174                      ForLoops:    110ms    111ms    4.46us    0.063ms
175                    IfThenElse:    118ms    119ms    0.09us    0.685ms
176                   ListSlicing:    116ms    120ms    8.59us    0.103ms
177                NestedForLoops:    125ms    137ms    0.09us    0.019ms
178          NormalClassAttribute:    124ms    136ms    0.11us    0.457ms
179       NormalInstanceAttribute:    110ms    117ms    0.10us    0.454ms
180           PythonFunctionCalls:    107ms    113ms    0.34us    0.271ms
181             PythonMethodCalls:    140ms    149ms    0.66us    0.141ms
182                     Recursion:    156ms    166ms    3.32us    0.452ms
183                  SecondImport:    112ms    118ms    1.18us    0.180ms
184           SecondPackageImport:    118ms    127ms    1.27us    0.180ms
185         SecondSubmoduleImport:    140ms    151ms    1.51us    0.180ms
186       SimpleComplexArithmetic:    128ms    139ms    0.16us    0.361ms
187        SimpleDictManipulation:    134ms    136ms    0.11us    0.452ms
188         SimpleFloatArithmetic:    110ms    113ms    0.09us    0.571ms
189      SimpleIntFloatArithmetic:    106ms    111ms    0.08us    0.548ms
190       SimpleIntegerArithmetic:    106ms    109ms    0.08us    0.544ms
191        SimpleListManipulation:    103ms    113ms    0.10us    0.587ms
192          SimpleLongArithmetic:    112ms    118ms    0.18us    0.271ms
193                    SmallLists:    105ms    116ms    0.17us    0.366ms
194                   SmallTuples:    108ms    128ms    0.24us    0.406ms
195         SpecialClassAttribute:    119ms    136ms    0.11us    0.453ms
196      SpecialInstanceAttribute:    143ms    155ms    0.13us    0.454ms
197                StringMappings:    115ms    121ms    0.48us    0.405ms
198              StringPredicates:    120ms    129ms    0.18us    2.064ms
199                 StringSlicing:    111ms    127ms    0.23us    0.781ms
200                     TryExcept:    125ms    126ms    0.06us    0.681ms
201                TryRaiseExcept:    133ms    137ms    2.14us    0.361ms
202                  TupleSlicing:    117ms    120ms    0.46us    0.066ms
203               UnicodeMappings:    156ms    160ms    4.44us    0.429ms
204             UnicodePredicates:    117ms    121ms    0.22us    2.487ms
205             UnicodeProperties:    115ms    153ms    0.38us    2.070ms
206                UnicodeSlicing:    126ms    129ms    0.26us    0.689ms
207-------------------------------------------------------------------------------
208Totals:                           6283ms   6673ms
209"""
210________________________________________________________________________
211
212Writing New Tests
213________________________________________________________________________
214
215pybench tests are simple modules defining one or more pybench.Test
216subclasses.
217
218Writing a test essentially boils down to providing two methods:
219.test() which runs .rounds number of .operations test operations each
220and .calibrate() which does the same except that it doesn't actually
221execute the operations.
222
223
224Here's an example:
225------------------
226
227from pybench import Test
228
229class IntegerCounting(Test):
230
231    # Version number of the test as float (x.yy); this is important
232    # for comparisons of benchmark runs - tests with unequal version
233    # number will not get compared.
234    version = 1.0
235
236    # The number of abstract operations done in each round of the
237    # test. An operation is the basic unit of what you want to
238    # measure. The benchmark will output the amount of run-time per
239    # operation. Note that in order to raise the measured timings
240    # significantly above noise level, it is often required to repeat
241    # sets of operations more than once per test round. The measured
242    # overhead per test round should be less than 1 second.
243    operations = 20
244
245    # Number of rounds to execute per test run. This should be
246    # adjusted to a figure that results in a test run-time of between
247    # 1-2 seconds (at warp 1).
248    rounds = 100000
249
250    def test(self):
251
252	""" Run the test.
253
254	    The test needs to run self.rounds executing
255	    self.operations number of operations each.
256
257        """
258        # Init the test
259        a = 1
260
261        # Run test rounds
262	#
263        for i in range(self.rounds):
264
265            # Repeat the operations per round to raise the run-time
266            # per operation significantly above the noise level of the
267            # for-loop overhead.
268
269	    # Execute 20 operations (a += 1):
270            a += 1
271            a += 1
272            a += 1
273            a += 1
274            a += 1
275            a += 1
276            a += 1
277            a += 1
278            a += 1
279            a += 1
280            a += 1
281            a += 1
282            a += 1
283            a += 1
284            a += 1
285            a += 1
286            a += 1
287            a += 1
288            a += 1
289            a += 1
290
291    def calibrate(self):
292
293	""" Calibrate the test.
294
295	    This method should execute everything that is needed to
296	    setup and run the test - except for the actual operations
297	    that you intend to measure. pybench uses this method to
298            measure the test implementation overhead.
299
300        """
301        # Init the test
302        a = 1
303
304        # Run test rounds (without actually doing any operation)
305        for i in range(self.rounds):
306
307	    # Skip the actual execution of the operations, since we
308	    # only want to measure the test's administration overhead.
309            pass
310
311Registering a new test module
312-----------------------------
313
314To register a test module with pybench, the classes need to be
315imported into the pybench.Setup module. pybench will then scan all the
316symbols defined in that module for subclasses of pybench.Test and
317automatically add them to the benchmark suite.
318
319
320Breaking Comparability
321----------------------
322
323If a change is made to any individual test that means it is no
324longer strictly comparable with previous runs, the '.version' class
325variable should be updated. Therefafter, comparisons with previous
326versions of the test will list as "n/a" to reflect the change.
327
328
329Version History
330---------------
331
332  2.1: made some minor changes for compatibility with Python 3.0:
333        - replaced cmp with divmod and range with max in Calls.py
334          (cmp no longer exists in 3.0, and range is a list in
335          Python 2.x and an iterator in Python 3.x)
336
337  2.0: rewrote parts of pybench which resulted in more repeatable
338       timings:
339        - made timer a parameter
340        - changed the platform default timer to use high-resolution
341          timers rather than process timers (which have a much lower
342          resolution)
343        - added option to select timer
344        - added process time timer (using systimes.py)
345        - changed to use min() as timing estimator (average
346          is still taken as well to provide an idea of the difference)
347        - garbage collection is turned off per default
348        - sys check interval is set to the highest possible value
349        - calibration is now a separate step and done using
350          a different strategy that allows measuring the test
351          overhead more accurately
352        - modified the tests to each give a run-time of between
353          100-200ms using warp 10
354        - changed default warp factor to 10 (from 20)
355        - compared results with timeit.py and confirmed measurements
356        - bumped all test versions to 2.0
357        - updated platform.py to the latest version
358        - changed the output format a bit to make it look
359          nicer
360        - refactored the APIs somewhat
361  1.3+: Steve Holden added the NewInstances test and the filtering
362       option during the NeedForSpeed sprint; this also triggered a long
363       discussion on how to improve benchmark timing and finally
364       resulted in the release of 2.0
365  1.3: initial checkin into the Python SVN repository
366
367
368Have fun,
369--
370Marc-Andre Lemburg
371mal@lemburg.com
372