1________________________________________________________________________ 2 3PYBENCH - A Python Benchmark Suite 4________________________________________________________________________ 5 6 Extendable suite of low-level benchmarks for measuring 7 the performance of the Python implementation 8 (interpreter, compiler or VM). 9 10pybench is a collection of tests that provides a standardized way to 11measure the performance of Python implementations. It takes a very 12close look at different aspects of Python programs and let's you 13decide which factors are more important to you than others, rather 14than wrapping everything up in one number, like the other performance 15tests do (e.g. pystone which is included in the Python Standard 16Library). 17 18pybench has been used in the past by several Python developers to 19track down performance bottlenecks or to demonstrate the impact of 20optimizations and new features in Python. 21 22The command line interface for pybench is the file pybench.py. Run 23this script with option '--help' to get a listing of the possible 24options. Without options, pybench will simply execute the benchmark 25and then print out a report to stdout. 26 27 28Micro-Manual 29------------ 30 31Run 'pybench.py -h' to see the help screen. Run 'pybench.py' to run 32the benchmark suite using default settings and 'pybench.py -f <file>' 33to have it store the results in a file too. 34 35It is usually a good idea to run pybench.py multiple times to see 36whether the environment, timers and benchmark run-times are suitable 37for doing benchmark tests. 38 39You can use the comparison feature of pybench.py ('pybench.py -c 40<file>') to check how well the system behaves in comparison to a 41reference run. 42 43If the differences are well below 10% for each test, then you have a 44system that is good for doing benchmark testings. Of you get random 45differences of more than 10% or significant differences between the 46values for minimum and average time, then you likely have some 47background processes running which cause the readings to become 48inconsistent. Examples include: web-browsers, email clients, RSS 49readers, music players, backup programs, etc. 50 51If you are only interested in a few tests of the whole suite, you can 52use the filtering option, e.g. 'pybench.py -t string' will only 53run/show the tests that have 'string' in their name. 54 55This is the current output of pybench.py --help: 56 57""" 58------------------------------------------------------------------------ 59PYBENCH - a benchmark test suite for Python interpreters/compilers. 60------------------------------------------------------------------------ 61 62Synopsis: 63 pybench.py [option] files... 64 65Options and default settings: 66 -n arg number of rounds (10) 67 -f arg save benchmark to file arg () 68 -c arg compare benchmark with the one in file arg () 69 -s arg show benchmark in file arg, then exit () 70 -w arg set warp factor to arg (10) 71 -t arg run only tests with names matching arg () 72 -C arg set the number of calibration runs to arg (20) 73 -d hide noise in comparisons (0) 74 -v verbose output (not recommended) (0) 75 --with-gc enable garbage collection (0) 76 --with-syscheck use default sys check interval (0) 77 --timer arg use given timer (time.time) 78 -h show this help text 79 --help show this help text 80 --debug enable debugging 81 --copyright show copyright 82 --examples show examples of usage 83 84Version: 85 2.1 86 87The normal operation is to run the suite and display the 88results. Use -f to save them for later reuse or comparisons. 89 90Available timers: 91 92 time.time 93 time.clock 94 systimes.processtime 95 96Examples: 97 98python3.0 pybench.py -f p30.pybench 99python3.1 pybench.py -f p31.pybench 100python pybench.py -s p31.pybench -c p30.pybench 101""" 102 103License 104------- 105 106See LICENSE file. 107 108 109Sample output 110------------- 111 112""" 113------------------------------------------------------------------------------- 114PYBENCH 2.1 115------------------------------------------------------------------------------- 116* using CPython 3.0 117* disabled garbage collection 118* system check interval set to maximum: 2147483647 119* using timer: time.time 120 121Calibrating tests. Please wait... 122 123Running 10 round(s) of the suite at warp factor 10: 124 125* Round 1 done in 6.388 seconds. 126* Round 2 done in 6.485 seconds. 127* Round 3 done in 6.786 seconds. 128... 129* Round 10 done in 6.546 seconds. 130 131------------------------------------------------------------------------------- 132Benchmark: 2006-06-12 12:09:25 133------------------------------------------------------------------------------- 134 135 Rounds: 10 136 Warp: 10 137 Timer: time.time 138 139 Machine Details: 140 Platform ID: Linux-2.6.8-24.19-default-x86_64-with-SuSE-9.2-x86-64 141 Processor: x86_64 142 143 Python: 144 Implementation: CPython 145 Executable: /usr/local/bin/python 146 Version: 3.0 147 Compiler: GCC 3.3.4 (pre 3.3.5 20040809) 148 Bits: 64bit 149 Build: Oct 1 2005 15:24:35 (#1) 150 Unicode: UCS2 151 152 153Test minimum average operation overhead 154------------------------------------------------------------------------------- 155 BuiltinFunctionCalls: 126ms 145ms 0.28us 0.274ms 156 BuiltinMethodLookup: 124ms 130ms 0.12us 0.316ms 157 CompareFloats: 109ms 110ms 0.09us 0.361ms 158 CompareFloatsIntegers: 100ms 104ms 0.12us 0.271ms 159 CompareIntegers: 137ms 138ms 0.08us 0.542ms 160 CompareInternedStrings: 124ms 127ms 0.08us 1.367ms 161 CompareLongs: 100ms 104ms 0.10us 0.316ms 162 CompareStrings: 111ms 115ms 0.12us 0.929ms 163 CompareUnicode: 108ms 128ms 0.17us 0.693ms 164 ConcatStrings: 142ms 155ms 0.31us 0.562ms 165 ConcatUnicode: 119ms 127ms 0.42us 0.384ms 166 CreateInstances: 123ms 128ms 1.14us 0.367ms 167 CreateNewInstances: 121ms 126ms 1.49us 0.335ms 168 CreateStringsWithConcat: 130ms 135ms 0.14us 0.916ms 169 CreateUnicodeWithConcat: 130ms 135ms 0.34us 0.361ms 170 DictCreation: 108ms 109ms 0.27us 0.361ms 171 DictWithFloatKeys: 149ms 153ms 0.17us 0.678ms 172 DictWithIntegerKeys: 124ms 126ms 0.11us 0.915ms 173 DictWithStringKeys: 114ms 117ms 0.10us 0.905ms 174 ForLoops: 110ms 111ms 4.46us 0.063ms 175 IfThenElse: 118ms 119ms 0.09us 0.685ms 176 ListSlicing: 116ms 120ms 8.59us 0.103ms 177 NestedForLoops: 125ms 137ms 0.09us 0.019ms 178 NormalClassAttribute: 124ms 136ms 0.11us 0.457ms 179 NormalInstanceAttribute: 110ms 117ms 0.10us 0.454ms 180 PythonFunctionCalls: 107ms 113ms 0.34us 0.271ms 181 PythonMethodCalls: 140ms 149ms 0.66us 0.141ms 182 Recursion: 156ms 166ms 3.32us 0.452ms 183 SecondImport: 112ms 118ms 1.18us 0.180ms 184 SecondPackageImport: 118ms 127ms 1.27us 0.180ms 185 SecondSubmoduleImport: 140ms 151ms 1.51us 0.180ms 186 SimpleComplexArithmetic: 128ms 139ms 0.16us 0.361ms 187 SimpleDictManipulation: 134ms 136ms 0.11us 0.452ms 188 SimpleFloatArithmetic: 110ms 113ms 0.09us 0.571ms 189 SimpleIntFloatArithmetic: 106ms 111ms 0.08us 0.548ms 190 SimpleIntegerArithmetic: 106ms 109ms 0.08us 0.544ms 191 SimpleListManipulation: 103ms 113ms 0.10us 0.587ms 192 SimpleLongArithmetic: 112ms 118ms 0.18us 0.271ms 193 SmallLists: 105ms 116ms 0.17us 0.366ms 194 SmallTuples: 108ms 128ms 0.24us 0.406ms 195 SpecialClassAttribute: 119ms 136ms 0.11us 0.453ms 196 SpecialInstanceAttribute: 143ms 155ms 0.13us 0.454ms 197 StringMappings: 115ms 121ms 0.48us 0.405ms 198 StringPredicates: 120ms 129ms 0.18us 2.064ms 199 StringSlicing: 111ms 127ms 0.23us 0.781ms 200 TryExcept: 125ms 126ms 0.06us 0.681ms 201 TryRaiseExcept: 133ms 137ms 2.14us 0.361ms 202 TupleSlicing: 117ms 120ms 0.46us 0.066ms 203 UnicodeMappings: 156ms 160ms 4.44us 0.429ms 204 UnicodePredicates: 117ms 121ms 0.22us 2.487ms 205 UnicodeProperties: 115ms 153ms 0.38us 2.070ms 206 UnicodeSlicing: 126ms 129ms 0.26us 0.689ms 207------------------------------------------------------------------------------- 208Totals: 6283ms 6673ms 209""" 210________________________________________________________________________ 211 212Writing New Tests 213________________________________________________________________________ 214 215pybench tests are simple modules defining one or more pybench.Test 216subclasses. 217 218Writing a test essentially boils down to providing two methods: 219.test() which runs .rounds number of .operations test operations each 220and .calibrate() which does the same except that it doesn't actually 221execute the operations. 222 223 224Here's an example: 225------------------ 226 227from pybench import Test 228 229class IntegerCounting(Test): 230 231 # Version number of the test as float (x.yy); this is important 232 # for comparisons of benchmark runs - tests with unequal version 233 # number will not get compared. 234 version = 1.0 235 236 # The number of abstract operations done in each round of the 237 # test. An operation is the basic unit of what you want to 238 # measure. The benchmark will output the amount of run-time per 239 # operation. Note that in order to raise the measured timings 240 # significantly above noise level, it is often required to repeat 241 # sets of operations more than once per test round. The measured 242 # overhead per test round should be less than 1 second. 243 operations = 20 244 245 # Number of rounds to execute per test run. This should be 246 # adjusted to a figure that results in a test run-time of between 247 # 1-2 seconds (at warp 1). 248 rounds = 100000 249 250 def test(self): 251 252 """ Run the test. 253 254 The test needs to run self.rounds executing 255 self.operations number of operations each. 256 257 """ 258 # Init the test 259 a = 1 260 261 # Run test rounds 262 # 263 for i in range(self.rounds): 264 265 # Repeat the operations per round to raise the run-time 266 # per operation significantly above the noise level of the 267 # for-loop overhead. 268 269 # Execute 20 operations (a += 1): 270 a += 1 271 a += 1 272 a += 1 273 a += 1 274 a += 1 275 a += 1 276 a += 1 277 a += 1 278 a += 1 279 a += 1 280 a += 1 281 a += 1 282 a += 1 283 a += 1 284 a += 1 285 a += 1 286 a += 1 287 a += 1 288 a += 1 289 a += 1 290 291 def calibrate(self): 292 293 """ Calibrate the test. 294 295 This method should execute everything that is needed to 296 setup and run the test - except for the actual operations 297 that you intend to measure. pybench uses this method to 298 measure the test implementation overhead. 299 300 """ 301 # Init the test 302 a = 1 303 304 # Run test rounds (without actually doing any operation) 305 for i in range(self.rounds): 306 307 # Skip the actual execution of the operations, since we 308 # only want to measure the test's administration overhead. 309 pass 310 311Registering a new test module 312----------------------------- 313 314To register a test module with pybench, the classes need to be 315imported into the pybench.Setup module. pybench will then scan all the 316symbols defined in that module for subclasses of pybench.Test and 317automatically add them to the benchmark suite. 318 319 320Breaking Comparability 321---------------------- 322 323If a change is made to any individual test that means it is no 324longer strictly comparable with previous runs, the '.version' class 325variable should be updated. Therefafter, comparisons with previous 326versions of the test will list as "n/a" to reflect the change. 327 328 329Version History 330--------------- 331 332 2.1: made some minor changes for compatibility with Python 3.0: 333 - replaced cmp with divmod and range with max in Calls.py 334 (cmp no longer exists in 3.0, and range is a list in 335 Python 2.x and an iterator in Python 3.x) 336 337 2.0: rewrote parts of pybench which resulted in more repeatable 338 timings: 339 - made timer a parameter 340 - changed the platform default timer to use high-resolution 341 timers rather than process timers (which have a much lower 342 resolution) 343 - added option to select timer 344 - added process time timer (using systimes.py) 345 - changed to use min() as timing estimator (average 346 is still taken as well to provide an idea of the difference) 347 - garbage collection is turned off per default 348 - sys check interval is set to the highest possible value 349 - calibration is now a separate step and done using 350 a different strategy that allows measuring the test 351 overhead more accurately 352 - modified the tests to each give a run-time of between 353 100-200ms using warp 10 354 - changed default warp factor to 10 (from 20) 355 - compared results with timeit.py and confirmed measurements 356 - bumped all test versions to 2.0 357 - updated platform.py to the latest version 358 - changed the output format a bit to make it look 359 nicer 360 - refactored the APIs somewhat 361 1.3+: Steve Holden added the NewInstances test and the filtering 362 option during the NeedForSpeed sprint; this also triggered a long 363 discussion on how to improve benchmark timing and finally 364 resulted in the release of 2.0 365 1.3: initial checkin into the Python SVN repository 366 367 368Have fun, 369-- 370Marc-Andre Lemburg 371mal@lemburg.com 372