1________________________________________________________________________ 2 3PYBENCH - A Python Benchmark Suite 4________________________________________________________________________ 5 6 Extendable suite of low-level benchmarks for measuring 7 the performance of the Python implementation 8 (interpreter, compiler or VM). 9 10pybench is a collection of tests that provides a standardized way to 11measure the performance of Python implementations. It takes a very 12close look at different aspects of Python programs and let's you 13decide which factors are more important to you than others, rather 14than wrapping everything up in one number, like the other performance 15tests do (e.g. pystone which is included in the Python Standard 16Library). 17 18pybench has been used in the past by several Python developers to 19track down performance bottlenecks or to demonstrate the impact of 20optimizations and new features in Python. 21 22The command line interface for pybench is the file pybench.py. Run 23this script with option '--help' to get a listing of the possible 24options. Without options, pybench will simply execute the benchmark 25and then print out a report to stdout. 26 27 28Micro-Manual 29------------ 30 31Run 'pybench.py -h' to see the help screen. Run 'pybench.py' to run 32the benchmark suite using default settings and 'pybench.py -f <file>' 33to have it store the results in a file too. 34 35It is usually a good idea to run pybench.py multiple times to see 36whether the environment, timers and benchmark run-times are suitable 37for doing benchmark tests. 38 39You can use the comparison feature of pybench.py ('pybench.py -c 40<file>') to check how well the system behaves in comparison to a 41reference run. 42 43If the differences are well below 10% for each test, then you have a 44system that is good for doing benchmark testings. Of you get random 45differences of more than 10% or significant differences between the 46values for minimum and average time, then you likely have some 47background processes running which cause the readings to become 48inconsistent. Examples include: web-browsers, email clients, RSS 49readers, music players, backup programs, etc. 50 51If you are only interested in a few tests of the whole suite, you can 52use the filtering option, e.g. 'pybench.py -t string' will only 53run/show the tests that have 'string' in their name. 54 55This is the current output of pybench.py --help: 56 57""" 58------------------------------------------------------------------------ 59PYBENCH - a benchmark test suite for Python interpreters/compilers. 60------------------------------------------------------------------------ 61 62Synopsis: 63 pybench.py [option] files... 64 65Options and default settings: 66 -n arg number of rounds (10) 67 -f arg save benchmark to file arg () 68 -c arg compare benchmark with the one in file arg () 69 -s arg show benchmark in file arg, then exit () 70 -w arg set warp factor to arg (10) 71 -t arg run only tests with names matching arg () 72 -C arg set the number of calibration runs to arg (20) 73 -d hide noise in comparisons (0) 74 -v verbose output (not recommended) (0) 75 --with-gc enable garbage collection (0) 76 --with-syscheck use default sys check interval (0) 77 --timer arg use given timer (time.time) 78 -h show this help text 79 --help show this help text 80 --debug enable debugging 81 --copyright show copyright 82 --examples show examples of usage 83 84Version: 85 2.0 86 87The normal operation is to run the suite and display the 88results. Use -f to save them for later reuse or comparisons. 89 90Available timers: 91 92 time.time 93 time.clock 94 systimes.processtime 95 96Examples: 97 98python2.1 pybench.py -f p21.pybench 99python2.5 pybench.py -f p25.pybench 100python pybench.py -s p25.pybench -c p21.pybench 101""" 102 103License 104------- 105 106See LICENSE file. 107 108 109Sample output 110------------- 111 112""" 113------------------------------------------------------------------------------- 114PYBENCH 2.0 115------------------------------------------------------------------------------- 116* using Python 2.4.2 117* disabled garbage collection 118* system check interval set to maximum: 2147483647 119* using timer: time.time 120 121Calibrating tests. Please wait... 122 123Running 10 round(s) of the suite at warp factor 10: 124 125* Round 1 done in 6.388 seconds. 126* Round 2 done in 6.485 seconds. 127* Round 3 done in 6.786 seconds. 128... 129* Round 10 done in 6.546 seconds. 130 131------------------------------------------------------------------------------- 132Benchmark: 2006-06-12 12:09:25 133------------------------------------------------------------------------------- 134 135 Rounds: 10 136 Warp: 10 137 Timer: time.time 138 139 Machine Details: 140 Platform ID: Linux-2.6.8-24.19-default-x86_64-with-SuSE-9.2-x86-64 141 Processor: x86_64 142 143 Python: 144 Executable: /usr/local/bin/python 145 Version: 2.4.2 146 Compiler: GCC 3.3.4 (pre 3.3.5 20040809) 147 Bits: 64bit 148 Build: Oct 1 2005 15:24:35 (#1) 149 Unicode: UCS2 150 151 152Test minimum average operation overhead 153------------------------------------------------------------------------------- 154 BuiltinFunctionCalls: 126ms 145ms 0.28us 0.274ms 155 BuiltinMethodLookup: 124ms 130ms 0.12us 0.316ms 156 CompareFloats: 109ms 110ms 0.09us 0.361ms 157 CompareFloatsIntegers: 100ms 104ms 0.12us 0.271ms 158 CompareIntegers: 137ms 138ms 0.08us 0.542ms 159 CompareInternedStrings: 124ms 127ms 0.08us 1.367ms 160 CompareLongs: 100ms 104ms 0.10us 0.316ms 161 CompareStrings: 111ms 115ms 0.12us 0.929ms 162 CompareUnicode: 108ms 128ms 0.17us 0.693ms 163 ConcatStrings: 142ms 155ms 0.31us 0.562ms 164 ConcatUnicode: 119ms 127ms 0.42us 0.384ms 165 CreateInstances: 123ms 128ms 1.14us 0.367ms 166 CreateNewInstances: 121ms 126ms 1.49us 0.335ms 167 CreateStringsWithConcat: 130ms 135ms 0.14us 0.916ms 168 CreateUnicodeWithConcat: 130ms 135ms 0.34us 0.361ms 169 DictCreation: 108ms 109ms 0.27us 0.361ms 170 DictWithFloatKeys: 149ms 153ms 0.17us 0.678ms 171 DictWithIntegerKeys: 124ms 126ms 0.11us 0.915ms 172 DictWithStringKeys: 114ms 117ms 0.10us 0.905ms 173 ForLoops: 110ms 111ms 4.46us 0.063ms 174 IfThenElse: 118ms 119ms 0.09us 0.685ms 175 ListSlicing: 116ms 120ms 8.59us 0.103ms 176 NestedForLoops: 125ms 137ms 0.09us 0.019ms 177 NormalClassAttribute: 124ms 136ms 0.11us 0.457ms 178 NormalInstanceAttribute: 110ms 117ms 0.10us 0.454ms 179 PythonFunctionCalls: 107ms 113ms 0.34us 0.271ms 180 PythonMethodCalls: 140ms 149ms 0.66us 0.141ms 181 Recursion: 156ms 166ms 3.32us 0.452ms 182 SecondImport: 112ms 118ms 1.18us 0.180ms 183 SecondPackageImport: 118ms 127ms 1.27us 0.180ms 184 SecondSubmoduleImport: 140ms 151ms 1.51us 0.180ms 185 SimpleComplexArithmetic: 128ms 139ms 0.16us 0.361ms 186 SimpleDictManipulation: 134ms 136ms 0.11us 0.452ms 187 SimpleFloatArithmetic: 110ms 113ms 0.09us 0.571ms 188 SimpleIntFloatArithmetic: 106ms 111ms 0.08us 0.548ms 189 SimpleIntegerArithmetic: 106ms 109ms 0.08us 0.544ms 190 SimpleListManipulation: 103ms 113ms 0.10us 0.587ms 191 SimpleLongArithmetic: 112ms 118ms 0.18us 0.271ms 192 SmallLists: 105ms 116ms 0.17us 0.366ms 193 SmallTuples: 108ms 128ms 0.24us 0.406ms 194 SpecialClassAttribute: 119ms 136ms 0.11us 0.453ms 195 SpecialInstanceAttribute: 143ms 155ms 0.13us 0.454ms 196 StringMappings: 115ms 121ms 0.48us 0.405ms 197 StringPredicates: 120ms 129ms 0.18us 2.064ms 198 StringSlicing: 111ms 127ms 0.23us 0.781ms 199 TryExcept: 125ms 126ms 0.06us 0.681ms 200 TryRaiseExcept: 133ms 137ms 2.14us 0.361ms 201 TupleSlicing: 117ms 120ms 0.46us 0.066ms 202 UnicodeMappings: 156ms 160ms 4.44us 0.429ms 203 UnicodePredicates: 117ms 121ms 0.22us 2.487ms 204 UnicodeProperties: 115ms 153ms 0.38us 2.070ms 205 UnicodeSlicing: 126ms 129ms 0.26us 0.689ms 206------------------------------------------------------------------------------- 207Totals: 6283ms 6673ms 208""" 209________________________________________________________________________ 210 211Writing New Tests 212________________________________________________________________________ 213 214pybench tests are simple modules defining one or more pybench.Test 215subclasses. 216 217Writing a test essentially boils down to providing two methods: 218.test() which runs .rounds number of .operations test operations each 219and .calibrate() which does the same except that it doesn't actually 220execute the operations. 221 222 223Here's an example: 224------------------ 225 226from pybench import Test 227 228class IntegerCounting(Test): 229 230 # Version number of the test as float (x.yy); this is important 231 # for comparisons of benchmark runs - tests with unequal version 232 # number will not get compared. 233 version = 1.0 234 235 # The number of abstract operations done in each round of the 236 # test. An operation is the basic unit of what you want to 237 # measure. The benchmark will output the amount of run-time per 238 # operation. Note that in order to raise the measured timings 239 # significantly above noise level, it is often required to repeat 240 # sets of operations more than once per test round. The measured 241 # overhead per test round should be less than 1 second. 242 operations = 20 243 244 # Number of rounds to execute per test run. This should be 245 # adjusted to a figure that results in a test run-time of between 246 # 1-2 seconds (at warp 1). 247 rounds = 100000 248 249 def test(self): 250 251 """ Run the test. 252 253 The test needs to run self.rounds executing 254 self.operations number of operations each. 255 256 """ 257 # Init the test 258 a = 1 259 260 # Run test rounds 261 # 262 # NOTE: Use xrange() for all test loops unless you want to face 263 # a 20MB process ! 264 # 265 for i in xrange(self.rounds): 266 267 # Repeat the operations per round to raise the run-time 268 # per operation significantly above the noise level of the 269 # for-loop overhead. 270 271 # Execute 20 operations (a += 1): 272 a += 1 273 a += 1 274 a += 1 275 a += 1 276 a += 1 277 a += 1 278 a += 1 279 a += 1 280 a += 1 281 a += 1 282 a += 1 283 a += 1 284 a += 1 285 a += 1 286 a += 1 287 a += 1 288 a += 1 289 a += 1 290 a += 1 291 a += 1 292 293 def calibrate(self): 294 295 """ Calibrate the test. 296 297 This method should execute everything that is needed to 298 setup and run the test - except for the actual operations 299 that you intend to measure. pybench uses this method to 300 measure the test implementation overhead. 301 302 """ 303 # Init the test 304 a = 1 305 306 # Run test rounds (without actually doing any operation) 307 for i in xrange(self.rounds): 308 309 # Skip the actual execution of the operations, since we 310 # only want to measure the test's administration overhead. 311 pass 312 313Registering a new test module 314----------------------------- 315 316To register a test module with pybench, the classes need to be 317imported into the pybench.Setup module. pybench will then scan all the 318symbols defined in that module for subclasses of pybench.Test and 319automatically add them to the benchmark suite. 320 321 322Breaking Comparability 323---------------------- 324 325If a change is made to any individual test that means it is no 326longer strictly comparable with previous runs, the '.version' class 327variable should be updated. Therefafter, comparisons with previous 328versions of the test will list as "n/a" to reflect the change. 329 330 331Version History 332--------------- 333 334 2.0: rewrote parts of pybench which resulted in more repeatable 335 timings: 336 - made timer a parameter 337 - changed the platform default timer to use high-resolution 338 timers rather than process timers (which have a much lower 339 resolution) 340 - added option to select timer 341 - added process time timer (using systimes.py) 342 - changed to use min() as timing estimator (average 343 is still taken as well to provide an idea of the difference) 344 - garbage collection is turned off per default 345 - sys check interval is set to the highest possible value 346 - calibration is now a separate step and done using 347 a different strategy that allows measuring the test 348 overhead more accurately 349 - modified the tests to each give a run-time of between 350 100-200ms using warp 10 351 - changed default warp factor to 10 (from 20) 352 - compared results with timeit.py and confirmed measurements 353 - bumped all test versions to 2.0 354 - updated platform.py to the latest version 355 - changed the output format a bit to make it look 356 nicer 357 - refactored the APIs somewhat 358 1.3+: Steve Holden added the NewInstances test and the filtering 359 option during the NeedForSpeed sprint; this also triggered a long 360 discussion on how to improve benchmark timing and finally 361 resulted in the release of 2.0 362 1.3: initial checkin into the Python SVN repository 363 364 365Have fun, 366-- 367Marc-Andre Lemburg 368mal@lemburg.com 369