• Home
Name Date Size #Lines LOC

..--

READMED12-May-20242.5 KiB6949

stringbench.pyD12-May-202443 KiB1,4831,223

README

1stringbench is a set of performance tests comparing byte string
2operations with unicode operations.  The two string implementations
3are loosely based on each other and sometimes the algorithm for one is
4faster than the other.
5
6These test set was started at the Need For Speed sprint in Reykjavik
7to identify which string methods could be sped up quickly and to
8identify obvious places for improvement.
9
10Here is an example of a benchmark
11
12
13@bench('"Andrew".startswith("A")', 'startswith single character', 1000)
14def startswith_single(STR):
15    s1 = STR("Andrew")
16    s2 = STR("A")
17    s1_startswith = s1.startswith
18    for x in _RANGE_1000:
19        s1_startswith(s2)
20
21The bench decorator takes three parameters.  The first is a short
22description of how the code works.  In most cases this is Python code
23snippet.  It is not the code which is actually run because the real
24code is hand-optimized to focus on the method being tested.
25
26The second parameter is a group title.  All benchmarks with the same
27group title are listed together.  This lets you compare different
28implementations of the same algorithm, such as "t in s"
29vs. "s.find(t)".
30
31The last is a count.  Each benchmark loops over the algorithm either
32100 or 1000 times, depending on the algorithm performance.  The output
33time is the time per benchmark call so the reader needs a way to know
34how to scale the performance.
35
36These parameters become function attributes.
37
38
39Here is an example of the output
40
41
42========== count newlines
4338.54   41.60   92.7    ...text.with.2000.newlines.count("\n") (*100)
44========== early match, single character
451.14    1.18    96.8    ("A"*1000).find("A") (*1000)
460.44    0.41    105.6   "A" in "A"*1000 (*1000)
471.15    1.17    98.1    ("A"*1000).index("A") (*1000)
48
49The first column is the run time in milliseconds for byte strings.
50The second is the run time for unicode strings.  The third is a
51percentage; byte time / unicode time.  It's the percentage by which
52unicode is faster than byte strings.
53
54The last column contains the code snippet and the repeat count for the
55internal benchmark loop.
56
57The times are computed with 'timeit.py' which repeats the test more
58and more times until the total time takes over 0.2 seconds, returning
59the best time for a single iteration.
60
61The final line of the output is the cumulative time for byte and
62unicode strings, and the overall performance of unicode relative to
63bytes.  For example
64
654079.83 5432.25 75.1    TOTAL
66
67However, this has no meaning as it evenly weights every test.
68
69