1:mod:`difflib` --- Helpers for computing deltas 2=============================================== 3 4.. module:: difflib 5 :synopsis: Helpers for computing differences between objects. 6 7.. moduleauthor:: Tim Peters <tim_one@users.sourceforge.net> 8.. sectionauthor:: Tim Peters <tim_one@users.sourceforge.net> 9.. Markup by Fred L. Drake, Jr. <fdrake@acm.org> 10 11**Source code:** :source:`Lib/difflib.py` 12 13.. testsetup:: 14 15 import sys 16 from difflib import * 17 18-------------- 19 20This module provides classes and functions for comparing sequences. It 21can be used for example, for comparing files, and can produce information 22about file differences in various formats, including HTML and context and unified 23diffs. For comparing directories and files, see also, the :mod:`filecmp` module. 24 25 26.. class:: SequenceMatcher 27 :noindex: 28 29 This is a flexible class for comparing pairs of sequences of any type, so long 30 as the sequence elements are :term:`hashable`. The basic algorithm predates, and is a 31 little fancier than, an algorithm published in the late 1980's by Ratcliff and 32 Obershelp under the hyperbolic name "gestalt pattern matching." The idea is to 33 find the longest contiguous matching subsequence that contains no "junk" 34 elements; these "junk" elements are ones that are uninteresting in some 35 sense, such as blank lines or whitespace. (Handling junk is an 36 extension to the Ratcliff and Obershelp algorithm.) The same 37 idea is then applied recursively to the pieces of the sequences to the left and 38 to the right of the matching subsequence. This does not yield minimal edit 39 sequences, but does tend to yield matches that "look right" to people. 40 41 **Timing:** The basic Ratcliff-Obershelp algorithm is cubic time in the worst 42 case and quadratic time in the expected case. :class:`SequenceMatcher` is 43 quadratic time for the worst case and has expected-case behavior dependent in a 44 complicated way on how many elements the sequences have in common; best case 45 time is linear. 46 47 **Automatic junk heuristic:** :class:`SequenceMatcher` supports a heuristic that 48 automatically treats certain sequence items as junk. The heuristic counts how many 49 times each individual item appears in the sequence. If an item's duplicates (after 50 the first one) account for more than 1% of the sequence and the sequence is at least 51 200 items long, this item is marked as "popular" and is treated as junk for 52 the purpose of sequence matching. This heuristic can be turned off by setting 53 the ``autojunk`` argument to ``False`` when creating the :class:`SequenceMatcher`. 54 55 .. versionadded:: 3.2 56 The *autojunk* parameter. 57 58 59.. class:: Differ 60 61 This is a class for comparing sequences of lines of text, and producing 62 human-readable differences or deltas. Differ uses :class:`SequenceMatcher` 63 both to compare sequences of lines, and to compare sequences of characters 64 within similar (near-matching) lines. 65 66 Each line of a :class:`Differ` delta begins with a two-letter code: 67 68 +----------+-------------------------------------------+ 69 | Code | Meaning | 70 +==========+===========================================+ 71 | ``'- '`` | line unique to sequence 1 | 72 +----------+-------------------------------------------+ 73 | ``'+ '`` | line unique to sequence 2 | 74 +----------+-------------------------------------------+ 75 | ``' '`` | line common to both sequences | 76 +----------+-------------------------------------------+ 77 | ``'? '`` | line not present in either input sequence | 78 +----------+-------------------------------------------+ 79 80 Lines beginning with '``?``' attempt to guide the eye to intraline differences, 81 and were not present in either input sequence. These lines can be confusing if 82 the sequences contain tab characters. 83 84 85.. class:: HtmlDiff 86 87 This class can be used to create an HTML table (or a complete HTML file 88 containing the table) showing a side by side, line by line comparison of text 89 with inter-line and intra-line change highlights. The table can be generated in 90 either full or contextual difference mode. 91 92 The constructor for this class is: 93 94 95 .. method:: __init__(tabsize=8, wrapcolumn=None, linejunk=None, charjunk=IS_CHARACTER_JUNK) 96 97 Initializes instance of :class:`HtmlDiff`. 98 99 *tabsize* is an optional keyword argument to specify tab stop spacing and 100 defaults to ``8``. 101 102 *wrapcolumn* is an optional keyword to specify column number where lines are 103 broken and wrapped, defaults to ``None`` where lines are not wrapped. 104 105 *linejunk* and *charjunk* are optional keyword arguments passed into :func:`ndiff` 106 (used by :class:`HtmlDiff` to generate the side by side HTML differences). See 107 :func:`ndiff` documentation for argument default values and descriptions. 108 109 The following methods are public: 110 111 .. method:: make_file(fromlines, tolines, fromdesc='', todesc='', context=False, \ 112 numlines=5, *, charset='utf-8') 113 114 Compares *fromlines* and *tolines* (lists of strings) and returns a string which 115 is a complete HTML file containing a table showing line by line differences with 116 inter-line and intra-line changes highlighted. 117 118 *fromdesc* and *todesc* are optional keyword arguments to specify from/to file 119 column header strings (both default to an empty string). 120 121 *context* and *numlines* are both optional keyword arguments. Set *context* to 122 ``True`` when contextual differences are to be shown, else the default is 123 ``False`` to show the full files. *numlines* defaults to ``5``. When *context* 124 is ``True`` *numlines* controls the number of context lines which surround the 125 difference highlights. When *context* is ``False`` *numlines* controls the 126 number of lines which are shown before a difference highlight when using the 127 "next" hyperlinks (setting to zero would cause the "next" hyperlinks to place 128 the next difference highlight at the top of the browser without any leading 129 context). 130 131 .. note:: 132 *fromdesc* and *todesc* are interpreted as unescaped HTML and should be 133 properly escaped while receiving input from untrusted sources. 134 135 .. versionchanged:: 3.5 136 *charset* keyword-only argument was added. The default charset of 137 HTML document changed from ``'ISO-8859-1'`` to ``'utf-8'``. 138 139 .. method:: make_table(fromlines, tolines, fromdesc='', todesc='', context=False, numlines=5) 140 141 Compares *fromlines* and *tolines* (lists of strings) and returns a string which 142 is a complete HTML table showing line by line differences with inter-line and 143 intra-line changes highlighted. 144 145 The arguments for this method are the same as those for the :meth:`make_file` 146 method. 147 148 :file:`Tools/scripts/diff.py` is a command-line front-end to this class and 149 contains a good example of its use. 150 151 152.. function:: context_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\\n') 153 154 Compare *a* and *b* (lists of strings); return a delta (a :term:`generator` 155 generating the delta lines) in context diff format. 156 157 Context diffs are a compact way of showing just the lines that have changed plus 158 a few lines of context. The changes are shown in a before/after style. The 159 number of context lines is set by *n* which defaults to three. 160 161 By default, the diff control lines (those with ``***`` or ``---``) are created 162 with a trailing newline. This is helpful so that inputs created from 163 :func:`io.IOBase.readlines` result in diffs that are suitable for use with 164 :func:`io.IOBase.writelines` since both the inputs and outputs have trailing 165 newlines. 166 167 For inputs that do not have trailing newlines, set the *lineterm* argument to 168 ``""`` so that the output will be uniformly newline free. 169 170 The context diff format normally has a header for filenames and modification 171 times. Any or all of these may be specified using strings for *fromfile*, 172 *tofile*, *fromfiledate*, and *tofiledate*. The modification times are normally 173 expressed in the ISO 8601 format. If not specified, the 174 strings default to blanks. 175 176 >>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n'] 177 >>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n'] 178 >>> sys.stdout.writelines(context_diff(s1, s2, fromfile='before.py', tofile='after.py')) 179 *** before.py 180 --- after.py 181 *************** 182 *** 1,4 **** 183 ! bacon 184 ! eggs 185 ! ham 186 guido 187 --- 1,4 ---- 188 ! python 189 ! eggy 190 ! hamster 191 guido 192 193 See :ref:`difflib-interface` for a more detailed example. 194 195 196.. function:: get_close_matches(word, possibilities, n=3, cutoff=0.6) 197 198 Return a list of the best "good enough" matches. *word* is a sequence for which 199 close matches are desired (typically a string), and *possibilities* is a list of 200 sequences against which to match *word* (typically a list of strings). 201 202 Optional argument *n* (default ``3``) is the maximum number of close matches to 203 return; *n* must be greater than ``0``. 204 205 Optional argument *cutoff* (default ``0.6``) is a float in the range [0, 1]. 206 Possibilities that don't score at least that similar to *word* are ignored. 207 208 The best (no more than *n*) matches among the possibilities are returned in a 209 list, sorted by similarity score, most similar first. 210 211 >>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy']) 212 ['apple', 'ape'] 213 >>> import keyword 214 >>> get_close_matches('wheel', keyword.kwlist) 215 ['while'] 216 >>> get_close_matches('pineapple', keyword.kwlist) 217 [] 218 >>> get_close_matches('accept', keyword.kwlist) 219 ['except'] 220 221 222.. function:: ndiff(a, b, linejunk=None, charjunk=IS_CHARACTER_JUNK) 223 224 Compare *a* and *b* (lists of strings); return a :class:`Differ`\ -style 225 delta (a :term:`generator` generating the delta lines). 226 227 Optional keyword parameters *linejunk* and *charjunk* are filtering functions 228 (or ``None``): 229 230 *linejunk*: A function that accepts a single string argument, and returns 231 true if the string is junk, or false if not. The default is ``None``. There 232 is also a module-level function :func:`IS_LINE_JUNK`, which filters out lines 233 without visible characters, except for at most one pound character (``'#'``) 234 -- however the underlying :class:`SequenceMatcher` class does a dynamic 235 analysis of which lines are so frequent as to constitute noise, and this 236 usually works better than using this function. 237 238 *charjunk*: A function that accepts a character (a string of length 1), and 239 returns if the character is junk, or false if not. The default is module-level 240 function :func:`IS_CHARACTER_JUNK`, which filters out whitespace characters (a 241 blank or tab; it's a bad idea to include newline in this!). 242 243 :file:`Tools/scripts/ndiff.py` is a command-line front-end to this function. 244 245 >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True), 246 ... 'ore\ntree\nemu\n'.splitlines(keepends=True)) 247 >>> print(''.join(diff), end="") 248 - one 249 ? ^ 250 + ore 251 ? ^ 252 - two 253 - three 254 ? - 255 + tree 256 + emu 257 258 259.. function:: restore(sequence, which) 260 261 Return one of the two sequences that generated a delta. 262 263 Given a *sequence* produced by :meth:`Differ.compare` or :func:`ndiff`, extract 264 lines originating from file 1 or 2 (parameter *which*), stripping off line 265 prefixes. 266 267 Example: 268 269 >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True), 270 ... 'ore\ntree\nemu\n'.splitlines(keepends=True)) 271 >>> diff = list(diff) # materialize the generated delta into a list 272 >>> print(''.join(restore(diff, 1)), end="") 273 one 274 two 275 three 276 >>> print(''.join(restore(diff, 2)), end="") 277 ore 278 tree 279 emu 280 281 282.. function:: unified_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\\n') 283 284 Compare *a* and *b* (lists of strings); return a delta (a :term:`generator` 285 generating the delta lines) in unified diff format. 286 287 Unified diffs are a compact way of showing just the lines that have changed plus 288 a few lines of context. The changes are shown in an inline style (instead of 289 separate before/after blocks). The number of context lines is set by *n* which 290 defaults to three. 291 292 By default, the diff control lines (those with ``---``, ``+++``, or ``@@``) are 293 created with a trailing newline. This is helpful so that inputs created from 294 :func:`io.IOBase.readlines` result in diffs that are suitable for use with 295 :func:`io.IOBase.writelines` since both the inputs and outputs have trailing 296 newlines. 297 298 For inputs that do not have trailing newlines, set the *lineterm* argument to 299 ``""`` so that the output will be uniformly newline free. 300 301 The context diff format normally has a header for filenames and modification 302 times. Any or all of these may be specified using strings for *fromfile*, 303 *tofile*, *fromfiledate*, and *tofiledate*. The modification times are normally 304 expressed in the ISO 8601 format. If not specified, the 305 strings default to blanks. 306 307 308 >>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n'] 309 >>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n'] 310 >>> sys.stdout.writelines(unified_diff(s1, s2, fromfile='before.py', tofile='after.py')) 311 --- before.py 312 +++ after.py 313 @@ -1,4 +1,4 @@ 314 -bacon 315 -eggs 316 -ham 317 +python 318 +eggy 319 +hamster 320 guido 321 322 See :ref:`difflib-interface` for a more detailed example. 323 324.. function:: diff_bytes(dfunc, a, b, fromfile=b'', tofile=b'', fromfiledate=b'', tofiledate=b'', n=3, lineterm=b'\\n') 325 326 Compare *a* and *b* (lists of bytes objects) using *dfunc*; yield a 327 sequence of delta lines (also bytes) in the format returned by *dfunc*. 328 *dfunc* must be a callable, typically either :func:`unified_diff` or 329 :func:`context_diff`. 330 331 Allows you to compare data with unknown or inconsistent encoding. All 332 inputs except *n* must be bytes objects, not str. Works by losslessly 333 converting all inputs (except *n*) to str, and calling ``dfunc(a, b, 334 fromfile, tofile, fromfiledate, tofiledate, n, lineterm)``. The output of 335 *dfunc* is then converted back to bytes, so the delta lines that you 336 receive have the same unknown/inconsistent encodings as *a* and *b*. 337 338 .. versionadded:: 3.5 339 340.. function:: IS_LINE_JUNK(line) 341 342 Return ``True`` for ignorable lines. The line *line* is ignorable if *line* is 343 blank or contains a single ``'#'``, otherwise it is not ignorable. Used as a 344 default for parameter *linejunk* in :func:`ndiff` in older versions. 345 346 347.. function:: IS_CHARACTER_JUNK(ch) 348 349 Return ``True`` for ignorable characters. The character *ch* is ignorable if *ch* 350 is a space or tab, otherwise it is not ignorable. Used as a default for 351 parameter *charjunk* in :func:`ndiff`. 352 353 354.. seealso:: 355 356 `Pattern Matching: The Gestalt Approach <http://www.drdobbs.com/database/pattern-matching-the-gestalt-approach/184407970>`_ 357 Discussion of a similar algorithm by John W. Ratcliff and D. E. Metzener. This 358 was published in `Dr. Dobb's Journal <http://www.drdobbs.com/>`_ in July, 1988. 359 360 361.. _sequence-matcher: 362 363SequenceMatcher Objects 364----------------------- 365 366The :class:`SequenceMatcher` class has this constructor: 367 368 369.. class:: SequenceMatcher(isjunk=None, a='', b='', autojunk=True) 370 371 Optional argument *isjunk* must be ``None`` (the default) or a one-argument 372 function that takes a sequence element and returns true if and only if the 373 element is "junk" and should be ignored. Passing ``None`` for *isjunk* is 374 equivalent to passing ``lambda x: False``; in other words, no elements are ignored. 375 For example, pass:: 376 377 lambda x: x in " \t" 378 379 if you're comparing lines as sequences of characters, and don't want to synch up 380 on blanks or hard tabs. 381 382 The optional arguments *a* and *b* are sequences to be compared; both default to 383 empty strings. The elements of both sequences must be :term:`hashable`. 384 385 The optional argument *autojunk* can be used to disable the automatic junk 386 heuristic. 387 388 .. versionadded:: 3.2 389 The *autojunk* parameter. 390 391 SequenceMatcher objects get three data attributes: *bjunk* is the 392 set of elements of *b* for which *isjunk* is ``True``; *bpopular* is the set of 393 non-junk elements considered popular by the heuristic (if it is not 394 disabled); *b2j* is a dict mapping the remaining elements of *b* to a list 395 of positions where they occur. All three are reset whenever *b* is reset 396 with :meth:`set_seqs` or :meth:`set_seq2`. 397 398 .. versionadded:: 3.2 399 The *bjunk* and *bpopular* attributes. 400 401 :class:`SequenceMatcher` objects have the following methods: 402 403 .. method:: set_seqs(a, b) 404 405 Set the two sequences to be compared. 406 407 :class:`SequenceMatcher` computes and caches detailed information about the 408 second sequence, so if you want to compare one sequence against many 409 sequences, use :meth:`set_seq2` to set the commonly used sequence once and 410 call :meth:`set_seq1` repeatedly, once for each of the other sequences. 411 412 413 .. method:: set_seq1(a) 414 415 Set the first sequence to be compared. The second sequence to be compared 416 is not changed. 417 418 419 .. method:: set_seq2(b) 420 421 Set the second sequence to be compared. The first sequence to be compared 422 is not changed. 423 424 425 .. method:: find_longest_match(alo=0, ahi=None, blo=0, bhi=None) 426 427 Find longest matching block in ``a[alo:ahi]`` and ``b[blo:bhi]``. 428 429 If *isjunk* was omitted or ``None``, :meth:`find_longest_match` returns 430 ``(i, j, k)`` such that ``a[i:i+k]`` is equal to ``b[j:j+k]``, where ``alo 431 <= i <= i+k <= ahi`` and ``blo <= j <= j+k <= bhi``. For all ``(i', j', 432 k')`` meeting those conditions, the additional conditions ``k >= k'``, ``i 433 <= i'``, and if ``i == i'``, ``j <= j'`` are also met. In other words, of 434 all maximal matching blocks, return one that starts earliest in *a*, and 435 of all those maximal matching blocks that start earliest in *a*, return 436 the one that starts earliest in *b*. 437 438 >>> s = SequenceMatcher(None, " abcd", "abcd abcd") 439 >>> s.find_longest_match(0, 5, 0, 9) 440 Match(a=0, b=4, size=5) 441 442 If *isjunk* was provided, first the longest matching block is determined 443 as above, but with the additional restriction that no junk element appears 444 in the block. Then that block is extended as far as possible by matching 445 (only) junk elements on both sides. So the resulting block never matches 446 on junk except as identical junk happens to be adjacent to an interesting 447 match. 448 449 Here's the same example as before, but considering blanks to be junk. That 450 prevents ``' abcd'`` from matching the ``' abcd'`` at the tail end of the 451 second sequence directly. Instead only the ``'abcd'`` can match, and 452 matches the leftmost ``'abcd'`` in the second sequence: 453 454 >>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd") 455 >>> s.find_longest_match(0, 5, 0, 9) 456 Match(a=1, b=0, size=4) 457 458 If no blocks match, this returns ``(alo, blo, 0)``. 459 460 This method returns a :term:`named tuple` ``Match(a, b, size)``. 461 462 .. versionchanged:: 3.9 463 Added default arguments. 464 465 466 .. method:: get_matching_blocks() 467 468 Return list of triples describing non-overlapping matching subsequences. 469 Each triple is of the form ``(i, j, n)``, 470 and means that ``a[i:i+n] == b[j:j+n]``. The 471 triples are monotonically increasing in *i* and *j*. 472 473 The last triple is a dummy, and has the value ``(len(a), len(b), 0)``. It 474 is the only triple with ``n == 0``. If ``(i, j, n)`` and ``(i', j', n')`` 475 are adjacent triples in the list, and the second is not the last triple in 476 the list, then ``i+n < i'`` or ``j+n < j'``; in other words, adjacent 477 triples always describe non-adjacent equal blocks. 478 479 .. XXX Explain why a dummy is used! 480 481 .. doctest:: 482 483 >>> s = SequenceMatcher(None, "abxcd", "abcd") 484 >>> s.get_matching_blocks() 485 [Match(a=0, b=0, size=2), Match(a=3, b=2, size=2), Match(a=5, b=4, size=0)] 486 487 488 .. method:: get_opcodes() 489 490 Return list of 5-tuples describing how to turn *a* into *b*. Each tuple is 491 of the form ``(tag, i1, i2, j1, j2)``. The first tuple has ``i1 == j1 == 492 0``, and remaining tuples have *i1* equal to the *i2* from the preceding 493 tuple, and, likewise, *j1* equal to the previous *j2*. 494 495 The *tag* values are strings, with these meanings: 496 497 +---------------+---------------------------------------------+ 498 | Value | Meaning | 499 +===============+=============================================+ 500 | ``'replace'`` | ``a[i1:i2]`` should be replaced by | 501 | | ``b[j1:j2]``. | 502 +---------------+---------------------------------------------+ 503 | ``'delete'`` | ``a[i1:i2]`` should be deleted. Note that | 504 | | ``j1 == j2`` in this case. | 505 +---------------+---------------------------------------------+ 506 | ``'insert'`` | ``b[j1:j2]`` should be inserted at | 507 | | ``a[i1:i1]``. Note that ``i1 == i2`` in | 508 | | this case. | 509 +---------------+---------------------------------------------+ 510 | ``'equal'`` | ``a[i1:i2] == b[j1:j2]`` (the sub-sequences | 511 | | are equal). | 512 +---------------+---------------------------------------------+ 513 514 For example:: 515 516 >>> a = "qabxcd" 517 >>> b = "abycdf" 518 >>> s = SequenceMatcher(None, a, b) 519 >>> for tag, i1, i2, j1, j2 in s.get_opcodes(): 520 ... print('{:7} a[{}:{}] --> b[{}:{}] {!r:>8} --> {!r}'.format( 521 ... tag, i1, i2, j1, j2, a[i1:i2], b[j1:j2])) 522 delete a[0:1] --> b[0:0] 'q' --> '' 523 equal a[1:3] --> b[0:2] 'ab' --> 'ab' 524 replace a[3:4] --> b[2:3] 'x' --> 'y' 525 equal a[4:6] --> b[3:5] 'cd' --> 'cd' 526 insert a[6:6] --> b[5:6] '' --> 'f' 527 528 529 .. method:: get_grouped_opcodes(n=3) 530 531 Return a :term:`generator` of groups with up to *n* lines of context. 532 533 Starting with the groups returned by :meth:`get_opcodes`, this method 534 splits out smaller change clusters and eliminates intervening ranges which 535 have no changes. 536 537 The groups are returned in the same format as :meth:`get_opcodes`. 538 539 540 .. method:: ratio() 541 542 Return a measure of the sequences' similarity as a float in the range [0, 543 1]. 544 545 Where T is the total number of elements in both sequences, and M is the 546 number of matches, this is 2.0\*M / T. Note that this is ``1.0`` if the 547 sequences are identical, and ``0.0`` if they have nothing in common. 548 549 This is expensive to compute if :meth:`get_matching_blocks` or 550 :meth:`get_opcodes` hasn't already been called, in which case you may want 551 to try :meth:`quick_ratio` or :meth:`real_quick_ratio` first to get an 552 upper bound. 553 554 .. note:: 555 556 Caution: The result of a :meth:`ratio` call may depend on the order of 557 the arguments. For instance:: 558 559 >>> SequenceMatcher(None, 'tide', 'diet').ratio() 560 0.25 561 >>> SequenceMatcher(None, 'diet', 'tide').ratio() 562 0.5 563 564 565 .. method:: quick_ratio() 566 567 Return an upper bound on :meth:`ratio` relatively quickly. 568 569 570 .. method:: real_quick_ratio() 571 572 Return an upper bound on :meth:`ratio` very quickly. 573 574 575The three methods that return the ratio of matching to total characters can give 576different results due to differing levels of approximation, although 577:meth:`quick_ratio` and :meth:`real_quick_ratio` are always at least as large as 578:meth:`ratio`: 579 580 >>> s = SequenceMatcher(None, "abcd", "bcde") 581 >>> s.ratio() 582 0.75 583 >>> s.quick_ratio() 584 0.75 585 >>> s.real_quick_ratio() 586 1.0 587 588 589.. _sequencematcher-examples: 590 591SequenceMatcher Examples 592------------------------ 593 594This example compares two strings, considering blanks to be "junk": 595 596 >>> s = SequenceMatcher(lambda x: x == " ", 597 ... "private Thread currentThread;", 598 ... "private volatile Thread currentThread;") 599 600:meth:`ratio` returns a float in [0, 1], measuring the similarity of the 601sequences. As a rule of thumb, a :meth:`ratio` value over 0.6 means the 602sequences are close matches: 603 604 >>> print(round(s.ratio(), 3)) 605 0.866 606 607If you're only interested in where the sequences match, 608:meth:`get_matching_blocks` is handy: 609 610 >>> for block in s.get_matching_blocks(): 611 ... print("a[%d] and b[%d] match for %d elements" % block) 612 a[0] and b[0] match for 8 elements 613 a[8] and b[17] match for 21 elements 614 a[29] and b[38] match for 0 elements 615 616Note that the last tuple returned by :meth:`get_matching_blocks` is always a 617dummy, ``(len(a), len(b), 0)``, and this is the only case in which the last 618tuple element (number of elements matched) is ``0``. 619 620If you want to know how to change the first sequence into the second, use 621:meth:`get_opcodes`: 622 623 >>> for opcode in s.get_opcodes(): 624 ... print("%6s a[%d:%d] b[%d:%d]" % opcode) 625 equal a[0:8] b[0:8] 626 insert a[8:8] b[8:17] 627 equal a[8:29] b[17:38] 628 629.. seealso:: 630 631 * The :func:`get_close_matches` function in this module which shows how 632 simple code building on :class:`SequenceMatcher` can be used to do useful 633 work. 634 635 * `Simple version control recipe 636 <https://code.activestate.com/recipes/576729/>`_ for a small application 637 built with :class:`SequenceMatcher`. 638 639 640.. _differ-objects: 641 642Differ Objects 643-------------- 644 645Note that :class:`Differ`\ -generated deltas make no claim to be **minimal** 646diffs. To the contrary, minimal diffs are often counter-intuitive, because they 647synch up anywhere possible, sometimes accidental matches 100 pages apart. 648Restricting synch points to contiguous matches preserves some notion of 649locality, at the occasional cost of producing a longer diff. 650 651The :class:`Differ` class has this constructor: 652 653 654.. class:: Differ(linejunk=None, charjunk=None) 655 :noindex: 656 657 Optional keyword parameters *linejunk* and *charjunk* are for filter functions 658 (or ``None``): 659 660 *linejunk*: A function that accepts a single string argument, and returns true 661 if the string is junk. The default is ``None``, meaning that no line is 662 considered junk. 663 664 *charjunk*: A function that accepts a single character argument (a string of 665 length 1), and returns true if the character is junk. The default is ``None``, 666 meaning that no character is considered junk. 667 668 These junk-filtering functions speed up matching to find 669 differences and do not cause any differing lines or characters to 670 be ignored. Read the description of the 671 :meth:`~SequenceMatcher.find_longest_match` method's *isjunk* 672 parameter for an explanation. 673 674 :class:`Differ` objects are used (deltas generated) via a single method: 675 676 677 .. method:: Differ.compare(a, b) 678 679 Compare two sequences of lines, and generate the delta (a sequence of lines). 680 681 Each sequence must contain individual single-line strings ending with 682 newlines. Such sequences can be obtained from the 683 :meth:`~io.IOBase.readlines` method of file-like objects. The delta 684 generated also consists of newline-terminated strings, ready to be 685 printed as-is via the :meth:`~io.IOBase.writelines` method of a 686 file-like object. 687 688 689.. _differ-examples: 690 691Differ Example 692-------------- 693 694This example compares two texts. First we set up the texts, sequences of 695individual single-line strings ending with newlines (such sequences can also be 696obtained from the :meth:`~io.BaseIO.readlines` method of file-like objects): 697 698 >>> text1 = ''' 1. Beautiful is better than ugly. 699 ... 2. Explicit is better than implicit. 700 ... 3. Simple is better than complex. 701 ... 4. Complex is better than complicated. 702 ... '''.splitlines(keepends=True) 703 >>> len(text1) 704 4 705 >>> text1[0][-1] 706 '\n' 707 >>> text2 = ''' 1. Beautiful is better than ugly. 708 ... 3. Simple is better than complex. 709 ... 4. Complicated is better than complex. 710 ... 5. Flat is better than nested. 711 ... '''.splitlines(keepends=True) 712 713Next we instantiate a Differ object: 714 715 >>> d = Differ() 716 717Note that when instantiating a :class:`Differ` object we may pass functions to 718filter out line and character "junk." See the :meth:`Differ` constructor for 719details. 720 721Finally, we compare the two: 722 723 >>> result = list(d.compare(text1, text2)) 724 725``result`` is a list of strings, so let's pretty-print it: 726 727 >>> from pprint import pprint 728 >>> pprint(result) 729 [' 1. Beautiful is better than ugly.\n', 730 '- 2. Explicit is better than implicit.\n', 731 '- 3. Simple is better than complex.\n', 732 '+ 3. Simple is better than complex.\n', 733 '? ++\n', 734 '- 4. Complex is better than complicated.\n', 735 '? ^ ---- ^\n', 736 '+ 4. Complicated is better than complex.\n', 737 '? ++++ ^ ^\n', 738 '+ 5. Flat is better than nested.\n'] 739 740As a single multi-line string it looks like this: 741 742 >>> import sys 743 >>> sys.stdout.writelines(result) 744 1. Beautiful is better than ugly. 745 - 2. Explicit is better than implicit. 746 - 3. Simple is better than complex. 747 + 3. Simple is better than complex. 748 ? ++ 749 - 4. Complex is better than complicated. 750 ? ^ ---- ^ 751 + 4. Complicated is better than complex. 752 ? ++++ ^ ^ 753 + 5. Flat is better than nested. 754 755 756.. _difflib-interface: 757 758A command-line interface to difflib 759----------------------------------- 760 761This example shows how to use difflib to create a ``diff``-like utility. 762It is also contained in the Python source distribution, as 763:file:`Tools/scripts/diff.py`. 764 765.. literalinclude:: ../../Tools/scripts/diff.py 766