1:mod:`difflib` --- Helpers for computing deltas 2=============================================== 3 4.. module:: difflib 5 :synopsis: Helpers for computing differences between objects. 6 7.. moduleauthor:: Tim Peters <tim_one@users.sourceforge.net> 8.. sectionauthor:: Tim Peters <tim_one@users.sourceforge.net> 9.. Markup by Fred L. Drake, Jr. <fdrake@acm.org> 10 11**Source code:** :source:`Lib/difflib.py` 12 13.. testsetup:: 14 15 import sys 16 from difflib import * 17 18-------------- 19 20This module provides classes and functions for comparing sequences. It 21can be used for example, for comparing files, and can produce difference 22information in various formats, including HTML and context and unified 23diffs. For comparing directories and files, see also, the :mod:`filecmp` module. 24 25 26.. class:: SequenceMatcher 27 28 This is a flexible class for comparing pairs of sequences of any type, so long 29 as the sequence elements are :term:`hashable`. The basic algorithm predates, and is a 30 little fancier than, an algorithm published in the late 1980's by Ratcliff and 31 Obershelp under the hyperbolic name "gestalt pattern matching." The idea is to 32 find the longest contiguous matching subsequence that contains no "junk" 33 elements; these "junk" elements are ones that are uninteresting in some 34 sense, such as blank lines or whitespace. (Handling junk is an 35 extension to the Ratcliff and Obershelp algorithm.) The same 36 idea is then applied recursively to the pieces of the sequences to the left and 37 to the right of the matching subsequence. This does not yield minimal edit 38 sequences, but does tend to yield matches that "look right" to people. 39 40 **Timing:** The basic Ratcliff-Obershelp algorithm is cubic time in the worst 41 case and quadratic time in the expected case. :class:`SequenceMatcher` is 42 quadratic time for the worst case and has expected-case behavior dependent in a 43 complicated way on how many elements the sequences have in common; best case 44 time is linear. 45 46 **Automatic junk heuristic:** :class:`SequenceMatcher` supports a heuristic that 47 automatically treats certain sequence items as junk. The heuristic counts how many 48 times each individual item appears in the sequence. If an item's duplicates (after 49 the first one) account for more than 1% of the sequence and the sequence is at least 50 200 items long, this item is marked as "popular" and is treated as junk for 51 the purpose of sequence matching. This heuristic can be turned off by setting 52 the ``autojunk`` argument to ``False`` when creating the :class:`SequenceMatcher`. 53 54 .. versionadded:: 3.2 55 The *autojunk* parameter. 56 57 58.. class:: Differ 59 60 This is a class for comparing sequences of lines of text, and producing 61 human-readable differences or deltas. Differ uses :class:`SequenceMatcher` 62 both to compare sequences of lines, and to compare sequences of characters 63 within similar (near-matching) lines. 64 65 Each line of a :class:`Differ` delta begins with a two-letter code: 66 67 +----------+-------------------------------------------+ 68 | Code | Meaning | 69 +==========+===========================================+ 70 | ``'- '`` | line unique to sequence 1 | 71 +----------+-------------------------------------------+ 72 | ``'+ '`` | line unique to sequence 2 | 73 +----------+-------------------------------------------+ 74 | ``' '`` | line common to both sequences | 75 +----------+-------------------------------------------+ 76 | ``'? '`` | line not present in either input sequence | 77 +----------+-------------------------------------------+ 78 79 Lines beginning with '``?``' attempt to guide the eye to intraline differences, 80 and were not present in either input sequence. These lines can be confusing if 81 the sequences contain tab characters. 82 83 84.. class:: HtmlDiff 85 86 This class can be used to create an HTML table (or a complete HTML file 87 containing the table) showing a side by side, line by line comparison of text 88 with inter-line and intra-line change highlights. The table can be generated in 89 either full or contextual difference mode. 90 91 The constructor for this class is: 92 93 94 .. method:: __init__(tabsize=8, wrapcolumn=None, linejunk=None, charjunk=IS_CHARACTER_JUNK) 95 96 Initializes instance of :class:`HtmlDiff`. 97 98 *tabsize* is an optional keyword argument to specify tab stop spacing and 99 defaults to ``8``. 100 101 *wrapcolumn* is an optional keyword to specify column number where lines are 102 broken and wrapped, defaults to ``None`` where lines are not wrapped. 103 104 *linejunk* and *charjunk* are optional keyword arguments passed into :func:`ndiff` 105 (used by :class:`HtmlDiff` to generate the side by side HTML differences). See 106 :func:`ndiff` documentation for argument default values and descriptions. 107 108 The following methods are public: 109 110 .. method:: make_file(fromlines, tolines, fromdesc='', todesc='', context=False, \ 111 numlines=5, *, charset='utf-8') 112 113 Compares *fromlines* and *tolines* (lists of strings) and returns a string which 114 is a complete HTML file containing a table showing line by line differences with 115 inter-line and intra-line changes highlighted. 116 117 *fromdesc* and *todesc* are optional keyword arguments to specify from/to file 118 column header strings (both default to an empty string). 119 120 *context* and *numlines* are both optional keyword arguments. Set *context* to 121 ``True`` when contextual differences are to be shown, else the default is 122 ``False`` to show the full files. *numlines* defaults to ``5``. When *context* 123 is ``True`` *numlines* controls the number of context lines which surround the 124 difference highlights. When *context* is ``False`` *numlines* controls the 125 number of lines which are shown before a difference highlight when using the 126 "next" hyperlinks (setting to zero would cause the "next" hyperlinks to place 127 the next difference highlight at the top of the browser without any leading 128 context). 129 130 .. note:: 131 *fromdesc* and *todesc* are interpreted as unescaped HTML and should be 132 properly escaped while receiving input from untrusted sources. 133 134 .. versionchanged:: 3.5 135 *charset* keyword-only argument was added. The default charset of 136 HTML document changed from ``'ISO-8859-1'`` to ``'utf-8'``. 137 138 .. method:: make_table(fromlines, tolines, fromdesc='', todesc='', context=False, numlines=5) 139 140 Compares *fromlines* and *tolines* (lists of strings) and returns a string which 141 is a complete HTML table showing line by line differences with inter-line and 142 intra-line changes highlighted. 143 144 The arguments for this method are the same as those for the :meth:`make_file` 145 method. 146 147 :file:`Tools/scripts/diff.py` is a command-line front-end to this class and 148 contains a good example of its use. 149 150 151.. function:: context_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\\n') 152 153 Compare *a* and *b* (lists of strings); return a delta (a :term:`generator` 154 generating the delta lines) in context diff format. 155 156 Context diffs are a compact way of showing just the lines that have changed plus 157 a few lines of context. The changes are shown in a before/after style. The 158 number of context lines is set by *n* which defaults to three. 159 160 By default, the diff control lines (those with ``***`` or ``---``) are created 161 with a trailing newline. This is helpful so that inputs created from 162 :func:`io.IOBase.readlines` result in diffs that are suitable for use with 163 :func:`io.IOBase.writelines` since both the inputs and outputs have trailing 164 newlines. 165 166 For inputs that do not have trailing newlines, set the *lineterm* argument to 167 ``""`` so that the output will be uniformly newline free. 168 169 The context diff format normally has a header for filenames and modification 170 times. Any or all of these may be specified using strings for *fromfile*, 171 *tofile*, *fromfiledate*, and *tofiledate*. The modification times are normally 172 expressed in the ISO 8601 format. If not specified, the 173 strings default to blanks. 174 175 >>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n'] 176 >>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n'] 177 >>> sys.stdout.writelines(context_diff(s1, s2, fromfile='before.py', tofile='after.py')) 178 *** before.py 179 --- after.py 180 *************** 181 *** 1,4 **** 182 ! bacon 183 ! eggs 184 ! ham 185 guido 186 --- 1,4 ---- 187 ! python 188 ! eggy 189 ! hamster 190 guido 191 192 See :ref:`difflib-interface` for a more detailed example. 193 194 195.. function:: get_close_matches(word, possibilities, n=3, cutoff=0.6) 196 197 Return a list of the best "good enough" matches. *word* is a sequence for which 198 close matches are desired (typically a string), and *possibilities* is a list of 199 sequences against which to match *word* (typically a list of strings). 200 201 Optional argument *n* (default ``3``) is the maximum number of close matches to 202 return; *n* must be greater than ``0``. 203 204 Optional argument *cutoff* (default ``0.6``) is a float in the range [0, 1]. 205 Possibilities that don't score at least that similar to *word* are ignored. 206 207 The best (no more than *n*) matches among the possibilities are returned in a 208 list, sorted by similarity score, most similar first. 209 210 >>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy']) 211 ['apple', 'ape'] 212 >>> import keyword 213 >>> get_close_matches('wheel', keyword.kwlist) 214 ['while'] 215 >>> get_close_matches('pineapple', keyword.kwlist) 216 [] 217 >>> get_close_matches('accept', keyword.kwlist) 218 ['except'] 219 220 221.. function:: ndiff(a, b, linejunk=None, charjunk=IS_CHARACTER_JUNK) 222 223 Compare *a* and *b* (lists of strings); return a :class:`Differ`\ -style 224 delta (a :term:`generator` generating the delta lines). 225 226 Optional keyword parameters *linejunk* and *charjunk* are filtering functions 227 (or ``None``): 228 229 *linejunk*: A function that accepts a single string argument, and returns 230 true if the string is junk, or false if not. The default is ``None``. There 231 is also a module-level function :func:`IS_LINE_JUNK`, which filters out lines 232 without visible characters, except for at most one pound character (``'#'``) 233 -- however the underlying :class:`SequenceMatcher` class does a dynamic 234 analysis of which lines are so frequent as to constitute noise, and this 235 usually works better than using this function. 236 237 *charjunk*: A function that accepts a character (a string of length 1), and 238 returns if the character is junk, or false if not. The default is module-level 239 function :func:`IS_CHARACTER_JUNK`, which filters out whitespace characters (a 240 blank or tab; it's a bad idea to include newline in this!). 241 242 :file:`Tools/scripts/ndiff.py` is a command-line front-end to this function. 243 244 >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True), 245 ... 'ore\ntree\nemu\n'.splitlines(keepends=True)) 246 >>> print(''.join(diff), end="") 247 - one 248 ? ^ 249 + ore 250 ? ^ 251 - two 252 - three 253 ? - 254 + tree 255 + emu 256 257 258.. function:: restore(sequence, which) 259 260 Return one of the two sequences that generated a delta. 261 262 Given a *sequence* produced by :meth:`Differ.compare` or :func:`ndiff`, extract 263 lines originating from file 1 or 2 (parameter *which*), stripping off line 264 prefixes. 265 266 Example: 267 268 >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True), 269 ... 'ore\ntree\nemu\n'.splitlines(keepends=True)) 270 >>> diff = list(diff) # materialize the generated delta into a list 271 >>> print(''.join(restore(diff, 1)), end="") 272 one 273 two 274 three 275 >>> print(''.join(restore(diff, 2)), end="") 276 ore 277 tree 278 emu 279 280 281.. function:: unified_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\\n') 282 283 Compare *a* and *b* (lists of strings); return a delta (a :term:`generator` 284 generating the delta lines) in unified diff format. 285 286 Unified diffs are a compact way of showing just the lines that have changed plus 287 a few lines of context. The changes are shown in an inline style (instead of 288 separate before/after blocks). The number of context lines is set by *n* which 289 defaults to three. 290 291 By default, the diff control lines (those with ``---``, ``+++``, or ``@@``) are 292 created with a trailing newline. This is helpful so that inputs created from 293 :func:`io.IOBase.readlines` result in diffs that are suitable for use with 294 :func:`io.IOBase.writelines` since both the inputs and outputs have trailing 295 newlines. 296 297 For inputs that do not have trailing newlines, set the *lineterm* argument to 298 ``""`` so that the output will be uniformly newline free. 299 300 The context diff format normally has a header for filenames and modification 301 times. Any or all of these may be specified using strings for *fromfile*, 302 *tofile*, *fromfiledate*, and *tofiledate*. The modification times are normally 303 expressed in the ISO 8601 format. If not specified, the 304 strings default to blanks. 305 306 307 >>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n'] 308 >>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n'] 309 >>> sys.stdout.writelines(unified_diff(s1, s2, fromfile='before.py', tofile='after.py')) 310 --- before.py 311 +++ after.py 312 @@ -1,4 +1,4 @@ 313 -bacon 314 -eggs 315 -ham 316 +python 317 +eggy 318 +hamster 319 guido 320 321 See :ref:`difflib-interface` for a more detailed example. 322 323.. function:: diff_bytes(dfunc, a, b, fromfile=b'', tofile=b'', fromfiledate=b'', tofiledate=b'', n=3, lineterm=b'\\n') 324 325 Compare *a* and *b* (lists of bytes objects) using *dfunc*; yield a 326 sequence of delta lines (also bytes) in the format returned by *dfunc*. 327 *dfunc* must be a callable, typically either :func:`unified_diff` or 328 :func:`context_diff`. 329 330 Allows you to compare data with unknown or inconsistent encoding. All 331 inputs except *n* must be bytes objects, not str. Works by losslessly 332 converting all inputs (except *n*) to str, and calling ``dfunc(a, b, 333 fromfile, tofile, fromfiledate, tofiledate, n, lineterm)``. The output of 334 *dfunc* is then converted back to bytes, so the delta lines that you 335 receive have the same unknown/inconsistent encodings as *a* and *b*. 336 337 .. versionadded:: 3.5 338 339.. function:: IS_LINE_JUNK(line) 340 341 Return ``True`` for ignorable lines. The line *line* is ignorable if *line* is 342 blank or contains a single ``'#'``, otherwise it is not ignorable. Used as a 343 default for parameter *linejunk* in :func:`ndiff` in older versions. 344 345 346.. function:: IS_CHARACTER_JUNK(ch) 347 348 Return ``True`` for ignorable characters. The character *ch* is ignorable if *ch* 349 is a space or tab, otherwise it is not ignorable. Used as a default for 350 parameter *charjunk* in :func:`ndiff`. 351 352 353.. seealso:: 354 355 `Pattern Matching: The Gestalt Approach <http://www.drdobbs.com/database/pattern-matching-the-gestalt-approach/184407970>`_ 356 Discussion of a similar algorithm by John W. Ratcliff and D. E. Metzener. This 357 was published in `Dr. Dobb's Journal <http://www.drdobbs.com/>`_ in July, 1988. 358 359 360.. _sequence-matcher: 361 362SequenceMatcher Objects 363----------------------- 364 365The :class:`SequenceMatcher` class has this constructor: 366 367 368.. class:: SequenceMatcher(isjunk=None, a='', b='', autojunk=True) 369 370 Optional argument *isjunk* must be ``None`` (the default) or a one-argument 371 function that takes a sequence element and returns true if and only if the 372 element is "junk" and should be ignored. Passing ``None`` for *isjunk* is 373 equivalent to passing ``lambda x: False``; in other words, no elements are ignored. 374 For example, pass:: 375 376 lambda x: x in " \t" 377 378 if you're comparing lines as sequences of characters, and don't want to synch up 379 on blanks or hard tabs. 380 381 The optional arguments *a* and *b* are sequences to be compared; both default to 382 empty strings. The elements of both sequences must be :term:`hashable`. 383 384 The optional argument *autojunk* can be used to disable the automatic junk 385 heuristic. 386 387 .. versionadded:: 3.2 388 The *autojunk* parameter. 389 390 SequenceMatcher objects get three data attributes: *bjunk* is the 391 set of elements of *b* for which *isjunk* is ``True``; *bpopular* is the set of 392 non-junk elements considered popular by the heuristic (if it is not 393 disabled); *b2j* is a dict mapping the remaining elements of *b* to a list 394 of positions where they occur. All three are reset whenever *b* is reset 395 with :meth:`set_seqs` or :meth:`set_seq2`. 396 397 .. versionadded:: 3.2 398 The *bjunk* and *bpopular* attributes. 399 400 :class:`SequenceMatcher` objects have the following methods: 401 402 .. method:: set_seqs(a, b) 403 404 Set the two sequences to be compared. 405 406 :class:`SequenceMatcher` computes and caches detailed information about the 407 second sequence, so if you want to compare one sequence against many 408 sequences, use :meth:`set_seq2` to set the commonly used sequence once and 409 call :meth:`set_seq1` repeatedly, once for each of the other sequences. 410 411 412 .. method:: set_seq1(a) 413 414 Set the first sequence to be compared. The second sequence to be compared 415 is not changed. 416 417 418 .. method:: set_seq2(b) 419 420 Set the second sequence to be compared. The first sequence to be compared 421 is not changed. 422 423 424 .. method:: find_longest_match(alo, ahi, blo, bhi) 425 426 Find longest matching block in ``a[alo:ahi]`` and ``b[blo:bhi]``. 427 428 If *isjunk* was omitted or ``None``, :meth:`find_longest_match` returns 429 ``(i, j, k)`` such that ``a[i:i+k]`` is equal to ``b[j:j+k]``, where ``alo 430 <= i <= i+k <= ahi`` and ``blo <= j <= j+k <= bhi``. For all ``(i', j', 431 k')`` meeting those conditions, the additional conditions ``k >= k'``, ``i 432 <= i'``, and if ``i == i'``, ``j <= j'`` are also met. In other words, of 433 all maximal matching blocks, return one that starts earliest in *a*, and 434 of all those maximal matching blocks that start earliest in *a*, return 435 the one that starts earliest in *b*. 436 437 >>> s = SequenceMatcher(None, " abcd", "abcd abcd") 438 >>> s.find_longest_match(0, 5, 0, 9) 439 Match(a=0, b=4, size=5) 440 441 If *isjunk* was provided, first the longest matching block is determined 442 as above, but with the additional restriction that no junk element appears 443 in the block. Then that block is extended as far as possible by matching 444 (only) junk elements on both sides. So the resulting block never matches 445 on junk except as identical junk happens to be adjacent to an interesting 446 match. 447 448 Here's the same example as before, but considering blanks to be junk. That 449 prevents ``' abcd'`` from matching the ``' abcd'`` at the tail end of the 450 second sequence directly. Instead only the ``'abcd'`` can match, and 451 matches the leftmost ``'abcd'`` in the second sequence: 452 453 >>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd") 454 >>> s.find_longest_match(0, 5, 0, 9) 455 Match(a=1, b=0, size=4) 456 457 If no blocks match, this returns ``(alo, blo, 0)``. 458 459 This method returns a :term:`named tuple` ``Match(a, b, size)``. 460 461 462 .. method:: get_matching_blocks() 463 464 Return list of triples describing non-overlapping matching subsequences. 465 Each triple is of the form ``(i, j, n)``, 466 and means that ``a[i:i+n] == b[j:j+n]``. The 467 triples are monotonically increasing in *i* and *j*. 468 469 The last triple is a dummy, and has the value ``(len(a), len(b), 0)``. It 470 is the only triple with ``n == 0``. If ``(i, j, n)`` and ``(i', j', n')`` 471 are adjacent triples in the list, and the second is not the last triple in 472 the list, then ``i+n < i'`` or ``j+n < j'``; in other words, adjacent 473 triples always describe non-adjacent equal blocks. 474 475 .. XXX Explain why a dummy is used! 476 477 .. doctest:: 478 479 >>> s = SequenceMatcher(None, "abxcd", "abcd") 480 >>> s.get_matching_blocks() 481 [Match(a=0, b=0, size=2), Match(a=3, b=2, size=2), Match(a=5, b=4, size=0)] 482 483 484 .. method:: get_opcodes() 485 486 Return list of 5-tuples describing how to turn *a* into *b*. Each tuple is 487 of the form ``(tag, i1, i2, j1, j2)``. The first tuple has ``i1 == j1 == 488 0``, and remaining tuples have *i1* equal to the *i2* from the preceding 489 tuple, and, likewise, *j1* equal to the previous *j2*. 490 491 The *tag* values are strings, with these meanings: 492 493 +---------------+---------------------------------------------+ 494 | Value | Meaning | 495 +===============+=============================================+ 496 | ``'replace'`` | ``a[i1:i2]`` should be replaced by | 497 | | ``b[j1:j2]``. | 498 +---------------+---------------------------------------------+ 499 | ``'delete'`` | ``a[i1:i2]`` should be deleted. Note that | 500 | | ``j1 == j2`` in this case. | 501 +---------------+---------------------------------------------+ 502 | ``'insert'`` | ``b[j1:j2]`` should be inserted at | 503 | | ``a[i1:i1]``. Note that ``i1 == i2`` in | 504 | | this case. | 505 +---------------+---------------------------------------------+ 506 | ``'equal'`` | ``a[i1:i2] == b[j1:j2]`` (the sub-sequences | 507 | | are equal). | 508 +---------------+---------------------------------------------+ 509 510 For example:: 511 512 >>> a = "qabxcd" 513 >>> b = "abycdf" 514 >>> s = SequenceMatcher(None, a, b) 515 >>> for tag, i1, i2, j1, j2 in s.get_opcodes(): 516 ... print('{:7} a[{}:{}] --> b[{}:{}] {!r:>8} --> {!r}'.format( 517 ... tag, i1, i2, j1, j2, a[i1:i2], b[j1:j2])) 518 delete a[0:1] --> b[0:0] 'q' --> '' 519 equal a[1:3] --> b[0:2] 'ab' --> 'ab' 520 replace a[3:4] --> b[2:3] 'x' --> 'y' 521 equal a[4:6] --> b[3:5] 'cd' --> 'cd' 522 insert a[6:6] --> b[5:6] '' --> 'f' 523 524 525 .. method:: get_grouped_opcodes(n=3) 526 527 Return a :term:`generator` of groups with up to *n* lines of context. 528 529 Starting with the groups returned by :meth:`get_opcodes`, this method 530 splits out smaller change clusters and eliminates intervening ranges which 531 have no changes. 532 533 The groups are returned in the same format as :meth:`get_opcodes`. 534 535 536 .. method:: ratio() 537 538 Return a measure of the sequences' similarity as a float in the range [0, 539 1]. 540 541 Where T is the total number of elements in both sequences, and M is the 542 number of matches, this is 2.0\*M / T. Note that this is ``1.0`` if the 543 sequences are identical, and ``0.0`` if they have nothing in common. 544 545 This is expensive to compute if :meth:`get_matching_blocks` or 546 :meth:`get_opcodes` hasn't already been called, in which case you may want 547 to try :meth:`quick_ratio` or :meth:`real_quick_ratio` first to get an 548 upper bound. 549 550 .. note:: 551 552 Caution: The result of a :meth:`ratio` call may depend on the order of 553 the arguments. For instance:: 554 555 >>> SequenceMatcher(None, 'tide', 'diet').ratio() 556 0.25 557 >>> SequenceMatcher(None, 'diet', 'tide').ratio() 558 0.5 559 560 561 .. method:: quick_ratio() 562 563 Return an upper bound on :meth:`ratio` relatively quickly. 564 565 566 .. method:: real_quick_ratio() 567 568 Return an upper bound on :meth:`ratio` very quickly. 569 570 571The three methods that return the ratio of matching to total characters can give 572different results due to differing levels of approximation, although 573:meth:`quick_ratio` and :meth:`real_quick_ratio` are always at least as large as 574:meth:`ratio`: 575 576 >>> s = SequenceMatcher(None, "abcd", "bcde") 577 >>> s.ratio() 578 0.75 579 >>> s.quick_ratio() 580 0.75 581 >>> s.real_quick_ratio() 582 1.0 583 584 585.. _sequencematcher-examples: 586 587SequenceMatcher Examples 588------------------------ 589 590This example compares two strings, considering blanks to be "junk": 591 592 >>> s = SequenceMatcher(lambda x: x == " ", 593 ... "private Thread currentThread;", 594 ... "private volatile Thread currentThread;") 595 596:meth:`ratio` returns a float in [0, 1], measuring the similarity of the 597sequences. As a rule of thumb, a :meth:`ratio` value over 0.6 means the 598sequences are close matches: 599 600 >>> print(round(s.ratio(), 3)) 601 0.866 602 603If you're only interested in where the sequences match, 604:meth:`get_matching_blocks` is handy: 605 606 >>> for block in s.get_matching_blocks(): 607 ... print("a[%d] and b[%d] match for %d elements" % block) 608 a[0] and b[0] match for 8 elements 609 a[8] and b[17] match for 21 elements 610 a[29] and b[38] match for 0 elements 611 612Note that the last tuple returned by :meth:`get_matching_blocks` is always a 613dummy, ``(len(a), len(b), 0)``, and this is the only case in which the last 614tuple element (number of elements matched) is ``0``. 615 616If you want to know how to change the first sequence into the second, use 617:meth:`get_opcodes`: 618 619 >>> for opcode in s.get_opcodes(): 620 ... print("%6s a[%d:%d] b[%d:%d]" % opcode) 621 equal a[0:8] b[0:8] 622 insert a[8:8] b[8:17] 623 equal a[8:29] b[17:38] 624 625.. seealso:: 626 627 * The :func:`get_close_matches` function in this module which shows how 628 simple code building on :class:`SequenceMatcher` can be used to do useful 629 work. 630 631 * `Simple version control recipe 632 <https://code.activestate.com/recipes/576729/>`_ for a small application 633 built with :class:`SequenceMatcher`. 634 635 636.. _differ-objects: 637 638Differ Objects 639-------------- 640 641Note that :class:`Differ`\ -generated deltas make no claim to be **minimal** 642diffs. To the contrary, minimal diffs are often counter-intuitive, because they 643synch up anywhere possible, sometimes accidental matches 100 pages apart. 644Restricting synch points to contiguous matches preserves some notion of 645locality, at the occasional cost of producing a longer diff. 646 647The :class:`Differ` class has this constructor: 648 649 650.. class:: Differ(linejunk=None, charjunk=None) 651 652 Optional keyword parameters *linejunk* and *charjunk* are for filter functions 653 (or ``None``): 654 655 *linejunk*: A function that accepts a single string argument, and returns true 656 if the string is junk. The default is ``None``, meaning that no line is 657 considered junk. 658 659 *charjunk*: A function that accepts a single character argument (a string of 660 length 1), and returns true if the character is junk. The default is ``None``, 661 meaning that no character is considered junk. 662 663 These junk-filtering functions speed up matching to find 664 differences and do not cause any differing lines or characters to 665 be ignored. Read the description of the 666 :meth:`~SequenceMatcher.find_longest_match` method's *isjunk* 667 parameter for an explanation. 668 669 :class:`Differ` objects are used (deltas generated) via a single method: 670 671 672 .. method:: Differ.compare(a, b) 673 674 Compare two sequences of lines, and generate the delta (a sequence of lines). 675 676 Each sequence must contain individual single-line strings ending with 677 newlines. Such sequences can be obtained from the 678 :meth:`~io.IOBase.readlines` method of file-like objects. The delta 679 generated also consists of newline-terminated strings, ready to be 680 printed as-is via the :meth:`~io.IOBase.writelines` method of a 681 file-like object. 682 683 684.. _differ-examples: 685 686Differ Example 687-------------- 688 689This example compares two texts. First we set up the texts, sequences of 690individual single-line strings ending with newlines (such sequences can also be 691obtained from the :meth:`~io.BaseIO.readlines` method of file-like objects): 692 693 >>> text1 = ''' 1. Beautiful is better than ugly. 694 ... 2. Explicit is better than implicit. 695 ... 3. Simple is better than complex. 696 ... 4. Complex is better than complicated. 697 ... '''.splitlines(keepends=True) 698 >>> len(text1) 699 4 700 >>> text1[0][-1] 701 '\n' 702 >>> text2 = ''' 1. Beautiful is better than ugly. 703 ... 3. Simple is better than complex. 704 ... 4. Complicated is better than complex. 705 ... 5. Flat is better than nested. 706 ... '''.splitlines(keepends=True) 707 708Next we instantiate a Differ object: 709 710 >>> d = Differ() 711 712Note that when instantiating a :class:`Differ` object we may pass functions to 713filter out line and character "junk." See the :meth:`Differ` constructor for 714details. 715 716Finally, we compare the two: 717 718 >>> result = list(d.compare(text1, text2)) 719 720``result`` is a list of strings, so let's pretty-print it: 721 722 >>> from pprint import pprint 723 >>> pprint(result) 724 [' 1. Beautiful is better than ugly.\n', 725 '- 2. Explicit is better than implicit.\n', 726 '- 3. Simple is better than complex.\n', 727 '+ 3. Simple is better than complex.\n', 728 '? ++\n', 729 '- 4. Complex is better than complicated.\n', 730 '? ^ ---- ^\n', 731 '+ 4. Complicated is better than complex.\n', 732 '? ++++ ^ ^\n', 733 '+ 5. Flat is better than nested.\n'] 734 735As a single multi-line string it looks like this: 736 737 >>> import sys 738 >>> sys.stdout.writelines(result) 739 1. Beautiful is better than ugly. 740 - 2. Explicit is better than implicit. 741 - 3. Simple is better than complex. 742 + 3. Simple is better than complex. 743 ? ++ 744 - 4. Complex is better than complicated. 745 ? ^ ---- ^ 746 + 4. Complicated is better than complex. 747 ? ++++ ^ ^ 748 + 5. Flat is better than nested. 749 750 751.. _difflib-interface: 752 753A command-line interface to difflib 754----------------------------------- 755 756This example shows how to use difflib to create a ``diff``-like utility. 757It is also contained in the Python source distribution, as 758:file:`Tools/scripts/diff.py`. 759 760.. literalinclude:: ../../Tools/scripts/diff.py 761