1:mod:`!difflib` --- Helpers for computing deltas 2================================================ 3 4.. module:: difflib 5 :synopsis: Helpers for computing differences between objects. 6 7.. moduleauthor:: Tim Peters <tim_one@users.sourceforge.net> 8.. sectionauthor:: Tim Peters <tim_one@users.sourceforge.net> 9.. Markup by Fred L. Drake, Jr. <fdrake@acm.org> 10 11**Source code:** :source:`Lib/difflib.py` 12 13.. testsetup:: 14 15 import sys 16 from difflib import * 17 18-------------- 19 20This module provides classes and functions for comparing sequences. It 21can be used for example, for comparing files, and can produce information 22about file differences in various formats, including HTML and context and unified 23diffs. For comparing directories and files, see also, the :mod:`filecmp` module. 24 25 26.. class:: SequenceMatcher 27 :noindex: 28 29 This is a flexible class for comparing pairs of sequences of any type, so long 30 as the sequence elements are :term:`hashable`. The basic algorithm predates, and is a 31 little fancier than, an algorithm published in the late 1980's by Ratcliff and 32 Obershelp under the hyperbolic name "gestalt pattern matching." The idea is to 33 find the longest contiguous matching subsequence that contains no "junk" 34 elements; these "junk" elements are ones that are uninteresting in some 35 sense, such as blank lines or whitespace. (Handling junk is an 36 extension to the Ratcliff and Obershelp algorithm.) The same 37 idea is then applied recursively to the pieces of the sequences to the left and 38 to the right of the matching subsequence. This does not yield minimal edit 39 sequences, but does tend to yield matches that "look right" to people. 40 41 **Timing:** The basic Ratcliff-Obershelp algorithm is cubic time in the worst 42 case and quadratic time in the expected case. :class:`SequenceMatcher` is 43 quadratic time for the worst case and has expected-case behavior dependent in a 44 complicated way on how many elements the sequences have in common; best case 45 time is linear. 46 47 **Automatic junk heuristic:** :class:`SequenceMatcher` supports a heuristic that 48 automatically treats certain sequence items as junk. The heuristic counts how many 49 times each individual item appears in the sequence. If an item's duplicates (after 50 the first one) account for more than 1% of the sequence and the sequence is at least 51 200 items long, this item is marked as "popular" and is treated as junk for 52 the purpose of sequence matching. This heuristic can be turned off by setting 53 the ``autojunk`` argument to ``False`` when creating the :class:`SequenceMatcher`. 54 55 .. versionchanged:: 3.2 56 Added the *autojunk* parameter. 57 58 59.. class:: Differ 60 61 This is a class for comparing sequences of lines of text, and producing 62 human-readable differences or deltas. Differ uses :class:`SequenceMatcher` 63 both to compare sequences of lines, and to compare sequences of characters 64 within similar (near-matching) lines. 65 66 Each line of a :class:`Differ` delta begins with a two-letter code: 67 68 +----------+-------------------------------------------+ 69 | Code | Meaning | 70 +==========+===========================================+ 71 | ``'- '`` | line unique to sequence 1 | 72 +----------+-------------------------------------------+ 73 | ``'+ '`` | line unique to sequence 2 | 74 +----------+-------------------------------------------+ 75 | ``' '`` | line common to both sequences | 76 +----------+-------------------------------------------+ 77 | ``'? '`` | line not present in either input sequence | 78 +----------+-------------------------------------------+ 79 80 Lines beginning with '``?``' attempt to guide the eye to intraline differences, 81 and were not present in either input sequence. These lines can be confusing if 82 the sequences contain whitespace characters, such as spaces, tabs or line breaks. 83 84 85.. class:: HtmlDiff 86 87 This class can be used to create an HTML table (or a complete HTML file 88 containing the table) showing a side by side, line by line comparison of text 89 with inter-line and intra-line change highlights. The table can be generated in 90 either full or contextual difference mode. 91 92 The constructor for this class is: 93 94 95 .. method:: __init__(tabsize=8, wrapcolumn=None, linejunk=None, charjunk=IS_CHARACTER_JUNK) 96 97 Initializes instance of :class:`HtmlDiff`. 98 99 *tabsize* is an optional keyword argument to specify tab stop spacing and 100 defaults to ``8``. 101 102 *wrapcolumn* is an optional keyword to specify column number where lines are 103 broken and wrapped, defaults to ``None`` where lines are not wrapped. 104 105 *linejunk* and *charjunk* are optional keyword arguments passed into :func:`ndiff` 106 (used by :class:`HtmlDiff` to generate the side by side HTML differences). See 107 :func:`ndiff` documentation for argument default values and descriptions. 108 109 The following methods are public: 110 111 .. method:: make_file(fromlines, tolines, fromdesc='', todesc='', context=False, \ 112 numlines=5, *, charset='utf-8') 113 114 Compares *fromlines* and *tolines* (lists of strings) and returns a string which 115 is a complete HTML file containing a table showing line by line differences with 116 inter-line and intra-line changes highlighted. 117 118 *fromdesc* and *todesc* are optional keyword arguments to specify from/to file 119 column header strings (both default to an empty string). 120 121 *context* and *numlines* are both optional keyword arguments. Set *context* to 122 ``True`` when contextual differences are to be shown, else the default is 123 ``False`` to show the full files. *numlines* defaults to ``5``. When *context* 124 is ``True`` *numlines* controls the number of context lines which surround the 125 difference highlights. When *context* is ``False`` *numlines* controls the 126 number of lines which are shown before a difference highlight when using the 127 "next" hyperlinks (setting to zero would cause the "next" hyperlinks to place 128 the next difference highlight at the top of the browser without any leading 129 context). 130 131 .. note:: 132 *fromdesc* and *todesc* are interpreted as unescaped HTML and should be 133 properly escaped while receiving input from untrusted sources. 134 135 .. versionchanged:: 3.5 136 *charset* keyword-only argument was added. The default charset of 137 HTML document changed from ``'ISO-8859-1'`` to ``'utf-8'``. 138 139 .. method:: make_table(fromlines, tolines, fromdesc='', todesc='', context=False, numlines=5) 140 141 Compares *fromlines* and *tolines* (lists of strings) and returns a string which 142 is a complete HTML table showing line by line differences with inter-line and 143 intra-line changes highlighted. 144 145 The arguments for this method are the same as those for the :meth:`make_file` 146 method. 147 148 149 150.. function:: context_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\n') 151 152 Compare *a* and *b* (lists of strings); return a delta (a :term:`generator` 153 generating the delta lines) in context diff format. 154 155 Context diffs are a compact way of showing just the lines that have changed plus 156 a few lines of context. The changes are shown in a before/after style. The 157 number of context lines is set by *n* which defaults to three. 158 159 By default, the diff control lines (those with ``***`` or ``---``) are created 160 with a trailing newline. This is helpful so that inputs created from 161 :func:`io.IOBase.readlines` result in diffs that are suitable for use with 162 :func:`io.IOBase.writelines` since both the inputs and outputs have trailing 163 newlines. 164 165 For inputs that do not have trailing newlines, set the *lineterm* argument to 166 ``""`` so that the output will be uniformly newline free. 167 168 The context diff format normally has a header for filenames and modification 169 times. Any or all of these may be specified using strings for *fromfile*, 170 *tofile*, *fromfiledate*, and *tofiledate*. The modification times are normally 171 expressed in the ISO 8601 format. If not specified, the 172 strings default to blanks. 173 174 >>> import sys 175 >>> from difflib import * 176 >>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n'] 177 >>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n'] 178 >>> sys.stdout.writelines(context_diff(s1, s2, fromfile='before.py', 179 ... tofile='after.py')) 180 *** before.py 181 --- after.py 182 *************** 183 *** 1,4 **** 184 ! bacon 185 ! eggs 186 ! ham 187 guido 188 --- 1,4 ---- 189 ! python 190 ! eggy 191 ! hamster 192 guido 193 194 See :ref:`difflib-interface` for a more detailed example. 195 196 197.. function:: get_close_matches(word, possibilities, n=3, cutoff=0.6) 198 199 Return a list of the best "good enough" matches. *word* is a sequence for which 200 close matches are desired (typically a string), and *possibilities* is a list of 201 sequences against which to match *word* (typically a list of strings). 202 203 Optional argument *n* (default ``3``) is the maximum number of close matches to 204 return; *n* must be greater than ``0``. 205 206 Optional argument *cutoff* (default ``0.6``) is a float in the range [0, 1]. 207 Possibilities that don't score at least that similar to *word* are ignored. 208 209 The best (no more than *n*) matches among the possibilities are returned in a 210 list, sorted by similarity score, most similar first. 211 212 >>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy']) 213 ['apple', 'ape'] 214 >>> import keyword 215 >>> get_close_matches('wheel', keyword.kwlist) 216 ['while'] 217 >>> get_close_matches('pineapple', keyword.kwlist) 218 [] 219 >>> get_close_matches('accept', keyword.kwlist) 220 ['except'] 221 222 223.. function:: ndiff(a, b, linejunk=None, charjunk=IS_CHARACTER_JUNK) 224 225 Compare *a* and *b* (lists of strings); return a :class:`Differ`\ -style 226 delta (a :term:`generator` generating the delta lines). 227 228 Optional keyword parameters *linejunk* and *charjunk* are filtering functions 229 (or ``None``): 230 231 *linejunk*: A function that accepts a single string argument, and returns 232 true if the string is junk, or false if not. The default is ``None``. There 233 is also a module-level function :func:`IS_LINE_JUNK`, which filters out lines 234 without visible characters, except for at most one pound character (``'#'``) 235 -- however the underlying :class:`SequenceMatcher` class does a dynamic 236 analysis of which lines are so frequent as to constitute noise, and this 237 usually works better than using this function. 238 239 *charjunk*: A function that accepts a character (a string of length 1), and 240 returns if the character is junk, or false if not. The default is module-level 241 function :func:`IS_CHARACTER_JUNK`, which filters out whitespace characters (a 242 blank or tab; it's a bad idea to include newline in this!). 243 244 >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True), 245 ... 'ore\ntree\nemu\n'.splitlines(keepends=True)) 246 >>> print(''.join(diff), end="") 247 - one 248 ? ^ 249 + ore 250 ? ^ 251 - two 252 - three 253 ? - 254 + tree 255 + emu 256 257 258.. function:: restore(sequence, which) 259 260 Return one of the two sequences that generated a delta. 261 262 Given a *sequence* produced by :meth:`Differ.compare` or :func:`ndiff`, extract 263 lines originating from file 1 or 2 (parameter *which*), stripping off line 264 prefixes. 265 266 Example: 267 268 >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True), 269 ... 'ore\ntree\nemu\n'.splitlines(keepends=True)) 270 >>> diff = list(diff) # materialize the generated delta into a list 271 >>> print(''.join(restore(diff, 1)), end="") 272 one 273 two 274 three 275 >>> print(''.join(restore(diff, 2)), end="") 276 ore 277 tree 278 emu 279 280 281.. function:: unified_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\n') 282 283 Compare *a* and *b* (lists of strings); return a delta (a :term:`generator` 284 generating the delta lines) in unified diff format. 285 286 Unified diffs are a compact way of showing just the lines that have changed plus 287 a few lines of context. The changes are shown in an inline style (instead of 288 separate before/after blocks). The number of context lines is set by *n* which 289 defaults to three. 290 291 By default, the diff control lines (those with ``---``, ``+++``, or ``@@``) are 292 created with a trailing newline. This is helpful so that inputs created from 293 :func:`io.IOBase.readlines` result in diffs that are suitable for use with 294 :func:`io.IOBase.writelines` since both the inputs and outputs have trailing 295 newlines. 296 297 For inputs that do not have trailing newlines, set the *lineterm* argument to 298 ``""`` so that the output will be uniformly newline free. 299 300 The unified diff format normally has a header for filenames and modification 301 times. Any or all of these may be specified using strings for *fromfile*, 302 *tofile*, *fromfiledate*, and *tofiledate*. The modification times are normally 303 expressed in the ISO 8601 format. If not specified, the 304 strings default to blanks. 305 306 >>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n'] 307 >>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n'] 308 >>> sys.stdout.writelines(unified_diff(s1, s2, fromfile='before.py', tofile='after.py')) 309 --- before.py 310 +++ after.py 311 @@ -1,4 +1,4 @@ 312 -bacon 313 -eggs 314 -ham 315 +python 316 +eggy 317 +hamster 318 guido 319 320 See :ref:`difflib-interface` for a more detailed example. 321 322.. function:: diff_bytes(dfunc, a, b, fromfile=b'', tofile=b'', fromfiledate=b'', tofiledate=b'', n=3, lineterm=b'\n') 323 324 Compare *a* and *b* (lists of bytes objects) using *dfunc*; yield a 325 sequence of delta lines (also bytes) in the format returned by *dfunc*. 326 *dfunc* must be a callable, typically either :func:`unified_diff` or 327 :func:`context_diff`. 328 329 Allows you to compare data with unknown or inconsistent encoding. All 330 inputs except *n* must be bytes objects, not str. Works by losslessly 331 converting all inputs (except *n*) to str, and calling ``dfunc(a, b, 332 fromfile, tofile, fromfiledate, tofiledate, n, lineterm)``. The output of 333 *dfunc* is then converted back to bytes, so the delta lines that you 334 receive have the same unknown/inconsistent encodings as *a* and *b*. 335 336 .. versionadded:: 3.5 337 338.. function:: IS_LINE_JUNK(line) 339 340 Return ``True`` for ignorable lines. The line *line* is ignorable if *line* is 341 blank or contains a single ``'#'``, otherwise it is not ignorable. Used as a 342 default for parameter *linejunk* in :func:`ndiff` in older versions. 343 344 345.. function:: IS_CHARACTER_JUNK(ch) 346 347 Return ``True`` for ignorable characters. The character *ch* is ignorable if *ch* 348 is a space or tab, otherwise it is not ignorable. Used as a default for 349 parameter *charjunk* in :func:`ndiff`. 350 351 352.. seealso:: 353 354 `Pattern Matching: The Gestalt Approach <https://www.drdobbs.com/database/pattern-matching-the-gestalt-approach/184407970>`_ 355 Discussion of a similar algorithm by John W. Ratcliff and D. E. Metzener. This 356 was published in `Dr. Dobb's Journal <https://www.drdobbs.com/>`_ in July, 1988. 357 358 359.. _sequence-matcher: 360 361SequenceMatcher Objects 362----------------------- 363 364The :class:`SequenceMatcher` class has this constructor: 365 366 367.. class:: SequenceMatcher(isjunk=None, a='', b='', autojunk=True) 368 369 Optional argument *isjunk* must be ``None`` (the default) or a one-argument 370 function that takes a sequence element and returns true if and only if the 371 element is "junk" and should be ignored. Passing ``None`` for *isjunk* is 372 equivalent to passing ``lambda x: False``; in other words, no elements are ignored. 373 For example, pass:: 374 375 lambda x: x in " \t" 376 377 if you're comparing lines as sequences of characters, and don't want to synch up 378 on blanks or hard tabs. 379 380 The optional arguments *a* and *b* are sequences to be compared; both default to 381 empty strings. The elements of both sequences must be :term:`hashable`. 382 383 The optional argument *autojunk* can be used to disable the automatic junk 384 heuristic. 385 386 .. versionchanged:: 3.2 387 Added the *autojunk* parameter. 388 389 SequenceMatcher objects get three data attributes: *bjunk* is the 390 set of elements of *b* for which *isjunk* is ``True``; *bpopular* is the set of 391 non-junk elements considered popular by the heuristic (if it is not 392 disabled); *b2j* is a dict mapping the remaining elements of *b* to a list 393 of positions where they occur. All three are reset whenever *b* is reset 394 with :meth:`set_seqs` or :meth:`set_seq2`. 395 396 .. versionadded:: 3.2 397 The *bjunk* and *bpopular* attributes. 398 399 :class:`SequenceMatcher` objects have the following methods: 400 401 .. method:: set_seqs(a, b) 402 403 Set the two sequences to be compared. 404 405 :class:`SequenceMatcher` computes and caches detailed information about the 406 second sequence, so if you want to compare one sequence against many 407 sequences, use :meth:`set_seq2` to set the commonly used sequence once and 408 call :meth:`set_seq1` repeatedly, once for each of the other sequences. 409 410 411 .. method:: set_seq1(a) 412 413 Set the first sequence to be compared. The second sequence to be compared 414 is not changed. 415 416 417 .. method:: set_seq2(b) 418 419 Set the second sequence to be compared. The first sequence to be compared 420 is not changed. 421 422 423 .. method:: find_longest_match(alo=0, ahi=None, blo=0, bhi=None) 424 425 Find longest matching block in ``a[alo:ahi]`` and ``b[blo:bhi]``. 426 427 If *isjunk* was omitted or ``None``, :meth:`find_longest_match` returns 428 ``(i, j, k)`` such that ``a[i:i+k]`` is equal to ``b[j:j+k]``, where ``alo 429 <= i <= i+k <= ahi`` and ``blo <= j <= j+k <= bhi``. For all ``(i', j', 430 k')`` meeting those conditions, the additional conditions ``k >= k'``, ``i 431 <= i'``, and if ``i == i'``, ``j <= j'`` are also met. In other words, of 432 all maximal matching blocks, return one that starts earliest in *a*, and 433 of all those maximal matching blocks that start earliest in *a*, return 434 the one that starts earliest in *b*. 435 436 >>> s = SequenceMatcher(None, " abcd", "abcd abcd") 437 >>> s.find_longest_match(0, 5, 0, 9) 438 Match(a=0, b=4, size=5) 439 440 If *isjunk* was provided, first the longest matching block is determined 441 as above, but with the additional restriction that no junk element appears 442 in the block. Then that block is extended as far as possible by matching 443 (only) junk elements on both sides. So the resulting block never matches 444 on junk except as identical junk happens to be adjacent to an interesting 445 match. 446 447 Here's the same example as before, but considering blanks to be junk. That 448 prevents ``' abcd'`` from matching the ``' abcd'`` at the tail end of the 449 second sequence directly. Instead only the ``'abcd'`` can match, and 450 matches the leftmost ``'abcd'`` in the second sequence: 451 452 >>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd") 453 >>> s.find_longest_match(0, 5, 0, 9) 454 Match(a=1, b=0, size=4) 455 456 If no blocks match, this returns ``(alo, blo, 0)``. 457 458 This method returns a :term:`named tuple` ``Match(a, b, size)``. 459 460 .. versionchanged:: 3.9 461 Added default arguments. 462 463 464 .. method:: get_matching_blocks() 465 466 Return list of triples describing non-overlapping matching subsequences. 467 Each triple is of the form ``(i, j, n)``, 468 and means that ``a[i:i+n] == b[j:j+n]``. The 469 triples are monotonically increasing in *i* and *j*. 470 471 The last triple is a dummy, and has the value ``(len(a), len(b), 0)``. It 472 is the only triple with ``n == 0``. If ``(i, j, n)`` and ``(i', j', n')`` 473 are adjacent triples in the list, and the second is not the last triple in 474 the list, then ``i+n < i'`` or ``j+n < j'``; in other words, adjacent 475 triples always describe non-adjacent equal blocks. 476 477 .. XXX Explain why a dummy is used! 478 479 .. doctest:: 480 481 >>> s = SequenceMatcher(None, "abxcd", "abcd") 482 >>> s.get_matching_blocks() 483 [Match(a=0, b=0, size=2), Match(a=3, b=2, size=2), Match(a=5, b=4, size=0)] 484 485 486 .. method:: get_opcodes() 487 488 Return list of 5-tuples describing how to turn *a* into *b*. Each tuple is 489 of the form ``(tag, i1, i2, j1, j2)``. The first tuple has ``i1 == j1 == 490 0``, and remaining tuples have *i1* equal to the *i2* from the preceding 491 tuple, and, likewise, *j1* equal to the previous *j2*. 492 493 The *tag* values are strings, with these meanings: 494 495 +---------------+---------------------------------------------+ 496 | Value | Meaning | 497 +===============+=============================================+ 498 | ``'replace'`` | ``a[i1:i2]`` should be replaced by | 499 | | ``b[j1:j2]``. | 500 +---------------+---------------------------------------------+ 501 | ``'delete'`` | ``a[i1:i2]`` should be deleted. Note that | 502 | | ``j1 == j2`` in this case. | 503 +---------------+---------------------------------------------+ 504 | ``'insert'`` | ``b[j1:j2]`` should be inserted at | 505 | | ``a[i1:i1]``. Note that ``i1 == i2`` in | 506 | | this case. | 507 +---------------+---------------------------------------------+ 508 | ``'equal'`` | ``a[i1:i2] == b[j1:j2]`` (the sub-sequences | 509 | | are equal). | 510 +---------------+---------------------------------------------+ 511 512 For example:: 513 514 >>> a = "qabxcd" 515 >>> b = "abycdf" 516 >>> s = SequenceMatcher(None, a, b) 517 >>> for tag, i1, i2, j1, j2 in s.get_opcodes(): 518 ... print('{:7} a[{}:{}] --> b[{}:{}] {!r:>8} --> {!r}'.format( 519 ... tag, i1, i2, j1, j2, a[i1:i2], b[j1:j2])) 520 delete a[0:1] --> b[0:0] 'q' --> '' 521 equal a[1:3] --> b[0:2] 'ab' --> 'ab' 522 replace a[3:4] --> b[2:3] 'x' --> 'y' 523 equal a[4:6] --> b[3:5] 'cd' --> 'cd' 524 insert a[6:6] --> b[5:6] '' --> 'f' 525 526 527 .. method:: get_grouped_opcodes(n=3) 528 529 Return a :term:`generator` of groups with up to *n* lines of context. 530 531 Starting with the groups returned by :meth:`get_opcodes`, this method 532 splits out smaller change clusters and eliminates intervening ranges which 533 have no changes. 534 535 The groups are returned in the same format as :meth:`get_opcodes`. 536 537 538 .. method:: ratio() 539 540 Return a measure of the sequences' similarity as a float in the range [0, 541 1]. 542 543 Where T is the total number of elements in both sequences, and M is the 544 number of matches, this is 2.0\*M / T. Note that this is ``1.0`` if the 545 sequences are identical, and ``0.0`` if they have nothing in common. 546 547 This is expensive to compute if :meth:`get_matching_blocks` or 548 :meth:`get_opcodes` hasn't already been called, in which case you may want 549 to try :meth:`quick_ratio` or :meth:`real_quick_ratio` first to get an 550 upper bound. 551 552 .. note:: 553 554 Caution: The result of a :meth:`ratio` call may depend on the order of 555 the arguments. For instance:: 556 557 >>> SequenceMatcher(None, 'tide', 'diet').ratio() 558 0.25 559 >>> SequenceMatcher(None, 'diet', 'tide').ratio() 560 0.5 561 562 563 .. method:: quick_ratio() 564 565 Return an upper bound on :meth:`ratio` relatively quickly. 566 567 568 .. method:: real_quick_ratio() 569 570 Return an upper bound on :meth:`ratio` very quickly. 571 572 573The three methods that return the ratio of matching to total characters can give 574different results due to differing levels of approximation, although 575:meth:`~SequenceMatcher.quick_ratio` and :meth:`~SequenceMatcher.real_quick_ratio` 576are always at least as large as :meth:`~SequenceMatcher.ratio`: 577 578 >>> s = SequenceMatcher(None, "abcd", "bcde") 579 >>> s.ratio() 580 0.75 581 >>> s.quick_ratio() 582 0.75 583 >>> s.real_quick_ratio() 584 1.0 585 586 587.. _sequencematcher-examples: 588 589SequenceMatcher Examples 590------------------------ 591 592This example compares two strings, considering blanks to be "junk": 593 594 >>> s = SequenceMatcher(lambda x: x == " ", 595 ... "private Thread currentThread;", 596 ... "private volatile Thread currentThread;") 597 598:meth:`~SequenceMatcher.ratio` returns a float in [0, 1], measuring the similarity of the 599sequences. As a rule of thumb, a :meth:`~SequenceMatcher.ratio` value over 0.6 means the 600sequences are close matches: 601 602 >>> print(round(s.ratio(), 3)) 603 0.866 604 605If you're only interested in where the sequences match, 606:meth:`~SequenceMatcher.get_matching_blocks` is handy: 607 608 >>> for block in s.get_matching_blocks(): 609 ... print("a[%d] and b[%d] match for %d elements" % block) 610 a[0] and b[0] match for 8 elements 611 a[8] and b[17] match for 21 elements 612 a[29] and b[38] match for 0 elements 613 614Note that the last tuple returned by :meth:`~SequenceMatcher.get_matching_blocks` 615is always a dummy, ``(len(a), len(b), 0)``, and this is the only case in which the last 616tuple element (number of elements matched) is ``0``. 617 618If you want to know how to change the first sequence into the second, use 619:meth:`~SequenceMatcher.get_opcodes`: 620 621 >>> for opcode in s.get_opcodes(): 622 ... print("%6s a[%d:%d] b[%d:%d]" % opcode) 623 equal a[0:8] b[0:8] 624 insert a[8:8] b[8:17] 625 equal a[8:29] b[17:38] 626 627.. seealso:: 628 629 * The :func:`get_close_matches` function in this module which shows how 630 simple code building on :class:`SequenceMatcher` can be used to do useful 631 work. 632 633 * `Simple version control recipe 634 <https://code.activestate.com/recipes/576729-simple-version-control/>`_ for a small application 635 built with :class:`SequenceMatcher`. 636 637 638.. _differ-objects: 639 640Differ Objects 641-------------- 642 643Note that :class:`Differ`\ -generated deltas make no claim to be **minimal** 644diffs. To the contrary, minimal diffs are often counter-intuitive, because they 645synch up anywhere possible, sometimes accidental matches 100 pages apart. 646Restricting synch points to contiguous matches preserves some notion of 647locality, at the occasional cost of producing a longer diff. 648 649The :class:`Differ` class has this constructor: 650 651 652.. class:: Differ(linejunk=None, charjunk=None) 653 :noindex: 654 655 Optional keyword parameters *linejunk* and *charjunk* are for filter functions 656 (or ``None``): 657 658 *linejunk*: A function that accepts a single string argument, and returns true 659 if the string is junk. The default is ``None``, meaning that no line is 660 considered junk. 661 662 *charjunk*: A function that accepts a single character argument (a string of 663 length 1), and returns true if the character is junk. The default is ``None``, 664 meaning that no character is considered junk. 665 666 These junk-filtering functions speed up matching to find 667 differences and do not cause any differing lines or characters to 668 be ignored. Read the description of the 669 :meth:`~SequenceMatcher.find_longest_match` method's *isjunk* 670 parameter for an explanation. 671 672 :class:`Differ` objects are used (deltas generated) via a single method: 673 674 675 .. method:: Differ.compare(a, b) 676 677 Compare two sequences of lines, and generate the delta (a sequence of lines). 678 679 Each sequence must contain individual single-line strings ending with 680 newlines. Such sequences can be obtained from the 681 :meth:`~io.IOBase.readlines` method of file-like objects. The delta 682 generated also consists of newline-terminated strings, ready to be 683 printed as-is via the :meth:`~io.IOBase.writelines` method of a 684 file-like object. 685 686 687.. _differ-examples: 688 689Differ Example 690-------------- 691 692This example compares two texts. First we set up the texts, sequences of 693individual single-line strings ending with newlines (such sequences can also be 694obtained from the :meth:`~io.IOBase.readlines` method of file-like objects): 695 696 >>> text1 = ''' 1. Beautiful is better than ugly. 697 ... 2. Explicit is better than implicit. 698 ... 3. Simple is better than complex. 699 ... 4. Complex is better than complicated. 700 ... '''.splitlines(keepends=True) 701 >>> len(text1) 702 4 703 >>> text1[0][-1] 704 '\n' 705 >>> text2 = ''' 1. Beautiful is better than ugly. 706 ... 3. Simple is better than complex. 707 ... 4. Complicated is better than complex. 708 ... 5. Flat is better than nested. 709 ... '''.splitlines(keepends=True) 710 711Next we instantiate a Differ object: 712 713 >>> d = Differ() 714 715Note that when instantiating a :class:`Differ` object we may pass functions to 716filter out line and character "junk." See the :meth:`Differ` constructor for 717details. 718 719Finally, we compare the two: 720 721 >>> result = list(d.compare(text1, text2)) 722 723``result`` is a list of strings, so let's pretty-print it: 724 725 >>> from pprint import pprint 726 >>> pprint(result) 727 [' 1. Beautiful is better than ugly.\n', 728 '- 2. Explicit is better than implicit.\n', 729 '- 3. Simple is better than complex.\n', 730 '+ 3. Simple is better than complex.\n', 731 '? ++\n', 732 '- 4. Complex is better than complicated.\n', 733 '? ^ ---- ^\n', 734 '+ 4. Complicated is better than complex.\n', 735 '? ++++ ^ ^\n', 736 '+ 5. Flat is better than nested.\n'] 737 738As a single multi-line string it looks like this: 739 740 >>> import sys 741 >>> sys.stdout.writelines(result) 742 1. Beautiful is better than ugly. 743 - 2. Explicit is better than implicit. 744 - 3. Simple is better than complex. 745 + 3. Simple is better than complex. 746 ? ++ 747 - 4. Complex is better than complicated. 748 ? ^ ---- ^ 749 + 4. Complicated is better than complex. 750 ? ++++ ^ ^ 751 + 5. Flat is better than nested. 752 753 754.. _difflib-interface: 755 756A command-line interface to difflib 757----------------------------------- 758 759This example shows how to use difflib to create a ``diff``-like utility. 760 761.. literalinclude:: ../includes/diff.py 762 763ndiff example 764------------- 765 766This example shows how to use :func:`difflib.ndiff`. 767 768.. literalinclude:: ../includes/ndiff.py 769