• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1:mod:`difflib` --- Helpers for computing deltas
2===============================================
3
4.. module:: difflib
5   :synopsis: Helpers for computing differences between objects.
6
7.. moduleauthor:: Tim Peters <tim_one@users.sourceforge.net>
8.. sectionauthor:: Tim Peters <tim_one@users.sourceforge.net>
9.. Markup by Fred L. Drake, Jr. <fdrake@acm.org>
10
11**Source code:** :source:`Lib/difflib.py`
12
13.. testsetup::
14
15   import sys
16   from difflib import *
17
18--------------
19
20This module provides classes and functions for comparing sequences. It
21can be used for example, for comparing files, and can produce difference
22information in various formats, including HTML and context and unified
23diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
24
25
26.. class:: SequenceMatcher
27
28   This is a flexible class for comparing pairs of sequences of any type, so long
29   as the sequence elements are :term:`hashable`.  The basic algorithm predates, and is a
30   little fancier than, an algorithm published in the late 1980's by Ratcliff and
31   Obershelp under the hyperbolic name "gestalt pattern matching."  The idea is to
32   find the longest contiguous matching subsequence that contains no "junk"
33   elements; these "junk" elements are ones that are uninteresting in some
34   sense, such as blank lines or whitespace.  (Handling junk is an
35   extension to the Ratcliff and Obershelp algorithm.) The same
36   idea is then applied recursively to the pieces of the sequences to the left and
37   to the right of the matching subsequence.  This does not yield minimal edit
38   sequences, but does tend to yield matches that "look right" to people.
39
40   **Timing:** The basic Ratcliff-Obershelp algorithm is cubic time in the worst
41   case and quadratic time in the expected case. :class:`SequenceMatcher` is
42   quadratic time for the worst case and has expected-case behavior dependent in a
43   complicated way on how many elements the sequences have in common; best case
44   time is linear.
45
46   **Automatic junk heuristic:** :class:`SequenceMatcher` supports a heuristic that
47   automatically treats certain sequence items as junk. The heuristic counts how many
48   times each individual item appears in the sequence. If an item's duplicates (after
49   the first one) account for more than 1% of the sequence and the sequence is at least
50   200 items long, this item is marked as "popular" and is treated as junk for
51   the purpose of sequence matching. This heuristic can be turned off by setting
52   the ``autojunk`` argument to ``False`` when creating the :class:`SequenceMatcher`.
53
54   .. versionadded:: 3.2
55      The *autojunk* parameter.
56
57
58.. class:: Differ
59
60   This is a class for comparing sequences of lines of text, and producing
61   human-readable differences or deltas.  Differ uses :class:`SequenceMatcher`
62   both to compare sequences of lines, and to compare sequences of characters
63   within similar (near-matching) lines.
64
65   Each line of a :class:`Differ` delta begins with a two-letter code:
66
67   +----------+-------------------------------------------+
68   | Code     | Meaning                                   |
69   +==========+===========================================+
70   | ``'- '`` | line unique to sequence 1                 |
71   +----------+-------------------------------------------+
72   | ``'+ '`` | line unique to sequence 2                 |
73   +----------+-------------------------------------------+
74   | ``'  '`` | line common to both sequences             |
75   +----------+-------------------------------------------+
76   | ``'? '`` | line not present in either input sequence |
77   +----------+-------------------------------------------+
78
79   Lines beginning with '``?``' attempt to guide the eye to intraline differences,
80   and were not present in either input sequence. These lines can be confusing if
81   the sequences contain tab characters.
82
83
84.. class:: HtmlDiff
85
86   This class can be used to create an HTML table (or a complete HTML file
87   containing the table) showing a side by side, line by line comparison of text
88   with inter-line and intra-line change highlights.  The table can be generated in
89   either full or contextual difference mode.
90
91   The constructor for this class is:
92
93
94   .. method:: __init__(tabsize=8, wrapcolumn=None, linejunk=None, charjunk=IS_CHARACTER_JUNK)
95
96      Initializes instance of :class:`HtmlDiff`.
97
98      *tabsize* is an optional keyword argument to specify tab stop spacing and
99      defaults to ``8``.
100
101      *wrapcolumn* is an optional keyword to specify column number where lines are
102      broken and wrapped, defaults to ``None`` where lines are not wrapped.
103
104      *linejunk* and *charjunk* are optional keyword arguments passed into :func:`ndiff`
105      (used by :class:`HtmlDiff` to generate the side by side HTML differences).  See
106      :func:`ndiff` documentation for argument default values and descriptions.
107
108   The following methods are public:
109
110   .. method:: make_file(fromlines, tolines, fromdesc='', todesc='', context=False, \
111                         numlines=5, *, charset='utf-8')
112
113      Compares *fromlines* and *tolines* (lists of strings) and returns a string which
114      is a complete HTML file containing a table showing line by line differences with
115      inter-line and intra-line changes highlighted.
116
117      *fromdesc* and *todesc* are optional keyword arguments to specify from/to file
118      column header strings (both default to an empty string).
119
120      *context* and *numlines* are both optional keyword arguments. Set *context* to
121      ``True`` when contextual differences are to be shown, else the default is
122      ``False`` to show the full files. *numlines* defaults to ``5``.  When *context*
123      is ``True`` *numlines* controls the number of context lines which surround the
124      difference highlights.  When *context* is ``False`` *numlines* controls the
125      number of lines which are shown before a difference highlight when using the
126      "next" hyperlinks (setting to zero would cause the "next" hyperlinks to place
127      the next difference highlight at the top of the browser without any leading
128      context).
129
130      .. note::
131         *fromdesc* and *todesc* are interpreted as unescaped HTML and should be
132         properly escaped while receiving input from untrusted sources.
133
134      .. versionchanged:: 3.5
135         *charset* keyword-only argument was added.  The default charset of
136         HTML document changed from ``'ISO-8859-1'`` to ``'utf-8'``.
137
138   .. method:: make_table(fromlines, tolines, fromdesc='', todesc='', context=False, numlines=5)
139
140      Compares *fromlines* and *tolines* (lists of strings) and returns a string which
141      is a complete HTML table showing line by line differences with inter-line and
142      intra-line changes highlighted.
143
144      The arguments for this method are the same as those for the :meth:`make_file`
145      method.
146
147   :file:`Tools/scripts/diff.py` is a command-line front-end to this class and
148   contains a good example of its use.
149
150
151.. function:: context_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\\n')
152
153   Compare *a* and *b* (lists of strings); return a delta (a :term:`generator`
154   generating the delta lines) in context diff format.
155
156   Context diffs are a compact way of showing just the lines that have changed plus
157   a few lines of context.  The changes are shown in a before/after style.  The
158   number of context lines is set by *n* which defaults to three.
159
160   By default, the diff control lines (those with ``***`` or ``---``) are created
161   with a trailing newline.  This is helpful so that inputs created from
162   :func:`io.IOBase.readlines` result in diffs that are suitable for use with
163   :func:`io.IOBase.writelines` since both the inputs and outputs have trailing
164   newlines.
165
166   For inputs that do not have trailing newlines, set the *lineterm* argument to
167   ``""`` so that the output will be uniformly newline free.
168
169   The context diff format normally has a header for filenames and modification
170   times.  Any or all of these may be specified using strings for *fromfile*,
171   *tofile*, *fromfiledate*, and *tofiledate*.  The modification times are normally
172   expressed in the ISO 8601 format. If not specified, the
173   strings default to blanks.
174
175      >>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
176      >>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
177      >>> sys.stdout.writelines(context_diff(s1, s2, fromfile='before.py', tofile='after.py'))
178      *** before.py
179      --- after.py
180      ***************
181      *** 1,4 ****
182      ! bacon
183      ! eggs
184      ! ham
185        guido
186      --- 1,4 ----
187      ! python
188      ! eggy
189      ! hamster
190        guido
191
192   See :ref:`difflib-interface` for a more detailed example.
193
194
195.. function:: get_close_matches(word, possibilities, n=3, cutoff=0.6)
196
197   Return a list of the best "good enough" matches.  *word* is a sequence for which
198   close matches are desired (typically a string), and *possibilities* is a list of
199   sequences against which to match *word* (typically a list of strings).
200
201   Optional argument *n* (default ``3``) is the maximum number of close matches to
202   return; *n* must be greater than ``0``.
203
204   Optional argument *cutoff* (default ``0.6``) is a float in the range [0, 1].
205   Possibilities that don't score at least that similar to *word* are ignored.
206
207   The best (no more than *n*) matches among the possibilities are returned in a
208   list, sorted by similarity score, most similar first.
209
210      >>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy'])
211      ['apple', 'ape']
212      >>> import keyword
213      >>> get_close_matches('wheel', keyword.kwlist)
214      ['while']
215      >>> get_close_matches('pineapple', keyword.kwlist)
216      []
217      >>> get_close_matches('accept', keyword.kwlist)
218      ['except']
219
220
221.. function:: ndiff(a, b, linejunk=None, charjunk=IS_CHARACTER_JUNK)
222
223   Compare *a* and *b* (lists of strings); return a :class:`Differ`\ -style
224   delta (a :term:`generator` generating the delta lines).
225
226   Optional keyword parameters *linejunk* and *charjunk* are filtering functions
227   (or ``None``):
228
229   *linejunk*: A function that accepts a single string argument, and returns
230   true if the string is junk, or false if not. The default is ``None``. There
231   is also a module-level function :func:`IS_LINE_JUNK`, which filters out lines
232   without visible characters, except for at most one pound character (``'#'``)
233   -- however the underlying :class:`SequenceMatcher` class does a dynamic
234   analysis of which lines are so frequent as to constitute noise, and this
235   usually works better than using this function.
236
237   *charjunk*: A function that accepts a character (a string of length 1), and
238   returns if the character is junk, or false if not. The default is module-level
239   function :func:`IS_CHARACTER_JUNK`, which filters out whitespace characters (a
240   blank or tab; it's a bad idea to include newline in this!).
241
242   :file:`Tools/scripts/ndiff.py` is a command-line front-end to this function.
243
244      >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True),
245      ...              'ore\ntree\nemu\n'.splitlines(keepends=True))
246      >>> print(''.join(diff), end="")
247      - one
248      ?  ^
249      + ore
250      ?  ^
251      - two
252      - three
253      ?  -
254      + tree
255      + emu
256
257
258.. function:: restore(sequence, which)
259
260   Return one of the two sequences that generated a delta.
261
262   Given a *sequence* produced by :meth:`Differ.compare` or :func:`ndiff`, extract
263   lines originating from file 1 or 2 (parameter *which*), stripping off line
264   prefixes.
265
266   Example:
267
268      >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True),
269      ...              'ore\ntree\nemu\n'.splitlines(keepends=True))
270      >>> diff = list(diff) # materialize the generated delta into a list
271      >>> print(''.join(restore(diff, 1)), end="")
272      one
273      two
274      three
275      >>> print(''.join(restore(diff, 2)), end="")
276      ore
277      tree
278      emu
279
280
281.. function:: unified_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\\n')
282
283   Compare *a* and *b* (lists of strings); return a delta (a :term:`generator`
284   generating the delta lines) in unified diff format.
285
286   Unified diffs are a compact way of showing just the lines that have changed plus
287   a few lines of context.  The changes are shown in an inline style (instead of
288   separate before/after blocks).  The number of context lines is set by *n* which
289   defaults to three.
290
291   By default, the diff control lines (those with ``---``, ``+++``, or ``@@``) are
292   created with a trailing newline.  This is helpful so that inputs created from
293   :func:`io.IOBase.readlines` result in diffs that are suitable for use with
294   :func:`io.IOBase.writelines` since both the inputs and outputs have trailing
295   newlines.
296
297   For inputs that do not have trailing newlines, set the *lineterm* argument to
298   ``""`` so that the output will be uniformly newline free.
299
300   The context diff format normally has a header for filenames and modification
301   times.  Any or all of these may be specified using strings for *fromfile*,
302   *tofile*, *fromfiledate*, and *tofiledate*.  The modification times are normally
303   expressed in the ISO 8601 format. If not specified, the
304   strings default to blanks.
305
306
307      >>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
308      >>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
309      >>> sys.stdout.writelines(unified_diff(s1, s2, fromfile='before.py', tofile='after.py'))
310      --- before.py
311      +++ after.py
312      @@ -1,4 +1,4 @@
313      -bacon
314      -eggs
315      -ham
316      +python
317      +eggy
318      +hamster
319       guido
320
321   See :ref:`difflib-interface` for a more detailed example.
322
323.. function:: diff_bytes(dfunc, a, b, fromfile=b'', tofile=b'', fromfiledate=b'', tofiledate=b'', n=3, lineterm=b'\\n')
324
325   Compare *a* and *b* (lists of bytes objects) using *dfunc*; yield a
326   sequence of delta lines (also bytes) in the format returned by *dfunc*.
327   *dfunc* must be a callable, typically either :func:`unified_diff` or
328   :func:`context_diff`.
329
330   Allows you to compare data with unknown or inconsistent encoding. All
331   inputs except *n* must be bytes objects, not str. Works by losslessly
332   converting all inputs (except *n*) to str, and calling ``dfunc(a, b,
333   fromfile, tofile, fromfiledate, tofiledate, n, lineterm)``. The output of
334   *dfunc* is then converted back to bytes, so the delta lines that you
335   receive have the same unknown/inconsistent encodings as *a* and *b*.
336
337   .. versionadded:: 3.5
338
339.. function:: IS_LINE_JUNK(line)
340
341   Return ``True`` for ignorable lines.  The line *line* is ignorable if *line* is
342   blank or contains a single ``'#'``, otherwise it is not ignorable.  Used as a
343   default for parameter *linejunk* in :func:`ndiff` in older versions.
344
345
346.. function:: IS_CHARACTER_JUNK(ch)
347
348   Return ``True`` for ignorable characters.  The character *ch* is ignorable if *ch*
349   is a space or tab, otherwise it is not ignorable.  Used as a default for
350   parameter *charjunk* in :func:`ndiff`.
351
352
353.. seealso::
354
355   `Pattern Matching: The Gestalt Approach <http://www.drdobbs.com/database/pattern-matching-the-gestalt-approach/184407970>`_
356      Discussion of a similar algorithm by John W. Ratcliff and D. E. Metzener. This
357      was published in `Dr. Dobb's Journal <http://www.drdobbs.com/>`_ in July, 1988.
358
359
360.. _sequence-matcher:
361
362SequenceMatcher Objects
363-----------------------
364
365The :class:`SequenceMatcher` class has this constructor:
366
367
368.. class:: SequenceMatcher(isjunk=None, a='', b='', autojunk=True)
369
370   Optional argument *isjunk* must be ``None`` (the default) or a one-argument
371   function that takes a sequence element and returns true if and only if the
372   element is "junk" and should be ignored. Passing ``None`` for *isjunk* is
373   equivalent to passing ``lambda x: False``; in other words, no elements are ignored.
374   For example, pass::
375
376      lambda x: x in " \t"
377
378   if you're comparing lines as sequences of characters, and don't want to synch up
379   on blanks or hard tabs.
380
381   The optional arguments *a* and *b* are sequences to be compared; both default to
382   empty strings.  The elements of both sequences must be :term:`hashable`.
383
384   The optional argument *autojunk* can be used to disable the automatic junk
385   heuristic.
386
387   .. versionadded:: 3.2
388      The *autojunk* parameter.
389
390   SequenceMatcher objects get three data attributes: *bjunk* is the
391   set of elements of *b* for which *isjunk* is ``True``; *bpopular* is the set of
392   non-junk elements considered popular by the heuristic (if it is not
393   disabled); *b2j* is a dict mapping the remaining elements of *b* to a list
394   of positions where they occur. All three are reset whenever *b* is reset
395   with :meth:`set_seqs` or :meth:`set_seq2`.
396
397   .. versionadded:: 3.2
398      The *bjunk* and *bpopular* attributes.
399
400   :class:`SequenceMatcher` objects have the following methods:
401
402   .. method:: set_seqs(a, b)
403
404      Set the two sequences to be compared.
405
406   :class:`SequenceMatcher` computes and caches detailed information about the
407   second sequence, so if you want to compare one sequence against many
408   sequences, use :meth:`set_seq2` to set the commonly used sequence once and
409   call :meth:`set_seq1` repeatedly, once for each of the other sequences.
410
411
412   .. method:: set_seq1(a)
413
414      Set the first sequence to be compared.  The second sequence to be compared
415      is not changed.
416
417
418   .. method:: set_seq2(b)
419
420      Set the second sequence to be compared.  The first sequence to be compared
421      is not changed.
422
423
424   .. method:: find_longest_match(alo, ahi, blo, bhi)
425
426      Find longest matching block in ``a[alo:ahi]`` and ``b[blo:bhi]``.
427
428      If *isjunk* was omitted or ``None``, :meth:`find_longest_match` returns
429      ``(i, j, k)`` such that ``a[i:i+k]`` is equal to ``b[j:j+k]``, where ``alo
430      <= i <= i+k <= ahi`` and ``blo <= j <= j+k <= bhi``. For all ``(i', j',
431      k')`` meeting those conditions, the additional conditions ``k >= k'``, ``i
432      <= i'``, and if ``i == i'``, ``j <= j'`` are also met. In other words, of
433      all maximal matching blocks, return one that starts earliest in *a*, and
434      of all those maximal matching blocks that start earliest in *a*, return
435      the one that starts earliest in *b*.
436
437         >>> s = SequenceMatcher(None, " abcd", "abcd abcd")
438         >>> s.find_longest_match(0, 5, 0, 9)
439         Match(a=0, b=4, size=5)
440
441      If *isjunk* was provided, first the longest matching block is determined
442      as above, but with the additional restriction that no junk element appears
443      in the block.  Then that block is extended as far as possible by matching
444      (only) junk elements on both sides. So the resulting block never matches
445      on junk except as identical junk happens to be adjacent to an interesting
446      match.
447
448      Here's the same example as before, but considering blanks to be junk. That
449      prevents ``' abcd'`` from matching the ``' abcd'`` at the tail end of the
450      second sequence directly.  Instead only the ``'abcd'`` can match, and
451      matches the leftmost ``'abcd'`` in the second sequence:
452
453         >>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd")
454         >>> s.find_longest_match(0, 5, 0, 9)
455         Match(a=1, b=0, size=4)
456
457      If no blocks match, this returns ``(alo, blo, 0)``.
458
459      This method returns a :term:`named tuple` ``Match(a, b, size)``.
460
461
462   .. method:: get_matching_blocks()
463
464      Return list of triples describing non-overlapping matching subsequences.
465      Each triple is of the form ``(i, j, n)``,
466      and means that ``a[i:i+n] == b[j:j+n]``.  The
467      triples are monotonically increasing in *i* and *j*.
468
469      The last triple is a dummy, and has the value ``(len(a), len(b), 0)``.  It
470      is the only triple with ``n == 0``.  If ``(i, j, n)`` and ``(i', j', n')``
471      are adjacent triples in the list, and the second is not the last triple in
472      the list, then ``i+n < i'`` or ``j+n < j'``; in other words, adjacent
473      triples always describe non-adjacent equal blocks.
474
475      .. XXX Explain why a dummy is used!
476
477      .. doctest::
478
479         >>> s = SequenceMatcher(None, "abxcd", "abcd")
480         >>> s.get_matching_blocks()
481         [Match(a=0, b=0, size=2), Match(a=3, b=2, size=2), Match(a=5, b=4, size=0)]
482
483
484   .. method:: get_opcodes()
485
486      Return list of 5-tuples describing how to turn *a* into *b*. Each tuple is
487      of the form ``(tag, i1, i2, j1, j2)``.  The first tuple has ``i1 == j1 ==
488      0``, and remaining tuples have *i1* equal to the *i2* from the preceding
489      tuple, and, likewise, *j1* equal to the previous *j2*.
490
491      The *tag* values are strings, with these meanings:
492
493      +---------------+---------------------------------------------+
494      | Value         | Meaning                                     |
495      +===============+=============================================+
496      | ``'replace'`` | ``a[i1:i2]`` should be replaced by          |
497      |               | ``b[j1:j2]``.                               |
498      +---------------+---------------------------------------------+
499      | ``'delete'``  | ``a[i1:i2]`` should be deleted.  Note that  |
500      |               | ``j1 == j2`` in this case.                  |
501      +---------------+---------------------------------------------+
502      | ``'insert'``  | ``b[j1:j2]`` should be inserted at          |
503      |               | ``a[i1:i1]``. Note that ``i1 == i2`` in     |
504      |               | this case.                                  |
505      +---------------+---------------------------------------------+
506      | ``'equal'``   | ``a[i1:i2] == b[j1:j2]`` (the sub-sequences |
507      |               | are equal).                                 |
508      +---------------+---------------------------------------------+
509
510      For example::
511
512        >>> a = "qabxcd"
513        >>> b = "abycdf"
514        >>> s = SequenceMatcher(None, a, b)
515        >>> for tag, i1, i2, j1, j2 in s.get_opcodes():
516        ...     print('{:7}   a[{}:{}] --> b[{}:{}] {!r:>8} --> {!r}'.format(
517        ...         tag, i1, i2, j1, j2, a[i1:i2], b[j1:j2]))
518        delete    a[0:1] --> b[0:0]      'q' --> ''
519        equal     a[1:3] --> b[0:2]     'ab' --> 'ab'
520        replace   a[3:4] --> b[2:3]      'x' --> 'y'
521        equal     a[4:6] --> b[3:5]     'cd' --> 'cd'
522        insert    a[6:6] --> b[5:6]       '' --> 'f'
523
524
525   .. method:: get_grouped_opcodes(n=3)
526
527      Return a :term:`generator` of groups with up to *n* lines of context.
528
529      Starting with the groups returned by :meth:`get_opcodes`, this method
530      splits out smaller change clusters and eliminates intervening ranges which
531      have no changes.
532
533      The groups are returned in the same format as :meth:`get_opcodes`.
534
535
536   .. method:: ratio()
537
538      Return a measure of the sequences' similarity as a float in the range [0,
539      1].
540
541      Where T is the total number of elements in both sequences, and M is the
542      number of matches, this is 2.0\*M / T. Note that this is ``1.0`` if the
543      sequences are identical, and ``0.0`` if they have nothing in common.
544
545      This is expensive to compute if :meth:`get_matching_blocks` or
546      :meth:`get_opcodes` hasn't already been called, in which case you may want
547      to try :meth:`quick_ratio` or :meth:`real_quick_ratio` first to get an
548      upper bound.
549
550      .. note::
551
552         Caution: The result of a :meth:`ratio` call may depend on the order of
553         the arguments. For instance::
554
555            >>> SequenceMatcher(None, 'tide', 'diet').ratio()
556            0.25
557            >>> SequenceMatcher(None, 'diet', 'tide').ratio()
558            0.5
559
560
561   .. method:: quick_ratio()
562
563      Return an upper bound on :meth:`ratio` relatively quickly.
564
565
566   .. method:: real_quick_ratio()
567
568      Return an upper bound on :meth:`ratio` very quickly.
569
570
571The three methods that return the ratio of matching to total characters can give
572different results due to differing levels of approximation, although
573:meth:`quick_ratio` and :meth:`real_quick_ratio` are always at least as large as
574:meth:`ratio`:
575
576   >>> s = SequenceMatcher(None, "abcd", "bcde")
577   >>> s.ratio()
578   0.75
579   >>> s.quick_ratio()
580   0.75
581   >>> s.real_quick_ratio()
582   1.0
583
584
585.. _sequencematcher-examples:
586
587SequenceMatcher Examples
588------------------------
589
590This example compares two strings, considering blanks to be "junk":
591
592   >>> s = SequenceMatcher(lambda x: x == " ",
593   ...                     "private Thread currentThread;",
594   ...                     "private volatile Thread currentThread;")
595
596:meth:`ratio` returns a float in [0, 1], measuring the similarity of the
597sequences.  As a rule of thumb, a :meth:`ratio` value over 0.6 means the
598sequences are close matches:
599
600   >>> print(round(s.ratio(), 3))
601   0.866
602
603If you're only interested in where the sequences match,
604:meth:`get_matching_blocks` is handy:
605
606   >>> for block in s.get_matching_blocks():
607   ...     print("a[%d] and b[%d] match for %d elements" % block)
608   a[0] and b[0] match for 8 elements
609   a[8] and b[17] match for 21 elements
610   a[29] and b[38] match for 0 elements
611
612Note that the last tuple returned by :meth:`get_matching_blocks` is always a
613dummy, ``(len(a), len(b), 0)``, and this is the only case in which the last
614tuple element (number of elements matched) is ``0``.
615
616If you want to know how to change the first sequence into the second, use
617:meth:`get_opcodes`:
618
619   >>> for opcode in s.get_opcodes():
620   ...     print("%6s a[%d:%d] b[%d:%d]" % opcode)
621    equal a[0:8] b[0:8]
622   insert a[8:8] b[8:17]
623    equal a[8:29] b[17:38]
624
625.. seealso::
626
627   * The :func:`get_close_matches` function in this module which shows how
628     simple code building on :class:`SequenceMatcher` can be used to do useful
629     work.
630
631   * `Simple version control recipe
632     <https://code.activestate.com/recipes/576729/>`_ for a small application
633     built with :class:`SequenceMatcher`.
634
635
636.. _differ-objects:
637
638Differ Objects
639--------------
640
641Note that :class:`Differ`\ -generated deltas make no claim to be **minimal**
642diffs. To the contrary, minimal diffs are often counter-intuitive, because they
643synch up anywhere possible, sometimes accidental matches 100 pages apart.
644Restricting synch points to contiguous matches preserves some notion of
645locality, at the occasional cost of producing a longer diff.
646
647The :class:`Differ` class has this constructor:
648
649
650.. class:: Differ(linejunk=None, charjunk=None)
651
652   Optional keyword parameters *linejunk* and *charjunk* are for filter functions
653   (or ``None``):
654
655   *linejunk*: A function that accepts a single string argument, and returns true
656   if the string is junk.  The default is ``None``, meaning that no line is
657   considered junk.
658
659   *charjunk*: A function that accepts a single character argument (a string of
660   length 1), and returns true if the character is junk. The default is ``None``,
661   meaning that no character is considered junk.
662
663   These junk-filtering functions speed up matching to find
664   differences and do not cause any differing lines or characters to
665   be ignored.  Read the description of the
666   :meth:`~SequenceMatcher.find_longest_match` method's *isjunk*
667   parameter for an explanation.
668
669   :class:`Differ` objects are used (deltas generated) via a single method:
670
671
672   .. method:: Differ.compare(a, b)
673
674      Compare two sequences of lines, and generate the delta (a sequence of lines).
675
676      Each sequence must contain individual single-line strings ending with
677      newlines.  Such sequences can be obtained from the
678      :meth:`~io.IOBase.readlines` method of file-like objects.  The delta
679      generated also consists of newline-terminated strings, ready to be
680      printed as-is via the :meth:`~io.IOBase.writelines` method of a
681      file-like object.
682
683
684.. _differ-examples:
685
686Differ Example
687--------------
688
689This example compares two texts. First we set up the texts, sequences of
690individual single-line strings ending with newlines (such sequences can also be
691obtained from the :meth:`~io.BaseIO.readlines` method of file-like objects):
692
693   >>> text1 = '''  1. Beautiful is better than ugly.
694   ...   2. Explicit is better than implicit.
695   ...   3. Simple is better than complex.
696   ...   4. Complex is better than complicated.
697   ... '''.splitlines(keepends=True)
698   >>> len(text1)
699   4
700   >>> text1[0][-1]
701   '\n'
702   >>> text2 = '''  1. Beautiful is better than ugly.
703   ...   3.   Simple is better than complex.
704   ...   4. Complicated is better than complex.
705   ...   5. Flat is better than nested.
706   ... '''.splitlines(keepends=True)
707
708Next we instantiate a Differ object:
709
710   >>> d = Differ()
711
712Note that when instantiating a :class:`Differ` object we may pass functions to
713filter out line and character "junk."  See the :meth:`Differ` constructor for
714details.
715
716Finally, we compare the two:
717
718   >>> result = list(d.compare(text1, text2))
719
720``result`` is a list of strings, so let's pretty-print it:
721
722   >>> from pprint import pprint
723   >>> pprint(result)
724   ['    1. Beautiful is better than ugly.\n',
725    '-   2. Explicit is better than implicit.\n',
726    '-   3. Simple is better than complex.\n',
727    '+   3.   Simple is better than complex.\n',
728    '?     ++\n',
729    '-   4. Complex is better than complicated.\n',
730    '?            ^                     ---- ^\n',
731    '+   4. Complicated is better than complex.\n',
732    '?           ++++ ^                      ^\n',
733    '+   5. Flat is better than nested.\n']
734
735As a single multi-line string it looks like this:
736
737   >>> import sys
738   >>> sys.stdout.writelines(result)
739       1. Beautiful is better than ugly.
740   -   2. Explicit is better than implicit.
741   -   3. Simple is better than complex.
742   +   3.   Simple is better than complex.
743   ?     ++
744   -   4. Complex is better than complicated.
745   ?            ^                     ---- ^
746   +   4. Complicated is better than complex.
747   ?           ++++ ^                      ^
748   +   5. Flat is better than nested.
749
750
751.. _difflib-interface:
752
753A command-line interface to difflib
754-----------------------------------
755
756This example shows how to use difflib to create a ``diff``-like utility.
757It is also contained in the Python source distribution, as
758:file:`Tools/scripts/diff.py`.
759
760.. literalinclude:: ../../Tools/scripts/diff.py
761