• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1:mod:`difflib` --- Helpers for computing deltas
2===============================================
3
4.. module:: difflib
5   :synopsis: Helpers for computing differences between objects.
6
7.. moduleauthor:: Tim Peters <tim_one@users.sourceforge.net>
8.. sectionauthor:: Tim Peters <tim_one@users.sourceforge.net>
9.. Markup by Fred L. Drake, Jr. <fdrake@acm.org>
10
11**Source code:** :source:`Lib/difflib.py`
12
13.. testsetup::
14
15   import sys
16   from difflib import *
17
18--------------
19
20This module provides classes and functions for comparing sequences. It
21can be used for example, for comparing files, and can produce information
22about file differences in various formats, including HTML and context and unified
23diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
24
25
26.. class:: SequenceMatcher
27   :noindex:
28
29   This is a flexible class for comparing pairs of sequences of any type, so long
30   as the sequence elements are :term:`hashable`.  The basic algorithm predates, and is a
31   little fancier than, an algorithm published in the late 1980's by Ratcliff and
32   Obershelp under the hyperbolic name "gestalt pattern matching."  The idea is to
33   find the longest contiguous matching subsequence that contains no "junk"
34   elements; these "junk" elements are ones that are uninteresting in some
35   sense, such as blank lines or whitespace.  (Handling junk is an
36   extension to the Ratcliff and Obershelp algorithm.) The same
37   idea is then applied recursively to the pieces of the sequences to the left and
38   to the right of the matching subsequence.  This does not yield minimal edit
39   sequences, but does tend to yield matches that "look right" to people.
40
41   **Timing:** The basic Ratcliff-Obershelp algorithm is cubic time in the worst
42   case and quadratic time in the expected case. :class:`SequenceMatcher` is
43   quadratic time for the worst case and has expected-case behavior dependent in a
44   complicated way on how many elements the sequences have in common; best case
45   time is linear.
46
47   **Automatic junk heuristic:** :class:`SequenceMatcher` supports a heuristic that
48   automatically treats certain sequence items as junk. The heuristic counts how many
49   times each individual item appears in the sequence. If an item's duplicates (after
50   the first one) account for more than 1% of the sequence and the sequence is at least
51   200 items long, this item is marked as "popular" and is treated as junk for
52   the purpose of sequence matching. This heuristic can be turned off by setting
53   the ``autojunk`` argument to ``False`` when creating the :class:`SequenceMatcher`.
54
55   .. versionadded:: 3.2
56      The *autojunk* parameter.
57
58
59.. class:: Differ
60
61   This is a class for comparing sequences of lines of text, and producing
62   human-readable differences or deltas.  Differ uses :class:`SequenceMatcher`
63   both to compare sequences of lines, and to compare sequences of characters
64   within similar (near-matching) lines.
65
66   Each line of a :class:`Differ` delta begins with a two-letter code:
67
68   +----------+-------------------------------------------+
69   | Code     | Meaning                                   |
70   +==========+===========================================+
71   | ``'- '`` | line unique to sequence 1                 |
72   +----------+-------------------------------------------+
73   | ``'+ '`` | line unique to sequence 2                 |
74   +----------+-------------------------------------------+
75   | ``'  '`` | line common to both sequences             |
76   +----------+-------------------------------------------+
77   | ``'? '`` | line not present in either input sequence |
78   +----------+-------------------------------------------+
79
80   Lines beginning with '``?``' attempt to guide the eye to intraline differences,
81   and were not present in either input sequence. These lines can be confusing if
82   the sequences contain tab characters.
83
84
85.. class:: HtmlDiff
86
87   This class can be used to create an HTML table (or a complete HTML file
88   containing the table) showing a side by side, line by line comparison of text
89   with inter-line and intra-line change highlights.  The table can be generated in
90   either full or contextual difference mode.
91
92   The constructor for this class is:
93
94
95   .. method:: __init__(tabsize=8, wrapcolumn=None, linejunk=None, charjunk=IS_CHARACTER_JUNK)
96
97      Initializes instance of :class:`HtmlDiff`.
98
99      *tabsize* is an optional keyword argument to specify tab stop spacing and
100      defaults to ``8``.
101
102      *wrapcolumn* is an optional keyword to specify column number where lines are
103      broken and wrapped, defaults to ``None`` where lines are not wrapped.
104
105      *linejunk* and *charjunk* are optional keyword arguments passed into :func:`ndiff`
106      (used by :class:`HtmlDiff` to generate the side by side HTML differences).  See
107      :func:`ndiff` documentation for argument default values and descriptions.
108
109   The following methods are public:
110
111   .. method:: make_file(fromlines, tolines, fromdesc='', todesc='', context=False, \
112                         numlines=5, *, charset='utf-8')
113
114      Compares *fromlines* and *tolines* (lists of strings) and returns a string which
115      is a complete HTML file containing a table showing line by line differences with
116      inter-line and intra-line changes highlighted.
117
118      *fromdesc* and *todesc* are optional keyword arguments to specify from/to file
119      column header strings (both default to an empty string).
120
121      *context* and *numlines* are both optional keyword arguments. Set *context* to
122      ``True`` when contextual differences are to be shown, else the default is
123      ``False`` to show the full files. *numlines* defaults to ``5``.  When *context*
124      is ``True`` *numlines* controls the number of context lines which surround the
125      difference highlights.  When *context* is ``False`` *numlines* controls the
126      number of lines which are shown before a difference highlight when using the
127      "next" hyperlinks (setting to zero would cause the "next" hyperlinks to place
128      the next difference highlight at the top of the browser without any leading
129      context).
130
131      .. note::
132         *fromdesc* and *todesc* are interpreted as unescaped HTML and should be
133         properly escaped while receiving input from untrusted sources.
134
135      .. versionchanged:: 3.5
136         *charset* keyword-only argument was added.  The default charset of
137         HTML document changed from ``'ISO-8859-1'`` to ``'utf-8'``.
138
139   .. method:: make_table(fromlines, tolines, fromdesc='', todesc='', context=False, numlines=5)
140
141      Compares *fromlines* and *tolines* (lists of strings) and returns a string which
142      is a complete HTML table showing line by line differences with inter-line and
143      intra-line changes highlighted.
144
145      The arguments for this method are the same as those for the :meth:`make_file`
146      method.
147
148   :file:`Tools/scripts/diff.py` is a command-line front-end to this class and
149   contains a good example of its use.
150
151
152.. function:: context_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\\n')
153
154   Compare *a* and *b* (lists of strings); return a delta (a :term:`generator`
155   generating the delta lines) in context diff format.
156
157   Context diffs are a compact way of showing just the lines that have changed plus
158   a few lines of context.  The changes are shown in a before/after style.  The
159   number of context lines is set by *n* which defaults to three.
160
161   By default, the diff control lines (those with ``***`` or ``---``) are created
162   with a trailing newline.  This is helpful so that inputs created from
163   :func:`io.IOBase.readlines` result in diffs that are suitable for use with
164   :func:`io.IOBase.writelines` since both the inputs and outputs have trailing
165   newlines.
166
167   For inputs that do not have trailing newlines, set the *lineterm* argument to
168   ``""`` so that the output will be uniformly newline free.
169
170   The context diff format normally has a header for filenames and modification
171   times.  Any or all of these may be specified using strings for *fromfile*,
172   *tofile*, *fromfiledate*, and *tofiledate*.  The modification times are normally
173   expressed in the ISO 8601 format. If not specified, the
174   strings default to blanks.
175
176      >>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
177      >>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
178      >>> sys.stdout.writelines(context_diff(s1, s2, fromfile='before.py', tofile='after.py'))
179      *** before.py
180      --- after.py
181      ***************
182      *** 1,4 ****
183      ! bacon
184      ! eggs
185      ! ham
186        guido
187      --- 1,4 ----
188      ! python
189      ! eggy
190      ! hamster
191        guido
192
193   See :ref:`difflib-interface` for a more detailed example.
194
195
196.. function:: get_close_matches(word, possibilities, n=3, cutoff=0.6)
197
198   Return a list of the best "good enough" matches.  *word* is a sequence for which
199   close matches are desired (typically a string), and *possibilities* is a list of
200   sequences against which to match *word* (typically a list of strings).
201
202   Optional argument *n* (default ``3``) is the maximum number of close matches to
203   return; *n* must be greater than ``0``.
204
205   Optional argument *cutoff* (default ``0.6``) is a float in the range [0, 1].
206   Possibilities that don't score at least that similar to *word* are ignored.
207
208   The best (no more than *n*) matches among the possibilities are returned in a
209   list, sorted by similarity score, most similar first.
210
211      >>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy'])
212      ['apple', 'ape']
213      >>> import keyword
214      >>> get_close_matches('wheel', keyword.kwlist)
215      ['while']
216      >>> get_close_matches('pineapple', keyword.kwlist)
217      []
218      >>> get_close_matches('accept', keyword.kwlist)
219      ['except']
220
221
222.. function:: ndiff(a, b, linejunk=None, charjunk=IS_CHARACTER_JUNK)
223
224   Compare *a* and *b* (lists of strings); return a :class:`Differ`\ -style
225   delta (a :term:`generator` generating the delta lines).
226
227   Optional keyword parameters *linejunk* and *charjunk* are filtering functions
228   (or ``None``):
229
230   *linejunk*: A function that accepts a single string argument, and returns
231   true if the string is junk, or false if not. The default is ``None``. There
232   is also a module-level function :func:`IS_LINE_JUNK`, which filters out lines
233   without visible characters, except for at most one pound character (``'#'``)
234   -- however the underlying :class:`SequenceMatcher` class does a dynamic
235   analysis of which lines are so frequent as to constitute noise, and this
236   usually works better than using this function.
237
238   *charjunk*: A function that accepts a character (a string of length 1), and
239   returns if the character is junk, or false if not. The default is module-level
240   function :func:`IS_CHARACTER_JUNK`, which filters out whitespace characters (a
241   blank or tab; it's a bad idea to include newline in this!).
242
243   :file:`Tools/scripts/ndiff.py` is a command-line front-end to this function.
244
245      >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True),
246      ...              'ore\ntree\nemu\n'.splitlines(keepends=True))
247      >>> print(''.join(diff), end="")
248      - one
249      ?  ^
250      + ore
251      ?  ^
252      - two
253      - three
254      ?  -
255      + tree
256      + emu
257
258
259.. function:: restore(sequence, which)
260
261   Return one of the two sequences that generated a delta.
262
263   Given a *sequence* produced by :meth:`Differ.compare` or :func:`ndiff`, extract
264   lines originating from file 1 or 2 (parameter *which*), stripping off line
265   prefixes.
266
267   Example:
268
269      >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True),
270      ...              'ore\ntree\nemu\n'.splitlines(keepends=True))
271      >>> diff = list(diff) # materialize the generated delta into a list
272      >>> print(''.join(restore(diff, 1)), end="")
273      one
274      two
275      three
276      >>> print(''.join(restore(diff, 2)), end="")
277      ore
278      tree
279      emu
280
281
282.. function:: unified_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\\n')
283
284   Compare *a* and *b* (lists of strings); return a delta (a :term:`generator`
285   generating the delta lines) in unified diff format.
286
287   Unified diffs are a compact way of showing just the lines that have changed plus
288   a few lines of context.  The changes are shown in an inline style (instead of
289   separate before/after blocks).  The number of context lines is set by *n* which
290   defaults to three.
291
292   By default, the diff control lines (those with ``---``, ``+++``, or ``@@``) are
293   created with a trailing newline.  This is helpful so that inputs created from
294   :func:`io.IOBase.readlines` result in diffs that are suitable for use with
295   :func:`io.IOBase.writelines` since both the inputs and outputs have trailing
296   newlines.
297
298   For inputs that do not have trailing newlines, set the *lineterm* argument to
299   ``""`` so that the output will be uniformly newline free.
300
301   The context diff format normally has a header for filenames and modification
302   times.  Any or all of these may be specified using strings for *fromfile*,
303   *tofile*, *fromfiledate*, and *tofiledate*.  The modification times are normally
304   expressed in the ISO 8601 format. If not specified, the
305   strings default to blanks.
306
307
308      >>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
309      >>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
310      >>> sys.stdout.writelines(unified_diff(s1, s2, fromfile='before.py', tofile='after.py'))
311      --- before.py
312      +++ after.py
313      @@ -1,4 +1,4 @@
314      -bacon
315      -eggs
316      -ham
317      +python
318      +eggy
319      +hamster
320       guido
321
322   See :ref:`difflib-interface` for a more detailed example.
323
324.. function:: diff_bytes(dfunc, a, b, fromfile=b'', tofile=b'', fromfiledate=b'', tofiledate=b'', n=3, lineterm=b'\\n')
325
326   Compare *a* and *b* (lists of bytes objects) using *dfunc*; yield a
327   sequence of delta lines (also bytes) in the format returned by *dfunc*.
328   *dfunc* must be a callable, typically either :func:`unified_diff` or
329   :func:`context_diff`.
330
331   Allows you to compare data with unknown or inconsistent encoding. All
332   inputs except *n* must be bytes objects, not str. Works by losslessly
333   converting all inputs (except *n*) to str, and calling ``dfunc(a, b,
334   fromfile, tofile, fromfiledate, tofiledate, n, lineterm)``. The output of
335   *dfunc* is then converted back to bytes, so the delta lines that you
336   receive have the same unknown/inconsistent encodings as *a* and *b*.
337
338   .. versionadded:: 3.5
339
340.. function:: IS_LINE_JUNK(line)
341
342   Return ``True`` for ignorable lines.  The line *line* is ignorable if *line* is
343   blank or contains a single ``'#'``, otherwise it is not ignorable.  Used as a
344   default for parameter *linejunk* in :func:`ndiff` in older versions.
345
346
347.. function:: IS_CHARACTER_JUNK(ch)
348
349   Return ``True`` for ignorable characters.  The character *ch* is ignorable if *ch*
350   is a space or tab, otherwise it is not ignorable.  Used as a default for
351   parameter *charjunk* in :func:`ndiff`.
352
353
354.. seealso::
355
356   `Pattern Matching: The Gestalt Approach <http://www.drdobbs.com/database/pattern-matching-the-gestalt-approach/184407970>`_
357      Discussion of a similar algorithm by John W. Ratcliff and D. E. Metzener. This
358      was published in `Dr. Dobb's Journal <http://www.drdobbs.com/>`_ in July, 1988.
359
360
361.. _sequence-matcher:
362
363SequenceMatcher Objects
364-----------------------
365
366The :class:`SequenceMatcher` class has this constructor:
367
368
369.. class:: SequenceMatcher(isjunk=None, a='', b='', autojunk=True)
370
371   Optional argument *isjunk* must be ``None`` (the default) or a one-argument
372   function that takes a sequence element and returns true if and only if the
373   element is "junk" and should be ignored. Passing ``None`` for *isjunk* is
374   equivalent to passing ``lambda x: False``; in other words, no elements are ignored.
375   For example, pass::
376
377      lambda x: x in " \t"
378
379   if you're comparing lines as sequences of characters, and don't want to synch up
380   on blanks or hard tabs.
381
382   The optional arguments *a* and *b* are sequences to be compared; both default to
383   empty strings.  The elements of both sequences must be :term:`hashable`.
384
385   The optional argument *autojunk* can be used to disable the automatic junk
386   heuristic.
387
388   .. versionadded:: 3.2
389      The *autojunk* parameter.
390
391   SequenceMatcher objects get three data attributes: *bjunk* is the
392   set of elements of *b* for which *isjunk* is ``True``; *bpopular* is the set of
393   non-junk elements considered popular by the heuristic (if it is not
394   disabled); *b2j* is a dict mapping the remaining elements of *b* to a list
395   of positions where they occur. All three are reset whenever *b* is reset
396   with :meth:`set_seqs` or :meth:`set_seq2`.
397
398   .. versionadded:: 3.2
399      The *bjunk* and *bpopular* attributes.
400
401   :class:`SequenceMatcher` objects have the following methods:
402
403   .. method:: set_seqs(a, b)
404
405      Set the two sequences to be compared.
406
407   :class:`SequenceMatcher` computes and caches detailed information about the
408   second sequence, so if you want to compare one sequence against many
409   sequences, use :meth:`set_seq2` to set the commonly used sequence once and
410   call :meth:`set_seq1` repeatedly, once for each of the other sequences.
411
412
413   .. method:: set_seq1(a)
414
415      Set the first sequence to be compared.  The second sequence to be compared
416      is not changed.
417
418
419   .. method:: set_seq2(b)
420
421      Set the second sequence to be compared.  The first sequence to be compared
422      is not changed.
423
424
425   .. method:: find_longest_match(alo=0, ahi=None, blo=0, bhi=None)
426
427      Find longest matching block in ``a[alo:ahi]`` and ``b[blo:bhi]``.
428
429      If *isjunk* was omitted or ``None``, :meth:`find_longest_match` returns
430      ``(i, j, k)`` such that ``a[i:i+k]`` is equal to ``b[j:j+k]``, where ``alo
431      <= i <= i+k <= ahi`` and ``blo <= j <= j+k <= bhi``. For all ``(i', j',
432      k')`` meeting those conditions, the additional conditions ``k >= k'``, ``i
433      <= i'``, and if ``i == i'``, ``j <= j'`` are also met. In other words, of
434      all maximal matching blocks, return one that starts earliest in *a*, and
435      of all those maximal matching blocks that start earliest in *a*, return
436      the one that starts earliest in *b*.
437
438         >>> s = SequenceMatcher(None, " abcd", "abcd abcd")
439         >>> s.find_longest_match(0, 5, 0, 9)
440         Match(a=0, b=4, size=5)
441
442      If *isjunk* was provided, first the longest matching block is determined
443      as above, but with the additional restriction that no junk element appears
444      in the block.  Then that block is extended as far as possible by matching
445      (only) junk elements on both sides. So the resulting block never matches
446      on junk except as identical junk happens to be adjacent to an interesting
447      match.
448
449      Here's the same example as before, but considering blanks to be junk. That
450      prevents ``' abcd'`` from matching the ``' abcd'`` at the tail end of the
451      second sequence directly.  Instead only the ``'abcd'`` can match, and
452      matches the leftmost ``'abcd'`` in the second sequence:
453
454         >>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd")
455         >>> s.find_longest_match(0, 5, 0, 9)
456         Match(a=1, b=0, size=4)
457
458      If no blocks match, this returns ``(alo, blo, 0)``.
459
460      This method returns a :term:`named tuple` ``Match(a, b, size)``.
461
462      .. versionchanged:: 3.9
463         Added default arguments.
464
465
466   .. method:: get_matching_blocks()
467
468      Return list of triples describing non-overlapping matching subsequences.
469      Each triple is of the form ``(i, j, n)``,
470      and means that ``a[i:i+n] == b[j:j+n]``.  The
471      triples are monotonically increasing in *i* and *j*.
472
473      The last triple is a dummy, and has the value ``(len(a), len(b), 0)``.  It
474      is the only triple with ``n == 0``.  If ``(i, j, n)`` and ``(i', j', n')``
475      are adjacent triples in the list, and the second is not the last triple in
476      the list, then ``i+n < i'`` or ``j+n < j'``; in other words, adjacent
477      triples always describe non-adjacent equal blocks.
478
479      .. XXX Explain why a dummy is used!
480
481      .. doctest::
482
483         >>> s = SequenceMatcher(None, "abxcd", "abcd")
484         >>> s.get_matching_blocks()
485         [Match(a=0, b=0, size=2), Match(a=3, b=2, size=2), Match(a=5, b=4, size=0)]
486
487
488   .. method:: get_opcodes()
489
490      Return list of 5-tuples describing how to turn *a* into *b*. Each tuple is
491      of the form ``(tag, i1, i2, j1, j2)``.  The first tuple has ``i1 == j1 ==
492      0``, and remaining tuples have *i1* equal to the *i2* from the preceding
493      tuple, and, likewise, *j1* equal to the previous *j2*.
494
495      The *tag* values are strings, with these meanings:
496
497      +---------------+---------------------------------------------+
498      | Value         | Meaning                                     |
499      +===============+=============================================+
500      | ``'replace'`` | ``a[i1:i2]`` should be replaced by          |
501      |               | ``b[j1:j2]``.                               |
502      +---------------+---------------------------------------------+
503      | ``'delete'``  | ``a[i1:i2]`` should be deleted.  Note that  |
504      |               | ``j1 == j2`` in this case.                  |
505      +---------------+---------------------------------------------+
506      | ``'insert'``  | ``b[j1:j2]`` should be inserted at          |
507      |               | ``a[i1:i1]``. Note that ``i1 == i2`` in     |
508      |               | this case.                                  |
509      +---------------+---------------------------------------------+
510      | ``'equal'``   | ``a[i1:i2] == b[j1:j2]`` (the sub-sequences |
511      |               | are equal).                                 |
512      +---------------+---------------------------------------------+
513
514      For example::
515
516        >>> a = "qabxcd"
517        >>> b = "abycdf"
518        >>> s = SequenceMatcher(None, a, b)
519        >>> for tag, i1, i2, j1, j2 in s.get_opcodes():
520        ...     print('{:7}   a[{}:{}] --> b[{}:{}] {!r:>8} --> {!r}'.format(
521        ...         tag, i1, i2, j1, j2, a[i1:i2], b[j1:j2]))
522        delete    a[0:1] --> b[0:0]      'q' --> ''
523        equal     a[1:3] --> b[0:2]     'ab' --> 'ab'
524        replace   a[3:4] --> b[2:3]      'x' --> 'y'
525        equal     a[4:6] --> b[3:5]     'cd' --> 'cd'
526        insert    a[6:6] --> b[5:6]       '' --> 'f'
527
528
529   .. method:: get_grouped_opcodes(n=3)
530
531      Return a :term:`generator` of groups with up to *n* lines of context.
532
533      Starting with the groups returned by :meth:`get_opcodes`, this method
534      splits out smaller change clusters and eliminates intervening ranges which
535      have no changes.
536
537      The groups are returned in the same format as :meth:`get_opcodes`.
538
539
540   .. method:: ratio()
541
542      Return a measure of the sequences' similarity as a float in the range [0,
543      1].
544
545      Where T is the total number of elements in both sequences, and M is the
546      number of matches, this is 2.0\*M / T. Note that this is ``1.0`` if the
547      sequences are identical, and ``0.0`` if they have nothing in common.
548
549      This is expensive to compute if :meth:`get_matching_blocks` or
550      :meth:`get_opcodes` hasn't already been called, in which case you may want
551      to try :meth:`quick_ratio` or :meth:`real_quick_ratio` first to get an
552      upper bound.
553
554      .. note::
555
556         Caution: The result of a :meth:`ratio` call may depend on the order of
557         the arguments. For instance::
558
559            >>> SequenceMatcher(None, 'tide', 'diet').ratio()
560            0.25
561            >>> SequenceMatcher(None, 'diet', 'tide').ratio()
562            0.5
563
564
565   .. method:: quick_ratio()
566
567      Return an upper bound on :meth:`ratio` relatively quickly.
568
569
570   .. method:: real_quick_ratio()
571
572      Return an upper bound on :meth:`ratio` very quickly.
573
574
575The three methods that return the ratio of matching to total characters can give
576different results due to differing levels of approximation, although
577:meth:`quick_ratio` and :meth:`real_quick_ratio` are always at least as large as
578:meth:`ratio`:
579
580   >>> s = SequenceMatcher(None, "abcd", "bcde")
581   >>> s.ratio()
582   0.75
583   >>> s.quick_ratio()
584   0.75
585   >>> s.real_quick_ratio()
586   1.0
587
588
589.. _sequencematcher-examples:
590
591SequenceMatcher Examples
592------------------------
593
594This example compares two strings, considering blanks to be "junk":
595
596   >>> s = SequenceMatcher(lambda x: x == " ",
597   ...                     "private Thread currentThread;",
598   ...                     "private volatile Thread currentThread;")
599
600:meth:`ratio` returns a float in [0, 1], measuring the similarity of the
601sequences.  As a rule of thumb, a :meth:`ratio` value over 0.6 means the
602sequences are close matches:
603
604   >>> print(round(s.ratio(), 3))
605   0.866
606
607If you're only interested in where the sequences match,
608:meth:`get_matching_blocks` is handy:
609
610   >>> for block in s.get_matching_blocks():
611   ...     print("a[%d] and b[%d] match for %d elements" % block)
612   a[0] and b[0] match for 8 elements
613   a[8] and b[17] match for 21 elements
614   a[29] and b[38] match for 0 elements
615
616Note that the last tuple returned by :meth:`get_matching_blocks` is always a
617dummy, ``(len(a), len(b), 0)``, and this is the only case in which the last
618tuple element (number of elements matched) is ``0``.
619
620If you want to know how to change the first sequence into the second, use
621:meth:`get_opcodes`:
622
623   >>> for opcode in s.get_opcodes():
624   ...     print("%6s a[%d:%d] b[%d:%d]" % opcode)
625    equal a[0:8] b[0:8]
626   insert a[8:8] b[8:17]
627    equal a[8:29] b[17:38]
628
629.. seealso::
630
631   * The :func:`get_close_matches` function in this module which shows how
632     simple code building on :class:`SequenceMatcher` can be used to do useful
633     work.
634
635   * `Simple version control recipe
636     <https://code.activestate.com/recipes/576729/>`_ for a small application
637     built with :class:`SequenceMatcher`.
638
639
640.. _differ-objects:
641
642Differ Objects
643--------------
644
645Note that :class:`Differ`\ -generated deltas make no claim to be **minimal**
646diffs. To the contrary, minimal diffs are often counter-intuitive, because they
647synch up anywhere possible, sometimes accidental matches 100 pages apart.
648Restricting synch points to contiguous matches preserves some notion of
649locality, at the occasional cost of producing a longer diff.
650
651The :class:`Differ` class has this constructor:
652
653
654.. class:: Differ(linejunk=None, charjunk=None)
655   :noindex:
656
657   Optional keyword parameters *linejunk* and *charjunk* are for filter functions
658   (or ``None``):
659
660   *linejunk*: A function that accepts a single string argument, and returns true
661   if the string is junk.  The default is ``None``, meaning that no line is
662   considered junk.
663
664   *charjunk*: A function that accepts a single character argument (a string of
665   length 1), and returns true if the character is junk. The default is ``None``,
666   meaning that no character is considered junk.
667
668   These junk-filtering functions speed up matching to find
669   differences and do not cause any differing lines or characters to
670   be ignored.  Read the description of the
671   :meth:`~SequenceMatcher.find_longest_match` method's *isjunk*
672   parameter for an explanation.
673
674   :class:`Differ` objects are used (deltas generated) via a single method:
675
676
677   .. method:: Differ.compare(a, b)
678
679      Compare two sequences of lines, and generate the delta (a sequence of lines).
680
681      Each sequence must contain individual single-line strings ending with
682      newlines.  Such sequences can be obtained from the
683      :meth:`~io.IOBase.readlines` method of file-like objects.  The delta
684      generated also consists of newline-terminated strings, ready to be
685      printed as-is via the :meth:`~io.IOBase.writelines` method of a
686      file-like object.
687
688
689.. _differ-examples:
690
691Differ Example
692--------------
693
694This example compares two texts. First we set up the texts, sequences of
695individual single-line strings ending with newlines (such sequences can also be
696obtained from the :meth:`~io.BaseIO.readlines` method of file-like objects):
697
698   >>> text1 = '''  1. Beautiful is better than ugly.
699   ...   2. Explicit is better than implicit.
700   ...   3. Simple is better than complex.
701   ...   4. Complex is better than complicated.
702   ... '''.splitlines(keepends=True)
703   >>> len(text1)
704   4
705   >>> text1[0][-1]
706   '\n'
707   >>> text2 = '''  1. Beautiful is better than ugly.
708   ...   3.   Simple is better than complex.
709   ...   4. Complicated is better than complex.
710   ...   5. Flat is better than nested.
711   ... '''.splitlines(keepends=True)
712
713Next we instantiate a Differ object:
714
715   >>> d = Differ()
716
717Note that when instantiating a :class:`Differ` object we may pass functions to
718filter out line and character "junk."  See the :meth:`Differ` constructor for
719details.
720
721Finally, we compare the two:
722
723   >>> result = list(d.compare(text1, text2))
724
725``result`` is a list of strings, so let's pretty-print it:
726
727   >>> from pprint import pprint
728   >>> pprint(result)
729   ['    1. Beautiful is better than ugly.\n',
730    '-   2. Explicit is better than implicit.\n',
731    '-   3. Simple is better than complex.\n',
732    '+   3.   Simple is better than complex.\n',
733    '?     ++\n',
734    '-   4. Complex is better than complicated.\n',
735    '?            ^                     ---- ^\n',
736    '+   4. Complicated is better than complex.\n',
737    '?           ++++ ^                      ^\n',
738    '+   5. Flat is better than nested.\n']
739
740As a single multi-line string it looks like this:
741
742   >>> import sys
743   >>> sys.stdout.writelines(result)
744       1. Beautiful is better than ugly.
745   -   2. Explicit is better than implicit.
746   -   3. Simple is better than complex.
747   +   3.   Simple is better than complex.
748   ?     ++
749   -   4. Complex is better than complicated.
750   ?            ^                     ---- ^
751   +   4. Complicated is better than complex.
752   ?           ++++ ^                      ^
753   +   5. Flat is better than nested.
754
755
756.. _difflib-interface:
757
758A command-line interface to difflib
759-----------------------------------
760
761This example shows how to use difflib to create a ``diff``-like utility.
762It is also contained in the Python source distribution, as
763:file:`Tools/scripts/diff.py`.
764
765.. literalinclude:: ../../Tools/scripts/diff.py
766