• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1:mod:`!difflib` --- Helpers for computing deltas
2================================================
3
4.. module:: difflib
5   :synopsis: Helpers for computing differences between objects.
6
7.. moduleauthor:: Tim Peters <tim_one@users.sourceforge.net>
8.. sectionauthor:: Tim Peters <tim_one@users.sourceforge.net>
9.. Markup by Fred L. Drake, Jr. <fdrake@acm.org>
10
11**Source code:** :source:`Lib/difflib.py`
12
13.. testsetup::
14
15   import sys
16   from difflib import *
17
18--------------
19
20This module provides classes and functions for comparing sequences. It
21can be used for example, for comparing files, and can produce information
22about file differences in various formats, including HTML and context and unified
23diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
24
25
26.. class:: SequenceMatcher
27   :noindex:
28
29   This is a flexible class for comparing pairs of sequences of any type, so long
30   as the sequence elements are :term:`hashable`.  The basic algorithm predates, and is a
31   little fancier than, an algorithm published in the late 1980's by Ratcliff and
32   Obershelp under the hyperbolic name "gestalt pattern matching."  The idea is to
33   find the longest contiguous matching subsequence that contains no "junk"
34   elements; these "junk" elements are ones that are uninteresting in some
35   sense, such as blank lines or whitespace.  (Handling junk is an
36   extension to the Ratcliff and Obershelp algorithm.) The same
37   idea is then applied recursively to the pieces of the sequences to the left and
38   to the right of the matching subsequence.  This does not yield minimal edit
39   sequences, but does tend to yield matches that "look right" to people.
40
41   **Timing:** The basic Ratcliff-Obershelp algorithm is cubic time in the worst
42   case and quadratic time in the expected case. :class:`SequenceMatcher` is
43   quadratic time for the worst case and has expected-case behavior dependent in a
44   complicated way on how many elements the sequences have in common; best case
45   time is linear.
46
47   **Automatic junk heuristic:** :class:`SequenceMatcher` supports a heuristic that
48   automatically treats certain sequence items as junk. The heuristic counts how many
49   times each individual item appears in the sequence. If an item's duplicates (after
50   the first one) account for more than 1% of the sequence and the sequence is at least
51   200 items long, this item is marked as "popular" and is treated as junk for
52   the purpose of sequence matching. This heuristic can be turned off by setting
53   the ``autojunk`` argument to ``False`` when creating the :class:`SequenceMatcher`.
54
55   .. versionchanged:: 3.2
56      Added the *autojunk* parameter.
57
58
59.. class:: Differ
60
61   This is a class for comparing sequences of lines of text, and producing
62   human-readable differences or deltas.  Differ uses :class:`SequenceMatcher`
63   both to compare sequences of lines, and to compare sequences of characters
64   within similar (near-matching) lines.
65
66   Each line of a :class:`Differ` delta begins with a two-letter code:
67
68   +----------+-------------------------------------------+
69   | Code     | Meaning                                   |
70   +==========+===========================================+
71   | ``'- '`` | line unique to sequence 1                 |
72   +----------+-------------------------------------------+
73   | ``'+ '`` | line unique to sequence 2                 |
74   +----------+-------------------------------------------+
75   | ``'  '`` | line common to both sequences             |
76   +----------+-------------------------------------------+
77   | ``'? '`` | line not present in either input sequence |
78   +----------+-------------------------------------------+
79
80   Lines beginning with '``?``' attempt to guide the eye to intraline differences,
81   and were not present in either input sequence. These lines can be confusing if
82   the sequences contain whitespace characters, such as spaces, tabs or line breaks.
83
84
85.. class:: HtmlDiff
86
87   This class can be used to create an HTML table (or a complete HTML file
88   containing the table) showing a side by side, line by line comparison of text
89   with inter-line and intra-line change highlights.  The table can be generated in
90   either full or contextual difference mode.
91
92   The constructor for this class is:
93
94
95   .. method:: __init__(tabsize=8, wrapcolumn=None, linejunk=None, charjunk=IS_CHARACTER_JUNK)
96
97      Initializes instance of :class:`HtmlDiff`.
98
99      *tabsize* is an optional keyword argument to specify tab stop spacing and
100      defaults to ``8``.
101
102      *wrapcolumn* is an optional keyword to specify column number where lines are
103      broken and wrapped, defaults to ``None`` where lines are not wrapped.
104
105      *linejunk* and *charjunk* are optional keyword arguments passed into :func:`ndiff`
106      (used by :class:`HtmlDiff` to generate the side by side HTML differences).  See
107      :func:`ndiff` documentation for argument default values and descriptions.
108
109   The following methods are public:
110
111   .. method:: make_file(fromlines, tolines, fromdesc='', todesc='', context=False, \
112                         numlines=5, *, charset='utf-8')
113
114      Compares *fromlines* and *tolines* (lists of strings) and returns a string which
115      is a complete HTML file containing a table showing line by line differences with
116      inter-line and intra-line changes highlighted.
117
118      *fromdesc* and *todesc* are optional keyword arguments to specify from/to file
119      column header strings (both default to an empty string).
120
121      *context* and *numlines* are both optional keyword arguments. Set *context* to
122      ``True`` when contextual differences are to be shown, else the default is
123      ``False`` to show the full files. *numlines* defaults to ``5``.  When *context*
124      is ``True`` *numlines* controls the number of context lines which surround the
125      difference highlights.  When *context* is ``False`` *numlines* controls the
126      number of lines which are shown before a difference highlight when using the
127      "next" hyperlinks (setting to zero would cause the "next" hyperlinks to place
128      the next difference highlight at the top of the browser without any leading
129      context).
130
131      .. note::
132         *fromdesc* and *todesc* are interpreted as unescaped HTML and should be
133         properly escaped while receiving input from untrusted sources.
134
135      .. versionchanged:: 3.5
136         *charset* keyword-only argument was added.  The default charset of
137         HTML document changed from ``'ISO-8859-1'`` to ``'utf-8'``.
138
139   .. method:: make_table(fromlines, tolines, fromdesc='', todesc='', context=False, numlines=5)
140
141      Compares *fromlines* and *tolines* (lists of strings) and returns a string which
142      is a complete HTML table showing line by line differences with inter-line and
143      intra-line changes highlighted.
144
145      The arguments for this method are the same as those for the :meth:`make_file`
146      method.
147
148
149
150.. function:: context_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\n')
151
152   Compare *a* and *b* (lists of strings); return a delta (a :term:`generator`
153   generating the delta lines) in context diff format.
154
155   Context diffs are a compact way of showing just the lines that have changed plus
156   a few lines of context.  The changes are shown in a before/after style.  The
157   number of context lines is set by *n* which defaults to three.
158
159   By default, the diff control lines (those with ``***`` or ``---``) are created
160   with a trailing newline.  This is helpful so that inputs created from
161   :func:`io.IOBase.readlines` result in diffs that are suitable for use with
162   :func:`io.IOBase.writelines` since both the inputs and outputs have trailing
163   newlines.
164
165   For inputs that do not have trailing newlines, set the *lineterm* argument to
166   ``""`` so that the output will be uniformly newline free.
167
168   The context diff format normally has a header for filenames and modification
169   times.  Any or all of these may be specified using strings for *fromfile*,
170   *tofile*, *fromfiledate*, and *tofiledate*.  The modification times are normally
171   expressed in the ISO 8601 format. If not specified, the
172   strings default to blanks.
173
174      >>> import sys
175      >>> from difflib import *
176      >>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
177      >>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
178      >>> sys.stdout.writelines(context_diff(s1, s2, fromfile='before.py',
179      ...                        tofile='after.py'))
180      *** before.py
181      --- after.py
182      ***************
183      *** 1,4 ****
184      ! bacon
185      ! eggs
186      ! ham
187        guido
188      --- 1,4 ----
189      ! python
190      ! eggy
191      ! hamster
192        guido
193
194   See :ref:`difflib-interface` for a more detailed example.
195
196
197.. function:: get_close_matches(word, possibilities, n=3, cutoff=0.6)
198
199   Return a list of the best "good enough" matches.  *word* is a sequence for which
200   close matches are desired (typically a string), and *possibilities* is a list of
201   sequences against which to match *word* (typically a list of strings).
202
203   Optional argument *n* (default ``3``) is the maximum number of close matches to
204   return; *n* must be greater than ``0``.
205
206   Optional argument *cutoff* (default ``0.6``) is a float in the range [0, 1].
207   Possibilities that don't score at least that similar to *word* are ignored.
208
209   The best (no more than *n*) matches among the possibilities are returned in a
210   list, sorted by similarity score, most similar first.
211
212      >>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy'])
213      ['apple', 'ape']
214      >>> import keyword
215      >>> get_close_matches('wheel', keyword.kwlist)
216      ['while']
217      >>> get_close_matches('pineapple', keyword.kwlist)
218      []
219      >>> get_close_matches('accept', keyword.kwlist)
220      ['except']
221
222
223.. function:: ndiff(a, b, linejunk=None, charjunk=IS_CHARACTER_JUNK)
224
225   Compare *a* and *b* (lists of strings); return a :class:`Differ`\ -style
226   delta (a :term:`generator` generating the delta lines).
227
228   Optional keyword parameters *linejunk* and *charjunk* are filtering functions
229   (or ``None``):
230
231   *linejunk*: A function that accepts a single string argument, and returns
232   true if the string is junk, or false if not. The default is ``None``. There
233   is also a module-level function :func:`IS_LINE_JUNK`, which filters out lines
234   without visible characters, except for at most one pound character (``'#'``)
235   -- however the underlying :class:`SequenceMatcher` class does a dynamic
236   analysis of which lines are so frequent as to constitute noise, and this
237   usually works better than using this function.
238
239   *charjunk*: A function that accepts a character (a string of length 1), and
240   returns if the character is junk, or false if not. The default is module-level
241   function :func:`IS_CHARACTER_JUNK`, which filters out whitespace characters (a
242   blank or tab; it's a bad idea to include newline in this!).
243
244      >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True),
245      ...              'ore\ntree\nemu\n'.splitlines(keepends=True))
246      >>> print(''.join(diff), end="")
247      - one
248      ?  ^
249      + ore
250      ?  ^
251      - two
252      - three
253      ?  -
254      + tree
255      + emu
256
257
258.. function:: restore(sequence, which)
259
260   Return one of the two sequences that generated a delta.
261
262   Given a *sequence* produced by :meth:`Differ.compare` or :func:`ndiff`, extract
263   lines originating from file 1 or 2 (parameter *which*), stripping off line
264   prefixes.
265
266   Example:
267
268      >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True),
269      ...              'ore\ntree\nemu\n'.splitlines(keepends=True))
270      >>> diff = list(diff) # materialize the generated delta into a list
271      >>> print(''.join(restore(diff, 1)), end="")
272      one
273      two
274      three
275      >>> print(''.join(restore(diff, 2)), end="")
276      ore
277      tree
278      emu
279
280
281.. function:: unified_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\n')
282
283   Compare *a* and *b* (lists of strings); return a delta (a :term:`generator`
284   generating the delta lines) in unified diff format.
285
286   Unified diffs are a compact way of showing just the lines that have changed plus
287   a few lines of context.  The changes are shown in an inline style (instead of
288   separate before/after blocks).  The number of context lines is set by *n* which
289   defaults to three.
290
291   By default, the diff control lines (those with ``---``, ``+++``, or ``@@``) are
292   created with a trailing newline.  This is helpful so that inputs created from
293   :func:`io.IOBase.readlines` result in diffs that are suitable for use with
294   :func:`io.IOBase.writelines` since both the inputs and outputs have trailing
295   newlines.
296
297   For inputs that do not have trailing newlines, set the *lineterm* argument to
298   ``""`` so that the output will be uniformly newline free.
299
300   The unified diff format normally has a header for filenames and modification
301   times.  Any or all of these may be specified using strings for *fromfile*,
302   *tofile*, *fromfiledate*, and *tofiledate*.  The modification times are normally
303   expressed in the ISO 8601 format. If not specified, the
304   strings default to blanks.
305
306      >>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
307      >>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
308      >>> sys.stdout.writelines(unified_diff(s1, s2, fromfile='before.py', tofile='after.py'))
309      --- before.py
310      +++ after.py
311      @@ -1,4 +1,4 @@
312      -bacon
313      -eggs
314      -ham
315      +python
316      +eggy
317      +hamster
318       guido
319
320   See :ref:`difflib-interface` for a more detailed example.
321
322.. function:: diff_bytes(dfunc, a, b, fromfile=b'', tofile=b'', fromfiledate=b'', tofiledate=b'', n=3, lineterm=b'\n')
323
324   Compare *a* and *b* (lists of bytes objects) using *dfunc*; yield a
325   sequence of delta lines (also bytes) in the format returned by *dfunc*.
326   *dfunc* must be a callable, typically either :func:`unified_diff` or
327   :func:`context_diff`.
328
329   Allows you to compare data with unknown or inconsistent encoding. All
330   inputs except *n* must be bytes objects, not str. Works by losslessly
331   converting all inputs (except *n*) to str, and calling ``dfunc(a, b,
332   fromfile, tofile, fromfiledate, tofiledate, n, lineterm)``. The output of
333   *dfunc* is then converted back to bytes, so the delta lines that you
334   receive have the same unknown/inconsistent encodings as *a* and *b*.
335
336   .. versionadded:: 3.5
337
338.. function:: IS_LINE_JUNK(line)
339
340   Return ``True`` for ignorable lines.  The line *line* is ignorable if *line* is
341   blank or contains a single ``'#'``, otherwise it is not ignorable.  Used as a
342   default for parameter *linejunk* in :func:`ndiff` in older versions.
343
344
345.. function:: IS_CHARACTER_JUNK(ch)
346
347   Return ``True`` for ignorable characters.  The character *ch* is ignorable if *ch*
348   is a space or tab, otherwise it is not ignorable.  Used as a default for
349   parameter *charjunk* in :func:`ndiff`.
350
351
352.. seealso::
353
354   `Pattern Matching: The Gestalt Approach <https://www.drdobbs.com/database/pattern-matching-the-gestalt-approach/184407970>`_
355      Discussion of a similar algorithm by John W. Ratcliff and D. E. Metzener. This
356      was published in `Dr. Dobb's Journal <https://www.drdobbs.com/>`_ in July, 1988.
357
358
359.. _sequence-matcher:
360
361SequenceMatcher Objects
362-----------------------
363
364The :class:`SequenceMatcher` class has this constructor:
365
366
367.. class:: SequenceMatcher(isjunk=None, a='', b='', autojunk=True)
368
369   Optional argument *isjunk* must be ``None`` (the default) or a one-argument
370   function that takes a sequence element and returns true if and only if the
371   element is "junk" and should be ignored. Passing ``None`` for *isjunk* is
372   equivalent to passing ``lambda x: False``; in other words, no elements are ignored.
373   For example, pass::
374
375      lambda x: x in " \t"
376
377   if you're comparing lines as sequences of characters, and don't want to synch up
378   on blanks or hard tabs.
379
380   The optional arguments *a* and *b* are sequences to be compared; both default to
381   empty strings.  The elements of both sequences must be :term:`hashable`.
382
383   The optional argument *autojunk* can be used to disable the automatic junk
384   heuristic.
385
386   .. versionchanged:: 3.2
387      Added the *autojunk* parameter.
388
389   SequenceMatcher objects get three data attributes: *bjunk* is the
390   set of elements of *b* for which *isjunk* is ``True``; *bpopular* is the set of
391   non-junk elements considered popular by the heuristic (if it is not
392   disabled); *b2j* is a dict mapping the remaining elements of *b* to a list
393   of positions where they occur. All three are reset whenever *b* is reset
394   with :meth:`set_seqs` or :meth:`set_seq2`.
395
396   .. versionadded:: 3.2
397      The *bjunk* and *bpopular* attributes.
398
399   :class:`SequenceMatcher` objects have the following methods:
400
401   .. method:: set_seqs(a, b)
402
403      Set the two sequences to be compared.
404
405   :class:`SequenceMatcher` computes and caches detailed information about the
406   second sequence, so if you want to compare one sequence against many
407   sequences, use :meth:`set_seq2` to set the commonly used sequence once and
408   call :meth:`set_seq1` repeatedly, once for each of the other sequences.
409
410
411   .. method:: set_seq1(a)
412
413      Set the first sequence to be compared.  The second sequence to be compared
414      is not changed.
415
416
417   .. method:: set_seq2(b)
418
419      Set the second sequence to be compared.  The first sequence to be compared
420      is not changed.
421
422
423   .. method:: find_longest_match(alo=0, ahi=None, blo=0, bhi=None)
424
425      Find longest matching block in ``a[alo:ahi]`` and ``b[blo:bhi]``.
426
427      If *isjunk* was omitted or ``None``, :meth:`find_longest_match` returns
428      ``(i, j, k)`` such that ``a[i:i+k]`` is equal to ``b[j:j+k]``, where ``alo
429      <= i <= i+k <= ahi`` and ``blo <= j <= j+k <= bhi``. For all ``(i', j',
430      k')`` meeting those conditions, the additional conditions ``k >= k'``, ``i
431      <= i'``, and if ``i == i'``, ``j <= j'`` are also met. In other words, of
432      all maximal matching blocks, return one that starts earliest in *a*, and
433      of all those maximal matching blocks that start earliest in *a*, return
434      the one that starts earliest in *b*.
435
436         >>> s = SequenceMatcher(None, " abcd", "abcd abcd")
437         >>> s.find_longest_match(0, 5, 0, 9)
438         Match(a=0, b=4, size=5)
439
440      If *isjunk* was provided, first the longest matching block is determined
441      as above, but with the additional restriction that no junk element appears
442      in the block.  Then that block is extended as far as possible by matching
443      (only) junk elements on both sides. So the resulting block never matches
444      on junk except as identical junk happens to be adjacent to an interesting
445      match.
446
447      Here's the same example as before, but considering blanks to be junk. That
448      prevents ``' abcd'`` from matching the ``' abcd'`` at the tail end of the
449      second sequence directly.  Instead only the ``'abcd'`` can match, and
450      matches the leftmost ``'abcd'`` in the second sequence:
451
452         >>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd")
453         >>> s.find_longest_match(0, 5, 0, 9)
454         Match(a=1, b=0, size=4)
455
456      If no blocks match, this returns ``(alo, blo, 0)``.
457
458      This method returns a :term:`named tuple` ``Match(a, b, size)``.
459
460      .. versionchanged:: 3.9
461         Added default arguments.
462
463
464   .. method:: get_matching_blocks()
465
466      Return list of triples describing non-overlapping matching subsequences.
467      Each triple is of the form ``(i, j, n)``,
468      and means that ``a[i:i+n] == b[j:j+n]``.  The
469      triples are monotonically increasing in *i* and *j*.
470
471      The last triple is a dummy, and has the value ``(len(a), len(b), 0)``.  It
472      is the only triple with ``n == 0``.  If ``(i, j, n)`` and ``(i', j', n')``
473      are adjacent triples in the list, and the second is not the last triple in
474      the list, then ``i+n < i'`` or ``j+n < j'``; in other words, adjacent
475      triples always describe non-adjacent equal blocks.
476
477      .. XXX Explain why a dummy is used!
478
479      .. doctest::
480
481         >>> s = SequenceMatcher(None, "abxcd", "abcd")
482         >>> s.get_matching_blocks()
483         [Match(a=0, b=0, size=2), Match(a=3, b=2, size=2), Match(a=5, b=4, size=0)]
484
485
486   .. method:: get_opcodes()
487
488      Return list of 5-tuples describing how to turn *a* into *b*. Each tuple is
489      of the form ``(tag, i1, i2, j1, j2)``.  The first tuple has ``i1 == j1 ==
490      0``, and remaining tuples have *i1* equal to the *i2* from the preceding
491      tuple, and, likewise, *j1* equal to the previous *j2*.
492
493      The *tag* values are strings, with these meanings:
494
495      +---------------+---------------------------------------------+
496      | Value         | Meaning                                     |
497      +===============+=============================================+
498      | ``'replace'`` | ``a[i1:i2]`` should be replaced by          |
499      |               | ``b[j1:j2]``.                               |
500      +---------------+---------------------------------------------+
501      | ``'delete'``  | ``a[i1:i2]`` should be deleted.  Note that  |
502      |               | ``j1 == j2`` in this case.                  |
503      +---------------+---------------------------------------------+
504      | ``'insert'``  | ``b[j1:j2]`` should be inserted at          |
505      |               | ``a[i1:i1]``. Note that ``i1 == i2`` in     |
506      |               | this case.                                  |
507      +---------------+---------------------------------------------+
508      | ``'equal'``   | ``a[i1:i2] == b[j1:j2]`` (the sub-sequences |
509      |               | are equal).                                 |
510      +---------------+---------------------------------------------+
511
512      For example::
513
514        >>> a = "qabxcd"
515        >>> b = "abycdf"
516        >>> s = SequenceMatcher(None, a, b)
517        >>> for tag, i1, i2, j1, j2 in s.get_opcodes():
518        ...     print('{:7}   a[{}:{}] --> b[{}:{}] {!r:>8} --> {!r}'.format(
519        ...         tag, i1, i2, j1, j2, a[i1:i2], b[j1:j2]))
520        delete    a[0:1] --> b[0:0]      'q' --> ''
521        equal     a[1:3] --> b[0:2]     'ab' --> 'ab'
522        replace   a[3:4] --> b[2:3]      'x' --> 'y'
523        equal     a[4:6] --> b[3:5]     'cd' --> 'cd'
524        insert    a[6:6] --> b[5:6]       '' --> 'f'
525
526
527   .. method:: get_grouped_opcodes(n=3)
528
529      Return a :term:`generator` of groups with up to *n* lines of context.
530
531      Starting with the groups returned by :meth:`get_opcodes`, this method
532      splits out smaller change clusters and eliminates intervening ranges which
533      have no changes.
534
535      The groups are returned in the same format as :meth:`get_opcodes`.
536
537
538   .. method:: ratio()
539
540      Return a measure of the sequences' similarity as a float in the range [0,
541      1].
542
543      Where T is the total number of elements in both sequences, and M is the
544      number of matches, this is 2.0\*M / T. Note that this is ``1.0`` if the
545      sequences are identical, and ``0.0`` if they have nothing in common.
546
547      This is expensive to compute if :meth:`get_matching_blocks` or
548      :meth:`get_opcodes` hasn't already been called, in which case you may want
549      to try :meth:`quick_ratio` or :meth:`real_quick_ratio` first to get an
550      upper bound.
551
552      .. note::
553
554         Caution: The result of a :meth:`ratio` call may depend on the order of
555         the arguments. For instance::
556
557            >>> SequenceMatcher(None, 'tide', 'diet').ratio()
558            0.25
559            >>> SequenceMatcher(None, 'diet', 'tide').ratio()
560            0.5
561
562
563   .. method:: quick_ratio()
564
565      Return an upper bound on :meth:`ratio` relatively quickly.
566
567
568   .. method:: real_quick_ratio()
569
570      Return an upper bound on :meth:`ratio` very quickly.
571
572
573The three methods that return the ratio of matching to total characters can give
574different results due to differing levels of approximation, although
575:meth:`~SequenceMatcher.quick_ratio` and :meth:`~SequenceMatcher.real_quick_ratio`
576are always at least as large as :meth:`~SequenceMatcher.ratio`:
577
578   >>> s = SequenceMatcher(None, "abcd", "bcde")
579   >>> s.ratio()
580   0.75
581   >>> s.quick_ratio()
582   0.75
583   >>> s.real_quick_ratio()
584   1.0
585
586
587.. _sequencematcher-examples:
588
589SequenceMatcher Examples
590------------------------
591
592This example compares two strings, considering blanks to be "junk":
593
594   >>> s = SequenceMatcher(lambda x: x == " ",
595   ...                     "private Thread currentThread;",
596   ...                     "private volatile Thread currentThread;")
597
598:meth:`~SequenceMatcher.ratio` returns a float in [0, 1], measuring the similarity of the
599sequences.  As a rule of thumb, a :meth:`~SequenceMatcher.ratio` value over 0.6 means the
600sequences are close matches:
601
602   >>> print(round(s.ratio(), 3))
603   0.866
604
605If you're only interested in where the sequences match,
606:meth:`~SequenceMatcher.get_matching_blocks` is handy:
607
608   >>> for block in s.get_matching_blocks():
609   ...     print("a[%d] and b[%d] match for %d elements" % block)
610   a[0] and b[0] match for 8 elements
611   a[8] and b[17] match for 21 elements
612   a[29] and b[38] match for 0 elements
613
614Note that the last tuple returned by :meth:`~SequenceMatcher.get_matching_blocks`
615is always a dummy, ``(len(a), len(b), 0)``, and this is the only case in which the last
616tuple element (number of elements matched) is ``0``.
617
618If you want to know how to change the first sequence into the second, use
619:meth:`~SequenceMatcher.get_opcodes`:
620
621   >>> for opcode in s.get_opcodes():
622   ...     print("%6s a[%d:%d] b[%d:%d]" % opcode)
623    equal a[0:8] b[0:8]
624   insert a[8:8] b[8:17]
625    equal a[8:29] b[17:38]
626
627.. seealso::
628
629   * The :func:`get_close_matches` function in this module which shows how
630     simple code building on :class:`SequenceMatcher` can be used to do useful
631     work.
632
633   * `Simple version control recipe
634     <https://code.activestate.com/recipes/576729-simple-version-control/>`_ for a small application
635     built with :class:`SequenceMatcher`.
636
637
638.. _differ-objects:
639
640Differ Objects
641--------------
642
643Note that :class:`Differ`\ -generated deltas make no claim to be **minimal**
644diffs. To the contrary, minimal diffs are often counter-intuitive, because they
645synch up anywhere possible, sometimes accidental matches 100 pages apart.
646Restricting synch points to contiguous matches preserves some notion of
647locality, at the occasional cost of producing a longer diff.
648
649The :class:`Differ` class has this constructor:
650
651
652.. class:: Differ(linejunk=None, charjunk=None)
653   :noindex:
654
655   Optional keyword parameters *linejunk* and *charjunk* are for filter functions
656   (or ``None``):
657
658   *linejunk*: A function that accepts a single string argument, and returns true
659   if the string is junk.  The default is ``None``, meaning that no line is
660   considered junk.
661
662   *charjunk*: A function that accepts a single character argument (a string of
663   length 1), and returns true if the character is junk. The default is ``None``,
664   meaning that no character is considered junk.
665
666   These junk-filtering functions speed up matching to find
667   differences and do not cause any differing lines or characters to
668   be ignored.  Read the description of the
669   :meth:`~SequenceMatcher.find_longest_match` method's *isjunk*
670   parameter for an explanation.
671
672   :class:`Differ` objects are used (deltas generated) via a single method:
673
674
675   .. method:: Differ.compare(a, b)
676
677      Compare two sequences of lines, and generate the delta (a sequence of lines).
678
679      Each sequence must contain individual single-line strings ending with
680      newlines.  Such sequences can be obtained from the
681      :meth:`~io.IOBase.readlines` method of file-like objects.  The delta
682      generated also consists of newline-terminated strings, ready to be
683      printed as-is via the :meth:`~io.IOBase.writelines` method of a
684      file-like object.
685
686
687.. _differ-examples:
688
689Differ Example
690--------------
691
692This example compares two texts. First we set up the texts, sequences of
693individual single-line strings ending with newlines (such sequences can also be
694obtained from the :meth:`~io.IOBase.readlines` method of file-like objects):
695
696   >>> text1 = '''  1. Beautiful is better than ugly.
697   ...   2. Explicit is better than implicit.
698   ...   3. Simple is better than complex.
699   ...   4. Complex is better than complicated.
700   ... '''.splitlines(keepends=True)
701   >>> len(text1)
702   4
703   >>> text1[0][-1]
704   '\n'
705   >>> text2 = '''  1. Beautiful is better than ugly.
706   ...   3.   Simple is better than complex.
707   ...   4. Complicated is better than complex.
708   ...   5. Flat is better than nested.
709   ... '''.splitlines(keepends=True)
710
711Next we instantiate a Differ object:
712
713   >>> d = Differ()
714
715Note that when instantiating a :class:`Differ` object we may pass functions to
716filter out line and character "junk."  See the :meth:`Differ` constructor for
717details.
718
719Finally, we compare the two:
720
721   >>> result = list(d.compare(text1, text2))
722
723``result`` is a list of strings, so let's pretty-print it:
724
725   >>> from pprint import pprint
726   >>> pprint(result)
727   ['    1. Beautiful is better than ugly.\n',
728    '-   2. Explicit is better than implicit.\n',
729    '-   3. Simple is better than complex.\n',
730    '+   3.   Simple is better than complex.\n',
731    '?     ++\n',
732    '-   4. Complex is better than complicated.\n',
733    '?            ^                     ---- ^\n',
734    '+   4. Complicated is better than complex.\n',
735    '?           ++++ ^                      ^\n',
736    '+   5. Flat is better than nested.\n']
737
738As a single multi-line string it looks like this:
739
740   >>> import sys
741   >>> sys.stdout.writelines(result)
742       1. Beautiful is better than ugly.
743   -   2. Explicit is better than implicit.
744   -   3. Simple is better than complex.
745   +   3.   Simple is better than complex.
746   ?     ++
747   -   4. Complex is better than complicated.
748   ?            ^                     ---- ^
749   +   4. Complicated is better than complex.
750   ?           ++++ ^                      ^
751   +   5. Flat is better than nested.
752
753
754.. _difflib-interface:
755
756A command-line interface to difflib
757-----------------------------------
758
759This example shows how to use difflib to create a ``diff``-like utility.
760
761.. literalinclude:: ../includes/diff.py
762
763ndiff example
764-------------
765
766This example shows how to use :func:`difflib.ndiff`.
767
768.. literalinclude:: ../includes/ndiff.py
769