difflib.py - OpenGrok cross reference for /external/python/cpython3/Lib/difflib.py

Lines Matching +full:diff +full:- +full:sequences
2 Module difflib -- helpers for computing deltas between objects.
8     For two lists of strings, return a delta in context diff format.
14     Return one of the two sequences that generated an ndiff delta.
17     For two lists of strings, return a delta in unified diff format.
20     A flexible class for comparing pairs of sequences of any type.
23     For producing human-readable deltas from sequences of lines of text.
47     SequenceMatcher is a flexible class for comparing pairs of sequences of
53     elements (R-O doesn't address junk).  The same idea is then applied
54     recursively to the pieces of the sequences to the left and to the right
56     sequences, but does tend to yield matches that "look right" to people.
58     SequenceMatcher tries to compute a "human-friendly diff" between two
59     sequences.  Unlike e.g. UNIX(tm) diff, the fundamental notion is the
60     longest *contiguous* & junk-free matching subsequence.  That's what
64     reports than does diff.  This method appears to be the least vulnerable
78     sequences.  As a rule of thumb, a .ratio() value over 0.6 means the
79     sequences are close matches:
85     If you're only interested in where the sequences match,
107     See the Differ class for a fancy human-friendly file differencer, which
108     uses SequenceMatcher both to compare sequences of lines, and to compare
109     sequences of characters within similar (near-matching) lines.
114     Timing:  Basic R-O is cubic time worst case and quadratic time expected
116     expected-case behavior dependent in a complicated way on how many
117     elements the sequences have in common; best case time is linear.
123         Optional arg isjunk is None (the default), or a one-argument
128         if you're comparing lines as sequences of characters, and don't
131         Optional arg a is the first of two sequences to be compared.  By
135         Optional arg b is the second of two sequences to be compared.  By
159         #      ascending & non-overlapping in i and in j; terminated by
169         #      a user-supplied function taking a sequence element and
170         #      returning true iff the element is "junk" -- this has
185         """Set the two sequences to be compared.
211         many sequences, use .set_seq2(S) once and call .set_seq1(x)
212         repeatedly for each of the other sequences.
237         many sequences, use .set_seq2(S) once and call .set_seq1(x)
238         repeatedly for each of the other sequences.
259     # be viewed as an adaptive notion of semi-junk, and yields an enormous
262     # note that this is only called when b changes; so for cross-product
267         # Because isjunk is a user-defined (not C) function, and we test
270         # time-consuming routine in the whole module!  If anyone sees
271         # Jim Roskind, thank him again for profile.py -- I never would
356         # stripped, it's "a" (tied with "b").  UNIX(tm) diff does so
360         # Windiff ends up at the same place as diff, but by pairing up
369         # find longest junk-free match
371         # junk-free match ending with a[i-1] and b[j]
385                 k = newj2len[j] = j2lenget(j-1, 0) + 1
387                     besti, bestj, bestsize = i-k+1, j-k+1, k
390         # Extend the best by non-junk elements on each end.  In particular,
391         # "popular" non-junk elements aren't in b2j, which greatly speeds
393         # doesn't contain any junk *or* popular non-junk elements.
395               not isbjunk(b[bestj-1]) and \
396               a[besti-1] == b[bestj-1]:
397             besti, bestj, bestsize = besti-1, bestj-1, bestsize+1
406         # saves post-processing the (possibly considerable) expense of
411               isbjunk(b[bestj-1]) and \
412               a[besti-1] == b[bestj-1]:
413             besti, bestj, bestsize = besti-1, bestj-1, bestsize+1
474                 # Yes, so collapse them -- this just increases the length of
493         """Return list of 5-tuples describing how to turn a into b.
529             # out a diff to change a[i:ai] into b[j:bj], pump out
578             codes[0] = tag, max(i1, i2-n), i2, max(j1, j2-n), j2
579         if codes[-1][0] == 'equal':
580             tag, i1, i2, j1, j2 = codes[-1]
581             codes[-1] = tag, i1, min(i2, i1+n), j1, min(j2, j1+n)
588             if tag == 'equal' and i2-i1 > nn:
592                 i1, j1 = max(i1, i2-n), max(j1, j2-n)
598         """Return a measure of the sequences' similarity (float in [0,1]).
600         Where T is the total number of elements in both sequences, and
602         Note that this is 1 if the sequences are identical, and 0 if
619         matches = sum(triple[-1] for triple in self.get_matching_blocks())
646             avail[elt] = numb - 1
672     possibilities is a list of sequences against which to match word
726     Differ is a class for comparing sequences of lines of text, and
727     producing human-readable differences or deltas.  Differ uses
728     SequenceMatcher both to compare sequences of lines, and to compare
729     sequences of characters within similar (near-matching) lines.
731     Each line of a Differ delta begins with a two-letter code:
733         '- '    line unique to sequence 1
735         '  '    line common to both sequences
740     can be confusing if the sequences contain tab characters.
742     Note that Differ makes no claim to produce a *minimal* diff.  To the
743     contrary, minimal diffs are often counter-intuitive, because they synch
746     locality, at the occasional cost of producing a longer diff.
750     First we set up the texts, sequences of individual single-line strings
751     ending with newlines (such sequences can also be obtained from the
752     `readlines()` method of file-like objects):
761     >>> text1[0][-1]
780     'result' is a list of strings, so let's pretty-print it:
785      '-   2. Explicit is better than implicit.\n',
786      '-   3. Simple is better than complex.\n',
789      '-   4. Complex is better than complicated.\n',
790      '?            ^                     ---- ^\n',
795     As a single multi-line string it looks like this:
799     -   2. Explicit is better than implicit.
800     -   3. Simple is better than complex.
803     -   4. Complex is better than complicated.
804     ?            ^                     ---- ^
816         - `linejunk`: A function that should accept a single string argument,
817           and return true iff the string is junk. The module-level function
824         - `charjunk`: A function that should accept a string of length 1. The
825           module-level function `IS_CHARACTER_JUNK` may be used to filter out
835         Compare two sequences of lines; generate the resulting delta.
837         Each sequence must contain individual single-line strings ending with
838         newlines. Such sequences can be obtained from the `readlines()` method
839         of file-like objects.  The delta generated also consists of newline-
840         terminated strings, ready to be printed as-is via the writelines()
841         method of a file-like object.
848         - one
852         - two
853         - three
854         ?  -
864                 g = self._dump('-', a, alo, ahi)
875         """Generate comparison results for a same-tagged range."""
881         # dump the shorter block first -- reduces the burden on short-term
883         if bhi - blo < ahi - alo:
885             second = self._dump('-', a, alo, ahi)
887             first  = self._dump('-', a, alo, ahi)
896         for *similar* lines; the best-matching pair (if any) is used as a
906         - abcDefghiJkl
920         # on junk -- unless we have to)
932                 # upper bounds first -- have seen this speed up messy
942             # no non-identical "pretty close" pair
944                 # no identical pair either -- treat it as a straight replace
947             # no close pair, but an identical pair -- synch up on that
962             # pump out a '-', '?', '+', '?' quad for the synched lines
966                 la, lb = ai2 - ai1, bj2 - bj1
971                     atags += '-' * la
993                 g = self._dump('-', a, alo, ahi)
1010         '- \tabcDefghiJkl\n'
1018         yield "- " + aline
1034 # looks for matching blocks that are entirely junk-free, then extends the
1081 ###  Unified Diff
1086     # Per the diff spec at http://www.unix.org/single_unix_specification/
1088     length = stop - start
1092         beginning -= 1        # empty ranges begin at line just before the range
1098     Compare two sequences of lines; generate the delta as a unified diff.
1104     By default, the diff control lines (those with ---, +++, or @@) are
1122     ...             '2005-01-26 23:30:50', '2010-04-02 10:20:52',
1125     --- Original        2005-01-26 23:30:50
1126     +++ Current         2010-04-02 10:20:52
1127     @@ -1,4 +1,4 @@
1130     -two
1131     -three
1143             yield '--- {}{}{}'.format(fromfile, fromdate, lineterm)
1146         first, last = group[0], group[-1]
1149         yield '@@ -{} +{} @@{}'.format(file1_range, file2_range, lineterm)
1158                     yield '-' + line
1165 ###  Context Diff
1170     # Per the diff spec at http://www.unix.org/single_unix_specification/
1172     length = stop - start
1174         beginning -= 1        # empty ranges begin at line just before the range
1177     return '{},{}'.format(beginning, beginning + length - 1)
1183     Compare two sequences of lines; generate the delta as a context diff.
1189     By default, the diff control lines (those with *** or ---) are
1198     The context diff format normally has a header for filenames and
1210     --- Current
1217     --- 1,4 ----
1225     prefix = dict(insert='+ ', delete='- ', replace='! ', equal='  ')
1233             yield '--- {}{}{}'.format(tofile, todate, lineterm)
1235         first, last = group[0], group[-1]
1248         yield '--- {} ----{}'.format(file2_range, lineterm)
1260     #   --- b'oldfile.txt'
1276     Compare `a` and `b`, two sequences of lines represented as bytes rather
1305     Compare `a` and `b` (lists of strings); return a `Differ`-style delta.
1310     - linejunk: A function that should accept a single string argument and
1315     - charjunk: A function that accepts a character (string of length
1317       the module-level function IS_CHARACTER_JUNK, which filters out
1321     Tools/scripts/ndiff.py is a command-line front-end to this function.
1325     >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True),
1327     >>> print(''.join(diff), end="")
1328     - one
1332     - two
1333     - three
1334     ?  -
1345     fromlines -- list of text lines to compared to tolines
1346     tolines -- list of text lines to be compared to fromlines
1347     context -- number of context lines to display on each side of difference,
1349     linejunk -- passed on to ndiff (see ndiff documentation)
1350     charjunk -- passed on to ndiff (see ndiff documentation)
1355     from/to line tuple -- (line num, line text)
1356         line num -- integer or None (to indicate a context separation)
1357         line text -- original line text with following markers inserted:
1358             '\0+' -- marks start of added text
1359             '\0-' -- marks start of deleted text
1360             '\0^' -- marks start of changed text
1361             '\1' -- marks end of added/deleted/changed text
1363     boolean flag -- None indicates context separation, True indicates
1377     change_re = re.compile(r'(\++|\-+|\^+)')
1385         lines -- list of lines from the ndiff generator to produce a line of
1388         format_key -- '+' return first line in list with "add" markup around
1390                       '-' return first line in list with "delete" markup around
1395         side -- indice into the num_lines list (0=from,1=to)
1396         num_lines -- from/to current line number.  This is NOT intended to be a
1466             elif s.startswith('-?+?'):
1470             elif s.startswith('--++'):
1473                 num_blanks_pending -= 1
1474                 yield _make_line(lines,'-',0), None, True
1476             elif s.startswith(('--?+', '--+', '- ')):
1479                 from_line,to_line = _make_line(lines,'-',0), None
1480                 num_blanks_to_yield,num_blanks_pending = num_blanks_pending-1,0
1481             elif s.startswith('-+?'):
1485             elif s.startswith('-?+'):
1489             elif s.startswith('-'):
1491                 num_blanks_pending -= 1
1492                 yield _make_line(lines,'-',0), None, True
1494             elif s.startswith('+--'):
1500             elif s.startswith(('+ ', '+-')):
1519                 num_blanks_to_yield -= 1
1593                 lines_to_write -= 1
1595             lines_to_write = context-1
1601                         lines_to_write = context-1
1603                         lines_to_write -= 1
1611 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
1612           "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
1617     <meta http-equiv="Content-Type"
1631         table.diff {font-family:Courier; border:medium;}
1632         .diff_header {background-color:#e0e0e0}
1633         td.diff_header {text-align:right}
1634         .diff_next {background-color:#c0c0c0}
1635         .diff_add {background-color:#aaffaa}
1636         .diff_chg {background-color:#ffff77}
1637         .diff_sub {background-color:#ffaaaa}"""
1640     <table class="diff" id="difflib_chg_%(prefix)s_top"
1650     <table class="diff" summary="Legends">
1671     of text with inter-line and intra-line change highlights.  The table can
1676     make_table -- generates HTML for a single side by side table
1677     make_file -- generates complete HTML file with a single side by side table
1679     See tools/scripts/diff.py for an example usage of this class.
1693         tabsize -- tab stop spacing, defaults to 8.
1694         wrapcolumn -- column number where lines are broken and wrapped,
1696         linejunk,charjunk -- keyword arguments passed into ndiff() (used by
1706                   context=False, numlines=5, *, charset='utf-8'):
1710         fromlines -- list of "from" lines
1711         tolines -- list of "to" lines
1712         fromdesc -- "from" file column header string
1713         todesc -- "to" file column header string
1714         context -- set to True for contextual differences (defaults to False
1716         numlines -- number of context lines.  When context is set True,
1721         charset -- charset of the HTML document
1772         if (size <= max) or ((size -(text.count('\0')*3)) <= max):
1862         side -- 0 or 1 indicating "from" or "to" text
1863         flag -- indicates if difference on line
1864         linenum -- line number (used for line number column)
1865         text -- line text to be marked up
1876         # make space non-breakable so they don't get compressed or line wrapped
1912                     i = max([0,i-numlines])
1945         fromlines -- list of "from" lines
1946         tolines -- list of "to" lines
1947         fromdesc -- "from" file column header string
1948         todesc -- "to" file column header string
1949         context -- set to True for contextual differences (defaults to False
1951         numlines -- number of context lines.  When context is set True,
2012                      replace('\0-','<span class="diff_sub">'). \
2021     Generate one of the two sequences that generated a delta.
2029     >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True),
2031     >>> diff = list(diff)
2032     >>> print(''.join(restore(diff, 1)), end="")
2036     >>> print(''.join(restore(diff, 2)), end="")
2042         tag = {1: "- ", 2: "+ "}[int(which)]