difflib.py - OpenGrok cross reference for /external/python/cpython2/Lib/difflib.py

Lines Matching +full:diff +full:- +full:sequences
2 Module difflib -- helpers for computing deltas between objects.
8     For two lists of strings, return a delta in context diff format.
14     Return one of the two sequences that generated an ndiff delta.
17     For two lists of strings, return a delta in unified diff format.
20     A flexible class for comparing pairs of sequences of any type.
23     For producing human-readable deltas from sequences of lines of text.
47     SequenceMatcher is a flexible class for comparing pairs of sequences of
53     elements (R-O doesn't address junk).  The same idea is then applied
54     recursively to the pieces of the sequences to the left and to the right
56     sequences, but does tend to yield matches that "look right" to people.
58     SequenceMatcher tries to compute a "human-friendly diff" between two
59     sequences.  Unlike e.g. UNIX(tm) diff, the fundamental notion is the
60     longest *contiguous* & junk-free matching subsequence.  That's what
64     reports than does diff.  This method appears to be the least vulnerable
78     sequences.  As a rule of thumb, a .ratio() value over 0.6 means the
79     sequences are close matches:
85     If you're only interested in where the sequences match,
107     See the Differ class for a fancy human-friendly file differencer, which
108     uses SequenceMatcher both to compare sequences of lines, and to compare
109     sequences of characters within similar (near-matching) lines.
114     Timing:  Basic R-O is cubic time worst case and quadratic time expected
116     expected-case behavior dependent in a complicated way on how many
117     elements the sequences have in common; best case time is linear.
125         Set the two sequences to be compared.
140         Return list of 5-tuples describing how to turn a into b.
143         Return a measure of the sequences' similarity (float in [0,1]).
155         Optional arg isjunk is None (the default), or a one-argument
160         if you're comparing lines as sequences of characters, and don't
163         Optional arg a is the first of two sequences to be compared.  By
167         Optional arg b is the second of two sequences to be compared.  By
191         #      ascending & non-overlapping in i and in j; terminated by
201         #      a user-supplied function taking a sequence element and
202         #      returning true iff the element is "junk" -- this has
222         """Set the two sequences to be compared.
248         many sequences, use .set_seq2(S) once and call .set_seq1(x)
249         repeatedly for each of the other sequences.
274         many sequences, use .set_seq2(S) once and call .set_seq1(x)
275         repeatedly for each of the other sequences.
297     # be viewed as an adaptive notion of semi-junk, and yields an enormous
300     # note that this is only called when b changes; so for cross-product
305         # Because isjunk is a user-defined (not C) function, and we test
308         # time-consuming routine in the whole module!  If anyone sees
309         # Jim Roskind, thank him again for profile.py -- I never would
397         # stripped, it's "a" (tied with "b").  UNIX(tm) diff does so
401         # Windiff ends up at the same place as diff, but by pairing up
406         # find longest junk-free match
408         # junk-free match ending with a[i-1] and b[j]
422                 k = newj2len[j] = j2lenget(j-1, 0) + 1
424                     besti, bestj, bestsize = i-k+1, j-k+1, k
427         # Extend the best by non-junk elements on each end.  In particular,
428         # "popular" non-junk elements aren't in b2j, which greatly speeds
430         # doesn't contain any junk *or* popular non-junk elements.
432               not isbjunk(b[bestj-1]) and \
433               a[besti-1] == b[bestj-1]:
434             besti, bestj, bestsize = besti-1, bestj-1, bestsize+1
443         # saves post-processing the (possibly considerable) expense of
448               isbjunk(b[bestj-1]) and \
449               a[besti-1] == b[bestj-1]:
450             besti, bestj, bestsize = besti-1, bestj-1, bestsize+1
511                 # Yes, so collapse them -- this just increases the length of
530         """Return list of 5-tuples describing how to turn a into b.
566             # out a diff to change a[i:ai] into b[j:bj], pump out
615             codes[0] = tag, max(i1, i2-n), i2, max(j1, j2-n), j2
616         if codes[-1][0] == 'equal':
617             tag, i1, i2, j1, j2 = codes[-1]
618             codes[-1] = tag, i1, min(i2, i1+n), j1, min(j2, j1+n)
625             if tag == 'equal' and i2-i1 > nn:
629                 i1, j1 = max(i1, i2-n), max(j1, j2-n)
635         """Return a measure of the sequences' similarity (float in [0,1]).
637         Where T is the total number of elements in both sequences, and
639         Note that this is 1 if the sequences are identical, and 0 if
656         matches = reduce(lambda sum, triple: sum + triple[-1],
684             avail[elt] = numb - 1
707     possibilities is a list of sequences against which to match word
766     Differ is a class for comparing sequences of lines of text, and
767     producing human-readable differences or deltas.  Differ uses
768     SequenceMatcher both to compare sequences of lines, and to compare
769     sequences of characters within similar (near-matching) lines.
771     Each line of a Differ delta begins with a two-letter code:
773         '- '    line unique to sequence 1
775         '  '    line common to both sequences
780     can be confusing if the sequences contain tab characters.
782     Note that Differ makes no claim to produce a *minimal* diff.  To the
783     contrary, minimal diffs are often counter-intuitive, because they synch
786     locality, at the occasional cost of producing a longer diff.
790     First we set up the texts, sequences of individual single-line strings
791     ending with newlines (such sequences can also be obtained from the
792     `readlines()` method of file-like objects):
801     >>> text1[0][-1]
820     'result' is a list of strings, so let's pretty-print it:
825      '-   2. Explicit is better than implicit.\n',
826      '-   3. Simple is better than complex.\n',
829      '-   4. Complex is better than complicated.\n',
830      '?            ^                     ---- ^\n',
835     As a single multi-line string it looks like this:
839     -   2. Explicit is better than implicit.
840     -   3. Simple is better than complex.
843     -   4. Complex is better than complicated.
844     ?            ^                     ---- ^
855         Compare two sequences of lines; generate the resulting delta.
864         - `linejunk`: A function that should accept a single string argument,
865           and return true iff the string is junk. The module-level function
873         - `charjunk`: A function that should accept a string of length 1. The
874           module-level function `IS_CHARACTER_JUNK` may be used to filter out
884         Compare two sequences of lines; generate the resulting delta.
886         Each sequence must contain individual single-line strings ending with
887         newlines. Such sequences can be obtained from the `readlines()` method
888         of file-like objects.  The delta generated also consists of newline-
889         terminated strings, ready to be printed as-is via the writeline()
890         method of a file-like object.
896         - one
900         - two
901         - three
902         ?  -
912                 g = self._dump('-', a, alo, ahi)
924         """Generate comparison results for a same-tagged range."""
930         # dump the shorter block first -- reduces the burden on short-term
932         if bhi - blo < ahi - alo:
934             second = self._dump('-', a, alo, ahi)
936             first  = self._dump('-', a, alo, ahi)
946         for *similar* lines; the best-matching pair (if any) is used as a
956         - abcDefghiJkl
970         # on junk -- unless we have to)
982                 # upper bounds first -- have seen this speed up messy
992             # no non-identical "pretty close" pair
994                 # no identical pair either -- treat it as a straight replace
998             # no close pair, but an identical pair -- synch up on that
1014             # pump out a '-', '?', '+', '?' quad for the synched lines
1018                 la, lb = ai2 - ai1, bj2 - bj1
1023                     atags += '-' * la
1047                 g = self._dump('-', a, alo, ahi)
1065         '- \tabcDefghiJkl\n'
1079         yield "- " + aline
1095 # looks for matching blocks that are entirely junk-free, then extends the
1142 ###  Unified Diff
1147     # Per the diff spec at http://www.unix.org/single_unix_specification/
1149     length = stop - start
1153         beginning -= 1        # empty ranges begin at line just before the range
1159     Compare two sequences of lines; generate the delta as a unified diff.
1165     By default, the diff control lines (those with ---, +++, or @@) are
1183     ...             '2005-01-26 23:30:50', '2010-04-02 10:20:52',
1186     --- Original        2005-01-26 23:30:50
1187     +++ Current         2010-04-02 10:20:52
1188     @@ -1,4 +1,4 @@
1191     -two
1192     -three
1203             yield '--- {}{}{}'.format(fromfile, fromdate, lineterm)
1206         first, last = group[0], group[-1]
1209         yield '@@ -{} +{} @@{}'.format(file1_range, file2_range, lineterm)
1218                     yield '-' + line
1225 ###  Context Diff
1230     # Per the diff spec at http://www.unix.org/single_unix_specification/
1232     length = stop - start
1234         beginning -= 1        # empty ranges begin at line just before the range
1237     return '{},{}'.format(beginning, beginning + length - 1)
1243     Compare two sequences of lines; generate the delta as a context diff.
1249     By default, the diff control lines (those with *** or ---) are
1258     The context diff format normally has a header for filenames and
1269     --- Current
1276     --- 1,4 ----
1283     prefix = dict(insert='+ ', delete='- ', replace='! ', equal='  ')
1291             yield '--- {}{}{}'.format(tofile, todate, lineterm)
1293         first, last = group[0], group[-1]
1306         yield '--- {} ----{}'.format(file2_range, lineterm)
1316     Compare `a` and `b` (lists of strings); return a `Differ`-style delta.
1321     - linejunk: A function that should accept a single string argument, and
1326     - charjunk: A function that should accept a string of length 1. The
1327       default is module-level function IS_CHARACTER_JUNK, which filters out
1331     Tools/scripts/ndiff.py is a command-line front-end to this function.
1335     >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
1337     >>> print ''.join(diff),
1338     - one
1342     - two
1343     - three
1344     ?  -
1355     fromlines -- list of text lines to compared to tolines
1356     tolines -- list of text lines to be compared to fromlines
1357     context -- number of context lines to display on each side of difference,
1359     linejunk -- passed on to ndiff (see ndiff documentation)
1360     charjunk -- passed on to ndiff (see ndiff documentation)
1365     from/to line tuple -- (line num, line text)
1366         line num -- integer or None (to indicate a context separation)
1367         line text -- original line text with following markers inserted:
1368             '\0+' -- marks start of added text
1369             '\0-' -- marks start of deleted text
1370             '\0^' -- marks start of changed text
1371             '\1' -- marks end of added/deleted/changed text
1373     boolean flag -- None indicates context separation, True indicates
1387     change_re = re.compile('(\++|\-+|\^+)')
1395         lines -- list of lines from the ndiff generator to produce a line of
1398         format_key -- '+' return first line in list with "add" markup around
1400                       '-' return first line in list with "delete" markup around
1405         side -- indice into the num_lines list (0=from,1=to)
1406         num_lines -- from/to current line number.  This is NOT intended to be a
1431             for key,(begin,end) in sub_info[::-1]:
1479             elif s.startswith('-?+?'):
1483             elif s.startswith('--++'):
1486                 num_blanks_pending -= 1
1487                 yield _make_line(lines,'-',0), None, True
1489             elif s.startswith(('--?+', '--+', '- ')):
1492                 from_line,to_line = _make_line(lines,'-',0), None
1493                 num_blanks_to_yield,num_blanks_pending = num_blanks_pending-1,0
1494             elif s.startswith('-+?'):
1498             elif s.startswith('-?+'):
1502             elif s.startswith('-'):
1504                 num_blanks_pending -= 1
1505                 yield _make_line(lines,'-',0), None, True
1507             elif s.startswith('+--'):
1513             elif s.startswith(('+ ', '+-')):
1532                 num_blanks_to_yield -= 1
1601                 lines_to_write -= 1
1603             lines_to_write = context-1
1608                     lines_to_write = context-1
1610                     lines_to_write -= 1
1615 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
1616           "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
1621     <meta http-equiv="Content-Type"
1622           content="text/html; charset=ISO-8859-1" />
1635         table.diff {font-family:Courier; border:medium;}
1636         .diff_header {background-color:#e0e0e0}
1637         td.diff_header {text-align:right}
1638         .diff_next {background-color:#c0c0c0}
1639         .diff_add {background-color:#aaffaa}
1640         .diff_chg {background-color:#ffff77}
1641         .diff_sub {background-color:#ffaaaa}"""
1644     <table class="diff" id="difflib_chg_%(prefix)s_top"
1654     <table class="diff" summary="Legends">
1675     of text with inter-line and intra-line change highlights.  The table can
1680     make_table -- generates HTML for a single side by side table
1681     make_file -- generates complete HTML file with a single side by side table
1683     See tools/scripts/diff.py for an example usage of this class.
1697         tabsize -- tab stop spacing, defaults to 8.
1698         wrapcolumn -- column number where lines are broken and wrapped,
1700         linejunk,charjunk -- keyword arguments passed into ndiff() (used to by
1714         fromlines -- list of "from" lines
1715         tolines -- list of "to" lines
1716         fromdesc -- "from" file column header string
1717         todesc -- "to" file column header string
1718         context -- set to True for contextual differences (defaults to False
1720         numlines -- number of context lines.  When context is set True,
1773         if (size <= max) or ((size -(text.count('\0')*3)) <= max):
1863         side -- 0 or 1 indicating "from" or "to" text
1864         flag -- indicates if difference on line
1865         linenum -- line number (used for line number column)
1866         text -- line text to be marked up
1877         # make space non-breakable so they don't get compressed or line wrapped
1913                     i = max([0,i-numlines])
1946         fromlines -- list of "from" lines
1947         tolines -- list of "to" lines
1948         fromdesc -- "from" file column header string
1949         todesc -- "to" file column header string
1950         context -- set to True for contextual differences (defaults to False
1952         numlines -- number of context lines.  When context is set True,
2013                      replace('\0-','<span class="diff_sub">'). \
2022     Generate one of the two sequences that generated a delta.
2030     >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
2032     >>> diff = list(diff)
2033     >>> print ''.join(restore(diff, 1)),
2037     >>> print ''.join(restore(diff, 2)),
2043         tag = {1: "- ", 2: "+ "}[int(which)]