• Home
  • Raw
  • Download

Lines Matching +full:diff +full:- +full:sequences

2 Module difflib -- helpers for computing deltas between objects.
8 For two lists of strings, return a delta in context diff format.
14 Return one of the two sequences that generated an ndiff delta.
17 For two lists of strings, return a delta in unified diff format.
20 A flexible class for comparing pairs of sequences of any type.
23 For producing human-readable deltas from sequences of lines of text.
47 SequenceMatcher is a flexible class for comparing pairs of sequences of
53 elements (R-O doesn't address junk). The same idea is then applied
54 recursively to the pieces of the sequences to the left and to the right
56 sequences, but does tend to yield matches that "look right" to people.
58 SequenceMatcher tries to compute a "human-friendly diff" between two
59 sequences. Unlike e.g. UNIX(tm) diff, the fundamental notion is the
60 longest *contiguous* & junk-free matching subsequence. That's what
64 reports than does diff. This method appears to be the least vulnerable
78 sequences. As a rule of thumb, a .ratio() value over 0.6 means the
79 sequences are close matches:
85 If you're only interested in where the sequences match,
107 See the Differ class for a fancy human-friendly file differencer, which
108 uses SequenceMatcher both to compare sequences of lines, and to compare
109 sequences of characters within similar (near-matching) lines.
114 Timing: Basic R-O is cubic time worst case and quadratic time expected
116 expected-case behavior dependent in a complicated way on how many
117 elements the sequences have in common; best case time is linear.
125 Set the two sequences to be compared.
140 Return list of 5-tuples describing how to turn a into b.
143 Return a measure of the sequences' similarity (float in [0,1]).
155 Optional arg isjunk is None (the default), or a one-argument
160 if you're comparing lines as sequences of characters, and don't
163 Optional arg a is the first of two sequences to be compared. By
167 Optional arg b is the second of two sequences to be compared. By
191 # ascending & non-overlapping in i and in j; terminated by
201 # a user-supplied function taking a sequence element and
202 # returning true iff the element is "junk" -- this has
222 """Set the two sequences to be compared.
248 many sequences, use .set_seq2(S) once and call .set_seq1(x)
249 repeatedly for each of the other sequences.
274 many sequences, use .set_seq2(S) once and call .set_seq1(x)
275 repeatedly for each of the other sequences.
297 # be viewed as an adaptive notion of semi-junk, and yields an enormous
300 # note that this is only called when b changes; so for cross-product
305 # Because isjunk is a user-defined (not C) function, and we test
308 # time-consuming routine in the whole module! If anyone sees
309 # Jim Roskind, thank him again for profile.py -- I never would
397 # stripped, it's "a" (tied with "b"). UNIX(tm) diff does so
401 # Windiff ends up at the same place as diff, but by pairing up
406 # find longest junk-free match
408 # junk-free match ending with a[i-1] and b[j]
422 k = newj2len[j] = j2lenget(j-1, 0) + 1
424 besti, bestj, bestsize = i-k+1, j-k+1, k
427 # Extend the best by non-junk elements on each end. In particular,
428 # "popular" non-junk elements aren't in b2j, which greatly speeds
430 # doesn't contain any junk *or* popular non-junk elements.
432 not isbjunk(b[bestj-1]) and \
433 a[besti-1] == b[bestj-1]:
434 besti, bestj, bestsize = besti-1, bestj-1, bestsize+1
443 # saves post-processing the (possibly considerable) expense of
448 isbjunk(b[bestj-1]) and \
449 a[besti-1] == b[bestj-1]:
450 besti, bestj, bestsize = besti-1, bestj-1, bestsize+1
511 # Yes, so collapse them -- this just increases the length of
530 """Return list of 5-tuples describing how to turn a into b.
566 # out a diff to change a[i:ai] into b[j:bj], pump out
615 codes[0] = tag, max(i1, i2-n), i2, max(j1, j2-n), j2
616 if codes[-1][0] == 'equal':
617 tag, i1, i2, j1, j2 = codes[-1]
618 codes[-1] = tag, i1, min(i2, i1+n), j1, min(j2, j1+n)
625 if tag == 'equal' and i2-i1 > nn:
629 i1, j1 = max(i1, i2-n), max(j1, j2-n)
635 """Return a measure of the sequences' similarity (float in [0,1]).
637 Where T is the total number of elements in both sequences, and
639 Note that this is 1 if the sequences are identical, and 0 if
656 matches = reduce(lambda sum, triple: sum + triple[-1],
684 avail[elt] = numb - 1
707 possibilities is a list of sequences against which to match word
766 Differ is a class for comparing sequences of lines of text, and
767 producing human-readable differences or deltas. Differ uses
768 SequenceMatcher both to compare sequences of lines, and to compare
769 sequences of characters within similar (near-matching) lines.
771 Each line of a Differ delta begins with a two-letter code:
773 '- ' line unique to sequence 1
775 ' ' line common to both sequences
780 can be confusing if the sequences contain tab characters.
782 Note that Differ makes no claim to produce a *minimal* diff. To the
783 contrary, minimal diffs are often counter-intuitive, because they synch
786 locality, at the occasional cost of producing a longer diff.
790 First we set up the texts, sequences of individual single-line strings
791 ending with newlines (such sequences can also be obtained from the
792 `readlines()` method of file-like objects):
801 >>> text1[0][-1]
820 'result' is a list of strings, so let's pretty-print it:
825 '- 2. Explicit is better than implicit.\n',
826 '- 3. Simple is better than complex.\n',
829 '- 4. Complex is better than complicated.\n',
830 '? ^ ---- ^\n',
835 As a single multi-line string it looks like this:
839 - 2. Explicit is better than implicit.
840 - 3. Simple is better than complex.
843 - 4. Complex is better than complicated.
844 ? ^ ---- ^
855 Compare two sequences of lines; generate the resulting delta.
864 - `linejunk`: A function that should accept a single string argument,
865 and return true iff the string is junk. The module-level function
873 - `charjunk`: A function that should accept a string of length 1. The
874 module-level function `IS_CHARACTER_JUNK` may be used to filter out
884 Compare two sequences of lines; generate the resulting delta.
886 Each sequence must contain individual single-line strings ending with
887 newlines. Such sequences can be obtained from the `readlines()` method
888 of file-like objects. The delta generated also consists of newline-
889 terminated strings, ready to be printed as-is via the writeline()
890 method of a file-like object.
896 - one
900 - two
901 - three
902 ? -
912 g = self._dump('-', a, alo, ahi)
924 """Generate comparison results for a same-tagged range."""
930 # dump the shorter block first -- reduces the burden on short-term
932 if bhi - blo < ahi - alo:
934 second = self._dump('-', a, alo, ahi)
936 first = self._dump('-', a, alo, ahi)
946 for *similar* lines; the best-matching pair (if any) is used as a
956 - abcDefghiJkl
970 # on junk -- unless we have to)
982 # upper bounds first -- have seen this speed up messy
992 # no non-identical "pretty close" pair
994 # no identical pair either -- treat it as a straight replace
998 # no close pair, but an identical pair -- synch up on that
1014 # pump out a '-', '?', '+', '?' quad for the synched lines
1018 la, lb = ai2 - ai1, bj2 - bj1
1023 atags += '-' * la
1047 g = self._dump('-', a, alo, ahi)
1065 '- \tabcDefghiJkl\n'
1079 yield "- " + aline
1095 # looks for matching blocks that are entirely junk-free, then extends the
1142 ### Unified Diff
1147 # Per the diff spec at http://www.unix.org/single_unix_specification/
1149 length = stop - start
1153 beginning -= 1 # empty ranges begin at line just before the range
1159 Compare two sequences of lines; generate the delta as a unified diff.
1165 By default, the diff control lines (those with ---, +++, or @@) are
1183 ... '2005-01-26 23:30:50', '2010-04-02 10:20:52',
1186 --- Original 2005-01-26 23:30:50
1187 +++ Current 2010-04-02 10:20:52
1188 @@ -1,4 +1,4 @@
1191 -two
1192 -three
1203 yield '--- {}{}{}'.format(fromfile, fromdate, lineterm)
1206 first, last = group[0], group[-1]
1209 yield '@@ -{} +{} @@{}'.format(file1_range, file2_range, lineterm)
1218 yield '-' + line
1225 ### Context Diff
1230 # Per the diff spec at http://www.unix.org/single_unix_specification/
1232 length = stop - start
1234 beginning -= 1 # empty ranges begin at line just before the range
1237 return '{},{}'.format(beginning, beginning + length - 1)
1243 Compare two sequences of lines; generate the delta as a context diff.
1249 By default, the diff control lines (those with *** or ---) are
1258 The context diff format normally has a header for filenames and
1269 --- Current
1276 --- 1,4 ----
1283 prefix = dict(insert='+ ', delete='- ', replace='! ', equal=' ')
1291 yield '--- {}{}{}'.format(tofile, todate, lineterm)
1293 first, last = group[0], group[-1]
1306 yield '--- {} ----{}'.format(file2_range, lineterm)
1316 Compare `a` and `b` (lists of strings); return a `Differ`-style delta.
1321 - linejunk: A function that should accept a single string argument, and
1326 - charjunk: A function that should accept a string of length 1. The
1327 default is module-level function IS_CHARACTER_JUNK, which filters out
1331 Tools/scripts/ndiff.py is a command-line front-end to this function.
1335 >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
1337 >>> print ''.join(diff),
1338 - one
1342 - two
1343 - three
1344 ? -
1355 fromlines -- list of text lines to compared to tolines
1356 tolines -- list of text lines to be compared to fromlines
1357 context -- number of context lines to display on each side of difference,
1359 linejunk -- passed on to ndiff (see ndiff documentation)
1360 charjunk -- passed on to ndiff (see ndiff documentation)
1365 from/to line tuple -- (line num, line text)
1366 line num -- integer or None (to indicate a context separation)
1367 line text -- original line text with following markers inserted:
1368 '\0+' -- marks start of added text
1369 '\0-' -- marks start of deleted text
1370 '\0^' -- marks start of changed text
1371 '\1' -- marks end of added/deleted/changed text
1373 boolean flag -- None indicates context separation, True indicates
1387 change_re = re.compile('(\++|\-+|\^+)')
1395 lines -- list of lines from the ndiff generator to produce a line of
1398 format_key -- '+' return first line in list with "add" markup around
1400 '-' return first line in list with "delete" markup around
1405 side -- indice into the num_lines list (0=from,1=to)
1406 num_lines -- from/to current line number. This is NOT intended to be a
1431 for key,(begin,end) in sub_info[::-1]:
1479 elif s.startswith('-?+?'):
1483 elif s.startswith('--++'):
1486 num_blanks_pending -= 1
1487 yield _make_line(lines,'-',0), None, True
1489 elif s.startswith(('--?+', '--+', '- ')):
1492 from_line,to_line = _make_line(lines,'-',0), None
1493 num_blanks_to_yield,num_blanks_pending = num_blanks_pending-1,0
1494 elif s.startswith('-+?'):
1498 elif s.startswith('-?+'):
1502 elif s.startswith('-'):
1504 num_blanks_pending -= 1
1505 yield _make_line(lines,'-',0), None, True
1507 elif s.startswith('+--'):
1513 elif s.startswith(('+ ', '+-')):
1532 num_blanks_to_yield -= 1
1601 lines_to_write -= 1
1603 lines_to_write = context-1
1608 lines_to_write = context-1
1610 lines_to_write -= 1
1615 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
1616 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
1621 <meta http-equiv="Content-Type"
1622 content="text/html; charset=ISO-8859-1" />
1635 table.diff {font-family:Courier; border:medium;}
1636 .diff_header {background-color:#e0e0e0}
1637 td.diff_header {text-align:right}
1638 .diff_next {background-color:#c0c0c0}
1639 .diff_add {background-color:#aaffaa}
1640 .diff_chg {background-color:#ffff77}
1641 .diff_sub {background-color:#ffaaaa}"""
1644 <table class="diff" id="difflib_chg_%(prefix)s_top"
1654 <table class="diff" summary="Legends">
1675 of text with inter-line and intra-line change highlights. The table can
1680 make_table -- generates HTML for a single side by side table
1681 make_file -- generates complete HTML file with a single side by side table
1683 See tools/scripts/diff.py for an example usage of this class.
1697 tabsize -- tab stop spacing, defaults to 8.
1698 wrapcolumn -- column number where lines are broken and wrapped,
1700 linejunk,charjunk -- keyword arguments passed into ndiff() (used to by
1714 fromlines -- list of "from" lines
1715 tolines -- list of "to" lines
1716 fromdesc -- "from" file column header string
1717 todesc -- "to" file column header string
1718 context -- set to True for contextual differences (defaults to False
1720 numlines -- number of context lines. When context is set True,
1773 if (size <= max) or ((size -(text.count('\0')*3)) <= max):
1863 side -- 0 or 1 indicating "from" or "to" text
1864 flag -- indicates if difference on line
1865 linenum -- line number (used for line number column)
1866 text -- line text to be marked up
1877 # make space non-breakable so they don't get compressed or line wrapped
1913 i = max([0,i-numlines])
1946 fromlines -- list of "from" lines
1947 tolines -- list of "to" lines
1948 fromdesc -- "from" file column header string
1949 todesc -- "to" file column header string
1950 context -- set to True for contextual differences (defaults to False
1952 numlines -- number of context lines. When context is set True,
2013 replace('\0-','<span class="diff_sub">'). \
2022 Generate one of the two sequences that generated a delta.
2030 >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
2032 >>> diff = list(diff)
2033 >>> print ''.join(restore(diff, 1)),
2037 >>> print ''.join(restore(diff, 2)),
2043 tag = {1: "- ", 2: "+ "}[int(which)]