Lines Matching +full:diff +full:- +full:sequences
2 Module difflib -- helpers for computing deltas between objects.
8 For two lists of strings, return a delta in context diff format.
14 Return one of the two sequences that generated an ndiff delta.
17 For two lists of strings, return a delta in unified diff format.
20 A flexible class for comparing pairs of sequences of any type.
23 For producing human-readable deltas from sequences of lines of text.
47 SequenceMatcher is a flexible class for comparing pairs of sequences of
53 elements (R-O doesn't address junk). The same idea is then applied
54 recursively to the pieces of the sequences to the left and to the right
56 sequences, but does tend to yield matches that "look right" to people.
58 SequenceMatcher tries to compute a "human-friendly diff" between two
59 sequences. Unlike e.g. UNIX(tm) diff, the fundamental notion is the
60 longest *contiguous* & junk-free matching subsequence. That's what
64 reports than does diff. This method appears to be the least vulnerable
78 sequences. As a rule of thumb, a .ratio() value over 0.6 means the
79 sequences are close matches:
85 If you're only interested in where the sequences match,
107 See the Differ class for a fancy human-friendly file differencer, which
108 uses SequenceMatcher both to compare sequences of lines, and to compare
109 sequences of characters within similar (near-matching) lines.
114 Timing: Basic R-O is cubic time worst case and quadratic time expected
116 expected-case behavior dependent in a complicated way on how many
117 elements the sequences have in common; best case time is linear.
123 Optional arg isjunk is None (the default), or a one-argument
128 if you're comparing lines as sequences of characters, and don't
131 Optional arg a is the first of two sequences to be compared. By
135 Optional arg b is the second of two sequences to be compared. By
159 # ascending & non-overlapping in i and in j; terminated by
169 # a user-supplied function taking a sequence element and
170 # returning true iff the element is "junk" -- this has
185 """Set the two sequences to be compared.
211 many sequences, use .set_seq2(S) once and call .set_seq1(x)
212 repeatedly for each of the other sequences.
237 many sequences, use .set_seq2(S) once and call .set_seq1(x)
238 repeatedly for each of the other sequences.
259 # be viewed as an adaptive notion of semi-junk, and yields an enormous
262 # note that this is only called when b changes; so for cross-product
267 # Because isjunk is a user-defined (not C) function, and we test
270 # time-consuming routine in the whole module! If anyone sees
271 # Jim Roskind, thank him again for profile.py -- I never would
356 # stripped, it's "a" (tied with "b"). UNIX(tm) diff does so
360 # Windiff ends up at the same place as diff, but by pairing up
369 # find longest junk-free match
371 # junk-free match ending with a[i-1] and b[j]
385 k = newj2len[j] = j2lenget(j-1, 0) + 1
387 besti, bestj, bestsize = i-k+1, j-k+1, k
390 # Extend the best by non-junk elements on each end. In particular,
391 # "popular" non-junk elements aren't in b2j, which greatly speeds
393 # doesn't contain any junk *or* popular non-junk elements.
395 not isbjunk(b[bestj-1]) and \
396 a[besti-1] == b[bestj-1]:
397 besti, bestj, bestsize = besti-1, bestj-1, bestsize+1
406 # saves post-processing the (possibly considerable) expense of
411 isbjunk(b[bestj-1]) and \
412 a[besti-1] == b[bestj-1]:
413 besti, bestj, bestsize = besti-1, bestj-1, bestsize+1
474 # Yes, so collapse them -- this just increases the length of
493 """Return list of 5-tuples describing how to turn a into b.
529 # out a diff to change a[i:ai] into b[j:bj], pump out
578 codes[0] = tag, max(i1, i2-n), i2, max(j1, j2-n), j2
579 if codes[-1][0] == 'equal':
580 tag, i1, i2, j1, j2 = codes[-1]
581 codes[-1] = tag, i1, min(i2, i1+n), j1, min(j2, j1+n)
588 if tag == 'equal' and i2-i1 > nn:
592 i1, j1 = max(i1, i2-n), max(j1, j2-n)
598 """Return a measure of the sequences' similarity (float in [0,1]).
600 Where T is the total number of elements in both sequences, and
602 Note that this is 1 if the sequences are identical, and 0 if
619 matches = sum(triple[-1] for triple in self.get_matching_blocks())
646 avail[elt] = numb - 1
672 possibilities is a list of sequences against which to match word
726 Differ is a class for comparing sequences of lines of text, and
727 producing human-readable differences or deltas. Differ uses
728 SequenceMatcher both to compare sequences of lines, and to compare
729 sequences of characters within similar (near-matching) lines.
731 Each line of a Differ delta begins with a two-letter code:
733 '- ' line unique to sequence 1
735 ' ' line common to both sequences
740 can be confusing if the sequences contain tab characters.
742 Note that Differ makes no claim to produce a *minimal* diff. To the
743 contrary, minimal diffs are often counter-intuitive, because they synch
746 locality, at the occasional cost of producing a longer diff.
750 First we set up the texts, sequences of individual single-line strings
751 ending with newlines (such sequences can also be obtained from the
752 `readlines()` method of file-like objects):
761 >>> text1[0][-1]
780 'result' is a list of strings, so let's pretty-print it:
785 '- 2. Explicit is better than implicit.\n',
786 '- 3. Simple is better than complex.\n',
789 '- 4. Complex is better than complicated.\n',
790 '? ^ ---- ^\n',
795 As a single multi-line string it looks like this:
799 - 2. Explicit is better than implicit.
800 - 3. Simple is better than complex.
803 - 4. Complex is better than complicated.
804 ? ^ ---- ^
816 - `linejunk`: A function that should accept a single string argument,
817 and return true iff the string is junk. The module-level function
824 - `charjunk`: A function that should accept a string of length 1. The
825 module-level function `IS_CHARACTER_JUNK` may be used to filter out
835 Compare two sequences of lines; generate the resulting delta.
837 Each sequence must contain individual single-line strings ending with
838 newlines. Such sequences can be obtained from the `readlines()` method
839 of file-like objects. The delta generated also consists of newline-
840 terminated strings, ready to be printed as-is via the writelines()
841 method of a file-like object.
848 - one
852 - two
853 - three
854 ? -
864 g = self._dump('-', a, alo, ahi)
875 """Generate comparison results for a same-tagged range."""
881 # dump the shorter block first -- reduces the burden on short-term
883 if bhi - blo < ahi - alo:
885 second = self._dump('-', a, alo, ahi)
887 first = self._dump('-', a, alo, ahi)
896 for *similar* lines; the best-matching pair (if any) is used as a
906 - abcDefghiJkl
920 # on junk -- unless we have to)
932 # upper bounds first -- have seen this speed up messy
942 # no non-identical "pretty close" pair
944 # no identical pair either -- treat it as a straight replace
947 # no close pair, but an identical pair -- synch up on that
962 # pump out a '-', '?', '+', '?' quad for the synched lines
966 la, lb = ai2 - ai1, bj2 - bj1
971 atags += '-' * la
993 g = self._dump('-', a, alo, ahi)
1010 '- \tabcDefghiJkl\n'
1018 yield "- " + aline
1034 # looks for matching blocks that are entirely junk-free, then extends the
1081 ### Unified Diff
1086 # Per the diff spec at http://www.unix.org/single_unix_specification/
1088 length = stop - start
1092 beginning -= 1 # empty ranges begin at line just before the range
1098 Compare two sequences of lines; generate the delta as a unified diff.
1104 By default, the diff control lines (those with ---, +++, or @@) are
1122 ... '2005-01-26 23:30:50', '2010-04-02 10:20:52',
1125 --- Original 2005-01-26 23:30:50
1126 +++ Current 2010-04-02 10:20:52
1127 @@ -1,4 +1,4 @@
1130 -two
1131 -three
1143 yield '--- {}{}{}'.format(fromfile, fromdate, lineterm)
1146 first, last = group[0], group[-1]
1149 yield '@@ -{} +{} @@{}'.format(file1_range, file2_range, lineterm)
1158 yield '-' + line
1165 ### Context Diff
1170 # Per the diff spec at http://www.unix.org/single_unix_specification/
1172 length = stop - start
1174 beginning -= 1 # empty ranges begin at line just before the range
1177 return '{},{}'.format(beginning, beginning + length - 1)
1183 Compare two sequences of lines; generate the delta as a context diff.
1189 By default, the diff control lines (those with *** or ---) are
1198 The context diff format normally has a header for filenames and
1210 --- Current
1217 --- 1,4 ----
1225 prefix = dict(insert='+ ', delete='- ', replace='! ', equal=' ')
1233 yield '--- {}{}{}'.format(tofile, todate, lineterm)
1235 first, last = group[0], group[-1]
1248 yield '--- {} ----{}'.format(file2_range, lineterm)
1260 # --- b'oldfile.txt'
1276 Compare `a` and `b`, two sequences of lines represented as bytes rather
1305 Compare `a` and `b` (lists of strings); return a `Differ`-style delta.
1310 - linejunk: A function that should accept a single string argument and
1315 - charjunk: A function that accepts a character (string of length
1317 the module-level function IS_CHARACTER_JUNK, which filters out
1321 Tools/scripts/ndiff.py is a command-line front-end to this function.
1325 >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True),
1327 >>> print(''.join(diff), end="")
1328 - one
1332 - two
1333 - three
1334 ? -
1345 fromlines -- list of text lines to compared to tolines
1346 tolines -- list of text lines to be compared to fromlines
1347 context -- number of context lines to display on each side of difference,
1349 linejunk -- passed on to ndiff (see ndiff documentation)
1350 charjunk -- passed on to ndiff (see ndiff documentation)
1355 from/to line tuple -- (line num, line text)
1356 line num -- integer or None (to indicate a context separation)
1357 line text -- original line text with following markers inserted:
1358 '\0+' -- marks start of added text
1359 '\0-' -- marks start of deleted text
1360 '\0^' -- marks start of changed text
1361 '\1' -- marks end of added/deleted/changed text
1363 boolean flag -- None indicates context separation, True indicates
1377 change_re = re.compile(r'(\++|\-+|\^+)')
1385 lines -- list of lines from the ndiff generator to produce a line of
1388 format_key -- '+' return first line in list with "add" markup around
1390 '-' return first line in list with "delete" markup around
1395 side -- indice into the num_lines list (0=from,1=to)
1396 num_lines -- from/to current line number. This is NOT intended to be a
1466 elif s.startswith('-?+?'):
1470 elif s.startswith('--++'):
1473 num_blanks_pending -= 1
1474 yield _make_line(lines,'-',0), None, True
1476 elif s.startswith(('--?+', '--+', '- ')):
1479 from_line,to_line = _make_line(lines,'-',0), None
1480 num_blanks_to_yield,num_blanks_pending = num_blanks_pending-1,0
1481 elif s.startswith('-+?'):
1485 elif s.startswith('-?+'):
1489 elif s.startswith('-'):
1491 num_blanks_pending -= 1
1492 yield _make_line(lines,'-',0), None, True
1494 elif s.startswith('+--'):
1500 elif s.startswith(('+ ', '+-')):
1519 num_blanks_to_yield -= 1
1593 lines_to_write -= 1
1595 lines_to_write = context-1
1601 lines_to_write = context-1
1603 lines_to_write -= 1
1611 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
1612 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
1617 <meta http-equiv="Content-Type"
1631 table.diff {font-family:Courier; border:medium;}
1632 .diff_header {background-color:#e0e0e0}
1633 td.diff_header {text-align:right}
1634 .diff_next {background-color:#c0c0c0}
1635 .diff_add {background-color:#aaffaa}
1636 .diff_chg {background-color:#ffff77}
1637 .diff_sub {background-color:#ffaaaa}"""
1640 <table class="diff" id="difflib_chg_%(prefix)s_top"
1650 <table class="diff" summary="Legends">
1671 of text with inter-line and intra-line change highlights. The table can
1676 make_table -- generates HTML for a single side by side table
1677 make_file -- generates complete HTML file with a single side by side table
1679 See tools/scripts/diff.py for an example usage of this class.
1693 tabsize -- tab stop spacing, defaults to 8.
1694 wrapcolumn -- column number where lines are broken and wrapped,
1696 linejunk,charjunk -- keyword arguments passed into ndiff() (used by
1706 context=False, numlines=5, *, charset='utf-8'):
1710 fromlines -- list of "from" lines
1711 tolines -- list of "to" lines
1712 fromdesc -- "from" file column header string
1713 todesc -- "to" file column header string
1714 context -- set to True for contextual differences (defaults to False
1716 numlines -- number of context lines. When context is set True,
1721 charset -- charset of the HTML document
1772 if (size <= max) or ((size -(text.count('\0')*3)) <= max):
1862 side -- 0 or 1 indicating "from" or "to" text
1863 flag -- indicates if difference on line
1864 linenum -- line number (used for line number column)
1865 text -- line text to be marked up
1876 # make space non-breakable so they don't get compressed or line wrapped
1912 i = max([0,i-numlines])
1945 fromlines -- list of "from" lines
1946 tolines -- list of "to" lines
1947 fromdesc -- "from" file column header string
1948 todesc -- "to" file column header string
1949 context -- set to True for contextual differences (defaults to False
1951 numlines -- number of context lines. When context is set True,
2012 replace('\0-','<span class="diff_sub">'). \
2021 Generate one of the two sequences that generated a delta.
2029 >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True),
2031 >>> diff = list(diff)
2032 >>> print(''.join(restore(diff, 1)), end="")
2036 >>> print(''.join(restore(diff, 2)), end="")
2042 tag = {1: "- ", 2: "+ "}[int(which)]