Class MyersDiff<S extends Sequence>
- Type Parameters:
S
- type of sequence.
The basic idea is to put the line numbers of text A as columns ("x") and the lines of text B as rows ("y"). Now you try to find the shortest "edit path" from the upper left corner to the lower right corner, where you can always go horizontally or vertically, but diagonally from (x,y) to (x+1,y+1) only if line x in text A is identical to line y in text B.
Myers' fundamental concept is the "furthest reaching D-path on diagonal k": a D-path is an edit path starting at the upper left corner and containing exactly D non-diagonal elements ("differences"). The furthest reaching D-path on diagonal k is the one that contains the most (diagonal) elements which ends on diagonal k (where k = y - x).
Example:
H E L L O W O R L D ____ L \___ O \___ W \________
Since every D-path has exactly D horizontal or vertical elements, it can only end on the diagonals -D, -D+2, ..., D-2, D.
Since every furthest reaching D-path contains at least one furthest reaching (D-1)-path (except for D=0), we can construct them recursively.
Since we are really interested in the shortest edit path, we can start looking for a 0-path, then a 1-path, and so on, until we find a path that ends in the lower right corner.
To save space, we do not need to store all paths (which has quadratic space requirements), but generate the D-paths simultaneously from both sides. When the ends meet, we will have found "the middle" of the path. From the end points of that diagonal part, we can generate the rest recursively.
This only requires linear space.
The overall (runtime) complexity is:
O(N * D^2 + 2 * N/2 * (D/2)^2 + 4 * N/4 * (D/4)^2 + ...) = O(N * D^2 * 5 / 4) = O(N * D^2),
(With each step, we have to find the middle parts of twice as many regions as before, but the regions (as well as the D) are halved.)
So the overall runtime complexity stays the same with linear space, albeit with a larger constant factor.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescription(package private) class
A class to help bisecting the sequences a and b to find minimal edit paths. -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected HashedSequence<S>
The first text to be compared.protected HashedSequence<S>
The second text to be compared.protected HashedSequenceComparator<S>
Comparison function for sequences.protected EditList
The list of edits found during the last call tocalculateEdits(Edit)
static final DiffAlgorithm
Singleton instance of MyersDiff.(package private) MyersDiff<S>.MiddleEdit
-
Constructor Summary
ConstructorsModifierConstructorDescriptionprivate
MyersDiff
(EditList edits, HashedSequenceComparator<S> cmp, HashedSequence<S> a, HashedSequence<S> b, Edit region) -
Method Summary
Modifier and TypeMethodDescriptionprotected void
calculateEdits
(int beginA, int endA, int beginB, int endB) Calculates the differences between a given part of A against another given part of Bprivate void
Entrypoint into the algorithm this class is all about.static void
Main method
-
Field Details
-
INSTANCE
Singleton instance of MyersDiff. -
edits
The list of edits found during the last call tocalculateEdits(Edit)
-
cmp
Comparison function for sequences. -
a
The first text to be compared. Referred to as "Text A" in the comments -
b
The second text to be compared. Referred to as "Text B" in the comments -
middle
MyersDiff<S extends Sequence>.MiddleEdit middle
-
-
Constructor Details
-
MyersDiff
private MyersDiff(EditList edits, HashedSequenceComparator<S> cmp, HashedSequence<S> a, HashedSequence<S> b, Edit region)
-
-
Method Details
-
calculateEdits
Entrypoint into the algorithm this class is all about. This method triggers that the differences between A and B are calculated in form of a list of edits.- Parameters:
r
- portion of the sequences to examine.
-
calculateEdits
protected void calculateEdits(int beginA, int endA, int beginB, int endB) Calculates the differences between a given part of A against another given part of B- Parameters:
beginA
- start of the part of A which should be compared (0<=beginA<sizeof(A))endA
- end of the part of A which should be compared (beginA<=endA<sizeof(A))beginB
- start of the part of B which should be compared (0<=beginB<sizeof(B))endB
- end of the part of B which should be compared (beginB<=endB<sizeof(B))
-
main
Main method- Parameters:
args
- two filenames specifying the contents to be diffed
-