Class LevenshteinDetailedDistance
- java.lang.Object
-
- org.apache.commons.text.similarity.LevenshteinDetailedDistance
-
- All Implemented Interfaces:
EditDistance<LevenshteinResults>
,SimilarityScore<LevenshteinResults>
public class LevenshteinDetailedDistance extends java.lang.Object implements EditDistance<LevenshteinResults>
An algorithm for measuring the difference between two character sequences.This is the number of changes needed to change one sequence into another, where each change is a single character modification (deletion, insertion or substitution).
- Since:
- 1.0
-
-
Field Summary
Fields Modifier and Type Field Description private static LevenshteinDetailedDistance
DEFAULT_INSTANCE
Default instance.private java.lang.Integer
threshold
Threshold.
-
Constructor Summary
Constructors Constructor Description LevenshteinDetailedDistance()
This returns the default instance that uses a version of the algorithm that does not use a threshold parameter.LevenshteinDetailedDistance(java.lang.Integer threshold)
If the threshold is not null, distance calculations will be limited to a maximum length.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description LevenshteinResults
apply(java.lang.CharSequence left, java.lang.CharSequence right)
Find the Levenshtein distance between two Strings.private static LevenshteinResults
findDetailedResults(java.lang.CharSequence left, java.lang.CharSequence right, int[][] matrix, boolean swapped)
Finds count for each of the three [insert, delete, substitute] operations needed.static LevenshteinDetailedDistance
getDefaultInstance()
Gets the default instance.java.lang.Integer
getThreshold()
Gets the distance threshold.private static LevenshteinResults
limitedCompare(java.lang.CharSequence left, java.lang.CharSequence right, int threshold)
Find the Levenshtein distance between two CharSequences if it's less than or equal to a given threshold.private static LevenshteinResults
unlimitedCompare(java.lang.CharSequence left, java.lang.CharSequence right)
Find the Levenshtein distance between two Strings.
-
-
-
Field Detail
-
DEFAULT_INSTANCE
private static final LevenshteinDetailedDistance DEFAULT_INSTANCE
Default instance.
-
threshold
private final java.lang.Integer threshold
Threshold.
-
-
Constructor Detail
-
LevenshteinDetailedDistance
public LevenshteinDetailedDistance()
This returns the default instance that uses a version of the algorithm that does not use a threshold parameter.
- See Also:
getDefaultInstance()
-
LevenshteinDetailedDistance
public LevenshteinDetailedDistance(java.lang.Integer threshold)
If the threshold is not null, distance calculations will be limited to a maximum length.If the threshold is null, the unlimited version of the algorithm will be used.
- Parameters:
threshold
- If this is null then distances calculations will not be limited. This may not be negative.
-
-
Method Detail
-
apply
public LevenshteinResults apply(java.lang.CharSequence left, java.lang.CharSequence right)
Find the Levenshtein distance between two Strings.
A higher score indicates a greater distance.
The previous implementation of the Levenshtein distance algorithm was from http://www.merriampark.com/ld.htm
Chas Emerick has written an implementation in Java, which avoids an OutOfMemoryError which can occur when my Java implementation is used with very large strings.
This implementation of the Levenshtein distance algorithm is from http://www.merriampark.com/ldjava.htmdistance.apply(null, *) = IllegalArgumentException distance.apply(*, null) = IllegalArgumentException distance.apply("","") = 0 distance.apply("","a") = 1 distance.apply("aaapppp", "") = 7 distance.apply("frog", "fog") = 1 distance.apply("fly", "ant") = 3 distance.apply("elephant", "hippo") = 7 distance.apply("hippo", "elephant") = 7 distance.apply("hippo", "zzzzzzzz") = 8 distance.apply("hello", "hallo") = 1
- Specified by:
apply
in interfaceEditDistance<LevenshteinResults>
- Specified by:
apply
in interfaceSimilarityScore<LevenshteinResults>
- Parameters:
left
- the first string, must not be nullright
- the second string, must not be null- Returns:
- result distance, or -1
- Throws:
java.lang.IllegalArgumentException
- if either String inputnull
-
getDefaultInstance
public static LevenshteinDetailedDistance getDefaultInstance()
Gets the default instance.- Returns:
- The default instace
-
getThreshold
public java.lang.Integer getThreshold()
Gets the distance threshold.- Returns:
- The distance threshold
-
limitedCompare
private static LevenshteinResults limitedCompare(java.lang.CharSequence left, java.lang.CharSequence right, int threshold)
Find the Levenshtein distance between two CharSequences if it's less than or equal to a given threshold.This implementation follows from Algorithms on Strings, Trees and Sequences by Dan Gusfield and Chas Emerick's implementation of the Levenshtein distance algorithm from http://www.merriampark.com/ld.htm
limitedCompare(null, *, *) = IllegalArgumentException limitedCompare(*, null, *) = IllegalArgumentException limitedCompare(*, *, -1) = IllegalArgumentException limitedCompare("","", 0) = 0 limitedCompare("aaapppp", "", 8) = 7 limitedCompare("aaapppp", "", 7) = 7 limitedCompare("aaapppp", "", 6)) = -1 limitedCompare("elephant", "hippo", 7) = 7 limitedCompare("elephant", "hippo", 6) = -1 limitedCompare("hippo", "elephant", 7) = 7 limitedCompare("hippo", "elephant", 6) = -1
- Parameters:
left
- the first CharSequence, must not be nullright
- the second CharSequence, must not be nullthreshold
- the target threshold, must not be negative- Returns:
- result distance, or -1
-
unlimitedCompare
private static LevenshteinResults unlimitedCompare(java.lang.CharSequence left, java.lang.CharSequence right)
Find the Levenshtein distance between two Strings.
A higher score indicates a greater distance.
The previous implementation of the Levenshtein distance algorithm was from http://www.merriampark.com/ld.htm
Chas Emerick has written an implementation in Java, which avoids an OutOfMemoryError which can occur when my Java implementation is used with very large strings.
This implementation of the Levenshtein distance algorithm is from http://www.merriampark.com/ldjava.htmunlimitedCompare(null, *) = IllegalArgumentException unlimitedCompare(*, null) = IllegalArgumentException unlimitedCompare("","") = 0 unlimitedCompare("","a") = 1 unlimitedCompare("aaapppp", "") = 7 unlimitedCompare("frog", "fog") = 1 unlimitedCompare("fly", "ant") = 3 unlimitedCompare("elephant", "hippo") = 7 unlimitedCompare("hippo", "elephant") = 7 unlimitedCompare("hippo", "zzzzzzzz") = 8 unlimitedCompare("hello", "hallo") = 1
- Parameters:
left
- the first CharSequence, must not be nullright
- the second CharSequence, must not be null- Returns:
- result distance, or -1
- Throws:
java.lang.IllegalArgumentException
- if either CharSequence input isnull
-
findDetailedResults
private static LevenshteinResults findDetailedResults(java.lang.CharSequence left, java.lang.CharSequence right, int[][] matrix, boolean swapped)
Finds count for each of the three [insert, delete, substitute] operations needed. This is based on the matrix formed based on the two character sequence.- Parameters:
left
- character sequence which need to be converted fromright
- character sequence which need to be converted tomatrix
- two dimensional array containingswapped
- tells whether the value for left character sequence and right character sequence were swapped to save memory- Returns:
- result object containing the count of insert, delete and substitute and total count needed
-
-