|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object edu.illinois.cs.cogcomp.lbj.coref.features.StringSimilarityFeatures
public class StringSimilarityFeatures
A collection of features relating to the similarity of strings.
Constructor Summary | |
---|---|
protected |
StringSimilarityFeatures()
Should not need to construct this static feature library. |
Method Summary | |
---|---|
static int |
calcLevenshteinEditDist(java.lang.String a,
java.lang.String b)
Calculates the (unnormalized) Levenshtein edit distance for a pair of strings. |
static boolean |
doLastWordsMatch(CExample ex,
boolean useHead)
Determines whether the last word of one mention matches the last word of another mention. |
static double |
getEdit(CExample ex,
boolean useHead)
Gets the character-based edit distance, normalized by the length of the longer string. |
static int |
getNumDiffCapitalizedWords(CExample ex)
Determines the number of capitalized words occur in exactly one mention. |
static boolean |
leftSubstring(CExample ex,
boolean useHead)
Determines whether one mention's text begins with the text of the other mention. |
static boolean |
prenominalModifierWordMatchAnotherOrHeadWord(CExample ex)
Determines whether any words of one mention preceding its head match any words of the other mention that precede or are contained in the head. |
static boolean |
prenominalModifierWordMatchAnotherOrHeadWord(Mention m1,
Mention m2)
Determines whether a noun preceding the head of m1
matches a noun preceding or in the head of m2
No assumptions are made about the textual order
of m1 and m2 . |
static boolean |
rightSubstring(CExample ex,
boolean useHead)
Determines whether one mention's text ends with the text of the other mention. |
static boolean |
stringsMatchByWords(java.lang.String s1,
java.lang.String s2,
java.lang.String[] ignoreWords)
Determines whether two strings contain the same sequence of words, after dropping ignoreWords . |
static boolean |
subsequence(CExample ex,
boolean useHead)
Determines whether the sequence of words in one mention is a subsequence of the words in another mention. |
static boolean |
subsequence(java.lang.String big,
java.lang.String small)
Determines whether the sequence of words in big
is a subsequence of the words in small . |
static boolean |
substring(CExample ex,
boolean useHead)
Determines whether one mention's text is a substring of the other's. |
static boolean |
textMatchSoon(CExample ex,
boolean useHead)
Determines whether the text (either the heads or the extents) of the mentions match, after dropping stop words. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
protected StringSimilarityFeatures()
Method Detail |
---|
public static boolean substring(CExample ex, boolean useHead)
ex
- The example whose mentions' text will be compared.useHead
- if true, compare head text,
otherwise compare extent text.public static boolean leftSubstring(CExample ex, boolean useHead)
ex
- The example whose mentions' text will be compared.useHead
- if true, compare head text,
otherwise compare extent text.public static boolean rightSubstring(CExample ex, boolean useHead)
ex
- The example whose mentions' text will be compared.useHead
- if true, compare head text,
otherwise compare extent text.public static boolean textMatchSoon(CExample ex, boolean useHead)
ex
- The example whose mentions will be compared.useHead
- if true, compare head text,
otherwise compare extent text.public static boolean doLastWordsMatch(CExample ex, boolean useHead)
ex
- The example whose mentions' text will be compared.useHead
- if true, compare head text,
otherwise compare extent text.public static boolean prenominalModifierWordMatchAnotherOrHeadWord(CExample ex)
ex
- The example whose mentions will be compared.
public static boolean prenominalModifierWordMatchAnotherOrHeadWord(Mention m1, Mention m2)
m1
matches a noun preceding or in the head of m2
No assumptions are made about the textual order
of m1
and m2
.
Uses POS tags to determine whether words are nouns.
m1
- The first mention.m2
- The second mention.
m1
that precede its head
match any nouns from m2
preceding or in its head.public static boolean subsequence(CExample ex, boolean useHead)
ex
- The example whose mentions will be compared.useHead
- Whether heads or extents should be compared.
public static boolean subsequence(java.lang.String big, java.lang.String small)
big
is a subsequence of the words in small
.
Words are split using whitespace ("\s+")
and compared using a case-insensitive comparison.
big
- The bigger string.small
- The smaller string.
public static int getNumDiffCapitalizedWords(CExample ex)
ex
- The example whose mentions will be compared.
public static boolean stringsMatchByWords(java.lang.String s1, java.lang.String s2, java.lang.String[] ignoreWords)
ignoreWords
.
Note: Words are split by single whitespace characters (\s)
rather than by one or more whitespace characters (\s+)
for backwards compatibility.
s1
- The first string.s2
- The second string.ignoreWords
- The words to be ignored.
The words should be supplied in the lowercase.public static double getEdit(CExample ex, boolean useHead)
ex
- The example whose mentions should be compared.useHead
- Whether the heads or extents should be compared.public static int calcLevenshteinEditDist(java.lang.String a, java.lang.String b)
a
- One string.b
- Another string.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |