edu.illinois.cs.cogcomp.lbj.coref.features
Class AliasFeatures

java.lang.Object
  extended by edu.illinois.cs.cogcomp.lbj.coref.features.AliasFeatures

public class AliasFeatures
extends java.lang.Object

Collection of feature generating functions that relate to aliases. For example features are provided to determine whether one mention is an initialism or acronym of the other, and to find the initials of a string.


Constructor Summary
protected AliasFeatures()
          No need to construct collection of features.
 
Method Summary
static boolean areSoonAlias(CExample ex)
          Determines whether two mentions are aliases, as computed by the two parameter form of this method, and using gold Entity Types.
static boolean areSoonAlias(CExample ex, boolean useGoldEType)
          Determines whether two mentions are aliases, as defined in Soon et al., Computational Linguistics, 2001.
static boolean areSoonAlias(CExample ex, boolean useGoldEType, boolean useCache)
          Determines whether two mentions are aliases, as defined in Soon et al., Computational Linguistics, 2001.
static boolean areSoonAliasBetter(CExample ex, boolean useGoldETypes)
          Determines whether the mentions are aliases.
static boolean doInitialsMatch(CExample ex)
          Determines whether the heads of two mentions have the same initials.
static boolean doInitialsMatchBetter(CExample ex)
          Checks whether two mentions initials match.
static boolean doInitialsMatchBetter(java.lang.String initials, java.lang.String[] words)
          Determines whether initials is the initials corresponding to words.
static boolean doLastNamesMatch(CExample ex, boolean useGoldETypes)
          Determines whether mentions have the same last name, if people.
static java.lang.String getInitials(java.lang.String s)
          Computes the initials of s.
static java.lang.String getInitials(java.lang.String[] parts)
          Computes the initials of parts, including a character for each non-stop word.
static java.lang.String getSoonInitials(java.lang.String[] parts, java.util.Set<java.lang.String> suffixes, boolean useCase)
          Computes the initials of parts, by returning a string containing the first letter of each non-stop word, except that suffixes are excluded and if useCase is true only words beginning with an uppercase character are included.
static java.lang.String getSoonInitials(java.lang.String s, java.util.Set<java.lang.String> suffixes)
          Computes the initials of the specified string.
static boolean ignorable(java.lang.String w)
          Determines whether a word can be ignored when computing initials or determining whether initials match.
static boolean noETypeAlias(CExample ex)
          Determine whether the mentions are aliases, without using entity types.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

AliasFeatures

protected AliasFeatures()
No need to construct collection of features.

Method Detail

areSoonAliasBetter

public static boolean areSoonAliasBetter(CExample ex,
                                         boolean useGoldETypes)
Determines whether the mentions are aliases. If personal names, and their last head words match, yes. Or if non-pronouns with the same entity type (or either/both "unknown") and their heads have matching initials, yes. (Note that at exactly one mention should be longer than one word for the mentions to be initials.) Assumes mention types have been set (either predicted or true.) Words are split by non-alphanum characters.

Parameters:
ex - The example
useGoldETypes - if true, uses mention's given entity types; if false, gets entity types using EntityTypeFeatures.
Returns:
Whether the mentions are aliases.

noETypeAlias

public static boolean noETypeAlias(CExample ex)
Determine whether the mentions are aliases, without using entity types. For use in baseline system.

Parameters:
ex - The example.
Returns:
Whether the mentions are aliases.

doInitialsMatchBetter

public static boolean doInitialsMatchBetter(CExample ex)
Checks whether two mentions initials match. If exactly one mention has one word, it is treated as initials and the other mention's head is compared with it using doInitialsMatchBetter(String, String[]). Words are split on non-alphanum, non-period characters.

Parameters:
ex - The example
Returns:
true when one mention is the initials of another mention.

doInitialsMatchBetter

public static boolean doInitialsMatchBetter(java.lang.String initials,
                                            java.lang.String[] words)
Determines whether initials is the initials corresponding to words. Non-word characters are ignored, and any word that is ignorable is optional.

Parameters:
initials - A string representing the initials of some phrase.
words - The words whose initials will be computed and compared.
Returns:
Whether the initials for words match initials.

ignorable

public static boolean ignorable(java.lang.String w)
Determines whether a word can be ignored when computing initials or determining whether initials match. Currently, stop words and organizational suffixes are ignorable.

Parameters:
w - A lowercase string.
Returns:
true when w is ignorable.

areSoonAlias

public static boolean areSoonAlias(CExample ex)
Determines whether two mentions are aliases, as computed by the two parameter form of this method, and using gold Entity Types.

Parameters:
ex - The example whose mentions will be checked for relatedness.
Returns:
Whether the mentions are aliases.

areSoonAlias

public static boolean areSoonAlias(CExample ex,
                                   boolean useGoldEType)
Determines whether two mentions are aliases, as defined in Soon et al., Computational Linguistics, 2001. For proper names of people or Geo-Political Entities (GPEs), returns true when the last words of the heads are equal (ignoring case) For organizations, determines whether the mention having less words is an initialism of the mention having more words (ignoring punctuation in the shorter). Caching will not be used.

Parameters:
ex - The example whose mentions will be checked for relatedness.
useGoldEType - If true, use the gold Entity Type, otherwise, predict it.
Returns:
Whether the mentions are aliases as defined above.

areSoonAlias

public static boolean areSoonAlias(CExample ex,
                                   boolean useGoldEType,
                                   boolean useCache)
Determines whether two mentions are aliases, as defined in Soon et al., Computational Linguistics, 2001. For proper names of people or GeoPolitical Entities (GPEs), returns true when the last words of the heads are equal (ignoring case) For organizations, determines whether the mention having less words is an initialism of the mention having more words (ignoring punctuation in the shorter).

Parameters:
ex - The example whose mentions will be checked for relatedness.
useGoldEType - If true, use the gold Entity Type, otherwise, predict it.
useCache - Whether to use the cached values for entity type as determined by EntityTypeFeatures.getEType(Mention, boolean).
Returns:
Whether the mentions are aliases as defined above.

doInitialsMatch

public static boolean doInitialsMatch(CExample ex)
Determines whether the heads of two mentions have the same initials. Convert both mentions to initials (if multiword), and determine whether the initials are equal (ignoring case). Note that if both mentions are single word, this method returns true iff the mentions have the same spelling. Uses getInitials(String s).

Parameters:
ex - The example whose mentions will be compared.
Returns:
Whether the initials match.

getSoonInitials

public static java.lang.String getSoonInitials(java.lang.String s,
                                               java.util.Set<java.lang.String> suffixes)
Computes the initials of the specified string. Computes initials by returning a string containing the first letter of each non-stop word, except that suffixes are excluded. Uses smartcase (i.e. only words beginning with an uppercase character are included when s is not all lowercase), Splits s on whitespace.

Parameters:
s - The String whose initials will be computed.
Returns:
The initials of the given string.
See Also:
getSoonInitials(String[], Set, boolean)

getSoonInitials

public static java.lang.String getSoonInitials(java.lang.String[] parts,
                                               java.util.Set<java.lang.String> suffixes,
                                               boolean useCase)
Computes the initials of parts, by returning a string containing the first letter of each non-stop word, except that suffixes are excluded and if useCase is true only words beginning with an uppercase character are included.

Parameters:
parts - An array of strings corresponding to words in a phrase.
suffixes - A Set of Strings not to be used to form initials in the result.
useCase - When true, only include initials from words beginning with an uppercase letter. (Result will still be lowercase).
Returns:
A lowercase String containing the sequence of initials with no additional spaces or punctuation.

getInitials

public static java.lang.String getInitials(java.lang.String s)
Computes the initials of s.


getInitials

public static java.lang.String getInitials(java.lang.String[] parts)
Computes the initials of parts, including a character for each non-stop word.

Parameters:
parts - The words to compute initials on.
Returns:
A lowercase String containing the sequence of initials, with no additional spaces or punctuation.

doLastNamesMatch

public static boolean doLastNamesMatch(CExample ex,
                                       boolean useGoldETypes)
Determines whether mentions have the same last name, if people. Assumes mention types have been set (either predicted or true). Words are split by non-alphanum characters.

Parameters:
ex - The example
useGoldETypes - if true, uses mention's given entity types; if false, gets entity types using EntityTypeFeatures.
Returns:
Whether the mentions have the same last name.