edu.illinois.cs.cogcomp.lbj.coref.features
Class WordNetTools

java.lang.Object
  extended by edu.illinois.cs.cogcomp.lbj.coref.features.WordNetTools
All Implemented Interfaces:
java.io.Externalizable, java.io.Serializable

public class WordNetTools
extends java.lang.Object
implements java.io.Externalizable

A collection of methods for dealing with WordNet, with functionality to look up words in WordNet and determine the synonyms, hypernyms, and antonyms. Since WordNet can be slow, also caches lookups. Provides facility for loading and saving from a file, enabling cache persistence.

See Also:
Serialized Form

Field Summary
private  java.util.Map<java.lang.String,java.lang.Boolean> areAntCache
           
private  java.util.Map<java.lang.String,java.lang.Boolean> areHypCache
           
private  java.util.Map<java.lang.String,java.lang.Boolean> areSynCache
           
private static java.lang.String m_className
           
private  DictionaryDatabase m_dict
           
private static java.lang.String m_dsrBase
           
private static java.lang.String m_dsrName
           
private static java.lang.String m_packageName
           
private static WordNetTools m_wn
           
private static long serialVersionUID
           
private  java.util.Map<java.lang.String,java.lang.Boolean> shareHypCache
           
private  java.util.Map<java.lang.String,java.lang.Boolean> shareHypPOSCache
           
private  java.util.Map<java.lang.String,java.lang.Boolean> shareSynCache
           
private  java.util.Map<java.lang.String,IndexWord> wordCache
           
 
Constructor Summary
WordNetTools()
          Constructs a new WordNetTools object with empty caches.
 
Method Summary
 boolean areAntonyms(java.lang.String w1, java.lang.String w2)
          Determines whether two phrases are antonyms using WordNet.
 boolean areHypernyms(java.lang.String w1, java.lang.String w2)
          Determines whether one phrase is the hypernym of another using WordNet.
 boolean areSynonyms(java.lang.String w1, java.lang.String w2)
          Determines whether two phrases are synonyms using WordNet.
 java.util.Set<Word> getAllHypernyms(IndexWord word)
          Gets all hypernyms of all senses of a word.
 java.util.Set<Word> getAllHypernyms(PointerTarget word)
          Gets all hypernyms of a word.
 java.util.Set<Word> getAntonyms(IndexWord word)
          Gets all antonyms for all senses of a word.
protected static java.lang.String getFQDSRName()
          Determines the location where the serialization file should be loaded or saved.
 java.util.Set<Word> getHypernyms(IndexWord word)
          Gets hypernyms of all senses of a word.
 java.lang.String[] getHypernymStrings(java.lang.String word)
          Looks up word and gets all hypernyms for all senses of a word.
private  IndexWord getIndexNoun(java.lang.String word)
          Looks up the IndexWord of a string in WordNet.
private  IndexWord getIndexWord(java.lang.String word)
          Looks up the IndexWord of a string in WordNet.
private  IndexWord getIndexWord(java.lang.String word, java.lang.String pos)
          Looks up the IndexWord of a string in WordNet.
 java.util.List<Word> getSenseWords(IndexWord word)
          Gets the Word for each sense of word.
 java.util.Set<Word> getSynonyms(IndexWord word)
          Gets all synonyms for all senses of a word.
 java.lang.String[] getSynonymStrings(java.lang.String word)
          Looks up word and gets all synonyms for all senses of a word.
static WordNetTools getWN()
          Gets the singleton WordNetTools, with its associated caches.
private  Word getWord(PointerTarget p)
          Translates a pointer target to a Word.
private  boolean hasLetters(java.lang.String phrase)
          Determines whether a phrase has any letters.
private  java.lang.String join(java.lang.String a, java.lang.String b)
           
private static WordNetTools loadWN()
          Load the WN from a precomputed location.
protected  IndexWord lookupIndexNounSafe(java.lang.String word)
          Looks up the IndexWord of a string in WordNet.
protected  IndexWord lookupIndexWordSafe(java.lang.String word, java.lang.String pos)
          Looks up the IndexWord of a string in WordNet.
private
<T> boolean
overlap(java.util.Set<T> a, java.util.Collection<T> b)
           
 void printSynsets(IndexWord word)
          Writes the synsets of word to System.out.
 void printSynsets(java.lang.String word)
          Writes the synsets of word to System.out.
 void readExternal(java.io.ObjectInput in)
           
private  void readObject(java.io.ObjectInputStream in)
           
static void saveWN()
          Saves the WordNetTools with its associated caches.
 boolean shareHypernyms(java.lang.String w1, java.lang.String w2)
          Determines whether two phrases share hypernyms using WordNet.
 boolean shareHypernymsPOS(java.lang.String w1, java.lang.String w2)
          Determines whether two phrases share hypernyms using WordNet.
 boolean shareSynonymns(java.lang.String w1, java.lang.String w2)
          Determines whether any sense of one phrase has the same synonym as any sense of another phrase.
private  void startup()
          Load the underlying WordNet FileBackedDictionary.
 void writeExternal(java.io.ObjectOutput out)
          Writes the object using the externalization protocol.
private  void writeObject(java.io.ObjectOutputStream out)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

m_packageName

private static java.lang.String m_packageName

m_className

private static java.lang.String m_className

m_dsrBase

private static final java.lang.String m_dsrBase
See Also:
Constant Field Values

m_dsrName

private static java.lang.String m_dsrName

m_wn

private static WordNetTools m_wn

m_dict

private transient DictionaryDatabase m_dict

wordCache

private transient java.util.Map<java.lang.String,IndexWord> wordCache

shareSynCache

private java.util.Map<java.lang.String,java.lang.Boolean> shareSynCache

shareHypPOSCache

private java.util.Map<java.lang.String,java.lang.Boolean> shareHypPOSCache

shareHypCache

private java.util.Map<java.lang.String,java.lang.Boolean> shareHypCache

areSynCache

private java.util.Map<java.lang.String,java.lang.Boolean> areSynCache

areHypCache

private java.util.Map<java.lang.String,java.lang.Boolean> areHypCache

areAntCache

private java.util.Map<java.lang.String,java.lang.Boolean> areAntCache

serialVersionUID

private static final long serialVersionUID
See Also:
Constant Field Values
Constructor Detail

WordNetTools

public WordNetTools()
Constructs a new WordNetTools object with empty caches.

See Also:
getWN()
Method Detail

startup

private void startup()
Load the underlying WordNet FileBackedDictionary.

Throws:
java.lang.RuntimeException - if the database cannot be loaded.

areSynonyms

public boolean areSynonyms(java.lang.String w1,
                           java.lang.String w2)
Determines whether two phrases are synonyms using WordNet. This method is case-sensitive to the extent that the underlying database is. Input strings are assumed to be nouns or noun phrases. If any sense of w1 is synonymous with any sense of w2, or vice versa, returns true.

Parameters:
w1 - A string.
w2 - Another string.
Returns:
Whether two phrases are synonyms.

areAntonyms

public boolean areAntonyms(java.lang.String w1,
                           java.lang.String w2)
Determines whether two phrases are antonyms using WordNet. This method is case-sensitive to the extent that the underlying database is. Input strings are assumed to be nouns or noun phrases. If any sense of w1 is an antonym of any sense of w2, or vice versa, returns true.

Parameters:
w1 - A string.
w2 - Another string.
Returns:
Whether two phrases are antonyms.

areHypernyms

public boolean areHypernyms(java.lang.String w1,
                            java.lang.String w2)
Determines whether one phrase is the hypernym of another using WordNet. This method is case-sensitive to the extent that the underlying database is. Input strings are assumed to be nouns or noun phrases. If any sense of w1 is a hypernym of any sense of w2, or vice versa, returns true. Not restricted to direct hypernyms (other ancestors or descendants may appear along the hypernym path between the two phrases).

Parameters:
w1 - A string.
w2 - Another string.
Returns:
Whether either phrase is the hypernym of the other.

shareHypernyms

public boolean shareHypernyms(java.lang.String w1,
                              java.lang.String w2)
Determines whether two phrases share hypernyms using WordNet. Since entries in WordNet would generally share a very high-level hypernym such as "entity", entries near the root of the hypernym tree are not considered matches. This method is case-sensitive to the extent that the underlying database is. Input strings are assumed to be nouns or noun phrases. If any sense of w1 is the hypernym of w2, or vice versa, returns true.

Parameters:
w1 - A string.
w2 - Another string.
Returns:
Whether the phrases share a hypernym.

shareHypernymsPOS

public boolean shareHypernymsPOS(java.lang.String w1,
                                 java.lang.String w2)
Determines whether two phrases share hypernyms using WordNet. Since entries in WordNet would generally share a very high-level hypernym such as "entity", entries near the root of the hypernym tree are not considered matches. This method is case-sensitive to the extent that the underlying database is. Input strings are assumed to be nouns, except that if no such entry is found, they are assumed to be adjectives. If any sense of w1 is the hypernym of w2, or vice versa, returns true.

Parameters:
w1 - A string.
w2 - Another string.
Returns:
Whether the phrases share a hypernym.

shareSynonymns

public boolean shareSynonymns(java.lang.String w1,
                              java.lang.String w2)
Determines whether any sense of one phrase has the same synonym as any sense of another phrase. This method is case-sensitive to the extent that the underlying database is. Input strings are assumed to be nouns or noun phrases.

Parameters:
w1 - A string.
w2 - Another string.
Returns:
Whether two phrases share a synonym.

getSynonyms

public java.util.Set<Word> getSynonyms(IndexWord word)
Gets all synonyms for all senses of a word.

Parameters:
word - An entry in WordNet.
Returns:
The set of Words synonymous with word.

getSynonymStrings

public java.lang.String[] getSynonymStrings(java.lang.String word)
Looks up word and gets all synonyms for all senses of a word.

Parameters:
word - Any string, assumed to be a noun.
Returns:
The phrases synonymous with word.

getAntonyms

public java.util.Set<Word> getAntonyms(IndexWord word)
Gets all antonyms for all senses of a word.

Parameters:
word - An entry in WordNet.
Returns:
The set of Words that are antonyms of word.

getAllHypernyms

public java.util.Set<Word> getAllHypernyms(IndexWord word)
Gets all hypernyms of all senses of a word.

Parameters:
word - An entry in WordNet.
Returns:
The set of Words that are hypernyms of word.

getAllHypernyms

public java.util.Set<Word> getAllHypernyms(PointerTarget word)
Gets all hypernyms of a word.

Parameters:
word - A pointer target entry in WordNet, which may be a Word or Synset (in which case the first element will be used).
Returns:
The set of Words that are hypernyms of word.

getHypernymStrings

public java.lang.String[] getHypernymStrings(java.lang.String word)
Looks up word and gets all hypernyms for all senses of a word.

Parameters:
word - Any string, assumed to be a noun.
Returns:
The hypernyms of word.

getHypernyms

public java.util.Set<Word> getHypernyms(IndexWord word)
Gets hypernyms of all senses of a word. Does not include top level entries such as "entity", "abstraction", "physical entity", "object", "whole", "artifact", or "group", since practically every entry would have one of these as a hypernym.

Parameters:
word - An entry in WordNet.
Returns:
The set of Words that are hypernyms of word.
See Also:
getAllHypernyms(IndexWord)

getWord

private Word getWord(PointerTarget p)
Translates a pointer target to a Word. If p is a Synset, the first entry is returned.

Parameters:
p - A pointer target representing a word or synset.
Returns:
The word, the first word of a synset, or null.

getIndexWord

private IndexWord getIndexWord(java.lang.String word)
Looks up the IndexWord of a string in WordNet. The string is assumed to be a noun (or noun phrase), but if no such entry is found, it is assumed to be an adjective.

Parameters:
word - The word or phrase to be looked up.
Returns:
The IndexWord entry, or null.

getIndexNoun

private IndexWord getIndexNoun(java.lang.String word)
Looks up the IndexWord of a string in WordNet. The string is assumed to be a noun (or noun phrase).

Parameters:
word - The word or phrase to be looked up.
Returns:
The IndexWord entry, or null.

getIndexWord

private IndexWord getIndexWord(java.lang.String word,
                               java.lang.String pos)
Looks up the IndexWord of a string in WordNet. If the desired word is not found, the word will be backed-off by attempting to add "s" or "es", and finally by using the lookupBaseForm method (which result will only be accepted if at least the length of word minus 2). This method is cached based on the word (ignoring the POS), so a given word string will always return the same IndexWord, even if pos differs.

Parameters:
word - The word or phrase to be looked up.
pos - The part of speech of the desired entry, "NOUN" or "ADJ".
Returns:
The IndexWord entry, or null.

lookupIndexNounSafe

protected IndexWord lookupIndexNounSafe(java.lang.String word)
Looks up the IndexWord of a string in WordNet. The string is assumed to be a noun (or noun phrase). If the entry is not found, a shorter entry will NOT be returned.

Parameters:
word - The word or phrase to be looked up.
Returns:
The IndexWord entry, or null.

lookupIndexWordSafe

protected IndexWord lookupIndexWordSafe(java.lang.String word,
                                        java.lang.String pos)
Looks up the IndexWord of a string in WordNet. This method is not cached. so a given word string will always return the same IndexWord, even if pos differs.

Parameters:
word - The word or phrase to be looked up. If punctuation or number, it will be ignored.
pos - The part of speech of the desired entry, either "NOUN" or "ADJ".
Returns:
The IndexWord entry, or null.

hasLetters

private boolean hasLetters(java.lang.String phrase)
Determines whether a phrase has any letters.

Parameters:
phrase - any text.
Returns:
Whether the phrase contains any letters.

getSenseWords

public java.util.List<Word> getSenseWords(IndexWord word)
Gets the Word for each sense of word.

Parameters:
word - The desired entry.
Returns:
The senses of word as Words.

overlap

private <T> boolean overlap(java.util.Set<T> a,
                            java.util.Collection<T> b)

join

private java.lang.String join(java.lang.String a,
                              java.lang.String b)

printSynsets

public void printSynsets(java.lang.String word)
Writes the synsets of word to System.out.

Parameters:
word - The noun or noun phrase.

printSynsets

public void printSynsets(IndexWord word)
Writes the synsets of word to System.out.

Parameters:
word - The word or phrase.

writeObject

private void writeObject(java.io.ObjectOutputStream out)
                  throws java.io.IOException
Throws:
java.io.IOException

readObject

private void readObject(java.io.ObjectInputStream in)
                 throws java.io.IOException,
                        java.lang.ClassNotFoundException
Throws:
java.io.IOException
java.lang.ClassNotFoundException

readExternal

public void readExternal(java.io.ObjectInput in)
Specified by:
readExternal in interface java.io.Externalizable

writeExternal

public void writeExternal(java.io.ObjectOutput out)
Writes the object using the externalization protocol.

Specified by:
writeExternal in interface java.io.Externalizable

loadWN

private static WordNetTools loadWN()
Load the WN from a precomputed location.


saveWN

public static void saveWN()
Saves the WordNetTools with its associated caches.


getFQDSRName

protected static java.lang.String getFQDSRName()
Determines the location where the serialization file should be loaded or saved.

Returns:
The fully qualified filename of the serialization file.
See Also:
loadWN(), saveWN()

getWN

public static WordNetTools getWN()
Gets the singleton WordNetTools, with its associated caches.

Returns:
The WordNetTools object.