edu.illinois.cs.cogcomp.lbj.coref.ir.examples
Class WordExample

java.lang.Object
  extended by edu.illinois.cs.cogcomp.lbj.coref.ir.examples.Example
      extended by edu.illinois.cs.cogcomp.lbj.coref.ir.examples.WordExample
Direct Known Subclasses:
BIOExample, ExtendHeadExample

public class WordExample
extends Example

Represents a word in a document, for use in classification or classifier training, along with utility methods for getting information and features pertaining to the word.


Field Summary
protected  Doc m_doc
          The document containing the example word.
protected  int m_numWords
          The number of words in the document.
protected  int m_wordN
          The (zero-based) position of the example word in the document.
 
Constructor Summary
WordExample()
          Constructs an example that does not refer to any word.
WordExample(Doc doc, int wordNum)
          Constructs a word example from a given document and word number.
 
Method Summary
 java.lang.String getCasedWord()
          Gets the word at the word number plus the offset, without altering original case; Note this does NOT mean that any extra casing is applied here (although it may be applied at the Doc parsing stage).
 java.lang.String getCasedWord(int offset)
          Gets the word without altering original case; Note this does NOT mean that any extra casing is applied here (although it may be applied at the Doc parsing stage.)
 Doc getDoc()
           
 java.lang.String getPOS()
          Gets the part-of-speech (POS) tag of the word represented by this example.
 java.lang.String getPOS(int offset)
          Gets the part-of-speech (POS) tag of the word at the word number plus the offset.
 java.lang.String getWord()
          Gets a lowercase version of the word.
 java.lang.String getWord(int offset)
          Gets a lowercase version of the word at the word number plus the offset, or the empty string if no such word exists.
 java.lang.String getWordCase()
          Gets a description of the case of the word.
 java.lang.String getWordCase(int offset)
          Gets a description of the case of the specified word.
 java.lang.String getWordCase(java.lang.String word)
          Gets a description of the case of the specified word.
 int getWordNum()
           
 
Methods inherited from class edu.illinois.cs.cogcomp.lbj.coref.ir.examples.Example
getLabel
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

m_doc

protected Doc m_doc
The document containing the example word.


m_wordN

protected int m_wordN
The (zero-based) position of the example word in the document.


m_numWords

protected int m_numWords
The number of words in the document.

Constructor Detail

WordExample

public WordExample()
Constructs an example that does not refer to any word. Not recommended for use unless word will be set.


WordExample

public WordExample(Doc doc,
                   int wordNum)
Constructs a word example from a given document and word number.

Parameters:
doc - The document containing the example word.
wordNum - The (zero-based) position of the word in the document.
Method Detail

getDoc

public Doc getDoc()

getWordNum

public int getWordNum()

getWord

public java.lang.String getWord()
Gets a lowercase version of the word.

Returns:
the word, in lowercase.

getWord

public java.lang.String getWord(int offset)
Gets a lowercase version of the word at the word number plus the offset, or the empty string if no such word exists.

Parameters:
offset - The position relative to the word number of the desired word.
Returns:
the word, in lowercase, or the empty string if no such word.

getCasedWord

public java.lang.String getCasedWord()
Gets the word at the word number plus the offset, without altering original case; Note this does NOT mean that any extra casing is applied here (although it may be applied at the Doc parsing stage).

Returns:
The word (without altering its case).

getCasedWord

public java.lang.String getCasedWord(int offset)
Gets the word without altering original case; Note this does NOT mean that any extra casing is applied here (although it may be applied at the Doc parsing stage.)

Parameters:
offset - The position relative to the word number of the desired word.
Returns:
The word (without altering its case), or the empty string if no such word exists.

getPOS

public java.lang.String getPOS()
Gets the part-of-speech (POS) tag of the word represented by this example.

Returns:
The part-of-speech tag of the word.
See Also:
Doc.getPOS(int)

getPOS

public java.lang.String getPOS(int offset)
Gets the part-of-speech (POS) tag of the word at the word number plus the offset.

Parameters:
offset - The position relative to the word number of the desired word's POS tag.
Returns:
The part-of-speech tag of the specified word.
See Also:
Doc.getPOS(int)

getWordCase

public java.lang.String getWordCase()
Gets a description of the case of the word.

Returns:
"allLower", "firstCap", "allCaps", "multiCase", "digit", "punc", "other" In case of a single-character uppercase word, returns "firstCap" Words beginning with a digit are "digit". This implies that words containing internal digits can still be "allLower" or "allUpper". Words beginning with punctuation are "punc", and word-internal punc is not considered "punc". Zero length words are "other". Words beginning with whitespace are "other". multiCase means mixed case, but "mixed*" as a feature is disallowed.

getWordCase

public java.lang.String getWordCase(int offset)
Gets a description of the case of the specified word.

Parameters:
offset - The position relative to the word number of the specified word.
Returns:
"allLower", "firstCap", "allCaps", "multiCase", "digit", "punc", "other" In case of a single-character uppercase word, returns "firstCap" Words beginning with a digit are "digit". This implies that words containing internal digits can still be "allLower" or "allUpper". Words beginning with punctuation are "punc", and word-internal punc is not considered "punc". Zero length words are "other". Words beginning with whitespace are "other". multiCase means mixed case, but "mixed*" as a feature is disallowed.

getWordCase

public java.lang.String getWordCase(java.lang.String word)
Gets a description of the case of the specified word.

Parameters:
word - The specified word.
Returns:
"allLower", "firstCap", "allCaps", "multiCase", "digit", "punc", "other" In case of a single-character uppercase word, returns "firstCap" Words beginning with a digit are "digit". This implies that words containing internal digits can still be "allLower" or "allUpper". Words beginning with punctuation are "punc", and word-internal punc is not considered "punc". Zero length words are "other". Words beginning with whitespace are "other". multiCase means mixed case, but "mixed*" as a feature is disallowed.