WordExample

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

edu.illinois.cs.cogcomp.lbj.coref.ir.examples
Class WordExample

java.lang.Object
  edu.illinois.cs.cogcomp.lbj.coref.ir.examples.Example
      edu.illinois.cs.cogcomp.lbj.coref.ir.examples.WordExample

Direct Known Subclasses:: BIOExample, ExtendHeadExample

public class WordExample
extends Example
extends Example

Represents a word in a document, for use in classification or classifier training, along with utility methods for getting information and features pertaining to the word.

Field Summary
`protected Doc`	`m_doc` The document containing the example word.
`protected int`	`m_numWords` The number of words in the document.
`protected int`	`m_wordN` The (zero-based) position of the example word in the document.

Constructor Summary
`WordExample()` Constructs an example that does not refer to any word.
`WordExample(Doc doc, int wordNum)` Constructs a word example from a given document and word number.

Method Summary
`java.lang.String`	`getCasedWord()` Gets the word at the word number plus the offset, without altering original case; Note this does NOT mean that any extra casing is applied here (although it may be applied at the Doc parsing stage).
`java.lang.String`	`getCasedWord(int offset)` Gets the word without altering original case; Note this does NOT mean that any extra casing is applied here (although it may be applied at the Doc parsing stage.)
`Doc`	`getDoc()`
`java.lang.String`	`getPOS()` Gets the part-of-speech (POS) tag of the word represented by this example.
`java.lang.String`	`getPOS(int offset)` Gets the part-of-speech (POS) tag of the word at the word number plus the offset.
`java.lang.String`	`getWord()` Gets a lowercase version of the word.
`java.lang.String`	`getWord(int offset)` Gets a lowercase version of the word at the word number plus the offset, or the empty string if no such word exists.
`java.lang.String`	`getWordCase()` Gets a description of the case of the word.
`java.lang.String`	`getWordCase(int offset)` Gets a description of the case of the specified word.
`java.lang.String`	`getWordCase(java.lang.String word)` Gets a description of the case of the specified word.
`int`	`getWordNum()`

Methods inherited from class edu.illinois.cs.cogcomp.lbj.coref.ir.examples.Example
`getLabel`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

m_doc

protected Doc m_doc

The document containing the example word.

m_wordN

protected int m_wordN

The (zero-based) position of the example word in the document.

m_numWords

protected int m_numWords

The number of words in the document.

Constructor Detail

WordExample

public WordExample()

Constructs an example that does not refer to any word. Not recommended for use unless word will be set.

WordExample

public WordExample(Doc doc,
                   int wordNum)

Constructs a word example from a given document and word number.

Parameters:: doc - The document containing the example word.; wordNum - The (zero-based) position of the word in the document.

Method Detail

getDoc

public Doc getDoc()

getWordNum

public int getWordNum()

getWord

public java.lang.String getWord()

Gets a lowercase version of the word.

Returns:: the word, in lowercase.

getWord

public java.lang.String getWord(int offset)

Gets a lowercase version of the word at the word number plus the offset, or the empty string if no such word exists.

Parameters:: offset - The position relative to the word number of the desired word.
Returns:: the word, in lowercase, or the empty string if no such word.

getCasedWord

public java.lang.String getCasedWord()

Gets the word at the word number plus the offset, without altering original case; Note this does NOT mean that any extra casing is applied here (although it may be applied at the Doc parsing stage).

Returns:: The word (without altering its case).

getCasedWord

public java.lang.String getCasedWord(int offset)

Gets the word without altering original case; Note this does NOT mean that any extra casing is applied here (although it may be applied at the Doc parsing stage.)

Parameters:: offset - The position relative to the word number of the desired word.
Returns:: The word (without altering its case), or the empty string if no such word exists.

getPOS

public java.lang.String getPOS()

Gets the part-of-speech (POS) tag of the word represented by this example.

Returns:: The part-of-speech tag of the word.
See Also:: Doc.getPOS(int)

getPOS

public java.lang.String getPOS(int offset)

Gets the part-of-speech (POS) tag of the word at the word number plus the offset.

Parameters:: offset - The position relative to the word number of the desired word's POS tag.
Returns:: The part-of-speech tag of the specified word.
See Also:: Doc.getPOS(int)

getWordCase

public java.lang.String getWordCase()

Gets a description of the case of the word.

Returns:: "allLower", "firstCap", "allCaps", "multiCase", "digit", "punc", "other" In case of a single-character uppercase word, returns "firstCap" Words beginning with a digit are "digit". This implies that words containing internal digits can still be "allLower" or "allUpper". Words beginning with punctuation are "punc", and word-internal punc is not considered "punc". Zero length words are "other". Words beginning with whitespace are "other". multiCase means mixed case, but "mixed*" as a feature is disallowed.

getWordCase

public java.lang.String getWordCase(int offset)

Gets a description of the case of the specified word.

Parameters:: offset - The position relative to the word number of the specified word.
Returns:: "allLower", "firstCap", "allCaps", "multiCase", "digit", "punc", "other" In case of a single-character uppercase word, returns "firstCap" Words beginning with a digit are "digit". This implies that words containing internal digits can still be "allLower" or "allUpper". Words beginning with punctuation are "punc", and word-internal punc is not considered "punc". Zero length words are "other". Words beginning with whitespace are "other". multiCase means mixed case, but "mixed*" as a feature is disallowed.

getWordCase

public java.lang.String getWordCase(java.lang.String word)

Gets a description of the case of the specified word.

Parameters:: word - The specified word.
Returns:: "allLower", "firstCap", "allCaps", "multiCase", "digit", "punc", "other" In case of a single-character uppercase word, returns "firstCap" Words beginning with a digit are "digit". This implies that words containing internal digits can still be "allLower" or "allUpper". Words beginning with punctuation are "punc", and word-internal punc is not considered "punc". Zero length words are "other". Words beginning with whitespace are "other". multiCase means mixed case, but "mixed*" as a feature is disallowed.

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

edu.illinois.cs.cogcomp.lbj.coref.ir.examples Class WordExample

m_doc

m_wordN

m_numWords

WordExample

WordExample

getDoc

getWordNum

getWord

getWord

getCasedWord

getCasedWord

getPOS

getPOS

getWordCase

getWordCase

getWordCase

edu.illinois.cs.cogcomp.lbj.coref.ir.examples
Class WordExample