edu.illinois.cs.cogcomp.lbj.coref.ir.docs
Class DocPlainText

java.lang.Object
  extended by edu.illinois.cs.cogcomp.lbj.coref.ir.docs.DocBase
      extended by edu.illinois.cs.cogcomp.lbj.coref.ir.docs.DocPlainText
All Implemented Interfaces:
Doc, java.io.Serializable

public class DocPlainText
extends DocBase
implements Doc

Represents a Doc constructed from plain text.

To load a document from a string, construct using the no-arg constructor and then call loadFromPlainText(java.lang.String). To load the document including mention detection, see DocFromTextLoader

To load a document given the name of a plain text file, see DocPlainText(String). To load the document including mention detection, see DocPlainTextLoader.

Author:
Eric Bengtson
See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from class edu.illinois.cs.cogcomp.lbj.coref.ir.docs.DocBase
DocBase.PosSource
 
Field Summary
private static long serialVersionUID
           
 
Fields inherited from class edu.illinois.cs.cogcomp.lbj.coref.ir.docs.DocBase
goodEnds, goodStarts, m_annotationAuthor, m_baseFN, m_bNeedsCasing, m_caser, m_dateTime, m_docID, m_docType, m_encoding, m_headline, m_slug, m_source, m_text, m_version, medEnds, totalMentions
 
Constructor Summary
DocPlainText()
          Constructs an empty document.
DocPlainText(java.lang.String filename)
          Constructs a document using the specified plain text file.
 
Method Summary
 void loadFromFilename(java.lang.String filename)
          Builds this document from the specified plain text file.
 void loadFromPlainText(java.lang.String text)
          Builds the document from the given plain text, automatically splitting sentences, determining quote levels, determining part-of-speech tags, and splitting words by an automatic word-splitting algorithm.
 void loadFromPlainText(java.lang.String text, boolean doWordSplit)
          Builds the document from the given plain text, automatically splitting sentences, determining quote levels, determining part-of-speech tags, and either splitting words by whitespace or using a word-splitter.
 void write(java.lang.String filename, boolean usePredictions)
          Writes this Doc in the appropriate format.
 
Methods inherited from class edu.illinois.cs.cogcomp.lbj.coref.ir.docs.DocBase
addHeadPrediction, addPredEntities, addRelation, addTrueEntity, addTrueMention, alignPredMentsToTrue, buildMentionsContaining, buildMentionsInSents, calcAndSetQuotes, getBestMentionFor, getCExampleFor, getCoherenceInfo, getCoherenceInfo, getCorefChains, getDocID, getEntities, getEntityFor, getEntityFor, getEntityFor, getGExampleFor, getHeadPrediction, getInCorpusInverseFreq, getInDocInverseFreq, getInverseTrueHeadFreq, getInverseTrueHeadFreq, getMention, getMentions, getMentionsContainedIn, getMentionsContaining, getMentionsInSent, getMentionsInSentences, getMentionsWithExtentStartingAt, getMentionsWithHeadStartingAt, getNumMentions, getNumRelations, getNumSentences, getPlainText, getPOS, getPOS, getPredEntities, getPredMention, getPredMentions, getQuoteNestLevel, getRelation, getSentNum, getShortEID, getStartCharNum, getTextFirstWordNum, getTrueEntities, getTrueMention, getTrueMentionFor, getTrueMentions, getWholeDocCounts, getWord, getWordNum, getWords, hasHeadPrediction, hasPredEntities, hasPredMentions, hasTrueEntities, hasTrueMentions, initMembersDefault, isCaseSensitive, loadChunkedText, loadFromText, loadFromText, loadPOSTaggerOutput, loadPOSTags, loadSGMText, makeBestMentionMap, makeChunk, printChunkValidity, recordWordLocation, removeTagsAndExtraNL, repeat, save, setCorpusCounts, setPlainText, setPOSTags, setPredEntities, setPredictedMentions, setQuoteLevels, setSentenceNumbers, setUsePredictedEntities, setUsePredictedMentions, setWords, setWords, sortEntitiesByListOrder, sortPredictedMentions, sortTrueMentions, toAnnotatedString, toAnnotatedString, toCoherenceTableString, toCoherenceTableString, toString, toSubstituteString, translateEscaped, usePredictedEntities, usePredictedMentions, write
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface edu.illinois.cs.cogcomp.lbj.coref.ir.docs.Doc
getBestMentionFor, getCExampleFor, getCoherenceInfo, getCoherenceInfo, getCorefChains, getDocID, getEntities, getEntityFor, getEntityFor, getGExampleFor, getInCorpusInverseFreq, getInDocInverseFreq, getInverseTrueHeadFreq, getInverseTrueHeadFreq, getMentions, getMentionsContainedIn, getMentionsContaining, getMentionsInSent, getMentionsInSentences, getMentionsWithExtentStartingAt, getMentionsWithHeadStartingAt, getNumRelations, getNumSentences, getPlainText, getPOS, getPOS, getPredEntities, getPredMentions, getQuoteNestLevel, getRelation, getSentNum, getStartCharNum, getTextFirstWordNum, getTrueEntities, getTrueMentionFor, getTrueMentions, getWholeDocCounts, getWord, getWordNum, getWords, hasPredEntities, hasPredMentions, hasTrueEntities, hasTrueMentions, isCaseSensitive, makeChunk, save, setCorpusCounts, setPredEntities, setPredictedMentions, setUsePredictedEntities, setUsePredictedMentions, toAnnotatedString, toAnnotatedString, toCoherenceTableString, toCoherenceTableString, toSubstituteString, usePredictedEntities, usePredictedMentions, write
 

Field Detail

serialVersionUID

private static final long serialVersionUID
See Also:
Constant Field Values
Constructor Detail

DocPlainText

public DocPlainText()
Constructs an empty document. This constructor can be used, followed by loadFromPlainText(java.lang.String) to construct a document from a text string.


DocPlainText

public DocPlainText(java.lang.String filename)
Constructs a document using the specified plain text file. * Automatically splits sentences, determines quote levels, determines part-of-speech tags, and splits words using an automatic word-splitting algorithm. Mentions and entities will not be set here.

Parameters:
filename - The name of the specified file.
Method Detail

loadFromFilename

public void loadFromFilename(java.lang.String filename)
Builds this document from the specified plain text file. Automatically splits sentences, determines quote levels, determines part-of-speech tags, and splits words using an automatic word-splitting algorithm. Mentions and entities will not be set here.

Parameters:
filename - The name of a file containing plain text.

loadFromPlainText

public void loadFromPlainText(java.lang.String text)
Builds the document from the given plain text, automatically splitting sentences, determining quote levels, determining part-of-speech tags, and splitting words by an automatic word-splitting algorithm. Mentions and entities will not be set here.

Parameters:
text - The text of the document.

loadFromPlainText

public void loadFromPlainText(java.lang.String text,
                              boolean doWordSplit)
Builds the document from the given plain text, automatically splitting sentences, determining quote levels, determining part-of-speech tags, and either splitting words by whitespace or using a word-splitter. Mentions and entities will not be set here.

Parameters:
text - The text of the document.
doWordSplit - If true, words will be split by an automatic word-splitting algorithm; otherwise words will be assumed to be separated by whitespace.

write

public void write(java.lang.String filename,
                  boolean usePredictions)
Description copied from interface: Doc
Writes this Doc in the appropriate format.

Specified by:
write in interface Doc
Specified by:
write in class DocBase
Parameters:
filename - The name of the target file.
usePredictions - Whether predicted mentions and entities should be written.