edu.illinois.cs.cogcomp.lbj.coref.ir.docs
Class DocPlainText
java.lang.Object
edu.illinois.cs.cogcomp.lbj.coref.ir.docs.DocBase
edu.illinois.cs.cogcomp.lbj.coref.ir.docs.DocPlainText
- All Implemented Interfaces:
- Doc, java.io.Serializable
public class DocPlainText
- extends DocBase
- implements Doc
Represents a Doc constructed from plain text.
To load a document from a string, construct using the no-arg constructor
and then call loadFromPlainText(java.lang.String)
.
To load the document including mention detection, see
DocFromTextLoader
To load a document given the name of a plain text file,
see DocPlainText(String)
.
To load the document including mention detection, see
DocPlainTextLoader
.
- Author:
- Eric Bengtson
- See Also:
- Serialized Form
Fields inherited from class edu.illinois.cs.cogcomp.lbj.coref.ir.docs.DocBase |
goodEnds, goodStarts, m_annotationAuthor, m_baseFN, m_bNeedsCasing, m_caser, m_dateTime, m_docID, m_docType, m_encoding, m_headline, m_slug, m_source, m_text, m_version, medEnds, totalMentions |
Constructor Summary |
DocPlainText()
Constructs an empty document. |
DocPlainText(java.lang.String filename)
Constructs a document using the specified plain text file. |
Method Summary |
void |
loadFromFilename(java.lang.String filename)
Builds this document from the specified plain text file. |
void |
loadFromPlainText(java.lang.String text)
Builds the document from the given plain text,
automatically splitting sentences, determining quote levels,
determining part-of-speech tags, and splitting words by
an automatic word-splitting algorithm. |
void |
loadFromPlainText(java.lang.String text,
boolean doWordSplit)
Builds the document from the given plain text,
automatically splitting sentences, determining quote levels,
determining part-of-speech tags, and either splitting words
by whitespace or using a word-splitter. |
void |
write(java.lang.String filename,
boolean usePredictions)
Writes this Doc in the appropriate format. |
Methods inherited from class edu.illinois.cs.cogcomp.lbj.coref.ir.docs.DocBase |
addHeadPrediction, addPredEntities, addRelation, addTrueEntity, addTrueMention, alignPredMentsToTrue, buildMentionsContaining, buildMentionsInSents, calcAndSetQuotes, getBestMentionFor, getCExampleFor, getCoherenceInfo, getCoherenceInfo, getCorefChains, getDocID, getEntities, getEntityFor, getEntityFor, getEntityFor, getGExampleFor, getHeadPrediction, getInCorpusInverseFreq, getInDocInverseFreq, getInverseTrueHeadFreq, getInverseTrueHeadFreq, getMention, getMentions, getMentionsContainedIn, getMentionsContaining, getMentionsInSent, getMentionsInSentences, getMentionsWithExtentStartingAt, getMentionsWithHeadStartingAt, getNumMentions, getNumRelations, getNumSentences, getPlainText, getPOS, getPOS, getPredEntities, getPredMention, getPredMentions, getQuoteNestLevel, getRelation, getSentNum, getShortEID, getStartCharNum, getTextFirstWordNum, getTrueEntities, getTrueMention, getTrueMentionFor, getTrueMentions, getWholeDocCounts, getWord, getWordNum, getWords, hasHeadPrediction, hasPredEntities, hasPredMentions, hasTrueEntities, hasTrueMentions, initMembersDefault, isCaseSensitive, loadChunkedText, loadFromText, loadFromText, loadPOSTaggerOutput, loadPOSTags, loadSGMText, makeBestMentionMap, makeChunk, printChunkValidity, recordWordLocation, removeTagsAndExtraNL, repeat, save, setCorpusCounts, setPlainText, setPOSTags, setPredEntities, setPredictedMentions, setQuoteLevels, setSentenceNumbers, setUsePredictedEntities, setUsePredictedMentions, setWords, setWords, sortEntitiesByListOrder, sortPredictedMentions, sortTrueMentions, toAnnotatedString, toAnnotatedString, toCoherenceTableString, toCoherenceTableString, toString, toSubstituteString, translateEscaped, usePredictedEntities, usePredictedMentions, write |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Methods inherited from interface edu.illinois.cs.cogcomp.lbj.coref.ir.docs.Doc |
getBestMentionFor, getCExampleFor, getCoherenceInfo, getCoherenceInfo, getCorefChains, getDocID, getEntities, getEntityFor, getEntityFor, getGExampleFor, getInCorpusInverseFreq, getInDocInverseFreq, getInverseTrueHeadFreq, getInverseTrueHeadFreq, getMentions, getMentionsContainedIn, getMentionsContaining, getMentionsInSent, getMentionsInSentences, getMentionsWithExtentStartingAt, getMentionsWithHeadStartingAt, getNumRelations, getNumSentences, getPlainText, getPOS, getPOS, getPredEntities, getPredMentions, getQuoteNestLevel, getRelation, getSentNum, getStartCharNum, getTextFirstWordNum, getTrueEntities, getTrueMentionFor, getTrueMentions, getWholeDocCounts, getWord, getWordNum, getWords, hasPredEntities, hasPredMentions, hasTrueEntities, hasTrueMentions, isCaseSensitive, makeChunk, save, setCorpusCounts, setPredEntities, setPredictedMentions, setUsePredictedEntities, setUsePredictedMentions, toAnnotatedString, toAnnotatedString, toCoherenceTableString, toCoherenceTableString, toSubstituteString, usePredictedEntities, usePredictedMentions, write |
serialVersionUID
private static final long serialVersionUID
- See Also:
- Constant Field Values
DocPlainText
public DocPlainText()
- Constructs an empty document.
This constructor can be used, followed by
loadFromPlainText(java.lang.String)
to construct a document from a text string.
DocPlainText
public DocPlainText(java.lang.String filename)
- Constructs a document using the specified plain text file.
* Automatically splits sentences, determines quote levels,
determines part-of-speech tags, and splits words using
an automatic word-splitting algorithm.
Mentions and entities will not be set here.
- Parameters:
filename
- The name of the specified file.
loadFromFilename
public void loadFromFilename(java.lang.String filename)
- Builds this document from the specified plain text file.
Automatically splits sentences, determines quote levels,
determines part-of-speech tags, and splits words using
an automatic word-splitting algorithm.
Mentions and entities will not be set here.
- Parameters:
filename
- The name of a file containing plain text.
loadFromPlainText
public void loadFromPlainText(java.lang.String text)
- Builds the document from the given plain text,
automatically splitting sentences, determining quote levels,
determining part-of-speech tags, and splitting words by
an automatic word-splitting algorithm.
Mentions and entities will not be set here.
- Parameters:
text
- The text of the document.
loadFromPlainText
public void loadFromPlainText(java.lang.String text,
boolean doWordSplit)
- Builds the document from the given plain text,
automatically splitting sentences, determining quote levels,
determining part-of-speech tags, and either splitting words
by whitespace or using a word-splitter.
Mentions and entities will not be set here.
- Parameters:
text
- The text of the document.doWordSplit
- If true, words will be split by
an automatic word-splitting algorithm; otherwise
words will be assumed to be separated by whitespace.
write
public void write(java.lang.String filename,
boolean usePredictions)
- Description copied from interface:
Doc
- Writes this Doc in the appropriate format.
- Specified by:
write
in interface Doc
- Specified by:
write
in class DocBase
- Parameters:
filename
- The name of the target file.usePredictions
- Whether predicted mentions and entities
should be written.