|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
public interface Doc
Represents one document from a corpus, including the text, sentences, words, part-of-speech tags, annotations of coreference, relations, entities, mentions, and other relevant information.
The most common way to create a document is to use a DocLoader,
such as DocFromTextLoader
or DocLoader.getDefaultLoader(java.lang.String)
.
The advantage of such an approach is that loading of mentions
is done automatically (if annotation is provided in the files)
and/or mention prediction and typing is done automatically
(if mention detectors and typers are provided).
Alternatively, subclasses may be constructed directly.
Method Summary | |
---|---|
Mention |
getBestMentionFor(Mention m)
Gets the canonical mention of the entity containing m . |
CExample |
getCExampleFor(Mention m1,
Mention m2)
Returns the unique CExample for the given pair of mentions
in the given order. |
java.util.Map<Entity,java.util.Map<java.lang.Integer,java.lang.String>> |
getCoherenceInfo()
Gets the coherence info using the value of usePredictedEntities() to determine whether
to use predicted entities. |
java.util.Map<Entity,java.util.Map<java.lang.Integer,java.lang.String>> |
getCoherenceInfo(boolean usePred)
Gets a grid indicating the mention type for each combination of entities and sentences. |
ChainSolution<Mention> |
getCorefChains()
Gets the partition of mentions into coreferential sets. |
java.lang.String |
getDocID()
Gets the ID for this document, as a string. |
java.util.List<Entity> |
getEntities()
Gets the entities, in no particular order. |
Entity |
getEntityFor(Mention m)
Gets the entity containing m . |
Entity |
getEntityFor(Mention m,
boolean usePred)
Gets the entity containing m . |
GExample |
getGExampleFor(Mention m)
Returns the unique GExample for the given pair of mentions
in the given order. |
double |
getInCorpusInverseFreq(java.lang.String word)
Gets the inverse of the number of occurrences of the specified word in the corpus. |
double |
getInDocInverseFreq(java.lang.String word)
Gets the inverse of the number of occurrences of the specified word in the document. |
double |
getInverseTrueHeadFreq(int wordNum)
Gets the inverse true head frequency of the word at the specified position. |
double |
getInverseTrueHeadFreq(java.lang.String word)
Gets the inverse of the number of occurrences of the specified word in the heads of the true mentions in the document. |
java.util.List<Mention> |
getMentions()
Gets the mentions of the document, sorted (typically in document order). |
java.util.Set<Mention> |
getMentionsContainedIn(Mention m)
Gets the set of mentions whose head is entirely contained within a specified mention's extent, including the specified mention itself. |
java.util.Set<Mention> |
getMentionsContaining(Mention m)
Gets the set of mentions whose extents entirely contain a specified mention's extent, including the specified mention itself. |
java.util.List<Mention> |
getMentionsInSent(int sentNum)
Gets a list of the mentions in a specified sentence in order. |
Pair<java.util.List<Mention>,java.util.List<Mention>> |
getMentionsInSentences(int s1,
int s2)
Gets a pair of lists of mentions, one for each of the two specified sentences. |
java.util.Set<Mention> |
getMentionsWithExtentStartingAt(int startWord)
Returns the set of mentions whose extents start at the specified word number, or an empty set if none are found. |
java.util.Set<Mention> |
getMentionsWithHeadStartingAt(int startWord)
Returns the set of mentions whose heads start at the specified word number, or an empty set if none are found. |
int |
getNumRelations()
Gets the number of relations. |
int |
getNumSentences()
Returns the number of sentences in the document. |
java.lang.String |
getPlainText()
Gets the text that is the basis for counting, including the start/end characters in Chunk objects. |
java.util.List<java.lang.String> |
getPOS()
Gets a list of the Part-Of-Speech tags for the words of the document. |
java.lang.String |
getPOS(int posNum)
Gets the Part-Of-Speech tag for the word at the posNum
position in the document. |
java.util.List<Entity> |
getPredEntities()
Gets a list of predicted entities, in no particular order. |
java.util.List<Mention> |
getPredMentions()
Gets a sorted list of predicted mentions. |
int |
getQuoteNestLevel(int wordNum)
Indicates the number of nested quotes the specified word is in. |
Relation |
getRelation(int number)
Gets the specified relation. |
int |
getSentNum(int wordNum)
Gets the sentence number for the specified word. |
int |
getStartCharNum(int wordNum)
Gets the zero-based position of the first character of a word. |
int |
getTextFirstWordNum()
Gets the word number of the first word in the main text of the document (as distinguished from headlines and metadata that may be included in the plain text.) |
java.util.List<Entity> |
getTrueEntities()
Gets a list of true entities, in no particular order. |
Mention |
getTrueMentionFor(Mention pred)
Gets the true mention aligned with the specified mention. |
java.util.List<Mention> |
getTrueMentions()
Gets a sorted list of true mentions. |
java.util.Map<java.lang.String,java.lang.Integer> |
getWholeDocCounts()
Gets the counts for the words in the document. |
java.lang.String |
getWord(int wordNum)
Gets the specified word. |
int |
getWordNum(int charNum)
Determines the word number (zero-based) of the word at charNum ,
or if no word is at charNum, return the word number of the closest
word appearing after charNum, or if no such word exists, return -1. |
java.util.List<java.lang.String> |
getWords()
Gets a list of the surface forms of the words of the document. |
boolean |
hasPredEntities()
Indicates whether predicted entities are available. |
boolean |
hasPredMentions()
Indicates whether predicted mentions have been set. |
boolean |
hasTrueEntities()
Indicates whether true entities are available. |
boolean |
hasTrueMentions()
Indicates whether true mentions have been set. |
boolean |
isCaseSensitive()
Indicates whether the document is case sensitive. |
Chunk |
makeChunk(int startWord,
int endWord)
Create a chunk spanning the specified words in this document. |
void |
save()
Writes the document to a file using serialization. |
void |
setCorpusCounts(java.util.Map<java.lang.String,java.lang.Integer> counts)
Sets the corpus counts for the words in the corpus. |
void |
setPredEntities(ChainSolution<Mention> sol)
Sets the predicted entities to be those specified by sol . |
void |
setPredictedMentions(java.util.Collection<Mention> ments)
Sets the predicted mentions and records a preference for using them. |
void |
setUsePredictedEntities(boolean usePred)
Sets the preference for using predicted entities or true entities. |
void |
setUsePredictedMentions(boolean usePred)
Sets the preference for using predicted mentions or true mentions. |
java.lang.String |
toAnnotatedString(boolean showPOS)
Gets the document as a string annotated with mention boundaries, with square brackets for true mentions, asterisks for false alarms, and triangle brackets for missed mentions, and optionally annotated with Part-Of-Speech tags. |
java.lang.String |
toAnnotatedString(boolean showPOS,
boolean showMTypes,
boolean showETypes,
boolean showEIDs)
Gets the document as a string annotated with mention boundaries, with square brackets for true mentions, asterisks for false alarms, and triangle brackets for missed mentions, and optionally annotated with Part-Of-Speech tags, mention types, entity types, and entity IDs. |
java.lang.String |
toCoherenceTableString()
Gets the coherence grid represented as a string, laid out in a grid. |
java.lang.String |
toCoherenceTableString(boolean usePred)
Gets the coherence grid represented as a string, laid out in a grid. |
java.lang.String |
toSubstituteString()
Gets the document as a string where each mention has been replaced by the most specific mention coreferential with it. |
boolean |
usePredictedEntities()
Indicates whether requests for entities will return predicted entities or true entities. |
boolean |
usePredictedMentions()
Indicates whether requests for mentions will return predicted mentions or true mentions. |
void |
write(boolean usePredictions)
Writes this Doc in the appropriate format. |
void |
write(java.lang.String filename,
boolean usePredictions)
Writes this Doc in the appropriate format. |
Method Detail |
---|
java.lang.String getPlainText()
java.lang.String getDocID()
boolean isCaseSensitive()
int getSentNum(int wordNum)
wordNum
- the zero-based position of the word whose
sentence number is desired.
int getNumSentences()
void setUsePredictedEntities(boolean usePred)
usePred
- if true
, prefer to use predicted entities,
otherwise, prefer true entities.boolean usePredictedEntities()
java.util.List<Entity> getEntities()
usePredictedEntities()
and predicted
entities are available, return them;
otherwise return true entities.
java.util.List<Entity> getPredEntities()
java.util.List<Entity> getTrueEntities()
ChainSolution<Mention> getCorefChains()
Entity getEntityFor(Mention m)
m
.
Uses entities from getEntities()
.
m
- The mention whose entity is desired.
m
, or null if not found.Entity getEntityFor(Mention m, boolean usePred)
m
.
If usePred
, returns the predicted entity,
else returns the true entity (if the requested type of entity
is not available, null
will be returned).
m
- The mention whose entity is desired.usePred
- Whether to return a predicted entity or a true entity.
m
, or null if the entity
of the specified type is not available.void setPredEntities(ChainSolution<Mention> sol)
sol
.
Entity IDs are automatically created, and each mention's
setPredictedEntityID()
method is called.
Also sets usePredictedEntities to true
.
The entities are backed internally, but the mentions are not duplicated.
sol
- The partition of mentions from which to derive entities.boolean hasPredEntities()
boolean hasTrueEntities()
CExample getCExampleFor(Mention m1, Mention m2)
CExample
for the given pair of mentions
in the given order.
Doc is the head of a collection of related examples;
as such, it needs to return the same CExample
any time
an inference-based classifier is used.
m1
- The first mention.m2
- The second mention.
CExample
referring to
the ordered pair m1, m2
.GExample getGExampleFor(Mention m)
GExample
for the given pair of mentions
in the given order.
Doc is the head of a collection of related examples;
as such, it needs to return the same GExample
any time
an inference-based classifier is used.
m
- The mention.
GExample
referring to
the ordered pair m1, m2
.void setUsePredictedMentions(boolean usePred)
usePred
- if true
, prefer to use predicted mentions,
otherwise, prefer true mentions.boolean usePredictedMentions()
java.util.List<Mention> getMentions()
usePredictedMentions()
.
java.util.List<Mention> getPredMentions()
boolean hasPredMentions()
boolean hasTrueMentions()
java.util.List<Mention> getTrueMentions()
void setPredictedMentions(java.util.Collection<Mention> ments)
ments
- The predicted mentions (copied defensively).Mention getTrueMentionFor(Mention pred)
pred
- A predicted mention.
pred
.Mention getBestMentionFor(Mention m)
m
.
m
- A mention.
m
.java.util.Set<Mention> getMentionsWithHeadStartingAt(int startWord)
startWord
- A word number.
startWord
.java.util.Set<Mention> getMentionsWithExtentStartingAt(int startWord)
startWord
- A word number.
startWord
.java.util.Set<Mention> getMentionsContainedIn(Mention m)
getMentions()
.
m
- The specified mention.
m
.java.util.Set<Mention> getMentionsContaining(Mention m)
getMentions()
.
m
- The specified mention.
m
.java.util.List<Mention> getMentionsInSent(int sentNum)
usePredictedMentions()
.
sentNum
- The number of the specified sentence.
Pair<java.util.List<Mention>,java.util.List<Mention>> getMentionsInSentences(int s1, int s2)
s1
- The number of the first sentence.s2
- The number of the second sentence.
Chunk makeChunk(int startWord, int endWord)
startWord
- The position of the first word in desired chunk.endWord
- The position of the last word in the desired chunk.
java.util.List<java.lang.String> getWords()
java.lang.String getWord(int wordNum)
wordNum
- The position of the specified word
(as an index into a List
).
wordNum
th word as a string.java.util.List<java.lang.String> getPOS()
POSTagger
java.lang.String getPOS(int posNum)
posNum
position in the document.
posNum
- The position of the word whose POS tag should be returned.
POSTagger
int getWordNum(int charNum)
charNum
,
or if no word is at charNum, return the word number of the closest
word appearing after charNum, or if no such word exists, return -1.
charNum
- The character number.
int getTextFirstWordNum()
int getStartCharNum(int wordNum)
wordNum
- The zero-based position of the word in the document.
-1
if wordNum
is invalid.int getQuoteNestLevel(int wordNum)
wordNum
- The position of the specified word.
double getInverseTrueHeadFreq(int wordNum)
wordNum
- The position in the document of the specified word.
getInverseTrueHeadFreq(String)
double getInverseTrueHeadFreq(java.lang.String word)
word
- The specified word.
double getInDocInverseFreq(java.lang.String word)
word
- The specified word.
double getInCorpusInverseFreq(java.lang.String word)
word
- The specified word.
void setCorpusCounts(java.util.Map<java.lang.String,java.lang.Integer> counts)
counts
- A map from words to counts of words in the corpus.java.util.Map<java.lang.String,java.lang.Integer> getWholeDocCounts()
Relation getRelation(int number)
number
- the number of the desired relation.
int getNumRelations()
java.lang.String toAnnotatedString(boolean showPOS, boolean showMTypes, boolean showETypes, boolean showEIDs)
showPOS
- Whether the Part-Of-Speech tags should be shown.showMTypes
- Whether mention types should be shown.showETypes
- Whether entity types should be shown.showEIDs
- Whether entity IDs should be shown.
java.lang.String toAnnotatedString(boolean showPOS)
showPOS
- Whether the Part-Of-Speech tags should be shown.
java.lang.String toSubstituteString()
java.util.Map<Entity,java.util.Map<java.lang.Integer,java.lang.String>> getCoherenceInfo(boolean usePred)
usePred
- Whether predicted entities should be used.
java.util.Map<Entity,java.util.Map<java.lang.Integer,java.lang.String>> getCoherenceInfo()
usePredictedEntities()
to determine whether
to use predicted entities.
java.lang.String toCoherenceTableString(boolean usePred)
usePred
-
getCoherenceInfo()
java.lang.String toCoherenceTableString()
usePredictedEntities()
.
getCoherenceInfo()
void save() throws java.io.IOException
java.io.IOException
void write(boolean usePredictions)
usePredictions
- Whether predicted mentions and entities
should be written.void write(java.lang.String filename, boolean usePredictions)
filename
- The name of the target file.usePredictions
- Whether predicted mentions and entities
should be written.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |