public class TextAnnotation extends AbstractTextAnnotation implements Serializable
Modifier and Type | Field and Description |
---|---|
protected gnu.trove.map.TIntObjectMap<ArrayList<edu.illinois.cs.cogcomp.core.datastructures.IntPair>> |
allSpans |
protected int[] |
characterOffsetsToTokens
A map from character offset to the token id.
|
protected String |
corpusId
An identifier for the corpus
|
protected String |
id
The identifier for this text annotation
|
protected List<Sentence> |
sentences
The list of sentences contained in this text
|
text, tokenCharacterOffsets, tokenizedText, tokens, views
Modifier | Constructor and Description |
---|---|
|
TextAnnotation(String corpusId,
String id,
List<String> tokenizedSentences)
Create a new
TextAnnotation with the specified corpusId and
textId. |
|
TextAnnotation(String corpusId,
String id,
String text)
Deprecated.
|
|
TextAnnotation(String corpusId,
String id,
String text,
edu.illinois.cs.cogcomp.core.datastructures.IntPair[] characterOffsets,
String[] tokens,
int[] sentenceEndPositions) |
|
TextAnnotation(String corpusId,
String id,
String text,
String[] tokens,
int[] sentenceEndPositions)
Create a new text annotation using the given text, the tokens and the
sentence boundary positions (only the ending positions), specified in
terms of the tokens.
|
protected |
TextAnnotation(String corpusId,
String id,
String text,
String[] tokens,
int[] sentenceEndPositions,
String sentenceViewGenerator,
double sentenceViewScore) |
|
TextAnnotation(String corpusId,
String id,
String text,
TokenizerUtilities.SentenceViewGenerators sentenceViewGenerator)
Create a new text annotation.
|
Modifier and Type | Method and Description |
---|---|
void |
addView(ViewGenerator viewGenerator)
Adds a view that is generated by a
ViewGenerator |
boolean |
equals(Object obj) |
String |
getCorpusId() |
String |
getId() |
int |
getNumberOfSentences() |
Sentence |
getSentence(int sentenceId) |
Sentence |
getSentenceFromToken(int tokenId)
Gets the sentence containing the specified token
|
List<Sentence> |
getSentenceFromTokens(Set<Integer> tokens) |
int |
getSentenceId(Constituent constituent) |
int |
getSentenceId(int tokenId)
Gets the index of the sentence that contains the token, indexed by
tokenPosition.
|
List<edu.illinois.cs.cogcomp.core.datastructures.IntPair> |
getSpansMatching(String text) |
int |
getTokenIdFromCharacterOffset(int characterOffset)
Get the position of token that corresponds to the character offset that
is passed as a parameter.
|
int |
hashCode() |
List<Sentence> |
sentences() |
String |
toString() |
addView, addView, getAvailableViews, getDetokenizedText, getText, getToken, getTokenCharacterOffset, getTokenizedText, getTokens, getTokensInSpan, getTopKViews, getView, hasView, select, size
protected String corpusId
protected String id
protected int[] characterOffsetsToTokens
getTokenIdFromCharacterOffset(int)
is called the first
time.protected gnu.trove.map.TIntObjectMap<ArrayList<edu.illinois.cs.cogcomp.core.datastructures.IntPair>> allSpans
@Deprecated public TextAnnotation(String corpusId, String id, String text)
SentenceViewGenerator
corpusId
- A string that identifies the corpusid
- A string that identifies this texttext
- The text itselfpublic TextAnnotation(String corpusId, String id, String text, TokenizerUtilities.SentenceViewGenerators sentenceViewGenerator)
SentenceViewGenerator
.corpusId
- A string that identifies the corpusid
- A string that identifies this texttext
- The text itselfsentenceViewGenerator
- An instance of SentenceViewGenerator
NOTE: This
sentenceViewGenerator must set CHARACTER OFFSETS in the view
createdpublic TextAnnotation(String corpusId, String id, String text, String[] tokens, int[] sentenceEndPositions)
For example, for the text "Jack went up the hill. So did Jill.", the tokens would be the array {"Jack", "went", "up", "the", "hill", "." ,"So", "did", "Jill", "."} and the array of sentence boundary array would be {6, 11}. If the last element of the sentence boundary array is not equal to the size of the tokens array, an IllegalArgumentException is raised.
corpusId
- A string that identifies the corpusid
- A string that identifies this texttext
- The text it selftokens
- The array of tokens of this textsentenceEndPositions
- The ending positions of sentences, specified as indices to the
tokens array. Note that the end positions are exclusive -- for
example, if the sentence ends at the 7th token, then the end
position for that sentence would be 8.protected TextAnnotation(String corpusId, String id, String text, String[] tokens, int[] sentenceEndPositions, String sentenceViewGenerator, double sentenceViewScore)
public TextAnnotation(String corpusId, String id, List<String> tokenizedSentences)
TextAnnotation
with the specified corpusId and
textId. The sentences in the text are provided by the parameter
tokenizedSentences
. Further, this constructor assumes that the
sentences are white space tokenized.corpusId
- A string that identifies the corpusid
- A string that identifies this texttokenizedSentences
- A list of white-space tokenized sentencespublic void addView(ViewGenerator viewGenerator)
ViewGenerator
public String getCorpusId()
public String getId()
public int getNumberOfSentences()
public Sentence getSentence(int sentenceId)
public int getSentenceId(Constituent constituent)
public int getSentenceId(int tokenId)
getSentence(int)
.tokenId
- The index of the token whose sentenceId is neededIllegalArgumentException
- if no sentence contains the tokenId
public Sentence getSentenceFromToken(int tokenId)
public int getTokenIdFromCharacterOffset(int characterOffset)
CuratorClient
uses this function to convert views
from the Curator representation.Copyright © 2015. All rights reserved.