TokenizerTextAnnotationBuilder (illinois-cogcomp-nlp 3.1.29 API)

java.lang.Object
- edu.illinois.cs.cogcomp.nlp.utility.TokenizerTextAnnotationBuilder

All Implemented Interfaces:

TextAnnotationBuilder
```
public class TokenizerTextAnnotationBuilder
extends Object
implements TextAnnotationBuilder
```
A set of convenience methods for constructing TextAnnotations. Replaces a morass of specialized constructors in Edison to support use of illinois-core-utilities.

Author:

Mark Sammons, Narender Gupta

Field Summary
- Fields inherited from interface edu.illinois.cs.cogcomp.annotation.TextAnnotationBuilder
  SPLIT_ON_DASH

Constructor Summary

Constructors
Constructor and Description

TokenizerTextAnnotationBuilder(Tokenizer tokenizer)
instantiate a TextAnnotationBuilder.

Constructors
Constructor and Description
`TokenizerTextAnnotationBuilder(Tokenizer tokenizer)` instantiate a TextAnnotationBuilder.

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`TextAnnotation`	`buildTextAnnotation(String corpusId, String id, String text, String[] tokens, int[] sentenceEndPositions)` Create a new text annotation using the given text, the tokens and the sentence boundary positions (only the ending positions), specified in terms of the tokens.
`static TextAnnotation`	`buildTextAnnotation(String corpusId, String textId, String text, String[] tokens, int[] sentenceEndPositions, String sentenceViewGenerator, double sentenceViewScore)` instantiate a TextAnnotation using a SentenceViewGenerator to create an explicit Sentence view
`TextAnnotation`	`createTextAnnotation(String text)` create a TextAnnotation for the text argument, using the Tokenizer provided at construction.
`TextAnnotation`	`createTextAnnotation(String corpusId, String textId, String text)` Tokenize the input text (split into sentences and "words" within sentences) and populate a TextAnnotation object.
`TextAnnotation`	`createTextAnnotation(String corpusId, String textId, String text, Tokenizer.Tokenization tokenization)` A stub method that should not be called with this Builder.
`String`	`getName()`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - TokenizerTextAnnotationBuilder
```
public TokenizerTextAnnotationBuilder(Tokenizer tokenizer)
```
    instantiate a TextAnnotationBuilder.
    
    Parameters:
    
    tokenizer - The Tokenizer that will split text into sentences and words.
- Method Detail
  - buildTextAnnotation
```
public static TextAnnotation buildTextAnnotation(String corpusId,
                                                 String textId,
                                                 String text,
                                                 String[] tokens,
                                                 int[] sentenceEndPositions,
                                                 String sentenceViewGenerator,
                                                 double sentenceViewScore)
```
    instantiate a TextAnnotation using a SentenceViewGenerator to create an explicit Sentence view
    
    Parameters:
    
    corpusId - a field in TextAnnotation that can be used by the client for book-keeping (e.g. track texts from the same corpus)
    
    textId - a field in TextAnnotation that can be used by the client for book-keeping (e.g. identify a specific document by some reference string)
    
    text - the plain English text to process
    
    tokens - the token Strings, in order from original text
    
    sentenceEndPositions - token offsets of sentence ends (one-past-the-end indexing)
    
    sentenceViewGenerator - the name of the source of the sentence split
    
    sentenceViewScore - a score that may indicate how reliable the sentence split information is
    
    Returns:
    
    a TextAnnotation object with ViewNames.TOKENS and ViewNames.SENTENCE views.
  - getName
```
public String getName()
```
    Specified by:
    
    getName in interface TextAnnotationBuilder
  - createTextAnnotation
```
public TextAnnotation createTextAnnotation(String text)
                                    throws IllegalArgumentException
```
    create a TextAnnotation for the text argument, using the Tokenizer provided at construction. The text should be free from html/xml tags and non-English characters, assuming you want to process this text with other NLP components.
    
    Specified by:
    
    createTextAnnotation in interface TextAnnotationBuilder
    
    Parameters:
    
    text - the text to build the TextAnnotation
    
    Returns:
    
    a TextAnnotation object with ViewNames.SENTENCE and ViewNames.TOKENS views and default corpus id and text id fields.
    
    Throws:
    
    IllegalArgumentException - if the tokenizer has problems processing the text.
  - createTextAnnotation
```
public TextAnnotation createTextAnnotation(String corpusId,
                                           String textId,
                                           String text)
                                    throws IllegalArgumentException
```
    Tokenize the input text (split into sentences and "words" within sentences) and populate a TextAnnotation object. Specifies token character offsets with respect to original text. Input text should be English and avoid html and xml tags, and non-English characters may cause problems if you use the TextAnnotation as input to other NLP components.
    
    Specified by:
    
    createTextAnnotation in interface TextAnnotationBuilder
    
    Parameters:
    
    corpusId - a field in TextAnnotation that can be used by the client for book-keeping (e.g. track texts from the same corpus)
    
    textId - a field in TextAnnotation that can be used by the client for book-keeping (e.g. identify a specific document by some reference string)
    
    text - the plain English text to process
    
    Returns:
    
    a TextAnnotation object with ViewNames.TOKENS and ViewNames.SENTENCE views.
    
    Throws:
    
    IllegalArgumentException - if the tokenizer has problems with the input text.
  - createTextAnnotation
```
public TextAnnotation createTextAnnotation(String corpusId,
                                           String textId,
                                           String text,
                                           Tokenizer.Tokenization tokenization)
                                    throws IllegalArgumentException
```
    A stub method that should not be called with this Builder. Please use BasicTextAnnotationBuilder if you need to create TextAnnotation from pre-tokenized text.
    
    Specified by:
    
    createTextAnnotation in interface TextAnnotationBuilder
    
    text - Raw text string
    
    tokenization - An instance containing tokens, character offsets, and sentence boundaries.
    
    Throws:
    
    IllegalArgumentException
  - buildTextAnnotation
```
public TextAnnotation buildTextAnnotation(String corpusId,
                                          String id,
                                          String text,
                                          String[] tokens,
                                          int[] sentenceEndPositions)
```
    Create a new text annotation using the given text, the tokens and the sentence boundary positions (only the ending positions), specified in terms of the tokens.
    For example, for the text "Jack went up the hill. So did Jill.", the tokens would be the array {"Jack", "went", "up", "the", "hill", "." ,"So", "did", "Jill", "."} and the array of sentence boundary array would be {6, 11}. If the last element of the sentence boundary array is not equal to the size of the tokens array, an IllegalArgumentException is raised.
    
    Parameters:
    
    corpusId - A string that identifies the corpus
    
    id - A string that identifies this text
    
    text - The text it self
    
    tokens - The array of tokens of this text
    
    sentenceEndPositions - The ending positions of sentences, specified as indices to the tokens array. Note that the end positions are exclusive -- for example, if the sentence ends at the 7th token, then the end position for that sentence would be 8.

Class TokenizerTextAnnotationBuilder

Field Summary

Fields inherited from interface edu.illinois.cs.cogcomp.annotation.TextAnnotationBuilder

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

TokenizerTextAnnotationBuilder

Method Detail

buildTextAnnotation

getName

createTextAnnotation

createTextAnnotation

createTextAnnotation

buildTextAnnotation