TextAnnotationUtilities (illinois-cogcomp-nlp 3.1.29 API)

java.lang.Object
- edu.illinois.cs.cogcomp.core.datastructures.textannotation.TextAnnotationUtilities

public class TextAnnotationUtilities
extends Object

Author:: Vivek Srikumar

Field Summary

Fields
Modifier and Type	Field and Description
`static Comparator<Constituent>`	`constituentEndComparator`
`static Comparator<Constituent>`	`constituentLengthComparator`
`static Comparator<Constituent>`	`constituentStartComparator`
`static Comparator<Constituent>`	`constituentStartEndComparator` This comparator will sort entities on start location, but where start is equal on end as well so the shorter entities come first.
`static Comparator<IntPair>`	`IntPairComparator`
`static Comparator<Sentence>`	`sentenceStartComparator`

Constructor Summary

Constructors
Constructor and Description

TextAnnotationUtilities()

Constructors
Constructor and Description
`TextAnnotationUtilities()`

Method Summary

All Methods Static Methods Concrete Methods
Modifier and Type	Method and Description
`static void`	`copyAttributesFromTo(HasAttributes origObj, HasAttributes newObj)`
`static Constituent`	`copyConstituentWithNewTokenOffsets(TextAnnotation newTA, Constituent c, int offset)` create a new constituent with token offsets shifted by the specified amount
`static Relation`	`copyRelation(Relation r, Map<Constituent,Constituent> consMap)` required: consMap must contain the source and target constituents for r as keys, and their values must be non-null
`static void`	`copyViewFromTo(String vuName, TextAnnotation ta, TextAnnotation newTA, int sourceStartTokenIndex, int sourceEndTokenIndex, int offset)`
`static void`	`copyViewsFromTo(TextAnnotation ta, TextAnnotation newTA, int sourceStartTokenIndex, int sourceEndTokenIndex, int offset)` copy views from the relevant span from ta to newTA.
`static TextAnnotation`	`createFromTokenizedString(String text)`
`static List<String>`	`getSentenceList(TextAnnotation ta)`
`static TextAnnotation`	`getSubTextAnnotation(TextAnnotation ta, int sentenceId)` Given a `TextAnnotation` object, and a sentence id, it gives a smaller `TextAnnotation` which contains the annotations specific to the given sentence id.
`static String`	`getTokenSequence(TextAnnotation ta, int start, int end)`
`static void`	`mapSentenceAnnotationsToText(TextAnnotation sentenceTa, TextAnnotation textTa, int sentenceId)` given a `TextAnnotation` for a sentence with annotations, map its annotations into a TextAnnotation object for a longer text containing that sentence.
`static TextAnnotation`	`mapTransformedTextAnnotationToSource(TextAnnotation ta, StringTransformation st)` given a TextAnnotation generated from the transformed text of a StringTransformation object, and the corresponding StringTransformation object, generate a new TextAnnotation whose annotations correspond to those of the transformed text TextAnnotation, but whose offsets correspond to the original text Example: you parse an xml-formatted news document, and use a StringTransformation to record all the places you removed xml markup or made other changes.
`static void`	`printTextAnnotation(PrintStream out, TextAnnotation ta)`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

constituentStartEndComparator
```
public static final Comparator<Constituent> constituentStartEndComparator
```
This comparator will sort entities on start location, but where start is equal on end as well so the shorter entities come first.

constituentStartComparator

public static final Comparator<Constituent> constituentStartComparator

sentenceStartComparator

public static final Comparator<Sentence> sentenceStartComparator

constituentEndComparator

public static final Comparator<Constituent> constituentEndComparator

constituentLengthComparator

public static final Comparator<Constituent> constituentLengthComparator

IntPairComparator

public static final Comparator<IntPair> IntPairComparator

Constructor Detail
- TextAnnotationUtilities
```
public TextAnnotationUtilities()
```

Method Detail
- createFromTokenizedString
```
public static TextAnnotation createFromTokenizedString(String text)
```
- getTokenSequence
```
public static String getTokenSequence(TextAnnotation ta,
                                      int start,
                                      int end)
```
- getSentenceList
```
public static List<String> getSentenceList(TextAnnotation ta)
```
- printTextAnnotation
```
public static void printTextAnnotation(PrintStream out,
                                       TextAnnotation ta)
```
- mapSentenceAnnotationsToText
```
public static void mapSentenceAnnotationsToText(TextAnnotation sentenceTa,
                                                TextAnnotation textTa,
                                                int sentenceId)
```
  given a TextAnnotation for a sentence with annotations, map its annotations into a TextAnnotation object for a longer text containing that sentence.
  
  Parameters:
  
  sentenceTa - annotated TextAnnotation for sentence
  
  textTa - TextAnnotation for longer text containing sentence, without annotations for that sentence
  
  sentenceId - index of the sentence in the longer text
- getSubTextAnnotation
```
public static TextAnnotation getSubTextAnnotation(TextAnnotation ta,
                                                  int sentenceId)
```
  Given a TextAnnotation object, and a sentence id, it gives a smaller TextAnnotation which contains the annotations specific to the given sentence id. The underlying text is just the sentence text, and character offsets are modified to correspond to this new shorter text.
- copyViewsFromTo
```
public static void copyViewsFromTo(TextAnnotation ta,
                                   TextAnnotation newTA,
                                   int sourceStartTokenIndex,
                                   int sourceEndTokenIndex,
                                   int offset)
```
  copy views from the relevant span from ta to newTA. If ta is smaller than newTA, map all constituents, changing offsets according to the value 'offset'. Otherwise, only map those constituents within the span sourceStartTokenIndex, sourceEndTokenIndex to newTA.
  
  Parameters:
  
  ta -
  
  newTA -
  
  sourceStartTokenIndex -
  
  sourceEndTokenIndex -
  
  offset -
- copyViewFromTo
```
public static void copyViewFromTo(String vuName,
                                  TextAnnotation ta,
                                  TextAnnotation newTA,
                                  int sourceStartTokenIndex,
                                  int sourceEndTokenIndex,
                                  int offset)
```
- copyRelation
```
public static Relation copyRelation(Relation r,
                                    Map<Constituent,Constituent> consMap)
```
  required: consMap *must* contain the source and target constituents for r as keys, and their values must be non-null
  
  Parameters:
  
  r - relation to copy
  
  consMap - map from original constituents to new counterparts
  
  Returns:
  
  new relation with all info copied from original, but with new source and target constituents
- copyAttributesFromTo
```
public static void copyAttributesFromTo(HasAttributes origObj,
                                        HasAttributes newObj)
```
- copyConstituentWithNewTokenOffsets
```
public static Constituent copyConstituentWithNewTokenOffsets(TextAnnotation newTA,
                                                             Constituent c,
                                                             int offset)
```
  create a new constituent with token offsets shifted by the specified amount
  
  Parameters:
  
  newTA - TextAnnotation which will contain the new Constituent
  
  c - original Constituent to copy
  
  offset - the offset to shift token indexes of new Constituent. Can be negative.
  
  Returns:
  
  the new Constituent
- mapTransformedTextAnnotationToSource
```
public static TextAnnotation mapTransformedTextAnnotationToSource(TextAnnotation ta,
                                                                  StringTransformation st)
```
  given a TextAnnotation generated from the transformed text of a StringTransformation object, and the corresponding StringTransformation object, generate a new TextAnnotation whose annotations correspond to those of the transformed text TextAnnotation, but whose offsets correspond to the original text Example: you parse an xml-formatted news document, and use a StringTransformation to record all the places you removed xml markup or made other changes. You process the cleaned text with a set of NLP tools. This method takes the output and maps the offsets back to the xml-formatted source document. This is useful for e.g. TAC evaluations, where provenance offsets are important.
  
  Parameters:
  
  ta -
  
  st -
  
  Returns:

Class TextAnnotationUtilities

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

constituentStartEndComparator

constituentStartComparator

sentenceStartComparator

constituentEndComparator

constituentLengthComparator

IntPairComparator

Constructor Detail

TextAnnotationUtilities

Method Detail

createFromTokenizedString

getTokenSequence

getSentenceList

printTextAnnotation

mapSentenceAnnotationsToText

getSubTextAnnotation

copyViewsFromTo

copyViewFromTo

copyRelation

copyAttributesFromTo

copyConstituentWithNewTokenOffsets

mapTransformedTextAnnotationToSource