public class StatefulTokenizer extends Object implements Tokenizer
TokenizerStateMachine
is not.Tokenizer.Tokenization
Constructor and Description |
---|
StatefulTokenizer()
Takes a boolean indicating if we are to split on dash or not.
|
StatefulTokenizer(boolean splitOnDash)
Takes a boolean indicating if we are to split on dash or not.
|
Modifier and Type | Method and Description |
---|---|
boolean |
isSplitOnDash() |
void |
setSplitOnDash(boolean splitOnDash) |
Pair<String[],IntPair[]> |
tokenizeSentence(String sentence)
given a sentence, return a set of tokens and their character offsets
|
Tokenizer.Tokenization |
tokenizeTextSpan(String textSpan)
given a span of text, return a list of Pair< String[], IntPair[] > corresponding
to tokenized sentences, where the String[] is the ordered list of sentence tokens and the
IntPair[] is the corresponding list of character offsets with respect to the original
text.
|
public StatefulTokenizer()
public StatefulTokenizer(boolean splitOnDash)
splitOnDash
- if true, we will split words on a "-".public Pair<String[],IntPair[]> tokenizeSentence(String sentence)
Tokenizer
tokenizeSentence
in interface Tokenizer
sentence
- The sentence stringPair
containing the array of tokens and their character offsetspublic Tokenizer.Tokenization tokenizeTextSpan(String textSpan)
Tokenizer
tokenizeTextSpan
in interface Tokenizer
public boolean isSplitOnDash()
public void setSplitOnDash(boolean splitOnDash)
splitOnDash
- the splitOnDash to setCopyright © 2017. All rights reserved.