Tokenizer.Tokenization
Constructor and Description |
---|
ChineseTokenizer(String basedir) |
Modifier and Type | Method and Description |
---|---|
static boolean |
containsHanScript(String s) |
TextAnnotation |
getTextAnnotation1(String text) |
static void |
main(String[] args) |
TextAnnotation |
oldGetTextAnnotation(String text) |
Pair<String[],IntPair[]> |
tokenizeSentence(String text)
given a sentence, return a set of tokens and their character offsets
|
Tokenizer.Tokenization |
tokenizeTextSpan(String text)
given a span of text, return a list of Pair< String[], IntPair[] > corresponding
to tokenized sentences, where the String[] is the ordered list of sentence tokens and the
IntPair[] is the corresponding list of character offsets with respect to the original
text.
|
String |
trad2simp(String text) |
public ChineseTokenizer(String basedir)
public static boolean containsHanScript(String s)
public static void main(String[] args)
public TextAnnotation oldGetTextAnnotation(String text)
public TextAnnotation getTextAnnotation1(String text)
public Pair<String[],IntPair[]> tokenizeSentence(String text)
tokenizeSentence
in interface Tokenizer
text
- The sentence stringPair
containing the array of tokens and their character offsetspublic Tokenizer.Tokenization tokenizeTextSpan(String text)
tokenizeTextSpan
in interface Tokenizer
text
- Copyright © 2017. All rights reserved.