ChineseTokenizer (illinois-cogcomp-nlp 3.1.29 API)

Skip navigation links

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

java.lang.Object
- edu.illinois.cs.cogcomp.tokenizer.ChineseTokenizer

All Implemented Interfaces:: Tokenizer

public class ChineseTokenizer
extends Object
implements Tokenizer

Created by ctsai12 on 12/7/15.

Nested Class Summary
- Nested classes/interfaces inherited from interface edu.illinois.cs.cogcomp.nlp.tokenizer.Tokenizer
  Tokenizer.Tokenization

Constructor Summary

Constructors
Constructor and Description

ChineseTokenizer(String basedir)

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`static boolean`	`containsHanScript(String s)`
`TextAnnotation`	`getTextAnnotation1(String text)`
`static void`	`main(String[] args)`
`TextAnnotation`	`oldGetTextAnnotation(String text)`
`Pair<String[],IntPair[]>`	`tokenizeSentence(String text)` given a sentence, return a set of tokens and their character offsets
`Tokenizer.Tokenization`	`tokenizeTextSpan(String text)` given a span of text, return a list of Pair< String[], IntPair[] > corresponding to tokenized sentences, where the String[] is the ordered list of sentence tokens and the IntPair[] is the corresponding list of character offsets with respect to the original text.
`String`	`trad2simp(String text)`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - ChineseTokenizer
```
public ChineseTokenizer(String basedir)
```
- Method Detail
  - containsHanScript
```
public static boolean containsHanScript(String s)
```
  - main
```
public static void main(String[] args)
```
  - trad2simp
```
public String trad2simp(String text)
```
  - oldGetTextAnnotation
```
public TextAnnotation oldGetTextAnnotation(String text)
```
  - getTextAnnotation1
```
public TextAnnotation getTextAnnotation1(String text)
```
  - tokenizeSentence
```
public Pair<String[],IntPair[]> tokenizeSentence(String text)
```
    given a sentence, return a set of tokens and their character offsets
    
    Specified by:
    
    tokenizeSentence in interface Tokenizer
    
    Parameters:
    
    text - The sentence string
    
    Returns:
    
    A Pair containing the array of tokens and their character offsets
  - tokenizeTextSpan
```
public Tokenizer.Tokenization tokenizeTextSpan(String text)
```
    given a span of text, return a list of Pair< String[], IntPair[] > corresponding to tokenized sentences, where the String[] is the ordered list of sentence tokens and the IntPair[] is the corresponding list of character offsets with respect to the original text.
    
    Specified by:
    
    tokenizeTextSpan in interface Tokenizer
    
    Parameters:
    
    text -

Skip navigation links

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

Copyright © 2017. All rights reserved.