public class TreebankChunkReader extends PennTreebankReader
perl ./chunlink.pl -s -ns ./combined/wsj/$dir/$file > ./chunkData/wsj/$dir/$file
We assume that this creates a directory structure in treebank-2 as follows
\treebank-2 - .. other standard stuff - \chunkData - \wsj - \00 - wsj_0001.mrg - wsj_0002.mrg - ... - \01 - wsj_0101.mrg - ... - and the other sections
#arguments: IOB tag: Begin, word numbering: sent #columns: word_id iob_inner pos word function heads head_ids iob_chain trace-function trace-type trace-head_ids # Sentence 0001/01 0 B-NP NNP Pierre NOFUNC Vinken 1 B-S/B-NP/B-NP 1 I-NP NNP Vinken NP-SBJ join 8 I-S/I-NP/I-NP 2 O COMMA COMMA NOFUNC Vinken 1 I-S/I-NP 3 B-NP CD 61 NOFUNC years 4 I-S/I-NP/B-ADJP/B-NP . . .
This class takes the standard penn treebank annotation and adds this annotation to it Each file in this has the following information in it:
Modifier and Type | Field and Description |
---|---|
protected String |
chunkHome |
protected List<String> |
chunkLines |
combinedWSJHome, currentSectionId, PENN_TREEBANK_WSJ, sections
corpusName, currentAnnotationId
Constructor and Description |
---|
TreebankChunkReader(String treebankHome) |
TreebankChunkReader(String treebankHome,
String[] sections) |
Modifier and Type | Method and Description |
---|---|
TextAnnotation |
next() |
hasNext, initializeReader, makeTextAnnotation, remove
iterator, reset
public TreebankChunkReader(String treebankHome) throws Exception
treebankHome
- Exception
public TextAnnotation next()
next
in interface Iterator<TextAnnotation>
next
in class TextAnnotationReader
Copyright © 2015. All rights reserved.