public class TreebankChunkReader extends PennTreebankReader
perl ./chunlink.pl -s -ns ./combined/wsj/$dir/$file > ./chunkData/wsj/$dir/$file
We assume that this creates a directory structure in treebank-2 as follows
\treebank-2 - .. other standard stuff - \chunkData - \wsj - \00 - wsj_0001.mrg - wsj_0002.mrg - ... - \01 - wsj_0101.mrg - ... - and the other sections
#arguments: IOB tag: Begin, word numbering: sent #columns: word_id iob_inner pos word function heads head_ids iob_chain trace-function trace-type trace-head_ids # Sentence 0001/01 0 B-NP NNP Pierre NOFUNC Vinken 1 B-S/B-NP/B-NP 1 I-NP NNP Vinken NP-SBJ join 8 I-S/I-NP/I-NP 2 O COMMA COMMA NOFUNC Vinken 1 I-S/I-NP 3 B-NP CD 61 NOFUNC years 4 I-S/I-NP/B-ADJP/B-NP . . .
This class takes the standard penn treebank annotation and adds this annotation to it
Each file in this has the following information in it:
Modifier and Type | Field and Description |
---|---|
protected String |
chunkHome |
protected List<String> |
chunkLines |
combinedWSJHome, currentSectionId, PENN_TREEBANK_WSJ, sections
corpusName, currentAnnotationId, resourceManager
Constructor and Description |
---|
TreebankChunkReader(String treebankHome) |
TreebankChunkReader(String treebankHome,
String[] sections) |
Modifier and Type | Method and Description |
---|---|
TextAnnotation |
next()
return the next annotation object.
|
generateReport, hasNext, initializeReader, remove
iterator, reset
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
forEach, spliterator
forEachRemaining
public TreebankChunkReader(String treebankHome) throws Exception
Exception
public TextAnnotation next()
PennTreebankReader
next
in interface Iterator<TextAnnotation>
next
in class PennTreebankReader
Copyright © 2017. All rights reserved.