public class TreebankChunkReader extends PennTreebankReader
perl ./chunlink.pl -s -ns ./combined/wsj/$dir/$file > ./chunkData/wsj/$dir/$file
We assume that this creates a directory structure in treebank-2 as follows
\treebank-2
- .. other standard stuff
- \chunkData
- \wsj
- \00
- wsj_0001.mrg
- wsj_0002.mrg
- ...
- \01
- wsj_0101.mrg
- ...
- and the other sections
#arguments: IOB tag: Begin, word numbering: sent #columns: word_id iob_inner pos word function heads head_ids iob_chain trace-function trace-type trace-head_ids # Sentence 0001/01 0 B-NP NNP Pierre NOFUNC Vinken 1 B-S/B-NP/B-NP 1 I-NP NNP Vinken NP-SBJ join 8 I-S/I-NP/I-NP 2 O COMMA COMMA NOFUNC Vinken 1 I-S/I-NP 3 B-NP CD 61 NOFUNC years 4 I-S/I-NP/B-ADJP/B-NP . . .
This class takes the standard penn treebank annotation and adds this annotation to it
Each file in this has the following information in it:
| Modifier and Type | Field and Description |
|---|---|
protected String |
chunkHome |
protected List<String> |
chunkLines |
combinedWSJHome, currentSectionId, PENN_TREEBANK_WSJ, sectionscorpusName, currentAnnotationId, resourceManager| Constructor and Description |
|---|
TreebankChunkReader(String treebankHome) |
TreebankChunkReader(String treebankHome,
String[] sections) |
| Modifier and Type | Method and Description |
|---|---|
TextAnnotation |
next()
return the next annotation object.
|
generateReport, hasNext, initializeReader, removeiterator, resetclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitforEach, spliteratorforEachRemainingpublic TreebankChunkReader(String treebankHome) throws Exception
Exceptionpublic TextAnnotation next()
PennTreebankReadernext in interface Iterator<TextAnnotation>next in class PennTreebankReaderCopyright © 2017. All rights reserved.