public abstract class AbstractIncrementalXmlCorpusReader extends Object
Constructor and Description |
---|
AbstractIncrementalXmlCorpusReader(ResourceManager rm)
ResourceManager must specify the fields
CorpusReaderConfigurator .CORPUS_NAME and
.CORPUS_DIRECTORY, plus whatever is required by the derived class for initializeReader(). |
Modifier and Type | Method and Description |
---|---|
String |
generateReport()
generate a human-readable report of annotations read from the source file (plus whatever
other relevant statistics the user should know about).
|
abstract List<List<Path>> |
getFileListing()
generate a list of files comprising the corpus.
|
String |
getSourceDirectory() |
abstract List<XmlTextAnnotation> |
getXmlTextAnnotationsFromFile(List<Path> corpusFileListEntry)
given an entry from the corpus file list generated by
getFileListing() , parse its
contents and get zero or more TextAnnotation objects. |
boolean |
hasNext() |
protected void |
initializeReader()
this method is called by the base class constructor, so all subclass-specific object
initialization must be done here.
|
XmlTextAnnotation |
next()
Returns the next element in the iteration.
|
void |
reset() |
public AbstractIncrementalXmlCorpusReader(ResourceManager rm) throws Exception
CorpusReaderConfigurator
.CORPUS_NAME and
.CORPUS_DIRECTORY, plus whatever is required by the derived class for initializeReader().rm
- ResourceManagerException
protected void initializeReader()
public void reset()
public String getSourceDirectory()
public boolean hasNext()
public XmlTextAnnotation next()
NoSuchElementException
- if the iteration has no more elementspublic abstract List<List<Path>> getFileListing() throws IOException
IOException
public abstract List<XmlTextAnnotation> getXmlTextAnnotationsFromFile(List<Path> corpusFileListEntry) throws Exception
getFileListing()
, parse its
contents and get zero or more TextAnnotation objects.corpusFileListEntry
- corpus file containing content to be processedException
public String generateReport()
Copyright © 2017. All rights reserved.