public abstract class AbstractIncrementalCorpusReader<T> extends AnnotationReader<T>
Modifier and Type | Field and Description |
---|---|
protected List<List<Path>> |
fileList
contains pointers to files comprising corpus.
|
protected String |
sourceDirectory
root directory of corpus
|
corpusName, currentAnnotationId, resourceManager
Constructor and Description |
---|
AbstractIncrementalCorpusReader(ResourceManager rm)
ResourceManager must specify the fields
CorpusReaderConfigurator .CORPUS_NAME and
.CORPUS_DIRECTORY, plus whatever is required by the derived class for initializeReader(). |
Modifier and Type | Method and Description |
---|---|
String |
generateReport()
generate a human-readable report of annotations read from the source file (plus whatever
other relevant statistics the user should know about).
|
abstract List<T> |
getAnnotationsFromFile(List<Path> corpusFileListEntry)
given an entry from the corpus file list generated by
getFileListing() , parse its
contents and get zero or more TextAnnotation objects. |
abstract List<List<Path>> |
getFileListing()
generate a list of files comprising the corpus.
|
String |
getSourceDirectory() |
boolean |
hasNext()
is there another annotation object to return?
|
protected void |
initializeReader()
this method is called by the base class constructor, so all subclass-specific object
initialization must be done here.
|
T |
next()
Returns the next element in the iteration.
|
void |
reset()
override this to conform to whatever the derived class's state mechanism requires.
|
iterator, remove
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
forEach, spliterator
forEachRemaining
protected List<List<Path>> fileList
protected String sourceDirectory
public AbstractIncrementalCorpusReader(ResourceManager rm) throws Exception
CorpusReaderConfigurator
.CORPUS_NAME and
.CORPUS_DIRECTORY, plus whatever is required by the derived class for initializeReader().rm
- ResourceManagerException
protected void initializeReader()
initializeReader
in class AnnotationReader<T>
public void reset()
AnnotationReader
reset
in interface IResetableIterator<T>
reset
in class AnnotationReader<T>
public String getSourceDirectory()
public boolean hasNext()
AnnotationReader
public T next()
next
in interface Iterator<T>
next
in class AnnotationReader<T>
NoSuchElementException
- if the iteration has no more elementspublic abstract List<List<Path>> getFileListing() throws IOException
IOException
public abstract List<T> getAnnotationsFromFile(List<Path> corpusFileListEntry) throws Exception
getFileListing()
, parse its
contents and get zero or more TextAnnotation objects.corpusFileListEntry
- corpus file containing content to be processedException
public String generateReport()
generateReport
in class AnnotationReader<T>
Copyright © 2017. All rights reserved.