edu.illinois.cs.cogcomp.lbj.coref.io.loaders
Class DocPlainTextLoader

java.lang.Object
  extended by edu.illinois.cs.cogcomp.lbj.coref.io.loaders.DocLoader
      extended by edu.illinois.cs.cogcomp.lbj.coref.io.loaders.DocPlainTextLoader

public class DocPlainTextLoader
extends DocLoader

Loads documents from the filenames listed in the specified file.

To load documents from files, construct this providing the filename of a file containing a list of plain-text-document filenames (one per line) and then call DocLoader.loadDocs(). (Note: Each filename should be specified relative to a location in the classpath).

Author:
Eric Bengtson

Field Summary
 
Fields inherited from class edu.illinois.cs.cogcomp.lbj.coref.io.loaders.DocLoader
m_caser, m_fileListFN, m_mdDecoder, m_mTypeClassifier
 
Constructor Summary
DocPlainTextLoader(java.lang.String fileListFN, MentionDecoder mentionDetector, LBJ2.classify.Classifier mTyper)
          Constructs a loader that loads plain text files, and will detect and type mentions automatically.
 
Method Summary
protected  Doc createDoc(java.lang.String filename)
          Constructs and returns a document from the specified plain text file.
 
Methods inherited from class edu.illinois.cs.cogcomp.lbj.coref.io.loaders.DocLoader
getDefaultLoader, getDefaultLoader, getFilenames, getPredMents, loadDoc, loadDocs
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DocPlainTextLoader

public DocPlainTextLoader(java.lang.String fileListFN,
                          MentionDecoder mentionDetector,
                          LBJ2.classify.Classifier mTyper)
Constructs a loader that loads plain text files, and will detect and type mentions automatically. Words will be split by an automatic word splitting algorithm. Sentence boundaries, quotations, and part-of-speech tags will also automatically be discovered. The file contains a list of filenames, one per line.

Parameters:
fileListFN - The name of the corpus file, relative to the a location in the classpath, containing a list of plain text document filenames, one per line. Each document filename should be specified relative to the classpath.
mentionDetector - The mention detector extracts mentions from a document, by predicting the head and extent boundaries of all mentions.
mTyper - Determines the mention types of each mention. Takes Mention objects as input and returns the type as a string, "NAM", "NOM", "PRE", or "PRO".
Method Detail

createDoc

protected Doc createDoc(java.lang.String filename)
Constructs and returns a document from the specified plain text file.

Specified by:
createDoc in class DocLoader
Parameters:
filename - The name of a plain text file, relative to a location in the classpath.
Returns:
A document representing the specified plain text file.