edu.illinois.cs.cogcomp.lbj.coref.io.loaders
Class DocAPFLoader

java.lang.Object
  extended by edu.illinois.cs.cogcomp.lbj.coref.io.loaders.DocLoader
      extended by edu.illinois.cs.cogcomp.lbj.coref.io.loaders.DocAPFLoader

public class DocAPFLoader
extends DocLoader

Loads a corpus of APF documents. Includes functionality for predicting mentions and their types.


Field Summary
private  DocBase.PosSource m_posSource
           
 
Fields inherited from class edu.illinois.cs.cogcomp.lbj.coref.io.loaders.DocLoader
m_caser, m_fileListFN, m_mdDecoder, m_mTypeClassifier
 
Constructor Summary
DocAPFLoader()
          Default constructor.
DocAPFLoader(java.lang.String fileListFN)
          Construct a loader that loads a list of APF documents given the name of the file specifying a list of document filenames.
DocAPFLoader(java.lang.String fileListFN, boolean offline)
          Construct a loader that loads a list of APF documents given the name of the file specifying a list of document filenames.
DocAPFLoader(java.lang.String fileListFN, DocBase.PosSource posSource)
          Construct a loader that loads a list of APF documents given the name of the file specifying a list of document filenames.
DocAPFLoader(java.lang.String fileListFN, MentionDecoder mentionDecoder, LBJ2.classify.Classifier mTyper)
          Construct a loader that loads a list of APF documents given the name of the file specifying a list of document filenames.
DocAPFLoader(java.lang.String fileListFN, MentionDecoder mentionDecoder, LBJ2.classify.Classifier mTyper, DocBase.PosSource posSource)
          Construct a loader that loads a list of APF documents given the name of the file specifying a list of document filenames.
 
Method Summary
protected  Doc createDoc(java.lang.String filename)
          Construct an DocAPF document from the given filename, which may end with the extension ".apf.xml".
 
Methods inherited from class edu.illinois.cs.cogcomp.lbj.coref.io.loaders.DocLoader
getDefaultLoader, getDefaultLoader, getFilenames, getPredMents, loadDoc, loadDocs
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

m_posSource

private DocBase.PosSource m_posSource
Constructor Detail

DocAPFLoader

public DocAPFLoader(java.lang.String fileListFN,
                    MentionDecoder mentionDecoder,
                    LBJ2.classify.Classifier mTyper,
                    DocBase.PosSource posSource)
Construct a loader that loads a list of APF documents given the name of the file specifying a list of document filenames. The file contains a list of filenames, one per line. Mentions will be predicted using the provided decoders and classifiers.

Parameters:
fileListFN - The name of the corpus file, containing a list of APF document filenames, one per line. Each document filename should be specified relative to the classpath and should be an apf.xml file, which may end in ".apf.xml". Additional files will be loaded from the same location.
mentionDecoder - The mention decoder extracts mentions from a document.
mTyper - Determines the mention types of each mention. Takes Mention objects as input and returns the type as a string, "NAM", "NOM", "PRE", or "PRO".
posSource - The source of POS (Part-of-Speech) tags. May be PosSource.FILE loaded from files generated by offline preprocessor PosSource.LBJ Predicted using LBJPOS tool (in classpath) - Preferrred. PosSource.SNOW Predicted using external command-line based preprocessor called at runtime.

DocAPFLoader

public DocAPFLoader(java.lang.String fileListFN,
                    MentionDecoder mentionDecoder,
                    LBJ2.classify.Classifier mTyper)
Construct a loader that loads a list of APF documents given the name of the file specifying a list of document filenames. The file contains a list of filenames, one per line. Mentions will be predicted using the provided decoders and classifiers.

Parameters:
fileListFN - The name of the corpus file, containing a list of APF document filenames, one per line. Each document filename should be specified relative to the classpath and should be an apf.xml file, which may end in ".apf.xml". Additional files will be loaded from the same location.
mentionDecoder - The mention decoder extracts mentions from a document.
mTyper - Determines the mention types of each mention. Takes Mention objects as input and returns the type as a string, "NAM", "NOM", "PRE", or "PRO".

DocAPFLoader

public DocAPFLoader(java.lang.String fileListFN)
Construct a loader that loads a list of APF documents given the name of the file specifying a list of document filenames. The file contains a list of filenames, one per line. The resulting documents will have true mentions but no predicted mentions.

Parameters:
fileListFN - The name of the corpus file, containing a list of APF document filenames, one per line. Each document filename should be specified relative to the classpath and should be an apf.xml file, which may end in ".apf.xml". Additional files will be loaded from the same location.

DocAPFLoader

public DocAPFLoader(java.lang.String fileListFN,
                    boolean offline)
Construct a loader that loads a list of APF documents given the name of the file specifying a list of document filenames. The file contains a list of filenames, one per line. The resulting documents will have true mentions but no predicted mentions.

Parameters:
fileListFN - The name of the corpus file, containing a list of APF document filenames, one per line. Each document filename should be specified relative to the classpath and should be an apf.xml file, which may end in ".apf.xml". Additional files will be loaded from the same location.
offline - Whether the documents should be loaded in a backwards compatible fashion: Particularly using offline preprocessing.

DocAPFLoader

public DocAPFLoader(java.lang.String fileListFN,
                    DocBase.PosSource posSource)
Construct a loader that loads a list of APF documents given the name of the file specifying a list of document filenames. The file contains a list of filenames, one per line. The resulting documents will have true mentions but no predicted mentions.

Parameters:
fileListFN - The name of the corpus file, containing a list of APF document filenames, one per line. Each document filename should be specified relative to the classpath and should be an apf.xml file, which may end in ".apf.xml". Additional files will be loaded from the same location.
posSource - The source of POS (Part-of-Speech) tags. May be PosSource.FILE loaded from files generated by offline preprocessor PosSource.LBJ Predicted using LBJPOS tool (in classpath) - Preferrred. PosSource.SNOW Predicted using external command-line based preprocessor called at runtime.

DocAPFLoader

public DocAPFLoader()
Default constructor. Use when document filenames are not known in advance.

Method Detail

createDoc

protected Doc createDoc(java.lang.String filename)
Construct an DocAPF document from the given filename, which may end with the extension ".apf.xml".

Specified by:
createDoc in class DocLoader
Parameters:
filename - The name of the APF file, which may end with the extension ".apf.xml". The filename should be specified relative to the classpath and should be an apf.xml file. Additional files will be loaded from the same location.
Returns:
a document corresponding to the inputString, either representing the text of inputString or saved in the file named by inputString
Throws:
java.lang.RuntimeException - if the document cannot be loaded for I/O reasons.