edu.illinois.cs.cogcomp.lbj.coref.ir.docs
Class DocAPF

java.lang.Object
  extended by edu.illinois.cs.cogcomp.lbj.coref.ir.docs.DocBase
      extended by edu.illinois.cs.cogcomp.lbj.coref.ir.docs.DocXMLBase
          extended by edu.illinois.cs.cogcomp.lbj.coref.ir.docs.DocAPF
All Implemented Interfaces:
Doc, java.io.Serializable

public class DocAPF
extends DocXMLBase

Author:
Eric Bengtson
See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from class edu.illinois.cs.cogcomp.lbj.coref.ir.docs.DocBase
DocBase.PosSource
 
Field Summary
private static long serialVersionUID
           
 
Fields inherited from class edu.illinois.cs.cogcomp.lbj.coref.ir.docs.DocBase
goodEnds, goodStarts, m_annotationAuthor, m_baseFN, m_bNeedsCasing, m_caser, m_dateTime, m_docID, m_docType, m_encoding, m_headline, m_slug, m_source, m_text, m_version, medEnds, totalMentions
 
Constructor Summary
DocAPF()
          Basic constructor: Not recommended.
DocAPF(java.lang.String filename)
          Loads filename file and reads in the XML representation.
DocAPF(java.lang.String filename, LBJ2.classify.Classifier caser)
           
DocAPF(java.lang.String filename, DocBase.PosSource posSource)
          Loads filename file and reads in the XML representation.
 
Method Summary
protected  java.lang.String getBaseFilename(java.lang.String filename)
          Removes the extension (including the periods) from the filename, if it has an extension.
protected  Entity loadEntity(org.w3c.dom.Node nEntity)
          Loads an entity from an XML representation and returns it.
protected  Chunk processChunk(org.w3c.dom.Element element)
          Load a chunk.
protected  java.lang.String toXMLString(Chunk c)
           
protected  java.lang.String toXMLString(Entity e)
           
 void write(boolean usePredictions)
          Writes this Doc in the appropriate format.
 void write(java.lang.String filenameBase, boolean usePredictions)
          Writes this Doc in the appropriate format.
 
Methods inherited from class edu.illinois.cs.cogcomp.lbj.coref.ir.docs.DocXMLBase
getOptAttrib, getShortEID, loadRelation, loadXML, processAttributes, processEntityMention, toXMLString, toXMLString, toXMLString
 
Methods inherited from class edu.illinois.cs.cogcomp.lbj.coref.ir.docs.DocBase
addHeadPrediction, addPredEntities, addRelation, addTrueEntity, addTrueMention, alignPredMentsToTrue, buildMentionsContaining, buildMentionsInSents, calcAndSetQuotes, getBestMentionFor, getCExampleFor, getCoherenceInfo, getCoherenceInfo, getCorefChains, getDocID, getEntities, getEntityFor, getEntityFor, getEntityFor, getGExampleFor, getHeadPrediction, getInCorpusInverseFreq, getInDocInverseFreq, getInverseTrueHeadFreq, getInverseTrueHeadFreq, getMention, getMentions, getMentionsContainedIn, getMentionsContaining, getMentionsInSent, getMentionsInSentences, getMentionsWithExtentStartingAt, getMentionsWithHeadStartingAt, getNumMentions, getNumRelations, getNumSentences, getPlainText, getPOS, getPOS, getPredEntities, getPredMention, getPredMentions, getQuoteNestLevel, getRelation, getSentNum, getStartCharNum, getTextFirstWordNum, getTrueEntities, getTrueMention, getTrueMentionFor, getTrueMentions, getWholeDocCounts, getWord, getWordNum, getWords, hasHeadPrediction, hasPredEntities, hasPredMentions, hasTrueEntities, hasTrueMentions, initMembersDefault, isCaseSensitive, loadChunkedText, loadFromText, loadFromText, loadPOSTaggerOutput, loadPOSTags, loadSGMText, makeBestMentionMap, makeChunk, printChunkValidity, recordWordLocation, removeTagsAndExtraNL, repeat, save, setCorpusCounts, setPlainText, setPOSTags, setPredEntities, setPredictedMentions, setQuoteLevels, setSentenceNumbers, setUsePredictedEntities, setUsePredictedMentions, setWords, setWords, sortEntitiesByListOrder, sortPredictedMentions, sortTrueMentions, toAnnotatedString, toAnnotatedString, toCoherenceTableString, toCoherenceTableString, toString, toSubstituteString, translateEscaped, usePredictedEntities, usePredictedMentions
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

serialVersionUID

private static final long serialVersionUID
See Also:
Constant Field Values
Constructor Detail

DocAPF

public DocAPF()
Basic constructor: Not recommended.


DocAPF

public DocAPF(java.lang.String filename)
       throws XMLException
Loads filename file and reads in the XML representation.

Parameters:
filename - The name of the file.
Throws:
XMLException

DocAPF

public DocAPF(java.lang.String filename,
              DocBase.PosSource posSource)
       throws XMLException
Loads filename file and reads in the XML representation.

Parameters:
filename - The name of the file.
posSource - Where the document should get POS tags from. If PosSource.FILE, attempts to make the system more exactly reproduce the previously published results. This requires a corpus that is preprocessed offline using CogComp preprocessing tools available at http://L2R.cs.uiuc.edu/~cogcomp If PosSource.SNOW, use a local SNoW based preprocessor called tagger, located in PATH_POS environment variable (which must be exported). This is generally slow. Otherwise, uses the LBJ preprocesor (fastest, but performance may differ from published results). PosSource.FILE Loads offline preprocessing from files ending in .sgm.strip_chunker. PosSource.SNOW Uses an offline
Throws:
XMLException

DocAPF

public DocAPF(java.lang.String filename,
              LBJ2.classify.Classifier caser)
       throws XMLException
Throws:
XMLException
Method Detail

loadEntity

protected Entity loadEntity(org.w3c.dom.Node nEntity)
                     throws XMLException
Loads an entity from an XML representation and returns it. As a side effect, adds true mentions to the document.

Specified by:
loadEntity in class DocXMLBase
Throws:
XMLException

processChunk

protected Chunk processChunk(org.w3c.dom.Element element)
                      throws XMLException
Load a chunk.

Specified by:
processChunk in class DocXMLBase
Parameters:
element - An element containing a charseq Element.
Returns:
The desired chunk.
Throws:
XMLException

write

public void write(boolean usePredictions)
Description copied from interface: Doc
Writes this Doc in the appropriate format.

Specified by:
write in interface Doc
Specified by:
write in class DocXMLBase
Parameters:
usePredictions - Whether predicted mentions and entities should be written.

write

public void write(java.lang.String filenameBase,
                  boolean usePredictions)
Description copied from interface: Doc
Writes this Doc in the appropriate format.

Specified by:
write in interface Doc
Specified by:
write in class DocXMLBase
Parameters:
filenameBase - The name of the target file.
usePredictions - Whether predicted mentions and entities should be written.

getBaseFilename

protected java.lang.String getBaseFilename(java.lang.String filename)
Removes the extension (including the periods) from the filename, if it has an extension. For DocAPF files, the extension is ".apf.xml".

Specified by:
getBaseFilename in class DocXMLBase
Parameters:
filename - The name of the file.
Returns:
The name of the file with the extension removed.

toXMLString

protected java.lang.String toXMLString(Chunk c)
Specified by:
toXMLString in class DocXMLBase

toXMLString

protected java.lang.String toXMLString(Entity e)