edu.illinois.cs.cogcomp.lbj.coref.ir.docs
Class DocXMLBase

java.lang.Object
  extended by edu.illinois.cs.cogcomp.lbj.coref.ir.docs.DocBase
      extended by edu.illinois.cs.cogcomp.lbj.coref.ir.docs.DocXMLBase
All Implemented Interfaces:
Doc, java.io.Serializable
Direct Known Subclasses:
DocACEPhase2, DocAPF

public abstract class DocXMLBase
extends DocBase

The superclass of documents loaded from XML.

Author:
Eric Bengtson
See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from class edu.illinois.cs.cogcomp.lbj.coref.ir.docs.DocBase
DocBase.PosSource
 
Field Summary
private static long serialVersionUID
           
 
Fields inherited from class edu.illinois.cs.cogcomp.lbj.coref.ir.docs.DocBase
goodEnds, goodStarts, m_annotationAuthor, m_baseFN, m_bNeedsCasing, m_caser, m_dateTime, m_docID, m_docType, m_encoding, m_headline, m_slug, m_source, m_text, m_version, medEnds, totalMentions
 
Constructor Summary
DocXMLBase()
          Basic constructor: Not recommended.
DocXMLBase(java.lang.String filename, java.lang.String ext)
          Given the name of a file and the extension, load the file and reads in the XML representation.
DocXMLBase(java.lang.String baseFilename, java.lang.String ext, LBJ2.classify.Classifier caser)
           
DocXMLBase(java.lang.String filename, java.lang.String ext, DocBase.PosSource posSource)
          Given the name of a file and the extension, load the file and reads in the XML representation.
 
Method Summary
private  Chunk findAndProcessChunk(org.w3c.dom.Element parent, java.lang.String tagName)
          Find and load a chunk.
private  boolean foundPredEnt(java.lang.String eID)
           
protected abstract  java.lang.String getBaseFilename(java.lang.String filename)
          Trim possible extension from file.
protected  java.lang.String getOptAttrib(org.w3c.dom.NamedNodeMap attribs, java.lang.String attribName, java.lang.String defaultResult)
           
 java.lang.String getShortEID(java.lang.String longID)
           
protected abstract  Entity loadEntity(org.w3c.dom.Node nEntity)
           
protected  Relation loadRelation(org.w3c.dom.Element node)
          Loads a Relation from an xml representation and returns it.
 void loadXML(java.lang.String filename)
           
protected  java.util.List<Chunk> processAttributes(org.w3c.dom.Element parent, java.lang.String attrName)
          Gets all Chunks found inside parent with nodeName attrName.
protected abstract  Chunk processChunk(org.w3c.dom.Element element)
          Load a chunk.
protected  Mention processEntityMention(org.w3c.dom.Element node, java.lang.String entityID, java.lang.String entityType, java.lang.String subtype, java.lang.String specificity)
          Process an mentionType_mention tag.
private  RelationEntityArgument processRelationEntityArgument(org.w3c.dom.Element node)
           
private  RelationMention processRelationMention(org.w3c.dom.Element element)
           
private  RelationMentionArgument processRelationMentionArgument(org.w3c.dom.Element node)
           
protected abstract  java.lang.String toXMLString(Chunk c)
           
protected  java.lang.String toXMLString(Mention m, java.lang.String linePrefix)
           
protected  java.lang.String toXMLString(Relation r)
           
private  java.lang.String toXMLString(RelationEntityArgument a, int argNum)
           
private  java.lang.String toXMLString(RelationMentionArgument a, java.lang.String linePrefix)
           
private  java.lang.String toXMLString(RelationMention m, java.lang.String linePrefix)
           
protected  java.lang.String toXMLString(java.lang.String plainText)
          Converts plain text to XML safe format by escaping ampersands.
abstract  void write(boolean usePredictions)
          Writes this Doc in the appropriate format.
abstract  void write(java.lang.String filenameBase, boolean usePredictions)
          Writes this Doc in the appropriate format.
 
Methods inherited from class edu.illinois.cs.cogcomp.lbj.coref.ir.docs.DocBase
addHeadPrediction, addPredEntities, addRelation, addTrueEntity, addTrueMention, alignPredMentsToTrue, buildMentionsContaining, buildMentionsInSents, calcAndSetQuotes, getBestMentionFor, getCExampleFor, getCoherenceInfo, getCoherenceInfo, getCorefChains, getDocID, getEntities, getEntityFor, getEntityFor, getEntityFor, getGExampleFor, getHeadPrediction, getInCorpusInverseFreq, getInDocInverseFreq, getInverseTrueHeadFreq, getInverseTrueHeadFreq, getMention, getMentions, getMentionsContainedIn, getMentionsContaining, getMentionsInSent, getMentionsInSentences, getMentionsWithExtentStartingAt, getMentionsWithHeadStartingAt, getNumMentions, getNumRelations, getNumSentences, getPlainText, getPOS, getPOS, getPredEntities, getPredMention, getPredMentions, getQuoteNestLevel, getRelation, getSentNum, getStartCharNum, getTextFirstWordNum, getTrueEntities, getTrueMention, getTrueMentionFor, getTrueMentions, getWholeDocCounts, getWord, getWordNum, getWords, hasHeadPrediction, hasPredEntities, hasPredMentions, hasTrueEntities, hasTrueMentions, initMembersDefault, isCaseSensitive, loadChunkedText, loadFromText, loadFromText, loadPOSTaggerOutput, loadPOSTags, loadSGMText, makeBestMentionMap, makeChunk, printChunkValidity, recordWordLocation, removeTagsAndExtraNL, repeat, save, setCorpusCounts, setPlainText, setPOSTags, setPredEntities, setPredictedMentions, setQuoteLevels, setSentenceNumbers, setUsePredictedEntities, setUsePredictedMentions, setWords, setWords, sortEntitiesByListOrder, sortPredictedMentions, sortTrueMentions, toAnnotatedString, toAnnotatedString, toCoherenceTableString, toCoherenceTableString, toString, toSubstituteString, translateEscaped, usePredictedEntities, usePredictedMentions
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

serialVersionUID

private static final long serialVersionUID
See Also:
Constant Field Values
Constructor Detail

DocXMLBase

public DocXMLBase()
Basic constructor: Not recommended.


DocXMLBase

public DocXMLBase(java.lang.String filename,
                  java.lang.String ext)
           throws XMLException
Given the name of a file and the extension, load the file and reads in the XML representation.

Parameters:
filename - The filename, which may or may not end with ext.
ext - The extension of the filename, without a leading period.
Throws:
XMLException

DocXMLBase

public DocXMLBase(java.lang.String filename,
                  java.lang.String ext,
                  DocBase.PosSource posSource)
           throws XMLException
Given the name of a file and the extension, load the file and reads in the XML representation.

Parameters:
filename - The filename, which may or may not end with ext.
ext - The extension of the filename, without a leading period.
posSource - If PosSource.FILE, attempts to make the system more exactly reproduce the previously published results. This requires a corpus that is preprocessed offline using CogComp preprocessing tools available at http://L2R.cs.uiuc.edu/~cogcomp If PosSource.SNOW, use a local SNoW based preprocessor called tagger, located in PATH_POS environment variable (which must be exported). This is generally slow. Otherwise, uses the LBJ preprocesor (fastest, but performance may differ from published results).
Throws:
XMLException

DocXMLBase

public DocXMLBase(java.lang.String baseFilename,
                  java.lang.String ext,
                  LBJ2.classify.Classifier caser)
           throws XMLException
Throws:
XMLException
Method Detail

loadXML

public void loadXML(java.lang.String filename)
             throws XMLException
Parameters:
filename - file to load containing xml representation.
Throws:
XMLException

loadEntity

protected abstract Entity loadEntity(org.w3c.dom.Node nEntity)
                              throws XMLException
Throws:
XMLException

loadRelation

protected Relation loadRelation(org.w3c.dom.Element node)
                         throws XMLException
Loads a Relation from an xml representation and returns it.

Throws:
XMLException

processAttributes

protected java.util.List<Chunk> processAttributes(org.w3c.dom.Element parent,
                                                  java.lang.String attrName)
                                           throws XMLException
Gets all Chunks found inside parent with nodeName attrName.

Parameters:
parent - of children that have name attrName.
attrName - Name of children to extract.
Throws:
XMLException

processEntityMention

protected Mention processEntityMention(org.w3c.dom.Element node,
                                       java.lang.String entityID,
                                       java.lang.String entityType,
                                       java.lang.String subtype,
                                       java.lang.String specificity)
                                throws XMLException
Process an mentionType_mention tag. Must not be called until counting texts and word split texts have been processed.

Parameters:
node - A mentionType_mention node
entityID - The ID of the entity that this mentions.
specificity - The specificity ("SPC" or "GEN") of the mention.
entityType - The entity-type.
subtype - The entity-type subtype.
Returns:
The processed mention.
Throws:
XMLException - If the XML cannot be processed.

processRelationMention

private RelationMention processRelationMention(org.w3c.dom.Element element)
                                        throws XMLException
Throws:
XMLException

processRelationMentionArgument

private RelationMentionArgument processRelationMentionArgument(org.w3c.dom.Element node)
                                                        throws XMLException
Throws:
XMLException

processRelationEntityArgument

private RelationEntityArgument processRelationEntityArgument(org.w3c.dom.Element node)

getOptAttrib

protected java.lang.String getOptAttrib(org.w3c.dom.NamedNodeMap attribs,
                                        java.lang.String attribName,
                                        java.lang.String defaultResult)

findAndProcessChunk

private Chunk findAndProcessChunk(org.w3c.dom.Element parent,
                                  java.lang.String tagName)
                           throws XMLException
Find and load a chunk.

Parameters:
parent - Parent of Node with name tagName.
tagName - tagName of desired chunk.
Returns:
The desired Chunk.
Throws:
XMLException

processChunk

protected abstract Chunk processChunk(org.w3c.dom.Element element)
                               throws XMLException
Load a chunk.

Parameters:
element - An element containing a charseq Element.
Returns:
The desired chunk.
Throws:
XMLException

getShortEID

public java.lang.String getShortEID(java.lang.String longID)
Overrides:
getShortEID in class DocBase

write

public abstract void write(boolean usePredictions)
Description copied from interface: Doc
Writes this Doc in the appropriate format.

Specified by:
write in interface Doc
Overrides:
write in class DocBase
Parameters:
usePredictions - Whether predicted mentions and entities should be written.

write

public abstract void write(java.lang.String filenameBase,
                           boolean usePredictions)
Description copied from interface: Doc
Writes this Doc in the appropriate format.

Specified by:
write in interface Doc
Specified by:
write in class DocBase
Parameters:
filenameBase - The name of the target file.
usePredictions - Whether predicted mentions and entities should be written.

getBaseFilename

protected abstract java.lang.String getBaseFilename(java.lang.String filename)
Trim possible extension from file.


toXMLString

protected java.lang.String toXMLString(Mention m,
                                       java.lang.String linePrefix)

toXMLString

protected abstract java.lang.String toXMLString(Chunk c)

toXMLString

protected java.lang.String toXMLString(java.lang.String plainText)
Converts plain text to XML safe format by escaping ampersands.


toXMLString

protected java.lang.String toXMLString(Relation r)

foundPredEnt

private boolean foundPredEnt(java.lang.String eID)

toXMLString

private java.lang.String toXMLString(RelationEntityArgument a,
                                     int argNum)

toXMLString

private java.lang.String toXMLString(RelationMention m,
                                     java.lang.String linePrefix)

toXMLString

private java.lang.String toXMLString(RelationMentionArgument a,
                                     java.lang.String linePrefix)