edu.illinois.cs.cogcomp.lbj.coref.parsers
Class CoParser

java.lang.Object
  extended by edu.illinois.cs.cogcomp.lbj.coref.parsers.CoParser
All Implemented Interfaces:
LBJ2.parse.Parser

public class CoParser
extends java.lang.Object
implements LBJ2.parse.Parser

Extracts coreference examples for use in training an LBJ classifier. The examples are extracted from a corpus of documents specified either by providing a file name containing a list of document filenames or by providing a document loader. From each document, examples are extracted according to the specified example extractor. See the various constructors for details. To extract examples, repeatedly call the next method until it returns null.

Author:
Eric Bengtson

Field Summary
private  CExampleExtractor m_cExExtractor
           
private  java.util.List<Doc> m_docs
           
private  java.util.List<CExample> m_examples
           
private  int m_iD
           
private  int m_iX
           
 
Constructor Summary
CoParser(DocLoader loader, CExampleExtractor extractor)
          Constructs a Parser that extracts coreference examples from a corpus, with documents loaded by a specified document loader and coreference examples extracted from each document using the specified example extractor.
CoParser(java.lang.String fileListFN)
          Constructs a Parser that extracts coreference examples from a corpus loaded using the default document loader, and examples extracted using the default example extractor.
CoParser(java.lang.String fileListFN, CExampleExtractor extractor)
          Constructs a Parser that extracts coreference examples from a corpus loaded using the default document loader as specified by DocLoader.getDefaultLoader(java.lang.String), and examples extracted using the specified example extractor.
 
Method Summary
private  void advanceDoc()
          Prepares to extract examples from the next document (including resetting the document).
protected  void cleanup()
          Called immediately before next returns null.
 void close()
           
 void enqueue(java.lang.Object q)
          Does nothing
private  java.util.List<CExample> getExamples()
          Load all examples from the example extractor.
private  CExample getNextExample()
          Gets an example from the cache and prepares for the next example.
 CExample next()
          Gets the next coreference example, or null if no more examples remain.
 void reset()
          Resets the parser to the first document in the corpus and resets the example extractor.
private  void resetDoc()
          Resets the document, including caching the examples from the example extractor.
protected  void startup(DocLoader loader)
          Prepares the parser, by loading documents and resetting the doc.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

m_docs

private java.util.List<Doc> m_docs

m_cExExtractor

private CExampleExtractor m_cExExtractor

m_examples

private java.util.List<CExample> m_examples

m_iD

private int m_iD

m_iX

private int m_iX
Constructor Detail

CoParser

public CoParser(java.lang.String fileListFN)
Constructs a Parser that extracts coreference examples from a corpus loaded using the default document loader, and examples extracted using the default example extractor. The default example extractor is currently CExExClosestPosAllNeg, which loads examples as follows: For each mention, creates a positive example with the nearest preceding coreferential mention, and creates negative examples with each preceding non-coreferential mention. Does not include any cataphoric examples (examples where a pronoun precedes a non-pronoun).

Parameters:
fileListFN - The classpath-relative filename of the corpus file, containing a list of document filenames, one per line. Each filename should be specified relative to a location in the classpath.

CoParser

public CoParser(DocLoader loader,
                CExampleExtractor extractor)
Constructs a Parser that extracts coreference examples from a corpus, with documents loaded by a specified document loader and coreference examples extracted from each document using the specified example extractor.

Parameters:
loader - A document loader that loads a corpus of documents.
extractor - An coreference example extractor.

CoParser

public CoParser(java.lang.String fileListFN,
                CExampleExtractor extractor)
Constructs a Parser that extracts coreference examples from a corpus loaded using the default document loader as specified by DocLoader.getDefaultLoader(java.lang.String), and examples extracted using the specified example extractor.

Parameters:
fileListFN - The classpath-relative filename of the corpus file, containing a list of document filenames, one per line. Each filename should be specified relative to a location in the classpath.
extractor - An coreference example extractor.
Method Detail

next

public CExample next()
Gets the next coreference example, or null if no more examples remain.

Specified by:
next in interface LBJ2.parse.Parser
Returns:
The next coreference example or null if none remain.

reset

public void reset()
Resets the parser to the first document in the corpus and resets the example extractor. It is not necessary to call this method before the first call to next.

Specified by:
reset in interface LBJ2.parse.Parser

close

public void close()
Specified by:
close in interface LBJ2.parse.Parser

enqueue

public void enqueue(java.lang.Object q)
Does nothing

Parameters:
q - An arbitrary object.

getNextExample

private CExample getNextExample()
Gets an example from the cache and prepares for the next example. Call only after m_examples is initialized and when m_iX is less than the size of m_examples

Returns:
The next example (but never null).

advanceDoc

private void advanceDoc()
Prepares to extract examples from the next document (including resetting the document). Safe to call even when no additional documents remain.


resetDoc

private void resetDoc()
Resets the document, including caching the examples from the example extractor. Safe to call even if document is empty or does not exist.


getExamples

private java.util.List<CExample> getExamples()
Load all examples from the example extractor. This includes calling setDoc on the example extractor to set the doc to the current document (as indicated by m_iD). Should not be called if the document does not exist.

Returns:
A list of the examples, in the order returned by the example extractor.

startup

protected void startup(DocLoader loader)
Prepares the parser, by loading documents and resetting the doc.

Parameters:
loader - The loader from which to get the documents.

cleanup

protected void cleanup()
Called immediately before next returns null. Currently does nothing, but can be used to save caches or record statistics.