edu.illinois.cs.cogcomp.lbj.coref.parsers
Class EMParser

java.lang.Object
  extended by edu.illinois.cs.cogcomp.lbj.coref.parsers.EMParser
All Implemented Interfaces:
LBJ2.parse.Parser

public class EMParser
extends java.lang.Object
implements LBJ2.parse.Parser

Extracts mentions for use in training an LBJ classifier. The mentions are extracted from a corpus of documents specified either by providing a file name containing a list of document filenames. Gets all the mentions in the specified documents,

Author:
Eric Bengtson

Field Summary
private  java.util.List<Doc> m_docs
           
private  java.util.List<Mention> m_examples
           
private  int m_iD
           
private  int m_iX
           
 
Constructor Summary
EMParser(java.lang.String fileListFN)
          Constructs a Parser that extracts mentions for use in training an LBJ classifier.
 
Method Summary
private  void advanceDoc()
          Prepares to extract mentions from the next document (including resetting the document).
protected  void cleanup()
          Called immediately before next returns null.
 void close()
           
 void enqueue(java.lang.Object q)
          Does nothing
private  java.util.List<Mention> getExamples(int iD)
          Load all mentions from the current document using Doc.getMentions().
private  Mention getNextExample()
          Gets the current mention from the cache and prepares for the next example.
 Mention next()
          Gets the next mention, or null if no more mentions remain.
 void reset()
          Resets the parser to the first document in the corpus and resets the position within the (first) document.
private  void resetDoc()
          Resets the document, including caching the mentions from the current document.
protected  void startup(DocLoader loader)
          Prepares the parser, by loading documents and resetting.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

m_docs

private java.util.List<Doc> m_docs

m_examples

private java.util.List<Mention> m_examples

m_iD

private int m_iD

m_iX

private int m_iX
Constructor Detail

EMParser

public EMParser(java.lang.String fileListFN)
Constructs a Parser that extracts mentions for use in training an LBJ classifier. The mentions are extracted from a corpus of documents specified either by providing a file name containing a list of document filenames. Gets all the mentions in the specified documents.

Method Detail

next

public Mention next()
Gets the next mention, or null if no more mentions remain.

Specified by:
next in interface LBJ2.parse.Parser
Returns:
The next mention or null if none remain.

reset

public void reset()
Resets the parser to the first document in the corpus and resets the position within the (first) document. It is not necessary to call this method before the first call to next.

Specified by:
reset in interface LBJ2.parse.Parser

close

public void close()
Specified by:
close in interface LBJ2.parse.Parser

enqueue

public void enqueue(java.lang.Object q)
Does nothing

Parameters:
q - An arbitrary object.

getNextExample

private Mention getNextExample()
Gets the current mention from the cache and prepares for the next example. Call only after m_examples is initialized and when m_iX is less than the size of m_examples

Returns:
The current mention (but never null).

advanceDoc

private void advanceDoc()
Prepares to extract mentions from the next document (including resetting the document). Safe to call even when no additional documents remain.


resetDoc

private void resetDoc()
Resets the document, including caching the mentions from the current document. Safe to call even if document is empty or does not exist.


getExamples

private java.util.List<Mention> getExamples(int iD)
Load all mentions from the current document using Doc.getMentions(). Should not be called if the document does not exist.

Returns:
A list of mentions as retrieved using Doc.getMentions().

startup

protected void startup(DocLoader loader)
Prepares the parser, by loading documents and resetting.

Parameters:
loader - The loader from which to get the documents.

cleanup

protected void cleanup()
Called immediately before next returns null. Currently does nothing, but can be used to save caches or record statistics.