edu.illinois.cs.cogcomp.lbj.coref.parsers
Class BIOParser

java.lang.Object
  extended by edu.illinois.cs.cogcomp.lbj.coref.parsers.BIOParser
All Implemented Interfaces:
LBJ2.parse.Parser

public class BIOParser
extends java.lang.Object
implements LBJ2.parse.Parser

Extracts examples of mention chunks, one per word, for training a mention detection classifier. Each example represents one word, and indicate whether the word begins, is inside, or ends a head and/or extent of a mention. The examples are extracted from a corpus of documents specified by providing a document loader. To extract examples, repeatedly call the next method until it returns null.

Author:
Eric Bengtson

Field Summary
protected  java.util.List<Doc> m_docs
           
protected  java.util.List<BIOExample> m_examples
           
protected  int m_iD
           
protected  int m_iX
           
private  int m_numExamplesProcessed
           
 
Constructor Summary
BIOParser(DocLoader loader)
          Constructs a Parser that extracts examples from a corpus, with documents loaded by a specified document loader.
 
Method Summary
private  void advanceDoc()
          Prepares to extract examples from the next document (including resetting the document).
protected  void cleanup()
          Called immediately before next returns null.
 void close()
           
 void enqueue(java.lang.Object q)
          Does nothing
 java.util.List<BIOExample> getBIOExamples(Doc d)
           
private  BIOExample getNextExample()
          Gets an example from the cache and prepares for the next example.
 BIOExample next()
          Gets the next example, or null if no more examples remain.
 void reset()
          Resets the parser to the first document in the corpus.
private  void resetDoc()
          Resets the document, including caching the examples from the example extractor.
protected  void startup(DocLoader loader)
          Prepares the parser, by loading documents and resetting the doc.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

m_docs

protected java.util.List<Doc> m_docs

m_examples

protected java.util.List<BIOExample> m_examples

m_iD

protected int m_iD

m_iX

protected int m_iX

m_numExamplesProcessed

private int m_numExamplesProcessed
Constructor Detail

BIOParser

public BIOParser(DocLoader loader)
Constructs a Parser that extracts examples from a corpus, with documents loaded by a specified document loader.

Parameters:
loader - A document loader that loads a corpus of documents.
Method Detail

next

public BIOExample next()
Gets the next example, or null if no more examples remain.

Specified by:
next in interface LBJ2.parse.Parser
Returns:
The next example or null if none remain.

reset

public void reset()
Resets the parser to the first document in the corpus. It is not necessary to call this method before the first call to next.

Specified by:
reset in interface LBJ2.parse.Parser

close

public void close()
Specified by:
close in interface LBJ2.parse.Parser

enqueue

public void enqueue(java.lang.Object q)
Does nothing

Parameters:
q - An arbitrary object.

getBIOExamples

public java.util.List<BIOExample> getBIOExamples(Doc d)

getNextExample

private BIOExample getNextExample()
Gets an example from the cache and prepares for the next example. Call only after m_examples is initialized and when m_iX is less than the size of m_examples

Returns:
The next example (but never null).

advanceDoc

private void advanceDoc()
Prepares to extract examples from the next document (including resetting the document). Safe to call even when no additional documents remain.


resetDoc

private void resetDoc()
Resets the document, including caching the examples from the example extractor. Safe to call even if document is empty or does not exist.


startup

protected void startup(DocLoader loader)
Prepares the parser, by loading documents and resetting the doc.

Parameters:
loader - The loader from which to get the documents.

cleanup

protected void cleanup()
Called immediately before next returns null. Currently does nothing, but can be used to save caches or record statistics.