public class EREEventReader extends EREMentionRelationReader
EREDocumentReader.EreCorpusends, IS_FOUND, startsARG_ONE, ARG_TWO, AUTHOR, CORPUS_TYPE, DATELINE, DATETIME, deletableSpanTags, DOC, ENTITIES, ENTITY, ENTITY_ID, ENTITY_MENTION, ENTITY_MENTION_ID, EntityHeadEndCharOffset, EntityHeadStartCharOffset, EntityIdAttribute, EntityKbIdAttribute, EntityMentionIdAttribute, EntityMentionTypeAttribute, EntitySpecificityAttribute, EVENT_ARGUMENT, EVENT_MENTION, EventIdAttribute, EventMentionIdAttribute, FILL, FILLER, FILLER_ID, FILLERS, HEADLINE, HOPPER, HOPPERS, ID, IMG, KBID, LENGTH, MENTION_HEAD, MENTION_TEXT, NAM, NAME_END, NAME_START, NOM, NOUN_TYPE, OFFSET, ORIG_AUTHOR, ORIGIN, POST, PRO, QUOTE, REALIS, RELATION, RELATION_MENTION, RelationIdAttribute, RelationMentionIdAttribute, RelationRealisAttribute, RELATIONS, RelationSourceRoleAttribute, RelationSubtypeAttribute, RelationTargetRoleAttribute, RelationTypeAttribute, ROLE, SARCASM, SNIP, SOURCE, SPECIFICITY, SQUISH, STUFF, SUBTYPE, tagsToIgnore, tagsWithAtts, TRIGGER, TYPE, UNKNOWN_KBID, UNSPECIFIED, WAYSfileIdfileList, sourceDirectorycorpusName, currentAnnotationId, resourceManager| Constructor and Description |
|---|
EREEventReader(EREDocumentReader.EreCorpus ereCorpus,
String corpusRoot,
boolean throwExceptionOnXmlParseFailure)
Read mention-relation annotations -- including coreference -- from ERE English corpus.
|
EREEventReader(EREDocumentReader.EreCorpus ereCorpus,
TextAnnotationBuilder taBldr,
String corpusRoot,
boolean throwExceptionOnXmlParseFailure)
constructor to allow arbitrary language/tokenization behavior via explicit TextAnnotationBuilder
|
| Modifier and Type | Method and Description |
|---|---|
String |
generateReport()
Reports number of relations and relation mentions read from source and generated.
|
List<XmlTextAnnotation> |
getAnnotationsFromFile(List<Path> corpusFileListEntry)
given an entry from the corpus file list generated by
EREDocumentReader.getFileListing() , parse its
contents and get zero or more TextAnnotation objects. |
static String |
getEventViewName() |
void |
reset()
set the reader to start from the beginning of the corpus.
|
getMentionViewName, readRelationcompileOffsets, findEndIndex, findEndIndexIgnoreError, findStartIndex, findStartIndexIgnoreError, getCorefViewName, getEntitiesFromFile, getFillersFromFile, getMentionConstituent, getTokenOffsets, readEntitybuildEreConfig, buildEreXmlTextAnnotationMaker, buildXmlTextAnnotationMaker, buildXmlTextAnnotationMaker, getFileListing, getPostViewNamegetRequiredAnnotationFileExtension, getRequiredSourceFileExtension, initializeReadergetSourceDirectory, hasNext, nextiterator, removeclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitforEach, spliteratorforEachRemainingpublic EREEventReader(EREDocumentReader.EreCorpus ereCorpus, String corpusRoot, boolean throwExceptionOnXmlParseFailure) throws Exception
ereCorpus - the ERE corpus release (values from EreCorpuscorpusRoot - the data root directory for the ERE corpus to be processedthrowExceptionOnXmlParseFailure - if 'true', throws exception if xml parser encounters e.g. mismatched
open/close tags @throws ExceptionExceptionpublic EREEventReader(EREDocumentReader.EreCorpus ereCorpus, TextAnnotationBuilder taBldr, String corpusRoot, boolean throwExceptionOnXmlParseFailure) throws Exception
ereCorpus - the ERE corpus release (values from EreCorpus -- specifies source/markup directories,
source xml tag behaviortaBldr - a TextAnnotationBuilder for the desired language/tokenization behavior.corpusRoot - the data root directory for the ERE corpus to be processedthrowExceptionOnXmlParseFailure - if 'true', throws exception if xml parser encounters e.g. mismatched
open/close tags @throws ExceptionException - if source/annotation file missing, or if xml not validpublic static String getEventViewName()
public void reset()
XmlDocumentReaderreset in interface IResetableIterator<XmlTextAnnotation>reset in class EREMentionRelationReaderpublic List<XmlTextAnnotation> getAnnotationsFromFile(List<Path> corpusFileListEntry) throws Exception
EREDocumentReaderEREDocumentReader.getFileListing() , parse its
contents and get zero or more TextAnnotation objects. This allows for the case where corpus
annotations are provided in standoff format in one or more files separate from the source
document. In such cases, the first file in the list should contain the source document
and the rest should be the corresponding markup files.
In this default implementation, it is assumed that a single file contains both source and markup.getAnnotationsFromFile in class EREMentionRelationReadercorpusFileListEntry - a list of files, the first of which is a source file.Exceptionpublic String generateReport()
EREMentionRelationReadergenerateReport in class EREMentionRelationReaderCopyright © 2017. All rights reserved.