public class EREEventReader extends EREMentionRelationReader
EREDocumentReader.EreCorpus
ends, IS_FOUND, starts
ARG_ONE, ARG_TWO, AUTHOR, CORPUS_TYPE, DATELINE, DATETIME, deletableSpanTags, DOC, ENTITIES, ENTITY, ENTITY_ID, ENTITY_MENTION, ENTITY_MENTION_ID, EntityHeadEndCharOffset, EntityHeadStartCharOffset, EntityIdAttribute, EntityKbIdAttribute, EntityMentionIdAttribute, EntityMentionTypeAttribute, EntitySpecificityAttribute, EVENT_ARGUMENT, EVENT_MENTION, EventIdAttribute, EventMentionIdAttribute, FILL, FILLER, FILLER_ID, FILLERS, HEADLINE, HOPPER, HOPPERS, ID, IMG, KBID, LENGTH, MENTION_HEAD, MENTION_TEXT, NAM, NAME_END, NAME_START, NOM, NOUN_TYPE, OFFSET, ORIG_AUTHOR, ORIGIN, POST, PRO, QUOTE, REALIS, RELATION, RELATION_MENTION, RelationIdAttribute, RelationMentionIdAttribute, RelationRealisAttribute, RELATIONS, RelationSourceRoleAttribute, RelationSubtypeAttribute, RelationTargetRoleAttribute, RelationTypeAttribute, ROLE, SARCASM, SNIP, SOURCE, SPECIFICITY, SQUISH, STUFF, SUBTYPE, tagsToIgnore, tagsWithAtts, TRIGGER, TYPE, UNKNOWN_KBID, UNSPECIFIED, WAYS
fileId
fileList, sourceDirectory
corpusName, currentAnnotationId, resourceManager
Constructor and Description |
---|
EREEventReader(EREDocumentReader.EreCorpus ereCorpus,
String corpusRoot,
boolean throwExceptionOnXmlParseFailure)
Read mention-relation annotations -- including coreference -- from ERE English corpus.
|
EREEventReader(EREDocumentReader.EreCorpus ereCorpus,
TextAnnotationBuilder taBldr,
String corpusRoot,
boolean throwExceptionOnXmlParseFailure)
constructor to allow arbitrary language/tokenization behavior via explicit TextAnnotationBuilder
|
Modifier and Type | Method and Description |
---|---|
String |
generateReport()
Reports number of relations and relation mentions read from source and generated.
|
List<XmlTextAnnotation> |
getAnnotationsFromFile(List<Path> corpusFileListEntry)
given an entry from the corpus file list generated by
EREDocumentReader.getFileListing() , parse its
contents and get zero or more TextAnnotation objects. |
static String |
getEventViewName() |
void |
reset()
set the reader to start from the beginning of the corpus.
|
getMentionViewName, readRelation
compileOffsets, findEndIndex, findEndIndexIgnoreError, findStartIndex, findStartIndexIgnoreError, getCorefViewName, getEntitiesFromFile, getFillersFromFile, getMentionConstituent, getTokenOffsets, readEntity
buildEreConfig, buildEreXmlTextAnnotationMaker, buildXmlTextAnnotationMaker, buildXmlTextAnnotationMaker, getFileListing, getPostViewName
getRequiredAnnotationFileExtension, getRequiredSourceFileExtension, initializeReader
getSourceDirectory, hasNext, next
iterator, remove
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
forEach, spliterator
forEachRemaining
public EREEventReader(EREDocumentReader.EreCorpus ereCorpus, String corpusRoot, boolean throwExceptionOnXmlParseFailure) throws Exception
ereCorpus
- the ERE corpus release (values from EreCorpus
corpusRoot
- the data root directory for the ERE corpus to be processedthrowExceptionOnXmlParseFailure
- if 'true', throws exception if xml parser encounters e.g. mismatched
open/close tags @throws ExceptionException
public EREEventReader(EREDocumentReader.EreCorpus ereCorpus, TextAnnotationBuilder taBldr, String corpusRoot, boolean throwExceptionOnXmlParseFailure) throws Exception
ereCorpus
- the ERE corpus release (values from EreCorpus
-- specifies source/markup directories,
source xml tag behaviortaBldr
- a TextAnnotationBuilder
for the desired language/tokenization behavior.corpusRoot
- the data root directory for the ERE corpus to be processedthrowExceptionOnXmlParseFailure
- if 'true', throws exception if xml parser encounters e.g. mismatched
open/close tags @throws ExceptionException
- if source/annotation file missing, or if xml not validpublic static String getEventViewName()
public void reset()
XmlDocumentReader
reset
in interface IResetableIterator<XmlTextAnnotation>
reset
in class EREMentionRelationReader
public List<XmlTextAnnotation> getAnnotationsFromFile(List<Path> corpusFileListEntry) throws Exception
EREDocumentReader
EREDocumentReader.getFileListing()
, parse its
contents and get zero or more TextAnnotation objects. This allows for the case where corpus
annotations are provided in standoff format in one or more files separate from the source
document. In such cases, the first file in the list should contain the source document
and the rest should be the corresponding markup files.
In this default implementation, it is assumed that a single file contains both source and markup.getAnnotationsFromFile
in class EREMentionRelationReader
corpusFileListEntry
- a list of files, the first of which is a source file.Exception
public String generateReport()
EREMentionRelationReader
generateReport
in class EREMentionRelationReader
Copyright © 2017. All rights reserved.