public class ParseUtils extends Object
Modifier and Type | Field and Description |
---|---|
static String |
PATH_DOWN_STRING
This string indicates a path going down a parse tree in the path string.
|
static String |
PATH_UP_STRING
This string indicates a path going up a parse tree in the path string.
|
Constructor and Description |
---|
ParseUtils() |
Modifier and Type | Method and Description |
---|---|
static String |
convertBracketsFromPTBFormat(String sentence)
Convert brackets from the Penn treebank format (which uses strings like -LRB-, -RRB-, etc to
denote '(', ')', etc.) to readable tokens.
|
static String |
convertBracketsToPTBFormat(String sentence)
Convert brackets from readable forms to the Penn treebank format (which uses strings like
-LRB-, -RRB-, etc to denote '(', ')', etc.)
|
static String[] |
getAllPhraseSiblingLabels(String parseViewName,
Constituent constituent)
Get the labels of all the siblings of the parse tree node that covers the input constituent.
|
static String[] |
getAllSiblingLabels(Tree<String> node)
Get the labels of all siblings of a given tree node.
|
static <T> Tree<T> |
getCommonAncestor(Tree<T> t1,
Tree<T> t2,
Tree<T> tree)
Deprecated.
|
static int |
getHeadWordPosition(Constituent c,
HeadFinderBase headFinder,
String parseViewName)
Get the head word of a constituent using the
HeadFinderBase that is passed as an
argument. |
static Tree<String> |
getParseTree(String parseViewName,
Sentence s)
Get the parse tree of a sentence.
|
static Tree<String> |
getParseTree(String parseViewName,
TextAnnotation ta,
int sentenceId)
Get the parse tree of the
sentenceId th sentence from the text annotation. |
static Tree<String> |
getParseTree(TextAnnotation ta,
int sentenceId)
Deprecated.
|
static Tree<String> |
getParseTreeCovering(String parseViewName,
Constituent c)
Get a parse tree from a text annotation that covers the specified constituent.
|
static <T> Pair<List<Tree<T>>,List<Tree<T>>> |
getPath(Tree<T> start,
Tree<T> end,
Tree<T> tree,
int maxDepth)
Deprecated.
|
static <T> String |
getPathString(Tree<T> start,
Tree<T> end,
Tree<T> tree,
int maxDepth)
Deprecated.
|
static <T> String |
getPathStringIgnoreLexicalItems(Tree<T> start,
Tree<T> end,
Tree<T> tree,
int maxDepth)
Deprecated.
|
static <T> String |
getPathStringToCommonAncestor(Tree<T> start,
Tree<T> end,
Tree<T> tree,
int maxDepth)
Deprecated.
|
static <T> List<T> |
getPathToRoot(Tree<T> tree,
Tree<T> leaf,
int maxDepth)
Deprecated.
|
static <T> List<Tree<T>> |
getPathTreesToRoot(Tree<T> tree,
Tree<T> node,
int maxDepth)
Deprecated.
|
static Constituent |
getPhraseFromHead(Constituent predicate,
Constituent argHead,
String parseViewName)
Primarily a fix for prepSRL objects; converts them from single head words to constituents.
|
static String |
getSentenceFromTree(Tree<String> tree)
Gets the terminal string from the parse tree.
|
static Tree<Pair<String,IntPair>> |
getSpanLabeledTree(Tree<String> parseTree)
Transforms a parse tree into a new tree where each node is labeled by the span it covers in
addition to the label of that node from the original parse tree.
|
static String |
getSubcatFrame(Tree<String> yieldNode)
Assuming that the tree comes with lexical items and POS tags, the subcat frame for the verb
can be found by going to the parent of the POS tag (which is probably a VP) and listing its
children.
|
static String |
getTerminalString(Tree<String> tree) |
static String[] |
getTerminalStringSentence(Tree<String> tree) |
static List<String> |
getTerminalTokens(Tree<String> tree) |
static Tree<Pair<String,IntPair>> |
getTokenIndexedCleanedParseTreeNodeCovering(Constituent c,
String parseViewName) |
static Tree<Pair<String,IntPair>> |
getTokenIndexedParseTreeNodeCovering(String parseViewName,
Constituent c) |
static Tree<Pair<String,IntPair>> |
getTokenIndexedTreeCovering(Tree<String> parse,
int start,
int end)
From a parse tree and a span that is specified with the start and end (exclusive), this
function returns a tree that corresponds to the subtree that covers the span.
|
static Tree<String> |
getTreeCovering(Tree<String> parse,
int start,
int end) |
static Tree<String> |
snipNullNodes(Tree<String> tree)
Removes subtrees labeled with the null label (-NONE-) and returns a new tree
|
static String |
stripFunctionTags(String label)
Strips function tags from a given node label.
|
static Tree<String> |
stripFunctionTags(Tree<String> tree)
Strips function tags from a tree and returns a new tree.
|
static String |
stripIndexReferences(String label) |
static Tree<String> |
stripIndexReferences(Tree<String> tree)
Removes index information in the parse tree to other nodes.
|
public static final String PATH_UP_STRING
public static final String PATH_DOWN_STRING
public static String convertBracketsFromPTBFormat(String sentence)
sentence
- A sentence which is to be convertedpublic static String convertBracketsToPTBFormat(String sentence)
sentence
- A sentence which is to be convertedpublic static Tree<Pair<String,IntPair>> getSpanLabeledTree(Tree<String> parseTree)
For example, consider the following input tree:
(S1 (S (NP (DT The) (NN bird)) (VP (VBD flew))) (. .))
This is transformed as follows:
([S1,[0,4]] ([S,[0,4]] ([NP,[0,2]] ([DT,[0,1]] [The,<0,1]]) ([NN,[1,2]] [bird,[1,2]])) ([VP,[2,3]] ([VBD,[2,3]] [flew,[2,3]])) ([.,[3,4]] [.,[3,4]])))
Here, the notation [.,.] is used to denote a Pair
object. That is, the node labeled
[NP,[0,2]] indicates that the corresponding node in the parse tree is labeled NP and that NP
spans the tokens ranging from 0 to 2 (exclusive.)
parseTree
- The parse tree to be annotated with the spansPair
of the original node's label and the
span that the node covers.public static Tree<String> snipNullNodes(Tree<String> tree)
tree
- A parse tree, possibly containing null labelspublic static String stripFunctionTags(String label)
label
- A node labelpublic static Tree<String> stripFunctionTags(Tree<String> tree)
tree
- A parse treepublic static Tree<String> stripIndexReferences(Tree<String> tree)
tree
- A parse treepublic static String getSentenceFromTree(Tree<String> tree)
tree
- The parse tree, where the leaf nodes are the terminals we care aboutpublic static String[] getAllPhraseSiblingLabels(String parseViewName, Constituent constituent)
parseViewName
- The name of the parse view. This might typically be one of
ViewNames.PARSE_CHARNIAK or ViewNames.PARSE_STANFORDconstituent
- The constituent whose sibling phrases are required.public static String[] getAllSiblingLabels(Tree<String> node)
node
- The node whose siblings are required.@Deprecated public static <T> Tree<T> getCommonAncestor(Tree<T> t1, Tree<T> t2, Tree<T> tree) throws Exception
t1
- The first treet2
- The second treetree
- The tree that contains t1
and t2
.t1
and t2
. If none is found,
then the function returns null.Exception
public static Tree<String> getParseTree(String parseViewName, Sentence s)
parseViewName
exists in the text annotation.parseViewName
- The name of the parse views
- The sentencepublic static Tree<String> getParseTree(String parseViewName, TextAnnotation ta, int sentenceId)
sentenceId
th sentence from the text annotation. This
code assumes that the view called parseViewName
exists in the text annotation.parseViewName
- The name of the parse viewta
- The text annotation objectsentenceId
- The sentence whose parse tree is requiredsentenceId
th sentence@Deprecated public static Tree<String> getParseTree(TextAnnotation ta, int sentenceId)
sentenceId
th sentence from the text
annotation. This code assumes that the view called ViewNames.PARSE_CHARNIAK exists in the
text annotation.ta
- The text annotation objectsentenceId
- The sentence whose parse tree is requiredsentenceId
th sentencepublic static Tree<String> getParseTreeCovering(String parseViewName, Constituent c)
parseViewName
- The name of the parse viewc
- The constituent that we care aboutTextAnnotation
to which the constituent
belongs which covers the constituent.@Deprecated public static <T> Pair<List<Tree<T>>,List<Tree<T>>> getPath(Tree<T> start, Tree<T> end, Tree<T> tree, int maxDepth) throws Exception
Exception
@Deprecated public static <T> String getPathString(Tree<T> start, Tree<T> end, Tree<T> tree, int maxDepth) throws Exception
start
and end
that belong to
the tree tree
Exception
@Deprecated public static <T> String getPathStringIgnoreLexicalItems(Tree<T> start, Tree<T> end, Tree<T> tree, int maxDepth) throws Exception
Exception
@Deprecated public static <T> String getPathStringToCommonAncestor(Tree<T> start, Tree<T> end, Tree<T> tree, int maxDepth) throws Exception
Exception
@Deprecated public static <T> List<T> getPathToRoot(Tree<T> tree, Tree<T> leaf, int maxDepth) throws Exception
Exception
@Deprecated public static <T> List<Tree<T>> getPathTreesToRoot(Tree<T> tree, Tree<T> node, int maxDepth) throws Exception
Exception
public static String getSubcatFrame(Tree<String> yieldNode)
public static Tree<Pair<String,IntPair>> getTokenIndexedCleanedParseTreeNodeCovering(Constituent c, String parseViewName)
public static Tree<Pair<String,IntPair>> getTokenIndexedParseTreeNodeCovering(String parseViewName, Constituent c)
public static Tree<Pair<String,IntPair>> getTokenIndexedTreeCovering(Tree<String> parse, int start, int end)
public static int getHeadWordPosition(Constituent c, HeadFinderBase headFinder, String parseViewName) throws Exception
HeadFinderBase
that is passed as an
argument. To use this function, first, a head finder should be created. For example:
TextAnnotation ta = ... // some text annotation Constituent c = ... // some constituent CollinsHeadFinder headFinder = new CollinsHeadFinder(); int headId = ParseHelper.getHeadWordPosition(c, headFinder); // now we can do other things with the headId. String headWord = WordHelpers.getWord(ta, headId);
parseViewName
- The name of the view which contains the parse treesc
- The constituent whose head we wish to findheadFinder
- The head finderException
public static Constituent getPhraseFromHead(Constituent predicate, Constituent argHead, String parseViewName)
predicate
- The predicate of the construction (e.g. "with")argHead
- The head-word of the argument of the construction (e.g. "telescope")parseViewName
- The name of the parse view used to extract the phrase-structure treeCopyright © 2017. All rights reserved.