public class WordSplitter extends Object implements edu.illinois.cs.cogcomp.lbjava.parse.Parser
Sentences returned by
another parser (e.g., SentenceSplitter) and splits them into
Word objects. Entire sentences now represented as
LinkedVectors are then returned one at a time by calls
to the next() method.
A main(String[]) method is also implemented which applies
this class to plain text in a straight-forward way.
| Modifier and Type | Field and Description |
|---|---|
protected edu.illinois.cs.cogcomp.lbjava.parse.Parser |
parser
The
Sentence returning parser. |
| Constructor and Description |
|---|
WordSplitter(edu.illinois.cs.cogcomp.lbjava.parse.Parser p)
Initializing constructor.
|
| Modifier and Type | Method and Description |
|---|---|
void |
close()
Frees any resources this parser may be holding.
|
static void |
main(String[] args)
Run this program on a file containing plain text, and it will produce
the same text on
STDOUT rearranged so that each line
contains exactly one sentence, and so that character sequences deemed to
be "words" are delimited by whitespace. |
Object |
next()
Returns
LinkedVectors of Word objects one at
a time. |
void |
reset()
Sets this parser back to the beginning of the raw data.
|
protected edu.illinois.cs.cogcomp.lbjava.parse.Parser parser
Sentence returning parser.public WordSplitter(edu.illinois.cs.cogcomp.lbjava.parse.Parser p)
p - The Sentence returning parser.public static void main(String[] args)
STDOUT rearranged so that each line
contains exactly one sentence, and so that character sequences deemed to
be "words" are delimited by whitespace.
Usage:
java edu.illinois.cs.cogcomp.lbjava.edu.illinois.cs.cogcomp.lbjava.nlp.WordSplitter <file name>
args - The command line arguments.public Object next()
LinkedVectors of Word objects one at
a time.next in interface edu.illinois.cs.cogcomp.lbjava.parse.Parserpublic void reset()
reset in interface edu.illinois.cs.cogcomp.lbjava.parse.Parserpublic void close()
close in interface edu.illinois.cs.cogcomp.lbjava.parse.ParserCopyright © 2017. All rights reserved.