public class NerBenchmark extends Object
Directory format:
- "benchmark"
- <dataset name> there can be as many of these directories as you like, Reuters, Ontonotes, MUC7
and Web are examples of datasets one might run.
- "config" : this must contain one or more configuration files, there will be a run per config file, only files
ending with ".config" are processed
- "test" : the test directory. If training, and not test directory, the "train" directory will be used for both.
- "train" : the directory with the training data, only needed if "-training" passed.
- "dev" : the hold out set for training.
Command Line Options:
-verbose : provide detailed report on all scoring methods with separate evaluation for phrase level tokenization,
word level tokenization and so on. Alternatively, only the overall F1 scores are reported.
-training : this option will cause a training run, if training, evaluation will not be performed, that requires another run.
-features : for debugging, reports the feature vector for each token in the dataset. Output produced in a "features.out" file.
-iterations : specify a fixed number of iterations, or -1 (the default) means auto converge requiring a "dev" directory.
-release : build a final model for release, it will build on test and train, and unless "-iterations" specified, it will autoconvert
using "dev" for a holdout set.
Constructor and Description |
---|
NerBenchmark() |
Modifier and Type | Method and Description |
---|---|
static void |
main(String[] args)
Run a benchmark test against each subdirectory within the benchmark directory.
|
Copyright © 2017. All rights reserved.