We start with a file containing labeled examples. Our target concepts have
IDs 0 and 1, and all other numbers appearing in the examples
represent other features present (word colocations, parts of speech, etc.).
The first example in the training file is:
0,96,116,119,120,128,138,157,212,230,328,451,454,601,636,641,
646,773,774,815,872,897,937,1134,1160,1197,1231,1267,1461,1503,
1576,1640,1654,1838,1845,1878,1937,1941,1946,1953,1986,2012,
2387,2612,2958,3211,3221,3222,3233,3242,3308,3315,3318,3487,
3524,3526,3897,4037,4136,4404,6933,6991,7269,7298,7398,7488,
7539,7562,7755,7794,8032,8377,9336:
Here, the example has a label of 0, meaning that it will be a positive example for target 0 (the word ``their'') and a negative example for all other targets (in our case, just target 1, the word ``there''). In training, labels are always present, but their location in the list is not restricted. An example with no active target IDs has a negative label for all targets. All examples are terminated with a colon.
The original sentence from which the above example was generated is:
In the interim between now and next year, we trust the House and Senate will
put
« their » minds to studying Georgia's very real economic, fiscal and social
problems and come up with answers without all the political heroics.
The angled brackets were added for emphasis. The above example was generated from this sentence using Fex (Feature Extractor)7.1, a program which generates features based on specified relational definitions. In this example features include words around the target word, parts of speech in close proximity to the target word, and simple conjunctions of those.
Given our training data (provided in the file traindata.snow), we can now
train a classifier which will be able to classify new examples from outside
the training set based on what the system learned about the features present
in the training data. In order to train our network, we must invoke SNoW in
training mode with our training examples as the input file. We do this as
follows:
> snow -train -I tutorial/traindata.snow -F tutorial/test.net -W :0-1
This gives the output:
SNoW+ - Sparse Network of Winnows Plus
Cognitive Computations Group - University of Illinois at Urbana-Champaign
Version 3.2.0
Input file: 'tutorial/traindata.snow'
Network file: 'tutorial/test.net'
Training with 2 cycles over training data.
Directing output to console.
Network Spec -> w(:0-1),
The output from SNoW lets us know if there were any errors in the parameters we entered, and also gives information on the learning algorithm used. Here, we used a Winnow learning algorithm with default parameters by specifying -W :0-1 on the command line. This tells SNoW to use a default set of parameters (which work quite well for many experiments) and that our target concepts have IDs 0 and 1. Different algorithms and parameters can be specified on the command line or in an architecture file, as will be shown later in the tutorial.
The training made two cycles through our training data, which is the default. The number of cycles can be specified on the command line, and generally, the more cycles used, the closer the classifier comes to completely learning the training data.