Example files are ASCII text files. In their basic format, SNoW represents
Boolean examples. Each example consists of a list of non-negative integer
IDs, separated by a single comma and any amount of whitespace, and terminated
by a colon. The largest legal ID (either feature or target) is
.
It is common practice to place a single example on each line, but it is not
required. SNoW ignores all whitespace. Each number uniquely represents a
feature or a class label (target ID), and they can appear in any order within
the example. The appearance of an ID within an example is an assertion
stating that the feature or target is active within the example.
During training, SNoW will treat each given example as positive for each
target node whose ID is active in the example and negative for each target
node whose ID does not appear in the example. All examples are presented to
all target nodes except those target nodes that learn with naive Bayes for
which each target node is only trained on positive examples. Here are two
examples:
7, 5, 1, 13:
0, 3, 1234, 123456, 12, 987, 234, 556:
Labels are not required in testing unless the user wants SNoW to keep performance statistics.
SNoW can also deal with continuous features. This is done by associating a
strength with each feature index (ID). Each occurrence of an ID may be
followed by a floating point strength surrounded by parentheses. For
instance:
7(1.5), 5, 10(0.6), 13(-3.2):
If no parentheses appear after the feature ID, it is equivalent to giving the
feature a strength of .
In testing, the Winnow, Perceptron, and naive Bayes learning algorithms multiply each feature's weight within a target node by its strength in the example before adding it to that target node's activation. Winnow and Perceptron also do it during training. (Naive Bayes does not calculate activations during training.) When a mistake is made and weights are updated, Perceptron multiplies its learning rate parameter by the strength of the feature being updated before adding or subtracting the total to that feature's weight. Winnow raises the appropriate multiplier (either alpha or beta) to a power equal to the strength of the feature being updated before multiplying that feature's weight into the total.
Features IDs that represent targets may also be given a strength that is
different than . Doing so will not affect SNoW's behavior at all unless
the multiple labels option is disabled (see -m) or a function
approximation algorithm is enabled (see -G or
Section 4.3.3). See sections 4.1
and 4.2 for more details on the role feature strengths play in
learning.