Next: The Sequential Model Up: File Formats Previous: File Formats Contents


Example Files

Example files are ASCII text files. In their basic format, SNoW represents Boolean examples. Each example consists of a list of non-negative integer IDs, separated by a single comma and any amount of whitespace, and terminated by a colon. The largest legal ID (either feature or target) is $ 2^{32} - 4$. It is common practice to place a single example on each line, but it is not required. SNoW ignores all whitespace. Each number uniquely represents a feature or a class label (target ID), and they can appear in any order within the example. The appearance of an ID within an example is an assertion stating that the feature or target is active within the example.

During training, SNoW will treat each given example as positive for each target node whose ID is active in the example and negative for each target node whose ID does not appear in the example. All examples are presented to all target nodes except those target nodes that learn with naive Bayes for which each target node is only trained on positive examples. Here are two examples:

7, 5, 1, 13:
0, 3, 1234, 123456, 12, 987, 234, 556:

Labels are not required in testing unless the user wants SNoW to keep performance statistics.

SNoW can also deal with continuous features. This is done by associating a strength with each feature index (ID). Each occurrence of an ID may be followed by a floating point strength surrounded by parentheses. For instance:

7(1.5), 5, 10(0.6), 13(-3.2):

If no parentheses appear after the feature ID, it is equivalent to giving the feature a strength of $ 1$.

In testing, the Winnow, Perceptron, and naive Bayes learning algorithms multiply each feature's weight within a target node by its strength in the example before adding it to that target node's activation. Winnow and Perceptron also do it during training. (Naive Bayes does not calculate activations during training.) When a mistake is made and weights are updated, Perceptron multiplies its learning rate parameter by the strength of the feature being updated before adding or subtracting the total to that feature's weight. Winnow raises the appropriate multiplier (either alpha or beta) to a power equal to the strength of the feature being updated before multiplying that feature's weight into the total.

Features IDs that represent targets may also be given a strength that is different than $ 1$. Doing so will not affect SNoW's behavior at all unless the multiple labels option is disabled (see -m) or a function approximation algorithm is enabled (see -G or Section 4.3.3). See sections 4.1 and 4.2 for more details on the role feature strengths play in learning.



Subsections

Next: The Sequential Model Up: File Formats Previous: File Formats Contents
Cognitive Computations 2004-08-20