Next: Testing Parameters Up: Command Line Parameters Previous: Architecture Definition Parameters Contents

Training Parameters

The following parameters are all optional. As in all other cases, they can be defined as parts of the architecture file or on the command line. Some change the bookkeeping that SNoW does as it keeps track of the features linked to its target nodes. Some are used to tailor SNoW's efficiency to a particular learning scenario and dataset. None of the settings of these parameters are written to the network file unless otherwise stated.

-a <+ | ->
: Setting this parameter to + forces all non-discarded features to be written to the network file. If set to -, features that have not yet reached the eligibility threshold (i.e., pending features) are not written to the network. When set to + and used in conjunction with a non-default setting of -e in training, it is best to specify the same setting of -e during testing. This parameter is not available in -interactive mode. Default -.

-d <none | abs:<k> | rel>
: Specifies the discarding method, if any. Once a feature is discarded, it can never again contribute to a target node's activation. Features are discarded on a per target node basis. Discarding is done only when training and only in -train mode, every 1000 examples. One more application of discarding is then performed when SNoW finishes training on all examples. Note that features cannot be discarded until they have become eligible, and they will never be eligible again in the same invocation of SNoW after they have been discarded.

Absolute discarding discards features in a target node's weight vector with weight less than some threshold (specified as a fraction of the default weight). For example, specifying -d abs:0.1 on the command line means that when the feature's weight drops below $ 10\%$ of the initial weight, the feature is discarded.

Relative discarding compares feature weights across the whole network. The feature with the smallest weight is discarded, and its weight is subtracted from the weight of every other feature in the network. This method was developed specifically for Winnow networks and is probably best used only with Winnow networks.

This parameter is not available in -interactive mode. The default is none.

-e <count | percent>:<i>
: This parameter controls when the weights of linked features become eligible to participate in target node activation calculation and updates during training. Features that are more frequently encountered in the presented examples are more likely to be considered eligible. There are two types of eligibility, count and percent.

We note that although the default method is count, if you are interested in generating small hypotheses we are recommending using the relative method which we call percent here. See [Carlson et al., 2001] for experimental evidence.

Using the count method, an integer threshold is specified, and features become eligible as soon as their active counts become equal to that threshold. Before then, they are pending (not eligible, but not discarded either). See the -s parameter for a description of what makes a feature's active count within a given target node go up. Experiments with a small number of examples may benefit from setting this parameter to count:1.

For example, if -e count:3 is specified and feature $ 12836$ appears twice in the training file, then this feature would never be included in activation calculations during training, and it would not be written to the network file after training is complete (see the -a parameter for the only exception to this rule).

Using the percent method, the specified floating point threshold represents a fraction of total feature occurrences. During first cycle of training, all features are considered eligible. After the first cycle, a histogram of feature frequencies (active counts) is created for each target node. Starting with the highest frequencies, features are declared eligible until the total number of eligible feature occurrences within a given target node meets the specified percentage. All other features become pending. Again, see the -s parameter for a description of what makes a feature's active count within a given target node go up.

For example, let $ n$ be the number of features observed with a specific target node $ t$ (one of which is feature $ 12836$), and let $ i_1 \ldots i_k$, $ k < n$, be the features that are active more times than feature $ 12836$ in examples in which $ t$ is also active. Also, let $ a_{t,i}$ be the number of times feature $ i$ is active with $ t$, and let $ \gamma$ represent the user specified percentage eligibility (as in -e percent:$ \gamma$). Then, if the condition

$\displaystyle a_{t,{12836}} + \sum_{f=1}^k a_{t,{i_f}} \le \gamma\sum_{f=1}^n a_{t,{i_f}}
$

holds true, then feature $ 12836$ will remain eligible for participation in updates and activation calculations (as will all features $ i_1 \ldots i_k$). Otherwise, feature $ 12836$ will become pending.

The -e parameter also affects -test mode. When the network is read in, each feature's eligibility (either ``eligible'' or ``pending'') is assigned according this parameter's setting and the active count found for the feature in the network file. This parameter is not available in -interactive mode. The default setting of this parameter in both training and testing is count:2.

-f <+ | ->
: This parameter controls the automatic insertion of the ``fixed'' feature into every example. When set to +, a feature with ID equal to the highest integer that can be represented by an unsigned int in C++ is inserted into every example as it is read from disk. When set to -, no extra feature is added.

The fixed feature acts as a dynamic threshold in Winnow and Perceptron learners. The total activation produced by each example (including the fixed feature) is still compared to the constant threshold specified on the command line to determine if a mistake has been made (unless -O + has also been specified), but with the fixed feature enabled, this threshold can be thought of as merely the constant component of a dynamic threshold. This parameter is available in every execution mode. Default +.

-g <+ | ->$ [$,<+ | ->$ ]$
: If -g + is specified, SNoW automatically generates conjunctions of features active in each example. For each pair of active features, a new feature is generated using a set mapping, where the pair of feature IDs $ (i,j)$ is mapped to the new ID $ 10000j + i$, where $ i$ is the smaller of the pair. Note that in order for this mapping to work, all feature IDs used must be less than 10000. Also, using this parameter with many active features can significantly increase computation time and memory usage.

If -g - is specified, no conjunctions will be generated. If neither is specified, SNoW will examine the training data to decide if conjunctions would be useful. SNoW decides to generate conjunctions automatically if fewer than 100 unique features are present in the training data.

Users also have the option of writing the new examples to disk by specifying an additional argument. If -g +,+ is specified, input examples will be written to disk with conjunctions added. They will be output into a file whose name is the original filename concatenated with ``.conjunctions''. If no conjunctions are generated, the file will be left empty.

The setting of the first argument to this parameter is written to the network, but the second argument is not. This parameter is not available in -interactive mode. The default is -g <unset>,-, when SNoW will examine the training data to decide if conjunctions should be generated and no examples are written to disk.

-M <+ | ->
: This parameter controls the storage of examples in memory and only has an effect on -train mode. Setting it to - makes SNoW parse and train one example at a time. At the beginning of the next cycle, the input stream is rewound, and the parsing and training process begins again. Setting it to + makes SNoW parse and store every training example in memory before training begins. This uses much more memory, but runs quicker and quicker in comparison to the alternate setting as the number of training cycles increases. This parameter is not available in -interactive mode. Default -.

-m <+ | ->
: This parameter specifies whether examples should be treated as having multiple labels. If -m + is specified, the targets that appear in an example will not be treated as features by other targets, and thus a target will not be learned as a function of other targets in training. In testing, since all targets that appear in the example are treated as labels when this option is set to +, SNoW will count the example as correct if any of the targets that appear in the example are predicted.

If -m - is specified, a given target will treat other targets as features, and will therefore use those other target's strengths in the example for activation calculations in training. In testing, the example is only counted as correct if the predicted target is the first target found in the example.

This parameter can be set in every execution mode, but will not make any difference in any mode unless there exist examples with more than one target active. This parameter is not available in -interactive mode. Default +.

-r <i>
: Specifies the number of cycles (rounds) through the training data. Multiple passes through the training data can sometimes improve the resulting network. Only Perceptron and Winnow are affected by this parameter. This parameter is not available in -interactive mode. Default 2.

-s <s | f>
: Specifies whether to use the sparse or full network option. This setting only affects Perceptron and Winnow learners. In a sparse network (-s s), features are only linked to target nodes if both the feature and the target have appeared active together in an example. In contrast, a full network (-s f) has each target node linked to the same set of features. That is, if a feature is linked to any target node, then it is linked to all target nodes in the network. Also, in a sparse network, the active count of a given feature in a given target node is incremented only when that target ID and feature ID are active in the same example. In a full network, the active count of a given feature is incremented in every target node whenever that feature ID is active in any example.

In either kind of network, active counts are only incremented during the first cycle of -train mode and when -i + is specified in -test mode. This is important for eligibility (see -e), the rules of which are applicable whether the network is full or sparse. This parameter is not available in -interactive mode. Default s.

-u <+ | ->
: This parameter is for disabling first cycle updates. Setting it to + means that mistakes will trigger updates during the first round of training with Perceptron and Winnow learners. Setting it to - means that mistakes will not trigger updates (in fact, nothing will) during the first round of training. This can be useful if it is important that all eligibility decisions be made before any updates take place (see the -e parameter). This parameter can only be specified in -train mode. This parameter is not available in -interactive mode. Default +.

-z <+ | ->
: This parameter is for enabling ``raw'' mode, a mode of operation in which all of SNoW's bells and whistles are turned off so that SNoW's output is more easily compared with hand calculated results. Setting this parameter to + is exactly equivalent to setting Perceptron's initial feature weight to 0, Winnow's initial feature weight to $ 1$, and setting the following combination of command line parameters with every other optional parameter taking its default value:

-e count:1 -f - -g - -r 1 -s f

If any of these parameters are specified in addition to -z + on the command line, the extra command line settings will override those that -z + imposes no matter where in the command line they occur. This parameter is not available in -evaluate or -server mode. This parameter is not available in -interactive mode. Default -.



Next:
Testing Parameters Up: Command Line Parameters Previous: Architecture Definition Parameters Contents
Cognitive Computations 2004-08-20