SNoW implements a function approximation version of Perceptron's additive update rule - a stochastic approximation to the Gradient Descent algorithm (GD) - and of Winnow's multiplicative update rule - an exponentiated Gradient Descent algorithm (EGD) [Kivinen and Warmuth, 1997]. See option -G for usage details.
Both use the same training policy. They are non-mistake-driven, meaning that each target node an example is presented to will perform an update on that example whether the network can be said to have made a mistake on that example or not. They are also on-line, as opposed to the true Gradient Descent algorithm in which the total error incurred by a weight vector with respect to an entire training set and the total update to that weight vector with respect to that error are calculated before any weights are updated. Mitchell calls the on-line version a stochastic approximation. A disadvantage to this approximation algorithm is that it requires a smaller learning rate to guarantee convergence. An advantage is that it is less susceptible to ``getting stuck'' in a local minimum that is not the global minimum [Mitchell, 1997].
Specifically, let
be the set of active
features in a given example that are linked to target node
, let
be
the weight of feature
in
, let
be the learning rate, let
be the strength of feature
in the given example, and let
be the
strength of target
in that same example4.9. Then
is the
predicted activation of target node
before updating, and
will
represent the correct activation for this example while updating.
A regression update in SNoW proceeds as follows:
GD: When enabled in conjunction with Perceptron, the GD update rule becomes:
EGD: When enabled in conjunction with Winnow, the corresponding algorithm is referred to as Exponentiated Gradient Descent in the literature:
(Note that is never used in this case.)
Naive Bayes does not have a Gradient Descent update rule. In both Winnow and Perceptron, each example is presented only to targets active in that example.