Next: Sequential Model Up: Extensions to the Basic Previous: Regularization Contents


Function Approximation (Regression)

SNoW implements a function approximation version of Perceptron's additive update rule - a stochastic approximation to the Gradient Descent algorithm (GD) - and of Winnow's multiplicative update rule - an exponentiated Gradient Descent algorithm (EGD) [Kivinen and Warmuth, 1997]. See option -G for usage details.

Both use the same training policy. They are non-mistake-driven, meaning that each target node an example is presented to will perform an update on that example whether the network can be said to have made a mistake on that example or not. They are also on-line, as opposed to the true Gradient Descent algorithm in which the total error incurred by a weight vector with respect to an entire training set and the total update to that weight vector with respect to that error are calculated before any weights are updated. Mitchell calls the on-line version a stochastic approximation. A disadvantage to this approximation algorithm is that it requires a smaller learning rate to guarantee convergence. An advantage is that it is less susceptible to ``getting stuck'' in a local minimum that is not the global minimum [Mitchell, 1997].

Specifically, let $ {\cal A}_t = \{i_1, \ldots, i_m \}$ be the set of active features in a given example that are linked to target node $ t$, let $ w^t_i$ be the weight of feature $ i$ in $ t$, let $ \alpha $ be the learning rate, let $ s_i$ be the strength of feature $ i$ in the given example, and let $ s_t$ be the strength of target $ t$ in that same example4.9. Then $ \Omega_t = \sum_{i \in {\cal A}_{t}} w_{t,i} s_i$ is the predicted activation of target node $ t$ before updating, and $ s_t$ will represent the correct activation for this example while updating.

A regression update in SNoW proceeds as follows:

Naive Bayes does not have a Gradient Descent update rule. In both Winnow and Perceptron, each example is presented only to targets active in that example.



Footnotes

... example4.9
See Section 6.1 for more information on feature (and target) strengths.


Next: Sequential Model Up: Extensions to the Basic Previous: Regularization Contents
Cognitive Computations 2004-08-20