Threshold-Relative Updating

Next: Prediction Up: Extensions to the Basic Previous: Voting: The Clouds Architecture Contents

Threshold-Relative Updating

In addition to the function approximation mode in Section 4.3.3, SNoW implements another update rule in which the weight update rate is not constant (as in $\alpha$ or $\beta$ for Winnow, or $learning\_rate$ for Perceptron) but rather depends on how far the activation value is from the threshold. With Threshold-Relative Updating (option -t), an update always causes the activation of an example on which the network has made a mistake to jump directly to the immediate vicinity of the threshold instead of taking a small step towards it.

Let ${\cal A}_t = \{i_1, \ldots, i_m \}$ be the set of active features in a given example that are linked to target node , let $w_{t,i}$ be the weight of feature in target , let be the strength of feature in the example, and let $\theta_t$ be the threshold at target node . Then $\Omega_t = \sum_{i \in {\cal A}_t} w_{t,i} s_i$ is the activation of target before updating.

The Winnow threshold relative updating rule is:

$\displaystyle w_i = w_i * \left(rate * \frac{\theta_t}{\Omega_t}\right)$

where is $\alpha$ if the example is positive and $\beta$ if it's negative. Notice that in this case, following the update, we get:

$\displaystyle new\_\Omega_t = \sum_{i \in {\cal A}_t} new\_w_{t,i} s_i = rate * \theta_t$

Similarly, the update rule for Perceptron becomes:

$\displaystyle w_i = w_i + \frac{rate + (\theta_t - \Omega_t)}{active\_features}$

where is the Perceptron algorithm parameter $learning\_rate$ if the example is positive and $-learning\_rate$ if it's negative, and $active\_features$ is the total number of active features in the example. Again, it's easy to see that in this case, following the update, we get:

$\displaystyle new\_\Omega_t = \sum_{i \in {\cal A}_t} new\_w_{t,i} s_i = rate + \theta_t$

That is, the example's new activation will be roughly equal to the threshold with a small buffer between the updated activation and the threshold. See option -t for details.

Next: Prediction Up: Extensions to the Basic Previous: Voting: The Clouds Architecture Contents

Cognitive Computations 2004-08-20