Next: Prediction Up: Extensions to the Basic Previous: Voting: The Clouds Architecture Contents


Threshold-Relative Updating

In addition to the function approximation mode in Section 4.3.3, SNoW implements another update rule in which the weight update rate is not constant (as in $ \alpha $ or $ \beta $ for Winnow, or $ learning\_rate$ for Perceptron) but rather depends on how far the activation value is from the threshold. With Threshold-Relative Updating (option -t), an update always causes the activation of an example on which the network has made a mistake to jump directly to the immediate vicinity of the threshold instead of taking a small step towards it.

Let $ {\cal A}_t = \{i_1, \ldots, i_m \}$ be the set of active features in a given example that are linked to target node $ t$, let $ w_{t,i}$ be the weight of feature $ i$ in target $ t$, let $ s_i$ be the strength of feature $ i$ in the example, and let $ \theta_t$ be the threshold at target node $ t$. Then $ \Omega_t = \sum_{i \in {\cal A}_t} w_{t,i} s_i$ is the activation of target $ t$ before updating.

The Winnow threshold relative updating rule is:

$\displaystyle w_i = w_i * \left(rate * \frac{\theta_t}{\Omega_t}\right) $

where $ rate$ is $ \alpha $ if the example is positive and $ \beta $ if it's negative. Notice that in this case, following the update, we get:

$\displaystyle new\_\Omega_t = \sum_{i \in {\cal A}_t} new\_w_{t,i} s_i = rate * \theta_t
$

Similarly, the update rule for Perceptron becomes:

$\displaystyle w_i = w_i + \frac{rate + (\theta_t - \Omega_t)}{active\_features} $

where $ rate$ is the Perceptron algorithm parameter $ learning\_rate$ if the example is positive and $ -learning\_rate$ if it's negative, and $ active\_features$ is the total number of active features in the example. Again, it's easy to see that in this case, following the update, we get:

$\displaystyle new\_\Omega_t = \sum_{i \in {\cal A}_t} new\_w_{t,i} s_i = rate + \theta_t
$

That is, the example's new activation will be roughly equal to the threshold with a small buffer between the updated activation and the threshold. See option -t for details.



Next: Prediction Up: Extensions to the Basic Previous: Voting: The Clouds Architecture Contents
Cognitive Computations 2004-08-20