SNoW is a multi-class classifier. The default architecture instantiation
implements a one-vs-all multi-class training policy: each target node is
learned individually, by considering all examples labeled
as positive and
all other examples as negative. The testing policy then simply evaluates all
target representations on a given example and applies a winner-take-all gate
to the targets' prediction confidences.
The winner-take-all testing policy is expressive and allows for representing a decomposition of the domain into labeled regions that are separated by hyperplanes (a Voronoi diagram). However, learning independently does not support this expressivity, since each labeled region needs to be linearly separable from the union of all others.
The true Multi-Class classification training policy buys back this expressivity and allows SNoW to learn Voronoi diagrams [Har-Peled et al., 2002]. In addition to supporting true multi-class, it supports ranking of labels, and allows examples to be labeled with several labels, with a specified order between them. In this document, we refer to this training policy as Constraint Classification. See the -O command line parameter for usage information.
Specifically, assume there are possible class labels in a learning
experiment, and assume that a given training example contains
,
, class labels (target IDs) with IDs
,
, ...
which
appear in that order in the example. With -O + set, SNoW will interpret
the labels in this example as asserting an activation lattice. The activation
of target node
should be higher than the activation of target node
, the activation of target node
should be higher than the
activation of target node
, and so on, and the activation of target node
should be higher than the individual activations of all the remaining
target nodes.
Let and
be the first two labels in the example's lattice for
which SNoW evaluates a higher activation for target node
than for
target node
. Target node
will be promoted and target node
will be demoted. The activation of target node
is then
recalculated before comparing with target node
. When target node
is compared to the remaining
target nodes, target nodes with
smaller IDs are inspected first. Again, if target node
needs to be
promoted after comparison with one of the remaining
target nodes, its
activation will be recalculated after promotion and before comparing with the
target node with the next higher ID.
This implementation is a special case of the Constraint Classification
algorithm described in [Har-Peled et al., 2002]. Notice that when and the
update rule used is Winnow, this algorithm reduces to the balanced version of
Winnow described in [Littlestone, 1988].
SNoW also implements a more conservative version of this algorithm. In the conservative Constraint Classification algorithm, only the target node with highest activation according to the current network with respect to a given example and the target node that corresponds to that example's label are compared. If they are the same target node (in other words, if the network predicted correctly), no change is made. Otherwise, the target node that ended up with the highest activation is demoted, and the target node corresponding to the example's label is promoted.