True Multi-Class Classification (Constraint Classification)

Next: Regularization Up: Extensions to the Basic Previous: Extensions to the Basic Contents

True Multi-Class Classification (Constraint Classification)

SNoW is a multi-class classifier. The default architecture instantiation implements a one-vs-all multi-class training policy: each target node is learned individually, by considering all examples labeled as positive and all other examples as negative. The testing policy then simply evaluates all target representations on a given example and applies a winner-take-all gate to the targets' prediction confidences.

The winner-take-all testing policy is expressive and allows for representing a decomposition of the domain into labeled regions that are separated by hyperplanes (a Voronoi diagram). However, learning independently does not support this expressivity, since each labeled region needs to be linearly separable from the union of all others.

The true Multi-Class classification training policy buys back this expressivity and allows SNoW to learn Voronoi diagrams [Har-Peled et al., 2002]. In addition to supporting true multi-class, it supports ranking of labels, and allows examples to be labeled with several labels, with a specified order between them. In this document, we refer to this training policy as Constraint Classification. See the -O command line parameter for usage information.

Specifically, assume there are possible class labels in a learning experiment, and assume that a given training example contains , $1 \leq n \leq k$ , class labels (target IDs) with IDs , , ... which appear in that order in the example. With -O + set, SNoW will interpret the labels in this example as asserting an activation lattice. The activation of target node should be higher than the activation of target node , the activation of target node should be higher than the activation of target node , and so on, and the activation of target node should be higher than the individual activations of all the remaining target nodes.

Let and $c_{i + 1}$ be the first two labels in the example's lattice for which SNoW evaluates a higher activation for target node $c_{i + 1}$ than for target node . Target node will be promoted and target node $c_{i + 1}$ will be demoted. The activation of target node $c_{i + 1}$ is then recalculated before comparing with target node $c_{i + 2}$ . When target node is compared to the remaining target nodes, target nodes with smaller IDs are inspected first. Again, if target node needs to be promoted after comparison with one of the remaining target nodes, its activation will be recalculated after promotion and before comparing with the target node with the next higher ID.

This implementation is a special case of the Constraint Classification algorithm described in [Har-Peled et al., 2002]. Notice that when and the update rule used is Winnow, this algorithm reduces to the balanced version of Winnow described in [Littlestone, 1988].

SNoW also implements a more conservative version of this algorithm. In the conservative Constraint Classification algorithm, only the target node with highest activation according to the current network with respect to a given example and the target node that corresponds to that example's label are compared. If they are the same target node (in other words, if the network predicted correctly), no change is made. Otherwise, the target node that ended up with the highest activation is demoted, and the target node corresponding to the example's label is promoted.

Next: Regularization Up: Extensions to the Basic Previous: Extensions to the Basic Contents

Cognitive Computations 2004-08-20