Next: Basic Learning Rules Up: The SNoW Architecture Previous: The SNoW Architecture Contents


The Basic System

The SNoW learning architecture framework is a sparse network of linear units over a Boolean or real valued feature space.

In SNoW's most basic architectural instantiation, a two layer network is maintained. The input layer is the features layer. Nodes in this layer are allocated to features observed in training examples. The second layer consists of target nodes. Each target node corresponds to a concept (a class label) one wants to represent as a function of the input features.4.1 For example, when learning a Boolean function, it is most common to use two target nodes; one representing the positive examples and one the negative. Architectural instantiations can also redefine the number of targets, target nodes, and their types, choice of algorithms and algorithmic parameters, and more.

SNoW expects a set of examples as input. During training, each target node uses all examples provided to learn a representation of its concept as a function of (some of) the features. During evaluation/testing, these learned representations are evaluated on a given example to yield a prediction. An incremental mode in which training continues along with testing is also available. SNoW makes use of the infinite attribute domain [Blum, 1992] as opposed to the standard example representation as an array of Boolean or real values, one per feature in the feature space, where element $ i$ in the array specifies the strength of feature $ i$. As such, examples are represented as a list of active features. Features that would have a strength of 0 in the classical definition are not listed. Each example is thus a list of feature indices (possibly associated with a real valued strength). This is done in anticipation of a very large feature space and sparse examples; that is, only a small percentage of all possible features are active in each example.

Target nodes are linked via weighted edges to (some of the) input features. Edges are allocated dynamically; a feature $ i$ is allocated and linked to target node $ t$ if and only if $ i$ is present in an example labeled $ t$. It is also possible to make more complex decisions about which features to include in the network4.2. Since such decisions are made throughout the first pass through the training set, the full benefit of the information in each example may not be reaped after that first pass. Therefore, SNoW processes the training set twice by default, and the number of passes or cycles can be modified on the command line (see the -r command line parameter).

Let $ {\cal A}_{t} = \{i_1, \ldots, i_m \}$ be the set of features that are active in an example and are linked to target node $ t$. Let $ s_i$ be the real valued strength associated with feature $ i$ (default: $ 1$) in the example. Then we say that $ t$ predicts positive if and only if

$\displaystyle \sum_{i \in {\cal A}_t} w_{t,i} s_i \ge \theta_t$ (4.1)

where $ w_{t,i}$ is the weight on the edge connecting the $ i^{th}$ feature to target node $ t$, and $ \theta_t$ is $ t$'s threshold.

The real-valued result of the summation in equation 4.1 is also referred to as the target node's activation. Target node activations drive weight vector updates during training as well as predictions during testing. The default testing policy for multiple target networks is a winner-take-all rule. Let $ T$ be the set of all targets defined in the current architecture instantiation. The predicted target $ t^*$ for example $ e$ with a set of active features $ \cup_{t \in T}{\cal A}_{t}$ is:4.3

$\displaystyle t^*(e) = argmax_{t \in T}~\sigma(\theta_t, \Omega_t(e)) $

where $ \Omega_t(e)$ is the activation calculated by the summation in equation 4.1 for target node $ t$ given $ e$, and $ \sigma(\theta, \Omega(e))$ is a learning algorithm specific sigmoid function4.4 whose transition from an output close to 0 to an output close to $ 1$ centers around $ \theta$. Specific sigmoid functions employed by SNoW's various learning algorithms are discussed in the next section.

The only other testing policy currently implemented in SNoW is automatically enabled for networks which contain exactly one target node. This single-target testing policy makes predictions similarly to the classical versions of the linear threshold algorithms SNoW supports by simply returning the prediction of the single target node (either positive or negative as determined by the equation discussed above) as SNoW's prediction.

SNoW can then display for the user its prediction for each example, its accuracy across an entire labeled testing set, the activations of each target node for each example, those same activations after softmax normalization, or the results obtained from applying the sigmoid function to the activations, which will hereafter be referred to as the sigmoid activations. In simple architecture instantiations, sigmoid activations are equivalent to prediction confidences, but see Section 4.3.5 for a more in depth discussion.



Footnotes

... features.4.1
SNoW also supports voting several target nodes to represent a single target via the cloud architecture - an additional level defined over the target nodes' layer.
... network4.2
See the -d and -e command line parameters.
... is:4.3
This assumes that each target is represented by a single target node. See Section 4.3.5 for situations where this assumption does not hold.
... function4.4
See [Mitchell, 1997] for a more detailed discussion of sigmoid functions in the context of artificial neural networks.


Next: Basic Learning Rules Up: The SNoW Architecture Previous: The SNoW Architecture Contents
Cognitive Computations 2004-08-20