by Y. LeCun, B. Boser, et al
Summary
This paper introduces a neural network based approach of recgnizing handwritten zip codes.
Training & testing data is collected from a post office in Buffalo, altogether 9298 segmented numerals. 7291 is used for training and the remaining 2007 saved for testing. All the zip codes are preprocessed by seperating digits from their neighbors and then apply linear transformating to convert them into 16x16 pixel images, and the grey levels are scaled -1~1.
The input of the network is 16x16 images, and the output is 10 units, one per class. The precise location of a feature is not relevant to the classification, while the approximation location does. In order to detect a particular feature at any location, it uses the same set of weights - "weight sharing". And the neural network is composed of 3 layers:
H1: 12 feature maps with 8x8 units, each unit takes a 5x5 field from the 16x16 image as input. Units in 1st layer that are one unit part, their receptive fields are 2 pixels apart. Thus the image is undersampled and some position information is eliminated. Connections extending past the boundaries are filled with virtual background plane, whose value is chosen to be -1. It uses the same set of 25 weights on all the 5x5 input fields, but each unit in H1 has its own bias. (19,968 connections & 1068 free parameters)
H2: 12 feature maps with 4x4 units built on H1 and just like the way H1 is built, except that each unit in H2 combines 8 of the 12 feature maps in H1. The sample fields are also 5x5, and also identical weight vectors and different bias for each unit. (38,592 connections & 2592 free parameters)
H3: 30 units fully connected to units in H2. (192*30 + 30biases = 5790 connections)
Output: 10 units fully connected to units in H3, adding another 310 weights.
The nonlinear function used at each node was a scaled hyperbolic tangent. The target values for the output units were chosen within the quasilinear range of the sigmoid. The weights were initialized with random values ranging between -2.4/Fi and 2.4/Fi, where Fi is the number of inputs.
The network was trained 23 passes (167,693 pattern presentations) - backpropagation. The accuracy was 0.14% for training data, and 5% for testing data. The percentage of rejection was 12.1% to achieve 1% error rate. Overall, it outperforms previous works in accuracy and speed.
Discussion
To be honest, I know little about neural network. But it seems to be using a bunch of functions to undersample sketches by their features, and train the network to make certain weights bigger, so that the input can follow the path which maximize the weights to get to the corresponding output. However, this may not work equally well in non-digit sketches, and also worth noted is that the system uses images of grey level -1~1, while the sketches we are considering uses binery images most of the time. Anyway, this is a different solution from the ones we've learned so far.
Subscribe to:
Post Comments (Atom)

2 comments:
This approach definitely seems more suited for digits and alphabet characters. I would think the large number of examples needed to train and the large number of input possibilities in free-form sketch would make this approach not nearly as applicable as the other approaches we've looked at.
Good summary . I used yours to write mine.
Post a Comment