Tuesday, October 21, 2008

"Template-based Online Character Recognition"

by Scott D. Connell, Anil K. Jain

Summary

To perform classificatio of handwritten characters, a recognition system must attempt to leverage between class variations, while accommodating potentially large within-class variations -- composed of two parts: between and within writing styles. Their goal is to automatically learn these writing styles so that adequate character models can be built.

Background discusses five categories of online handwriting recognition techniques: (1) primitive decomposition, (2) motor models, (3) elastic matching, (4) stochastic models, (5) neural networks.

Preprocessing uses equi-distant resampling followed by a Gaussian filter. Locations of endpoints and high curvature points are preserved. And characters are normalized to the same height.

A stroke is represented as a string of on average 62 events. Distance between two strings involves computing the distance between the corresponding pair of events. This requires events first be aligned. In this paper, each event is represented as: x, y with respect to the first point of the first stroke of the digit, and the angle of curvature (theta) at the sample point. Distance between two events is a weighed sum of their corresponding x, y, theta distances. Distance between characters is the sum of distances between aligned events. Neighboring strokes in multi-stroke characters are connected, so that they can be represented as a single string. Restrictions are made to skip points to prevent 1-N or N-1 alignment of events, when a point is skiped, penalty (spurious & missing penalties) is added to the distance. The distance is calculated by finding the minimal distance of all possible alignments of events in two strings. A stroke count difference penalty is added to the final distance.

Two methods - cluster centers and training set editing - are used to selecting representative prototypes. Cluster centers: cluster data and retain the prototype closest to each center (can be seen as finding representative examples of writing styles.) Training set editing: selecting only those templates that fall along class boundaries.

Next, the paper introduces nearest neighbor and decision tree as methods of classification. In constructing a decision tree, "difference feature" is used (asking question "Is character c more similar to reference i, or reference j?").

Accuracy was 86.9% for a 36-class set of alphanumeric characters.


Discussion

To me, it seems "different writing styles" within one class is just the same as different classes. The ability to distinguish between writing styles (and serves as different classes?) comes from the fact the training examples are automatically clustered. I know there is unsupervised clustering algorithms where the number of clusters K is automatically decided based on the data set.

1 comment:

Daniel said...

I also look at the "writing styles" and "different classes" in an odd way. If you're having to account for different writing styles, why do through all this hassle? Just have the user train a normal classifier.