Thursday, October 16, 2008

"Ink Features for Diagram Recognition"

by Rachel Patel, Beryl Plimmer, John Grundy, Ross Ihaka

Summary

Previous works use top-down, bottom-up, and combination approaches to do sketch recognition. But few systems support mixed text/shape sketch. This paper uses formal statistical analysis to identify key ink features and tries to tell text and shapes apart.

46 features are selected and grouped into 7 categories: size, time, intersections, curvature, pressure, OS recognition values, and inter-stroke gaps. 1519 strokes are collected for statistical analysis to determine whether each feature is significant in distinguishing between shapes and text. These features can then be arranged into a decision tree, with the most significant features at the root, leaves of the tree are either TEXT or SHAPE. rpart function is applied to the training data to find the optimal position for a split to made that yields minimal number of misclassification. The features that most accurately split the data into text and shape is significant.

This divider outperforms InkKit and Microsoft divider in overall accuracy, though its misclassification rate of text on new diagram set is a little higher than that of InkKit (music nodes were a confounding factor).

Significant features are:
1. time till next stroke (inter-stroke gaps)
2. speed till next stroke (inter-stroke gaps)
3. distance from last stroke (inter-stroke gaps)
4. distance to next stroke (inter-stroke gaps)
5. bounding box width (size)
6. perimeter to area (size)
7. amount of ink inside (size)
8. total angle (curvature)


Discussion

The idea of using rpart to select significant features and organize them into a decision tree is great. The misclassification rate on shapes is twice as high as that on text, meaning that many shapes are classified as text, and this is due mainly to such strokes as music nodes.

One thing is that for English and some other Latin character sets, capitalized characters and non-capitalized characters may be distinct on some features. For non-capitalized characters, curvature / bounding box width might be a useful feature (don't know if this correlates with curvature + bounding box width), as text tend to have high curvature/bounding box width. However, this feature may to be able to distinguish between shapes and most capitalized characters, such as 'H', 'F', 'X', etc, especially on stroke basis.

1 comment:

Daniel said...

Interesting feature. I guess since most letters are thinner, their curvature to bounding box width ratio would be larger. However, this assumes that they character is oriented with your coordinate system. What if the user writes on a diagonal? The bounding box would be larger.