SKETCH RECOGNITION 2008: "Envisioning Sketch Recognition: A Local Feature Based Approach to Recognizing Informal Sketches"

by Michael Oltmans

Summary

Chapter1 Introduction

The approach is based on a representation of a sketched shape in terms of the visual parts it is made of. It is developed to recognize shapes in free sketching, which is different from sketches with feedback. When the goal is to jot something down quickly or explore early design possibilities, a less intrusive style of interaction is preferable. Feedback allows the user to repair recognition errors early on, touch up strokes are cleaned up before next shape is drawn, and feedback can teach the user what the recognition system is capable of recognizing, but freely drawn sketches don't have these features.

The principal challenges are:
(1) Shapes vary on both a signal and a conceptual level. (noises and conceptual variation, same thing drawn differently)
(2) Users draw strokes that overtrace previous strokes, and that may not be necessary to depict the shape being drawn. (doesn't change appearance, so visual approach is well suited)
(3) Segmenting strokes into groups corresponding to individual shapes is a complex task. (grouping according to the order they were drawn is not sufficient)

Chapter2 Representation of Visual Parts

The part-based representation is used to train a classifier to distinguish between shapes based on their parts. A visual part is an abstraction of the appearance of a region of the shape. It represents the appearance of a region of the shape (e.g. the top right hand corner of a rectangle). Each visual part is a circular region (bullseye) that is subdivided into rings and wedges, like a dartboard. The rings are spaced so that they are separated equally in log-space. They are spaced like this so that the inner bins are smaller and represent more detailed information about the appearance, while the outer bins contain more general information. A shape context is calculated
every 5 pixels along the the strokes in the shape.

The histogram boundaries are rotated relative to the interest point's orientation to make it rotationally invariant;the bins are rotated by an additional half bin width to prevent the stroke from lying along the boundary. So the bullseyes are rotationally invariant. The stroke direction relative to the orientation of the bullseye feature (calculating the tangent over a window of 19 points, four orientations: 0-45, 45-90, 90-135, 135-180) is added as a third dimension of the histogram, providing a count of how many points fall into each spatial bin at each direction (a single bin can distinguish between vertical and horizontal line).

Stroke preprocessing: first scale the shape so that the maximum distance between any two points in the shape is 75 pixels. Then resample the points along the strokes to have an approximately constant separation along the stroke. And interpolate additional points so that no two consecutive points have a distance greater than 1 pixel.

The distance between two bullseyes is calculated by comparing their vectors of bin counts: delta(p,q) = sigma(square(pi-qi)/(pi+qi)). Since the outer bins are less important and they usually contain more points, normalize the total weight of the histogram to be 1 to avoid differences arising from the total number of points in a region.

Chapter3 Representation of Shapes

Codebook: calculate the collection of bullseye parts for each shape in our training set, then cluster them into groups of similar parts, and use a representative from each cluster center as one of the codebook entries. Use a quality threshold (QT) clusterer, which forms clusters such that the distance between any two members of a clusterer is under a threshold. To improve
eciency, instead of trying each point as a cluster seed, the authors randomly selected 20 points as seeds and choose the largest cluster formed from those 20 seeds. Randomly select 1000 bullseye parts from the training set instead of using all of the calculated parts. The clustering is terminated after finding the 200 largest clusters.

To recognize a sketched shape, it is represented based on the parts it contains and then train a classifier to distinguish shapes based on those parts. First calculate the visual parts at a sampling of locations that cover the sketched symbol. Typically, 50 parts are calculated for a shape. In the next step, represent the sketched symbol in terms of the standard vocabulary of parts (codebook), compare each part in the codebook to each of the parts calculated from the sketched symbol, a distance matrix is got. Then a match vector is calculated by finding the minimum distance in each row of the matrix (match value: small match value means a codebook part presents in the input shape). Using the match vector representation, a classifier is trained to learn the differences between shape classes and to learn the allowable variations within a shape class.

The visual parts insulate the shape descriptions from the signal noise: bin (fuzzy), the way that are used to describe a shape (distance measure, low if similar, not exactly the same).

Chapter4 Recognition

With a representation of fixed cardinality and ordering corresponding to the codebook entries we can train a support vector machine (SVM) to learn a decision boundary that separates typical match vectors of one shape class from those from another shape class. To assign each input shape a single label from the set of possible shape classes, a one-vs-one strategy is used. This is implemented by training one classifier to distinguish between each pair of shape classes. This results in n(n-1)/2 binary classiers for n shape classes. For one input shape, the result from
each classifier is counted as a vote for the class it predicted. Assgin the class which received the most votes to the shape.

Shape localization involved sevel steps: selecting candidate regions, classifying the candidate regions, and combining the classified regions to form the final predictions. Candidate regions are selected by scanning a rectangular window over the input sketch every 10 pixels along the stroke with several different sized windows. The classifier is trained on an additional WIRE class to identify wire segments. The candidate finder generates many overlapping candidates so each shape in the image is partially contained in many candidate regions. The algorithm for combining the initial set of candidates into a nal set of predictions has two steps that are repeated for each shape class: (1) forming initial clusters of candidates and (2) splitting clusters that are too large into multiple predictions. To make a nal set of predictions we follow a greedy strategy based
on the total weight of each cluster. The weight of each cluster is determined by EM (expectation maximization), by combining the weights of the individual candidates in the cluster. Highly weighted clusters are generally tightly-grouped and have many high scoring candidates. As a result clusters with high weights are generally correct whereas clusters with low weights are less likely to be correct.

Chapter5 Evaluation

The isolated shape recognizer and the full sketch processor are evaluated separately. Analog circuit drawings report a recognition rate of 89.5%. HHreco data set that contains PowerPoint style shapes produced a recognition rate of 94.4%. One of the limitations is that it has difficulty identifying shapes that contain the same low level features in different combinations. The full sketch processor correctly identified 92.3% of the regions.

Discussion

This paper uses a vision-based approach, which successfully tackles the noises and conceptual variance of shapes appear in free sketching. And the full sketch processor which identifies locations and scales of components in sketch also has quite high accuracy. But the fuzzying also makes it unable to distinguish between shapes with the same set of parts that are differently connected. Also, this is an instance-based regcognizer, the input must be compared with all templates. So it slows quickly when new classes are added. And each time a new class is added, all the classifiers must be retrained. For sketches not like analog circuits where small components connected by wires, this recognizer may not work well (for example, a sketch with a circle in a rectangle).

SKETCH RECOGNITION 2008

Wednesday, October 1, 2008

"Envisioning Sketch Recognition: A Local Feature Based Approach to Recognizing Informal Sketches"

No comments:

About Me

Blog Archive