SKETCH RECOGNITION 2008: September 2008

Tuesday, September 23, 2008

"A Curvature Estimation For Pen Input Segmentation in Sketch-based Modeling"

by Dae Hyun Kim, Myoung-Jun Kim

Summary

This paper discusses the use of curvature to identify segmenting points in strokes. Strokes are first resampled to equally distanced points. So that the curvature at Ci with support k (support is a set of adjacent points used for the curvature estimation) can be computed by adding all directions within the range Di-k to Di+k (Di is the angle between Pi-1Pi and PiPi+1).

Then they define the local convexity: extend the support the each side of Pi with the current point Pj only if it's locally convex with respect to Pi (have the same sign of direction). But using local convexity alone sometimes indistinguishes actual segmenting point, so local monotonicity is introduced: extend the support only if the absolute direction monotonically decreases. The advantages of this method are that (1) there's no complex computation, and (2) it's invariant under the rotation and reordering of pen markings (produces the same result.)

The segmenting points are the points with local maximum absolute curvature. Experiments show a 95% accuracy (correctly found points / actual segmenting points). Also, speed information can be added to aid segmenting point detection, and adjacent points with high curvature are merged to return only one as the segmenting point.

Discussion

This paper actually uses the geometric feature of curve to dynamically adjust the window size (support) of points used to measure the curvature. This is after the wiggy edges are smoothed by resampling. So the resampling distance should be carefully chosen to both keep curvature information and smooth out the noises.

"Eliminating False Positives During Corner Finding by Merging Similar Segments"

by Aaron Wolin, Brandon Paulson, Tracy Hammond

Summary

This paper introduces MergeCF, which utilizes stroke's curvature and drawing speed to find corners in sketches. Then false positive corners are removed and segments that are alike are merged.

Distances: Chord distance (euclidean distance) and path length.
Speed: Speed at Pi is path length / duration for the neighborhood Pi-1 to Pi+1.

Initial fit: Corners are points with high curvature and low speed. Also check corners that are close together and remove those with smallest curvature.

Merging segments: Corners surrounding the smallest stroke segments are likely to be false
positives. Find the smallest segment, than combine it with one of its neighbors that yields smaller primitive fit error. No merging is performed if the error after merging is much higher than the sum of original errors of the two segments.

Experiment using polylines and complex shapes shows that Merge outperforms Sezgin and Kim on both "correct corner found" accuracy (0.971 vs 0.822 & 0.783) and all-or-nothing accuracy (0.667 vs 0.298 & 0.194).

Discussion

This algorithm first find as many potential corners as possible and then eliminate false positives. The important part is how to define the threshold for "small segment" and "not much higher error".

Monday, September 22, 2008

"PaleoSketch: Accurate Primitive Sketch Recognition and Beautification"

by Brandon Paulson, Tracy Hammond

Summary

In addition to placing few drawing constraints on users, the recognizer introduced in this paper is also able to return multiple interpretations as well as support more primitive shapes (line, circle, ellipse, arc, curve, spiral, helix). The recogizer in this paper is geometric-based. Strokes are pre-recognized before going on to low-level shape recognizers, beautification and the subsequent hierarchy.

Pre-recognition begins with removing consecutive points with duplicate position or time. Then the direction graph, speed graph, and curvature graph are computed. And corners are found by 1) going through points on the sketch and find all points before the points that fail line test, 2) merging corners that fall within a neighborhood with their average index, and 3) replacing non-end corners with the points that hasve the highest curvature in their neighborhood. In addition, two new features are introduced: normalized distance between direction extremes (NDDE = stroke length between point with highest direction and point with lowest direction / total stroke length), and direction change ratio (DCR = maximum change in direction / average change in direction). Curves typically have high NDDE and low DCR while polylines have low NDDE and high DCR. For long and many-points strokes, if the highest curvature within first and last 20% of the stroke, then tail removal is performed. Then overtracing and closure is tested.

Then low-level shape tests (line, polyline, ellipse, circle, arc, curve, spiral, helix, and complex test) are performed by computing various features of the stroke, comparing feature values with the thresholds and calculating errors. Of them, complex test is the default test that recursively divide the substroke into two parts at the highest curvature point and send back into the recognizer until all non-polyline primitives are got. Beautified shapes are returned after tests.

Hierarchy uses a ranking algorithm to find the best fit among multiple interpretations. Assgin scores to shapes: line = 1, arc = 3, curve = 5, circle = 5, ellipse = 5, helix = 5, spiral = 5, and sum all the scores of the substrokes. The interpretations are ranked by the total score, interpretation with lower score has higher ranking. Lastly, tails are removed, and the substrokes at both ends will be removed or adjusted accordingly.

Results show that its accuracy is as high as 99.89%, and 98.56% of the correct interpretation is the top interpretation. Most of the misclassified strokes would actually also be misclassified by human recognizer. And this low-level recognizer is integrated into a high-level system - LADDER.

Discussion

In terms of recogition accuracy, this recognizer is amazing, and it supports a handful of primitive shapes, which enables it to recognize quite complex sketches. Tons of thresholds are defined to distinguish minute differences between shapes. And NDDE and DCR are really useful features. The ideas of "continuation strokes" and "invisible corners" seems promising, for rounded rectangle or polygon, the edges can be moved to pass the centroid of the relevant substroke and parallel to the originally recognized edge.

Sunday, September 21, 2008

"Sketch Based Interfaces: Early Processing for Sketch Understanding"

by T. M. Sezgin, T. Stahovich, R. Davis

Summary

The authors is trying to provide a system where users can sketch naturally and have the their sketches understood, with no constraints on how things are drawn.

Early processing in the sytem consists of three phases: approximation, beautification, and basic recognition. Approximation finds the representative points of strokes in the sketch. Beautification modifies the points found in approximation phase to make the output nicer and more meaningful (show parallelism, perpendicularity, etc.) Basic recognition tries to interprete strokes into objects (rectangles, circles, etc.) And this paper focuses mainly on the approximation approaches. Stroke approximation is divided into vertex detection and curves handling.

The authors do vertex detection by combining the information in curvature and speed, since vertexes usually have high curvature and low drawing speed. Fd is the set of all points with local maximum curvatures above average curvature, while Fs is the set of all points with local minimum speed below 90% average speed. Using speed data alone misses short lines that are not drawn fast enough, and using curvature data alone produces many false positive point due to hand dynamics during freehand sketching. So the idea is to combine them. Three stages in generating hybrid fits:

1) Computing vertex certainties: (a) certainty for curvature canditate Vi is |Di-k - Di+k| / L, where Di is curvature at Vi, k is neighborhood size (not specified), L is path distance of substroke from Vi-k to Vi+k. (b) certainty for speed candidate Vi is 1 - vi/vmax.

2) Generating a set of hybrid fits: H0 is the intersection of Fd and Fs. In succession iterations, Hi' = Hi + Vs (Vs is the best remaining speed fit candidate), Hi" = Hi + Vd (Vd is the best remaining curvature fit candidate.) Goodness metric: least squares error (average of the sum of the squares of the distances to the fit from each point in the stroke S.) Ei = 1/|S| sigma ODSQ(s, Hi). Here |S| is the stroke length, and ODSQ stands for orthogonal distance squared, i.e. the square of the distance from the stroke point to the relevant line segment of the polyline difined by Hi. Compare the errors for Hi' and Hi", the one with smaller error become Hi+1. Repeat until all points in Fd and Fs have been used.

3) Selecting the best fit: set an error upper bound (not specified) and designate as final fit Hf, the Hi with the fewest vertices that also has an error below the threshold.

The second phase in approximation is curves handling. Define L1 as the euclidean distance between a pair of consecutive vertices found in vertex detection phase, and L2 be the path length of the sketch between these two points. The ratio of L2/L1 is significantly higher than 1 in curved regions of the sketch. Approximate the curved regions with Bezier curves: end points are consecutive vertices u = Si, v = Sj, and control points are c1 = KT1 + v, c2 = KT2 + u, where K = 1/3*sigma|Sk - Sk+1| where k from i to j (K = 1/3 path length of curve segment), and T1, T2 are unit length tangent vectors pointing inwards at the curve segment. The curve is subdivided in the middle if error of Bezier approximation is higher than a threshold (not specified), until the desired precision is achieved.

Discussion

The idea of using speed to find candidate verticies had already been stated in Herot's paper over 2 decades before this paper was written, and on this point, the recognition may rely partly on how the users draw, for example, some may draw very carefully with even speed. The method of choosing verticies by their certainty scores and least squares errors is the great part of this paper, though many threshold values are left unspecified (I guess some of these thresholds should vary according to the sketch it is dealing with.)

The accuracy of object recognition is as high as 96%, but this doesn't mean the approximation has equally high accuracy, for 96% might as well be attributed to the effect of beautification and the primitives in the system, for example, even the sketch is not properly approximated, the object recognizer may still correctly classify the sketch as rectangle, circle or spring, since it's easy to tell these objects apart.

Saturday, September 20, 2008

"Algorithms for the Reduction of the Number of Points Required to Represent a Digitized Line Or Its Caricature"

by Davis H Douglas and Thomas K Peucker

Summary

Finding representative points of lines helps to reduce the storage space needed and computational time of other computings which is proportional to the number of points in a caricature. One way to do this is by training computers with human decitions, but there are too many possible ways to represent a single class of feature. Another way is to approximate the points along a line with mathematical functions (ends of segments are averages of fixed number of points along the line), this misses important information in high curvature parts, while keeping too much information in the smoother part. Some algorithms focus on which points to be deleted (points closer than a threshold to neighbors are dropped, dropping points if line direction doesn't change much, deleting all but every Nth point), while others focus on which to be maintaained.

The method provided by A.E.G which was discribed by T. Lang in 1969 is similar to the algorithm created by the authors. But the result is unsatisfying where sharp angles are numerous. This algorithm concentrates on selecting points.

An intuitive way to decide whether a series of points form a line is to find the furthest point from the line segments between end points, see whether the distance from the furthest point to the line connecting end points exceeds some threshold. The authors used 2 methods:

Method 1: 1) assign start point of curve as "anchor", the last point as "floater". 2) find the point with greatest perpendicular distance to the line segment between anchor and floater, assgin it to floater, repeat this step until furthest distance satisfy threshold. 3) move anchor to floater and assign the last point as floater, iterate from step 2.

Method 2: same as method 1, except keeping floaters of each iteration in a stack. The next iteration only chooses floaters from the stack built in the previous iteration. This method saves time by avoiding re-examining points.

For Lang's algorithm, it does line test on points in the order of (1, 2, 3), (1, 2, 4), (1, 3, 4), (1, 2, 5)...anchor moves to the point before the first point that does not allow all intervening points to satisfy the tolerance. Two modificiaitons on this algorithm is given by the authors. Modification 1 has the anchor move to the furthest point from the segment instead of the point before the floater. Modification 2 avoiding unnecessary repeated calculations of distance by testing if the accumulated total (sigma(Distance(Pn, P0Pn+1))) is greater than the tolerance.

Comparison on AEG (Lang's algorithm), AEG+Mod1, AEG+Mod2, AEG+Mod1+Mod2, Method1, Method2 shows that Method2 outperforms the other algorithms.

Discussion

This algorithm is like recursively doing line tests on the curve. But in terms of "corner" finding, it gives much more points that are just not in a line but are not corners, for example, points on a smooth arc, points in middle of the bottom line in "|__|".

Monday, September 15, 2008

"Short Straw: A Simple and Effective Corner Finder for Polylines"

by Aaron Wolin, Tracy Hammond

Summary

Corner finding is the technique that involves splitting up a stroke into primatives, such as lines and arcs. ShortStraw is a simple and effective algorithm for corner finding.

Sezgin et al. use a stroke's curvature and pen speed to determine stroke corners. Kim and Kim use local convexity and local monoticity and also use resampling in their algorithm. These algorithms are complex, while ShortStraw is simple and highly accurate.

Implementation of ShortStraw:
Step 1, resample points of a stroke to be evenly spaced apart, so that the path distances between adjacent points are equal. The interspacing distance is bounding box diagonal / 40 (where 40 is a constant number carefully chosen.)

Setp 2, bottom-up corner finding: Corners are points with local minimum straw below a threshold of median straw * 0.95. (window size of plus/minus 3 is chosen here)

Step 3, top-down corner checking to remove false positives and add false negatives. (1) do line tests between adjacent corners found by step 2, if Dist(a, b) / Path-Dist(a, b) < 0.95, then there are missing corners in the halfway, relax threshold and add the point with local minimum straw to corner set, repeat untill all the stroke segments between corners pass line test. (2) if a subset of triplet passes line test, then the middle corner is removed. Corners returned are from the resampled points. Or it can be the nearest original point to that corner.

Results show that ShortStraw has much higher all-or-nothing accurracy (calculated by taking the number of correctly segmented strokes divided by the total number of strokes) of 0.74 compared with less than 0.3 using Sezgin and Kims' algorithms. And the number of false negatives with ShortStraw is 1/10 the number with Sezgin and Kim. The missed corners at obtuse angles can possibly be fixed by utilizing a varied windowor straw length for each point, but this may sacrifice algorithm simplicity.

Discussion

This is a quite straightforward algorithm, and the accuracy is high. The top-down checking phase further increases accuracy. I think maybe context information can be included in the future, for example, human viewers will think some point as a corner even if it looks more like an arc than some line segments meet at corners.

Algorithm Video

"Prototype Pruning by Feature Extraction for Handwritten Mathematical Symbol Recognition"

by Stephen M. Watt, Xiaofang Xie

Summary

Mathematical handwriting differs from other forms of handwriting: the set of possible input symbols is very extensive, (2) the spatial relation among symbols can use complex context-sensitive two dimensional rules, (3) can involve large operators such as matrix brackets, fraction bars or square roots. This paper presents approaches to recognize a large set of mathematical symbols.

Pruning prototypes is an intuitive way to break a large vocabulary into several smaller groups. For an unknown symbol, find the small group to which it belongs, then try to match the symbol in the group. During prototype pruning, the mathematical symbols are grouped according to some features. Then comparison during recognition is performed on groups instead of the whole set of symbols.

Data is gathered by using questionnaire, and about 50 people's handwritten is put into the database. Several pre-processing operations were performed, including selectively chopping the head and the tail of strokes, re-sampling, smoothing and size normalization.

Then some features are introduced:

Geometric Features:
(1) Number of Loops: by finding number of minimum distance pairs - sweep a line parallel to the line between "begin" and "end" (a pair of points within a certain distance threshold on a stroke) in the direction that shortens distanse between the two neighboring intersections with the stroke to find either to be locally connected or a minimus distance pair.
(2) Number of Intersections: using modified sweepline algorithm - on finding an intersection, delete the two line segments associated with the intersection, and insert two line segments that begin from intersection and end with their old ending points.
(3) Number of Cusps: cusp is formed by a series of five points p0, p1, p2, p3 and p4, such that angle between p0p1 and p1p2 > 170 degree, angle between p2p3 and p3p4 also > 170 degree, but angle between p1p2 and p2p3 is small.

Ink Related Features
(1) Number of Strokes
(2) Point Density: determine density similar to "o", "b" or "p".

Directional Features
(1) Initial, End and Initial-End Directions: discretize all directions to the nearest of N, NE, E, SE, S, SW, W, NW.

Global Features
(1) Initial and End Position: NE, NW, SE, SW of the bounding box.
(2) Width to Height Ratio: 0 for a slim symbol, 1 for a wide symbol, and 2 for a normal symbol.

Of them, the number of strokes (exact match), the (discretized) width to height ratio (exact match), the initial position (could differ by one), initial to end direction (could differ by one), and end direction (could differ by two) was used to prune models. (Models' feature values are pre-computed and stored.)

After pruning, elastic matching was used for recognition step, achieved by calculating the minimum distance between the unknown symbol and a set of models.

Experiments results show that although using feature to pre-classify lowers accuracy a little bit - while still quite high - the comparison is reduced to a large extent (approximately 89%), thus recognition speed is greatly promoted. And the authors suggest to introcude context and HMM recognizer to reduce ambiguity.

Discussion

I think the really important part in this paper is the introduction of features. Although only several was used, the authors discussed a lot. The geometric features seems to be quite useful if they are used in feature-based recognition. For other features, discrete values are used ignore small errors. This fuzzy logic actually improves recognition accuracy. The suggestion about using context is a good idea in mathemetical handwriting recognition and maybe some other fields.

Sunday, September 14, 2008

"User Sketches: A Quick, Inexpensive, and Effective way to Elicit More Reflective User Feedback"

by M. Tohidi, W. Buxton, R. Baecker, A. Sellen

Summary

Usability testing (UT) is well-known and commonly used to involve users in various stages of the design process. But in current UT process, participants generate significantly more reactive comments and criticisms than reflective design suggestions or redesign proposals. So user sketching is proposed to provide the right means for generating reflective feedback, without adding significantly to the cost and without eliminating the active involvement of users in the process.

Iterative process between sketch and knowledge/mind allows for an increasing improvement of the sketch/design. It provides users with a richer medium for communicating their feedback to designers. The work has taken place in two stages.

In their first study, they examined the differences that would result between a usability test that exposed users to a single design, versus one where they tried three different alternatives. The product was a touch-sensitive House Climate Control System. Three distinctive prototypes were developed, each reflecting a different design language: Circular, Tabular, and Linear. All three were consistent in terms of fidelity, functionality and quality. Think-aloud protocol and interview give much less reflective feedback than reactive feedback. Participants lack confidence in their own redesign suggestions. The process of thinking and verbalizing one's thoughts at the same time is very distracting.

Then in the second stage of the study, participants were asked to redesign the system. The results suggests that the less the original design is reflected in the user sketches, the less successful it is, and the sketches from the multiple design condition were richer in variation from any of the original designs compared with those from the three single conditions. Being able to recognize patterns at a glance, or answering unanticipated questions are unique benefits of sketches over numeric, textual, audio and video data. Sketching is a cheap and efficient way to enable the participants to refine already generated ideas and discover new ones.

Then three examples are given to illustrate a number of ways in which sketching helped the participants to better organize their thoughts, come up with new and unexpected design ideas, reflect on and refine previously stated ideas, and communicate their ideas to the experimenters.

Discussion

This paper illustrates the idea of applying sketching in UT. From the sketch recognition respect of view, I think it quite inspiring. The approach of paper-based sketching can totally be replaced by sketch recognition in the future.

The sentence "It is when sketches are seen in a larger group that patterns emerge and relationships are discovered." in Discussion suggests that we can use sketch recognition to abstract patterns from multiple sketches drawn by users. And this will be a major advantage of sketch recognition to paper-based sketching in UT process.

Another sentence in the Future Work part "one of our longer term objectives is to explore such techniques further in ideation, especially when comparing multiple design alternatives." gives us the picture that how sketch recognition can facilitate design. For example, the elements in multiple design alternatives can be abstracted and recombined in other forms to generage new design, or the users can give reflective feedback by just modifying the original design with sketch recognition instead of drawing their own from the beginning...

some videos:
UML design http://www.youtube.com/watch?v=v3-ozq-ZbHE
LADDER GUI http://www.youtube.com/watch?v=RjwdpB0Y924
GUI design http://www.youtube.com/watch?v=6gVrw1Wgdfg

Monday, September 8, 2008

"Graphical Input Through Machine Recognition Of Sketches"

by Christopher F. Herot

Summary

Traditional systems of computer-aided design is more akin to computer-aided evaluation or manipulation. And this paper discusses the approaches to let machines make inference about the meaning of sketches as well as user's attitude toward his design.

The HUNCH system is introduced. It's composed of a set of programs that interpret sketches at different levels. HUNCH was conceived around STRAIT, which found corners in the sketch based on the assumption that drawing speed decreased at corners. Curves are considered to be special corners which are recognized by CURVIT. The first version of STRAIT used a latching algorithm that combined endpoints falling close together, which rendered some bizzare results, which then brings STRAIT without latching - STRAIN. Experiment on computing latching radius by a function of speed also shows unsatisfactory result, because it's based on the assumption that user's certainty in endpoints' position correlate with speed of drawing the line, which doesn't reflect contidion at endpoints. Comparing candidates for latching with GUESS in 3d space may solve the problem, but paradoxically, GUESS relies on the latched data as input. This may require that latching make initial decisions and be modified in later stages if proved untenable. Overtracing reduces quantity of data and turning several overtraced lines into one. 3D projection based on 2D network of lines, and room finding based on floorplan shows success in machine recognition to some extent.

Truely successful system would have to make use of context at lowest level. General case description is mathed against the context-free structure in a top-down fashion and generate a composite structure where all of the syntactic entities of the sketch are assigned a meaning, but the matching process is complicated.

A more interactive system that combines bottom-up data flow of and top-down flow of context information comes from the user as well as high level program is promising. The new system consists of a data base and a set of programs to manipulate it. A set of functions such as speed and bentness are used to find lines and curves. By varying the number of points in an interval, machine's interpretation can be manipulated to fit closest to user's intention. Inference-making procedures are run in background, allowing interpretation on the fly. Latching is decided by certainty factors, and should be controlled by user profile. Also, latching radius should adjust according to some factors.

Discussion

This paper shows an "intelligent" system. It really tries to recognize what the user has drawn, although the speed factor depends purely on drawing habits of individual users. The idea of involving the context information (used by human observer) into the system is great, which facilitats decision making by assgining meaning to things being sketched. However, human intervention - context, intentions, etc. - is still playing a part in interpretation ("For example, if the program were told that the user has drawn a square, it could vary the size of the interval until the two functions produced exactly five peaks.")

Wednesday, September 3, 2008

"Gestures without Libraries, Toolkits or Training: A $1 Recognizer for User Interface Prototypes"

by Wobbrock, Wilson, Li

Comment
1. Manoj's blog

Summary

Gesture recognition has been the privilege of experts in pattern recognition and AI, not HCI experts. $1 recognizer facilitates incorporation the gestures into UI. It supports configurable rotation, scale, and position invariance, no feature selection or training examples.

Although toolkits like Artkit, Amulet, SATIN and libraries like Siger are powerful, they cannot help in most new prototyping environments where they are not available.

The algorithm is in 4 steps:

Step 1: , resample a points path into n (n = 64 is adequate) evenly spaced points due to variation in hardware and software sampling time.

Step 2: Rotate points so that their indicative angle (angle between centroid of gesture (x¯,y¯) and the gesture's first point) is at 0°.

Step 3: Scale points so that the resulting bounding box will be of
size*size dimension, and move the bounding box so that centroid of the gesture is at (0,0). For gestures serving as templates, Steps 1-3 should be carried out once on the raw input points. For candidates, Steps 1-4 should be used just after the candidate is articulated.

Step 4: Recognition and score. Compare candidate with templates, calculate average distance between corresponding points. The template with least path-distance to candidate is the result of recognition. Scoring can be calculated by 1 - path-distance between template and candidate / largest possible path-distance. Note that Step 2 only finds approximated best angular alignment between template and candidate, so candidate may need further rotation (only rotate in direction that renders a decrease in path-distance, and apply Golden Section Search (GSS) bounded by ±45°and a 2° threshold) to find optimal indicative angle and thus a global minimum path-distance.

Then, experiments are performed that show $1 has very good performance over DTW and Rubine's algorithm.

Discussion

This algorithm is quite useful to novices and non-specialists, at leat the steps really make sense to me. It is a geometric-based algorithm, and it is a good algorithm in terms of computational speed and no need of training examples. One thing I don't quite understand is GSS.

"MARQS: Retrieving Sketches Using Domain- and Style-independent Features Learned From a Single Example Using a Dual-classifier"

by Brandon Paulson, Tracy Hammond

Comment
1. Andrew's blog

Summary

In this paper, the authors implemented a search-by-sketch system called MARQS. It 1) recognizes a wide range of shapes that are not necessarily geometric nor drawn in the same way each time, 2) recognize a query example from only one initial training example, and 3) learn from successful queries to improve accuracy over time. MARQS uses two classifiers: 1) Single Classifier: immediate used after a single example and 2) Linear Classifier: improve accuracy over time as more examples are available from queries.

Previous work in feature-based recognizers: Rubine, Long, etc (single stroke), HMM(Hidden Markov Model), however it's style-dependent. Kara & Stahovich, similar to this paper, uses combination of four different recognizers that recognize shapes after only a single example, however, unlike approaches used in this paper, it's not an interactive learning system, and does not learn from future sketches. There are other single-stroke recognizers that use geometric properties instead. Sezgin's system use the process of approximation -> beautification -> basic recognition. Kato's system: create black&white abstraction of full color image, divide into 8*8 blocks and determine local correlation between each block, which are then summed to global correlation to determine similarity.

Recognition algorithm in this paper uses a feature set based on global features of the sketch, allowing them to be drawn to any scale and orientation, in any order, and with multiple strokes. The first step of the algorithm is to find the major axis (line that connects 2 points in a sketch that are furthest apart), and rotate so that axis lies horizontally. Then it uses two classifiers to perform sketch searching. The first one is Single Classifer: errors between features of query and features of sketches in database are calculated, normalized and summed to total error, ranked in increasing order. And the second one is Linear Classifier: implemented in much the same way as that in Rubine's paper, this classifier is used when at least 2 examples for every album in the database. Here, 4 global features are chosen: 1) bounding box aspect ratio 2) pixel density 3) average curvature 4) number of perceived corners across all strokes of the sketch.

Experiment on MARQS: 1350 queries was performed, 27% used Single Classifier with average search rank of 1.7, and 73% used Linear Classifier with average search rank of 1.44. The overall search rank was 1.51. 70% ranked in top 1 and 98% ranked in top 4.

Then the downfalls (slowdown during query time and reduction in accuracy over time with the Single Classifier) are discussed and improvements (generalize feature values of a symbol across all examples of that symbol for comparison, add more features, etc.) are suggested.

Discussion

I think the highlight of the system is the use of two classifiers, which enables "cold startup". But since the Linear Classifier uses Rubine's approach, I wonder if there is such a number of query examples beyond which gives little improvement or even negative effect in accuracy (like in Rubin's work, the number was 15.)

Another thing is that I don't think "pixel density" is a good feature to choose, because for the same picture, larger scale renders smaller pixel density. Thus using this feature means constraint on scales of the sketches.

Tuesday, September 2, 2008

"Visual Similatiry Of Pen Gestures"

by Long, Landy, Rowe and Michiels

Summary

Gestures are versatile, powerful, efficient, and convenient, but users often find gestures difficult to remember and misrecognizes gestures. The authors are developing a tool to assist pen-based UI disigners in creating and evaluating gestures for pen-based UIs by advising them about how to improve their gesture set. The current work is an investigation into gesture similarity.

Next, related work is showed, Apple Newton MessagePad popularized pen input, it used single-stroke and iconic gestures to recognize text. 3Com PalmPilot recognizes special command strockes. Other examples are music, drawing, air-traffic control, searching, etc. About perceptual similarity: the logarithm of quantitative metrics was found to correlate with similarity. However, similarity metrics vary between people and different stimuli. Multi-Dimensional Scaling (MDS) reduces the number of dimensions of a data set so that patterns can be expressed by a 2-3 dimensional plot. Several issues in using MDS are: (1) how to use data from multiple participants (INDSCAL takes as input a proximity matrix for each participant and takes individual differences into account) (2) how many dimensions to use in analysis (no more than 1/4 number of stimuli) (3) how to measure distance (Euclidean distance) (4) assign meaning to axes (a. inspecting plots of the stimuli b. correlation with measurable quantities).

Then a gesture similarity experiment is described, 2 data sets and different subjects were used in 2 trials:

Trial 1, gesture set was designed by Long to span a widerange of possible gesture types and to have differences in orientattion. 21 participants involved and were asked to select the gesture seemed most different among 3 gestures picked from the gesture set, all combinations were showed to each participant exactly once. The goals of the analysis were: (1) determine measurable geometric properties that influence perceived similarity of gestures (by using plots that showed inter-gesture dissimilarities by Euclidean distances) (2) produce a model that predict the perceived similarity of given gestures (by running regression analysis that produced weights indicating how mush each feature contributed to the similarity). A model of gesture similarity that correlated 0.74 with the reported gesture similarities was derived.

Trial 2, designed three new gesture sets (1) explore the effect of total absolute angle and aspect (2) explore length and area (3) explore rotation-relates features, to test the predictive power of their model for new people and gestures. Participants were asked to do the same thing as in trial 1. Again, this trial was to determine what features were used for similarity judgements and to derive a model for predicting similarity. Also, predicted similarities by using model derived in trial 1 were compared with similarities reported by participants in trial 2. The result was not so satisfactory as that in trial 1. The derived model predicts the reported gesture similarities with correlation 0.71.

The correlation between the prediction of trial 1 and the data from trial 2 was 0.56, with the other way around 0.51.

The results show that human perception of gesture similarity is complicated, creating multiple models with more data might help.

Discussion

The inevident predicting power of models derived from trials may due to the design of gesture sets and the features chosen. Both factors are intuitively determined by the authors, which may not fit the perfect model for predicting gesture similarity. Another possibility is that there is no such perfect model that can accurately predict perceived similarities of any given gestures for anyone at all. After all, human perception of similarity is so complicated.

Monday, September 1, 2008

"Specifying Gestures by Example"

by Dean Rubine

Summary

The author first listed some examples of existing gesture-based applications, however, in these applications, the gesture recognizer is hand coded, which is quite complicated job. But GRANDMA (Gesture Recognizers Automated in a Novel Direct Manipulation Architecture) enables an implementor to create gestural interfaces with simple click-and drag. GDP is such an application built by using GRANDMA.

Then GDP is showed as an example gesture-based application. The application recognizes gestures that create rectangle, ellipse, line, pack and perform copy, rotate-scale, delete. The end of gesture is perceived as either release of mouse button or cease of moving mouse for a given amout of time (latter method is used in GDP). Gestureing and direct-manipulation are combines in a 2-phase interaction technique: (1) gesture collection (2) gesture classification & manipulation. And due to the limitation of GRANDMA, gestures in GDP are all single strockes to avoid segmentatic problem and to allow shorter timeouts.

GRANDMA is a MVC system which associates input event handler with a vew class. In GDP, GdpTopView is the window in which GDP runs, GraphicObjectView is associated with line, rectangle, ellipse, text gesture handlers. Gestures are added through the following steps: (1) create new gesture handler ("new class" button) and associate with GraphicObjectView. (2) enter gesture training examples (15 examples per gesture is adequate). (3) edit semantics of each gesture in handler's set -- "recog" is evaluated when gesture is recognized, "manip" is evaluated on subsequent mouse points, "done" is evaluated when mouse button is released. Note that attributees of the gesture may be used in the gesture semantics (e.g. start and end points of line).

Next, the author discusses how gestures are classified. Preprocessing is done to eliminate jiggle: an input point within 3 pixels of the previous input point is discarded. Geometrically and algebraically features of are carefully chosen: (1) incrementally computable in constant time per input point (2) meaningful so that it can be used in gesture semantics as well as for recognition (3) enough features to provide differentiation between all gestures that might reasonably be expected (in GDP, 13 features are calculated). Gesture classification is done by calculating the linear evaluation function for each class "Vc" over the features of an input gesture, the input gesture is class "c" that maximizes "Vc". The weights in each class's linear evaluation function are determined by the training examples of that class. Thereafter, the probability that a gesture is classified correctly and the deviation (Mahalanobis distance) that a gesture is away from the mean of its chosen class is computed to reject ambiguous gestures and outliers, however, this should be disenabled if the application provides "undo". Overall, the gesture classification method has high rate of correct rate and fast computation time provided with around 15 training examples per gesture.

In the end, some extensions are introduced: versions of GDP use eager recognition (as soon as enought of it has been seen to do so unambiguously) and multi-finger gesture recognition (by comining classification of individual paths with a decision tree and a set of global features).

Discussion

GRANDMA definitely simplifies the work involved in building gesture-based applications. MVC architecture enables straightforward mapping between gesture handlers and views, though I don't see the use of "done" in gesture semantics since gestures end when users stop moving the mouse for a while.

Training by examples is a great idea, however, the author didn't give much information on how the set of training examples should be consisted of. For example, if all the input examples are all gathered from 5-year-old kids, the features of a class may be biased and cannot be used for efficient classification of input from common users.

I still need time to figure out the meaning of covariance matrix. But at the same time, I come up with another idea of gesture classfication which I haven't tested how well it works: the mean feature values of training examples for a given class can be computed (and may be normalized by, say, dividing by their standard deviation), and a vector of N mean feature values (say N features are chosen) can be denoted as a point in N-dimensional feature space, M classes can be represented as M points in N-dimentional space, each point for one class. And then, a given input gesture can also be denoted as a point in N-dimensional space based on its feature values, the distances Di (1 < i < M) from the point that represents the input gesture to the M points that represent M classes can then be computed, and the input gesture is class "i" that minimizes Di.

SKETCH RECOGNITION 2008