SKETCH RECOGNITION 2008: November 2008

Tuesday, November 25, 2008

"Multimodal Collaborative Handwriting Training for Visually-Impaired People"

by Beryl Plimmer, Andrew Crossan, Stephen A. Brewster, Rachel Blagojevic

Summary

This paper presents McSig, a multimodal collaborative handwriting training system for the visully-impaired people.

The teacher can teacher visully-impaired students how to write in three ways: 1) verbal communication to give guidence and explain concepts. 2) the teacher use a Tablet PC to draw, and her movement is echoed to the PHANTOM on the student side (Playback mode), and 3) Stencil mode - allow students to write shapes themselves with constraining forces to guide their movements. Multi-stroke characters and multiple repetition of single-stroke characters are discerned by using time-out, if strokes are drawn within 1sec interval, they belong to the same character.

The system is also audio aided. They mapped a pitch of a sinusoidal to vertical movements, and audio pan to horizontal movments. A high pitch indicates that the cursor is near the top of the drawing area, while a low pitch indicates that the cursor is near the bottom. Distinct sounds are played at the start and end of the teacher's trajectory.

PID controller is used to minimise the error between the current value of a system and a target value. The user will be dragged through a close approximation of the trajectory in a smooth and stable manner.

User study consists of pre-test phase, training phase, and post-test phase. Both the partially-sighted group and totally blind group see progress after using this system. The virtual stencil didn't work may due to the fact that visually-impaired people use two hands to write, one holding the pen, and the other for spatial orientation on the paper. This system could also be extended to use in surgery training.

Discussion

The idea of using multimodal way to help learning is good. But the sound may be annoying and distract the the user.

Monday, November 17, 2008

"Fluid Sketches: Continuous Recognition and Morphing of Simple HandDrawn Shapes"

by James Arvo, Kevin Novins

Summary

This paper presents an approach to tightly couple gesture recognition with rapid morphing to provide a new and subjectively different form of feedback. The feedback is provided as each pen stroke is drawn, so that immediate adjustments can be made if the result is not what was intended.

A family of ordinary different eqations (ODEs) is used to determine how a user-drawn shape changes with time as a result of continuous recognition and morphing. The parametric ODE is referred to as the fluid sketching equation. Points are continuous moved towards its corresponding ideal position that reflects the curve the user is attempting to draw. qs(t) is the new position of ps at time t. It is this feedback of q influencing its own subsequent time evolution that leads to modeling q as a differential equation.

Recognition stage attempts to classify the user-drawn stroke into a fragment of a known shape (circle, box, line segment). Best fit of the stroke to each shape is calculated. The metric is the sum of the squares of the distances from each sampled point on the curve to the closest-matching shape. Least-squares fit for circle is found by applying a linear equation, while for box, relaxation is used (each point exerts a "spring" force to the nearest face or vertex and use gradient descent to get to a configuration in which the total net force is nearly zero - most balanced state).

The matching is based on the entire stroke, but the recent portion is weighed more heavily. The simplest strategy for f is to simply move in the direction of the closest point on the ideal shape. Viscosity v is to describe how fast the point on the stroke moves toward the ideal shape S. v=0 means instantaneous snapping to the ideal shape, while v=infinity means that the user-drawn strokes are retained
unmorphed.

In situations where accuracy of placement is less important, the majority of users in this study preferred fluid sketching.

Discussion

The approach introduced in the paper enables the user to see the cleaned shape portion without even finishing the stroke. However, this may distract user's attention and also distort the stroke at an early stage that may yield a misrecognition.

"Sketch Recognition User Interfaces: Guidelines for Design and Development"

by Christine Alvarado

Summary

This paper addresses both HCI and sketch recognition by introducing a recognition-based diagram creation tool that robustly recognizes naturally drawn diagrams and automatically imports them into PowerPoint. Recognition is done by a multi-domain sketch engine called SketchREAD. Users create diagrams consisting of shapes (ellipses and quadrilaterals) and connectors (lines and arrows) on Tablet PCs using this tool.

To enable seamless interaction, the diagram creation tool and PPT is automatically synchronized to contain the same information. This tool also provides editing capabilities including move and delete. Pen-based editing commands are designed. Users switch between edit mode and sketch mode by pressing buttons on the winodw. Online edit mode is introduced to allow users sketch while in edit mode. The user holds down the pen in sketch mode to enter online edit mode, then she selects the items, when she lifted the pen, she can draw new items, however, the selected items remain highlighted, indicating that she can move or delete them using the same gestures in edit mode. Both recognized (cleaned) strokes and unrecognized (original) strokes appear on the slides.

Design guidelines: (1) Display recognition results only when the user is done sketching. Whether the diagram is finished is indicated by changing of window focus, or explicited by the user by pressing "show" button. (2) Provide obvious indications to distinguish free sketching from recognition. (3) Restrict recognition to a single domain until automatic domain detection becomes feasible. (4) Incorporate pen-based editing as much as possible (copy, paste, alignment, resizing). (5) Sketching and editing should use distinct pen motions. (6) SkRUIs require large buttons.

Discussion

Not touching the recognition techniques in detail, the paper discusses mainly the UI design and evalution of the sketch recognition tool. The design guidelines are useful.

The online edit mode may still be confusing to the user, it's better to provide a more straightforward way to switch between editing and sketching, like the trigger-action used in LADDER.

"What Are Intelligence? And Why?"

by Randall Davis

"Magic Paper: Sketch-Understanding Research"

by Randall Davis

Summary

Two things motivate the desire to enable sketching. First, using pen is more natural then keyboard and mouse, and free-form sketching is more intuitive than shapes CAD programs generate. Second,CAD force sketchers to make commitments that they may not want to make at early conceptual design stage.

The difficulty of sketching is in proportion to the user's allowed degree of freedom. These difficulties include: First, our task is incremental. This lets the system provide continuous feedback about its understanding of the sketch, so the user can make corrections when a misunderstanding arises. In addition, as in any signal interpretation problem, there is noise. Next, the drawing conventions in many domains permit variations. Individual styles also vary, across users and even within a sketch. Another issue is the difficulty of segmentation. Next, there are issues involving overtraced or filled-in shapes. Finally, and perhaps most interesting, the signal is both 2D and nonchronological. The signal is nonchronological in the sense that we don't require each object to be finished before the next is started.

Two basic assumptions ground most work in sketch understanding. First, the work is done in domains where there's a reasonably well-established graphical lexicon and grammar. But this is clearly not universal. Second, much like work in speech understanding, sketch-understanding systems are built for a specific domain.

Finding primitives: uses the data's temporal character. And by combining curvature and speed, corner finding could be more precise. Recognizing shapes: three correspondingly different approaches to recognition: (1) how the shape is defined - using LADDER, (2) how the shape is drawn - Dynamic Bayes' Network (DBN), and (3) what the shape looks like - vision approaches (bull's eye). We can connect sketch understanding to various back-end programs (physics simulator, RationalRose).

The author also talks about creating a sketch understander for a new domain, the issue of underconstrained and overconstrained definitions, and description-refinement process using near-miss.

Discussion

This paper is an overview of sketch recognition, including the difficulties, common approaches, issues, applications, etc.

"Interactive Learning of Structural Shape Descriptions from Automatically Generated Nearmiss Examples"

by Tracy Hammond, Randall Davis

Summary

This paper describes a visual debugger of shape descriptions using active learning that automatically generates its own suspected near-miss examples.

A near-miss shape is a shape that differs from the initial hand drawn positive example in only one aspect. The debugging is done by classifying near-miss shapes as positive or negative to help correct overconstrained (constraints too tight) or underconstrained (constraints too loose) descriptions.

This approach needs a positive hand-drawn example and a user-typed description that will correctly recognize that shape. Then a recognizer is generated using the description. If the known to be correct example is not recognized, the description must be overconstrained. And it's corrected by first finding a match between the typed description and the drawn shape that has fewest failed constraints. The system shows those failed constraints in red and asks the developer if they could be removed to correct the description. If the description is automatically generated, the list of constraints could be quite long, so the authors applied a set of heuristics to prune the list.

The initial hand-drawn example and the initial description may be both overconstrained although the example can be recognized with that description. So they use a constraint candidate list. Each time a positive example shape is encountered, any constraint not true of the new example is removed from the list. They also generate a list of constraints known to be part of the correct description by examine the negative examples. Each time a negative example shape is encountered, a list of the constraints that are in the constraint candidate list but are not true of the negative example shape is constructed. At least one constraint in this list is part of the correct description, because it correctly classifies the new example as negative. For scaling and rotation, the developer is asked to indicate the status of each example individually in the case of scaling, while he/she is permitted only to say whether or not all of the examples are positive for rotation. A constraint is tested by creating a description in which the constraint is replaced by its negation.

Under-constrained description is checked by making a list of possible missing constraints. As the list could be very long, the authors uses a filtering process. Whether or not a constraint is missing from the description is tested by adding its negation to the description. If the shape in which the constraint is met is a positive example, and the shape in which the constraint is not met is a negative example, then this constraint is necessary in the shape's concept.

Discussion

This paper talks about a positive learning approach to debug the shape description in LADDER in an interactive way. It is both efficient and necessary because human testers may never draw shapes that expose the bug in the description.

SKETCH RECOGNITION 2008