Monday, September 8, 2008

"Graphical Input Through Machine Recognition Of Sketches"

by Christopher F. Herot

Summary

Traditional systems of computer-aided design is more akin to computer-aided evaluation or manipulation. And this paper discusses the approaches to let machines make inference about the meaning of sketches as well as user's attitude toward his design.

The HUNCH system is introduced. It's composed of a set of programs that interpret sketches at different levels. HUNCH was conceived around STRAIT, which found corners in the sketch based on the assumption that drawing speed decreased at corners. Curves are considered to be special corners which are recognized by CURVIT. The first version of STRAIT used a latching algorithm that combined endpoints falling close together, which rendered some bizzare results, which then brings STRAIT without latching - STRAIN. Experiment on computing latching radius by a function of speed also shows unsatisfactory result, because it's based on the assumption that user's certainty in endpoints' position correlate with speed of drawing the line, which doesn't reflect contidion at endpoints. Comparing candidates for latching with GUESS in 3d space may solve the problem, but paradoxically, GUESS relies on the latched data as input. This may require that latching make initial decisions and be modified in later stages if proved untenable. Overtracing reduces quantity of data and turning several overtraced lines into one. 3D projection based on 2D network of lines, and room finding based on floorplan shows success in machine recognition to some extent.

Truely successful system would have to make use of context at lowest level. General case description is mathed against the context-free structure in a top-down fashion and generate a composite structure where all of the syntactic entities of the sketch are assigned a meaning, but the matching process is complicated.

A more interactive system that combines bottom-up data flow of and top-down flow of context information comes from the user as well as high level program is promising. The new system consists of a data base and a set of programs to manipulate it. A set of functions such as speed and bentness are used to find lines and curves. By varying the number of points in an interval, machine's interpretation can be manipulated to fit closest to user's intention. Inference-making procedures are run in background, allowing interpretation on the fly. Latching is decided by certainty factors, and should be controlled by user profile. Also, latching radius should adjust according to some factors.


Discussion

This paper shows an "intelligent" system. It really tries to recognize what the user has drawn, although the speed factor depends purely on drawing habits of individual users. The idea of involving the context information (used by human observer) into the system is great, which facilitats decision making by assgining meaning to things being sketched. However, human intervention - context, intentions, etc. - is still playing a part in interpretation ("For example, if the program were told that the user has drawn a square, it could vary the size of the interval until the two functions produced exactly five peaks.")

1 comment:

Daniel said...

I'm also a fan of using context for recognition, though it's still something that has to be specified by the user. Though much of this could just be during training, where you assign a context to an object. This would allow for the same gesture but with different contexts (ex: a plus sign for math versus a positive sign for electronics). Ambiguity would be resolved by the context of the other gestures drawn.