by Ming Ye, Herry Sutanto, Sashi Raghupathy, Chengyang Li and Michael Shilman
Summary
This paper presents an optimization method for text line grouping.
Line grouping problem: find the partitioning of the stroke set to optimize the likelihood of the resulting lines and the consistency and simplicity of the their configuration.
Three features to measure the likelihood of a line: the linear regression error (eLR), maximum inter-stroke distances (Dxmax(l) & Dymax(l)) projected onto the fitting line and its orthogonal direction respectively.
Configuration consistency: use neighborhood graph - a pair of lines are neighbors if the minimum distance between their fitting line segments falls below a threshold and there are no other objects (lines or drawings) lying between them. Compute each line’s configuration consistency θ(l) as the neighbor-length-weighted sum of the orientation angle difference. Use the weights to encourage lines to be more consistent with longer (usually more reliable) neighbors.
Model complexity: measured by number of lines (π).
Cost function: weighted sum of eLR, Dxmax(l), Dymax(l), θ(l), π. The optimal line grouping is the stroke partitioning that minimizes this cost.
Gradient-descent optimization: Starting from an initial solution from temporal grouping, iteratively generate local alternative hypotheses and accept the one leading to the largest global cost decrease. Temporal grouping: partitioning based on the fact that most text lines are composed of temporally adjacent strokes. Two types of alternative hypotheses: merging a pair of line segments and correcting high-configuration-energy errors, corresponding to the two major error categories of the temporal grouping results. Merging hypothesis is generated for each neighboring pair. High-configuration-energy errors are caused by temporally adjacent strokes belonging to different lines. Any line segment l whose maximum angle difference from its neighbors exceeds a threshold is an initial candidate. Iteratively update the line configuration and the hypothesis queue, take the hypothesis that incurs the most global cost reduction in each step, until the queue is empty. Incremental mode used in online sketch yields better results than batch mode (throw in full page of sketch).
Accuracy is measured by the recall metric, which is defined as the number of correct lines divided by the number of labeled lines in each page. The average recall rates for PerfectWD and CrudeWD are 0.93 and 0.87 respectively.
Discussion
The features used in grouping (calculating global cost) is useful. The iterative gradient-descent algorithm seems to be working quite well. But I'm wondering if it works equally well on sketches with text and shape mixed. For example, people may write text on a line, or the table wasn't drawn big enough that they have to write through the edges of the table (text cut by shpe).
Subscribe to:
Post Comments (Atom)

1 comment:
i think you are right . The scenarios you have mentioned can break the algorithm.
Post a Comment