Sunday, November 2, 2008

Grouping Text Lines in Freeform Handwritten Notes

Author: Ming Ye, Herry Sutanto, Sashi Raghupathy, Chengyang Li and Michael Shilman

Comments: Yuxiang's blog

Summary:
This paper describes an effective way of grouping ink strokes in to text/shape .

Likelihood of Line:
Three features are used measure the likelihood of the line.
Linear Regression error(eLR) - measure of the deviation of the stroke points from the fitting line. Reflects the linearity.
Maximum inter stroke distance of the strokes projection along X(dxmax) and Y (dymax) - reflects the compactness of the stroke set.

Configuration Consistency - the paper identifies 3 configuration and consistency corresponding to the configuration. The paper also provides a method to measure the consistency using the neighborhood graph where the vertices correspond to a line and edges correspond to the distance between the line. If the distance is below a certain threshold and there are no drawings between them, the 2 vertices connected by the edge can be considered neighbors. Configuration consistency is computed using the neighbor-length-weighted sum of te orientation angle difference.

A cost function is defined as weighted sum of the 3 features discussed for the likelihood of the line and the configuration consistency. Groups are chosen which would reduce the cost function.
Optimization is done to cost function by redefining it as sum of 'eLR' and 'dxmax'.

Undergrouping errors - typically 'i / t' and high configuration energy errors are caused by temporally adjacent strokes being recognised in different lines. This solved using the finding the neighbors which are approximately parralel.

Iterations: The algorithm starts with the results of the temporal grouping and then uses gradient descent to reduce the cost function and fixes the under/over grouping errors during the iteration. Incremental parsing turns out to be much more efficient than batch mode.

Produces a result of 0.93 and 0.87 respectively for perfect word/draw and crude W/D

Discussion:
The way they solve the starting point problem by assigning groups based on temporal information is interesting. This paper will be effective in areas were text strokes are continuous i.e input is a set of paragraphs and drawings / input containing continuous text strokes. I do not think this would work well in cases of finite state machines where text strokes are very sparse and broken.


1 comment:

Daniel said...

I agree that this approach would be much more limited in areas were text is not already grouped (e.g. a word or two meant to serve as a label). But when you have some other heuristic to detect a candidate region of text, this would be useful for further refinement.