Sketch Recognition: Distinguishing Text from Graphics in On-line Handwritten Ink

Author: Christopher M. Bishop, Markus Svens´en, Geoffrey E. Hinton

Comments:

Summary:
This paper applies machine learning techniques to classify ink strokes into text/ shape.

Feature selection : A total of 11 features were selected and Total least squares, an approach similar to PCA was used to reduce the feature set. On the whole 9 features were identified

1. The stroke arc length, i.e., the sum of the lengths of the stroke segments.
2. The total absolute curvature, defined as the sum of the absolute angles between the consecutive segments.
3. The main direction (x- and y-components) of the stroke, as given by the TLS fit.
4. The eigenvalue (length-width) ratio of the TLS fit of the stroke.
5. The total number of fragments found in the stroke.
6. The arc length of the largest fragment of the stroke.
7. The total absolute curvature of the largest fragment.
8. The main direction of the largest fragment.
9. The length of the long side of the bounding rectangle (not axis-aligned) of the largest fragment.

Classification: The paper describes an Multi layer perceptron to classify the strokes. Then extends it using the temporal information to find the correlation between the classes( text/ shape). This is further extended using the gaps between the successive stroke.

MLP is sufficiently constrained and not over fitted. this is done by a 10 fold cross validation. A factor is introduced in the objective function (error) to balance the bias towards text. Bayes theorem is used to predict the classes.
This algorithm is then extended using the temporal information. The class labels of the strokes that are in sequence are co-related. A HMM is built over the transition probability P(tn/tn-1). Once the HMM is constructed, the viterbi algorithm is used to classify the strokes.
3 features identified for the representing gaps between strokes. A MLP was then formed using the features and then integrated with a Bi-partite HMM.

Conclusion: temporal model encodes the fact that strokes of one kind (text or graphics) are more likely to be followed by another stroke of the same kind rather than a stroke of the opposite kind. The approach can also modified to analyze the subsequence of strokes in cases where 2 strokes separated far apart can not help in predicting the next stroke.

Temporal gap can change while editing the document. Spatial context can be integrated with this model fairly easily.

Discussion:
Its exciting to see application of Pattern Recognition in sketch.

As far as i understand, the system calculates probability of a stroke to be text/ shape using the 9 features - uses MLP to decide if a stroke is shape/text. This classification result is further changed by incorporating temporal information and gaps between stroke. This is done by building a finite state machine( Bi-partite HMM) which considers P(tn/tn-1). and the gaps .

The results are interesting. This classifier will be effective for classifying handwritten paragraphs vs figures (e.g class notes). I do not think this will be effective in classifying diagrams and the labels written over them (e.g Constraint Satisfaction Diagrams...) .The temporal information and the gaps between strokes does not help much in these cases.

Sketch Recognition

Wednesday, October 15, 2008

Distinguishing Text from Graphics in On-line Handwritten Ink

No comments:

Followers

Blog Archive

About Me

My blogs