Sketch Recognition: October 2008

Monday, October 20, 2008

Structure in On-line Documents

Authors: Anil K. Jain and Anoop M. Namboodiri, Jayashree Subrahmonia

Comments:

Summary:
This paper discusses an approach to classify text and shape in online documents.
A linear decision boundary can be used to classify text and non text strokes using stroke length and stroke curvature details.

Grouping non text strokes : A minimum spanning tree is used to group the non text strokes. Strokes are nodes and the shortest distance between them are the edge weights. Inconsistent edges are removed from the spanning tree. Maximum inter region distance of 20 and intra region distance of 200 is imposed on the regions.

Tables - Ruled and unruled

Ruled tables - Ruled tables can be found using te hough transform(r,theta). theta = 0/90 there would be peaks. ruled tables can be found using the condition that it would contain atleast 5 lines.

Unruled tables - Projection on one axis give peaks for all the lines and projection on other axis gives peak with valleys( gaps between the columns).

Discussion:
Interesting features to classify tables. I did not understand how the spanning tree and clustering would help in classification. May be some of the reference papers would be helpful in understanding this.

Sketch Recognition for Computer-Aided Design

Author: Christopher Herot

Comments:

Summary:

This paper discusses about enhancing user experience in Sketch recognition system. This is done by understanding user intent from speed, pressure and sequence of the stroke. This paper highlights the cumbersome nature of the explicit action - response systems and proposes modifications which would alleviate this overhead.

First phase of interpretation - The sketch is converted to line and curves. Lines, curves and corners are identified using patterns in speed. Curves are smoothened using Bspline curve fit. The system assumes that strokes drawn slowly are intended more literally than those drawn hastily. The output is then processed for overtracing and thickness. Thickness may be used as a measure of the degree of emphasis intended for the line.

Second phase of interpretation - The next phase is latching the corners of the stroke which do not actually interesect. The latching radius is determined using the speed, line length and density of line around each point.

Further interpretations are done based on the context.

In order to tune the system to particular user sketching style, the system is trained with practise patterns drawn by the user.

This system with the help of above interpretations can decrease the number of interactions with the user compared to the traditional action - response system. Allowing multi level viewing and manipulation of the interpreted data is also important. This allows user to edit raw data as well as the interpreted data. The system adapts to the interpretation errors by changing the parameters as they are corrected by the user.

The paper also emphasises on the inclusion of the user's model of the system in the system model. This would imply tracking the user confidence in the system. This would heklp the system provide the right level of feedback and degree of inference making. This part is very critical since this is related to user behavior.

Discussion:

Understanding user intent using the speed, pressure and sequence seems to be very interesting idea. This seems to be practical but the final part of the paper which proposes a system which includes user's model of the system seems to be complex and theoretical. Parameters to capture user confidence is not clear.

Wednesday, October 15, 2008

Seminar by Edward Lank

Sloppy Selection:

Steering Law : Time to draw a stroke is directly proportional to length of the path and inversely proportional to width of the constrained path.
Speed of the stroke is directly proportional to width of the constrained path.

I do not remember the other law.
Both of them were used to find the intent of the user while selection using Ink.

It assumed that user selection path is more constrained if the user has to accurately select particular region/ stroke.

PUI:
Reducing the number of modes. Usually sketch interfaces plagued my the modes (writing, drawing, erase, edit...) . Method to reduce it by recognizing the user intent. Popping up tool-tip menu to identify the intent of the user and disambiguate the recognition process. e.g: A circle drawn by user can be a circle / a gesture for selection. Disambiguated by popping up "select" at tooltip. If user clicks on 'select', take selection mode or recognize stroke as circle.

Distinguishing Text from Graphics in On-line Handwritten Ink

Author: Christopher M. Bishop, Markus Svens´en, Geoffrey E. Hinton

Comments:

Summary:
This paper applies machine learning techniques to classify ink strokes into text/ shape.

Feature selection : A total of 11 features were selected and Total least squares, an approach similar to PCA was used to reduce the feature set. On the whole 9 features were identified

1. The stroke arc length, i.e., the sum of the lengths of the stroke segments.
2. The total absolute curvature, defined as the sum of the absolute angles between the consecutive segments.
3. The main direction (x- and y-components) of the stroke, as given by the TLS fit.
4. The eigenvalue (length-width) ratio of the TLS fit of the stroke.
5. The total number of fragments found in the stroke.
6. The arc length of the largest fragment of the stroke.
7. The total absolute curvature of the largest fragment.
8. The main direction of the largest fragment.
9. The length of the long side of the bounding rectangle (not axis-aligned) of the largest fragment.

Classification: The paper describes an Multi layer perceptron to classify the strokes. Then extends it using the temporal information to find the correlation between the classes( text/ shape). This is further extended using the gaps between the successive stroke.

MLP is sufficiently constrained and not over fitted. this is done by a 10 fold cross validation. A factor is introduced in the objective function (error) to balance the bias towards text. Bayes theorem is used to predict the classes.
This algorithm is then extended using the temporal information. The class labels of the strokes that are in sequence are co-related. A HMM is built over the transition probability P(tn/tn-1). Once the HMM is constructed, the viterbi algorithm is used to classify the strokes.
3 features identified for the representing gaps between strokes. A MLP was then formed using the features and then integrated with a Bi-partite HMM.

Conclusion: temporal model encodes the fact that strokes of one kind (text or graphics) are more likely to be followed by another stroke of the same kind rather than a stroke of the opposite kind. The approach can also modified to analyze the subsequence of strokes in cases where 2 strokes separated far apart can not help in predicting the next stroke.

Temporal gap can change while editing the document. Spatial context can be integrated with this model fairly easily.

Discussion:
Its exciting to see application of Pattern Recognition in sketch.

As far as i understand, the system calculates probability of a stroke to be text/ shape using the 9 features - uses MLP to decide if a stroke is shape/text. This classification result is further changed by incorporating temporal information and gaps between stroke. This is done by building a finite state machine( Bi-partite HMM) which considers P(tn/tn-1). and the gaps .

The results are interesting. This classifier will be effective for classifying handwritten paragraphs vs figures (e.g class notes). I do not think this will be effective in classifying diagrams and the labels written over them (e.g Constraint Satisfaction Diagrams...) .The temporal information and the gaps between strokes does not help much in these cases.

Tuesday, October 14, 2008

Ink Features for Diagram Recognition

Authors: Rachel Patel, Beryl Plimmer, John Grundy, Ross Ihaka

Comments:
1. Daniel's blog

Summary:
This paper proposes an algorithm for text vs shape recognition. This paper builts a classification tree based on certain features to classify text from shape. The results are then compared to Microsoft's divider and Ink Kit.

Feature Selection:
Set of 46 shapes were selected and then important features were selected based on analysis of these features on the following data. Set of 9 shapes were recognized and samples were collected from 26 people. The paper uses rpart function to find a classification tree. The aim is to find the most optimal position for a split to be made so that there are a minimal amount of misclassified strokes. If this is done for all features in the feature set, using the observations in the dataset, then the features that most accurately split the data into text and shape stroke groups, with the least amount of misclassified strokes, will be identified as the significant features for division of text and shape strokes.

The final set of features are Time till next stroke, Speed till next stroke, Distance from last stroke.Distance to next stroke, Bounding box width, Perimeter to area, Amount of ink inside, Total angle.

* The size of shape strokes is much larger than text strokes reflected by the use of bounding box width, perimeter to area and amount of ink inside features.
* Curvature is relevant for differentiating joined up letters from shapes
* Inter stroke distance is used to find words. This feature is slow for strokes in a word. Faster speed but a high inter stroke distance suggests next word.

Future work - first step would be to replace classification tree with more robust classifiers with the same features which allows variability.

Discussion:
Samples of cases where the algorithm failed would have been useful to understand more about the algorithm.
I think the sample sketches used give more idea about some symbols which fall under difficult cases for classification. The check box and the musical notes are some difficult symbols to distinguish from the text.

Sunday, October 12, 2008

Renegade Gaming: Practices Surrounding Social Use of the Nintendo DS Handheld Gaming System

Author: Christine Szentgyorgyi, Michael Terry & Edward Lank

Summary:
This paper is a qualitative study investigating the collocated multiplayer gaming practices of Nintendo DS owners. It tries to answer question like who people play with, under what circumstances, and for what reasons in the context multiplayer gaming with DS.

Renegade gaming- 3 rules to play - appropriateness with respect to the location, degree to which the non player would get affected and personal image.

Gaming Goals:
In this study, subjects reported gaming for the following purposes:To pass time,To learn or keep one’s mind “sharp”,To be social and To engage in competitive play.

Social gaming: Engaging in multi-player games seen better than trash talking
Coordinated game plays : this happens less , more likely among group of friends with gud logistics .
Adhoc gaming: Involves including strangers in the game . Awkwardness in approaching. Location plays an influential part in this. Places meant for gaming makes it easy to interact with strangers.

DS multiplayer is considered to be less social, with three main factors contributing to the difference: the lack of a shared display, the reduced potential for spectators, and the closed nature of the gameplay experience
The ability to attach a large screen to DS can help could enhance the social experience surrounding group gameplay.

Discussion:
I dont have much to comment about this paper. Its altogether a good survey. It throws light on how gamers socialize , locations where games are played, intentions behind playing games, effect of screen size on socializing, ....

Saturday, October 11, 2008

MathBrush: A Case Study for Pen-based Interactive Mathematics

Authors: George Labahn, Edward Lank, Mirette Marzouk, Andrea Bunt, Scott MacLean, and David Tausky

Comments:
1. Daniel's blog

Summary:
This paper discusses about a complete full-featured pen-math problem-solving system called Math Brush. This system uses Computer Algebra system (CAS) to solve the mathematical problems. This system allows Sketch inputs to input math problems.

The system provides a scrollable panel for pen input. It provides a context sensitive pop up menu which display a set of CAS features that can be used. The system allows the user to edit part of expression, manipulate sub expressions based on CAS output. The system also allows the user to plot the math expression in the form 2D/ 3D plots and it can be rotated using the pen.

Equation entry: The input validation panel is the input panel. The strokes made on them by the character recognition system. Some problems related to recognition were due to the extraneous ink let by the user (dots,...) . Editing gestures like scratch and translation were not provided for the user. Some other problems recognized were the user interface containing 2 input panels for correcting character recognition results and the structure of equations and user's inability to percieve the reason behind the recognition errors(correcting stroke vs correcting recognition results) .

Discussion:
The MathBrush seems to be an application which tries to incorporate MATLAB + pen input. The paper is well written in terms of explaining the limitations and failures of the system.
The idea of math education and research using computers is interesting.
I think the Maple system which the author uses to compare is from Stephen M Watt (Author of prototype pruning paper)

Wednesday, October 8, 2008

Sketch-Based Educational Games: "Drawing" Kids Away from Traditional Interfaces

Authors: Brandon Paulson, Brian Eoff, Aaron Wolin, Joshua Johnston, Tracy Hammond

Summary:
This paper discusses about various games where sketch recognition is useful.
APPLES - Animated Planetary Physics Learning and Entertainment Simulation using LADDER, Memory games involving sketches , Tools for sketching on maps, Sentence Diagramming - tool to annotate on sentence.

Discussion:
This paper gives a different applications of Sketch recognition.
Thinking in similar lines, i think a tool can be developed for finding if a person has dyslexia (getting confused between similar shapes). One more idea would be to develop tool to analyze how children learn writing. One way i could think of , is to ask a child to replicate a letter and find the closeness of the input to the letter. This would help us understand how handwriting of a person evolves. I dont know if it would be useful though.

Recognizing Free-form Hand-sketched Constraint Network Diagrams by Combining Geometry and Context

Authors: Tracy Hammond, Barry O’Sullivan

Comments:

Summary:
This paper discusses a sketch recognition system developed for Constraint Network diagrams using LADDER.
Author define geometrical recognition rules for nodes, lines , letters and constraints that would be used in the diagram. Example:
The letter V consists of two connecting lines abiding by the
following geometric rules:
1. The "neg" line is negatively or vertically sloped.
2. The "pos" line is positively or vertically sloped.
3. The endpoint "p1" of "pos" line and the endpoint "p1" of
the "neg" line are connected.
4. The endpoint "p2" of "pos" line is above the endpoint
"p1" of the "pos" line.
5. The endpoint "p2" of "neg" line is above the endpoint
"p1" of the "neg" line.
6. The endpoint "p2" of "neg" line is left of the endpoint
"p2" of the "pos" line.
7. The "pos" line and the "neg" line are of equal length.
Contextual rules are then used to further identify V:
1. The "node" contains the center of the "pos" line.
2. The "node" contains the center of the "neg" line.

The system combines the relative and absolute thresholds to get upper bounds and lower bounds. This solves the problem of loose thresholding in relative thresholds and strict thresholds of the absolute thresholds.

The system also allows editing of the sketched diagrams - delete, scale, drag ...

Discussion:
I see no significance in recognizing the letters inside the node. As they are just symbols to recognize each node , should it be recognized? Is it not enough to recognize just the nodes , lines and the constraints?. Removing letters from the recognition ,can actually reduce the complexity of rules.

LADDER, a sketching language for user interface developers

Authors: Tracy Hammond, Randall Davis

Comments:

Discussion:
This paper describes LADDER - a language to develop sketch recognition system.

This language allows users to specify the shapes to be recognized including the ways they can be edited and displayed.

Description limitations: LADDER can describe only shapes that have fixed graphical grammar. It cannot describe free hand diagrams. The shapes should have primitive components and should not have lot of irregularities.

Shape definition: This includes definition of
* list of components - primitives that make the shape
* geometric constraints - like angles between lines...
* aliases
* editing behavior - Editing operations(trigger) that can be performed on the shape and the corresponding effect (action)
* display methods - 4 ways of display can be set for a shape - original strokes, cleaned up (beautified) stroke, ideal shape and the alternate custom shape.

LADDER also allows to define hierarchical shapes, abstract shapes and shape groups.
It contains predefined components, constraints , editing behavior and predefined display to describe a shape. Vector (keyword) is used to define the minimum and maximum number of components for primitive shapes which can have variable number of components.

Multi domain recognition system: the system uses a bottom - up approach in recognition.
Low Level recognizer - recognizes primitive shape / combination of them from the stroke drawn and passes the result to high level domain shape recognizer. The limitation is when the low level recognizer fails, the domain shape recognizer would also fail.

Domain shape recognizer - this recognizer is based on Jesse rule. It searches for all combination of shapes that can satisfy the rule.A greedy algorithm is used to make the system faster. The down fall is in cases of ambiguity, the system may select wrong shape. The system was slower still and it was fixed by pruning the unrecognized strokes from the recognition tree.

The system also contains methods to find editing triggers and methods to solve constraints in the shape (ideal strokes).

Discussion:
As the author highlights, there is inherent problem in the bottom - up approach, the failure in low - level recognizer can cause failure in domain shape recognizer.
The ability of LADDER to define editing and display methods makes it powerful.

Ambiguous Intentions: a paper-like interface for creative design

Authors : Mark D Gross, Ellen Yi-Luen Do

Comments:
1. Daniel's blog
Summary:
This paper discusses about sketch recognition application in Designing.

Interfaces for conceptual and creative design should recognize and interpret drawings. They should also capture users’ intended ambiguity, vagueness, and imprecision and convey these qualities visually and through interactive behavior.
Electronic Cocktail Napkin - freehand drawing environment implemented using Lisp using Wacom tablets, Apple Newton PDAs , mouse / trackball .

Configuration recognizer - recognizes set of shapes based on components and context. User defined recognizer is used. Recognizer is trained using the user drawn examples. The recognition features used depends on the context. For example, spatial relation may not be used for Circuit diagrams, instead connection are used to identify patterns. In case of ambiguity, the system retains the shape and alternatives till it resolves the ambiguity.

Imprecision - After recognizing the shapes, the system maintains connectivity, adjacency and alignment constraints. This allows user to stretch/ move the shapes. Domain specific context is used to identify the constraints to be applied while such edits.
User can gradually tighten the constraints to make a final design.

Implementation - Recognizer for strokes , Recognizer for Configuration and maintenance of constraints. As all the recognizers depend on the context information, the system maintains a current context which varies with the recognition results of the current stroke drawn.

Discussion:
This paper has taken good leverage of incremental design.
The idea of applying context to identify configuration is very interesting. This system identifies the current context based on the strokes drawn previously. there can be recognition problems when system ends up in conflicting context information.
The notion of measuring commitment and certainty of the designer from the precision, roughness, speed and overtracing is very interesting.

Wednesday, October 1, 2008

What!?! No Rubine Features?: Using Geometric-based Features to Produce Normalized Confidence Values for Sketch Recognition

Authors : Brandon Paulson, Pankaj Rajan, Pedro Davalos, Ricardo Gutierrez-Osuna, Tracy Hammond

Comments:
1. Andrew's blog
Summary:
This paper constructs a sketch recognition system with 44 features - 31 from paleo-sketch and 13 from rubine and uses a quadratic classifier to recognize sketches. The results are then compared to Paleo sketch. The samples used to train and test the system where the same as that used for paleosketch.
Of the 44 features , the system identifies the following features as important through feature subset selection:

* End point to stroke length ratio -
* NDDE & DCR
* Total rotation - above 4 features were used more than 90% to classify

* Curve least square error - to identify curves

* Circle fit : Major axis: minor axis - used to classify circle/ ellipse

* Spiral fit - Average radius to bounding box radius ratio, Center closeness error

* Polyline fit - # of substrokes, number of strokes passing line test, feature area error

* Complex fit- # of sub-fits, # of non polyline primitives, percent of subfits that are lines, complex score/ rank

Results show this system produces 97% accuracy on this optimal subset which is as good as the paleo sketch system(98.56%)

Discussion:
This is a very interesting set. Each feature is significant to identify a particular type of primitive.
I do not understand how the system classifies the arc. I think it considers arcs as subset of curves.
Its interesting to see the use of Polyline fit: feature area error, i think this helps the system to distinguish between ellipse/circle and polylines.

Calculating percent of substroke that are lines for Polyline and complex fit seems to be rendundant. But both of them are marked important by this system.

For curves , least squares error is significant and for poly lines, its feature area which is important. its quite different from what we have discussed in the class. I thought least squares would be more significant for lines and feature area for curves.

Backpropagation Applied to Handwritten Zip Code Recognition

Authors : Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackel

Comments:
1. Yuxiang's blog

Summary:
This paper discusses about an algorithm to recognize Zip code numbers using neural networks. It uses a 3 layer neural network. H1 - 12 feature maps with 8x8 units, each unit takes a 5x5 field from the 16x16 image as input. H uses 12 feature maps with 4x4 units built on H1. H3 has 30 units fully connected to H2 and output layer has 10 units ( corresponding to 0-9) fully connected to H3.

Discussion:
I did not understand anything of this paper.

Sketch Recognition