Friday, November 28, 2008

SHADY: A Shape Description Debugger for Use in Sketch Recognition

Authors: Tracy Hammond and Randall Davis

Comments:

Summary:
This paper gives a description on a debugger (SHADY) that identified over-constrained shapes by having the user draw some samples of the shapes. SHADY uses the drawn sample to identify the constraints that are wrong and displays them. The tool tries to give out the smallest collection of constraints that would match the user's necessity and displays the set of constraints and the set of wrong contraints.
A constraint solver is built for linear constraints of the shapes. Missing constraints are identified from the user drawn sample and left out in the description. Near miss shapes are shown to the user based on the missing constraints. The user then chooses the shapes which are close to his necessity.

Discussion:
Capturing user intentions is important here. It would be interesting to see if speed and pressure data corresponding to strokes would help identify the constraints. Users tend to slow down at places of constraints (Sloppy selection).

Discussion:

Multimodal Collaborative Handwriting Training for Visually-Impaired People

Authors: Beryl Plimmer, Andrew Crossan, Stephen A. Brewster, Rachel Blagojevic

Comments:

Summary:
MCSig, a system designed with multimodal feedback for aiding visually-impaired people in learning to write. Unlike ordinary people, visually-impaired people lack the main form of feedback while writing - the visual feedback. This makes their learning harder. This paper proposes a tool which combines haptics and feedback through sound to help write the characters. A specially designed tool is being used in this tool to guide the user to trace through the character and a sound based feedback to get a feedback on the quality of the trace.

Discussion:
This paper throws interesting aspect of feedback to people. Using ridges on the paper to provide feedback on shapes is an interesting method.

This paper also shows the way we learn to write.
- First we start learning by tracing over the character. This means moving our whole hand and the pen together to trace the whole character. As the pen traces the character, hand also traces the character. This is the also the way we start writing in the cursive.
- As our confidence grows, we fix our palm and trace the characters by moving the pen and the 2 fingers holding our pen.

Using a Geometric-Based Sketch Recognition Approach to Sketch Chinese Radicals

Author: Paul Taele, Tracy Hammond

Comments:

Summary:
This paper gives a description of chinese radical recognition built using LADDER and sezgin primitive recognizer. The image recognition and neural network methods have several drawbacks like need for large training data, inability to store the order of the strokes. LADDER achieves a lesser accuracy in recognition but consumes less time in building and using the system.

Discussion:
Interesting things found in this paper is the significance of the geometric features in identifying the chinese radicals.

Friday, November 21, 2008

Fluid Sketches: Continuous Recognition and Morphing of Simple HandDrawn Shapes

Author: James Arvo,Kevin Novins

Comment:

Summary:

This paper discusses about a method feedback while sketching. The paper uses instantaneous morphing and recognition feedback in order to give the user more flexibility while drawing.
This paper provides mathematical models for morphing and recognizing shapes.
As the user sketches, the ink is transformed to the clean shape instantaneously. The clean shape then replaces the original ink as the user continues sketching. The author claims this reduces the effort in sketching.

Discussion:

The idea of instantaneous feedback on user- drawn sketch is interesting. But i think this can also confuse users while drawing a complex shape.

Tuesday, November 18, 2008

Sketch Recognition User Interfaces: Guidelines for Design and Development

Author: Christine Alvarado

Comment:
1. Daniel's blog

Summary:

This paper discusses Sketch Recognition user interfaces (SkRUI) and an application developed using this concept, MS Power point .
Online Edit mode - user needs to hold the pen down fro sometime and the system changes the mode to Edit mode automatically. User can then select the sketches. A pen up then results in switching back to sketch mode while the selected items are highlighted.
A switching between unrecognized / recognized sketch was provided by a checkbox at the top of the window.

Evaluation of the system was done to understand the perception of the user about the tool and what they wanted from the tool.
3 scenarios - Creating new diagrams/ slides with the tool
- Labeling the diagrams with keyboard since the system did not support handwriting recognition
- editing and sketching using pen

Design guidelines - this paper provides some design guidelines to design a SkRUI systems.
- Display recognition results only when the user is done sketching
- Provide obvious indications to distinguish free sketching from recognition
- Restrict recognition to a single domain until automatic domain detection becomes feasible
- Incorporate pen-based editing.
- Sketching and editing should use distinct pen motions
- SkRUIs require large buttons
- The pen must always respond in real time

Discussion:
These design guidelines may not be ideal for all the cases. There can be lot variability in designing SkRUIs. For instance, sketch recognition can be done while drawing the stroke. Its cannot be a rule.

This throws up lot of open questions:
- When to recognise ?
- When to show the results?
- How to switch modes in an unambiguous way?
- How to show recognition results?

Saturday, November 8, 2008

Magic Paper: Sketch-Understanding Research

Author: Randall Davis

Summary:
Why sketch? more intuitive. Difficulty in sketch recognition - order of strokes, noise, segmentation, overtracing, more degree of freedom more difficulty.

Recognizing sketch - identifying primitives - identify the shape based on order of strokes, set of geometric constraints or based on template . LADDER provides us a method to define geometric constraints for shapes. Using a variation of HMM (dynamic bayes net) to capture the order of strokes. These help us to build a sketch interface (e.g UML) . PRoviding refinement to sketch recognition by generating near miss samples.

Discussion:
Its a good summary on LADDER

Interactive Learning of Structural Shape Descriptions from

Author: Tracy Hammond, Randall Davis

Comments:

Summary:
This paper discusses about defining structural shape descriptions in LADDER by generating near miss examples. This paper discusses about handling the conceptual errors (under- constrained and over- constrained errors) that occurs while describing a shape. The other type of errors is syntactic errors (out of scope of this paper) .

Under-Contrained - missing equalLength condition for the 4 sides in describing a square.
Over- Contrained - adding equalLength condition for (top left) sides for a rectangle. Redundant constraints .

After the description of a shape with GUI, the first step is to find a good match between the typed description and the drawn shape that is a set of bindings that associates variables in the
description with the geometric shapes. The system chooses the variable assignment with the fewest failed constraints. The system then displays the subcomponents of the failed constraints for the developer to remove the constraints, if necessary. The developer should remove enough contraints for the system to recognise the initial hand-drawn shape.

The system then reduces the constraint list by testing with negative samples and positive samples.

Over constrained testing - shapes for differrent scales and rotation are displayed to the developer and asked if all the samples are positive. Testing for other constraints are done by negating the constraints and applying them to generate shapes. Similar testing is done for checking under-constraints.
The paper also describes the process behind generating shapes for different constraints.

Discussion:
Its a nice method to generate near miss samples as the users cannot always generate all the samples.

Sunday, November 2, 2008

Grouping Text Lines in Freeform Handwritten Notes

Author: Ming Ye, Herry Sutanto, Sashi Raghupathy, Chengyang Li and Michael Shilman

Comments: Yuxiang's blog

Summary:
This paper describes an effective way of grouping ink strokes in to text/shape .

Likelihood of Line:
Three features are used measure the likelihood of the line.
Linear Regression error(eLR) - measure of the deviation of the stroke points from the fitting line. Reflects the linearity.
Maximum inter stroke distance of the strokes projection along X(dxmax) and Y (dymax) - reflects the compactness of the stroke set.

Configuration Consistency - the paper identifies 3 configuration and consistency corresponding to the configuration. The paper also provides a method to measure the consistency using the neighborhood graph where the vertices correspond to a line and edges correspond to the distance between the line. If the distance is below a certain threshold and there are no drawings between them, the 2 vertices connected by the edge can be considered neighbors. Configuration consistency is computed using the neighbor-length-weighted sum of te orientation angle difference.

A cost function is defined as weighted sum of the 3 features discussed for the likelihood of the line and the configuration consistency. Groups are chosen which would reduce the cost function.
Optimization is done to cost function by redefining it as sum of 'eLR' and 'dxmax'.

Undergrouping errors - typically 'i / t' and high configuration energy errors are caused by temporally adjacent strokes being recognised in different lines. This solved using the finding the neighbors which are approximately parralel.

Iterations: The algorithm starts with the results of the temporal grouping and then uses gradient descent to reduce the cost function and fixes the under/over grouping errors during the iteration. Incremental parsing turns out to be much more efficient than batch mode.

Produces a result of 0.93 and 0.87 respectively for perfect word/draw and crude W/D

Discussion:
The way they solve the starting point problem by assigning groups based on temporal information is interesting. This paper will be effective in areas were text strokes are continuous i.e input is a set of paragraphs and drawings / input containing continuous text strokes. I do not think this would work well in cases of finite state machines where text strokes are very sparse and broken.


Monday, October 20, 2008

Structure in On-line Documents

Authors: Anil K. Jain and Anoop M. Namboodiri, Jayashree Subrahmonia

Comments:

Summary:
This paper discusses an approach to classify text and shape in online documents.
A linear decision boundary can be used to classify text and non text strokes using stroke length and stroke curvature details.

Grouping non text strokes : A minimum spanning tree is used to group the non text strokes. Strokes are nodes and the shortest distance between them are the edge weights. Inconsistent edges are removed from the spanning tree. Maximum inter region distance of 20 and intra region distance of 200 is imposed on the regions.

Tables - Ruled and unruled

Ruled tables - Ruled tables can be found using te hough transform(r,theta). theta = 0/90 there would be peaks. ruled tables can be found using the condition that it would contain atleast 5 lines.

Unruled tables - Projection on one axis give peaks for all the lines and projection on other axis gives peak with valleys( gaps between the columns).

Discussion:
Interesting features to classify tables. I did not understand how the spanning tree and clustering would help in classification. May be some of the reference papers would be helpful in understanding this.

Sketch Recognition for Computer-Aided Design

Author: Christopher Herot

Comments:

Summary:
This paper discusses about enhancing user experience in Sketch recognition system. This is done by understanding user intent from speed, pressure and sequence of the stroke. This paper highlights the cumbersome nature of the explicit action - response systems and proposes modifications which would alleviate this overhead.

First phase of interpretation - The sketch is converted to line and curves. Lines, curves and corners are identified using patterns in speed. Curves are smoothened using Bspline curve fit. The system assumes that strokes drawn slowly are intended more literally than those drawn hastily. The output is then processed for overtracing and thickness. Thickness may be used as a measure of the degree of emphasis intended for the line.

Second phase of interpretation - The next phase is latching the corners of the stroke which do not actually interesect. The latching radius is determined using the speed, line length and density of line around each point.

Further interpretations are done based on the context.
In order to tune the system to particular user sketching style, the system is trained with practise patterns drawn by the user.

This system with the help of above interpretations can decrease the number of interactions with the user compared to the traditional action - response system. Allowing multi level viewing and manipulation of the interpreted data is also important. This allows user to edit raw data as well as the interpreted data. The system adapts to the interpretation errors by changing the parameters as they are corrected by the user.

The paper also emphasises on the inclusion of the user's model of the system in the system model. This would imply tracking the user confidence in the system. This would heklp the system provide the right level of feedback and degree of inference making. This part is very critical since this is related to user behavior.

Discussion:
Understanding user intent using the speed, pressure and sequence seems to be very interesting idea. This seems to be practical but the final part of the paper which proposes a system which includes user's model of the system seems to be complex and theoretical. Parameters to capture user confidence is not clear.

Wednesday, October 15, 2008

Seminar by Edward Lank

Sloppy Selection:

Steering Law : Time to draw a stroke is directly proportional to length of the path and inversely proportional to width of the constrained path.
Speed of the stroke is directly proportional to width of the constrained path.

I do not remember the other law.
Both of them were used to find the intent of the user while selection using Ink.

It assumed that user selection path is more constrained if the user has to accurately select particular region/ stroke.

PUI:
Reducing the number of modes. Usually sketch interfaces plagued my the modes (writing, drawing, erase, edit...) . Method to reduce it by recognizing the user intent. Popping up tool-tip menu to identify the intent of the user and disambiguate the recognition process. e.g: A circle drawn by user can be a circle / a gesture for selection. Disambiguated by popping up "select" at tooltip. If user clicks on 'select', take selection mode or recognize stroke as circle.


Distinguishing Text from Graphics in On-line Handwritten Ink

Author: Christopher M. Bishop, Markus Svens´en, Geoffrey E. Hinton


Comments:


Summary:
This paper applies machine learning techniques to classify ink strokes into text/ shape.

Feature selection : A total of 11 features were selected and Total least squares, an approach similar to PCA was used to reduce the feature set. On the whole 9 features were identified

1. The stroke arc length, i.e., the sum of the lengths of the stroke segments.
2. The total absolute curvature, defined as the sum of the absolute angles between the consecutive segments.
3. The main direction (x- and y-components) of the stroke, as given by the TLS fit.
4. The eigenvalue (length-width) ratio of the TLS fit of the stroke.
5. The total number of fragments found in the stroke.
6. The arc length of the largest fragment of the stroke.
7. The total absolute curvature of the largest fragment.
8. The main direction of the largest fragment.
9. The length of the long side of the bounding rectangle (not axis-aligned) of the largest fragment.

Classification: The paper describes an Multi layer perceptron to classify the strokes. Then extends it using the temporal information to find the correlation between the classes( text/ shape). This is further extended using the gaps between the successive stroke.

MLP is sufficiently constrained and not over fitted. this is done by a 10 fold cross validation. A factor is introduced in the objective function (error) to balance the bias towards text. Bayes theorem is used to predict the classes.
This algorithm is then extended using the temporal information. The class labels of the strokes that are in sequence are co-related. A HMM is built over the transition probability P(tn/tn-1). Once the HMM is constructed, the viterbi algorithm is used to classify the strokes.
3 features identified for the representing gaps between strokes. A MLP was then formed using the features and then integrated with a Bi-partite HMM.

Conclusion: temporal model encodes the fact that strokes of one kind (text or graphics) are more likely to be followed by another stroke of the same kind rather than a stroke of the opposite kind. The approach can also modified to analyze the subsequence of strokes in cases where 2 strokes separated far apart can not help in predicting the next stroke.

Temporal gap can change while editing the document. Spatial context can be integrated with this model fairly easily.

Discussion:
Its exciting to see application of Pattern Recognition in sketch.

As far as i understand, the system calculates probability of a stroke to be text/ shape using the 9 features - uses MLP to decide if a stroke is shape/text. This classification result is further changed by incorporating temporal information and gaps between stroke. This is done by building a finite state machine( Bi-partite HMM) which considers P(tn/tn-1). and the gaps .

The results are interesting. This classifier will be effective for classifying handwritten paragraphs vs figures (e.g class notes). I do not think this will be effective in classifying diagrams and the labels written over them (e.g Constraint Satisfaction Diagrams...) .The temporal information and the gaps between strokes does not help much in these cases.

Tuesday, October 14, 2008

Ink Features for Diagram Recognition

Authors: Rachel Patel, Beryl Plimmer, John Grundy, Ross Ihaka

Comments:
1.
Daniel's blog

Summary:
This paper proposes an algorithm for text vs shape recognition. This paper builts a classification tree based on certain features to classify text from shape. The results are then compared to Microsoft's divider and Ink Kit.

Feature Selection:
Set of 46 shapes were selected and then important features were selected based on analysis of these features on the following data. Set of 9 shapes were recognized and samples were collected from 26 people. The paper uses rpart function to find a classification tree. The aim is to find the most optimal position for a split to be made so that there are a minimal amount of misclassified strokes. If this is done for all features in the feature set, using the observations in the dataset, then the features that most accurately split the data into text and shape stroke groups, with the least amount of misclassified strokes, will be identified as the significant features for division of text and shape strokes.

The final set of features are Time till next stroke, Speed till next stroke, Distance from last stroke.Distance to next stroke, Bounding box width, Perimeter to area, Amount of ink inside, Total angle.

* The size of shape strokes is much larger than text strokes reflected by the use of bounding box width, perimeter to area and amount of ink inside features.
* Curvature is relevant for differentiating joined up letters from shapes
* Inter stroke distance is used to find words. This feature is slow for strokes in a word. Faster speed but a high inter stroke distance suggests next word.

Future work - first step would be to replace classification tree with more robust classifiers with the same features which allows variability.

Discussion:
Samples of cases where the algorithm failed would have been useful to understand more about the algorithm.
I think the sample sketches used give more idea about some symbols which fall under difficult cases for classification. The check box and the musical notes are some difficult symbols to distinguish from the text.

Sunday, October 12, 2008

Renegade Gaming: Practices Surrounding Social Use of the Nintendo DS Handheld Gaming System

Author: Christine Szentgyorgyi, Michael Terry & Edward Lank

Summary:
This paper is a qualitative study investigating the collocated multiplayer gaming practices of Nintendo DS owners. It tries to answer question like who people play with, under what circumstances, and for what reasons in the context multiplayer gaming with DS.

Renegade gaming- 3 rules to play - appropriateness with respect to the location, degree to which the non player would get affected and personal image.

Gaming Goals:
In this study, subjects reported gaming for the following purposes:To pass time,To learn or keep one’s mind “sharp”,To be social and To engage in competitive play.

Social gaming: Engaging in multi-player games seen better than trash talking
Coordinated game plays : this happens less , more likely among group of friends with gud logistics .
Adhoc gaming: Involves including strangers in the game . Awkwardness in approaching. Location plays an influential part in this. Places meant for gaming makes it easy to interact with strangers.

DS multiplayer is considered to be less social, with three main factors contributing to the difference: the lack of a shared display, the reduced potential for spectators, and the closed nature of the gameplay experience
The ability to attach a large screen to DS can help could enhance the social experience surrounding group gameplay.

Discussion:
I dont have much to comment about this paper. Its altogether a good survey. It throws light on how gamers socialize , locations where games are played, intentions behind playing games, effect of screen size on socializing, ....

Saturday, October 11, 2008

MathBrush: A Case Study for Pen-based Interactive Mathematics

Authors: George Labahn, Edward Lank, Mirette Marzouk, Andrea Bunt, Scott MacLean, and David Tausky

Comments:
1.
Daniel's blog

Summary:
This paper discusses about a complete full-featured pen-math problem-solving system called Math Brush. This system uses Computer Algebra system (CAS) to solve the mathematical problems. This system allows Sketch inputs to input math problems.

The system provides a scrollable panel for pen input. It provides a context sensitive pop up menu which display a set of CAS features that can be used. The system allows the user to edit part of expression, manipulate sub expressions based on CAS output. The system also allows the user to plot the math expression in the form 2D/ 3D plots and it can be rotated using the pen.

Equation entry: The input validation panel is the input panel. The strokes made on them by the character recognition system. Some problems related to recognition were due to the extraneous ink let by the user (dots,...) . Editing gestures like scratch and translation were not provided for the user. Some other problems recognized were the user interface containing 2 input panels for correcting character recognition results and the structure of equations and user's inability to percieve the reason behind the recognition errors(correcting stroke vs correcting recognition results) .

Discussion:
The MathBrush seems to be an application which tries to incorporate MATLAB + pen input. The paper is well written in terms of explaining the limitations and failures of the system.
The idea of math education and research using computers is interesting.
I think the Maple system which the author uses to compare is from Stephen M Watt (Author of prototype pruning paper)

Wednesday, October 8, 2008

Sketch-Based Educational Games: "Drawing" Kids Away from Traditional Interfaces

Authors: Brandon Paulson, Brian Eoff, Aaron Wolin, Joshua Johnston, Tracy Hammond

Summary:
This paper discusses about various games where sketch recognition is useful.
APPLES - Animated Planetary Physics Learning and Entertainment Simulation using LADDER, Memory games involving sketches , Tools for sketching on maps, Sentence Diagramming - tool to annotate on sentence.

Discussion:
This paper gives a different applications of Sketch recognition.
Thinking in similar lines, i think a tool can be developed for finding if a person has dyslexia (getting confused between similar shapes). One more idea would be to develop tool to analyze how children learn writing. One way i could think of , is to ask a child to replicate a letter and find the closeness of the input to the letter. This would help us understand how handwriting of a person evolves. I dont know if it would be useful though.

Recognizing Free-form Hand-sketched Constraint Network Diagrams by Combining Geometry and Context

Authors: Tracy Hammond, Barry O’Sullivan

Comments:

Summary:
This paper discusses a sketch recognition system developed for Constraint Network diagrams using LADDER.
Author define geometrical recognition rules for nodes, lines , letters and constraints that would be used in the diagram. Example:

The letter V consists of two connecting lines abiding by the
following geometric rules:
1. The "neg" line is negatively or vertically sloped.
2. The "pos" line is positively or vertically sloped.
3. The endpoint "p1" of "pos" line and the endpoint "p1" of
the "neg" line are connected.
4. The endpoint "p2" of "pos" line is above the endpoint
"p1" of the "pos" line.
5. The endpoint "p2" of "neg" line is above the endpoint
"p1" of the "neg" line.
6. The endpoint "p2" of "neg" line is left of the endpoint
"p2" of the "pos" line.
7. The "pos" line and the "neg" line are of equal length.
Contextual rules are then used to further identify V:
1. The "node" contains the center of the "pos" line.
2. The "node" contains the center of the "neg" line.

The system combines the relative and absolute thresholds to get upper bounds and lower bounds. This solves the problem of loose thresholding in relative thresholds and strict thresholds of the absolute thresholds.

The system also allows editing of the sketched diagrams - delete, scale, drag ...

Discussion:
I see no significance in recognizing the letters inside the node. As they are just symbols to recognize each node , should it be recognized? Is it not enough to recognize just the nodes , lines and the constraints?. Removing letters from the recognition ,can actually reduce the complexity of rules.

LADDER, a sketching language for user interface developers

Authors: Tracy Hammond, Randall Davis

Comments:

Discussion:
This paper describes LADDER - a language to develop sketch recognition system.

This language allows users to specify the shapes to be recognized including the ways they can be edited and displayed.

Description limitations: LADDER can describe only shapes that have fixed graphical grammar. It cannot describe free hand diagrams. The shapes should have primitive components and should not have lot of irregularities.

Shape definition: This includes definition of
* list of components - primitives that make the shape
* geometric constraints - like angles between lines...
* aliases
* editing behavior - Editing operations(trigger) that can be performed on the shape and the corresponding effect (action)
* display methods - 4 ways of display can be set for a shape - original strokes, cleaned up (beautified) stroke, ideal shape and the alternate custom shape.

LADDER also allows to define hierarchical shapes, abstract shapes and shape groups.
It contains predefined components, constraints , editing behavior and predefined display to describe a shape. Vector (keyword) is used to define the minimum and maximum number of components for primitive shapes which can have variable number of components.

Multi domain recognition system: the system uses a bottom - up approach in recognition.
Low Level recognizer - recognizes primitive shape / combination of them from the stroke drawn and passes the result to high level domain shape recognizer. The limitation is when the low level recognizer fails, the domain shape recognizer would also fail.

Domain shape recognizer - this recognizer is based on Jesse rule. It searches for all combination of shapes that can satisfy the rule.A greedy algorithm is used to make the system faster. The down fall is in cases of ambiguity, the system may select wrong shape. The system was slower still and it was fixed by pruning the unrecognized strokes from the recognition tree.

The system also contains methods to find editing triggers and methods to solve constraints in the shape (ideal strokes).

Discussion:
As the author highlights, there is inherent problem in the bottom - up approach, the failure in low - level recognizer can cause failure in domain shape recognizer.
The ability of LADDER to define editing and display methods makes it powerful.


Ambiguous Intentions: a paper-like interface for creative design

Authors : Mark D Gross, Ellen Yi-Luen Do

Comments:
1.
Daniel's blog
Summary:
This paper discusses about sketch recognition application in Designing.

Interfaces for conceptual and creative design should recognize and interpret drawings. They should also capture users’ intended ambiguity, vagueness, and imprecision and convey these qualities visually and through interactive behavior.
Electronic Cocktail Napkin - freehand drawing environment implemented using Lisp using Wacom tablets, Apple Newton PDAs , mouse / trackball .

Configuration recognizer - recognizes set of shapes based on components and context. User defined recognizer is used. Recognizer is trained using the user drawn examples. The recognition features used depends on the context. For example, spatial relation may not be used for Circuit diagrams, instead connection are used to identify patterns. In case of ambiguity, the system retains the shape and alternatives till it resolves the ambiguity.

Imprecision - After recognizing the shapes, the system maintains connectivity, adjacency and alignment constraints. This allows user to stretch/ move the shapes. Domain specific context is used to identify the constraints to be applied while such edits.
User can gradually tighten the constraints to make a final design.

Implementation - Recognizer for strokes , Recognizer for Configuration and maintenance of constraints. As all the recognizers depend on the context information, the system maintains a current context which varies with the recognition results of the current stroke drawn.

Discussion:
This paper has taken good leverage of incremental design.
The idea of applying context to identify configuration is very interesting. This system identifies the current context based on the strokes drawn previously. there can be recognition problems when system ends up in conflicting context information.
The notion of measuring commitment and certainty of the designer from the precision, roughness, speed and overtracing is very interesting.

Wednesday, October 1, 2008

What!?! No Rubine Features?: Using Geometric-based Features to Produce Normalized Confidence Values for Sketch Recognition

Authors : Brandon Paulson, Pankaj Rajan, Pedro Davalos, Ricardo Gutierrez-Osuna, Tracy Hammond

Comments:
1. Andrew's blog
Summary:
This paper constructs a sketch recognition system with 44 features - 31 from paleo-sketch and 13 from rubine and uses a quadratic classifier to recognize sketches. The results are then compared to Paleo sketch. The samples used to train and test the system where the same as that used for paleosketch.
Of the 44 features , the system identifies the following features as important through feature subset selection:

* End point to stroke length ratio -
* NDDE & DCR
* Total rotation - above 4 features were used more than 90% to classify

* Curve least square error - to identify curves

* Circle fit : Major axis: minor axis - used to classify circle/ ellipse

* Spiral fit - Average radius to bounding box radius ratio, Center closeness error

* Polyline fit - # of substrokes, number of strokes passing line test, feature area error

* Complex fit- # of sub-fits, # of non polyline primitives, percent of subfits that are lines, complex score/ rank

Results show this system produces 97% accuracy on this optimal subset which is as good as the paleo sketch system(98.56%)

Discussion:
This is a very interesting set. Each feature is significant to identify a particular type of primitive.
I do not understand how the system classifies the arc. I think it considers arcs as subset of curves.
Its interesting to see the use of Polyline fit: feature area error, i think this helps the system to distinguish between ellipse/circle and polylines.

Calculating percent of substroke that are lines for Polyline and complex fit seems to be rendundant. But both of them are marked important by this system.

For curves , least squares error is significant and for poly lines, its feature area which is important. its quite different from what we have discussed in the class. I thought least squares would be more significant for lines and feature area for curves.

Backpropagation Applied to Handwritten Zip Code Recognition

Authors : Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackel

Comments:
1.
Yuxiang's blog

Summary:
This paper discusses about an algorithm to recognize Zip code numbers using neural networks. It uses a 3 layer neural network. H1 - 12 feature maps with 8x8 units, each unit takes a 5x5 field from the 16x16 image as input. H uses 12 feature maps with 4x4 units built on H1. H3 has 30 units fully connected to H2 and output layer has 10 units ( corresponding to 0-9) fully connected to H3.

Discussion:
I did not understand anything of this paper.

Friday, September 26, 2008

Envisioning Sketch Recognition: A Local Feature Based Approach to Recognizing Informal Sketches

Author: Michael Oltmans(thesis)

Comments:
1.Andrew's Blog

Summary:
This is a thesis work on freehand shape recognition. In this thesis, the author has discussed a method to recognize shapes based on the recognition results of the parts constituting the shapes.
Bulls-eye features are used in this method. This is calculated using a circle around interest point and cut into wedges. A vector containing the points which fall in particular wedges is taken as the feature. The circles around the interest point are separated equally in log space.The radius of the circles are chosen in such a way that it covers most of the shape. The inner bins capture more data while the outer bins capture coarse data.
The trajectory of the stroke is used to achieve rotational invariance. Shapes that drawn in the reverse order cannot be recognized. To solve this, bullseye feature for original and the reverse order of the shape is found. There may be errors in finding the bullseye histogram when the strokes are on the boundary. Some points may fall on either side of the boundary. To solve this, the bullseye axis oriented with interest point and rotating extra half a bin turn.
Direction at a point is found by tangent to window of 19 points. The preprocessing steps include resampling stroke for constant spatial frequency(> 1pixel). The author proposes way to calculate distance between the bullseye bins. The bins are also normalized based on the size of the bin. Outer bins hold lesser weights than inner bins. Finally, the result is normalized to 1 to remove the avoid the effect of the differences in total number of points in a region.
This algorithm first performs a clustering based on the bullseye for the training data. Then cluster centers added to a codebook. A match vector is then formed from the codebook which would contain data just enough for classifying shapes.
The algorithm uses support vector machines to differentiate the classes. One Vs One strategy is used to combine binary classfiers. So for n shapes n*(n-1)/2 binary classifiers are used.
For finding shapes in a complete sketch, the isolated shape recognition is run over candidate region. This assigns class labels to the candidate region. Isolated shape reconizers are trained for Wire recognition and also for variation in shapes in the form of change in alignment, inteference from the wire region,... . To give out a final set of predictions, the algorithm needs to first list the set of initial candidates and split the cluster of candidates if they are large.
Results show a 92.3% accuracy in recognizing circuit shapes and 94% recognition on HHreco dataset. The algorithm fails in recognizing symbols that are similar ( capacitor, battery, ground), (Bjt, diode), ... .

Discussion:
I have read 2 completely different approaches in sketch recognition today. One is the constellation method and the other is the bullseye window proposed by this author. These approaches are very different from the traditional approaches.
The results are quite good. The algorithm fails quite rarely in classifying similar shapes. It had worked well in classifying bjt and jfet. I am interested to know if this algorithm could differentiate resistor(zigzag line) and inductor(helix).
I think this algorithm will be very slow, given the number of steps in each section of the algorithm.

Constellation Models for Sketch Recognition

Authors: D. Sharon and M. van de Panne

Comments:
Summary:
This paper uses constellation model to recognize shapes. An object is defined with 2 features, individual and pair wise. Individual features give the shape and global location of the object. The pairwise features give the relative position of parts. Individual feature is a vector of X,Y( center of the line aligned to the bounding box), d(normalized length of the diagonal of the bounding box) and Cosine of the angle between the diagonal the X axis. Pairwise feature vector include difference in X, difference in Y, the minimum distance between the endpoints of stroke a and any point on stroke b, and the minimum distance between the endpoints of stroke b and any point on stroke a.
The sketch recognition process has two phases, the first which searches the space of possible mandatory label assignments, and the second which searches for optional labels for the remaining unlabelled strokes. In this way the mandatory labels provide contextual location information necessary for assigning appropriate labels to the potentially large number of optional parts.Mean and variance for the Individual and pair wise features are calculated separately and likelihood for labeling(L) is calculated by mutiplying the likelihood of individual features being recognized as L and likelihood of pair-wise features being labelled L.
A maximum likelihood search procedure is used to label the strokes. There are exponential number of possibilities of such selection. There are some strokes which might not be labelled. In order to reduce the computation cost because of the above mentioned factors, the search is done over 2 phases - labeling strokes which are mandatory and then linear search through the optional labels for the recognition of the remaining unlabeled strokes.The search is carried out using branch and bound tree. Multi pass thresholding is used to increase the speed of searching for large number of strokes.
Recognition can go wrong in several ways - inability to find suitable mandatory strokes because of the hard constraints, mislabeling of a mandatory stroke and mislabeling of optional strokes.
Errors are rare and imply a lack of training data, can occur if unusual strokes occur that affect
the overall bounding box and fewer mandatory strokes

Discussion:
The highlight of the paper is the constellation model. Using pairwise features to understand the spacial orientation is interesting. The number of training samples that this algorithm needs is quite large(20-60). The algorithm depends more on the bounding box and on the madatory strokes for the recognition. This is contrary to the flexibility it claims to give the user.

Thursday, September 25, 2008

A Domain-Independent System for Sketch Recognition

Authors: Bo Yu, Shijie Cai

Comments:

Summary:
This paper has an interesting way to find the corners. Its actually based on primitive shape recognition. This algorithm has two steps- Stroke Approximation and Post processing.In Stroke Approximation, The algorithm tries to fit a stroke into some primitive. If it fails, it splits the stroke into two substrokes at point of highest curvature then feeds back to the algorithm. This paper proposes different recognition criteria for primitives (lines,arcs, curves, circles, ellipse, helix and spiral) which is similar to that discussed in PaleoSketch. The second step, post processing , involves merging corners and cleaning up false positives. This is very similar to mergeCF . Removing short lines, merging overlapping curves/arcs .

Discussion:
This paper forsees creation of a system which allows to configure it for any shape. LADDER framework achieves this. Its an interesting approach to find corners based on the primitives. Accuracy of 98% for polylines and 94% for arcs are impressive.

GLADDER: Combining Gesture and Geometric Sketch Recognition

Authors: Paul Corey and Tracy Hammond

Summary:
This paper proposes a method to integrate a feature based classifier Rubine and a geometric recognition system based on LADDER. Rubine classifier is used to calculate minimum mahalanobis distance of the sample from the classes. Then the sample is rejected for the rubine classification based on a threshold. This threshold value is set to 35 which is claimed to be optimum threshold.
The sample which falls above the threshold are classified using geometric recognition system.

Discussion:
Its kind of clever way to improve accuracy by using geometric recognition system to classify samples that may be rejected by Rubine. I expected a significant increase in the accuracy of classification. As the paper states, the shapes which fall in the common area of both these systems may be the reason for errors(reduced accuracy).

Monday, September 22, 2008

A curvature estimation for pen input segmentation in sketch-based modeling

Authors: Dae Hyun Kim , Myoung-Jun Kim

Comments:
1. Daniel's blog

Summary:

This paper proposes an algorithm to find the corners based on the curvature and direction measures. Preprocessing steps include resampling strokes to make the inter point distance constant. This implies the curvature can be estimated only based on the direction values of consecutive points by removing the constant distance term.
Local Convexity: Curvature at a point is defined as the sum of curvature values over a window of k points before and after the current point (support window). This sum is greater if the point is a corner. This fails when there are corners which are close together.

Local Monotocity: When the Local convexity calculation fails, monotonocity is used to find the corners. This is used where corners are close together. The algorithm finds if the points around a corner is monotonously increasing or decreasing in curvature and finds the minima in the support window .This minima is then stated as the corner.

Discussion:
This method works well for polylines and smoother curves. Like any other corner finding algorithm, It fails for small curves and curves with irregularly changind curvature.

Eliminating False Positives During Corner Finding by Merging Similar Segments

Author: Aaron Wolin, Brandon Paulson and Tracy Hammond

Summary:
Chord length between 2 points - Sum of the euclidean distance between all the consecutive points between them.
Speed - chord length/ time difference.
In the first run , corners found based on the curvature and speed thresholds. If the corners are close together, one with lower curvature is removed.
The algorithm tries to remove false positives by merging 2 segments. It tries to check if the merged segment best fit into some primitive shape ( line/arc).
Discussion:
Simple and a powerful algorithm. I would like to see if it works well with helix and resistor type strokes.

PaleoSketch: Accurate Primitive Sketch Recognition and Beautification

Author: Brandon Paulson and Tracy Hammond
Comments:
1. Yuxiang's blog
Summary:
This paper is about primitive shape recognition. The shapes recognised by the system include line,polyline,arcs,curve, circle,ellipse,helix and spiral. First step in the recognition is to calculate some pre-recognition values for a stroke and pass it on to low level shape recognizers.Each of the recognizers return a boolean flag for recognition and a beautified shape object. Then the recognition results are fed into an hierarchy function which sorts them in order of best fit.
Pre recognition -
- Removing duplicate points (Points with same X,Y coordinates / time).
- Direction graph,speed graph, curvature graph and corners are computed.
- Normalized distance between direction extremes (NDDE)- this is calculated by dividing the stroke length between points with highest and lowest direction values by total stroke length.Curves will have high NDDE and polylines have Lowers NDDE.
- Direction change Ratio(DCR) - to find spikes in direction graphs. its is the maximum change in direction by average change in direction. First and last 20% of the stroke is analysed for higher curvature and the stroke till that point is considered tail. this is not performed on the strokes with short length.
- Test for overtracing - count the number of revolutions (2*pi) that the stroke has made. If its more than one revolution and above certain threshold, mark it as overtraced.
- Closed figure test - Ratio of distance between first and last point by the stroke length give measure of this feature.

Line test - Find the orthogonal distance between the best fit line and the real stroke. if the error(orthogonal distance/stroke length) is within a threshold, then mark the stroke as line.Verify overtracing and number of corners in the stroke.

PolyLine Test - Segment stroke into substrokes based on the corners. Berify line test on each substroke and each of them must have high DCR;

Ellipse test -Farthest points make the major axis and the the perpendicular bisector of the major axis is the minor axis. It should pass the closed shape test, NDDE must be high and feature area / area of ideal ellipse must be within a threshold.

Circle test:
Ellipse test+ ratio of major /minor axis should be close to 1.

Arc Test - Perpendicular bisectors of the chords on the arc can be used to determine the center of the arc. average distance of all the points from the center is take as ideal radius. Feature area error is taken into consideration. NDDE high and DCR low.

Curve test - fitting a bezier curve to the points. finding least square error with the bezier parametric values. Curve should have low DCR value.

Spiral test - Must pass Overtrace test, NDDE should be high,substroke should be a circle with ascending / descnding radius, Centers of the substrokes must be close to each other, distance between the farthest of the centers should not be greater than the diameter of spiral. distance between first and last point / the stroke length should be less than a threshold.

helix test -Spirals with moving centers. test on number of revolutions.

Complex test - more than one possible output.The results of the complex shapes are ranked.

Discussion -
Seems to be very powerful solution for identifying primitives. I would like to know the how computationally intensive this algorithm is. I liked the way this algorithm finds center for arcs. I did not understand the recognition part for curves since i do not know much about the bezier curves. Calculating rank for complex shape based on the primitive shapes they are composed of is interesting.


Thursday, September 18, 2008

Sketch Based Interfaces: Early Processing for Sketch

Authors: Tevfik Metin Sezgin, Thomas Stahovich, Randall Davis

Summary:
- Corner detection with direction change, curvature and speed of the stroke.
- Average Based filtering - threshold for curvature/speed is based on the averages.
- Speed data used to remove the errors generated by the curvature data. Speed data is low at corners.Problems with speed data -maximum speed for short lines may be below threshold and can be recognised as corners,.
- Generate hybrid fits based on the speed and curvature results based on vertex certainties, then selecting the best fit.Certainty measure for curvature is magnitude of curvature around the corner and certainty for speed is the measure of pen slowdown at the point. 1-vi/vmax.
- List of hybrid fits are sorted based on the certainty measure. Intersection of the curvature and speed candidate list is first chosen. Interatively, corners ,with best certainty measure on either of the lists, are added to the list to create 2 more lists H' and H''.Best of the 3 lists are chosen from the Mean square distance of the list from the stroke. The one with the lowest distance is chosen for next interation and this continues till the candidate corner lists are fully checked.

Discussion:
Does not work well for the strokes with many sharp curves like resistors in circuit diagrams.Detects corners on curves.

Monday, September 15, 2008

Algorithms for the reduction of the number of points required to represent a digitized line or its caricature

Authors: David H Douglas, Thomas K Peucker

Comments:
1.

Summary:
This paper suggests resampling of sketch data.
* Resampling based on removing duplicate points - remove points which lie with in threshold distance. might remove points which are important.
* Resampling based on selecting point - E.g. Selecting every 6th point in a line. Might remove important points
*Resampling by calculating perpendicular distances -
- take 2 points - anchor and floating point. find the perpendicular distance between all the points lying between them and the line formed by them.
- find the maximum of the perpendicular distances. If it lies within a threshold, the anchor and floating point can be deemed to be a line.
- this anchor and floating points window is moved across the stroke.

Discussion: 
Effective way of resampling strokes without losing sensitive points. Success of the algorithm greatly depends on the threshold chosen to limit the perpendicular distances. 
This algorithm gives a method to perform a straight line test over a set of points. A similar one is used in the Wolin's algorithm too.

ShortStraw: A Simple and Effective Corner Finder for Polylines

Authors: Aaron Wolin, Tracy Hammond

Comment:
1.
Daniel's blog
Summary:
Resampling - Interspacing distance = bounding box diagonal by a constant(40).
Corner Finding -
* for each point 'p' in the stroke, find the distance between (p-w,p+w). This distance is short when the point 'p' is the corner.
* find the median of all the distances. threshold for corner is fixed at t= median * 0.95.
* if straw(k) if below this threshold, then its a corner.
* Run a line test on 2 consecutive corners (0.95 threshold). if the corners does not pass the line test, then there are additional corners between them. So the threshold t is relaxed, to include the corners. Run this step until the segment between the corners passes the line test.
* Run a collinearity test on consecutive 3 corners.

Discussion:
a clever way to find corners in a stroke. I dont know how effective will relaxing the threshold for outlying corners will be. It might raise false positives.

Tuesday, September 9, 2008

User Sketches: A Quick, Inexpensive, and Effective way to Elicit More Reflective User Feedback

Author: Maryam Tohidi, William Buxton, Ronald Baecker, Abigail Sellen
Comments:
1. Nabeel's blog
Summary:
This paper compares various traditional methods of usability testing with one that involves sketching. Usability test was done with 48 participants on house climate control system.
3 prototypes were used. Participants were divided into groups of 12. First 3 groups were given one prototype each and the last one was given all the 3 prototypes.
Verbal feedback suggested first 2 prototypes were better than third one. Result was consistent with the 4th group.The hypothesis that the 4th group, that had all the prototypes, would give more feedback was proven wrong. Many of the participants had comments and not suggestions. They could not express their suggestion.
Feedback through Sketch - The participants came up with more ideas. The first 2 prototypes were mostly replicated exactly with minor modification. The 4th group came up with ideas that had features pulled out of all the 3 prototypes.

Discussion:
Excited to see a new application for Sketch.
These sketch + a domain specific sketch recognition system would be more valuable.

Prototype Pruning by Feature Extraction for Handwritten Mathematical Symbol Recognition

Author: Stephen M Watt, Xiaofang Xie

Summary:
Recognizer for large symbol sets - pruning prototypes to restrict search into small group of the sample. Preclassifying of the stroke based on certain features in done in this approach.
Preprocessing - chopping head and tail of the strokes(removing hooks), resampling , (average/gaussian) smoothing and size normalization.

Geometric features- number of loops, intersection and cusps.
* Loops - found using Minimum distance pair. Thresholds are fixed for time separation, distance between the pair, area of the loop.
* Intersections-modified Bently ottman sweepline algorithm - no detail
* Cusps - sharp turning point. Cusps are detected in this paper by finding 3 points which makes small angle. This algorithm also checks if the previous 2 points of the cusp lie on a straight line and also for next 2 points.

Ink Related Features: Number of Strokes, Point Density( dividing bounding box into 3 regions upper, middle and lower).

Directional features - Initial direction,End Direction and Initial End Direction.

Global Features - Initial and end position, Weight to Height Ratio

Discussion:
The author's thesis suggested some interesting stroke preprocessing - removing hooks(head/tail) from the start and end of the strokes based on sharp edges and Gaussian smoothing.
The calculation of point density is interesting. The bounding box is divided into 3 regions upper(40%),middle(20%) and bottom(40%).
I like the concept of finding out cusp and loops.In her thesis, she has discussed scenarios to where cusps can become loops while drawing and vice versa.
I am not able to understand how the author calculates the number of loops.

Sunday, September 7, 2008

GRAPHICAL INPUT THROUGH MACHINE RECOGNITION OF SKETCHES

Author: Christopher F. Herot

Summary:
HUNCH - multi level sketch interpreting system. Sketch interpreted in one level acts as the input for the next level and so on. Constant sample rate input device is used for sketching. Based on STRAIT which assumed that speed of stroke can be linked to intent with which it was drawn. Fast strokes were less intended than the slow ones. Speed of the stroke was less around the corners.Model worked better for some people. STRAIN , modification of the old system uses speed and bentness to find the corners.

Latching - STRAIT used latching to join endpoints which was lying within a radius. This simple technique produced lot of errors.Features that can be considered to determine latching - change in size/ angle, closure, number of lines, speed and line length, user profiles ( training to specific user).

Overtracing - Reducing the amount of data given by the user. line.The amount and style of overtracing may serve as a measure of a particular user's attitude toward the design, with heavy overtracing indicating emphasis or reinforcement of selected areas.

Other levels of inferences - Program uses simple rules of projective geometry to map two-dimensional networks of lines into three-dimensional data structures. this has also helped in finding the interdependencies of latching and finding the third dimension.
Program maps the sketches to 2 dimensional grids (similar to bitmap). lines passing through point is '1' and others have value '0'. Used in room finding algorithm - works for simple sketches.

Recognising sketch is done based on context. HUNCH system generates a context free data structure which follows a top down approach with general nodes on the top and specific details in the bottom. Matching is done in a top down fashion. First, match is found for the top most node then moved down.

Discussion:
This system doesnot use the domain based information instead allows users to manipulate the results and trains based on them. Some interesting mappings are given in this paper - speed of stroke to intent of user , speed/bentness - corner, overtracing - attitude of the user . It has also thrown light on the problems of latching and overtracing which i never knew before reading this paper.

Wednesday, September 3, 2008

Gestures without libraries, toolkit or training

Authors: Jacob.o.Wobbrock, Andrew D Wilson, Yang Li

Summary:
This paper proposes a simple four step algorithm to recognize gestures which is computationally less expensive compared to Rubine, DTW - Flexible and requires less training.
1. Resample the point path - Slow gestures give more sample points and fast gestures give less sample points. So resampling of gestures to N=64 (adequate , 32-N-256) is done.
2. Rotate Gesture based on indicative angle (angle between centroid of gesture and first point) - Find the indicative angle and rotate the gesture such that this angle is Zero.This paper proposes using Golden Section search using Golden ratio angle and bound within range of +45 - -45 degrees. It is better than brute force and the hill climbing algorithm.
3.Scale and translate : Gesture is scaled to a reference square and the translated to a reference point such that centroid lies at (0,0).
4. Find the euclidean distance between the templates and and candidate. The one that gives minimum distance is the match.

Limitations:
Cannot recognize gestures which differs on orientation, aspect ratios/ location.For the recognizer to be flexible to gesture variations, authors suggest to add the unrecognized gesture to the same class as the user intends it to be recognized.

Results show this algorithm performs identical to DTW in terms of recognizing accuracy and better than rubine. Recognition improved with number of templates (significantly in rubine).

Discussion:
this paper doesnot discuss about the rational applied behind the size of the reference square. Is it the bounding box of template/ is it some arbitrary square (larger than every sample)?. Reference square depending on the size of the template can also increase the error rate. (in case template is small).
Algorithm doesnot take care of direction of stroke.
Lots of limitations - limited support variations, inability to recognize gestures based on orientation differencs ,etc.
Algorithm will be useful for applications using fairly simple and less gestures.



Sunday, August 31, 2008

MARQS: RETRIEVING SKETCHES USING DOMAIN- AND STYLE-INDEPENDENT FEATURES LEARNED FROM A SINGLE EXAMPLE USING A DUAL-CLASSIFIER

Authors: Brandon Paulson, Tracy Hammond

Comments:
1. Daniel's blog
2. Ben's Blog

Summary:
MARQS - Media Albums Retrieved by a Query Sketch - an application to which can store music/ pictures and retrieve them with the help of sketch based queries.Applications uses 2 recognizers - simple classifier and a linear classifier (similar to Rubine). The classifiers use set of global features that describe the sketch rather than single stroke. This helps the classifier to be extended to multi stroke sketches.
Sketches are first rotated about an major axis ( line between farthest points on sketch) such that the axis lies horizontal.Features - Bounding box aspect ratio,Pixel Density, Average Curvature and The number of perceived corners across all strokes of the sketch. Simple classifier is used for sketch with one example. It calculates the absolute differences between the feature of sketch and template (Errors), then normalized by dividing them by the value of feature in query sketch.The errors are then summed up and the result is published based on the lowest errors.
Linear classifier (similar to
rubine) is applied to sketches with more than 2 samples.

Discussion:
This algorithm is an interesting approach. it is not affected by the direction of stroke/ scale/ change in local orientation. (None of the features depend on these factors). I do not think pixel density is an important feature- I would have to look into how this feature affects sketch recognition. Anyways it would be taken care by the weights calculated by the linear classifier.
I would like to know if there are features which can fit in this algorithm.

Specifying Gestures by example

Author: Dean Rubine
Comment:
1. Daniel's blog
Summary:
This paper describes GRANDMA, a toolkit for adding gestures to click and drag interfaces and about the single stroke gesture recognizer used by the toolkit. The single stroke gesture recognition is chosen in order to avoid segmentation problems of multi stroke gesture based systems and to use eager recognition. Single stroke gestures also contribute to better usability of the UI.

Designing Gestures with GRANDMA
GRANDMA is a Model/View/Controller(MVC) framework. Gesture Designer must first determine the view classes and the associated gestures. A controller is associated with a view class and all its instances and instances of all the subclass. To add a gesture to the system, a gesture handler is created. Then a new gesture class is created with 15 training examples per class.
Semantics of each gesture has 3 actions: recog, manip and done. Recog is evaluated when gesture is recognized(when the mouse stops moving -0.2 seconds time out), Manip is evaluated on subsequent mouse points and Done is evaluated when the mouse button is released. Gesture drawn over multiple gesture handling views are recognized with the priority given to the topmost view.
Sketch recognition is done based on the statistical analysis on the vector of features(extracted from input gesture). Features are chosen based on constant computation time, ability work efficiently on large/small gestures and ability to differentiate between gestures. Author suggests not to use rejection when 'undo' can be used. Eager Recognition allows the system to recognise the gesture even before its completed without any explicit indication from the user.Multi Finger gesture recognition can be done by applying single recognition on each stroke and applying global features to discriminate between different multi path gestures.

Discussion:
First, thing i donot understand is the purpose of views. The next thing, I don't understand the significance of associating views with gestures.

Note: Rubine's features does not take care of scaling/ direction of stroke. So a large gesture may not be recognized if the training data were small in size.

Saturday, August 30, 2008

Visual Similarity of Pen Gestures

Comments:
1. Nabeel's Blog
Summary:

Gestures, a powerful feature in Pen based UI, are commands issued with a pen. Factors that affect selection of gestures are similarity among gestures, difficulty in learning gestures and wrong recognition by the computer.A pair of experiments is conducted to measure the similarity among gestures and derive an algorithm for computing the similarity.To analyze the date collected, Multi-Dimensional scaling technique is used. This technique reduces the number of dimensions in the data set so the patterns can be seen by viewing a plot of the data (2/3 dimensions). The gesture set chosen in the first experiment were based on 2 criteria: span a wide range of gestures and to include variation in orientation. Data was collected from 21 people. Analysis of the plots of gestures generated by MDS helped author to determine the geometric properties which influenced the users perceived similarity. Regression analysis helped the author develop a model of gesture similarity which can predict how people perceive 2 gestures. Similarity of the gestures were given by the euclidean distance between their features. Smaller distance means greater similarity. Regression analysis produced weights for different features in determining the similarity.The authors were able to derive a model which predicts gesture similarities with 0.74 correlation with reported data.Second experiment had 3 gesture sets with each set used to check the effect of variation of particular features
on similarity perception.Though this goal could not be completely achieved since some features co varied with others.The derived model predicted reported gesture similarity with 0.71 correlation. The analysis of first gesture set (effects of absolute angle and aspect) gave 3 dimension with MDS plot. It showed that absolute angles in the set co varied greatly that it was difficult to determine if it was significant.Analysis of second gesture set shows that neither length nor area and significant in similarity judgement.Analysis of third gesture set showed that gestures whose lines were horizontal and vertical were perceived similar.

Discussion:
In both the experiments, the users were unaware of the operations a gesture is mapped to. It would interesting to find if there is a link between the context in which the gesture is used( application with which an user is interacting), operations performed by the gestures and the way user perceives a gesture.

Sketchpad by Ivan Sutherland

Comments:
1. Nabeel's blog
Summary:
This paper is a description of the Sketchpad system. This system allows the user to draw a shapes with the help of a pen and combination of keys. Author presents an example of drawing an hexagon to explain the concept of subpicture, constraints and copying.Changing the base hexagon changes the appearance of all the shapes that used the copies of the hexagon. A circular list of pointers is used to store the information on subpicture/shape. Garbage collection is performed over the list to remove free blocks. An hierarchy is maintained while storing new shapes. All shapes are grouped in a ring structure under generic. The blocks are further divided into super generic and generic generic blocks. For example: shape related to lines goes under Line with specific features. All the shapes in sketchpad are generated from lines, arcs and circles. These lines and circles are generated using difference equations.Features like Recursive merging, recursive deleting and expanding instances increases scope of sketchpad applications. Sketchpad allows users to add constraints to the system. Each constraint is added as a ring under generic block. Constraint satisfaction is done by a one pass method ,which tries to reduce the errors to zero by adjusting the constraint variables. When it fails, a slow and reliable relaxation method is used.

Discussion:
It would interesting to know the criteria used by the garbage collection algorithm in the system.
This system is similar to the FLUID framework in using the Geometric constraints.While this system uses the geometric constraints to create new shapes and use them to create complex pictures, the FLUID system uses these constraints to identify the ink drawn by user.

About me

Email Address: manoj.prasad@neo.tamu.edu

Graduate Standing: 1st year Masters

Why am i taking Sketch Recognition class? I am interested in this field.This is the field which has kept me interested for more than 1 year.

What experience do you bring to this class? During my internship in HP labs, i had built an IM which can connect with gmail account and send Ink across clients. Other applications i had built were xml parsing applications (INKML parsing) and converters (ISF <--> INKML converters).

What do you expect to be doing in 10 years? I dont know.

What do you think will be the next biggest technological advancement in computer science?
quantum computers generating real random numbers.

What was your favorite course in undergrad (CS or otherwise)?
Computer Architecture

If you could be another animal, what would it be and why?
Ants because they are workaholics. I like to become one but am not able to.

What is your favorite motto or slogan?

What is your favorite movie? Rope

Name some interesting fact about yourself.
I am lazy.
I like to know about things which i dont know.

Thursday, August 28, 2008

Introduction to Sketch Recognition by Tracy Hammond and Kenneth Mock

Comment:
1. Ben's Blog
Summary:


This paper is an overview of almost all the pen based technology available today. Sketchpad the first pen based system was created by Ivan sunderland in 1963. This system which utilized a pen and set of keys as mode of input to create simple graphics. From this primitive form , the pen based evolved over 40 years to tablet PCs, convertibles, whiteboards, USB connected Pen tablets, ... which are cheaper and affordable.These pen based systems use Digitizers to capture Pen based activities. These digitizers are of 2 type - Active and Passive. Passive digitizers detect touch, so even finger would be recognised as pen. The drawback of these digitizers are that they recognise the touch from palm and other things which are in contact with it. This causes vectoring. These digitizers usually have low accuracy and resolution. Active digitizers on the otherhand, recognise electromagnetic waves to detect pen activities(only special pens) and it is more precise.

This paper also discusses about the various applications of Pen based systems. Pen based systems have found themselves useful in teaching. Application like One Note, Annotations on Powerpoint, ... can be used to make presentation more dynamic and user defined. With the pen based systems, the field of sketch recognition has also grown into an important field. This field has application in diverse areas. ChemPad , an application which can recognise sketched diagran into a 3D view of the molecular structure. Hand Sketch, an application which recognises hand drawn symbols and plays music accordingly. This paper also discusses about a Framework(FLUID), which can used to create sketch interfaces. This framework uses ,LADDER and GUILD. LADDER, a language for defining objects that has to be identified by the application.
GUILD, a recognition system which recognises shapes based on the LADDER definition. The output of this system can be fed to Back end system , which would do domain specific jobs.

Discussion:
- Pen based systems have grown upon application where mouse cannot be used. Writing, sketching and gestures where using mouse would be difficult.So there raises the question - Can pen replace mouse? I think this would be difficult because we are used to mouse for so long that any change would cause discomfort to the users. This would require change in UI to either accomodate mouse activities( the way it is done now) or create a new UI which would not need mouse.