Wednesday, September 3, 2008

Gestures without libraries, toolkit or training

Authors: Jacob.o.Wobbrock, Andrew D Wilson, Yang Li

Summary:
This paper proposes a simple four step algorithm to recognize gestures which is computationally less expensive compared to Rubine, DTW - Flexible and requires less training.
1. Resample the point path - Slow gestures give more sample points and fast gestures give less sample points. So resampling of gestures to N=64 (adequate , 32-N-256) is done.
2. Rotate Gesture based on indicative angle (angle between centroid of gesture and first point) - Find the indicative angle and rotate the gesture such that this angle is Zero.This paper proposes using Golden Section search using Golden ratio angle and bound within range of +45 - -45 degrees. It is better than brute force and the hill climbing algorithm.
3.Scale and translate : Gesture is scaled to a reference square and the translated to a reference point such that centroid lies at (0,0).
4. Find the euclidean distance between the templates and and candidate. The one that gives minimum distance is the match.

Limitations:
Cannot recognize gestures which differs on orientation, aspect ratios/ location.For the recognizer to be flexible to gesture variations, authors suggest to add the unrecognized gesture to the same class as the user intends it to be recognized.

Results show this algorithm performs identical to DTW in terms of recognizing accuracy and better than rubine. Recognition improved with number of templates (significantly in rubine).

Discussion:
this paper doesnot discuss about the rational applied behind the size of the reference square. Is it the bounding box of template/ is it some arbitrary square (larger than every sample)?. Reference square depending on the size of the template can also increase the error rate. (in case template is small).
Algorithm doesnot take care of direction of stroke.
Lots of limitations - limited support variations, inability to recognize gestures based on orientation differencs ,etc.
Algorithm will be useful for applications using fairly simple and less gestures.



3 comments:

Nabeel said...

Yes I also think that this algorithm is not useful for any real world applications.

You are right that there is no rationale behind the size of the reference square. I think the author here assumes that the templates and candidates are of reasonable sizes and their features a clear.

Yuxiang said...

What they do is to scale the sketches so that all of them fit in a bounding box of exactly the same size...yes, this can cause error, for instance, a roughly straight line has a very narrow bounding box, when it's forced to be widened to become a square, the line may turn into a zigzag.

andrew said...

I wouldn't go as far to say that this algorithm has no real world applications. They implemented it in javascript, which means a web browser can use it. That seems valuable to me. It's simplicity is its advantage. You wouldn't use it in a advanced systems with lots of gestures, but for a light-weight or mobile systems it sounds useful.