In our previous post, we mentioned that we published the Kinder-Gator dataset, which contains the motions of 10 children and 10 adults performing motions in front of the Kinect. Currently, we are exploring recognition of whole-body motions in the dataset. Since we are focusing on whole-body motions, we would like to concentrate on motions in which movement involves one or more limbs in the body. Hence, we only use a subset of the motions in Kinder-Gator, since it also includes motions that just involve hand motion or body poses. To test the performance of their $1 unistroke gesture recognition algorithm; an algorithm designed to help incorporate stroke gestures into games and UI prototypes, Wobbrock et al. [1] defined a representative set of 16 unistroke gestures (e.g., a triangle, an X) that are useful for these applications. Similarly, we want to define a representative set of motions that encompasses the unique combinations of upper and lower limb movements in our dataset. This representative set will be used to evaluate the performance of the recognition algorithms we are currently exploring for whole-body motions. In this blog post, we discuss the steps we are taking to define the representative subset from motions in Kinder-Gator:
- EXCLUSION: We are excluding motions that involve drawing shapes and symbols, and motions that involve making symbols and shapes with the body that Kinder-Gator includes. We are excluding the drawing motions because these motions usually involve the movement of the hand or wrist while the rest of the body remains static. Hence, these motions are not good representatives of motions involving the whole-body. Furthermore, we are excluding these motions because they are intended for the recognition of the shape or symbol being performed, rather than the recognition of the motion in its entirety. Examples of motions from the Kinder-Gator dataset that fall within this category include: “Draw the letter A in the air” and “Make the letter X with your body”.
- CHARACTERIZATION: As mentioned earlier, we want our representative subset to be unique in terms of upper and lower limb movement. Hence, in this step, we are characterizing motions in terms of the dimensions of movement of the upper and lower limb, and we exclude motions that are too similar in their dimensions of movement, to avoid collisions. To accomplish this, first, we are excluding motions that are mirrors of other motions, since the motions being performed are the same, just with the opposite limb. For example, the motion ‘wave your other hand’ is a mirror of the motion ‘wave your hand’ so we exclude the mirror. Next, we are characterizing the movement of joints in the upper limb (hand and shoulder) and lower limb (knee and foot) along the horizontal x, vertical y, and depth z dimensions. By doing this, we expect to identify motions that are similar in their upper and lower limb movement for the next step.
- SELECTION: Finally, to identify the final representative subset of motions, we are grouping motions that are similar based on the characterization in the previous step. That is, we are grouping motions that are similar in terms of their upper and lower limb movement. The groupings resulted in 16 groups wherein each group contained a unique combination of upper and lower limb movement. Afterward, we will choose one motion from each group to form the representative subset of motion. Our next step is to use these motions to test the performance of existing recognition algorithms, and then adaptations or new algorithms as well.
Working on the POSE project has been very interesting and has allowed me to gain a better understanding of recognition algorithms. I look forward to gaining more knowledge as I progress further in the project.
REFERENCES
- Wobbrock, Jacob O., Andrew D. Wilson, and Yang Li. “Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes.” Proceedings of the 20th annual ACM symposium on User interface software and technology. ACM, 2007.
- Anthony, L., & Wobbrock, J. O. (2010, May). A lightweight multistroke recognizer for user interface prototypes. In Proceedings of Graphics Interface 2010 (pp. 245-252). Canadian Information Processing Society.