Sign language processing

sign2A sign language is a language which uses manual communication and body language to convey meaning, as opposed to acoustically conveyed sound patterns. This can involve simultaneously combining hand shapes, orientation and movement of the hands, arms or body, and facial expressions to fluidly express a speaker’s thoughts. While they share many similarities with spoken languages (which depend primarily on sound), show the same linguistic properties and use the same language faculty, however they use space for grammar in a way that spoken languages do not.

Performing even some of the basic language processing task on sign language data still presents several challenges, mainly due to its strongly multimodal nature that employs a much larger set of synchronized and complementary modalities to form meaning. In analogy to speech and text processing, such tasks may include sign language recognition (from video), synthesis (via a virtual character), creation of lexical resources and corpora (in the form of annotated video assets), or sign language teaching and learning.

Capturing sign language at a level of detail sufficient for studying and modelling it, requires sophisticated full body motion capture equipment, also covering the fingers, the head pose, the facial expressions and the gaze. If the capture is to be used for the creation of sign language resources, it needs to take place under controlled conditions at a properly configured space ensuring easy post-processing by providing proper lighting and clean views of all the bodily expressions. A similar configuration can be used for obtaining video clips of the signing using a properly configured camera set positioned at different angles and recording at high-enough frame rates. These recordings would not only be used as a reference material for the captured motion data, but also as training material for tailored video processing algorithms aimed at recognizing sign language solely from the video stream. Similarly, depth sensors, such as the Kinect sensor, can be also introduced into the setting, providing an additional, complementary data stream for training sign language recognition algorithms.

At the synthesis side, competent software platforms are required for rendering in 3D realistic models of virtual characters with sufficient degrees of freedom in all the critical body parts in order to generate convincing signing avatars with high degree of naturalness for many applications areas including assistive technologies and education.