MLAC 2016 - Workshop on Multi-view Lip-reading/Audiovisual Challenges

Date2016-11-21 - 2016-11-23

Deadline2016-08-15

VenueTaipei, Taiwan

Keywords

Websitehttps://ouluvs2.cse.oulu.fi/ACCVW.html

Topics/Call fo Papers

It is known that human speech perception is a bimodal process that makes use of both acoustic and visual information. There is clear evidence that visual cues play an important role in automatic speech recognition either when audio is seriously corrupted by noise, through audiovisual speech recognition (AVSR) or even when it is inaccessible, through automatic lip-reading (ALR).
This workshop is aimed to challenge researchers to deal with the large variations of the speakers' appearances caused by camera-view changes. To this end, we have collected a multi-view audiovisual database, named 'OuluVS2' [1], which includes 52 speakers uttering digit strings, short phrases and sentences. To facilitate participants, we have preprocessed the first two types of data to extract the regions of interest. The cropped mouth videos are available to researchers together with the original ones.
Please visit Home for instructions on how to download the database.
Researchers are invited to work on either type of data and tackle the following problems:
Single-view ALR/AVSR - to train and test on data recorded from a single camera view.
Multiple-view ALR/AVSR ? to train and test on synchronized data recorded from multiple camera views.
Cross-view ALR/AVSR ? to learn knowledge from videos recorded from a reference view (e.g., the frontal view) to enhance recognition performance for a target view (e.g., the profile view) from which there is not sufficient amount of training data.

Other CFPs

Last modified: 2016-06-05 14:42:20

Main Menu

Searching By

PARTNERS

MLAC 2016 - Workshop on Multi-view Lip-reading/Audiovisual Challenges

Topics/Call fo Papers

Advertisement