MLAC 2016 - Workshop on Multi-view Lip-reading/Audiovisual Challenges
- 26th International Conference on Thermal, Mechanical and Multi-Physics Simulation and Experiments in Microelectronics and Microsystems
- 26th European Conference on Knowledge Management
- Artificial Intelligence (AI) in the Age of Transformation: Opportunities and Challenges
- 45th BARCELONA International Congress on “AI and Educational Technology: Innovations & Challenges” (AIET-25) Aug. 11-13, 2025 Barcelona (Spain)
- 13th International Conference on Opportunities and Challenges in Management, Economics and Accounting
Topics/Call fo Papers
It is known that human speech perception is a bimodal process that makes use of both acoustic and visual information. There is clear evidence that visual cues play an important role in automatic speech recognition either when audio is seriously corrupted by noise, through audiovisual speech recognition (AVSR) or even when it is inaccessible, through automatic lip-reading (ALR).
This workshop is aimed to challenge researchers to deal with the large variations of the speakers' appearances caused by camera-view changes. To this end, we have collected a multi-view audiovisual database, named 'OuluVS2' [1], which includes 52 speakers uttering digit strings, short phrases and sentences. To facilitate participants, we have preprocessed the first two types of data to extract the regions of interest. The cropped mouth videos are available to researchers together with the original ones.
Please visit Home for instructions on how to download the database.
Researchers are invited to work on either type of data and tackle the following problems:
Single-view ALR/AVSR - to train and test on data recorded from a single camera view.
Multiple-view ALR/AVSR ? to train and test on synchronized data recorded from multiple camera views.
Cross-view ALR/AVSR ? to learn knowledge from videos recorded from a reference view (e.g., the frontal view) to enhance recognition performance for a target view (e.g., the profile view) from which there is not sufficient amount of training data.
This workshop is aimed to challenge researchers to deal with the large variations of the speakers' appearances caused by camera-view changes. To this end, we have collected a multi-view audiovisual database, named 'OuluVS2' [1], which includes 52 speakers uttering digit strings, short phrases and sentences. To facilitate participants, we have preprocessed the first two types of data to extract the regions of interest. The cropped mouth videos are available to researchers together with the original ones.
Please visit Home for instructions on how to download the database.
Researchers are invited to work on either type of data and tackle the following problems:
Single-view ALR/AVSR - to train and test on data recorded from a single camera view.
Multiple-view ALR/AVSR ? to train and test on synchronized data recorded from multiple camera views.
Cross-view ALR/AVSR ? to learn knowledge from videos recorded from a reference view (e.g., the frontal view) to enhance recognition performance for a target view (e.g., the profile view) from which there is not sufficient amount of training data.
Other CFPs
Last modified: 2016-06-05 14:42:20