ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

ALRA 2012 - workshop on Active Learning in Real-world Applications

Date2012-09-28

Deadline2012-06-29

Venuebristol (u, UK - United Kingdom UK - United Kingdom

KeywordsActive Learning

Websitehttp://www.nomao.com/labs/alra

Topics/Call fo Papers

This workshop aims to offer a meeting opportunity for academics and industry-related researchers, belonging to the various communities of Computational Intelligence, Machine Learning, Experimental Design and Data Mining to discuss new areas of active learning, and to bridge the gap between data acquisition or experimentation and model building. How active sampling, incremental learning and data acquisition, can contribute towards the design and modeling of highly intelligent machine learning systems?
Machine learning indicates methods and algorithms which allow a model to learn a behavior thanks to examples. Active learning gathers methods which select examples used to build a training dataset for the predictive model. All the strategies aim to use a set of examples as small as possible and to select the most informative examples.
When designing active learning algorithms for real-world data, some specific issues are raised. The main ones are scalability and practicability. Methods must be able to handle high volumes of data, and the process for labeling new examples by an expert must be optimized.
We encourage papers that describe applications of active learning in real-world. The industrial context, the main difficulties met and the original solution developed, shall be described. Contributions on the following challenge, that proposes such a practical application of active learning, will also be welcome.
Associated challenge
As a search engine of places, Nomao collects data coming from multiple sources on the web and needs to aggregate them properly. The deduplication process consists in detecting what data refer to the same place. To automate this process, using Machine Learning is well suited, and to optimize the creation of the training dataset, using Active Learning is appropriate.
However, in that case, millions of data must be labeled, so labeling the training examples one by one, and running the model at each step, is unpracticable. Instead, sets of examples must be proposed for labeling, and this raises specific issues.
Today, 33.059 examples have already been labeled, each example being characterized by 118 features. This training dataset is available on the Nomao Challenge page.
A huge test dataset of unlabeled examples will also be provided. Then two active campaigns will be organized, each participant being allowed to ask for the labeling of a given number (e.g. 100) of the test examples by an expert.
Then a final test campaign will be carried out to evaluate the different approaches proposed, each participant being asked to label a given set of examples, and their predictions being compared to the known true labels.
Papers that address this issue will be welcome. Authors will thus contribute to the confrontation of proposed solutions and to discussions during the workshop. And author of the best results will receive a free registration for the conference and workshop.

Last modified: 2012-04-11 23:17:41