Dialects 2011 - Dialects-2011 ? First Workshop on Algorithms and Resources for Modelling of Dialects and Language Varieties Jeremy Jancsary, Friedrich Neubarth, Harald Trost
Topics/Call fo Papers
The currently prevailing statistical paradigm has made possible major achievements in many areas of natural language processing. But since the methods employed critically depend on the availability of large training corpora, the applicability of these methods is generally limited to major languages / standard varieties, to the exclusion of dialects or varieties that substantially differ from the standard.
However, language varieties (and specifically dialects) are a primary means of expressing a person's social affiliation and identity. Hence, computer systems that can adapt to the user by displaying a familiar socio-cultural identity are expected to raise the acceptance within certain contexts and target groups dramatically. But current systems are far from achieving the fidelity required for realization of these benefits.
The crucial obstacle is scarcity of data. Most important of all, substantial corpora of language varieties or dialects are rare. Moreover, authoritative orthographic conventions usually do not exist. As a result, the notation of written texts can vary widely and there are no obvious conventions for the annotation of speech corpora.
This situation calls for novel approaches, methods and techniques to overcome or circumvent the problem of data scarcity, but also to enhance and strengthen the standing that language varieties and dialects have in natural language processing technologies as well as in interaction technologies that build upon the former.
While there will be a clear focus on machine learning applied to the before mentioned problems, this workshop aims at gathering researchers with expertise in various disciplines.
Topics
machine learning algorithms operating in the regime of data scarcity
bootstrapping and active learning schemes for principled acquisition, annotation or generation of training data
methods to acquire resources by exploiting the proximity between varieties and standard language
issues of orthography and annotation
machine translation between language varieties or dialects
speech synthesis of dialects with limited corpora
interaction technologies dealing with social identity in speech and text
novel approaches transcending the paradigm of statistical modelling
Progress in the above listed topics requires an interdisciplinary approach: machine learning, machine translation, speech synthesis, automatic speech recognition but also linguistics and interaction technologies will have to contribute. We invite researchers with a genuine interest in modelling of language varieties and the advancement of natural language processing in this area.
Submissions
We invite high-quality submissions on original, unpublished work in areas relating to the aforementioned topics. Both significant theoretical advances and descriptions of successful practical systems involving processing or generation of language varieties are welcome. Submission of work that is only incremental in nature or describes minor progress is explicitly discouraged.
Two paper categories will be distinguished:
Long papers are expected to report on contributions of lasting value and will be presented orally in the plenary session of the workshop. Submissions should not exceed a length of 9 pages, excluding references.
Short papers are ideally suited for exciting new work that is not yet mature enough for a long paper, but has substantial merit. The work will be presented during the poster session and - depending on the type of work - a system demonstration can be given. The length of short papers is restricted to 4 pages, excluding references.
Reviewing will be double-blind, so please ensure your submission is properly anonymized. In particular, the paper should not reveal the authors' identities or include acknowledgments or references to project names, websites, software or such that might give away the identity.
However, language varieties (and specifically dialects) are a primary means of expressing a person's social affiliation and identity. Hence, computer systems that can adapt to the user by displaying a familiar socio-cultural identity are expected to raise the acceptance within certain contexts and target groups dramatically. But current systems are far from achieving the fidelity required for realization of these benefits.
The crucial obstacle is scarcity of data. Most important of all, substantial corpora of language varieties or dialects are rare. Moreover, authoritative orthographic conventions usually do not exist. As a result, the notation of written texts can vary widely and there are no obvious conventions for the annotation of speech corpora.
This situation calls for novel approaches, methods and techniques to overcome or circumvent the problem of data scarcity, but also to enhance and strengthen the standing that language varieties and dialects have in natural language processing technologies as well as in interaction technologies that build upon the former.
While there will be a clear focus on machine learning applied to the before mentioned problems, this workshop aims at gathering researchers with expertise in various disciplines.
Topics
machine learning algorithms operating in the regime of data scarcity
bootstrapping and active learning schemes for principled acquisition, annotation or generation of training data
methods to acquire resources by exploiting the proximity between varieties and standard language
issues of orthography and annotation
machine translation between language varieties or dialects
speech synthesis of dialects with limited corpora
interaction technologies dealing with social identity in speech and text
novel approaches transcending the paradigm of statistical modelling
Progress in the above listed topics requires an interdisciplinary approach: machine learning, machine translation, speech synthesis, automatic speech recognition but also linguistics and interaction technologies will have to contribute. We invite researchers with a genuine interest in modelling of language varieties and the advancement of natural language processing in this area.
Submissions
We invite high-quality submissions on original, unpublished work in areas relating to the aforementioned topics. Both significant theoretical advances and descriptions of successful practical systems involving processing or generation of language varieties are welcome. Submission of work that is only incremental in nature or describes minor progress is explicitly discouraged.
Two paper categories will be distinguished:
Long papers are expected to report on contributions of lasting value and will be presented orally in the plenary session of the workshop. Submissions should not exceed a length of 9 pages, excluding references.
Short papers are ideally suited for exciting new work that is not yet mature enough for a long paper, but has substantial merit. The work will be presented during the poster session and - depending on the type of work - a system demonstration can be given. The length of short papers is restricted to 4 pages, excluding references.
Reviewing will be double-blind, so please ensure your submission is properly anonymized. In particular, the paper should not reveal the authors' identities or include acknowledgments or references to project names, websites, software or such that might give away the identity.
Other CFPs
- 2012 Conference on Empirical Methods in Natural Language Processing
- Seventh Workshop on Statistical Machine Translation (WMT12)
- IS&T/SPIE Electronic Imaging 2012
- 2011 International Conference on Energy, Environment and Sustainable Development (EESD 2011)
- 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2011)
Last modified: 2011-04-11 14:22:54