ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

AEPC 2010 - AEPC 2010 : Workshop on Annotation and Exploitation of Parallel Corpora

Date2010-12-02

Deadline2010-09-26

VenueTartu, Estonia Estonia

Keywords

Websitehttp://math.ut.ee/tlt9/aepc/

Topics/Call fo Papers

Workshop on Annotation and Exploitation of Parallel Corpora (AEPC)
Find information on the workshop in short on the AEPC flyer.

In recent years parallel corpora have become ever more useful for data-driven Machine Translation, Word Sense Disambiguation, or Cross-language Information Retrieval. Most of the time parallel corpora were used as raw texts (i.e. without any linguistic annotation) or with independent linguistic annotation (i.e. linguistic annotation that was applied to either language side without resort to the other). We believe that the full potential of parallel corpora will be reached when parallel corpora are aligned and annotated concurrently. Many research strands like the automatic creation of parallel treebanks and parallel parsing point in this direction. In particular the popularity of syntax-enhanced approaches to statistical machine translation and the rise of multilingual corpus linguistics indicate the relevance of this workshop at this point in time.

Various projects have been initiated to build aligned parallel treebanks [Cmejrek et al., 2005, Gustafson-Capková et al., 2007, Ahrenberg, 2007] and most of them are based on tedious manual labor [Lundborg et al., 2007, Samuelsson and Volk, 2007]. Recently, several attempts have been made to automate this process mainly focused on creating syntaxoriented translation models [Wang et al., 2002, Gildea, 2003, Zhechev and Way, 2008, Lavie et al., 2008]. The main strategies are based on alignment through parsing and chunking [Spreyer et al., 2008], language pair-dependent alignment rules [Groves et al., 2004] and the use of previous word alignment to induce phrase correspondences [Zhechev and Way, 2008]. Discriminative approaches using supervised learning have been successfully applied as well [Tiedemann and Kotzé, 2009]. Using these techniques to scale up the size of available aligned treebanks opens up a wide range of new possibilities for the exploration of cross-lingual data with syntactic and semantic information.

The work on automatic tree alignment is closely related to synchronous parsing based on transduction grammars (as in [Melamed, 2003]) or based on bootstrapping from a small set of manually labeled seeds (as in [Kuhn and Jellinghaus, 2006]). The advertised advantage is that the parallel text helps in syntactic disambiguation as well as in fast and robust annotation. Multiparallel corpora are considered to be of higher value than bilingual corpora.

Automatic syntactic annotation depends on the availability of language technology modules (e.g. PoS taggers and parsers) in the respective language. Resource-poor languages might not have this technology infrastructure. Moreover manual annotation is time-consuming. Therefore [Hwa et al., 2005] and [Smith and Eisner, 2006] have proposed ways to transfer syntactic information in parallel corpora, termed annotation projection, from one language to another.

As a follow up to the work on projecting syntactic information across parallel corpora, the projection of semantic annotation was pioneered in recent work by [Padó and Lapata, 2009]. They have worked on the transfer of frame-semantic annotation across parallel corpora. We believe that improved functional and semantic projection is a necessary step to speed up the tedious process of semantic annotation. This is confirmed in recent work by [Dorr et al., 2010].

There are few tools for corpus linguistics over parallel corpora, there are even fewer for visualizing and searching annotated parallel corpora (an example is [Germann, 2007]). With the increasing interest in and availability of annotated parallel corpora we see a growing demand for such tools.

With this workshop we try to bring together researchers that work on annotating parallel corpora for various languages and purposes and researchers that explore such resources for various applications. The following research areas will be addressed:

Parallel Treebanks (manual or automatic creation)
Cross-language Word Alignment and Phrase-Structure Alignment
Parallel Grammars, Parallel Parsing
Grammar Induction
Parallel Semantic Annotation
Parallel Referent Resolution and Anaphora
Annotation Projection
Multi-parallel Corpora
Tools for Multilingual Corpus Linguistics
Exploitation of Parallel Corpora for Evaluation
Annotated Parallel Corpora for Machine Translation
Novel Applications of Annotated Parallel Corpora
AEPC Workshop Schedule
Deadline for paper submission: 26 September 2010
Notification of acceptance: 24 October 2010
Final version of paper for workshop proceedings: 15 November 2010
Workshop: 2 December 2010

Last modified: 2010-08-31 13:10:18