ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

SPMRL-SANCL 2014 - First Joint Workshop on Statistical Parsing of Morphologically Rich Languages and Syntactic Analysis of Non-Canonical Languages (SPMRL-SANCL 2014)

Date2014-08-24

Deadline2014-05-02

VenueDublin, Ireland Ireland

Keywords

Websitehttps://www.spmrl.org/spmrl-sancl2014.html

Topics/Call fo Papers

First Joint Workshop on Statistical Parsing of Morphologically Rich Languages and Syntactic Analysis of Non-Canonical Languages (SPMRL-SANCL 2014)
ENDORSED BY SIGPARSE
Co-located with Coling 2014, August 23/24 in Dublin, Ireland
SPMRL-SANCL 2014 will feature a shared task on semi-supervised parsing morphologically rich languages.
Outline
Statistical parsing of morphologically-rich languages has repeatedly been shown to exhibit non-trivial challenges including, among others, sparse lexica in the face of rich inflectional systems, parsing deficiency in the face of free word order and treebank annotation idiosyncrasies in the face of morphosyntactic interactions.
Similar problems arise for parsing non-canonical languages. Besides technical issues such as lexical sparseness and ad-hoc structures, we also face theoretical problems including constructions that do not, or very seldomly occur in standard language, such as verbless sentences or complex hashtags.
The first joint SPMRL-SANCL workshop addresses both the challenge of parsing MRLs and NCLs. It provides a forum for research addressing the often overlapping issues of both fields with the goal of identifying cross-cutting issues in the annotation and parsing methodology for such languages.
Areas of interest
The areas of interest of the SPMRL-SANCL workshop include, but are not limited to, the following list of topics:
applying cutting-edge parsing techniques to new languages and domains
strengths and weaknesses of current parsing techniques when applied to morphologically-rich and/or non-canonical language
insights and techniques that are targeted at improving parsing quality for morphologically-rich and/or non-canonical language
using insights from parsing and associated processing problems to motivate decisions in the creation of new syntactically annotated corpora
annotation and parsing of data from domains and genres that are not yet covered for many languages
In addition to regular paper submissions, we ask for poster submissions addressing the syntactic analysis of frequent phenomena of non-canonical languages which are difficult to annotate and parse using conventional annotation schemes. A case in point are the representation of verbless utterances in a dependency scheme, the pros and cons of different representations of disfluencies for statistical parsing, or the analysis of complex hashtags which incorporate and merge different syntactic arguments into one token. The posters should focus on one or more of a number of given issues described in more detail (see http://spmrl.org/sancl-posters2014.html) and will be presented at the workshop. More details on the submission categories for the poster session can be found below and at the website.
Shared Task
In addition, the workshop will host the second shared task on parsing morphologically rich language (see http://spmrl.org/spmrl2014-sharedtask.html). The first shared task was held in conjunction with SPMRL 2013, and helped show that carefully engineered approaches can help to push the envelope on languages such as Hungarian, Basque, Hebrew and Polish, where the shared task results for constituency parsing are the best current known for those languages. Just as importantly, the task embodied a focus on realistic scenarios (no gold tokenization, no gold part-of-speech or morphology), as well as meaningful evaluation measures including a cross-framework evaluation that permits comparisons between constituent and dependency parsing models.
In addition, this shared task was the first to feature, besides the English-only SANCL 2012 “parsing the web shared task”, a pure raw parsing scenario (no gold tokenization, no gold morphology) and to feature a cross-framework evaluation procedure which showed that the difference between constituent and dependency parsing models on this data set was not as high set as previously thought, especially when it comes to non gold input.
The second installment will feature a similar range of languages. But it will also consider a semi-supervised scenario where larger quantities of in-domain text are available. These unlabeled data are aimed to be used for self-training, co-training, lexical acquisition, generating word clusters, word embeddings and so on. A separate call for the Shared Task is to to be sent soon.

Last modified: 2014-03-21 08:04:17