Metadata 2012 - Workshop on Describing Language Resources with Metadata
Topics/Call fo Papers
Workshop on Describing Language Resources with Metadata:
Towards Flexibility and Interoperability in the Documentation
of Language Resources
To be held in conjunction with the 8th International
Language Resources and Evaluation Conference (LREC 2012)
22 May 2012, Lütfi Kirdar Istanbul
Exhibition and Congress Centre, Istanbul, Turkey
Deadline for submission: 19 February 2012
The description of Language Resources (LRs) continues to be a crucial point
in the life cycle of LRs, and more particularly, in their sustainable
exchange. This has been so for a number of repositories or LR distribution
centres in place (ELRA, GSK, LDC, OLAC, TST-Centrale, BAS, among others),
who house LR catalogues following some proprietary metadata schema. A number
of projects and initiatives have also focused these past few years in the
sharing of LRs (ENABLER, CLARIN, FLaReNet, PANACEA, META-SHARE), for example
for Language Technology (LT)
Based on these initiatives a consensus emerges that shows a number of
requirements for standardized metadata:
1. There should be a common publication channel for the LR
descriptions in the world.
2. This channel allows users to carry out easy and efficient LR data
discovery and possible subsequent retrieval of LRs.
3. Expert knowledge is required to create the data model for the
metadata description.
4. Subject matter experts (both researchers and LR/LT providers and
developers) are required to provide the content for the data model.
5. The data model needs to be clear, expressive, flexible,
customizable and interoperable.
6. Metadata have to provide for different user groups, ranging from
providers to consumers (both individuals and organisations). This applies
both to the information contained in the metadata and the supporting tool
infrastructure for creating, maintaining, distributing, harvesting and
searching the metadata.
Currently several initiatives focus on metadata. From the realm of work done
within initiatives like ENABLER and CLARIN descended the Component MetaData
Infrastructure (CMDI, ISO TC 37 SC 4 work item for ISO 24622), which allows
the combination of standard data categories (for example from ISO 12620, to components, which are combined into metadata profiles. Early
versions of this model have been operational in repositories such as ELRA's,
which complied with the work done within INTERA. FLaReNet, as the result of
a permanent and cyclical consultation, has issued a set of main
recommendations where a global infrastructure of uniform and interoperable
metadata sets appear among the Top Priorities for the field of LRs. For use
within HLT, META-SHARE provides a fully-fledged schema for the description
of LRs, in the framework of the component model, covering all the current
resource types and media types of use, in all the stages of a resource's
life-cycle. Our aim is to learn from one another's experiences and plans in
this area.
Making resources available for others and putting this to a second use in
other projects has never been more widely accepted as a sensible efficient
way to avoid a waste of efforts and resources. However, when it comes to the
details, there is still a vast number of problems. This workshop will be a
forum to address issues and challenges in the concrete work with metadata
for LRs, not restricted to a single initiative for archiving LRs.
The current state of the art for metadata provision allows for a very
flexible approach, catering for the needs of different archives and
communities, referring to common data category registries that describe the
meaning of a data category at least to authors of metadata. Component models
for metadata provisions are for example used by CLARIN and META-SHARE, but
there is also an increased flexibility in other metadata schemas such as
Dublin Core, which is usually not seen as appropriate for meaningful
description of language resources.
Topics of interest are:
1. Infrastructures for creating components and profiles for metadata
2. Editing and creating metadata
3. Porting legacy metadata
4. Metadata as a resource
5. Maintenance of metadata
6. Classification of language resources
7. Providing metadata concepts
8. Creating components and profiles
9. Services harvesting and interpreting metadata
10. Experience from the large LR data center catalogues: LDC, ELRA,
BAS, and how to interoperate with them
11. Controlled vocabularies, terminology and metadata description
12. Formal models for metadata representation and standardized models
of serialisation
13. Customization and reuse of metadata schemas
14. Plans or experiences with emerging metadata infrastructures as for
example from CLARIN & META-SHARE
15. Experiences with the Component based metadata infrastructures
16. Integration and conversion of multiple repositories: experiences
17. Standardization issues for metadata
We invite submissions for full papers and system demonstrations that address
these questions and other related issues relevant to the workshop.
Workshop Programme and Audience Addressed
This full-day workshop aims at bringing together technology oriented working
groups on metadata modeling or schema creation and both researchers and
producers creating metadata in the course of their work. Those interested
to use metadata in their projects should get the insights and come out with
a clear idea of how to either describe their LRs or convert their
schema. Those who have developed recently a model can share their
experience, and those who have specific concerns with interoperability of
metadata schemas as developed by the various initiatives can open the
discussion in search for joint solutions.
Tools and the tool infrastructures should also be part of the discussion
given that the initiatives provide also editors, mappings, search
interfaces, component and profile registries.
Organising Committee
Victoria Arranz (ELDA/ELRA, Paris, France,
Daan Broeder (MPI, Nijmegen, The Netherlands,
Bertrand Gaiffe (ATILF, Nancy, France,
Maria Gavrilidou (Athena Research and Innovation Center, Athens, Greece,
Monica Monachini, (CNR-ILC, Pisa, Italy,
Thorsten Trippel (University of Tübingen, Tübingen, Germany,
Programme Committee
Helen Aristar-Dry (Michigan State University, USA)
Núria Bel (UPF, Barcelona, Spain)
Antonio Branco, (University of Lisbon, Portugal)
Lars Borin (Språkbanken, Sweden)
Khalid Choukri (ELDA/ELRA, Paris, France)
Thierry Declerck (DFKI, Germany)
Matej Durco (Austrian Academy of Sciences, Austria)
Gil Francopoulo (CNRS-LIMSI-IMMI + TAGMATICA, Paris, France)
Francesca Frontini (CNR-ILC, Pisa, Italy)
Erhard Hinrichs (Univerität Tübingen, Germany)
Penny Labropoulou (ILSP-Athena, Athens, Greece)
Valérie Mapelli (ELDA/ELRA, Paris, France)
Jan Odijk (Universiteit Utrecht, The Netherlands)
Elena Pierazzo (Kings College, London, UK)
Laurent Romary (INRIA, France)
Mike Rosner (University of Malta, Malta)
Andreas Witt (IDS, Germany)
Peter Wittenburg (MPI, The Netherlands)
Tamás Varadi (Hungarian Academy of Sciences, Hungary)
Marta Villegas (UPF, Barcelona, Spain)
Sue Ellen Wright (Kent State University, USA)
Important dates
Submission of full papers: Sunday 19 February 2012
Notification of acceptance of papers and demonstrations: Thursday 22 March 2012
Submission of final version: Saturday 31 March 2012
Final programme available: Friday 13 April 2012
Workshop: Tuesday 22 May 2012
Authors should use the START system accessible from and the LREC
author's kit for submitting a two-column article of 4 to 8 pages.
For further queries, please contact Victoria Arranz at or
Thorsten Trippel at
When submitting a paper through START, authors will be kindly asked to
provide relevant information about the resources that have been used for the
work described in their paper or that are the outcome of their research. For
further information on this initiative, please refer to Authors will also be asked
to contribute to the Language Library, the new initiative of LREC2012.
Towards Flexibility and Interoperability in the Documentation
of Language Resources
To be held in conjunction with the 8th International
Language Resources and Evaluation Conference (LREC 2012)
22 May 2012, Lütfi Kirdar Istanbul
Exhibition and Congress Centre, Istanbul, Turkey
Deadline for submission: 19 February 2012
The description of Language Resources (LRs) continues to be a crucial point
in the life cycle of LRs, and more particularly, in their sustainable
exchange. This has been so for a number of repositories or LR distribution
centres in place (ELRA, GSK, LDC, OLAC, TST-Centrale, BAS, among others),
who house LR catalogues following some proprietary metadata schema. A number
of projects and initiatives have also focused these past few years in the
sharing of LRs (ENABLER, CLARIN, FLaReNet, PANACEA, META-SHARE), for example
for Language Technology (LT)
Based on these initiatives a consensus emerges that shows a number of
requirements for standardized metadata:
1. There should be a common publication channel for the LR
descriptions in the world.
2. This channel allows users to carry out easy and efficient LR data
discovery and possible subsequent retrieval of LRs.
3. Expert knowledge is required to create the data model for the
metadata description.
4. Subject matter experts (both researchers and LR/LT providers and
developers) are required to provide the content for the data model.
5. The data model needs to be clear, expressive, flexible,
customizable and interoperable.
6. Metadata have to provide for different user groups, ranging from
providers to consumers (both individuals and organisations). This applies
both to the information contained in the metadata and the supporting tool
infrastructure for creating, maintaining, distributing, harvesting and
searching the metadata.
Currently several initiatives focus on metadata. From the realm of work done
within initiatives like ENABLER and CLARIN descended the Component MetaData
Infrastructure (CMDI, ISO TC 37 SC 4 work item for ISO 24622), which allows
the combination of standard data categories (for example from ISO 12620, to components, which are combined into metadata profiles. Early
versions of this model have been operational in repositories such as ELRA's,
which complied with the work done within INTERA. FLaReNet, as the result of
a permanent and cyclical consultation, has issued a set of main
recommendations where a global infrastructure of uniform and interoperable
metadata sets appear among the Top Priorities for the field of LRs. For use
within HLT, META-SHARE provides a fully-fledged schema for the description
of LRs, in the framework of the component model, covering all the current
resource types and media types of use, in all the stages of a resource's
life-cycle. Our aim is to learn from one another's experiences and plans in
this area.
Making resources available for others and putting this to a second use in
other projects has never been more widely accepted as a sensible efficient
way to avoid a waste of efforts and resources. However, when it comes to the
details, there is still a vast number of problems. This workshop will be a
forum to address issues and challenges in the concrete work with metadata
for LRs, not restricted to a single initiative for archiving LRs.
The current state of the art for metadata provision allows for a very
flexible approach, catering for the needs of different archives and
communities, referring to common data category registries that describe the
meaning of a data category at least to authors of metadata. Component models
for metadata provisions are for example used by CLARIN and META-SHARE, but
there is also an increased flexibility in other metadata schemas such as
Dublin Core, which is usually not seen as appropriate for meaningful
description of language resources.
Topics of interest are:
1. Infrastructures for creating components and profiles for metadata
2. Editing and creating metadata
3. Porting legacy metadata
4. Metadata as a resource
5. Maintenance of metadata
6. Classification of language resources
7. Providing metadata concepts
8. Creating components and profiles
9. Services harvesting and interpreting metadata
10. Experience from the large LR data center catalogues: LDC, ELRA,
BAS, and how to interoperate with them
11. Controlled vocabularies, terminology and metadata description
12. Formal models for metadata representation and standardized models
of serialisation
13. Customization and reuse of metadata schemas
14. Plans or experiences with emerging metadata infrastructures as for
example from CLARIN & META-SHARE
15. Experiences with the Component based metadata infrastructures
16. Integration and conversion of multiple repositories: experiences
17. Standardization issues for metadata
We invite submissions for full papers and system demonstrations that address
these questions and other related issues relevant to the workshop.
Workshop Programme and Audience Addressed
This full-day workshop aims at bringing together technology oriented working
groups on metadata modeling or schema creation and both researchers and
producers creating metadata in the course of their work. Those interested
to use metadata in their projects should get the insights and come out with
a clear idea of how to either describe their LRs or convert their
schema. Those who have developed recently a model can share their
experience, and those who have specific concerns with interoperability of
metadata schemas as developed by the various initiatives can open the
discussion in search for joint solutions.
Tools and the tool infrastructures should also be part of the discussion
given that the initiatives provide also editors, mappings, search
interfaces, component and profile registries.
Organising Committee
Victoria Arranz (ELDA/ELRA, Paris, France,
Daan Broeder (MPI, Nijmegen, The Netherlands,
Bertrand Gaiffe (ATILF, Nancy, France,
Maria Gavrilidou (Athena Research and Innovation Center, Athens, Greece,
Monica Monachini, (CNR-ILC, Pisa, Italy,
Thorsten Trippel (University of Tübingen, Tübingen, Germany,
Programme Committee
Helen Aristar-Dry (Michigan State University, USA)
Núria Bel (UPF, Barcelona, Spain)
Antonio Branco, (University of Lisbon, Portugal)
Lars Borin (Språkbanken, Sweden)
Khalid Choukri (ELDA/ELRA, Paris, France)
Thierry Declerck (DFKI, Germany)
Matej Durco (Austrian Academy of Sciences, Austria)
Gil Francopoulo (CNRS-LIMSI-IMMI + TAGMATICA, Paris, France)
Francesca Frontini (CNR-ILC, Pisa, Italy)
Erhard Hinrichs (Univerität Tübingen, Germany)
Penny Labropoulou (ILSP-Athena, Athens, Greece)
Valérie Mapelli (ELDA/ELRA, Paris, France)
Jan Odijk (Universiteit Utrecht, The Netherlands)
Elena Pierazzo (Kings College, London, UK)
Laurent Romary (INRIA, France)
Mike Rosner (University of Malta, Malta)
Andreas Witt (IDS, Germany)
Peter Wittenburg (MPI, The Netherlands)
Tamás Varadi (Hungarian Academy of Sciences, Hungary)
Marta Villegas (UPF, Barcelona, Spain)
Sue Ellen Wright (Kent State University, USA)
Important dates
Submission of full papers: Sunday 19 February 2012
Notification of acceptance of papers and demonstrations: Thursday 22 March 2012
Submission of final version: Saturday 31 March 2012
Final programme available: Friday 13 April 2012
Workshop: Tuesday 22 May 2012
Authors should use the START system accessible from and the LREC
author's kit for submitting a two-column article of 4 to 8 pages.
For further queries, please contact Victoria Arranz at or
Thorsten Trippel at
When submitting a paper through START, authors will be kindly asked to
provide relevant information about the resources that have been used for the
work described in their paper or that are the outcome of their research. For
further information on this initiative, please refer to Authors will also be asked
to contribute to the Language Library, the new initiative of LREC2012.
Other CFPs
- LREC 2012 Workshop on Language Engineering for Online Reputation Management
- Call For Chapters : Network Security Technologies: Design and Applications
- ReConFig 2012 PhD Forum
- First Workshop on Power Grid-Friendly Computing (PGFC 2012)
Last modified: 2012-02-06 13:33:36