NLP&DBpedia 2014 - 2nd International Workshop on NLP & DBpedia 2014
Topics/Call fo Papers
The DBpedia community has recently experienced an immense increase in activity. We believe that the time has come to explore the connection between DBpedia & Natural Language Processing (NLP) in a yet unprecedented depth.
DBpedia has a long-standing tradition to provide useful data as well as a commitment to reliable Semantic Web technologies and living best practices. With the rise of WikiData, DBpedia is step-by-step relieved from the tedious extraction of data from Wikipedia’s infoboxes and can shift its focus on new challenges such as extracting information from the unstructured article text as well as becoming a testing ground for multilingual NLP methods.
Motivation
The central role of Wikipedia (and therefore DBpedia) for the creation of a Translingual Web has recently been recognized by the Strategic Research Agenda (cf. section 3.4, page 23) and most of the contributions of the recent Dagstuhl seminar on the Multilingual Semantic Web also stress the role of Wikipedia for Multilingualism. As more and more language-specific chapters of DBpedia are created (currently 14 language editions), DBpedia is becoming a driving factor for a Linguistic Linked Open Data cloud as well as localized LOD clouds with specialized domains (e.g. the Dutch windmill domain ontology created from http://nl.dbpedia.org).
The data contained in Wikipedia and DBpedia have ideal properties for making them a controlled testbed for NLP. Wikipedia and DBpedia are multilingual and multi-domain, the communities maintaining these resource are very open and it is easy to join and contribute. The open licence allows data consumers to benefit from the content and many parts are collaboratively editable. Especially, the data in DBpedia is widely used and disseminated throughout the Semantic Web.
We envision the workshop to produce the following items:
an open call to the DBpedia data consumer community will generate a wish list of data, which is to be generated from Wikipedia by NLP methods. This wish list will be broken down to tasks and benchmarks and a GOLD standard will be created.
the benchmarks and test data created will be collected and published under an open licence for future evaluation (inspired by http://oaei.ontologymatching.org/ and http://archive.ics.uci.edu/ml/datasets.html).
NLP4DBpedia
DBpedia has been around for quite a while, infusing the Web of Data with multi-domain data of decent quality. The data in DBpedia is, however, mostly extracted from Wikipedia infoboxes, while the remaining parts of Wikipedia are to a large extent not exploited for DBpedia. Here, NLP techniques may help improving DBpedia.
Extracting additional triples from the plain text information in Wikipedia, either unsupervised or using the existing triples as training information, could multiply the information in DBpedia, or help telling correct from incorrect information by finding supporting text passages. Furthermore, analyzing the semantics of other structures in Wikipedia, such as tables, list pages, or categories, would help make DBpedia richer. Finally, since Wikipedia exists in more than 200 languages, we are particularly interested in seeing NLP approaches not only working for English, but also for other languages, in order to leverage the huge amount of knowledge captured in the different language editions.
DBpedia4NLP
On the other hand, NLP and information extraction techniques often involve various resources while processing texts from different domains. As high-quality annotated data is often too expensive and time-consuming to obtain, NLP researchers are looking to external structured sources to complement their datasets. Such resources can be gazetteers to aid a named entity recognition system or examples of relations between entities to bootstrap a relation finder. DBpedia can easily be utilised to assist NLP modules in a variety of tasks.
We invite papers from both these areas including:
Knowledge extraction from text and HTML documents (especially unstructured and semi-structured documents) on the Web, using information in the Linked Open Data (LOD) cloud, and especially in DBpedia.
Representation of NLP tool output and NLP resources as RDF/OWL, and linking the extracted output to the LOD cloud.
Novel applications using the extracted knowledge, the Web of Data or NLP DBpedia-based methods.
Topics
Improving DBpedia with NLP methods
Finding errors in DBpedia with NLP methods
Annotation methods for Wikipedia articles
Cross-lingual data and text mining on Wikipedia
Pattern and semantic analysis of natural language, reading the Web, learning by reading
Large-scale information extraction
Entity resolution and automatic discovery of Named Entities
Multilingual entity recognition task of real world entities
Frequent pattern analysis of entities
Relationship extraction, slot filling
Entity linking, Named Entity disambiguation, cross-document co-reference resolution
Disambiguation through knowledge base
Ontology representation of natural language text
Analysis of ontology models for natural language text
Learning and refinement of ontologies
Natural language taxonomies modeled to Semantic Web ontologies
Use cases of entity recognition for Linked Data applications
Impact of entity linking on information retrieval, semantic search
Furthermore, an informal list of NLP tasks can be found on this Wikipedia page: http://en.wikipedia.org/wiki/Natural_language_proc...
These are relevant for the workshop as long as they fit into the DBpedia4NLP and NLP4DBpedia frame (i.e. the used data evolves around Wikipedia and DBpedia).
Workshop format.
The workshop will be pro-active to encourage collaborative participation: for example, live minutes of the workshop will be taken using an open EtherPad. We plan to collect the material used by each submission such as dataset used, source code, etc. and to share it to the whole community using a portal such as CKAN. Moreover, we intend to give to the attendees a big picture from the workshop day and to mainly discuss and fill the topic highlighted in the Knowledge Extraction Wikipedia page. Participants are also encouraged to extend the Wikipedia page.
A persistent website will be created to publicize the call for papers and the motivation of this proposal, supplementing and integrating the traditional collection of papers that constitute the proceedings (which will be published in CEUR-WS.org, a recognized ISSN publication series, to ensure wide accessibility).
Submissions
All papers must represent original and unpublished work that is not currently under review. Papers will be evaluated according to their significance, originality, technical content, style, clarity, and relevance to the workshop. At least one author of each accepted paper is expected to attend the workshop.
We welcome the following types of contributions:
Full research papers (up to 12 pages).
Position papers (up to 6 pages)
Use case descriptions (up to 6 pages)
Data/benchmark paper(2-6 pages, depending on the size and complexity)
Formatting Guidelines
All submissions must be written in English and must be formatted according to the style for Lecture Notes in Computer Science (LNCS) Authors. Please submit your contributions electronically in PDF format to https://www.easychair.org/conferences/?conf=nlpdbp...
For details on the LNCS style, see Springer’s Author Instructions. NLP & DBpedia 2014 submissions are not anonymous.
DBpedia has a long-standing tradition to provide useful data as well as a commitment to reliable Semantic Web technologies and living best practices. With the rise of WikiData, DBpedia is step-by-step relieved from the tedious extraction of data from Wikipedia’s infoboxes and can shift its focus on new challenges such as extracting information from the unstructured article text as well as becoming a testing ground for multilingual NLP methods.
Motivation
The central role of Wikipedia (and therefore DBpedia) for the creation of a Translingual Web has recently been recognized by the Strategic Research Agenda (cf. section 3.4, page 23) and most of the contributions of the recent Dagstuhl seminar on the Multilingual Semantic Web also stress the role of Wikipedia for Multilingualism. As more and more language-specific chapters of DBpedia are created (currently 14 language editions), DBpedia is becoming a driving factor for a Linguistic Linked Open Data cloud as well as localized LOD clouds with specialized domains (e.g. the Dutch windmill domain ontology created from http://nl.dbpedia.org).
The data contained in Wikipedia and DBpedia have ideal properties for making them a controlled testbed for NLP. Wikipedia and DBpedia are multilingual and multi-domain, the communities maintaining these resource are very open and it is easy to join and contribute. The open licence allows data consumers to benefit from the content and many parts are collaboratively editable. Especially, the data in DBpedia is widely used and disseminated throughout the Semantic Web.
We envision the workshop to produce the following items:
an open call to the DBpedia data consumer community will generate a wish list of data, which is to be generated from Wikipedia by NLP methods. This wish list will be broken down to tasks and benchmarks and a GOLD standard will be created.
the benchmarks and test data created will be collected and published under an open licence for future evaluation (inspired by http://oaei.ontologymatching.org/ and http://archive.ics.uci.edu/ml/datasets.html).
NLP4DBpedia
DBpedia has been around for quite a while, infusing the Web of Data with multi-domain data of decent quality. The data in DBpedia is, however, mostly extracted from Wikipedia infoboxes, while the remaining parts of Wikipedia are to a large extent not exploited for DBpedia. Here, NLP techniques may help improving DBpedia.
Extracting additional triples from the plain text information in Wikipedia, either unsupervised or using the existing triples as training information, could multiply the information in DBpedia, or help telling correct from incorrect information by finding supporting text passages. Furthermore, analyzing the semantics of other structures in Wikipedia, such as tables, list pages, or categories, would help make DBpedia richer. Finally, since Wikipedia exists in more than 200 languages, we are particularly interested in seeing NLP approaches not only working for English, but also for other languages, in order to leverage the huge amount of knowledge captured in the different language editions.
DBpedia4NLP
On the other hand, NLP and information extraction techniques often involve various resources while processing texts from different domains. As high-quality annotated data is often too expensive and time-consuming to obtain, NLP researchers are looking to external structured sources to complement their datasets. Such resources can be gazetteers to aid a named entity recognition system or examples of relations between entities to bootstrap a relation finder. DBpedia can easily be utilised to assist NLP modules in a variety of tasks.
We invite papers from both these areas including:
Knowledge extraction from text and HTML documents (especially unstructured and semi-structured documents) on the Web, using information in the Linked Open Data (LOD) cloud, and especially in DBpedia.
Representation of NLP tool output and NLP resources as RDF/OWL, and linking the extracted output to the LOD cloud.
Novel applications using the extracted knowledge, the Web of Data or NLP DBpedia-based methods.
Topics
Improving DBpedia with NLP methods
Finding errors in DBpedia with NLP methods
Annotation methods for Wikipedia articles
Cross-lingual data and text mining on Wikipedia
Pattern and semantic analysis of natural language, reading the Web, learning by reading
Large-scale information extraction
Entity resolution and automatic discovery of Named Entities
Multilingual entity recognition task of real world entities
Frequent pattern analysis of entities
Relationship extraction, slot filling
Entity linking, Named Entity disambiguation, cross-document co-reference resolution
Disambiguation through knowledge base
Ontology representation of natural language text
Analysis of ontology models for natural language text
Learning and refinement of ontologies
Natural language taxonomies modeled to Semantic Web ontologies
Use cases of entity recognition for Linked Data applications
Impact of entity linking on information retrieval, semantic search
Furthermore, an informal list of NLP tasks can be found on this Wikipedia page: http://en.wikipedia.org/wiki/Natural_language_proc...
These are relevant for the workshop as long as they fit into the DBpedia4NLP and NLP4DBpedia frame (i.e. the used data evolves around Wikipedia and DBpedia).
Workshop format.
The workshop will be pro-active to encourage collaborative participation: for example, live minutes of the workshop will be taken using an open EtherPad. We plan to collect the material used by each submission such as dataset used, source code, etc. and to share it to the whole community using a portal such as CKAN. Moreover, we intend to give to the attendees a big picture from the workshop day and to mainly discuss and fill the topic highlighted in the Knowledge Extraction Wikipedia page. Participants are also encouraged to extend the Wikipedia page.
A persistent website will be created to publicize the call for papers and the motivation of this proposal, supplementing and integrating the traditional collection of papers that constitute the proceedings (which will be published in CEUR-WS.org, a recognized ISSN publication series, to ensure wide accessibility).
Submissions
All papers must represent original and unpublished work that is not currently under review. Papers will be evaluated according to their significance, originality, technical content, style, clarity, and relevance to the workshop. At least one author of each accepted paper is expected to attend the workshop.
We welcome the following types of contributions:
Full research papers (up to 12 pages).
Position papers (up to 6 pages)
Use case descriptions (up to 6 pages)
Data/benchmark paper(2-6 pages, depending on the size and complexity)
Formatting Guidelines
All submissions must be written in English and must be formatted according to the style for Lecture Notes in Computer Science (LNCS) Authors. Please submit your contributions electronically in PDF format to https://www.easychair.org/conferences/?conf=nlpdbp...
For details on the LNCS style, see Springer’s Author Instructions. NLP & DBpedia 2014 submissions are not anonymous.
Other CFPs
- INNS Symposium on Ubiquitous Computing
- 5th Biobased Chemicals: Commercialization & Partnering Conference
- Seoul International Conference on Life Sciences and Biological Engineering
- The 1st International Workshop on Sparse Representation for Audio Signal Processing (SPASP2014)
- The 9th IEEE Multimedia Technologies in E-Learning (MTEL2014)
Last modified: 2014-05-05 17:47:48