Gen-Deep 2018 - Generalization in the Age of Deep Learning, Inability
Topics/Call fo Papers
Deep learning has brought a wealth of state-of-the-art results and new capabilities. Although methods have achieved near human-level performance on many benchmarks, numerous recent studies imply that these benchmarks only weakly test their intended purpose, and that simple examples produced either by human or machine, cause systems to fail spectacularly. For example, a recently released textual entailment demo was criticized on social media for predicting:
“John killed Mary”
→
“Mary killed John”
Entailing with 92% confidence
Such surprising failures combined with the inability to interpret state-of-the-art models have eroded confidence in our systems, and while these systems are not perfect, the real flaw lies with our benchmarks that do not adequately measure a model’s ability to generalize, and are thus easily gameable.
This workshop provides a venue for exploring new approaches for measuring and enforcing generalization in models. We are soliciting work in the following areas:
Analysis of existing models and their failings
Creation of new evaluation paradigms,
e.g. zero-shot learning, Winnograd schema, and datasets that avoid explicit types of gamification.
Modeling advances
regularization, compositionality, interpretability, inductive bias, multi-task learning, and other methods that promote generalization.
Some of our goals are similar in spirit to those of the recent “Build it Break it” shared task. However, we propose going beyond identifying areas of weakness (i.e. “breaking” existing systems), and discussing scalable evaluations that more rigorously test generalization as well as modeling techniques for enforcing it.
“John killed Mary”
→
“Mary killed John”
Entailing with 92% confidence
Such surprising failures combined with the inability to interpret state-of-the-art models have eroded confidence in our systems, and while these systems are not perfect, the real flaw lies with our benchmarks that do not adequately measure a model’s ability to generalize, and are thus easily gameable.
This workshop provides a venue for exploring new approaches for measuring and enforcing generalization in models. We are soliciting work in the following areas:
Analysis of existing models and their failings
Creation of new evaluation paradigms,
e.g. zero-shot learning, Winnograd schema, and datasets that avoid explicit types of gamification.
Modeling advances
regularization, compositionality, interpretability, inductive bias, multi-task learning, and other methods that promote generalization.
Some of our goals are similar in spirit to those of the recent “Build it Break it” shared task. However, we propose going beyond identifying areas of weakness (i.e. “breaking” existing systems), and discussing scalable evaluations that more rigorously test generalization as well as modeling techniques for enforcing it.
Other CFPs
- Workshop on Figurative Language Processing
- Second ACL Workshop on Ethics in Natural Language Processing
- Computational models of Reference, Anaphora and Coreference (CRAC)
- Fourth Workshop on Cognitive Knowledge Acquisition and Applications
- Fifth Workshop on Computational Linguistics and Clinical Psychology
Last modified: 2018-01-12 06:49:45