PlanBig 2014 - 2014 Programming Languages for Big Data (PlanBig)
Topics/Call fo Papers
Programming Languages for Big Data (PlanBig)
Organizers
Véronique Benzaken (University Paris South, FR)
James Cheney (University of Edinburgh, GB)
Torsten Grust (Universität Tübingen, DE)
Dimitios Vytiniotis (Microsoft Research UK - Cambridge, GB)
For support, please contact
Annette Beyer for administrative matters
Marc Herbstritt for scientific matters
Documents
Dagstuhl Seminar Schedule (Upload here)
(Use seminar number and access code to log in)
Motivation
We have all witnessed a dramatic increase in the number of domain-specific languages or libraries for interfacing with other computing paradigms (data-parallelism, sensor networks, MapReduce-style fault-tolerant parallelism, distributed programming, Bayesian inference engines, SAT or SMT solvers, or multi-tier Web programming), as well as techniques for language-integrated querying or processing data over other data models (XML, RDF, JSON). Much of this activity is spurred by the opportunities offered by so-called "Big Data" ? that is, large-scale, data-intensive computing on massive amounts of data. Such techniques have already benefited from concepts from programming languages. For example, MapReduce’s "map" and "reduce" operators are based on classical list-manipulation primitives introduced in LISP.
Programming systems that manipulate big data pose many challenges. As the amount of data being processed grows beyond the capabilities of any one computer system, the problem of effectively programming multiple computers, each possibly with multiple CPUs, GPUs, or software subsystems, becomes unavoidable; issues such as security, trust, and provenance become increasingly entangled with classical efficiency and correctness concerns. Some of these problems have been studied for decades: for example, integration of relational database capabilities and general-purpose programming languages has been a long-standing challenge, with some approaches now in mainstream use (such as Microsoft’s LINQ). Other problems may require advances in the foundations of programming languages.
Programming that crosses multiple execution models is increasingly required for modern applications, using paradigms both established (e.g., database, dataflow or data-parallel computing models) and emerging (e.g., multicore, GPU, or software-defined networking). Cross-model programs that execute in multiple (possibly heterogeneous) environments have much more challenging security, debugging, validation, and optimization problems than conventional programming languages. Both big data and massively parallel systems are currently based on systems-based methods and testing regimes that cannot offer guarantees of safety, security, correctness, and evolvability. In a purely system-based approach these problems are hard to even enunciate, let alone solve. Language-based techniques, particularly formalization, verification, abstraction, and representation independence, are badly needed to reconcile the performance benefits of advanced computational paradigms with the advantages of modern programming languages.
These problems are currently being addressed in a variety of different communities, often using methods that share a great deal of common features, for example the use of comprehensions to structure database queries, data-parallelism, or MapReduce/Hadoop jobs, the use of semantics to clarify the meaning of new languages and correctness of optimizations, the use of static analyses for effectively optimizing large-scale jobs, and the need for increased security and assurance including new techniques for provenance and trust. This Dagstuhl seminar on "Programming Languages for Big Data" seeks to identify and develop these common foundations in order to reap the full benefits of Big Data and associated data-intensive computing resources.
Four more specific topics are proposed to focus the seminar, although we anticipate that other topics may emerge due to future research developments or interactions at the seminar itself:
Static analysis and types for performance/power optimization for and reliability of big data programming
Language abstractions for cross-model programming
Language design principles for distribution, heterogeneity, and preservation
Trust, security, and provenance for high-confidence big data programming
Classification
Data Bases / Information Retrieval
Programming Languages / Compiler
Security / Cryptology
Keywords
High-performance computing
Data-intensive research
Language-integrated query
Language-based security
Organizers
Véronique Benzaken (University Paris South, FR)
James Cheney (University of Edinburgh, GB)
Torsten Grust (Universität Tübingen, DE)
Dimitios Vytiniotis (Microsoft Research UK - Cambridge, GB)
For support, please contact
Annette Beyer for administrative matters
Marc Herbstritt for scientific matters
Documents
Dagstuhl Seminar Schedule (Upload here)
(Use seminar number and access code to log in)
Motivation
We have all witnessed a dramatic increase in the number of domain-specific languages or libraries for interfacing with other computing paradigms (data-parallelism, sensor networks, MapReduce-style fault-tolerant parallelism, distributed programming, Bayesian inference engines, SAT or SMT solvers, or multi-tier Web programming), as well as techniques for language-integrated querying or processing data over other data models (XML, RDF, JSON). Much of this activity is spurred by the opportunities offered by so-called "Big Data" ? that is, large-scale, data-intensive computing on massive amounts of data. Such techniques have already benefited from concepts from programming languages. For example, MapReduce’s "map" and "reduce" operators are based on classical list-manipulation primitives introduced in LISP.
Programming systems that manipulate big data pose many challenges. As the amount of data being processed grows beyond the capabilities of any one computer system, the problem of effectively programming multiple computers, each possibly with multiple CPUs, GPUs, or software subsystems, becomes unavoidable; issues such as security, trust, and provenance become increasingly entangled with classical efficiency and correctness concerns. Some of these problems have been studied for decades: for example, integration of relational database capabilities and general-purpose programming languages has been a long-standing challenge, with some approaches now in mainstream use (such as Microsoft’s LINQ). Other problems may require advances in the foundations of programming languages.
Programming that crosses multiple execution models is increasingly required for modern applications, using paradigms both established (e.g., database, dataflow or data-parallel computing models) and emerging (e.g., multicore, GPU, or software-defined networking). Cross-model programs that execute in multiple (possibly heterogeneous) environments have much more challenging security, debugging, validation, and optimization problems than conventional programming languages. Both big data and massively parallel systems are currently based on systems-based methods and testing regimes that cannot offer guarantees of safety, security, correctness, and evolvability. In a purely system-based approach these problems are hard to even enunciate, let alone solve. Language-based techniques, particularly formalization, verification, abstraction, and representation independence, are badly needed to reconcile the performance benefits of advanced computational paradigms with the advantages of modern programming languages.
These problems are currently being addressed in a variety of different communities, often using methods that share a great deal of common features, for example the use of comprehensions to structure database queries, data-parallelism, or MapReduce/Hadoop jobs, the use of semantics to clarify the meaning of new languages and correctness of optimizations, the use of static analyses for effectively optimizing large-scale jobs, and the need for increased security and assurance including new techniques for provenance and trust. This Dagstuhl seminar on "Programming Languages for Big Data" seeks to identify and develop these common foundations in order to reap the full benefits of Big Data and associated data-intensive computing resources.
Four more specific topics are proposed to focus the seminar, although we anticipate that other topics may emerge due to future research developments or interactions at the seminar itself:
Static analysis and types for performance/power optimization for and reliability of big data programming
Language abstractions for cross-model programming
Language design principles for distribution, heterogeneity, and preservation
Trust, security, and provenance for high-confidence big data programming
Classification
Data Bases / Information Retrieval
Programming Languages / Compiler
Security / Cryptology
Keywords
High-performance computing
Data-intensive research
Language-integrated query
Language-based security
Other CFPs
- 2014 BIG data programming CHAllenge (BIGCHA)
- 2nd International Workshop on Machine Learning and Data Mining for Sensor Networks
- International Workshop on Protocols and Applications for the Internet of Things
- 3th International Workshop on Survivable and Robust Optical Networks
- 5th International Symposium on Frontiers in Ambient and Mobile Systems
Last modified: 2014-08-20 22:59:54