ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

Bigdata 2016 - Accelerating Big Data Processing with Hadoop, Spark and Memcached on Datacenters with Modern Architectures

Date2016-06-18 - 2016-06-22

Deadline2015-11-18

VenueSeoul, South Korea South Korea

Keywords

Websitehttps://web.cse.ohio-state.edu/~panda/is...

Topics/Call fo Papers

Apache Hadoop and Spark are gaining prominence in handling Big Data and analytics. Similarly, Memcached in Web 2.0 environment is becoming important for large-scale query processing. These middleware are traditionally written with sockets and do not deliver best performance on datacenters with modern high performance networks. In this tutorial, we will provide an in-depth overview of the architecture of Hadoop components (HDFS, MapReduce, RPC, HBase, etc.), Spark and Memcached. We will examine the challenges in re-designing the networking and I/O components of these middleware with modern interconnects, protocols (such as InfiniBand, iWARP, RoCE, and RSocket) with RDMA and storage architecture. Using the publicly available software packages in the High-Performance Big Data (HiBD, http://hibd.cse.ohio-state.edu) project, we will provide case studies of the new designs for several Hadoop/Spark/Memcached components and their associated benefits. Through these case studies, we will also examine the interplay between high performance interconnects, storage systems (HDD and SSD), and multi-core platforms to achieve the best solutions for these components.
Targeted Audience and Scope
The tutorial content is planned for half-a-day. This tutorial is targeted for various categories of people working in the areas of Big Data including high-performance Hadoop/Spark/Memcached, high performance communication and I/O architecture, storage, networking, middleware, cloud computing and applications. Specific audience this tutorial is aimed at include:
Scientists, engineers, researchers, and students engaged in designing next-generation Big Data systems and applications
Designers and developers of Big Data, Hadoop, Spark and Memcached middleware
Newcomers to the field of Big Data who are interested in familiarizing themselves with Hadoop, Spark, Memcached, RDMA, and high-performance networking
Managers and administrators responsible for setting-up next generation Big Data environment and high-end systems/facilities in their organizations/laboratories
The content level will be as follows: 30% beginner, 40% intermediate, and 30% advanced. There is no fixed pre-requisite. As long as the attendee has a general knowledge in Big Data, Hadoop, Spark, Memcached, high performance computing, networking and storage architecture, and related issues, he/she will be able to understand and appreciate it. The tutorial is designed in such a way that an attendee gets exposed to the topics in a smooth and progressive manner. This tutorial is organized as a coherent talk to cover multiple topics.
Outline of the Tutorial
Introduction to Big Data Applications and Analytics
Overview of MapReduce and Resilient Distributed Datasets (RDD) Programming Models
Architecture Overview of Apache Hadoop, Spark and Memcached
MapReduce and YARN
HDFS
Spark
RPC
HBase
Memcached
Overview of High-Performance Interconnects, Protocols, and Storage Architectures for Modern Datacenters
InfiniBand and RDMA
10/40 GigE, iWARP and RoCE technologies
RSocket and SDP protocols
SSD-based storage
Challenges in Accelerating Hadoop, Spark and Memcached on Modern Datacenters
Overview of Benchmarks and Applications using Hadoop, Spark and Memcached
Acceleration Case Studies and In-Depth Performance Evaluation
MapReduce over InfiniBand with RDMA, SSD, and Lustre
HDFS over InfiniBand with RDMA and Heterogeneous Storage (RAMDisk, SSD, HDD, and Lustre)
Spark over InfiniBand with RDMA and SSD
RPC over InfiniBand with RDMA
HBase over InfiniBand with RDMA and SSD
Memcached over InfiniBand with RDMA and SSD
The High-Performance Big Data (HiBD) Project and Associated Releases
Ongoing and Future Activities for High-Performance Big Data
Conclusion and Q&A

Last modified: 2016-02-28 00:07:59