Dhabaleswar K. Panda

article

EReinit: Scalable and efficient fault‐tolerance for bulk‐synchronous MPI applications

Sourav Chakraborty, Ignacio Laguna, Murali Emani, Kathryn Mohror, more

Concurrency and Computation: Practice and Experience > 32 > 3 > n/a - n/a

Scientists from many different fields have been developing Bulk‐Synchronous MPI applications to simulate and study a wide variety of scientific phenomena. Since failure rates are expected to increase in larger‐scale future HPC systems, providing efficient fault‐tolerance mechanisms for this class of applications is paramount. The global‐restart model has been proposed to decrease the time of failure...

article

CCF THPC inaugural issue editorial

Depai Qian, Dhabaleswar K. Panda

CCF Transactions on High Performance Computing > 2019 > 1 > 1 > 1-2

chapter

Contention-Aware Kernel-Assisted MPI Collectives for Multi-/Many-Core Systems

Sourav Chakraborty, Hari Subramoni, Dhabaleswar K. Panda

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 13 - 24

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Multi-/many-core CPU based architectures are seeing widespread adoption due to their unprecedented compute performance in a small power envelope. With the increasingly large number of cores on each node, applications spend a significant portion of their execution time in intra-node communication. While shared memory is commonly used for intra-node communication, it needs to copy each message once...

chapter

A Scalable Network-Based Performance Analysis Tool for MPI on Large-Scale HPC Systems

Hari Subramoni, Xiaoyi Lu, Dhabaleswar K. Panda

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 354 - 358

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Studying the interaction among applications, MPI runtimes, and the fabric they run on is critical to understanding application performance. There exists no high-performance and scalable tool that enables understanding this interplay on modern multi-petaflop systems. Designing such a tool is non-trivial and involves multiple components including 1) data profiling/collection from network/MPI library,...

chapter

Characterizing Deep Learning over Big Data (DLoBD) Stacks on RDMA-Capable Networks

Xiaoyi Lu, Haiyang Shi, M. Haseeb Javed, Rajarshi Biswas, more

2017 IEEE 25th Annual Symposium on High-Performance Interconnects (HOTI) > 87 - 94

2017 IEEE 25th Annual Symposium on High-Performance Interconnects (HOTI)

Deep Learning over Big Data (DLoBD) is becoming one of the most important research paradigms to mine value from the massive amount of gathered data. Many emerging deep learning frameworks start running over Big Data stacks, such as Hadoop and Spark. With the convergence of HPC, Big Data, and Deep Learning, these DLoBD stacks are taking advantage of RDMA and multi-/many-core based CPUs/GPUs. Even though...

chapter

Efficient and Scalable Multi-Source Streaming Broadcast on GPU Clusters for Deep Learning

Ching-Hsiang Chu, Xiaoyi Lu, Ammar A. Awan, Hari Subramoni, more

2017 46th International Conference on Parallel Processing (ICPP) > 161 - 170

2017 46th International Conference on Parallel Processing (ICPP)

Broadcast operations (e.g. MPI_Bcast) have been widely used in deep learning applications to exchange a large amount of data among multiple graphics processing units (GPUs). Recent studies have shown that leveraging the InfiniBand hardware-based multicast (IB-MCAST) protocol can enhance scalability of GPU-based broadcast operations. However, these initial designs with IB-MCAST are not optimized for...

chapter

MPI-GDS: High Performance MPI Designs with GPUDirect-aSync for CPU-GPU Control Flow Decoupling

Akshay Venkatesh, Khaled Hamidouche, Sreeram Potluri, Davide Rosetti, more

2017 46th International Conference on Parallel Processing (ICPP) > 151 - 160

2017 46th International Conference on Parallel Processing (ICPP)

While GPUs are becoming common in HPC systems, the CPU is still responsible for managing both GPU-side and CPU-side compute, communication, and synchronization operations. For instance, if a result from a GPU-side computation is to be transferred to a remote destination, then the CPU must synchronize on GPU compute completion issuing a communication operation. Both CPU cycles and energy are consumed...

chapter

High-Performance and Resilient Key-Value Store with Online Erasure Coding for Big Data Workloads

Dipti Shankar, Xiaoyi Lu, Dhabaleswar K. Panda

2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS) > 527 - 537

2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)

Distributed key-value store-based caching solutions are being increasingly used to accelerate Big Data applications on modern HPC clusters. This has necessitated incorporating fault-tolerance capabilities into high-performance key-value stores such as Memcached that are otherwise volatile in nature. In-memory replication is being used as the primary mechanism to ensure resilient data operations. However,...

chapter

High-Performance Virtual Machine Migration Framework for MPI Applications on SR-IOV Enabled InfiniBand Clusters

Jie Zhang, Xiaoyi Lu, Dhabaleswar K. Panda

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) > 143 - 152

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

High-speed interconnects (e.g. InfiniBand) have been widely deployed on modern HPC clusters. With the emergence of HPC in the cloud, high-speed interconnects have paved their way into the cloud with recently introduced Single Root I/O Virtualization (SR-IOV) technology, which is able to provide efficient sharing of high-speed interconnect resources and achieve near-native I/O performance. However,...

chapter

Swift-X: Accelerating OpenStack Swift with RDMA for Building an Efficient HPC Cloud

Shashank Gugnani, Xiaoyi Lu, Dhabaleswar K. Panda

2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) > 238 - 247

2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

Running Big Data applications in the cloud has become extremely popular in recent times. To enable the storage of data for these applications, cloud-based distributed storage solutions are a must. OpenStack Swift is an object storage service which is widely used for such purposes. Swift is one of the main components of the OpenStack software package. Although Swift has become extremely popular in...

chapter

Introduction to HPBDC Workshop

Xiaoyi Lu, Jianfeng Zhan, Dhabaleswar K. Panda

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 1020

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record.

chapter

Performance Characterization of Hadoop Workloads on SR-IOV-Enabled Virtualized InfiniBand Clusters

Shashank Gugnani, Xiaoyi Lu, Dhabaleswar K. Panda

2016 IEEE/ACM 3rd International Conference on Big Data Computing Applications and Technologies (BDCAT) > 36 - 45

2016 IEEE/ACM 3rd International Conference on Big Data Computing Applications and Technologies (BDCAT)

Big Data Systems are becoming increasingly complex and generally have very high operational costs. Cloud computing offers attractive solutions for managing large scale systems. However, one of the major bottlenecks in VM performance is virtualized I/O. Since Big Data applications and middleware rely heavily on high performance interconnects such as InfiniBand, the performance of virtualized InfiniBand...

chapter

CUDA M3: Designing Efficient CUDA Managed Memory-Aware MPI by Exploiting GDR and IPC

Khaled Hamidouche, Ammar Ahmad Awan, Akshay Venkatesh, Dhabaleswar K. Panda

2016 IEEE 23rd International Conference on High Performance Computing (HiPC) > 52 - 61

2016 IEEE 23rd International Conference on High Performance Computing (HiPC)

CUDA 6.0 introduced Managed Memory feature to boost the productivity on GPU systems. It removes the explicit memory management and data movement between the host and the accelerator burden from the programmer. However, these benefits restrict the pinning of the memory and hence limits its performance by depriving the usage of performance-centric features like CUDA-IPC and GPUDirect RDMA. On another...

chapter

Mizan-RMA: Accelerating Mizan Graph Processing Framework with MPI RMA

Mingzhe Li, Xiaoyi Lu, Khaled Hamidouche, Jie Zhang, more

2016 IEEE 23rd International Conference on High Performance Computing (HiPC) > 42 - 51

2016 IEEE 23rd International Conference on High Performance Computing (HiPC)

The MPI programming model has been used by several graph processing systems. Out of these systems, only MPI two-sided programming model is used for datamovements. The MPI one-sided programming model that achieves better overlap ofcommunication and computation has been seen as advantageous for applicationswith irregular communication patterns. However, the benefits of MPI one-sidedprogramming model...

chapter

Enabling Performance Efficient Runtime Support for Hybrid MPI+UPC++ Programming Models

Jahanzeb Maqbool Hashmi, Khaled Hamidouche, Dhabaleswar K. Panda

2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS) > 1180 - 1187

2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS)

UPC++ is an emerging Partition Global Address Space (PGAS) library. The performance of UPC++ applications is highly dependent on the efficiency of the communication runtime. Current UPC++ runtime relies on MPI and native IB verbs based conduit implementations of GASNet communication middleware. Hybrid MPI+UPC++ programming model is seen as an attractive way to program next generation systems. However,...

chapter

Impact of HPC Cloud Networking Technologies on Accelerating Hadoop RPC and HBase

Xiaoyi Lu, Dipti Shankar, Shashank Gugnani, Hari Subramoni, more

2016 IEEE International Conference on Cloud Computing Technology and Science (CloudCom) > 310 - 317

2016 IEEE International Conference on Cloud Computing Technology and Science (CloudCom)

The performance of Hadoop components can be significantly improved by leveraging advanced features such as Remote Direct Memory Access (RDMA) on modern HPC clusters, where high-performance networks like InfiniBand (IB) and RoCE have been deployed widely. With the emergence of high-performance computing in the cloud (HPC Cloud), high-performance networks have paved their way into the cloud with recently...

chapter

System-Level Scalable Checkpoint-Restart for Petascale Computing

Jiajun Cao, Kapil Arya, Rohan Garg, Shawn Matott, more

2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS) > 932 - 941

2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS)

Fault tolerance for the upcoming exascale generation has long been an area of active research. One of the components of a fault tolerance strategy is checkpointing. Petascale-level checkpointing is demonstrated through a new mechanism for virtualization of the InfiniBand UD (unreliable datagram) mode, and for updating the remote address on each UD-based send, due to lack of a fixed peer. Note that...

chapter

Designing Virtualization-Aware and Automatic Topology Detection Schemes for Accelerating Hadoop on SR-IOV-Enabled Clouds

Shashank Gugnani, Xiaoyi Lu, Dhabaleswar K. Panda

2016 IEEE International Conference on Cloud Computing Technology and Science (CloudCom) > 152 - 159

2016 IEEE International Conference on Cloud Computing Technology and Science (CloudCom)

Hadoop is gaining more and more popularity in virtualized environments because of the flexibility and elasticity offered by cloud-based systems. Hadoop supports topology-awareness through topology-aware designs in all of its major components. However, there exists no service that can automatically detect the underlying network topology in a scalable and efficient manner, and provide this information...

chapter

Re-Designing CNTK Deep Learning Framework on Modern GPU Enabled Clusters

Dip Sankar Banerjee, Khaled Hamidouche, Dhabaleswar K. Panda

2016 IEEE International Conference on Cloud Computing Technology and Science (CloudCom) > 144 - 151

2016 IEEE International Conference on Cloud Computing Technology and Science (CloudCom)

Deep learning frameworks have recently gained widespread popularity due to their highly accurate prediction capabilities and availability of low cost processors that can perform training over a large dataset quickly. Given the high core count in modern generation high performance computing systems, training deep networks over large data has now become practical. In this work, while targeting the Computational...

chapter

Designing MPI Library with On-Demand Paging (ODP) of InfiniBand: Challenges and Benefits

Mingzhe Li, Khaled Hamidouche, Xiaoyi Lu, Hari Subramoni, more

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis > 433 - 443

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis

Existing InfiniBand drivers require the communication buffers to be pinned in physical memory during communication. Most runtimes leave these buffers pinned until the end of the run. Such situation limits the swappable memory space for applications. To address these concerns, Mellanox has recently introduced the On-Demand Paging (ODP) feature for InfiniBand. With ODP, communication buffers are paged...

INFONA - science communication portal

Search results for: Dhabaleswar K. Panda

EReinit: Scalable and efficient fault‐tolerance for bulk‐synchronous MPI applications

CCF THPC inaugural issue editorial

Contention-Aware Kernel-Assisted MPI Collectives for Multi-/Many-Core Systems

A Scalable Network-Based Performance Analysis Tool for MPI on Large-Scale HPC Systems

Characterizing Deep Learning over Big Data (DLoBD) Stacks on RDMA-Capable Networks

Efficient and Scalable Multi-Source Streaming Broadcast on GPU Clusters for Deep Learning

MPI-GDS: High Performance MPI Designs with GPUDirect-aSync for CPU-GPU Control Flow Decoupling

High-Performance and Resilient Key-Value Store with Online Erasure Coding for Big Data Workloads

High-Performance Virtual Machine Migration Framework for MPI Applications on SR-IOV Enabled InfiniBand Clusters

Swift-X: Accelerating OpenStack Swift with RDMA for Building an Efficient HPC Cloud

Introduction to HPBDC Workshop

Performance Characterization of Hadoop Workloads on SR-IOV-Enabled Virtualized InfiniBand Clusters

CUDA M3: Designing Efficient CUDA Managed Memory-Aware MPI by Exploiting GDR and IPC

Mizan-RMA: Accelerating Mizan Graph Processing Framework with MPI RMA

Enabling Performance Efficient Runtime Support for Hybrid MPI+UPC++ Programming Models

Impact of HPC Cloud Networking Technologies on Accelerating Hadoop RPC and HBase

System-Level Scalable Checkpoint-Restart for Petascale Computing

Designing Virtualization-Aware and Automatic Topology Detection Schemes for Accelerating Hadoop on SR-IOV-Enabled Clouds

Re-Designing CNTK Deep Learning Framework on Modern GPU Enabled Clusters

Designing MPI Library with On-Demand Paging (ODP) of InfiniBand: Challenges and Benefits

Filter options

Publication date

Content availability

Publication type

Keywords

Data set

Journal

INFONA - science communication portal

Search results for: Dhabaleswar K. Panda

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Data set

Journal

Reporting an error / abuse

Sending the report failed

Accessibility options