2017 IEEE International Conference on Cluster Computing (CLUSTER)

book

2017 IEEE International Conference on Cluster Computing (CLUSTER)

IEEE

chapter

Performance of Large-Scale Electronic Structure Calculations on Built-in FPGA Systems

Seungmin Lee, Dukyun Nam, Hoon Ryu

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 635 - 636

2017 IEEE International Conference on Cluster Computing (CLUSTER)

We discuss the feasibility of an in-house Schrödinger equation solver on the Intel Broadwell Xeon processor with a built-in FPGA, with a particular focus on the performance of large-scale sparse matrix-vector multiplication (SpMV) that is the core numerical operation of electronic structure simulations for multi-million atomic systems. The double-precision SpMV section in our solver is offloaded to...

chapter

Evaluating Effect of Write Combining on PCIe Throughput to Improve HPC Interconnect Performance

Mahesh Chaudhari, Kedar Kulkarni, Shreeya Badhe, Vandana Inamdar

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 639 - 640

2017 IEEE International Conference on Cluster Computing (CLUSTER)

HPC interconnect is a very crucial component of any HPC machine. Interconnect performance is one of the contributing factors for overall performance of HPC system. Most popular interface to connect Network Interface Card (NIC) to CPU is PCI express (PCIe). With denser core counts in compute servers and increasingly maturing fabric interconnect speeds, there is need to maximize the packet data movement...

chapter

Preliminary Interference Study About Job Placement and Routing Algorithms in the Fat-Tree Topology for HPC Applications

Peixin Qiao, Xin Wang, Xu Yang, Yuping Fan, more

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 641 - 642

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Among the high-radix and low-diameter networks, fat-tree topology is commonly used in HPC and datacenter systems. Resource and job management is critically important to mitigate application interference in order to achieve high system performance and utilization. Preliminary studies have shown the effect of job placement on parallel scientific applications performance. In this work we study interference...

chapter

Efficient Swap Protocol of Remote Memory Paging for Out-of-Core Multi-thread Applications

Hiroko Midorikawa, Kenji Kitagawa, Hikari Ohura

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 637 - 638

2017 IEEE International Conference on Cluster Computing (CLUSTER)

A new page swap protocol is proposed for a user-level remote memory paging system to accelerate the performance of out-of-core processing with multi-thread user programs and libraries written in OpenMP and pthread. The original swap protocol has a bottle-neck in efficient page swapping which is requested by multiple threads in a user program, because all MPI communications to memory servers and page...

chapter

A Power-Efficient Accelerator for Convolutional Neural Networks

Fan Sun, Chao Wang, Lei Gong, Chongchong Xu, more

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 631 - 632

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Convolutional neural networks(CNNs) have been widely applied in various applications. However, the computation-intensive convolutional layers and memory-intensive fully connected layers have brought many challenges to the implementation of CNN on embedded platforms. To overcome this problem, this work proposes a power-efficient accelerator for CNNs, and different methods are applied to optimize the...

chapter

Analyzing Hybrid Transactional Memory Performance Using Intel SDE

Mohammad A. Qayum, Abdel-Hameed A. Badawy, Jeanine Cook

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 627 - 628

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Due to the rapidly increasing use of big data, machines are stressed to provide more computing power at higher energy efficiency while maintaining simpler and more scalable computing paradigms. Transactional Memory (TM) is one such technique that can be used for synchronization instead of conventional locks used in critical sections since it has simpler paradigms, is scalable and has better energy...

chapter

Implementing Lattice QCD Application with XcalableACC Language on Accelerated Cluster

Masahiro Nakao, Hitoshi Murai, Hidetoshi Iwashita, Akihiro Tabuchi, more

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 429 - 438

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Accelerated clusters, which are distributed memory systems equipped with accelerators, have been used in various fields. For accelerated clusters, programmers often implement their applications by a combination of MPI and CUDA (MPI+CUDA). However, the approach faces programming complexity issues. This paper introduces the XcalableACC (XACC) language, which is a hybrid model of XcalableMP (XMP) and...

chapter

Flexible Data Aggregation for Performance Profiling

David Boehme, David Beckingsale, Martin Schulz

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 419 - 428

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Almost all performance analysis tools in the HPC space perform some form of aggregation to compute summary information of a series of performance measurements, from summations to more complex operations like histograms. Aggregation not only reduces data volumes and consequently storage space requirements and overheads, but is also crucial to extract insights from recorded measurement data. In current...

chapter

Algorithm-Directed Crash Consistence in Non-volatile Memory for HPC

Shuo Yang, Kai Wu, Yifan Qiao, Dong Li, more

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 475 - 486

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Fault tolerance is one of the major design goals for HPC. The emergence of non-volatile memories (NVM) provides a solution to build fault tolerant HPC. Data in NVM-based main memory are not lost when the system crashes because of the non-volatility nature of NVM. However, because of volatile caches, data must be logged and explicitly flushed from caches into NVM to ensure consistence and correctness...

chapter

Publisher's Information

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 888

2017 IEEE International Conference on Cluster Computing (CLUSTER)

chapter

LIKWID Monitoring Stack: A Flexible Framework Enabling Job Specific Performance monitoring for the masses

Thomas Rohl, Jan Eitzinger, Georg Hager, Gerhard Wellein

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 781 - 784

2017 IEEE International Conference on Cluster Computing (CLUSTER)

System monitoring is an established tool to measure the utilization and health of HPC systems. Usually system monitoring infrastructures make no connection to job information and do not utilize hardware performance monitoring (HPM) data. To increase the efficient use of HPC systems automatic and continuous performance monitoring of jobs is an essential component. It can help to identify pathological...

chapter

A Novel Hybrid Transactional Memory Based on Abort Prediction and Adaptive Retry Policy

Young-Sung Shin, Yeon-Woo Jang, Moon-Hwan Kang, Jae-Woo Chang

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 613 - 614

2017 IEEE International Conference on Cluster Computing (CLUSTER)

This paper proposes a novel hybrid transactional memory scheme based on both abort prediction and an adaptive retry policy. First, the proposed scheme can predict not only conflicts between transactions running concurrently, but also the capacity and other aborts of transactions by collecting the information of previously executed transactions. Second, the proposed scheme can provide an adaptive retry...

chapter

Detection of Silent Data Corruption in Adaptive Numerical Integration Solvers

Pierre-Louis Guhur, Emil Constantinescu, Debojyoti Ghosh, Tom Peterka, more

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 592 - 602

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Scientific computing requires trust in results. In high-performance computing, trust is impeded by silent data corruption (SDC), in other words corruption that remains unnoticed. Numerical integration solvers are especially sensitive to SDCs because an SDC introduced in a certain step affects all the following steps. SDCs can even cause the solver to become unstable. Adaptive solvers can change the...

chapter

Distributed Parallel Backprojection for Real-Time Stripmap SAR Imaging on GPU Clusters

Masato Gocho, Noboru Oishi, Atsuo Ozaki

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 619 - 620

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Parallelization on a GPU (graphics processing unit) cluster is an effective approach to reducing the huge computation time of backprojection, which is the most accurate SAR (synthetic aperture radar) imaging algorithm for reconstructing images with no errors caused by the platform motion. To obtain accurate imagery in real-time, we developed a distributed parallel backprojection algorithm for stripmap...

chapter

Automatic, Abstracted and Portable Topology-Aware Thread Placement

Jens Gustedt, Emmanuel Jeannot, Farouk Mansouri

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 389 - 399

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Efficiently programming shared-memory machines is a difficult challenge because mapping application threads onto the memory hierarchy has a strong impact on the performance. However, optimizing such thread placement is difficult: architectures become increasingly complex and application behavior changes with implementations and input parameters, e.g problem size and number of threads. In this work,...

chapter

EclipseMR: Distributed and Parallel Task Processing with Consistent Hashing

Vicente A. B. Sanchez, Wonbae Kim, Youngmoon Eom, Kibeom Jin, more

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 322 - 332

2017 IEEE International Conference on Cluster Computing (CLUSTER)

We present EclipseMR, a novel MapReduce framework prototype that efficiently utilizes a large distributed memory in cluster environments. EclipseMR consists of double-layered consistent hash rings - a decentralized DHT-based file system and an in-memory key-value store that employs consistent hashing. The in-memory key-value store in EclipseMR is designed not only to cache local data but also remote...

chapter

Visual Analytics Techniques for Exploring the Design Space of Large-Scale High-Radix Networks

Jianping Kelvin Li, Misbah Mubarak, Robert B. Ross, Christopher D. Carothers, more

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 193 - 203

2017 IEEE International Conference on Cluster Computing (CLUSTER)

High-radix, low-diameter, hierarchical networks based on the Dragonfly topology are common picks for building next generation HPC systems. However, effective tools are lacking for analyzing the network performance and exploring the design choices for such emerging networks at scale. In this paper, we present visual analytics methods that couple data aggregation techniques with interactive visualizations...

chapter

Performance Modeling for Optimal Data Placement on GPU with Heterogeneous Memory Systems

Yingchao Huang, Dong Li

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 166 - 177

2017 IEEE International Conference on Cluster Computing (CLUSTER)

A heterogeneous memory system (HMS) consists of multiple memory components with different properties. GPU is a representative architecture with HMS. It is challenging to decide optimal placement of data objects on HMS because of the large exploration space and complicated memory hierarchy on HMS. In this paper, we introduce performance modeling techniques to predict performance of various data placements...

chapter

Contention-Aware Kernel-Assisted MPI Collectives for Multi-/Many-Core Systems

Sourav Chakraborty, Hari Subramoni, Dhabaleswar K. Panda

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 13 - 24

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Multi-/many-core CPU based architectures are seeing widespread adoption due to their unprecedented compute performance in a small power envelope. With the increasingly large number of cores on each node, applications spend a significant portion of their execution time in intra-node communication. While shared memory is commonly used for intra-node communication, it needs to copy each message once...

INFONA - science communication portal

2017 IEEE International Conference on Cluster Computing (CLUSTER)