2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

book

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

IEEE

chapter

A GPU-Friendly Skiplist Algorithm

Nurit Moscovici, Nachshon Cohen, Erez Petrank

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 246 - 259

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

We propose a design for a fine-grained lock-based skiplist optimized for Graphics Processing Units (GPUs). While GPUs are often used to accelerate streaming parallel computations, it remains a significant challenge to efficiently offload concurrent computations with more complicated data-irregular access and fine-grained synchronization. Natural building blocks for such computations would be concurrent...

chapter

Application Clustering Policies to Address System Fairness with Intel’s Cache Allocation Technology

Vicent Selfa, Julio Sahuquillo, Lieven Eeckhout, Salvador Petit, more

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 194 - 205

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

Achieving system fairness is a major design concern in current multicore processors. Unfairness arises due to contention in the shared resources of the system, such as the LLC and main memory. To address this problem, many research works have proposed novel cache partitioning policies aimed at addressing system fairness without harming performance. Unfortunately, existing proposals targeting fairness...

chapter

POSTER: Cutting the Fat: Speeding Up RBM for Fast Deep Learning Through Generalized Redundancy Elimination

Lin Ning, Randall Pittman, Xipeng Shen

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 154 - 155

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

Restricted Boltzmann Machine (RBM) is the building block of Deep Belief Nets and other deep learning tools. Fast learning and prediction are both essential for practical usage of RBM-based machine learning techniques. This paper presents a concept named generalized redundancy elimination to avoid most of the the computations required in RBM learning and prediction without changing the results. It...

chapter

Artifact Evaluation Committee

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) > xv

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

Provides a listing of current committee members and society officers.

chapter

Lightweight Provenance Service for High-Performance Computing

Dong Dai, Yong Chen, Philip Carns, John Jenkins, more

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 117 - 129

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

Provenance describes detailed information about the history of a piece of data, containing the relationships among elements such as users, processes, jobs, and workflows that contribute to the existence of data. Provenance is key to supporting many data management functionalities that are increasingly important in operations such as identifying data sources, parameters, or assumptions behind a given...

chapter

Cache Automaton: Repurposing Caches for Automata Processing

Arun Subramaniyan, Jingcheng Wang, Ezhil R. M. Balasubramanian, David Blaauw, more

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 373

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

Finite State Automata (FSA) are powerful computational models for extracting patterns from large streams (TBs/PBs) of unstructured data such as system logs, social media posts, emails, and news articles. FSA are also widely used in network security [6], bioinformatics [4] to enable efficient pattern matching. Compute-centric architectures like CPUs and GPG-PUs perform poorly on automata processing...

chapter

Leeway: Addressing Variability in Dead-Block Prediction for Last-Level Caches

Priyank Faldu, Boris Grot

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 180 - 193

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

The looming breakdown of Moore's Law and the end of voltage scaling are ushering a new era where neither transistors nor the energy to operate them is free. This calls for a new regime in computer systems, one in which every transistor counts. Caches are essential for processor performance and represent the bulk of modern processor's transistor budget. To get more performance out of the cache hierarchy,...

chapter

Transparent Dual Memory Compression Architecture

Seikwon Kim, Seonyoung Lee, Taehoon Kim, Jaehyuk Huh

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 206 - 218

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

The increasing memory requirements of big data applications have been driving the precipitous growth of memory capacity in server systems. To maximize the efficiency of external memory, HW-based memory compression techniques have been proposed to increase effective memory capacity. Although such memory compression techniques can improve the memory efficiency significantly, a critical trade-off exists...

chapter

Nexus: A New Approach to Replication in Distributed Shared Caches

Po-An Tsai, Nathan Beckmann, Daniel Sanchez

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 166 - 179

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

Last-level caches are increasingly distributed, consisting of many small banks. To perform well, most accesses must be served by banks near requesting cores. An attractive approach is to replicate read-only data so that a copy is available nearby. But replication introduces a delicate tradeoff between capacity and latency: too little replication forces cores to access faraway banks, while too much...

chapter

POSTER: Improving Datacenter Efficiency Through Partitioning-Aware Scheduling

Harshad Kasture, Xu Ji, Nosayba El-Sayed, Nathan Beckmann, more

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 134 - 135

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

Datacenter servers often colocate multiple applications to improve utilization and efficiency. However, colocated applications interfere in shared resources, e.g., the last-level cache (LLC) and DRAM bandwidth, causing performance inefficiencies. Prior work has proposed two disjoint approaches to address interference. First, techniques that partition shared resources like the LLC can provide isolation...

chapter

POSTER: Improving NUMA System Efficiency with a Utilization-Based Co-scheduling

Younghyun Cho, Camilo A. Celis Guzman, Bernhard Egger

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 150 - 151

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

This work proposes a co-scheduling technique for co-located parallel applications on Non-Uniform Memory Access (NUMA) multi-socket multi-core platforms. The technique allocates core resources for running parallel applications such that both the utilization of the memory controllers and the CPU cores are maximized. Utilization is predicted using an online performance prediction model based on queuing...

chapter

POSTER: Bridge the Gap Between Neural Networks and Neuromorphic Hardware

Yu Ji, YouHui Zhang, WenGuang Chen, Yuan Xie

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 148 - 149

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

Different from training common neural networks (NNs) for inference on general-purpose processors, the development of NNs for neuromorphic chips is usually faced with a number of hardware-specific restrictions. This paper proposes a systematic methodology to address the challenge. It can transform an existing trained, unrestricted NN (usually for software execution substrate) into an equivalent network...

chapter

Graphie: Large-Scale Asynchronous Graph Traversals on Just a GPU

Wei Han, Daniel Mawhirter, Bo Wu, Matthew Buland

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 233 - 245

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

Most GPU-based graph systems cannot handle large-scale graphs that do not fit in the GPU memory. The ever-increasing graph size demands a scale-up graph system, which can run on a single GPU with optimized memory access efficiency and well-controlled data transfer overhead. However, existing systems either incur redundant data transfers or fail to use shared memory. In this paper we present Graphie,...

chapter

Efficient Checkpointing of Loop-Based Codes for Non-volatile Main Memory

Hussein Elnawawy, Mohammad Alshboul, James Tuck, Yan Solihin

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 318 - 329

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

Future main memory will likely include Non-Volatile Memory. Non-Volatile Main Memory (NVMM) provides an opportunity to rethink checkpointing strategies for providing failure safety to applications. While there are many checkpointing and logging schemes in literature, their use must be revisited as they incur high execution time overheads as well as a large number of additional writes to NVMM, which...

chapter

Sthira: A Formal Approach to Minimize Voltage Guardbands under Variation in Networks-on-Chip for Energy Efficiency

Raghavendra Pradyumna Pothukuchi, Amin Ansari, Bhargava Gopireddy, Josep Torrellas

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 260 - 272

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

Networks-on-Chip (NoCs) in chip multiprocessors are prone to within-die process variation as they span the whole chip. To tolerate variation, their voltages (Vdd) carry over-provisioned guardbands. As a result, prior work has proposed to save energy by operating at reduced Vdd while occasionally suffering and fixing errors. Unfortunately, these proposals use heuristic controller designs that provide...

chapter

End-to-End Deep Learning of Optimization Heuristics

Chris Cummins, Pavlos Petoumenos, Zheng Wang, Hugh Leather

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 219 - 232

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

Accurate automatic optimization heuristics are necessary for dealing with thecomplexity and diversity of modern hardware and software. Machine learning is aproven technique for learning such heuristics, but its success is bound by thequality of the features used. These features must be hand crafted by developersthrough a combination of expert domain knowledge and trial and error. This makesthe quality...

chapter

Near-Memory Address Translation

Javier Picorel, Djordje Jevdjic, Babak Falsafi

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 303 - 317

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

Memory and logic integration on the same chip is becoming increasingly cost effective, creating the opportunity to offload data-intensive functionality to processing units placed inside memory chips. The introduction of memory-side processing units (MPUs) into conventional systems faces virtual memory as the first big showstopper: without efficient hardware support for address translation MPUs have...

chapter

POSTER: Exploiting Approximations for Energy/Quality Tradeoffs in Service-Based Applications

Liu Liu, Sibren Isaacman, Abhishek Bhattacharjee, Ulrich Kremer

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 156 - 157

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

Approximations and redundancies allow mobile and distributed applications to produce answers or outcomes of lesser quality at lower costs. This paper introduces RAPID, a new programming framework and methodology for service-based applications with approximations and redundancies. Finding the best service configuration under a given resource budget becomes a constrained, dual-weight graph optimization...

chapter

POSTER: BigBus: A Scalable Optical Interconnect

Eldhose Peter, Janibul Bashir, Smruti R. Sarangi

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 162 - 163

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

This paper presents BigBus, a novel on-chip photonic network for a 1024 node system. The crux of the idea is to segment the entire system into smaller clusters of nodes, and adopt a hybrid strategy for each segment that includes conventional laser modulation, as well as a novel technique for sharing power across nodes dynamically. We represent energy internally as tokens, where one token will allow...

INFONA - science communication portal

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)