2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

book

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

IEEE

chapter

Title page

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 1

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

chapter

Copyright page

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 1

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

chapter

A parallel abstract interpreter for JavaScript

Kyle Dewey, Vineeth Kashyap, Ben Hardekopf

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 34 - 45

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

We investigate parallelizing flow- and context-sensitive static analysis for JavaScript. Previous attempts to parallelize such analyses for other languages typically start with the traditional framework of sequential dataflow analysis, and then propose methods to parallelize the existing sequential algorithms within this framework. However, we show that this approach is non-optimal and propose a new...

chapter

Automatic data placement into GPU on-chip memory resources

Chao Li, Yi Yang, Zhen Lin, Huiyang Zhou

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 23 - 33

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Although graphics processing units (GPUs) rely on thread-level parallelism to hide long off-chip memory access latency, judicious utilization of on-chip memory resources, including register files, shared memory, and data caches, is critical to application performance. However, explicitly managing GPU on-chip memory resources is a non-trivial task for application developers. More importantly, as on-chip...

chapter

On performance debugging of unnecessary lock contentions on multicore processors: A replay-based approach

Long Zheng, Xiaofei Liao, Bingsheng He, Song Wu, more

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 56 - 67

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Locks have been widely used as an effective synchronization mechanism among processes and threads. However, we observe that a large number of false inter-thread dependencies (i.e., unnecessary lock contentions) exist during the program execution on multicore processors, thereby incurring significant performance overhead. This paper presents a performance debugging framework, PerfPlay, to facilitate...

chapter

Locality aware concurrent start for stencil applications

Sunil Shrestha, Guang R. Gao, Joseph Manzano, Andres Marquez, more

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 157 - 166

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Stencil computations are at the heart of many physical simulations used in scientific codes. Thus, there exists a plethora of optimization efforts for this family of computations. Among these techniques, tiling techniques that allow concurrent start have proven to be very efficient in providing better performance for these critical kernels. Nevertheless, with many core designs being the norm, these...

chapter

Branch prediction and the performance of interpreters — Don't trust folklore

Erven Rohou, Bharath Narasimha Swamy, Andre Seznec

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 103 - 114

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Interpreters have been used in many contexts. They provide portability and ease of development at the expense of performance. The literature of the past decade covers analysis of why interpreters are slow, and many software techniques to improve them. A large proportion of these works focuses on the dispatch loop, and in particular on the implementation of the switch statement: typically an indirect...

chapter

Optimizing the flash-RAM energy trade-off in deeply embedded systems

James Pallister, Kerstin Eder, Simon J. Hollis

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 115 - 124

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Deeply embedded systems often have the tightest constraints on energy consumption, requiring that they consume tiny amounts of current and run on batteries for years. However, they typically execute code directly from flash, instead of the more energy efficient RAM. We implement a novel compiler optimization¹ that exploits the relative efficiency of RAM by statically moving carefully selected basic...

chapter

EMEURO: A framework for generating multi-purpose accelerators via deep learning

Lawrence McAfee, Kunle Olukotun

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 125 - 135

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Approximate computing is a very promising design paradigm for crossing the CPU power wall, primarily driven by the potential to sacrifice output quality for significant gains in performance, energy, and fault tolerance. Unfortunately, existing solutions have primarily either focused on new programming models, or new hardware designs, leaving significant room between these two ends for software-based...

chapter

Data provenance tracking for concurrent programs

Brandon Lucia, Luis Ceze

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 146 - 156

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

We propose Last Writer Slicing (LWS), a mechanism for tracking data provenance information in multithreaded code in a production setting. Last writer slices dynamically track provenance of values by recording the thread and operation that last wrote each variable. We show that this information complements core dumps and greatly improves debugability. We also propose communication traps (CTraps), an...

chapter

Optimizing and auto-tuning scale-free sparse matrix-vector multiplication on Intel Xeon Phi

Wai Teng Tang, Ruizhe Zhao, Mian Lu, Yun Liang, more

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 136 - 145

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Recently, the Intel Xeon Phi coprocessor has received increasing attention in high performance computing due to its simple programming model and highly parallel architecture. In this paper, we implement sparse matrix vector multiplication (SpMV) for scale-free matrices on the Xeon Phi architecture and optimize its performance. Scale-free sparse matrices are widely used in various application domains,...

chapter

Message from the general chairs

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 1 - 2

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

chapter

Locality-centric thread scheduling for bulk-synchronous programming models on CPU architectures

Hee-Seok Kim, Izzat El Hajj, John Stratton, Steven Lumetta, more

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 257 - 268

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

With heterogeneous computing on the rise, executing programs efficiently on different devices from a single source code has become increasingly important. OpenCL, having a bulk-synchronous programming model, has been proposed as a framework for writing such performance-portable programs. Execution order of work-items in a program is unconstrained except at barrier synchronization events, giving some...

chapter

PSLP: Padded SLP automatic vectorization

Vasileios Porpodas, Alberto Magni, Timothy M. Jones

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 190 - 201

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

The need to increase performance and power efficiency in modern processors has led to a wide adoption of SIMD vector units. All major vendors support vector instructions and the trend is pushing them to become wider and more powerful. However, writing code that makes efficient use of these units is hard and leads to platform-specific implementations. Compiler-based automatic vectorization is one solution...

chapter

Scalable conditional induction variables (CIV) analysis

Cosmin E. Oancea, Lawrence Rauchwerger

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 213 - 224

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Subscripts using induction variables that cannot be expressed as a formula in terms of the enclosing-loop indices appear in the low-level implementation of common programming abstractions such as Alter, or stack operations and pose significant challenges to automatic parallelization. Because the complexity of such induction variables is often due to their conditional evaluation across the iteration...

chapter

HELIX-UP: Relaxing program semantics to unleash parallelization

Simone Campanoni, Glenn Holloway, Gu-Yeon Wei, David Brooks

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 235 - 245

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Automatic generation of parallel code for general-purpose commodity processors is a challenging computational problem. Nevertheless, there is a lot of latent thread-level parallelism in the way sequential programs are actually used. To convert latent parallelism into performance gains, users may be willing to compromise on the quality of a program's results. We have developed a parallelizing compiler...

chapter

Approximating flow-sensitive pointer analysis using frequent itemset mining

Vaivaswatha Nagaraj, R. Govindarajan

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 225 - 234

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Pointer alias analysis is a well researched problem in the area of compilers and program verification. Many recent works in this area have focused on flow-sensitivity due to the additional precision it offers. However, a flow-sensitive analysis is computationally expensive, thus, preventing its use in larger programs. In this work, we observe that a number of object sets, consisting of tens to hundreds...

chapter

Hermes: A fast cross-ISA binary translator with post-optimization

Xiaochun Zhang, Qi Guo, Yunji Chen, Tianshi Chen, more

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 246 - 256

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

In the era of mobile and cloud computing, cross-ISA (Instruction Set Architecture) binary translation attracts increasing attentions due to the ISA diversity of computing platforms. To easily adapt to vast guest- and host-ISAs with minimal porting efforts, existing cross-ISA binary translators (e.g., QEMU) are typically built upon ISA-independent Intermediate Representation (IR). Although IR conceals...

chapter

A graph-based higher-order intermediate representation

Roland Leisa, Marcel Koster, Sebastian Hack

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 202 - 212

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Many modern programming languages support both imperative and functional idioms. However, state-of-the-art imperative intermediate representations (IRs) cannot natively represent crucial functional concepts (like higher-order functions). On the other hand, functional IRs employ an explicit scope nesting, which is cumbersome to maintain across certain transformations. In this paper we present Thorin:...

INFONA - science communication portal

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)