2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Deep Learning is redefining computing. Deep Neural Networks, or DNNs, have led to breakthrough accuracy improvements for tasks formerly considered AI, like speech recognition, image classification, and translation. Recurrent DNNs are differentiable universal computers. DNNs are layered structures of relatively simple functions with millions to billions of learnable model parameters. The challenge...

chapter

Poster session

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > xii - xviii

chapter

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > xix - xx

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

chapter

Cover page

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 1 - 2

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

chapter

Legato: End-to-end bounded region serializability using commodity hardware transactional memory

Aritra Sengupta, Man Cao, Michael D. Bond, Milind Kulkarni

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 1 - 13

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Shared-memory languages and systems provide strong guarantees only for well-synchronized (data-race-free) programs. Prior work introduces support for memory consistency based on region serializability of executing code regions, but all approaches incur serious limitations such as adding high run-time overhead or relying on complex custom hardware. This paper explores the potential for leveraging widely...

chapter

Automatic detection of extended data-race-free regions

Alexandra Jimborean, Jonatan Waern, Per Ekemark, Stefanos Kaxiras, more

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 14 - 26

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Data-race-free (DRF) parallel programming becomes a standard as newly adopted memory models of mainstream programming languages such as C++ or Java impose data-race-freedom as a requirement. We propose compiler techniques that automatically delineate extended data-race-free regions (xDRF), namely regions of code which provide the same guarantees as the synchronization-free regions (in the context...

chapter

FinePar: Irregularity-aware fine-grained workload partitioning on integrated architectures

Feng Zhang, Bo Wu, Jidong Zhai, Bingsheng He, more

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 27 - 38

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

The integrated architecture that features both CPU and GPU on the same die is an emerging and promising architecture for fine-grained CPU-GPU collaboration. However, the integration also brings forward several programming and system optimization challenges, especially for irregular applications. The complex interplay between heterogeneity and irregularity leads to very low processor utilization of...

chapter

TwinKernels: An execution model to improve GPU hardware scheduling at compile time

Xiang Gong, Zhongliang Chen, Amir Kavyan Ziabari, Rafael Ubal, more

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 39 - 49

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

As throughput-oriented accelerators, GPUs provide tremendous processing power by running a massive number of threads in parallel. However, exploiting high degrees of thread-level parallelism (TLP) does not always translate to the peak performance that GPUs can offer, leaving the GPU's resources often under-utilized. Compared to compute resources, memory resources can tolerate considerably lower levels...

chapter

Taming warp divergence

Jayvant Anantpur, R. Govindarajan

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 50 - 60

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Graphics Processing Units (GPUs) are designed to exploit large amount of parallelism. However, warp-level divergence occurring due to different amounts of work, memory access latency experienced, etc., results in warps of a thread block (TB) finishing kernel execution at different points in time. This, in effect, reduces utilization of resources of SMs and hence performance of the GPU. We propose...

chapter

Dynamic buffer overflow detection for GPGPUs

Christopher Erb, Mike Collins, Joseph L. Greathouse

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 61 - 73

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Buffer overflows are a common source of program crashes, data corruption, and security problems. In this work, we demonstrate that GPU-based workloads can also cause buffer overflows, a problem that was traditionally ignored because CPUs and GPUs had separate memory spaces. Modern GPUs share virtual, and sometimes physical, memory with CPUs, meaning that GPU-based buffer overflows are capable of producing...

chapter

LIFT: A functional data-parallel IR for high-performance GPU code generation

Michel Steuwer, Toomas Remmelg, Christophe Dubach

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 74 - 85

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Parallel patterns (e.g., map, reduce) have gained traction as an abstraction for targeting parallel accelerators and are a promising answer to the performance portability problem. However, compiling high-level programs into efficient low-level parallel code is challenging. Current approaches start from a high-level parallel IR and proceed to emit GPU code directly in one big step. Fixed strategies...

chapter

Synthesizing benchmarks for predictive modeling

Chris Cummins, Pavlos Petoumenos, Zheng Wang, Hugh Leather

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 86 - 99

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Predictive modeling using machine learning is an effective method for building compiler heuristics, but there is a shortage of benchmarks. Typical machine learning experiments outside of the compilation field train over thousands or millions of examples. In machine learning for compilers, however, there are typically only a few dozen common benchmarks available. This limits the quality of learned...

chapter

Formalizing the concurrency semantics of an LLVM fragment

Soham Chakraborty, Viktor Vafeiadis

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 100 - 110

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

The LLVM compiler follows closely the concurrency model of C/C++ 2011, but with a crucial difference. While in C/C++ a data race between a non-atomic read and a write is declared to be undefined behavior, in LLVM such a race has defined behavior: the read returns the special ‘undef’ value. This subtle difference in the semantics of racy programs has profound consequences on the set of allowed program...

chapter

ThinLTO: Scalable and incremental LTO

Teresa Johnson, Mehdi Amini, Xinliang David Li

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 111 - 121

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Cross-Module Optimization (CMO) is an effective means for improving runtime performance, by extending the scope of optimizations across source module boundaries. Two CMO approaches are Link-Time Optimization (LTO) and Lightweight Inter-Procedural Optimization (LIPO). However, each of these solutions has limitations that prevent it from being enabled by default. ThinLTO is a new approach that attempts...

chapter

Automatic generation of fast BLAS3-GEMM: A portable compiler approach

Xing Su, Xiangke Liao, Jingling Xue

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 122 - 133

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

GEMM is the main computational kernel in BLAS3. Its micro-kernel is either hand-crafted in assembly code or generated from C code by general-purpose compilers (guided by architecture-specific directives or auto-tuning). Therefore, either performance or portability suffers. We present a POrtable Compiler Approach, Poca, implemented in LLVM, to automatically generate and optimize this micro-kernel in...

chapter

Pointer disambiguation via strict inequalities

Maroua Maalej, Vitor Paisante, Pedro Ramos, Laure Gonnord, more

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 134 - 147

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

The design and implementation of static analyses that disambiguate pointers has been a focus of research since the early days of compiler construction. One of the challenges that arise in this context is the analysis of languages that support pointer arithmetics, such as C, C++ and assembly dialects. This paper contributes to solve this challenge. We start from an obvious, yet unexplored, observation:...

Publication date

Set your own date range

Keywords

OPTIMIZATION (15)
HARDWARE (10)
KERNEL (7)
GRAPHICS PROCESSING UNITS (6)
PROGRAM PROCESSORS (6)
MEMORY MANAGEMENT (5)
RUNTIME (5)
SEMANTICS (5)
BENCHMARK TESTING (4)
INSTRUCTION SETS (4)
REGISTERS (4)
ALGORITHM DESIGN AND ANALYSIS (3)
ARRAYS (3)
COMPUTATIONAL MODELING (3)
COMPUTER ARCHITECTURE (3)
LAYOUT (3)
PROCESSOR SCHEDULING (3)
STANDARDS (3)
SYNCHRONIZATION (3)
TRAINING (3)
COLLABORATION (2)
COMPILER ANALYSIS (2)
ENGINES (2)
GOOGLE (2)
HISTOGRAMS (2)
INFORMATICS (2)
INSTRUMENTS (2)
LOAD MODELING (2)
MACHINE LEARNING (2)
MEASUREMENT (2)
PARALLEL PROCESSING (2)
PREFETCHING (2)
PROGRAMMING (2)
ACTIVE LEARNING (1)
ALGORITHMS (1)
ALIAS ANALYSIS (1)
APPROXIMATE COMPUTING (1)
APPROXIMATION ALGORITHMS (1)
BARS (1)
BENCHMARKING (1)
C++ LANGUAGES (1)
COARSE-GRAINED RECONFIGURABLE ARCHITECTURE (1)
CODE COMPRESSION (1)
CODE OPTIMIZATION (1)
COGNITION (1)
COHERENCE (1)
COLLABORATIVE ANALYSIS (1)
COMPILERS (1)
COMPUTATIONAL IDIOMS (1)
COMPUTER CRASHES (1)
CONCURRENT COMPUTING (1)
CONSTRAINT SOLVER (1)
CONTEXT (1)
CROSS-MODULE (1)
CRYPTOGRAPHY (1)
DATA MODELS (1)
DATABASES (1)
DEEP LEARNING (1)
DEGRADATION (1)
DEMAND-DRIVEN ANALYSIS (1)
DENSE LINEAR ALGEBRA (1)
DEPENDENCE ANALYSIS (1)
DETECTION ALGORITHMS (1)
DETECTORS (1)
DISTRIBUTED BRAGG REFLECTORS (1)
DIVERGENCE (1)
DYNAMIC BINARY TRANSLATION (1)
ELECTRIC FENCES (1)
EMULATION (1)
ENERGY CONSUMPTION (1)
ENERGY REDUCTION (1)
EXECUTION PHASES (1)
FACEBOOK (1)
GEMM (1)
GPU (1)
GPUS (1)
GRAMMAR (1)
INTER-PROCEDURAL (1)
IRON (1)
ITERATIVE COMPILATION (1)
LIBRARIES (1)
LINEAR ALGEBRA (1)
LINK-TIME OPTIMIZATION (1)
LIPS (1)
LOADING (1)
MESSAGE SYSTEMS (1)
MICROMECHANICAL DEVICES (1)
MULTICORE PROCESSING (1)
NOISE MEASUREMENT (1)
OPENCL (1)
ORGANIZATIONS (1)
PARALLELIZATION (1)
PERFORMANCE EVALUATION (1)
PERIODIC STRUCTURES (1)
PINS (1)
PIPELINES (1)
POLYHEDRAL MODEL (1)
PRECISION (1)
PREDICTIVE MODELS (1)
PRESBURGER PRECONDITION (1)
more

INFONA - science communication portal

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Message from the general chair

Report from the artifact evaluation committee

Organization committee

Sponsors

Keynote: The computer science behind the Microsoft Cognitive Toolkit: An open source large-scale deep learning toolkit for Windows and Linux

Poster session

Contents

Cover page

Legato: End-to-end bounded region serializability using commodity hardware transactional memory

Automatic detection of extended data-race-free regions

FinePar: Irregularity-aware fine-grained workload partitioning on integrated architectures

TwinKernels: An execution model to improve GPU hardware scheduling at compile time

Taming warp divergence

Dynamic buffer overflow detection for GPGPUs

LIFT: A functional data-parallel IR for high-performance GPU code generation

Synthesizing benchmarks for predictive modeling

Formalizing the concurrency semantics of an LLVM fragment

ThinLTO: Scalable and incremental LTO

Automatic generation of fast BLAS3-GEMM: A portable compiler approach

Pointer disambiguation via strict inequalities

Filter options

Publication date

Keywords

INFONA - science communication portal

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)