2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Items from 1 to 20 out of 36 results

book

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

IEEE

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

chapter

Optimizing function placement for large-scale data-center applications

Guilherme Ottoni, Bertrand Maher

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 233 - 244

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Modern data-center applications often comprise a large amount of code, with substantial working sets, making them good candidates for code-layout optimizations. Although recent work has evaluated the impact of profile-guided intramodule optimizations and some cross-module optimizations, no recent study has evaluated the benefit of function placement for such large-scale applications. In this paper,...

chapter

Clairvoyance: Look-ahead compile-time scheduling

Kim-Anh Tran, Trevor E. Carlson, Konstantinos Koukos, Magnus Sjalander, more

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 171 - 184

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

To enhance the performance of memory-bound applications, hardware designs have been developed to hide memory latency, such as the out-of-order (OoO) execution engine, at the price of increased energy consumption. Contemporary processor cores span a wide range of performance and energy efficiency options: from fast and power-hungry OoO processors to efficient, but slower in-order processors. The more...

chapter

Parallel associative reductions in Halide

Patricia Suriana, Andrew Adams, Shoaib Kamil

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 281 - 291

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Halide is a domain-specific language for fast image processing that separates pipelines into the algorithm, which defines what values are computed, and the schedule, which defines how they are computed. Changes to the schedule are guaranteed to not change the results. While Halide supports parallelizing and vectorizing naturally data-parallel operations, it does not support the same scheduling for...

chapter

Discovery and exploitation of general reductions: A constraint based approach

Philip Ginsbach, Michael F. P. O'Boyle

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 269 - 280

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Discovering and exploiting scalar reductions in programs has been studied for many years. The discovery of more complex reduction operations has, however, received less attention. Such reductions contain compile-time unknown parameters, indirect memory accesses and dynamic control flow, which are challenging for existing approaches. In this paper we develop a new compiler based approach that automatically...

chapter

Optimistic loop optimization

Johannes Doerfert, Tobias Grosser, Sebastian Hack

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 292 - 304

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Compilers use static analyses to justify program optimizations. As every optimization must preserve the semantics of the original program, static analysis typically fall-back to conservative approximations. Consequently, the set of states for which the optimization is invalid is overapproximated and potential optimization opportunities are missed. Instead of justifying the optimization statically,...

chapter

Software prefetching for indirect memory accesses

Sam Ainsworth Timothy, M. Jones

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 305 - 317

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting proposition to solve this is software prefetching, where special non-blocking loads are used to bring data into the cache hierarchy just before being required. However, these are difficult to insert to effectively improve performance, and techniques for automatic insertion are currently limited. This paper develops...

chapter

Formalizing the concurrency semantics of an LLVM fragment

Soham Chakraborty, Viktor Vafeiadis

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 100 - 110

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

The LLVM compiler follows closely the concurrency model of C/C++ 2011, but with a crucial difference. While in C/C++ a data race between a non-atomic read and a write is declared to be undefined behavior, in LLVM such a race has defined behavior: the read returns the special ‘undef’ value. This subtle difference in the semantics of racy programs has profound consequences on the set of allowed program...

chapter

Report from the artifact evaluation committee

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > ix

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

chapter

Keynote: The computer science behind the Microsoft Cognitive Toolkit: An open source large-scale deep learning toolkit for Windows and Linux

Frank Seide

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > xi

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Deep Learning is redefining computing. Deep Neural Networks, or DNNs, have led to breakthrough accuracy improvements for tasks formerly considered AI, like speech recognition, image classification, and translation. Recurrent DNNs are differentiable universal computers. DNNs are layered structures of relatively simple functions with millions to billions of learnable model parameters. The challenge...

chapter

Author index

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 318

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

chapter

Legato: End-to-end bounded region serializability using commodity hardware transactional memory

Aritra Sengupta, Man Cao, Michael D. Bond, Milind Kulkarni

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 1 - 13

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Shared-memory languages and systems provide strong guarantees only for well-synchronized (data-race-free) programs. Prior work introduces support for memory consistency based on region serializability of executing code regions, but all approaches incur serious limitations such as adding high run-time overhead or relying on complex custom hardware. This paper explores the potential for leveraging widely...

chapter

Pointer disambiguation via strict inequalities

Maroua Maalej, Vitor Paisante, Pedro Ramos, Laure Gonnord, more

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 134 - 147

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

The design and implementation of static analyses that disambiguate pointers has been a focus of research since the early days of compiler construction. One of the challenges that arise in this context is the analysis of languages that support pointer arithmetics, such as C, C++ and assembly dialects. This paper contributes to solve this challenge. We start from an obvious, yet unexplored, observation:...

chapter

Minimizing the cost of iterative compilation with active learning

William F. Ogilvie, Pavlos Petoumenos, Zheng Wang, Hugh Leather

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 245 - 256

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Since performance is not portable between platforms, engineers must fine-tune heuristics for each processor in turn. This is such a laborious task that high-profile compilers, supporting many architectures, cannot keep up with hardware innovation and are actually out-of-date. Iterative compilation driven by machine learning has been shown to be efficient at generating portable optimization models...

chapter

Cross-ISA machine emulation for multicores

Emilio G. Cota, Paolo Bonzini, Alex Bennee, Luca P. Carloni

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 210 - 220

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Speed, portability and correctness have traditionally been the main requirements for dynamic binary translation (DBT) systems. Given the increasing availability of multi-core machines as both emulation guests and hosts, scalability has emerged as an additional design objective. It has however been an elusive goal for two reasons: contention on common data structures such as the translation cache is...

chapter

Removing checks in dynamically typed languages through efficient profiling

Gem Dot, Alejandro Martinez, Antonio Gonzalez

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 257 - 268

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Dynamically typed languages increase programmer's productivity at the expense of some runtime overheads to manage the types of variables, since they are not declared at compile time and can change at runtime. One of the most important overheads is due to very frequent checks that are introduced in the specialized code to identify the type of the variables. In this paper, we present a HW/SW hybrid...

chapter

Dynamic buffer overflow detection for GPGPUs

Christopher Erb, Mike Collins, Joseph L. Greathouse

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 61 - 73

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Buffer overflows are a common source of program crashes, data corruption, and security problems. In this work, we demonstrate that GPU-based workloads can also cause buffer overflows, a problem that was traditionally ignored because CPUs and GPUs had separate memory spaces. Modern GPUs share virtual, and sometimes physical, memory with CPUs, meaning that GPU-based buffer overflows are capable of producing...

chapter

TwinKernels: An execution model to improve GPU hardware scheduling at compile time

Xiang Gong, Zhongliang Chen, Amir Kavyan Ziabari, Rafael Ubal, more

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 39 - 49

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

As throughput-oriented accelerators, GPUs provide tremendous processing power by running a massive number of threads in parallel. However, exploiting high degrees of thread-level parallelism (TLP) does not always translate to the peak performance that GPUs can offer, leaving the GPU's resources often under-utilized. Compared to compute resources, memory resources can tolerate considerably lower levels...

chapter

Taming warp divergence

Jayvant Anantpur, R. Govindarajan

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 50 - 60

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Graphics Processing Units (GPUs) are designed to exploit large amount of parallelism. However, warp-level divergence occurring due to different amounts of work, memory access latency experienced, etc., results in warps of a thread block (TB) finishing kernel execution at different points in time. This, in effect, reduces utilization of resources of SMs and hence performance of the GPU. We propose...

Publication date

Set your own date range

Content availability

Available (35)
None (1)

Keywords

OPTIMIZATION (15)
HARDWARE (10)
KERNEL (7)
GRAPHICS PROCESSING UNITS (6)
PROGRAM PROCESSORS (6)
MEMORY MANAGEMENT (5)
RUNTIME (5)
SEMANTICS (5)
BENCHMARK TESTING (4)
INSTRUCTION SETS (4)
REGISTERS (4)
ALGORITHM DESIGN AND ANALYSIS (3)
ARRAYS (3)
COMPUTATIONAL MODELING (3)
COMPUTER ARCHITECTURE (3)
LAYOUT (3)
PROCESSOR SCHEDULING (3)
STANDARDS (3)
SYNCHRONIZATION (3)
TRAINING (3)
COLLABORATION (2)
COMPILER ANALYSIS (2)
ENGINES (2)
GOOGLE (2)
HISTOGRAMS (2)
INFORMATICS (2)
INSTRUMENTS (2)
LOAD MODELING (2)
MACHINE LEARNING (2)
MEASUREMENT (2)
PARALLEL PROCESSING (2)
PREFETCHING (2)
PROGRAMMING (2)
ACTIVE LEARNING (1)
ALGORITHMS (1)
ALIAS ANALYSIS (1)
APPROXIMATE COMPUTING (1)
APPROXIMATION ALGORITHMS (1)
BARS (1)
BENCHMARKING (1)
C++ LANGUAGES (1)
COARSE-GRAINED RECONFIGURABLE ARCHITECTURE (1)
CODE COMPRESSION (1)
CODE OPTIMIZATION (1)
COGNITION (1)
COHERENCE (1)
COLLABORATIVE ANALYSIS (1)
COMPILERS (1)
COMPUTATIONAL IDIOMS (1)
COMPUTER CRASHES (1)
CONCURRENT COMPUTING (1)
CONSTRAINT SOLVER (1)
CONTEXT (1)
CROSS-MODULE (1)
CRYPTOGRAPHY (1)
DATA MODELS (1)
DATABASES (1)
DEEP LEARNING (1)
DEGRADATION (1)
DEMAND-DRIVEN ANALYSIS (1)
DENSE LINEAR ALGEBRA (1)
DEPENDENCE ANALYSIS (1)
DETECTION ALGORITHMS (1)
DETECTORS (1)
DISTRIBUTED BRAGG REFLECTORS (1)
DIVERGENCE (1)
DYNAMIC BINARY TRANSLATION (1)
ELECTRIC FENCES (1)
EMULATION (1)
ENERGY CONSUMPTION (1)
ENERGY REDUCTION (1)
EXECUTION PHASES (1)
FACEBOOK (1)
GEMM (1)
GPU (1)
GPUS (1)
GRAMMAR (1)
INTER-PROCEDURAL (1)
IRON (1)
ITERATIVE COMPILATION (1)
LIBRARIES (1)
LINEAR ALGEBRA (1)
LINK-TIME OPTIMIZATION (1)
LIPS (1)
LOADING (1)
MESSAGE SYSTEMS (1)
MICROMECHANICAL DEVICES (1)
MULTICORE PROCESSING (1)
NOISE MEASUREMENT (1)
OPENCL (1)
ORGANIZATIONS (1)
PARALLELIZATION (1)
PERFORMANCE EVALUATION (1)
PERIODIC STRUCTURES (1)
PINS (1)
PIPELINES (1)
POLYHEDRAL MODEL (1)
PRECISION (1)
PREDICTIVE MODELS (1)
PRESBURGER PRECONDITION (1)
more

INFONA - science communication portal

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)