2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)

Items from 1 to 20 out of 34 results

chapter

Title pages

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT) > c1

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)

chapter

GPU evolution: Will graphics morph into compute?

Norm Rubin

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT) > 1

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)

chapter

Outer-loop vectorization - revisited for short SIMD architectures

Dorit Nuzman, Ayal Zaks

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT) > 2 - 11

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)

Vectorization has been an important method of using data-level parallelism to accelerate scientific workloads on vector machines such as Cray for the past three decades. In the last decade it has also proven useful for accelerating multimedia and embedded applications on short SIMD architectures such as MMX, SSE and AltiVec. Most of the focus has been directed at innermost loops, effectively executing...

chapter

Redundancy elimination revisited

Keith Cooper, Jason Eckhardt, Ken Kennedy

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT) > 12 - 21

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)

This work proposes and evaluates improvements to previously known algorithms for redundancy elimination.

chapter

Exploiting loop-dependent Stream Reuse for stream processors

Xuejun Yang, Ying Zhang, Jingling Xue, Ian Rogers, more

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT) > 22 - 31

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)

The memory access limits the performance of stream processors. By exploiting the reuse of data held in the Stream Register File (SRF), an on-chip storage, the number of memory accesses can be reduced. In current stream compilers reuse is only attempted for simple stream references, those whose start and end are known. Compiler analysis from outside of stream processors does not directly enable the...

chapter

Feature selection and policy optimization for distributed instruction placement using reinforcement learning

Katherine E. Coons, Behnam Robatmili, Matthew E. Taylor, Betrand A. Maher, more

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT) > 32 - 42

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)

Communication overheads are one of the fundamental challenges in a multiprocessor system. As the number of processors on a chip increases, communication overheads and the distribution of computation and data become increasingly important performance factors. Explicit Dataflow Graph Execution (EDGE) processors, in which instructions communicate with one another directly on a distributed substrate,...

chapter

Core Cannibalization Architecture: Improving lifetime chip performance for multicore processors in the presence of hard faults

Bogdan F. Romanescu, Daniel J. Sorin

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT) > 43 - 51

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)

To improve the lifetime performance of a multicore chip with simple cores, we propose the Core Cannibalization Architecture (CCA). A chip with CCA provisions a fraction of the cores as cannibalizable cores (CCs). In the absence of hard faults, the CCs function just like normal cores. In the presence of hard faults, the CCs can be cannibalized for spare parts at the granularity of pipeline stages....

chapter

Pangaea: A tightly-coupled IA32 heterogeneous chip multiprocessor

Henry Wong, Anne Bracy, Ethan Schuchman, Tor M. Aamodt, more

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT) > 52 - 61

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)

Moore's Law and the drive towards performance efficiency have led to the on-chip integration of general-purpose cores with special-purpose accelerators. Pangaea is a heterogeneous CMP design for non-rendering workloads that integrates IA32 CPU cores with non-IA32 GPU-class multi-cores, extending the current state-of-the-art CPU-GPU integration that physically “fuses” existing CPU and GPU designs....

chapter

Skewed redundancy

Gordon B. Bell, Mikko H. Lipasti

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT) > 62 - 71

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)

Technology scaling in integrated circuits has consistently provided dramatic performance improvements in modern microprocessors. However, increasing device counts and decreasing on-chip voltage levels have made transient errors a first-order design constraint that can no longer be ignored. Several proposals have provided fault detection and tolerance through redundantly executing a program on an additional...

chapter

The PARSEC benchmark suite: Characterization and architectural implications

Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, Kai Li

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT) > 72 - 81

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)

This paper presents and characterizes the Princeton Application Repository for Shared-Memory Computers (PARSEC), a benchmark suite for studies of Chip-Multiprocessors (CMPs). Previous available benchmarks for multiprocessors have focused on high-performance computing applications and used a limited number of synchronization methods. PARSEC includes emerging applications in recognition, mining and...

chapter

Visualizing potential parallelism in sequential programs

Graham D. Price, John Giacomoni, Manish Vachharajani

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT) > 82 - 90

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)

This paper presents ParaMeter, an interactive program analysis and visualization system for large traces. Using ParaMeter, a software developer can locate and analyze regions of code that may yield to parallelization efforts and to possibly extract performance from multicore hardware. The key contributions in the paper are (1) a method to use interactive visualization of traces to find and exploit...

chapter

Characterizing and modeling the behavior of context switch misses!

Fang Liu, Fei Guo, Yan Solihin, Seongbeom Kim, more

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT) > 91 - 101

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)

One of the essential features in modern computer systems is context switching, which allows multiple threads of execution to time-share a limited number of processors. While very useful, context switching can introduce high performance overheads, with one of the primary reasons being the cache perturbation effect. Between the time a thread is switched out and when it resumes execution, parts of its...

chapter

MCAMP: Communication optimization on Massively Parallel Machines with hierarchical scratch-pad memory

Hiroshige Hayashizaki, Yutaka Sugawara, Mary Inaba, Kei Hiraki

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT) > 102 - 111

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)

Massively parallel machines that integrate a large number of simple processors and small scratch-pad memories (SPMs) into a single chip can achieve a high peak performance per watt of power. In these machines, communication optimizations are important because the communication bandwidth tends to be a bottleneck. Previously proposed communication optimizations using copy candidates, which have been...

chapter

Profiler and compiler assisted adaptive I/O prefetching for shared storage caches

Seung Woo Son, Sai Prashanth Muralidhara, Ozcan Ozturk, Mahmut Kandemir, more

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT) > 112 - 121

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)

I/O prefetching has been employed in the past as one of the mechanisms to hide large disk latencies. However, I/O prefetching in parallel applications is problematic when multiple CPUs share the same set of disks due to the possibility that prefetches from different CPUs can interact on shared memory caches in the I/O nodes in complex and unpredictable ways. In this paper, we (i) quantify the impact...

chapter

Runtime optimization of vector operations on large scale SMP clusters

Costin Iancu, Steven Hofmeyr

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT) > 122 - 132

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)

“Vector” style communication operations transfer multiple disjoint memory regions within one logical step. These operations are widely used in applications, they do improve application performance, and their behavior has been studied and optimized using different implementation techniques across a large variety of systems. In this paper we present a methodology for the selection of the best performing...

chapter

(How) can programmers conquer the multicore menace?

Saman Amarasinghe

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT) > 133

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)

The document was not made available for publication as part of the conference proceedings.

chapter

Distributed Cooperative Caching

Enric Herrero, Jose Gonzalez, Ramon Canal

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT) > 134 - 143

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)

This paper presents the Distributed Cooperative Caching, a scalable and energy-efficient scheme to manage chip multiprocessor (CMP) cache resources. The proposed configuration is based in the Cooperative Caching framework [3] but it is intended for large scale CMPs. Both centralized and distributed configurations have the advantage of combining the benefits of private and shared caches. In our proposal,...

chapter

Scalable and reliable communication for hardware transactional memory

Seth H. Pugsley, Manu Awasthi, Niti Madan, Naveen Muralimanohar, more

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT) > 144 - 154

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)

In a hardware transactional memory system with lazy versioning and lazy conflict detection, the process of transaction commit can emerge as a bottleneck. This is especially true for a large-scale distributed memory system where multiple transactions may attempt to commit simultaneously and co-ordination is required before allowing commits to proceed in parallel. In this paper, we propose novel algorithms...

chapter

Improving support for locality and fine-grain sharing in chip multiprocessors

Hemayet Hossain, Sandhya Dwarkadas, Michael C. Huang

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT) > 155 - 165

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)

Both commercial and scientific workloads benefit from concurrency and exhibit data sharing across threads/processes. The resulting sharing patterns are often fine-grain, with the modified cache lines still residing in the writer's primary cache when accessed. Chip multiprocessors present an opportunity to optimize for fine-grain sharing using direct access to remote processor components through low-latency...

chapter

Edge-centric modulo scheduling for coarse-grained reconfigurable architectures

Hyunchul Park, Kevin Fan, Scott Mahlke, Taewook Oh, more

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT) > 166 - 176

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)

Coarse-grained reconfigurable architectures (CGRAs) present an appealing hardware platform by providing the potential for high computation throughput, scalability, low cost, and energy efficiency. CGRAs consist of an array of function units and register files often organized as a two dimensional grid. The most difficult challenge in deploying CGRAs is compiler scheduling technology that can efficiently...

Publication date

Set your own date range

Keywords

PROGRAM PROCESSORS (12)
MULTICORE PROCESSING (11)
HARDWARE (10)
OPTIMIZATION (9)
ALGORITHM DESIGN AND ANALYSIS (5)
COHERENCE (5)
COMPUTER ARCHITECTURE (5)
INSTRUCTION SETS (5)
REGISTERS (5)
PROGRAMMING (4)
RUNTIME (4)
GRAPHICS PROCESSING UNITS (3)
MEMORY MANAGEMENT (3)
PARALLEL PROCESSING (3)
PIPELINES (3)
PROPOSALS (3)
RADIATION DETECTORS (3)
SOFTWARE (3)
SYSTEM-ON-CHIP (3)
THROUGHPUT (3)
BANDWIDTH (2)
BENCHMARK TESTING (2)
CACHE STORAGE (2)
CHIP MULTIPROCESSORS (2)
COMPUTER SCIENCE (2)
DYNAMIC POWER MANAGEMENT (2)
DYNAMIC SCHEDULING (2)
ENERGY EFFICIENCY (2)
GENERATORS (2)
MATHEMATICAL MODEL (2)
MEMORY HIERARCHY (2)
MICROARCHITECTURE (2)
OPENMP (2)
PARALLEL PROGRAMMING (2)
PARTITIONING ALGORITHMS (2)
PREFETCHING (2)
PROTOCOLS (2)
REDUNDANCY (2)
RELIABILITY (2)
SCALABILITY (2)
SCHEDULES (2)
SHAPE (2)
SWITCHES (2)
SYNCHRONIZATION (2)
ADAPTATION MODELS (1)
ADAPTIVE (1)
ALGORITHMS FOR TRANSACTION COMMIT (1)
ANALYTICAL MODEL (1)
ANALYTICAL MODELS (1)
ANIMATION (1)
APPROXIMATION ALGORITHMS (1)
ARMCO (1)
ARRAYS (1)
BENCHMARK SUITE (1)
BOOLEAN FUNCTIONS (1)
BUFFER STORAGE (1)
CACHE COHERENCE (1)
CACHE CONTENTION (1)
CACHE PARTITIONING (1)
CACHE RESIZING (1)
CATHODE RAY TUBES (1)
CELL BE (1)
CHIP-MULTIPROCESSOR (1)
CHIPMULTI-PROCESSOR (1)
CIRCUIT FAULTS (1)
CMP (1)
CMP SCHEDULING (1)
CO-SCHEDULING (1)
COARSE-GRAINED RECONFIGURABLE ARCHITECTURE (1)
COMMUNICATION CODE GENERATION (1)
COMPILER (1)
COMPILER HEURISTICS (1)
COMPILER OPTIMIZATIONS (1)
COMPUTATIONAL MODELING (1)
COMPUTERS (1)
CONCURRENT COMPUTING (1)
CONTEXT (1)
CONTEXT MODELING (1)
CONTEXT SWITCH MISSES (1)
COOPERATIVE CACHING (1)
COPY CANDIDATES (1)
CRITICAL THREADS (1)
DATA PARALLELISM (1)
DATA REUSE (1)
DATA VISUALIZATION (1)
DEGRADATION (1)
DISTANCE MEASUREMENT (1)
DISTRIBUTED COOPERATIVE CACHING (1)
DISTRIBUTED PROCESSING (1)
DYNAMIC VOLTAGE SCALING (1)
ELECTRONICS PACKAGING (1)
END-TO-END LATENCY (1)
ENERGY CONSUMPTION (1)
ENERGY MANAGEMENT (1)
ENERGY-AWARE (1)
ENGINES (1)
ERROR TOLERANCE (1)
EXPRESSION OPTIMIZATION (1)
FACE (1)
FAULT TOLERANCE (1)
more

INFONA - science communication portal

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)

Title pages

GPU evolution: Will graphics morph into compute?

Outer-loop vectorization - revisited for short SIMD architectures

Redundancy elimination revisited

Exploiting loop-dependent Stream Reuse for stream processors

Feature selection and policy optimization for distributed instruction placement using reinforcement learning

Core Cannibalization Architecture: Improving lifetime chip performance for multicore processors in the presence of hard faults

Pangaea: A tightly-coupled IA32 heterogeneous chip multiprocessor

Skewed redundancy

The PARSEC benchmark suite: Characterization and architectural implications

Visualizing potential parallelism in sequential programs

Characterizing and modeling the behavior of context switch misses!

MCAMP: Communication optimization on Massively Parallel Machines with hierarchical scratch-pad memory

Profiler and compiler assisted adaptive I/O prefetching for shared storage caches

Runtime optimization of vector operations on large scale SMP clusters

(How) can programmers conquer the multicore menace?

Distributed Cooperative Caching

Scalable and reliable communication for hardware transactional memory

Improving support for locality and fine-grain sharing in chip multiprocessors

Edge-centric modulo scheduling for coarse-grained reconfigurable architectures

Filter options

Publication date

Keywords

INFONA - science communication portal

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT) $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)