Advanced search

From:

To:

Items from 1 to 20 out of 32 results

chapter

Body bias optimization for variable pipelined CGRA

Takuya Kojima, Naoki Ando, Hayate Okuhara, Ng. Anh Vu Doan, more

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 4

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

Variable Pipeline Cool Mega Array (VPCMA) is an low power Coarse Grained Reconfigurable Architecture (CGRA) based on the concept of CMA (Cool Mega Array). It implements a pipeline structure that can be configured depending on performance requirements, and the silicon on thin buried oxide (SOTB) technology that allows to control its body bias voltage to balance performance and leakage power. In this...

chapter

Redundancy elimination revisited

Keith Cooper, Jason Eckhardt, Ken Kennedy

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT) > 12 - 21

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)

This work proposes and evaluates improvements to previously known algorithms for redundancy elimination.

chapter

Extension of Context Free Grammar for Intermediate Code and Peephole Optimization Rule Parsers

Chirag H. Bhatt, Harshad B. Bhadka

2015 Fifth International Conference on Advanced Computing & Communication Technologies > 162 - 164

2015 Fifth International Conference on Advanced Computing & Communication Technologies (ACCT)

For the purpose of writing and constructing compilers, interpreters and optimizers the parsing of code is mandatory. The syntactic analysis of the input intermediate code into its component parts is known as parsing. And for the implementation of such parsers we need a Context Free Grammar (CFG) which helps to analyses the input code. This paper mainly focuses on extension of the CFG in an EBNF form...

chapter

Double precision stencil computations on Kepler GPUs

Anamaria Vizitiu, Lucian Itu, Laszlo Lazar, Constantin Suciu

2014 18th International Conference on System Theory, Control and Computing (ICSTCC) > 123 - 127

2014 18th International Conference on System Theory, Control and Computing (ICSTCC)

Graphics Processing Units (GPU) have been used extensively for accelerating parallelizable applications in general, and scientific computations in particular. Stencil based algorithms are used intensively in various research areas and represent good candidates for GPU based acceleration. Since scientific computations have high accuracy requirements, herein we focus on stencil based double precision...

chapter

Enabling PGAS Productivity with Hardware Support for Shared Address Mapping: A UPC Case Study

Olivier Serres, Abdullah Kayi, Ahmad Anbar, Tarek El Ghazawi

2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS) > 1 - 10

2014 IEEE International Conference on High Performance Computing and Communications (HPCC), 2014 IEEE 6th International Symposium on Cyberspace Safety and Security (CSS) and 2014 IEEE 11th International Conference on Embedded Software and Systems (ICESS)

The Partitioned Global Address Space (PGAS) programming model strikes a balance between the locality-aware, but explicit, message-passing model (e.g. MPI) and the easy-to-use, but locality-agnostic, shared memory model (e.g. OpenMP). However, the PGAS rich memory model comes at a performance cost which can hinder its potential for scalability and performance. To contain this overhead and achieve full...

chapter

An analytical model for estimating execution cost of 1D array expressions

Youssef Omran Gdura

2014 6th International Conference on Computer Science and Information Technology (CSIT) > 133 - 141

2014 6th International Conference on Computer Science and Information Technology (CSIT)

Compiler writers have developed various techniques, such as constant folding, subexpresison elimination, loop transformation and vectorization, to help compilers in code optimization for performance improvement. Yet, they have been far less successful in developing techniques or cost models that compilers can rely on to simplify parallel programming and tune the performance of parallel applications...

article

Tag Check Elision

Zhong Zheng, Zhiying Wang, Mikko Lipasti

02014 IEEE/ACM International Symposium on Low Power Electronics and Design... > 2014 > 351 - 356

2014 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)

For set-associative caches, accessing cache ways in parallel results in significant energy waste, as only one way contains the desired data. In this paper, we propose Tag Check Elision (TCE): a non-speculative approach for accessing set-associative caches without a tag check to save energy. TCE can eliminate up to 86% of the tag checks (67% on average), without sacrificing any performance. These direct...

chapter

Efficient sparse matrix multiple-vector multiplication using a bitmapped format

Ramaseshan Kannan

20th Annual International Conference on High Performance Computing > 286 - 294

2013 20th International Conference on High Performance Computing (HiPC)

The problem of obtaining high computational throughput from sparse matrix multiple-vector multiplication routines is considered. Current sparse matrix formats and algorithms have high bandwidth requirements and poor reuse of cache and register loaded entries, which restrict their performance. We propose the mapped blocked row format: a bitmapped sparse matrix format that stores entries as blocks without...

chapter

A Multiple-ISA Reconfigurable Architecture

Fernanda M. Capella, Marcelo Brandalero, Jair Fajardo Junior, Antonio C. S. Beck, more

2013 III Brazilian Symposium on Computing Systems Engineering > 71 - 76

2013 III Brazilian Symposium on Computing Systems Engineering (SBESC)

In these days, every new added hardware feature must not change the underlying instruction set architecture (ISA), in order to avoid adaptation or recompilation of existing code. Nevertheless, this need for compatibility imposes a great number of restrictions to the designers, because it keeps them tied to a specific ISA and all its legacy hardware issues. Considering that the market is mainly dominated...

chapter

Memory design considerations for high-performance networking SoCs

Igor Arsovski, Qing Li, Mark Kuemerle, Rui Tu, more

2013 International SoC Design Conference (ISOCC) > 286 - 289

2013 International Soc Design Conference (ISOCC)

On chip memory in today's networking SoCs takes up >50% of total area and consumes >40% of total power. As demand for high-performance networks grows, so will the memory content on future SoCs. This paper presents IBM's 32nm HKMG SOI embedded memory offering discussing the considerations associated with the design of key networking memory functions. With memory limiting system performance and...

chapter

High level tranforms toreduce energy consumption of signal and image processing operators

H. Ye, L. Lacassagne, J. Falcou, D. Etiemble, more

2013 23rd International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS) > 247 - 254

2013 23rd International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)

High Level Synthesis for Systems on Chip is a challenging way to cut off development time, while assuming a good level of performance. But the HLS tools are limited by the abstraction level of the description to perform some high level transforms. This paper evaluates the impact of such high level transforms for ASICs. We have evaluated recursive and non recursive filters for signal processing an...

chapter

Global Load Instruction Aggregation Based on Code Motion

Yasunobu Sumikawa, Munehiro Takimoto

2012 Fifth International Symposium on Parallel Architectures, Algorithms and Programming > 149 - 156

2012 Fifth International Symposium on Parallel Architectures, Algorithms and Programming (PAAP)

Most modern processors have some much faster cache memories than a main memory, and therefore, it is important to effectively utilize it for the efficient execution. The cache memories work well through enhancing temporal or spatial localities in the program. Therefore, the cache efficiency can be improved by making accesses to the same array or structure continuous. We propose the new cache optimization...

chapter

The Research on SLP Optimization Technique towards DSP

Weiyi Suo, Rongcai Zhao, Yuan Yao, Peng Liu

2012 11th International Symposium on Distributed Computing and Applications to Business, Engineering & Science > 179 - 183

2012 11th International Symposium on Distributed Computing and Applications to Business, Engineering & Science

According to the application of Super word Level Paralleism (SLP) auto-vectorization compiling system in Digital Signal Processing (DSP), due to the specialized functions of DSP frame, such as the specific addressing model, a wide variety of registers, irregular data branch, the obstacle of dependence relation to vectorization non-aligned data or other reasons, which resulted in the compiler can not...

chapter

Staged Reads: Mitigating the impact of DRAM writes on DRAM reads

Niladrish Chatterjee, Naveen Muralimanohar, Rajeev Balasubramonian, Al Davis, more

IEEE International Symposium on High-Performance Comp Architecture > 1 - 12

2012 IEEE 18th International Symposium on High Performance Computer Architecture (HPCA)

Main memory latencies have always been a concern for system performance. Given that reads are on the critical path for CPU progress, reads must be prioritized over writes. However, writes must be eventually processed and they often delay pending reads. In fact, a single channel in the main memory system offers almost no parallelism between reads and writes. This is because a single off-chip memory...

chapter

Optimizing SIMD Parallel Computation with Non-Consecutive Array Access in Inline SSE Assembly Language

Chen Juan, Yang Canqun

2012 Fifth International Conference on Intelligent Computation Technology and Automation > 254 - 257

2012 Fifth International Conference on Intelligent Computation Technology and Automation (ICICTA)

Many processors, such as Intel Xeon processor 5100 series, AMD Athlon 64, support SIMD computation model with the Streaming SIMD Extensions (SSE), SSE2 and SSE3. Using double-precision SSE/SSE2/SSE3 instructions simultaneously can handle two packed double-precision floating-point data elements with 128-bit XMM vector registers, which greatly improves floating-point performance. Sometimes non-consecutive...

chapter

A high performance FFT library with single instruction multiple data (SIMD) architecture

Wang Xu, Zhang Yan, Ding Shunying

2011 International Conference on Electronics, Communications and Control (ICECC) > 630 - 633

2011 International Conference on Electronics, Communications and Control (ICECC)

Fast Fourier Transform (FFT) is the basis of Digital Signal Processing (DSP). In this paper, a high performance FFT library using radix-2 decimation in frequency (DIF) algorithm is presented which is well suited for SIMD architecture. SIMD architecture microprocessors, such as Intel and AMD, allow parallel floating point operations on contiguous data in memory. A 128-point FFT based radix-2 DIF algorithm...

chapter

Automatic multi-objective optimization of parameters for hardware and code optimizations

Ralf Jahr, Theo Ungerer, Horia Calborean, Lucian Vintan

2011 International Conference on High Performance Computing & Simulation > 308 - 316

2011 International Conference on High Performance Computing & Simulation (HPCS)

Recent computer architectures can be configured in lots of different ways. To explore this huge design space, system simulators are typically used. As performance is no longer the only decisive factor but also e.g. power usage or the resource usage of the system it became very hard for designers to select optimal configurations. In this article we use a multi-objective design space exploration tool...

chapter

Vapor SIMD: Auto-vectorize once, run everywhere

D Nuzman, S Dyshel, E Rohou, I Rosen, more

International Symposium on Code Generation and Optimization (CGO 2011) > 151 - 160

2011 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2011)

Just-in-Time (JIT) compiler technology offers portability while facilitating target- and context-specific specialization. Single-Instruction-Multiple-Data (SIMD) hardware is ubiquitous and markedly diverse, but can be difficult for JIT compilers to efficiently target due to resource and budget constraints. We present our design for a synergistic auto-vectorizing compilation scheme. The scheme is composed...

chapter

Parallel Sparse Matrix Vector Multiplication using greedy extraction of boxes

D Brahme, B R Mishra, A Barve

2010 International Conference on High Performance Computing > 1 - 10

2010 International Conference on High Performance Computing (HiPC 2010)

Parallel Sparse Matrix Vector Multiplication (PSpMV) is a compute intensive kernel used in iterative solvers like Conjugate Gradient, GMRES and Lanzcos. Numerous attempts at optimizing this function have been made that require fine tuning of many hardware and software parameters to achieve optimal performance. We attempt to offer a simple framework that involves (i) Employing a greedy algorithm to...

chapter

A Technology Based Benefit Analysis on Reuse of Vector Register for SIMD Vectorization Optimization

Ming Yang, Yuan Yao, Shuai Wei, Yuanyuan Zhang, more

2010 Third International Symposium on Information Science and Engineering > 101 - 104

2010 International Symposium on Information Science and Engineering (ISISE)

The reuse-rate of vector register is one of the most important aspects that influence the SIMD performance. However, the reuse of vector register probably can lead ton on-continuous memory access and non-hit of cache. Based on register reuse analysis, we establish a cost analysis that guide the multiple code generation. Then we perform NPB test based on different problem-scale and the result shows...

Keywords:
ARRAYS
OPTIMIZATION
REGISTERS

Publication date

Set your own date range

Publication type

book (31)
article (1)

Keywords

OPTIMISATION (5)
PIPELINES (5)
PROGRAM PROCESSORS (5)
ALGORITHM DESIGN AND ANALYSIS (4)
COMPUTER ARCHITECTURE (4)
HARDWARE (4)
KERNEL (4)
REDUNDANCY (4)
SIMD (4)
SOFTWARE (4)
ASSEMBLY (3)
DELAY (3)
INSTRUCTION SETS (3)
PARALLEL ARCHITECTURES (3)
PARALLEL PROCESSING (3)
RECONFIGURABLE ARCHITECTURES (3)
SIGNAL PROCESSING (3)
TRANSFORMS (3)
VECTORS (3)
BANDWIDTH (2)
BENCHMARK TESTING (2)
CODE OPTIMIZATION (2)
COMPLEXITY THEORY (2)
COMPUTER SCIENCE (2)
DATA MINING (2)
DELAYS (2)
ENERGY OPTIMIZATION (2)
INDEXES (2)
LIBRARIES (2)
MEMORY MANAGEMENT (2)
MICROPROCESSOR CHIPS (2)
PARALLEL PROGRAMMING (2)
POWER CONSUMPTION (2)
PREFETCHING (2)
PROGRAM COMPILERS (2)
RANDOM ACCESS MEMORY (2)
SHAPE (2)
SIGNAL PROCESSING ALGORITHMS (2)
SPARSE MATRICES (2)
TESTING (2)
WRITING (2)
2D RECONFIGURABLE ARCHITECTURE (1)
2D RECONFIGURABLE PROCESSOR ARRAY (1)
ACCELERATION (1)
ADAPTIVE CODING (1)
AGGREGATES (1)
AGGRESSIVE GENERIC OFFLINE STAGE (1)
ALGORITHM TRANSFORMS (1)
ANALYTICAL MODELS (1)
ANTENNA MEASUREMENTS (1)
APPROXIMATION ALGORITHMS (1)
APPROXIMATION THEORY (1)
ARCHITECTURAL APPROACHES (1)
ARCHITECTURAL REGISTERS (1)
AREA MINIMIZATION (1)
ARRAY (1)
ARRAY EXPRESSIONS (1)
ARTIFICIAL INTELLIGENCE (1)
ARTIFICIAL NEURAL NETWORKS (1)
ASIC (1)
AUTO-VECTORIZATION (1)
AUTOMATIC DESIGN SPACE EXPLORATION (1)
BASEBAND (1)
BENEFIT ANALYSIS (1)
BILLIONS TRANSISTOR ERA (1)
BINARY TRANSLATION (1)
BIOMEDICAL IMAGING (1)
BUFFER STORAGE (1)
BYTECODE-BASED OPTIMIZATIONS (1)
CACHE MEMORY (1)
CACHE STORAGE (1)
CACHEBIT (1)
CAHCE SIMULATING (1)
CAM-BASED CHECKPOINTED REGISTER ALIAS TABLE (1)
CGRA (1)
CHANNEL ESTIMATION (1)
CHECKPOINTING (1)
CIRCUITS AND SYSTEMS (1)
CLASSIFICATION ALGORITHMS (1)
CLOCKS (1)
COARSE-GRAINED (1)
COARSE-GRAINED ARRAY BASED BASEBAND PROCESSOR (1)
CODE MOTION (1)
CODECS (1)
CODING STANDARD CHECKING (1)
CODING STANDARDS (1)
COMMERCIAL OFF-THE-SHELL (1)
COMMUNICATION LATENCY (1)
COMMUNICATION NETWORK RECONFIGURATION (1)
COMPILER TECHNIQUES (1)
COMPLIANCE CHECKING (1)
COMPONENT BASED DEVELOPMENT (1)
COMPUTATIONAL MODELING (1)
COMPUTER AIDED MANUFACTURING (1)
COMPUTERS (1)
CONFERENCES (1)
CONFORMANCE TESTING (1)
more

INFONA - science communication portal

Advanced search

Advanced search

Body bias optimization for variable pipelined CGRA

Redundancy elimination revisited

Extension of Context Free Grammar for Intermediate Code and Peephole Optimization Rule Parsers

Double precision stencil computations on Kepler GPUs

Enabling PGAS Productivity with Hardware Support for Shared Address Mapping: A UPC Case Study

An analytical model for estimating execution cost of 1D array expressions

Tag Check Elision

Efficient sparse matrix multiple-vector multiplication using a bitmapped format

A Multiple-ISA Reconfigurable Architecture

Memory design considerations for high-performance networking SoCs

High level tranforms toreduce energy consumption of signal and image processing operators

Global Load Instruction Aggregation Based on Code Motion

The Research on SLP Optimization Technique towards DSP

Staged Reads: Mitigating the impact of DRAM writes on DRAM reads

Optimizing SIMD Parallel Computation with Non-Consecutive Array Access in Inline SSE Assembly Language

A high performance FFT library with single instruction multiple data (SIMD) architecture

Automatic multi-objective optimization of parameters for hardware and code optimizations

Vapor SIMD: Auto-vectorize once, run everywhere

Parallel Sparse Matrix Vector Multiplication using greedy extraction of boxes

A Technology Based Benefit Analysis on Reuse of Vector Register for SIMD Vectorization Optimization

Filter options

Publication date

Publication type

Keywords

INFONA - science communication portal

Advanced search

Advanced search

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options