Advanced search

From:

To:

Items from 1 to 20 out of 39 results

chapter

Runtime automatic speculative parallelization

B Hertzberg, K Olukotun

International Symposium on Code Generation and Optimization (CGO 2011) > 64 - 73

2011 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2011)

We present Runtime Automatic Speculative Parallelization (RASP), a technique for the dynamic extraction of speculative threads from a running application in a user-transparent fashion. By leveraging the idle cores in a CMP to analyze, optimize, and participate in the execution of a running sequential program, RASP enables a collection of simpler cores to achieve sequential performance on par with...

chapter

MAO — An extensible micro-architectural optimizer

R Hundt, E Raman, M Thuresson, N Vachharajani

International Symposium on Code Generation and Optimization (CGO 2011) > 1 - 10

2011 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2011)

Performance matters, and so does repeatability and predictability. Today's processors' micro-architectures have become so complex as to now contain many undocumented, not understood, and even puzzling performance cliffs. Small changes in the instruction stream, such as the insertion of a single NOP instruction, can lead to significant performance deltas, with the effect of exposing compiler and performance...

article

Static Analysis of Register File Vulnerability

Jongeun Lee, Aviral Shrivastava

IEEE Transactions on Computer-Aided Design of Integrated Circuits and... > 2011 > 30 > 4 > 607 - 616

With continuous technology scaling, soft errors are becoming an increasingly important design concern even for earth-bound applications. While compiler approaches have the potential to mitigate the effect of soft errors with minimal runtime overheads, static vulnerability estimation-an essential part of compiler approaches-is lacking due to its inherent complexity. This paper presents a static analysis...

chapter

Abstract state machines as an intermediate representation for high-level synthesis

R Sinha, H D Patel

2011 Design, Automation&Test in Europe > 1 - 6

2011 Design, Automation & Test in Europe

This work presents a high-level synthesis methodology that uses the abstract state machines (ASMs) formalism as an intermediate representation (IR). We perform scheduling and allocation on this IR, and generate synthesizable VHDL. We have the following advantages when using ASMs as an IR: 1) it allows the specification of both sequential and parallel computation, 2) it supports an extension of a clean...

chapter

GVE: Godson-T Verification Engine for many-core architecture rapid prototyping and debugging

Zhengmeng Lei, Lunkai Zhang, Fenglong Song, Shibin Tang, more

2010 International Conference on Field-Programmable Technology > 253 - 256

2010 International Conference on Field-Programmable Technology (FPT 2010)

In this paper, we present a Godson-T Verification Engine (GVE) to rapidly prototype and debug our Godson-T many-core processor design. GVE adopts the state-of-the-art hardware platform which contains 6 Xilinx Virtex-5 LX330 FPGAs, thus permitting us to map our many-core processor and peripheral devices into it. Besides the hardware, our toolkit Godson-T Studio provides the compiler, program loader,...

chapter

Two Efficient Software Techniques to Detect and Correct Control-Flow Errors

H R Zarandi, M Maghsoudloo, N Khoshavi

2010 IEEE 16th Pacific Rim International Symposium on Dependable Computing > 141 - 148

2010 IEEE 16th Pacific Rim International Symposium on Dependable Computing (PRDC 2010)

This paper proposes two efficient software techniques, Control-flow and Data Errors Correction using Data-flow Graph Consideration (CDCC) and Miniaturized Check-Pointing (MCP), to detect and correct control-flow errors. These techniques have been implemented based on addition of redundant codes in a given program. The creativity applied in the methods for online detection and correction of the control-flow...

chapter

Software integration of identical DLP threads via compilation for VLIW processors

Maolin Guan, Nan Wu, Mei Wen, Chunyuan Zhang

5th International Conference on Computer Sciences and Convergence Information Technology > 427 - 433

2010 5th International Conference on Computer Sciences and Convergence Information Technology (ICCIT 2010)

Based on the characteristics of data level parallelism (DLP) multi-threading programs appearing in the practical application, this paper proposes a new method that implements software integration of identical DLP threads via compilation for VLIW processors. This method translates DLP into ILP by merging the operations in corresponding basic blocks divided from different threads into one basic block...

chapter

Control Independence Using Dual Renaming

Lin Meng, Shigeru Oyanagi

2010 First International Conference on Networking and Computing > 264 - 267

2010 First International Conference on Networking and Computing (ICNC 2010)

Modern Super scalar Processor squashes up all of wrong-path instructions when the branch prediction misses. In deeper pipelines, branch miss prediction penalty increases seriously owing to large number of squashed instructions. Exploiting control independence has been proposed for reducing this penalty. Control Independence method reuses control independent instructions (CI instructions) without squashing...

chapter

A language-based approach to implementing multi-vendor support in an AdvancedTCA Shelf Manager

S Zhukov

2010 6th Central and Eastern European Software Engineering Conference (CEE-SECR) > 5 - 11

2010 6th Central and Eastern European Software Engineering Conference in Russia (CEE-SECR 2010)

The Pigeon Point^® AdvancedTCA^® Shelf Manager software supports multiple types of shelves produced by multiple vendors. While the number of supported platform configurations grew over time, hard coding platform-specific features in the software became impractical. A language-based approach has been designed and a new interpreted language for describing hardware platforms (Shelf Manager carrier boards...

chapter

CCDA: Correcting control-flow and data errors automatically

Mohammad Maghsoudloo, Navid Khoshavi, Hamid R Zarandi

2010 15th CSI International Symposium on Computer Architecture and Digital Systems > 99 - 104

15th CSI International Symposium on Computer Architecture and Digital Systems (CADS 2010)

This paper presents an efficient software technique to detect and correct control-flow errors through addition of redundant codes in a given program. The key innovation performed in the proposed technique is detection and correction of the control-flow errors using both control-flow graph and data-flow graph. Using this technique, most of control-flow errors in the program are detected first, and...

chapter

Generating Performance Bounds from Source Code

Sri Hari Krishna Narayanan, B Norris, P D Hovland

2010 39th International Conference on Parallel Processing Workshops > 197 - 206

2010 39th International Conference on Parallel Processing Workshops (ICPPW)

Understanding and tuning the performance of complex applications on modern hardware are challenging tasks, requiring understanding of the algorithms, implementation, compiler optimizations, and underlying architecture. Many tools exist for measuring and analyzing the runtime performance of applications. Obtaining sufficiently detailed performance data and comparing it with the peak performance of...

chapter

Global Register Alias Table: Executing Sequential Program on Multi-Core

Chunhao Wang, Lihan Ju, Di Wu, Lingxiang Xiang, more

2010 10th IEEE International Conference on Computer and Information Technology > 1818 - 1824

2010 IEEE 10th International Conference on Computer and Information Technology (CIT)

Executing sequential program on multi-core is crucial for accommodating instruction level parallelism (ILP) in chip multiprocessor (CMP) architecture. One widely used method of steering instructions across cores is based on dependency. However, this method requires a sophisticated steering mechanism and brings much hardware complexity and area overhead. This paper presents the Global Register Alias...

chapter

Rapid prototyping and compact testing of CPU emulators

Weiqin Ma, A Forin, Jyh-Charn Liu

Proceedings of 2010 21st IEEE International Symposium on Rapid System Protyping > 1 - 7

2010 21st IEEE International Symposium on Rapid System Prototyping (RSP 2010)

In this paper, we propose a novel rapid prototyping technique to produce a high quality CPU emulator at reduced development cost. Specification mining from published CPU manuals, automated code generation of both the emulator and its test vectors from the mined CPU specifications, and a hardware-oracle based test strategy all work together to close the gaps between specification analysis, development...

chapter

Improving the performance of program monitors with compiler support in multi-core environment

Guojin He, Antonia Zhai

2010 IEEE International Symposium on Parallel&Distributed Processing (IPDPS) > 1 - 12

2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

Dynamic program execution monitors allow programmers to observe and verify an application while it is running. Instrumentation-based dynamic program monitors often incur significant performance overhead due to instrumentation. Special hardware supports have been proposed to reduce this overhead. However, these supports mostly target specific monitoring requirements and thus have limited applicability...

chapter

Concept and Development of Modular VLIW Processor Based on FPGA

Debyo Saptono, Vincent Brost, Fan Yang, Eri Prasetyo Wibowo

2010 Second International Conference on Computer and Network Technology > 561 - 565

2010 Second International Conference on Computer and Network Technology (ICCNT 2010)

Modern FPGA chips, with their larger memory capacity and reconfigurability potential, are opening new frontiers in rapid prototyping of embedded systems. With the advent of high density FPGAs it is now possible to implement a high performance VLIW processor core in an FPGA. Architecture based on Very Long Instruction Word (VLIW) processors are an optimal choice in the attempt to obtain high performance...

chapter

Profile driven data-dependency analysis for improved high level language hardware synthesis

P. Crosthwaite, J. Williams, P. Sutton

2009 International Conference on Field-Programmable Technology > 207 - 214

2009 International Conference on Field-Programmable Technology (FPT 2009)

Existing high-level hardware synthesis tools typically focus on the automated discovery of opportunities for Instruction Level Parallelism (ILP) or alternatively allow designers to explicitly specify instances or opportunities for ILP. We present a novel profiling driven approach to the automated discovery of higher level speculative parallelism opportunities for custom-hardware implementations. The...

chapter

Implicit Data Permutation for SIMD Devices

Li Shen, Libo Huang, Nong Xiao, Zhiying Wang

2009 Fourth International Conference on Embedded and Multimedia Computing > 1 - 6

2009 Fourth International Conference on Embedded and Multimedia Computing (EM-Com 2009)

SIMD extension is one of the most effective ways to exploit data level parallelism in current microprocessor design. But limited by some constraints, such as memory address alignment and in consecutive memory access, data permutation operations are usually needed before SIMD calculations, which impede us to exploit more parallelism. In this paper, an implicit data permutation mechanism is proposed...

chapter

Reducing processor energy consumption by compiler optimization

V. Guzma, T. Pitkanen, P. Kellomaki, J. Takala

2009 IEEE Workshop on Signal Processing Systems > 63 - 68

2009 IEEE Workshop on Signal Processing Systems. SiPS 2009

Purpose of embedded computing is to transform input data to output format. Functionality required to achieve this goal is therefore combination of operation executions on computing units and data transfers between those units. To avoid memory bottlenecks, processors use register files to store data during computation.

chapter

Code density concerns for new architectures

V.M. Weaver, S.A. McKee

2009 IEEE International Conference on Computer Design > 459 - 464

2009 IEEE International Conference on Computer Design (ICCD 2009)

Reducing a program's instruction count can improve cache behavior and bandwidth utilization, lower power consumption, and increase overall performance. Nonetheless, code density is an often overlooked feature in studying processor architectures. We hand-optimize an assembly language embedded benchmark for size on 21 different instruction set architectures, finding up to a factor of three difference...

chapter

Design and deployment of a generic ECC-based fault tolerance mechanism for embedded HW cores

J.-C. Ruiz, D. de Andres, P. Gil

2009 IEEE Conference on Emerging Technologies&Factory Automation > 1 - 8

2009 IEEE 14th International Conference on Emerging Technologies & Factory Automation. ETFA 2009

Current practices for the design and deployment of hardware redundancy techniques in embedded systems remain in practice specific (defined on a case-per-case basis) and mostly manual. This paper addresses the challenging problems of engineering fault tolerance mechanisms in a generic way and providing suitable tools for coping with their deployment. This approach relies on metaprogramming to specify...

Keywords:
HARDWARE
REGISTERS
PROGRAM COMPILERS

Publication date

Set your own date range

Publication type

book (36)
article (3)

Keywords

COMPUTER ARCHITECTURE (12)
SOFTWARE (12)
INSTRUCTION SETS (8)
OPTIMIZATION (8)
CLOCKS (7)
DATA MINING (7)
BENCHMARK TESTING (6)
COMPILER (6)
FIELD PROGRAMMABLE GATE ARRAYS (6)
MICROPROCESSOR CHIPS (6)
PROGRAM PROCESSORS (6)
EMBEDDED SYSTEMS (5)
HARDWARE DESCRIPTION LANGUAGES (5)
REDUCED INSTRUCTION SET COMPUTING (5)
HARDWARE-SOFTWARE CODESIGN (4)
MAGNETIC CORES (4)
PARALLEL ARCHITECTURES (4)
PARALLEL PROCESSING (4)
PIPELINES (4)
RUNTIME (4)
ALGORITHM DESIGN AND ANALYSIS (3)
C LANGUAGE (3)
CHECKPOINTING (3)
DELAY (3)
FORMAL SPECIFICATION (3)
INSTRUCTION LEVEL PARALLELISM (3)
MEMORY MANAGEMENT (3)
MICROPROCESSORS (3)
MONITORING (3)
MULTIPROCESSING SYSTEMS (3)
RADIO FREQUENCY (3)
RESOURCE MANAGEMENT (3)
VLIW (3)
APPLICATION PROGRAM INTERFACES (2)
C LANGUAGE COMPILER (2)
COMPILER OPTIMIZATION (2)
COMPLEXITY THEORY (2)
COMPUTATIONAL MODELING (2)
CONTROL-FLOW CHECKING (2)
CONTROL-FLOW ERROR (2)
CONTROL-FLOW GRAPH (2)
DATA ERROR (2)
DATA ERROR GENERATION (2)
DATA FLOW COMPUTING (2)
DATA FLOW GRAPHS (2)
DATA LEVEL PARALLELISM (2)
DIGITAL SIGNAL PROCESSING (2)
ERROR CORRECTION (2)
ERROR CORRECTION CODES (2)
ERROR DETECTION (2)
FILE ORGANISATION (2)
FPGA (2)
GENERATORS (2)
HARDWARE DESIGN LANGUAGES (2)
HIGH LEVEL LANGUAGES (2)
ISA (2)
KERNEL (2)
LOGIC DESIGN (2)
LOGIC GATES (2)
MATHEMATICAL MODEL (2)
MATRIX MULTIPLICATION (2)
OPTIMISATION (2)
OUT OF ORDER (2)
PIPELINE PROCESSING (2)
PROGRAM DIAGNOSTICS (2)
PROGRAM INTERPRETERS (2)
PROGRAM TESTING (2)
REGISTER FILE (2)
RETARGETABLE COMPILER (2)
SEMANTICS (2)
SOFTWARE ENGINEERING (2)
SYNCHRONIZATION (2)
SYSTEM MONITORING (2)
TRANSIENT ANALYSIS (2)
TRANSIENT FAULTS (2)
16-TILE GODSON-T TAPE-OUT PROJECT (1)
6 XILINX VIRTEX-5 LX330 FPGA (1)
ABSTRACT STATE MACHINE FORMALISM (1)
ACCELERATION (1)
ADA 2005 (1)
ADAPTIVE STRATEGY (1)
ADVANCED COMPILER TECHNOLOGY (1)
ADVANCED INFRASTRUCTURE (1)
ADVANCED TELECOMMUNICATIONS COMPUTING ARCHITECTURE (1)
ADVANCEDTCA (1)
AEROSPACE ELECTRONICS (1)
AGGRESSIVE OPERATING MARGIN (1)
ALGORITHMIC HEURISTIC (1)
ALMOST IN-ORDER COMPLEXITY (1)
ANSI C (1)
API (1)
APPLICATION PROGRAM INTERFACE (1)
APPLICATION SOFTWARE (1)
APPLICATION SPECIFIC INSTRUCTION-SET PROCESSORS (1)
APPLICATION SPECIFIC INTEGRATED CIRCUITS (1)
APPLICATION SPECIFIC SPECULATIVE DEVICES (1)
ARCHITECTURAL PARAMETERS (1)
more

INFONA - science communication portal

Advanced search

Advanced search

Runtime automatic speculative parallelization

MAO — An extensible micro-architectural optimizer

Static Analysis of Register File Vulnerability

Abstract state machines as an intermediate representation for high-level synthesis

GVE: Godson-T Verification Engine for many-core architecture rapid prototyping and debugging

Two Efficient Software Techniques to Detect and Correct Control-Flow Errors

Software integration of identical DLP threads via compilation for VLIW processors

Control Independence Using Dual Renaming

A language-based approach to implementing multi-vendor support in an AdvancedTCA Shelf Manager

CCDA: Correcting control-flow and data errors automatically

Generating Performance Bounds from Source Code

Global Register Alias Table: Executing Sequential Program on Multi-Core

Rapid prototyping and compact testing of CPU emulators

Improving the performance of program monitors with compiler support in multi-core environment

Concept and Development of Modular VLIW Processor Based on FPGA

Profile driven data-dependency analysis for improved high level language hardware synthesis

Implicit Data Permutation for SIMD Devices

Reducing processor energy consumption by compiler optimization

Code density concerns for new architectures

Design and deployment of a generic ECC-based fault tolerance mechanism for embedded HW cores

Filter options

Publication date

Publication type

Keywords

INFONA - science communication portal

Advanced search

Advanced search

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options