The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We present Runtime Automatic Speculative Parallelization (RASP), a technique for the dynamic extraction of speculative threads from a running application in a user-transparent fashion. By leveraging the idle cores in a CMP to analyze, optimize, and participate in the execution of a running sequential program, RASP enables a collection of simpler cores to achieve sequential performance on par with...
Performance matters, and so does repeatability and predictability. Today's processors' micro-architectures have become so complex as to now contain many undocumented, not understood, and even puzzling performance cliffs. Small changes in the instruction stream, such as the insertion of a single NOP instruction, can lead to significant performance deltas, with the effect of exposing compiler and performance...
With continuous technology scaling, soft errors are becoming an increasingly important design concern even for earth-bound applications. While compiler approaches have the potential to mitigate the effect of soft errors with minimal runtime overheads, static vulnerability estimation-an essential part of compiler approaches-is lacking due to its inherent complexity. This paper presents a static analysis...
This work presents a high-level synthesis methodology that uses the abstract state machines (ASMs) formalism as an intermediate representation (IR). We perform scheduling and allocation on this IR, and generate synthesizable VHDL. We have the following advantages when using ASMs as an IR: 1) it allows the specification of both sequential and parallel computation, 2) it supports an extension of a clean...
In this paper, we present a Godson-T Verification Engine (GVE) to rapidly prototype and debug our Godson-T many-core processor design. GVE adopts the state-of-the-art hardware platform which contains 6 Xilinx Virtex-5 LX330 FPGAs, thus permitting us to map our many-core processor and peripheral devices into it. Besides the hardware, our toolkit Godson-T Studio provides the compiler, program loader,...
This paper proposes two efficient software techniques, Control-flow and Data Errors Correction using Data-flow Graph Consideration (CDCC) and Miniaturized Check-Pointing (MCP), to detect and correct control-flow errors. These techniques have been implemented based on addition of redundant codes in a given program. The creativity applied in the methods for online detection and correction of the control-flow...
Based on the characteristics of data level parallelism (DLP) multi-threading programs appearing in the practical application, this paper proposes a new method that implements software integration of identical DLP threads via compilation for VLIW processors. This method translates DLP into ILP by merging the operations in corresponding basic blocks divided from different threads into one basic block...
Modern Super scalar Processor squashes up all of wrong-path instructions when the branch prediction misses. In deeper pipelines, branch miss prediction penalty increases seriously owing to large number of squashed instructions. Exploiting control independence has been proposed for reducing this penalty. Control Independence method reuses control independent instructions (CI instructions) without squashing...
The Pigeon Point® AdvancedTCA® Shelf Manager software supports multiple types of shelves produced by multiple vendors. While the number of supported platform configurations grew over time, hard coding platform-specific features in the software became impractical. A language-based approach has been designed and a new interpreted language for describing hardware platforms (Shelf Manager carrier boards...
This paper presents an efficient software technique to detect and correct control-flow errors through addition of redundant codes in a given program. The key innovation performed in the proposed technique is detection and correction of the control-flow errors using both control-flow graph and data-flow graph. Using this technique, most of control-flow errors in the program are detected first, and...
Understanding and tuning the performance of complex applications on modern hardware are challenging tasks, requiring understanding of the algorithms, implementation, compiler optimizations, and underlying architecture. Many tools exist for measuring and analyzing the runtime performance of applications. Obtaining sufficiently detailed performance data and comparing it with the peak performance of...
Executing sequential program on multi-core is crucial for accommodating instruction level parallelism (ILP) in chip multiprocessor (CMP) architecture. One widely used method of steering instructions across cores is based on dependency. However, this method requires a sophisticated steering mechanism and brings much hardware complexity and area overhead. This paper presents the Global Register Alias...
In this paper, we propose a novel rapid prototyping technique to produce a high quality CPU emulator at reduced development cost. Specification mining from published CPU manuals, automated code generation of both the emulator and its test vectors from the mined CPU specifications, and a hardware-oracle based test strategy all work together to close the gaps between specification analysis, development...
Dynamic program execution monitors allow programmers to observe and verify an application while it is running. Instrumentation-based dynamic program monitors often incur significant performance overhead due to instrumentation. Special hardware supports have been proposed to reduce this overhead. However, these supports mostly target specific monitoring requirements and thus have limited applicability...
Modern FPGA chips, with their larger memory capacity and reconfigurability potential, are opening new frontiers in rapid prototyping of embedded systems. With the advent of high density FPGAs it is now possible to implement a high performance VLIW processor core in an FPGA. Architecture based on Very Long Instruction Word (VLIW) processors are an optimal choice in the attempt to obtain high performance...
Existing high-level hardware synthesis tools typically focus on the automated discovery of opportunities for Instruction Level Parallelism (ILP) or alternatively allow designers to explicitly specify instances or opportunities for ILP. We present a novel profiling driven approach to the automated discovery of higher level speculative parallelism opportunities for custom-hardware implementations. The...
SIMD extension is one of the most effective ways to exploit data level parallelism in current microprocessor design. But limited by some constraints, such as memory address alignment and in consecutive memory access, data permutation operations are usually needed before SIMD calculations, which impede us to exploit more parallelism. In this paper, an implicit data permutation mechanism is proposed...
Purpose of embedded computing is to transform input data to output format. Functionality required to achieve this goal is therefore combination of operation executions on computing units and data transfers between those units. To avoid memory bottlenecks, processors use register files to store data during computation.
Reducing a program's instruction count can improve cache behavior and bandwidth utilization, lower power consumption, and increase overall performance. Nonetheless, code density is an often overlooked feature in studying processor architectures. We hand-optimize an assembly language embedded benchmark for size on 21 different instruction set architectures, finding up to a factor of three difference...
Current practices for the design and deployment of hardware redundancy techniques in embedded systems remain in practice specific (defined on a case-per-case basis) and mostly manual. This paper addresses the challenging problems of engineering fault tolerance mechanisms in a generic way and providing suitable tools for coping with their deployment. This approach relies on metaprogramming to specify...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.