The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
A method of retiming the spatial synchronous dataflow graph (SDF) is proposed, which is based on the SDF representation in the multidimensional space. The dimensions of this space are the spatial coordinate of the processing unit, coordinate of the operator firing and operator type. At the first stage of the datapath synthesis, the operator nodes are placed in the space according to a set of rules...
Coarse-grained reconfigurable architectures (CGRAs) present an appealing hardware platform by providing the potential for high computation throughput, scalability, low cost, and energy efficiency. CGRAs consist of an array of function units and register files often organized as a two dimensional grid. The most difficult challenge in deploying CGRAs is compiler scheduling technology that can efficiently...
Graphics Processing Units (GPUs) are designed to exploit large amount of parallelism. However, warp-level divergence occurring due to different amounts of work, memory access latency experienced, etc., results in warps of a thread block (TB) finishing kernel execution at different points in time. This, in effect, reduces utilization of resources of SMs and hence performance of the GPU. We propose...
Sparse code multiple access (SCMA) is a new kind of multiple access (MA) technology which ranks one of the most promising candidates for 5G wireless because of its outstanding performance. SCMA enjoys stronger overloading tolerance compared with traditional MA technologies. It also takes the advantage of sparse property to achieve lower complexity when using message passing algorithm (MPA). In this...
Sparse code multiple access (SCMA) is a new kind of multiple access (MA) technology which ranks one of the most promising candidates for 5G wireless because of its outstanding performance. SCMA enjoys stronger overloading tolerance compared with traditional MA technologies. It also takes the advantage of sparse property to achieve lower complexity when using message passing algorithm (MPA). In this...
Despite many efforts to better utilize the potential of GPUs and CPUs, it is far from being fully exploited. Although many tasks can be easily sped up by using accelerators, most of the existing schedulers are not flexible enough to really optimize the resource usage of the complete system. The main reasons are (i) that each processing unit requires a specific program code and that this code is often...
Statically pipelined processors have a fully exposed datapath where all portions of the pipeline are directly controlled by effects within an instruction, which simplifies hardware and enables a new level of compiler optimizations. This paper describes an effect scheduling strategy to aggressively compact instructions, which has a critical impact on code size and performance. Unique scheduling challenges...
Computer designers rely upon near-cycle-accurate microarchitectural simulators to explore the design space of new systems. Hybrid simulators which offload simulation work onto FPGAs (also known as FAME simulators) can overcome the speed limitations of software-only simulators. However such simulators must be automatically synthesized or the time to design them becomes prohibitive. Previous work has...
Instruction level parallelism is one of the basic ways of increasing the performance of current processors. One method to improve instruction parallelism is the chain technique, which bypasses execution results from one Arithmetic Logic Unit (ALU) to others. However, this technique cannot be used with the current superscalar processor scheduling method. We develop a scheduling method for the chain...
Mixed-criticality systems, in which multiple tasks of varying criticality execute on a single hardware platform, are an emerging research area in real-time embedded systems. High-criticality tasks require spatial and temporal isolation guarantees for independent verification, and the task set should efficiently utilize hardware resources. Hardware-based isolation is desirable but often underutilizes...
The number of configurable systems deployed in hostile environments continues to rise. This, along with decreasing geometries and lower operating voltages leads to an expected increase in transient errors. This paper presents Resiliency-aware Scheduling, a novel approach to resource allocation for hardening computations on configurable systems. Using modular and replicated functional units called...
In this paper we describe and analyze the main features of the Hardware Real-Time Scheduler Coprocessor unit (HRTC) for NIOS II processor. We describe how the HRTSC supports time, events, task and priorities. The HRTSC was designed as a SOPC component to incorporate real-time features for embedded real-time applications. The hardware architecture has an easy integration with the IDE programming environment...
Coarse Grained Reconfigurable Arrays (CGRAs) are typically very efficient for a single task. However all functional units are required to perform in lock step, wasting resources and making complex programming flows difficult. Massively Parallel Processor Arrays (MPPAs) excel at executing unrelated tasks simultaneously, but limit the amount of resources dedicated to a single task. We propose an architecture...
The recently-discovered polar codes are widely seen as a major breakthrough in coding theory. These codes achieve the capacity of many important channels under successive cancellation decoding. Motivated by the rapid progress in the theory of polar codes, we propose a family of architectures for efficient hardware implementation of successive cancellation decoders. We show that such decoders can be...
In this paper a multithreaded processor with hardware context switch mechanism driven by external events is presented for multi-processor system on chip (MPSoC). Combining this mechanism with asynchronous memory access the proposed processor implements Non-preemptive thread scheduling which can assure fairness of threads and optimization for single thread. The overhead of hardware thread switch is...
With increasing complexity of MPSoCs, efficient runtime management of system resources becomes of vital importance for improving the system performance and energy efficiency. OSIP-an operating system application-specific instruction-set processor - provides a promising solution to this. It delivers high computational performance to deal with dynamic task scheduling and mapping, while still being programmable...
Theoretical real-time research generally neglects context switch times. But in recent embedded applications which consist of dozens of threads with very short execution times, their impact is too serious to be ignored. We present a hard real-time scheduling algorithm that perfectly hides the context switch times of an arbitrary number of threads. It requires a Simultaneous Multithreaded (SMT) processor...
Recently, next-generation digital entertainment and mobile communication devices are driving the demand for high-performance processing solutions. In order to achieve this demand, multiple-issue processors such as very long instruction word (VLIW) architecture augmented with a reconfigurable hardware accelerator have been proposed in many papers. The reconfigurable hardware accelerator is usually...
The paper discusses optimization of hardware architecture from angle of compiling. The angle of compiling, which the paper refers to, is the located compiling technology. That is to say, the paper will be analyzed how to optimizing instruction set, register location and pipelining of hardware architecture from GCC compiling technology, such as peephole, diagram coloring and instruction scheduling.
With the increasing chip density and the continuous improvement of reliability requirements, the concurrent error detection techniques in register transfer level have become an increasing concern. Because of the actual low probability of failure occurring, a number of semi-concurrent error detection techniques are feasible. The circuits can be checked in every N iterations. So the recomputations will...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.