The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Caches are traditionally organized as a rigid hierarchy, with multiple levels of progressively larger and slower memories. Hierarchy allows a simple, fixed design to benefit a wide range of applications, since working sets settle at the smallest (i.e., fastest and most energy-efficient) level they fit in. However, rigid hierarchies also add overheads, because each level adds latency and energy even...
This paper presents a complete on-chip ADC BIST solution based on a segmented stimulus error identification algorithm known as USER-SMILE. By adapting the algorithm for efficient hardware realization, the solution is implemented towards a 1Msps 12-bit SAR ADC on a 28nm CMOS automotive microcontroller. While sufficient test accuracy is demonstrated, the solution is further extended to correct linearity...
A system-on-chip field gate programmable array (FPGA)-based video processing platform for human detection in complex scenes is presented. This study details the hardwarebased implementation of a human detection algorithm in 2D/3D scenes, including the capture, video processing, and display stages. The proposed method is implemented by extending a previously proposed method that uses features extracted...
Iterative stencils are kernels in various application domains such as numerical simulations and medical imaging, that merit FPGA acceleration. The best architecture depends on many factors such as the target platform, off-chip memory bandwidth, problem size, and performance requirements. We generate a family of FPGA stencil accelerators targeting emerging System on Chip platforms, (e.g., Xilinx Zynq...
High-Level Synthesis has emerged as a promising technology for improving FPGA designer productivity, but will only be successful if it is accompanied by a debug ecosystem. Recent efforts have presented in-system debug techniques which allow a designer to debug an implementation, running on an FPGA, in the context of the original source code. These techniques typically store a history of all user variables...
Large-scale graphs processing attracts more and more attentions, and it has been widely applied in many application domains. FPGA is a promising platform to implement graph processing algorithms with high power-efficiency and parallelism. In this paper, we propose OmniGraph, a scalable hardware accelerator for graph processing. OmniGraph can process graphs with different sizes adaptively and is adaptable...
Increasing data set sizes motivate for a shift of focus from computation-centric systems to data-centric systems, where data movement is treated as a first-class optimization metric. An example of this emerging paradigm is in-situ computing in largescale computing systems. Observing that data movement costs are increasing at an exponential rate even at a node level (as a node itself is fast-becoming...
Die-stacked DRAM (a.k.a., on-chip DRAM) provides much higher bandwidth and lower latency than off-chip DRAM. It is a promising technology to break the "memory wall". Die-stacked DRAM can be used either as a cache (i.e., DRAM cache) or as a part of memory (PoM). A DRAM cache design would suffer from more page faults than a PoM design as the DRAM cache cannot contribute towards capacity of...
The emerging Internet-of-Things (IoT) paradigm creates a new market for very small and cost-sensitive chips. Design costs must be as low as possible in order to be competitive. In this context, the 1-pin test has proven to be a beneficial way to significantly reduce test costs. However, the incorporated signature generation requires an X-free design, which is not always possible (e.g. due to timing...
A deep learning processor with 8 gated recurrent neural network (RNN) accelerators is proposed in this paper. It features on-chip incremental learning by numerical and local gradient computation enhancement. Extra precision of training is obtained without extending the bit-width. Tri-mode weight access (DMA/FIFO/RAM) improves the throughput during incremental learning. The number multipliers and activation...
Open source hardware projects are becoming more and more common. OpenRISC SOC, one of the prominent of these projects, has become quite popular with the support of volunteer developers. In this work, we have demonstrated the design of an DES (Data Encryption Standard) based system, that can be used in security applications, on ORPSoC-v2 (Openrisc Reference Platform System-on-Chip). Additionally, we...
Sparse Matrix-Vector multiplication (SpMV) is a fundamental kernel for many scientific and engineering applications. However, SpMV performance and efficiency are poor on commercial of-the-shelf (COTS) architectures, specially when the data size exceeds on-chip memory or last level cache (LLC). In this work we present an algorithm co-optimized hardware accelerator for large SpMV problems. We start...
Novel applications demand computational resources that are provided by multiprocessor systems-on-chip (MPSoCs). At the same time, they increasingly process sensitive data and incorporate security-relevant functions like encryption or authentication. This paper discusses the implications of the MPSoC technology on security. It provides an overview of hardware-oriented techniques to enhance security...
As the complexity of System-on-Chip (SoC) and the reuse of third party IP continues to grow, the security of a heterogeneous SoC has become a critical issue. In order to increase the software security of such SoC, the TrustZone technology has been proposed by ARM to enforce software security. Nevertheless, many SoC embed non-trusted third party Intellectual Property (IP) trying to take the benefits...
Network security and monitoring devices use packet classification to match packet header fields in a set of rules. Many hardware architectures have been designed to accelerate packet classification and achieve wire-speed throughput for 100 Gbps networks. The architectures are designed for high throughput even for the shortest packets. However, FPGA SoC and Intel Xeon with FPGA have limited resources...
We present ongoing work on a platform for mobile health and implantable telemetry devices with powerful point-of-contact processing capabilities based on our VivoSoC multi-sensor medical instrumentation SoC, a custom power management IC, and only a few additional components - allowing the realisation of sub-ccm devices. We detail the powerful yet efficient acquisition and parallel processing capabilities...
This paper explores the use of On-chip cryptographic units for implementing security in low cost wireless sensor networks. The objective of this research is to reduce the deployment time and computational complexity of security protocols in WSNs, whilst keeping security related performance parameters at par with the current state-of-the-art. A method is proposed to continue using simple radio transreceiver...
A number of critical design decisions, such as network topology, buffer sizes, flow control mechanism and so on so forth, have to be evaluated in any NoC the design. Designs and verifications of NoCs are based on either software simulations, which are extremely slow and inaccurate for complex models, or hardware emulations using low/mid-class FPGAs, where the scalability of the NoC system is intensively...
The home-grown SW26010 many-core processor enabled the production of China’s first independently developed number-one ranked supercomputer – the Sunway TaihuLight. The design of the limited off-chip memory bandwidth, however, renders the SW26010 a highly memory-bound processor. To compensate for this limitation, the processor was designed with a unique hardware feature, "Register Level Communication"...
With the increase of CMP (Chip-Multiprocessor) scale, moving data to computation on chip becomes more expensive. Accordingly, moving computation to data has potential to improve efficiency. We propose an in-place computation co-design of many-simple-core CMP for irregular applications. The computing paradigm is that an application's critical irregular data (or part of them) is partitioned into on-chip...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.