The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This work presents an efficient hardware accelerator design of deep residual learning algorithms, which have shown superior image recognition accuracy (>90% top-5 accuracy on ImageNet database). Two key objectives of the acceleration strategy are to (1) maximize resource utilization and minimize data movements, and (2) employ scalable and reusable computing primitives to optimize physical design...
Demand for low-power data processing hardware continues to rise inexorably. Existing programmable and “general purpose” solutions (eg. SIMD, GPGPUs) are insufficient, as evidenced by the order-of-magnitude improvements and industry adoption of application and domain-specific accelerators in important areas like machine learning, computer vision and big data. The stark tradeoffs between efficiency...
The integration of mixed signal circuits in Systems on Chip is a trend in modern systems and applications with important challenges. In particular, the simulation of this kind of systems is a very time-consuming process that is becoming more and more complex due to the size of current designs. This paper describes a HW/SW co-simulation environment for mixed-signal circuits. The analog components are...
When designing hardware accelerators for System on Chips, hardware and software integration can quickly become difficult. Heterogeneity in the interfaces prevents developers from efficiently using available hardware. In this paper, we propose an improved microcontroller approach to Intellectual Property (IP) core integration in System on Chips. This approach is based on an instruction set designed...
Traffic accidents negatively affect the lives of human beings. Accidents may result in deaths, severe injuries, and loss of income to the impacted families. Accident detection and prevention is a keystone in improving road safety. In this paper, a system for detecting vehicle collision and rollover is presented. The proposed system includes three key phases. Data acquisition where accelerometer and...
This paper presents a complete video fusion system with hardware acceleration and investigates the energy trade-offs between computing in the CPU or the FPGA device. The video fusion application is based on the Dual-Tree Complex Wavelet Transforms (DT-CWT). Video fusion combines information from different spectral bands into a single representation and advanced algorithms based on wavelet transforms...
Technology scaling and growing use of accelerators make optimization of data movement of increasing importance in all computing systems. Further, growing diversity in memory structures makes embedding such optimization in software non-portable. We propose a novel architectural solution called Data Layout Transformation (DLT) associated with a simple set of instructions that enable software to describe...
In this work, we investigate a transformation of VHDL descriptions into equivalent formal models. The targeted equivalence is at the level of the functional behavior. That is, we aim at producing formal models that have the same functional simulation behavior as the original VHDL implementation. We rely on the BIP component-based modeling language as the underlying formalism for this transformation...
We explore vectorised implementations, exploiting single instruction multiple data (SIMD) CPU instructions on commonly used architectures, of three efficient algorithms for morphological dilation and erosion. We discuss issues specific to SIMD implementation and describe how they guide algorithm choice. We compare our implementations to a commonly used opensource SIMD accelerated machine vision library...
Today virtualization technology is the focus of many new potential threats and introduces new security challenges that we must meet. The key problem is that malware can utilize the virtualization techniques of modern CPUs for “hidden virtualization” (invisible for user): to execute as a hypervisor and transform the working operation system (OS) into a “guest” state. In this work we analyzed and compared...
This paper presents four different architectures for the hardware acceleration of axis-parallel, oblique and non-linear decision tree ensemble classifier systems. Hardware architectures for the implementation of a number of ensemble combination rules are also presented. The proposed architectures are optimized for size, making them particularly interesting for embedded applications where the size...
Coupling processors with acceleration hardware is an effective manner to improve energy efficiency of embedded systems. Many-core is nowadays a dominating design paradigm for SoCs, which opens new challenges and opportunities for designing HW blocks. Exploring acceleration solutions that naturally fit into well-established parallel programming models and that can be incrementally added on top of existing...
High-performance computing as we know it today is experiencing unprecedented changes, encompassing all levels from technology to use cases. This paper explores the adoption of customizable, deeply heterogeneous manycore systems for future QoS-sensitive and power-efficient high-performance computing. At the heart of the proposed architecture is a NoC-based manycore system embracing medium-end CPUs,...
In this paper, we perform a systematic comparison to study the energy cost of varying data formats and data types w.r.t. arithmetic logic and data movement for accelerator-based heterogeneous systems in which both compute-intensive (FFT accelerator) and data-intensive accelerators (DLT accelerator) are added. We explore evaluation for a wide range of design processes (e.g. 32nm bulk-CMOS and projected...
The EDA industry has recently witnessed the growing popularity of densely populated, IP rich SoC designs targeting high performance computing platforms. Such SoCs require effective logic simulation, with high levels of accuracy and throughput, for a fault free design and faster time to market. Hardware-Assisted Simulation (HAS) is the appropriate choice, while simulating such designs. Existing HAS...
MPX implements hardware accelerated support for detection and prevention of memory corruption. This paper will examine the effectiveness of MPX. Herein we attempt to find false positives and false negatives, and to determine what attacks may still be feasible. In particular we wish to see if a system protected by MPX is still exploitable. Intel MPX appears to provide a solid mitigation technique,...
Modern Graphics Processing Units (GPUs) are well provisioned to support the concurrent execution of thousands of threads. Unfortunately, diUerent bottlenecks during execution and heterogeneous application requirements create imbalances in utilization of resources in the cores. For example, when a GPU is bottlenecked by the available oU-chip memory bandwidth, its computational resources are often overwhelmingly...
In the open hardware graphics accelerator (ORGFX), there are rectangle, line, triangle and curve rasterization modules. This paper is only focused on the improvement of line rasterization speed. Besides modifying the algorithm itself, hardware implementation and resource consumption are put into consideration here. Originally, ORGFX uses the classic Bresenham line algorithm with high precision and...
The paper presents IPPro which is a high performance, scalable soft-core processor targeted for image processing applications. It has been based on the Xilinx DSP48E1 architecture using the ZYNQ Field Programmable Gate Array and is a scalar 16-bit RISC processor that operates at 526MHz, giving 526MIPS of performance. Each IPPro core uses 1 DSP48, 1 Block RAM and 330 Kintex-7 slice-registers, thus...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.