The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Convolutional neural networks (CNNs) are revolutionizing machine learning, but they present significant computational challenges. Recently, many FPGA-based accelerators have been proposed to improve the performance and efficiency of CNNs. Current approaches construct a single processor that computes the CNN layers one at a time; the processor is optimized to maximize the throughput at which the collection...
In this paper, we propose two different hardware structure of SHA-3 hash algorithm for different width of circuit interface. They both support the four functions SHA3-224/256/384/512 of SHA-3 algorithm. The padding unit of our design is also implemented by hardware instead of software. Besides, a 3-round-in-1 structure is proposed to speed up the throughput of our circuit. We conduct an implementation...
High Efficiency Video Coding (HEVC), a modern video compression standard, exceeds the predecessor H.264 in efficiency by 50%, but with cost of increased complexity. It is one of main research topics for FPGA engineers working on image compression algorithms. On the other hand high-level synthesis tools after few years of lower interest from the industry and academic research, started to gain more...
Some modern high-level synthesis (HLS) tools [1] permit the synthesis of multi-threaded software into parallel hardware, where concurrent software threads are realized as concurrently operating hardware units. A common performance bottleneck in any parallel implementation (whether it be hardware or software) is memory bandwidth — parallel threads demand concurrent access to memory resulting in contention...
The backbone of modern infrastructure is concrete, yet modern ultrasonic characterization techniques often fail to be accurate and repeatable. Recent advances in low frequency ultrasonic array hardware have generated a new potential for improved material and structural characterization of concrete. However, the analysis and interpretation associated with the new array based hardware is more difficult...
An ultrasonic array system is designed for wirelessly powering and communicating with implantable medical devices. A flexible hardware interface is used to drive a 32-element linear phased array with emphasis on programmability and efficiency rather than bandwidth or peak power. Various beam patterns and data modulation schemes are verified at a depth of 6 cm in a tissue phantom with more than sufficient...
With the growing importance of deep learning and energy-saving approximate computing, half precision floating point arithmetic (FP16) is fast gaining popularity. Nvidia's recent Pascal architecture was the first GPU that offered FP16 support. However, when actual products were shipped, programmers soon realized that a naïve replacement of single precision (FP32) code with half precision led to disappointing...
Clustering is a crucial tool for analyzing data in virtually every scientific and engineering discipline. The U.S. National Academy of Sciences (NAS) has recently announced "the seven giants of statistical data analysis" in which data clustering plays a central role [1]. This research also emphasizes that more scalable solutions are required to enable time and space clustering for the future...
The growing prominence and computational challenges imposed by Deep Neural Networks (DNNs) has fueled the design of specialized accelerator architectures and associated dataflows to improve their implementation efficiency. Each of these solutions serve as a datapoint on the throughput vs. energy trade-offs for a given DNN and a set of architectural constraints. In this paper, we set out to explore...
In this paper the hardware designed for the SKA Low (Square Kilometre Array) correlator and beamformer (CBF) is discussed. SKA-Low is a low frequency aperture array (LFAA) to be located in remote Western Australia. The array is collecting radio signals in the frequency range from 50 to 350 MHz. The large number of dual polarization antennas (131072) are distributed over a total of 512 stations with...
ASKAP has recently started its Early Science program with 12 MkII PAF-equipped antennas and 36 beams simultaneously covering a 30 square degree field of view. The first observations have focused on mapping extragalactic neutral hydrogen in galaxy groups and clusters selected by the ‘WALLABY’ Survey Science Team. Significant efforts from engineers, software designers, and scientists are overcoming...
Customizing the precision of data can provide attractive trade-offs between accuracy and hardware resources. Custom hardware and FPGA designs allow bit-level control over precision, but software is typically limited by the range of types supported by the underlying processor. We propose a new form of vector computing aimed at arrays of custom-precision data on general-purpose processors with SIMD...
With the increase in data volumes, it is prudent to classify data depending on its criticality. One might prefer a cheap storage for a year old system logs but a highly fault tolerant storage for personal photos. The existing solutions include storing these two sets of data in two different systems or choosing a system with a fault tolerance level required by the most critical data. This means a higher...
The start-up value of an SRAM cell is unique, random, and unclonable as it is determined by the inherent process mismatch between transistors. These properties make SRAM an attractive circuit for generating encryption keys. The primary challenge for SRAM based key generation, however, is the poor stability when the circuit is subject to random noise, temperature and voltage changes, and device aging...
Knowledge graphs have been widely adopted, in large part owing to their schema-less nature. It enables knowledge graphs to grow seamlessly and allows for new relationships and entities as needed. Knowledge graph has become a powerful tool to represent knowledge in the form of a labelled directed graph and to give semantics to textual information. A knowledge graph is a graph constructed by representing...
High-Level Synthesis (HLS) has been widely recognized as an efficient compilation process targeting FPGAs for algorithm evaluation and product prototyping. However, the massively parallel memory access demands and the extremely expensive cost of single-bank memory with multi-port have impeded loop pipelining performance. Thus, based on an alternative multi-bank memory architecture, a joint approach...
Processing In Memory (PIM), the concept of integrating processing directly with memory, has been attracting a lot of attention since PIM can assist in overcoming the throughput limitation caused by data movement between CPU and memory. The challenge, however, is that it requires the programmers to have a deep understanding of the PIM architecture to maximize the benefits such as data locality and...
The SKA (Square Kilometer Array) radio telescope under construction will become the largest telescope in the world by integrating the sampled data from a huge number of small antenna nodes in the array to emulate a giant antenna. Due to the limited storage space, the SKA needs to process massive data in real-time, which makes the SKA scientific data processing become a bottleneck of the computational...
Timer interference arises when a high-priority realtime task is delayed by a timer interrupt that is intended for a lower-priority task. We demonstrate that high-resolution timers, as exposed for instance by Linux's hrtimer API, can cause substantial timer interference, which manifests as significantly increased response times and lowered throughput. To eliminate this source of unpredictability, we...
This paper presents a static-placement, dynamic-issue (SPDI) framework for the coarse-grained reconfigurable architecture (CGRA) in order to tackle the inefficiencies of the static-issue, static-placement (SISP) CGRA. This framework includes the compiler that statically places the operations and hardware design, a SPDI CGRA, that automatically schedule the operations. We stress on introducing the...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.