The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper proposes a hybrid sorted table design for minimizing electronic trading latency, with three main contributions. First, a hierarchical sorted table with two levels, a fast cache table in reconfigurable hardware storing megabytes of data items and a master table in software storing gigabytes of data items. Second, a full set of operations, including insertion, deletion, selection and sorting,...
Accelerating relational databases in general and SQL in particular has become an important topic given thechallenges arising from large data collections and increasinglycomplex workloads. Most existing work, however, has beenfocused on either accelerating a single operator (e.g., a join) orin data reduction along the data path (e.g., from disk to CPU). In this paper we focus instead on the system...
This paper presents a stand-alone Dual-mode Tongue DriveSystem (sdTDS) which is designed for people with severedisabilities to control their environment using their tonguemotion and speech. The sdTDS detects user's tongue motion using a magnetic tracer placed on tongue and an array of magnetic sensors embedded in a wireless headset and at the same time it can capture the user's voice using a small...
The SKA (Square Kilometer Array) radio telescope under construction will become the largest telescope in the world by integrating the sampled data from a huge number of small antenna nodes in the array to emulate a giant antenna. Due to the limited storage space, the SKA needs to process massive data in real-time, which makes the SKA scientific data processing become a bottleneck of the computational...
We propose a design flow for automatic generation of hardware sandboxes. Our tool, the Component Authentication Process for Sandboxed Layouts (CAPSL), generates sandboxes capable of detecting trojan activation and nullifying potential damage to a system at run-time. Our approach captures the behavioral properties of non-trusted IPs with formal models that are translated to checker automata and implemented...
Deep neural networks have gained tremendous attention in both the academic and industrial communities due to their performance in many artificial intelligence applications, particularly in computer vision. However, these algorithms are known to be computationally very demanding for both scoring and model learning applications. State-of-the-art recognition models use tens of millions of parameters...
Library based design and IP reuse have been previouslyproposed to speed up the synthesis for large-scale FPGAdesigns. However, previous library based design flow faces severalunresolved challenges. Firstly, there may result in large wastearea between the modules due to the difference in module sizes. While utilizing multiple ratio modules can help to reduce thewaste area, pre-synthesis each module...
The increase in speed and capacity of FPGAs is faster than the development of effective design tools to fully utilize it, and routing of nets remains as one of the most time-consuming stages of the FPGA design flow. While existing works have proposed methods of accelerating routing through parallelization, they are limited by the memory architecture of the system that they target. In this paper, we...
Binary weighted networks (BWN) for imageclassification reduce computation for convolutional neuralnetworks (CNN) from multiply-adds to accumulates with littleto no accuracy loss. Hardware architectures such as FPGA cantake full advantage of BWN computations because of theirability to express weights represented as 0 and 1 efficientlythrough customizable logic. In this paper, we present animplementation...
DNNs (Deep Neural Networks) have demonstrated great success in numerous applications such as image classification, speech recognition, video analysis, etc. However, DNNs are much more computation-intensive and memory-intensive than previous shallow models. Thus, it is challenging to deploy DNNs in both large-scale data centers and real-time embedded systems. Considering performance, flexibility, and...
GPUs are capable of running a variety of applications, however their generic parallel-architecture can lead to inefficient use of resources and reduced power efficiency, due to algorithmic or architectural constraints. In this work, taking inspiration from CGRAs (coarse-grained reconfigurable architectures), we demonstrate resource sharing and re-distribution as a solution that can be leveraged by...
Neuromorphic computing is expanding by leaps and bounds through custom integrated circuits (digital and analog), and large scale platforms developed by industry or government-funded projects (e.g. TrueNorth and BrainScaleS, respectively). Whereas the trend is for massive parallelism and neuromorphic computation in order to solve problems, such as those that may appear in machine learning and deep...
While acceleration of Molecular Dynamics has received much attention, a significant part of that application, the bonded force calculation, has not. We present what we believe to be the first description and analysis of bonded force calculations outside of ASICs. We characterize the computational requirements. We find that a naive direct implementation requires FPGA resources out of proportion with...
Field programmable gate arrays (FPGAs) have been adopted in various fields, due to the design flexibility and customizability. Different applications have different requirements in performance, hardware resources and cost, leading to demands of diverse FPGA architectures. Delay is an important metric to evaluate different alternatives during FPGA architecture development. The existing analytical delay...
Convolutional neural networks (CNNs) have recently broken many performance records in image recognition and object detection problems. The success of CNNs, to a great extent, is enabled by the fast scaling-up of the networks that learn from a huge volume of data. The deployment of big CNN models can be both computation-intensive and memory-intensive, leaving severe challenges to hardware implementations...
We can build lightweight bit-serial FPGA NoC routers thatcost 20 LUT, 17 FF per router and operate at 800–900 MHzspeeds. Each bit-serial router implements deflection-routing on aunidirectional torus topology requiring 1b-wide connection perport. The key ideas that enable this implementation are (1)reformulation of the dimension-ordered routing (DOR) functionusing compact 1 LUT, 1 FF streaming pattern...
Sorting is one of the most fundamental and usefulapplications in computer science, and continues to be animportant tool in analyzing large datasets. An important andchallenging subclass of sorting problems involves sorting terabytescale datasets with hundreds of billions of records. Theconventional method of sorting such large amounts of datais to distribute the data and computation over a cluster...
Over the past decade, a wide-ranging collection of network functions in middleboxes has been used to accommodate the needs of network users. Although the use of general-purpose processors has been shown to be feasible for this purpose, the serial nature of microprocessors limits network functional virtualization (NFV) performance. In this paper, we describe a new heterogeneous hardware-software approach...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.