The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The RSA algorithm is the most used public key cipher algorithm. Since it was presented in 1977, designers have proposed several architectures and implementations to improve its performance using different techniques and devices. In all cases, Montgomery's modular multiplication algorithm is the best option when trying to design an efficient RSA. In this paper, we present a high-radix Montgomery's...
This paper makes the following contributions: 1) A compact trie representation and a hybrid data structure for IP lookup that reduces the memory consumption while eliminating backtracking. 2) A merging algorithm that eliminates leaf pushing and simplifies table updates in virtual routers.
This work extends force-directed scheduling (FDS) to support specification constructs that express parallelism and timing behaviours. We select the FDS algorithm because it maximizes the amount of resource sharing, and it naturally supports constructs for parallelism. However, timed constructs are not supported. As a result, we propose timed FDS (TFDS) that optimizes over parallel, timed and untimed...
Recently we proposed occam-pi as a high-level language for programming massively parallel reconfigurable architectures. The design of occam-pi incorporates ideas from CSP and pi-calculus to facilitate expressing parallelism and reconfigurability. The feasability of this approach was illustrated by building three occam-pi implementations of DCT executing on an Ambric. However, because DCT is a simple...
In this paper we describe a double precision floating point sparse matrix-vector multiplier (SpMV) and its performance as implemented on a Convey HC-1 reconfigurable computer. The primary contributions of this work are a novel streaming reduction architecture for floating point accumulation, a novel on-chip cache optimized for streaming compressed sparse row (CSR) matrices, and end-to-end integration...
This paper proposes techniques for face detection and gives the implementation details for an FPGA development board. We analyze and discuss the relation between the system computation cost and selection of the image scaling factor. We give a new method to select the stop threshold for the image reduction process, which reduces the total computation by half. We also provide a color image output mode...
Next generation video standards have strict and increasing performance demands due to real-time requirements and the trend towards higher frame resolutions and bit rates. Leveraging the advantages of reconfigurable logic and emerging multi-core processor architectures to exploit all levels of parallelism of such workloads is necessary to achieve real time functionality at a reasonable cost.
We present a parallelized and pipelined architecture for a generalized Laguerre-Volterra MIMO system to identify the time-varying neural dynamics underlying spike activities. The proposed architecture consists of a first stage containing a vector convolution and MAC (Multiply and Accumulation) component, a second stage containing a pre-threshold potential updating unit with an error approximation...
With the on-chip resources largely increased, the computing paradigm of modern architectures are much different from traditional ones. The relation between temporal computing and spatial computing is getting more and more intricate. In this paper, a coarse-grained reconfigurable architecture is proposed named as programmable dataflow computing architecture: ProDFA. With both fine-grained control ability...
Computations involving matrices form the kernel of a large spectrum of computationally demanding applications for which FPGAs have actively been utilized as accelerators. The performances of such matrix operations on FPGAs are related to underlying architectural parameters such as computational resources, memory and I/O bandwidth. A model that gives bounds on the peak performance of matrix-vector...
FPGAs as commodities offer a resource for high-performance computation that is unmatched in flexibility and price/performance. As a lab, we are interested in high-level descriptions of computation and data, and how they may be customized to map effectively on FPGA fabrics. This paper describes our tool-chain, approach and methodology to FPGA utilization. We give a case study of the generation of a...
NAND flash memory has been widely used for data storage due to its high density, high throughput, low cost, and low power. However, as flash memory manufacturers scale to smaller process technologies and store more bits per cell, the reliability and endurance of flash memory are decreasing. Wear-leveling and error correction coding can significantly improve both reliability and endurance, but finding...
In this paper, we use an FPGA platform and a high level synthesis tool, called Impulse C, to speedup a statistical Line Of Reaction (LOR) estimation for a high-resolution Positron Emission Tomography (PET) scanner. The estimation algorithm provides a significant improvement over conventional methods, but the execution time is too long to be practical for clinic applications. Impulse C allows us to...
The FPGA compilation process (synthesis, map, place, and route) is a time consuming task that severely limits designer productivity. Compilation time can be reduced by saving implementation data in the form of hard macros. Hard macros consist of previously synthesized, placed and routed circuits that enable rapid design assembly because of the native FPGA circuitry (primitives and nets)which they...
The regularity of resources found in FPGAs is a unique feature, which can be utilized in a number of applications, e.g., in timing critical applications or applications with a demand for homogeneous routing. Current synthesis tools do not support an automatic generation of homogeneous FPGA designs, such that a time-consuming hand-crafted design is required. We present a tool flow, which automatically...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.