The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Modular multiplication, addition, and subtraction being the core operation of Elliptic curve public(ECC) system, the decrease of area and the merging of structure have been a hot topic in recent years. This paper first analyzes the difference between multiplication type and addition type of modular multiplier. Then, Combined with the structural characteristics of the modular adder, and mixing modular...
This paper presents the design and prototyping of hardware and software to address the problem of rapid and reliable 3D digitization of very large collections of pinned insects. Using the collection at the Field Museum of Natural History (FMNH) as a use case, a pipeline to ingest the entire collection of 4.5 million specimens in circa 1-2 years imposes a few second limit on average processing time...
Scientific simulations typically store only a small fraction of computed timesteps due to storage and I/O bandwidth limitations. Previous work has demonstrated the compressibility of floating-point volume data, but such compression often comes with a tradeoff between computational complexity and the achievable compression ratio. This work demonstrates the use of special-purpose video encoding hardware...
The increasing use of digital signal processors (DSPs) in wireless communications and signal processing necessitates the optimization of compilers to support special hardware features. In this paper, we propose a compiler transformation method for zero overhead loop (ZOL). It supports very long instruction word (VLIW), internal branches and the loops whose iterative times are known at runtime and...
Over the past few years we have articulated theory that describes ‘encrypted computing’, in which data remains in encrypted form while being worked on inside a processor, by virtue of a modified arithmetic. The last two years have seen research and development on a standards-compliant processor that shows that near-conventional speeds are attainable via this approach. Benchmark performance with the...
Cameras are the defacto sensor. The growing demand for real-time and low-power computer vision, coupled with trends towards high-efficiency heterogeneous systems, has given rise to a wide range of image processing acceleration techniques at the camera node and in the cloud. In this paper, we characterize two novel camera systems that use acceleration techniques to push the extremes of energy and performance...
The current Internet routing ecosystem is neither sustainable nor economical. More than 711K IPv4 routes and more than 41K IPv6 routes exist in current global Forwarding Information Base (FIBs) with growth rates increasing. This rapid growth has serious consequences, such as creating the need for costly FIB memory upgrades and increased potential for Internet service outages. And while FIB memories...
Packet forwarding in Software-Defined Networks (SDN) relies on a centralised network controller which enforces network policies expressed as forwarding rules. Rules are deployed as sets of entries into network device tables. With heterogeneous devices, deployment is strongly bounded by the respective table constraints (size, lookup time, etc.) and forwarding pipelines. Hence, minimising the overall...
Advancements in deep learning have ignited an explosion of research on efficient hardware for embedded computer vision. Hardware vision acceleration, however, does not address the cost of capturing and processing the image data that feeds these algorithms. We examine the role of the image signal processing (ISP) pipeline in computer vision to identify opportunities to reduce computation and save energy...
Large number multiplication has always been an essential operation in cryptographic algorithms. In this paper, we propose Broken-Karatsuba multiplication by applying the non-least-positive form to represent large numbers and dig the parallelism hidden in conventional Karatsuba multiplication. Further, we modify Montgomery modular multiplication algorithm with Broken-Karatsuba multiplication to make...
We introduce PyRTL, a Python embedded hardware design language that helps concisely and precisely describe digital hardware structures. Rather than attempt to infer a good design via HLS, PyRTL provides a wrapper over a well-defined "core" set of primitives in a way that empowers digital hardware design teaching and research. The proposed system takes advantage of the programming language...
Undergraduate students rapidly implement a partially-reconfigured, real-time video processor on the Xilinx PYNQ board. The video processor performs various real-time operations including Sobel edge detection, embossing, averaging, an interactive Pong game, etc., using a separate partially-reconfigurable bit-stream for each distinct function. Selection of image-processing functions is accomplished...
FPGAs have emerged as a cost-effective accelerator alternative in clouds and clusters. Programmability remains a challenge, however, with OpenCL being generally recognized as a likely part of the solution. In this work we seek to advance the use of OpenCL for HPC on FPGAs in two ways. The first is by examining a core HPC application, Molecular Dynamics. The second is by examining a fundamental design...
General-purpose workloads running on modern graphics processing units (GPGPUs) rely on hardware-based barriers to synchronize warps within a thread block (TB). However, imbalance may exist before reaching a barrier if a GPGPU workload contains irregular memory accesses, i.e., some warps may be critical while others may not. Ideally, cache space should be reserved for the critical warps. Unfortunately,...
Modern computer architectures have an ever-increasing demand for performance, but are constrained in power dissipation and chip area. To tackle these demands, architectures with application-specific accelerators have gained traction in research and industry. While this is a very promising direction, hard-wired accelerators fall short when too many applications need to be supported or flexibility is...
In fields like embedded vision, where algorithms are computationally expensive, hardware accelerators play a major role in high throughput applications. These accelerators could be implemented as hardwired IP cores or Application Specific Instruction-set Processors (ASIPs). While hardwired solutions often provide the best possible performance, they are less flexible then ASIP implementation. In this...
Code injection attacks are an undeniable threat in today's cyberworld. Instruction Set Randomization (ISR) was initially proposed in 2003. This technique was designed to protect systems against code injection attacks by creating an unique instruction set for each machine, thanks to randomization. It is a promising technique in the growing embedded system and Internet of Things (IoT) devices ecosystem,...
Today's network traffic are dynamic and fast. Conventional network traffic classification based on flow feature and data mining are not able to process traffic efficiently. Hardware based network traffic classifier is needed to be adaptable to dynamic network state and to provide accurate and updated classification at high speed. In this paper, a hardware architecture of online incremental semi-supervised...
In this paper, we propose a cross-layer integrated microprocessor design methodology where instructions in software programs drive the design down to the gate level netlists. Based on in-depth exploration of the dynamic timing behavior of each instruction in the program, a fully integrated design approach is proposed with ultra-dynamic clock and power management circuits and software driven design...
Network security and monitoring devices use packet classification to match packet header fields in a set of rules. Many hardware architectures have been designed to accelerate packet classification and achieve wire-speed throughput for 100 Gbps networks. The architectures are designed for high throughput even for the shortest packets. However, FPGA SoC and Intel Xeon with FPGA have limited resources...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.