The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper presents a low power AES-GCM authenticated encryption IP core which combines an improved four-parallel architecture, an advanced 65nm SOTB CMOS technology and a low complexity clock gating technique. As a result, the power consumption of the proposed AES-GCM core is only 8.9mW which is lower than other AES-GCM IP cores presented in literature. The detail implementation results are also...
The expanding use of deep learning algorithms causes the demands for accelerating neural network (NN) signal processing. For the NN processing, in-memory computation is desired, in which expensive data transfer can be eliminated. In reflection of recently proposed binary neural networks (BNNs), which can reduce the computation resource and area requirements, we designed an in-memory BNN signal processor...
In this paper, an FPGA-based implementation of Frequent Items Counting is proposed. The architecture deploys the equality comparator matrix for comparing the input items with themselves to count them instantly within a single operating clock. The proposed architecture is applied to the case of the 8-bit item. That means 256 different types of items in total. The system is built and verified on the...
High-level synthesis is increasingly being used to automatically translate existing software algorithms into hardware quickly and efficiently. Typically, the circuits created by HLS are implemented on Field-Programmable Gate Arrays (FPGAs). While the fine-grained architecture of an FPGA is well suited for general circuit implementation, it can result in excessive routing resource utilization for larger...
A new technique for computing the truncated cube of an operand at length of power two is proposed, implemented, analyzed, and compared to existing techniques. The new proposed method is comparable to previously proposed methods that compute the cube of an operand in parallel. Post layout results are presented in a 65nm Application Specific Integrated Circuit implementation and are compared against...
This paper reports on a content addressable memory (CAM) employing a multi-Vdd scheme for low power pattern recognition applications. The complete design, simulation and testing of the chip is presented along with an exploration of the multi-Vdd design space. The proposed design, operating at an optimal operating point in a triple-Vdd configuration, increases the delay range by 2.4 times and consumes...
High density embedded memories have been demanded increasingly to enhance the performance and reduce the power dissipation of advanced systems, such as multicore processors, which have been used in a wide variety of applications from servers to Internet-of-things (IoT) devices. In this paper, a memory cell, referred to as the gain-cell magnetoresistive random access memory (gMRAM), is introduced....
Spiking Neural Networks (SNNs) are the third generation of artificial neural networks that closely mimic the time encoding and information processing aspects of the human brain. It has been postulated that these networks are more efficient for realizing cognitive computing systems compared to second generation networks that are widely used in machine learning algorithms today. In this paper, we review...
This paper presents an original and unique embedded FFT hardware algorithm development process based on a systematic and scalable procedure for generating permutation-based address patterns for any power-of-2 transform size algorithm and any folding factor in a Kronecker Pease FFT hardware implementation. This is coupled by a procedure to perform automatic code generation of Kronecker FFT cores. The...
In this paper we present a memristive neuromorphic system for higher power and area efficiency. The system is based on a mixed signal approach considering the digital nature of the peripheral and control logics and the integration being analog. So, the system is connected digitally outside but the core is purely analog. This mixed signal approach provides the advantage of implementing neural networks...
This paper presents an all-digital background blind calibration technique for the capacitor mismatch problem in SAR ADCs. It utilizes the redundancy offered using a sub-radix-2 DAC architecture to blindly estimate the mismatch and the assigned weight for each comparator decision. The weights are estimated by building partial histogram windows for the comparator decision vectors. To remove the dependency...
Scalar addition and multiplication generally obey commutative and distributive laws. However, in their hardware implementation, error propagation and accumulation do not necessarily follow the same rules. In this paper we present a statistical analysis of the accuracy in complex multiplication approaches for IEEE 754 single precision operands. Several approaches were evaluated using a dual approach...
Stochastic turbo decoder is a new scheme for turbo codes. But the long decoding latency and high complexity are two main challenges for fully parallel stochastic turbo decoders. In this paper, we proposed a novel stochastic turbo decoder scheme with two high accuracy stochastic operator modules, including no-scaling stochastic addition and stochastic normalization operator, which can improve the decoding...
Massive multiple input multiple output (MIMO) technology plays an important role in next generation wireless communication systems. Modified Brent-Luk-Van Loan array and other parallel hardware implementations were developed for channel matrix factorization. For a large matrix size as of massive MIMO, however, previous implementations would require a large amount of hardware resource. This paper presents...
Memristor technology is receiving an increased attention as a potential solution to meet the scaling demands in integrated circuit design. Memristor provides advantages like high-density, low-power, non-volatility and good scalability. In this paper, an 8-bit iterative full adder design is proposed that uses space-time based circuit notation. It uses stateful logic with memristive nanowire crossbar...
This paper proposes novel soft error detection and mitigation technique in reduced instruction set computer (RISC) based pipeline processors. We leveraged the data encoding techniques (re-computing with rotated operands (RERO)) in conjunction with back pressure controlling mechanism in pipeline architecture. In order to alleviate the performance degradation due to potential stalling, we exploited...
The spin transfer torque magnetic random access (STT-MRAM) is suitable for embedded memories and also for the second level cache memory in the mobile CPU's. The most capable Nonvolatile memory (NVM) component is STT-MRAM. There is a demand to improve efficient circuit and architecture to compete with the existing NVM technologies. Low energy consumption is achieved to write and read into MTJ. This...
Viterbi detectors are widely used in data recording channels in the timing loop as well as in the digital back end before error-correction decoding to detect data in the presence of inter-symbol interference (ISI) and noise. Further, soft reliability values assist in the decoding of outer codes. The state-of-the-art implementations of the Viterbi algorithm are synchronous which consider the ‘worst-case’...
Sparse Code Multiple Access (SCMA) is a promising multiple access technology candidate for 5G wireless communication systems. The high detection complexity is its bottleneck. Stochastic computation is an ultra-low complexity digital signal processing technique in which probabilities are represented and processed with streams of random bits. In this paper, we propose a novel low complexity stochastic...
Side channel attacks are a major class of attacks to crypto-systems. Attackers collect and analyze timing behavior, I/O data, or power consumption in these systems to undermine their effectiveness in protecting sensitive information. In this work, we propose a new cache architecture, called Janus, to enable crypto-systems to introduce randomization and uncertainty in their runtime timing behavior...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.