The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Owing to the advantages of low standby power and high scalability, ReRAM technology is considered as a promising replacement for conventional DRAM in future manycore systems. In order to make ReRAM highly scalable, the memory array has to have a crossbar array structure, which needs a specific access mechanism for activating a row of memory when reading/writing a data block from/to it. This type of...
High-level synthesis (HLS) is well capable of generating control and computation circuits for FPGA accelerators, but still requires sufficient human effort to tackle the challenge of memory and communication bottlenecks. One important approach for improving data locality is to apply loop tiling on memory-intensive loops. Loop tiling is a well-known compiler technique that partitions the iteration...
In this paper, we present a new, dynamic graph data structure, built to deliver high update rates while keeping a low memory footprint using autonomous memory management directly on the GPU. By transferring the memory management to the GPU, efficient updating of the graph structure and fast initialization times are enabled as no additional memory allocation calls or reallocation procedures are necessary...
Memristors have extended their influence beyond memory to logic and in-memory computing. Memristive logic design, the methodology of designing logic circuits using memristors, is an emerging concept whose growth is fueled by the quest for energy efficient computing systems. As a result, many memristive logic families have evolved with different attributes, and a mature comparison among them is needed...
The POWER9™ Processor in 14 nm SOI FinFET technology makes use of 7 different families of arrays. This paper gives an overview on advantages of different implementations, focusing on two key innovations introduced with this processor generation: Fast and low-latency write assist schemes for single-voltage performance arrays, as well as a new methodology, the synthesized soft arrays, to enable significant...
String objects, the most commonly used objects in Java programs, are immutable (read-only) and easily identified. Previous analysis of sharing string objects in the Java Virtual Machine showed promising results, however it is clear that sharing a wider set of objects would result in better performance. Automatic object selection for sharing is non-trivial, because in the current state, only read-only...
We present OctNet, a representation for deep learning with sparse 3D data. In contrast to existing models, our representation enables 3D convolutional networks which are both deep and high resolution. Towards this goal, we exploit the sparsity in the input data to hierarchically partition the space using a set of unbalanced octrees where each leaf node stores a pooled feature representation. This...
In this paper, we present a new methodology that provides i) a theoretical analysis of the two most commonly used approaches for effective shared cache management (i.e., cache partitioning and loop tiling) and ii) a unified framework to fine tuning those two mechanisms in tandem (not separately). Our approach manages to lower the number of main memory accesses by one order of magnitude keeping at...
The extremely strict code length constraint is the main drawback of lowest density, maximum-distance separable (MDS) array codes of distance greater than 3. To break away from the status quo, we proposed in [5] a family of lowest density MDS array codes of (column) distance 4, called XI-Code. Compared with the previous alternatives, XI-Code has lower encoding and decoding complexities, and much looser...
Per-flow counting for big network data streams is a fundamental problem in various network applications such as traffic monitoring, load balancing, capacity planning, etc. Traditional research focused on designing compact data structures to estimate flow sizes from the beginning of the data stream (i.e., landmark window model). However, for many applications, the most recent elements of a stream are...
This paper proposes a novel intrusion detection algorithm that aims to identify malicious CAN messages injected by attackers in the CAN bus of modern vehicles. The proposed algorithm identifies anomalies in the sequence of messages that flow in the CAN bus and is characterized by small memory and computational footprints, that make it applicable to current ECUs. Its detection performance are demonstrated...
High-Level Synthesis (HLS) has been widely recognized as an efficient compilation process targeting FPGAs for algorithm evaluation and product prototyping. However, the massively parallel memory access demands and the extremely expensive cost of single-bank memory with multi-port have impeded loop pipelining performance. Thus, based on an alternative multi-bank memory architecture, a joint approach...
Data sets are often too immense to fit completely inside the computer's main memory and must instead reside on disk. If data set will be kept in main memory it will be very costly. A computer must retrieve required data and place it in internal memory to process it. Efficient data structures, like B-tree, B+ tree, are used to process large datasets. Nodes of these data structures are buffered in memory...
GPUs provide high-bandwidth/low-latency on-chip shared memory and L1 cache to efficiently service a large number of concurrent memory requests (to contiguous memory space). To support warp-wide accesses to L1 cache, GPU L1 cache lines are very wide. However, such L1 cache architecture cannot always be efficiently utilized when applications generate many memory requests with irregular access patterns...
In a processing engine there is a continuous exchange of data between the computing nodes. The data that needs to be exchanged cannot be transferred in its normal form hence needs to be converted into normal bytes for faster execution. This is achieved by the process of Serialization. If there is any computation required to be performed on the data that is being transmitted we will have to de-serialize...
The Suffix Tree, a crucial and versatile data structure for string analysis of large texts, is often used in pattern matching and in bioinformatics applications. The Affix Tree generalizes the Suffix Tree in that it supports full tree functionalities in both search directions. The bottleneck of Affix Trees is their space requirement for storing the data structure. Here, we discuss existing representations...
The cache replacement policy is a major factor which determines the effectiveness of memory hierarchy. The replacement policy affects both the hit rate and the access latency of the cache. It decides the cache block to be replaced to give room for the incoming block. The replacement policy has to be chosen in such a way that the cache misses are reduced. Last level cache misses causes hundreds of...
Layout and placement of shared data structures is critical to achieving scalable performance on heterogeneous memory architectures. While recent research has established the importance of data organization and developed mechanisms for data layout conversion, a general strategy for when to make layout changes and where to map data segments in a heterogeneous environment, has not yet emerged. In this...
Spin-transfer torque random access memory (STT-RAM) has been proposed to be an excellent candidate for substituting traditional memory due to its fascinating features such as high density and low power. Memory partitioning is an efficient strategy to overcome the obstacle of memory bandwidth limiting speed of parallel data access. However, the performance is unsatisfactory, while previous memory partitioning...
Modern reconfigurable computing chips, such as FPGAs, offer an unprecedented opportunity to achieving both multifunctionality and real-time responsiveness for memoryintensive embedded applications. However, how to cost-effectively synthesize application-specific hardware constructs that fully exploit memory-level parallelism remains to be a key challenge. To address this problem, we propose a new...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.