Search results

chapter

Exploring the impact of memory block permutation on performance of a crossbar ReRAM main memory

Morteza Ramezani, Nima Elyasi, Mohammad Arjomand, Mahmut T. Kandemir, more

2017 IEEE International Symposium on Workload Characterization (IISWC) > 167 - 176

2017 IEEE International Symposium on Workload Characterization (IISWC)

Owing to the advantages of low standby power and high scalability, ReRAM technology is considered as a promising replacement for conventional DRAM in future manycore systems. In order to make ReRAM highly scalable, the memory array has to have a crossbar array structure, which needs a specific access mechanism for activating a row of memory when reading/writing a data block from/to it. This type of...

chapter

Tile size selection for optimized memory reuse in high-level synthesis

Junyi Liu, John Wickerson, George A. Constantinides

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 8

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

High-level synthesis (HLS) is well capable of generating control and computation circuits for FPGA accelerators, but still requires sufficient human effort to tackle the challenge of memory and communication bottlenecks. One important approach for improving data locality is to apply loop tiling on memory-intensive loops. Loop tiling is a well-known compiler technique that partitions the iteration...

chapter

Autonomous, independent management of dynamic graphs on GPUs

Martin Winter, Rhaleb Zayer, Markus Steinberger

2017 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 7

2017 IEEE High Performance Extreme Computing Conference (HPEC)

In this paper, we present a new, dynamic graph data structure, built to deliver high update rates while keeping a low memory footprint using autonomous memory management directly on the GPU. By transferring the memory management to the GPU, efficient updating of the graph structure and fast initialization times are enabled as no additional memory allocation calls or reallocation procedures are necessary...

chapter

Memristive logic: A framework for evaluation and comparison

John Reuben, Rotem Ben-Hur, Nimrod Wald, Nishil Talati, more

2017 27th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS) > 1 - 8

2017 27th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS)

Memristors have extended their influence beyond memory to logic and in-memory computing. Memristive logic design, the methodology of designing logic circuits using memristors, is an emerging concept whose growth is fueled by the quest for energy efficient computing systems. As a result, many memristive logic families have evolved with different attributes, and a mature comparison among them is needed...

chapter

A system of array families and synthesized soft arrays for the POWER9™ processor in 14nm SOI FinFET technology

P. Salz, A. Frisch, W. Penth, J. Noack, more

ESSCIRC 2017 - 43rd IEEE European Solid State Circuits Conference > 303 - 307

ESSCIRC 2017 - 43rd IEEE European Solid State Circuits Conference (ESSCIRC)

The POWER9™ Processor in 14 nm SOI FinFET technology makes use of 7 different families of arrays. This paper gives an overview on advantages of different implementations, focusing on two key innovations introduced with this processor generation: Fast and low-latency write assist schemes for single-voltage performance arrays, as well as a new methodology, the synthesized soft arrays, to enable significant...

chapter

User controlled object sharing between Java VM instances

Azden Bierbrauer, Konstantin Nasartschuk, Adam Richard, Kenneth B. Kent, more

2017 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM) > 1 - 6

2017 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM)

String objects, the most commonly used objects in Java programs, are immutable (read-only) and easily identified. Previous analysis of sharing string objects in the Java Virtual Machine showed promising results, however it is clear that sharing a wider set of objects would result in better performance. Automatic object selection for sharing is non-trivial, because in the current state, only read-only...

chapter

OctNet: Learning Deep 3D Representations at High Resolutions

Gernot Riegler, Ali Osman Ulusoy, Andreas Geiger

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 6620 - 6629

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We present OctNet, a representation for deep learning with sparse 3D data. In contrast to existing models, our representation enables 3D convolutional networks which are both deep and high resolution. Towards this goal, we exploit the sparsity in the input data to hierarchically partition the space using a set of unbalanced octrees where each leaf node stores a pooled feature representation. This...

chapter

Cache Partitioning + Loop Tiling: A Methodology for Effective Shared Cache Management

Vasilios Kelefouras, Georgios Keramidas, Nikolaos Voros

2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) > 477 - 482

2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

In this paper, we present a new methodology that provides i) a theoretical analysis of the two most commonly used approaches for effective shared cache management (i.e., cache partitioning and loop tiling) and ii) a unified framework to fine tuning those two mechanisms in tandem (not separately). Our approach manages to lower the number of main memory accesses by one order of magnitude keeping at...

chapter

Efficient lowest density MDS array codes of column distance 4

Zhijie Huang, Hong Jiang, Nong Xiao

2017 IEEE International Symposium on Information Theory (ISIT) > 834 - 838

2017 IEEE International Symposium on Information Theory (ISIT)

The extremely strict code length constraint is the main drawback of lowest density, maximum-distance separable (MDS) array codes of distance greater than 3. To break away from the status quo, we proposed in [5] a family of lowest density MDS array codes of (column) distance 4, called XI-Code. Compared with the previous alternatives, XI-Code has lower encoding and decoding complexities, and much looser...

chapter

Per-flow counting for big network data stream over sliding windows

You Zhou, Yian Zhou, Shigang Chen, Youlin Zhang

2017 IEEE/ACM 25th International Symposium on Quality of Service (IWQoS) > 1 - 10

2017 IEEE/ACM 25th International Symposium on Quality of Service (IWQoS)

Per-flow counting for big network data streams is a fundamental problem in various network applications such as traffic monitoring, load balancing, capacity planning, etc. Traditional research focused on designing compact data structures to estimate flow sizes from the beginning of the data stream (i.e., landmark window model). However, for many applications, the most recent elements of a stream are...

chapter

Anomaly detection of CAN bus messages through analysis of ID sequences

Mirco Marchetti, Dario Stabili

2017 IEEE Intelligent Vehicles Symposium (IV) > 1577 - 1583

2017 IEEE Intelligent Vehicles Symposium (IV)

This paper proposes a novel intrusion detection algorithm that aims to identify malicious CAN messages injected by attackers in the CAN bus of modern vehicles. The proposed algorithm identifies anomalies in the sequence of messages that flow in the CAN bus and is characterized by small memory and computational footprints, that make it applicable to current ECUs. Its detection performance are demonstrated...

chapter

Memory fartitioning-based modulo scheduling for high-level synthesis

Tianyi Lu, Shouyi Yin, Xianqing Yao, Zhicong Xie, more

2017 IEEE International Symposium on Circuits and Systems (ISCAS) > 1 - 4

2017 IEEE International Symposium on Circuits and Systems (ISCAS)

High-Level Synthesis (HLS) has been widely recognized as an efficient compilation process targeting FPGAs for algorithm evaluation and product prototyping. However, the massively parallel memory access demands and the extremely expensive cost of single-bank memory with multi-port have impeded loop pipelining performance. Thus, based on an alternative multi-bank memory architecture, a joint approach...

chapter

Robust and efficient algorithms for storage and retrieval of disk based data structures

Kathiravan Srinivasan, Ravinder Kumar, Sahil Singla

2017 International Conference on Applied System Innovation (ICASI) > 934 - 937

2017 International Conference on Applied System Innovation (ICASI)

Data sets are often too immense to fit completely inside the computer's main memory and must instead reside on disk. If data set will be kept in main memory it will be very costly. A computer must retrieve required data and place it in internal memory to process it. Efficient data structures, like B-tree, B+ tree, are used to process large datasets. Nodes of these data structures are buffered in memory...

chapter

Elastic-Cache: GPU Cache Architecture for Efficient Fine- and Coarse-Grained Cache-Line Management

Bingchao Li, Jizhou Sun, Murali Annavaram, Nam Sung Kim

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) > 82 - 91

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

GPUs provide high-bandwidth/low-latency on-chip shared memory and L1 cache to efficiently service a large number of concurrent memory requests (to contiguous memory space). To support warp-wide accesses to L1 cache, GPU L1 cache lines are very wide. However, such L1 cache architecture cannot always be efficiently utilized when applications generate many memory requests with irregular access patterns...

chapter

Performance enhancement of apache APEX

Sushama A Shirke, Tushar Gosavi, Anurag Kumar Mishra, Vishal Vir Singh, more

2017 International conference of Electronics, Communication and Aerospace Technology (ICECA) > 1 > 660 - 663

2017 International conference of Electronics, Communication and Aerospace Technology (ICECA)

In a processing engine there is a continuous exchange of data between the computing nodes. The data that needs to be exchanged cannot be transferred in its normal form hence needs to be converted into normal bytes for faster execution. This is achieved by the process of Serialization. If there is any computation required to be performed on the data that is being transmitted we will have to de-serialize...

chapter

Full Compressed Affix Tree Representations

Rodrigo Canovas, Eric Rivals

2017 Data Compression Conference (DCC) > 102 - 111

2017 Data Compression Conference (DCC)

The Suffix Tree, a crucial and versatile data structure for string analysis of large texts, is often used in pattern matching and in bioinformatics applications. The Affix Tree generalizes the Suffix Tree in that it supports full tree functionalities in both search directions. The bottleneck of Affix Trees is their space requirement for storing the data structure. Here, we discuss existing representations...

chapter

A cache replacement policy based on re-reference count

Sreya Sreedharan, Shimmi Asokan

2017 International Conference on Inventive Communication and Computational Technologies (ICICCT) > 129 - 134

2017 International Conference on Inventive Communication and Computational Technologies (ICICCT)

The cache replacement policy is a major factor which determines the effectiveness of memory hierarchy. The replacement policy affects both the hit rate and the access latency of the cache. It decides the cache block to be replaced to give room for the incoming block. The replacement policy has to be chosen in such a way that the cache misses are reduced. Last level cache misses causes hundreds of...

chapter

Characterizing data organization effects on heterogeneous memory architectures

Apan Qasem, Ashwin M. Aji, Gregory Rodgers

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 160 - 170

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Layout and placement of shared data structures is critical to achieving scalable performance on heterogeneous memory architectures. While recent research has established the importance of data organization and developed mechanisms for data layout conversion, a general strategy for when to make layout changes and where to map data segments in a heterogeneous environment, has not yet emerged. In this...

chapter

Disturbance aware memory partitioning for parallel data access in STT-RAM

Shouyi Yin, Zhicong Xie, Shaojun Wei

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC) > 1 - 6

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC)

Spin-transfer torque random access memory (STT-RAM) has been proposed to be an excellent candidate for substituting traditional memory due to its fascinating features such as high density and low power. Memory partitioning is an efficient strategy to overcome the obstacle of memory bandwidth limiting speed of parallel data access. However, the performance is unsatisfactory, while previous memory partitioning...

chapter

Tessellating memory space for parallel access

Juan Escobedo, Mingjie Lin

2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC) > 75 - 80

2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC)

Modern reconfigurable computing chips, such as FPGAs, offer an unprecedented opportunity to achieving both multifunctionality and real-time responsiveness for memoryintensive embedded applications. However, how to cost-effectively synthesize application-specific hardware constructs that fully exploit memory-level parallelism remains to be a key challenge. To address this problem, we propose a new...

INFONA - science communication portal

Search results

Exploring the impact of memory block permutation on performance of a crossbar ReRAM main memory

Tile size selection for optimized memory reuse in high-level synthesis

Autonomous, independent management of dynamic graphs on GPUs

Memristive logic: A framework for evaluation and comparison

A system of array families and synthesized soft arrays for the POWER9™ processor in 14nm SOI FinFET technology

User controlled object sharing between Java VM instances

OctNet: Learning Deep 3D Representations at High Resolutions

Cache Partitioning + Loop Tiling: A Methodology for Effective Shared Cache Management

Efficient lowest density MDS array codes of column distance 4

Per-flow counting for big network data stream over sliding windows

Anomaly detection of CAN bus messages through analysis of ID sequences

Memory fartitioning-based modulo scheduling for high-level synthesis

Robust and efficient algorithms for storage and retrieval of disk based data structures

Elastic-Cache: GPU Cache Architecture for Efficient Fine- and Coarse-Grained Cache-Line Management

Performance enhancement of apache APEX

Full Compressed Affix Tree Representations

A cache replacement policy based on re-reference count

Characterizing data organization effects on heterogeneous memory architectures

Disturbance aware memory partitioning for parallel data access in STT-RAM

Tessellating memory space for parallel access

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options