Search results

chapter

Exploring the impact of memory block permutation on performance of a crossbar ReRAM main memory

Morteza Ramezani, Nima Elyasi, Mohammad Arjomand, Mahmut T. Kandemir, more

2017 IEEE International Symposium on Workload Characterization (IISWC) > 167 - 176

2017 IEEE International Symposium on Workload Characterization (IISWC)

Owing to the advantages of low standby power and high scalability, ReRAM technology is considered as a promising replacement for conventional DRAM in future manycore systems. In order to make ReRAM highly scalable, the memory array has to have a crossbar array structure, which needs a specific access mechanism for activating a row of memory when reading/writing a data block from/to it. This type of...

chapter

Tile size selection for optimized memory reuse in high-level synthesis

Junyi Liu, John Wickerson, George A. Constantinides

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 8

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

High-level synthesis (HLS) is well capable of generating control and computation circuits for FPGA accelerators, but still requires sufficient human effort to tackle the challenge of memory and communication bottlenecks. One important approach for improving data locality is to apply loop tiling on memory-intensive loops. Loop tiling is a well-known compiler technique that partitions the iteration...

chapter

Automated generation of banked memory architectures in the high-level synthesis of multi-threaded software

Yu Ting Chen, Jason H. Anderson

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 8

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

Some modern high-level synthesis (HLS) tools [1] permit the synthesis of multi-threaded software into parallel hardware, where concurrent software threads are realized as concurrently operating hardware units. A common performance bottleneck in any parallel implementation (whether it be hardware or software) is memory bandwidth — parallel threads demand concurrent access to memory resulting in contention...

chapter

Adapting an industrial memory BIST solution for testing CAMs

Jais Abraham, Uttam Garg, Glenn Colon-Bonet, Ramesh Sharma, more

2017 International Test Conference in Asia (ITC-Asia) > 112 - 117

2017 International Test Conference in Asia (ITC-Asia)

Content Addressable Memories (CAMs) have found widespread use in applications that require high speed search capabilities. Each cell in the CAM array is associated with a storage unit and a comparator logic. Due to the various customized features in the CAM implementations, creation of an automated BIST solution for testing them has presented unique challenges. This paper shows that, with suitable...

chapter

Large Scale Data Clustering Using Memristive k-Median Computation

Yomi Karthik Rupesh, Mahdi Nazm Bojnordi

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 374

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

Clustering is a crucial tool for analyzing data in virtually every scientific and engineering discipline. The U.S. National Academy of Sciences (NAS) has recently announced "the seven giants of statistical data analysis" in which data clustering plays a central role [1]. This research also emphasizes that more scalable solutions are required to enable time and space clustering for the future...

chapter

Interleaved logic-in-memory architecture for energy-efficient fine-grained data processing

Kai Yang, Robert Karam, Swarup Bhunia

2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS) > 409 - 412

2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS)

For a growing pool of data-intensive applications, data transfer, rather than processing speed, has emerged as the major bottleneck to performance and energy scalability. In this paper, we propose a novel interleaved logic-in-memory architecture, referred to as MISK, which leverages fine-grained integration of logic functions within dense, 2-D static random-access memory (SRAM) arrays for in-situ...

chapter

Energy-efficient SQL query exploiting RRAM-based process-in-memory structure

Yuliang Sun, Yu Wang, Huazhong Yang

2017 IEEE 6th Non-Volatile Memory Systems and Applications Symposium (NVMSA) > 1 - 6

2017 IEEE 6th Non-Volatile Memory Systems and Applications Symposium (NVMSA)

With the coming of ‘Big Data‘ era, high-energy-efficiency database is demanded for the Internet of things (IoT) application scenarios. The emerging Resistive Random Access Memory (RRAM) has been considered as an energy-efficient replacement of DRAM for next-generation main memory. In this paper, we propose an RRAM-based SQL query unit with process-in-memory characteristic. A storage structure for...

chapter

A high performance multi-port SRAM for low voltage shared memory systems in 32 nm CMOS

Samira Ataei, Matthew Gaalswyk, James E. Stine

2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS) > 1236 - 1239

2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS)

A 4 kb fully differential 8-port SRAM bitcell array (6 read ports and 2 write ports) is presented in this paper. This 8-port SRAM provides simultaneous access, high system throughput and a great read static noise margin by isolating the read ports from storage nodes. At 0.4 V supply voltage, designed 8-port SRAM bitcell shows 123, 137 and 123 mV static noise margin during read, write and standby modes,...

chapter

A Cloud System for Machine Learning Exploiting a Parallel Array DBMS

Yiqun Zhang, Carlos Ordonez, Lennart Johnsson

2017 28th International Workshop on Database and Expert Systems Applications (DEXA) > 22 - 26

2017 28th International Workshop on Database and Expert Systems Applications (DEXA)

Computing machine learning models in the cloud remains a central problem in big data analytics. In this work, we introduce a cloud analytic system exploiting a parallel array DBMS based on a classical shared-nothing architecture. Our approach combines in-DBMS data summarization with mathematical processing in an external program. We study how to summarize a data set in parallel assuming a large number...

chapter

A novel ReRAM-based processing-in-memory architecture for graph computing

Lei Han, Zhaoyan Shen, Zili Shao, H. Howie Huang, more

2017 IEEE 6th Non-Volatile Memory Systems and Applications Symposium (NVMSA) > 1 - 6

2017 IEEE 6th Non-Volatile Memory Systems and Applications Symposium (NVMSA)

Graph algorithms such as breadth-first search (BFS) have been gaining ever-increasing importance in the era of Big Data. However, the memory bandwidth remains the key performance bottleneck for graph processing. To address this problem, we utilize processing-in-memory (PIM), combined with non-volatile metal-oxide resistive random access memory (ReRAM), to improve the performance of both computation...

chapter

A novel SRAM — STT-MRAM hybrid cache implementation improving cache performance

Odilia Coi, Guillaume Patrigeon, Sophiane Senni, Lionel Torres, more

2017 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH) > 39 - 44

2017 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)

Memories are currently a real bottleneck to design high speed and energy-efficient systems-on-chip. A significant increase of the performance gap between processors and memories is observed. On the other hand, an important proportion of total power is spent on memory systems due to the increasing trend of embedding volatile memory into systems-on-chip. For these reasons, STT-MRAM (Spin-Transfer Torque...

chapter

Architecting large-scale SRAM arrays with monolithic 3D integration

Joonho Kong, Young-Ho Gong, Sung Woo Chung

2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED) > 1 - 6

2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)

In this paper, we architect large-scale SRAM arrays with monolithic 3D (M3D) integration technology. We introduce M3D-based SRAM arrays with three different ways of integration: M3D-R (vertical routing-only), M3D-VBL (vertical bitline), and M3D-VWL (vertical wordline). We also apply M3D-based SRAM arrays to last-level caches: tag arrays for eDRAM LLCs and data arrays for SRAM LLCs. The proposed LLCs...

chapter

A 31.2pJ/disparity· pixel stereo matching processor with stereo SRAM for mobile UI application

Jinsu Lee, Dongjoo Shin, Kyuho Lee, Hoi-Jun Yoo

2017 Symposium on VLSI Circuits > C158 - C159

2017 Symposium on VLSI Circuits

An energy-efficient and high-speed stereo matching processor is proposed for smart mobile devices with proposed stereo SRAM (S-SRAM) and independent regional integral cost (IRIC). Cost generation unit (CGU) with the proposed S-SRAM reduces 63.2% of CGU power consumption. The proposed IRIC enables cost aggregation unit (CAU) to obtain 6.4× of speed and 12.3% of the power reduction of CAU with pipelined...

chapter

Architecture of an FPGA accelerator for LDA-based inference

Taisuke Ono, Hasitha Muthumala Waidyasooriya, Masanori Hariyama, Tsukasa Ishigaki

2017 18th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD) > 357 - 362

2017 18th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)

Latent Dirichlet allocation (LDA) based topic inference is a data classification method, that is used efficiently for extremely large data sets. However, the processing time is very large due to the serial computational behavior of the Markov Chain Monte Carlo method used for the topic inference. We propose a pipelined hardware architecture and memory allocation scheme to accelerate LDA using parallel...

chapter

Power Efficient Sharing-Aware GPU Data Management

Abdulaziz Tabbakh, Murali Annavaram, Xuehai Qian

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) > 698 - 707

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

The power consumed by memory system in GPUs is a significant fraction of the total chip power. As thread level parallelism increases, GPUs are likely to stress cache and memory bandwidth even more, thereby exacerbating power consumption. We observe that neighboring concurrent thread arrays (CTAs) within GPU applications share considerable amount of data. However, the default GPU scheduling policy...

chapter

AnalyzeThat: A Programmable Shared-Memory System for an Array of Processing-In-Memory Devices

Sangkuen Lee, Hyogi Sim, Youngjae Kim, Sudharshan S. Vazhkudai

2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) > 619 - 624

2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

Processing In Memory (PIM), the concept of integrating processing directly with memory, has been attracting a lot of attention since PIM can assist in overcoming the throughput limitation caused by data movement between CPU and memory. The challenge, however, is that it requires the programmers to have a deep understanding of the PIM architecture to maximize the benefits such as data locality and...

chapter

Performance Evaluation of Scale-Free Graph Algorithms in Low Latency Non-volatile Memory

Manu Shantharam, Keita Iwabuchi, Pietro Cicotti, Laura Carrington, more

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 1021 - 1028

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

The purpose of this study is to quantitatively assess the performance of graph processing algorithms for large scale-free graphs residing in byte-addressable Non-Volatile Memory (NVM). Our study focuses on static and dynamic graph algorithms previously optimized for external memory in the form of locally attached NAND Flash arrays, with data structures tuned to maximize locality. The evaluation is...

chapter

Breadth-First Search with A Multi-Core Computer

Maryia Belova, Ming Ouyang

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 579 - 587

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Breadth-first search is a building block of many graph algorithms. Because BFS is memory-bound, parallelizing BFS on a multi-core computer must consider issues of data hazards, effects of atomic operations on memory throughput, and the size of the last level cache. Additionally, graph algorithms must cope with non-sequential memory access, which defeats cache prefetching and leads to a high cache...

chapter

An Adjacent-Line-Merging Writeback Scheme for STT-RAM last-level caches

Masayuki Sato, Zentaro Sakai, Ryusuke Egawa, Hiroaki Kobayashi

2017 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS) > 1 - 2

2017 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)

Spin-Transfer Torque RAM (STT-RAM) has a higher density than SRAM and non-volatility, and is expected to be used as the last-level cache (LLC) of a microprocessor. One technical issue is that, since the energy cost of write access requests for an STT-RAM LLC is expensive, the total energy consumption of the STT-RAM LLC may increase for some write-intensive applications. Therefore, this paper proposes...

chapter

Protect non-volatile memory from wear-out attack based on timing difference of row buffer hit/miss

Haiyu Mao, Xian Zhang, Guangyu Sun, Jiwu Shu

Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017 > 1623 - 1626

2017 Design, Automation & Test in Europe Conference & Exhibition (DATE)

Non-volatile Memories (NVMs), such as PCM and ReRAM, have been widely proposed for future main memory design because of their low standby power, high storage density, fast access speed. However, these NVMs suffer from the write endurance problem. In order to prevent a malicious program from wearing out NVMs deliberately, researchers have proposed various wear-leveling methods, which remap logical...

INFONA - science communication portal

Search results

Exploring the impact of memory block permutation on performance of a crossbar ReRAM main memory

Tile size selection for optimized memory reuse in high-level synthesis

Automated generation of banked memory architectures in the high-level synthesis of multi-threaded software

Adapting an industrial memory BIST solution for testing CAMs

Large Scale Data Clustering Using Memristive k-Median Computation

Interleaved logic-in-memory architecture for energy-efficient fine-grained data processing

Energy-efficient SQL query exploiting RRAM-based process-in-memory structure

A high performance multi-port SRAM for low voltage shared memory systems in 32 nm CMOS

A Cloud System for Machine Learning Exploiting a Parallel Array DBMS

A novel ReRAM-based processing-in-memory architecture for graph computing

A novel SRAM — STT-MRAM hybrid cache implementation improving cache performance

Architecting large-scale SRAM arrays with monolithic 3D integration

A 31.2pJ/disparity· pixel stereo matching processor with stereo SRAM for mobile UI application

Architecture of an FPGA accelerator for LDA-based inference

Power Efficient Sharing-Aware GPU Data Management

AnalyzeThat: A Programmable Shared-Memory System for an Array of Processing-In-Memory Devices

Performance Evaluation of Scale-Free Graph Algorithms in Low Latency Non-volatile Memory

Breadth-First Search with A Multi-Core Computer

An Adjacent-Line-Merging Writeback Scheme for STT-RAM last-level caches

Protect non-volatile memory from wear-out attack based on timing difference of row buffer hit/miss

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options