Advanced search

chapter

The mondrian data engine

Mario Drumond, Alexandros Daglis, Nooshin Mirzadeh, Dmitrii Ustiugov, more

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 639 - 651

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

The increasing demand for extracting value out of ever-growing data poses an ongoing challenge to system designers, a task only made trickier by the end of Dennard scaling. As the performance density of traditional CPU-centric architectures stagnates, advancing compute capabilities necessitates novel architectural approaches. Near-memory processing (NMP) architectures are reemerging as promising candidates...

chapter

HeteroOS — OS design for heterogeneous memory management in datacenter

Sudarsun Kannan, Ada Gavrilovska, Vishal Gupta, Karsten Schwan

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 521 - 534

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

Heterogeneous memory management combined with server virtualization in datacenters is expected to increase the software and OS management complexity. State-of-the-art solutions rely exclusively on the hypervisor (VMM) for expensive page hotness tracking and migrations, limiting the benefits from heterogeneity. To address this, we design HeteroOS, a novel application-transparent OS-level solution for...

chapter

Rebooting the Data Access Hierarchy of Computing Systems

Wen-mei W. Hwu, Izzat El Hajj, Simon Garcia de Gonzalo, Carl Pearson, more

2017 IEEE International Conference on Rebooting Computing (ICRC) > 1 - 4

2017 IEEE International Conference on Rebooting Computing (ICRC)

We have been experiencing two very important movements in computing. On the one hand, a tremendous amount of resource has been invested into innovative applications such as first-principle-based methods, deep learning and cognitive computing. On the other hand, the industry has been taking a technological path where application performance and energy efficiency vary by more than two orders of magnitude...

chapter

Cache Memory Energy Efficiency Exploration for the HEVC Motion Estimation

Anderson Martins, Wagner Penny, Matheus Weber, Daniel Palomino, more

2017 VII Brazilian Symposium on Computing Systems Engineering (SBESC) > 31 - 38

2017 VII Brazilian Symposium on Computing Systems Engineering (SBESC)

The intense demand for video applications from mobile devices brings a challenge to the hardware design, especially in the energy consumption. This work presents a design space exploration to define energy-efficient cache memory configurations for the ME process considering different video sequences and HEVC encoder configurations. We focus on the Motion Estimation (ME) process, known as the most...

chapter

Efficient Snapshot Mechanisms for Xen Virtual Machines

Po-Jen Chuang, Yen-Chia Huang

2017 IEEE 10th Conference on Service-Oriented Computing and Applications (SOCA) > 112 - 115

2017 IEEE 10th Conference on Service-Oriented Computing and Applications (SOCA)

This paper presents an efficient new snapshot mechanism, briefed as Live Save, to make real time backup of the VM state to the local host. The proposed Live Save will iteratively send the state data, store the snapshot file in the local host and send the entire file directly to a remote host when necessary - to save significant bandwidth consumption. We also set up an advanced Improved Live Save which...

chapter

Demystifying the characteristics of 3D-stacked memories: A case study for Hybrid Memory Cube

Ramyad Hadidi, Bahar Asgari, Burhan Ahmad Mudassar, Saibal Mukhopadhyay, more

2017 IEEE International Symposium on Workload Characterization (IISWC) > 66 - 75

2017 IEEE International Symposium on Workload Characterization (IISWC)

Three-dimensional (3D)-stacking technology, which enables the integration of DRAM and logic dies, offers high bandwidth and low energy consumption. This technology also empowers new memory designs for executing tasks not traditionally associated with memories. A practical 3D-stacked memory is Hybrid Memory Cube (HMC), which provides significant access bandwidth and low power consumption in a small...

chapter

Design and implementation of LTE-Hi channel emulator

Nianzu Zhang, Zhenbo Jiang, Guangxing Li, Jianhong Xie

2017 International Symposium on Antennas and Propagation (ISAP) > 1 - 2

2017 International Symposium on Antennas and Propagation (ISAP)

This paper presents the design and test of 2×2 channel emulator, which is especially optimized for LTE-Hi testing. It adopts traditional convention method of the channel impulse response in time domain. The design features lower memory resource with sharing delay data buffering, which could reduce the cost of efficient VLSI Implement. In addition, Bidirectional design offers flexible and reliable...

chapter

On memory footprints of partitioned sparse matrices

Daniel Langr, Ivan Simecek

2017 Federated Conference on Computer Science and Information Systems (FedCSIS) > 513 - 521

2017 Federated Conference on Computer Science and Information Systems (FedCSIS)

The presented study analyses 563 representative benchmark sparse matrices with respect to their partitioning into uniformly-sized blocks. The aim is to minimize memory footprints of matrices. Different block sizes and different ways of storing blocks in memory are considered and statistically evaluated. Memory footprints of partitioned matrices are additionally compared with lower bounds and the CSR...

chapter

One size does not fit all: Implementation trade-offs for iterative stencil computations on FPGAs

Gael Deest, Tomofumi Yuki, Sanjay Rajopadhye, Steven Derrien

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 8

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

Iterative stencils are kernels in various application domains such as numerical simulations and medical imaging, that merit FPGA acceleration. The best architecture depends on many factors such as the target platform, off-chip memory bandwidth, problem size, and performance requirements. We generate a family of FPGA stencil accelerators targeting emerging System on Chip platforms, (e.g., Xilinx Zynq...

chapter

OMBM: Optimized Memory Bandwidth Management for Ensuring QoS and High Server Utilization

Hanul Sung, Jeesoo Min, Sujin Ha, Hyeonsang Eom

2017 IEEE 2nd International Workshops on Foundations and Applications of Self* Systems (FAS*W) > 269 - 276

2017 IEEE 2nd International Workshops on Foundations and Applications of Self* Systems (FAS*W)

Latency-critical workloads such as web search engines, social networks and finance market applications are sensitive to tail latencies for meeting Service Level Objectives (SLOs). Since unexpected tail latencies are caused by sharing hardware resources with other co-executing workloads, a service provider executes the latency-critical workload alone. Thus, the data center for the latency-critical...

chapter

Dynamic Co-Scheduling Driven by Main Memory Bandwidth Utilization

Jens Breitbart, Simon Pickartz, Stefan Lankes, Josef Weidendorfer, more

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 400 - 409

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Most applications running on supercomputers achieve only a fraction of a system's peak performance. It has been demonstrated that the co-scheduling of applications can improve the overall system utilization. However, following this approach, applications need to fulfill certain criteria such that the mutual slowdown is kept at a minimum. In this paper, we present an HPC scheduler that applies co-scheduling...

chapter

Runtime Techniques for Programming with Fast and Slow Memory

Xiang Ni, Nikhil Jain, Kavitha Chandrasekar, Laxmikant V. Kale

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 147 - 151

2017 IEEE International Conference on Cluster Computing (CLUSTER)

The increase in memory capacity is substantially behind the increase in computing power in today's supercomputers. In order to alleviate the effect of this gap, diverse options such as NVM - non-volatile memory (less expensive but slow) and HBM - high bandwidth memory (fast but expensive) are being explored. In this paper, we present a common approach using parallel runtime techniques for utilizing...

chapter

Emerging memory technologies for high density applications

Giorgio Servalli

2017 47th European Solid-State Device Research Conference (ESSDERC) > 156 - 159

ESSDERC 2017 - 47th IEEE European Solid-State Device Research Conference (ESSDERC)

Comparison of most mature and promising emerging memory technologies respect to mainstream NAND and DRAM and challenges for the introduction in the market for high density applications.

chapter

Performance and Power Analysis of SX-ACE Using HP-X Benchmark Programs

Ryusuke Egawa, Kazuhiko Komatsu, Yoko Isobe, Toshihiro Kato, more

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 693 - 700

2017 IEEE International Conference on Cluster Computing (CLUSTER)

As the SIMD width of modern microprocessors has been widening for keeping up with the computational demand for HPC systems, recently the vector architecture comes back to spotlight. Besides, a modern vector architecture that has been keeping a large SIMD width and a high B/F ratio has survived and evolved in the HPC community. In this paper, to clarify the potential of the modern vector architecture,...

chapter

SELF: A High Performance and Bandwidth Efficient Approach to Exploiting Die-Stacked DRAM as Part of Memory

Yuhua Guo, Qing Liu, Weijun Xiao, Ping Huang, more

2017 IEEE 25th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS) > 187 - 197

2017 IEEE 25th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)

Die-stacked DRAM (a.k.a., on-chip DRAM) provides much higher bandwidth and lower latency than off-chip DRAM. It is a promising technology to break the "memory wall". Die-stacked DRAM can be used either as a cache (i.e., DRAM cache) or as a part of memory (PoM). A DRAM cache design would suffer from more page faults than a PoM design as the DRAM cache cannot contribute towards capacity of...

chapter

A layer-block-wise pipeline for memory and bandwidth reduction in distributed deep learning

Haruki Mori, Tetsuya Youkawa, Shintaro Izumi, Masahiko Yoshimoto, more

2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP) > 1 - 6

2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)

This paper describes a pipelined stochastic gradient descent (SGD) algorithm and its hardware architecture with a memory distributed structure. In the proposed architecture, a pipeline stage takes charge of multiple layers: a “layer block.” The layer-block-wise pipeline has much less weight parameters for network training than conventional multithreading because weight memory is distributed to workers...

chapter

A Methodology for Predicting Application-Specific Achievable Memory Bandwidth for HW/SW-Codesign

Matthias Gobel, Ahmed Elhossini, Ben Juurlink

2017 Euromicro Conference on Digital System Design (DSD) > 533 - 537

2017 Euromicro Conference on Digital System Design (DSD)

The trend of using heterogeneous computing and HW/SW-Codesign approaches allows increasing performance significantly while reducing power consumption. One of the main challenges when combining multiple processing devices is the communication, as an inefficient communication configu-ration can pose a bottleneck to the overall system performance. To address this problem, we present a methodology that...

chapter

Optimizing Memory Management in Deeply Heterogeneous HPC Accelerators

Anna Pupykina, Giovanni Agosta

2017 46th International Conference on Parallel Processing Workshops (ICPPW) > 291 - 300

2017 46th International Conference on Parallel Processing Workshops (ICPPW)

We address the problem of optimizing global shared memory usage in deeply heterogeneous accelerators in the context of HPC systems running multiple applications with different quality of service levels. We explore predictive memory allocation algorithms, allowing to serve up to 28% more high priority requests when using a moving average based prediction in a low-workload scenario.

chapter

A Novel Minimum Time Parallel 2-D Discrete Wavelet Transform Algorithm for General Purpose Processors

Eduardo Moscoso Rubino, Alberto Jose Alvares, Raul Marin Prades, Pedro Sanz Valero

2017 46th International Conference on Parallel Processing (ICPP) > 553 - 562

2017 46th International Conference on Parallel Processing (ICPP)

A novel efficient inplace, multithreaded, and cachefriendly parallel 2-D wavelet transform algorithm based on the lifting transform is introduced. In order to maximize the cache utilization and consequently minimize the memory bus bandwidth use, the threads compete to work on a small memory area maximizing the chance of finding it in the cache and their synchronization is done with very low overhead...

chapter

A Staged Memory Resource Management Method for CMP systems

Yangguo Liu, Junlin Lu, Dong Tong, Xu Cheng

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) > 91 - 98

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP)

Memory interference is a critical impediment to system performance in CMP systems. To address this problem, we first propose a Dynamically Proportional Bandwidth Throttling policy (DPBT), which dynamically throttles back memory-intensive applications based on their memory access behavior. DPBT achieves a more balance memory bandwidth partitioning. Moreover, we improve the previous memory channel partitioning...

INFONA - science communication portal

Advanced search

Advanced search

The mondrian data engine

HeteroOS — OS design for heterogeneous memory management in datacenter

Rebooting the Data Access Hierarchy of Computing Systems

Cache Memory Energy Efficiency Exploration for the HEVC Motion Estimation

Efficient Snapshot Mechanisms for Xen Virtual Machines

Demystifying the characteristics of 3D-stacked memories: A case study for Hybrid Memory Cube

Design and implementation of LTE-Hi channel emulator

On memory footprints of partitioned sparse matrices

One size does not fit all: Implementation trade-offs for iterative stencil computations on FPGAs

OMBM: Optimized Memory Bandwidth Management for Ensuring QoS and High Server Utilization

Dynamic Co-Scheduling Driven by Main Memory Bandwidth Utilization

Runtime Techniques for Programming with Fast and Slow Memory

Emerging memory technologies for high density applications

Performance and Power Analysis of SX-ACE Using HP-X Benchmark Programs

SELF: A High Performance and Bandwidth Efficient Approach to Exploiting Die-Stacked DRAM as Part of Memory

A layer-block-wise pipeline for memory and bandwidth reduction in distributed deep learning

A Methodology for Predicting Application-Specific Achievable Memory Bandwidth for HW/SW-Codesign

Optimizing Memory Management in Deeply Heterogeneous HPC Accelerators

A Novel Minimum Time Parallel 2-D Discrete Wavelet Transform Algorithm for General Purpose Processors

A Staged Memory Resource Management Method for CMP systems

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Advanced search

Advanced search

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options