The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The increasing demand for extracting value out of ever-growing data poses an ongoing challenge to system designers, a task only made trickier by the end of Dennard scaling. As the performance density of traditional CPU-centric architectures stagnates, advancing compute capabilities necessitates novel architectural approaches. Near-memory processing (NMP) architectures are reemerging as promising candidates...
This paper investigates compression for DRAM caches. As the capacity of DRAM cache is typically large, prior techniques on cache compression, which solely focus on improving cache capacity, provide only a marginal benefit. We show that more performance benefit can be obtained if the compression of the DRAM cache is tailored to provide higher bandwidth. If a DRAM cache can provide two compressed lines...
Caches are traditionally organized as a rigid hierarchy, with multiple levels of progressively larger and slower memories. Hierarchy allows a simple, fixed design to benefit a wide range of applications, since working sets settle at the smallest (i.e., fastest and most energy-efficient) level they fit in. However, rigid hierarchies also add overheads, because each level adds latency and energy even...
Heterogeneous memory management combined with server virtualization in datacenters is expected to increase the software and OS management complexity. State-of-the-art solutions rely exclusively on the hypervisor (VMM) for expensive page hotness tracking and migrations, limiting the benefits from heterogeneity. To address this, we design HeteroOS, a novel application-transparent OS-level solution for...
In this paper, an advanced joint jammer with deceptive jamming and blanket jamming is proposed for countering linear frequency modulated (LFM) radar. Multiple preceded and hysteretic false targets are produced by convolution operation and the blanket jamming is obtained by modulating the intercepted hostile radar signal according to pseudo-random sequences. The jamming algorithm is implemented on...
Modern multi-core systems employ shared memory architecture, entailing problems related to the main memory such as row-buffer conflicts, time-varying hot-spots across memory channels, and superfluous switches between reads and writes originating from different cores. There have been proposals to solve these problems by partitioning main memory across banks and/or channels such that a DRAM bank is...
He VMware ESXi hypervisor attracts a wide range of customers and is deployed in domains ranging from desktop computing to server computing. While the software systems are increasingly moving towards consolidation, hardware has already transitioned into multi-socket Non-Uniform Memory Access (NUMA)-based systems. The marriage of increasing consolidation and the multi-socket based systems warrants low-overhead,...
As servers are equipped with more memory modules each with larger capacity, main-memory systems are now the second highest energy-consuming component in big-memory servers and their energy consumption even becomes comparable to processors in some servers. Meanwhile, it is critical for big-memory servers and their main-memory systems to offer high energy efficiency. Prior work exploited mobile LPDDR...
Three-dimensional (3D)-stacking technology, which enables the integration of DRAM and logic dies, offers high bandwidth and low energy consumption. This technology also empowers new memory designs for executing tasks not traditionally associated with memories. A practical 3D-stacked memory is Hybrid Memory Cube (HMC), which provides significant access bandwidth and low power consumption in a small...
The reactive model of Software Defined Networking (SDN) invokes controller to dynamically determine the behaviors of a new flow without any pre-knowledge in the data plane. However, the reactive events raised by such flexible model meanwhile consume lots of the bottleneck resources of the fast memory in switch and bandwidth between controller and switches. To address this problem, we propose SoftRing...
FPGAs are being incorporated into contemporary datacenters in order to improve computational capacity, power consumption, and processing latency. Efficiently integrating FP-GAs in datacenters is, however, quite challenging. Ideally, smaller tasks could share a device and the cloud management layer would be able to partially reconfigure the device to allocate its free resources to incoming tasks. Moreover,...
Fault tolerance is one of the major design goals for HPC. The emergence of non-volatile memories (NVM) provides a solution to build fault tolerant HPC. Data in NVM-based main memory are not lost when the system crashes because of the non-volatility nature of NVM. However, because of volatile caches, data must be logged and explicitly flushed from caches into NVM to ensure consistence and correctness...
The Network Functions Virtualization (NFV) paradigm offers network operators benefits in terms of cost efficiency, vendor independence, as well as flexibility and scalability. However, in order to profit most from these features, new challenges in the area of management and orchestration of the virtual network functions (VNFs) need to be addressed.,,In particular, this work deals with the VNF chain...
The increase in memory capacity is substantially behind the increase in computing power in today's supercomputers. In order to alleviate the effect of this gap, diverse options such as NVM - non-volatile memory (less expensive but slow) and HBM - high bandwidth memory (fast but expensive) are being explored. In this paper, we present a common approach using parallel runtime techniques for utilizing...
Comparison of most mature and promising emerging memory technologies respect to mainstream NAND and DRAM and challenges for the introduction in the market for high density applications.
Some modern high-level synthesis (HLS) tools [1] permit the synthesis of multi-threaded software into parallel hardware, where concurrent software threads are realized as concurrently operating hardware units. A common performance bottleneck in any parallel implementation (whether it be hardware or software) is memory bandwidth — parallel threads demand concurrent access to memory resulting in contention...
The architecture of the Microsoft Catapult II cloud places the accelerator (FPGA) as a bump-in-the-wire on the way to the network and thus promises a dramatic reduction in latency as layers of hardware and software are avoided. We demonstrate this capability with an implementation of the 3D FFT. Next we examine phased application elasticity, i.e., the use of a reduced set of nodes for some phases...
Die-stacked DRAM (a.k.a., on-chip DRAM) provides much higher bandwidth and lower latency than off-chip DRAM. It is a promising technology to break the "memory wall". Die-stacked DRAM can be used either as a cache (i.e., DRAM cache) or as a part of memory (PoM). A DRAM cache design would suffer from more page faults than a PoM design as the DRAM cache cannot contribute towards capacity of...
Recently, architectures with scratchpad memory are gaining popularity. These architectures consist of low bandwidth, large capacity DRAM and high bandwidth, user addressable small capacity scratchpad. Existing algorithms must be redesigned to take advantage of the high bandwidth while overcoming the constraint on capacity of scratchpad. In this paper, we propose an optimized edge-centric graph processing...
The paper proposes the solution of the actual scientific problem of load balancing and efficient use of distributed system resources. The proposed method is based on the calculation of the load of the central processor, memory and bandwidth fractal information streams of different classes of service for each server and the entire distributed system. The method allows calculating the imbalance of all...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.