The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper, we evaluate the error criticality of radiation-induced errors on modern High-Performance Computing~(HPC) accelerators (Intel Xeon Phi and NVIDIA K40) through a dedicated set of metrics. We show that, as long as imprecise computing is concerned, the simple mismatch detection is not sufficient to evaluate and compare the radiation sensitivity of HPC devices and algorithms. Our analysis...
GPU adoption for general purpose computing hasbeen accelerating. To support a large number of concurrentlyactive threads, GPUs are provisioned with a very large registerfile (RF). The RF power consumption is a critical concern. Oneoption to reduce the power consumption dramatically is touse near-threshold voltage(NTV) to operate the RF. However, operating MOSFET devices at NTV is fraught with stabilityand...
Modern solid state drives (SSDs) unnecessarily confine host programs to the conventional block I/O interface, leading to suboptimal performance and resource under-utilization. Recent attempts to replace or extend this interface with a key-value-oriented interface and/or built-in support for transactions offer some improvements, but the details of their implementations make them a poor match for many...
Previous work has demonstrated that systems with unencrypted DRAM interfaces are susceptible to cold boot attacks – where the DRAM in a system is frozen to give it sufficient retention time and is then re-read after reboot, or is transferred to an attacker's machine for extracting sensitive data. This method has been shown to be an effective attack vector for extracting disk encryption keys out of...
Path ORAM (Oblivious RAM) is a recently proposed ORAM protocol for preventing information leakage from memory access sequences. It receives wide adoption due to its simplicity, practical efficiency and asymptotic efficiency. However, Path ORAM has extremely large memory bandwidth demand, leading to severe memory competition in server settings, e.g., a server may service one application that uses Path...
In future performance improvement of the basic building block of supercomputers has to come through increased integration enabled by 3D (vertical) and 2.5D (horizontal) die-stacking. But to take advantage of this integration we need an interconnection network between the memory and compute die that not only can provide an order of magnitude higher bandwidth but also consume an order of magnitude less...
This paper presents SecMC, a secure memory controller that provides efficient memory scheduling with a strong quantitative security guarantee against timing channel attacks. The first variant, named SecMC-NI, eliminates timing channels while allowing a tight memory schedule by interleaving memory requests that access different banks or ranks. Experimental results show that SecMC-NI significantly (45%...
Today's caches tightly couple data with metadata (Address Tags) at the cache line granularity. The co-location of data and its identifying metadata means that they require multiple approaches to locate data (associative way searches and level-by-level searches), evict data (coherent writebacks buffers and associative level-by-level searches) and keep data coherent (directory indirections and associative...
Performance isolation is an important goal in server-class environments. Partitioning the last-level cache of a chip multiprocessor (CMP) across co-running applications has proven useful in this regard. Two popular approaches are (a) hardware support for way partitioning, or (b) operating system support for set partitioning through page coloring. Unfortunately, neither approach by itself is scalable...
Technology scaling has continuously improved the density, performance, energy efficiency, and cost of DRAM-based main memory systems. Starting from sub-20nm processes, however, the industry began to pay considerably higher costs to screen and manage notably increasing defective cells. The traditional technique, which replaces the rows/columns containing faulty cells with spare rows/columns, has been...
Modern NAND flash memory chips provide high density by storing two bits of data in each flash cell, called a multi-level cell (MLC). An MLC partitions the threshold voltage range of a flash cell into four voltage states. When a flash cell is programmed, a high voltage is applied to the cell. Due to parasitic capacitance coupling between flash cells that are physically close to each other, flash cell...
This paper proposes an energy-efficient, high-throughput DRAM architecture for GPUs and throughput processors. In these systems, requests from thousands of concurrent threads compete for a limited number of DRAM row buffers. As a result, only a fraction of the data fetched into a row buffer is used, leading to significant energy overheads. Our proposed DRAM architecture exploits the hierarchical organization...
The increasingly-stringent power and energy requirements of emerging embedded applications have led to a strong recent interest in aggressive power gating techniques. Conventional techniques for aggressive power gating perform module-based power gating in processors, where power domains correspond to RTL modules. We observe that there can be significant power benefits from module-oblivious power gating,...
Defining a processor micro-architecture for a targeted productspace involves multi-dimensional optimization across performance, power and reliability axes. A key decision in sucha definition process is the circuit-and technology-driven parameterof the nominal (voltage, frequency) operating point. This is a challenging task, since optimizing individually orpair-wise amongst these metrics usually results...
Dynamic parallelism (DP) is a promising feature for GPUs, which allows on-demand spawning of kernels on the GPU without any CPU intervention. However, this feature has two major drawbacks. First, the launching of GPU kernels can incur significant performance penalties. Second, dynamically-generated kernels are not always able to efficiently utilize the GPU cores due to hardware-limits. To address...
Brain-inspired hyperdimensional (HD) computing emulates cognition tasks by computing with hypervectors as an alternative to computing with numbers. At its very core, HD computing is about manipulating and comparing large patterns, stored in memory as hypervectors: the input symbols are mapped to a hypervector and an associative search is performed for reasoning and classification. For every classification...
DRAM is the primary technology used for main memory in modern systems. Unfortunately, as DRAM scales down to smaller technology nodes, it faces key challenges in both data integrity and latency, which strongly affects overall system reliability and performance. To develop reliable and high-performance DRAM-based main memory in future systems, it is critical to characterize, understand, and analyze...
Server workloads benefit from execution on many-core processors due to their massive request-level parallelism. A key characteristic of server workloads is the large instruction footprints. While a shared last-level cache (LLC) captures the footprints, it necessitates a low-latency network-on-chip (NOC) to minimize the core stall time on accesses serviced by the LLC. As strict quality-of-service requirements...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.