The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Given the complexity of modern integrated circuits, design reuse is essential, but current hardware description languages do not adequately address reuse challenges for many classes of design. Processor cores, as an example, are shaped by cycle-level interactions, and leveraging such designs into environments with different timing constraints requires retiming, repipelining, and microarchitectural...
Approximate computing is an emerging design paradigm targeting at error-tolerant applications. The area, delay, and power consumption of a circuit can be improved by sacrificing a reasonable amount of accuracy. Approximate logic synthesis (ALS) aims at automatically synthesizing an approximate circuit for a given target circuit. In this paper, we propose to approximate a target function by a maximally...
We introduce M2S-CGM a detailed architectural simulator that models the interactions between CPUs and GPUs operating in coherent heterogeneous compute environments. M2S-CGM extends an existing and established x86 CPU model and Southern Islands GPU model, adds a new custom-built memory system model and switching fabric called CGM, and incorporates a well-known SDRAM model. The CGM memory system simulator...
This work presents a method to characterize heterogeneous hardware/software systems mapped onto Configurable SoCs (CSoC) in situ, where in situ implies that the CSoC being characterized is also the final target platform. The result of our proposed method is a trade-off curve of different configurations with unique area vs. performance characteristics, each of which uses a different micro-architecture...
Power and thermal limitations make it impossible to run all cores on a multicore system at their maximum frequency. Therefore, modern systems require careful power management. These systems must manage complex tradeoffs between energy, power, and frequency, choosing which cores to accelerate to achieve good performance while maintaining energy efficiency or operating under a power budget. Navigating...
The general-purpose cache-coherent many-core server processors are usually designed with a per-core private cache hierarchy and a large shared multi-banked last-level cache (LLC). The round-trip latency and the volume of traffic through the on-die interconnect between the per-core private cache hierarchy and the shared LLC banks can be significantly large. As a result, optimized private caching is...
This paper presents a practical method for improving timing uncertainty due to thermal noise in a ring oscillator. The methodology utilizes delay elements with non-linear behavior dependent on event separation, the period between successive events. Pulse logic gates are shown to have delay-separation dynamics which can impact the statistics of subsequent events in the oscillators. The slope of the...
Although prefetching concepts have been proposed for decades, new challenges are introduced by sophisticated system architecture and emerging applications. Large instruction windows coupled with out-of-order execution makes program data access sequence distorted from cache perspective. Big data applications stress memory subsystems heavily with their large working set sizes and complex data access...
This paper presents a spin-orbit torque magnetic random access memory (SOT-MRAM) using perpendicular-anisotropy magnetic tunnel junction (p-MTJ). In spite of conventional p-MTJ based SOT-MRAMs which need an external magnetic field to achieve a deterministic switching, the proposed cell uses a spin-torque transfer (STT) current where we show that the cell needs only two access transistors. This can...
In this paper, we propose a novel Spin-Transfer Torque Magnetic Random-Access Memory (STT-MRAM) array design that could simultaneously work as non-volatile memory and implement a reconfigure in-memory logic operation without add-on logic circuits to the memory chip. The computed output could be simply read out like a typical MRAM bit-cell through the modified peripheral circuit. Such intrinsic in-memory...
Convolutional Neural Networks have dramatically improved in recent years, surpassing human accuracy on certain problems and performance exceeding that of traditional computer vision algorithms. While the compute pattern in itself is relatively simple, significant compute and memory challenges remain as CNNs may contain millions of floating-point parameters and require billions of floating-point operations...
Emergence of monolithic 3D (M3D) integration has opened up the possibility of designing the ultra-low-power and high-performance circuits and systems. The smaller dimensions of monolithic inter-tier vias (MIVs) offer high density integration, the flexibility of partitioning logic blocks across multiple tiers, and significantly reduced total wire-length. In this work, we explore the design space of...
Mixed-Criticality Systems (MCS) are real-time systems characterized by two or more distinct levels of criticality. In MCS, it is imperative that high-critical flows meet their deadlines while low critical flows can tolerate some delays. Sharing resources between flows in Network-On-Chip (NoC) can lead to different unpredictable latencies and subsequently complicate the implementation of MCS in many-core...
Linear algebra operations are at the heart of scientific computing solvers, machine learning and artificial intelligence. In this paper, LACore, a novel, programmable accelerator architecture for general-purpose linear algebra applications, is presented. LACore enables many of the architectural features typically available in custom supercomputing machines in an accelerator form factor that can be...
Modern file systems employ journaling techniques to guarantee data consistency in case of unexpected system crashes or power failures. However, journaling file systems usually suffer from performance decrease due to the extra journal writes. Moreover, the emerging non-volatile memory technologies (NVMs) have the potential capability to improve the performance of journaling file systems by being deployed...
The byte-addressable Non-Volatile Memory (NVM) offers fast, fine-grained access to persistent storage. While DRAM and NVM have similar read performance, the write operations of existing NVM materials incur longer latency and lower bandwidth than DRAM. This read-write asymmetry nature of NVM causes two bottlenecks for accessing read-and write-intensive file data: expensive data block lookups via file...
Microfluidic routing fabrics, or crossbars, based on transposer primitives provide benefits in manufacturability, performance, and on-the-fly reconfigurability. Many applications in microfluidics, such as DNA barcoding for single-cell analysis, are expected to benefit from these new devices. However, the control of these critical devices poses new security questions that may impact the functional...
Hardware prefetcher is an essential component of modern processors that helps in boosting system performance by fetching the data before processor demands for the same. Hardware prefetching techniques have been proposed to exploit various kinds of access patterns. However, there are applications that are highly irregular in nature that evolved in the past decade, and have massive memory footprint...
Network-connected embedded systems require multiple lines of defense against malware. In addition to preventing malware by designing secure interfaces and software, anomaly-based detection is needed to detect malware that successfully infiltrates these defenses. Timing based anomaly detection strengthens embedded system security by detecting anomalies in the execution time of critical software tasks...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.