The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
PHP is the dominant server-side scripting language used to implement dynamic web content. Just-in-time compilation, as implemented in Facebook's state-of-the-art HipHopVM, helps mitigate the poor performance of PHP, but substantial overheads remain, especially for realistic, large-scale PHP applications. This paper analyzes such applications and shows that there is little opportunity for conventional...
In today's high performance computing (HPC) environments, analyzing and predicting the performance of multiple-processor systems (clusters cores) on critical workloads remains a challenge. This is as a result of the key metrics that influences system's behavior. Busty arrivals in HPCs demand either a shared memory-parallel architecture or pipelined dataflow architecture. At present, a processor model...
Due to technology scaling, which means smaller transistor, lower voltage and more aggressive clock frequency, VLSI devices are becoming more susceptible against soft errors. Especially for those devices deployed in safety- and mission-critical applications, dependability and reliability are becoming increasingly important constraints during the development of system on/around them. Other phenomena...
Semiconductor design houses are increasingly becoming dependent on third party vendors to procure intellectual property (IP) and meet time-to-market constraints. However, these third party IPs cannot be trusted as hardware Trojans can be maliciously inserted into them by untrusted vendors. While different approaches have been proposed to detect Trojans in third party IPs, their limitations have not...
HPC interconnect is a very crucial component of any HPC machine. Interconnect performance is one of the contributing factors for overall performance of HPC system. Most popular interface to connect Network Interface Card (NIC) to CPU is PCI express (PCIe). With denser core counts in compute servers and increasingly maturing fabric interconnect speeds, there is need to maximize the packet data movement...
As the memory and storage hierarchy get deeper and more complex, it is important to have new benchmarks and evaluation tools that allow us to explore the emerging middleware solutions to use this hierarchy. Skel is a tool aimed at automating and refining this process of studying HPC I/O performance. It works by generating application I/O kernel/benchmarks as determined by a domain-specific model....
Power-aware scheduling has become a critical research thrust for deploying exascale High Performance Computing (HPC) systems with limited power budget. Time-varying pricing of electricity with respect to the market demand and dynamic HPC workloads can lead to unpredictable operational cost, which complicates the scheduling decisions further. For an oversubscribed HPC system, value based scheduling...
Lightweight block ciphers are an important topic of research in the context of the Internet of Things (IoT). Current cryptographic contests and standardization efforts seek to benchmark lightweight ciphers in both hardware and software. Although there have been several benchmarking studies of both hardware and software implementations of lightweight ciphers, direct comparison of hardware and software...
Reinforcement Learning (RL) is an area of machine learning in which an agent interacts with the environment by making sequential decisions. The agent receives reward from the environment to find an optimal policy that maximises the reward. Trust Region Policy Optimisation (TRPO) is a recent policy optimisation algorithm that achieves superior results in various RL benchmarks, but is computationally...
Hardware design is an essential part of research in high performance computing. Initial efforts in hardware research consist of analyzing the design ideas in a software simulator. This allows chip designers to minimize amount of manufacturing that would be too costly and to avoid doing FPGA designs which are even more time consuming. Simulating a hardware design involves running many tests that try...
This paper explores the use of hardware sand-boxes, conceptually similar to software sandboxes, for secure integration of non-trusted IPs in systems-on-chip (SoC) designs. The goal of the hardware sandbox is to only allow permissible interactions between the IP and the rest of the system. The hardware sandbox design achieves this by exposing the IP interface to isolated virtual resources and checking...
Knowledge of power consumption at a subsystem level can facilitate adaptive energy-saving techniques such as power gating, runtime task mapping and dynamic voltage and/or frequency scahng. While we have the ability to attribute power to an arbitrary hardware system's modules in real time, the selection of the particular signals to monitor for the purpose of power estimation within any given module...
Field-Programmable Gate Arrays (FPGAs) are gaining considerable momentum in mainstream high-performance systems in recent years due to their flexibility and low power consumption. Still, FPGAs remain largely unavailable to software programmers due to programming and debugging difficulties that are inherent to standard Hardware Description Languages. The performance that hardware-oblivious software...
The performance of commodity video-gaming embedded devices (consoles, graphics cards, tablets, etc.) has been advancing at a rapid pace owing to strong consumer demand and stiff market competition. Gaming devices are currently amongst the most powerful and cost-effective computational technologies available in quantity. In this article, we evaluate a sample of current generation video-gaming devices...
General-purpose workloads running on modern graphics processing units (GPGPUs) rely on hardware-based barriers to synchronize warps within a thread block (TB). However, imbalance may exist before reaching a barrier if a GPGPU workload contains irregular memory accesses, i.e., some warps may be critical while others may not. Ideally, cache space should be reserved for the critical warps. Unfortunately,...
Early design-space evaluation of computer-systems is usually performed using performance models such as detailed simulators, RTL-based models etc. Unfortunately, it is very challenging (often impossible) to run many emerging applications on detailed performance models owing to their complex application software-stacks, significantly long run times, system dependencies and the limited speed/potential...
With NVIDA Tegra Jetson X1 and Pascal P100 GPUs, NVIDIA introduced hardware-based computation on FP16 numbers also called half-precision arithmetic. In this talk, we will introduce the steps required to build a viable benchmark for this new arithmetic format. This will include the connections to established IEEE floating point standards and existing HPC benchmarks. The discussion will focus on performance...
Work-queue is an effective approach for mapping irregular-parallel workloads to GPGPUs. It can improve the utilization of SIMD units by only processing useful works which are dynamically generated during execution. As current GPGPUs lack necessary supports for work-queues, a software-based work-queue implementation often suffers from memory contention and load balancing issues. We present a novel...
Big data has exacerbated the so-called “memory wall” problem. To study the memory characteristics of big data applications has become an important issue in the high end computing community. In this paper, we propose a trace-based method based on the trace files generated by simulators, which captures memory access information in different memory hierarchies and aggregates information to get memory...
Non volatile memory (NVM) is expected to enrich the next generation computer system. However, designers have difficulties in exploring new software and hardware design ideas based on NVM due to the limitations of current simulation-based evaluation, e.g., slow runtime. In order to resolve this problem, we present an open, reliable, and versatile hardware platform for NVM emulation. We built Zynq FPGA-based...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.