The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We describe CPU and GPU implementations of parallel triangle-counting and k-truss identification in the Galois and IrGL systems. Both systems are based on a graph-centric abstraction called the operator formulation of algorithms. Depending on the input graph, our implementations are two to three orders of magnitude faster than the reference implementations provided by the IEEE HPEC static graph challenge.
OpenACC's programming model presents a simple interface to programmers, offering a trade-off between performance and development effort. OpenACC relies on compiler technologies to generate efficient code and optimize for performance. Among the difficult to implement directives, is the cache directive. The cache directive allows the programmer to utilize accelerator's hardware- or software-managed...
A broad class of applications involve indirect or data-dependent memory accesses and are referred to as irregular applications. Recent developments in SIMD architectures — specifically, the emergence of wider SIMD lanes, combination of SIMD parallelism with many-core MIMD parallelism, and more flexible programming APIs — are providing new opportunities as well as challenges for this class of applications...
Competitive advantage of a firm is usually reflected through its superiority in production resources and performance outcomes. In order to achieve high performance (e.g., productivity) and significantly improve product quality, major US industries have promoted and implemented Robust Design (RD) techniques during the last decade. RD is a cost-effective procedure for determining the optimal settings...
We discuss the SINR-maximization problem for analog receive beamforming with finite resolution phase shifters and amplifiers. This discrete optimization problem can be globally solved by means of branch-and-bound. The performance of the algorithm depends on the quality of bounds for the subproblems which are determined within the branch-and-bound procedure. These subproblems can be generated by a...
This paper explores optimal array beamforming for secure communication in static multipath propagation environments. The problem is cast as a convex optimization of the average secure key rate achieved in the presence of a passive eavesdropper, and the optimization is performed using semidefinite programming. While representative results are presented for a uniform linear array, the optimization procedure...
2-transistor (2T) cell technology used for embedded non-volatile memory (eNVM) has been scaled down to 40nm node. To enable aggressive cell scaling, the array architecture is modified compared to previous generations and the channel length of cell is drastically reduced requiring steep cell junctions, which give rise to new disturb phenomena. This paper describes how to safeguard the drain disturb...
General purpose compilers aim to extract the best average performance for all possible user applications. Due to the lack of specializations for different types of computations, compiler attained performance often lags behind those of the manually optimized libraries. In this paper, we demonstrate a new approach, programmable composition, to enable the specialization of compiler optimizations without...
We present a tool that reduces the development time of GPU-executable code. We implement a catalogue of common optimizations specific to the GPU architecture. Through the tool, the programmer can semi-automatically transform a computationally-intensive code section into GPU-executable form and apply optimizations thereto. Based on experiments, the code generated by the tool can be 3-256X faster than...
Embedded DSP computing is currently shifting towards manycore architectures in order to cope with the ever growing computational demands. Actor based dataflow languages are being considered as a programming model. In this paper we present a code generator for CAL, one such dataflow language. We propose to use a compilation tool with two intermediate representations. We start from a machine model of...
A general procedure, based on the SemiDefinite Relaxation (SDR) technique, is presented to solve efficiently a wide range of difficult (because non-convex) array synthesis problems. This powerful approximation technique is easy to implement and the so-approximated problem can then be efficiently solved using off-the-shelf numerical routines. Examples of shaped beam and reconfigurable array synthesis...
Reducing energy consumption without affecting computational performance is a significant research driver in computer engineering. The Partitioned Global Address Space (PGAS) programming model provides a global address space for ease-of-use while providing locality-awareness for efficient execution. For symmetric multiprocessor (SMP) clusters, PGAS locality-awareness offers opportunities for intelligent...
Computational scientists and engineers commonly rely on established software libraries to achieve high performance and reliability in their numerical applications. Unfortunately, this approach does not work well if the desired functionality is absent in existing libraries or if the integration is difficult. In such scenarios, one is often forced to explore alternative algorithms and in-house implementations...
I/O performance is vital for most HPC applications especially those that generate a vast amount of data with the growth of scale. Many studies have shown that scientific applications tend to issue small and noncontiguous accesses in an interleaving fashion, causing different processes to access overlapping regions. In such scenario, collective I/O is a widely used optimization technique. However,...
We present studies of an extrinsic program disturb mechanism in a Field Programmable Gate Array (FPGA) fabricated with a 65 nm embedded-Flash process. It is concluded that multiple positive charges are involved during disturb to explain the observed extrinsic behavior. Its failure rate was improved with tunnel oxidation process tuning and stronger pre-oxidation cleans.
Array region analysis plays a significant role in various optimizations at compile time. Displaying array access information efficiently in HPC applications has been a vital challenge for scientists and developers for the past few years. Dragon array region analysis tool is a powerful and interactive tool that was built on top of the Open UH compiler, an open source C/C++/Fortran compiler, that supports...
Current processor trends of integrating more cores with wider SIMD units, along with a deeper and complex memory hierarchy, have made it increasingly more challenging to extract performance from applications. It is believed by some that traditional approaches to programming do not apply to these modern processors and hence radical new languages must be discovered. In this paper, we question this thinking...
In this paper, we propose a source-to-source code optimization framework for general purpose computing on graphics processing units (GPGPU). Our framework is based on a re-formulation of the polyhedral loop transformation theory under the context of GPGPU. We prove that the number of actual memory transactions can be used as a performance metric to guide the code optimization process. In addition,...
A phase-only method of generating notches in the beam pattern of a phased array antenna is described. It is shown how the problem can be formulated as a semidefinite programming problem, and solved with readily available solvers. Numerical results simulating the performance of a 32 element uniform linear array are presented.
Feature models are commonly used to specify variability in software product lines. Several tools support feature models for variability management at different steps in the development process. However, tool support for test configuration generation is currently limited. This test generation task consists in systematically selecting a set of configurations that represent a relevant sample of the variability...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.