The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper develops and evaluates search and optimization techniques for autotuning 3D stencil (nearest neighbor) computations on GPUs. Observations indicate that parameter tuning is necessary for heterogeneous GPUs to achieve optimal performance with respect to a search space. Our proposed framework takes a most concise specification of stencil behavior from the user as a single formula, autogenerates...
Current generation of multicore computing platforms are vastly different. Sustenance of many core applications across heterogenous platforms is a daunting task, more so when dynamic nature of the application is factored in. Open Computing Language (OpenCL) was created to address this issue. Designed to run on CPUs, GPUs, FPGAs and other platforms. OpenCL is becoming a standard for cross-platform parallel...
The enormous computational power available in modern graphics processing units (GPUs) has enabled the widely use of them for general-purpose applications. However, manual development of high-performance parallel codes for GPUs is still very challenging. In order to fully exploit the capability of GPU for general purpose computing under heterogeneous processing platforms, we propose performance estimation...
Array region analysis plays a significant role in various optimizations at compile time. Displaying array access information efficiently in HPC applications has been a vital challenge for scientists and developers for the past few years. Dragon array region analysis tool is a powerful and interactive tool that was built on top of the Open UH compiler, an open source C/C++/Fortran compiler, that supports...
Many general-purpose applications exploit Graphics Processing Units (GPUs) by executing a set of well-known dataparallel primitives. Those primitives are usually invoked from the host many times, so their throughput has a great impact on the performance of the overall system. Thus, the study of novel algorithmic strategies to optimize their implementation on current devices is an interesting topic...
GPU computing offers a high potential of raw processing power at comparatively low costs. This paper investigates optimization techniques for solving initial value problems (IVPs) of ordinary differential equations (ODEs) on GPUs. Different techniques, especially for exploiting the GPU memory hierarchy, are discussed, and corresponding OpenCL implementations of the explicit Euler method are compared...
We present an automated symbolic verifier for checking the functional correctness of GPGPU kernels parametrically, for an arbitrary number of threads. Our tool checks the functional equivalence of a kernel and its optimized versions, helping debug errors introduced during memory coalescing and bank conflict elimination related optimizations. Key features of our work include: (1) a symbolic method...
In this paper, we propose a source-to-source code optimization framework for general purpose computing on graphics processing units (GPGPU). Our framework is based on a re-formulation of the polyhedral loop transformation theory under the context of GPGPU. We prove that the number of actual memory transactions can be used as a performance metric to guide the code optimization process. In addition,...
The enormous computational power available in modern graphics processing units (GPUs) has enabled the widely use of them for general-purpose applications. However, manual development of high-performance parallel codes for GPUs is still very challenging. In order for improving GPGPU application performance by efficiently using GPU global memory, we extend the polyhedral model to capture memory access...
Graphs are a fundamental data representation that has been used extensively in various domains. In graph-based applications, a systematic exploration of the graph such as a breadth-first search (BFS) often serves as a key component in the processing of their massive data sets. In this paper, we present a new method for implementing the parallel BFS algorithm on multi-core CPUs which exploits a fundamental...
In this study, we test and analyze the performance of Gyrokinetic Torodial Code(GTC) program. According to the analysis results, we port GTC's compute-intensive subroutines to GPU and speed up them on the “CPU+GPU” heterogeneous architecture of TH-1A supercomputer. Some optimization strategies are developed in this process, for example, subroutines are integrated to reduce the data transfer between...
We introduce a new class of reflectarrays, namely, the aperiodic conformal reflectarrays, aimed at exploiting, as much as possible, the available degrees of freedom of the radiating structure, such as positions, orientations and characteristics of the radiating elements and the shape of the reflecting surface. A synthesis technique is outlined, properly dealing with key aspects such as the complexity...
With the development of Graphics Processing Unit (GPU) and the Compute Unified Device Architecture (CUDA) platform, researchers shift their attentions to general-purpose computing applications with GPU. In this paper, we present a novel parallel approach to run artificial fish swarm algorithm (AFSA) on GPU. Experiments are conducted by running AFSA both on GPU and CPU respectively to optimize four...
CUDA facilitates the development of General Purpose computing on Graphics Processing Units (GPGPU), however, its complex memory system, thread-level structure, and data transmission control between memories have brought great challenges for programming on GPU. In order to facilitate the development of parallel programs on GPU and reuse existing sequential codes, in this paper we propose a novel directive...
Option pricing is an important problem in computational finance due to the fast-growing market and increasing complexity of options. For option pricing, a model is required to describe the price process of the underlying asset. The GARCH model is one of the prominent option pricing models since it can model stochastic volatility of the underlying asset. To derive expected profit based on the GARCH...
In the paper, particle gradient multi-objective evolutionary algorithm (PGMOEA) on GPU is presented. PGMOEA extends the classical particle dynamic multi-objective evolutionary algorithm by incorporating the gradient information of each particle from evolutionary programming. We perform experiments to compare PGMOEA on GPU with PGMOEA on CPU and demonstrate that PGMOEA on GPU is much more effective...
The increasing programability and the high computational power of Graphical Processing Units (GPU) make them attractive to general purpose programming. However, taking full benefit of this execution environment is a challenging task. One of these challenges stem from divergences, a phenomenon that occurs when threads that execute in lock-step are forced to take different program paths due to branches...
The Graphics Processing Unit (GPU) is an asymmetric, heterogeneous multi-core architecture that can be used for high performance parallel computing applications. However, a significant level of interest has been focused on algorithms for solving regular problems, as these applications typically map well to the GPU. Irregular applications, which rely on pointer or graph-based data structures, have...
Genetic Algorithms(GAs) are suitable for parallel computing since population members fitness maybe evaluated in parallel. Most past parallel GA studies have exploited this aspect, besides resorting to different algorithms, such as island, single-population master-slave, fine-grained and hybrid models. A GA involves a number of other operations which, if parallelized, may lead to better parallel GA...
This paper describes the use of CUDA to accelerate the Himeno benchmark on clusters with GPUs. The implementation is designed to optimize memory bandwidth utilization. Our approach achieves over 83% of the theoretical peak bandwidth on a NVIDIA Tesla C1060 GPU and performs at over 50 GFlops. A multi-GPU implementation that utilizes MPI alongside CUDA streams to overlap GPU execution with data transfers...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.