Advanced search

From:

To:

Items from 1 to 20 out of 20 results

article

Autogeneration and Autotuning of 3D Stencil Codes on Homogeneous and Heterogeneous GPU Clusters

Yongpeng Zhang, Frank Mueller

IEEE Transactions on Parallel and Distributed Systems > 2013 > 24 > 3 > 417 - 427

This paper develops and evaluates search and optimization techniques for autotuning 3D stencil (nearest neighbor) computations on GPUs. Observations indicate that parameter tuning is necessary for heterogeneous GPUs to achieve optimal performance with respect to a search space. Our proposed framework takes a most concise specification of stencil behavior from the user as a single formula, autogenerates...

chapter

Cross-Platform OpenCL Code and Performance Portability Investigated with a Climate and Weather Physics Model

Han Dong, Dibyajyoti Ghosh, Fahad Zafar, Shujia Zhou

2012 41st International Conference on Parallel Processing Workshops > 126 - 134

2012 41st International Conference on Parallel Processing Workshops (ICPPW)

Current generation of multicore computing platforms are vastly different. Sustenance of many core applications across heterogenous platforms is a daunting task, more so when dynamic nature of the application is factored in. Open Computing Language (OpenCL) was created to address this issue. Designed to run on CPUs, GPUs, FPGAs and other platforms. OpenCL is becoming a standard for cross-platform parallel...

chapter

GPGPU Memory Estimation and Optimization Targeting OpenCL Architecture

Junfeng Zhu, Gang Chen, Baifeng Wu

2012 IEEE International Conference on Cluster Computing > 449 - 458

2012 IEEE International Conference on Cluster Computing (CLUSTER)

The enormous computational power available in modern graphics processing units (GPUs) has enabled the widely use of them for general-purpose applications. However, manual development of high-performance parallel codes for GPUs is still very challenging. In order to fully exploit the capability of GPU for general purpose computing under heterogeneous processing platforms, we propose performance estimation...

chapter

A Compiler-Based Tool for Array Analysis in HPC Applications

Ahmad Qawasmeh, Barbara Chapman, Amrita Banerjee

2012 41st International Conference on Parallel Processing Workshops > 454 - 463

2012 41st International Conference on Parallel Processing Workshops (ICPPW)

Array region analysis plays a significant role in various optimizations at compile time. Displaying array access information efficiently in HPC applications has been a vital challenge for scientists and developers for the past few years. Dragon array region analysis tool is a powerful and interactive tool that was built on top of the Open UH compiler, an open source C/C++/Fortran compiler, that supports...

chapter

Algorithmic strategies for optimizing the parallel reduction primitive in CUDA

Pedro J. Martin, Luis F. Ayuso, Roberto Torres, Antonio Gavilanes

2012 International Conference on High Performance Computing & Simulation (HPCS) > 511 - 519

2012 International Conference on High Performance Computing & Simulation (HPCS)

Many general-purpose applications exploit Graphics Processing Units (GPUs) by executing a set of well-known dataparallel primitives. Those primitives are usually invoked from the host many times, so their throughput has a great impact on the performance of the overall system. Thus, the study of novel algorithmic strategies to optimize their implementation on current devices is an interesting topic...

chapter

Diamond-Like Tiling Schemes for Efficient Explicit Euler on GPUs

Matthias Korch, Julien Kulbe, Carsten Scholtes

2012 11th International Symposium on Parallel and Distributed Computing > 259 - 266

2012 11th International Symposium on Parallel and Distributed Computing (ISPDC)

GPU computing offers a high potential of raw processing power at comparatively low costs. This paper investigates optimization techniques for solving initial value problems (IVPs) of ordinary differential equations (ODEs) on GPUs. Different techniques, especially for exploiting the GPU memory hierarchy, are discussed, and corresponding OpenCL implementations of the explicit Euler method are compared...

chapter

Parameterized Verification of GPU Kernel Programs

Guodong Li, Ganesh Gopalakrishnan

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 2450 - 2459

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

We present an automated symbolic verifier for checking the functional correctness of GPGPU kernels parametrically, for an arbitrary number of threads. Our tool checks the functional equivalence of a kernel and its optimized versions, helping debug errors introduced during memory coalescing and bank conflict elimination related optimizations. Key features of our work include: (1) a symbolic method...

chapter

A Polyhedral Modeling Based Source-to-Source Code Optimization Framework for GPGPU

Chenxi Wang, Kang Kang, Maohua Zhu, Yangdong Deng

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 1964 - 1970

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

In this paper, we propose a source-to-source code optimization framework for general purpose computing on graphics processing units (GPGPU). Our framework is based on a re-formulation of the polyhedral loop transformation theory under the context of GPGPU. We prove that the number of actual memory transactions can be used as a performance metric to guide the code optimization process. In addition,...

chapter

An experimental GPU global memory performance estimation and optimization

Zhu Junfeng, Chen Gang, Zhang Keliang, Wu Baifeng

2012 International Conference on Systems and Informatics (ICSAI2012) > 910 - 914

2012 International Conference on Systems and Informatics (ICSAI)

The enormous computational power available in modern graphics processing units (GPUs) has enabled the widely use of them for general-purpose applications. However, manual development of high-performance parallel codes for GPUs is still very challenging. In order for improving GPGPU application performance by efficiently using GPU global memory, we extend the polyhedral model to capture memory access...

chapter

Efficient Parallel Graph Exploration on Multi-Core CPU and GPU

Sungpack Hong, Tayo Oguntebi, Kunle Olukotun

2011 International Conference on Parallel Architectures and Compilation Techniques > 78 - 88

2011 International Conference on Parallel Architectures and Compilation Techniques (PACT)

Graphs are a fundamental data representation that has been used extensively in various domains. In graph-based applications, a systematic exploration of the graph such as a breadth-first search (BFS) often serves as a key component in the processing of their massive data sets. In this paper, we present a new method for implementing the parallel BFS algorithm on multi-core CPUs which exploits a fundamental...

chapter

Performance analysis and optimization of Gyrokinetic Torodial Code on TH-1A supercomputer

Xiaoqian Zhu, Xin Liu, Xiangfei Meng, Jinghua Feng

2011 International Conference on Electrical and Control Engineering > 6027 - 6031

2011 International Conference on Electrical and Control Engineering (ICECE)

In this study, we test and analyze the performance of Gyrokinetic Torodial Code(GTC) program. According to the analysis results, we port GTC's compute-intensive subroutines to GPU and speed up them on the “CPU+GPU” heterogeneous architecture of TH-1A supercomputer. Some optimization strategies are developed in this process, for example, subroutines are integrated to reduce the data transfer between...

chapter

Aperiodic conformal reflectarrays

A. Capozzoli, C. Curcio, A. Liseno, M. Migliorelli, more

2011 IEEE International Symposium on Antennas and Propagation (APSURSI) > 361 - 364

2011 IEEE Antennas and Propagation Society International Symposium and USNC/URSI National Radio Science Meeting

We introduce a new class of reflectarrays, namely, the aperiodic conformal reflectarrays, aimed at exploiting, as much as possible, the available degrees of freedom of the radiating structure, such as positions, orientations and characteristics of the radiating elements and the shape of the reflecting surface. A synthesis technique is outlined, properly dealing with key aspects such as the complexity...

chapter

Parallel Fish Swarm Algorithm Based on GPU-Acceleration

Yifan Hu, Baozhong Yu, Jianliang Ma, Tianzhou Chen

2011 3rd International Workshop on Intelligent Systems and Applications > 1 - 4

2011 3rd International Workshop on Intelligent Systems and Applications (ISA)

With the development of Graphics Processing Unit (GPU) and the Compute Unified Device Architecture (CUDA) platform, researchers shift their attentions to general-purpose computing applications with GPU. In this paper, we present a novel parallel approach to run artificial fish swarm algorithm (AFSA) on GPU. Experiments are conducted by running AFSA both on GPU and CPU respectively to optimize four...

chapter

GPU-S2S: A Compiler for Source-to-Source Translation on GPU

Dan Li, Haijun Cao, Xiaoshe Dong, Bao Zhang

2010 3rd International Symposium on Parallel Architectures, Algorithms and Programming > 144 - 148

Third International Symposium on Parallel Architectures, Algorithms and Programming (PAAP 2010)

CUDA facilitates the development of General Purpose computing on Graphics Processing Units (GPGPU), however, its complex memory system, thread-level structure, and data transmission control between memories have brought great challenges for programming on GPU. In order to facilitate the development of parallel programs on GPU and reuse existing sequential codes, in this paper we propose a novel directive...

chapter

Parallelization and characterization of GARCH option pricing on GPUs

Ren-Shuo Liu, Yun-Cheng Tsai, Chia-Lin Yang

IEEE International Symposium on Workload Characterization (IISWC'10) > 1 - 10

2010 IEEE International Symposium on Workload Characterization (IISWC 2010)

Option pricing is an important problem in computational finance due to the fast-growing market and increasing complexity of options. For option pricing, a model is required to describe the price process of the underlying asset. The GARCH model is one of the prominent option pricing models since it can model stochastic volatility of the underlying asset. To derive expected profit based on the GARCH...

chapter

Particle Gradient Multi-objective Evolutionary Algorithm Based on GPU with CUDA

Xuezhi Yue, Zhijian Wu, Kangshun Li

2010 Third International Symposium on Information Science and Engineering > 540 - 544

2010 International Symposium on Information Science and Engineering (ISISE)

In the paper, particle gradient multi-objective evolutionary algorithm (PGMOEA) on GPU is presented. PGMOEA extends the classical particle dynamic multi-objective evolutionary algorithm by incorporating the gradient information of each particle from evolutionary programming. We perform experiments to compare PGMOEA on GPU with PGMOEA on CPU and demonstrate that PGMOEA on GPU is much more effective...

chapter

Performance Debugging of GPGPU Applications with the Divergence Map

B Coutinho, D Sampaio, F M Q Pereira, W Meira

2010 22nd International Symposium on Computer Architecture and High Performance Computing > 33 - 40

2010 22nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2010)

The increasing programability and the high computational power of Graphical Processing Units (GPU) make them attractive to general purpose programming. However, taking full benefit of this execution environment is a challenging task. One of these challenges stem from divergences, a phenomenon that occurs when threads that execute in lock-step are forced to take different program paths due to branches...

chapter

Exploiting Parallelism in Iterative Irregular Maxflow Computations on GPU Accelerators

S Solomon, P Thulasiraman, R K Thulasiram

2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC) > 297 - 304

2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC 2010)

The Graphics Processing Unit (GPU) is an asymmetric, heterogeneous multi-core architecture that can be used for high performance parallel computing applications. However, a significant level of interest has been focused on algorithms for solving regular problems, as these applications typically map well to the GPU. Irregular applications, which rely on pointer or graph-based data structures, have...

chapter

Parallelization of binary and real-coded genetic algorithms on GPU using CUDA

Ramnik Arora, Rupesh Tulshyan, Kalyanmoy Deb

IEEE Congress on Evolutionary Computation > 1 - 8

2010 IEEE Congress on Evolutionary Computation

Genetic Algorithms(GAs) are suitable for parallel computing since population members fitness maybe evaluated in parallel. Most past parallel GA studies have exploited this aspect, besides resorting to different algorithms, such as island, single-population master-slave, fine-grained and hybrid models. A GA involves a number of other operations which, if parallelized, may lead to better parallel GA...

chapter

Implementing the Himeno benchmark with CUDA on GPU clusters

Everett H Phillips, Massimiliano Fatica

2010 IEEE International Symposium on Parallel&Distributed Processing (IPDPS) > 1 - 10

2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

This paper describes the use of CUDA to accelerate the Himeno benchmark on clusters with GPUs. The implementation is designed to optimize memory bandwidth utilization. Our approach achieves over 83% of the theoretical peak bandwidth on a NVIDIA Tesla C1060 GPU and performs at over 50 GFlops. A multi-GPU implementation that utilizes MPI alongside CUDA streams to overlap GPU execution with data transfers...

Filter options

Keywords:
ARRAYS
OPTIMIZATION
GRAPHICS PROCESSING UNIT

Publication date

Set your own date range

Publication type

book (19)
article (1)

Keywords

INSTRUCTION SETS (12)
KERNEL (10)
GPU (9)
COMPUTER GRAPHIC EQUIPMENT (7)
COPROCESSORS (6)
COMPUTATIONAL MODELING (5)
CUDA (4)
GPGPU (4)
BANDWIDTH (3)
MEMORY MANAGEMENT (3)
OPTIMISATION (3)
PARALLEL PROCESSING (3)
PROGRAMMING (3)
DATA MINING (2)
DATA STRUCTURES (2)
PARALLEL ALGORITHMS (2)
PARALLEL ARCHITECTURES (2)
PARALLEL PROGRAMMING (2)
PARALLELISM (2)
PROGRAM COMPILERS (2)
VECTORS (2)
4-CORE CPU (1)
ACCELERATORS (1)
ALGORITHMS (1)
ANALYSIS TOOL (1)
ANTENNA ARRAYS (1)
APPLICATION PROGRAM INTERFACES (1)
APPROXIMATION METHODS (1)
ARRAY REGION ANALYSIS (1)
ARTIFICIAL FISH SWARM ALGORITHM (1)
AUTOMATIC MAPPING (1)
AUTOMATIC SOURCE-TO-SOURCE TRANSLATION TOOL (1)
AUTOREGRESSIVE PROCESSES (1)
BENCHMARK TESTING (1)
BFS (1)
BIOLOGICAL CELLS (1)
C SEQUENTIAL CODE (1)
CD-ROMS (1)
COMPILER DIRECTIVE (1)
COMPILER-BASED TOOL (1)
COMPLEX MEMORY SYSTEM (1)
COMPUTE UNIFIED DEVICE ARCHITECTURE (1)
COMPUTER BUGS (1)
COMPUTEUNIFIEDDEVICE ARCHITECTURE (1)
CORRECTNESS OF OPTIMIZATIONS (1)
CUDA API (1)
CUDA CLUSTER (1)
CUDA CODE (1)
DATA COMMUNICATION (1)
DATA STRUCTURE (1)
DATA TRANSFER (1)
DATA TRANSMISSION CONTROL (1)
DATA-PARALLEL ALGORITHMS (1)
DIAMOND-LIKE CARBON (1)
DIRECTIVE BASED COMPILER GUIDED APPROACH (1)
DIVERGENCE MAP (1)
DYNAMIC PROFILING TECHNIQUES (1)
EVOLUTIONARY COMPUTATION (1)
EXPLICIT EULER METHOD (1)
FLOATING POINT ARITHMETIC (1)
FORMAL VERIFICATION (1)
FUSION OPTIMIZATION (1)
GARCH (1)
GENERAL PURPOSE PROGRAMMING (1)
GENERAL-PURPOSE COMPUTING APPLICATION (1)
GENETIC ALGORITHM (1)
GENETIC ALGORITHMS (1)
GFLOPS (1)
GPGPU APPLICATIONS (1)
GPGPU PROGRAMMING (1)
GPU ACCELERATION (1)
GPU ACCELERATOR (1)
GPU CLUSTER (1)
GPU CLUSTERS (1)
GPU COMPUTING (1)
GPU EXECUTION (1)
GPU PROGRAMMING (1)
GPU-S2S (1)
GRAPH (1)
GRAPH BASED MAXIMUM FLOW ALGORITHM (1)
GRAPH THEORY (1)
GRAPHIC PROCESSING UNIT (1)
GRAPHICAL PROCESSING UNITS (1)
GRAPHICS PROCESSINGUNIT (1)
GTC (1)
HARDWARE (1)
HETEROGENEOUS MULTICORE ARCHITECTURE (1)
HETEROGENEOUS PROCESSING (1)
HIGH PERFORMANCE COMPUTING (1)
HIGH PERFORMANCE PARALLEL COMPUTING (1)
HIMENO BENCHMARK (1)
INDEXES (1)
INITIAL VALUE PROBLEMS (1)
INSTRUMENTS (1)
IRREGULAR ALGORITHM (1)
ITERATIVE (1)
ITERATIVE IRREGULAR MAXFLOW COMPUTATION (1)
more

INFONA - science communication portal

Advanced search

Advanced search

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options