Advanced search

From:

To:

Items from 1 to 19 out of 19 results

chapter

Histogram optimization with CUDA

Keh Kok Yong, Sheera Shaheera Othman Talib

2016 IEEE Industrial Electronics and Applications Conference (IEACon) > 312 - 318

2016 IEEE Industrial Electronics and Applications Conference (IEACon)

Histogram is a popular analytic graphical representation of data distribution resulting from processing a given numerical input data. Although the sequential histogram computation may be simple, it is no longer suitable in processing high volume of data. With recent advancement of high performance computing (HPC), aided by the accelerating growth of General Purpose Graphic Processing Unit (GPGPU),...

article

Redundant Network Traffic Elimination with GPU Accelerated Rabin Fingerprinting

Jianhua Sun, Hao Chen, Ligang He, Huailiang Tan

IEEE Transactions on Parallel and Distributed Systems > 2016 > 27 > 7 > 2130 - 2142

Recently, redundant network traffic elimination has attracted a lot of attention from both the academia and the industry. A core challenge and enabling technique in implementing redundancy elimination is to perform content-based chunking, which typically involves the computationally heavy Rabin fingerprinting algorithm. In this paper, we propose a GPU-based implementation of Rabin fingerprinting to...

chapter

Compiling and Optimizing Java 8 Programs for GPU Execution

Kazuaki Ishizaki, Akihiro Hayashi, Gita Koblents, Vivek Sarkar

2015 International Conference on Parallel Architecture and Compilation (PACT) > 419 - 431

2015 International Conference on Parallel Architecture and Compilation (PACT)

GPUs can enable significant performance improvements for certain classes of data parallel applications and are widely used in recent computer systems. However, GPU execution currently requires explicit low-level operations such as 1) managing memory allocations and transfers between the host system and the GPU, 2) writing GPU kernels in a low-level programming model such as CUDA or OpenCL, and 3)...

chapter

Fast Sparse Matrix and Sparse Vector Multiplication Algorithm on the GPU

Carl Yang, Yangzihao Wang, John D. Owens

2015 IEEE International Parallel and Distributed Processing Symposium Workshop > 841 - 847

2015 IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW)

We implement a promising algorithm for sparse-matrix sparse-vector multiplication (SpMSpV) on the GPU. An efficient k-way merge lies at the heart of finding a fast parallel SpMSpV algorithm. We examine the scalability of three approaches -- no sorting, merge sorting, and radix sorting -- in solving this problem. For breadth-first search (BFS), we achieve a 1.26x speedup over state-of-the-art sparse-matrix...

chapter

Double precision stencil computations on Kepler GPUs

Anamaria Vizitiu, Lucian Itu, Laszlo Lazar, Constantin Suciu

2014 18th International Conference on System Theory, Control and Computing (ICSTCC) > 123 - 127

2014 18th International Conference on System Theory, Control and Computing (ICSTCC)

Graphics Processing Units (GPU) have been used extensively for accelerating parallelizable applications in general, and scientific computations in particular. Stencil based algorithms are used intensively in various research areas and represent good candidates for GPU based acceleration. Since scientific computations have high accuracy requirements, herein we focus on stencil based double precision...

chapter

A Compiler Translate Directive-Based Language to Optimized CUDA

Feng Li, Hong An, Weihao Liang, Xiaoqiang Li, more

2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS) > 982 - 989

2014 IEEE International Conference on High Performance Computing and Communications (HPCC), 2014 IEEE 6th International Symposium on Cyberspace Safety and Security (CSS) and 2014 IEEE 11th International Conference on Embedded Software and Systems (ICESS)

Graphics processing units(GPUs) provide a low cost platform for accelerating high performance computations. New programming languages, such as CUDA and OpenCL, make GPU programming attractive to programmers. However, programming GPUs is still a cumbersome task for two reasons, tedious performance optimizations and lack of portability. First, optimizing an algorithm for a specific GPU is a time-consuming...

chapter

Optimized GPU Sorting Algorithms on Special Input Distributions

Quan Yang, Zhihui Du, Sen Zhang

2012 11th International Symposium on Distributed Computing and Applications to Business, Engineering & Science > 57 - 61

2012 11th International Symposium on Distributed Computing and Applications to Business, Engineering & Science

We present a high performance graphics processing unit (GPU) sorting algorithm ISSD (Improved Sorting considering Special Distributions) implemented with the Compute Unified Device Architecture (CUDA). The ISSD focuses on two aspects to improve parallel sorting performance. One is how to decompose the sorting tasks into independent and balanced subtasks which can then be easily distributed to thousands...

chapter

Attempt of unbiased comparison of GPU and CPU performance in common scientific computing

Adnan Hidic, Damir Zubanovic, Adnan Hajdarevic, Alvin Huseinovic, more

2012 IX International Symposium on Telecommunications (BIHTEL) > 1 - 6

2012 IX International Symposium on Telecommunications (BIHTEL)

Graphics processing units (GPU) are considered to have superior performance over the central processing units (CPU) in performing common scientific computations. Number of factors that can seriously change this conception are usually overlooked. In this paper, some of these factors are taken into account and their impact is measured, analysed and discussed. Matrix multiplication and Shell sorting...

chapter

GPGPU Memory Estimation and Optimization Targeting OpenCL Architecture

Junfeng Zhu, Gang Chen, Baifeng Wu

2012 IEEE International Conference on Cluster Computing > 449 - 458

2012 IEEE International Conference on Cluster Computing (CLUSTER)

The enormous computational power available in modern graphics processing units (GPUs) has enabled the widely use of them for general-purpose applications. However, manual development of high-performance parallel codes for GPUs is still very challenging. In order to fully exploit the capability of GPU for general purpose computing under heterogeneous processing platforms, we propose performance estimation...

chapter

A Polyhedral Modeling Based Source-to-Source Code Optimization Framework for GPGPU

Chenxi Wang, Kang Kang, Maohua Zhu, Yangdong Deng

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 1964 - 1970

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

In this paper, we propose a source-to-source code optimization framework for general purpose computing on graphics processing units (GPGPU). Our framework is based on a re-formulation of the polyhedral loop transformation theory under the context of GPGPU. We prove that the number of actual memory transactions can be used as a performance metric to guide the code optimization process. In addition,...

chapter

An experimental GPU global memory performance estimation and optimization

Zhu Junfeng, Chen Gang, Zhang Keliang, Wu Baifeng

2012 International Conference on Systems and Informatics (ICSAI2012) > 910 - 914

2012 International Conference on Systems and Informatics (ICSAI)

The enormous computational power available in modern graphics processing units (GPUs) has enabled the widely use of them for general-purpose applications. However, manual development of high-performance parallel codes for GPUs is still very challenging. In order for improving GPGPU application performance by efficiently using GPU global memory, we extend the polyhedral model to capture memory access...

chapter

Efficient Parallel Graph Exploration on Multi-Core CPU and GPU

Sungpack Hong, Tayo Oguntebi, Kunle Olukotun

2011 International Conference on Parallel Architectures and Compilation Techniques > 78 - 88

2011 International Conference on Parallel Architectures and Compilation Techniques (PACT)

Graphs are a fundamental data representation that has been used extensively in various domains. In graph-based applications, a systematic exploration of the graph such as a breadth-first search (BFS) often serves as a key component in the processing of their massive data sets. In this paper, we present a new method for implementing the parallel BFS algorithm on multi-core CPUs which exploits a fundamental...

chapter

Performance analysis and optimization of Gyrokinetic Torodial Code on TH-1A supercomputer

Xiaoqian Zhu, Xin Liu, Xiangfei Meng, Jinghua Feng

2011 International Conference on Electrical and Control Engineering > 6027 - 6031

2011 International Conference on Electrical and Control Engineering (ICECE)

In this study, we test and analyze the performance of Gyrokinetic Torodial Code(GTC) program. According to the analysis results, we port GTC's compute-intensive subroutines to GPU and speed up them on the “CPU+GPU” heterogeneous architecture of TH-1A supercomputer. Some optimization strategies are developed in this process, for example, subroutines are integrated to reduce the data transfer between...

chapter

GPU-S2S: A Compiler for Source-to-Source Translation on GPU

Dan Li, Haijun Cao, Xiaoshe Dong, Bao Zhang

2010 3rd International Symposium on Parallel Architectures, Algorithms and Programming > 144 - 148

Third International Symposium on Parallel Architectures, Algorithms and Programming (PAAP 2010)

CUDA facilitates the development of General Purpose computing on Graphics Processing Units (GPGPU), however, its complex memory system, thread-level structure, and data transmission control between memories have brought great challenges for programming on GPU. In order to facilitate the development of parallel programs on GPU and reuse existing sequential codes, in this paper we propose a novel directive...

chapter

Parallelization and characterization of GARCH option pricing on GPUs

Ren-Shuo Liu, Yun-Cheng Tsai, Chia-Lin Yang

IEEE International Symposium on Workload Characterization (IISWC'10) > 1 - 10

2010 IEEE International Symposium on Workload Characterization (IISWC 2010)

Option pricing is an important problem in computational finance due to the fast-growing market and increasing complexity of options. For option pricing, a model is required to describe the price process of the underlying asset. The GARCH model is one of the prominent option pricing models since it can model stochastic volatility of the underlying asset. To derive expected profit based on the GARCH...

chapter

Exploiting Parallelism in Iterative Irregular Maxflow Computations on GPU Accelerators

S Solomon, P Thulasiraman, R K Thulasiram

2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC) > 297 - 304

2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC 2010)

The Graphics Processing Unit (GPU) is an asymmetric, heterogeneous multi-core architecture that can be used for high performance parallel computing applications. However, a significant level of interest has been focused on algorithms for solving regular problems, as these applications typically map well to the GPU. Irregular applications, which rely on pointer or graph-based data structures, have...

chapter

PIR: PMaC's Idiom Recognizer

C Olschanowsky, A Snavely, M R Meswani, L Carrington

2010 39th International Conference on Parallel Processing Workshops > 189 - 196

2010 39th International Conference on Parallel Processing Workshops (ICPPW)

The speed of the memory subsystem often constrains the performance of large-scale parallel applications. Experts tune such applications to use hierarchical memory subsystems efficiently. Hardware accelerators, such as GPUs, can potentially improve memory performance beyond the capabilities of traditional hierarchical systems. However, the addition of such specialized hardware complicates code porting...

chapter

Parallelization of binary and real-coded genetic algorithms on GPU using CUDA

Ramnik Arora, Rupesh Tulshyan, Kalyanmoy Deb

IEEE Congress on Evolutionary Computation > 1 - 8

2010 IEEE Congress on Evolutionary Computation

Genetic Algorithms(GAs) are suitable for parallel computing since population members fitness maybe evaluated in parallel. Most past parallel GA studies have exploited this aspect, besides resorting to different algorithms, such as island, single-population master-slave, fine-grained and hybrid models. A GA involves a number of other operations which, if parallelized, may lead to better parallel GA...

chapter

Solving 2D Nonlinear Unsteady Convection-Diffusion Equations on Heterogenous Platforms with Multiple GPUs

Canqun Yang, Zhen Ge, Juan Chen, Feng Wang, more

2009 15th International Conference on Parallel and Distributed Systems > 961 - 966

2009 IEEE 15th International Conference on Parallel and Distributed Systems (ICPADS 2009)

Solving complex convection-diffusion equations is very important to many practical mathematical and physical problems. After the finite difference discretization, most of the time for equations solution is spent on sparse linear equation solvers. In this paper, our goal is to solve 2D Nonlinear Unsteady Convection-Diffusion Equations by accelerating an iterative algorithm named Jacobi-preconditioned...

Filter options

Keywords:
ARRAYS
OPTIMIZATION
GPU

Publication date

Set your own date range

INFONA - science communication portal

Advanced search

Advanced search

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options