Search results

Items from 1 to 20 out of 433 results

chapter

Parallel MLEM algorithm using GPU

T. A. Valencia-Perez

2017 14th International Conference on Electrical Engineering, Computing Science and Automatic Control (CCE) > 1 - 6

2017 14th International Conference on Electrical Engineering, Computing Science and Automatic Control (CCE)

The Computed Tomography (CT) is a imaging method based on X-rays to obtain cross-sectional images from an object. It is a widely used method in several areas, such as medicine, archeology or material sciences. Tomographic reconstruction techniques, use the projections of images from multiple directions. There are several algorithms for this purpose but can be classified according to their reconstruction...

chapter

GScheduler: Optimizing resource provision by using GPU usage pattern extraction in cloud environments

Zhuqing Xu, Fang Dong, Jiahui Jin, Junzhou Luo, more

2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC) > 3225 - 3230

2017 IEEE International Conference on Systems, Man and Cybernetics (SMC)

GPU-based clusters are widely chosen for accelerating a variety of scientific applications in high-end cloud environments. With their growing popularity, there is a necessity for improving the system throughput and decreasing the turnaround time for co-executing applications on the same GPU device. However, resource contention among multiple applications on a multi-tasked GPU leads to the performance...

chapter

General-purpose computing on GPU: Pixel processing

Milos Ockay

2017 Communication and Information Technologies (KIT) > 1 - 4

2017 Communication and Information Technologies (KIT)

Presented paper explains general purpose approach to the parallel pixel processing on GPU. It presents essential dataset structuring, correct type assignment and kernel configuration for CUDA application interface. Paper also explains data movement and optimal computation saturation. Transfers are also analyzed in correlation with the computation especially for the embarrassingly parallel problem...

chapter

VLAG: A very fast locality approximation model for GPU kernels with regular access patterns

Mohsen Kiani, Amir Rajabzadeh

2017 7th International Conference on Computer and Knowledge Engineering (ICCKE) > 260 - 265

2017 7th International Conference on Computer and Knowledge Engineering (ICCKE)

Performance modeling plays an important role for optimal hardware design and optimized application implementation. This paper presents a very low overhead performance model, called VLAG, to approximate the data localities exploited by GPU kernels. VLAG receives source code-level information to estimate per memory-access instruction, per data array, and per kernel localities within GPU kernels. VLAG...

chapter

Moka: Model-based concurrent kernel analysis

Leiming Yu, Xun Gong, Yifan Sun, Qianqian Fang, more

2017 IEEE International Symposium on Workload Characterization (IISWC) > 197 - 206

2017 IEEE International Symposium on Workload Characterization (IISWC)

GPUs continue to increase the number of compute resources with each new generation. Many data-parallel applications have been re-engineered to leverage the thousands of cores on the GPU. But not every kernel can fully utilize all the resources available. Many applications contain multiple kernels that could potentially be run concurrently. To better utilize the massive resources on the GPU, device...

chapter

cudaCR: An In-Kernel Application-Level Checkpoint/Restart Scheme for CUDA-Enabled GPUs

Behnam Pourghassemi, Aparna Chandramowlishwaran

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 725 - 732

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Fault-tolerance is becoming increasingly important as we enter the era of exascale computing. Increasing the number of cores results in a smaller mean time between failures, and consequently, higher probability of errors. Among the different software fault tolerance techniques, checkpoint/restart is the most commonly used method in supercomputers, the de-facto standard for large-scale systems. Although...

chapter

Exploration of OpenCL for FPGAs using SDAccel and comparison to GPUs and multicore CPUs

Lester Kalms, Diana Gohringer

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 4

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

Due to energy efficiency, heterogeneous computing is gaining more and more attention. Since FPGA implementations are time consuming, high-level synthesis (HLS) is used to close the productivity gap. OpenCL has become accepted as a good programming model for HLS, due to its portability, good capability of design verification and rich instruction set. This work implements different optimization strategies...

chapter

A GPU-Friendly Skiplist Algorithm

Nurit Moscovici, Nachshon Cohen, Erez Petrank

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 246 - 259

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

We propose a design for a fine-grained lock-based skiplist optimized for Graphics Processing Units (GPUs). While GPUs are often used to accelerate streaming parallel computations, it remains a significant challenge to efficiently offload concurrent computations with more complicated data-irregular access and fine-grained synchronization. Natural building blocks for such computations would be concurrent...

chapter

Scientific computing using consumer video-gaming embedded devices

Glenn Volkema, Gaurav Khanna

2017 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 8

2017 IEEE High Performance Extreme Computing Conference (HPEC)

The performance of commodity video-gaming embedded devices (consoles, graphics cards, tablets, etc.) has been advancing at a rapid pace owing to strong consumer demand and stiff market competition. Gaming devices are currently amongst the most powerful and cost-effective computational technologies available in quantity. In this article, we evaluate a sample of current generation video-gaming devices...

chapter

3D tomography back-projection parallelization on FPGAs using opencl

Maxime Martelli, Nicolas Gag, Alain Merigot, Cyrille Enderli

2017 Conference on Design and Architectures for Signal and Image Processing (DASIP) > 1 - 6

2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)

This paper deals with the evaluation of FPGAs resurgence for hardware acceleration applied to computed tomography on the back-projection operator used in iterative reconstruction algorithms. We focus our attention on the tools developed by FPGAs manufacturers, in particular the Intel FPGA SDK for OpenCL, that promises a new level of hardware abstraction from the developer's perspective, allowing a...

chapter

Developing CPU-GPU Embedded Systems Using Platform-Agnostic Components

Gabriel Campeanu, Jan Carlson, Severine Sentilles

2017 43rd Euromicro Conference on Software Engineering and Advanced Applications (SEAA) > 176 - 180

2017 43rd Euromicro Conference on Software Engineering and Advanced Applications (SEAA)

Nowadays, there are many embedded systems with different architectures that have incorporated GPUs. However, it is difficult to develop CPU-GPU embedded systems using component-based development (CBD), since existing CBD approaches have no support for GPU development. In this context, when targeting a particular CPU-GPU platform, the component developer is forced to construct hardware-specific components,...

chapter

Variable-Size Batched LU for Small Matrices and Its Integration into Block-Jacobi Preconditioning

Hartwig Anzt, Jack Dongarra, Goran Flegar, Enrique S. Quintana-Orti

2017 46th International Conference on Parallel Processing (ICPP) > 91 - 100

2017 46th International Conference on Parallel Processing (ICPP)

We present a set of new batched CUDA kernels for the LU factorization of a large collection of independent problems of different size, and the subsequent triangular solves. All kernels heavily exploit the registers of the graphics processing unit (GPU) in order to deliver high performance for small problems. The development of these kernels is motivated by the need for tackling this embarrasingly-parallel...

chapter

High-Performance and Memory-Saving Sparse General Matrix-Matrix Multiplication for NVIDIA Pascal GPU

Yusuke Nagasaka, Akira Nukada, Satoshi Matsuoka

2017 46th International Conference on Parallel Processing (ICPP) > 101 - 110

2017 46th International Conference on Parallel Processing (ICPP)

Sparse general matrix-matrix multiplication (SpGEMM) is one of the key kernels of preconditioners such as algebraic multigrid method or graph algorithms. However, the performance of SpGEMM is quite low on modern processors due to random memory access to both input and output matrices. As well as the number and the pattern of non-zero elements in the output matrix, important for achieving locality,...

chapter

A CUDA-based parallel adaptive dynamic programming algorithm

Lu Li, Xin Chen, Wei Wang

2017 36th Chinese Control Conference (CCC) > 3510 - 3515

2017 36th Chinese Control Conference (CCC)

Adaptive Dynamic Programming (ADP) with critic-actor architecture is a useful way to achieve online learning control. The algorithm Gaussian-Kernel Adaptive Dynamic Programming (GK-ADP) that has been developed before has a kind of two-phase iteration, which not only approximates value function, but also optimizes hyper-parameters simultaneously. However, just like most iteration algorithms are applied...

chapter

GPU-based coevolutionary particle swarm optimization

Zhao Liang, Zhu Yanxing, Zhang Jianyu, Ye Zhencheng

2017 36th Chinese Control Conference (CCC) > 9883 - 9887

2017 36th Chinese Control Conference (CCC)

Coevolutionary particle swarm optimization (CPSO) algorithm has been investigated and applied in the real world widely. When tackling the large-scale and complex real time optimization problems, the running time of CPSO algorithm is a barrier. In this paper, Graphics Processing Unit (GPU) is introduced to provide speedup in order to meet the real time requirements. The CPSO algorithm has been implemented...

chapter

Blind image restoration for blurred images implemented on GPU

Tomio Goto, Shota Otake, Satoshi Hirano

2017 IEEE International Conference on Consumer Electronics - Taiwan (ICCE-TW) > 213 - 214

2017 IEEE International Conference on Consumer Electronics - Taiwan (ICCE-TW)

In this paper, we use a restoration method that rapidly restores blurred images using local patches proposed by Senshiki et al. [1]. The computation time is significantly reduced by that method, but it is not yet a practical. Therefore, we propose to accelerate by implementing the image restoration processing on GPU. By measuring the processing time of the image restoration, we show the superiority...

chapter

A software technique to enhance register utilization of Convolutional Neural Networks on GPGPUs

Che-Huai Lin, An-Ting Cheng, Bo-Cheng Lai

2017 International Conference on Applied System Innovation (ICASI) > 614 - 617

2017 International Conference on Applied System Innovation (ICASI)

CNNs (Convolutional Neural Networks) have demonstrated superior results in a wide range of applications. However, the time-consuming convolution operations required by CNNs pose great challenges to designers. GPGPUs (General Purpose Graphic Processing Units) have been widely used to exploiting the massive parallelism of convolution operations. This paper proposes a software-based loop-unrolling technique...

chapter

Optimal Bandwidth Selection for Kernel Regression Using a Fast Grid Search and a GPU

Chris Rohlfs, Mohamed Zahran

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 550 - 556

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

This study presents a new algorithm and corresponding statistical package for estimating optimal bandwidth for a nonparametric kernel regression. Kernel regression is widely used in Economics, Statistics, and other fields. The formula for the optimal "bandwidth," or smoothing parameter, is well-known. In practice, however, the computational demands of estimating the optimal bandwidth have...

chapter

Auto-Tuning Strategies for Parallelizing Sparse Matrix-Vector (SpMV) Multiplication on Multi- and Many-Core Processors

Kaixi Hou, Wu-chun Feng, Shuai Che

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 713 - 722

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Because sparse matrix-vector multiplication (SpMV) is an important and widely used computational kernel in many real-world applications, it behooves us to accelerate SpMV on modern multi- and many-core architectures. While many storage formats have been developed to facilitate SpMV operations, the compressed sparse row (CSR) format is still the most popular and general storage format. However, parallelizing...

chapter

GPU accelerated foreground segmentation using CodeBook model and shadow removal using CUDA

Praveen Gudivaka, Nayaneesh Mishra, Anupam Agrawal

2017 International Conference on Computing, Communication and Automation (ICCCA) > 765 - 770

2017 International Conference on Computing, Communication and Automation (ICCCA)

Background Subtraction is the major important step in many image processing applications which can be applied in much of video surveillances. The major result of this method is accuracy as well as processing time. So we mainly focused on these two challenges. We parallelized the Two Layered CodeBook Model on Graphical Processing Unit (GPU) for increasing the processing speed and the accuracy of the...

Keywords:
KERNEL
GPU

Publication date

Set your own date range

Content availability

Available (431)
None (2)

Keywords

GRAPHICS PROCESSING UNITS (213)
INSTRUCTION SETS (204)
GRAPHICS PROCESSING UNIT (180)
CUDA (142)
COPROCESSORS (86)
COMPUTER ARCHITECTURE (83)
PARALLEL PROCESSING (82)
COMPUTER GRAPHIC EQUIPMENT (70)
COMPUTATIONAL MODELING (69)
HARDWARE (57)
OPTIMIZATION (56)
OPENCL (55)
PROGRAMMING (51)
ARRAYS (50)
ALGORITHM DESIGN AND ANALYSIS (49)
MEMORY MANAGEMENT (42)
ACCELERATION (41)
REGISTERS (31)
PERFORMANCE EVALUATION (30)
SPARSE MATRICES (27)
YARN (26)
PARALLEL COMPUTING (25)
PIXEL (25)
VECTORS (25)
GPGPU (24)
MATHEMATICAL MODEL (24)
BANDWIDTH (23)
COMPUTER GRAPHICS (22)
LIBRARIES (22)
THROUGHPUT (21)
COMPUTE UNIFIED DEVICE ARCHITECTURE (20)
BENCHMARK TESTING (19)
RUNTIME (19)
GRAPHICS (18)
PARALLEL ALGORITHMS (18)
CPU (17)
CENTRAL PROCESSING UNIT (16)
FIELD PROGRAMMABLE GATE ARRAYS (16)
PARALLEL (16)
EQUATIONS (15)
FPGA (15)
IMAGE PROCESSING (15)
INDEXES (15)
FEATURE EXTRACTION (13)
PARALLEL PROGRAMMING (13)
PERFORMANCE (13)
TRAINING (13)
OPENMP (12)
PARALLEL ARCHITECTURES (12)
CONVOLUTION (11)
HIGH PERFORMANCE COMPUTING (11)
SUPPORT VECTOR MACHINES (11)
CONTEXT (10)
GRAPHIC PROCESSING UNIT (10)
MULTICORE PROCESSING (10)
RANDOM ACCESS MEMORY (10)
RENDERING (COMPUTER GRAPHICS) (10)
IMAGE RECONSTRUCTION (9)
JACOBIAN MATRICES (9)
MATRIX MULTIPLICATION (9)
REAL-TIME SYSTEMS (9)
RESOURCE MANAGEMENT (9)
THREE DIMENSIONAL DISPLAYS (9)
VIDEO CODING (9)
ANALYTICAL MODELS (8)
CONFERENCES (8)
DATA MINING (8)
DATA STRUCTURES (8)
DATABASES (8)
ENCODING (8)
ENERGY EFFICIENCY (8)
LINEAR ALGEBRA (8)
MOTION ESTIMATION (8)
MULTIPROCESSING SYSTEMS (8)
NVIDIA (8)
PARALLEL ALGORITHM (8)
PROGRAM PROCESSORS (8)
SPMV (8)
SYNCHRONIZATION (8)
TILES (8)
TUNING (8)
ACCURACY (7)
APPROXIMATION ALGORITHMS (7)
COMPUTER VISION (7)
DECODING (7)
EDUCATIONAL INSTITUTIONS (7)
HIGH DEFINITION VIDEO (7)
HISTOGRAMS (7)
IMAGE COLOR ANALYSIS (7)
IMAGE SEGMENTATION (7)
ITERATIVE METHODS (7)
MPI (7)
OPTIMISATION (7)
PARTITIONING ALGORITHMS (7)
PIPELINES (7)
RADIATION DETECTORS (7)
SHAPE (7)
SIMD (7)
more

INFONA - science communication portal

Search results

Parallel MLEM algorithm using GPU

GScheduler: Optimizing resource provision by using GPU usage pattern extraction in cloud environments

General-purpose computing on GPU: Pixel processing

VLAG: A very fast locality approximation model for GPU kernels with regular access patterns

Moka: Model-based concurrent kernel analysis

cudaCR: An In-Kernel Application-Level Checkpoint/Restart Scheme for CUDA-Enabled GPUs

Exploration of OpenCL for FPGAs using SDAccel and comparison to GPUs and multicore CPUs

A GPU-Friendly Skiplist Algorithm

Scientific computing using consumer video-gaming embedded devices

3D tomography back-projection parallelization on FPGAs using opencl

Developing CPU-GPU Embedded Systems Using Platform-Agnostic Components

Variable-Size Batched LU for Small Matrices and Its Integration into Block-Jacobi Preconditioning

High-Performance and Memory-Saving Sparse General Matrix-Matrix Multiplication for NVIDIA Pascal GPU

A CUDA-based parallel adaptive dynamic programming algorithm

GPU-based coevolutionary particle swarm optimization

Blind image restoration for blurred images implemented on GPU

A software technique to enhance register utilization of Convolutional Neural Networks on GPGPUs

Optimal Bandwidth Selection for Kernel Regression Using a Fast Grid Search and a GPU

Auto-Tuning Strategies for Parallelizing Sparse Matrix-Vector (SpMV) Multiplication on Multi- and Many-Core Processors

GPU accelerated foreground segmentation using CodeBook model and shadow removal using CUDA

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options