Search results

chapter

Scalpel: Customizing DNN pruning to the underlying hardware parallelism

Jiecao Yu, Andrew Lukefahr, David Palframan, Ganesh Dasika, more

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 548 - 560

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

As the size of Deep Neural Networks (DNNs) continues to grow to increase accuracy and solve more complex problems, their energy footprint also scales. Weight pruning reduces DNN model size and the computation by removing redundant weights. However, we implemented weight pruning for several popular networks on a variety of hardware platforms and observed surprising results. For many networks, the network...

chapter

Automated Generation of HDL Implementations of Dadda and Wallace Tree Multipliers

Lucas Gaia de Castro, Henrique Seiti Ogawa, Bruno de Carvalho Albertini

2017 VII Brazilian Symposium on Computing Systems Engineering (SBESC) > 17 - 22

2017 VII Brazilian Symposium on Computing Systems Engineering (SBESC)

Convolutional Neural Networks are being studied to provide features such as real time image recognition. One of the key operations to support HW implementations of this type of network is the multiplication. Despite the high number of operations required by Convolutional Neural Networks, they became feasible in the past years due the high availability of computing power, present on devices such as...

chapter

Automatic Partitioning of Stencil Computations on Heterogeneous Systems

Alyson D. Pereira, Rodrigo C.O. Rocha, Luiz Ramos, Marcio Castro, more

2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW) > 43 - 48

2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)

The stencil pattern is important in many scientific and engineering domains, spurring great interest from researchers and industry. In recent years, various optimizations have been proposed for parallel stencil applications running on GPUs. However, most of the runtime systems that execute those applications often fail to fully utilize the parallelism of modern heterogeneous systems. In this paper,...

chapter

Accelerated solution of stiffness matrix for isoparametric elements based on CUDA

Hu Binxing, Li Xinguo, Qiao Hao, Li Zenghao

2017 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC) > 1 - 4

2017 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC)

High precision results in structural with the shortest time consumption are expected when methods are introduced to solve FEM(Finite element method). Solving of stiffness matrix assembled by isoparametric elements and solving the assembled stiffness matrix are the most time-consuming. In the previous serial algorithms, there is always a time limitation for some applications and it is hard to achieve...

chapter

In situ video encoding of floating-point volume data using special-purpose hardware for a posteriori rendering and analysis

Nick Leaf, Bob Miller, Kwan-Liu Ma

2017 IEEE 7th Symposium on Large Data Analysis and Visualization (LDAV) > 64 - 73

2017 IEEE 7th Symposium on Large Data Analysis and Visualization (LDAV)

Scientific simulations typically store only a small fraction of computed timesteps due to storage and I/O bandwidth limitations. Previous work has demonstrated the compressibility of floating-point volume data, but such compression often comes with a tradeoff between computational complexity and the achievable compression ratio. This work demonstrates the use of special-purpose video encoding hardware...

chapter

Performance characterization, prediction, and optimization for heterogeneous systems with multi-level memory interference

Shin-Ying Lee, Carole-Jean Wu

2017 IEEE International Symposium on Workload Characterization (IISWC) > 43 - 53

2017 IEEE International Symposium on Workload Characterization (IISWC)

Modern computer systems are accelerator-rich, equipped with many types of hardware accelerators to speed up computation. For example, graphics processing units (GPUs) are a type of accelerators that are widely employed to accelerate parallel workloads. In order to well utilize different accelerators to gain better execution time speedup or reduce total energy consumption, many scheduling algorithms...

chapter

MLMS: Mini learning management system for schools without internet connection

Manuel J. Ibarra, Carlos Huaraca, Wilfredo Soto, Carmen Palomino

2017 Twelfth Latin American Conference on Learning Technologies (LACLO) > 1 - 7

2017 Twelfth Latin American Conference on Learning Technologies (LACLO)

More than 64% of schools in Apurimac Peru are located in rural area; unlike the schools in urban area, they present problem of lack of Internet connection, hence, without access to virtual learning and they can't use Educational Resources. This paper describes the design, development and testing of the Mini Learning Management System for schools without internet connection to improve de availability...

chapter

Parallel-implemented message passing algorithm for SCMA decoder based on GPGPU

Yunfeng Qi, Gang Wu, Su Hu, Yuan Gao

2017 9th International Conference on Wireless Communications and Signal Processing (WCSP) > 1 - 6

2017 9th International Conference on Wireless Communications and Signal Processing (WCSP)

Current multi-user detection scheme for sparse code multiple access (SCMA) is iterative message passing algorithm (MPA) in which the message update strategy is in a parallel manner. To take full advantage of MPA's feature of parallelism, this letter proposes a hardware implementation strategy of max-log MPA decoder used in SCMA systems with soft baseband, which is based on general-purpose computing...

chapter

HeteroSync: A benchmark suite for fine-grained synchronization on tightly coupled GPUs

Matthew D. Sinclair, Johnathan Alsop, Sarita V. Adve

2017 IEEE International Symposium on Workload Characterization (IISWC) > 239 - 249

2017 IEEE International Symposium on Workload Characterization (IISWC)

Traditionally GPUs focused on streaming, data-parallel applications, with little data reuse or sharing and coarse-grained synchronization. However, the rise of general-purpose GPU (GPGPU) computing has made GPUs desirable for applications with more general sharing patterns and fine-grained synchronization, especially for recent GPUs that have a unified address space and coherent caches. Prior work...

chapter

FPGA acceleration of multilevel ORB feature extraction for computer vision

Josh Weberruss, Lindsay Kleeman, David Boland, Tom Drummond

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 8

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

In this paper, we present the first multilevel implementation of the Harris-Stephens corner detector and the ORB feature extractor running on FPGA hardware, for computer vision and robotics applications. ORB is a fundamental component of many robotics applications, and requires significant computation. The design has been validated both in behavioural simulation and in implementation on an Arria V...

chapter

ConVGPU: GPU Management Middleware in Container Based Virtualized Environment

Daeyoun Kang, Tae Joon Jun, Dohyeun Kim, Jaewook Kim, more

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 301 - 309

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Nowadays, Graphics Processing Unit (GPU) is essential for general-purpose high-performance computing, because of its dominant performance in parallel computing compare to that of CPU. There have been many successful trials on the use of GPU in virtualized environment. Especially, NVIDIA Docker obtained a most practical way to bring GPU into the container-based virtualized environment. However, most...

chapter

Comparison of GPU and FPGA based hardware platforms for ultrasonic flaw detection using support vector machines

Kushal Virupakshappa, Erdal Oruklu, Yiyue Jiang, Yu Yuan

2017 IEEE International Ultrasonics Symposium (IUS) > 1

2017 IEEE International Ultrasonics Symposium (IUS)

Our earlier work on support vector machines (SVM) and ultrasonic flaw detection algorithms demonstrated i) highly accurate classifier performance and ii) the feasibility of the algorithm for real-time implementation on low-cost embedded systems with graphical processing units (GPU) and CUDA library (a parallel computing platform and programming model) support. This works extends the implementation...

chapter

Scientific computing using consumer video-gaming embedded devices

Glenn Volkema, Gaurav Khanna

2017 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 8

2017 IEEE High Performance Extreme Computing Conference (HPEC)

The performance of commodity video-gaming embedded devices (consoles, graphics cards, tablets, etc.) has been advancing at a rapid pace owing to strong consumer demand and stiff market competition. Gaming devices are currently amongst the most powerful and cost-effective computational technologies available in quantity. In this article, we evaluate a sample of current generation video-gaming devices...

chapter

Sparse matrix assembly on the GPU through multiplication patterns

Rhaleb Zayer, Markus Steinberger, Hans-Peter Seidel

2017 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 8

2017 IEEE High Performance Extreme Computing Conference (HPEC)

The numerical treatment of variational problems gives rise to large sparse matrices, which are typically assembled by coalescing elementary contributions. As the explicit matrix form is required by numerical solvers, the assembly step can be a potential bottleneck, especially in implicit and time dependent settings where considerable updates are needed. On standard HPC platforms, this process can...

chapter

Ultrasonic imaging research platform with GPU-based software focusing

Mok Kun Jeong, Sung Jae Kwon, Chun Duck Park, Baek Sop Kim, more

2017 IEEE International Ultrasonics Symposium (IUS) > 1 - 4

2017 IEEE International Ultrasonics Symposium (IUS)

We have designed and implemented an ultrasonic imaging research platform that performs all signal processing, including beamforming and image processing, using software on a GPU. An operating software is developed on a PC that can control RF data acquisition hardware to accommodate ultrasound images of various formats. Beamforming methods that include conventional scan line based imaging, scan line-based...

chapter

Ultrasonic imaging research platform with GPU based software focusing

Mok Kun Jeong, Sung Jae Kwon, Chun Duck Park, Baek Sop Kim, more

2017 IEEE International Ultrasonics Symposium (IUS) > 1

2017 IEEE International Ultrasonics Symposium (IUS)

We have designed and implemented an ultrasonic imaging research platform that performs all signal processing including beamforming, using software on a GPU. Software-based approach on the GPU is expected to reduce the hardware complexity and offer the advantages of flexibility and rapid implementation even if there is any future change in the requirements for ultrasound imaging applications. An operating...

chapter

Exploiting half precision arithmetic in Nvidia GPUs

Nhut-Minh Ho, Weng-Fai Wong

2017 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 7

2017 IEEE High Performance Extreme Computing Conference (HPEC)

With the growing importance of deep learning and energy-saving approximate computing, half precision floating point arithmetic (FP16) is fast gaining popularity. Nvidia's recent Pascal architecture was the first GPU that offered FP16 support. However, when actual products were shipped, programmers soon realized that a naïve replacement of single precision (FP32) code with half precision led to disappointing...

chapter

POSTER: BACM: Barrier-Aware Cache Management for Irregular Memory-Intensive GPGPU Workloads

Yuxi Liu, Xia Zhao, Zhibin Yu, Zhenlin Wang, more

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 140 - 141

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

General-purpose workloads running on modern graphics processing units (GPGPUs) rely on hardware-based barriers to synchronize warps within a thread block (TB). However, imbalance may exist before reaching a barrier if a GPGPU workload contains irregular memory accesses, i.e., some warps may be critical while others may not. Ideally, cache space should be reserved for the critical warps. Unfortunately,...

chapter

An Ultra Low-Power Hardware Accelerator for Acoustic Scoring in Speech Recognition

Hamid Tabani, Jose-Maria Arnau, Jordi Tubella, Antonio Gonzalez

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 41 - 52

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

Accurate, real-time Automatic Speech Recognition (ASR) comes at a high energy cost, so accuracy has often to be sacrificed in order to fit the strict power constraints of mobile systems. However, accuracy is extremely important for the end-user, and today's systems are still unsatisfactory for many applications. The most critical component of an ASR system is the acoustic scoring, as it has a large...

chapter

POSTER: DaQueue: A Data-Aware Work-Queue Design for GPGPUs

Yashuai Lu, Libo Huang, Li Shen

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 142 - 143

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

Work-queue is an effective approach for mapping irregular-parallel workloads to GPGPUs. It can improve the utilization of SIMD units by only processing useful works which are dynamically generated during execution. As current GPGPUs lack necessary supports for work-queues, a software-based work-queue implementation often suffers from memory contention and load balancing issues. We present a novel...

INFONA - science communication portal

Search results

Scalpel: Customizing DNN pruning to the underlying hardware parallelism

Automated Generation of HDL Implementations of Dadda and Wallace Tree Multipliers

Automatic Partitioning of Stencil Computations on Heterogeneous Systems

Accelerated solution of stiffness matrix for isoparametric elements based on CUDA

In situ video encoding of floating-point volume data using special-purpose hardware for a posteriori rendering and analysis

Performance characterization, prediction, and optimization for heterogeneous systems with multi-level memory interference

MLMS: Mini learning management system for schools without internet connection

Parallel-implemented message passing algorithm for SCMA decoder based on GPGPU

HeteroSync: A benchmark suite for fine-grained synchronization on tightly coupled GPUs

FPGA acceleration of multilevel ORB feature extraction for computer vision

ConVGPU: GPU Management Middleware in Container Based Virtualized Environment

Comparison of GPU and FPGA based hardware platforms for ultrasonic flaw detection using support vector machines

Scientific computing using consumer video-gaming embedded devices

Sparse matrix assembly on the GPU through multiplication patterns

Ultrasonic imaging research platform with GPU-based software focusing

Ultrasonic imaging research platform with GPU based software focusing

Exploiting half precision arithmetic in Nvidia GPUs

POSTER: BACM: Barrier-Aware Cache Management for Irregular Memory-Intensive GPGPU Workloads

An Ultra Low-Power Hardware Accelerator for Acoustic Scoring in Speech Recognition

POSTER: DaQueue: A Data-Aware Work-Queue Design for GPGPUs

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options