Search results for: Mayank Daga

Items from 1 to 9 out of 9 results

chapter

Structural Agnostic SpMV: Adapting CSR-Adaptive for Irregular Matrices

Mayank Daga, Joseph L. Greathouse

2015 IEEE 22nd International Conference on High Performance Computing (HiPC) > 64 - 74

2015 IEEE 22nd International Conference on High Performance Computing (HiPC)

Sparse matrix vector multiplication (SpMV) is an important linear algebra primitive. Recent research has focused on improving the performance of SpMV on GPUs when using compressed sparse row (CSR), the most frequently used matrix storage format on CPUs. Efficient CSR-based SpMV obviates the need for other GPU-specific storage formats, thereby saving runtime and storage overheads. However, existing...

chapter

Exploring Parallel Programming Models for Heterogeneous Computing Systems

Mayank Daga, Zachary S. Tschirhart, Chip Freitag

2015 IEEE International Symposium on Workload Characterization > 98 - 107

2015 IEEE International Symposium on Workload Characterization (IISWC)

Parallel systems that employ CPUs and GPUs as two heterogeneous computational units have become immensely popular due to their ability to maximize performance under restrictive thermal budgets. However, programming heterogeneous systems via traditional programming models like OpenCL or CUDA involves rewriting large portions of application-code. They also lead to code that is not performance portable...

chapter

On the Performance, Energy, and Power of Data-Access Methods in Heterogeneous Computing Systems

Rubasri Kalidas, Mayank Daga, Konstantinos Krommydas, Wu-Chun Feng

2015 IEEE International Parallel and Distributed Processing Symposium Workshop > 871 - 879

2015 IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW)

Graphics processing units (GPUs) have delivered promising speedups in data-parallel applications. A discrete GPU resides on the PCIe interface and has traditionally required data to be moved from the host memory to the GPU memory via PCIe. In certain applications, the overhead of these data transfers between memory spaces can nullify any performance gains achieved from faster computation on the GPU...

chapter

Efficient Sparse Matrix-Vector Multiplication on GPUs Using the CSR Storage Format

Joseph L. Greathouse, Mayank Daga

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis > 769 - 780

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis

The performance of sparse matrix vector multiplication (SpMV) is important to computational scientists. Compressed sparse row (CSR) is the most frequently used format to store sparse matrices. However, CSR-based SpMV on graphics processing units (GPUs) has poor performance due to irregular memory access patterns, load imbalance, and reduced parallelism. This has led researchers to propose new storage...

chapter

Efficient breadth-first search on a heterogeneous processor

Mayank Daga, Mark Nutter, Mitesh Meswani

2014 IEEE International Conference on Big Data (Big Data) > 373 - 382

2014 IEEE International Conference on Big Data (Big Data)

Accelerating breadth-first search (BFS) can be a compelling value-add given its pervasive deployment. The current state-of-the-art hybrid BFS algorithm selects different traversal directions based on graph properties, thereby, possessing heterogeneous characteristics. Related work has studied this heterogeneous BFS algorithm on homogeneous processors. In recent years heterogeneous processors have...

chapter

Exploiting Coarse-Grained Parallelism in B+ Tree Searches on an APU

Mayank Daga, Mark Nutter

2012 SC Companion: High Performance Computing, Networking Storage and Analysis > 240 - 247

2012 SC Companion: High-Performance Computing, Networking, Storage and Analysis (SCC)

B+ tree structured index searches are one of the fundamental database operations and hence, accelerating them is essential. GPUs provide a compelling mix of performance per watt and performance per dollar, and thus are an attractive platform for accelerating B+ tree searches. However, tree search on discrete GPUs presents significant challenges for acceleration due to (i) the irregular representation...

chapter

Architecture-Aware Mapping and Optimization on a 1600-Core GPU

Mayank Daga, Thomas Scogland, Wu-chun Feng

2011 IEEE 17th International Conference on Parallel and Distributed Systems > 316 - 323

2011 IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS)

The graphics processing unit (GPU) continues to make in-roads as a computational accelerator for high-performance computing (HPC). However, despite its increasing popularity, mapping and optimizing GPU code remains a difficult task, it is a multi-dimensional problem that requires deep technical knowledge of GPU architecture. Although substantial literature exists on how to map and optimize GPU performance...

chapter

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

Mayank Daga, Ashwin M. Aji, Wu-chun Feng

2011 Symposium on Application Accelerators in High-Performance Computing > 141 - 149

2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC)

The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers between the CPU and GPU over PCIe. Emerging heterogeneous computing architectures that "fuse" the functionality of the CPU and GPU, e.g., AMD...