Sreepathi Pai

chapter

Parallel triangle counting and k-truss identification using graph-centric methods

Chad Voegele, Yi-Shan Lu, Sreepathi Pai, Keshav Pingali

2017 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 7

2017 IEEE High Performance Extreme Computing Conference (HPEC)

We describe CPU and GPU implementations of parallel triangle-counting and k-truss identification in the Galois and IrGL systems. Both systems are based on a graph-centric abstraction called the operator formulation of algorithms. Depending on the input graph, our implementations are two to three orders of magnitude faster than the reference implementations provided by the IEEE HPEC static graph challenge.

chapter

Fast and efficient automatic memory management for GPUs using compiler-assisted runtime coherence scheme

Sreepathi Pai, R. Govindarajan, Matthew J. Thazhuthaveetil

2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT) > 33 - 42

2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT)

Exploiting the performance potential of GPUs requires managing the data transfers to and from them efficiently which is an error-prone and tedious task. In this paper, we develop a software coherence mechanism to fully automate all data transfers between the CPU and GPU without any assistance from the programmer. Our mechanism uses compiler analysis to identify potential stale accesses and uses a...

chapter

Controlled Kernel Launch for Dynamic Parallelism in GPUs

Xulong Tang, Ashutosh Pattnaik, Huaipan Jiang, Onur Kayiran, more

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) > 649 - 660

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)

Dynamic parallelism (DP) is a promising feature for GPUs, which allows on-demand spawning of kernels on the GPU without any CPU intervention. However, this feature has two major drawbacks. First, the launching of GPU kernels can incur significant performance penalties. Second, dynamically-generated kernels are not always able to efficiently utilize the GPU cores due to hardware-limits. To address...

chapter

Synchronization Trade-Offs in GPU Implementations of Graph Algorithms

Rashid Kaleem, Anand Venkat, Sreepathi Pai, Mary Hall, more

2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS) > 514 - 523

2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

Although there is an extensive literature on GPU implementations of graph algorithms, we do not yet have a clear understanding of how implementation choices impact performance. As a step towards this goal, we studied how the choice of synchronization mechanism affects the end-to-end performance of complex graph algorithms, using stochastic gradient descent (SGD) as an exemplar. We implemented seven...

chapter

Preemptive thread block scheduling with online structural runtime prediction for concurrent GPGPU kernels

Sreepathi Pai, R. Govindarajan, Matthew J. Thazhuthaveetil

2014 23rd International Conference on Parallel Architecture and Compilation (PACT) > 483 - 484

2014 23rd International Conference on Parallel Architecture and Compilation (PACT)

Recent NVIDIA Graphics Processing Units (GPUs) can execute multiple kernels concurrently. On these GPUs, the thread block scheduler (TBS) currently uses the FIFO policy to schedule thread blocks of concurrent kernels. We show that the FIFO policy leaves performance to chance, resulting in significant loss of performance and fairness. To improve performance and fairness, we propose use of the preemptive...

INFONA - science communication portal

Search results for: Sreepathi Pai

Parallel triangle counting and k-truss identification using graph-centric methods

Fast and efficient automatic memory management for GPUs using compiler-assisted runtime coherence scheme

Controlled Kernel Launch for Dynamic Parallelism in GPUs

Synchronization Trade-Offs in GPU Implementations of Graph Algorithms

Preemptive thread block scheduling with online structural runtime prediction for concurrent GPGPU kernels

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results for: Sreepathi Pai

Parallel triangle counting and k-truss identification using graph-centric methods

Fast and efficient automatic memory management for GPUs using compiler-assisted runtime coherence scheme

Controlled Kernel Launch for Dynamic Parallelism in GPUs

Synchronization Trade-Offs in GPU Implementations of Graph Algorithms

Preemptive thread block scheduling with online structural runtime prediction for concurrent GPGPU kernels

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options