Onur Kayiran

chapter

There and back again: Optimizing the interconnect in networks of memory cubes

Matthew Poremba, Itir Akgun, Jieming Yin, Onur Kayiran, more

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 678 - 690

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

High-performance computing, enterprise, and datacenter servers are driving demands for higher total memory capacity as well as memory performance. Memory “cubes” with high per-package capacity (from 3D integration) along with high-speed point-to-point interconnects provide a scalable memory system architecture with the potential to deliver both capacity and performance. Multiple such cubes connected...

chapter

Controlled Kernel Launch for Dynamic Parallelism in GPUs

Xulong Tang, Ashutosh Pattnaik, Huaipan Jiang, Onur Kayiran, more

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) > 649 - 660

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)

Dynamic parallelism (DP) is a promising feature for GPUs, which allows on-demand spawning of kernels on the GPU without any CPU intervention. However, this feature has two major drawbacks. First, the launching of GPU kernels can incur significant performance penalties. Second, dynamically-generated kernels are not always able to efficiently utilize the GPU cores due to hardware-limits. To address...

chapter

Design and Analysis of an APU for Exascale Computing

Thiruvengadam Vijayaraghavany, Yasuko Eckert, Gabriel H. Loh, Michael J. Schulte, more

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) > 85 - 96

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)

The challenges to push computing to exaflop levels are difficult given desired targets for memory capacity, memory bandwidth, power efficiency, reliability, and cost. This paper presents a vision for an architecture that can be used to construct exascale systems. We describe a conceptual Exascale Node Architecture (ENA), which is the computational building block for an exascale supercomputer. The...

chapter

OSCAR: Orchestrating STT-RAM cache traffic for heterogeneous CPU-GPU architectures

Jia Zhan, Onur Kayiran, Gabriel H. Loh, Chita R. Das, more

2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) > 1 - 13

2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

As we integrate data-parallel GPUs with general-purpose CPUs on a single chip, the enormous cache traffic generated by GPUs will not only exhaust the limited cache capacity, but also severely interfere with CPU requests. Such heterogeneous multicores pose significant challenges to the design of shared last-level cache (LLC). This problem can be mitigated by replacing SRAM LLC with emerging non-volatile...

chapter

Efficient synthetic traffic models for large, complex SoCs

Jieming Yin, Onur Kayiran, Matthew Poremba, Natalie Enright Jerger, more

2016 IEEE International Symposium on High Performance Computer Architecture (HPCA) > 297 - 308

2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)

The interconnect or network on chip (NoC) is an increasingly important component in processors. As systems scale up in size and functionality, the ability to efficiently model larger and more complex NoCs becomes increasingly important to the design and evaluation of such systems. Recent work proposed the "SynFull" methodology that performs statistical analysis of a workload's NoC traffic...

chapter

Scheduling techniques for GPU architectures with processing-in-memory capabilities

Ashutosh Pattnaik, Xulong Tang, Adwait Jog, Onur Kayiran, more

2016 International Conference on Parallel Architecture and Compilation Techniques (PACT) > 31 - 44

2016 International Conference on Parallel Architecture and Compilation Techniques (PACT)

Processing data in or near memory (PIM), as opposed to in conventional computational units in a processor, can greatly alleviate the performance and energy penalties of data transfers from/to main memory. Graphics Processing Unit (GPU) architectures and applications, where main memory bandwidth is a critical bottleneck, can benefit from the use of PIM. To this end, an application should be properly...

chapter

μC-States: Fine-grained GPU datapath power management

Onur Kayiran, Adwait Jog, Ashutosh Pattnaik, Rachata Ausavarungnirun, more

2016 International Conference on Parallel Architecture and Compilation Techniques (PACT) > 17 - 30

2016 International Conference on Parallel Architecture and Compilation Techniques (PACT)

To improve the performance of Graphics Processing Units (GPUs) beyond simply increasing core count, architects are recently adopting a scale-up approach: the peak throughput and individual capabilities of the GPU cores are increasing rapidly. This big-core trend in GPUs leads to various challenges, including higher static power consumption and lower and imbalanced utilization of the datapath components...

chapter

Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance

Rachata Ausavarungnirun, Saugata Ghose, Onur Kayiran, Gabriel H. Loh, more

2015 International Conference on Parallel Architecture and Compilation (PACT) > 25 - 38

2015 International Conference on Parallel Architecture and Compilation (PACT)

In a GPU, all threads within a warp execute the same instruction in lockstep. For a memory instruction, this can lead to memory divergence: the memory requests for some threads are serviced early, while the remaining requests incur long latencies. This divergence stalls the warp, as it cannot execute the next instruction until all requests from the current instruction complete. In this work, we make...

chapter

Managing GPU Concurrency in Heterogeneous Architectures

Onur Kayiran, Nachiappan Chidambaram Nachiappan, Adwait Jog, Rachata Ausavarungnirun, more

2014 47th Annual IEEE/ACM International Symposium on Microarchitecture > 114 - 126

2014 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

Heterogeneous architectures consisting of general-purpose CPUs and throughput-optimized GPUs are projected to be the dominant computing platforms for many classes of applications. The design of such systems is more complex than that of homogeneous architectures because maximizing resource utilization while minimizing shared resource interference between CPU and GPU applications is difficult. We show...

chapter

Neither more nor less: Optimizing thread-level parallelism for GPGPUs

Onur Kayiran, Adwait Jog, Mahmut T. Kandemir, Chita R. Das

Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques > 157 - 166

2013 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT)

General-purpose graphics processing units (GPG-PUs) are at their best in accelerating computation by exploiting abundant thread-level parallelism (TLP) offered by many classes of HPC applications. To facilitate such high TLP, emerging programming models like CUDA and OpenCL allow programmers to create work abstractions in terms of smaller work units, called cooperative thread arrays (CTAs). CTAs are...

INFONA - science communication portal

Search results for: Onur Kayiran

There and back again: Optimizing the interconnect in networks of memory cubes

Controlled Kernel Launch for Dynamic Parallelism in GPUs

Design and Analysis of an APU for Exascale Computing

OSCAR: Orchestrating STT-RAM cache traffic for heterogeneous CPU-GPU architectures

Efficient synthetic traffic models for large, complex SoCs

Scheduling techniques for GPU architectures with processing-in-memory capabilities

μC-States: Fine-grained GPU datapath power management

Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance

Managing GPU Concurrency in Heterogeneous Architectures

Neither more nor less: Optimizing thread-level parallelism for GPGPUs

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results for: Onur Kayiran

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options