Search results

chapter

Understanding and overcoming parallelism bottlenecks in ForkJoin applications

Gustavo Pinto, Anthony Canino, Fernando Castor, Guoqing Xu, more

2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE) > 765 - 775

2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)

ForkJoin framework is a widely used parallel programming framework upon which both core concurrency libraries and real-world applications are built. Beneath its simple and user-friendly APIs, ForkJoin is a sophisticated managed parallel runtime unfamiliar to many application programmers: the framework core is a work-stealing scheduler, handles fine-grained tasks, and sustains the pressure from automatic...

chapter

Thread- and data-level parallel simulation in SystemC, a Bitcoin miner case study

Zhongqi Cheng, Tim Schmidt, Guantao Liu, Rainer Doomer

2017 IEEE International High Level Design Validation and Test Workshop (HLDVT) > 74 - 81

2017 IEEE International High Level Design Validation and Test Workshop (HLDVT)

The rapidly growing design complexity has become a big obstacle and dramatically increased the time required for SystemC simulation. In this case study, we exploit different levels of parallelism, including thread- and data-level parallelism, to accelerate the simulation of a Bitcoin miner model in SystemC. Our experiments are performed on two multi-core processors and one many-core Intel(g) Xeon...

chapter

Distributed Affine-Invariant MCMC Sampler

Balazs Nemeth, Tom Haber, Jori Liesenborgs, Wim Lamotte

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 520 - 524

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Markov Chain Monte Carlo methods provide a tool for tackling high dimensional problems. With many-core systems readily available today, it is no surprise that leveraging parallelism in these samplers has been a subject of recent research. The focus has been on solutions for shared-memory architectures, however these perform poorly in a distributed-memory environment. This paper introduces a fully...

chapter

Reconfiguring Parallel State Machine Replication

Eduardo Alchieri, Fernando Dotti, Odorico M. Mendizabal, Fernando Pedone

2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS) > 104 - 113

2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS)

State Machine Replication (SMR) is a well-known technique to implement fault-tolerant systems. In SMR, servers are replicated and client requests are deterministically executed in the same order by all replicas. To improve performance in multi-processor systems, some approaches have proposed to parallelize the execution of non-conflicting requests. Such approaches perform remarkably well in workloads...

chapter

Phase-locked radar impulse optimal receiver by average risk minimum criterion

Denis A. Dolmatov, Renat R. Abdullin, Rostislav I. Sokolov

2017 IEEE Radio and Antenna Days of the Indian Ocean (RADIO) > 1 - 2

2017 IEEE Radio and Antenna Days of the Indian Ocean (RADIO)

This paper reflects the results of developed sync system investigation, which is based on optimal receiver by average risk minimum criterion. This receiver is intended to detect the reflected radar impulse under conditions of Gaussian and Johnson noises. During a digital experiment we used M-sequence with single pulse phase-shift keying as a reference signal. The use of a developed self-tuning system...

chapter

Practical Experience with Transactional Lock Elision

Tingzhe Zhou, Pante A Zardoshti, Michael Spear

2017 46th International Conference on Parallel Processing (ICPP) > 81 - 90

2017 46th International Conference on Parallel Processing (ICPP)

Transactional Memory (TM) promises both to provide a scalable mechanism for synchronization in concurrent programs, and to offer ease-of-use benefits to programmers. The most straightforward use of TM in real-world programs is in the form of Transactional Lock Elision (TLE). In TLE, critical sections are attempted as transactions, with a fall-back to a lock if conflicts manifest. Thus TLE expects...

chapter

Massive spatial query on the Kepler architecture

Yili Gong, Jia Tang, Wenhai Li, Zihui Ye

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) > 111 - 118

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP)

In this paper, we present an optimized framework that can efficiently perform massive spatial queries on the current GPUs. To benefit the widely adopted filter-and-verify paradigm from GPUs, the skewed workloads are first associated with certain cells in a scaled spatial grid, such that the following range verification cost against the massive spatial objects can be significantly reduced. Particularly...

chapter

Parallelized Mobility-Aware Complex Event Processing

Yuhao Gong, Hongyu Kuang, Xinchen Cai, Hao Hu, more

2017 IEEE International Conference on Web Services (ICWS) > 898 - 901

2017 IEEE International Conference on Web Services (ICWS)

The concept of complex event processing (CEP) and complex-event-aware service have been extensively studied to retrieve relevant information from massive amount of realtime streaming events. In mobile environment, the Mobilityaware CEP (MCEP) system was proposed to address the issue of synchronization problem between different query ranges and MCEP operators. We noticed that MCEP systems lack the...

chapter

Comparison of Threading Programming Models

Solmaz Salehian, Jiawen Liu, Yonghong Yan

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 766 - 774

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

In this paper, we provide comparison of languagefeatures and runtime systems of commonly used threadingparallel programming models for high performance computing, including OpenMP, Intel Cilk Plus, Intel TBB, OpenACC, NvidiaCUDA, OpenCL, C++11 and PThreads. We then report ourperformance comparison of OpenMP, Cilk Plus and C++11 fordata and task parallelism on CPU using benchmarks. The resultsshow...

chapter

Improving the Integration of Task Nesting and Dependencies in OpenMP

Josep M. Perez, Vicenc Beltran, Jesus Labarta, Eduard Ayguade

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) > 809 - 818

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

The tasking model of OpenMP 4.0 supports both nesting and the definition of dependences between sibling tasks. A natural way to parallelize many codes with tasks is to first taskify the high-level functions and then to further refine these tasks with additional subtasks. However, this top-down approach has some drawbacks since combining nesting with dependencies usually requires additional measures...

chapter

Generating Performance Models for Irregular Applications

Ryan D. Friese, Nathan R. Tallent, Abhinav Vishnu, Darren J. Kerbyson, more

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) > 317 - 326

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

Many applications have irregular behavior — e.g., input-dependent solvers, irregular memory accesses, or unbiased branches — that cannot be captured using today's automated performance modeling techniques. We describe new hierarchical critical path analyses for the Palm model generation tool. To obtain a good tradeoff between model accuracy, generality, and generation cost, we combine static and dynamic...

chapter

Sparse Tensor Factorization on Many-Core Processors with High-Bandwidth Memory

Shaden Smith, Jongsoo Park, George Karypis

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) > 1058 - 1067

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

HPC systems are increasingly used for data intensive computations which exhibit irregular memory accesses, non-uniform work distributions, large memory footprints, and high memory bandwidth demands. To address these challenging demands, HPC systems are turning to many-core architectures that feature a large number of energy-efficient cores backed by high-bandwidth memory. These features are exemplified...

chapter

A GPU-based parallel MAX-MIN Ant System algorithm with grouped roulette wheel selection

Wei Zhou, Fazhi He, Zhengchang Zhang

2017 IEEE 21st International Conference on Computer Supported Cooperative Work in Design (CSCWD) > 360 - 365

2017 IEEE 21st International Conference on Computer Supported Cooperative Work in Design (CSCWD)

This paper presents a GPU-accelerated parallel MAX-MIN Ant System (MMAS) algorithm based on an approach named grouped roulette wheel selection (G-Roulette). A data parallel model is adapted in the proposed approach with special consideration for GPU architecture. We propose a G-Roulette strategy to enhance the parallel computation of fitness-proportionate selection. The G-Roulette strategy includes...

chapter

ParaDiMe: A Distributed Memory FPGA Router Based on Speculative Parallelism and Path Encoding

Chin Hau Hoo, Akash Kumar

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) > 172 - 179

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

The increase in speed and capacity of FPGAs is faster than the development of effective design tools to fully utilize it, and routing of nets remains as one of the most time-consuming stages of the FPGA design flow. While existing works have proposed methods of accelerating routing through parallelization, they are limited by the memory architecture of the system that they target. In this paper, we...

chapter

LIFT: A functional data-parallel IR for high-performance GPU code generation

Michel Steuwer, Toomas Remmelg, Christophe Dubach

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 74 - 85

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Parallel patterns (e.g., map, reduce) have gained traction as an abstraction for targeting parallel accelerators and are a promising answer to the performance portability problem. However, compiling high-level programs into efficient low-level parallel code is challenging. Current approaches start from a high-level parallel IR and proceed to emit GPU code directly in one big step. Fixed strategies...

chapter

Detailed and highly parallelizable cycle-accurate network-on-chip simulation on GPGPU

Amir Charif, Alexandre Coelho, Nacer-Eddine Zergainoh, Michael Nicolaidis

2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC) > 672 - 677

2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC)

As the number of processing elements in modern chips keeps increasing, the evaluation of new designs will need to account for various challenges at the NoC level. To cope with the impractically long run times when simulating large NoCs, we introduce a novel GPU-based parallel simulation method that can speed up simulations by over 250×, while offering RTL-like accuracy. These promising results make...

chapter

More Effective Synchronization Scheme in ML Using Stale Parameters

Yabin Li, Han Wan, Bo Jiang, Xiang Long

2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS) > 757 - 764

2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS)

In Machine learning (ML) the model we use is increasingly important, and the model's parameters, the key point of the ML, are adjusted through iteratively processing a training dataset until convergence. Although data-parallel ML systems often engage a perfect error tolerance when synchronizing the model parameters for maximizing parallelism, the synchronization of model parameters may delay in completion,...

chapter

The AllScale Runtime Interface — Theoretical Foundation and Concept

Arne Hendricks, Thomas Heller, Herbert Jordan, Peter Thoman, more

2016 9th Workshop on Many-Task Computing on Clouds, Grids, and Supercomputers (MTAGS) > 13 - 19

2016 9th Workshop on Many-Task Computing on Clouds, Grids, and Supercomputers (MTAGS)

Extreme scale HPC systems are expected to reach exascale performance around the year 2020. While it is widely known that theses systems pose new challenges regarding energy efficiency of architectures, concurrency and resiliency, they also challenge developers of applications trying to efficiently utilizing resources: Managing parallel control flows, hardware resources and dependencies is a complex...

chapter

Improving Consistency of Parallel Emulation for Multi-hop Wireless Ad-Hoc Network

Yusuke Yamazaki, Timothy Girry Kale, Satoshi Ohzahata, Toshihiko Kato

2016 Fourth International Symposium on Computing and Networking (CANDAR) > 338 - 342

2016 Fourth International Symposium on Computing and Networking (CANDAR)

When parallel and distribution processing is used for wireless network emulation, the results are different in case that each of the receiver nodes independently run the events without synchronizing among these events. In our previous study for the parallel wireless network emulation, we proposed a synchronizing method to get the same result of collision event among the terminals that receive the...

chapter

Image Feature Matching and Its Parallelization Using OpenMP

Nupur Kohli, Teng-Sheng Moh

2016 International Conference on Collaboration Technologies and Systems (CTS) > 249 - 256

2016 International Conference on Collaboration Technologies and Systems (CTS)

Parallel Computing has been gaining interest nowadays due to physical constraints preventing frequency scaling. Therefore, in order to achieve high performance on multicore systems, programmers need to focus on parallelizing their programs. Although there are many available parallelized APIs written by experts that should improve coding, they do not automatically guarantee good performance. This paper...

INFONA - science communication portal

Search results

Understanding and overcoming parallelism bottlenecks in ForkJoin applications

Thread- and data-level parallel simulation in SystemC, a Bitcoin miner case study

Distributed Affine-Invariant MCMC Sampler

Reconfiguring Parallel State Machine Replication

Phase-locked radar impulse optimal receiver by average risk minimum criterion

Practical Experience with Transactional Lock Elision

Massive spatial query on the Kepler architecture

Parallelized Mobility-Aware Complex Event Processing

Comparison of Threading Programming Models

Improving the Integration of Task Nesting and Dependencies in OpenMP

Generating Performance Models for Irregular Applications

Sparse Tensor Factorization on Many-Core Processors with High-Bandwidth Memory

A GPU-based parallel MAX-MIN Ant System algorithm with grouped roulette wheel selection

ParaDiMe: A Distributed Memory FPGA Router Based on Speculative Parallelism and Path Encoding

LIFT: A functional data-parallel IR for high-performance GPU code generation

Detailed and highly parallelizable cycle-accurate network-on-chip simulation on GPGPU

More Effective Synchronization Scheme in ML Using Stale Parameters

The AllScale Runtime Interface — Theoretical Foundation and Concept

Improving Consistency of Parallel Emulation for Multi-hop Wireless Ad-Hoc Network

Image Feature Matching and Its Parallelization Using OpenMP

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options