Search results

chapter

Simulation of the algorithms for optimization of remote core locking method for multicore computer systems

Alexey A. Paznikov, Kirill V. Pavsky, Valery A. Pavsky, Mikhail S. Kupriyanov

2017 IEEE II International Conference on Control in Technical Systems (CTS) > 51 - 54

2017 IEEE II International Conference on Control in Technical Systems (CTS)

This paper proposes the algorithms for optimization of Remote Core Locking (RCL) synchronization method in multithreaded programs. The algorithm of initialization of RCL-locks and the algorithms for threads affinity optimization are developed. The algorithms consider the structures of hierarchical computer systems and non-uniform memory access (NUMA) to minimize execution time of RCL-programs. The...

chapter

Algorithms of collective operations for distributed arrays in partitioned global address space

Alexey A. Paznikov, Mikhail G. Kurnosov, Mikhail S. Kupriyanov

2017 IEEE II International Conference on Control in Technical Systems (CTS) > 5 - 8

2017 IEEE II International Conference on Control in Technical Systems (CTS)

This paper represents the heuristic algorithms for optimizing communications in parallel PGAS-programs and minimizes of its execution time. This is achieved by taking into account of hierarchical structure of computer systems while reduction. Developed algorithms are implemented for PGAS-language Cray Chapel.

chapter

Thread- and data-level parallel simulation in SystemC, a Bitcoin miner case study

Zhongqi Cheng, Tim Schmidt, Guantao Liu, Rainer Doomer

2017 IEEE International High Level Design Validation and Test Workshop (HLDVT) > 74 - 81

2017 IEEE International High Level Design Validation and Test Workshop (HLDVT)

The rapidly growing design complexity has become a big obstacle and dramatically increased the time required for SystemC simulation. In this case study, we exploit different levels of parallelism, including thread- and data-level parallelism, to accelerate the simulation of a Bitcoin miner model in SystemC. Our experiments are performed on two multi-core processors and one many-core Intel(g) Xeon...

chapter

A graphics tracing framework for exploring CPU+GPU memory systems

Andreas Sembrant, Trevor E. Carlson, Erik Hagersten, David Black-Schaffer

2017 IEEE International Symposium on Workload Characterization (IISWC) > 54 - 65

2017 IEEE International Symposium on Workload Characterization (IISWC)

Modern SoCs contain CPU and GPU cores to execute both general purpose and highly-parallel graphics workloads. While the primary use of the GPU is for rendering graphics, the effects of graphics workloads on the overall system have received little attention. The primary reason for this is the lack of efficient tools and simulators for modern graphics applications. In this work, we present GLTraceSim,...

chapter

cudaCR: An In-Kernel Application-Level Checkpoint/Restart Scheme for CUDA-Enabled GPUs

Behnam Pourghassemi, Aparna Chandramowlishwaran

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 725 - 732

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Fault-tolerance is becoming increasingly important as we enter the era of exascale computing. Increasing the number of cores results in a smaller mean time between failures, and consequently, higher probability of errors. Among the different software fault tolerance techniques, checkpoint/restart is the most commonly used method in supercomputers, the de-facto standard for large-scale systems. Although...

chapter

Effectiveness of partitioning strategies of Fast Fourier Transform in GPU implementations

Kamil Wieloch, Kamil Stokfiszewski, Mykhaylo Yatsymirskyy

2017 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT) > 1 > 322 - 325

2017 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT)

In this paper authors present the results of the effectiveness comparison between the variants of the Radix-2 Deci-mation in Time (DIT) Fast Fourier Transform (FFT) algorithm's implementations on graphics processing units (GPUs) which differ in the way the calculations are distributed among GPUs computational resources. The conducted experiments show that the partitioning of the FFT computational...

chapter

RCU-HTM: Combining RCU with HTM to Implement Highly Efficient Concurrent Binary Search Trees

Dimitrios Siakavaras, Konstantinos Nikas, Georgios Goumas, Nectarios Koziris

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 1 - 13

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

In this paper we introduce RCU-HTM, a technique that combines Read-Copy-Update (RCU) with Hardware Transactional Memory (HTM) to implement highly efficient concurrent Binary Search Trees (BSTs). Similarly to RCU-based algorithms, we perform the modifications of the tree structure in private copies of the affected parts of the tree rather than in-place. This allows threads that traverse the tree to...

chapter

Redesigning Go’s Built-In Map to Support Concurrent Operations

Louis Jenkins, Tingzhe Zhou, Michael Spear

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 14 - 26

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

The Go language lacks built-in data structures that allow fine-grained concurrent access. In particular, its map data type, one of only two generic collections in Go, limits concurrency to the case where all operations are read-only; any mutation (insert, update, or remove) requires exclusive access to the entire map. The tight integration of this map into the Go language and runtime precludes its...

chapter

Performance Evaluation of Priority Queues for Fine-Grained Parallel Tasks on GPUs

Nikolai Baudis, Florian Jacob, Philipp Andelfinger

2017 IEEE 25th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS) > 1 - 11

2017 IEEE 25th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)

Graphics processing units (GPUs) are increasingly applied to accelerate tasks such as graph problems and discreteevent simulation that are characterized by irregularity, i.e., a strong dependence of the control flow and memory accesses on the input. The core data structure in many of these irregular tasks are priority queues that guide the progress of the computations and which can easily become the...

chapter

Understanding the Impact of Fine-Grained Data Sharing and Thread Communication on Heterogeneous Workload Development

Tuan Ta, David Troendle, Xiaoqi Hu, Byunghyun Jang

2017 16th International Symposium on Parallel and Distributed Computing (ISPDC) > 132 - 139

2017 16th International Symposium on Parallel and Distributed Computing (ISPDC)

The conventional OpenCL 1.x style CPU-GPU heterogeneous computing paradigm treats the CPU and GPU processors as loosely connected separate entities. At best each executes independent tasks, but, more commonly, the CPU idles while waiting for results from the GPU. No data-sharing and communications are allowed during kernel execution. This model limits the number of applications that can harness the...

chapter

A Case Study on Context Maintenance in Dynamic Hybrid Race Detectors

Jialin Yang, Ernest Bota Pobee, W. K. Chan

2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC) > 2 > 84 - 89

2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC)

Many dynamic hybrid race detectors aim at detecting violations of the lockset discipline in execution traces of multithreaded programs. They are designed to abstract memory accesses appearing in traces as contexts. Nonetheless, they keep these contexts in different extents and partition the sets of contexts into equivalent classes of different granularity. In our case study, we compare three detectors...

chapter

Cache-Friendly Bitmap Compression on Symmetric Multiprocessors

Alexia Ingerson, David Chiu, Jason Sawin

2017 IEEE International Conference on Autonomic Computing (ICAC) > 71 - 72

2017 IEEE International Conference on Autonomic Computing (ICAC)

A worksharing model is presented to enhance parallel compression of data-intensive bitmap indices. To increase spatial locality, our approach interleaves multiple independent bitmaps in a combined file. Each file block, which fits entirely in cache, is processed by independent threads. Results show that our model significantly outperforms embarrassingly-parallel designs.

chapter

Massive spatial query on the Kepler architecture

Yili Gong, Jia Tang, Wenhai Li, Zihui Ye

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) > 111 - 118

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP)

In this paper, we present an optimized framework that can efficiently perform massive spatial queries on the current GPUs. To benefit the widely adopted filter-and-verify paradigm from GPUs, the skewed workloads are first associated with certain cells in a scaled spatial grid, such that the following range verification cost against the massive spatial objects can be significantly reduced. Particularly...

chapter

Analysis on interactive data race checker: IDRC

Md A. Obaida, Israt Jahan, Sayeed Z. Sajal

2017 IEEE International Conference on Electro Information Technology (EIT) > 265 - 269

2017 IEEE International Conference on Electro Information Technology (EIT)

Parallel programming is becoming more and more prevalent in this era of concurrent programming. Because of the nondeterministic nature of parallel programming, it is notoriously difficult to debug concurrency bugs, moreover attempt to fix one bug may result in deadlock or other concurrency bugs. Though many static and dynamic data race detection tool is proposed in recent years, none of them is interactive...

chapter

Automatic-Signal Monitors with Multi-object Synchronization

Wei-Lun Hung, Vijay K. Garg

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) > 927 - 936

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

Current monitor based systems have some disadvantages for multi-object operations. They require the programmers to (1) manually determine the order of locking operations, (2) manually determine the points of execution where threads should signal other threads, (3) use global locks or perform busy waiting for operations that depend upon a condition that spans multiple objects. Transactional memory...

chapter

Scalable Lock-Free Vector with Combining

Ivan Walulya, Philippas Tsigas

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) > 917 - 926

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

Dynamic vectors are among the most commonly used data structures in programming. They provide constant time random access and resizable data storage. Additionally, they provide constant time insertion (pushback) and deletion (popback) at the end of the sequence. However, in a multithreaded system, concurrent pushback and popback operations attempt to update the same shared object, creating a synchronization...

chapter

Studying Multi-threaded Behavior with TSViz

Matheus Nunes, Harjeet Lalh, Ashaya Sharma, Augustine Wong, more

2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C) > 35 - 38

2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C)

Modern high-performing systems make extensive use of multiple CPU cores. These multi-threaded systems are complex to design, build, and understand. Debugging performance of these multi-threaded systems is especially challenging. This requires the developer to understand the relative execution of dozens of threads and their inter-dependencies, including data-sharing and synchronization behaviors. We...

chapter

Communication Optimization on GPU: A Case Study of Sequence Alignment Algorithms

Jie Wang, Xinfeng Xie, Jason Cong

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) > 72 - 81

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

Data movement is increasingly becoming the bottleneck of both performance and energy efficiency in modern computation. Until recently, it was the case that there is limited freedom for communication optimization on GPUs, as conventional GPUs only provide two types of methods for inter-thread communication: using shared memory or global memory. However, a new warp shuffle instruction has been introduced...

chapter

Comparison of Threading Programming Models

Solmaz Salehian, Jiawen Liu, Yonghong Yan

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 766 - 774

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

In this paper, we provide comparison of languagefeatures and runtime systems of commonly used threadingparallel programming models for high performance computing, including OpenMP, Intel Cilk Plus, Intel TBB, OpenACC, NvidiaCUDA, OpenCL, C++11 and PThreads. We then report ourperformance comparison of OpenMP, Cilk Plus and C++11 fordata and task parallelism on CPU using benchmarks. The resultsshow...

chapter

FFQ: A Fast Single-Producer/Multiple-Consumer Concurrent FIFO Queue

Sergei Arnautov, Pascal Felber, Christof Fetzer, Bohdan Trach

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) > 907 - 916

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

With the spreading of multi-core architectures, operating systems and applications are becoming increasingly more concurrent and their scalability is often limited by the primitives used to synchronize the different hardware threads. In this paper, we address the problem of how to optimize the throughput of a system with multiple producer and consumer threads. Such applications typically synchronize...

INFONA - science communication portal

Search results

Simulation of the algorithms for optimization of remote core locking method for multicore computer systems

Algorithms of collective operations for distributed arrays in partitioned global address space

Thread- and data-level parallel simulation in SystemC, a Bitcoin miner case study

A graphics tracing framework for exploring CPU+GPU memory systems

cudaCR: An In-Kernel Application-Level Checkpoint/Restart Scheme for CUDA-Enabled GPUs

Effectiveness of partitioning strategies of Fast Fourier Transform in GPU implementations

RCU-HTM: Combining RCU with HTM to Implement Highly Efficient Concurrent Binary Search Trees

Redesigning Go’s Built-In Map to Support Concurrent Operations

Performance Evaluation of Priority Queues for Fine-Grained Parallel Tasks on GPUs

Understanding the Impact of Fine-Grained Data Sharing and Thread Communication on Heterogeneous Workload Development

A Case Study on Context Maintenance in Dynamic Hybrid Race Detectors

Cache-Friendly Bitmap Compression on Symmetric Multiprocessors

Massive spatial query on the Kepler architecture

Analysis on interactive data race checker: IDRC

Automatic-Signal Monitors with Multi-object Synchronization

Scalable Lock-Free Vector with Combining

Studying Multi-threaded Behavior with TSViz

Communication Optimization on GPU: A Case Study of Sequence Alignment Algorithms

Comparison of Threading Programming Models

FFQ: A Fast Single-Producer/Multiple-Consumer Concurrent FIFO Queue

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options