The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Distributed cyber-physical systems cover a wide range of applications such as automotive, avionic or industrial automation. These applications require a global notion of time to fullfill their timing requirements. Multi-processor system on chips (MPSOCs) are an attractive implementation option since they offer several benefits such as parallelism and power efficiency. However, MPSOCs have a Globally...
This paper describes and analyses a novel method to improve the parallel performance for solving sparse triangular systems (spTRSV). The main objective of this study consists in reducing the total idle time of processors as well as the execution time. Also, the developed solution is suitable for sparse and band structures. To evaluate and validate our contribution, a series of experiments have been...
This paper presents an open source task scheduling simulator, called MCRTsim, for real-time systems with uniprocessors, multiprocessors, and multi-core processors. It contains a task set generator, a set of real-time schedulers and synchronization protocols, and a comprehensive set of tools including visualized execution tracer, schedulability analyzer, and measurement and statistic modules. Therefore,...
Work partitioning is a key challenge with ap- plications in many scientific and technological fields. The problem is very well studied with a rich literature on both distributed and parallel computing architectures. In this paper we deal with the work partitioning problem for parallel and distributed agent-based simulations which aims at (i) balancing the overall load distribution, (ii) minimizing,...
The increase in speed and capacity of FPGAs is faster than the development of effective design tools to fully utilize it, and routing of nets remains as one of the most time-consuming stages of the FPGA design flow. While existing works have proposed methods of accelerating routing through parallelization, they are limited by the memory architecture of the system that they target. In this paper, we...
Multicore architectures can provide high predictable performance through parallel processing. Unfortunately, computing the makespan of parallel applications is overly pessimistic either due to load imbalance issues plaguing static scheduling methods or due to timing anomalies plaguing dynamic scheduling methods. This paper contributes with an anomaly-free dynamic scheduling method, called Lazy, which...
This paper presents and characterizes the Princeton Application Repository for Shared-Memory Computers (PARSEC), a benchmark suite for studies of Chip-Multiprocessors (CMPs). Previous available benchmarks for multiprocessors have focused on high-performance computing applications and used a limited number of synchronization methods. PARSEC includes emerging applications in recognition, mining and...
Shared-memory languages and systems provide strong guarantees only for well-synchronized (data-race-free) programs. Prior work introduces support for memory consistency based on region serializability of executing code regions, but all approaches incur serious limitations such as adding high run-time overhead or relying on complex custom hardware. This paper explores the potential for leveraging widely...
In the real world, many problems on massive graphs can be mapped to an underlying critical problem of discovering top-k subgraphs. For massive graphs, subgraph queries may have enormous number of matches, and so it is inefficient to compute all matches when only top-k matches are desired. Meanwhile, parallel algorithm is urgent for the scalability of massive graph computing. In this paper, we address...
The speedup is usually limited by two main laws in high-performance computing, that is, the Amdahl's and Gustafson's laws. However, the speedup sometimes can reach far beyond the limited linear speedup, known as superlinear speedup, which means that the speedup is greater than the number of processors that are used. Although the superlinear speedup is not a new concept and many authors have already...
PWCS (Probabilistic Write / Copy-Select) is a new kind of lock-free synchronization mechanism with wait-free characteristics proposed by Nicholas Mc Guire at the 13th real-time Linux workshop, which utilizes the inherent randomness of the modern computer systems. It aims at addressing the multi-reader - single-writer problem in Linux. Based on the original label-based PWCS, we propose a hash-based...
Routing of nets is one of the most time-consuming steps in the FPGA design flow. While existing works have described ways of accelerating the process through parallelization, they are not scalable. In this paper, we propose ParaFRo, a two-phase hybrid parallel FPGA router using fine-grained synchronization and partitioning. The first phase of the router aims to exploit the maximum parallelism available...
Multi-core processors are very common in the form of dual-core and quad-core processors. To take advantage of multiple cores, parallel programs are written. Existing legacy applications are sequential and runs on multiple cores utilizing only one core. Such applications should be either rewritten or parallelized to make efficient use of multiple cores. Manual parallelization requires huge efforts...
This paper presents a novel relaxed synchronization strategy for generic numerical algorithms executed in distributed and parallel computing systems. Large problems are efficiently solved if they can be parallelized. However, as the number of processing elements increases, the communication, necessary to synchronize intermediate computation across processing elements, increases and soon becomes a...
We present a three-step binding algorithm for applications in the form of directed acyclic graphs (DAGs) of tasks with deadlines, that need to be bound to a shared memory multiprocessor platform. The aim of the algorithm is to obtain a good binding that results in low makespans of the schedules of the DAGs. It first clusters tasks assuming unlimited resources using a deadline-aware shared memory extension...
FPGAs have grown considerably in the past years. In the meantime it is possible to implement several soft-core processors in one FPGA. This enables considerable parallelism for the developer. Unfortunately, most application code is still available in sequential form. Thus, in this contribution we present a tool that enables the automated transformation of an application into a streaming pipeline using...
The quest for more performance frequently finds some interesting answers in unconventional computing. IPNoSys is a parallel processing platform for packet-based applications. Its hardware architecture is based on network-on-chip (NoC) structure and its applications are executed while the packets are routed through the NoC. This paper presents a new architecture to IPNoSys programming model. IPNoSys...
Utilizing Hardware Accelerators (ACCs) is a promising solution to improve performance and power efficiency of Chip Multi-Processors (CMPs). However, new challenges arise with the trend of shifting from few ACCs (with sparse ACCs coverage) to many ACCs (denser ACCs coverage) on a chip. The primary challenges are a lack of clear semantics in ACC communication as well as a processor-centric view for...
The CPU module is composed of networks including a lot of multiprocessors, and the parallel processing is done between such processors. The most important elements in a VLSI multiprocessor network are the component of networks. The key in the network is to execute the communication between a lot of nodes faultlessly while securing the scalability. In this paper, we introduce a parallel architecture...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.