The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Performance modeling plays an important role for optimal hardware design and optimized application implementation. This paper presents a very low overhead performance model, called VLAG, to approximate the data localities exploited by GPU kernels. VLAG receives source code-level information to estimate per memory-access instruction, per data array, and per kernel localities within GPU kernels. VLAG...
Combining of nonlocal means and variational models can extend the ability of image processing such as texture image processing. In this paper, an implementation scheme of nonlocal variational models is proposed, which contains discretization, coordinate transformation, and dimension expansion. Firstly, continuous models are transferred to discrete model. Then we transform absolute coordinate to relative...
Memristors have extended their influence beyond memory to logic and in-memory computing. Memristive logic design, the methodology of designing logic circuits using memristors, is an emerging concept whose growth is fueled by the quest for energy efficient computing systems. As a result, many memristive logic families have evolved with different attributes, and a mature comparison among them is needed...
Computing machine learning models in the cloud remains a central problem in big data analytics. In this work, we introduce a cloud analytic system exploiting a parallel array DBMS based on a classical shared-nothing architecture. Our approach combines in-DBMS data summarization with mathematical processing in an external program. We study how to summarize a data set in parallel assuming a large number...
Projections and measurements of error rates in near-exascale and exascale systems suggest a dramatic growth, due to extreme scale (10^9 cores), concurrency, software complexity, and deep submicron transistor scaling. Such a growth makes resilience a critical concern, and may increase the incidence of errors that "escape", silently corrupting application state. Such errors can often be revealed...
In this work, the problem of developing algorithms that automatically infer information about small-scale solar photovoltaic (PV) arrays in high resolution aerial imagery is considered. Such algorithms potentially offer a faster and cheaper solution to collecting small-scale PV information, such as their location and capacity. Existing work on this topic has focused on the automatic identification...
Link clustering groups different edges in a graph according to their similarities. Link clustering can reveal the overlapping and hierarchical organizations in a wide spectrum of networks. This work studies how to improve efficiency of link clustering along three dimensions, algorithm, modeling, and parallelization, on multi-core machines. We evaluate the efficiency improved due to each of the three...
Finding the best model to reveal potential relationships of a given set of data is not an easy job and often requires many iterations of trial and errors for model sections, feature selections and parameters tuning. This problem is greatly complicated in the big data era where the I/O bottlenecks significantly slowed down the time needed to finding the best model. In this article, we examine the case...
Due to increasing demand of low power computing, and diminishing returns from technology scaling, industry and academia are turning with renewed interest toward energy-efficient programmable accelerators. This paper proposes an Integrated Programmable-Array accelerator (IPA) architecture based on an innovative execution model, targeted to accelerate both data and control-flow parts of deeply embedded...
The contribution of the present work relies on an innovative and judicious combination of several optimization techniques for achieving high performance when using automatic vectorization and hybrid MPI/OpenMP parallelism in a Particle-in-Cell (PIC) code. The domain of application is plasma physics: the code simulates 2d2v Vlasov-Poisson systems on Cartesian grids with periodic boundary conditions...
Heterogeneous computing systems, e.g., those with accelerators than the host CPUs, offer the accelerated performance for a variety of workloads. However, most parallel programming models require platform dependent, time-consuming hand-tuning efforts for collectively using all the resources in a system to achieve efficient results. In this work, we explore the use of OpenMP parallel language extensions...
High-performance distributed memory applications often load or receive data in a format that differs from what the application uses. One such difference arises from how the application distributes data for parallel processing. Data must be redistributed from how it was laid out by the producer to how the application needs the data to be laid out amongst its processes. In this paper, we present a large-scale...
The full-wave analysis of large-scale phased array systems poses a very challenging computational electromagnetic problem. Conventional full-wave techniques such as the Method of Moments (MoM) can handle small-to medium-scale problems relatively easily. When the size of the array exceeds a hundred elements, full-wave techniques reach their limit of applicability. For larger arrays, periodic simulators...
Graph processing is used in many fields of science such as sociology, risk prediction or biology. Although analysis of graphs is important it also poses numerous challenges especially for large graphs which have to be processed on multicore systems. In this paper, we present PGAS (Partitioned Global Address Space) version of the level-synchronous BFS (Breadth First Search) algorithm and its implementation...
This paper presents a novel map-reduce runtime system that is designed for scalability and for composition with other parallel software. We use a modified programming interface that expresses reduction operations over data containers as opposed to key-value pairs. This design choice admits higher efficiency as the programmer can select appropriate data structures. Our runtime targets shared memory...
For partitioned global address space (PGAS) runtimes, supporting out-of-core data computation is an important issue. Some researchers showed that flash SSDs are useful for out-of-core data computation.In this paper, we introduce ComEx-PM, a PGAS communication runtime. ComEx-PM supports out-of-core data computation using a flash SSD. ComEx-PM launched multiple processes in each node. Memory region...
The SENSEI generic in situ interface is an API that promotes code portability and reusability. From the simulation view, a developer can instrument their code with the SENSEI API and then make make use of any number of in situ infrastructures. From the method view, a developer can write an in situ method using the SENSEI API, then expect it to run in any number of in situ infrastructures, or be invoked...
Chapel supports distributed computing with an underlying PGAS memory address space. While it provides abstractions for writing simple and elegant distributed code, the type system currently lacks a notion of locality i.e. a description of an object's access behavior in relation to its actual location. This often necessitates programmer intervention to avoid redundant non-local data access. Moreover,...
In this paper, we proposed an innovative approachfor feature selection and model updating in big data machinelearning. Since hard drive access is the biggest barrier for bigdata problems, it is therefore nature to reduce disk I/O operationswhen evaluating different combinations of features, or updatinga learning machine. Particularly, we are interested in discoveringif small enough matrices exist...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.