Search results

chapter

VLAG: A very fast locality approximation model for GPU kernels with regular access patterns

Mohsen Kiani, Amir Rajabzadeh

2017 7th International Conference on Computer and Knowledge Engineering (ICCKE) > 260 - 265

2017 7th International Conference on Computer and Knowledge Engineering (ICCKE)

Performance modeling plays an important role for optimal hardware design and optimized application implementation. This paper presents a very low overhead performance model, called VLAG, to approximate the data localities exploited by GPU kernels. VLAG receives source code-level information to estimate per memory-access instruction, per data array, and per kernel localities within GPU kernels. VLAG...

chapter

Nonlocal variational method application for image denoising

Jun Yu, Shu-min Dong

2017 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC) > 1 - 6

2017 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC)

Combining of nonlocal means and variational models can extend the ability of image processing such as texture image processing. In this paper, an implementation scheme of nonlocal variational models is proposed, which contains discretization, coordinate transformation, and dimension expansion. Firstly, continuous models are transferred to discrete model. Then we transform absolute coordinate to relative...

chapter

Memristive logic: A framework for evaluation and comparison

John Reuben, Rotem Ben-Hur, Nimrod Wald, Nishil Talati, more

2017 27th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS) > 1 - 8

2017 27th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS)

Memristors have extended their influence beyond memory to logic and in-memory computing. Memristive logic design, the methodology of designing logic circuits using memristors, is an emerging concept whose growth is fueled by the quest for energy efficient computing systems. As a result, many memristive logic families have evolved with different attributes, and a mature comparison among them is needed...

chapter

A Cloud System for Machine Learning Exploiting a Parallel Array DBMS

Yiqun Zhang, Carlos Ordonez, Lennart Johnsson

2017 28th International Workshop on Database and Expert Systems Applications (DEXA) > 22 - 26

2017 28th International Workshop on Database and Expert Systems Applications (DEXA)

Computing machine learning models in the cloud remains a central problem in big data analytics. In this work, we introduce a cloud analytic system exploiting a parallel array DBMS based on a classical shared-nothing architecture. Our approach combines in-DBMS data summarization with mathematical processing in an external program. We study how to summarize a data set in parallel assuming a large number...

chapter

Resilience for Stencil Computations with Latent Errors

Aiman Fang, Aurelien Cavelan, Yves Robert, Andrew A. Chien

2017 46th International Conference on Parallel Processing (ICPP) > 581 - 590

2017 46th International Conference on Parallel Processing (ICPP)

Projections and measurements of error rates in near-exascale and exascale systems suggest a dramatic growth, due to extreme scale (10^9 cores), concurrency, software complexity, and deep submicron transistor scaling. Such a growth makes resilience a critical concern, and may increase the incidence of errors that "escape", silently corrupting application state. Such errors can often be revealed...

chapter

Estimating the electricity generation capacity of solar photovoltaic arrays using only color aerial imagery

Brenda So, Cory Nezin, Vishnu Kaimal, Sam Keene, more

2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS) > 1603 - 1606

IGARSS 2017 - 2017 IEEE International Geoscience and Remote Sensing Symposium

In this work, the problem of developing algorithms that automatically infer information about small-scale solar photovoltaic (PV) arrays in high resolution aerial imagery is considered. Such algorithms potentially offer a faster and cheaper solution to collecting small-scale PV information, such as their location and capacity. Existing work on this topic has focused on the automatic identification...

chapter

Improving Efficiency of Link Clustering on Multi-core Machines

Guanhua Yan

2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS) > 2017 - 2024

2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)

Link clustering groups different edges in a graph according to their similarities. Link clustering can reveal the overlapping and hierarchical organizations in a wide spectrum of networks. This work studies how to improve efficiency of link clustering along three dimensions, algorithm, modeling, and parallelization, on multi-core machines. We evaluate the efficiency improved due to each of the three...

chapter

Finding the Best Box-Cox Transformation in Big Data with Meta-Model Learning: A Case Study on QCT Developer Cloud

Yuxiang Gao, Tongling Zhang, Baijian Yang

2017 IEEE 4th International Conference on Cyber Security and Cloud Computing (CSCloud) > 31 - 34

2017 IEEE 4th International Conference on Cyber Security and Cloud Computing (CSCloud)

Finding the best model to reveal potential relationships of a given set of data is not an easy job and often requires many iterations of trial and errors for model sections, feature selections and parameters tuning. This problem is greatly complicated in the big data era where the I/O bottlenecks significantly slowed down the time needed to finding the best model. In this article, we examine the case...

chapter

A 142MOPS/mW integrated programmable array accelerator for smart visual processing

Satyajit Das, Davide Rossi, Kevin J. M. Martin, Philippe Coussy, more

2017 IEEE International Symposium on Circuits and Systems (ISCAS) > 1 - 4

2017 IEEE International Symposium on Circuits and Systems (ISCAS)

Due to increasing demand of low power computing, and diminishing returns from technology scaling, industry and academia are turning with renewed interest toward energy-efficient programmable accelerators. This paper proposes an Integrated Programmable-Array accelerator (IPA) architecture based on an innovative execution model, targeted to accelerate both data and control-flow parts of deeply embedded...

chapter

Efficient Data Structures for a Hybrid Parallel and Vectorized Particle-in-Cell Code

Yann Barsamian, Sever A. Hirstoaga, Eric Violard

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 1168 - 1177

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

The contribution of the present work relies on an innovative and judicious combination of several optimization techniques for achieving high performance when using automatic vectorization and hybrid MPI/OpenMP parallelism in a Particle-in-Cell (PIC) code. The domain of application is plasma physics: the code simulates 2d2v Vlasov-Poisson systems on Cartesian grids with periodic boundary conditions...

chapter

HOMP: Automated Distribution of Parallel Loops and Data in Highly Parallel Accelerator-Based Systems

Yonghong Yan, Jiawen Liu, Kirk W. Cameron, Mariam Umar

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) > 788 - 798

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

Heterogeneous computing systems, e.g., those with accelerators than the host CPUs, offer the accelerated performance for a variety of workloads. However, most parallel programming models require platform dependent, time-consuming hand-tuning efforts for collectively using all the resources in a system to achieve efficient results. In this work, we explore the use of OpenMP parallel language extensions...

chapter

Automated Dynamic Data Redistribution

Thomas Marrinan, Joseph A. Insley, Silvio Rizzi, Francois Tessier, more

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 1208 - 1215

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

High-performance distributed memory applications often load or receive data in a format that differs from what the application uses. One such difference arises from how the application distributes data for parallel processing. Data must be redistributed from how it was laid out by the producer to how the application needs the data to be laid out amongst its processes. In this paper, we present a large-scale...

chapter

A spatial decomposition technique for large phased arrays using region overlap

K. Y. Sze, K. F. Sabet, T. Ozdemir, D. Chun

2002 9th International Symposium on Antenna Technology and Applied Electromagnetics [ANTEM] > 1 - 4

2002 9th International Symposium on Antenna Technology and Applied Electromagnetics [ANTEM]

The full-wave analysis of large-scale phased array systems poses a very challenging computational electromagnetic problem. Conventional full-wave techniques such as the Method of Moments (MoM) can handle small-to medium-scale problems relatively easily. When the size of the array exceeds a hundred elements, full-wave techniques reach their limit of applicability. For larger arrays, periodic simulators...

chapter

Level-Synchronous BFS Algorithm Implemented in Java Using PCJ Library

Magdalena Ryczkowska, Marek Nowicki, Piotr Bala

2016 International Conference on Computational Science and Computational Intelligence (CSCI) > 596 - 601

2016 International Conference on Computational Science and Computational Intelligence (CSCI)

Graph processing is used in many fields of science such as sociology, risk prediction or biology. Although analysis of graphs is important it also poses numerous challenges especially for large graphs which have to be processed on multicore systems. In this paper, we present PGAS (Partitioned Global Address Space) version of the level-synchronous BFS (Breadth First Search) algorithm and its implementation...

chapter

A scalable and composable map-reduce system

Mahwish Arif, Hans Vandierendonck, Dimitrios S. Nikolopoulos, Bronis R. de Supinski

2016 IEEE International Conference on Big Data (Big Data) > 2233 - 2242

2016 IEEE International Conference on Big Data (Big Data)

This paper presents a novel map-reduce runtime system that is designed for scalability and for composition with other parallel software. We use a modified programming interface that expresses reduction operations over data containers as opposed to key-value pairs. This design choice admits higher efficiency as the programmer can select appropriate data structures. Our runtime targets shared memory...

chapter

PGAS Communication Runtime for Extreme Large Data Computation

Ryo Matsumiya, Toshio Endo

2016 Second International Workshop on Extreme Scale Programming Models and Middlewar (ESPM2) > 10 - 16

2016 Second International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)

For partitioned global address space (PGAS) runtimes, supporting out-of-core data computation is an important issue. Some researchers showed that flash SSDs are useful for out-of-core data computation.In this paper, we introduce ComEx-PM, a PGAS communication runtime. ComEx-PM supports out-of-core data computation using a flash SSD. ComEx-PM launched multiple processes in each node. Memory region...

chapter

The SENSEI Generic In Situ Interface

Utkarsh Ayachit, Brad Whitlock, Matthew Wolf, Burlen Loring, more

2016 Second Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization (ISAV) > 40 - 44

2016 Second Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization (ISAV)

The SENSEI generic in situ interface is an API that promotes code portability and reusability. From the simulation view, a developer can instrument their code with the SENSEI API and then make make use of any number of in situ infrastructures. From the method view, a developer can write an in situ method using the SENSEI API, then expect it to run in any number of in situ infrastructures, or be invoked...

chapter

Optimizing PGAS Overhead in a Multi-locale Chapel Implementation of CoMD

Riyaz Haque, David Richards

2016 PGAS Applications Workshop (PAW) > 25 - 32

2016 PGAS Applications Workshop (PAW)

Chapel supports distributed computing with an underlying PGAS memory address space. While it provides abstractions for writing simple and elegant distributed code, the type system currently lacks a notion of locality i.e. a description of an object's access behavior in relation to its actual location. This often necessitates programmer intervention to avoid redundant non-local data access. Moreover,...

chapter

A Scalable Feature Selection and Model Updating Approach for Big Data Machine Learning

Baijian Yang, Tonglin Zhang

2016 IEEE International Conference on Smart Cloud (SmartCloud) > 146 - 151

2016 IEEE International Conference on Smart Cloud (SmartCloud)

In this paper, we proposed an innovative approachfor feature selection and model updating in big data machinelearning. Since hard drive access is the biggest barrier for bigdata problems, it is therefore nature to reduce disk I/O operationswhen evaluating different combinations of features, or updatinga learning machine. Particularly, we are interested in discoveringif small enough matrices exist...

chapter

SuperGlue: Standardizing Glue Components for HPC Workflows

Jay Lofstead, Alexis Champsaur, Jai Dayal, Matthew Wolf, more

2016 IEEE International Conference on Cluster Computing (CLUSTER) > 170 - 171

2016 IEEE International Conference on Cluster Computing (CLUSTER)

INFONA - science communication portal

Search results

VLAG: A very fast locality approximation model for GPU kernels with regular access patterns

Nonlocal variational method application for image denoising

Memristive logic: A framework for evaluation and comparison

A Cloud System for Machine Learning Exploiting a Parallel Array DBMS

Resilience for Stencil Computations with Latent Errors

Estimating the electricity generation capacity of solar photovoltaic arrays using only color aerial imagery

Improving Efficiency of Link Clustering on Multi-core Machines

Finding the Best Box-Cox Transformation in Big Data with Meta-Model Learning: A Case Study on QCT Developer Cloud

A 142MOPS/mW integrated programmable array accelerator for smart visual processing

Efficient Data Structures for a Hybrid Parallel and Vectorized Particle-in-Cell Code

HOMP: Automated Distribution of Parallel Loops and Data in Highly Parallel Accelerator-Based Systems

Automated Dynamic Data Redistribution

A spatial decomposition technique for large phased arrays using region overlap

Level-Synchronous BFS Algorithm Implemented in Java Using PCJ Library

A scalable and composable map-reduce system

PGAS Communication Runtime for Extreme Large Data Computation

The SENSEI Generic In Situ Interface

Optimizing PGAS Overhead in a Multi-locale Chapel Implementation of CoMD

A Scalable Feature Selection and Model Updating Approach for Big Data Machine Learning

SuperGlue: Standardizing Glue Components for HPC Workflows

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options