Search results

chapter

Toward a pixel-parallel architecture for graph cuts inference on FPGA

Tianqi Gao, Jungwook Choi, Shang-nien Tsai, Rob A. Rutenbar

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 4

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

The method of Graph Cuts converts a Maximum a Posteriori (MAP) inference problem on a Markov Random Field (MRF) into a network flow, which can be solved efficiently. Many computer vision problems can be conveniently cast as an inference task to find most likely labels for pixels. The method is widely used, but computationally burdensome. Prior accelerator attempts have failed to exploit the problem's...

chapter

Interleaved logic-in-memory architecture for energy-efficient fine-grained data processing

Kai Yang, Robert Karam, Swarup Bhunia

2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS) > 409 - 412

2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS)

For a growing pool of data-intensive applications, data transfer, rather than processing speed, has emerged as the major bottleneck to performance and energy scalability. In this paper, we propose a novel interleaved logic-in-memory architecture, referred to as MISK, which leverages fine-grained integration of logic functions within dense, 2-D static random-access memory (SRAM) arrays for in-situ...

chapter

Bitslice Vectors: A Software Approach to Customizable Data Precision on Processors with SIMD Extensions

Shixiong Xu, David Gregg

2017 46th International Conference on Parallel Processing (ICPP) > 442 - 451

2017 46th International Conference on Parallel Processing (ICPP)

Customizing the precision of data can provide attractive trade-offs between accuracy and hardware resources. Custom hardware and FPGA designs allow bit-level control over precision, but software is typically limited by the range of types supported by the underlying processor. We propose a new form of vector computing aimed at arrays of custom-precision data on general-purpose processors with SIMD...

chapter

Finding partial hash collisions by brute force parallel programming

Vincent Chiriaco, Aubrey Franzen, Rebecca Thayil, Xiaowen Zhang

2017 IEEE Long Island Systems, Applications and Technology Conference (LISAT) > 1 - 6

2017 IEEE Long Island Systems, Applications and Technology Conference (LISAT)

A hash function hashes a longer message of arbitrary length into a much shorter bit string of fixed length, called a hash. Inevitably, there will be a lot of different messages being hashed to the same or similar hash. We call this a hash collision or a partial hash collision. By utilizing multiple processors from the CUNY High Performance Computing Center's clusters, we can locate partial collisions...

chapter

A Pipelined and Scalable Dataflow Implementation of Convolutional Neural Networks on FPGA

Marco Bacis, Giuseppe Natale, Emanuele Del Sozzo, Marco Domenico Santambrogio

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 90 - 97

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Convolutional Neural Network (CNN) is a deep learning algorithm extended from Artificial Neural Network (ANN) and widely used for image classification and recognition, thanks to its invariance to distortions. The recent rapid growth of applications based on deep learning algorithms, especially in the context of Big Data analytics, has dramatically improved both industrial and academic research and...

chapter

Stabbing Colors in One Dimension

Arnab Ganguly, Wing-Kai Hon, Rahul Shah

2017 Data Compression Conference (DCC) > 280 - 289

2017 Data Compression Conference (DCC)

Given n horizontal segments, each associated with a color from [σ], the Categorical Segment Stabbing problem is to find the distinct K colors stabbed by a vertical line. When the end-points of the segments are distinct and lie in [1, 2n], we present an (2 + ε)n log σ + O(n)-bit index with O(K/ε) query time, where ε∈ (0, 1].When the end-points are arbitrary real numbers, a standard reduction to the...

chapter

An improved automatic MPI code generation algorithm for parallelizing compilation

Yangxia Xiang, Caisen Chen, Hongyan Wang, Zeyun Zhou

2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC) > 1623 - 1626

2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC)

Open64 is an open source compiler with powerful analysis and widely used as a research and commercial development platform. However, it has not been designed and developed to realize MPI parallelization. There are many contributions in the paper. Firstly, the Open64 compiler infrastructure is showed. Secondly, the location of MPI code generation in the Open64 compiler architecture is analyzed. Thirdly,...

chapter

A Workload Sensitive Dynamic Scaling Matrix Multiplier Structure

Yuran Qiao, Junzhong Shen, Tao Xiao, Qianming Yang

2016 8th International Conference on Computational Intelligence and Communication Networks (CICN) > 548 - 552

2016 8th International Conference on Computational Intelligence and Communication Networks (CICN)

Matrix multiplication is one of the most widely used computational kernels in scientific computing and machine learning. Using dedicated circuit for matrix multiplication can reduce the computational time and energy consumption. Traditional matrix multipliers always adopt linear array architecture, which works inefficiently when the size of matrix sub-block is much smaller than the array length. Using...

chapter

Extreme scale breadth-first search on supercomputers

Koji Ueno, Toyotaro Suzumura, Naoya Maruyama, Katsuki Fujisawa, more

2016 IEEE International Conference on Big Data (Big Data) > 1040 - 1047

2016 IEEE International Conference on Big Data (Big Data)

Breadth-First Search(BFS) is one of the most fundamental graph algorithms used as a component of many graph algorithms. Our new method for distributed parallel BFS can compute BFS for one trillion vertices graph within half a second, using large supercomputers such as the K-Computer. By the use of our proposed algorithm, the K-Computer was ranked 1st in Graph500 using all the 82,944 nodes available...

chapter

A Parallel Algorithm for Finding All Pairs κ-Mismatch Maximal Common Substrings

Sriram P. Chockalingam, Sharma V. Thankachan, Srinivas Aluru

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis > 784 - 794

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis

We present an efficient parallel algorithm for the following problem: Given an input collection D of n sequences of total length N, a length threshold f and a mismatch threshold κ, report all κ-mismatch maximal common substrings of length at least f over all pairs of strings in D. This problem is motivated by clustering and assembly applications in computational biology, where D is a collection of...

chapter

Way prediction set-associative data cache for low power digital signal processors

Leiou Wang, Donghui Wang

2016 IEEE 13th International Conference on Signal Processing (ICSP) > 508 - 512

2016 IEEE 13th International Conference on Signal Processing (ICSP)

In digital signal processors, set-associative caches achieve low miss rates for typical applications but result in significant power consumption. Set-associative caches decrease access time by probing all the data ways in parallel with the tag lookup, although the output of only the matching way is used. The power spent access the other ways is wasted. Eliminating the power consumption by performing...

chapter

Overcoming the power wall by exploiting inexactness and emerging COTS architectural features: Trading precision for improving application quality

Mike Fagan, Jeremy Schlachter, Kazutomo Yoshii, Sven Leyffer, more

2016 29th IEEE International System-on-Chip Conference (SOCC) > 241 - 246

2016 29th IEEE International System-on-Chip Conference (SOCC)

Energy and power consumption are major limitations to continued scaling of computing systems. Inexactness where the quality of the solution can be traded for energy savings has been proposed as a counterintuitive approach to overcoming those limitation. However, in the past, inexactness has been necessitated the need for highly customized or specialized hardware. In order to move away from customization,...

chapter

StructSlim: A lightweight profiler to guide structure splitting

Probir Roy, Xu Liu

2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 36 - 46

2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Memory access latency continues to be a dominant bottleneck in a large class of applications on modern architectures. To optimize memory performance, it is important to utilize the locality in the memory hierarchy. Structure splitting can significantly improve memory locality. However, pinpointing inefficient code and providing insightful guidance for structure splitting is challenging. Existing tools...

chapter

A high speed shuffle bus for VLSI arrays

Wen-Tai Lin, Jyh-Ping Hwang

1987 Symposium on VLSI Circuits > 41 - 42

1987 Symposium on VLSI Circuits

Due to the concerns of two dimensional layout and structural modularity, interprocess or data transfers for VLSI arrays, such as systolic/wavefront processors, are normally achieved by way of neighborhood communication. Although interconnection networks are designed to enhance global communication for non-systolic types of processing, it is not feasible to incorporate the processors and global interconnections...

chapter

Automatic Code Generation of Distributed Parallel Tasks

Nelson Lossing, Corinne Ancourt, Francois Irigoin

2016 IEEE Intl Conference on Computational Science and Engineering (CSE) and IEEE Intl Conference on Embedded and Ubiquitous Computing (EUC) and 15th Intl Symposium on Distributed Computing and Applications for Business Engineering (DCABES) > 234 - 241

2016 IEEE Intl Conference on Computational Science and Engineering (CSE) and IEEE Intl Conference on Embedded and Ubiquitous Computing (EUC) and 15th Intl Symposium on Distributed Computing and Applications for Business Engineering (DCABES)

With the advent of clustered systems, more and more parallel computing is required. However a lot of programming skills is needed to write a parallel codes, especially when you want to benefit from the various parallel architectural resources, with heterogeneous units and complex memory organizations. We present in this paper a method that generates automatically, step by step, a task-parallel distributed...

chapter

Contrastive analysis of bubble & merge sort proposing hybrid approach

Sehrish Munawar Cheema, Nadeem Sarwar, Fatima Yousaf

2016 Sixth International Conference on Innovative Computing Technology (INTECH) > 371 - 375

2016 Sixth International Conference on Innovative Computing Technology (INTECH)

A sorting algorithm is one that puts elements of a list in a certain order. It makes easy searching and locating the information. The most-used orders are numerical order and lexicographical order. An efficient sorting algorithm is that takes less time and space complexity. In this paper I make contrastive analysis of bubble sort and merge sort and tried to show why required some new approach to get...

chapter

Design of control systems for parallel computing structures based on net models

Vladimir Kulagin

2016 International Siberian Conference on Control and Communications (SIBCON) > 1 - 4

2016 International Siberian Conference on Control and Communications (SIBCON)

This paper addresses the issue of designing control systems for parallel computing structures. Designing methodology described grounds on Petri nets to model computing systems of different dimensionality. Then a description of the Petri nets models (PN-models) vertex projection procedure, which allows constructing new models with differing structural and dynamical properties, is presented. Afterwards...

chapter

Enhancing the EDUCache simulator with visualization of cache performance

Marjan Gusev, Sasko Ristov, Dimitrij Mijoski

2016 IEEE Global Engineering Education Conference (EDUCON) > 375 - 382

2016 IEEE Global Engineering Education Conference (EDUCON)

Computer science students use data array processing in many courses. To exploit the full power of caches and obtain higher performance, they mostly use the textbook example of sequential access of data arrays. However, a lot of discrepancies occur and the expected performance is not obtained in real life program executions, mostly due to the existence of several cache levels, with various architectures...

chapter

Designing customized ISA processors using high level synthesis

Sam Skalicky, Tejaswini Ananthanarayana, Sonia Lopez, Marcin Lukowiak

2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig) > 1 - 6

2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig)

In this paper we propose a new degree of flexibility for soft processor design in which only the instructions relevant to the task at hand are implemented as a subset of the Instruction Set Architecture (ISA). These customized processors execute software kernels in the usual way, yet can be implemented with a fraction of the hardware resources used by other full- ISA soft processor cores. We present...

chapter

Exploiting Pure Superword Level Parallelism for Array Indirections

Huihui Sun, Rongcai Zhao, Wei Gao, Yi Gong, more

2015 Seventh International Symposium on Parallel Architectures, Algorithms and Programming (PAAP) > 13 - 19

2015 Seventh International Symposium on Parallel Architectures, Algorithms and Programming (PAAP)

SIMD (Single Instruction Multiple Data) extension units are ubiquitous in modern processors. Array indirections raise several challenges for SIMD vectorization including disjoint memory access, unknown alignment and dependence cycle. Existing SIMD automatic vectorization methods fail to handle these challenges very well. This paper presents a new method exploiting Pure SLP (Superword Level Parallelism)...

INFONA - science communication portal

Search results

Toward a pixel-parallel architecture for graph cuts inference on FPGA

Interleaved logic-in-memory architecture for energy-efficient fine-grained data processing

Bitslice Vectors: A Software Approach to Customizable Data Precision on Processors with SIMD Extensions

Finding partial hash collisions by brute force parallel programming

A Pipelined and Scalable Dataflow Implementation of Convolutional Neural Networks on FPGA

Stabbing Colors in One Dimension

An improved automatic MPI code generation algorithm for parallelizing compilation

A Workload Sensitive Dynamic Scaling Matrix Multiplier Structure

Extreme scale breadth-first search on supercomputers

A Parallel Algorithm for Finding All Pairs κ-Mismatch Maximal Common Substrings

Way prediction set-associative data cache for low power digital signal processors

Overcoming the power wall by exploiting inexactness and emerging COTS architectural features: Trading precision for improving application quality

StructSlim: A lightweight profiler to guide structure splitting

A high speed shuffle bus for VLSI arrays

Automatic Code Generation of Distributed Parallel Tasks

Contrastive analysis of bubble & merge sort proposing hybrid approach

Design of control systems for parallel computing structures based on net models

Enhancing the EDUCache simulator with visualization of cache performance

Designing customized ISA processors using high level synthesis

Exploiting Pure Superword Level Parallelism for Array Indirections

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options