Advanced search

chapter

PDS: An I/O-Efficient Scaling Scheme for Parity Declustered Data Layout

Zhipeng Li, Yinlong Xu, Yongkun Li, Chengjin Tian, more

2017 46th International Conference on Parallel Processing (ICPP) > 402 - 411

2017 46th International Conference on Parallel Processing (ICPP)

Parity declustering is widely deployed in erasure coded storage systems so as to provide fast recovery and high data availability. However, to perform scaling on such RAIDs, it is necessary to preserve the parity declustered data layout so as to guarantee the RAID performance after scaling. Unfortunately, existing scaling algorithms fail to achieve this goal so they can not be applied for scaling...

chapter

Ctrl-C: Instruction-Aware Control Loop Based Adaptive Cache Bypassing for GPUs

Shin-Ying Lee, Carole-Jean Wu

2016 IEEE 34th International Conference on Computer Design (ICCD) > 133 - 140

2016 IEEE 34th International Conference on Computer Design (ICCD)

The performance of general-purpose graphics processing units (GPGPUs) is often limited by the efficiency of the memory subsystems, particularly the L1 data caches. Because of the massive multithreading computation paradigm, significant memory resource contention and cache thrashing are often observed in GPGPU workloads. This leads to high cache miss rates and substantial pipeline stall time. In order...

chapter

Quantitative Analysis of Graph Algorithms: Models and Optimization Methods

Xu Wang, Yongxin Zhu, Yufeng Chen

2016 IEEE 2nd International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS) > 191 - 196

2016 IEEE 2nd International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC) and IEEE International Conference on Intelligent Data and Security (IDS)

With the prevalence of graph data in real-world applications and their ever-increasing size, many graph computing systems have been developed in recent years to scale the processing and analyzing of massive graphs. However, few study focuses on modeling the performance of graph algorithms and systems to identify the bottleneck of existing machines under the workload of large-scale graphs. In this...

chapter

An active micro-electrode array with spike detection and asynchronous readout

Timir Datta-Chaudhuri, Bathiya Senevirathna, Alexander Castro, Elisabeth Smela, more

2014 IEEE Biomedical Circuits and Systems Conference (BioCAS) Proceedings > 588 - 591

2014 IEEE Biomedical Circuits and Systems Conference (BioCAS)

We present an active micro-electrode array for neural recording with integrated spike detection and an asynchronous readout architecture. Neural amplifier arrays generate voluminous data because of the necessary per-channel sampling rates and number of channels in a dense array. Most of the time, neural cells produce well below 100 spikes per second, with action potential durations generally on the...

chapter

CyGraph: A Reconfigurable Architecture for Parallel Breadth-First Search

Osama G. Attia, Tyler Johnson, Kevin Townsend, Philip Jones, more

2014 IEEE International Parallel & Distributed Processing Symposium Workshops > 228 - 235

2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW)

Large-scale graph structures are considered as a keystone for many emerging high-performance computing applications in which Breadth-First Search (BFS) is an important building block. For such graph structures, BFS operations tends to be memory-bound rather than compute-bound. In this paper, we present an efficient reconfigurable architecture for parallel BFS that adopts new optimizations for utilizing...

chapter

Fast co-processor for real time video transmission

Yasser Ismail, Wael El-Medany, Hessa Al-Junaid, Ahmed Abdelgawad

2013 IEEE 20th International Conference on Electronics, Circuits, and Systems (ICECS) > 945 - 949

2013 IEEE 20th International Conference on Electronics, Circuits, and Systems (ICECS)

Motion Estimation (ME) is the most computationally intensive part of video compression. Speed up this process will open new application for high speed video transmission. Fast Motion Estimation (FME) co-processor architecture that efficiently reuses search area data is proposed. Smart and efficient small Processing Element (PE) and local memory are the central of the architecture. The search area...

chapter

Adaptive Differential Evolution Algorithm to Design Broad-Band Array with Fixed Side Lobe Level and Optimum Voltage Standing Wave Ratio

B Basu, G K Mahanti

2011 International Conference on Devices and Communications (ICDeCom) > 1 - 5

2011 International Conference on Devices and Communications (ICDeCom)

This paper investigates the application of Adaptive Differential Evolution (DE) algorithm to enhance the performance of broadband antenna. A linear array consisting of half wavelength long unequally spaced parallel dipoles is designed to meet the increasing demand of extremely large bandwidth in wireless communication systems. Array is devised to maintain an almost invariable radiation patterns over...

chapter

An efficient routing algorithm for improving the QoS in Internet

Gavaskar Vincent, T Sasipraba

INTERACT-2010 > 381 - 387

2010 International Conference on Emerging Trends in Robotics and Communication Technologies (INTERACT 2010)

QoS Routing Algorithm is a routing algorithm for finding the shortest path that satisfies the QoS requirements of the end users. While finding the shortest path this uses some of improved ideas for effectively finding the shortest path with required QoS measures. Exactly the QoS routing algorithm is a kind of multi constrained routing algorithm where more than one link components are taken into considerations...

chapter

A parallel FPGA design of the Smith-Waterman traceback

Zubair Nawaz, Muhammad Nadeem, Hans van Someren, Koen Bertels

2010 International Conference on Field-Programmable Technology > 454 - 459

2010 International Conference on Field-Programmable Technology (FPT 2010)

The Smith-Waterman (SW) algorithm is the only optimal local sequence alignment algorithm. There are many SW implementations on FPGA, which show speedups of up to 100x as compared to a general-purpose-processor (GPP). In this paper, we propose a design of the SW traceback, which is done in parallel with the matrix fill stage and which gives the optimal alignment after once scanning through the whole...

chapter

SOC chip scheduler embodying I-slip algorithm

T B Salankar, V A Nitnaware

NORCHIP 2010 > 1 - 8

2010 28th Norchip Conference (NORCHIP 2010)

We describe the methodology; the design and the implementation of scheduler block of interconnect. The scheduler block is implemented in Verilog using SYNOPSYS tool's DVE and Design_vision. The interconnect is capable of handling 72 bit packets and a total of 32 packets at a time. There are total 8 devices and we have to establish the communication between them. Each device consists of an input block...

chapter

Research about High Performance Disk Array in Stream Media Server

Wu Ding-Xue, Jiang Guo-song

2010 International Symposium on Intelligence Information Processing and Trusted Computing > 671 - 674

2010 International Symposium on Intelligence Information Processing and Trusted Computing (IPTC 2010)

This paper discusses stream media application server from stripe size, RAID algorithm and block size. According to the characteristics, we design and set up stream media application server which has excellent performance.

chapter

Optimizing Sparse Matrix Vector Multiplication Using Diagonal Storage Matrix Format

Liang Yuan, Yunquan Zhang, Xiangzheng Sun, Ting Wang

2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC) > 585 - 590

2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC 2010)

Sparse matrix vector multiplication (SpMV) is used in many scientific computations. The main bottleneck of this algorithm is memory bandwidth and many methods reduce memory bandwidth usage by compressing the index array. The matrices from finite difference modeling applications often have several dense diagonals and sparse diagonals. For these matrices, the index array can be deleted by using diagonal...

chapter

Towards a multi-stream low-priority high throughput (multi)point-to-(multi)point data transport protocol

Mugurel Ionut Andreica, Alexandru Costan, Nicolae Tapus

Proceedings of the 2010 IEEE 6th International Conference on Intelligent Computer Communication and Processing > 427 - 434

2010 IEEE 6th International Conference on Intelligent Computer Communication and Processing (ICCP 2010)

In the first part of this paper we discuss the possibility of developing multi-stream (multi)point-to-(multi)point data transport protocols with low-priority behavior towards other communication flows and which fill the upload bandwidth efficiently. We propose and evaluate a multi-stream data transfer protocol which aims to achieve these goals. In the second part of the paper we discuss several problems...

chapter

A design case study: CPU vs. GPGPU vs. FPGA

Daniel L. Rosenband, Till Rosenband

2009 7th IEEE/ACM International Conference on Formal Methods and Models for Co-Design > 69 - 72

2009 7th IEEE/ACM International Conference on Formal Methods and Models for Co-Design (MEMOCODE)

This paper describes our winning submission for the Absolute Performance category of the MEMOCODE 2009 Design Contest. We show that our GPGPU-based design achieves performance within a factor of four of theoretical maximum performance for the implemented algorithm. This result was reached after a short design-cycle of 2 man-days, which indicates that the NVIDIA CUDA platform allows for rapid development...

chapter

Sequence-Based Data Dissemination Algorithms for Peer-to-Peer Multicast Protocols

M.J. Rostami, M. Saeed, S.M. Hoseininasab

2009 International Conference on Future Computer and Communication > 376 - 380

2009 International Conference on Future Computer and Communication (ICFCC)

Data dissemination in the existing peer-to-peer multicast protocols is performed from source node by delivering data to destination nodes over either a tree or a partial-mesh path structure covering all the multicast nodes. In this paper, we show that the existing tree-based or partial-mesh-based data dissemination algorithms do not perform efficiently in traditional customer-provider networks. We...

chapter

2QF: A New Replacement Algorithm for Ro-RAT System

Kang Jianbin, Wang Haishan, Ma Cheng, Jia Huibo

2009 WRI World Congress on Computer Science and Information Engineering > 4 > 91 - 95

2009 WRI World Congress on Computer Science and Information Engineering, CSIE

Tape is used widely in high reliability storage system. This paper introduces a storage system named Ro-RAT (Read only RAID-Tape-Library) which can be used in geological exploration storage. Tape is used as storage medium in Ro-RAT. Due to the sequence access to tape devices, a disk cache is used to reduce access time. Base on the study of the accessing characteristic, a new replacement algorithm...

chapter

Scheduling algorithms for dedicated nodes in Alchemi grid

Z. Stanfel, G. Martinovic, Z. Hocenski

2008 IEEE International Conference on Systems, Man and Cybernetics > 2531 - 2536

2008 IEEE International Conference on Systems, Man and Cybernetics (SMC 2008)

Computational grids are useful tools for bringing supercomputing power to users by using idle resources in the network. In the following paper we give a short overview of architecture of the Alchemi grid developed on .Net platform. We created a grid application, which utilizes Rabin-Karp string searching algorithm to test Alchemi grid performances in situation when requests put diverse demands for...

chapter

A BISR Architecture for Embedded Memories

K. Pekmestzi, N. Axelos, I. Sideris, N. Moshopoulos

2008 14th IEEE International On-Line Testing Symposium > 149 - 154

14th IEEE International On-Line Testing Symposium

In this paper a BISR architecture for embedded memories is presented. The proposed scheme utilises a multiple bank cache-like memory for repairs. Statistical analysis is used for minimisation of the total resources required to achieve a very high fault coverage. Simulation results show that the proposed BISR scheme is characterised by high efficiency and low area overhead, even for high defect densities...

chapter

Overview of DFT features of the Sun Microsystems Niagara2 CMP/CMT SPARC chip

T. Ziaja, M. Gala

2008 IEEE International Conference on Integrated Circuit Design and Technology and Tutorial > 151 - 154

2008 IEEE International Conference on IC Design and Technology & Tutorial (ICICDT)

The Niagara2 CMT system-on-chip incorporates many design-for-test features to achieve high test coverage for both arrays and logic. All the arrays are tested using memory built-in-self-test. This is supplemented with scan-based testing. Logic is tested with standard ATPG for slow-speed defects and extensive use of transition test, along with logic built-in-self-test for the SPARC cores, for at-speed...

chapter

The Design and Implement of 3D Terrain Matching Processor for Image Laser Radar

Junbin Gong, Yufei Sun, Hongbo Xu, Jinwen Tian

2008 Congress on Image and Signal Processing > 2 > 582 - 586

International Congress on Image and Signal Processing (CISP 2008)

The imaging laser radar is fine measure equipment for TAN with the ability to get the high precision 3D terrain. A 3D terrain matching processor was needed to be designed for the specifical application. In this paper, base on the specialty of the imaging laser radar, the3D terrain matching processor was designed, with scheme of DSP+FPGA calculating engine, multi-level memory system, flexible parallel...

INFONA - science communication portal

Advanced search

Advanced search

PDS: An I/O-Efficient Scaling Scheme for Parity Declustered Data Layout

Ctrl-C: Instruction-Aware Control Loop Based Adaptive Cache Bypassing for GPUs

Quantitative Analysis of Graph Algorithms: Models and Optimization Methods

An active micro-electrode array with spike detection and asynchronous readout

CyGraph: A Reconfigurable Architecture for Parallel Breadth-First Search

Fast co-processor for real time video transmission

Adaptive Differential Evolution Algorithm to Design Broad-Band Array with Fixed Side Lobe Level and Optimum Voltage Standing Wave Ratio

An efficient routing algorithm for improving the QoS in Internet

A parallel FPGA design of the Smith-Waterman traceback

SOC chip scheduler embodying I-slip algorithm

Research about High Performance Disk Array in Stream Media Server

Optimizing Sparse Matrix Vector Multiplication Using Diagonal Storage Matrix Format

Towards a multi-stream low-priority high throughput (multi)point-to-(multi)point data transport protocol

A design case study: CPU vs. GPGPU vs. FPGA

Sequence-Based Data Dissemination Algorithms for Peer-to-Peer Multicast Protocols

2QF: A New Replacement Algorithm for Ro-RAT System

Scheduling algorithms for dedicated nodes in Alchemi grid

A BISR Architecture for Embedded Memories

Overview of DFT features of the Sun Microsystems Niagara2 CMP/CMT SPARC chip

The Design and Implement of 3D Terrain Matching Processor for Image Laser Radar

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Advanced search

Advanced search

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options