The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Parity declustering is widely deployed in erasure coded storage systems so as to provide fast recovery and high data availability. However, to perform scaling on such RAIDs, it is necessary to preserve the parity declustered data layout so as to guarantee the RAID performance after scaling. Unfortunately, existing scaling algorithms fail to achieve this goal so they can not be applied for scaling...
The performance of general-purpose graphics processing units (GPGPUs) is often limited by the efficiency of the memory subsystems, particularly the L1 data caches. Because of the massive multithreading computation paradigm, significant memory resource contention and cache thrashing are often observed in GPGPU workloads. This leads to high cache miss rates and substantial pipeline stall time. In order...
With the prevalence of graph data in real-world applications and their ever-increasing size, many graph computing systems have been developed in recent years to scale the processing and analyzing of massive graphs. However, few study focuses on modeling the performance of graph algorithms and systems to identify the bottleneck of existing machines under the workload of large-scale graphs. In this...
We present an active micro-electrode array for neural recording with integrated spike detection and an asynchronous readout architecture. Neural amplifier arrays generate voluminous data because of the necessary per-channel sampling rates and number of channels in a dense array. Most of the time, neural cells produce well below 100 spikes per second, with action potential durations generally on the...
Large-scale graph structures are considered as a keystone for many emerging high-performance computing applications in which Breadth-First Search (BFS) is an important building block. For such graph structures, BFS operations tends to be memory-bound rather than compute-bound. In this paper, we present an efficient reconfigurable architecture for parallel BFS that adopts new optimizations for utilizing...
Motion Estimation (ME) is the most computationally intensive part of video compression. Speed up this process will open new application for high speed video transmission. Fast Motion Estimation (FME) co-processor architecture that efficiently reuses search area data is proposed. Smart and efficient small Processing Element (PE) and local memory are the central of the architecture. The search area...
This paper investigates the application of Adaptive Differential Evolution (DE) algorithm to enhance the performance of broadband antenna. A linear array consisting of half wavelength long unequally spaced parallel dipoles is designed to meet the increasing demand of extremely large bandwidth in wireless communication systems. Array is devised to maintain an almost invariable radiation patterns over...
QoS Routing Algorithm is a routing algorithm for finding the shortest path that satisfies the QoS requirements of the end users. While finding the shortest path this uses some of improved ideas for effectively finding the shortest path with required QoS measures. Exactly the QoS routing algorithm is a kind of multi constrained routing algorithm where more than one link components are taken into considerations...
The Smith-Waterman (SW) algorithm is the only optimal local sequence alignment algorithm. There are many SW implementations on FPGA, which show speedups of up to 100x as compared to a general-purpose-processor (GPP). In this paper, we propose a design of the SW traceback, which is done in parallel with the matrix fill stage and which gives the optimal alignment after once scanning through the whole...
We describe the methodology; the design and the implementation of scheduler block of interconnect. The scheduler block is implemented in Verilog using SYNOPSYS tool's DVE and Design_vision. The interconnect is capable of handling 72 bit packets and a total of 32 packets at a time. There are total 8 devices and we have to establish the communication between them. Each device consists of an input block...
This paper discusses stream media application server from stripe size, RAID algorithm and block size. According to the characteristics, we design and set up stream media application server which has excellent performance.
Sparse matrix vector multiplication (SpMV) is used in many scientific computations. The main bottleneck of this algorithm is memory bandwidth and many methods reduce memory bandwidth usage by compressing the index array. The matrices from finite difference modeling applications often have several dense diagonals and sparse diagonals. For these matrices, the index array can be deleted by using diagonal...
In the first part of this paper we discuss the possibility of developing multi-stream (multi)point-to-(multi)point data transport protocols with low-priority behavior towards other communication flows and which fill the upload bandwidth efficiently. We propose and evaluate a multi-stream data transfer protocol which aims to achieve these goals. In the second part of the paper we discuss several problems...
This paper describes our winning submission for the Absolute Performance category of the MEMOCODE 2009 Design Contest. We show that our GPGPU-based design achieves performance within a factor of four of theoretical maximum performance for the implemented algorithm. This result was reached after a short design-cycle of 2 man-days, which indicates that the NVIDIA CUDA platform allows for rapid development...
Data dissemination in the existing peer-to-peer multicast protocols is performed from source node by delivering data to destination nodes over either a tree or a partial-mesh path structure covering all the multicast nodes. In this paper, we show that the existing tree-based or partial-mesh-based data dissemination algorithms do not perform efficiently in traditional customer-provider networks. We...
Tape is used widely in high reliability storage system. This paper introduces a storage system named Ro-RAT (Read only RAID-Tape-Library) which can be used in geological exploration storage. Tape is used as storage medium in Ro-RAT. Due to the sequence access to tape devices, a disk cache is used to reduce access time. Base on the study of the accessing characteristic, a new replacement algorithm...
Computational grids are useful tools for bringing supercomputing power to users by using idle resources in the network. In the following paper we give a short overview of architecture of the Alchemi grid developed on .Net platform. We created a grid application, which utilizes Rabin-Karp string searching algorithm to test Alchemi grid performances in situation when requests put diverse demands for...
In this paper a BISR architecture for embedded memories is presented. The proposed scheme utilises a multiple bank cache-like memory for repairs. Statistical analysis is used for minimisation of the total resources required to achieve a very high fault coverage. Simulation results show that the proposed BISR scheme is characterised by high efficiency and low area overhead, even for high defect densities...
The Niagara2 CMT system-on-chip incorporates many design-for-test features to achieve high test coverage for both arrays and logic. All the arrays are tested using memory built-in-self-test. This is supplemented with scan-based testing. Logic is tested with standard ATPG for slow-speed defects and extensive use of transition test, along with logic built-in-self-test for the SPARC cores, for at-speed...
The imaging laser radar is fine measure equipment for TAN with the ability to get the high precision 3D terrain. A 3D terrain matching processor was needed to be designed for the specifical application. In this paper, base on the specialty of the imaging laser radar, the3D terrain matching processor was designed, with scheme of DSP+FPGA calculating engine, multi-level memory system, flexible parallel...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.