The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Noise in the image are random variations in intensity due to intrinsic or extrinsic sources. This paper proposes a high throughput Fixed Point Discrete Kalman filter (DKF) architecture for denoising images with additive white Gaussian noise (AWGN) at real-time. A linearized state model based on neighbor pixel similarity is used for improving the PSNR of the noisy image. A 5-stage two parallel bi-functional...
We have implemented a flexible User Defined Operator (UDO) for labeling connected components of a binary mask expressed as an array in SciDB, a parallel distributed database management system based on the array data model. This UDO is able to process very large multidimensional arrays by exploiting SciDB's memory management mechanism that efficiently manipulates arrays whose memory requirements far...
Small ARM Cortex-M0 PC boards MARY linkable to form an array network for parallel processing were applied to realtime multi-tasking DSP. Multiple MARY boards coordinate the multi-tasking DSP process via four directional communication ports of (N, S, W, E) without using the shared memory like the Transputer (developed in 1980's) works. The validity of the Transputer concept applied to a low-end embedded...
This paper presents an analysis of different alternatives for the realization of a VLSI cell in a nonlinear neuronal array, based on a simplicial piecewise linear (PWL) operation. Depending on the type of existing design constraints, namely, speed or density, different bus sizes can be used to broadcast the parameters stored in the memory, and in addition, row and column operations can be serialized...
Real time systems typically suffer from delay in data processing. This delay is caused by many reasons such as computational power, processor unit architecture, and synchronization signals in these systems. In order to increase the processing power, a new architecture and clocking technique is carried out in this paper hence the performance. This new architecture design called Embedded Parallel Systolic...
Emerging 3D-integration enables integrating high quality image sensors with various massively parallel processing elements. Analog motion estimation is one potential application, which is likely to result in significant benefits in the form of low power or high frame-rate 3D-integrated image sensor-processors. The system-level operation of a proposed analog motion estimation array, enabling all various...
Current research is mainly focussing on exploiting TLP to increase performance. Another avenue, however, for achieving performance scalability is specialization. In this paper we propose application specific intra-vector instructions for two dimensional signal processing kernels. In such kernels usually significant data rearrangement overhead is required in order to use the SIMD capabilities. When...
Heterogeneous multi core architectures are considered as a prominent next generation processor. Cell B.E. (Broadband Engine) was originally designed for a processor suitable for streaming applications such as media processing, but it also can be applied to other high performance computing applications. In order to confirm the effectiveness of the Cell B.E. for other HPC applications, the parallel...
The trend toward high processing power at a reasonable cost continues with the emergence of multi-core architectures with large number of cores. In such computing systems, a major technological challenge is to design the internal, on-chip communication network.This not only depends on high performance in latency, bandwidth, and fairness in contention under heavy loads, but also depends on an efficient...
A target independent specification model, called CIC (Common Intermediate Code) has been proposed to specify an application in a fashion that all potential functional and data parallelism are explicitly defined by the programmer. After mapping of an application to the target processors it is performed to exploit the parallelism optimally, the CIC translator synthesizes the target-specific code automatically...
Numerical simulations in computational physics, biology, and finance, often require the use of high quality and efficient parallel random number generators. We design and optimize several parallel pseudo random number generators on the cell broadband engine, with minimal correlation between the parallel streams: the linear congruential generator (LCG) with 64-bit prime addend and the Mersenne Twister...
This paper presents a PIM-based (Processing-in-Memory) architecture based on new reconfigurable cell data path. The architecture delivers increased power/throughput/area efficiency compared to previous well-known architectures. The investigation of the new reconfigurable cell design was performed in 0.18 and 0.13 micron CMOS technology nodes. Specifications of individual blocks are presented as well...
This paper describes several methodologies based on a pulsed laser beam to reveal the architecture of a high integrated SDRAM, and the different classes of Single Event Effects that can occur due to cosmic radiations. At cell level, laser is used to reveal an important technological parameter: the lithography process. At memory array level, laser is a powerful tool to retrieve cell physical arrangements,...
Array computers can be useful in the solution of numerical spatiotemporal problems such as partial differential equations (PDEs). IBM has recently introduced the cell broadband engine (Cell BE) Architecture, which contains 8 identical vector processors in an array structure. In the paper the implementation of the 3-D Princeton Ocean Model on the Cell BE is discussed. The area/speed/power tradeoffs...
The imaging laser radar is fine measure equipment for TAN with the ability to get the high precision 3D terrain. A 3D terrain matching processor was needed to be designed for the specifical application. In this paper, base on the specialty of the imaging laser radar, the3D terrain matching processor was designed, with scheme of DSP+FPGA calculating engine, multi-level memory system, flexible parallel...
We introduce a network of processing elements, the cube-connected-cycles (CCC), complying with the present technological constraints of VLSI design. By combining the principles of parallelism and pipelining, the CCC can emulate the cube-connected machine with no significant degradation of performance but with a much more compact structure. We describe in detail how to program the CCC for efficiently...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.