Search results

Items from 1 to 20 out of 681 results

chapter

Maximizing CNN accelerator efficiency through resource partitioning

Yongming Shen, Michael Ferdman, Peter Milder

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 535 - 547

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

Convolutional neural networks (CNNs) are revolutionizing machine learning, but they present significant computational challenges. Recently, many FPGA-based accelerators have been proposed to improve the performance and efficiency of CNNs. Current approaches construct a single processor that computes the CNN layers one at a time; the processor is optimized to maximize the throughput at which the collection...

chapter

High throughput design and implementation of SHA-3 hash algorithm

Xufan Wu, Shuguo Li

2017 International Conference on Electron Devices and Solid-State Circuits (EDSSC) > 1 - 2

2017 International Conference on Electron Devices and Solid-State Circuits (EDSSC)

In this paper, we propose two different hardware structure of SHA-3 hash algorithm for different width of circuit interface. They both support the four functions SHA3-224/256/384/512 of SHA-3 algorithm. The padding unit of our design is also implemented by hardware instead of software. Besides, a 3-round-in-1 structure is proposed to speed up the throughput of our circuit. We conduct an implementation...

chapter

H.265 inverse transform FPGA implementation in Impulse C

Slawomir Cichon, Marek Gorgon

2017 Federated Conference on Computer Science and Information Systems (FedCSIS) > 607 - 611

2017 Federated Conference on Computer Science and Information Systems (FedCSIS)

High Efficiency Video Coding (HEVC), a modern video compression standard, exceeds the predecessor H.264 in efficiency by 50%, but with cost of increased complexity. It is one of main research topics for FPGA engineers working on image compression algorithms. On the other hand high-level synthesis tools after few years of lower interest from the industry and academic research, started to gain more...

chapter

Automated generation of banked memory architectures in the high-level synthesis of multi-threaded software

Yu Ting Chen, Jason H. Anderson

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 8

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

Some modern high-level synthesis (HLS) tools [1] permit the synthesis of multi-threaded software into parallel hardware, where concurrent software threads are realized as concurrently operating hardware units. A common performance bottleneck in any parallel implementation (whether it be hardware or software) is memory bandwidth — parallel threads demand concurrent access to memory resulting in contention...

chapter

Ultrasonic analysis modifications for imaging of concrete infrastructure

James Bittner, John Popovics

2017 IEEE International Ultrasonics Symposium (IUS) > 1

2017 IEEE International Ultrasonics Symposium (IUS)

The backbone of modern infrastructure is concrete, yet modern ultrasonic characterization techniques often fail to be accurate and repeatable. Recent advances in low frequency ultrasonic array hardware have generated a new potential for improved material and structural characterization of concrete. However, the analysis and interpretation associated with the new array based hardware is more difficult...

chapter

Closed-loop ultrasonic power and communication with multiple miniaturized active implantable medical devices

Max L. Wang, Ting Chia Chang, Thomas Teisberg, Marcus J. Weber, more

2017 IEEE International Ultrasonics Symposium (IUS) > 1 - 4

2017 IEEE International Ultrasonics Symposium (IUS)

An ultrasonic array system is designed for wirelessly powering and communicating with implantable medical devices. A flexible hardware interface is used to drive a 32-element linear phased array with emphasis on programmability and efficiency rather than bandwidth or peak power. Various beam patterns and data modulation schemes are verified at a depth of 6 cm in a tissue phantom with more than sufficient...

chapter

Exploiting half precision arithmetic in Nvidia GPUs

Nhut-Minh Ho, Weng-Fai Wong

2017 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 7

2017 IEEE High Performance Extreme Computing Conference (HPEC)

With the growing importance of deep learning and energy-saving approximate computing, half precision floating point arithmetic (FP16) is fast gaining popularity. Nvidia's recent Pascal architecture was the first GPU that offered FP16 support. However, when actual products were shipped, programmers soon realized that a naïve replacement of single precision (FP32) code with half precision led to disappointing...

chapter

Large Scale Data Clustering Using Memristive k-Median Computation

Yomi Karthik Rupesh, Mahdi Nazm Bojnordi

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 374

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

Clustering is a crucial tool for analyzing data in virtually every scientific and engineering discipline. The U.S. National Academy of Sciences (NAS) has recently announced "the seven giants of statistical data analysis" in which data clustering plays a central role [1]. This research also emphasizes that more scalable solutions are required to enable time and space clustering for the future...

chapter

POSTER: Design Space Exploration for Performance Optimization of Deep Neural Networks on Shared Memory Accelerators

Swagath Venkataramani, Jungwook Choi, Vijayalakshmi Srinivasan, Kailash Gopalakrishnan, more

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 146 - 147

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

The growing prominence and computational challenges imposed by Deep Neural Networks (DNNs) has fueled the design of specialized accelerator architectures and associated dataflows to improve their implementation efficiency. Each of these solutions serve as a datapoint on the throughput vs. energy trade-offs for a given DNN and a set of architectural constraints. In this paper, we set out to explore...

chapter

Gemini FPGA hardware platform for the SKA low correlator and beamformer

E. Kooistra, G. A. Hampson, A. W. Gunst, J. D. Bunton, more

2017 XXXIInd General Assembly and Scientific Symposium of the International Union of Radio Science (URSI GASS) > 1 - 4

2017 XXXIInd General Assembly and Scientific Symposium of the International Union of Radio Science (URSI GASS)

In this paper the hardware designed for the SKA Low (Square Kilometre Array) correlator and beamformer (CBF) is discussed. SKA-Low is a low frequency aperture array (LFAA) to be located in remote Western Australia. The array is collecting radio signals in the frequency range from 50 to 350 MHz. The large number of dual polarization antennas (131072) are distributed over a total of 512 stations with...

chapter

Early science results from ASKAP

Karen Lee-Waddell

2017 XXXIInd General Assembly and Scientific Symposium of the International Union of Radio Science (URSI GASS) > 1 - 4

2017 XXXIInd General Assembly and Scientific Symposium of the International Union of Radio Science (URSI GASS)

ASKAP has recently started its Early Science program with 12 MkII PAF-equipped antennas and 36 beams simultaneously covering a 30 square degree field of view. The first observations have focused on mapping extragalactic neutral hydrogen in galaxy groups and clusters selected by the ‘WALLABY’ Survey Science Team. Significant efforts from engineers, software designers, and scientists are overcoming...

chapter

Bitslice Vectors: A Software Approach to Customizable Data Precision on Processors with SIMD Extensions

Shixiong Xu, David Gregg

2017 46th International Conference on Parallel Processing (ICPP) > 442 - 451

2017 46th International Conference on Parallel Processing (ICPP)

Customizing the precision of data can provide attractive trade-offs between accuracy and hardware resources. Custom hardware and FPGA designs allow bit-level control over precision, but software is typically limited by the range of types supported by the underlying processor. We propose a new form of vector computing aimed at arrays of custom-precision data on general-purpose processors with SIMD...

chapter

Cider: A Case for Block Level Variable Redundancy on a Distributed Flash Array

Sharath Chandrashekhara, Madhusudhan R. Kumar, Mahesh Venkataramaiah, Vipin Chaudhary

2017 26th International Conference on Computer Communication and Networks (ICCCN) > 1 - 9

2017 26th International Conference on Computer Communication and Networks (ICCCN)

With the increase in data volumes, it is prudent to classify data depending on its criticality. One might prefer a cheap storage for a year old system logs but a highly fault tolerant storage for personal photos. The existing solutions include storing these two sets of data in two different systems or choosing a system with a fault tolerance level required by the most critical data. This means a higher...

chapter

A data remanence based approach to generate 100% stable keys from an SRAM physical unclonable function

Muqing Liu, Chen Zhou, Qianying Tang, Keshab K. Parhi, more

2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED) > 1 - 6

2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)

The start-up value of an SRAM cell is unique, random, and unclonable as it is determined by the inherent process mismatch between transistors. These properties make SRAM an attractive circuit for generating encryption keys. The primary challenge for SRAM based key generation, however, is the poor stability when the circuit is subject to random noise, temperature and voltage changes, and device aging...

chapter

Specifying architecture of knowledge graph with data graph, information graph, knowledge graph and wisdom graph

Yucong Duan, Lixu Shao, Gongzhu Hu, Zhangbing Zhou, more

2017 IEEE 15th International Conference on Software Engineering Research, Management and Applications (SERA) > 327 - 332

2017 IEEE 15th International Conference on Software Engineering Research, Management and Applications (SERA)

Knowledge graphs have been widely adopted, in large part owing to their schema-less nature. It enables knowledge graphs to grow seamlessly and allows for new relationships and entities as needed. Knowledge graph has become a powerful tool to represent knowledge in the form of a labelled directed graph and to give semantics to textual information. A knowledge graph is a graph constructed by representing...

chapter

Memory fartitioning-based modulo scheduling for high-level synthesis

Tianyi Lu, Shouyi Yin, Xianqing Yao, Zhicong Xie, more

2017 IEEE International Symposium on Circuits and Systems (ISCAS) > 1 - 4

2017 IEEE International Symposium on Circuits and Systems (ISCAS)

High-Level Synthesis (HLS) has been widely recognized as an efficient compilation process targeting FPGAs for algorithm evaluation and product prototyping. However, the massively parallel memory access demands and the extremely expensive cost of single-bank memory with multi-port have impeded loop pipelining performance. Thus, based on an alternative multi-bank memory architecture, a joint approach...

chapter

AnalyzeThat: A Programmable Shared-Memory System for an Array of Processing-In-Memory Devices

Sangkuen Lee, Hyogi Sim, Youngjae Kim, Sudharshan S. Vazhkudai

2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) > 619 - 624

2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

Processing In Memory (PIM), the concept of integrating processing directly with memory, has been attracting a lot of attention since PIM can assist in overcoming the throughput limitation caused by data movement between CPU and memory. The challenge, however, is that it requires the programmers to have a deep understanding of the PIM architecture to maximize the benefits such as data locality and...

chapter

Exploring High Efficiency Hardware Accelerator for the Key Algorithm of Square Kilometer Array Telescope Data Processing

Qian Wu, Yongxin Zhu, Xu Wang, Mengjun Li, more

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) > 195

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

The SKA (Square Kilometer Array) radio telescope under construction will become the largest telescope in the world by integrating the sampled data from a huge number of small antenna nodes in the array to emulate a giant antenna. Due to the limited storage space, the SKA needs to process massive data in real-time, which makes the SKA scientific data processing become a bottleneck of the computational...

chapter

TimerShield: Protecting High-Priority Tasks from Low-Priority Timer Interference (Outstanding Paper)

Pratyush Patel, Manohar Vanga, Bjorn B. Brandenburg

2017 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS) > 3 - 12

2017 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS)

Timer interference arises when a high-priority realtime task is delayed by a timer interrupt that is intended for a lower-priority task. We demonstrate that high-resolution timers, as exposed for instance by Linux's hrtimer API, can cause substantial timer interference, which manifests as significantly increased response times and lowered throughput. To eliminate this source of unpredictability, we...

chapter

A static-placement, dynamic-issue framework for CGRA loop accelerator

Zhongyuan Zhao, Weiguang Sheng, Weifeng He, ZhiGang Mao, more

Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017 > 1348 - 1353

2017 Design, Automation & Test in Europe Conference & Exhibition (DATE)

This paper presents a static-placement, dynamic-issue (SPDI) framework for the coarse-grained reconfigurable architecture (CGRA) in order to tackle the inefficiencies of the static-issue, static-placement (SISP) CGRA. This framework includes the compiler that statically places the operations and hardware design, a SPDI CGRA, that automatically schedule the operations. We stress on introducing the...

Keywords:
ARRAYS
HARDWARE

Publication date

Set your own date range

Content availability

Available (671)
None (10)

Keywords

FIELD PROGRAMMABLE GATE ARRAYS (180)
SOFTWARE (92)
FPGA (77)
ALGORITHM DESIGN AND ANALYSIS (70)
REGISTERS (65)
RANDOM ACCESS MEMORY (61)
CLOCKS (53)
KERNEL (49)
PARALLEL PROCESSING (48)
PROGRAM PROCESSORS (45)
COMPUTER ARCHITECTURE (44)
RECONFIGURABLE ARCHITECTURES (41)
OPTIMIZATION (40)
BENCHMARK TESTING (34)
MEMORY MANAGEMENT (34)
COMPUTATIONAL MODELING (31)
DATA MINING (30)
PIXEL (29)
ARRAY SIGNAL PROCESSING (28)
INSTRUCTION SETS (28)
ADDERS (27)
CRYPTOGRAPHY (27)
LOGIC GATES (27)
THROUGHPUT (27)
MICROPROCESSOR CHIPS (26)
DELAY (25)
DIGITAL SIGNAL PROCESSING (25)
PIPELINE PROCESSING (25)
COMPLEXITY THEORY (24)
HARDWARE DESCRIPTION LANGUAGES (24)
RADIATION DETECTORS (24)
EMBEDDED SYSTEMS (23)
PROGRAMMING (23)
BANDWIDTH (21)
LOGIC DESIGN (21)
SIGNAL PROCESSING ALGORITHMS (21)
SWITCHES (21)
COMPUTERS (20)
TABLE LOOKUP (20)
ACCELERATION (19)
FIELD PROGRAMMABLE GATE ARRAY (19)
PIPELINES (19)
ENCODING (18)
INDEXES (18)
SYSTOLIC ARRAYS (18)
VLSI (18)
GRAPHICS PROCESSING UNITS (17)
MATHEMATICAL MODEL (17)
NOISE (17)
PERFORMANCE EVALUATION (17)
MULTIPROCESSING SYSTEMS (16)
ROUTING (16)
SYSTOLIC ARRAY (16)
ANTENNA ARRAYS (15)
CONTEXT (15)
COPROCESSORS (15)
ENGINES (15)
MICROPROCESSORS (15)
MONITORING (15)
RECONFIGURABLE COMPUTING (15)
SYNCHRONIZATION (15)
TRANSFORMS (15)
VECTORS (15)
ESTIMATION (14)
RESOURCE MANAGEMENT (14)
RUNTIME (14)
SYSTEM-ON-CHIP (14)
VIDEO CODING (14)
DECODING (13)
DIGITAL SIGNAL PROCESSING CHIPS (13)
DIRECTION-OF-ARRIVAL ESTIMATION (13)
FAULT TOLERANCE (13)
GRAPHICS PROCESSING UNIT (13)
LIBRARIES (13)
MOTION ESTIMATION (13)
PARALLEL ARCHITECTURES (13)
PROGRAM COMPILERS (13)
SORTING (13)
SYSTEM-ON-A-CHIP (13)
DATA STRUCTURES (12)
EVOLUTIONARY COMPUTATION (12)
EVOLVABLE HARDWARE (12)
REAL TIME SYSTEMS (12)
ACCURACY (11)
APPLICATION SPECIFIC INTEGRATED CIRCUITS (11)
CACHE STORAGE (11)
CIRCUIT FAULTS (11)
DIGITAL ARITHMETIC (11)
JAVA (11)
MULTICORE PROCESSING (11)
PREFETCHING (11)
PROTOTYPES (11)
STANDARDS (11)
TESTING (11)
TILES (11)
DIRECTION OF ARRIVAL ESTIMATION (10)
HEURISTIC ALGORITHMS (10)
MEMORY ARCHITECTURE (10)
more

INFONA - science communication portal

Search results

Maximizing CNN accelerator efficiency through resource partitioning

High throughput design and implementation of SHA-3 hash algorithm

H.265 inverse transform FPGA implementation in Impulse C

Automated generation of banked memory architectures in the high-level synthesis of multi-threaded software

Ultrasonic analysis modifications for imaging of concrete infrastructure

Closed-loop ultrasonic power and communication with multiple miniaturized active implantable medical devices

Exploiting half precision arithmetic in Nvidia GPUs

Large Scale Data Clustering Using Memristive k-Median Computation

POSTER: Design Space Exploration for Performance Optimization of Deep Neural Networks on Shared Memory Accelerators

Gemini FPGA hardware platform for the SKA low correlator and beamformer

Early science results from ASKAP

Bitslice Vectors: A Software Approach to Customizable Data Precision on Processors with SIMD Extensions

Cider: A Case for Block Level Variable Redundancy on a Distributed Flash Array

A data remanence based approach to generate 100% stable keys from an SRAM physical unclonable function

Specifying architecture of knowledge graph with data graph, information graph, knowledge graph and wisdom graph

Memory fartitioning-based modulo scheduling for high-level synthesis

AnalyzeThat: A Programmable Shared-Memory System for an Array of Processing-In-Memory Devices

Exploring High Efficiency Hardware Accelerator for the Key Algorithm of Square Kilometer Array Telescope Data Processing

TimerShield: Protecting High-Priority Tasks from Low-Priority Timer Interference (Outstanding Paper)

A static-placement, dynamic-issue framework for CGRA loop accelerator

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options