Search results

Items from 1 to 20 out of 978 results

chapter

Maximizing CNN accelerator efficiency through resource partitioning

Yongming Shen, Michael Ferdman, Peter Milder

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 535 - 547

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

Convolutional neural networks (CNNs) are revolutionizing machine learning, but they present significant computational challenges. Recently, many FPGA-based accelerators have been proposed to improve the performance and efficiency of CNNs. Current approaches construct a single processor that computes the CNN layers one at a time; the processor is optimized to maximize the throughput at which the collection...

chapter

APPROX-NoC: A data approximation framework for Network-on-Chip architectures

Rahul Boyapati, Jiayi Huang, Pritam Majumder, Ki Hwan Yum, more

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 666 - 677

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

The trend of unsustainable power consumption and large memory bandwidth demands in massively parallel multicore systems, with the advent of the big data era, has brought upon the onset of alternate computation paradigms utilizing heterogeneity, specialization, processor-in-memory and approximation. Approximate Computing is being touted as a viable solution for high performance computation by relaxing...

chapter

Understanding and optimizing asynchronous low-precision stochastic gradient descent

Christopher De Sa, Matthew Feldman, Christopher Re, Kunle Olukotun

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 561 - 574

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

Stochastic gradient descent (SGD) is one of the most popular numerical algorithms used in machine learning and other domains. Since this is likely to continue for the foreseeable future, it is important to study techniques that can make it run fast on parallel hardware. In this paper, we provide the first analysis of a technique called BUCKWILD! that uses both asynchronous execution and low-precision...

chapter

Hardware accelerator for coordinated radioresource scheduling in 5G ultra-high-density distributed antenna systems

Yuki Arikawa, Takeshi Sakamoto, Shunji Kimura

2017 27th International Telecommunication Networks and Applications Conference (ITNAC) > 1 - 6

2017 27th International Telecommunication Networks and Applications Conference (ITNAC)

This paper presents a novel radio-resource scheduler with a hardware accelerator for coordinated scheduling in 5G ultra-high-density distributed antenna systems. In 5G mobile communications systems, the transmission weight and the overall system throughputs for a huge number of possible combinations of antennas and user equipment have to be computed. To accelerate the scheduling, the new scheduler...

chapter

Designing a High-Throughput Pipeline for Digitizing Pinned Insects

Mark Hereld, Nicola J. Ferrier, Nitin Agarwal, Petra Sierwald

2017 IEEE 13th International Conference on e-Science (e-Science) > 542 - 550

2017 IEEE 13th International Conference on e-Science (e-Science)

This paper presents the design and prototyping of hardware and software to address the problem of rapid and reliable 3D digitization of very large collections of pinned insects. Using the collection at the Field Museum of Natural History (FMNH) as a use case, a pipeline to ingest the entire collection of 4.5 million specimens in circa 1-2 years imposes a few second limit on average processing time...

chapter

High throughput design and implementation of SHA-3 hash algorithm

Xufan Wu, Shuguo Li

2017 International Conference on Electron Devices and Solid-State Circuits (EDSSC) > 1 - 2

2017 International Conference on Electron Devices and Solid-State Circuits (EDSSC)

In this paper, we propose two different hardware structure of SHA-3 hash algorithm for different width of circuit interface. They both support the four functions SHA3-224/256/384/512 of SHA-3 algorithm. The padding unit of our design is also implemented by hardware instead of software. Besides, a 3-round-in-1 structure is proposed to speed up the throughput of our circuit. We conduct an implementation...

chapter

Analysis of K-bit pipelined processor cores using perl benchmarking

Eze Victor Chisom, K. C. Okafor, A. A. Obayi, Okoro Nkem Jennifer, more

2017 International Conference on Computing Networking and Informatics (ICCNI) > 1 - 7

2017 International Conference on Computing Networking and Informatics (ICCNI)

In today's high performance computing (HPC) environments, analyzing and predicting the performance of multiple-processor systems (clusters cores) on critical workloads remains a challenge. This is as a result of the key metrics that influences system's behavior. Busty arrivals in HPCs demand either a shared memory-parallel architecture or pipelined dataflow architecture. At present, a processor model...

chapter

Spartan and NEMO: Two HPC-Cloud Hybrid Implementations

Lev Lafayette, Bernd Wiebelt

2017 IEEE 13th International Conference on e-Science (e-Science) > 458 - 459

2017 IEEE 13th International Conference on e-Science (e-Science)

High Performance Computing systems offer excellent metrics for speed and efficiency when using bare metal hardware, a high speed interconnect, and parallel applications. In contrast cloud computing has provided management and implementation flexibility at a cost of performance. We therefore suggest two approaches to make HPC resources available in a dynamically reconfigurable hybrid HPC/Cloud architecture...

chapter

High-performance implementation of an HMAC processor based on SHA-3 Hash function

Junhui Li, Liji Wu, Xiangmin Zhang

2017 International Conference on Electron Devices and Solid-State Circuits (EDSSC) > 1 - 2

2017 International Conference on Electron Devices and Solid-State Circuits (EDSSC)

The Keyed-Hash Message Authentication Codes(HMAC) is a useful mechanism for message authentication. In this paper, a high-performance HMAC/SHA-3 processor which can generate HMAC message digest and hash message digest is presented. Not only the standard length (224,256,384,512) of the message digest can be generated, but also a length of 64-bit message digest. Due to the application of new generation...

chapter

Workload characterization of interactive cloud services on big and small server platforms

Shuang Chen, Shay GalOn, Christina Delimitrou, Srilatha Manne, more

2017 IEEE International Symposium on Workload Characterization (IISWC) > 125 - 134

2017 IEEE International Symposium on Workload Characterization (IISWC)

Key-value stores (e.g., Memcached) and web servers (e.g., NGINX) are widely used by cloud providers. As interactive services, they have strict service-level objectives, with typical 99^th-percentile tail latencies on the order of a few milliseconds. Unlike average latency, tail latency is more sensitive to changes in usage load and traffic patterns, system configurations, and resource availability...

chapter

Mapping of P4 match action tables to FPGA

Michal Kekely, Jan Korenek

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 2

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

Current networks are changing very fast. Network administrators need more flexible and powerful tools to be able to support new protocols or services very fast. The P4 language provides new level of abstraction for flexible packet processing. Therefore, we have designed new architecture for memory efficient mapping of P4 match/action tables to FPGA. The architecture is based on DCFL algorithm and...

chapter

A generic high throughput architecture for stream processing

Christes Rousopoulos, Ektoras Karandeinos, Grigorios Chrysos, Apostolos Dollas, more

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 5

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

Stream join is a fundamental and computationally expensive data mining operation for relating information from different data streams. This paper presents two FPGA-based architectures that accelerate stream join processing. The proposed hardware-based systems were implemented on a multi-FPGA hybrid system with high memory bandwidth. The experimental evaluation shows that our proposed systems can outperform...

chapter

Evaluating Effect of Write Combining on PCIe Throughput to Improve HPC Interconnect Performance

Mahesh Chaudhari, Kedar Kulkarni, Shreeya Badhe, Vandana Inamdar

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 639 - 640

2017 IEEE International Conference on Cluster Computing (CLUSTER)

HPC interconnect is a very crucial component of any HPC machine. Interconnect performance is one of the contributing factors for overall performance of HPC system. Most popular interface to connect Network Interface Card (NIC) to CPU is PCI express (PCIe). With denser core counts in compute servers and increasingly maturing fabric interconnect speeds, there is need to maximize the packet data movement...

chapter

Comparison of hardware and software implementations of selected lightweight block ciphers

William Diehl, Farnoud Farahmand, Panasayya Yalla, Jens-Peter Kaps, more

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 4

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

Lightweight block ciphers are an important topic of research in the context of the Internet of Things (IoT). Current cryptographic contests and standardization efforts seek to benchmark lightweight ciphers in both hardware and software. Although there have been several benchmarking studies of both hardware and software implementations of lightweight ciphers, direct comparison of hardware and software...

chapter

Scalable high-performance architecture for convolutional ternary neural networks on FPGA

Adrien Prost-Boucle, Alban Bourge, Frederic Petrot, Hande Alemdar, more

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 7

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

Thanks to their excellent performances on typical artificial intelligence problems, deep neural networks have drawn a lot of interest lately. However, this comes at the cost of large computational needs and high power consumption. Benefiting from high precision at acceptable hardware cost on these difficult problems is a challenge. To address it, we advocate the use of ternary neural networks (TNN)...

chapter

Hardware-oriented turbo-product codes decoder architecture

Yaroslav Krainyk, Vladyslav Perov, Maksym Musiyenko, Yevhen Davydenko

2017 9th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS) > 1 > 151 - 154

2017 9th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS)

Model of Turbo-Product Codes decoder architecture and method for construction of Turbo-Product Codes decoder are proposed in the paper. The model describes decoder functioning taking into account limitations of hardware platform and proposes re-use of components in the decoding process. The method provides set of steps for decoder implementation. Field-Programmable Gate Arrays circuits are selected...

chapter

OCEAN: An on-chip incremental-learning enhanced processor with gated recurrent neural network accelerators

Chixiao Chen, Hongwei Ding, Huwan Peng, Haozhe Zhu, more

ESSCIRC 2017 - 43rd IEEE European Solid State Circuits Conference > 259 - 262

ESSCIRC 2017 - 43rd IEEE European Solid State Circuits Conference (ESSCIRC)

A deep learning processor with 8 gated recurrent neural network (RNN) accelerators is proposed in this paper. It features on-chip incremental learning by numerical and local gradient computation enhancement. Extra precision of training is obtained without extending the bit-width. Tri-mode weight access (DMA/FIFO/RAM) improves the throughput during incremental learning. The number multipliers and activation...

chapter

Bit-level pipelining for highly parallel turbo-code decoders: A critical assessment

Stefan Weithoffer, Kira Kraft, Norbert Wehn

2017 IEEE AFRICON > 121 - 126

2017 IEEE AFRICON

The degree to which Turbo-Code decoder architectures can be parallelized is constrained by requirements for flexibility with respect to code block sizes and code rates. At the same time throughput requirements are expected to increase by a factor of up to 20x for 5G networks, which are currently undergoing standardization. The limiting factors for the throughput of a Turbo-Code decoder are maximum...

chapter

Robust throughput boosting for low latency dynamic partial reconfiguration

A. Nannarelli, M. Re, G. C. Cardarilli, L. Di Nunzio, more

2017 30th IEEE International System-on-Chip Conference (SOCC) > 86 - 90

2017 30th IEEE International System-on-Chip Conference (SOCC)

Reducing the configuration time of portions of an FPGA at run time is crucial in contemporary FPGA-based accelerators. In this work, we propose a method to increase the throughput for FPGA dynamic partial reconfiguration by using standard IP blocks. The throughput is increased by over-clocking the configuration bitstream circuitry beyond the limits stated in the specifications of these standard blocks...

chapter

Proposition and evaluation of a real-time generic architecture for a laser stripe detection system on FPGA

Seher Colak, Emmanuel Dumas, Virginie Fresse, Olivier Alata

2017 Conference on Design and Architectures for Signal and Image Processing (DASIP) > 1 - 6

2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)

Laser triangulation applications are commonly used for industrial quality control. Such algorithms require real-time systems often made of a computing unit close to the image sensor through a short and fast link. Choosing a camera with integrated Field Programmable Gate Array (FPGA) as the computing unit can provide high pipeline and parallel computing adapted to process image in real-time. Moreover,...

Keywords:
THROUGHPUT
HARDWARE

Publication date

Set your own date range

INFONA - science communication portal

Search results

Maximizing CNN accelerator efficiency through resource partitioning

APPROX-NoC: A data approximation framework for Network-on-Chip architectures

Understanding and optimizing asynchronous low-precision stochastic gradient descent

Hardware accelerator for coordinated radioresource scheduling in 5G ultra-high-density distributed antenna systems

Designing a High-Throughput Pipeline for Digitizing Pinned Insects

High throughput design and implementation of SHA-3 hash algorithm

Analysis of K-bit pipelined processor cores using perl benchmarking

Spartan and NEMO: Two HPC-Cloud Hybrid Implementations

High-performance implementation of an HMAC processor based on SHA-3 Hash function

Workload characterization of interactive cloud services on big and small server platforms

Mapping of P4 match action tables to FPGA

A generic high throughput architecture for stream processing

Evaluating Effect of Write Combining on PCIe Throughput to Improve HPC Interconnect Performance

Comparison of hardware and software implementations of selected lightweight block ciphers

Scalable high-performance architecture for convolutional ternary neural networks on FPGA

Hardware-oriented turbo-product codes decoder architecture

OCEAN: An on-chip incremental-learning enhanced processor with gated recurrent neural network accelerators

Bit-level pipelining for highly parallel turbo-code decoders: A critical assessment

Robust throughput boosting for low latency dynamic partial reconfiguration

Proposition and evaluation of a real-time generic architecture for a laser stripe detection system on FPGA

Filter options

Publication date

Content availability

Keywords

Data set

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Data set

Reporting an error / abuse

Sending the report failed

Accessibility options