Search results

chapter

Architectural support for server-side PHP processing

Dibakar Gope, David J. Schlais, Mikko H. Lipasti

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 507 - 520

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

PHP is the dominant server-side scripting language used to implement dynamic web content. Just-in-time compilation, as implemented in Facebook's state-of-the-art HipHopVM, helps mitigate the poor performance of PHP, but substantial overheads remain, especially for realistic, large-scale PHP applications. This paper analyzes such applications and shows that there is little opportunity for conventional...

chapter

Understanding and optimizing asynchronous low-precision stochastic gradient descent

Christopher De Sa, Matthew Feldman, Christopher Re, Kunle Olukotun

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 561 - 574

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

Stochastic gradient descent (SGD) is one of the most popular numerical algorithms used in machine learning and other domains. Since this is likely to continue for the foreseeable future, it is important to study techniques that can make it run fast on parallel hardware. In this paper, we provide the first analysis of a technique called BUCKWILD! that uses both asynchronous execution and low-precision...

chapter

Computing Based on Material Training: Application to Binary Classification Problems

E. Vissol-Gaudin, A. Kotsialos, C. Groves, C. Pearson, more

2017 IEEE International Conference on Rebooting Computing (ICRC) > 1 - 8

2017 IEEE International Conference on Rebooting Computing (ICRC)

Evolution-in-materio is a form of unconventional computing combining materials' training and evolutionary search algorithms. In previous work, a mixture of single-walled-carbon-nanotubes (SWCNTs) dispersed in a liquid crystal (LC) was trained so that its morphology and electrical properties were gradually changed to perform a computational task. Material-based computation is treated as an optimisation...

chapter

Estimating Energy Impact of Software Releases and Deployment Strategies: The KPMG Case Study

Roberto Verdecchia, Giuseppe Procaccianti, Ivano Malavolta, Patricia Lago, more

2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM) > 257 - 266

2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)

Background. Often motivated by optimization objectives, software products are characterized by different subsequent releases and deployed through different strategies. The impact of these two aspects of software on energy consumption has still to be completely understood and can be improved by carrying out ad-hoc analyses for specific software products. Aims. In this research we report on an industrial...

chapter

Performance evaluation of genetic algorithm to solve hardware-software partitioning design: A factorial design analysis

Earn Tzeh Tan, Zaini Abdul Halim

TENCON 2017 - 2017 IEEE Region 10 Conference > 439 - 442

TENCON 2017 - 2017 IEEE Region 10 Conference

Hardware-software (HW-SW) partitioning plays a vital role in design phase of embedded system. The partitioning is a process to map each computation task in an application to either software or hardware. In general, hardware run faster compared to software, but with significant cost and resources utilization. Thus, current embedded system often incorporates a mix of hardware and software component...

chapter

A SVM optimization tool and FPGA system architecture applied to NMPC

Carlos Eduardo Santos, Renato Coral Sampaio, Helon Ayala, Leandro dos S. Coelho, more

2017 30th Symposium on Integrated Circuits and Systems Design (SBCCI) > 96 - 102

2017 30th Symposium on Integrated Circuits and Systems Design (SBCCI)

Support Vector Machines (SVMs) are supervised learning models of the machine learning field whose performance strongly depended on its hyperparameters. The Bio-inspired Optimization Tool for SVM (BIOTS) tool is based on a Multi-Objective Particle Swarm Algorithm (MOPSO) to tune hyperparameters of SVMs. In this work, BIOTS is proposed along with a custom hardware design generator (VHDL) that implements...

chapter

How NeSI Helps Users Run Better and Faster on New Zealand's Supercomputing Platforms

Alexxander Pletzer, Wolfgang Hayek, Chris Scott, Brian Corrie, more

2017 IEEE 13th International Conference on e-Science (e-Science) > 465 - 466

2017 IEEE 13th International Conference on e-Science (e-Science)

To improve the effective utilisation of its supercomputing platforms, the New Zealand eScience Infrastructure (NeSI) offers, in addition to user support and the installation of a comprehensive software stack, a consultancy service to some of its users. Here we present lessons learned from this work and how additional improvements can be made to further enhance productivity of researchers on computing...

chapter

Co-locating and concurrent fine-tuning MapReduce applications on microservers for energy efficiency

Maria Malik, Dean M. Tullsen, Houman Homayoun

2017 IEEE International Symposium on Workload Characterization (IISWC) > 22 - 31

2017 IEEE International Symposium on Workload Characterization (IISWC)

Datacenters provide flexibility and high performance for users and cost efficiency for operators. However, the high computational demands of big data and analytics technologies such as MapReduce, a dominant programming model and framework for big data analytics, mean that even small changes in the efficiency of execution in the data center can have a large effect on user cost and operational cost...

chapter

A 0.3mm² 280MHz GF(3^m) η_T pairing accelerator for lightweight system

Xusheng Wang, Xiangyu Li

2017 International Conference on Electron Devices and Solid-State Circuits (EDSSC) > 1 - 2

2017 International Conference on Electron Devices and Solid-State Circuits (EDSSC)

In this paper, a low-cost accelerator for the η_T pairing in characteristic three over the super-singular elliptic curves is designed. As the critical operations of η_T pairing, the cubing and sparse multiplications over GF(3^6m) in the Miller's algorithm are merged and their arithmetic are modified and scheduled to reduce the intermediate data related overhead. With these optimizations, the Miller's...

chapter

A compilation method for zero overhead loop in DSPs with VLIW

Rui Chang, Jun Wu, Haoqi Ren

2017 9th International Conference on Wireless Communications and Signal Processing (WCSP) > 1 - 7

2017 9th International Conference on Wireless Communications and Signal Processing (WCSP)

The increasing use of digital signal processors (DSPs) in wireless communications and signal processing necessitates the optimization of compilers to support special hardware features. In this paper, we propose a compiler transformation method for zero overhead loop (ZOL). It supports very long instruction word (VLIW), internal branches and the loops whose iterative times are known at runtime and...

chapter

Estimating performance of large scale distributed simulation built on homogeneous hardware

Desheng Fu, Matthias Becker, Marcus O'Connor, Helena Szczerbicka

2017 IEEE/ACM 21st International Symposium on Distributed Simulation and Real Time Applications (DS-RT) > 1 - 8

2017 IEEE/ACM 21st International Symposium on Distributed Simulation and Real Time Applications (DS-RT)

Large scale distributed simulation should be well planned before the execution, since applying unnecessary hardware only wastes our time and money. On the other side, we need enough hardware to achieve an acceptable performance. Thus, it is considerable to estimate the performance of a large scale distributed simulation before the execution. Such an estimation also improves the efficiency of the applied...

chapter

Strategies to Improve the Performance of a Geophysics Model for Different Manycore Systems

Matheus S. Serpa, Eduardo H.M. Cruz, Matthias Diener, Arthur M. Krause, more

2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW) > 49 - 54

2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)

Many software mechanisms for geophysics exploration in Oil & Gas industries are based on wave propagation simulation. To perform such simulations, state-of-art HPC architectures are employed, generating results faster and with more accuracy at each generation. The software must evolve to support the new features of each design to keep performance scaling. Furthermore, it is important to understand...

chapter

Evaluating irregular memory access on OpenCL FPGA platforms: A case study with XSBench

Yingyi Luo, Xianshan Wen, Kazutomo Yoshii, Seda Ogrenci-Memik, more

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 4

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

FPGAs are becoming an attractive choice as a heterogeneous computing unit for scientific computing because FPGA vendors are adding floating-point-optimized architectures to their product lines. Additionally, high-level synthesis (HLS) tools such as Altera OpenCL SDK are emerging, which could potentially break the FPGA programming wall and provide a streamlined flow for domain experts in scientific...

chapter

Mira: A Framework for Static Performance Analysis

Kewen Meng, Boyana Norris

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 103 - 113

2017 IEEE International Conference on Cluster Computing (CLUSTER)

The performance model of an application can provide understanding about its runtime behavior on particular hardware. Such information can be analyzed by developers for performance tuning. However, model building and analyzing is frequently ignored during software development until performance problems arise because they require significant expertise and can involve many time-consuming application...

chapter

Bridging high-level synthesis and application-specific arithmetic: The case study of floating-point summations

Yohann Uguen, Florent de Dinechin, Steven Derrien

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 8

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

FPGAs are well known for their ability to perform non-standard computations not supported by classical microprocessors. Many libraries of highly customizable application-specific IPs have exploited this capablity. However, using such IPs usually requires handcrafted HDL, hence significant design efforts. High Level Synthesis (HLS) lowers the design effort thanks to the use of C/C++ dialects for programming...

chapter

Customised pearlmutter propagation: A hardware architecture for trust region policy optimisation

Shengjia Shao, Wayne Luk

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 6

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

Reinforcement Learning (RL) is an area of machine learning in which an agent interacts with the environment by making sequential decisions. The agent receives reward from the environment to find an optimal policy that maximises the reward. Trust Region Policy Optimisation (TRPO) is a recent policy optimisation algorithm that achieves superior results in various RL benchmarks, but is computationally...

chapter

Evaluating high-level design strategies on FPGAs for high-performance computing

Artur Podobas, Hamid Reza Zohouri, Naoya Maruyama, Satoshi Matsuoka

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 4

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

Field-Programmable Gate Arrays (FPGAs) are gaining considerable momentum in mainstream high-performance systems in recent years due to their flexibility and low power consumption. Still, FPGAs remain largely unavailable to software programmers due to programming and debugging difficulties that are inherent to standard Hardware Description Languages. The performance that hardware-oblivious software...

chapter

Quantifying the Potential Benefits of On-chip Near-Data Computing in Manycore Processors

Jagadish B. Kotra, Diana Guttman, Nachiappan Chidamabaram N., Mahmut T. Kandemir, more

2017 IEEE 25th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS) > 198 - 209

2017 IEEE 25th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)

Increasing data set sizes motivate for a shift of focus from computation-centric systems to data-centric systems, where data movement is treated as a first-class optimization metric. An example of this emerging paradigm is in-situ computing in largescale computing systems. Observing that data movement costs are increasing at an exponential rate even at a node level (as a node itself is fast-becoming...

chapter

A nonuniform quantizer for hardware implementation of neural networks

Rosa Altilio, Antonello Rosato, Massimo Panella

2017 European Conference on Circuit Theory and Design (ECCTD) > 1 - 4

2017 European Conference on Circuit Theory and Design (ECCTD)

New trends in neural computation, now dealing with distributed learning on pervasive sensor networks and multiple sources of big data, make necessary the use of computationally efficient techniques to be implemented on simple and cheap hardware architectures. In this paper, a nonuniform quantization at the input layer of neural networks is introduced, in order to optimize their implementation on hardware...

chapter

Minimal-area loop pipelining for high-level synthesis with CCC

Georgios Dimitriou, Michael Dossis, Georgios Stamoulis

2017 South Eastern European Design Automation, Computer Engineering, Computer Networks and Social Media Conference (SEEDA-CECNSM) > 1 - 8

2017 South Eastern European Design Automation, Computer Engineering, Computer Networks and Social Media Conference (SEEDA-CECNSM)

Increased complexity of computer hardware makes close to impossible to rely on hand-coding at the-level of HDLs for digital hardware design. High-level synthesis can be employed instead, in order to automatically obtain HDL codes from highlevel language functional descriptions. With high-level synthesis it becomes easier to design coprocessors, accelerators, and other special-purpose hardware. Nonetheless,...

INFONA - science communication portal

Search results

Architectural support for server-side PHP processing

Understanding and optimizing asynchronous low-precision stochastic gradient descent

Computing Based on Material Training: Application to Binary Classification Problems

Estimating Energy Impact of Software Releases and Deployment Strategies: The KPMG Case Study

Performance evaluation of genetic algorithm to solve hardware-software partitioning design: A factorial design analysis

A SVM optimization tool and FPGA system architecture applied to NMPC

How NeSI Helps Users Run Better and Faster on New Zealand's Supercomputing Platforms

Co-locating and concurrent fine-tuning MapReduce applications on microservers for energy efficiency

A 0.3mm² 280MHz GF(3^m) η_T pairing accelerator for lightweight system

A compilation method for zero overhead loop in DSPs with VLIW

Estimating performance of large scale distributed simulation built on homogeneous hardware

Strategies to Improve the Performance of a Geophysics Model for Different Manycore Systems

Evaluating irregular memory access on OpenCL FPGA platforms: A case study with XSBench

Mira: A Framework for Static Performance Analysis

Bridging high-level synthesis and application-specific arithmetic: The case study of floating-point summations

Customised pearlmutter propagation: A hardware architecture for trust region policy optimisation

Evaluating high-level design strategies on FPGAs for high-performance computing

Quantifying the Potential Benefits of On-chip Near-Data Computing in Manycore Processors

A nonuniform quantizer for hardware implementation of neural networks

Minimal-area loop pipelining for high-level synthesis with CCC

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options