The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
PHP is the dominant server-side scripting language used to implement dynamic web content. Just-in-time compilation, as implemented in Facebook's state-of-the-art HipHopVM, helps mitigate the poor performance of PHP, but substantial overheads remain, especially for realistic, large-scale PHP applications. This paper analyzes such applications and shows that there is little opportunity for conventional...
Stochastic gradient descent (SGD) is one of the most popular numerical algorithms used in machine learning and other domains. Since this is likely to continue for the foreseeable future, it is important to study techniques that can make it run fast on parallel hardware. In this paper, we provide the first analysis of a technique called BUCKWILD! that uses both asynchronous execution and low-precision...
Evolution-in-materio is a form of unconventional computing combining materials' training and evolutionary search algorithms. In previous work, a mixture of single-walled-carbon-nanotubes (SWCNTs) dispersed in a liquid crystal (LC) was trained so that its morphology and electrical properties were gradually changed to perform a computational task. Material-based computation is treated as an optimisation...
Background. Often motivated by optimization objectives, software products are characterized by different subsequent releases and deployed through different strategies. The impact of these two aspects of software on energy consumption has still to be completely understood and can be improved by carrying out ad-hoc analyses for specific software products. Aims. In this research we report on an industrial...
Hardware-software (HW-SW) partitioning plays a vital role in design phase of embedded system. The partitioning is a process to map each computation task in an application to either software or hardware. In general, hardware run faster compared to software, but with significant cost and resources utilization. Thus, current embedded system often incorporates a mix of hardware and software component...
Support Vector Machines (SVMs) are supervised learning models of the machine learning field whose performance strongly depended on its hyperparameters. The Bio-inspired Optimization Tool for SVM (BIOTS) tool is based on a Multi-Objective Particle Swarm Algorithm (MOPSO) to tune hyperparameters of SVMs. In this work, BIOTS is proposed along with a custom hardware design generator (VHDL) that implements...
To improve the effective utilisation of its supercomputing platforms, the New Zealand eScience Infrastructure (NeSI) offers, in addition to user support and the installation of a comprehensive software stack, a consultancy service to some of its users. Here we present lessons learned from this work and how additional improvements can be made to further enhance productivity of researchers on computing...
Datacenters provide flexibility and high performance for users and cost efficiency for operators. However, the high computational demands of big data and analytics technologies such as MapReduce, a dominant programming model and framework for big data analytics, mean that even small changes in the efficiency of execution in the data center can have a large effect on user cost and operational cost...
In this paper, a low-cost accelerator for the ηT pairing in characteristic three over the super-singular elliptic curves is designed. As the critical operations of ηT pairing, the cubing and sparse multiplications over GF(36m) in the Miller's algorithm are merged and their arithmetic are modified and scheduled to reduce the intermediate data related overhead. With these optimizations, the Miller's...
The increasing use of digital signal processors (DSPs) in wireless communications and signal processing necessitates the optimization of compilers to support special hardware features. In this paper, we propose a compiler transformation method for zero overhead loop (ZOL). It supports very long instruction word (VLIW), internal branches and the loops whose iterative times are known at runtime and...
Large scale distributed simulation should be well planned before the execution, since applying unnecessary hardware only wastes our time and money. On the other side, we need enough hardware to achieve an acceptable performance. Thus, it is considerable to estimate the performance of a large scale distributed simulation before the execution. Such an estimation also improves the efficiency of the applied...
Many software mechanisms for geophysics exploration in Oil & Gas industries are based on wave propagation simulation. To perform such simulations, state-of-art HPC architectures are employed, generating results faster and with more accuracy at each generation. The software must evolve to support the new features of each design to keep performance scaling. Furthermore, it is important to understand...
FPGAs are becoming an attractive choice as a heterogeneous computing unit for scientific computing because FPGA vendors are adding floating-point-optimized architectures to their product lines. Additionally, high-level synthesis (HLS) tools such as Altera OpenCL SDK are emerging, which could potentially break the FPGA programming wall and provide a streamlined flow for domain experts in scientific...
The performance model of an application can provide understanding about its runtime behavior on particular hardware. Such information can be analyzed by developers for performance tuning. However, model building and analyzing is frequently ignored during software development until performance problems arise because they require significant expertise and can involve many time-consuming application...
FPGAs are well known for their ability to perform non-standard computations not supported by classical microprocessors. Many libraries of highly customizable application-specific IPs have exploited this capablity. However, using such IPs usually requires handcrafted HDL, hence significant design efforts. High Level Synthesis (HLS) lowers the design effort thanks to the use of C/C++ dialects for programming...
Reinforcement Learning (RL) is an area of machine learning in which an agent interacts with the environment by making sequential decisions. The agent receives reward from the environment to find an optimal policy that maximises the reward. Trust Region Policy Optimisation (TRPO) is a recent policy optimisation algorithm that achieves superior results in various RL benchmarks, but is computationally...
Field-Programmable Gate Arrays (FPGAs) are gaining considerable momentum in mainstream high-performance systems in recent years due to their flexibility and low power consumption. Still, FPGAs remain largely unavailable to software programmers due to programming and debugging difficulties that are inherent to standard Hardware Description Languages. The performance that hardware-oblivious software...
Increasing data set sizes motivate for a shift of focus from computation-centric systems to data-centric systems, where data movement is treated as a first-class optimization metric. An example of this emerging paradigm is in-situ computing in largescale computing systems. Observing that data movement costs are increasing at an exponential rate even at a node level (as a node itself is fast-becoming...
New trends in neural computation, now dealing with distributed learning on pervasive sensor networks and multiple sources of big data, make necessary the use of computationally efficient techniques to be implemented on simple and cheap hardware architectures. In this paper, a nonuniform quantization at the input layer of neural networks is introduced, in order to optimize their implementation on hardware...
Increased complexity of computer hardware makes close to impossible to rely on hand-coding at the-level of HDLs for digital hardware design. High-level synthesis can be employed instead, in order to automatically obtain HDL codes from highlevel language functional descriptions. With high-level synthesis it becomes easier to design coprocessors, accelerators, and other special-purpose hardware. Nonetheless,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.