The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Software Transactional Memory (STM) allows encapsulating shared-data accesses within transactions, executed with atomicity and isolation guarantees. The assessment of the consistency of a running transaction is performed by the STM layer at specific points of its execution, such as when a read or write access to a shared object occurs, or upon a commit attempt. However, performance and energy efficiency...
A kernel or mini-app is a self-contained small application that retains certain characteristics of the original application [7]. Working on a kernel or mini-app in the place of the original application can dramatically reduce the resources and effort required for performing software tasks such as performance optimization and porting to new platforms. However, using kernel as a proxy is based on the...
Word2Vec is a popular set of machine learning algorithms that use a neural network to generate dense vector representations of words. These vectors have proven to be useful in a variety of machine learning tasks. In this work, we propose new methods to increase the speed of the Word2Vec skip gram with hierarchical softmax architecture on multi-core shared memory CPU systems, and on modern NVIDIA GPUs...
Nowadays, there are many embedded systems with different architectures that have incorporated GPUs. However, it is difficult to develop CPU-GPU embedded systems using component-based development (CBD), since existing CBD approaches have no support for GPU development. In this context, when targeting a particular CPU-GPU platform, the component developer is forced to construct hardware-specific components,...
The hybrid runtime (HRT) model offers a path towards high performance and efficiency. By integrating the OS kernel, runtime, and application, an HRT allows the runtime developer to leverage the full feature set of the hardware and specialize OS services to the runtime's needs. However, conforming to the HRT model currently requires a port of the runtime to the kernel level, for example to the Nautilus...
Multi-view learning is a novel paradigm that aims at obtaining better results by examining the information from several perspectives instead of by analysing the same information from a single viewpoint. The multi-view methodology has widely been used for semi-supervised learning, where just some patterns were previously classified by an expert and there is a large amount of unlabelled ones. However...
Regularization kernel network (RKN) is an effective and widely used kernel method for nonlinear regression analysis. In this paper, I characterize its bias and propose an approach to correct the bias. This leads to a new method called bias corrected regularization kernel network (BCRKN). Theoretical characterizations and simulation studies are used to verify the effectiveness of this bias corrected...
Heterogeneous processing has gained popularity in the high performancecomputing (HPC) area lately and it appears to have a great potential for future data centers. In this regard, accelerators, such as GPUs and Intel Xeon Phi, have already started to play a significant role in HPC systems offering a high degree of parallelism to application developers. Furthermore, hardware virtualization is gaining...
Nowadays, developing effective techniques able to deal with data coming from structured domains is becoming crucial. In this context kernel methods are the state-of-the-art tool widely adopted in real-world applications that involve learning on structured data. Contrarily, when one has to deal with unstructured domains, deep learning methods represent a competitive, or even better, choice. In this...
This paper presents two approaches using a Block Low-Rank (BLR) compression technique to reduce the memory footprint and/or the time-to-solution of the sparse supernodal solver PASTIX. This flat, non-hierarchical, compression method allows to take advantage of the low-rank property of the blocks appearing during the factorization of sparse linear systems, which come from the discretization of partial...
In this paper, we investigate the interactions between topic persons to help readers construct the background knowledge of a topic. We proposed a rich interactive tree structure to represent syntactic, context, and semantic information of text, and this structure is incorporated into a tree-based convolution kernel to identify segments that convey person interactions and further construct person interaction...
Behavior-based analysis of dynamically executed binaries has become a widely used technique for the identification of suspected malware. Most solutions rely on function call patterns to determine whether a sample is exhibiting malicious behavior. These system and API calls are usually regarded individually and do not consider contextual information or process inter-dependencies. In addition, the patterns...
For a given TCP or UDP flow, protocol processing of incoming packets is performed on the core that receives the interrupt, while the user-space application which consumes the data may run on the same or a different core. If the cores are not the same, additional costs due to context switches, cache misses, and the movement of data between the caches of the cores may occur. The magnitude of this cost...
Modern commodity operating systems do not provide developers with user-space abstractions for building high-speed packet processing applications. The conventional raw socket is inefficient and unable to take advantage of the emerging hardware, like multi-core processors and multi-queue network adapters. In this paper we present the NetSlice operating system abstraction. Unlike the conventional raw...
The study of the evolution of highly configurable systems requires a thorough understanding of thee core ingredients of such systems: (1) the underlying variability model; (2) the assets that together implement the configurable features; and (3) the mapping from variable features to actual assets. Unfortunately, to date no systematic way to obtain such information at a sufficiently fine grained level...
GPUs have been widely adopted in data centers to provide acceleration services to many applications. Sharing a GPU is increasingly important for better processing throughput and energy efficiency. However, quality of service (QoS) among concurrent applications is minimally supported. Previous efforts are too coarse-grained and not scalable with increasing QoS requirements. We propose QoS mechanisms...
To consider QoS for resource-limited mobile systems, we introduce a fast preemption mechanism on GPUs. First, we involve a dual-kernel execution model to support fine-grained preemption, and a resource allocation policy to avoid resource fragmentation problem. Second, we propose a preemption victim selection scheme to reduce the throughput overhead while satisfying a required preemption latency. Evaluations...
We present a method for vision-based speaker identification in a group conversation. The group context in the conversation is modeled by the integrated face direction of group members. Experimental results show that integrated face direction of group members is effective for speaker identification in a group.
I-vector space feature has been recently proved to be very efficient in speaker recognition field. In this paper, we assess using the i-vector approach for emotional speaker recognition in order to boost the performance which is deteriorated by emotions. The key idea of the i-vector algorithm is to represent each speaker by a fixed length and low dimensional feature vector. The concatenation of these...
Users interact with mobile apps with certain intents such as finding a restaurant. Some intents and their corresponding activities are complex and may involve multiple apps; for example, a restaurant app, a messenger app and a calendar app may be needed to plan a dinner with friends. However, activities may be quite personal and third-party developers would not be building apps to specifically handle...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.