The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Modern Non-Volatile Memory (NVM) promises persistence, byte-addressability and DRAM-like read and write latency, which offers great opportunities for big data storage architecture. These excellent properties indicate that NVM has the potential to be incorporated with key-value stores to achieve high performance and durability simultaneously. In this paper, we propose an efficient key-value storage...
Clustering is a classical unsupervised learning task, which is aimed to divide a data set into several groups with similar objects. Clustering problem has been studied for many years, and many excellent clustering algorithms have been proposed. In this paper, we propose a novel clustering method based on density, which is simple but effective. The primary idea of the proposed method is given as follows...
Online social networks (OSNs) have attracted millions of users worldwide over the last decade. In response to a series of urgent issues faced by existing OSNs, such as information overload, single-point failure, and the privacy issue, this paper introduces a self-organized decentralized OSN (SDOSN) over a social overlay resembling real-life social graph. The social overlay considers social relationship...
Academic publication archives often draw from numerous, heterogeneous sources, whose records can follow differing naming conventions. As such, ambiguity issues concerning authorship of scientific papers often arise, such as authors sharing similar names, the use of first names versus initials, or alternate name spellings for the same author. These ambiguities have plagued research on scientific collaboration...
Big data analysis requires adequate infrastructure and programming paradigms capable of processing large amount of data. Hadoop, the most known open-source implementation of the MapReduce paradigm, is widely employed in big data analysis frameworks. However, in many recent application scenarios data are natively distributed over different geographic regions in data centers which are inter-connected...
In this feasibility study, we demonstrate the use of a factorgraph-based probabilistic graphical model approach to process longitudinal data derived from a population's electronic health records (EHR). Processing of EHR allows for forecasting patient-specific health complications and inference of population-level statistics on several epidemiological factors. As a case-study, we provide preliminary...
It is important to analyze and predict meteorological phenomena in real-time. Parallel programming by exploiting thousands of threads in GPUs can be efficiently used to speed up the execution of many applications. However, GPUs have limitations when used for processing big data, which can be better analyzed using distributed computing platforms such as Hadoop and Spark. In this paper, we propose DAMB...
The Support Vector Machine(SVM) is well known in machine learning and artificial intelligence for its high performance in data classification, regression and forecasting. Usually for large scaled dataset, an incremental training algorithm is applied for tuning or balancing the training cost and the accuracy in SVM applications. This paper presents an improved incremental training approach for large...
An advertising campaign is usually composed of a series of coordinated advertisements, with various formats and delivered through different media channels. Several existing studies have attempted to measure the individual contribution of related advertisements in a campaign, resulting in rule-based and data-driven multi-touch attribution models. However, most of these models ignored the interaction...
A key problem in online social networks is the identification of users' link information and the analysis of how these are reflected in the recommender systems. The basis to tackle this issue is user similarity measures. In this paper, we propose non-negative multiple matrix factorization with social similarity for recommender systems, considering the similarities between users, the relationships...
Behavioral and targeted profiling of users is an important task in marketing and in the advertising industry. Being able to match a given user profile to an advertising that leads to effective purchases is challenging because of a very tiny proportion of users willing to purchase goods and thus monetize the advertising. With such proportions being less than one percent of the overall user population,...
Nowadays more and more organisations use the collaborative environments, such as the social networks, to identify profiles of competencies, which are usually declared by the users themselves. We postulate that the analysis of the computer-supported collaborative activities may provide information about the users' competencies in specific domains. In this research work, we present a trace-based approach...
Hybrid cloud bursting (i.e., leasing temporary off-premise cloud resources to boost the capacity during peak utilization), has made significant impact especially for big data analytics, where the explosion of data sizes and increasingly complex computations frequently leads to insufficient local data center capacity. Cloud bursting however introduces a major challenge to runtime systems due to the...
Nowadays, social network sites, such as Facebook and Twitter, have tremendous number of users in their repositories. Having this huge amount of data requires analyzing them to get statistics about the users and their interests. In this paper, we propose a new algorithm that clusters the nodes in social networks into communities based on their geodesic location and the similarity between their interests...
User-generated reviews on the e-commerce site reflect consumers' sentiment about products, which can further direct consumers' purchasing behaviors and sellers' marketing strategies. In this paper, we propose a semi-supervised approach to mine the aspects of product discussed in Chinese online reviews and also the sentiments expressed in different aspects. We first apply the Latent Dirichlet Allocation...
We present a novel Cyber Security analytics framework. Wedemonstrate a comprehensive cyber security monitoring system toconstruct cyber security correlated events with feature selection toanticipate behaviour based on various sensors.
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.