Many websites allow users to tag data items to make them easier to find. In this paper we consider the problem of classifying tagged data according to user-specified interests. We present an approach for aggregating background knowledge from the Web to improve the performance of a classier. In previous work, researchers have developed technology for extracting knowledge, in the form of relational...
In data mining applications it is common to have more than one data source available to describe the same record. For example, in biological sciences, the same genes may be characterized through many types of experiments. Which of the data sources proves to be most reliable in predictions may depend on the record in question. For some records pieces of information may be unavailable because an experiment...
In the predictive modeling tasks, a clear distinction is often made between learning problems that are supervised or unsupervised, the first involving only labeled data (training patterns with known category labels) while the latter involving only unlabeled data. There is a growing interest in a hybrid setting, called semi-supervised learning, in semi-supervised classification, the labels of only...
Genome mapping, or the experimental determination of the ordering of DNA markers on a chromosome, is an important step in genome sequencing and ultimate assembly of sequenced genomes. The presented research addresses the problem of identifying markers that cannot be placed reliably. If such markers are included in standard mapping procedures they can result in an overall poor mapping. Traditional...
Commercial dishwashing systems currently involve human loading, sorting, inspecting, and unloading dishes and silverware pieces before and after washing in hot and humid environments. In such difficult working conditions, leading to high turn-over of low-paid employees, automation is desirable, especially in large-scale kitchens of hospitals, navy ships, schools, hotels and other dining facilities...
Predicting human behavior has been the subject of many research areas especially in machine learning. Due to its potential benefits, financially or otherwise, researchers have focused on modeling human behavior from recommending items in an online store to predicting the behavior of an entire ecosystem. In this paper, we make an attempt to predict human preference towards natural speech. The proposed...
Considering the special needs of credit risk analysis, the Infinite DEcision Agent ensemble Learning (IDEAL) system is proposed. In the first level of our model, we adopt soft margin boosting to overcome over fitting. In the second level, the RVM algorithm is revised for boosting so that different RVM agents can be generated from the updated instance space of the data. In the third level, the perceptron...
Many online social networks such as Face book, Linked In and My Space have become increasingly important. These social networks are rich in information about entities like hobbies, demographic information, friendship, and other attributes. This information can be used extensively for network analysis. One of the most important problems in social network analysis is community detection. The community...
An automated strategy for decomposing time series into small, elementary subsequences is proposed. This is accomplished in two steps: first the time series must be decomposed into simpler sub-series (segmentation), next each sub series has to be suitably modeled or uniquely characterized (classification). In this paper, an approximation employing the first right singular vector of the data matrix...
While the popularity of recommender systems is growing rapidly in e-commerce services, profile injection attacks are a great threat to their robustness and trustworthiness. Such attacks can be easily produced and inserted in recommender systems to alter the recommendation results. In such systems, attackers intentionally insert attack profiles to change the systems output to their advantage. This...