The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In Data Mining classification plays prominent role in predicting outcomes. One of the best supervised classification techniques in Data Mining is Naive Bayes Classification. Naive Bayes Classification is good at predicting outcomes and often outperforms other classification techniques. One of the reasons behind the strong performance of Naive Bayes Classification is due to the assumption of conditional...
Even though wine-drinkers generally agree that wines may be ranked by quality, wine-tasting is famously subjective. There have been many attempts to construct a more methodical approach to the assessment of wines. We propose a method of assessing wine quality using a decision tree, and test it against the wine-quality dataset from the UC Irvine Machine Learning Repository. Results are 60% in agreement...
Sentiment analysis deals with identifying polarity orientation embedded in users' comments and reviews. It aims at discriminating positive reviews from negative ones. Sentiment is related to culture and language morphology. In this paper, we investigate the effects of language morphology on sentiment analysis in reviews written in the Arabic language. In particular, we investigate, in details, how...
In this paper, we propose an expert search scheme in social networks. The proposed scheme updates a profile by analyzing recent activities, and considers the reliability scores of users and users' ratings that are computed by the updated profile. A user's profile is created by extracting a keyword from the recent activity information and calculating similarity with the keyword. To verify the performance...
Recent years have witnessed the explosive growth of recommender systems in various exciting application domains such as electronic commerce, social networking, and location-based services. A great many algorithms have been proposed to improve the accuracy of recommendation, but until recently the long tail problem rising from inadequate recommendation of niche items is recognized as a real challenge...
Data stored in educational database is increasing day by day. Data mining algorithms can be used to find hidden patterns from the student's database. These patterns can be used to find academic performance of students. The main aim of this study was to determine factors that influence the student's performance. This paper proposes Generalized Sequential Pattern mining algorithm for finding frequent...
Data mining research has produced a significant repertoire of algorithms to predict the classification of data instances with reasonable accuracy. However, data quantity and availability is continuing to rapidly expand such that we no longer have fixed and manageable data sets, but rather continual streams of data. Mining streaming data becomes challenging when using a piece-wise or online approach,...
Texture analysis finds central role in automatic inspection, medical image analysis, document processing and remote sensing. The result of deviation in illuminant direction affects the texture appearance. The images present in the universe are not uniform because of changes in scale., orientation and lighting conditions. The feature extraction of the non uniform images was done using gray level co-occurrence...
Sentiment analysis refers to the automatic extraction of sentiments from a natural language text. We study the effect of subjectivity-based features on sentiment classification on two lexicons and also propose new subjectivity-based features for sentiment classification. The subjectivity-based features we experiment with are based on the average word polarity and the new features that we propose are...
Low information quality is one of the reasons why information extraction initiatives fail. Incomplete information has a pervasive negative impact on downstream processing steps. This work addresses this problem with a novel information extraction approach, which integrates data mining and information extraction methods into a single complementary approach in order to benefit from their respective...
In this talk the author will discuss how parallel and/or distributed compute resources can be used differently: instead of focusing on speeding up algorithms, we propose to focus on improving accuracy. In a nutshell, the goal is to tune data mining algorithms to produce better results in the same time rather than producing similar results a lot faster. He will discuss a number of generic ways of tuning...
The improvement of energy efficiency in wireless cellular networks has become a popular topic in recent years due to its positive effects on the environment and economical benefits for network operators. An effective way to improve energy efficiency is to deploy energy-efficient base stations and turn as many base stations off as possible. In this paper, we consider an LTE network composed by the...
Classification is the process of dividing the data into number of groups which are either dependent or independent of each other and each group acts as a class. The task of Classification can be done by using several methods using different types of classifiers. But classification cannot be done easily when it is to be applied on text documents that is: document classification. The main purpose of...
Extraction of bilingual audio and text data is crucial for designing Speech to Speech (S2S) systems. In this work, we propose an automatic method to segment multilingual audio streams from movies. In addition, the audio streams are aligned with the corresponding subtitles. We found that the proposed method gives 89% perfectly segmented bilingual audio and 6% partially segmented bilingual audio. In...
In knowledge discovery in single sequences, different results could be discovered from the same sequence when different frequency measures are adopted. It is natural to raise such questions as (1) do these frequency measures reflect actual frequencies accurately? (2) what impacts do frequency measures have on discovered knowledge? (3) are discovered results accurate and reliable? and (4) which measures...
The class imbalance problem usually occurs in real applications. The class imbalance is that the amount of one class may be much less than that of another in training set. Under-sampling is a very popular approach to deal with this problem. Under-sampling approach is very efficient, it only using a subset of the majority class. The drawback of under-sampling is that it throws away many potentially...
The problem of privacy-preserving data mining has become more and more important in recent years. Many successful and efficient techniques have been developed. However, in collaborative data analysis, part of the datasets may come from different data owners and may be processed using different data distortion methods. Thus, combinations of datasets processed using different methods are of practical...
Data mining technology is a useful tool for knowledge discovery from large-scale databases. At present, most data mining researchers pay much attention to technique problems for developing data mining models and methods, while little to basic issues of data mining. In this paper, we address this question and propose a domain-oriented data-driven knowledge acquisition model. A data-driven data mining...
Accuracy is a very important criterion for the classifier in the process of classification. In this paper, a unified paradigm for the calculation of accuracy evaluated different classifier, using topological covering-based granular computing, is presented under the given sample space and different ideal classification assumptions. And corresponding examples for the calculation of accuracy in different...
Social tagging systems such as Facebook, YouTube, del.icio.us, Flickr become popular recent years and have achieved widespread success. State-of-art user modeling approaches in tagging systems usually use a vector of weighted tags. Unfortunately, typical user modeling methods using a vector of weighted tags which are based on personal view only and ignore the social view, have some inherent drawbacks...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.