The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Classification is an important technique in data mining. The K-Nearest neighbor (K-NN) algorithm is a memory based algorithm and is capable of producing satisfactory results when applied on certain data but the distance measures used in this algorithm is not capable of handling the data sets containing the uncertain attribute values. Data uncertainty is common in real word applications. In this paper...
The classification process of the Counter Propagation neural network (CPN) is investigated. The homogeneity distribution of the codebook vectors is a key element in the accuracy of the classification process. The paper defines an appropriate homogeneity measure that is strongly correlated with the optimal misclassification error. Based on this homogeneity value, the paper proposes three modification...
Aiming at properties of remote sensing image data such as high-dimension, nonlinearity and massive unlabeled samples, a kind of probability least squares support vector machine (PLSSVM) classification method based on hybrid entropy and L1 norm was proposed. Firstly, hybrid entropy was designed by combining quasi-entropy with entropy difference, which was used to select the most "valuable"...
Gene selection plays a crucial role in the analysis of microarray data with high dimensionality and small sample size. Incremental wrapper based feature subset selection (FSS) methods, among various feature selection approaches, tend to obtain high quality feature subset and better classification accuracy than filter methods, while it is much more time consuming since the interdependence and redundancy...
Many advancement is made in recent days and number of techniques are proposed by different researchers for processing and extracting knowledge from big data. But to evaluate the consistency in extracted model is always questionable. In this paper we are presenting two techniques for measuring the consistency between extracted model and predicting their applicability. In this paper, Meta learning based...
Analyses show that the absorption band position determines the type of mineral radically. The paper proposes a method of applying GA (Genetic Algorithm) to the selection of the uranium mineral band feature sub-set. First, on the fundamental of the correlation between feature-based metrics: information entropy, information gain, symmetrical uncertainty and type space, the GA which is a random search...
Web caching and pre-fetching are vital technologies that can increase the speed of Web loading processes. Since speed and memory are crucial aspects in enhancing the performance of mobile applications and websites, a better technique for Web loading process should be investigated. The weaknesses of the conventional Web caching policy include meaningless information and uncertainty of knowledge representation...
In order to compare the classification accuracies and performance differences between traditional and probability-based decision tree classifiers, and come to understand those algorithms, which aim to improve construction efficiency of probability-based decision trees, mentioned in "Decisions Trees for Uncertain Data", this paper tested several algorithms, named AVG, UDT, UDT-BP, UDT-LP,...
For the sake of measuring fuzzy uncertainty and rough uncertainty of real datasets, the fuzzy rough membership function (FRMF) defined in fuzzy rough set is introduced. A new fuzzy rough neural network (FRNN) is constructed based on neural network implementation of FRMF. FRNN has the merits of quick learning and good classification performance. And then a new neural network feature selection algorithm...
Land cover classification accuracy assessments are frequently limited to an error matrix, which derived from location-independent measures and consequently doesn't provide any information about the spatial distribution of the error. The objective of this work is to present a methodology for mapping the spatial distribution of classification errors based on stochastic simulation and that takes into...
Within the complex and competitive semiconductor manufacturing industry, lot cycle time (CT) remains one of the key performance indicators. Its reduction is of strategic importance as it contributes to cost decreasing, time-to-market shortening, faster fault detection, achieving throughput targets, and improving production-resource scheduling. To reduce CT, we suggest and investigate a data-driven...
Accurate land use/cover (LUC) classification data derived from remotely sensed data are very important for land use planning and environment sustainable development. Traditionally, statistical classifiers are often used to generate these data, but these classifiers rely on assumptions that may limit their utilities for many datasets. Conversely, artificial neural network (ANN) and decision tree (DT)...
Active learning methods seek to reduce the number of labeled instances needed to train an effective classifier. Most current methods assume the availability of some reasonable amount of initially labeled training data so that the learners can be trained with sufficient quality. However, for many applications, the amount of initial training data is often limited, this will affect the quality of the...
Active learning methods seek to reduce the number of labeled instances needed to train an effective classifier. Most current methods are myopic, i.e. select a single unlabelled sample to label at a time. The batch-mode active learning methods, on the other hand, typically select top N unlabeled samples with maximum score. Such selected samples often cannot guarantee the learner's performance. In this...
In the field of data mining, one of the main objectives is to achieve the highest possible classification accuracy. This paper presents a classifier fusion system based on the principles of the Dempster-Shafer theory of evidence combination. It allows one to combine evidence from different sources and arrive at a degree of belief (represented by a belief function) that takes into account all the available...
The DEM of Seabed is the core and foundation in the analysis of chart. In this paper a method of constructing the DEM of seabed based on uncertainty is proposed. Firstly, according to the uncertainty of the soundings data, the distributing law is explained and the interpolation model is constructed. Secondly, according to propagating law of the uncertainty, the DEM of Seabed based on uncertainty of...
Global land cover is the characterizations of Earth's surface, and it has been recognized as key drivers of biodiversity changes. Most biological conservation studies are focused on using land cover data to study the pattern of biological environment. However, few studies considered the accuracy of land cover products, which may induce further errors. Ecologists should be aware of those problems before...
Firstly, by preprocessing classification rule, we account distinct outlier attributes subspace of the rules about classification rules attributes, then it uses attribute weight vector to calculate weighted distance; secondly, it analyzes subspace outlier influence factor of weighted neighborhood area; finally, we creates frequent matching Sub-Set by comparing with subspace outlier influence factor...
When the labeled data are few, exploiting amount of unlabeled data can be helpful for improve learning performance of classifier. The key issue for active learning to solve is how to select the most ??valuable?? training samples to reduce labeled cost of amount of unlabeled samples. In the paper, we propose an efficient active Bayes classifier by using affinity propagation (AP) to select the most...
Support vector machines (SVMs) have met with significant success in numerous real-world learning tasks. However, like most machine learning algorithms, SVMs is a supervised learning which is based on the assumption that it is straightforward to obtain labeled data, but in reality labeled data can be scarce or expensive to obtain. Active learning (AL) is a way to deal with the above problem by asking...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.