The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Segmentation is considered as a core step for any recognition or classification method and for the text within any document to be effectively recognized it must be segmented accurately. In this paper a text and writer independent algorithm for the segmentation of sub-words in Arabic words has been presented. The concept is based around the global binarization of an image at various thresholding levels...
The clustering ensembles mingle numerous partitions of a specified data into a single clustering solution. Clustering ensemble has emerged as a potent approach for ameliorating both the forcefulness and the stability of unsupervised classification results. One of the major problems in clustering ensembles is to find the best consensus function. Finding final partition from different clustering results...
Most of the clustering algorithms are affected by the number of attributes and instances with respect to the computation time. Thus, the data mining community has made efforts to enable induction of the clustering efficient. Hence, scalability is naturally a critical issue that the data mining community faces. A method to handle this issue is to use a subset of all instances. This paper suggests an...
Clustering is one of the most useful methods of intelligent engineering domain, in which a set of similar objects are categorized into clusters. Almost all of the well-known clustering algorithms require input parameters which are hard to determine but have a significant influence on the clustering result. Furthermore, the majority is not robust enough towards noisy data. This paper presents an efficient...
Logo spotting is of a great interest because it enables to categorize the document images of a digital library of scanned documents according to their sources, without any costly semantic analysis of their textual transcript. In this paper, we present an approach for logo spotting, based on the matching of keypoints extracted both from the query document images and a given set of logos (gallery) using...
Semi-supervised learning (SSL) relies on a few labeled samples to explore data's intrinsic structure through pairwise smooth transduction. The performance of SSL mainly depends on two folds: (1) the accuracy of labeled queries, (2) the integrity of manifolds in data distribution. Both of these qualities would be poor in real applications as data often consist of several irrelevant clusters and discrete...
Clustering homologous proteins is one of the important tasks in functional genomics. Homologous proteins may share common functions. Annotating proteins of unknown function by transferring annotations from their homologues of known annotations is one of the efficient ways to predict protein function. We use a modularity-based method called CD for grouping together homologous proteins. The method employs...
The objective of the paper is to show the effect of noise on the performance of various clustering Techniques. Clustering is being widely used in many application including medical, finance and etc. Clustering may be applied on database using various approaches, based upon distance, density, hierarchy, and partition. The data item which is not relevant to data mining is called noise (e.g. out of bound...
Image segmentation is a very important process for multimedia applications. Multimedia databases use segmentation for the storage and indexing of images. This paper presents a way to segment images by applying both a clustering method and watershed transformation. It is well known that the major drawback of the watershed transformation method is the oversegmentation phenomenon it produces. For this...
In order to improve the poor accuracy and stability of the traditional geomagnetic matching navigation by TERCOM, In this paper, we discuss a new method which is based on the integration of TERCOM, K-means clustering algorithm and INS (inertial navigation system) in detail. Through an experiment, we find that the new method has higher accuracy and stability than the traditional method, especially...
Density based clustering algorithms are one of the primary method for data mining. The clusters which are formed using density clustering are easy to understand and it does limit itself to shapes of clusters. Existing density based algorithms have trouble because they are not capable of finding out all meaningful clusters whenever the density is so much varied. VDBSCAN is introduced to compensate...
This paper presents an incremental clustering algorithm based on DGC, a density-based algorithm we developed earlier. We experimented with real-life datasets and both methods perform satisfactorily. The methods have been compared with some well-known clustering algorithms and they perform well in terms of z-score cluster validity measure.
As a density based clustering algorithm, DBSCAN plays an important role in data mining. Normally DBSCAN algorithm is computationally expensive, limiting its performance in large-scale data sets, especially in high dimensional data sets. The high complexity is rooted from the region queries, a very common operation in density based algorithms, which brings the complexity of the algorithms to O(n2),...
In recent years, the advent of high throughput data generation techniques have increased not only the number of objects collected in databases, but also the number of attributes describing these objects. Clustering is the process of grouping the data into classes or clusters, so that objects within a cluster have high similarity in comparison to one another but are very dissimilar to objects in other...
Density-based clustering algorithms are very powerful to discover arbitrary-shaped clusters in large spatial databases. However, in many cases, varied local-density clusters exist in different regions of data space. In this paper, a new algorithm LD-BSCA is proposed with introducing the concept of local MinPts (a minimum number of points) and the new cluster expanding condition: ExpandConClId (Expanding...
Cluster analysis is a primary method for database mining. Most of clustering algorithms require input parameters which are hard to determine but have a significant influence on the clustering result. Furthermore, for many real-datasets there does not exist a global parameter setting for which the result of the clustering algorithm describes the intrinsic clustering structure accurately. We introduce...
We propose a probabilistic model for the relevance feedback of users looking for target images. This model takes into account user errors and user uncertainty about distinguishing similarly relevant images. Based on this model, we have developed an algorithm, which selects images to be presented to the user for further relevance feedback until a satisfactory image is found. In each query session,...
This paper proposed a new anomaly detection algorithm that can update normal profile of system usage pattern dynamically. The feature used to model systempsilas usage pattern was program behavior. When system usage pattern changed, new program behaviors will be inserted into old profiles by density-based incremental clustering. Compared to traditional re-clustering updating, it is much more efficiently...
The K-means algorithm based on partition and the DBSCAN algorithm based on density are analyzed. Combining advantages with disadvantages of the two algorithms, the improved algorithm DBSK is proposed. Because of the partition of data set, DBSK reduces the requirement of memory; the method of computing variable value is put forward; to the uneven data set, because of adopting different variable values...
DBSCAN is one of the most popular algorithms for cluster analysis. It can discover all clusters with arbitrary shape and separate noises. But this algorithm canpsilat choose parameter according to distributing of dataset. It simply uses the global MinPts parameter, so that the clustering result of multi-density database is inaccurate. In addition, when it is used to cluster large databases, it will...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.