The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The increased use of the Internet and the ease of access to online communities like social media have provided an avenue for cybercrimes. Cyberbullying, which is a kind of cybercrime, is defined as an aggressive, intentional action against a defenseless person by using the Internet, social media, or other electronic contents. Researchers have found that many of the bullying cases have tragically ended...
In this paper, we present a novel method to classify directions of capital flows in Internet finance. Our method is different from previous text classification methods in that extracts key sentences which may directly reflect the semantics of input text before classification. We use the Bi-LSTM model as a classifier to process input sentences. In this paper, we represent the matrix of key sentences...
This paper reports comparative authorship attribution results obtained on the Internet comments of the morphologically complex Lithuanian language. We have explored the impact of machine learning and similarity-based approaches on the different author set sizes (containing 10, 100, and 1,000 candidate authors), feature types (lexical, morphological, and character), and feature selection techniques...
With the Internet applications become more complex and diverse, simple network traffic matrix estimation or approximation methods such as gravity model are no longer adequate. In this paper, we advocate a novel approach of approximating traffic matrices with multiple low-rank matrices. We build the theory behind the MULTI-LOW-RANK approximation and discuss the conditions under which it is better than...
Many animal species exist in this world and there are always new species being discovered each year. Therefore, it is very important that these valuable species be documented properly to be referred to in future. Numerous information retrieval systems for managing and documenting animal species today only allow users to search animal images and descriptions online via text-based input. Therefore,...
With the rapid development of Internet, how to obtain valuable information from massive messages has become a major problem we need to be solved in the information explosive era. This paper introduces the development route of information extraction technology, and discusses four categories of Chinese entity relation extraction technologies in depth. Finally, the advantages and disadvantages of different...
Timely and accurate traffic classification and application characterization are becoming increasingly important with many applications in wired and wireless networks, e.g., traffic engineering, security monitoring, and quality of service (QoS). In particular, Software Defined Networking (SDN) is a new networking paradigm that has great impact on future IP networks and 5G wireless networks. In SDN...
Online reviews play a crucial role in helping consumers to make purchase decisions. However, a severe problem Internet Water Army (a large amount of paid posters who write inauthentic reviews) emerge in many E-commerce websites recently which dramatically undermines the value of user reviews. Although the word Internet Water Army originated from China, some other countries also suffered from this...
An improved KNN text classification algorithm based on Simhash has been proposed by introducing Simhash and the average Hamming distance of adjacent texts as a unit, which solves the problems caused by data imbalance and the large computational overhead in the traditional KNN text classification algorithms. Experimental results demonstrate that the proposed algorithm performs a higher precision, a...
In order to improve booking tickets experience of the users of Railway Online Ticketing System and ensure the system normally running, Railway Online Ticketing System's users abnormality booking the tickets detection model based on the traditional K-Means and FP-Growth algorithm is proposed. Firstly, preliminary filter user features by the Random Forest Algorithm based on Spark MLlib to identify the...
With the evolution of internet, there has been an unprecedented and unlimited growth in volume, velocity, veracity and variety of the data and the complexity of data attributes is on the rise. Further, in the domain of internet, data is not geo-centric any longer and multiple locations are contributing to the data acquisition technologies including but not limited to packet captures, data logs, routing...
With the emergence of the Internet social shopping platform, a large quantity of sentiment corpus is accumulating rapidly. Sentiment classification, which is a specific application of sentiment analysis, has received a lot of attention from researchers in the fields of natural language processing. The traditional method to classify sentiment text is usually limited to the content of text. However,...
Synonyms extraction is a fundamental research, which is helpful to text mining and information retrieval. In this paper, we propose method to extract synonymy from text, the method employs spectral clustering and word2vec. First, the word2vec model is trained by a large-scale English Wikipedia corpus. Then, we extract keywords from a text and use the trained model to generate similarities among these...
In recent years, the rapid development of geographic information system technology and the popularity of geo-location-based mobile information services have made people pay more attention to geography-related information. Thus, the information retrieval and related services based on geographic has a broad application prospects. However, the traditional search engine for the processing of geographic...
Predicting meme burst is of great relevance to develop security-related detecting and early warning capabilities. In this paper, we propose a feature-based method for real-time meme burst predictions, namely “Semantic, Network, and Time” (SNAT). By considering the potential characteristics of bursty memes, such as the semantics and spatio-temporal characteristics during their propagation, SNAT is...
Knowledge graph technology belongs to the field of artificial intelligence. It is widely used in semantic search and intelligent question answering. Construction of Uyghur's knowledge graph has the great value of Uyghur information processing and Uyghur application software development. Firstly, this paper describes the definition and structure of the knowledge graph, then it reviews the related research...
Nowadays cross-media retrieval is an useful technology that helps people find expected information from the huge amount of multimodal data more efficiently. A common cross-media retrieval framework is first to map features of different modalities into an isomorphic semantic space so that the similarity between heterogeneous data can be measured. For most of semantic space based methods, the mapping...
With the explosive growth of information on the Internet, it becomes more and more important to improve the efficiency of information acquisition. Automatic text summarization provides a good means for quick acquisition of information through compression and refinement. While existing methods for automatic text summarization achieve elegant performance on short sequences, however, they are facing...
With the rapid development of the Internet, massive Internet text data has brought new opportunities and challenges to the research of entity relation extraction. Open entity relation extraction overcomes the shortage of traditional methods, that relation types need to be predefined and plenty of training data need to be labeled in advance. A lot of work have been done for English Open ERE, and now...
Linking the growing IPv6 deployment to existing IPv4 addresses is an interesting field of research, be it for network forensics, structural analysis, or reconnaissance. In this work, we focus on classifying pairs of server IPv6 and IPv4 addresses as siblings, i.e., running on the same machine. Our methodology leverages active measurements of TCP timestamps and other network characteristics, which...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.