The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In 2015, The District of Columbia framed a Vision Zero mission and action plan, aimed at curbing roadway deaths to zero by 2024. Automated traffic enforcement (ATE) features prominently amongst Vision Zero strategies. The paper performs some analytics of data derived from the DC's speed and red light cameras and discusses a framework of how ATE can help DC reach its Vision Zero goals by fine-tuning...
Sampling through crawling is an important research topic in social network analysis. However there is very little existing work on sampling through crawling in directed networks. In this paper we present a new method of sampling a directed network, with the objective of maximizing the node coverage. Our proposed method, Predicted Max Degree (PMD) Sampling, works by predicting which k open nodes are...
Big data processing has introduced new ideas in the applications of bacterial analysis in recent years. This paper aims to develop an effective framework to automatically extract quantitative knowledge relating to bacterial motility through processing a sequence of large-scale microscopic images of bacterial movements. It was hypothesized that motile bacteria move according to a conceptual model referred...
Two of the major problems in social media message classification are the data sparseness issue and the high degree of lexical variation. Paraphrases, or synonyms, are alternative ways of expressing the same meaning using different lexical variations. In this study, we try to use paraphrases to improve tweet topic classification performance. We explored two approaches to generating paraphrases, WordNet,...
With the advent of the Internet and wide-spread popularity of online technology-enhanced learning platforms, many pedagogical activities today involve learners in online discussions such as synchronous chat. In this study, we describe a text mining method used for analyzing teamwork from such chat dialogue of students. The steps in the text mining method such as pre-processing and classification are...
The District Department of Transportation's (DDOT) pay by cell (PBC) program for on-street parking has been very successful. Last year, DDOT conducted spatio-statistical analysis to determine demographic and spatial trends that highlights PBC adoption and usage frequency. This paper serves as a continuation of that investigation, looking into spatial statistical patterns of non-PBC usage, trip patterns,...
Multi-relational data, like knowledge graphs, are generated from multiple data sources by extracting entities and their relationships. We often want to include inferred, implicit or likely relationships that are not explicitly stated, which can be viewed as link-prediction in a graph. Tensor decomposition models have been shown to produce state-of-the-art results in link-prediction tasks. We describe...
It is required to simulate the tactical moving objects such as combat plane, naval vessels, and submarine for performing performance and functional testing of the target management system. It is not possible to collect tactical moving objects data of various circumstances due to military security. To solve this problem, in this paper, we have proposed a generator of test data set for tactical moving...
Convolutional Neural Networks (CNN) are useful methods for identification of previously unknown embedded patterns in images. Several object and facial recognition along with image segmentation tasks have benefited from the non-linear abstraction of hybrid features using CNN. This work presents a novel CNN model parametrization work-flow developed on the cloud-computing platform of Microsoft Azure...
The amount of data that businesses collect and analyze has been rapidly increasing, which has triggered an increase in big data teams. With the growth of both the number and size of big data teams, specialized roles are starting to be defined. One such role is the data engineer, who focuses on ensuring that the data is easily available for advanced analytics. Via a case study, this paper explores...
Despite the existence of data analysis tools such as R, SQL, Excel and others, it is still insufficient to cope with today's big data analysis needs. The author proposes a CUI (Character User Interface) toolset with dozens of functions to neatly handle tabular data in TSV (Tab Separated Values) files. It implements many basic and useful functions that have not been implemented in existing software...
The number of e-commerce customers and database services in cloud computing platforms has grown increasingly, leading providers to adopt resource-sharing solutions to meet growing demand for infrastructure resources, such as processing and storage. Consolidating database applications has become arguably a de-facto solution to support a large number of customers/tenants at low infrastructure costs...
Applications deployed in the Cloud usually come with dedicated performance and availability requirements. This can be achieved by replicating data across several sites and/or by partitioning data. Data replication allows to parallelize read requests and thus to decrease data access latency, but induces significant overhead for the synchronization of updates. Partitioning, in contrast, is highly beneficial...
As hardware and software technologies have improved, our definition of a “manageable amount of data” has increased in its scope dramatically. The term “big data” can be applied to any of several different projects and technologies sharing the ultimate goal of supporting analysis on these large, heterogeneous, and evolving data sets. The term “data science” refers to the statistical, technical, and...
In the database community Polystores is an emerging and promising approach for data federation that aims at designing a unified querying layer over multiple data models. In the Semantic Web community a similar in spirit approach of Ontology-Based Data Access (OBDA) has been recently proposed, attracted a lot of attention, and proved its success in several industrial scenarios. In this paper we discuss...
We propose a new analytical method to classify web user behavior based on such latent states of users as intention, interest, or motivation. First, we put the clickstream data of many users into a Hidden Markov Model in which the number of hidden states is large enough to build a state transition network. Since the variable hidden states represent different latent states of users, the movement on...
E-commerce plays a key role in business success nowadays. Therefore, the performance of E-commerce websites is critical. E-commerce websites generate a large amount of data that is often used for performance evaluation. Many website evaluation methods have been proposed, but the social media factor is usually not taken into consideration. In this paper, Twitter data is utilized for big data analytics...
In this paper we present Digree, an experimental middleware system that can execute graph pattern matching queries over databases hosting voluminous graph datasets. First, we formally present the employed data model and the processes of re-writing a query into an equivalent set of subqueries and subsequently combining the partial results into the final result set. Our framework guarantees the correctness...
In this paper, we report a work consisting in using deep convolutional neural networks (CNNs) for curating and filtering photos posted by social media users (Instagram and Twitter). The final goal is to facilitate searching and discovering user-generated content (UGC) with potential value for digital marketing tasks. The images are captured in real time and automatically annotated with multiple CNNs...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.