The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We introduce a novel algorithm for mining temporal intervals from real-time system traces with linear complexity using passive, black-box learning. Our interest is in mining nfer specifications from spacecraft telemetry to improve human and machine comprehension. Nfer is a recently proposed formalism for inferring event stream abstractions with a rule notation based on Allen Logic. The problem of...
Currently, open source projects receive various kinds of issues daily, because of the extreme openness of Issue Tracking System (ITS) in GitHub. ITS is a labor-intensive and time-consuming task of issue categorization for project managers. However, a contributor is only required a short textual abstract to report an issue in GitHub. Thus, most traditional classification approaches based on detailed...
Adapted from biological sequence alignment, trace alignment is a process mining technique used to visualize and analyze workflow data. Any analysis done with this method, however, is affected by the alignment quality. The best existing trace alignment techniques use progressive guide-trees to heuristically approximate the optimal alignment in O(N2L2) time. These algorithms are heavily dependent on...
Given the soaring amount of data being generated daily, graph mining tasks are becoming increasingly challenging, leading to tremendous demand for summarization techniques. Feature selection is a representative approach that simplifies a dataset by choosing features that are relevant to a specific task, such as classification, prediction, and anomaly detection. Although it can be viewed as a way to...
The amount of data circulating on the Internet is increasing day by day. With the increasing use of social media in particular, the importance of analyzing these data is increasing. The use of machine learning approaches to analyze large amounts of data is still popular today. Today, the social network Facebook is the most popular social networking sites. In this study, some data taken on Facebook...
Heap overflow is one of the most widely exploited vulnerabilities, with a large number of heap overflow instances reported every year. It is important to decide whether a crash caused by heap overflow can be turned into an exploit. Efficient and effective assessment of exploitability of crashes facilitates to identify severe vulnerabilities and thus prioritize resources. In this paper, we propose...
Due to data intensive and sophisticated tasks in scientific experiments, workflows have been widely used to enable repetitive task automation and data reproducibility. This yields to the need for effective and efficient search mechanisms for scientific workflows discovery as workflow retrieval systems require a model which fulfills several requirements: unification, accuracy, and rich representations...
Together with the technology advancement, Computer Vision plays an important role in enhancing smart computing systems to help people overcome obstacles in their daily lives. One of the common troublesome problems is human memorization ability, especially memorizing things such as personal items. It is annoying for people to waste their time finding lost items manually by recall or notes. This motivates...
This work studies a data-driven methodology for detecting systematic defects using layout-aware scan diagnosis data. As part of volume diagnosis, this methodology focuses on ranking the most systematic defective signatures, while possible random defects are also present in the wafer. The main analysis components utilize χ2 Independence Tests to establish systematic relationships between reported defective...
Each programmer has his own way of programming but some criteria can be applied when analysing code: there are a set of best practices that can be checked, or “not so common” instructions that are mainly used by experts that can be found. Considering that all programs that are going to be compared are correct, it's possible to infer the experience level of the programmer or the proficiency level of...
This study focuses on multimodal artifact metrics and proposes a technique based on multimodal biometric systems that are a type of biometric identification systems. It is expected that this technique can aid in verifying the authenticity of each artifact in a more accurate manner and in increasing the level of difficulty involved in counterfeiting when compared to those of existing artifact metric...
A kernel or mini-app is a self-contained small application that retains certain characteristics of the original application [7]. Working on a kernel or mini-app in the place of the original application can dramatically reduce the resources and effort required for performing software tasks such as performance optimization and porting to new platforms. However, using kernel as a proxy is based on the...
Most modern search engines feature keyword based search interfaces. These interfaces are usually found on websites belonging to enterprises or governments or sites related to news articles, blogs and social media that contain a large corpus of documents. These collections of documents are not easily indexed by web search engines, and are considered as hidden web databases. These databases provide...
Developing new ideas and algorithms or comparing new findings in the field of requirements engineering and management implies a dataset to work with. Collecting the required data is time consuming, tedious, and may involve unforeseen difficulties. The need for datasets often forces re-searchers to collect data themselves in order to evaluate their findings. However, comparing results with other publications...
Software systems with quality of service (QoS), such as database management systems and web servers, are ubiquitous. Such systems must meet strict performance requirements. Instrumentation is a useful technique for the analysis and debugging of QoS systems. Dynamic binary instrumentation (DBI) extracts runtime information to comprehend system's behavior and detect performance bottlenecks. However,...
Data anonymization is a technique used to increase the assurance that private data is not accessible to third parties. In data mining processes, anonymization can impact the results, since anonymized data may hinder the data analysis performed by algorithms commonly used in this context. The goal of this Practical Experience Report is to evaluate the accuracy and per-formance impact of data anonymization...
Refactoring is widely used technique to enhance overall quality of an existing software system by changing its internal structure without modifying its external behavior. Although, it is difficult to implement the refactoring manually, it helps to reduce the defects in the existing software. Three main types of design defects are investigated in the current study namely blob, Spaghetti Code (SC) and...
Software engineers are using a variety of social platforms, where they participate in open source software projects and respond to other developers that ask for help on specific issues. This presence of developers in different platforms is a mirror of their hands-on experience and expertise in different technologies and programming languages and a useful source of information for their own use but...
Recent developments in the storage of system data in the Navy's data repository, LEAPS, using the FOCUS product meta-model have opened the doors to graph-theory applications in the design of Navy ship systems in the early stages of design. In this paper, we demonstrate the ability to extract graphs from ship data and present pertinent applications of such graphs including a vulnerability metric for...
In an industrial facility, a large amount of data on the operation of the facility is collected. The data includes control variables used in monitoring and regulation, alarm variables displayed to notify operators of abnormalities, and operators' actions taken to correct abnormalities as well as to operate the facility. As the data is historized and is readily accessible, it can provide a wealth of...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.