2017 IEEE 33rd International Conference on Data Engineering (ICDE)

book

2017 IEEE 33rd International Conference on Data Engineering (ICDE)

IEEE

chapter

SPATE: Compacting and Exploring Telco Big Data

Constantinos Costa, Georgios Chatzimilioudis, Demetrios Zeinalipour-Yazti, Mohamed F. Mokbel

2017 IEEE 33rd International Conference on Data Engineering (ICDE) > 1419 - 1420

2017 IEEE 33rd International Conference on Data Engineering (ICDE)

In this demonstration paper, we present SPATE, an innovative telco big data exploration framework whose objectives are two-fold: (i) minimizing the storage space needed to incrementally retain data over time, and (ii) minimizing the response time for spatiotemporal data exploration queries over stored data. Our framework deploys lossless data compression to ingest streams of telco big data in the...

chapter

A Scalable Data Integration and Analysis Architecture for Sensor Data of Pediatric Asthma

Dimitris Stripelis, Jose Luis Ambite, Yao-Yi Chiang, Sandrah P. Eckel, more

2017 IEEE 33rd International Conference on Data Engineering (ICDE) > 1407 - 1408

2017 IEEE 33rd International Conference on Data Engineering (ICDE)

According to the Centers for Disease Control, in the United States there are 6.8 million children living with asthma. Despite the importance of the disease, the available prognostic tools are not sufficient for biomedical researchers to thoroughly investigate the potential risks of the disease at scale. To overcome these challenges we present a big data integration and analysis infrastructure developed...

chapter

Hippo in Action: Scalable Indexing of a Billion New York City Taxi Trips and Beyond

Jia Yu, Raha Moraffah, Mohamed Sarwat

2017 IEEE 33rd International Conference on Data Engineering (ICDE) > 1413 - 1414

2017 IEEE 33rd International Conference on Data Engineering (ICDE)

The paper demonstrates Hippo a lightweight database indexing scheme that significantly reduces the storage and maintenance overhead without compromising much on the query execution performance. Hippo stores disk page ranges instead of tuple pointers in the indexed table to reduce the storage space occupied by the index. It maintains simplified histograms that represent the data distribution and adopts...

chapter

Processing Declarative Queries through Generating Imperative Code in Managed Runtimes

Stratis D. Viglas

2017 IEEE 33rd International Conference on Data Engineering (ICDE) > 1610 - 1611

2017 IEEE 33rd International Conference on Data Engineering (ICDE)

We present the results of our work on integrating database and programming language runtimes through code generation and extensive just-in-time adaptation. Our techniques deliver significant performance improvements over non-integrated solutions. Our work makes important first steps towards a future where data processing applications will commonly run on machines that can store their datasets entirely...

chapter

Author index

2017 IEEE 33rd International Conference on Data Engineering (ICDE) > 1617 - 1625

2017 IEEE 33rd International Conference on Data Engineering (ICDE)

Presents an index of the authors whose articles are published in the conference proceedings record.

chapter

Prediction-Based Task Assignment in Spatial Crowdsourcing

Peng Cheng, Xiang Lian, Lei Chen, Cyrus Shahabi

2017 IEEE 33rd International Conference on Data Engineering (ICDE) > 997 - 1008

2017 IEEE 33rd International Conference on Data Engineering (ICDE)

With the rapid advancement of mobile devices and crowdsourcing platforms, spatial crowdsourcing has attracted much attention from various research communities. A spatial crowdsourcing system periodically matches a number of locationbased workers with nearby spatial tasks (e.g., taking photos or videos at some specific locations). Previous studies on spatial crowdsourcing focus on task assignment strategies...

chapter

Enabling Real-Time Drug Abuse Detection in Tweets

Nhathai Phan, Soon Ae Chun, Manasi Bhole, James Geller

2017 IEEE 33rd International Conference on Data Engineering (ICDE) > 1510 - 1514

2017 IEEE 33rd International Conference on Data Engineering (ICDE)

Prescription drug abuse is one of the fastest growing public health problems in the USA. To address this epidemic, a near real-time monitoring strategy, instead of one resorting to a retrospective health records, may improve detecting the prevalence and patterns of abuse of both illegal drugs and prescription medications. In this paper, our primary goals are to demonstrate the possibility of utilizing...

chapter

Mother Smoking During Pregnancy and ADHD in Children

Jing (Melody) Yao

2017 IEEE 33rd International Conference on Data Engineering (ICDE) > 1515 - 1522

2017 IEEE 33rd International Conference on Data Engineering (ICDE)

Objective This study aimed to examine the association between mother smoking during pregnancy and ADHD in children.

chapter

Parallel Progressive Approach to Entity Resolution Using MapReduce

Yasser Altowim, Sharad Mehrotra

2017 IEEE 33rd International Conference on Data Engineering (ICDE) > 909 - 920

2017 IEEE 33rd International Conference on Data Engineering (ICDE)

Entity resolution (ER) is the process of identifying which entities in a dataset represent the same real-world object. This paper proposes a progressive approach to ER using MapReduce. In contrast to traditional ER, progressive ER aims to resolve the dataset such that the rate at which the data quality improves is maximized. Such a progressive approach is useful for many emerging analytical applications...

chapter

BWT Arrays and Mismatching Trees: A New Way for String Matching with k Mismatches

Yangjun Chen, Yujia Wu

2017 IEEE 33rd International Conference on Data Engineering (ICDE) > 399 - 410

2017 IEEE 33rd International Conference on Data Engineering (ICDE)

In this paper, we discuss an efficient and effective index mechanism to do the string matching with k mismatches, by which we will find all the substrings in a target string s having at most k positions different from a pattern string r. The main idea is to transform s to a BWT-array as index, denoted as BWT(s), and search r against it. During the process, the precomputed mismatch information of r...

chapter

Multiple-Query Optimization of Regular Path Queries

Zahid Abul-Basher

2017 IEEE 33rd International Conference on Data Engineering (ICDE) > 1426 - 1430

2017 IEEE 33rd International Conference on Data Engineering (ICDE)

Graph databases have become increasingly important with the rise of social networks, and with the growth of the Semantic Web and characterization of biological networks. Regular path queries (RPQs) are a way to explore path patterns in graphs which have become a standard method to explore graph databases. SPARQL 1.1 includes property paths, and so now encompasses RPQs as a fragment. In many environments,...

chapter

Data Science Education: We're Missing the Boat, Again

Bill Howe, Michael Franklin, Laura Haas, Tim Kraska, more

2017 IEEE 33rd International Conference on Data Engineering (ICDE) > 1473 - 1474

2017 IEEE 33rd International Conference on Data Engineering (ICDE)

In the first wave of data science education programs, data engineering topics (systems, scalable algorithms, data management, integration) tended to be de-emphasized in favor of machine learning and statistical modeling. The anecdotal evidence suggests this was a mistake: data scientists report spending most of their time grappling with data far upstream of modeling activities. A second wave of data...

chapter

Handling Uncertainty in Geo-Spatial Data

Andreas Zufle, Goce Trajcevski, Dieter Pfoser, Matthias Renz, more

2017 IEEE 33rd International Conference on Data Engineering (ICDE) > 1467 - 1470

2017 IEEE 33rd International Conference on Data Engineering (ICDE)

An inherent challenge arising in any dataset containing information of space and/or time is uncertainty due to various sources of imprecision. Integrating the impact of the uncertainty is a paramount when estimating the reliability (confidence) of any query result from the underlying input data. To deal with uncertainty, solutions have been proposed independently in the geo-science and the data-science...

chapter

Scalable Informative Rule Mining

Guoyao Feng, Lukasz Golab, Divesh Srivastava

2017 IEEE 33rd International Conference on Data Engineering (ICDE) > 437 - 448

2017 IEEE 33rd International Conference on Data Engineering (ICDE)

We present SIRUM: a system for Scalable Informative RUle Mining from multi-dimensional data. Informative rules have recently been studied in several contexts, including data summarization, data cube exploration and data quality. The objective is to produce a small set of rules (patterns) over the values of the dimension attributes that provide the most information about the distribution of a numeric...

chapter

Provenance-Aware Query Optimization

Xing Niu, Raghav Kapoor, Boris Glavic, Dieter Gawlick, more

2017 IEEE 33rd International Conference on Data Engineering (ICDE) > 473 - 484

2017 IEEE 33rd International Conference on Data Engineering (ICDE)

Data provenance is essential for debugging query results, auditing data in cloud environments, and explaining outputs of Big Data analytics. A well-established technique is to represent provenance as annotations on data and to instrument queries to propagate these annotations to produce results annotated with provenance. However, even sophisticated optimizers are often incapable of producing efficient...

chapter

Joint User-Entity Representation Learning for Event Recommendation in Social Network

Lijun Tang, Eric Yi Liu

2017 IEEE 33rd International Conference on Data Engineering (ICDE) > 271 - 280

2017 IEEE 33rd International Conference on Data Engineering (ICDE)

User-managed-events is a popular feature on social networks. Take Facebook Events as an example: over 135 million events were created in 2015 and over 550 million people use events each month. In this work, we consider the heavy sparseness in both user and event feedback history caused by short lifespans (transiency) of events and user participation patterns in a production event system. We propose...

chapter

Reverse Keyword-Based Location Search

Xike Xie, Xin Lin, Jianliang Xu, Christian S. Jensen

2017 IEEE 33rd International Conference on Data Engineering (ICDE) > 375 - 386

2017 IEEE 33rd International Conference on Data Engineering (ICDE)

The proliferation of geo-textual data gives prominence to spatial keyword search. The basic top-k spatial keyword query, returns k geo-textual objects that rank the highest according to their textual relevance and spatial proximity to query keywords and a query location. We define, study, and provide means of computing the reverse top-k keyword-based location query. This new type of query takes a...

chapter

Mutually Beneficial Confluent Routing

Xinpeng Zhang, Yasuhito Asano, Masatoshi Yoshikawa

2017 IEEE 33rd International Conference on Data Engineering (ICDE) > 43 - 44

2017 IEEE 33rd International Conference on Data Engineering (ICDE)

chapter

Conflict-Aware Weighted Bipartite b-Matching and Its Application to E-Commerce

Cheng Chen, Lan Zheng, Venkatesh Srinivasan, Alex Thomo, more

2017 IEEE 33rd International Conference on Data Engineering (ICDE) > 41 - 42

2017 IEEE 33rd International Conference on Data Engineering (ICDE)

INFONA - science communication portal

2017 IEEE 33rd International Conference on Data Engineering (ICDE)