Data sketching

[thumbnail of WRAP-Data-sketching-Cormode-2017.pdf]
Preview
PDF
WRAP-Data-sketching-Cormode-2017.pdf - Accepted Version - Requires a PDF viewer.

Download (508kB) | Preview

Request Changes to record.

Abstract

Do you ever feel overwhelmed by a constant stream of information? It can seem like there is a barrage of new email and text messages arriving, phone calls, articles to read, and knocks on the door. Putting these pieces together to keep track of what’s important can be a real challenge.

The same information overload is of concern in many computational settings. For example, telecommunications companies want to keep track of the activity on their network, to identify the overall network health, and spot anomalies or changes in behavior. Yet, the scale of events occurring is huge: many millions of network events per hour, per network element. And while new technologies allow the scale and granularity of events being monitored to increase by orders
of magnitude, the capacity of computing elements to make sense of these (processors, memory and disks) is barely increasing. Even on a small scale, the amount of information may be too large
to store in an impoverished setting (say, an embedded device), or be too big to conveniently keep in fast storage.

In response to this challenge, the model of streaming data processing has grown in popularity. In this setting, the aim is no longer to capture, store and index every minute event, but rather to quickly process each observation to create some summary of the current state. Following its processing, an event is dropped, and is no longer accessible. The summary that is retained is often referred to as sketch of the data. Coping with the vast scale of information means making a number of compromises: the description of the world is approximate, rather than exact; the nature of queries to be answered must be decided in advance, rather than after the fact; and some questions are now insoluble. However, the ability to process vast quantities of data at blinding speeds with modest resources can more than make up for these limitations. As a consequence, streaming methods have been adopted in a number of domains, starting with telecommunications, but spreading to search engines, social networks, finance, and time-series analysis. These ideas are also finding application in areas where traditional approaches are applicable, but the rough and ready sketching approach is more cost-effective. Successful applications of sketching involve a mixture of algorithmic tricks, systems know-how, and mathematical insight, and have led to new research contributions in each of these areas.

In this article, we introduce the ideas behind, and applications, of sketching, with a focus on the algorithmic innovations. That is, we describe some algorithmic developments in the abstract, and then indicate the subsequent steps needed to put them into practice, with examples. We will see four novel algorithmic ideas, and discuss some emerging areas.

Item Type: Journal Article
Subjects: Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software
Divisions: Faculty of Science, Engineering and Medicine > Science > Computer Science
Library of Congress Subject Headings (LCSH): Big data, Data mining, Data flow computing, Computer algorithms
Journal or Publication Title: Communications of the ACM
Publisher: Association for Computing Machinery, Inc.
ISSN: 0001-0782
Official Date: September 2017
Dates:
Date
Event
September 2017
Published
14 July 2017
Accepted
Volume: 60
Number: 9
Page Range: pp. 48-55
DOI: 10.1145/3080008
Status: Peer Reviewed
Publication Status: Published
Access rights to Published version: Restricted or Subscription Access
Date of first compliant deposit: 18 September 2017
Date of first compliant Open Access: 19 September 2017
Funder: European Research Council (ERC), Engineering and Physical Sciences Research Council (EPSRC), Royal Society (Great Britain). Wolfson Research Merit Award (RSWRMA)
Grant number: ERC-2014-CoG 647557 (ERC), EP/N510129/1 (EPSRC), EP/N012380/1 (EPSRC)
Persistent URL: https://wrap.warwick.ac.uk/92275/

Export / Share Citation


Request changes or add full text files to a record

Repository staff actions (login required)

View Item View Item