AlgoUK Workshop at Warwick

On 17-18 September, we will be hosting the AlgoUK workshop at Warwick. AlgoUK is a two-day national workshop, combining a UK theory day with an additional one-day workshop focusing on applications and applied areas relevant to algorithms and complexity. The talks will take place in MS.02

More details are available on the event web page: https://algouk.wixsite.com/warwick2019

Conference Papers for summer 2019

The following papers have been accepted for publication in 2019:

Independent sets in vertex-arrival streams (G. Cormode, J. Dark, and C. Konrad). In International Colloquium on Automata, Languages and Programming (ICALP), 2019.

We consider the maximal and maximum independent set problems in three models of graph streams:
In the edge model we see a stream of edges which collectively define a graph; this model is well-studied for a variety of problems. We show that the space complexity for a one-pass streaming algorithm to find a maximal independent set is quadratic (i.e. we must store all edges). We further show that it is not much easier if we only require approximate maximality. This contrasts strongly with the other two vertex-based models, where one can greedily find an exact solution in only the space needed to store the independent set.
In the “explicit” vertex model, the input stream is a sequence of vertices making up the graph. Every vertex arrives along with its incident edges that connect to previously arrived vertices. Various graph problems require substantially less space to solve in this setting than in edge-arrival streams. We show that every one-pass c-approximation streaming algorithm for maximum independent set (MIS) on explicit vertex streams requires Ω((n2)/(c6)) bits of space, where n is the number of vertices of the input graph. It is already known that Θ~((n2)/(c2)) bits of space are necessary and sufficient in the edge arrival model (Halldórsson et al. 2012), thus the MIS problem is not significantly easier to solve under the explicit vertex arrival order assumption. Our result is proved via a reduction from a new multi-party communication problem closely related to pointer jumping.
In the “implicit” vertex model, the input stream consists of a sequence of objects, one per vertex. The algorithm is equipped with a function that maps pairs of objects to the presence or absence of edges, thus defining the graph. This model captures, for example, geometric intersection graphs such as unit disc graphs. Our final set of results consists of several improved upper and lower bounds for interval and square intersection graphs, in both explicit and implicit streams. In particular, we show a gap between the hardness of the explicit and implicit vertex models for interval graphs.

Answering range queries under local differential privacy (G. Cormode, T. Kulkarni, and D. Srivastava). In International Conference on Very Large Data Bases (VLDB), 2019

Counting the fraction of a population having an input within a specified interval i.e. a range query, is a fundamental data analysis primitive. Range queries can also be used to compute other core statistics such as quantiles, and to build prediction models. However, frequently the data is subject to privacy concerns when it is drawn from individuals, and relates for example to their financial, health, religious or political status. In this paper, we introduce and analyze methods to support range queries under the local variant of differential privacy, an emerging standard for privacy-preserving data analysis.

The local model requires that each user releases a noisy view of her private data under a privacy guarantee. While many works address the problem of range queries in the trusted aggregator setting, this problem has not been addressed specifically under untrusted aggregation (local DP) model even though many primitives have been developed recently for estimating a discrete distribution. We describe and analyze two classes of approaches for range queries, based on hierarchical histograms and the Haar wavelet transform. We show that both have strong theoretical accuracy guarantees on variance. In practice, both methods are fast and require minimal computation and communication resources. Our experiments show that the wavelet approach is most accurate in high privacy settings, while the hierarchical approach dominates for weaker privacy requirements.

Streaming algorithms for bin packing and vector scheduling (G. Cormode and P. Veselý). In Workshop on Approximation and Online Algorithms, 2019.

Problems involving the efficient arrangement of simple objects, as captured by bin packing and makespan scheduling, are fundamental tasks in combinatorial optimization. These are well understood in the traditional online and offline cases, but have been less well-studied when the volume of the input is truly massive, and cannot even be read into memory. This is captured by the streaming model of computation, where the aim is to approximate the cost of the solution in one pass over the data, using small space. As a result, streaming algorithms produce concise input summaries that approximately preserve the optimum value.

We design the first efficient streaming algorithms for these fundamental problems in combinatorial optimization. For Bin Packing, we provide a streaming asymptotic 1+-approximation with O~(1/ε) memory, where O~ hides logarithmic factors. Moreover, such a space bound is essentially optimal. Our algorithm implies a streaming d+ε-approximation for Vector Bin Packing in d dimensions, running in space O~((d)/(ε)). For the related Vector Scheduling problem, we show how to construct an input summary in space O~(d2·m / ε2) that preserves the optimum value up to a factor of 2 – (1)/(m) +ε, where m is the number of identical machines.

 Efficient interactive proofs for linear algebra (G. Cormode and C. Hickey). In Proceedings of International Symposium on Algorithms and Computation (ISAAC), 2019.

Motivated by the growth in outsourced data analysis, we describe methods for verifying basic linear algebra operations performed by a cloud service without having to recalculate the entire result. We provide novel protocols in the streaming setting for inner product, matrix multiplication and vector-matrix-vector multiplication where the number of rounds of interaction can be adjusted to tradeoff space, communication, and duration of the protocol. Previous work suggests that the costs of these interactive protocols are optimized by choosing O(logn) rounds. However, we argue that we can reduce the number of rounds without incurring a significant time penalty by considering the total end-to-end time, so fewer rounds and larger messages are preferable. We confirm this claim with an experimental study that shows that a constant number of rounds gives the fastest protocol.

Towards a theory of parameterized streaming algorithms (R. Chitnis and G. Cormode). In International Symposium on Parameterized and Exact Computation, 2019.

Parameterized complexity attempts to give a more fine-grained analysis of the complexity of problems: instead of measuring the running time as a function of only the input size, we analyze the running time with respect to additional parameters. This approach has proven to be highly successful in delineating our understanding of NP-hard problems. Given this success with the TIME resource, it seems but natural to use this approach for dealing with the SPACE resource. First attempts in this direction have considered a few individual problems, with some success: Fafianie and Kratsch [MFCS’14] and Chitnis et al. [SODA’15] introduced the notions of streaming kernels and parameterized streaming algorithms respectively. For example, the latter shows how to refine the Ω(n2) bit lower bound for finding a minimum Vertex Cover (VC) in the streaming setting by designing an algorithm for the parameterized k-VC problem which uses O(k2logn) bits. In this paper, we initiate a systematic study of graph problems from the paradigm of parameterized streaming algorithms. We first define a natural hierarchy of space complexity classes of FPS, SubPS, SemiPS, SupPS and BPS, and then obtain tight classifications for several well-studied graph problems such as Longest Path, Feedback Vertex Set, Dominating Set, Girth, Treewidth, etc. into this hierarchy. On the algorithmic side, our parameterized streaming algorithms use techniques from the FPT world such as bidimensionality, iterative compression and bounded-depth search trees. On the hardness side, we obtain lower bounds for the parameterized streaming complexity of various problems via novel reductions from problems in communication complexity. We also show a general (unconditional) lower bound for space complexity of parameterized streaming algorithms for a large class of problems inspired by the recently developed frameworks for showing (conditional) kernelization lower bounds. Parameterized algorithms and streaming algorithms are approaches to cope with TIME and SPACE intractability respectively. It is our hope that this work on parameterized streaming algorithms leads to two-way flow of ideas between these two previously separated areas of theoretical computer science.

 

Funded PhD project in Data Privacy at the University of Warwick (UK applicants)

A fully-funded PhD project in Data Privacy is available to be held at the University of Warwick, under the supervision of Prof. Graham Cormode and Prof. Carsten Maple.

The central focus of the project is on systems which adopt a statistical model of privacy, namely Differential Privacy and its Local variant.  Local differential privacy (LDP) has been adopted by several major technology companies, leading to the technology being used by hundreds of millions users daily.   The project will address questions including:

  • As more data is released via implementations of local differential privacy, what are the security guarantees in practice?
  • What statistics and models can be computed accurately under LDP, and what the tradeoffs are between the various parameters: the number of participants, the type of interaction, the accuracy of the results, and the level of privacy guaranteed?
  • What is the utility of the model in new settings – for example, data arising in the mobile vehicular context?

The project is funded by the National Cybersecurity Centre (NCSC), and involves collaboration with Thales in the UK (led by Peter Davies).  Suitable applicants will have a strong background in mathematics or statistics, and good knowledge of algorithms and computer science.  The project will cover all tuition fees, and provide a stipend at the standard UKRI level.  Due to the requirements of the sponsor, this studentship is only available to applicants with UK citizenship.  Additional opportunities with the NCSC will be available during and after the completion of the project.

Applications, in the form of a CV and brief covering letter detailing background, confirmation of citizenship, and preparation of the applicant, can be sent to g.cormode@warwick.ac.uk and cm@warwick.ac.uk by May 15th 2019 (extended deadline).  Shortlisted applicants will be contacted for a follow-up discussion by phone or skype.

 

New Postdoctoral Researchers

Two new researchers have joined the ERC project “Small Summaries for Big Data” as a postdoctoral research fellow, under the supervision of Professor Graham Cormode.

pvPavel Vesely works on problems related to scheduling and bin packing, and will work on these questions in the context of streaming data.  Prior to coming to Warwick, he was a PhD student at Computer Science Institute of Charles University in Prague, where his adviser was professor Jiří Sgall.

 

Michael_Shekelyan

 

Michael Shekelyan works on data summarization, including histograms, sampling, and other randomized summaries.  Before coming to Warwick, Michael completed his PhD at the Faculty of Computer Science, Free University of Bozen-Bolzano, Italy.

Keynote talk at PODC

Invited keynote talk at Principles of Distributed Computation (PODC) 2018 in July 2018 on “Data summarization and distributed computation

The notion of summarization is to provide a compact representation of data which approximately captures its essential characteristics. If such summaries can be created, they can lead to efficient distributed algorithms which exchange summaries in order to compute a desired function. In this talk, I’ll describe recent efforts in this direction for problems inspired by machine learning: building graphical models over evolving, distributed training examples, and solving robust regression problems over large, distributed data sets.

Tutorials on Local Differential Privacy

A tutorial on local differential privacy, “Privacy at scale: Local differential privacy in practice” prepared by G. Cormode, S. Jha, T. Kulkarni, N. Li, D. Srivastava, and T. Wang, will be presented at the ACM SIGMOD and SIGKDD conferences this summer.  SIGMOD takes place in Houston, TX in June, and KDD will be in London, UK in August.

Local differential privacy (LDP), where users randomly perturb their inputs to provide plausible deniability of their data without the need for a trusted party, has been adopted recently by several major technology organizations, including Google, Apple and Microsoft. This tutorial aims to introduce the key technical underpinnings of these deployed systems, to survey current research that addresses related problems within the LDP model, and to identify relevant open problems and research directions for the community.

Draft slides are available, and video recordings should be available after they are presented.

DAPPER Project report

The ‘DAPPER’ Research project (Marie Curie Career Integration Grant 618202) has recently concluded its four year duration.  Here is the final public summary of the activity:

DAPPER: Delivering Anonymization Practically, Privately, Effectively and Reusably

Introduction. There is currently a tug-of-war going on surrounding data releases.  On one side, there are many strong reasons pulling to release data to other parties: business factors, freedom of information rules, and scientific sharing agreements.  On the other side, concerns about individual privacy pull back, and seek to limit releases.  Privacy technologies such as differential privacy have been proposed to resolve this deadlock, and there has been much study of how to perform private data release of data in various forms.  The focus of such works has been largely on the data owner: what process should they apply to ensure that the released data preserves privacy whilst still capturing the input data distribution accurately. Less attention has been paid to the needs of the data user, who wants to make use of the released data within their existing suite of tools and data. The difficulty of making use of data releases is a major stumbling block for the widespread adoption of data privacy technologies.

This Marie Curie career integration project considered the whole data release process, from the data owner to the data user.  It laid out a set of principles for privacy tool design that highlight the requirements for interoperability, extensibility and scalability. The aim of the project was in Delivering Anonymization Practically, Privately, Effectively and Reusably (DAPPER).  It produced published results under the following four themes:

  • Synthetic Private Data. New methods were developed for providing synthetic data in the form of (social) networks, based on anonymized versions of real data under the strong privacy guarantee of differential privacy [SIGMOD16].  The fellow also proposed a new privacy definition called personalized differential privacy (PDP), a generalization of differential privacy in which users specify a personal privacy requirement for their data, and introduced several novel mechanisms for achieving PDP [ICDE15].
  • Correlated Data Modelling. Many analysis and machine learning tasks require the availability of marginal statistics on multidimensional datasets while providing strong privacy guarantees for the data subjects. Applications for these statistics range from finding correlations in the data to fitting sophisticated prediction models. The fellow provided a set of algorithms for materializing marginal statistics under the strong model of local differential privacy [SIGMOD18], as well as developing PrivBayes, a differentially private method for releasing high-dimensional data [SIGMOD14,TODS17].
  • Data Utility Enhancement. The fellow worked on the core problem of count queries, and designed randomized mechanisms to release count data associated with a group of n individuals [ICDE18]. The fellow also gave new algorithms to provide statistical information about graphs based on a ‘ladder’ distribution [SIGMOD15].
  • Trajectory Data. The fellow presented DPT, a system to synthesize mobility data based on raw GPS trajectories of individuals while ensuring strong privacy protection in the form of ε-differential privacy [VLDB15].

The work of the fellow has had real impact on the state of the art: the fellow’s work is used within the private data collection software developed by Apple and deployed to millions of iOS and MacOS users around the world.  The fellow is now a Professor at the University of Warwick in the UK, and leads a group of researchers working on topics related to privacy and data analysis.

Website and Contact Details.  Activities and news about the project are posted by the fellow at the website https://gcormode.wordpress.com/.

Privacy Work at SIGMOD

A paper on marginal release under the model of local differential privacy has been accepted for presentation at the SIGMOD conference.  A related tutorial will also be presented on the model of Local Differential Privacy (LDP).  Details and links are as follows:

Marginal release under local differential privacy (Cormode, Kulkarni, Srivastava)

Many analysis and machine learning tasks require the availability of marginal statistics on multidimensional datasets while providing strong privacy guarantees for the data subjects. Applications for these statistics range from finding correlations in the data to fitting sophisticated prediction models. In this paper, we provide a set of algorithms for materializing marginal statistics under the strong model of local differential privacy. We prove the first tight theoretical bounds on the accuracy of marginals compiled under each approach, perform empirical evaluation to confirm these bounds, and evaluate them for tasks such as modeling and correlation testing. Our results show that releasing information based on (local) Fourier transformations of the input is preferable to alternatives based directly on (local) marginals.

 Privacy at scale: Local differential privacy in practice (G. Cormode, S. Jha, T. Kulkarni, N. Li, D. Srivastava, and T. Wang). Tutorial at SIGMOD 2018 and KDD 2018

Local differential privacy (LDP), where users randomly perturb their inputs to provide plausible deniability of their data without the need for a trusted party, has been adopted recently by several major technology organizations, including Google, Apple and Microsoft. This tutorial aims to introduce the key technical underpinnings of these deployed systems, to survey current research that addresses related problems within the LDP model, and to identify relevant open problems and research directions for the community.

Workshop on Algorithms for data summarization, March 2018

There is a workshop on algorithms for data summarization (streaming, sampling, sketching, property testing, sublinear algorithms and loosely related topics beyond) to be held at the University of Warwick, UK during March 2018 (19th – 22nd), organized by Graham Cormode and Artur Czumaj. The format of the workshop will be talks from experts, with plenty of opportunities for discussion and collaboration. Funding supporting the workshop is from European Research Council and UK EPSRC.

For more information, please contact the organizers.

Postdoc position in Algorithms Research (closes February 14th 2018)

We are seeking to recruit a postdoctoral research fellow to work in the area of designing algorithms for analysing large data sets.

You will be expected to perform high quality research under the supervision of Professor Graham Cormode, as part of the ERC funded project ‘Small Summaries for Big Data’. This can encompass streaming algorithms, sketching and dimensionality reduction, distributed monitoring and mergeable summaries, verification of outsourced computation, or other related topics. The expectation is that you will produce breakthrough research results in the summarisation of large volumes of data, and publish these results in top rated venues.

You will possess a PhD or an equivalent qualification in Computer Science or a very closely-related discipline (or you will shortly be obtaining it). You should have a strong background in one or more of the following areas: randomized and approximation algorithms; communication complexity and lower bounds; streaming or sublinear algorithms.

The post is based in the Department of Computer Science at University of Warwick, but collaborations with closely related research organisations such as the Centre for Discrete Mathematics and its Applications (DIMAP), the Warwick Institute for the Science of Cities (WISC); and the newly formed Alan Turing Institute (ATI) will be strongly encouraged. You will join a team of researchers led by Professor Cormode including other postdoctoral researchers and PhD students.

Candidates should provide with their application form a CV, a list of publications and a research statement.

Closing date: 14th February 2018

More information: https://atsv7.wcn.co.uk/search_engine/jobs.cgi?owner=5062452&ownertype=fair&jcode=1710356&vt_template=1457&adminview=1