Keynote talk at PODC

Invited keynote talk at Principles of Distributed Computation (PODC) 2018 in July 2018 on “Data summarization and distributed computation

The notion of summarization is to provide a compact representation of data which approximately captures its essential characteristics. If such summaries can be created, they can lead to efficient distributed algorithms which exchange summaries in order to compute a desired function. In this talk, I’ll describe recent efforts in this direction for problems inspired by machine learning: building graphical models over evolving, distributed training examples, and solving robust regression problems over large, distributed data sets.

Keynote in Symposium on Experimental Algorithms

Engineering streaming algorithms, June 2017.
Invited talk at Symposium on Experimental Algorithms.

Streaming algorithms must process a large quantity of small updates quickly to allow queries about the input to be answered from a small summary. Initial work on streaming algorithms laid out theoretical results, and subsequent efforts have involved engineering these for practical use. Informed by experiments, streaming algorithms have been widely implemented and used in practice. This talk will survey this line of work, and identify some lessons learned.