skip to main content
research-article

Frequency estimation under multiparty differential privacy: one-shot and streaming

Published: 01 June 2022 Publication History
  • Get Citation Alerts
  • Abstract

    We study the fundamental problem of frequency estimation under both privacy and communication constraints, where the data is distributed among k parties. We consider two application scenarios: (1) one-shot, where the data is static and the aggregator conducts a one-time computation; and (2) streaming, where each party receives a stream of items over time and the aggregator continuously monitors the frequencies. We adopt the model of multiparty differential privacy (MDP), which is more general than local differential privacy (LDP) and (centralized) differential privacy. Our protocols achieve optimality (up to logarithmic factors) permissible by the more stringent of the two constraints. In particular, when specialized to the ε-LDP model, our protocol achieves an error of √k/(εΘ(ε) − 1) using O(k max{ε, log 1/ε}) bits of communication and O(k log u) bits of public randomness, where u is the size of the domain.

    References

    [1]
    J. Acharya and Z. Sun. Communication complexity in locally private distribution estimation and heavy hitters. In International Conference on Machine Learning, pages 51--60. PMLR, 2019.
    [2]
    J. Acharya, Z. Sun, and H. Zhang. Hadamard response: Estimating distributions privately, efficiently, and with little communication. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 1120--1129, 2019.
    [3]
    N. Agarwal, A. T. Suresh, F. X. X. Yu, S. Kumar, and B. McMahan. cpsgd: Communication-efficient and differentially-private distributed sgd. In Advances in Neural Information Processing Systems, pages 7564--7575, 2018.
    [4]
    Apple. Apple differential privacy technical overview. https://www.apple.com/privacy/docs/Differential_Privacy_Overview.pdf, 2017. [Last accessed on 5-June-2022].
    [5]
    B. Balle, J. Bell, A. Gascón, and K. Nissim. The privacy blanket of the shuffle model. In Annual International Cryptology Conference, pages 638--667. Springer, 2019.
    [6]
    B. Barak, K. Chaudhuri, C. Dwork, S. Kale, F. McSherry, and K. Talwar. Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 273--282. ACM, 2007.
    [7]
    R. Bassily, K. Nissim, U. Stemmer, and A. Thakurta. Practical locally private heavy hitters. In Neural Information Processing Systems, pages 2285--2293, 2017.
    [8]
    R. Bassily, K. Nissim, U. Stemmer, and A. Thakurta. Practical locally private heavy hitters. J. Mach. Learn. Res., 21:16:1--16:42, 2020.
    [9]
    R. Bassily and A. Smith. Local, private, efficient protocols for succinct histograms. In ACM STOC, pages 127--135, 2015.
    [10]
    A. Blum, C. Dwork, F. McSherry, and K. Nissim. Practical privacy: the sulq framework. In ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 128--138, 2005.
    [11]
    A. Chakrabarti. Data stream algorithms. Computer Science, 49:149, 2015.
    [12]
    T. H. Chan, E. Shi, and D. Song. Optimal lower bound for differentially private multi-party aggregation. In European Symposium on Algorithms, pages 277--288. Springer, 2012.
    [13]
    T.-H. H. Chan, M. Li, E. Shi, and W. Xu. Differentially private continual monitoring of heavy hitters from distributed streams. In International Symposium on Privacy Enhancing Technologies Symposium, pages 140--159. Springer, 2012.
    [14]
    T.-H. H. Chan, E. Shi, and D. Song. Private and continual release of statistics. ACM Transactions on Information and System Security, 2011.
    [15]
    T.-H. H. Chan, E. Shi, and D. Song. Private and continual release of statistics. ACM Transactions on Information and System Security (TISSEC), 14(3):1--24, 2011.
    [16]
    M. Charikar, K. Chen, and M. Farach-Colton. Finding frequent items in data streams. In International Colloquium on Automata, Languages, and Programming, pages 693--703. Springer, 2002.
    [17]
    W.-N. Chen, P. Kairouz, and A. Özgür. Breaking the communication-privacy-accuracy trilemma. In Advances in Neural Information Processing Systems, 2020.
    [18]
    G. Cormode and M. Hadjieleftheriou. Methods for finding frequent items in data streams. The VLDB Journal, 19(1):3--20, 2010.
    [19]
    G. Cormode, S. Jha, T. Kulkarni, N. Li, D. Srivastava, and T. Wang. Privacy at scale: Local differential privacy in practice. In ACM International Conference on Management of Data, pages 1655--1658, 2018.
    [20]
    G. Cormode, T. Kulkarni, and D. Srivastava. Answering range queries under local differential privacy. Proceedings of the VLDB Endowment, 12(10):1126--1138, 2019.
    [21]
    G. Cormode, S. Maddock, and C. Maple. Frequency estimation under local differential privacy [experiments, analysis and benchmarks]. arXiv preprint arXiv:2103.16640, 2021.
    [22]
    G. Cormode and S. Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms, 55(1):58--75, 2005.
    [23]
    G. Cormode, S. Muthukrishnan, and K. Yi. Algorithms for distributed functional monitoring. ACM Transactions on Algorithms, 7:21, 03 2011.
    [24]
    G. Cormode, S. Muthukrishnan, K. Yi, and Q. Zhang. Continuous sampling from distributed streams. Journal of The ACM, 59:1--25, 04 2012.
    [25]
    G. Cormode, C. M. Procopiuc, D. Srivastava, and T. T. L. Tran. Differentially private summaries for sparse data. In International Conference on Database Theory, pages 299--311, 2012.
    [26]
    G. Cormode and K. Yi. Small Summaries for Big Data. Cambridge University Press, 2020.
    [27]
    B. Ding, J. Kulkarni, and S. Yekhanin. Collecting telemetry data privately. In Advances in Neural Information Processing Systems, pages 3571--3580, 2017.
    [28]
    C. Dwork, M. Naor, T. Pitassi, and G. N. Rothblum. Differential privacy under continual observation. In ACM STOC, pages 715--724, 2010.
    [29]
    C. Dwork, A. Roth, et al. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3-4):211--407, 2014.
    [30]
    Ú. Erlingsson, V. Feldman, I. Mironov, A. Raghunathan, K. Talwar, and A. Thakurta. Amplification by shuffling: From local to central differential privacy via anonymity. In ACM-SIAM Symposium on Discrete Algorithms, pages 2468--2479, 2019.
    [31]
    Ú. Erlingsson, V. Pihur, and A. Korolova. RAPPOR: randomized aggregatable privacy-preserving ordinal response. In ACM SIGSAC Conference on Computer and Communications Security, pages 1054--1067. ACM, 2014.
    [32]
    V. Feldman and K. Talwar. Lossless compression of efficient private local randomizers. In International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 3208--3219, 18-24 Jul 2021.
    [33]
    B. Ghazi, N. Golowich, R. Kumar, R. Pagh, and A. Velingker. On the power of multiple anonymous messages. arXiv preprint arXiv:1908.11358, 2019.
    [34]
    B. Ghazi, R. Kumar, P. Manurangsi, and R. Pagh. Private counting from anonymous messages: Near-optimal accuracy with vanishing communication overhead. In International Conference on Machine Learning, pages 3505--3514. PMLR, 2020.
    [35]
    A. Ghosh, T. Roughgarden, and M. Sundararajan. Universally utility-maximizing privacy mechanisms. SIAM Journal on Computing, 41(6):1673--1693, 2012.
    [36]
    Z. Huang, Y. Qiu, K. Yi, and G. Cormode. Frequency estimation under multiparty differential privacy: One-shot and streaming. arXiv preprint arXiv:2104.01808, 2021.
    [37]
    Z. Huang and K. Yi. The communication complexity of distributed epsilon-approximations. SIAM Journal on Computing, 46(4):1370--1394, 2017.
    [38]
    P. Kairouz and et al. Advances and open problems in federated learning. arXiv preprint arXiv:1912.04977, 2019.
    [39]
    P. Kairouz, S. Oh, and P. Viswanath. Extremal mechanisms for local differential privacy. Advances in neural information processing systems, 27:2879--2887, 2014.
    [40]
    A. Machanavajjhala, D. Kifer, J. M. Abowd, J. Gehrke, and L. Vilhuber. Privacy: Theory meets practice on the map. In IEEE International Conference on Data Engineering, pages 277--286. IEEE Computer Society, 2008.
    [41]
    A. McGregor, I. Mironov, T. Pitassi, O. Reingold, K. Talwar, and S. Vadhan. The limits of two-party differential privacy. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pages 81--90. IEEE, 2010.
    [42]
    J. Misra and D. Gries. Finding repeated elements. Science of computer programming, 2(2):143--152, 1982.
    [43]
    T. T. Nguyěn, X. Xiao, Y. Yang, S. C. Hui, H. Shin, and J. Shin. Collecting and analyzing data from smart device users with local differential privacy. CoRR, abs/1606.05053, 2016.
    [44]
    M. Pathak, S. Rane, and B. Raj. Multiparty differential privacy via aggregation of locally trained classifiers. In Advances in Neural Information Processing Systems, pages 1876--1884, 2010.
    [45]
    E. Shi, T.-H. Chan, E. Rieffel, and D. Song. Distributed private data analysis: Lower bounds and practical constructions. ACM Transactions on Algorithms, 13:1--38, 12 2017.
    [46]
    J. Upadhyay. Sublinear space private algorithms under the sliding window model. In International Conference on Machine Learning, pages 6363--6372. PMLR, 2019.
    [47]
    S. Vadhan. The complexity of differential privacy. In Tutorials on the Foundations of Cryptography, pages 347--450. Springer, 2017.
    [48]
    L. Wang, G. Luo, K. Yi, and G. Cormode. Quantiles over data streams: an experimental study. In ACM SIGMOD International Conference on Management of Data, pages 737--748, 2013.
    [49]
    N. Wang, X. Xiao, Y. Yang, T. D. Hoang, H. Shin, J. Shin, and G. Yu. Privtrie: Effective frequent term discovery under local differential privacy. In IEEE International Conference on Data Engineering, pages 821--832, 2018.
    [50]
    T. Wang, J. Blocki, N. Li, and S. Jha. Locally differentially private protocols for frequency estimation. In 26th {USENIX} Security Symposium ({USENIX} Security 17), pages 729--745, 2017.
    [51]
    T. Wang, X. Zhang, J. Feng, and X. Yang. A comprehensive survey on local differential privacy toward data statistics and analysis in crowdsensing. CoRR, abs/2010.05253, 2020.
    [52]
    X. Xiao, G. Wang, and J. Gehrke. Differential privacy via wavelet transforms. IEEE Trans. Knowl. Data Eng., 23(8):1200--1214, 2011.
    [53]
    M. Yang, L. Lyu, J. Zhao, T. Zhu, and K. Lam. Local differential privacy and its applications: A comprehensive survey. CoRR, abs/2008.03686, 2020.
    [54]
    D. Zhang, R. McKenna, I. Kotsogiannis, G. Bissias, M. Hay, A. Machanavajjhala, and G. Miklau. ϵKTELO: A framework for defining differentially private computations. ACM Trans. Database Syst., 45(1):2:1--2:44, 2020.
    [55]
    J. Zhang, G. Cormode, C. M. Procopiuc, D. Srivastava, and X. Xiao. Privbayes: Private data release via bayesian networks. ACM Trans. Database Syst., 42(4):25:1--25:41, 2017.

    Cited By

    View all
    • (2023)Privacy amplification via compressionProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669152(69202-69227)Online publication date: 10-Dec-2023
    • (2023)A smooth binary mechanism for efficient private continual observationProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668258(49133-49145)Online publication date: 10-Dec-2023
    • (2023)Falcon: A Privacy-Preserving and Interpretable Vertical Federated Learning SystemProceedings of the VLDB Endowment10.14778/3603581.360358816:10(2471-2484)Online publication date: 1-Jun-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 15, Issue 10
    June 2022
    319 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    Published: 01 June 2022
    Published in PVLDB Volume 15, Issue 10

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)8
    • Downloads (Last 6 weeks)0

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Privacy amplification via compressionProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669152(69202-69227)Online publication date: 10-Dec-2023
    • (2023)A smooth binary mechanism for efficient private continual observationProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668258(49133-49145)Online publication date: 10-Dec-2023
    • (2023)Falcon: A Privacy-Preserving and Interpretable Vertical Federated Learning SystemProceedings of the VLDB Endowment10.14778/3603581.360358816:10(2471-2484)Online publication date: 1-Jun-2023
    • (2023)Efficient and Secure Quantile Aggregation of Private Data StreamsIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.327277518(3058-3073)Online publication date: 1-Jan-2023
    • (2022)Improved utility analysis of private CountSketchProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602128(25631-25643)Online publication date: 28-Nov-2022

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media