Skip to main content

Showing 1–12 of 12 results for author: Gan, E

  1. arXiv:2404.17768  [pdf, other

    cs.LG cs.AI cs.CV

    Make the Most of Your Data: Changing the Training Data Distribution to Improve In-distribution Generalization Performance

    Authors: Dang Nguyen, Paymon Haddad, Eric Gan, Baharan Mirzasoleiman

    Abstract: Can we modify the training data distribution to encourage the underlying optimization method toward finding solutions with superior generalization performance on in-distribution data? In this work, we approach this question for the first time by comparing the inductive bias of gradient descent (GD) with that of sharpness-aware minimization (SAM). By studying a two-layer CNN, we prove that SAM lear… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: 32 pages, 11 figures, 6 tables

  2. arXiv:2403.11391  [pdf, other

    cs.LG cs.CV

    Investigating the Benefits of Projection Head for Representation Learning

    Authors: Yihao Xue, Eric Gan, Jiayi Ni, Siddharth Joshi, Baharan Mirzasoleiman

    Abstract: An effective technique for obtaining high-quality representations is adding a projection head on top of the encoder during training, then discarding it and using the pre-projection representations. Despite its proven practical effectiveness, the reason behind the success of this technique is poorly understood. The pre-projection representations are not directly optimized by the loss function, rais… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Journal ref: ICLR 2024

  3. arXiv:2311.06839  [pdf, other

    cs.LG cs.CR

    Inference and Interference: The Role of Clipping, Pruning and Loss Landscapes in Differentially Private Stochastic Gradient Descent

    Authors: Lauren Watson, Eric Gan, Mohan Dantam, Baharan Mirzasoleiman, Rik Sarkar

    Abstract: Differentially private stochastic gradient descent (DP-SGD) is known to have poorer training and test performance on large neural networks, compared to ordinary stochastic gradient descent (SGD). In this paper, we perform a detailed study and comparison of the two processes and unveil several new insights. By comparing the behavior of the two processes separately in early and late epochs, we find… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

  4. arXiv:2305.18761  [pdf, other

    cs.LG cs.CV

    Identifying Spurious Biases Early in Training through the Lens of Simplicity Bias

    Authors: Yu Yang, Eric Gan, Gintare Karolina Dziugaite, Baharan Mirzasoleiman

    Abstract: Neural networks trained with (stochastic) gradient descent have an inductive bias towards learning simpler solutions. This makes them highly prone to learning spurious correlations in the training data, that may not hold at test time. In this work, we provide the first theoretical analysis of the effect of simplicity bias on learning spurious correlations. Notably, we show that examples with spuri… ▽ More

    Submitted 6 March, 2024; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: 26 pages, 10 figures

    Journal ref: Proceedings of the 27th International Conference on Artificial Intelligence and Statistics (AISTATS) 2024, Valencia, Spain. PMLR: Volume 238

  5. arXiv:2305.16536  [pdf, ps, other

    cs.LG stat.ML

    Which Features are Learnt by Contrastive Learning? On the Role of Simplicity Bias in Class Collapse and Feature Suppression

    Authors: Yihao Xue, Siddharth Joshi, Eric Gan, Pin-Yu Chen, Baharan Mirzasoleiman

    Abstract: Contrastive learning (CL) has emerged as a powerful technique for representation learning, with or without label supervision. However, supervised CL is prone to collapsing representations of subclasses within a class by not capturing all their features, and unsupervised CL may suppress harder class-relevant features by focusing on learning easy class-irrelevant features; both significantly comprom… ▽ More

    Submitted 28 May, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: to appear at ICML 2023

  6. arXiv:2305.08746  [pdf, other

    cs.NE cond-mat.dis-nn cs.AI cs.LG math.RT q-bio.NC

    Seeing is Believing: Brain-Inspired Modular Training for Mechanistic Interpretability

    Authors: Ziming Liu, Eric Gan, Max Tegmark

    Abstract: We introduce Brain-Inspired Modular Training (BIMT), a method for making neural networks more modular and interpretable. Inspired by brains, BIMT embeds neurons in a geometric space and augments the loss function with a cost proportional to the length of each neuron connection. We demonstrate that BIMT discovers useful modular neural networks for many simple tasks, revealing compositional structur… ▽ More

    Submitted 6 June, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

    Comments: Codes are available here: https://github.com/KindXiaoming/BIMT

  7. arXiv:2004.00827  [pdf, other

    cs.DB

    Approximate Selection with Guarantees using Proxies

    Authors: Daniel Kang, Edward Gan, Peter Bailis, Tatsunori Hashimoto, Matei Zaharia

    Abstract: Due to the falling costs of data acquisition and storage, researchers and industry analysts often want to find all instances of rare events in large datasets. For instance, scientists can cheaply capture thousands of hours of video, but are limited by the need to manually inspect long videos to identify relevant objects and events. To reduce this cost, recent work proposes to use cheap proxy model… ▽ More

    Submitted 3 January, 2022; v1 submitted 2 April, 2020; originally announced April 2020.

    Journal ref: PVLDB 2020

  8. arXiv:2002.03063  [pdf, other

    cs.DB

    Storyboard: Optimizing Precomputed Summaries for Aggregation

    Authors: Edward Gan, Peter Bailis, Moses Charikar

    Abstract: An emerging class of data systems partition their data and precompute approximate summaries (i.e., sketches and samples) for each segment to reduce query costs. They can then aggregate and combine the segment summaries to estimate results without scanning the raw data. However, given limited storage space each summary introduces approximation errors that affect query accuracy. For instance, system… ▽ More

    Submitted 7 February, 2020; originally announced February 2020.

  9. arXiv:1905.02304  [pdf, other

    cs.LG cs.DB stat.ML

    CrossTrainer: Practical Domain Adaptation with Loss Reweighting

    Authors: Justin Chen, Edward Gan, Kexin Rong, Sahaana Suri, Peter Bailis

    Abstract: Domain adaptation provides a powerful set of model training techniques given domain-specific training data and supplemental data with unknown relevance. The techniques are useful when users need to develop models with data from varying sources, of varying quality, or from different time ranges. We build CrossTrainer, a system for practical domain adaptation. CrossTrainer utilizes loss reweighting,… ▽ More

    Submitted 6 May, 2019; originally announced May 2019.

  10. arXiv:1803.01969  [pdf, other

    cs.DB

    Moment-Based Quantile Sketches for Efficient High Cardinality Aggregation Queries

    Authors: Edward Gan, Jialin Ding, Kai Sheng Tai, Vatsal Sharan, Peter Bailis

    Abstract: Interactive analytics increasingly involves querying for quantiles over sub-populations of high cardinality datasets. Data processing engines such as Druid and Spark use mergeable summaries to estimate quantiles, but summary merge times can be a bottleneck during aggregation. We show how a compact and efficiently mergeable quantile sketch can support aggregation workloads. This data structure, whi… ▽ More

    Submitted 13 July, 2018; v1 submitted 5 March, 2018; originally announced March 2018.

    Comments: Technical Report for paper to be published in VLDB 2018

  11. arXiv:1603.00567  [pdf, other

    cs.DB

    MacroBase: Prioritizing Attention in Fast Data

    Authors: Peter Bailis, Edward Gan, Samuel Madden, Deepak Narayanan, Kexin Rong, Sahaana Suri

    Abstract: As data volumes continue to rise, manual inspection is becoming increasingly untenable. In response, we present MacroBase, a data analytics engine that prioritizes end-user attention in high-volume fast data streams. MacroBase enables efficient, accurate, and modular analyses that highlight and aggregate important and unusual behavior, acting as a search engine for fast data. MacroBase is able to… ▽ More

    Submitted 24 March, 2017; v1 submitted 1 March, 2016; originally announced March 2016.

    Comments: SIGMOD 2017

  12. Type Classes for Lightweight Substructural Types

    Authors: Edward Gan, Jesse A. Tov, Greg Morrisett

    Abstract: Linear and substructural types are powerful tools, but adding them to standard functional programming languages often means introducing extra annotations and typing machinery. We propose a lightweight substructural type system design that recasts the structural rules of weakening and contraction as type classes; we demonstrate this design in a prototype language, Clamp. Clamp supports polymorphi… ▽ More

    Submitted 16 February, 2015; originally announced February 2015.

    Comments: In Proceedings LINEARITY 2014, arXiv:1502.04419

    ACM Class: D.3.3

    Journal ref: EPTCS 176, 2015, pp. 34-48