subscribe to arXiv mailings

arXiv:2406.19350 [pdf, other]

Complex Dynamics in Autobidding Systems

Authors: Renato Paes Leme, Georgios Piliouras, Jon Schneider, Kelly Spendlove, Song Zuo

Abstract: It has become the default in markets such as ad auctions for participants to bid in an auction through automated bidding agents (autobidders) which adjust bids over time to satisfy return-over-spend constraints. Despite the prominence of such systems for the internet economy, their resulting dynamical behavior is still not well understood. Although one might hope that such relatively simple system… ▽ More It has become the default in markets such as ad auctions for participants to bid in an auction through automated bidding agents (autobidders) which adjust bids over time to satisfy return-over-spend constraints. Despite the prominence of such systems for the internet economy, their resulting dynamical behavior is still not well understood. Although one might hope that such relatively simple systems would typically converge to the equilibria of their underlying auctions, we provide a plethora of results that show the emergence of complex behavior, such as bi-stability, periodic orbits and quasi periodicity. We empirically observe how the market structure (expressed as motifs) qualitatively affects the behavior of the dynamics. We complement it with theoretical results showing that autobidding systems can simulate both linear dynamical systems as well logical boolean gates. △ Less

Submitted 1 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.16694 [pdf, other]

Task Oriented In-Domain Data Augmentation

Authors: Xiao Liang, Xinyu Hu, Simiao Zuo, Yeyun Gong, Qiang Lou, Yi Liu, Shao-Lun Huang, Jian Jiao

Abstract: Large Language Models (LLMs) have shown superior performance in various applications and fields. To achieve better performance on specialized domains such as law and advertisement, LLMs are often continue pre-trained on in-domain data. However, existing approaches suffer from two major issues. First, in-domain data are scarce compared with general domain-agnostic data. Second, data used for contin… ▽ More Large Language Models (LLMs) have shown superior performance in various applications and fields. To achieve better performance on specialized domains such as law and advertisement, LLMs are often continue pre-trained on in-domain data. However, existing approaches suffer from two major issues. First, in-domain data are scarce compared with general domain-agnostic data. Second, data used for continual pre-training are not task-aware, such that they may not be helpful to downstream applications. We propose TRAIT, a task-oriented in-domain data augmentation framework. Our framework is divided into two parts: in-domain data selection and task-oriented synthetic passage generation. The data selection strategy identifies and selects a large amount of in-domain data from general corpora, and thus significantly enriches domain knowledge in the continual pre-training data. The synthetic passages contain guidance on how to use domain knowledge to answer questions about downstream tasks. By training on such passages, the model aligns with the need of downstream applications. We adapt LLMs to two domains: advertisement and math. On average, TRAIT improves LLM performance by 8% in the advertisement domain and 7.5% in the math domain. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.15740 [pdf, other]

The FRB-searching pipeline of the Tianlai Cylinder Pathfinder Array

Authors: Zijie Yu, Furen Deng, Shijie Sun, Chenhui Niu, Jixia Li, Fengquan Wu, Wei-Yang Wang, Yougang Wang, Shifan Zuo, Lin Shu, Jie Hao, Xiaohui Liu, Reza Ansari, Ue-Li Pen, Albert Stebbins, Peter Timbie, Xuelei Chen

Abstract: This paper presents the design, calibration, and survey strategy of the Fast Radio Burst (FRB) digital backend and its real-time data processing pipeline employed in the Tianlai Cylinder Pathfinder array. The array, consisting of three parallel cylindrical reflectors and equipped with 96 dual-polarization feeds, is a radio interferometer array designed for conducting drift scans of the northern ce… ▽ More This paper presents the design, calibration, and survey strategy of the Fast Radio Burst (FRB) digital backend and its real-time data processing pipeline employed in the Tianlai Cylinder Pathfinder array. The array, consisting of three parallel cylindrical reflectors and equipped with 96 dual-polarization feeds, is a radio interferometer array designed for conducting drift scans of the northern celestial semi-sphere. The FRB digital backend enables the formation of 96 digital beams, effectively covering an area of approximately 40 square degrees with 3 dB beam. Our pipeline demonstrates the capability to make automatic search of FRBs, detecting at quasi-real-time and classify FRB candidates automatically. The current FRB searching pipeline has an overall recall rate of 88\%. During the commissioning phase, we successfully detected signals emitted by four well-known pulsars: PSR B0329+54, B2021+51, B0823+26, and B2020+28. We report the first discovery of an FRB by our array, designated as FRB 20220414A. We also investigate the optimal arrangement for the digitally formed beams to achieve maximum detection rate by numerical simulation. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: 27 pages, 21 figures, 7 tables, RAA accepted

arXiv:2406.11409 [pdf, other]

CodeGemma: Open Code Models Based on Gemma

Authors: CodeGemma Team, Heri Zhao, Jeffrey Hui, Joshua Howland, Nam Nguyen, Siqi Zuo, Andrea Hu, Christopher A. Choquette-Choo, Jingyue Shen, Joe Kelley, Kshitij Bansal, Luke Vilnis, Mateo Wirth, Paul Michel, Peter Choy, Pratik Joshi, Ravin Kumar, Sarmad Hashmi, Shubham Agrawal, Zhitao Gong, Jane Fine, Tris Warkentin, Ale Jakse Hartman, Bin Ni, Kathy Korevec , et al. (2 additional authors not shown)

Abstract: This paper introduces CodeGemma, a collection of specialized open code models built on top of Gemma, capable of a variety of code and natural language generation tasks. We release three model variants. CodeGemma 7B pretrained (PT) and instruction-tuned (IT) variants have remarkably resilient natural language understanding, excel in mathematical reasoning, and match code capabilities of other open… ▽ More This paper introduces CodeGemma, a collection of specialized open code models built on top of Gemma, capable of a variety of code and natural language generation tasks. We release three model variants. CodeGemma 7B pretrained (PT) and instruction-tuned (IT) variants have remarkably resilient natural language understanding, excel in mathematical reasoning, and match code capabilities of other open models. CodeGemma 2B is a state-of-the-art code completion model designed for fast code infilling and open-ended generation in latency-sensitive settings. △ Less

Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: v1: 11 pages, 4 figures, 5 tables. v2: Update metadata

arXiv:2406.07023 [pdf, other]

LiSD: An Efficient Multi-Task Learning Framework for LiDAR Segmentation and Detection

Authors: Jiahua Xu, Si Zuo, Chenfeng Wei, Wei Zhou

Abstract: With the rapid proliferation of autonomous driving, there has been a heightened focus on the research of lidar-based 3D semantic segmentation and object detection methodologies, aiming to ensure the safety of traffic participants. In recent decades, learning-based approaches have emerged, demonstrating remarkable performance gains in comparison to conventional algorithms. However, the segmentation… ▽ More With the rapid proliferation of autonomous driving, there has been a heightened focus on the research of lidar-based 3D semantic segmentation and object detection methodologies, aiming to ensure the safety of traffic participants. In recent decades, learning-based approaches have emerged, demonstrating remarkable performance gains in comparison to conventional algorithms. However, the segmentation and detection tasks have traditionally been examined in isolation to achieve the best precision. To this end, we propose an efficient multi-task learning framework named LiSD which can address both segmentation and detection tasks, aiming to optimize the overall performance. Our proposed LiSD is a voxel-based encoder-decoder framework that contains a hierarchical feature collaboration module and a holistic information aggregation module. Different integration methods are adopted to keep sparsity in segmentation while densifying features for query initialization in detection. Besides, cross-task information is utilized in an instance-aware refinement module to obtain more accurate predictions. Experimental results on the nuScenes dataset and Waymo Open Dataset demonstrate the effectiveness of our proposed model. It is worth noting that LiSD achieves the state-of-the-art performance of 83.3% mIoU on the nuScenes segmentation benchmark for lidar-only methods. △ Less

Submitted 11 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.03211 [pdf, ps, other]

Study of hybrid stars with nonstrange quark matter cores

Authors: Cheng-Ming Li, He-Rui Zheng, Shu-Yu Zuo, Ya-Peng Zhao, Fei Wang, Yong-Feng Huang

Abstract: In this work, under the hypothesis that quark matter may not be strange [Phys. Rev. Lett. 120, 222001 (2018)], we adopt a modification of the coupling constant of the four-quark scalar interaction $G\rightarrow G_1+G_2\langle\barψψ\rangle$ in the 2-flavor Nambu-Jona-Lasinio model to study nonstrange hybrid stars. According to lattice QCD simulation results of the critical temperature at zero chemi… ▽ More In this work, under the hypothesis that quark matter may not be strange [Phys. Rev. Lett. 120, 222001 (2018)], we adopt a modification of the coupling constant of the four-quark scalar interaction $G\rightarrow G_1+G_2\langle\barψψ\rangle$ in the 2-flavor Nambu-Jona-Lasinio model to study nonstrange hybrid stars. According to lattice QCD simulation results of the critical temperature at zero chemical potential, $G_1$ and $G_2$ are constrained as $G_1\in(1.935, 1.972)$ GeV$^{-2}$, and $G_2\in(-1.582, -0.743)$ GeV$^{-5}$, respectively. To obtain hybrid equation of states, the Maxwell construction is used to describe the first-order confinement-deconfinement phase transition in hybrid stars. With recent measurements on neutron star mass, radius, and tidal deformability, the hybrid equation of states are constrained. The result suggests that pure nonstrange quark matter cores can exist in hybrid stars, possessing 0.014-0.026 solar mass with the bag constant $B^{1/4}$ in a range of 148-161 MeV. It is argued that the binary neutron stars in GW170817 should be hadron stars. △ Less

Submitted 23 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

Comments: 11 pages, 10 figures

arXiv:2405.20642 [pdf, other]

Principal-Agent Multitasking: the Uniformity of Optimal Contracts and its Efficient Learning via Instrumental Regression

Authors: Shiliang Zuo

Abstract: This work studies the multitasking principal-agent problem. I first show a ``uniformity'' result. Specifically, when the tasks are perfect substitutes, and the agent's cost function is homogeneous to a certain degree, then the optimal contract only depends on the marginal utility of each task and the degree of homogeneity. I then study a setting where the marginal utility of each task is unknown s… ▽ More This work studies the multitasking principal-agent problem. I first show a ``uniformity'' result. Specifically, when the tasks are perfect substitutes, and the agent's cost function is homogeneous to a certain degree, then the optimal contract only depends on the marginal utility of each task and the degree of homogeneity. I then study a setting where the marginal utility of each task is unknown so that the optimal contract must be learned or estimated with observational data. I identify this problem as a regression problem with measurement errors and observe that this problem can be cast as an instrumental regression problem. The current works observe that both the contract and the repeated observations (when available) can act as valid instrumental variables, and propose using the generalized method of moments estimator to compute an approximately optimal contract from offline data. I also study an online setting and show how the optimal contract can be efficiently learned in an online fashion using the two estimators. Here the principal faces an exploration-exploitation tradeoff: she must experiment with new contracts and observe their outcome whilst at the same time ensuring her experimentations are not deviating too much from the optimal contract. This work shows when repeated observations are available and agents are sufficiently ``diverse", the principal can achieve a very low $\widetilde{O}(d)$ cumulative utility loss, even with a ``pure exploitation" algorithm. △ Less

Submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.20631 [pdf, ps, other]

Optimizing Contracts in Principal-Agent Team Production

Authors: Shiliang Zuo

Abstract: I study a principal-agent team production model. The principal hires a team of agents to participate in a common production task. The exact effort of each agent is unobservable and unverifiable, but the total production outcome (e.g. the total revenue) can be observed. The principal incentivizes the agents to exert effort through contracts. Specifically, the principal promises that each agent rece… ▽ More I study a principal-agent team production model. The principal hires a team of agents to participate in a common production task. The exact effort of each agent is unobservable and unverifiable, but the total production outcome (e.g. the total revenue) can be observed. The principal incentivizes the agents to exert effort through contracts. Specifically, the principal promises that each agent receives a pre-specified amount of share of the total production output. The principal is interested in finding the optimal profit-sharing rule that maximizes her own utility. I identify a condition under which the principal's optimization problem can be reformulated as solving a family of convex programs, thereby showing the optimal contract can be found efficiently. △ Less

Submitted 31 May, 2024; originally announced May 2024.

arXiv:2404.03476 [pdf, other]

A Reduction from Multi-Parameter to Single-Parameter Bayesian Contract Design

Authors: Matteo Castiglioni, Junjie Chen, Minming Li, Haifeng Xu, Song Zuo

Abstract: The main result of this paper is an almost approximation-preserving polynomial-time reduction from the most general multi-parameter Bayesian contract design (BCD) to single-parameter BCD. That is, for any multi-parameter BCD instance $I^M$, we construct a single-parameter instance $I^S$ such that any $β$-approximate contract (resp. menu of contracts) of $I^S$ can in turn be converted to a $(β-ε)$-… ▽ More The main result of this paper is an almost approximation-preserving polynomial-time reduction from the most general multi-parameter Bayesian contract design (BCD) to single-parameter BCD. That is, for any multi-parameter BCD instance $I^M$, we construct a single-parameter instance $I^S$ such that any $β$-approximate contract (resp. menu of contracts) of $I^S$ can in turn be converted to a $(β-ε)$-approximate contract (resp. menu of contracts) of $I^M$. The reduction is in time polynomial in the input size and $\log(\frac{1}ε)$; moreover, when $β= 1$ (i.e., the given single-parameter solution is exactly optimal), the dependence on $\frac{1}ε$ can be removed, leading to a polynomial-time exact reduction. This efficient reduction is somewhat surprising because in the closely related problem of Bayesian mechanism design, a polynomial-time reduction from multi-parameter to single-parameter setting is believed to not exist. Our result demonstrates the intrinsic difficulty of addressing moral hazard in Bayesian contract design, regardless of being single-parameter or multi-parameter. As byproducts, our reduction answers two open questions in recent literature of algorithmic contract design: (a) it implies that optimal contract design in single-parameter BCD is not in APX unless P=NP even when the agent's type distribution is regular, answering the open question of [Alon et al. 2021] in the negative; (b) it implies that the principal's (order-wise) tight utility gap between using a menu of contracts and a single contract is $Θ(n)$ where $n$ is the number of actions, answering the major open question of [Guruganesh et al. 2021] for the single-parameter case. △ Less

Submitted 4 April, 2024; originally announced April 2024.

arXiv:2403.13374 [pdf, other]

Byzantine-resilient Federated Learning With Adaptivity to Data Heterogeneity

Authors: Shiyuan Zuo, Xingrun Yan, Rongfei Fan, Han Hu, Hangguan Shan, Tony Q. S. Quek

Abstract: This paper deals with federated learning (FL) in the presence of malicious Byzantine attacks and data heterogeneity. A novel Robust Average Gradient Algorithm (RAGA) is proposed, which leverages the geometric median for aggregation and can freely select the round number for local updating. Different from most existing resilient approaches, which perform convergence analysis based on strongly-conve… ▽ More This paper deals with federated learning (FL) in the presence of malicious Byzantine attacks and data heterogeneity. A novel Robust Average Gradient Algorithm (RAGA) is proposed, which leverages the geometric median for aggregation and can freely select the round number for local updating. Different from most existing resilient approaches, which perform convergence analysis based on strongly-convex loss function or homogeneously distributed dataset, we conduct convergence analysis for not only strongly-convex but also non-convex loss function over heterogeneous dataset. According to our theoretical analysis, as long as the fraction of dataset from malicious users is less than half, RAGA can achieve convergence at rate $\mathcal{O}({1}/{T^{2/3- δ}})$ where $T$ is the iteration number and $δ\in (0, 2/3)$ for non-convex loss function, and at linear rate for strongly-convex loss function. Moreover, stationary point or global optimal solution is proved to obtainable as data heterogeneity vanishes. Experimental results corroborate the robustness of RAGA to Byzantine attacks and verifies the advantage of RAGA over baselines on convergence performance under various intensity of Byzantine attacks, for heterogeneous dataset. △ Less

Submitted 27 March, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.07143 [pdf, ps, other]

New Perspectives in Online Contract Design

Authors: Shiliang Zuo

Abstract: This work studies the repeated principal-agent problem from an online learning perspective. The principal's goal is to learn the optimal contract that maximizes her utility through repeated interactions, without prior knowledge of the agent's type (i.e., the agent's cost and production functions). This work contains three technical results. First, learning linear contracts with binary outcomes is… ▽ More This work studies the repeated principal-agent problem from an online learning perspective. The principal's goal is to learn the optimal contract that maximizes her utility through repeated interactions, without prior knowledge of the agent's type (i.e., the agent's cost and production functions). This work contains three technical results. First, learning linear contracts with binary outcomes is equivalent to dynamic pricing with an unknown demand curve. Second, learning an approximately optimal contract with identical agents can be accomplished with a polynomial sample complexity scheme. Third, learning the optimal contract with heterogeneous agents can be reduced to Lipschitz bandits under mild regularity conditions. The technical results demonstrate that the one-dimensional effort model, the default model for principal-agent problems in economics which seems largely ignored in recent works from the computer science community, may possibly be the more suitable choice when studying contract design from a learning perspective. △ Less

Submitted 22 May, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

arXiv:2402.13417 [pdf, other]

Unlocking the `Why' of Buying: Introducing a New Dataset and Benchmark for Purchase Reason and Post-Purchase Experience

Authors: Tao Chen, Siqi Zuo, Cheng Li, Mingyang Zhang, Qiaozhu Mei, Michael Bendersky

Abstract: Explanations are crucial for enhancing user trust and understanding within modern recommendation systems. To build truly explainable systems, we need high-quality datasets that elucidate why users make choices. While previous efforts have focused on extracting users' post-purchase sentiment in reviews, they ignore the reasons behind the decision to buy. In our work, we propose a novel purchase r… ▽ More Explanations are crucial for enhancing user trust and understanding within modern recommendation systems. To build truly explainable systems, we need high-quality datasets that elucidate why users make choices. While previous efforts have focused on extracting users' post-purchase sentiment in reviews, they ignore the reasons behind the decision to buy. In our work, we propose a novel purchase reason explanation task. To this end, we introduce an LLM-based approach to generate a dataset that consists of textual explanations of why real users make certain purchase decisions. We induce LLMs to explicitly distinguish between the reasons behind purchasing a product and the experience after the purchase in a user review. An automated, LLM-driven evaluation, as well as a small scale human evaluation, confirms the effectiveness of our approach to obtaining high-quality, personalized explanations. We benchmark this dataset on two personalized explanation generation tasks. We release the code and prompts to spur further research. △ Less

Submitted 20 February, 2024; originally announced February 2024.

arXiv:2401.13986 [pdf, other]

Towards Consistent Natural-Language Explanations via Explanation-Consistency Finetuning

Authors: Yanda Chen, Chandan Singh, Xiaodong Liu, Simiao Zuo, Bin Yu, He He, Jianfeng Gao

Abstract: Large language models (LLMs) often generate convincing, fluent explanations. However, different from humans, they often generate inconsistent explanations on different inputs. For example, an LLM may generate the explanation "all birds can fly" when answering the question "Can sparrows fly?" but meanwhile answer "no" to the related question "Can penguins fly?". Explanations should be consistent ac… ▽ More Large language models (LLMs) often generate convincing, fluent explanations. However, different from humans, they often generate inconsistent explanations on different inputs. For example, an LLM may generate the explanation "all birds can fly" when answering the question "Can sparrows fly?" but meanwhile answer "no" to the related question "Can penguins fly?". Explanations should be consistent across related examples so that they allow a human to simulate the LLM's decision process on multiple examples. We propose explanation-consistency finetuning (EC-finetuning), a method that adapts LLMs to generate more consistent natural-language explanations on related examples. EC-finetuning involves finetuning LLMs on synthetic data that is carefully constructed to contain consistent explanations. Across a variety of question-answering datasets in various domains, EC-finetuning yields a 10.0% relative explanation consistency improvement on four finetuning datasets, and generalizes to seven out-of-distribution datasets not seen during finetuning (+4.5% relative). Code is available at https://github.com/yandachen/explanation-consistency-finetuning . △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: arXiv admin note: text overlap with arXiv:2307.08678

arXiv:2312.07145 [pdf, other]

Contextual Bandits with Online Neural Regression

Authors: Rohan Deb, Yikun Ban, Shiliang Zuo, Jingrui He, Arindam Banerjee

Abstract: Recent works have shown a reduction from contextual bandits to online regression under a realizability assumption [Foster and Rakhlin, 2020, Foster and Krishnamurthy, 2021]. In this work, we investigate the use of neural networks for such online regression and associated Neural Contextual Bandits (NeuCBs). Using existing results for wide networks, one can readily show a ${\mathcal{O}}(\sqrt{T})$ r… ▽ More Recent works have shown a reduction from contextual bandits to online regression under a realizability assumption [Foster and Rakhlin, 2020, Foster and Krishnamurthy, 2021]. In this work, we investigate the use of neural networks for such online regression and associated Neural Contextual Bandits (NeuCBs). Using existing results for wide networks, one can readily show a ${\mathcal{O}}(\sqrt{T})$ regret for online regression with square loss, which via the reduction implies a ${\mathcal{O}}(\sqrt{K} T^{3/4})$ regret for NeuCBs. Departing from this standard approach, we first show a $\mathcal{O}(\log T)$ regret for online regression with almost convex losses that satisfy QG (Quadratic Growth) condition, a generalization of the PL (Polyak-Łojasiewicz) condition, and that have a unique minima. Although not directly applicable to wide networks since they do not have unique minima, we show that adding a suitable small random perturbation to the network predictions surprisingly makes the loss satisfy QG with unique minima. Based on such a perturbed prediction, we show a ${\mathcal{O}}(\log T)$ regret for online regression with both squared loss and KL loss, and subsequently convert these respectively to $\tilde{\mathcal{O}}(\sqrt{KT})$ and $\tilde{\mathcal{O}}(\sqrt{KL^*} + K)$ regret for NeuCB, where $L^*$ is the loss of the best policy. Separately, we also show that existing regret bounds for NeuCBs are $Ω(T)$ or assume i.i.d. contexts, unlike this work. Finally, our experimental results on various datasets demonstrate that our algorithms, especially the one based on KL loss, persistently outperform existing algorithms. △ Less

Submitted 12 December, 2023; originally announced December 2023.

arXiv:2312.01064 [pdf, other]

Application of Regularization Methods in the Sky Map Reconstruction of the Tianlai Cylinder Pathfinder Array

Authors: Kaifeng Yu, Shifan Zuo, Fengquan Wu, Yougang Wang, Xuelei Chen

Abstract: The Tianlai cylinder pathfinder is a radio interferometer array to test 21 cm intensity mapping techniques in the post-reionization era. It works in passive drift scan mode to survey the sky visible in the northern hemisphere. To deal with the large instantaneous field of view and the spherical sky, we decompose the drift scan data into m-modes, which are linearly related to the sky intensity. The… ▽ More The Tianlai cylinder pathfinder is a radio interferometer array to test 21 cm intensity mapping techniques in the post-reionization era. It works in passive drift scan mode to survey the sky visible in the northern hemisphere. To deal with the large instantaneous field of view and the spherical sky, we decompose the drift scan data into m-modes, which are linearly related to the sky intensity. The sky map is reconstructed by solving the linear interferometer equations. Due to incomplete uv coverage of the interferometer baselines, this inverse problem is usually ill-posed, and regularization method is needed for its solution. In this paper, we use simulation to investigate two frequently used regularization methods, the Truncated Singular Value Decomposition (TSVD), and the Tikhonov regularization techniques. Choosing the regularization parameter is very important for its application. We employ the generalized cross validation (GCV) method and the L-curve method to determine the optimal value. We compare the resulting maps obtained with the different regularization methods, and for the different parameters derived using the different criteria. While both methods can yield good maps for a range of regularization parameters, in the Tikhonov method the suppression of noisy modes are more gradually applied, produce more smooth maps which avoids some visual artefacts in the maps generated with the TSVD method. △ Less

Submitted 2 December, 2023; originally announced December 2023.

Comments: 17 pages, 14 figures

arXiv:2311.10679 [pdf, other]

Non-uniform Bid-scaling and Equilibria for Different Auctions: An Empirical Study

Authors: Yuan Deng, Jieming Mao, Vahab Mirrokni, Yifeng Teng, Song Zuo

Abstract: In recent years, the growing adoption of autobidding has motivated the study of auction design with value-maximizing auto-bidders. It is known that under mild assumptions, uniform bid-scaling is an optimal bidding strategy in truthful auctions, e.g., Vickrey-Clarke-Groves auction (VCG), and the price of anarchy for VCG is $2$. However, for other auction formats like First-Price Auction (FPA) and G… ▽ More In recent years, the growing adoption of autobidding has motivated the study of auction design with value-maximizing auto-bidders. It is known that under mild assumptions, uniform bid-scaling is an optimal bidding strategy in truthful auctions, e.g., Vickrey-Clarke-Groves auction (VCG), and the price of anarchy for VCG is $2$. However, for other auction formats like First-Price Auction (FPA) and Generalized Second-Price auction (GSP), uniform bid-scaling may not be an optimal bidding strategy, and bidders have incentives to deviate to adopt strategies with non-uniform bid-scaling. Moreover, FPA can achieve optimal welfare if restricted to uniform bid-scaling, while its price of anarchy becomes $2$ when non-uniform bid-scaling strategies are allowed. All these price of anarchy results have been focused on welfare approximation in the worst-case scenarios. To complement theoretical understandings, we empirically study how different auction formats (FPA, GSP, VCG) with different levels of non-uniform bid-scaling perform in an autobidding world with a synthetic dataset for auctions. Our empirical findings include: * For both uniform bid-scaling and non-uniform bid-scaling, FPA is better than GSP and GSP is better than VCG in terms of both welfare and profit; * A higher level of non-uniform bid-scaling leads to lower welfare performance in both FPA and GSP, while different levels of non-uniform bid-scaling have no effect in VCG. Our methodology of synthetic data generation may be of independent interest. △ Less

Submitted 17 November, 2023; originally announced November 2023.

arXiv:2310.17602 [pdf, other]

Simulation-based Inference of Reionization Parameters from 3D Tomographic 21 cm Light-cone Images -- II: Application of Solid Harmonic Wavelet Scattering Transform

Authors: Xiaosheng Zhao, Yi Mao, Shifan Zuo, Benjamin D. Wandelt

Abstract: The information regarding how the intergalactic medium is reionized by astrophysical sources is contained in the tomographic three-dimensional 21 cm images from the epoch of reionization. In Zhao et al. (2022a) ("Paper I"), we demonstrated for the first time that density estimation likelihood-free inference (DELFI) can be applied efficiently to perform a Bayesian inference of the reionization para… ▽ More The information regarding how the intergalactic medium is reionized by astrophysical sources is contained in the tomographic three-dimensional 21 cm images from the epoch of reionization. In Zhao et al. (2022a) ("Paper I"), we demonstrated for the first time that density estimation likelihood-free inference (DELFI) can be applied efficiently to perform a Bayesian inference of the reionization parameters from the 21 cm images. Nevertheless, the 3D image data needs to be compressed into informative summaries as the input of DELFI by, e.g., a trained 3D convolutional neural network (CNN) as in Paper I (DELFI-3D CNN). Here in this paper, we introduce an alternative data compressor, the solid harmonic wavelet scattering transform (WST), which has a similar, yet fixed (i.e. no training), architecture to CNN, but we show that this approach (i.e. solid harmonic WST with DELFI) outperforms earlier analyses based on 3D 21 cm images using DELFI-3D CNN in terms of credible regions of parameters. Realistic effects, including thermal noise and residual foreground after removal, are also applied to the mock observations from the Square Kilometre Array (SKA). We show that under the same inference strategy using DELFI, the 21 cm image analysis with solid harmonic WST outperforms the 21 cm power spectrum analysis. This research serves as a proof of concept, demonstrating the potential to harness the strengths of WST and simulation-based inference to derive insights from future 21 cm light-cone image data. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: 19 pages, 10 figures, 7 tables. Submitted to ApJ. Comments welcome

arXiv:2310.16336 [pdf, other]

SMURF-THP: Score Matching-based UnceRtainty quantiFication for Transformer Hawkes Process

Authors: Zichong Li, Yanbo Xu, Simiao Zuo, Haoming Jiang, Chao Zhang, Tuo Zhao, Hongyuan Zha

Abstract: Transformer Hawkes process models have shown to be successful in modeling event sequence data. However, most of the existing training methods rely on maximizing the likelihood of event sequences, which involves calculating some intractable integral. Moreover, the existing methods fail to provide uncertainty quantification for model predictions, e.g., confidence intervals for the predicted event's… ▽ More Transformer Hawkes process models have shown to be successful in modeling event sequence data. However, most of the existing training methods rely on maximizing the likelihood of event sequences, which involves calculating some intractable integral. Moreover, the existing methods fail to provide uncertainty quantification for model predictions, e.g., confidence intervals for the predicted event's arrival time. To address these issues, we propose SMURF-THP, a score-based method for learning Transformer Hawkes process and quantifying prediction uncertainty. Specifically, SMURF-THP learns the score function of events' arrival time based on a score-matching objective that avoids the intractable computation. With such a learned score function, we can sample arrival time of events from the predictive distribution. This naturally allows for the quantification of uncertainty by computing confidence intervals over the generated samples. We conduct extensive experiments in both event type prediction and uncertainty quantification of arrival time. In all the experiments, SMURF-THP outperforms existing likelihood-based methods in confidence calibration while exhibiting comparable prediction accuracy. △ Less

Submitted 24 October, 2023; originally announced October 2023.

arXiv:2310.13855 [pdf, other]

Evoke: Evoking Critical Thinking Abilities in LLMs via Reviewer-Author Prompt Editing

Authors: Xinyu Hu, Pengfei Tang, Simiao Zuo, Zihan Wang, Bowen Song, Qiang Lou, Jian Jiao, Denis Charles

Abstract: Large language models (LLMs) have made impressive progress in natural language processing. These models rely on proper human instructions (or prompts) to generate suitable responses. However, the potential of LLMs are not fully harnessed by commonly-used prompting methods: many human-in-the-loop algorithms employ ad-hoc procedures for prompt selection; while auto prompt generation approaches are e… ▽ More Large language models (LLMs) have made impressive progress in natural language processing. These models rely on proper human instructions (or prompts) to generate suitable responses. However, the potential of LLMs are not fully harnessed by commonly-used prompting methods: many human-in-the-loop algorithms employ ad-hoc procedures for prompt selection; while auto prompt generation approaches are essentially searching all possible prompts randomly and inefficiently. We propose Evoke, an automatic prompt refinement framework. In Evoke, there are two instances of a same LLM: one as a reviewer (LLM-Reviewer), it scores the current prompt; the other as an author (LLM-Author), it edits the prompt by considering the edit history and the reviewer's feedback. Such an author-reviewer feedback loop ensures that the prompt is refined in each iteration. We further aggregate a data selection approach to Evoke, where only the hard samples are exposed to the LLM. The hard samples are more important because the LLM can develop deeper understanding of the tasks out of them, while the model may already know how to solve the easier cases. Experimental results show that Evoke significantly outperforms existing methods. For instance, in the challenging task of logical fallacy detection, Evoke scores above 80, while all other baseline methods struggle to reach 20. △ Less

Submitted 20 October, 2023; originally announced October 2023.

arXiv:2310.10826 [pdf, ps, other]

Mechanism Design for Large Language Models

Authors: Paul Duetting, Vahab Mirrokni, Renato Paes Leme, Haifeng Xu, Song Zuo

Abstract: We investigate auction mechanisms for AI-generated content, focusing on applications like ad creative generation. In our model, agents' preferences over stochastically generated content are encoded as large language models (LLMs). We propose an auction format that operates on a token-by-token basis, and allows LLM agents to influence content creation through single dimensional bids. We formulate t… ▽ More We investigate auction mechanisms for AI-generated content, focusing on applications like ad creative generation. In our model, agents' preferences over stochastically generated content are encoded as large language models (LLMs). We propose an auction format that operates on a token-by-token basis, and allows LLM agents to influence content creation through single dimensional bids. We formulate two desirable incentive properties and prove their equivalence to a monotonicity condition on output aggregation. This equivalence enables a second-price rule design, even absent explicit agent valuation functions. Our design is supported by demonstrations on a publicly available LLM. △ Less

Submitted 2 July, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

Comments: WWW'24 Best Paper

arXiv:2310.10810 [pdf, other]

Robust Multi-Agent Reinforcement Learning via Adversarial Regularization: Theoretical Foundation and Stable Algorithms

Authors: Alexander Bukharin, Yan Li, Yue Yu, Qingru Zhang, Zhehui Chen, Simiao Zuo, Chao Zhang, Songan Zhang, Tuo Zhao

Abstract: Multi-Agent Reinforcement Learning (MARL) has shown promising results across several domains. Despite this promise, MARL policies often lack robustness and are therefore sensitive to small changes in their environment. This presents a serious concern for the real world deployment of MARL algorithms, where the testing environment may slightly differ from the training environment. In this work we sh… ▽ More Multi-Agent Reinforcement Learning (MARL) has shown promising results across several domains. Despite this promise, MARL policies often lack robustness and are therefore sensitive to small changes in their environment. This presents a serious concern for the real world deployment of MARL algorithms, where the testing environment may slightly differ from the training environment. In this work we show that we can gain robustness by controlling a policy's Lipschitz constant, and under mild conditions, establish the existence of a Lipschitz and close-to-optimal policy. Based on these insights, we propose a new robust MARL framework, ERNIE, that promotes the Lipschitz continuity of the policies with respect to the state observations and actions by adversarial regularization. The ERNIE framework provides robustness against noisy observations, changing transition dynamics, and malicious actions of agents. However, ERNIE's adversarial regularization may introduce some training instability. To reduce this instability, we reformulate adversarial regularization as a Stackelberg game. We demonstrate the effectiveness of the proposed framework with extensive experiments in traffic light control and particle environments. In addition, we extend ERNIE to mean-field MARL with a formulation based on distributionally robust optimization that outperforms its non-robust counterpart and is of independent interest. Our code is available at https://github.com/abukharin3/ERNIE. △ Less

Submitted 16 October, 2023; originally announced October 2023.

Comments: 33 pages, 10 figures

arXiv:2310.03105 [pdf, other]

Efficiency of the Generalized Second-Price Auction for Value Maximizers

Authors: Yuan Deng, Mohammad Mahdian, Jieming Mao, Vahab Mirrokni, Hanrui Zhang, Song Zuo

Abstract: We study the price of anarchy of the generalized second-price auction where bidders are value maximizers (i.e., autobidders). We show that in general the price of anarchy can be as bad as $0$. For comparison, the price of anarchy of running VCG is $1/2$ in the autobidding world. We further show a fined-grained price of anarchy with respect to the discount factors (i.e., the ratios of click probabi… ▽ More We study the price of anarchy of the generalized second-price auction where bidders are value maximizers (i.e., autobidders). We show that in general the price of anarchy can be as bad as $0$. For comparison, the price of anarchy of running VCG is $1/2$ in the autobidding world. We further show a fined-grained price of anarchy with respect to the discount factors (i.e., the ratios of click probabilities between lower slots and the highest slot in each auction) in the generalized second-price auction, which highlights the qualitative relation between the smoothness of the discount factors and the efficiency of the generalized second-price auction. △ Less

Submitted 4 October, 2023; originally announced October 2023.

arXiv:2309.17411 [pdf, other]

Resilient Model-Free Asymmetric Bipartite Consensus for Nonlinear Multi-Agent Systems against DoS Attacks

Authors: Yi Zhang, Yichao Wang, Junbo Zhao, Shan Zuo

Abstract: In this letter, we study an unified resilient asymmetric bipartite consensus (URABC) problem for nonlinear multi-agent systems with both cooperative and antagonistic interactions under denial-of-service (DoS) attacks. We first prove that the URABC problem is solved by stabilizing the neighborhood asymmetric bipartite consensus error. Then, we develop a distributed compact form dynamic linearizatio… ▽ More In this letter, we study an unified resilient asymmetric bipartite consensus (URABC) problem for nonlinear multi-agent systems with both cooperative and antagonistic interactions under denial-of-service (DoS) attacks. We first prove that the URABC problem is solved by stabilizing the neighborhood asymmetric bipartite consensus error. Then, we develop a distributed compact form dynamic linearization method to linearize the neighborhood asymmetric bipartite consensus error. By using an extended discrete state observer to enhance the robustness against unknown dynamics and an attack compensation mechanism to eliminate the adverse effects of DoS attacks, we finally propose a distributed resilient model-free adaptive control algorithm to solve the URABC problem. A numerical example validates the proposed results. △ Less

Submitted 29 September, 2023; originally announced September 2023.

arXiv:2309.17301 [pdf, other]

Distributed Resilient Control of DC Microgrids Under Generally Unbounded FDI Attacks

Authors: Yichao Wang, Mohamadamin Rajabinezhad, Omar A. Beg, Shan Zuo

Abstract: Due to the nature of distributed secondary control paradigm, DC microgrids are prone to malicious cyber-physical attacks, which could be unbounded to maximize their damage. Existing resilient secondary control methods addressing unbounded attacks require that the first time derivatives of cyber-physical attack signals be bounded. The secondary defense strategy presented in this letter relax such a… ▽ More Due to the nature of distributed secondary control paradigm, DC microgrids are prone to malicious cyber-physical attacks, which could be unbounded to maximize their damage. Existing resilient secondary control methods addressing unbounded attacks require that the first time derivatives of cyber-physical attack signals be bounded. The secondary defense strategy presented in this letter relax such a strict constraint by addressing more generally unbounded attack signals and hence, enhance the resilience of DC microgrids in adversarial environments. Rigorous proofs, based on Lyapunov techniques, show that the proposed method guarantees the uniformly ultimately bounded convergence for both voltage regulation and proportional load sharing under generally unbounded attacks. Comparative case studies further validate the enhanced resilience of the proposed attack-resilient control strategy. △ Less

Submitted 29 September, 2023; originally announced September 2023.

arXiv:2309.17253 [pdf, other]

Secondary Defense Strategies of AC Microgrids Against Generally Unbounded Attacks

Authors: Yichao Wang, Mohamadamin Rajabinezhad, Shan Zuo

Abstract: This paper develops a fully distributed attack-resilient secondary defense strategies for AC microgrids, addressing more generally unbounded attacks on control input channels than those addressed in existing literature. The secondary control of local inverter includes consensus-based voltage and current regulators utilizing relative information from neighboring inverters. This distributed control… ▽ More This paper develops a fully distributed attack-resilient secondary defense strategies for AC microgrids, addressing more generally unbounded attacks on control input channels than those addressed in existing literature. The secondary control of local inverter includes consensus-based voltage and current regulators utilizing relative information from neighboring inverters. This distributed control approach relies on localized control and a sparse communication network, making it susceptible to malicious cyber-physical attacks that can impair consensus performance and potentially destabilize the overall microgrid. In contrast to existing solutions that are limited to addressing either bounded faults, noises or unbounded attacks with bounded first-order time derivatives, we aim to surpass these constraints and enhance the defense capabilities of counteracting cyber-physical attacks by enabling the AC microgrids adopting the proposed strategies to withstand a much wider range of unbounded cyber-attack signals. Fully distributed attack-resilient secondary defense strategies are developed for AC microgrids to counteract the detrimental effects of generally unbounded attacks on control input channels. Rigorous proofs using Lyapunov techniques demonstrate that the proposed defense strategies accomplish the uniformly ultimately bounded convergence on frequency regulation and achieve voltage containment and active power sharing simultaneously for multi-inverter-based AC microgrids in the face of generally unbounded attacks. The proposed defense strategies are validated on a modified IEEE 34-bus test feeder benchmark system incorporating four inverter-based DERs. △ Less

Submitted 29 September, 2023; originally announced September 2023.

arXiv:2308.16896 [pdf, other]

PointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic Occupancy Prediction

Authors: Sicheng Zuo, Wenzhao Zheng, Yuanhui Huang, Jie Zhou, Jiwen Lu

Abstract: Semantic segmentation in autonomous driving has been undergoing an evolution from sparse point segmentation to dense voxel segmentation, where the objective is to predict the semantic occupancy of each voxel in the concerned 3D space. The dense nature of the prediction space has rendered existing efficient 2D-projection-based methods (e.g., bird's eye view, range view, etc.) ineffective, as they c… ▽ More Semantic segmentation in autonomous driving has been undergoing an evolution from sparse point segmentation to dense voxel segmentation, where the objective is to predict the semantic occupancy of each voxel in the concerned 3D space. The dense nature of the prediction space has rendered existing efficient 2D-projection-based methods (e.g., bird's eye view, range view, etc.) ineffective, as they can only describe a subspace of the 3D scene. To address this, we propose a cylindrical tri-perspective view to represent point clouds effectively and comprehensively and a PointOcc model to process them efficiently. Considering the distance distribution of LiDAR point clouds, we construct the tri-perspective view in the cylindrical coordinate system for more fine-grained modeling of nearer areas. We employ spatial group pooling to maintain structural details during projection and adopt 2D backbones to efficiently process each TPV plane. Finally, we obtain the features of each point by aggregating its projected features on each of the processed TPV planes without the need for any post-processing. Extensive experiments on both 3D occupancy prediction and LiDAR segmentation benchmarks demonstrate that the proposed PointOcc achieves state-of-the-art performance with much faster speed. Specifically, despite only using LiDAR, PointOcc significantly outperforms all other methods, including multi-modal methods, with a large margin on the OpenOccupancy benchmark. Code: https://github.com/wzzheng/PointOcc. △ Less

Submitted 31 August, 2023; originally announced August 2023.

Comments: Code is available at https://github.com/wzzheng/PointOcc

arXiv:2308.10427 [pdf, other]

Federated Learning Robust to Byzantine Attacks: Achieving Zero Optimality Gap

Authors: Shiyuan Zuo, Rongfei Fan, Han Hu, Ning Zhang, Shimin Gong

Abstract: In this paper, we propose a robust aggregation method for federated learning (FL) that can effectively tackle malicious Byzantine attacks. At each user, model parameter is firstly updated by multiple steps, which is adjustable over iterations, and then pushed to the aggregation center directly. This decreases the number of interactions between the aggregation center and users, allows each user to… ▽ More In this paper, we propose a robust aggregation method for federated learning (FL) that can effectively tackle malicious Byzantine attacks. At each user, model parameter is firstly updated by multiple steps, which is adjustable over iterations, and then pushed to the aggregation center directly. This decreases the number of interactions between the aggregation center and users, allows each user to set training parameter in a flexible way, and reduces computation burden compared with existing works that need to combine multiple historical model parameters. At the aggregation center, geometric median is leveraged to combine the received model parameters from each user. Rigorous proof shows that zero optimality gap is achieved by our proposed method with linear convergence, as long as the fraction of Byzantine attackers is below half. Numerical results verify the effectiveness of our proposed method. △ Less

Submitted 20 August, 2023; originally announced August 2023.

arXiv:2308.09082 [pdf, other]

Over-the-Air Computation Aided Federated Learning with the Aggregation of Normalized Gradient

Authors: Rongfei Fan, Xuming An, Shiyuan Zuo, Han Hu

Abstract: Over-the-air computation is a communication-efficient solution for federated learning (FL). In such a system, iterative procedure is performed: Local gradient of private loss function is updated, amplified and then transmitted by every mobile device; the server receives the aggregated gradient all-at-once, generates and then broadcasts updated model parameters to every mobile device. In terms of a… ▽ More Over-the-air computation is a communication-efficient solution for federated learning (FL). In such a system, iterative procedure is performed: Local gradient of private loss function is updated, amplified and then transmitted by every mobile device; the server receives the aggregated gradient all-at-once, generates and then broadcasts updated model parameters to every mobile device. In terms of amplification factor selection, most related works suppose the local gradient's maximal norm always happens although it actually fluctuates over iterations, which may degrade convergence performance. To circumvent this problem, we propose to turn local gradient to be normalized one before amplifying it. Under our proposed method, when the loss function is smooth, we prove our proposed method can converge to stationary point at sub-linear rate. In case of smooth and strongly convex loss function, we prove our proposed method can achieve minimal training loss at linear rate with any small positive tolerance. Moreover, a tradeoff between convergence rate and the tolerance is discovered. To speedup convergence, problems optimizing system parameters are also formulated for above two cases. Although being non-convex, optimal solution with polynomial complexity of the formulated problems are derived. Experimental results show our proposed method can outperform benchmark methods on convergence performance. △ Less

Submitted 2 September, 2023; v1 submitted 17 August, 2023; originally announced August 2023.

arXiv:2308.09072 [pdf, other]

Joint Power Control and Data Size Selection for Over-the-Air Computation Aided Federated Learning

Authors: Xuming An, Rongfei Fan, Shiyuan Zuo, Han Hu, Hai Jiang, Ning Zhang

Abstract: Federated learning (FL) has emerged as an appealing machine learning approach to deal with massive raw data generated at multiple mobile devices, {which needs to aggregate the training model parameter of every mobile device at one base station (BS) iteratively}. For parameter aggregating in FL, over-the-air computation is a spectrum-efficient solution, which allows all mobile devices to transmit t… ▽ More Federated learning (FL) has emerged as an appealing machine learning approach to deal with massive raw data generated at multiple mobile devices, {which needs to aggregate the training model parameter of every mobile device at one base station (BS) iteratively}. For parameter aggregating in FL, over-the-air computation is a spectrum-efficient solution, which allows all mobile devices to transmit their parameter-mapped signals concurrently to a BS. Due to heterogeneous channel fading and noise, there exists difference between the BS's received signal and its desired signal, measured as the mean-squared error (MSE). To minimize the MSE, we propose to jointly optimize the signal amplification factors at the BS and the mobile devices as well as the data size (the number of data samples involved in local training) at every mobile device. The formulated problem is challenging to solve due to its non-convexity. To find the optimal solution, with some simplification on cost function and variable replacement, which still preserves equivalence, we transform the changed problem to be a bi-level problem equivalently. For the lower-level problem, optimal solution is found by enumerating every candidate solution from the Karush-Kuhn-Tucker (KKT) condition. For the upper-level problem, the optimal solution is found by exploring its piecewise convexity. Numerical results show that our proposed method can greatly reduce the MSE and can help to improve the training performance of FL compared with benchmark methods. △ Less

Submitted 17 August, 2023; originally announced August 2023.

arXiv:2308.04931 [pdf, other]

A simulation of calibration and map-making errors of the Tianlai cylinder pathfinder array

Authors: Kaifeng Yu, Fengquan Wu, Shifan Zuo, Jixia Li, Shijie Sun, Yougang Wang, Xuelei Chen

Abstract: The Tianlai cylinder array is a pathfinder for developing and testing 21cm intensity mapping techniques. In this paper, we use numerical simulation to assess how its measurement is affected by thermal noise and the errors in calibration and map-making process, and the error in the sky map reconstructed from a drift scan survey. Here we consider only the single frequency, unpolarized case. The beam… ▽ More The Tianlai cylinder array is a pathfinder for developing and testing 21cm intensity mapping techniques. In this paper, we use numerical simulation to assess how its measurement is affected by thermal noise and the errors in calibration and map-making process, and the error in the sky map reconstructed from a drift scan survey. Here we consider only the single frequency, unpolarized case. The beam is modelled by fitting to the electromagnetic simulation of the antenna, and the variations of the complex gains of the array elements are modelled by Gaussian processes. Mock visibility data is generated and run through our data processing pipeline. We find that the accuracy of the current calibration is limited primarily by the absolute calibration, where the error comes mainly from the approximation of a single dominating point source. We then studied the $m$-mode map-making with the help of Moore-Penrose inverse. We find that discarding modes with singular values smaller than a threshold could generate visible artifacts in the map. The impacts of the residue variation of the complex gain and thermal noise are also investigated. The thermal noise in the map varies with latitude, being minimum at the latitude passing through the zenith of the telescope. The angular power spectrum of the reconstructed map show that the current Tianlai cylinder pathfinder, which has a shorter maximum baseline length in the North-South direction, can measure modes up to $l \lesssim 2πb_{\rm NS}/λ\sim 200$ very well, but would lose a significant fraction of higher angular modes when noise is present. These results help us to identify the main limiting factors in our current array configuration and data analysis procedure, and suggest that the performance can be improved by reconfiguration of the array feed positions. △ Less

Submitted 9 August, 2023; originally announced August 2023.

Comments: 25 pages, 18 figures, RAA accepted

Journal ref: Research in Astronomy and Astrophysics, 23, 105008 (2023)

arXiv:2307.13903 [pdf, ps, other]

Corruption-Robust Lipschitz Contextual Search

Authors: Shiliang Zuo

Abstract: I study the problem of learning a Lipschitz function with corrupted binary signals. The learner tries to learn a $L$-Lipschitz function $f: [0,1]^d \rightarrow [0, L]$ that the adversary chooses. There is a total of $T$ rounds. In each round $t$, the adversary selects a context vector $x_t$ in the input space, and the learner makes a guess to the true function value $f(x_t)$ and receives a binary… ▽ More I study the problem of learning a Lipschitz function with corrupted binary signals. The learner tries to learn a $L$-Lipschitz function $f: [0,1]^d \rightarrow [0, L]$ that the adversary chooses. There is a total of $T$ rounds. In each round $t$, the adversary selects a context vector $x_t$ in the input space, and the learner makes a guess to the true function value $f(x_t)$ and receives a binary signal indicating whether the guess is high or low. In a total of $C$ rounds, the signal may be corrupted, though the value of $C$ is \emph{unknown} to the learner. The learner's goal is to incur a small cumulative loss. This work introduces the new algorithmic technique \emph{agnostic checking} as well as new analysis techniques. I design algorithms which: for the symmetric loss, the learner achieves regret $L\cdot O(C\log T)$ with $d = 1$ and $L\cdot O_d(C\log T + T^{(d-1)/d})$ with $d > 1$; for the pricing loss, the learner achieves regret $L\cdot \widetilde{O} (T^{d/(d+1)} + C\cdot T^{1/(d+1)})$. △ Less

Submitted 1 February, 2024; v1 submitted 25 July, 2023; originally announced July 2023.

Comments: Accepted at ALT 2024

arXiv:2307.09530 [pdf, other]

3D ScatterNet: Inference from 21 cm Light-cones

Authors: Xiaosheng Zhao, Shifan Zuo, Yi Mao

Abstract: The Square Kilometre Array (SKA) will have the sensitivity to take the 3D light-cones of the 21 cm signal from the epoch of reionization. This signal, however, is highly non-Gaussian and can not be fully interpreted by the traditional statistic using power spectrum. In this work, we introduce the 3D ScatterNet that combines the normalizing flows with solid harmonic wavelet scattering transform, a… ▽ More The Square Kilometre Array (SKA) will have the sensitivity to take the 3D light-cones of the 21 cm signal from the epoch of reionization. This signal, however, is highly non-Gaussian and can not be fully interpreted by the traditional statistic using power spectrum. In this work, we introduce the 3D ScatterNet that combines the normalizing flows with solid harmonic wavelet scattering transform, a 3D CNN featurizer with inductive bias, to perform implicit likelihood inference (ILI) from 21 cm light-cones. We show that 3D ScatterNet outperforms the ILI with a fine-tuned 3D CNN in the literature. It also reaches better performance than ILI with the power spectrum for varied light-cone effects and varied signal contaminations. △ Less

Submitted 13 July, 2023; originally announced July 2023.

Comments: 9 pages, 4 figures, 2 tables. Accepted to ICML 2023 Machine Learning for Astrophysics workshop. Comments and suggestions are welcome

arXiv:2306.17413 [pdf, other]

DeepTagger: Knowledge Enhanced Named Entity Recognition for Web-Based Ads Queries

Authors: Simiao Zuo, Pengfei Tang, Xinyu Hu, Qiang Lou, Jian Jiao, Denis Charles

Abstract: Named entity recognition (NER) is a crucial task for online advertisement. State-of-the-art solutions leverage pre-trained language models for this task. However, three major challenges remain unresolved: web queries differ from natural language, on which pre-trained models are trained; web queries are short and lack contextual information; and labeled data for NER is scarce. We propose DeepTagger… ▽ More Named entity recognition (NER) is a crucial task for online advertisement. State-of-the-art solutions leverage pre-trained language models for this task. However, three major challenges remain unresolved: web queries differ from natural language, on which pre-trained models are trained; web queries are short and lack contextual information; and labeled data for NER is scarce. We propose DeepTagger, a knowledge-enhanced NER model for web-based ads queries. The proposed knowledge enhancement framework leverages both model-free and model-based approaches. For model-free enhancement, we collect unlabeled web queries to augment domain knowledge; and we collect web search results to enrich the information of ads queries. We further leverage effective prompting methods to automatically generate labels using large language models such as ChatGPT. Additionally, we adopt a model-based knowledge enhancement method based on adversarial data augmentation. We employ a three-stage training framework to train DeepTagger models. Empirical results in various NER tasks demonstrate the effectiveness of the proposed framework. △ Less

Submitted 30 June, 2023; originally announced June 2023.

arXiv:2306.09217 [pdf, other]

doi 10.1088/1674-4527/acd0ee

Map Reconstruction of radio observations with Conditional Invertible Neural Networks

Authors: Haolin Zhang, Shifan Zuo, Le Zhang

Abstract: In radio astronomy, the challenge of reconstructing a sky map from time ordered data (TOD) is known as an inverse problem. Standard map-making techniques and gridding algorithms are commonly employed to address this problem, each offering its own benefits such as producing minimum-variance maps. However, these approaches also carry limitations such as computational inefficiency and numerical insta… ▽ More In radio astronomy, the challenge of reconstructing a sky map from time ordered data (TOD) is known as an inverse problem. Standard map-making techniques and gridding algorithms are commonly employed to address this problem, each offering its own benefits such as producing minimum-variance maps. However, these approaches also carry limitations such as computational inefficiency and numerical instability in map-making and the inability to remove beam effects in grid-based methods. To overcome these challenges, this study proposes a novel solution through the use of the conditional invertible neural network (cINN) for efficient sky map reconstruction. With the aid of forward modeling, where the simulated TODs are generated from a given sky model with a specific observation, the trained neural network can produce accurate reconstructed sky maps. Using the five-hundred-meter aperture spherical radio telescope (FAST) as an example, cINN demonstrates remarkable performance in map reconstruction from simulated TODs, achieving a mean squared error of $2.29\pm 2.14 \times 10^{-4}~\rm K^2$, a structural similarity index of $0.968\pm0.002$, and a peak signal-to-noise ratio of $26.13\pm5.22$ at the $1σ$ level. Furthermore, by sampling in the latent space of cINN, the reconstruction errors for each pixel can be accurately quantified. △ Less

Submitted 15 June, 2023; originally announced June 2023.

Comments: Accepted for publication in Research in Astronomy and Astrophysics (RAA); 20 pages, 10 figures

arXiv:2306.06554 [pdf, other]

Bayesian Calibrated Click-Through Auction

Authors: Junjie Chen, Minming Li, Haifeng Xu, Song Zuo

Abstract: We study information design in click-through auctions, in which the bidders/advertisers bid for winning an opportunity to show their ads but only pay for realized clicks. The payment may or may not happen, and its probability is called the click-through rate (CTR). This auction format is widely used in the industry of online advertising. Bidders have private values, whereas the seller has private… ▽ More We study information design in click-through auctions, in which the bidders/advertisers bid for winning an opportunity to show their ads but only pay for realized clicks. The payment may or may not happen, and its probability is called the click-through rate (CTR). This auction format is widely used in the industry of online advertising. Bidders have private values, whereas the seller has private information about each bidder's CTRs. We are interested in the seller's problem of partially revealing CTR information to maximize revenue. Information design in click-through auctions turns out to be intriguingly different from almost all previous studies in this space since any revealed information about CTRs will never affect bidders' bidding behaviors -- they will always bid their true value per click -- but only affect the auction's allocation and payment rule. In some sense, this makes information design effectively a constrained mechanism design problem. Our first result is an FPTAS to compute an approximately optimal mechanism under a constant number of bidders. The design of this algorithm leverages Bayesian bidder values which help to ``smooth'' the seller's revenue function and lead to better tractability. The design of this FPTAS is complex and primarily algorithmic. Our second main result pursues the design of ``simple'' mechanisms that are approximately optimal yet more practical. We primarily focus on the two-bidder situation, which is already notoriously challenging as demonstrated in recent works. When bidders' CTR distribution is symmetric, we develop a simple prior-free signaling scheme, whose construction relies on a parameter termed optimal signal ratio. The constructed scheme provably obtains a good approximation as long as the maximum and minimum of bidders' value density functions do not differ much. △ Less

Submitted 20 April, 2024; v1 submitted 10 June, 2023; originally announced June 2023.

Comments: add more explanations, details and discussions, use a new template

arXiv:2306.05285 [pdf, other]

Unsupervised Statistical Feature-Guided Diffusion Model for Sensor-based Human Activity Recognition

Authors: Si Zuo, Vitor Fortes Rey, Sungho Suh, Stephan Sigg, Paul Lukowicz

Abstract: Human activity recognition (HAR) from on-body sensors is a core functionality in many AI applications: from personal health, through sports and wellness to Industry 4.0. A key problem holding up progress in wearable sensor-based HAR, compared to other ML areas, such as computer vision, is the unavailability of diverse and labeled training data. Particularly, while there are innumerable annotated i… ▽ More Human activity recognition (HAR) from on-body sensors is a core functionality in many AI applications: from personal health, through sports and wellness to Industry 4.0. A key problem holding up progress in wearable sensor-based HAR, compared to other ML areas, such as computer vision, is the unavailability of diverse and labeled training data. Particularly, while there are innumerable annotated images available in online repositories, freely available sensor data is sparse and mostly unlabeled. We propose an unsupervised statistical feature-guided diffusion model specifically optimized for wearable sensor-based human activity recognition with devices such as inertial measurement unit (IMU) sensors. The method generates synthetic labeled time-series sensor data without relying on annotated training data. Thereby, it addresses the scarcity and annotation difficulties associated with real-world sensor data. By conditioning the diffusion model on statistical information such as mean, standard deviation, Z-score, and skewness, we generate diverse and representative synthetic sensor data. We conducted experiments on public human activity recognition datasets and compared the method to conventional oversampling and state-of-the-art generative adversarial network methods. Experimental results demonstrate that this can improve the performance of human activity recognition and outperform existing techniques. △ Less

Submitted 19 May, 2024; v1 submitted 30 May, 2023; originally announced June 2023.

arXiv:2306.03109 [pdf, other]

Machine Learning Force Fields with Data Cost Aware Training

Authors: Alexander Bukharin, Tianyi Liu, Shengjie Wang, Simiao Zuo, Weihao Gao, Wen Yan, Tuo Zhao

Abstract: Machine learning force fields (MLFF) have been proposed to accelerate molecular dynamics (MD) simulation, which finds widespread applications in chemistry and biomedical research. Even for the most data-efficient MLFFs, reaching chemical accuracy can require hundreds of frames of force and energy labels generated by expensive quantum mechanical algorithms, which may scale as $O(n^3)$ to $O(n^7)$,… ▽ More Machine learning force fields (MLFF) have been proposed to accelerate molecular dynamics (MD) simulation, which finds widespread applications in chemistry and biomedical research. Even for the most data-efficient MLFFs, reaching chemical accuracy can require hundreds of frames of force and energy labels generated by expensive quantum mechanical algorithms, which may scale as $O(n^3)$ to $O(n^7)$, with $n$ proportional to the number of basis functions. To address this issue, we propose a multi-stage computational framework -- ASTEROID, which lowers the data cost of MLFFs by leveraging a combination of cheap inaccurate data and expensive accurate data. The motivation behind ASTEROID is that inaccurate data, though incurring large bias, can help capture the sophisticated structures of the underlying force field. Therefore, we first train a MLFF model on a large amount of inaccurate training data, employing a bias-aware loss function to prevent the model from overfitting tahe potential bias of this data. We then fine-tune the obtained model using a small amount of accurate training data, which preserves the knowledge learned from the inaccurate training data while significantly improving the model's accuracy. Moreover, we propose a variant of ASTEROID based on score matching for the setting where the inaccurate training data are unlabeled. Extensive experiments on MD datasets and downstream tasks validate the efficacy of ASTEROID. Our code and data are available at https://github.com/abukharin3/asteroid. △ Less

Submitted 5 June, 2023; originally announced June 2023.

arXiv:2305.06405 [pdf, other]

FAST drift scan survey for HI intensity mapping: I. preliminary data analysis

Authors: Yichao Li, Yougang Wang, Furen Deng, Wenxiu Yang, Wenkai Hu, Diyang Liu, Xinyang Zhao, Shifan Zuo, Shuanghao Shu, Jixia Li, Peter Timbie, Reza Ansari, Olivier Perdereau, Albert Stebbins, Laura Wolz, Fengquan Wu, Xin Zhang, Xuelei Chen

Abstract: This work presents the initial results of the drift-scan observation for the neutral hydrogen (HI) intensity mapping survey with the Five-hundred-meter Aperture Spherical radio Telescope (FAST). The data analyzed in this work were collected in night observations from 2019 through 2021. The primary findings are based on 28 hours of drift-scan observation carried out over seven nights in 2021, which… ▽ More This work presents the initial results of the drift-scan observation for the neutral hydrogen (HI) intensity mapping survey with the Five-hundred-meter Aperture Spherical radio Telescope (FAST). The data analyzed in this work were collected in night observations from 2019 through 2021. The primary findings are based on 28 hours of drift-scan observation carried out over seven nights in 2021, which covers $60\,{\rm deg}^2$ sky area. Our main findings are: (i) Our calibration strategy can successfully correct both the temporal and bandpass gain variation over the $4$-hour drift-scan observation. (ii) The continuum maps of the surveyed region are made with frequency resolution of $28$ kHz and pixel area of $2.95\,{\rm arcmin}^2$. The pixel noise levels of the continuum maps are slightly higher than the forecast assuming $T_{\rm sys}=20\,{\rm K}$, which are $36.0$ mK (for 10.0 s integration time) at the $1050$--$1150$ MHz band, and $25.9$ mK (for 16.7 s integration time) at the $1323$--$1450$ MHz band, respectively. (iii) The flux-weighted differential number count is consistent with the NRAO-VLA Sky Survey (NVSS) catalog down to the confusion limit $\sim7\,{\rm mJy}/{\rm beam}^{-1}$. (iv) The continuum flux measurements of the sources are consistent with that found in the literature. The difference in the flux measurement of $81$ isolated NVSS sources is about $6.3\%$. Our research offers a systematic analysis for the FAST HI intensity mapping drift-scan survey and serves as a helpful resource for further cosmology and associated galaxies sciences with the FAST drift-scan survey. △ Less

Submitted 10 May, 2023; originally announced May 2023.

Comments: 26 pages, 26 figures, and 4 tables

arXiv:2304.13108 [pdf, other]

Detecting HI Galaxies with Deep Neural Networks in the Presence of Radio Frequency Interference

Authors: Ruxi Liang, Furen Deng, Zepei Yang, Chunming Li, Feiyu Zhao, Botao Yang, Shuanghao Shu, Wenxiu Yang, Shifan Zuo, Yichao Li, Yougang Wang, Xuelei Chen

Abstract: In neutral hydrogen (HI) galaxy survey, a significant challenge is to identify and extract the HI galaxy signal from observational data contaminated by radio frequency interference (RFI). For a drift-scan survey, or more generally a survey of a spatially continuous region, in the time-ordered spectral data, the HI galaxies and RFI all appear as regions which extend an area in the time-frequency wa… ▽ More In neutral hydrogen (HI) galaxy survey, a significant challenge is to identify and extract the HI galaxy signal from observational data contaminated by radio frequency interference (RFI). For a drift-scan survey, or more generally a survey of a spatially continuous region, in the time-ordered spectral data, the HI galaxies and RFI all appear as regions which extend an area in the time-frequency waterfall plot, so the extraction of the HI galaxies and RFI from such data can be regarded as an image segmentation problem, and machine learning methods can be applied to solve such problems. In this study, we develop a method to effectively detect and extract signals of HI galaxies based on a Mask R-CNN network combined with the PointRend method. By simulating FAST-observed galaxy signals and potential RFI impacts, we created a realistic data set for the training and testing of our neural network. We compared five different architectures and selected the best-performing one. This architecture successfully performs instance segmentation of HI galaxy signals in the RFI-contaminated time-ordered data (TOD), achieving a precision of 98.64% and a recall of 93.59%. △ Less

Submitted 25 April, 2023; originally announced April 2023.

Comments: 17 pages, 9 figures, 1 tables. Accepted for publication in RAA

arXiv:2303.07943 [pdf, other]

doi 10.1093/mnras/stad1375

SKA Science Data Challenge 2: analysis and results

Authors: P. Hartley, A. Bonaldi, R. Braun, J. N. H. S. Aditya, S. Aicardi, L. Alegre, A. Chakraborty, X. Chen, S. Choudhuri, A. O. Clarke, J. Coles, J. S. Collinson, D. Cornu, L. Darriba, M. Delli Veneri, J. Forbrich, B. Fraga, A. Galan, J. Garrido, F. Gubanov, H. Håkansson, M. J. Hardcastle, C. Heneka, D. Herranz, K. M. Hess , et al. (83 additional authors not shown)

Abstract: The Square Kilometre Array Observatory (SKAO) will explore the radio sky to new depths in order to conduct transformational science. SKAO data products made available to astronomers will be correspondingly large and complex, requiring the application of advanced analysis techniques to extract key science findings. To this end, SKAO is conducting a series of Science Data Challenges, each designed t… ▽ More The Square Kilometre Array Observatory (SKAO) will explore the radio sky to new depths in order to conduct transformational science. SKAO data products made available to astronomers will be correspondingly large and complex, requiring the application of advanced analysis techniques to extract key science findings. To this end, SKAO is conducting a series of Science Data Challenges, each designed to familiarise the scientific community with SKAO data and to drive the development of new analysis techniques. We present the results from Science Data Challenge 2 (SDC2), which invited participants to find and characterise 233245 neutral hydrogen (Hi) sources in a simulated data product representing a 2000~h SKA MID spectral line observation from redshifts 0.25 to 0.5. Through the generous support of eight international supercomputing facilities, participants were able to undertake the Challenge using dedicated computational resources. Alongside the main challenge, `reproducibility awards' were made in recognition of those pipelines which demonstrated Open Science best practice. The Challenge saw over 100 participants develop a range of new and existing techniques, with results that highlight the strengths of multidisciplinary and collaborative effort. The winning strategy -- which combined predictions from two independent machine learning techniques to yield a 20 percent improvement in overall performance -- underscores one of the main Challenge outcomes: that of method complementarity. It is likely that the combination of methods in a so-called ensemble approach will be key to exploiting very large astronomical datasets. △ Less

Submitted 14 March, 2023; originally announced March 2023.

Comments: Under review by MNRAS; 28 pages, 16 figures

arXiv:2302.12010 [pdf, ps, other]

doi 10.1088/1674-1137/acbf2a

Nonextensive effects on QCD chiral phase diagram and baryon-number fluctuations within Polyakov-Nambu-Jona-Lasinio model

Authors: Ya-Peng Zhao, Chao-Yong Wang, Shu-Yu Zuo, Cheng-Ming Li

Abstract: In this paper, a version of the Polyakov-Nambu-Jona-Lasinio (PNJL) model based on nonextensive statistical mechanics is presented. This new statistics summarizes all possible factors that violate the assumptions of the Boltzmann-Gibbs (BG) statistics to a dimensionless nonextensivity parameter $q$, and when $q$ tends to 1, it returns to the BG case. Within the nonextensive PNJL model, we found tha… ▽ More In this paper, a version of the Polyakov-Nambu-Jona-Lasinio (PNJL) model based on nonextensive statistical mechanics is presented. This new statistics summarizes all possible factors that violate the assumptions of the Boltzmann-Gibbs (BG) statistics to a dimensionless nonextensivity parameter $q$, and when $q$ tends to 1, it returns to the BG case. Within the nonextensive PNJL model, we found that as $q$ increases, the location of the critical end point (CEP) exhibits non-monotonic behavior. That is, for $q<1.15$, CEP moves in the direction of lower temperature and larger quark chemical potential. But for $q>1.15$, CEP turns to move in the direction of lower temperature and lower quark chemical potential. In addition, we studied the moments of the net-baryon number distribution, that is, the variance ($σ^{2}$), skewness (S), and kurtosis ($κ$). Our results are generally consistent with the latest experimental data, especially for $\sqrt{S_{NN}}>19.6\ \mathrm{GeV}$, when $q$ is set to $1.07$. △ Less

Submitted 23 February, 2023; originally announced February 2023.

arXiv:2302.00377 [pdf, ps, other]

Autobidding Auctions in the Presence of User Costs

Authors: Yuan Deng, Jieming Mao, Vahab Mirrokni, Hanrui Zhang, Song Zuo

Abstract: We study autobidding ad auctions with user costs, where each bidder is value-maximizing subject to a return-over-investment (ROI) constraint, and the seller aims to maximize the social welfare taking into consideration the user's cost of viewing an ad. We show that in the worst case, the approximation ratio of social welfare by running the vanilla VCG auctions with user costs could as bad as 0. To… ▽ More We study autobidding ad auctions with user costs, where each bidder is value-maximizing subject to a return-over-investment (ROI) constraint, and the seller aims to maximize the social welfare taking into consideration the user's cost of viewing an ad. We show that in the worst case, the approximation ratio of social welfare by running the vanilla VCG auctions with user costs could as bad as 0. To improve the performance of VCG, We propose a new variant of VCG based on properly chosen cost multipliers, and prove that there exist auction-dependent and bidder-dependent cost multipliers that guarantee approximation ratios of 1/2 and 1/4 respectively in terms of the social welfare. △ Less

Submitted 1 February, 2023; originally announced February 2023.

arXiv:2301.07823 [pdf, other]

doi 10.1109/TCNS.2023.3338772

Resilient Containment Control of Heterogeneous Multi-Agent Systems Against Unbounded Sensor and Actuator Attacks

Authors: Shan Zuo, Yi Zhang, Yichao Wang

Abstract: Accurate local state measurement is important to ensure the reliable operation of distributed multi-agent systems (MAS). Existing fault-tolerant control strategies generally assume the sensor faults to be bounded and uncorrelated. In this paper, we study the ramifications of allowing the sensor attack injections to be unbounded and correlated. These malicious sensor attacks may bypass the conventi… ▽ More Accurate local state measurement is important to ensure the reliable operation of distributed multi-agent systems (MAS). Existing fault-tolerant control strategies generally assume the sensor faults to be bounded and uncorrelated. In this paper, we study the ramifications of allowing the sensor attack injections to be unbounded and correlated. These malicious sensor attacks may bypass the conventional attack-detection methods and compromise the cooperative performance and even stability of the distributed networked MAS. Moreover, the attackers may gain access to the actuation computing channels and manipulate the control input commands. To this end, we consider the resilient containment control problem of general linear heterogeneous MAS in the face of correlated and unbounded sensor attacks, as well as general unbounded actuator attacks. We propose an attack-resilient control framework to guarantee the uniform ultimate boundedness of the closed-loop dynamical systems and preserve the bounded containment performance. Compared with existing literature addressing bounded faults and/or disturbances that are unintentionally caused in the sensor and actuator channels, the proposed control protocols are resilient against unknown unbounded attack signals simultaneously injected into sensor and actuator channels, and hence are more practical in the real-world security applications. A numerical example illustrates the efficacy of the proposed result, by highlighting the resilience improvement over the conventional cooperative control method. △ Less

Submitted 18 January, 2023; originally announced January 2023.

arXiv:2212.08136 [pdf, other]

Efficient Long Sequence Modeling via State Space Augmented Transformer

Authors: Simiao Zuo, Xiaodong Liu, Jian Jiao, Denis Charles, Eren Manavoglu, Tuo Zhao, Jianfeng Gao

Abstract: Transformer models have achieved superior performance in various natural language processing tasks. However, the quadratic computational cost of the attention mechanism limits its practicality for long sequences. There are existing attention variants that improve the computational efficiency, but they have limited ability to effectively compute global information. In parallel to Transformer models… ▽ More Transformer models have achieved superior performance in various natural language processing tasks. However, the quadratic computational cost of the attention mechanism limits its practicality for long sequences. There are existing attention variants that improve the computational efficiency, but they have limited ability to effectively compute global information. In parallel to Transformer models, state space models (SSMs) are tailored for long sequences, but they are not flexible enough to capture complicated local information. We propose SPADE, short for $\underline{\textbf{S}}$tate s$\underline{\textbf{P}}$ace $\underline{\textbf{A}}$ugmente$\underline{\textbf{D}}$ Transform$\underline{\textbf{E}}$r. Specifically, we augment a SSM into the bottom layer of SPADE, and we employ efficient local attention methods for the other layers. The SSM augments global information, which complements the lack of long-range dependency issue in local attention methods. Experimental results on the Long Range Arena benchmark and language modeling tasks demonstrate the effectiveness of the proposed method. To further demonstrate the scalability of SPADE, we pre-train large encoder-decoder models and present fine-tuning results on natural language understanding and natural language generation tasks. △ Less

Submitted 15 December, 2022; originally announced December 2022.

arXiv:2212.05577 [pdf, other]

Mediation analysis with the mediator and outcome missing not at random

Authors: Shuozhi Zuo, Debashis Ghosh, Peng Ding, Fan Yang

Abstract: Mediation analysis is widely used for investigating direct and indirect causal pathways through which an effect arises. However, many mediation analysis studies are challenged by missingness in the mediator and outcome. In general, when the mediator and outcome are missing not at random, the direct and indirect effects are not identifiable without further assumptions. In this work, we study the id… ▽ More Mediation analysis is widely used for investigating direct and indirect causal pathways through which an effect arises. However, many mediation analysis studies are challenged by missingness in the mediator and outcome. In general, when the mediator and outcome are missing not at random, the direct and indirect effects are not identifiable without further assumptions. In this work, we study the identifiability of the direct and indirect effects under some interpretable mechanisms that allow for missing not at random in the mediator and outcome. We evaluate the performance of statistical inference under those mechanisms through simulation studies and illustrate the proposed methods via the National Job Corps Study. △ Less

Submitted 22 September, 2023; v1 submitted 11 December, 2022; originally announced December 2022.

arXiv:2210.01351 [pdf, other]

Less is More: Task-aware Layer-wise Distillation for Language Model Compression

Authors: Chen Liang, Simiao Zuo, Qingru Zhang, Pengcheng He, Weizhu Chen, Tuo Zhao

Abstract: Layer-wise distillation is a powerful tool to compress large models (i.e. teacher models) into small ones (i.e., student models). The student distills knowledge from the teacher by mimicking the hidden representations of the teacher at every intermediate layer. However, layer-wise distillation is difficult. Since the student has a smaller model capacity than the teacher, it is often under-fitted.… ▽ More Layer-wise distillation is a powerful tool to compress large models (i.e. teacher models) into small ones (i.e., student models). The student distills knowledge from the teacher by mimicking the hidden representations of the teacher at every intermediate layer. However, layer-wise distillation is difficult. Since the student has a smaller model capacity than the teacher, it is often under-fitted. Furthermore, the hidden representations of the teacher contain redundant information that the student does not necessarily need for the target task's learning. To address these challenges, we propose a novel Task-aware layEr-wise Distillation (TED). TED designs task-aware filters to align the hidden representations of the student and the teacher at each layer. The filters select the knowledge that is useful for the target task from the hidden representations. As such, TED reduces the knowledge gap between the two models and helps the student to fit better on the target task. We evaluate TED in two scenarios: continual pre-training and fine-tuning. TED demonstrates significant and consistent improvements over existing distillation methods in both scenarios. Code is available at https://github.com/cliang1453/task-aware-distillation. △ Less

Submitted 5 June, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

Comments: Proceedings of ICML 2023

arXiv:2209.07584 [pdf, other]

Context-Aware Query Rewriting for Improving Users' Search Experience on E-commerce Websites

Authors: Simiao Zuo, Qingyu Yin, Haoming Jiang, Shaohui Xi, Bing Yin, Chao Zhang, Tuo Zhao

Abstract: E-commerce queries are often short and ambiguous. Consequently, query understanding often uses query rewriting to disambiguate user-input queries. While using e-commerce search tools, users tend to enter multiple searches, which we call context, before purchasing. These history searches contain contextual insights about users' true shopping intents. Therefore, modeling such contextual information… ▽ More E-commerce queries are often short and ambiguous. Consequently, query understanding often uses query rewriting to disambiguate user-input queries. While using e-commerce search tools, users tend to enter multiple searches, which we call context, before purchasing. These history searches contain contextual insights about users' true shopping intents. Therefore, modeling such contextual information is critical to a better query rewriting model. However, existing query rewriting models ignore users' history behaviors and consider only the instant search query, which is often a short string offering limited information about the true shopping intent. We propose an end-to-end context-aware query rewriting model to bridge this gap, which takes the search context into account. Specifically, our model builds a session graph using the history search queries and their contained words. We then employ a graph attention mechanism that models cross-query relations and computes contextual information of the session. The model subsequently calculates session representations by combining the contextual information with the instant search query using an aggregation network. The session representations are then decoded to generate rewritten queries. Empirically, we demonstrate the superiority of our method to state-of-the-art approaches under various metrics. On in-house data from an online shopping platform, by introducing contextual information, our model achieves 11.6% improvement under the MRR (Mean Reciprocal Rank) metric and 20.1% improvement under the HIT@16 metric (a hit rate metric), in comparison with the best baseline method (Transformer-based model). △ Less

Submitted 24 September, 2022; v1 submitted 15 September, 2022; originally announced September 2022.

arXiv:2209.07499 [pdf, other]

DiP-GNN: Discriminative Pre-Training of Graph Neural Networks

Authors: Simiao Zuo, Haoming Jiang, Qingyu Yin, Xianfeng Tang, Bing Yin, Tuo Zhao

Abstract: Graph neural network (GNN) pre-training methods have been proposed to enhance the power of GNNs. Specifically, a GNN is first pre-trained on a large-scale unlabeled graph and then fine-tuned on a separate small labeled graph for downstream applications, such as node classification. One popular pre-training method is to mask out a proportion of the edges, and a GNN is trained to recover them. Howev… ▽ More Graph neural network (GNN) pre-training methods have been proposed to enhance the power of GNNs. Specifically, a GNN is first pre-trained on a large-scale unlabeled graph and then fine-tuned on a separate small labeled graph for downstream applications, such as node classification. One popular pre-training method is to mask out a proportion of the edges, and a GNN is trained to recover them. However, such a generative method suffers from graph mismatch. That is, the masked graph inputted to the GNN deviates from the original graph. To alleviate this issue, we propose DiP-GNN (Discriminative Pre-training of Graph Neural Networks). Specifically, we train a generator to recover identities of the masked edges, and simultaneously, we train a discriminator to distinguish the generated edges from the original graph's edges. In our framework, the graph seen by the discriminator better matches the original graph because the generator can recover a proportion of the masked edges. Extensive experiments on large-scale homogeneous and heterogeneous graphs demonstrate the effectiveness of the proposed framework. △ Less

Submitted 15 September, 2022; originally announced September 2022.

arXiv:2209.07303 [pdf, other]

Differentially Private Estimation of Hawkes Process

Authors: Simiao Zuo, Tianyi Liu, Tuo Zhao, Hongyuan Zha

Abstract: Point process models are of great importance in real world applications. In certain critical applications, estimation of point process models involves large amounts of sensitive personal data from users. Privacy concerns naturally arise which have not been addressed in the existing literature. To bridge this glaring gap, we propose the first general differentially private estimation procedure for… ▽ More Point process models are of great importance in real world applications. In certain critical applications, estimation of point process models involves large amounts of sensitive personal data from users. Privacy concerns naturally arise which have not been addressed in the existing literature. To bridge this glaring gap, we propose the first general differentially private estimation procedure for point process models. Specifically, we take the Hawkes process as an example, and introduce a rigorous definition of differential privacy for event stream data based on a discretized representation of the Hawkes process. We then propose two differentially private optimization algorithms, which can efficiently estimate Hawkes process models with the desired privacy and utility guarantees under two different settings. Experiments are provided to back up our theoretical analysis. △ Less

Submitted 15 September, 2022; originally announced September 2022.

arXiv:2208.14675 [pdf, other]

doi 10.3847/1538-4357/acb822

A Semi-blind PCA-based Foreground Subtraction Method for 21 cm Intensity Mapping

Authors: Shifan Zuo, Xuelei Chen, Yi Mao

Abstract: The Principal Component Analysis (PCA) method and the Singular Value Decomposition (SVD) method are widely used for foreground subtraction in 21 cm intensity mapping experiments. We show their equivalence, and point out that the condition for completely clean separation of foregrounds and cosmic 21 cm signal using the PCA/SVD is unrealistic. We propose a PCA-based foreground subtraction method, du… ▽ More The Principal Component Analysis (PCA) method and the Singular Value Decomposition (SVD) method are widely used for foreground subtraction in 21 cm intensity mapping experiments. We show their equivalence, and point out that the condition for completely clean separation of foregrounds and cosmic 21 cm signal using the PCA/SVD is unrealistic. We propose a PCA-based foreground subtraction method, dubbed "Singular Vector Projection (SVP)" method, which exploits a priori information of the left and/or right singular vectors of the foregrounds. We demonstrate with simulation tests that this new, semi-blind method can reduce the error of the recovered 21 cm signal by orders of magnitude, even if only the left and/or right singular vectors in the largest few modes are exploited. The SVP estimators provide a new, effective approach for 21 cm observations to remove foregrounds and uncover the physics in the cosmic 21 cm signal. △ Less

Submitted 1 February, 2023; v1 submitted 31 August, 2022; originally announced August 2022.

Comments: Comments welcome

Journal ref: ApJ, 2023, vol. 945, id. 38

Showing 1–50 of 103 results for author: Zuo, S