Skip to main content

Showing 1–50 of 453 results for author: Cheng, G

  1. arXiv:2407.05671  [pdf, other

    cs.CV cs.AI

    MSTF: Multiscale Transformer for Incomplete Trajectory Prediction

    Authors: Zhanwen Liu, Chao Li, Nan Yang, Yang Wang, Jiaqi Ma, Guangliang Cheng, Xiangmo Zhao

    Abstract: Motion forecasting plays a pivotal role in autonomous driving systems, enabling vehicles to execute collision warnings and rational local-path planning based on predictions of the surrounding vehicles. However, prevalent methods often assume complete observed trajectories, neglecting the potential impact of missing values induced by object occlusion, scope limitation, and sensor failures. Such ove… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  2. arXiv:2407.00924  [pdf, other

    cs.CL

    EXCGEC: A Benchmark of Edit-wise Explainable Chinese Grammatical Error Correction

    Authors: Jingheng Ye, Shang Qin, Yinghui Li, Xuxin Cheng, Libo Qin, Hai-Tao Zheng, Peng Xing, Zishan Xu, Guo Cheng, Zhao Wei

    Abstract: Existing studies explore the explainability of Grammatical Error Correction (GEC) in a limited scenario, where they ignore the interaction between corrections and explanations. To bridge the gap, this paper introduces the task of EXplainable GEC (EXGEC), which focuses on the integral role of both correction and explanation tasks. To facilitate the task, we propose EXCGEC, a tailored benchmark for… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 22 pages, 10 tables, 9 figures. Under review

  3. arXiv:2406.16946  [pdf, ps, other

    eess.SP

    Networked ISAC for Low-Altitude Economy: Coordinated Transmit Beamforming and UAV Trajectory Design

    Authors: Gaoyuan Cheng, Xianxin Song, Zhonghao Lyu, Jie Xu

    Abstract: This paper exploits the networked integrated sensing and communications (ISAC) to support low-altitude economy (LAE), in which a set of networked ground base stations (GBSs) cooperatively transmit joint information and sensing signals to communicate with multiple authorized unmanned aerial vehicles (UAVs) and concurrently detect unauthorized objects over the interested region in the three-dimensio… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2405.07568

  4. arXiv:2406.16028  [pdf, other

    cs.LG cs.AI

    TimeAutoDiff: Combining Autoencoder and Diffusion model for time series tabular data synthesizing

    Authors: Namjoon Suh, Yuning Yang, Din-Yin Hsieh, Qitong Luan, Shirong Xu, Shixiang Zhu, Guang Cheng

    Abstract: In this paper, we leverage the power of latent diffusion models to generate synthetic time series tabular data. Along with the temporal and feature correlations, the heterogeneous nature of the feature in the table has been one of the main obstacles in time series tabular data modeling. We tackle this problem by combining the ideas of the variational auto-encoder (VAE) and the denoising diffusion… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  5. arXiv:2406.13130  [pdf, other

    cs.LG stat.ML

    Advancing Retail Data Science: Comprehensive Evaluation of Synthetic Data

    Authors: Yu Xia, Chi-Hua Wang, Joshua Mabry, Guang Cheng

    Abstract: The evaluation of synthetic data generation is crucial, especially in the retail sector where data accuracy is paramount. This paper introduces a comprehensive framework for assessing synthetic retail data, focusing on fidelity, utility, and privacy. Our approach differentiates between continuous and discrete data attributes, providing precise evaluation criteria. Fidelity is measured through stab… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  6. arXiv:2406.13012  [pdf, other

    cs.LG cs.CR stat.ML

    Data Plagiarism Index: Characterizing the Privacy Risk of Data-Copying in Tabular Generative Models

    Authors: Joshua Ward, Chi-Hua Wang, Guang Cheng

    Abstract: The promise of tabular generative models is to produce realistic synthetic data that can be shared and safely used without dangerous leakage of information from the training set. In evaluating these models, a variety of methods have been proposed to measure the tendency to copy data from the training dataset when generating a sample. However, these methods suffer from either not considering data-c… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  7. arXiv:2406.09833  [pdf, other

    cs.AI cs.MM cs.SD eess.AS

    SHMamba: Structured Hyperbolic State Space Model for Audio-Visual Question Answering

    Authors: Zhe Yang, Wenrui Li, Guanghui Cheng

    Abstract: The Audio-Visual Question Answering (AVQA) task holds significant potential for applications. Compared to traditional unimodal approaches, the multi-modal input of AVQA makes feature extraction and fusion processes more challenging. Euclidean space is difficult to effectively represent multi-dimensional relationships of data. Especially when extracting and processing data with a tree structure or… ▽ More

    Submitted 20 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

  8. arXiv:2406.04619  [pdf, other

    cs.LG stat.ML

    CTSyn: A Foundational Model for Cross Tabular Data Generation

    Authors: Xiaofeng Lin, Chenheng Xu, Matthew Yang, Guang Cheng

    Abstract: Generative Foundation Models (GFMs) have produced synthetic data with remarkable quality in modalities such as images and text. However, applying GFMs to tabular data poses significant challenges due to the inherent heterogeneity of table features. Existing cross-table learning frameworks are hindered by the absence of both a generative model backbone and a decoding mechanism for heterogeneous fea… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  9. arXiv:2406.04374  [pdf, other

    cs.IR cs.GT cs.LG stat.ML

    Dynamic Online Recommendation for Two-Sided Market with Bayesian Incentive Compatibility

    Authors: Yuantong Li, Guang Cheng, Xiaowu Dai

    Abstract: Recommender systems play a crucial role in internet economies by connecting users with relevant products or services. However, designing effective recommender systems faces two key challenges: (1) the exploration-exploitation tradeoff in balancing new product exploration against exploiting known preferences, and (2) dynamic incentive compatibility in accounting for users' self-interested behaviors… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  10. arXiv:2406.01112  [pdf, other

    cs.CV

    BACON: Bayesian Optimal Condensation Framework for Dataset Distillation

    Authors: Zheng Zhou, Hongbo Zhao, Guangliang Cheng, Xiangtai Li, Shuchang Lyu, Wenquan Feng, Qi Zhao

    Abstract: Dataset Distillation (DD) aims to distill knowledge from extensive datasets into more compact ones while preserving performance on the test set, thereby reducing storage costs and training expenses. However, existing methods often suffer from computational intensity, particularly exhibiting suboptimal performance with large dataset sizes due to the lack of a robust theoretical framework for analyz… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 22 pages, 10 figures

  11. arXiv:2405.19005  [pdf, other

    cs.CV

    Auto-selected Knowledge Adapters for Lifelong Person Re-identification

    Authors: Xuelin Qian, Ruiqi Wu, Gong Cheng, Junwei Han

    Abstract: Lifelong Person Re-Identification (LReID) extends traditional ReID by requiring systems to continually learn from non-overlapping datasets across different times and locations, adapting to new identities while preserving knowledge of previous ones. Existing approaches, either rehearsal-free or rehearsal-based, still suffer from the problem of catastrophic forgetting since they try to cram diverse… ▽ More

    Submitted 30 May, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  12. arXiv:2405.17386  [pdf, other

    cs.CL cs.AI

    MindMerger: Efficient Boosting LLM Reasoning in non-English Languages

    Authors: Zixian Huang, Wenhao Zhu, Gong Cheng, Lei Li, Fei Yuan

    Abstract: Reasoning capabilities are crucial for Large Language Models (LLMs), yet a notable gap exists between English and non-English languages. To bridge this disparity, some works fine-tune LLMs to relearn reasoning capabilities in non-English languages, while others replace non-English inputs with an external model's outputs such as English translation text to circumvent the challenge of LLM understand… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  13. arXiv:2405.16876  [pdf, other

    cs.LG cs.AI

    Transfer Learning for Diffusion Models

    Authors: Yidong Ouyang, Liyan Xie, Hongyuan Zha, Guang Cheng

    Abstract: Diffusion models, a specific type of generative model, have achieved unprecedented performance in recent years and consistently produce high-quality synthetic samples. A critical prerequisite for their notable success lies in the presence of a substantial number of training samples, which can be impractical in real-world applications due to high collection costs or associated risks. Consequently,… ▽ More

    Submitted 27 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: 24 pages

  14. arXiv:2405.16730  [pdf, other

    cs.LG cs.AI stat.AP

    Latent Energy-Based Odyssey: Black-Box Optimization via Expanded Exploration in the Energy-Based Latent Space

    Authors: Peiyu Yu, Dinghuai Zhang, Hengzhi He, Xiaojian Ma, Ruiyao Miao, Yifan Lu, Yasi Zhang, Deqian Kong, Ruiqi Gao, Jianwen Xie, Guang Cheng, Ying Nian Wu

    Abstract: Offline Black-Box Optimization (BBO) aims at optimizing a black-box function using the knowledge from a pre-collected offline dataset of function values and corresponding input designs. However, the high-dimensional and highly-multimodal input design space of black-box function pose inherent challenges for most existing methods that model and operate directly upon input designs. These issues inclu… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  15. arXiv:2405.15979  [pdf, other

    cs.LG cs.CR stat.ML

    BadGD: A unified data-centric framework to identify gradient descent vulnerabilities

    Authors: Chi-Hua Wang, Guang Cheng

    Abstract: We present BadGD, a unified theoretical framework that exposes the vulnerabilities of gradient descent algorithms through strategic backdoor attacks. Backdoor attacks involve embedding malicious triggers into a training dataset to disrupt the model's learning process. Our framework introduces three novel constructs: Max RiskWarp Trigger, Max GradWarp Trigger, and Max GradDistWarp Trigger, each des… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 25 pages, 1 figure

  16. arXiv:2405.15337  [pdf, other

    stat.ML cs.LG

    Discriminative Estimation of Total Variation Distance: A Fidelity Auditor for Generative Data

    Authors: Lan Tao, Shirong Xu, Chi-Hua Wang, Namjoon Suh, Guang Cheng

    Abstract: With the proliferation of generative AI and the increasing volume of generative data (also called as synthetic data), assessing the fidelity of generative data has become a critical concern. In this paper, we propose a discriminative approach to estimate the total variation (TV) distance between two distributions as an effective measure of generative data fidelity. Our method quantitatively charac… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  17. arXiv:2405.14628  [pdf, other

    stat.ME stat.CO

    Online robust estimation and bootstrap inference for function-on-scalar regression

    Authors: Guanghui Cheng, Wenjuan Hu, Ruitao Lin, Chen Wang

    Abstract: We propose a novel and robust online function-on-scalar regression technique via geometric median to learn associations between functional responses and scalar covariates based on massive or streaming datasets. The online estimation procedure, developed using the average stochastic gradient descent algorithm, offers an efficient and cost-effective method for analyzing sequentially augmented datase… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  18. arXiv:2405.14018  [pdf, other

    cs.CR cs.LG stat.AP

    Watermarking Generative Tabular Data

    Authors: Hengzhi He, Peiyu Yu, Junpeng Ren, Ying Nian Wu, Guang Cheng

    Abstract: In this paper, we introduce a simple yet effective tabular data watermarking mechanism with statistical guarantees. We show theoretically that the proposed watermark can be effectively detected, while faithfully preserving the data fidelity, and also demonstrates appealing robustness against additive noise attack. The general idea is to achieve the watermarking through a strategic embedding based… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  19. arXiv:2405.07568  [pdf, ps, other

    eess.SP

    Networked ISAC for Low-Altitude Economy: Transmit Beamforming and UAV Trajectory Design

    Authors: Gaoyuan Cheng, Xianxin Song, Zhonghao Lyu, Jie Xu

    Abstract: This paper studies the exploitation of networked integrated sensing and communications (ISAC) to support low-altitude economy (LAE), in which a set of networked ground base stations (GBSs) transmit wireless signals to cooperatively communicate with multiple authorized unmanned aerial vehicles (UAVs) and concurrently use the echo signals to detect the invasion of unauthorized objects in interested… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  20. arXiv:2405.03060  [pdf, other

    cs.LG

    Tree-based Ensemble Learning for Out-of-distribution Detection

    Authors: Zhaiming Shen, Menglun Wang, Guang Cheng, Ming-Jun Lai, Lin Mu, Ruihao Huang, Qi Liu, Hao Zhu

    Abstract: Being able to successfully determine whether the testing samples has similar distribution as the training samples is a fundamental question to address before we can safely deploy most of the machine learning models into practice. In this paper, we propose TOOD detection, a simple yet effective tree-based out-of-distribution (TOOD) detection mechanism to determine if a set of unseen samples will ha… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  21. arXiv:2405.00393  [pdf, other

    cs.CR

    Inferring State Machine from the Protocol Implementation via Large Language Model

    Authors: Haiyang Wei, Zhengjie Du, Haohui Huang, Yue Liu, Guang Cheng, Linzhang Wang, Bing Mao

    Abstract: State machines play a pivotal role in augmenting the efficacy of protocol analyzing to unveil more vulnerabilities. However, the task of inferring state machines from network protocol implementations presents significant challenges. Traditional methods based on dynamic analysis often overlook crucial state transitions due to limited coverage, while static analysis faces difficulties with complex c… ▽ More

    Submitted 14 June, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

  22. arXiv:2404.18983  [pdf, other

    astro-ph.SR

    Evidence for Plasmoid-mediated Magnetic Reconnection during a Small-scale Flare in the Partially Ionized Low Solar Atmosphere

    Authors: Guanchong Cheng, Lei Ni, Zehao Tang, Yajie Chen, Yuhao Chen, Jialiang Hu, Jun Lin

    Abstract: Magnetic reconnection plays a crucial role in the energy release process for different kinds of solar eruptions and activities. The rapid solar eruption requires a fast reconnection model. Plasmoid instability in the reconnecting current sheets is one of the most acceptable fast reconnection mechanisms for explaining the explosive events in the magnetohydrodynamics (MHD) scale, which is also a pot… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  23. arXiv:2404.18429  [pdf

    physics.ao-ph

    The Jive Verification System and its Transformative Impact on Weather Forecasting Operations

    Authors: Nicholas Loveday, Deryn Griffiths, Tennessee Leeuwenburg, Robert Taggart, Thomas C. Pagano, George Cheng, Kevin Plastow, Elizabeth Ebert, Cassandra Templeton, Maree Carroll, Mohammadreza Khanarmuei, Isha Nagpal

    Abstract: Forecast verification is critical for continuous improvement in meteorological organizations. The Jive verification system was originally developed to assess the accuracy of public weather forecasts issued by the Australian Bureau of Meteorology. It started as a research project in 2015 and gradually evolved to be a Bureau operational verification system in 2022. The system includes daily verifica… ▽ More

    Submitted 9 July, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  24. arXiv:2404.13396  [pdf

    cond-mat.mtrl-sci

    Angle-Resolved Magneto-Chiral Anisotropy in a Non-Centrosymmetric Atomic Layer Superlattice

    Authors: Long Cheng, Mingrui Bao, Jingxian Zhang, Xue Zhang, Qun Yang, Qiang Li, Hui Cao, Dawei Qiu, Jia Liu, Fei Ye, Qing Wang, Genhao Liang, Hui Li, Guanglei Cheng, Hua Zhou, Jian-Min Zuo, Xiaodong Zhou, Jian Shen, Zhifeng Zhu, Sai Mu, Wenbo Wang, Xiaofang Zhai

    Abstract: Chirality in solid-state materials has sparked significant interest due to potential applications of topologically-protected chiral states in next-generation information technology. The electrical magneto-chiral effect (eMChE), arising from relativistic spin-orbit interactions, shows great promise for developing chiral materials and devices for electronic integration. Here we demonstrate an angle-… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  25. arXiv:2404.11384  [pdf, other

    cs.CL cs.LG

    Exploring Key Point Analysis with Pairwise Generation and Graph Partitioning

    Authors: Xiao Li, Yong Jiang, Shen Huang, Pengjun Xie, Gong Cheng, Fei Huang

    Abstract: Key Point Analysis (KPA), the summarization of multiple arguments into a concise collection of key points, continues to be a significant and unresolved issue within the field of argument mining. Existing models adapt a two-stage pipeline of clustering arguments or generating key points for argument clusters. This approach rely on semantic similarity instead of measuring the existence of shared key… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 11 pages, 4 figures, 4 tables. Accepted to NAACL 2024

  26. arXiv:2404.11027  [pdf, other

    cs.AI

    Empowering Large Language Models on Robotic Manipulation with Affordance Prompting

    Authors: Guangran Cheng, Chuheng Zhang, Wenzhe Cai, Li Zhao, Changyin Sun, Jiang Bian

    Abstract: While large language models (LLMs) are successful in completing various language processing tasks, they easily fail to interact with the physical world by generating control sequences properly. We find that the main reason is that LLMs are not grounded in the physical world. Existing LLM-based approaches circumvent this problem by relying on additional pre-defined skills or pre-trained sub-policie… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  27. arXiv:2404.09158  [pdf, other

    cs.CV cs.AI

    StreakNet-Arch: An Anti-scattering Network-based Architecture for Underwater Carrier LiDAR-Radar Imaging

    Authors: Xuelong Li, Hongjun An, Guangying Li, Xing Wang, Guanghua Cheng, Zhe Sun

    Abstract: In this paper, we introduce StreakNet-Arch, a novel signal processing architecture designed for Underwater Carrier LiDAR-Radar (UCLR) imaging systems, to address the limitations in scatter suppression and real-time imaging. StreakNet-Arch formulates the signal processing as a real-time, end-to-end binary classification task, enabling real-time image acquisition. To achieve this, we leverage Self-A… ▽ More

    Submitted 23 April, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

    Comments: Reduce the number of pages to 13

  28. arXiv:2403.18826  [pdf

    q-bio.QM eess.IV eess.SY

    SAM-dPCR: Real-Time and High-throughput Absolute Quantification of Biological Samples Using Zero-Shot Segment Anything Model

    Authors: Yuanyuan Wei, Shanhang Luo, Changran Xu, Yingqi Fu, Qingyue Dong, Yi Zhang, Fuyang Qu, Guangyao Cheng, Yi-Ping Ho, Ho-Pui Ho, Wu Yuan

    Abstract: Digital PCR (dPCR) has revolutionized nucleic acid diagnostics by enabling absolute quantification of rare mutations and target sequences. However, current detection methodologies face challenges, as flow cytometers are costly and complex, while fluorescence imaging methods, relying on software or manual counting, are time-consuming and prone to errors. To address these limitations, we present SAM… ▽ More

    Submitted 22 January, 2024; originally announced March 2024.

    Comments: 23 pages, 6 figures

  29. arXiv:2403.18216  [pdf, other

    stat.ML cs.CY cs.LG math.ST

    Minimax Optimal Fair Classification with Bounded Demographic Disparity

    Authors: Xianli Zeng, Guang Cheng, Edgar Dobriban

    Abstract: Mitigating the disparate impact of statistical machine learning methods is crucial for ensuring fairness. While extensive research aims to reduce disparity, the effect of using a \emph{finite dataset} -- as opposed to the entire population -- remains unclear. This paper explores the statistical foundations of fair binary classification with two protected groups, focusing on controlling demographic… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  30. arXiv:2403.16331  [pdf, other

    cs.SD cs.LG eess.AS

    Modeling Analog Dynamic Range Compressors using Deep Learning and State-space Models

    Authors: Hanzhi Yin, Gang Cheng, Christian J. Steinmetz, Ruibin Yuan, Richard M. Stern, Roger B. Dannenberg

    Abstract: We describe a novel approach for developing realistic digital models of dynamic range compressors for digital audio production by analyzing their analog prototypes. While realistic digital dynamic compressors are potentially useful for many applications, the design process is challenging because the compressors operate nonlinearly over long time scales. Our approach is based on the structured stat… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  31. arXiv:2403.12187  [pdf, ps, other

    stat.ML cs.LG math.ST

    Approximation of RKHS Functionals by Neural Networks

    Authors: Tian-Yi Zhou, Namjoon Suh, Guang Cheng, Xiaoming Huo

    Abstract: Motivated by the abundance of functional data such as time series and images, there has been a growing interest in integrating such data into neural networks and learning maps from function spaces to R (i.e., functionals). In this paper, we study the approximation of functionals on reproducing kernel Hilbert spaces (RKHS's) using neural networks. We establish the universality of the approximation… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  32. arXiv:2403.07780  [pdf, other

    stat.ML cs.LG

    FairRR: Pre-Processing for Group Fairness through Randomized Response

    Authors: Xianli Zeng, Joshua Ward, Guang Cheng

    Abstract: The increasing usage of machine learning models in consequential decision-making processes has spurred research into the fairness of these systems. While significant work has been done to study group fairness in the in-processing and post-processing setting, there has been little that theoretically connects these results to the pre-processing domain. This paper proposes that achieving group fairne… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  33. arXiv:2403.07056  [pdf, other

    hep-th gr-qc quant-ph

    Gravitational back-reaction is magical

    Authors: ChunJun Cao, Gong Cheng, Alioscia Hamma, Lorenzo Leone, William Munizzi, Savatore F. E. Oliviero

    Abstract: We study the interplay between magic and entanglement in quantum many-body systems. We show that non-local magic, which is supported by the quantum correlations is lower bounded by the non-flatness of entanglement spectrum and upper bounded by the amount of entanglement in the system. We then argue that a smoothed version of non-local magic bounds the hardness of classical simulations for incompre… ▽ More

    Submitted 16 May, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: 62 pages, 20 figures; title changed, Theorem 1 and 2 refined, references added

  34. arXiv:2403.06642  [pdf, other

    cs.IR cs.AI cs.CL

    TRAWL: External Knowledge-Enhanced Recommendation with LLM Assistance

    Authors: Weiqing Luo, Chonggang Song, Lingling Yi, Gong Cheng

    Abstract: Combining semantic information with behavioral data is a crucial research area in recommender systems. A promising approach involves leveraging external knowledge to enrich behavioral-based recommender systems with abundant semantic information. However, this approach faces two primary challenges: denoising raw external knowledge and adapting semantic representations. To address these challenges,… ▽ More

    Submitted 24 May, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: 8 pages, 3 figures

  35. arXiv:2403.03099  [pdf, other

    stat.ME stat.CO

    Data Nuggets: A Method for Reducing Big Data While Preserving Data Structure

    Authors: Traymon E. Beavers, Ge Cheng, Yajie Duan, Javier Cabrera, Mariusz Lubomirski, Dhammika Amaratunga, Jeffrey E. Teigler

    Abstract: Big data, with NxP dimension where N is extremely large, has created new challenges for data analysis, particularly in the realm of creating meaningful clusters of data. Clustering techniques, such as K-means or hierarchical clustering are popular methods for performing exploratory analysis on large datasets. Unfortunately, these methods are not always possible to apply to big data due to memory o… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: 31 pages and 8 figures

    MSC Class: 62:0

  36. arXiv:2402.16792  [pdf, other

    stat.ML cs.CR cs.LG

    Rate-Optimal Rank Aggregation with Private Pairwise Rankings

    Authors: Shirong Xu, Will Wei Sun, Guang Cheng

    Abstract: In various real-world scenarios like recommender systems and political surveys, pairwise rankings are commonly collected and utilized for rank aggregation to obtain an overall ranking of items. However, preference rankings can reveal individuals' personal preferences, underscoring the need to protect them before releasing for downstream analysis. In this paper, we address the challenge of preservi… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  37. arXiv:2402.12692  [pdf, other

    cs.CL

    FormulaReasoning: A Dataset for Formula-Based Numerical Reasoning

    Authors: Xiao Li, Bolin Zhu, Sichen Liu, Yin Zhu, Yiwei Liu, Gong Cheng

    Abstract: The application of formulas is a fundamental ability of humans when addressing numerical reasoning problems. However, existing numerical reasoning datasets seldom explicitly indicate the formulas employed during the reasoning steps. To bridge this gap, we construct a dataset for formula-based numerical reasoning called FormulaReasoning, which consists of 5,420 reasoning-based questions. We employ… ▽ More

    Submitted 12 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

  38. arXiv:2402.12001  [pdf, other

    cs.AI cs.DB cs.IR cs.SI

    A Survey on Extractive Knowledge Graph Summarization: Applications, Approaches, Evaluation, and Future Directions

    Authors: Xiaxia Wang, Gong Cheng

    Abstract: With the continuous growth of large Knowledge Graphs (KGs), extractive KG summarization becomes a trending task. Aiming at distilling a compact subgraph with condensed information, it facilitates various downstream KG-based tasks. In this survey paper, we are among the first to provide a systematic overview of its applications and define a taxonomy for existing methods from its interdisciplinary s… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: 9 pages, 13 figures, submitted to the IJCAI 2024 Survey Track

  39. arXiv:2402.11016  [pdf, other

    hep-th astro-ph.CO gr-qc

    Holographic phenomenology via overlapping degrees of freedom

    Authors: Oliver Friedrich, ChunJun Cao, Sean M. Carroll, Gong Cheng, Ashmeet Singh

    Abstract: The holographic principle suggests that regions of space contain fewer physical degrees of freedom than would be implied by conventional quantum field theory. Meanwhile, in Hilbert spaces of large dimension $2^n$, it is possible to define $N \gg n$ Pauli algebras that are nearly anti-commuting (but not quite) and which can be thought of as "overlapping degrees of freedom". We propose to model the… ▽ More

    Submitted 5 March, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: 46 pages + appendix; code and data available at https://github.com/OliverFHD/GPUniverse

  40. arXiv:2402.07175  [pdf, other

    astro-ph.SR

    A magnetic reconnection model for the hot explosion with both ultraviolet and Hα wing emissions

    Authors: Guanchong Cheng, Lei Ni, Yajie Chen, Jun Lin

    Abstract: Ellerman bombs (EBs) with significant H$α$ wing emissions and ultraviolet bursts (UV bursts) with strong Si IV emissions are two kinds of small transient brightening events that occur in the low solar atmosphere.We numerically investigated the magnetic reconnection process between the emerging arch magnetic field and the lower atmospheric background magnetic field. We aim to find out if the hot UV… ▽ More

    Submitted 20 February, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

  41. arXiv:2402.03760  [pdf, other

    cs.NI

    DeMarking: A Defense for Network Flow Watermarking in Real-Time

    Authors: Yali Yuan, Jian Ge, Guang Cheng

    Abstract: The network flow watermarking technique associates the two communicating parties by actively modifying certain characteristics of the stream generated by the sender so that it covertly carries some special marking information. Some curious users communicating with the hidden server as a Tor client may attempt de-anonymization attacks to uncover the real identity of the hidden server by using this… ▽ More

    Submitted 6 February, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

  42. arXiv:2402.02817  [pdf, other

    stat.ML cs.CY cs.LG

    Bayes-Optimal Fair Classification with Linear Disparity Constraints via Pre-, In-, and Post-processing

    Authors: Xianli Zeng, Guang Cheng, Edgar Dobriban

    Abstract: Machine learning algorithms may have disparate impacts on protected groups. To address this, we develop methods for Bayes-optimal fair classification, aiming to minimize classification error subject to given group fairness constraints. We introduce the notion of \emph{linear disparity measures}, which are linear functions of a probabilistic classifier; and \emph{bilinear disparity measures}, which… ▽ More

    Submitted 6 February, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: This paper replaces the preprint "Bayes-optimal classifiers under group fairness" by Xianli Zeng, Edgar Dobriban, and Guang Cheng (arXiv:2202.09724)

  43. arXiv:2402.00743  [pdf, other

    cs.LG cs.CL stat.ML

    Theoretical Understanding of In-Context Learning in Shallow Transformers with Unstructured Data

    Authors: Yue Xing, Xiaofeng Lin, Chenheng Xu, Namjoon Suh, Qifan Song, Guang Cheng

    Abstract: Large language models (LLMs) are powerful models that can learn concepts at the inference stage via in-context learning (ICL). While theoretical studies, e.g., \cite{zhang2023trained}, attempt to explain the mechanism of ICL, they assume the input $x_i$ and the output $y_i$ of each demonstration example are in the same token (i.e., structured data). However, in real practice, the examples are usua… ▽ More

    Submitted 18 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  44. arXiv:2401.15248  [pdf, other

    cs.LG stat.ML

    Better Representations via Adversarial Training in Pre-Training: A Theoretical Perspective

    Authors: Yue Xing, Xiaofeng Lin, Qifan Song, Yi Xu, Belinda Zeng, Guang Cheng

    Abstract: Pre-training is known to generate universal representations for downstream tasks in large-scale deep learning such as large language models. Existing literature, e.g., \cite{kim2020adversarial}, empirically observe that the downstream tasks can inherit the adversarial robustness of the pre-trained model. We provide theoretical justifications for this robustness inheritance phenomenon. Our theoreti… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

    Comments: To appear in AISTATS2024

  45. arXiv:2401.14547  [pdf

    cond-mat.str-el cond-mat.mes-hall cond-mat.mtrl-sci cond-mat.other physics.app-ph

    Discovery of a Topological Charge Density Wave

    Authors: Maksim Litskevich, Md Shafayat Hossain, Songbo Zhang, Zi-Jia Cheng, Satya N. Guin, Nitesh Kumar, Chandra Shekhar, Zhiwei Wang, Yongkai Li, Guoqing Chang, Jia-Xin Yin, Qi Zhang, Guangming Cheng, Yu-Xiao Jiang, Tyler A. Cochran, Nana Shumiya, Xian P. Yang, Daniel Multer, Xiaoxiong Liu, Nan Yao, Yugui Yao, Claudia Felser, Titus Neupert, M. Zahid Hasan

    Abstract: Charge density waves (CDWs) appear in numerous condensed matter platforms, ranging from high-Tc superconductors to quantum Hall systems. Despite such ubiquity, there has been a lack of direct experimental study on boundary states that can uniquely stem from the charge order. Here, using scanning tunneling microscopy, we directly visualize the bulk and boundary phenomenology of CDW in a topological… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: Nature Physics (2024); in press

  46. arXiv:2401.11776  [pdf, other

    cs.RO

    On the impact of robot personalization on human-robot interaction: A review

    Authors: Jinyu Yang, Camille Vindolet, Julio Rogelio Guadarrama Olvera, Gordon Cheng

    Abstract: This study reviews the impact of personalization on human-robot interaction. Firstly, the various strategies used to achieve personalization are briefly described. Secondly, the effects of personalization known to date are discussed. They are presented along with the personalized parameters, personalized features, used technology, and use case they relate to. It is observed that various positive e… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Report number: CONCATENATE/2023/14 ACM Class: I.2.9

  47. arXiv:2401.07187  [pdf, other

    stat.ML cs.LG math.ST

    A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models

    Authors: Namjoon Suh, Guang Cheng

    Abstract: In this article, we review the literature on statistical theories of neural networks from three perspectives. In the first part, results on excess risks for neural networks are reviewed in the nonparametric framework of regression or classification. These results rely on explicit constructions of neural networks, leading to fast convergence rates of excess risks, in that tools from the approximati… ▽ More

    Submitted 4 July, 2024; v1 submitted 13 January, 2024; originally announced January 2024.

    Comments: 33 pages, no figures,Invited for review in Annual Review of Statistics and Its Application (In review)

  48. arXiv:2401.01770  [pdf, other

    math.OC

    Legendre-Moment Transform for Linear Ensemble Control and Computation

    Authors: Xin Ning, Gong Cheng, Wei Zhang, Jr-Shin Li

    Abstract: Ensemble systems, pervasive in diverse scientific and engineering domains, pose challenges to existing control methods due to their massive scale and underactuated nature. This paper presents a dynamic moment approach to addressing theoretical and computational challenges in systems-theoretic analysis and control design for linear ensemble systems. We introduce the Legendre-moments and Legendre-mo… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

    MSC Class: 93B05; 93B28; 93B51

  49. arXiv:2401.01233  [pdf, other

    cs.LG

    Graph Elimination Networks

    Authors: Shuo Wang, Ge Cheng, Yun Zhang

    Abstract: Graph Neural Networks (GNNs) are widely applied across various domains, yet they perform poorly in deep layers. Existing research typically attributes this problem to node over-smoothing, where node representations become indistinguishable after multiple rounds of propagation. In this paper, we delve into the neighborhood propagation mechanism of GNNs and discover that the real root cause of GNNs'… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Comments: Includes 8 pages of main text and 4 pages of appendices

  50. arXiv:2401.00974  [pdf, other

    cs.LG cs.AI

    Downstream Task-Oriented Generative Model Selections on Synthetic Data Training for Fraud Detection Models

    Authors: Yinan Cheng, Chi-Hua Wang, Vamsi K. Potluru, Tucker Balch, Guang Cheng

    Abstract: Devising procedures for downstream task-oriented generative model selections is an unresolved problem of practical importance. Existing studies focused on the utility of a single family of generative models. They provided limited insights on how synthetic data practitioners select the best family generative models for synthetic training tasks given a specific combination of machine learning model… ▽ More

    Submitted 1 January, 2024; originally announced January 2024.

    Comments: The following article has been accepted by ICAIF22, Synthetic Data for AI in Finance; see https://sites.google.com/view/icaif-synthetic-2022/program