subscribe to arXiv mailings

Online Matching: A Brief Survey

Authors: Zhiyi Huang, Zhihao Gavin Tang, David Wajc

Abstract: Matching, capturing allocation of items to unit-demand buyers, or tasks to workers, or pairs of collaborators, is a central problem in economics. Indeed, the growing prevalence of matching-based markets, many of which online in nature, has motivated much research in economics, operations research, computer science, and their intersection. This brief survey is meant as an introduction to the area o… ▽ More Matching, capturing allocation of items to unit-demand buyers, or tasks to workers, or pairs of collaborators, is a central problem in economics. Indeed, the growing prevalence of matching-based markets, many of which online in nature, has motivated much research in economics, operations research, computer science, and their intersection. This brief survey is meant as an introduction to the area of online matching, with an emphasis on recent trends, both technical and conceptual. △ Less

Submitted 7 July, 2024; originally announced July 2024.

Comments: Also in SIGECOM Exchanges

arXiv:2407.05023 [pdf, other]

SurgicalGaussian: Deformable 3D Gaussians for High-Fidelity Surgical Scene Reconstruction

Authors: Weixing Xie, Junfeng Yao, Xianpeng Cao, Qiqin Lin, Zerui Tang, Xiao Dong, Xiaohu Guo

Abstract: Dynamic reconstruction of deformable tissues in endoscopic video is a key technology for robot-assisted surgery. Recent reconstruction methods based on neural radiance fields (NeRFs) have achieved remarkable results in the reconstruction of surgical scenes. However, based on implicit representation, NeRFs struggle to capture the intricate details of objects in the scene and cannot achieve real-tim… ▽ More Dynamic reconstruction of deformable tissues in endoscopic video is a key technology for robot-assisted surgery. Recent reconstruction methods based on neural radiance fields (NeRFs) have achieved remarkable results in the reconstruction of surgical scenes. However, based on implicit representation, NeRFs struggle to capture the intricate details of objects in the scene and cannot achieve real-time rendering. In addition, restricted single view perception and occluded instruments also propose special challenges in surgical scene reconstruction. To address these issues, we develop SurgicalGaussian, a deformable 3D Gaussian Splatting method to model dynamic surgical scenes. Our approach models the spatio-temporal features of soft tissues at each time stamp via a forward-mapping deformation MLP and regularization to constrain local 3D Gaussians to comply with consistent movement. With the depth initialization strategy and tool mask-guided training, our method can remove surgical instruments and reconstruct high-fidelity surgical scenes. Through experiments on various surgical videos, our network outperforms existing method on many aspects, including rendering quality, rendering speed and GPU usage. The project page can be found at https://surgicalgaussian.github.io. △ Less

Submitted 6 July, 2024; originally announced July 2024.

arXiv:2407.04702 [pdf, other]

Centered Co-Circular Central Configurations with Three Unequal masses in Power-Law n-Body Problems

Authors: Zhengyang Tang, Shuqiang Zhu

Abstract: We study the co-circular central configurations of the general power-law potential n-body problem for which the center of mass and the center of the common circle coincide. We prove that there are no central configurations of this type with all the masses equal except three. We study the co-circular central configurations of the general power-law potential n-body problem for which the center of mass and the center of the common circle coincide. We prove that there are no central configurations of this type with all the masses equal except three. △ Less

Submitted 17 April, 2024; originally announced July 2024.

Comments: 9 pages, 1 figure

MSC Class: 70F10; 70F15

arXiv:2407.02935 [pdf, other]

Properties of the QCD Matter -- An Experimental Review of Selected Results from RHIC BES Program

Authors: Jinhui Chen, Xin Dong, Xionghong He, Huanzhong Huang, Feng Liu, Xiaofeng Luo, Yu-Gang Ma, Lijuan Ruan, Ming Shao, Shusu Shi, Xu Sun, Aihong Tang, Zebo Tang, Fuqiang Wang, Hai Wang, Yi Wang, Zhigang Xiao, Guannan Xie, Nu Xu, Qinghua Xu, Zhangbu Xu, Chi Yang, Shuai Yang, Wangmei Zha, Yapeng Zhang , et al. (3 additional authors not shown)

Abstract: In the paper, we discuss the development of the multi-gap resistive plate chamber Time-of-Flight (TOF) technology and the production of the STAR TOF detector in China at the beginning of the 21st century. Then we review recent experimental results from the first beam energy scan program (BES-I) at the Relativistic Heavy Ion Collider (RHIC). Topics cover measurements of collectivity, chirality, cri… ▽ More In the paper, we discuss the development of the multi-gap resistive plate chamber Time-of-Flight (TOF) technology and the production of the STAR TOF detector in China at the beginning of the 21st century. Then we review recent experimental results from the first beam energy scan program (BES-I) at the Relativistic Heavy Ion Collider (RHIC). Topics cover measurements of collectivity, chirality, criticality, global polarization, strangeness, heavy-flavor, di-lepton and light nuclei productions. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 31 pages, 33 figures. This review is dedicated to Professor Wenqing Shen on the occasion to celebrate his leadership of the Chinese STAR Collaboration, the development and production of the STAR MRPC TOF detector in China and many physics analyses

arXiv:2407.01909 [pdf, other]

Pinyin Regularization in Error Correction for Chinese Speech Recognition with Large Language Models

Authors: Zhiyuan Tang, Dong Wang, Shen Huang, Shidong Shang

Abstract: Recent studies have demonstrated the efficacy of large language models (LLMs) in error correction for automatic speech recognition (ASR). However, much of the research focuses on the English language. This paper redirects the attention to Chinese. Firstly, we construct a specialized benchmark dataset aimed at error correction for Chinese ASR with 724K hypotheses-transcription pairs, named the Chin… ▽ More Recent studies have demonstrated the efficacy of large language models (LLMs) in error correction for automatic speech recognition (ASR). However, much of the research focuses on the English language. This paper redirects the attention to Chinese. Firstly, we construct a specialized benchmark dataset aimed at error correction for Chinese ASR with 724K hypotheses-transcription pairs, named the Chinese Hypotheses Paradise dataset (ChineseHP), which contains a wide range of scenarios and presents significant challenges. Subsequently, we conduct a preliminary evaluation using the dataset for both direct-prompting and fine-tuning pre-trained LLMs. Furthermore, we propose a straightforward method of Pinyin regularization for prompts, which involves the transcription of Pinyin directly from text hypotheses. The experimental results reveal that Pinyin regularization consistently enhances the error-correcting ability of LLMs when compared with those without regularization. The dataset is available on the website. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: Interspeech 2024

arXiv:2407.01892 [pdf, other]

GRASP: A Grid-Based Benchmark for Evaluating Commonsense Spatial Reasoning

Authors: Zhisheng Tang, Mayank Kejriwal

Abstract: Spatial reasoning, an important faculty of human cognition with many practical applications, is one of the core commonsense skills that is not purely language-based and, for satisfying (as opposed to optimal) solutions, requires some minimum degree of planning. Existing benchmarks of Commonsense Spatial Reasoning (CSR) tend to evaluate how Large Language Models (LLMs) interpret text-based spatial… ▽ More Spatial reasoning, an important faculty of human cognition with many practical applications, is one of the core commonsense skills that is not purely language-based and, for satisfying (as opposed to optimal) solutions, requires some minimum degree of planning. Existing benchmarks of Commonsense Spatial Reasoning (CSR) tend to evaluate how Large Language Models (LLMs) interpret text-based spatial descriptions rather than directly evaluate a plan produced by the LLM in response to a spatial reasoning scenario. In this paper, we construct a large-scale benchmark called $\textbf{GRASP}$, which consists of 16,000 grid-based environments where the agent is tasked with an energy collection problem. These environments include 100 grid instances instantiated using each of the 160 different grid settings, involving five different energy distributions, two modes of agent starting position, and two distinct obstacle configurations, as well as three kinds of agent constraints. Using GRASP, we compare classic baseline approaches, such as random walk and greedy search methods, with advanced LLMs like GPT-3.5-Turbo and GPT-4o. The experimental results indicate that even these advanced LLMs struggle to consistently achieve satisfactory solutions. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.01178 [pdf, other]

$\text{Memory}^3$: Language Modeling with Explicit Memory

Authors: Hongkang Yang, Zehao Lin, Wenjin Wang, Hao Wu, Zhiyu Li, Bo Tang, Wenqiang Wei, Jinbo Wang, Zeyun Tang, Shichao Song, Chenyang Xi, Yu Yu, Kai Chen, Feiyu Xiong, Linpeng Tang, Weinan E

Abstract: The training and inference of large language models (LLMs) are together a costly process that transports knowledge from raw data to meaningful computation. Inspired by the memory hierarchy of the human brain, we reduce this cost by equipping LLMs with explicit memory, a memory format cheaper than model parameters and text retrieval-augmented generation (RAG). Conceptually, with most of its knowled… ▽ More The training and inference of large language models (LLMs) are together a costly process that transports knowledge from raw data to meaningful computation. Inspired by the memory hierarchy of the human brain, we reduce this cost by equipping LLMs with explicit memory, a memory format cheaper than model parameters and text retrieval-augmented generation (RAG). Conceptually, with most of its knowledge externalized to explicit memories, the LLM can enjoy a smaller parameter size, training cost, and inference cost, all proportional to the amount of remaining "abstract knowledge". As a preliminary proof of concept, we train from scratch a 2.4B LLM, which achieves better performance than much larger LLMs as well as RAG models, and maintains higher decoding speed than RAG. The model is named $\text{Memory}^3$, since explicit memory is the third form of memory in LLMs after implicit memory (model parameters) and working memory (context key-values). We introduce a memory circuitry theory to support the externalization of knowledge, and present novel techniques including a memory sparsification mechanism that makes storage tractable and a two-stage pretraining scheme that facilitates memory formation. △ Less

Submitted 1 July, 2024; originally announced July 2024.

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2407.00668 [pdf, other]

HRDE: Retrieval-Augmented Large Language Models for Chinese Health Rumor Detection and Explainability

Authors: Yanfang Chen, Ding Chen, Shichao Song, Simin Niu, Hanyu Wang, Zeyun Tang, Feiyu Xiong, Zhiyu Li

Abstract: As people increasingly prioritize their health, the speed and breadth of health information dissemination on the internet have also grown. At the same time, the presence of false health information (health rumors) intermingled with genuine content poses a significant potential threat to public health. However, current research on Chinese health rumors still lacks a large-scale, public, and open-so… ▽ More As people increasingly prioritize their health, the speed and breadth of health information dissemination on the internet have also grown. At the same time, the presence of false health information (health rumors) intermingled with genuine content poses a significant potential threat to public health. However, current research on Chinese health rumors still lacks a large-scale, public, and open-source dataset of health rumor information, as well as effective and reliable rumor detection methods. This paper addresses this gap by constructing a dataset containing 1.12 million health-related rumors (HealthRCN) through web scraping of common health-related questions and a series of data processing steps. HealthRCN is the largest known dataset of Chinese health information rumors to date. Based on this dataset, we propose retrieval-augmented large language models for Chinese health rumor detection and explainability (HRDE). This model leverages retrieved relevant information to accurately determine whether the input health information is a rumor and provides explanatory responses, effectively aiding users in verifying the authenticity of health information. In evaluation experiments, we compared multiple models and found that HRDE outperformed them all, including GPT-4-1106-Preview, in rumor detection accuracy and answer quality. HRDE achieved an average accuracy of 91.04% and an F1 score of 91.58%. △ Less

Submitted 3 July, 2024; v1 submitted 30 June, 2024; originally announced July 2024.

arXiv:2407.00529 [pdf, other]

Detecting and Identifying Selection Structure in Sequential Data

Authors: Yujia Zheng, Zeyu Tang, Yiwen Qiu, Bernhard Schölkopf, Kun Zhang

Abstract: We argue that the selective inclusion of data points based on latent objectives is common in practical situations, such as music sequences. Since this selection process often distorts statistical analysis, previous work primarily views it as a bias to be corrected and proposes various methods to mitigate its effect. However, while controlling this bias is crucial, selection also offers an opportun… ▽ More We argue that the selective inclusion of data points based on latent objectives is common in practical situations, such as music sequences. Since this selection process often distorts statistical analysis, previous work primarily views it as a bias to be corrected and proposes various methods to mitigate its effect. However, while controlling this bias is crucial, selection also offers an opportunity to provide a deeper insight into the hidden generation process, as it is a fundamental mechanism underlying what we observe. In particular, overlooking selection in sequential data can lead to an incomplete or overcomplicated inductive bias in modeling, such as assuming a universal autoregressive structure for all dependencies. Therefore, rather than merely viewing it as a bias, we explore the causal structure of selection in sequential data to delve deeper into the complete causal process. Specifically, we show that selection structure is identifiable without any parametric assumptions or interventional experiments. Moreover, even in cases where selection variables coexist with latent confounders, we still establish the nonparametric identifiability under appropriate structural conditions. Meanwhile, we also propose a provably correct algorithm to detect and identify selection structures as well as other types of dependencies. The framework has been validated empirically on both synthetic data and real-world music. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Comments: ICML 2024

arXiv:2406.19654 [pdf, other]

A lensed FRB candidate in the first CHIME/FRB Catalogue and its potential implications

Authors: Chenming Chang, Songbo Zhang, Di Xiao, Zhenfan Tang, Ye Li, Junjie Wei, Xuefeng Wu

Abstract: Fast radio bursts (FRBs) are immensely energetic radio pulses with durations of milliseconds. Given their high all-sky rate, the probability of an FRB being lensed by an intervening massive object is non-negligible. In this study, we search for possible lensing candidates within the first Canadian Hydrogen Intensity Mapping Experiment FRB catalogue using an autocorrelation algorithm and verificati… ▽ More Fast radio bursts (FRBs) are immensely energetic radio pulses with durations of milliseconds. Given their high all-sky rate, the probability of an FRB being lensed by an intervening massive object is non-negligible. In this study, we search for possible lensing candidates within the first Canadian Hydrogen Intensity Mapping Experiment FRB catalogue using an autocorrelation algorithm and verification through signal simulations. We identify FRB 20190308C as a lensed candidate with a significance of 3.4 sigma. Furthermore, we constrain the mass of the lensing object using the Chang-Refsdal lens model, based on the flux ratio and time delay between the substructures of FRB 20190308C. Future long-term and high-precision observations are expected to reveal more lensed FRBs. △ Less

Submitted 28 June, 2024; originally announced June 2024.

Comments: 5 pages, 4 figures, submitted

arXiv:2406.17561 [pdf, other]

Improving density matrix electronic structure method by deep learning

Authors: Zechen Tang, Nianlong Zou, He Li, Yuxiang Wang, Zilong Yuan, Honggeng Tao, Yang Li, Zezhou Chen, Boheng Zhao, Minghui Sun, Hong Jiang, Wenhui Duan, Yong Xu

Abstract: The combination of deep learning and ab initio materials calculations is emerging as a trending frontier of materials science research, with deep-learning density functional theory (DFT) electronic structure being particularly promising. In this work, we introduce a neural-network method for modeling the DFT density matrix, a fundamental yet previously unexplored quantity in deep-learning electron… ▽ More The combination of deep learning and ab initio materials calculations is emerging as a trending frontier of materials science research, with deep-learning density functional theory (DFT) electronic structure being particularly promising. In this work, we introduce a neural-network method for modeling the DFT density matrix, a fundamental yet previously unexplored quantity in deep-learning electronic structure. Utilizing an advanced neural network framework that leverages the nearsightedness and equivariance properties of the density matrix, the method demonstrates high accuracy and excellent generalizability in multiple example studies, as well as capability to precisely predict charge density and reproduce other electronic structure properties. Given the pivotal role of the density matrix in DFT as well as other computational methods, the current research introduces a novel approach to the deep-learning study of electronic structure properties, opening up new opportunities for deep-learning enhanced computational materials study. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.17163 [pdf, other]

Paraphrase and Aggregate with Large Language Models for Minimizing Intent Classification Errors

Authors: Vikas Yadav, Zheng Tang, Vijay Srinivasan

Abstract: Large language models (LLM) have achieved remarkable success in natural language generation but lesser focus has been given to their applicability in decision making tasks such as classification. We show that LLMs like LLaMa can achieve high performance on large multi-class classification tasks but still make classification errors and worse, generate out-of-vocabulary class labels. To address thes… ▽ More Large language models (LLM) have achieved remarkable success in natural language generation but lesser focus has been given to their applicability in decision making tasks such as classification. We show that LLMs like LLaMa can achieve high performance on large multi-class classification tasks but still make classification errors and worse, generate out-of-vocabulary class labels. To address these critical issues, we introduce Paraphrase and AGgregate (PAG)-LLM approach wherein an LLM generates multiple paraphrases of the input query (parallel queries), performs multi-class classification for the original query and each paraphrase, and at the end aggregate all the classification labels based on their confidence scores. We evaluate PAG-LLM on two large multi-class classication datasets: CLINC, and Banking and show 22.7% and 15.1% error reduction. We show that PAG-LLM is especially effective for hard examples where LLM is uncertain, and reduces the critical misclassification and hallucinated label generation errors △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: Accepted at SIGIR 2024

arXiv:2406.15192 [pdf, ps, other]

Setting Targets is All You Need:Improved Order Competitive Ratio for Online Selection

Authors: Liyan Chen, Nuozhou Sun, Zhihao Gavin Tang

Abstract: There is a rising interest for studying the online benchmark as an alternative of the classical offline benchmark in online stochastic settings. Ezra, Feldman, Gravin, and Tang (SODA 2023) introduced the notion of order-competitive ratio, defined as the worst-case ratio between the performance of the best order-unaware algorithm and the best order-aware algorithm, to quantify the loss incurred by… ▽ More There is a rising interest for studying the online benchmark as an alternative of the classical offline benchmark in online stochastic settings. Ezra, Feldman, Gravin, and Tang (SODA 2023) introduced the notion of order-competitive ratio, defined as the worst-case ratio between the performance of the best order-unaware algorithm and the best order-aware algorithm, to quantify the loss incurred by the lack of knowledge of the arrival order. They showed in the online single selection setting (a.k.a. the prophet problem), the optimal order-competitive ratio achieved by deterministic algorithms is $1/\varphi \approx 0.618$, and left with an open question whether randomized algorithms can do better. We answer the open question firmly by introducing a novel family of algorithms called \emph{targeted value algorithms}. We show that the task of online selection is as easy as guessing the optimal online benchmark. Specifically, we provide 1) an alternative optimal $1/\varphi$ order-competitive algorithm by setting the targeted value deterministically, and 2) a $0.732$ order-competitive algorithm by setting the targeted value randomly. We further provide a $0.758$ upper bound on the order-competitive ratio of our algorithm, showing that our analysis is close to the best possible, and establish an upper bound of $0.829$ on the order-competitive ratio for general randomized order-unaware algorithms. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.13404 [pdf, other]

Low-Latency Layer-Aware Proactive and Passive Container Migration in Meta Computing

Authors: Mengjie Liu, Yihua Li, Fangyi Mou, Zhiqing Tang, Jiong Lou, Jianxiong Guo, Weijia Jia

Abstract: Meta computing is a new computing paradigm that aims to efficiently utilize all network computing resources to provide fault-tolerant, personalized services with strong security and privacy guarantees. It also seeks to virtualize the Internet as many meta computers. In meta computing, tasks can be assigned to containers at edge nodes for processing, based on container images with multiple layers.… ▽ More Meta computing is a new computing paradigm that aims to efficiently utilize all network computing resources to provide fault-tolerant, personalized services with strong security and privacy guarantees. It also seeks to virtualize the Internet as many meta computers. In meta computing, tasks can be assigned to containers at edge nodes for processing, based on container images with multiple layers. The dynamic and resource-constrained nature of meta computing environments requires an optimal container migration strategy for mobile users to minimize latency. However, the problem of container migration in meta computing has not been thoroughly explored. To address this gap, we present low-latency, layer-aware container migration strategies that consider both proactive and passive migration. Specifically: 1) We formulate the container migration problem in meta computing, taking into account layer dependencies to reduce migration costs and overall task duration by considering four delays. 2) We introduce a reinforcement learning algorithm based on policy gradients to minimize total latency by identifying layer dependencies for action selection, making decisions for both proactive and passive migration. Expert demonstrations are introduced to enhance exploitation. 3) Experiments using real data trajectories show that the algorithm outperforms baseline algorithms, achieving lower total latency. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: to be published in IEEE ICMC 2024

arXiv:2406.13399 [pdf, other]

VELO: A Vector Database-Assisted Cloud-Edge Collaborative LLM QoS Optimization Framework

Authors: Zhi Yao, Zhiqing Tang, Jiong Lou, Ping Shen, Weijia Jia

Abstract: The Large Language Model (LLM) has gained significant popularity and is extensively utilized across various domains. Most LLM deployments occur within cloud data centers, where they encounter substantial response delays and incur high costs, thereby impacting the Quality of Services (QoS) at the network edge. Leveraging vector database caching to store LLM request results at the edge can substanti… ▽ More The Large Language Model (LLM) has gained significant popularity and is extensively utilized across various domains. Most LLM deployments occur within cloud data centers, where they encounter substantial response delays and incur high costs, thereby impacting the Quality of Services (QoS) at the network edge. Leveraging vector database caching to store LLM request results at the edge can substantially mitigate response delays and cost associated with similar requests, which has been overlooked by previous research. Addressing these gaps, this paper introduces a novel Vector database-assisted cloud-Edge collaborative LLM QoS Optimization (VELO) framework. Firstly, we propose the VELO framework, which ingeniously employs vector database to cache the results of some LLM requests at the edge to reduce the response time of subsequent similar requests. Diverging from direct optimization of the LLM, our VELO framework does not necessitate altering the internal structure of LLM and is broadly applicable to diverse LLMs. Subsequently, building upon the VELO framework, we formulate the QoS optimization problem as a Markov Decision Process (MDP) and devise an algorithm grounded in Multi-Agent Reinforcement Learning (MARL) to decide whether to request the LLM in the cloud or directly return the results from the vector database at the edge. Moreover, to enhance request feature extraction and expedite training, we refine the policy network of MARL and integrate expert demonstrations. Finally, we implement the proposed algorithm within a real edge system. Experimental findings confirm that our VELO framework substantially enhances user satisfaction by concurrently diminishing delay and resource consumption for edge users utilizing LLMs. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: to be published in IEEE ICWS 2024

arXiv:2406.13124 [pdf, other]

Learning to Generate Answers with Citations via Factual Consistency Models

Authors: Rami Aly, Zhiqiang Tang, Samson Tan, George Karypis

Abstract: Large Language Models (LLMs) frequently hallucinate, impeding their reliability in mission-critical situations. One approach to address this issue is to provide citations to relevant sources alongside generated content, enhancing the verifiability of generations. However, citing passages accurately in answers remains a substantial challenge. This paper proposes a weakly-supervised fine-tuning meth… ▽ More Large Language Models (LLMs) frequently hallucinate, impeding their reliability in mission-critical situations. One approach to address this issue is to provide citations to relevant sources alongside generated content, enhancing the verifiability of generations. However, citing passages accurately in answers remains a substantial challenge. This paper proposes a weakly-supervised fine-tuning method leveraging factual consistency models (FCMs). Our approach alternates between generating texts with citations and supervised fine-tuning with FCM-filtered citation data. Focused learning is integrated into the objective, directing the fine-tuning process to emphasise the factual unit tokens, as measured by an FCM. Results on the ALCE few-shot citation benchmark with various instruction-tuned LLMs demonstrate superior performance compared to in-context learning, vanilla supervised fine-tuning, and state-of-the-art methods, with an average improvement of $34.1$, $15.5$, and $10.5$ citation F$_1$ points, respectively. Moreover, in a domain transfer setting we show that the obtained citation generation ability robustly transfers to unseen datasets. Notably, our citation improvements contribute to the lowest factual error rate across baselines. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: Accepted to ACL 2024. Code release will follow

arXiv:2406.12799 [pdf, ps, other]

Sample-Based Matroid Prophet Inequalities

Authors: Hu Fu, Pinyan Lu, Zhihao Gavin Tang, Hongxun Wu, Jinzhao Wu, Qianfan Zhang

Abstract: We study matroid prophet inequalities when distributions are unknown and accessible only through samples. While single-sample prophet inequalities for special matroids are known, no constant-factor competitive algorithm with even a sublinear number of samples was known for general matroids. Adding more to the stake, the single-sample version of the question for general matroids has close (two-way)… ▽ More We study matroid prophet inequalities when distributions are unknown and accessible only through samples. While single-sample prophet inequalities for special matroids are known, no constant-factor competitive algorithm with even a sublinear number of samples was known for general matroids. Adding more to the stake, the single-sample version of the question for general matroids has close (two-way) connections with the long-standing matroid secretary conjecture. In this work, we give a $(\frac14 - \varepsilon)$-competitive matroid prophet inequality with only $O_\varepsilon(\mathrm{poly} \log n)$ samples. Our algorithm consists of two parts: (i) a novel quantile-based reduction from matroid prophet inequalities to online contention resolution schemes (OCRSs) with $O_\varepsilon(\log n)$ samples, and (ii) a $(\frac14 - \varepsilon)$-selectable matroid OCRS with $O_\varepsilon(\mathrm{poly} \log n)$ samples which carefully addresses an adaptivity challenge. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: To appear at EC'24

arXiv:2406.12754 [pdf, other]

Chumor 1.0: A Truly Funny and Challenging Chinese Humor Understanding Dataset from Ruo Zhi Ba

Authors: Ruiqi He, Yushu He, Longju Bai, Jiarui Liu, Zhenjie Sun, Zenghao Tang, He Wang, Hanchen Xia, Naihao Deng

Abstract: Existing humor datasets and evaluations predominantly focus on English, lacking resources for culturally nuanced humor in non-English languages like Chinese. To address this gap, we construct Chumor, a dataset sourced from Ruo Zhi Ba (RZB), a Chinese Reddit-like platform dedicated to sharing intellectually challenging and culturally specific jokes. We annotate explanations for each joke and evalua… ▽ More Existing humor datasets and evaluations predominantly focus on English, lacking resources for culturally nuanced humor in non-English languages like Chinese. To address this gap, we construct Chumor, a dataset sourced from Ruo Zhi Ba (RZB), a Chinese Reddit-like platform dedicated to sharing intellectually challenging and culturally specific jokes. We annotate explanations for each joke and evaluate human explanations against two state-of-the-art LLMs, GPT-4o and ERNIE Bot, through A/B testing by native Chinese speakers. Our evaluation shows that Chumor is challenging even for SOTA LLMs, and the human explanations for Chumor jokes are significantly better than explanations generated by the LLMs. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.12216 [pdf, other]

Is persona enough for personality? Using ChatGPT to reconstruct an agent's latent personality from simple descriptions

Authors: Yongyi Ji, Zhisheng Tang, Mayank Kejriwal

Abstract: Personality, a fundamental aspect of human cognition, contains a range of traits that influence behaviors, thoughts, and emotions. This paper explores the capabilities of large language models (LLMs) in reconstructing these complex cognitive attributes based only on simple descriptions containing socio-demographic and personality type information. Utilizing the HEXACO personality framework, our st… ▽ More Personality, a fundamental aspect of human cognition, contains a range of traits that influence behaviors, thoughts, and emotions. This paper explores the capabilities of large language models (LLMs) in reconstructing these complex cognitive attributes based only on simple descriptions containing socio-demographic and personality type information. Utilizing the HEXACO personality framework, our study examines the consistency of LLMs in recovering and predicting underlying (latent) personality dimensions from simple descriptions. Our experiments reveal a significant degree of consistency in personality reconstruction, although some inconsistencies and biases, such as a tendency to default to positive traits in the absence of explicit information, are also observed. Additionally, socio-demographic factors like age and number of children were found to influence the reconstructed personality dimensions. These findings have implications for building sophisticated agent-based simulacra using LLMs and highlight the need for further research on robust personality generation in LLMs. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: Accepted to the ICML 2024 Workshop on Large Language Models and Cognition

arXiv:2406.11528 [pdf, other]

Optimal Robust Contract Design

Authors: Bo Peng, Zhihao Gavin Tang

Abstract: We consider the robust contract design problem when the principal only has limited information about the actions the agent can take. The principal evaluates a contract according to its worst-case performance caused by the uncertain action space. Carroll (AER 2015) showed that a linear contract is optimal among deterministic contracts. Recently, Kambhampati (JET 2023) showed that the principal's pa… ▽ More We consider the robust contract design problem when the principal only has limited information about the actions the agent can take. The principal evaluates a contract according to its worst-case performance caused by the uncertain action space. Carroll (AER 2015) showed that a linear contract is optimal among deterministic contracts. Recently, Kambhampati (JET 2023) showed that the principal's payoff can be strictly increased via randomization over linear contracts. In this paper, we characterize the optimal randomized contract, which remains linear and admits a closed form of its cumulative density function. The advantage of randomized contracts over deterministic contracts can be arbitrarily large even when the principal knows only one non-trivial action of the agent. Furthermore, our result generalizes to the model of contracting with teams, by Dai and Toikka (Econometrica 2022). △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: Full version of EC 2024 paper

arXiv:2406.10543 [pdf, other]

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows

Authors: Zhenggang Tang, Zhongzheng Ren, Xiaoming Zhao, Bowen Wen, Jonathan Tremblay, Stan Birchfield, Alexander Schwing

Abstract: We present a method for automatically modifying a NeRF representation based on a single observation of a non-rigid transformed version of the original scene. Our method defines the transformation as a 3D flow, specifically as a weighted linear blending of rigid transformations of 3D anchor points that are defined on the surface of the scene. In order to identify anchor points, we introduce a novel… ▽ More We present a method for automatically modifying a NeRF representation based on a single observation of a non-rigid transformed version of the original scene. Our method defines the transformation as a 3D flow, specifically as a weighted linear blending of rigid transformations of 3D anchor points that are defined on the surface of the scene. In order to identify anchor points, we introduce a novel correspondence algorithm that first matches RGB-based pairs, then leverages multi-view information and 3D reprojection to robustly filter false positives in two steps. We also introduce a new dataset for exploring the problem of modifying a NeRF scene through a single observation. Our dataset ( https://github.com/nerfdeformer/nerfdeformer ) contains 113 synthetic scenes leveraging 47 3D assets. We show that our proposed method outperforms NeRF editing methods as well as diffusion-based methods, and we also explore different methods for filtering correspondences. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Comments: 8 pages of main paper, CVPR 2024. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024

arXiv:2406.10536 [pdf, other]

doi 10.1016/j.scib.2024.06.011

Universal materials model of deep-learning density functional theory Hamiltonian

Authors: Yuxiang Wang, Yang Li, Zechen Tang, He Li, Zilong Yuan, Honggeng Tao, Nianlong Zou, Ting Bao, Xinghao Liang, Zezhou Chen, Shanghua Xu, Ce Bian, Zhiming Xu, Chong Wang, Chen Si, Wenhui Duan, Yong Xu

Abstract: Realizing large materials models has emerged as a critical endeavor for materials research in the new era of artificial intelligence, but how to achieve this fantastic and challenging objective remains elusive. Here, we propose a feasible pathway to address this paramount pursuit by developing universal materials models of deep-learning density functional theory Hamiltonian (DeepH), enabling compu… ▽ More Realizing large materials models has emerged as a critical endeavor for materials research in the new era of artificial intelligence, but how to achieve this fantastic and challenging objective remains elusive. Here, we propose a feasible pathway to address this paramount pursuit by developing universal materials models of deep-learning density functional theory Hamiltonian (DeepH), enabling computational modeling of the complicated structure-property relationship of materials in general. By constructing a large materials database and substantially improving the DeepH method, we obtain a universal materials model of DeepH capable of handling diverse elemental compositions and material structures, achieving remarkable accuracy in predicting material properties. We further showcase a promising application of fine-tuning universal materials models for enhancing specific materials models. This work not only demonstrates the concept of DeepH's universal materials model but also lays the groundwork for developing large materials models, opening up significant opportunities for advancing artificial intelligence-driven materials discovery. △ Less

Submitted 15 June, 2024; originally announced June 2024.

arXiv:2406.10378 [pdf]

An experimental search for an explanation of the difference between beam and bottle neutron lifetime measurements

Authors: M. F. Blatnik, L. S. Blokland, N. Callahan, J. H. Choi, S. Clayton, C. B Cude-Woods, B. W. Filippone, W. R. Fox, E. Fries, P. Geltenbort, F. M. Gonzalez, L. Hayen, K. P. Hickerson, A. T. Holley, T. M. Ito, A. Komives, S Lin, Chen-Yu Liu, M. F. Makela, C. L. Morris, R. Musedinovic, C. M. O'Shaughnessy, R. W. Pattie Jr., J. C. Ramsey, D. J. Salvat , et al. (10 additional authors not shown)

Abstract: The past two decades have yielded several new measurements and reanalysis of older measurements of the neutron lifetime. These have led to a 4.4 standard deviation discrepancy between the most precise measurements of the neutron decay rate producing protons in cold neutron beams and the most precise lifetime measured in neutron storage experiments. Here we publish an analysis of the recently publi… ▽ More The past two decades have yielded several new measurements and reanalysis of older measurements of the neutron lifetime. These have led to a 4.4 standard deviation discrepancy between the most precise measurements of the neutron decay rate producing protons in cold neutron beams and the most precise lifetime measured in neutron storage experiments. Here we publish an analysis of the recently published UCN aimed a searching for an explanation of this difference using the model proposed by Koch and Hummel. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Report number: LA-UR-24-25619

arXiv:2406.08698 [pdf, other]

Constraints on Ultra Heavy Dark Matter Properties from Dwarf Spheroidal Galaxies with LHAASO Observations

Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

Abstract: In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes… ▽ More In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes of astrophysical $γ$-ray background while large amount of dark matter. By analyzing more than 700 days observational data at LHAASO, no significant dark matter signal from 1 TeV to 1 EeV is detected. Accordingly we derive the most stringent constraints on the ultra-heavy dark matter annihilation cross-section up to EeV. The constraints on the lifetime of dark matter in decay mode are also derived. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 17 pages, 12 figures, accepted by PRL

arXiv:2406.06736 [pdf, other]

Long-Term Fairness Inquiries and Pursuits in Machine Learning: A Survey of Notions, Methods, and Challenges

Authors: Usman Gohar, Zeyu Tang, Jialu Wang, Kun Zhang, Peter L. Spirtes, Yang Liu, Lu Cheng

Abstract: The widespread integration of Machine Learning systems in daily life, particularly in high-stakes domains, has raised concerns about the fairness implications. While prior works have investigated static fairness measures, recent studies reveal that automated decision-making has long-term implications and that off-the-shelf fairness approaches may not serve the purpose of achieving long-term fairne… ▽ More The widespread integration of Machine Learning systems in daily life, particularly in high-stakes domains, has raised concerns about the fairness implications. While prior works have investigated static fairness measures, recent studies reveal that automated decision-making has long-term implications and that off-the-shelf fairness approaches may not serve the purpose of achieving long-term fairness. Additionally, the existence of feedback loops and the interaction between models and the environment introduces additional complexities that may deviate from the initial fairness goals. In this survey, we review existing literature on long-term fairness from different perspectives and present a taxonomy for long-term fairness studies. We highlight key challenges and consider future research directions, analyzing both current issues and potential further explorations. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.05138 [pdf]

doi 10.1016/j.compgeo.2024.106454

A Novel Coupled bES-FEM Formulation with SUPG stabilization for Thermo-Hydro-Mechanical Analysis in Saturated Porous Media

Authors: Zi-Qi Tang, Xi-Wen Zhou, Yin-Fu Jin, Zhen-Yu Yin, Qi Zhang

Abstract: Two primary types of numerical instabilities often occur in low-order finite element method (FEM) analyses of thermo-hydro-mechanical (THM) phenomena: (1) pressure oscillations arising improper interpolation of pressure and displacement fields; and (2) spatial oscillations induced by nonlinear convection terms in convection-dominated scenarios. In response to these issues, this paper proposes a no… ▽ More Two primary types of numerical instabilities often occur in low-order finite element method (FEM) analyses of thermo-hydro-mechanical (THM) phenomena: (1) pressure oscillations arising improper interpolation of pressure and displacement fields; and (2) spatial oscillations induced by nonlinear convection terms in convection-dominated scenarios. In response to these issues, this paper proposes a novel stabilized edge-based smoothed FEM with a bubble function (bES-FEM) for THM analysis within saturated porous media. In the proposed framework, a cubic bubble function is first incorporated into ES-FEM to efficiently mitigate pressure oscillations that breach the Inf-Sup condition, and then the Streamline Upwind Petrov-Galerkin (SUPG) scheme is adopted in bES-FEM to effectively reduce the spurious oscillations in convection-dominated heat transfer scenarios. The accuracy of the bES-FEM with SUPG formulation for THM coupled problems is validated through a series of five benchmark tests. Moreover, the simulations of open-loop ground source energy systems demonstrate the proposed method's exceptional capability in tackling complex THM challenges in real-world applications. All the obtained results showcase the superiority of proposed bES-FEM with SUPG in eliminating the spatial and pressure oscillations, marking it as a promising tool for the exploration of coupled THM issues. △ Less

Submitted 20 May, 2024; originally announced June 2024.

Comments: 39 pages (include references), 18 figures, 7 tables

Journal ref: Comput. Geotech. 173 (2024) 106454

arXiv:2406.04325 [pdf, other]

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Authors: Lin Chen, Xilin Wei, Jinsong Li, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Zehui Chen, Haodong Duan, Bin Lin, Zhenyu Tang, Li Yuan, Yu Qiao, Dahua Lin, Feng Zhao, Jiaqi Wang

Abstract: We present the ShareGPT4Video series, aiming to facilitate the video understanding of large video-language models (LVLMs) and the video generation of text-to-video models (T2VMs) via dense and precise captions. The series comprises: 1) ShareGPT4Video, 40K GPT4V annotated dense captions of videos with various lengths and sources, developed through carefully designed data filtering and annotating st… ▽ More We present the ShareGPT4Video series, aiming to facilitate the video understanding of large video-language models (LVLMs) and the video generation of text-to-video models (T2VMs) via dense and precise captions. The series comprises: 1) ShareGPT4Video, 40K GPT4V annotated dense captions of videos with various lengths and sources, developed through carefully designed data filtering and annotating strategy. 2) ShareCaptioner-Video, an efficient and capable captioning model for arbitrary videos, with 4.8M high-quality aesthetic videos annotated by it. 3) ShareGPT4Video-8B, a simple yet superb LVLM that reached SOTA performance on three advancing video benchmarks. To achieve this, taking aside the non-scalable costly human annotators, we find using GPT4V to caption video with a naive multi-frame or frame-concatenation input strategy leads to less detailed and sometimes temporal-confused results. We argue the challenge of designing a high-quality video captioning strategy lies in three aspects: 1) Inter-frame precise temporal change understanding. 2) Intra-frame detailed content description. 3) Frame-number scalability for arbitrary-length videos. To this end, we meticulously designed a differential video captioning strategy, which is stable, scalable, and efficient for generating captions for videos with arbitrary resolution, aspect ratios, and length. Based on it, we construct ShareGPT4Video, which contains 40K high-quality videos spanning a wide range of categories, and the resulting captions encompass rich world knowledge, object attributes, camera movements, and crucially, detailed and precise temporal descriptions of events. Based on ShareGPT4Video, we further develop ShareCaptioner-Video, a superior captioner capable of efficiently generating high-quality captions for arbitrary videos... △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: Project Page: https://sharegpt4video.github.io/

arXiv:2406.02924 [pdf, other]

Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for Large Language Models

Authors: Peijie Dong, Lujun Li, Zhenheng Tang, Xiang Liu, Xinglin Pan, Qiang Wang, Xiaowen Chu

Abstract: Despite the remarkable capabilities, Large Language Models (LLMs) face deployment challenges due to their extensive size. Pruning methods drop a subset of weights to accelerate, but many of them require retraining, which is prohibitively expensive and computationally demanding. Recently, post-training pruning approaches introduced novel metrics, enabling the pruning of LLMs without retraining. How… ▽ More Despite the remarkable capabilities, Large Language Models (LLMs) face deployment challenges due to their extensive size. Pruning methods drop a subset of weights to accelerate, but many of them require retraining, which is prohibitively expensive and computationally demanding. Recently, post-training pruning approaches introduced novel metrics, enabling the pruning of LLMs without retraining. However, these metrics require the involvement of human experts and tedious trial and error. To efficiently identify superior pruning metrics, we develop an automatic framework for searching symbolic pruning metrics using genetic programming. In particular, we devise an elaborate search space encompassing the existing pruning metrics to discover the potential symbolic pruning metric. We propose an opposing operation simplification strategy to increase the diversity of the population. In this way, Pruner-Zero allows auto-generation of symbolic pruning metrics. Based on the searched results, we explore the correlation between pruning metrics and performance after pruning and summarize some principles. Extensive experiments on LLaMA and LLaMA-2 on language modeling and zero-shot tasks demonstrate that our Pruner-Zero obtains superior performance than SOTA post-training pruning methods. Code at: \url{https://github.com/pprp/Pruner-Zero}. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: Accepted by ICML2024, 29 pages, 4 figures

arXiv:2406.01719 [pdf, other]

Imputation of Missing Photometric Data and Photometric Redshift Estimation for CSST

Authors: Zhijian Luo, Zhirui Tang, Zhu Chen, Liping Fu, Wei Du, Shaohua Zhang, Yan Gong, Chenggang Shu, Junhao Lu, Yicheng Li, Xian-Min Meng, Xingchen Zhou, Zuhui Fan

Abstract: Accurate photometric redshift (photo-$z$) estimation requires support from multi-band observational data. However, in the actual process of astronomical observations and data processing, some sources may have missing observational data in certain bands for various reasons. This could greatly affect the accuracy and reliability of photo-$z$ estimation for these sources, and even render some estimat… ▽ More Accurate photometric redshift (photo-$z$) estimation requires support from multi-band observational data. However, in the actual process of astronomical observations and data processing, some sources may have missing observational data in certain bands for various reasons. This could greatly affect the accuracy and reliability of photo-$z$ estimation for these sources, and even render some estimation methods unusable. The same situation may exist for the upcoming Chinese Space Station Telescope (CSST). In this study, we employ a deep learning method called Generative Adversarial Imputation Networks (GAIN) to impute the missing photometric data in CSST, aiming to reduce the impact of data missing on photo-$z$ estimation and improve estimation accuracy. Our results demonstrate that using the GAIN technique can effectively fill in the missing photometric data in CSST. Particularly, when the data missing rate is below 30\%, the imputation of photometric data exhibits high accuracy, with higher accuracy in the $g$, $r$, $i$, $z$, and $y$ bands compared to the $NUV$ and $u$ bands. After filling in the missing values, the quality of photo-$z$ estimation obtained by the widely used Easy and Accurate Zphot from Yale (EAZY) software is notably enhanced. Evaluation metrics for assessing the quality of photo-$z$ estimation, including the catastrophic outlier fraction ($f_{out}$), the normalized median absolute deviation ($\rm {σ_{NMAD}}$), and the bias of photometric redshift ($bias$), all show some degree of improvement. Our research will help maximize the utilization of observational data and provide a new method for handling sample missing values for applications that require complete photometry data to produce results. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2406.01193 [pdf, other]

Programmable Multi-input Buck-Boost Converter for Photovoltaics Arrays

Authors: Zhongting Tang, Yi Zhang, Pooya Davari

Abstract: This paper proposes a programmable multi-input buck-boost structure method, which can enhance the operation tolerance for the PV array under extremely harsh climatic conditions. The proposed structure based on a traditional two switches buck-boost converter can connect PV panels in parallel and cascade flexibly, and also enable the individual operation of each PV panel. The active switches can be… ▽ More This paper proposes a programmable multi-input buck-boost structure method, which can enhance the operation tolerance for the PV array under extremely harsh climatic conditions. The proposed structure based on a traditional two switches buck-boost converter can connect PV panels in parallel and cascade flexibly, and also enable the individual operation of each PV panel. The active switches can be programmed to change the connection structures as well as achieve the maximum power point track of PV panels simultaneously. The paper presents the programming method for an exemplified scalable structure converter for two PV panels. The simulation has been established in MATLAB/Simulink to validate the performance of the proposed converter in terms of multiplexing function, wide operating range of PV panels, and low switching stress. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: This work has been accepted by the conference proceedings of IPEMC 2024-ECCE Asia

arXiv:2405.19448 [pdf]

doi 10.18429/JACoW-IPAC2024-THPS42

Pressure Spike in The LBNF Absorber Core s Gun Drilled Cooling Channel from an Accident Beam Pulse

Authors: A. Deshpande, P. Hurh, J. Hylen, A. Lee, J. Lewis, I. Rakhno, V. I. Sidorov, Z. Tang, S. Tariq I. Tropin

Abstract: The LBNF Absorber consists of thirteen 6061-T6 aluminum core blocks. The core blocks are water cooled with de-ionized (DI) water which becomes radioactive during beam operations. The cooling water flows through gun-drilled channels in the core blocks. The cooling water is supplied by the LBNF Absorber Radioactive Water (RAW) cooling system which is designed as per ASME B31.3 Normal Fluid Service [… ▽ More The LBNF Absorber consists of thirteen 6061-T6 aluminum core blocks. The core blocks are water cooled with de-ionized (DI) water which becomes radioactive during beam operations. The cooling water flows through gun-drilled channels in the core blocks. The cooling water is supplied by the LBNF Absorber Radioactive Water (RAW) cooling system which is designed as per ASME B31.3 Normal Fluid Service [1]. An uninhibited beam accident pulse striking the water channels was identified as a credible accident scenario. In this study, it is assumed that the beam pulse hits the Absorber directly without interacting with any of the other upstream beamline components. The beam parameters used for the LBNF beam are 120 GeV, 2.4 MW with a 1.2 s cycle time. The accident pulse lasts for 10 μs. The maximum energy is deposited in the 3rd aluminum core block. For the sake of simplicity, it is assumed that the accident pulse strikes the 1 in. ID water channel directly. The analysis here simulates the pressure rise in the water during and after the beam pulse and its effects on the aluminum piping components that deliver water to the core blocks. The weld strengths as determined by the Load and Resistance Factor Design (LRDF) and the Allowable Strength Design (ASD) are compared to the forces generated in the weld owing to the pressure spike. A transient structural analysis was used to determine the equivalent membrane, peak, and bending stresses and they were com-pared to allowable limits. △ Less

Submitted 31 May, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

Comments: IPAC'24 - 15th International Particle Accelerator Conference

Report number: FERMILAB-CONF-23-0797-AD-LBNF-PPD

Journal ref: JACoW IPAC2024 (2024) THPS42

arXiv:2405.18881 [pdf, other]

Tuning-Free Alignment of Diffusion Models with Direct Noise Optimization

Authors: Zhiwei Tang, Jiangweizhi Peng, Jiasheng Tang, Mingyi Hong, Fan Wang, Tsung-Hui Chang

Abstract: In this work, we focus on the alignment problem of diffusion models with a continuous reward function, which represents specific objectives for downstream tasks, such as improving human preference. The central goal of the alignment problem is to adjust the distribution learned by diffusion models such that the generated samples maximize the target reward function. We propose a novel alignment appr… ▽ More In this work, we focus on the alignment problem of diffusion models with a continuous reward function, which represents specific objectives for downstream tasks, such as improving human preference. The central goal of the alignment problem is to adjust the distribution learned by diffusion models such that the generated samples maximize the target reward function. We propose a novel alignment approach, named Direct Noise Optimization (DNO), that optimizes the injected noise during the sampling process of diffusion models. By design, DNO is tuning-free and prompt-agnostic, as the alignment occurs in an online fashion during generation. We rigorously study the theoretical properties of DNO and also propose variants to deal with non-differentiable reward functions. Furthermore, we identify that naive implementation of DNO occasionally suffers from the out-of-distribution reward hacking problem, where optimized samples have high rewards but are no longer in the support of the pretrained distribution. To remedy this issue, we leverage classical high-dimensional statistics theory and propose to augment the DNO loss with certain probability regularization. We conduct extensive experiments on several popular reward functions trained on human feedback data and demonstrate that the proposed DNO approach achieves state-of-the-art reward scores as well as high image quality, all within a reasonable time budget for generation. △ Less

Submitted 3 July, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.17743 [pdf, other]

ORLM: Training Large Language Models for Optimization Modeling

Authors: Zhengyang Tang, Chenyu Huang, Xin Zheng, Shixi Hu, Zizhuo Wang, Dongdong Ge, Benyou Wang

Abstract: Large Language Models (LLMs) have emerged as powerful tools for tackling complex Operations Research (OR) problem by providing the capacity in automating optimization modeling. However, current methodologies heavily rely on prompt engineering (e.g., multi-agent cooperation) with proprietary LLMs, raising data privacy concerns that could be prohibitive in industry applications. To tackle this issue… ▽ More Large Language Models (LLMs) have emerged as powerful tools for tackling complex Operations Research (OR) problem by providing the capacity in automating optimization modeling. However, current methodologies heavily rely on prompt engineering (e.g., multi-agent cooperation) with proprietary LLMs, raising data privacy concerns that could be prohibitive in industry applications. To tackle this issue, we propose training open-source LLMs for optimization modeling. We identify four critical requirements for the training dataset of OR LLMs, design and implement OR-Instruct, a semi-automated process for creating synthetic data tailored to specific requirements. We also introduce the IndustryOR benchmark, the first industrial benchmark for testing LLMs on solving real-world OR problems. We apply the data from OR-Instruct to various open-source LLMs of 7b size (termed as ORLMs), resulting in a significantly improved capability for optimization modeling. Our best-performing ORLM achieves state-of-the-art performance on the NL4OPT, MAMO, and IndustryOR benchmarks. Our code and data are available at \url{https://github.com/Cardinal-Operations/ORLM}. △ Less

Submitted 29 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

Comments: Work in progress

arXiv:2405.15185 [pdf, other]

An Evaluation of Estimative Uncertainty in Large Language Models

Authors: Zhisheng Tang, Ke Shen, Mayank Kejriwal

Abstract: Words of estimative probability (WEPs), such as ''maybe'' or ''probably not'' are ubiquitous in natural language for communicating estimative uncertainty, compared with direct statements involving numerical probability. Human estimative uncertainty, and its calibration with numerical estimates, has long been an area of study -- including by intelligence agencies like the CIA. This study compares e… ▽ More Words of estimative probability (WEPs), such as ''maybe'' or ''probably not'' are ubiquitous in natural language for communicating estimative uncertainty, compared with direct statements involving numerical probability. Human estimative uncertainty, and its calibration with numerical estimates, has long been an area of study -- including by intelligence agencies like the CIA. This study compares estimative uncertainty in commonly used large language models (LLMs) like GPT-4 and ERNIE-4 to that of humans, and to each other. Here we show that LLMs like GPT-3.5 and GPT-4 align with human estimates for some, but not all, WEPs presented in English. Divergence is also observed when the LLM is presented with gendered roles and Chinese contexts. Further study shows that an advanced LLM like GPT-4 can consistently map between statistical and estimative uncertainty, but a significant performance gap remains. The results contribute to a growing body of research on human-LLM alignment. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.15156 [pdf, other]

Spectral fittings of warm coronal radiation with high seed photon temperature: apparent low-temperature and flat soft excess in AGNs

Authors: Ze-Yuan Tang, Jun-Jie Feng, Jun-Hui Fan

Abstract: A warm corona has been widely proposed to explain the soft X-ray excess (SE) above the 2--10 keV power law extrapolation in AGNs. In actual spectral fittings, the warm coronal seed photon temperature ($T_{\rm s}$) is usually assumed to be far away from the soft X-ray, but $kT_{\rm s}$ can reach close to 0.1 keV in standard accretion disc model. In this study, we used Monte Carlo simulations to obt… ▽ More A warm corona has been widely proposed to explain the soft X-ray excess (SE) above the 2--10 keV power law extrapolation in AGNs. In actual spectral fittings, the warm coronal seed photon temperature ($T_{\rm s}$) is usually assumed to be far away from the soft X-ray, but $kT_{\rm s}$ can reach close to 0.1 keV in standard accretion disc model. In this study, we used Monte Carlo simulations to obtain radiation spectra from a slab-like warm corona and fitted the spectra using the spherical-geometry-based routine \textsc{thcomp} or a thermal component. Our findings reveal that high $T_{\rm s}$ can influence the fitting results. A moderately high $kT_{\rm s}$ (around 0.03 keV) can result in an apparent low-temperature and flat SE, while an extremely high $kT_{\rm s}$ (around 0.07 keV) can even produce an unobserved blackbody-like SE. Our conclusions indicate that, for spectral fittings of the warm coronal radiation (SE in AGNs), $kT_{\rm s}$ should be treated as a free parameter with an upper limit, and an accurate coronal geometry is necessary when $kT_{\rm s}>0.01$ keV. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: Accepted for publication in RAA

arXiv:2405.13028 [pdf, other]

DuetSim: Building User Simulator with Dual Large Language Models for Task-Oriented Dialogues

Authors: Xiang Luo, Zhiwen Tang, Jin Wang, Xuejie Zhang

Abstract: User Simulators play a pivotal role in training and evaluating task-oriented dialogue systems. Traditional user simulators typically rely on human-engineered agendas, resulting in generated responses that often lack diversity and spontaneity. Although large language models (LLMs) exhibit a remarkable capacity for generating coherent and contextually appropriate utterances, they may fall short when… ▽ More User Simulators play a pivotal role in training and evaluating task-oriented dialogue systems. Traditional user simulators typically rely on human-engineered agendas, resulting in generated responses that often lack diversity and spontaneity. Although large language models (LLMs) exhibit a remarkable capacity for generating coherent and contextually appropriate utterances, they may fall short when tasked with generating responses that effectively guide users towards their goals, particularly in dialogues with intricate constraints and requirements. This paper introduces DuetSim, a novel framework designed to address the intricate demands of task-oriented dialogues by leveraging LLMs. DuetSim stands apart from conventional approaches by employing two LLMs in tandem: one dedicated to response generation and the other focused on verification. This dual LLM approach empowers DuetSim to produce responses that not only exhibit diversity but also demonstrate accuracy and are preferred by human users. We validate the efficacy of our method through extensive experiments conducted on the MultiWOZ dataset, highlighting improvements in response quality and correctness, largely attributed to the incorporation of the second LLM. Our code is accessible at: https://github.com/suntea233/DuetSim. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: Accepted by COLING 2024

arXiv:2405.12806 [pdf, other]

MOSS: Motion-based 3D Clothed Human Synthesis from Monocular Video

Authors: Hongsheng Wang, Xiang Cai, Xi Sun, Jinhong Yue, Zhanyun Tang, Shengyu Zhang, Feng Lin, Fei Wu

Abstract: Single-view clothed human reconstruction holds a central position in virtual reality applications, especially in contexts involving intricate human motions. It presents notable challenges in achieving realistic clothing deformation. Current methodologies often overlook the influence of motion on surface deformation, resulting in surfaces lacking the constraints imposed by global motion. To overcom… ▽ More Single-view clothed human reconstruction holds a central position in virtual reality applications, especially in contexts involving intricate human motions. It presents notable challenges in achieving realistic clothing deformation. Current methodologies often overlook the influence of motion on surface deformation, resulting in surfaces lacking the constraints imposed by global motion. To overcome these limitations, we introduce an innovative framework, Motion-Based 3D Clo}thed Humans Synthesis (MOSS), which employs kinematic information to achieve motion-aware Gaussian split on the human surface. Our framework consists of two modules: Kinematic Gaussian Locating Splatting (KGAS) and Surface Deformation Detector (UID). KGAS incorporates matrix-Fisher distribution to propagate global motion across the body surface. The density and rotation factors of this distribution explicitly control the Gaussians, thereby enhancing the realism of the reconstructed surface. Additionally, to address local occlusions in single-view, based on KGAS, UID identifies significant surfaces, and geometric reconstruction is performed to compensate for these deformations. Experimental results demonstrate that MOSS achieves state-of-the-art visual quality in 3D clothed human synthesis from monocular videos. Notably, we improve the Human NeRF and the Gaussian Splatting by 33.94% and 16.75% in LPIPS* respectively. Codes are available at https://wanghongsheng01.github.io/MOSS/. △ Less

Submitted 21 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

Comments: arXiv admin note: text overlap with arXiv:1710.03746 by other authors

arXiv:2405.12477 [pdf, other]

Gaussian Control with Hierarchical Semantic Graphs in 3D Human Recovery

Authors: Hongsheng Wang, Weiyue Zhang, Sihao Liu, Xinrui Zhou, Jing Li, Zhanyun Tang, Shengyu Zhang, Fei Wu, Feng Lin

Abstract: Although 3D Gaussian Splatting (3DGS) has recently made progress in 3D human reconstruction, it primarily relies on 2D pixel-level supervision, overlooking the geometric complexity and topological relationships of different body parts. To address this gap, we introduce the Hierarchical Graph Human Gaussian Control (HUGS) framework for achieving high-fidelity 3D human reconstruction. Our approach i… ▽ More Although 3D Gaussian Splatting (3DGS) has recently made progress in 3D human reconstruction, it primarily relies on 2D pixel-level supervision, overlooking the geometric complexity and topological relationships of different body parts. To address this gap, we introduce the Hierarchical Graph Human Gaussian Control (HUGS) framework for achieving high-fidelity 3D human reconstruction. Our approach involves leveraging explicitly semantic priors of body parts to ensure the consistency of geometric topology, thereby enabling the capture of the complex geometrical and topological associations among body parts. Additionally, we disentangle high-frequency features from global human features to refine surface details in body parts. Extensive experiments demonstrate that our method exhibits superior performance in human body reconstruction, particularly in enhancing surface details and accurately reconstructing body part junctions. Codes are available at https://wanghongsheng01.github.io/HUGS/. △ Less

Submitted 21 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

arXiv:2405.11826 [pdf, other]

Data quality control system and long-term performance monitor of the LHAASO-KM2A

Authors: Zhen Cao, F. Aharonian, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, W. Bian, A. V. Bukevich, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, H. X. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. Chen , et al. (263 additional authors not shown)

Abstract: The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To… ▽ More The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To ensure the reliability of the LHAASO-KM2A data, a three-level quality control system has been established. It is used to monitor the status of detector units, stability of reconstructed parameters and the performance of the array based on observations of the Crab Nebula and Moon shadow. This paper will introduce the control system and its application on the LHAASO-KM2A data collected from August 2021 to July 2023. During this period, the pointing and angular resolution of the array were stable. From the observations of the Moon shadow and Crab Nebula, the results achieved using the two methods are consistent with each other. According to the observation of the Crab Nebula at energies from 25 TeV to 100 TeV, the time averaged pointing errors are estimated to be $-0.003^{\circ} \pm 0.005^{\circ}$ and $0.001^{\circ} \pm 0.006^{\circ}$ in the R.A. and Dec directions, respectively. △ Less

Submitted 13 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

Comments: 15 pages, 9 figures

arXiv:2405.11102 [pdf, other]

CMOS-Compatible, AlScN-Based Integrated Electro-Optic Modulator

Authors: Valerie Yoshioka, Jicheng Jin, Haiqi Zhou, Zichen Tang, Roy H. Olsson III, Bo Zhen

Abstract: Commercial production of integrated photonic devices is limited by scalability of desirable material platforms. We explore a relatively new photonic material, AlScN, for its use in electro-optic modulation. Its CMOS-compatibility could facilitate large-scale production of integrated photonic modulators, and it exhibits an enhanced second-order optical nonlinearity compared to intrinsic AlN, indica… ▽ More Commercial production of integrated photonic devices is limited by scalability of desirable material platforms. We explore a relatively new photonic material, AlScN, for its use in electro-optic modulation. Its CMOS-compatibility could facilitate large-scale production of integrated photonic modulators, and it exhibits an enhanced second-order optical nonlinearity compared to intrinsic AlN, indicating the possibility for efficient modulation. Here, we measure the electro-optic effect in AlScN-based modulators, demonstrating $V_πL$ around 750 V$\cdot$cm. Since the electro-optic response is smaller than expected, we discuss potential causes for the reduced response and future outlook for AlScN-based photonics. △ Less

Submitted 17 May, 2024; originally announced May 2024.

Comments: 9 pages, 3 figures

arXiv:2405.10235 [pdf]

Novel Data Models for Inter-operable LCA Frameworks

Authors: Kourosh Malek, Max Dreger, Zirui Tang, Qingshi Tu

Abstract: Life cycle assessment (LCA) plays a critical role in assessing the environmental impacts of a product, technology, or service throughout its entire life cycle. Nonetheless, many existing LCA tools and methods lack adequate metadata management, which can hinder their further development and wide adoption. In the example of LCA for clean energy technologies, metadata helps monitor data and the envir… ▽ More Life cycle assessment (LCA) plays a critical role in assessing the environmental impacts of a product, technology, or service throughout its entire life cycle. Nonetheless, many existing LCA tools and methods lack adequate metadata management, which can hinder their further development and wide adoption. In the example of LCA for clean energy technologies, metadata helps monitor data and the environment that holds the integrity of the energy assets and sustainability of the materials sources across their entire value chains. Ontologizing metadata, i.e. a common vocabulary and language to connect multiple data sources, as well as implementing AI-aware data management, can have long-lasting, positive, and accelerating effects along with collecting and utilizing quality data from different sources and across the entire data lifecycle. The integration of ontologies in life cycle assessments has garnered significant attention in recent years. We synthesized the existing literature on ontologies for LCAs, providing insights into this interdisciplinary field's evolution, current state, and future directions. We also proposed the framework for a suitable data model and the workflow thereof to warrant the alignment with existing ontologies, practical frameworks, and industry standards. △ Less

Submitted 16 May, 2024; originally announced May 2024.

arXiv:2405.07908 [pdf, other]

Collaborative Planar Pushing of Polytopic Objects with Multiple Robots in Complex Scenes

Authors: Zili Tang, Yuming Feng, Meng Guo

Abstract: Pushing is a simple yet effective skill for robots to interact with and further change the environment. Related work has been mostly focused on utilizing it as a non-prehensile manipulation primitive for a robotic manipulator. However, it can also be beneficial for low-cost mobile robots that are not equipped with a manipulator. This work tackles the general problem of controlling a team of mobile… ▽ More Pushing is a simple yet effective skill for robots to interact with and further change the environment. Related work has been mostly focused on utilizing it as a non-prehensile manipulation primitive for a robotic manipulator. However, it can also be beneficial for low-cost mobile robots that are not equipped with a manipulator. This work tackles the general problem of controlling a team of mobile robots to push collaboratively polytopic objects within complex obstacle-cluttered environments. It incorporates several characteristic challenges for contact-rich tasks such as the hybrid switching among different contact modes and under-actuation due to constrained contact forces. The proposed method is based on hybrid optimization over a sequence of possible modes and the associated pushing forces, where (i) a set of sufficient modes is generated with a multi-directional feasibility estimation, based on quasi-static analyses for general objects and any number of robots; (ii) a hierarchical hybrid search algorithm is designed to iteratively decompose the navigation path via arc segments and select the optimal parameterized mode; and (iii) a nonlinear model predictive controller is proposed to track the desired pushing velocities adaptively online for each robot. The proposed framework is complete under mild assumptions. Its efficiency and effectiveness are validated in high-fidelity simulations and hardware experiments. Robustness to motion and actuation uncertainties is also demonstrated. △ Less

Submitted 1 June, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

Comments: Robotics: Science and Systems (RSS) 2024.Videos are available on https://zilitang.github.io/Collaborative-Pushing

arXiv:2405.07691 [pdf, other]

Discovery of Very-high-energy Gamma-ray Emissions from the Low Luminosity AGN NGC 4278 by LHAASO

Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

Abstract: The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) i… ▽ More The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) is compatible with NGC 4278 within $\sim0.03$ degree. Variation analysis shows an indication of the variability at a few months level in the TeV band, which is consistent with low frequency observations. Based on these observations, we report the detection of TeV $γ$-ray emissions from this low-luminosity AGN NGC 4278. The observations by LHAASO-WCDA during active period has a significance level of 8.8\,$σ$ with best-fit photon spectral index $\varGamma=2.56\pm0.14$ and a flux $f_{1-10\,\rm{TeV}}=(7.0\pm1.1_{\rm{sta}}\pm0.35_{\rm{syst}})\times10^{-13}\,\rm{photons\,cm^{-2}\,s^{-1}}$, or approximately $5\%$ of the Crab Nebula. The discovery of VHE from NGC 4278 indicates that the compact, weak radio jet can efficiently accelerate particles and emit TeV photons. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 11 pages, 5 figures

arXiv:2405.07120 [pdf]

doi 10.1103/PhysRevApplied.21.054019

Quasiparticle and Excitonic Structures of Few-layer and Bulk GaSe: Interlayer Coupling, Self-energy, and Electron-hole Interaction

Authors: Fanhao Jia, Zhao Tang, Greis J. Cruz, Weiwei Gao, Shaowen Xu, Wei Ren, Peihong Zhang

Abstract: Metal monochalcogenide GaSe is a classic layered semiconductor that has received increasing research interest due to its highly tunable electronic and optical properties for ultrathin electronics applications. Despite intense research efforts, a systematic understanding of the layer-dependent electronic and optical properties of GaSe remains to be established, and there appear significant discrepa… ▽ More Metal monochalcogenide GaSe is a classic layered semiconductor that has received increasing research interest due to its highly tunable electronic and optical properties for ultrathin electronics applications. Despite intense research efforts, a systematic understanding of the layer-dependent electronic and optical properties of GaSe remains to be established, and there appear significant discrepancies between different experiments. We have performed GW plus Bethe-Salpeter equation (BSE) calculations for few-layer and bulk GaSe, aiming at understanding the effects of interlayer coupling and dielectric screening on excited state properties of GaSe, and how the electronic and optical properties evolve from strongly two-dimensional (2D) like to intermediate thick layers, and to three-dimensional (3D) bulk character. Using a new definition of the exciton binding energy, we are able to calculate the binding energies of all excitonic states. Our results reveal an interesting correlation between the binding energy of an exciton and the spread of its wave function in the real and momentum spaces. We find that the existence of (nearly) parallel valence and conduction bands facilitates the formation of excitonic states that spread out in the momentum space. Thus, these excitons tend to be more localized in real space and have large exciton binding energies. The interlayer coupling substantially suppresses the Mexican-hat-like dispersion of the top valence band seen in monolayer system, explaining the greatly enhanced photoluminescence (PL) as layer thickness increases. Our results also help resolve apparent discrepancies between different experiments. After including the quasiparticle and excitonic effects as well the optical activities of excitons, our results compare well with available experimental results. △ Less

Submitted 11 May, 2024; originally announced May 2024.

Journal ref: Phys. Rev. Applied 21, 054019 (2024)

arXiv:2405.05957 [pdf, other]

OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage Pruning

Authors: Dan Qiao, Yi Su, Pinzheng Wang, Jing Ye, Wenjing Xie, Yuechi Zhou, Yuyang Ding, Zecheng Tang, Jikai Wang, Yixin Ji, Yue Wang, Pei Guo, Zechen Sun, Zikang Zhang, Juntao Li, Pingfu Chao, Wenliang Chen, Guohong Fu, Guodong Zhou, Qiaoming Zhu, Min Zhang

Abstract: Large Language Models (LLMs) have played an important role in many fields due to their powerful capabilities.However, their massive number of parameters leads to high deployment requirements and incurs significant inference costs, which impedes their practical applications. Training smaller models is an effective way to address this problem. Therefore, we introduce OpenBA-V2, a 3.4B model derived… ▽ More Large Language Models (LLMs) have played an important role in many fields due to their powerful capabilities.However, their massive number of parameters leads to high deployment requirements and incurs significant inference costs, which impedes their practical applications. Training smaller models is an effective way to address this problem. Therefore, we introduce OpenBA-V2, a 3.4B model derived from multi-stage compression and continual pre-training from the original 15B OpenBA model. OpenBA-V2 utilizes more data, more flexible training objectives, and techniques such as layer pruning, neural pruning, and vocabulary pruning to achieve a compression rate of 77.3\% with minimal performance loss. OpenBA-V2 demonstrates competitive performance compared to other open-source models of similar size, achieving results close to or on par with the 15B OpenBA model in downstream tasks such as common sense reasoning and Named Entity Recognition (NER). OpenBA-V2 illustrates that LLMs can be compressed into smaller ones with minimal performance loss by employing advanced training objectives and data strategies, which may help deploy LLMs in resource-limited scenarios. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.05584 [pdf, other]

A Survey on Backbones for Deep Video Action Recognition

Authors: Zixuan Tang, Youjun Zhao, Yuhang Wen, Mengyuan Liu

Abstract: Action recognition is a key technology in building interactive metaverses. With the rapid development of deep learning, methods in action recognition have also achieved great advancement. Researchers design and implement the backbones referring to multiple standpoints, which leads to the diversity of methods and encountering new challenges. This paper reviews several action recognition methods bas… ▽ More Action recognition is a key technology in building interactive metaverses. With the rapid development of deep learning, methods in action recognition have also achieved great advancement. Researchers design and implement the backbones referring to multiple standpoints, which leads to the diversity of methods and encountering new challenges. This paper reviews several action recognition methods based on deep neural networks. We introduce these methods in three parts: 1) Two-Streams networks and their variants, which, specifically in this paper, use RGB video frame and optical flow modality as input; 2) 3D convolutional networks, which make efforts in taking advantage of RGB modality directly while extracting different motion information is no longer necessary; 3) Transformer-based methods, which introduce the model from natural language processing into computer vision and video understanding. We offer objective sights in this review and hopefully provide a reference for future research. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: This paper has been accepted by ICME workshop

arXiv:2405.05004 [pdf, other]

TENet: Targetness Entanglement Incorporating with Multi-Scale Pooling and Mutually-Guided Fusion for RGB-E Object Tracking

Authors: Pengcheng Shao, Tianyang Xu, Zhangyong Tang, Linze Li, Xiao-Jun Wu, Josef Kittler

Abstract: There is currently strong interest in improving visual object tracking by augmenting the RGB modality with the output of a visual event camera that is particularly informative about the scene motion. However, existing approaches perform event feature extraction for RGB-E tracking using traditional appearance models, which have been optimised for RGB only tracking, without adapting it for the intri… ▽ More There is currently strong interest in improving visual object tracking by augmenting the RGB modality with the output of a visual event camera that is particularly informative about the scene motion. However, existing approaches perform event feature extraction for RGB-E tracking using traditional appearance models, which have been optimised for RGB only tracking, without adapting it for the intrinsic characteristics of the event data. To address this problem, we propose an Event backbone (Pooler), designed to obtain a high-quality feature representation that is cognisant of the innate characteristics of the event data, namely its sparsity. In particular, Multi-Scale Pooling is introduced to capture all the motion feature trends within event data through the utilisation of diverse pooling kernel sizes. The association between the derived RGB and event representations is established by an innovative module performing adaptive Mutually Guided Fusion (MGF). Extensive experimental results show that our method significantly outperforms state-of-the-art trackers on two widely used RGB-E tracking datasets, including VisEvent and COESOT, where the precision and success rates on COESOT are improved by 4.9% and 5.2%, respectively. Our code will be available at https://github.com/SSSpc333/TENet. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2405.02309 [pdf, other]

YAP:Ce scintillator as an absolute ultracold neutron detector

Authors: M. Krivoš, Z. Tang, N. Floyd, C. L. Morris, M. Blatnik, C. Cude-Woods, S. M. Clayton, A. T. Holley, T. M. Ito, C. -Y. Liu, M. Makela, I. F. Martinez, A. S. C. Navazo, C. M. O'Shaughnessy, E. L. Renner, R. W. Pattie, A. R. Young

Abstract: The upcoming UCNProBe experiment at Los Alamos National Laboratory will measure the $β$-decay rate of free neutrons with different systematic uncertainties than previous beam-based neutron lifetime experiments. We have developed a new $^{10}$B-coated YAP:Ce scintillator whose properties are presented. The advantage of the YAP:Ce scintillator is its high Fermi potential, which reduces the probabili… ▽ More The upcoming UCNProBe experiment at Los Alamos National Laboratory will measure the $β$-decay rate of free neutrons with different systematic uncertainties than previous beam-based neutron lifetime experiments. We have developed a new $^{10}$B-coated YAP:Ce scintillator whose properties are presented. The advantage of the YAP:Ce scintillator is its high Fermi potential, which reduces the probability for upscattering of ultracold neutrons, and its short decay time, which is important at high counting rates. Birks' coefficient of YAP:Ce was measured to be ($5.56^{+0.05}_{-0.30})\times 10^{-4}$ cm/MeV and light losses due to 120 nm of $^{10}$B-coating to be about 60%. The loss of light from YAP:Ce due to transmission through deuterated polystyrene scintillator was about 50%. The efficiency for counting neutrons that are captured on the $^{10}$B coating is (86.82 $\pm$ 2.61)%. Measurement with ultracold neutrons showed that YAP:Ce crystal counted 8% to 28% more UCNs compared to ZnS screen. This may be due to an uneven coating of $^{10}$B on the rough surface. △ Less

Submitted 27 March, 2024; originally announced May 2024.

arXiv:2405.00168 [pdf, other]

Revisiting RGBT Tracking Benchmarks from the Perspective of Modality Validity: A New Benchmark, Problem, and Method

Authors: Zhangyong Tang, Tianyang Xu, Zhenhua Feng, Xuefeng Zhu, He Wang, Pengcheng Shao, Chunyang Cheng, Xiao-Jun Wu, Muhammad Awais, Sara Atito, Josef Kittler

Abstract: RGBT tracking draws increasing attention due to its robustness in multi-modality warranting (MMW) scenarios, such as nighttime and bad weather, where relying on a single sensing modality fails to ensure stable tracking results. However, the existing benchmarks predominantly consist of videos collected in common scenarios where both RGB and thermal infrared (TIR) information are of sufficient quali… ▽ More RGBT tracking draws increasing attention due to its robustness in multi-modality warranting (MMW) scenarios, such as nighttime and bad weather, where relying on a single sensing modality fails to ensure stable tracking results. However, the existing benchmarks predominantly consist of videos collected in common scenarios where both RGB and thermal infrared (TIR) information are of sufficient quality. This makes the data unrepresentative of severe imaging conditions, leading to tracking failures in MMW scenarios. To bridge this gap, we present a new benchmark, MV-RGBT, captured specifically in MMW scenarios. In contrast with the existing datasets, MV-RGBT comprises more object categories and scenes, providing a diverse and challenging benchmark. Furthermore, for severe imaging conditions of MMW scenarios, a new problem is posed, namely \textit{when to fuse}, to stimulate the development of fusion strategies for such data. We propose a new method based on a mixture of experts, namely MoETrack, as a baseline fusion strategy. In MoETrack, each expert generates independent tracking results along with the corresponding confidence score, which is used to control the fusion process. Extensive experimental results demonstrate the significant potential of MV-RGBT in advancing RGBT tracking and elicit the conclusion that fusion is not always beneficial, especially in MMW scenarios. Significantly, the proposed MoETrack method achieves new state-of-the-art results not only on MV-RGBT, but also on standard benchmarks, such as RGBT234, LasHeR, and the short-term split of VTUAV (VTUAV-ST). More information of MV-RGBT and the source code of MoETrack will be released at https://github.com/Zhangyong-Tang/MoETrack. △ Less

Submitted 30 April, 2024; originally announced May 2024.

arXiv:2404.19231 [pdf]

Evolution of static to dynamic mechanical behavior in topological nonreciprocal robotic metamaterials

Authors: Zehuan Tang, Tingfeng Ma, Hui Chen, Yuanwen Gao

Abstract: Based on the Maxwell-Beatty reciprocity theorem, static non-reciprocity has been realized by using nonlinearity, but this non-reciprocity has strict restrictions on input amplitude and structure size (number of units). Here, we propose a robotic metamaterial with two components of displacement and rotation, which uses active control to add external forces on the units to break reciprocity at the l… ▽ More Based on the Maxwell-Beatty reciprocity theorem, static non-reciprocity has been realized by using nonlinearity, but this non-reciprocity has strict restrictions on input amplitude and structure size (number of units). Here, we propose a robotic metamaterial with two components of displacement and rotation, which uses active control to add external forces on the units to break reciprocity at the level of the interactions between the units. We show analytically and simulatively that breaking reciprocity at the level of the interactions directly leads to a strong asymmetric response of displacement in a static system, this displacement-specific characteristic not only has no restrictions on size, input amplitude, and suitable geometric asymmetry, but also can be transmitted to rotation by coupling under large deformation. After the evolution from statics to dynamics, asymmetric transmission and unidirectional amplification of vector solitons are both implemented in this system. Our research uncovers the evolution of static non-reciprocity to dynamic non-reciprocity while building a bridge between non-reciprocity physics and soliton science. △ Less

Submitted 26 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

Showing 1–50 of 1,272 results for author: Tang, Z