subscribe to arXiv mailings

Induced Domain Walls of QCD Axion, and Gravitational Waves

Authors: Junseok Lee, Kai Murai, Fuminobu Takahashi, Wen Yin

Abstract: We show that heavy axion domain walls induce domain walls of the QCD axion through a mixing between the heavy axion and the QCD axion, even when the pre-inflationary initial condition is assumed for the QCD axion. The induced domain walls arise because the effective $θ$ parameter changes across the heavy axion domain walls, shifting the potential minimum of the QCD axion. When the heavy axion doma… ▽ More We show that heavy axion domain walls induce domain walls of the QCD axion through a mixing between the heavy axion and the QCD axion, even when the pre-inflationary initial condition is assumed for the QCD axion. The induced domain walls arise because the effective $θ$ parameter changes across the heavy axion domain walls, shifting the potential minimum of the QCD axion. When the heavy axion domain walls collapse, the induced QCD axion domain walls collapse as well. This novel mechanism for producing the QCD axions can explain dark matter even with the axion decay constant as small as ${\cal O}(10^{9})$ GeV. In particular, this scenario requires domain wall collapse near the QCD crossover, potentially accounting for the stochastic gravitational wave background suggested by recent pulsar timing array observations, including NANOGrav. Using this mechanism, it is also possible to easily create induced domain walls for string axions or axions with a large decay constant, which would otherwise be challenging. We also comment on the implications for cosmic birefringence using induced axion domain walls. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 33 pages, 8 figures

Report number: TU-1237

arXiv:2407.07924 [pdf, other]

Solving General Natural-Language-Description Optimization Problems with Large Language Models

Authors: Jihai Zhang, Wei Wang, Siyan Guo, Li Wang, Fangquan Lin, Cheng Yang, Wotao Yin

Abstract: Optimization problems seek to find the best solution to an objective under a set of constraints, and have been widely investigated in real-world applications. Modeling and solving optimization problems in a specific domain typically require a combination of domain knowledge, mathematical skills, and programming ability, making it difficult for general users and even domain professionals. In this p… ▽ More Optimization problems seek to find the best solution to an objective under a set of constraints, and have been widely investigated in real-world applications. Modeling and solving optimization problems in a specific domain typically require a combination of domain knowledge, mathematical skills, and programming ability, making it difficult for general users and even domain professionals. In this paper, we propose a novel framework called OptLLM that augments LLMs with external solvers. Specifically, OptLLM accepts user queries in natural language, convert them into mathematical formulations and programming codes, and calls the solvers to calculate the results for decision-making. In addition, OptLLM supports multi-round dialogues to gradually refine the modeling and solving of optimization problems. To illustrate the effectiveness of OptLLM, we provide tutorials on three typical optimization applications and conduct experiments on both prompt-based GPT models and a fine-tuned Qwen model using a large-scale selfdeveloped optimization dataset. Experimental results show that OptLLM works with various LLMs, and the fine-tuned model achieves an accuracy boost compared to the promptbased models. Some features of OptLLM framework have been available for trial since June 2023 (https://opt.alibabacloud.com/chat or https://opt.aliyun.com/chat). △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2406.17028 [pdf, other]

Cosmic Stability of Dark Matter from Pauli Blocking

Authors: Brian Batell, Wen Yin

Abstract: Why does dark matter (DM) live longer than the age of the Universe? Here we study a novel sub-eV scalar DM candidate whose stability is due to the Pauli exclusion of its fermionic decay products. We analyze the stability of the DM condensate against decays, scatterings (i.e., evaporation), and parametric resonance, delineating the viable parameter regions in which DM is cosmologically stable. In a… ▽ More Why does dark matter (DM) live longer than the age of the Universe? Here we study a novel sub-eV scalar DM candidate whose stability is due to the Pauli exclusion of its fermionic decay products. We analyze the stability of the DM condensate against decays, scatterings (i.e., evaporation), and parametric resonance, delineating the viable parameter regions in which DM is cosmologically stable. In a minimal scenario in which the scalar DM decays to a pair of new exotic fermions, we find that scattering can populate an interacting thermal dark sector component to energies far above the DM mass. This self-interacting dark radiation may potentially alleviate the Hubble tensions. Furthermore, our scenario can be probed through precise measurements of the halo mass function or the masses of dwarf spheroidal galaxies since scattering prevents the DM from becoming too dense. On the other hand, if the lightest neutrino stabilizes the DM, the cosmic neutrino background (C$ν$B) can be significantly altered from the $Λ$CDM prediction and thus be probed in the future by C$ν$B detection experiments. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: 47 pages, 5 figures

Report number: TU-1168, PITT-PACC-2403

arXiv:2406.16253 [pdf, other]

LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing

Authors: Jiangshu Du, Yibo Wang, Wenting Zhao, Zhongfen Deng, Shuaiqi Liu, Renze Lou, Henry Peng Zou, Pranav Narayanan Venkit, Nan Zhang, Mukund Srinath, Haoran Ranran Zhang, Vipul Gupta, Yinghui Li, Tao Li, Fei Wang, Qin Liu, Tianlin Liu, Pengzhi Gao, Congying Xia, Chen Xing, Jiayang Cheng, Zhaowei Wang, Ying Su, Raj Sanjay Shah, Ruohao Guo , et al. (15 additional authors not shown)

Abstract: This work is motivated by two key trends. On one hand, large language models (LLMs) have shown remarkable versatility in various generative tasks such as writing, drawing, and question answering, significantly reducing the time required for many routine tasks. On the other hand, researchers, whose work is not only time-consuming but also highly expertise-demanding, face increasing challenges as th… ▽ More This work is motivated by two key trends. On one hand, large language models (LLMs) have shown remarkable versatility in various generative tasks such as writing, drawing, and question answering, significantly reducing the time required for many routine tasks. On the other hand, researchers, whose work is not only time-consuming but also highly expertise-demanding, face increasing challenges as they have to spend more time reading, writing, and reviewing papers. This raises the question: how can LLMs potentially assist researchers in alleviating their heavy workload? This study focuses on the topic of LLMs assist NLP Researchers, particularly examining the effectiveness of LLM in assisting paper (meta-)reviewing and its recognizability. To address this, we constructed the ReviewCritique dataset, which includes two types of information: (i) NLP papers (initial submissions rather than camera-ready) with both human-written and LLM-generated reviews, and (ii) each review comes with "deficiency" labels and corresponding explanations for individual segments, annotated by experts. Using ReviewCritique, this study explores two threads of research questions: (i) "LLMs as Reviewers", how do reviews generated by LLMs compare with those written by humans in terms of quality and distinguishability? (ii) "LLMs as Metareviewers", how effectively can LLMs identify potential issues, such as Deficient or unprofessional review segments, within individual paper reviews? To our knowledge, this is the first work to provide such a comprehensive analysis. △ Less

Submitted 25 June, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

arXiv:2406.16203 [pdf, other]

LLMs' Classification Performance is Overclaimed

Authors: Hanzi Xu, Renze Lou, Jiangshu Du, Vahid Mahzoon, Elmira Talebianaraki, Zhuoan Zhou, Elizabeth Garrison, Slobodan Vucetic, Wenpeng Yin

Abstract: In many classification tasks designed for AI or human to solve, gold labels are typically included within the label space by default, often posed as "which of the following is correct?" This standard setup has traditionally highlighted the strong performance of advanced AI, particularly top-performing Large Language Models (LLMs), in routine classification tasks. However, when the gold label is in… ▽ More In many classification tasks designed for AI or human to solve, gold labels are typically included within the label space by default, often posed as "which of the following is correct?" This standard setup has traditionally highlighted the strong performance of advanced AI, particularly top-performing Large Language Models (LLMs), in routine classification tasks. However, when the gold label is intentionally excluded from the label space, it becomes evident that LLMs still attempt to select from the available label candidates, even when none are correct. This raises a pivotal question: Do LLMs truly demonstrate their intelligence in understanding the essence of classification tasks? In this study, we evaluate both closed-source and open-source LLMs across representative classification tasks, arguing that the perceived performance of LLMs is overstated due to their inability to exhibit the expected comprehension of the task. This paper makes a threefold contribution: i) To our knowledge, this is the first work to identify the limitations of LLMs in classification tasks when gold labels are absent. We define this task as Classify-w/o-Gold and propose it as a new testbed for LLMs. ii) We introduce a benchmark, Know-No, comprising two existing classification tasks and one new task, to evaluate Classify-w/o-Gold. iii) This work defines and advocates for a new evaluation metric, OmniAccuracy, which assesses LLMs' performance in classification tasks both when gold labels are present and absent. △ Less

Submitted 3 July, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

arXiv:2406.13103 [pdf, other]

A Generic Method for Fine-grained Category Discovery in Natural Language Texts

Authors: Chang Tian, Matthew B. Blaschko, Wenpeng Yin, Mingzhe Xing, Yinliang Yue, Marie-Francine Moens

Abstract: Fine-grained category discovery using only coarse-grained supervision is a cost-effective yet challenging task. Previous training methods focus on aligning query samples with positive samples and distancing them from negatives. They often neglect intra-category and inter-category semantic similarities of fine-grained categories when navigating sample distributions in the embedding space. Furthermo… ▽ More Fine-grained category discovery using only coarse-grained supervision is a cost-effective yet challenging task. Previous training methods focus on aligning query samples with positive samples and distancing them from negatives. They often neglect intra-category and inter-category semantic similarities of fine-grained categories when navigating sample distributions in the embedding space. Furthermore, some evaluation techniques that rely on pre-collected test samples are inadequate for real-time applications. To address these shortcomings, we introduce a method that successfully detects fine-grained clusters of semantically similar texts guided by a novel objective function. The method uses semantic similarities in a logarithmic space to guide sample distributions in the Euclidean space and to form distinct clusters that represent fine-grained categories. We also propose a centroid inference mechanism to support real-time applications. The efficacy of the method is both theoretically justified and empirically confirmed on three benchmark tasks. The proposed objective function is integrated in multiple contrastive learning based neural models. Its results surpass existing state-of-the-art approaches in terms of Accuracy, Adjusted Rand Index and Normalized Mutual Information of the detected fine-grained categories. Code and data will be available at https://github.com/XX upon publication. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: preprint

arXiv:2406.12554 [pdf, other]

Populating secluded dark sector with ultra-relativistic bubbles

Authors: Aleksandr Azatov, Xander Nagels, Miguel Vanvlasselaer, Wen Yin

Abstract: We study Dark Matter production during first order phase transitions from bubble-plasma collisions. We focus on scenarios where the Dark Matter sector is secluded and its interaction with the visible sector (including the Standard Model) originates from dimension-five and dimension-six operators. We find that such DM is generally heavy and has a large initial velocity, leading to the possibility o… ▽ More We study Dark Matter production during first order phase transitions from bubble-plasma collisions. We focus on scenarios where the Dark Matter sector is secluded and its interaction with the visible sector (including the Standard Model) originates from dimension-five and dimension-six operators. We find that such DM is generally heavy and has a large initial velocity, leading to the possibility of DM being warm today. We differentiate between the cases of weakly and strongly coupled dark sectors, where, in the latter case, we focus on glueball DM, which turns out to have very distinct phenomenological properties. We also systematically compute the Freeze-In production of the dark sector and compare it with the bubble-plasma DM abundances. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 7 figures, 26 pages + appendices

Report number: SISSA 11/2024/FISI

arXiv:2406.05938 [pdf, other]

Expressive Power of Graph Neural Networks for (Mixed-Integer) Quadratic Programs

Authors: Ziang Chen, Xiaohan Chen, Jialin Liu, Xinshang Wang, Wotao Yin

Abstract: Quadratic programming (QP) is the most widely applied category of problems in nonlinear programming. Many applications require real-time/fast solutions, though not necessarily with high precision. Existing methods either involve matrix decomposition or use the preconditioned conjugate gradient method. For relatively large instances, these methods cannot achieve the real-time requirement unless the… ▽ More Quadratic programming (QP) is the most widely applied category of problems in nonlinear programming. Many applications require real-time/fast solutions, though not necessarily with high precision. Existing methods either involve matrix decomposition or use the preconditioned conjugate gradient method. For relatively large instances, these methods cannot achieve the real-time requirement unless there is an effective precondition. Recently, graph neural networks (GNNs) opened new possibilities for QP. Some promising empirical studies of applying GNNs for QP tasks show that GNNs can capture key characteristics of an optimization instance and provide adaptive guidance accordingly to crucial configurations during the solving process, or directly provide an approximate solution. Despite notable empirical observations, theoretical foundations are still lacking. In this work, we investigate the expressive or representative power of GNNs, a crucial aspect of neural network theory, specifically in the context of QP tasks, with both continuous and mixed-integer settings. We prove the existence of message-passing GNNs that can reliably represent key properties of quadratic programs, including feasibility, optimal objective value, and optimal solution. Our theory is validated by numerical results. △ Less

Submitted 9 June, 2024; originally announced June 2024.

arXiv:2406.05602 [pdf, other]

Can Prompt Modifiers Control Bias? A Comparative Analysis of Text-to-Image Generative Models

Authors: Philip Wootaek Shin, Jihyun Janice Ahn, Wenpeng Yin, Jack Sampson, Vijaykrishnan Narayanan

Abstract: It has been shown that many generative models inherit and amplify societal biases. To date, there is no uniform/systematic agreed standard to control/adjust for these biases. This study examines the presence and manipulation of societal biases in leading text-to-image models: Stable Diffusion, DALL-E 3, and Adobe Firefly. Through a comprehensive analysis combining base prompts with modifiers and t… ▽ More It has been shown that many generative models inherit and amplify societal biases. To date, there is no uniform/systematic agreed standard to control/adjust for these biases. This study examines the presence and manipulation of societal biases in leading text-to-image models: Stable Diffusion, DALL-E 3, and Adobe Firefly. Through a comprehensive analysis combining base prompts with modifiers and their sequencing, we uncover the nuanced ways these AI technologies encode biases across gender, race, geography, and region/culture. Our findings reveal the challenges and potential of prompt engineering in controlling biases, highlighting the critical need for ethical AI development promoting diversity and inclusivity. This work advances AI ethics by not only revealing the nuanced dynamics of bias in text-to-image generation models but also by offering a novel framework for future research in controlling bias. Our contributions-panning comparative analyses, the strategic use of prompt modifiers, the exploration of prompt sequencing effects, and the introduction of a bias sensitivity taxonomy-lay the groundwork for the development of common metrics and standard analyses for evaluating whether and how future AI models exhibit and respond to requests to adjust for inherent biases. △ Less

Submitted 8 June, 2024; originally announced June 2024.

arXiv:2406.05460 [pdf, other]

Fighting Against the Repetitive Training and Sample Dependency Problem in Few-shot Named Entity Recognition

Authors: Chang Tian, Wenpeng Yin, Dan Li, Marie-Francine Moens

Abstract: Few-shot named entity recognition (NER) systems recognize entities using a few labeled training examples. The general pipeline consists of a span detector to identify entity spans in text and an entity-type classifier to assign types to entities. Current span detectors rely on extensive manual labeling to guide training. Almost every span detector requires initial training on basic span features f… ▽ More Few-shot named entity recognition (NER) systems recognize entities using a few labeled training examples. The general pipeline consists of a span detector to identify entity spans in text and an entity-type classifier to assign types to entities. Current span detectors rely on extensive manual labeling to guide training. Almost every span detector requires initial training on basic span features followed by adaptation to task-specific features. This process leads to repetitive training of the basic span features among span detectors. Additionally, metric-based entity-type classifiers, such as prototypical networks, typically employ a specific metric that gauges the distance between the query sample and entity-type referents, ultimately assigning the most probable entity type to the query sample. However, these classifiers encounter the sample dependency problem, primarily stemming from the limited samples available for each entity-type referent. To address these challenges, we proposed an improved few-shot NER pipeline. First, we introduce a steppingstone span detector that is pre-trained on open-domain Wikipedia data. It can be used to initialize the pipeline span detector to reduce the repetitive training of basic features. Second, we leverage a large language model (LLM) to set reliable entity-type referents, eliminating reliance on few-shot samples of each type. Our model exhibits superior performance with fewer training steps and human-labeled data compared with baselines, as demonstrated through extensive experiments on various datasets. Particularly in fine-grained few-shot NER settings, our model outperforms strong baselines, including ChatGPT. We will publicly release the code, datasets, LLM outputs, and model checkpoints. △ Less

Submitted 18 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

Comments: ieee access: https://doi.org/10.1109/ACCESS.2024.3374727

arXiv:2406.02006 [pdf, other]

ODE-based Learning to Optimize

Authors: Zhonglin Xie, Wotao Yin, Zaiwen Wen

Abstract: Recent years have seen a growing interest in understanding acceleration methods through the lens of ordinary differential equations (ODEs). Despite the theoretical advancements, translating the rapid convergence observed in continuous-time models to discrete-time iterative methods poses significant challenges. In this paper, we present a comprehensive framework integrating the inertial systems wit… ▽ More Recent years have seen a growing interest in understanding acceleration methods through the lens of ordinary differential equations (ODEs). Despite the theoretical advancements, translating the rapid convergence observed in continuous-time models to discrete-time iterative methods poses significant challenges. In this paper, we present a comprehensive framework integrating the inertial systems with Hessian-driven damping equation (ISHD) and learning-based approaches for developing optimization methods through a deep synergy of theoretical insights. We first establish the convergence condition for ensuring the convergence of the solution trajectory of ISHD. Then, we show that provided the stability condition, another relaxed requirement on the coefficients of ISHD, the sequence generated through the explicit Euler discretization of ISHD converges, which gives a large family of practical optimization methods. In order to select the best optimization method in this family for certain problems, we introduce the stopping time, the time required for an optimization method derived from ISHD to achieve a predefined level of suboptimality. Then, we formulate a novel learning to optimize (L2O) problem aimed at minimizing the stopping time subject to the convergence and stability condition. To navigate this learning problem, we present an algorithm combining stochastic optimization and the penalty method (StoPM). The convergence of StoPM using the conservative gradient is proved. Empirical validation of our framework is conducted through extensive numerical experiments across a diverse set of optimization problems. These experiments showcase the superior performance of the learned optimization methods. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 55 pages, 28 figures

arXiv:2405.19978 [pdf, other]

Domain Adaptation with Cauchy-Schwarz Divergence

Authors: Wenzhe Yin, Shujian Yu, Yicong Lin, Jie Liu, Jan-Jakob Sonke, Efstratios Gavves

Abstract: Domain adaptation aims to use training data from one or multiple source domains to learn a hypothesis that can be generalized to a different, but related, target domain. As such, having a reliable measure for evaluating the discrepancy of both marginal and conditional distributions is crucial. We introduce Cauchy-Schwarz (CS) divergence to the problem of unsupervised domain adaptation (UDA). The C… ▽ More Domain adaptation aims to use training data from one or multiple source domains to learn a hypothesis that can be generalized to a different, but related, target domain. As such, having a reliable measure for evaluating the discrepancy of both marginal and conditional distributions is crucial. We introduce Cauchy-Schwarz (CS) divergence to the problem of unsupervised domain adaptation (UDA). The CS divergence offers a theoretically tighter generalization error bound than the popular Kullback-Leibler divergence. This holds for the general case of supervised learning, including multi-class classification and regression. Furthermore, we illustrate that the CS divergence enables a simple estimator on the discrepancy of both marginal and conditional distributions between source and target domains in the representation space, without requiring any distributional assumptions. We provide multiple examples to illustrate how the CS divergence can be conveniently used in both distance metric- or adversarial training-based UDA frameworks, resulting in compelling performance. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: Accepted by UAI-24

arXiv:2405.17705 [pdf, other]

DC-Gaussian: Improving 3D Gaussian Splatting for Reflective Dash Cam Videos

Authors: Linhan Wang, Kai Cheng, Shuo Lei, Shengkun Wang, Wei Yin, Chenyang Lei, Xiaoxiao Long, Chang-Tien Lu

Abstract: We present DC-Gaussian, a new method for generating novel views from in-vehicle dash cam videos. While neural rendering techniques have made significant strides in driving scenarios, existing methods are primarily designed for videos collected by autonomous vehicles. However, these videos are limited in both quantity and diversity compared to dash cam videos, which are more widely used across vari… ▽ More We present DC-Gaussian, a new method for generating novel views from in-vehicle dash cam videos. While neural rendering techniques have made significant strides in driving scenarios, existing methods are primarily designed for videos collected by autonomous vehicles. However, these videos are limited in both quantity and diversity compared to dash cam videos, which are more widely used across various types of vehicles and capture a broader range of scenarios. Dash cam videos often suffer from severe obstructions such as reflections and occlusions on the windshields, which significantly impede the application of neural rendering techniques. To address this challenge, we develop DC-Gaussian based on the recent real-time neural rendering technique 3D Gaussian Splatting (3DGS). Our approach includes an adaptive image decomposition module to model reflections and occlusions in a unified manner. Additionally, we introduce illumination-aware obstruction modeling to manage reflections and occlusions under varying lighting conditions. Lastly, we employ a geometry-guided Gaussian enhancement strategy to improve rendering details by incorporating additional geometry priors. Experiments on self-captured and public dash cam videos show that our method not only achieves state-of-the-art performance in novel view synthesis, but also accurately reconstructing captured scenes getting rid of obstructions. △ Less

Submitted 29 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

Comments: 9 pages,7 figures;project page: https://linhanwang.github.io/dcgaussian/

arXiv:2405.16020 [pdf, ps, other]

Block Acceleration Without Momentum: On Optimal Stepsizes of Block Gradient Descent for Least-Squares

Authors: Liangzu Peng, Wotao Yin

Abstract: Block coordinate descent is a powerful algorithmic template suitable for big data optimization. This template admits a lot of variants including block gradient descent (BGD), which performs gradient descent on a selected block of variables, while keeping other variables fixed. For a very long time, the stepsize for each block has tacitly been set to one divided by the block-wise Lipschitz smoothne… ▽ More Block coordinate descent is a powerful algorithmic template suitable for big data optimization. This template admits a lot of variants including block gradient descent (BGD), which performs gradient descent on a selected block of variables, while keeping other variables fixed. For a very long time, the stepsize for each block has tacitly been set to one divided by the block-wise Lipschitz smoothness constant, imitating the vanilla stepsize rule for gradient descent (GD). However, such a choice for BGD has not yet been able to theoretically justify its empirical superiority over GD, as existing convergence rates for BGD have worse constants than GD in the deterministic cases. To discover such theoretical justification, we set up a simple environment where we consider BGD applied to least-squares with two blocks of variables. Assuming the data matrix corresponding to each block is orthogonal, we find optimal stepsizes of BGD in closed form, which provably lead to asymptotic convergence rates twice as fast as GD with Polyak's momentum; this means, under that orthogonality assumption, one can accelerate BGD by just tuning stepsizes and without adding any momentum. An application that satisfies this assumption is \textit{generalized alternating projection} between two subspaces, and applying our stepsizes to it improves the prior convergence rate that was once claimed, slightly inaccurately, to be optimal. The main proof idea is to minimize, in stepsize variables, the spectral radius of a matrix that controls convergence rates. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: 36 pages, accepted to ICML 2024

arXiv:2405.15251 [pdf, other]

Learning to optimize: A tutorial for continuous and mixed-integer optimization

Authors: Xiaohan Chen, Jialin Liu, Wotao Yin

Abstract: Learning to Optimize (L2O) stands at the intersection of traditional optimization and machine learning, utilizing the capabilities of machine learning to enhance conventional optimization techniques. As real-world optimization problems frequently share common structures, L2O provides a tool to exploit these structures for better or faster solutions. This tutorial dives deep into L2O techniques, in… ▽ More Learning to Optimize (L2O) stands at the intersection of traditional optimization and machine learning, utilizing the capabilities of machine learning to enhance conventional optimization techniques. As real-world optimization problems frequently share common structures, L2O provides a tool to exploit these structures for better or faster solutions. This tutorial dives deep into L2O techniques, introducing how to accelerate optimization algorithms, promptly estimate the solutions, or even reshape the optimization problem itself, making it more adaptive to real-world applications. By considering the prerequisites for successful applications of L2O and the structure of the optimization problems at hand, this tutorial provides a comprehensive guide for practitioners and researchers alike. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.14741 [pdf, other]

Bagging Improves Generalization Exponentially

Authors: Huajie Qian, Donghao Ying, Henry Lam, Wotao Yin

Abstract: Bagging is a popular ensemble technique to improve the accuracy of machine learning models. It hinges on the well-established rationale that, by repeatedly retraining on resampled data, the aggregated model exhibits lower variance and hence higher stability, especially for discontinuous base learners. In this paper, we provide a new perspective on bagging: By suitably aggregating the base learners… ▽ More Bagging is a popular ensemble technique to improve the accuracy of machine learning models. It hinges on the well-established rationale that, by repeatedly retraining on resampled data, the aggregated model exhibits lower variance and hence higher stability, especially for discontinuous base learners. In this paper, we provide a new perspective on bagging: By suitably aggregating the base learners at the parametrization instead of the output level, bagging improves generalization performances exponentially, a strength that is significantly more powerful than variance reduction. More precisely, we show that for general stochastic optimization problems that suffer from slowly (i.e., polynomially) decaying generalization errors, bagging can effectively reduce these errors to an exponential decay. Moreover, this power of bagging is agnostic to the solution schemes, including common empirical risk minimization, distributionally robust optimization, and various regularizations. We demonstrate how bagging can substantially improve generalization performances in a range of examples involving heavy-tailed data that suffer from intrinsically slow rates. △ Less

Submitted 29 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

Comments: Correct author list typo

arXiv:2405.10303 [pdf, other]

Asymmetric Warm Dark Matter: from Cosmological Asymmetry to Chirality of Life

Authors: Wen Yin, Shota Nakagawa, Tamaki Murokoshi, Makoto Hattori

Abstract: We investigate a novel scenario involving asymmetric keV-range dark matter (DM) in the form of right-handed (sterile) neutrinos. Based on the Fermi-Dirac distribution, we demonstrate that asymmetric fermionic DM forms a Fermi degenerate gas, making it potentially colder than symmetric fermionic DM. This setup simultaneously accounts for the Universe's baryon asymmetry through tiny Yukawa interacti… ▽ More We investigate a novel scenario involving asymmetric keV-range dark matter (DM) in the form of right-handed (sterile) neutrinos. Based on the Fermi-Dirac distribution, we demonstrate that asymmetric fermionic DM forms a Fermi degenerate gas, making it potentially colder than symmetric fermionic DM. This setup simultaneously accounts for the Universe's baryon asymmetry through tiny Yukawa interactions with Standard Model leptons and the Higgs field, and the homochirality of amino acids via decay into circularly polarized photons. This scenario can be investigated through soft X-ray searches conducted by current and upcoming space missions. The helical X-rays is a smoking-gun signal of our scenario. Additionally, we propose a new mechanism to suppress DM thermal production by introducing a light modulus, which may also benefit cosmology involving generic right-handed neutrinos with large mixing. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: 22pages, 3figures, comments are welcome

arXiv:2405.10205 [pdf, other]

Exploring the Impact of ChatGPT on Wikipedia Engagement

Authors: Neal Reeves, Wenjie Yin, Elena Simperl

Abstract: Wikipedia is one of the most popular websites in the world, serving as a major source of information and learning resource for millions of users worldwide. While motivations for its usage vary, prior research suggests shallow information gathering -- looking up facts and information or answering questions -- dominates over more in-depth usage. On the 22nd of November 2022, ChatGPT was released to… ▽ More Wikipedia is one of the most popular websites in the world, serving as a major source of information and learning resource for millions of users worldwide. While motivations for its usage vary, prior research suggests shallow information gathering -- looking up facts and information or answering questions -- dominates over more in-depth usage. On the 22nd of November 2022, ChatGPT was released to the public and has quickly become a popular source of information, serving as an effective question-answering and knowledge gathering resource. Early indications have suggested that it may be drawing users away from traditional question answering services such as Stack Overflow, raising the question of how it may have impacted Wikipedia. In this paper, we explore Wikipedia user metrics across four areas: page views, unique visitor numbers, edit counts and editor numbers within twelve language instances of Wikipedia. We perform pairwise comparisons of these metrics before and after the release of ChatGPT and implement a panel regression model to observe and quantify longer-term trends. We find no evidence of a fall in engagement across any of the four metrics, instead observing that page views and visitor numbers increased in the period following ChatGPT's launch. However, we observe a lower increase in languages where ChatGPT was available than in languages where it was not, which may suggest ChatGPT's availability limited growth in those languages. Our results contribute to the understanding of how emerging generative AI tools are disrupting the Web ecosystem. △ Less

Submitted 29 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

Comments: 12 pages, 4 figures, submitted to ACM Collective Intelligence

arXiv:2404.19417 [pdf, other]

Physical Backdoor: Towards Temperature-based Backdoor Attacks in the Physical World

Authors: Wen Yin, Jian Lou, Pan Zhou, Yulai Xie, Dan Feng, Yuhua Sun, Tailai Zhang, Lichao Sun

Abstract: Backdoor attacks have been well-studied in visible light object detection (VLOD) in recent years. However, VLOD can not effectively work in dark and temperature-sensitive scenarios. Instead, thermal infrared object detection (TIOD) is the most accessible and practical in such environments. In this paper, our team is the first to investigate the security vulnerabilities associated with TIOD in the… ▽ More Backdoor attacks have been well-studied in visible light object detection (VLOD) in recent years. However, VLOD can not effectively work in dark and temperature-sensitive scenarios. Instead, thermal infrared object detection (TIOD) is the most accessible and practical in such environments. In this paper, our team is the first to investigate the security vulnerabilities associated with TIOD in the context of backdoor attacks, spanning both the digital and physical realms. We introduce two novel types of backdoor attacks on TIOD, each offering unique capabilities: Object-affecting Attack and Range-affecting Attack. We conduct a comprehensive analysis of key factors influencing trigger design, which include temperature, size, material, and concealment. These factors, especially temperature, significantly impact the efficacy of backdoor attacks on TIOD. A thorough understanding of these factors will serve as a foundation for designing physical triggers and temperature controlling experiments. Our study includes extensive experiments conducted in both digital and physical environments. In the digital realm, we evaluate our approach using benchmark datasets for TIOD, achieving an Attack Success Rate (ASR) of up to 98.21%. In the physical realm, we test our approach in two real-world settings: a traffic intersection and a parking lot, using a thermal infrared camera. Here, we attain an ASR of up to 98.38%. △ Less

Submitted 30 April, 2024; originally announced April 2024.

Comments: To appear in CVPR 2024.11pages, 8 figures and 4 tables

arXiv:2404.18372 [pdf, other]

Integrable semi-discretization for a modified Camassa-Holm equation with cubic nonlinearity

Authors: Bao-Feng Feng, Heng-Chun Hu, Han-Han Sheng, Wei Yin, Guo-Fu Yu

Abstract: In the present paper, an integrable semi-discretization of the modified Camassa-Holm (mCH) equation with cubic nonlinearity is presented. The key points of the construction are based on the discrete Kadomtsev-Petviashvili (KP) equation and appropriate definition of discrete reciprocal transformations. First, we demonstrate that these bilinear equations and their determinant solutions can be derive… ▽ More In the present paper, an integrable semi-discretization of the modified Camassa-Holm (mCH) equation with cubic nonlinearity is presented. The key points of the construction are based on the discrete Kadomtsev-Petviashvili (KP) equation and appropriate definition of discrete reciprocal transformations. First, we demonstrate that these bilinear equations and their determinant solutions can be derived from the discrete KP equation through Miwa transformation and some reductions. Then, by scrutinizing the reduction process, we obtain a set of semi-discrete bilinear equations and their general soliton solutions in the Gram-type determinant form. Finally, we obtain an integrable semi-discrete analog of the mCH equation by introducing dependent variables and discrete reciprocal transformation. It is also shown that the semi-discrete mCH equation converges to the continuous one in the continuum limit. △ Less

Submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.15506 [pdf, other]

Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation

Authors: Mu Hu, Wei Yin, Chi Zhang, Zhipeng Cai, Xiaoxiao Long, Hao Chen, Kaixuan Wang, Gang Yu, Chunhua Shen, Shaojie Shen

Abstract: We introduce Metric3D v2, a geometric foundation model for zero-shot metric depth and surface normal estimation from a single image, which is crucial for metric 3D recovery. While depth and normal are geometrically related and highly complimentary, they present distinct challenges. SoTA monocular depth methods achieve zero-shot generalization by learning affine-invariant depths, which cannot recov… ▽ More We introduce Metric3D v2, a geometric foundation model for zero-shot metric depth and surface normal estimation from a single image, which is crucial for metric 3D recovery. While depth and normal are geometrically related and highly complimentary, they present distinct challenges. SoTA monocular depth methods achieve zero-shot generalization by learning affine-invariant depths, which cannot recover real-world metrics. Meanwhile, SoTA normal estimation methods have limited zero-shot performance due to the lack of large-scale labeled data. To tackle these issues, we propose solutions for both metric depth estimation and surface normal estimation. For metric depth estimation, we show that the key to a zero-shot single-view model lies in resolving the metric ambiguity from various camera models and large-scale data training. We propose a canonical camera space transformation module, which explicitly addresses the ambiguity problem and can be effortlessly plugged into existing monocular models. For surface normal estimation, we propose a joint depth-normal optimization module to distill diverse data knowledge from metric depth, enabling normal estimators to learn beyond normal labels. Equipped with these modules, our depth-normal models can be stably trained with over 16 million of images from thousands of camera models with different-type annotations, resulting in zero-shot generalization to in-the-wild images with unseen camera settings. Our method enables the accurate recovery of metric 3D structures on randomly collected internet images, paving the way for plausible single-image metrology. Our project page is at https://JUGGHM.github.io/Metric3Dv2. △ Less

Submitted 21 March, 2024; originally announced April 2024.

Comments: Our project page is at https://JUGGHM.github.io/Metric3Dv2. arXiv admin note: substantial text overlap with arXiv:2307.10984

arXiv:2404.06444 [pdf, other]

Cosmic Clues: DESI, Dark Energy, and the Cosmological Constant Problem

Authors: Wen Yin

Abstract: Several attempts to solve the cosmological constant problem, which concerns the value of the cosmological constant being extremely smaller than the Standard Model mass scales, have introduced a scalar field with a very flat potential that can be approximated as linear around any given position. The scalar field scans the cosmological constant in such a way that the current small value is explained… ▽ More Several attempts to solve the cosmological constant problem, which concerns the value of the cosmological constant being extremely smaller than the Standard Model mass scales, have introduced a scalar field with a very flat potential that can be approximated as linear around any given position. The scalar field scans the cosmological constant in such a way that the current small value is explained. Recently, Dark Energy Spectroscopic Instrument (DESI) reported the results of the first year. Combining the data with CMB, Pantheon, Union3, and/or DES-SN5YR, there is a preference or anomaly, indicating that the dark energy in the current Universe slightly deviates from that in the $Λ$CDM model and varies over time. In this paper, I show that the simple linear potential of a scalar field that may explain the small cosmological constant can explain the DESI anomaly. In particular, the model proposed by the present author in 2108.04246, which relaxes the cosmological constant by the condition that inflation ends, predicts a time-dependence of the dark energy close to the one favored by the data. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: 9 pages, 1 figure

arXiv:2404.03602 [pdf, other]

Evaluating LLMs at Detecting Errors in LLM Responses

Authors: Ryo Kamoi, Sarkar Snigdha Sarathi Das, Renze Lou, Jihyun Janice Ahn, Yilun Zhao, Xiaoxin Lu, Nan Zhang, Yusen Zhang, Ranran Haoran Zhang, Sujeeth Reddy Vummanthala, Salika Dave, Shaobo Qin, Arman Cohan, Wenpeng Yin, Rui Zhang

Abstract: With Large Language Models (LLMs) being widely used across various tasks, detecting errors in their responses is increasingly crucial. However, little research has been conducted on error detection of LLM responses. Collecting error annotations on LLM responses is challenging due to the subjective nature of many NLP tasks, and thus previous research focuses on tasks of little practical value (e.g.… ▽ More With Large Language Models (LLMs) being widely used across various tasks, detecting errors in their responses is increasingly crucial. However, little research has been conducted on error detection of LLM responses. Collecting error annotations on LLM responses is challenging due to the subjective nature of many NLP tasks, and thus previous research focuses on tasks of little practical value (e.g., word sorting) or limited error types (e.g., faithfulness in summarization). This work introduces ReaLMistake, the first error detection benchmark consisting of objective, realistic, and diverse errors made by LLMs. ReaLMistake contains three challenging and meaningful tasks that introduce objectively assessable errors in four categories (reasoning correctness, instruction-following, context-faithfulness, and parameterized knowledge), eliciting naturally observed and diverse errors in responses of GPT-4 and Llama 2 70B annotated by experts. We use ReaLMistake to evaluate error detectors based on 12 LLMs. Our findings show: 1) Top LLMs like GPT-4 and Claude 3 detect errors made by LLMs at very low recall, and all LLM-based error detectors perform much worse than humans. 2) Explanations by LLM-based error detectors lack reliability. 3) LLMs-based error detection is sensitive to small changes in prompts but remains challenging to improve. 4) Popular approaches to improving LLMs, including self-consistency and majority vote, do not improve the error detection performance. Our benchmark and code are provided at https://github.com/psunlpgroup/ReaLMistake. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: Benchmark and code: https://github.com/psunlpgroup/ReaLMistake

arXiv:2404.01600 [pdf, other]

doi 10.1088/0256-307X/41/3/037104

C-type antiferromagnetic structure of topological semimetal CaMnSb$_2$

Authors: Bo Li, Xu-Tao Zeng, Qianhui Xu, Fan Yang, Junsen Xiang, Hengyang Zhong, Sihao Deng, Lunhua He, Juping Xu, Wen Yin, Xingye Lu, Huiying Liu, Xian-Lei Sheng, Wentao Jin

Abstract: Determination of the magnetic structure and confirmation of the presence or absence of inversion ($\mathcal{P}$) and time reversal ($\mathcal{T}$) symmetry is imperative for correctly understanding the topological magnetic materials. Here high-quality single crystals of the layered manganese pnictide CaMnSb$_2$ are synthesized using the self-flux method. De Haas-van Alphen oscillations indicate a… ▽ More Determination of the magnetic structure and confirmation of the presence or absence of inversion ($\mathcal{P}$) and time reversal ($\mathcal{T}$) symmetry is imperative for correctly understanding the topological magnetic materials. Here high-quality single crystals of the layered manganese pnictide CaMnSb$_2$ are synthesized using the self-flux method. De Haas-van Alphen oscillations indicate a nontrivial Berry phase of $\sim$ $π$ and a notably small cyclotron effective mass, supporting the Dirac semimetal nature of CaMnSb$_2$. Neutron diffraction measurements identify a C-type antiferromagnetic (AFM) structure below $T\rm_{N}$ = 303(1) K with the Mn moments aligned along the $a$ axis, which is well supported by the density functional theory (DFT) calculations. The corresponding magnetic space group is $Pn'm'a'$, preserving a $\mathcal{P}\times\mathcal{T}$ symmetry. Adopting the experimentally determined magnetic structure, band crossings near the Y point in momentum space and linear dispersions of the Sb $5p_{y,z}$ bands are revealed by the DFT calculations. Furthermore, our study predicts the possible existence of an intrinsic second-order nonlinear Hall effect in CaMnSb$_2$, offering a promising platform to study the impact of topological properties on nonlinear electrical transports in antiferromagnets. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: 7 Pages, 6 figures

Journal ref: Chinese Physics Letters 41, 037104 (2024)

arXiv:2404.01592 [pdf, other]

doi 10.1103/PhysRevMaterials.8.044409

Structural, magnetic and magnetocaloric properties of triangular-lattice transition-metal phosphates

Authors: Chuandi Zhang, Junsen Xiang, Quanliang Zhu, Longfei Wu, Shanfeng Zhang, Juping Xu, Wen Yin, Peijie Sun, Wei Li, Gang Su, Wentao Jin

Abstract: The recent discovery of the spin supersolid candidate Na$_2$BaCo(PO$_4$)$_2$ stimulates numerous research interest on the triangular-lattice transition-metal phosphates. Here we report a comprehensive study on the structural, magnetic and magnetocaloric properties of polycrystalline Na$_2$$A$$T$(PO$_4$)$_2$ ($A$ = Ba, Sr; $T$ = Co, Ni, Mn). X-ray and neutron diffraction measurements confirm that N… ▽ More The recent discovery of the spin supersolid candidate Na$_2$BaCo(PO$_4$)$_2$ stimulates numerous research interest on the triangular-lattice transition-metal phosphates. Here we report a comprehensive study on the structural, magnetic and magnetocaloric properties of polycrystalline Na$_2$$A$$T$(PO$_4$)$_2$ ($A$ = Ba, Sr; $T$ = Co, Ni, Mn). X-ray and neutron diffraction measurements confirm that Na$_2$Ba$T$(PO$_4$)$_2$ (NB$T$P) crystallizes in a trigonal structure, while Na$_2$Sr$T$(PO$_4$)$_2$ (NS$T$P) forms a monoclinic structure with a slight distortion of the triangular network of $T^{2+}$ ions. The dc magnetization data show that all six compounds order antiferromagnetically below 2 K, and the Néel temperatures of NS$T$P are consistently higher than those of NB$T$P for $T$ = Co, Ni, and Mn, due to the release of geometrical frustration by monoclinic distortions. Further magnetocaloric measurements show that trigonal NB$T$P can reach a lower temperature in the quasi-adiabatic demagnetization process and thus shows a better performance in the magnetic refrigeration, compared with monoclinic NS$T$P. Our findings highlight the outstanding magnetocaloric performances of the trigonal transition-metal phosphates, and disclose two necessary ingredients for a superior magnetic coolant that can reach an ultra-low temperature, including a perfect geometrically frustrated lattice and a small effective spin number associated with the magnetic ions. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: 10 Pages, 6 figures, accepted for publication in Physical Review Materials

Journal ref: Physical Review Materials 8, 044409 (2024)

arXiv:2403.17934 [pdf, other]

AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation

Authors: Qingping Sun, Yanjun Wang, Ailing Zeng, Wanqi Yin, Chen Wei, Wenjia Wang, Haiyi Mei, Chi Sing Leung, Ziwei Liu, Lei Yang, Zhongang Cai

Abstract: Expressive human pose and shape estimation (a.k.a. 3D whole-body mesh recovery) involves the human body, hand, and expression estimation. Most existing methods have tackled this task in a two-stage manner, first detecting the human body part with an off-the-shelf detection model and inferring the different human body parts individually. Despite the impressive results achieved, these methods suffer… ▽ More Expressive human pose and shape estimation (a.k.a. 3D whole-body mesh recovery) involves the human body, hand, and expression estimation. Most existing methods have tackled this task in a two-stage manner, first detecting the human body part with an off-the-shelf detection model and inferring the different human body parts individually. Despite the impressive results achieved, these methods suffer from 1) loss of valuable contextual information via cropping, 2) introducing distractions, and 3) lacking inter-association among different persons and body parts, inevitably causing performance degradation, especially for crowded scenes. To address these issues, we introduce a novel all-in-one-stage framework, AiOS, for multiple expressive human pose and shape recovery without an additional human detection step. Specifically, our method is built upon DETR, which treats multi-person whole-body mesh recovery task as a progressive set prediction problem with various sequential detection. We devise the decoder tokens and extend them to our task. Specifically, we first employ a human token to probe a human location in the image and encode global features for each instance, which provides a coarse location for the later transformer block. Then, we introduce a joint-related token to probe the human joint in the image and encoder a fine-grained local feature, which collaborates with the global feature to regress the whole-body mesh. This straightforward but effective model outperforms previous state-of-the-art methods by a 9% reduction in NMVE on AGORA, a 30% reduction in PVE on EHF, a 10% reduction in PVE on ARCTIC, and a 3% reduction in PVE on EgoBody. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: Homepage: https://ttxskk.github.io/AiOS/

arXiv:2403.13307 [pdf, other]

LaserHuman: Language-guided Scene-aware Human Motion Generation in Free Environment

Authors: Peishan Cong, Ziyi Wang, Zhiyang Dou, Yiming Ren, Wei Yin, Kai Cheng, Yujing Sun, Xiaoxiao Long, Xinge Zhu, Yuexin Ma

Abstract: Language-guided scene-aware human motion generation has great significance for entertainment and robotics. In response to the limitations of existing datasets, we introduce LaserHuman, a pioneering dataset engineered to revolutionize Scene-Text-to-Motion research. LaserHuman stands out with its inclusion of genuine human motions within 3D environments, unbounded free-form natural language descript… ▽ More Language-guided scene-aware human motion generation has great significance for entertainment and robotics. In response to the limitations of existing datasets, we introduce LaserHuman, a pioneering dataset engineered to revolutionize Scene-Text-to-Motion research. LaserHuman stands out with its inclusion of genuine human motions within 3D environments, unbounded free-form natural language descriptions, a blend of indoor and outdoor scenarios, and dynamic, ever-changing scenes. Diverse modalities of capture data and rich annotations present great opportunities for the research of conditional motion generation, and can also facilitate the development of real-life applications. Moreover, to generate semantically consistent and physically plausible human motions, we propose a multi-conditional diffusion model, which is simple but effective, achieving state-of-the-art performance on existing datasets. △ Less

Submitted 21 March, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.12959 [pdf, other]

WHAC: World-grounded Humans and Cameras

Authors: Wanqi Yin, Zhongang Cai, Ruisi Wang, Fanzhou Wang, Chen Wei, Haiyi Mei, Weiye Xiao, Zhitao Yang, Qingping Sun, Atsushi Yamashita, Ziwei Liu, Lei Yang

Abstract: Estimating human and camera trajectories with accurate scale in the world coordinate system from a monocular video is a highly desirable yet challenging and ill-posed problem. In this study, we aim to recover expressive parametric human models (i.e., SMPL-X) and corresponding camera poses jointly, by leveraging the synergy between three critical players: the world, the human, and the camera. Our a… ▽ More Estimating human and camera trajectories with accurate scale in the world coordinate system from a monocular video is a highly desirable yet challenging and ill-posed problem. In this study, we aim to recover expressive parametric human models (i.e., SMPL-X) and corresponding camera poses jointly, by leveraging the synergy between three critical players: the world, the human, and the camera. Our approach is founded on two key observations. Firstly, camera-frame SMPL-X estimation methods readily recover absolute human depth. Secondly, human motions inherently provide absolute spatial cues. By integrating these insights, we introduce a novel framework, referred to as WHAC, to facilitate world-grounded expressive human pose and shape estimation (EHPS) alongside camera pose estimation, without relying on traditional optimization techniques. Additionally, we present a new synthetic dataset, WHAC-A-Mole, which includes accurately annotated humans and cameras, and features diverse interactive human motions as well as realistic camera trajectories. Extensive experiments on both standard and newly established benchmarks highlight the superiority and efficacy of our framework. We will make the code and dataset publicly available. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: Homepage: https://wqyin.github.io/projects/WHAC/

arXiv:2403.12013 [pdf, other]

GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image

Authors: Xiao Fu, Wei Yin, Mu Hu, Kaixuan Wang, Yuexin Ma, Ping Tan, Shaojie Shen, Dahua Lin, Xiaoxiao Long

Abstract: We introduce GeoWizard, a new generative foundation model designed for estimating geometric attributes, e.g., depth and normals, from single images. While significant research has already been conducted in this area, the progress has been substantially limited by the low diversity and poor quality of publicly available datasets. As a result, the prior works either are constrained to limited scenar… ▽ More We introduce GeoWizard, a new generative foundation model designed for estimating geometric attributes, e.g., depth and normals, from single images. While significant research has already been conducted in this area, the progress has been substantially limited by the low diversity and poor quality of publicly available datasets. As a result, the prior works either are constrained to limited scenarios or suffer from the inability to capture geometric details. In this paper, we demonstrate that generative models, as opposed to traditional discriminative models (e.g., CNNs and Transformers), can effectively address the inherently ill-posed problem. We further show that leveraging diffusion priors can markedly improve generalization, detail preservation, and efficiency in resource usage. Specifically, we extend the original stable diffusion model to jointly predict depth and normal, allowing mutual information exchange and high consistency between the two representations. More importantly, we propose a simple yet effective strategy to segregate the complex data distribution of various scenes into distinct sub-distributions. This strategy enables our model to recognize different scene layouts, capturing 3D geometry with remarkable fidelity. GeoWizard sets new benchmarks for zero-shot depth and normal prediction, significantly enhancing many downstream applications such as 3D reconstruction, 2D content creation, and novel viewpoint synthesis. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: Project page: https://fuxiao0719.github.io/projects/geowizard/

arXiv:2403.11805 [pdf, other]

LLM as a System Service on Mobile Devices

Authors: Wangsong Yin, Mengwei Xu, Yuanchun Li, Xuanzhe Liu

Abstract: Being more powerful and intrusive into user-device interactions, LLMs are eager for on-device execution to better preserve user privacy. In this work, we propose a new paradigm of mobile AI: LLM as a system service on mobile devices (LLMaaS). Unlike traditional DNNs that execute in a stateless manner, such a system service is stateful: LLMs execution often needs to maintain persistent states (main… ▽ More Being more powerful and intrusive into user-device interactions, LLMs are eager for on-device execution to better preserve user privacy. In this work, we propose a new paradigm of mobile AI: LLM as a system service on mobile devices (LLMaaS). Unlike traditional DNNs that execute in a stateless manner, such a system service is stateful: LLMs execution often needs to maintain persistent states (mainly KV cache) across multiple invocations. To minimize the LLM context switching overhead under tight device memory budget, this work presents LLMS, which decouples the memory management of app and LLM contexts with a key idea of fine-grained, chunk-wise, globally-optimized KV cache compression and swapping. By fully leveraging KV cache's unique characteristics, it proposes three novel techniques: (1) Tolerance-Aware Compression: it compresses chunks based on their measured accuracy tolerance to compression. (2) IO-Recompute Pipelined Loading: it introduces recompute to swapping-in for acceleration. (3) Chunk Lifecycle Management: it optimizes the memory activities of chunks with an ahead-of-time swapping-out and an LCTRU (Least Compression-Tolerable and Recently-Used) queue based eviction. In evaluations conducted on well-established traces and various edge devices, \sys reduces context switching latency by up to 2 orders of magnitude when compared to competitive baseline solutions. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: Technical Report

arXiv:2403.10287 [pdf, other]

Few-Shot Image Classification and Segmentation as Visual Question Answering Using Vision-Language Models

Authors: Tian Meng, Yang Tao, Ruilin Lyu, Wuliang Yin

Abstract: The task of few-shot image classification and segmentation (FS-CS) involves classifying and segmenting target objects in a query image, given only a few examples of the target classes. We introduce the Vision-Instructed Segmentation and Evaluation (VISE) method that transforms the FS-CS problem into the Visual Question Answering (VQA) problem, utilising Vision-Language Models (VLMs), and addresses… ▽ More The task of few-shot image classification and segmentation (FS-CS) involves classifying and segmenting target objects in a query image, given only a few examples of the target classes. We introduce the Vision-Instructed Segmentation and Evaluation (VISE) method that transforms the FS-CS problem into the Visual Question Answering (VQA) problem, utilising Vision-Language Models (VLMs), and addresses it in a training-free manner. By enabling a VLM to interact with off-the-shelf vision models as tools, the proposed method is capable of classifying and segmenting target objects using only image-level labels. Specifically, chain-of-thought prompting and in-context learning guide the VLM to answer multiple-choice questions like a human; vision models such as YOLO and Segment Anything Model (SAM) assist the VLM in completing the task. The modular framework of the proposed method makes it easily extendable. Our approach achieves state-of-the-art performance on the Pascal-5i and COCO-20i datasets. △ Less

Submitted 15 March, 2024; originally announced March 2024.

arXiv:2403.09407 [pdf, other]

LM2D: Lyrics- and Music-Driven Dance Synthesis

Authors: Wenjie Yin, Xuejiao Zhao, Yi Yu, Hang Yin, Danica Kragic, Mårten Björkman

Abstract: Dance typically involves professional choreography with complex movements that follow a musical rhythm and can also be influenced by lyrical content. The integration of lyrics in addition to the auditory dimension, enriches the foundational tone and makes motion generation more amenable to its semantic meanings. However, existing dance synthesis methods tend to model motions only conditioned on au… ▽ More Dance typically involves professional choreography with complex movements that follow a musical rhythm and can also be influenced by lyrical content. The integration of lyrics in addition to the auditory dimension, enriches the foundational tone and makes motion generation more amenable to its semantic meanings. However, existing dance synthesis methods tend to model motions only conditioned on audio signals. In this work, we make two contributions to bridge this gap. First, we propose LM2D, a novel probabilistic architecture that incorporates a multimodal diffusion model with consistency distillation, designed to create dance conditioned on both music and lyrics in one diffusion generation step. Second, we introduce the first 3D dance-motion dataset that encompasses both music and lyrics, obtained with pose estimation technologies. We evaluate our model against music-only baseline models with objective metrics and human evaluations, including dancers and choreographers. The results demonstrate LM2D is able to produce realistic and diverse dance matching both lyrics and music. A video summary can be accessed at: https://youtu.be/4XCgvYookvA. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2403.08881 [pdf, other]

Origin of light-induced metastability in ZrTe$_5$

Authors: D. Nevola, N. Aryal, G. D. Gu, P. D. Johnson, W. -G. Yin, Q. Li

Abstract: We study the non-equilibrium electronic structure of a model Dirac semimetal ZrTe$_5$ by using time-and-angle resolved photoemission spectroscopy and density functional theory-based electron and phonon calculations. By measuring the electronic dispersion near the $Γ$ point at time delays up to 10 picoseconds, we discovered that the band spectral weight does not recover during the measured temporal… ▽ More We study the non-equilibrium electronic structure of a model Dirac semimetal ZrTe$_5$ by using time-and-angle resolved photoemission spectroscopy and density functional theory-based electron and phonon calculations. By measuring the electronic dispersion near the $Γ$ point at time delays up to 10 picoseconds, we discovered that the band spectral weight does not recover during the measured temporal window, revealing the existence of light induced metastable state in the electronic structure of this material. Our calculations find that the photoexcited $A_{1g}$ phonon mode lead to a band renormalization that both supports our experimental observations at the zone center and predicts changes to the band structure outside of our experimental window, ultimately showing the evolution from a direct to an indirect gap semimetal; such band renormalization dramatically reduces the electron-hole recombination rate giving rise to the metastability in this system. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: 6 pages, 4 figures

arXiv:2403.07535 [pdf, other]

Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving

Authors: JunDa Cheng, Wei Yin, Kaixuan Wang, Xiaozhi Chen, Shijie Wang, Xin Yang

Abstract: Multi-view depth estimation has achieved impressive performance over various benchmarks. However, almost all current multi-view systems rely on given ideal camera poses, which are unavailable in many real-world scenarios, such as autonomous driving. In this work, we propose a new robustness benchmark to evaluate the depth estimation system under various noisy pose settings. Surprisingly, we find c… ▽ More Multi-view depth estimation has achieved impressive performance over various benchmarks. However, almost all current multi-view systems rely on given ideal camera poses, which are unavailable in many real-world scenarios, such as autonomous driving. In this work, we propose a new robustness benchmark to evaluate the depth estimation system under various noisy pose settings. Surprisingly, we find current multi-view depth estimation methods or single-view and multi-view fusion methods will fail when given noisy pose settings. To address this challenge, we propose a single-view and multi-view fused depth estimation system, which adaptively integrates high-confident multi-view and single-view results for both robust and accurate depth estimations. The adaptive fusion module performs fusion by dynamically selecting high-confidence regions between two branches based on a wrapping confidence map. Thus, the system tends to choose the more reliable branch when facing textureless scenes, inaccurate calibration, dynamic objects, and other degradation or challenging conditions. Our method outperforms state-of-the-art multi-view and fusion methods under robustness testing. Furthermore, we achieve state-of-the-art performance on challenging benchmarks (KITTI and DDAD) when given accurate pose estimations. Project website: https://github.com/Junda24/AFNet/. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: Accepted to CVPR 2024

arXiv:2403.03863 [pdf, other]

X-Shot: A Unified System to Handle Frequent, Few-shot and Zero-shot Learning Simultaneously in Classification

Authors: Hanzi Xu, Muhao Chen, Lifu Huang, Slobodan Vucetic, Wenpeng Yin

Abstract: In recent years, few-shot and zero-shot learning, which learn to predict labels with limited annotated instances, have garnered significant attention. Traditional approaches often treat frequent-shot (freq-shot; labels with abundant instances), few-shot, and zero-shot learning as distinct challenges, optimizing systems for just one of these scenarios. Yet, in real-world settings, label occurrences… ▽ More In recent years, few-shot and zero-shot learning, which learn to predict labels with limited annotated instances, have garnered significant attention. Traditional approaches often treat frequent-shot (freq-shot; labels with abundant instances), few-shot, and zero-shot learning as distinct challenges, optimizing systems for just one of these scenarios. Yet, in real-world settings, label occurrences vary greatly. Some of them might appear thousands of times, while others might only appear sporadically or not at all. For practical deployment, it is crucial that a system can adapt to any label occurrence. We introduce a novel classification challenge: X-shot, reflecting a real-world context where freq-shot, few-shot, and zero-shot labels co-occur without predefined limits. Here, X can span from 0 to positive infinity. The crux of X-shot centers on open-domain generalization and devising a system versatile enough to manage various label scenarios. To solve X-shot, we propose BinBin (Binary INference Based on INstruction following) that leverages the Indirect Supervision from a large collection of NLP tasks via instruction following, bolstered by Weak Supervision provided by large language models. BinBin surpasses previous state-of-the-art techniques on three benchmark datasets across multiple domains. To our knowledge, this is the first work addressing X-shot learning, where X remains variable. △ Less

Submitted 6 March, 2024; originally announced March 2024.

arXiv:2402.18667 [pdf, other]

FOFO: A Benchmark to Evaluate LLMs' Format-Following Capability

Authors: Congying Xia, Chen Xing, Jiangshu Du, Xinyi Yang, Yihao Feng, Ran Xu, Wenpeng Yin, Caiming Xiong

Abstract: This paper presents FoFo, a pioneering benchmark for evaluating large language models' (LLMs) ability to follow complex, domain-specific formats, a crucial yet underexamined capability for their application as AI agents. Despite LLMs' advancements, existing benchmarks fail to assess their format-following proficiency adequately. FoFo fills this gap with a diverse range of real-world formats and in… ▽ More This paper presents FoFo, a pioneering benchmark for evaluating large language models' (LLMs) ability to follow complex, domain-specific formats, a crucial yet underexamined capability for their application as AI agents. Despite LLMs' advancements, existing benchmarks fail to assess their format-following proficiency adequately. FoFo fills this gap with a diverse range of real-world formats and instructions, developed through an AI-Human collaborative method. Our evaluation across both open-source (e.g., Llama 2, WizardLM) and closed-source (e.g., GPT-4, PALM2, Gemini) LLMs highlights three key findings: open-source models significantly lag behind closed-source ones in format adherence; LLMs' format-following performance is independent of their content generation quality; and LLMs' format proficiency varies across different domains. These insights suggest the need for specialized tuning for format-following skills and highlight FoFo's role in guiding the selection of domain-specific AI agents. FoFo is released here at https://github.com/SalesforceAIResearch/FoFo. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: The first two authors contributed equally

arXiv:2402.18568 [pdf, other]

A New Probe of Cosmic Birefringence Using Galaxy Polarization and Shapes

Authors: Weichen Winston Yin, Liang Dai, Junwu Huang, Lingyuan Ji, Simone Ferraro

Abstract: We propose a new method to search for parity-violating new physics via measurements of cosmic birefringence and demonstrate its power in detecting the topological effect originating from an axion string network with an axion-photon coupling as a motivated source of cosmic birefringence. The method, using large galaxy samples, exploits an empirical correlation between the polarization direction of… ▽ More We propose a new method to search for parity-violating new physics via measurements of cosmic birefringence and demonstrate its power in detecting the topological effect originating from an axion string network with an axion-photon coupling as a motivated source of cosmic birefringence. The method, using large galaxy samples, exploits an empirical correlation between the polarization direction of the integrated radio emission from a spiral galaxy and its apparent shape. We devise unbiased minimum-variance quadratic estimators for discrete samples of galaxies with both integrated radio polarization and shape measurements. Assuming a synergy with overlapping optical imaging surveys, we forecast the sensitivity to polarization rotation of the forthcoming SKA radio continuum surveys of spiral galaxies out to $z \sim 1.5$. The angular noise power spectrum of polarization rotation using our method can be lower than that expected from CMB Stage-IV experiments, when assuming a wide survey covering $\sim 1000\,{\rm deg}^2$ and reaching an RMS flux of $\sim 1\,μ{\rm Jy}$. Our method will be complementary to CMB-based methods as it will be subject to different systematics. It can be generalized to probe time-varying or redshift-varying birefringence signals. △ Less

Submitted 28 February, 2024; originally announced February 2024.

arXiv:2402.15896 [pdf, other]

Multimodal Instruction Tuning with Conditional Mixture of LoRA

Authors: Ying Shen, Zhiyang Xu, Qifan Wang, Yu Cheng, Wenpeng Yin, Lifu Huang

Abstract: Multimodal Large Language Models (MLLMs) have demonstrated remarkable proficiency in diverse tasks across different domains, with an increasing focus on improving their zero-shot generalization capabilities for unseen multimodal tasks. Multimodal instruction tuning has emerged as a successful strategy for achieving zero-shot generalization by fine-tuning pre-trained models on diverse multimodal ta… ▽ More Multimodal Large Language Models (MLLMs) have demonstrated remarkable proficiency in diverse tasks across different domains, with an increasing focus on improving their zero-shot generalization capabilities for unseen multimodal tasks. Multimodal instruction tuning has emerged as a successful strategy for achieving zero-shot generalization by fine-tuning pre-trained models on diverse multimodal tasks through instructions. As MLLMs grow in complexity and size, the need for parameter-efficient fine-tuning methods like Low-Rank Adaption (LoRA), which fine-tunes with a minimal set of parameters, becomes essential. However, applying LoRA in multimodal instruction tuning presents the challenge of task interference, which leads to performance degradation, especially when dealing with a broad array of multimodal tasks. To address this, this paper introduces a novel approach that integrates multimodal instruction tuning with Conditional Mixture-of-LoRA (MixLoRA). It innovates upon LoRA by dynamically constructing low-rank adaptation matrices tailored to the unique demands of each input instance, aiming to mitigate task interference. Experimental results on various multimodal evaluation datasets indicate that MixLoRA not only outperforms the conventional LoRA with the same or even higher ranks, demonstrating its efficacy and adaptability in diverse multimodal tasks. △ Less

Submitted 24 February, 2024; originally announced February 2024.

Comments: 8 pages, multimodal instruction tuning

arXiv:2402.14650 [pdf, other]

GaussianPro: 3D Gaussian Splatting with Progressive Propagation

Authors: Kai Cheng, Xiaoxiao Long, Kaizhi Yang, Yao Yao, Wei Yin, Yuexin Ma, Wenping Wang, Xuejin Chen

Abstract: The advent of 3D Gaussian Splatting (3DGS) has recently brought about a revolution in the field of neural rendering, facilitating high-quality renderings at real-time speed. However, 3DGS heavily depends on the initialized point cloud produced by Structure-from-Motion (SfM) techniques. When tackling with large-scale scenes that unavoidably contain texture-less surfaces, the SfM techniques always f… ▽ More The advent of 3D Gaussian Splatting (3DGS) has recently brought about a revolution in the field of neural rendering, facilitating high-quality renderings at real-time speed. However, 3DGS heavily depends on the initialized point cloud produced by Structure-from-Motion (SfM) techniques. When tackling with large-scale scenes that unavoidably contain texture-less surfaces, the SfM techniques always fail to produce enough points in these surfaces and cannot provide good initialization for 3DGS. As a result, 3DGS suffers from difficult optimization and low-quality renderings. In this paper, inspired by classical multi-view stereo (MVS) techniques, we propose GaussianPro, a novel method that applies a progressive propagation strategy to guide the densification of the 3D Gaussians. Compared to the simple split and clone strategies used in 3DGS, our method leverages the priors of the existing reconstructed geometries of the scene and patch matching techniques to produce new Gaussians with accurate positions and orientations. Experiments on both large-scale and small-scale scenes validate the effectiveness of our method, where our method significantly surpasses 3DGS on the Waymo dataset, exhibiting an improvement of 1.15dB in terms of PSNR. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: See the project page for code, data: https://kcheng1021.github.io/gaussianpro.github.io

arXiv:2402.11791 [pdf, other]

SDGE: Stereo Guided Depth Estimation for 360$^\circ$ Camera Sets

Authors: Jialei Xu, Wei Yin, Dong Gong, Junjun Jiang, Xianming Liu

Abstract: Depth estimation is a critical technology in autonomous driving, and multi-camera systems are often used to achieve a 360$^\circ$ perception. These 360$^\circ$ camera sets often have limited or low-quality overlap regions, making multi-view stereo methods infeasible for the entire image. Alternatively, monocular methods may not produce consistent cross-view predictions. To address these issues, we… ▽ More Depth estimation is a critical technology in autonomous driving, and multi-camera systems are often used to achieve a 360$^\circ$ perception. These 360$^\circ$ camera sets often have limited or low-quality overlap regions, making multi-view stereo methods infeasible for the entire image. Alternatively, monocular methods may not produce consistent cross-view predictions. To address these issues, we propose the Stereo Guided Depth Estimation (SGDE) method, which enhances depth estimation of the full image by explicitly utilizing multi-view stereo results on the overlap. We suggest building virtual pinhole cameras to resolve the distortion problem of fisheye cameras and unify the processing for the two types of 360$^\circ$ cameras. For handling the varying noise on camera poses caused by unstable movement, the approach employs a self-calibration method to obtain highly accurate relative poses of the adjacent cameras with minor overlap. These enable the use of robust stereo methods to obtain high-quality depth prior in the overlap region. This prior serves not only as an additional input but also as pseudo-labels that enhance the accuracy of depth estimation methods and improve cross-view prediction consistency. The effectiveness of SGDE is evaluated on one fisheye camera dataset, Synthetic Urban, and two pinhole camera datasets, DDAD and nuScenes. Our experiments demonstrate that SGDE is effective for both supervised and self-supervised depth estimation, and highlight the potential of our method for advancing downstream autonomous driving technologies, such as 3D object detection and occupancy prediction. △ Less

Submitted 2 April, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

arXiv:2402.11592 [pdf, other]

Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark

Authors: Yihua Zhang, Pingzhi Li, Junyuan Hong, Jiaxiang Li, Yimeng Zhang, Wenqing Zheng, Pin-Yu Chen, Jason D. Lee, Wotao Yin, Mingyi Hong, Zhangyang Wang, Sijia Liu, Tianlong Chen

Abstract: In the evolving landscape of natural language processing (NLP), fine-tuning pre-trained Large Language Models (LLMs) with first-order (FO) optimizers like SGD and Adam has become standard. Yet, as LLMs grow {in size}, the substantial memory overhead from back-propagation (BP) for FO gradient computation presents a significant challenge. Addressing this issue is crucial, especially for applications… ▽ More In the evolving landscape of natural language processing (NLP), fine-tuning pre-trained Large Language Models (LLMs) with first-order (FO) optimizers like SGD and Adam has become standard. Yet, as LLMs grow {in size}, the substantial memory overhead from back-propagation (BP) for FO gradient computation presents a significant challenge. Addressing this issue is crucial, especially for applications like on-device training where memory efficiency is paramount. This paper proposes a shift towards BP-free, zeroth-order (ZO) optimization as a solution for reducing memory costs during LLM fine-tuning, building on the initial concept introduced by MeZO. Unlike traditional ZO-SGD methods, our work expands the exploration to a wider array of ZO optimization techniques, through a comprehensive, first-of-its-kind benchmarking study across five LLM families (Roberta, OPT, LLaMA, Vicuna, Mistral), three task complexities, and five fine-tuning schemes. Our study unveils previously overlooked optimization principles, highlighting the importance of task alignment, the role of the forward gradient method, and the balance between algorithm complexity and fine-tuning performance. We further introduce novel enhancements to ZO optimization, including block-wise descent, hybrid training, and gradient sparsity. Our study offers a promising direction for achieving further memory-efficient LLM fine-tuning. Codes to reproduce all our experiments are at https://github.com/ZO-Bench/ZO-LLM . △ Less

Submitted 27 May, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

arXiv:2402.11138 [pdf, other]

Contrastive Instruction Tuning

Authors: Tianyi Lorena Yan, Fei Wang, James Y. Huang, Wenxuan Zhou, Fan Yin, Aram Galstyan, Wenpeng Yin, Muhao Chen

Abstract: Instruction tuning has been used as a promising approach to improve the performance of large language models (LLMs) on unseen tasks. However, current LLMs exhibit limited robustness to unseen instructions, generating inconsistent outputs when the same instruction is phrased with slightly varied forms or language styles. This behavior indicates LLMs' lack of robustness to textual variations and gen… ▽ More Instruction tuning has been used as a promising approach to improve the performance of large language models (LLMs) on unseen tasks. However, current LLMs exhibit limited robustness to unseen instructions, generating inconsistent outputs when the same instruction is phrased with slightly varied forms or language styles. This behavior indicates LLMs' lack of robustness to textual variations and generalizability to unseen instructions, potentially leading to trustworthiness issues. Accordingly, we propose Contrastive Instruction Tuning, which maximizes the similarity between the hidden representations of semantically equivalent instruction-instance pairs while minimizing the similarity between semantically different ones. To facilitate this approach, we augment the existing FLAN collection by paraphrasing task instructions. Experiments on the PromptBench benchmark show that CoIN consistently improves LLMs' robustness to unseen instructions with variations across character, word, sentence, and semantic levels by an average of +2.5% in accuracy. Code is available at https://github.com/luka-group/CoIN. △ Less

Submitted 6 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

Comments: ACL 2024 Findings

arXiv:2402.11122 [pdf, other]

Navigating the Dual Facets: A Comprehensive Evaluation of Sequential Memory Editing in Large Language Models

Authors: Zihao Lin, Mohammad Beigi, Hongxuan Li, Yufan Zhou, Yuxiang Zhang, Qifan Wang, Wenpeng Yin, Lifu Huang

Abstract: Memory Editing (ME) has emerged as an efficient method to modify erroneous facts or inject new facts into Large Language Models (LLMs). Two mainstream ME methods exist: parameter-modifying ME and parameter-preserving ME (integrating extra modules while preserving original parameters). Regrettably, previous studies on ME evaluation have two critical limitations: (i) evaluating LLMs with single edit… ▽ More Memory Editing (ME) has emerged as an efficient method to modify erroneous facts or inject new facts into Large Language Models (LLMs). Two mainstream ME methods exist: parameter-modifying ME and parameter-preserving ME (integrating extra modules while preserving original parameters). Regrettably, previous studies on ME evaluation have two critical limitations: (i) evaluating LLMs with single edit only, neglecting the need for continuous editing, and (ii) evaluations focusing solely on basic factual triples, overlooking broader LLM capabilities like logical reasoning and reading understanding. This study addresses these limitations with contributions threefold: (i) We explore how ME affects a wide range of fundamental capabilities of LLMs under sequential editing. Experimental results reveal an intriguing phenomenon: Most parameter-modifying ME consistently degrade performance across all tasks after a few sequential edits. In contrast, parameter-preserving ME effectively maintains LLMs' fundamental capabilities but struggles to accurately recall edited knowledge presented in a different format. (ii) We extend our evaluation to different editing settings, such as layers to edit, model size, instruction tuning, etc. Experimental findings indicate several strategies that can potentially mitigate the adverse effects of ME. (iii) We further explain why parameter-modifying ME damages LLMs from three dimensions: parameter changes after editing, language modeling capability, and the in-context learning capability. Our in-depth study advocates more careful use of ME in real-world scenarios. △ Less

Submitted 16 February, 2024; originally announced February 2024.

Comments: preprint, 15 pages

arXiv:2402.11095 [pdf, other]

GIM: Learning Generalizable Image Matcher From Internet Videos

Authors: Xuelun Shen, Zhipeng Cai, Wei Yin, Matthias Müller, Zijun Li, Kaixuan Wang, Xiaozhi Chen, Cheng Wang

Abstract: Image matching is a fundamental computer vision problem. While learning-based methods achieve state-of-the-art performance on existing benchmarks, they generalize poorly to in-the-wild images. Such methods typically need to train separate models for different scene types and are impractical when the scene type is unknown in advance. One of the underlying problems is the limited scalability of exis… ▽ More Image matching is a fundamental computer vision problem. While learning-based methods achieve state-of-the-art performance on existing benchmarks, they generalize poorly to in-the-wild images. Such methods typically need to train separate models for different scene types and are impractical when the scene type is unknown in advance. One of the underlying problems is the limited scalability of existing data construction pipelines, which limits the diversity of standard image matching datasets. To address this problem, we propose GIM, a self-training framework for learning a single generalizable model based on any image matching architecture using internet videos, an abundant and diverse data source. Given an architecture, GIM first trains it on standard domain-specific datasets and then combines it with complementary matching methods to create dense labels on nearby frames of novel videos. These labels are filtered by robust fitting, and then enhanced by propagating them to distant frames. The final model is trained on propagated data with strong augmentations. We also propose ZEB, the first zero-shot evaluation benchmark for image matching. By mixing data from diverse domains, ZEB can thoroughly assess the cross-domain generalization performance of different methods. Applying GIM consistently improves the zero-shot performance of 3 state-of-the-art image matching architectures; with 50 hours of YouTube videos, the relative zero-shot performance improves by 8.4%-18.1%. GIM also enables generalization to extreme cross-domain data such as Bird Eye View (BEV) images of projected 3D point clouds (Fig. 1(c)). More importantly, our single zero-shot model consistently outperforms domain-specific baselines when evaluated on downstream tasks inherent to their respective domains. The video presentation is available at https://www.youtube.com/watch?v=FU_MJLD8LeY. △ Less

Submitted 16 February, 2024; originally announced February 2024.

Comments: Accepted to ICLR 2024 for spotlight presentation

arXiv:2402.10874 [pdf, other]

Design of 2D Skyrmionic Metamaterial Through Controlled Assembly

Authors: Qichen Xu, Zhuanglin Shen, Alexander Edström, I. P. Miranda, Zhiwei Lu, Anders Bergman, Danny Thonig, Wanjian Yin, Olle Eriksson, Anna Delin

Abstract: Despite extensive research on magnetic skyrmions and antiskyrmions, a significant challenge remains in crafting nontrivial high-order skyrmionic textures with varying, or even tailor-made, topologies. We address this challenge, by focusing on a construction pathway of skyrmionics metamaterial within a monolayer thin film and suggest several promising lattice-like, flakes-like, and cell-like skyrmi… ▽ More Despite extensive research on magnetic skyrmions and antiskyrmions, a significant challenge remains in crafting nontrivial high-order skyrmionic textures with varying, or even tailor-made, topologies. We address this challenge, by focusing on a construction pathway of skyrmionics metamaterial within a monolayer thin film and suggest several promising lattice-like, flakes-like, and cell-like skyrmionic metamaterials that are surprisingly stable. Central to our approach is the concept of 'simulated controlled assembly', in short, a protocol inspired by 'click chemistry' that allows for positioning topological magnetic structures where one likes, and then allowing for energy minimization to elucidate the stability. Utilizing high-throughput atomistic-spin-dynamic (ASD) simulations alongside state-of-the-art AI-driven tools, we have isolated skyrmions (topological charge Q=1), antiskyrmions (Q=-1), and skyrmionium (Q=0). These entities serve as foundational 'skyrmionic building blocks' to forming reported intricate textures. In this work, two key contributions are introduced to the field of skyrmionic systems. First, we present a novel method for integrating control assembly protocols for the stabilization and investigation of topological magnets, which marks a significant advancement in the ability to explore new skyrmionic textures. Second, we report on the discovery of skyrmionic metamaterials, which shows a plethora of complex topologies that are possible to investigate theoretically and experimentally. △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2402.09501 [pdf, other]

Bubble Misalignment Mechanism for Axions

Authors: Junseok Lee, Kai Murai, Fuminobu Takahashi, Wen Yin

Abstract: We study the dynamics of axions at first-order phase transitions in non-Abelian gauge theories. When the duration of the phase transition is short compared to the timescale of the axion oscillations, the axion dynamics is similar to the trapped misalignment mechanism. On the other hand, if this is not the case, the axions are initially expelled from the inside of the bubbles, generating axion wave… ▽ More We study the dynamics of axions at first-order phase transitions in non-Abelian gauge theories. When the duration of the phase transition is short compared to the timescale of the axion oscillations, the axion dynamics is similar to the trapped misalignment mechanism. On the other hand, if this is not the case, the axions are initially expelled from the inside of the bubbles, generating axion waves on the outside. Analogous to the Fermi acceleration, these axions gain energy by repeatedly scattering off the bubble walls. Once they acquire enough energy, they can enter the bubbles. The novel ``bubble misalignment mechanism'' can significantly enhance the axion abundance, compared to models where the axion mass is either constant or varies continuously as a function of temperature. The increase in axion abundance depends on the axion mass, the duration of the phase transition, and the bubble wall velocity. This mechanism results in a spatially inhomogeneous distribution of axions, which could lead to the formation of axion miniclusters. It has potential implications for the formation of oscillons/I-balls, axion warm dark matter, cosmic birefringence, and the production of dark photons. △ Less

Submitted 18 March, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

Comments: 15pages, 14figures, 1table, v2: added references, corrected an error in axion number at transmission, conclusions unchanged

Report number: TU-1221

arXiv:2402.07976 [pdf, other]

First Result for Dark Matter Search by WINERED

Authors: Wen Yin, Taiki Bessho, Yuji Ikeda, Hitomi Kobayashi, Daisuke Taniguchi, Hiroaki Sameshima, Noriyuki Matsunaga, Shogo Otsubo, Yuki Sarugaku, Tomomi Takeuchi, Haruki Kato, Satoshi Hamano, Hideyo Kawakita

Abstract: The identity of dark matter has been a mystery in astronomy, cosmology, and particle theory for about a century. Bessho, Ikeda, and Yin (2022), three of the current authors, proposed using the state-of-the-art infrared spectrographs, including WINERED at $6.5$m Magellan Clay telescope and NIRSpec at James Webb Space Telescope, as efficient detectors for the indirect detection of dark matter with t… ▽ More The identity of dark matter has been a mystery in astronomy, cosmology, and particle theory for about a century. Bessho, Ikeda, and Yin (2022), three of the current authors, proposed using the state-of-the-art infrared spectrographs, including WINERED at $6.5$m Magellan Clay telescope and NIRSpec at James Webb Space Telescope, as efficient detectors for the indirect detection of dark matter with the mass around eV by measuring the line photons from the dark matter two body decays. Applying this concept, we have performed spectrographic observations of dwarf spheroidal galaxies (dSphs) Leo V and Tucana II using WINERED by utilizing an object-sky-object nodding observation technique for background subtraction. We present the first result from this dark matter search. Employing zero consistent flux data after the sky subtraction, we have established one of the most stringent limits to date on dark matter decaying into line photons in the mass range of $1.8-2.7\,$eV. Our data can also be applied to constrain other spectra of photons from the dSphs. △ Less

Submitted 12 February, 2024; originally announced February 2024.

Comments: 15 pages, 4 figures, 1 table, 6 data files attached

Report number: TU-1220

arXiv:2402.07099 [pdf, other]

Rethinking the Capacity of Graph Neural Networks for Branching Strategy

Authors: Ziang Chen, Jialin Liu, Xiaohan Chen, Xinshang Wang, Wotao Yin

Abstract: Graph neural networks (GNNs) have been widely used to predict properties and heuristics of mixed-integer linear programs (MILPs) and hence accelerate MILP solvers. This paper investigates the capacity of GNNs to represent strong branching (SB), the most effective yet computationally expensive heuristic employed in the branch-and-bound algorithm. In the literature, message-passing GNN (MP-GNN), as… ▽ More Graph neural networks (GNNs) have been widely used to predict properties and heuristics of mixed-integer linear programs (MILPs) and hence accelerate MILP solvers. This paper investigates the capacity of GNNs to represent strong branching (SB), the most effective yet computationally expensive heuristic employed in the branch-and-bound algorithm. In the literature, message-passing GNN (MP-GNN), as the simplest GNN structure, is frequently used as a fast approximation of SB and we find that not all MILPs's SB can be represented with MP-GNN. We precisely define a class of "MP-tractable" MILPs for which MP-GNNs can accurately approximate SB scores. Particularly, we establish a universal approximation theorem: for any data distribution over the MP-tractable class, there always exists an MP-GNN that can approximate the SB score with arbitrarily high accuracy and arbitrarily high probability, which lays a theoretical foundation of the existing works on imitating SB with MP-GNN. For MILPs without the MP-tractability, unfortunately, a similar result is impossible, which can be illustrated by two MILP instances with different SB scores that cannot be distinguished by any MP-GNN, regardless of the number of parameters. Recognizing this, we explore another GNN structure called the second-order folklore GNN (2-FGNN) that overcomes this limitation, and the aforementioned universal approximation theorem can be extended to the entire MILP space using 2-FGNN, regardless of the MP-tractability. A small-scale numerical experiment is conducted to directly validate our theoretical findings. △ Less

Submitted 8 June, 2024; v1 submitted 10 February, 2024; originally announced February 2024.

arXiv:2402.07070 [pdf, ps, other]

Efficient Algorithms for Sum-of-Minimum Optimization

Authors: Lisang Ding, Ziang Chen, Xinshang Wang, Wotao Yin

Abstract: In this work, we propose a novel optimization model termed "sum-of-minimum" optimization. This model seeks to minimize the sum or average of $N$ objective functions over $k$ parameters, where each objective takes the minimum value of a predefined sub-function with respect to the $k$ parameters. This universal framework encompasses numerous clustering applications in machine learning and related fi… ▽ More In this work, we propose a novel optimization model termed "sum-of-minimum" optimization. This model seeks to minimize the sum or average of $N$ objective functions over $k$ parameters, where each objective takes the minimum value of a predefined sub-function with respect to the $k$ parameters. This universal framework encompasses numerous clustering applications in machine learning and related fields. We develop efficient algorithms for solving sum-of-minimum optimization problems, inspired by a randomized initialization algorithm for the classic $k$-means (Arthur & Vassilvitskii, 2007) and Lloyd's algorithm (Lloyd, 1982). We establish a new tight bound for the generalized initialization algorithm and prove a gradient-descent-like convergence rate for generalized Lloyd's algorithm. The efficiency of our algorithms is numerically examined on multiple tasks, including generalized principal component analysis, mixed linear regression, and small-scale neural network training. Our approach compares favorably to previous ones based on simpler-but-less-precise optimization reformulations. △ Less

Submitted 9 June, 2024; v1 submitted 10 February, 2024; originally announced February 2024.

arXiv:2402.00157 [pdf, other]

Large Language Models for Mathematical Reasoning: Progresses and Challenges

Authors: Janice Ahn, Rishu Verma, Renze Lou, Di Liu, Rui Zhang, Wenpeng Yin

Abstract: Mathematical reasoning serves as a cornerstone for assessing the fundamental cognitive capabilities of human intelligence. In recent times, there has been a notable surge in the development of Large Language Models (LLMs) geared towards the automated resolution of mathematical problems. However, the landscape of mathematical problem types is vast and varied, with LLM-oriented techniques undergoing… ▽ More Mathematical reasoning serves as a cornerstone for assessing the fundamental cognitive capabilities of human intelligence. In recent times, there has been a notable surge in the development of Large Language Models (LLMs) geared towards the automated resolution of mathematical problems. However, the landscape of mathematical problem types is vast and varied, with LLM-oriented techniques undergoing evaluation across diverse datasets and settings. This diversity makes it challenging to discern the true advancements and obstacles within this burgeoning field. This survey endeavors to address four pivotal dimensions: i) a comprehensive exploration of the various mathematical problems and their corresponding datasets that have been investigated; ii) an examination of the spectrum of LLM-oriented techniques that have been proposed for mathematical problem-solving; iii) an overview of factors and concerns affecting LLMs in solving math; and iv) an elucidation of the persisting challenges within this domain. To the best of our knowledge, this survey stands as one of the first extensive examinations of the landscape of LLMs in the realm of mathematics, providing a holistic perspective on the current state, accomplishments, and future challenges in this rapidly evolving field. △ Less

Submitted 5 April, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

Comments: EACL 2024 Student Research Workshop, 8 pages

Showing 1–50 of 528 results for author: Yin, W