subscribe to arXiv mailings

Benchmarking Large Neighborhood Search for Multi-Agent Path Finding

Authors: Jiaqi Tan, Yudong Luo, Jiaoyang Li, Hang Ma

Abstract: Multi-Agent Path Finding (MAPF) aims to arrange collision-free goal-reaching paths for a group of agents. Anytime MAPF solvers based on large neighborhood search (LNS) have gained prominence recently due to their flexibility and scalability. Neighborhood selection strategy is crucial to the success of MAPF-LNS and a flurry of methods have been proposed. However, several pitfalls exist and hinder a… ▽ More Multi-Agent Path Finding (MAPF) aims to arrange collision-free goal-reaching paths for a group of agents. Anytime MAPF solvers based on large neighborhood search (LNS) have gained prominence recently due to their flexibility and scalability. Neighborhood selection strategy is crucial to the success of MAPF-LNS and a flurry of methods have been proposed. However, several pitfalls exist and hinder a comprehensive evaluation of these new methods, which mainly include: 1) Lower than actual or incorrect baseline performance; 2) Lack of a unified evaluation setting and criterion; 3) Lack of a codebase or executable model for supervised learning methods. To overcome these challenges, we conduct a fair comparison across prominent methods on the same benchmark and hyperparameter search settings. Additionally, we propose a simple neighborhood selection strategy which marks a clear advancement in terms of runtime efficiency in large maps with large number of agents. Our benchmarking evaluation promotes new challenges for existing learning based methods and presents opportunities for future research when machine learning is integrated with MAPF-LNS. Code and data are available at https://github.com/ChristinaTan0704/mapf-lns-benchmark. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.09395 [pdf, other]

doi 10.1145/3637528.3671559

Deep Bag-of-Words Model: An Efficient and Interpretable Relevance Architecture for Chinese E-Commerce

Authors: Zhe Lin, Jiwei Tan, Dan Ou, Xi Chen, Shaowei Yao, Bo Zheng

Abstract: Text relevance or text matching of query and product is an essential technique for the e-commerce search system to ensure that the displayed products can match the intent of the query. Many studies focus on improving the performance of the relevance model in search system. Recently, pre-trained language models like BERT have achieved promising performance on the text relevance task. While these mo… ▽ More Text relevance or text matching of query and product is an essential technique for the e-commerce search system to ensure that the displayed products can match the intent of the query. Many studies focus on improving the performance of the relevance model in search system. Recently, pre-trained language models like BERT have achieved promising performance on the text relevance task. While these models perform well on the offline test dataset, there are still obstacles to deploy the pre-trained language model to the online system as their high latency. The two-tower model is extensively employed in industrial scenarios, owing to its ability to harmonize performance with computational efficiency. Regrettably, such models present an opaque ``black box'' nature, which prevents developers from making special optimizations. In this paper, we raise deep Bag-of-Words (DeepBoW) model, an efficient and interpretable relevance architecture for Chinese e-commerce. Our approach proposes to encode the query and the product into the sparse BoW representation, which is a set of word-weight pairs. The weight means the important or the relevant score between the corresponding word and the raw text. The relevance score is measured by the accumulation of the matched word between the sparse BoW representation of the query and the product. Compared to popular dense distributed representation that usually suffers from the drawback of black-box, the most advantage of the proposed representation model is highly explainable and interventionable, which is a superior advantage to the deployment and operation of online search engines. Moreover, the online efficiency of the proposed model is even better than the most efficient inner product form of dense representation ... △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: KDD'24 accepted paper

arXiv:2407.08974 [pdf, other]

Topology-enhanced machine learning model (Top-ML) for anticancer peptide prediction

Authors: Joshua Zhi En Tan, JunJie Wee, Xue Gong, Kelin Xia

Abstract: Recently, therapeutic peptides have demonstrated great promise for cancer treatment. To explore powerful anticancer peptides, artificial intelligence (AI)-based approaches have been developed to systematically screen potential candidates. However, the lack of efficient featurization of peptides has become a bottleneck for these machine-learning models. In this paper, we propose a topology-enhanced… ▽ More Recently, therapeutic peptides have demonstrated great promise for cancer treatment. To explore powerful anticancer peptides, artificial intelligence (AI)-based approaches have been developed to systematically screen potential candidates. However, the lack of efficient featurization of peptides has become a bottleneck for these machine-learning models. In this paper, we propose a topology-enhanced machine learning model (Top-ML) for anticancer peptide prediction. Our Top-ML employs peptide topological features derived from its sequence "connection" information characterized by vector and spectral descriptors. Our Top-ML model has been validated on two widely used AntiCP 2.0 benchmark datasets and has achieved state-of-the-art performance. Our results highlight the potential of leveraging novel topology-based featurization to accelerate the identification of anticancer peptides. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.07775 [pdf, other]

Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs

Authors: Hao-Tien Lewis Chiang, Zhuo Xu, Zipeng Fu, Mithun George Jacob, Tingnan Zhang, Tsang-Wei Edward Lee, Wenhao Yu, Connor Schenck, David Rendleman, Dhruv Shah, Fei Xia, Jasmine Hsu, Jonathan Hoech, Pete Florence, Sean Kirmani, Sumeet Singh, Vikas Sindhwani, Carolina Parada, Chelsea Finn, Peng Xu, Sergey Levine, Jie Tan

Abstract: An elusive goal in navigation research is to build an intelligent agent that can understand multimodal instructions including natural language and image, and perform useful navigation. To achieve this, we study a widely useful category of navigation tasks we call Multimodal Instruction Navigation with demonstration Tours (MINT), in which the environment prior is provided through a previously recor… ▽ More An elusive goal in navigation research is to build an intelligent agent that can understand multimodal instructions including natural language and image, and perform useful navigation. To achieve this, we study a widely useful category of navigation tasks we call Multimodal Instruction Navigation with demonstration Tours (MINT), in which the environment prior is provided through a previously recorded demonstration video. Recent advances in Vision Language Models (VLMs) have shown a promising path in achieving this goal as it demonstrates capabilities in perceiving and reasoning about multimodal inputs. However, VLMs are typically trained to predict textual output and it is an open research question about how to best utilize them in navigation. To solve MINT, we present Mobility VLA, a hierarchical Vision-Language-Action (VLA) navigation policy that combines the environment understanding and common sense reasoning power of long-context VLMs and a robust low-level navigation policy based on topological graphs. The high-level policy consists of a long-context VLM that takes the demonstration tour video and the multimodal user instruction as input to find the goal frame in the tour video. Next, a low-level policy uses the goal frame and an offline constructed topological graph to generate robot actions at every timestep. We evaluated Mobility VLA in a 836m^2 real world environment and show that Mobility VLA has a high end-to-end success rates on previously unsolved multimodal instructions such as "Where should I return this?" while holding a plastic bin. A video demonstrating Mobility VLA can be found here: https://youtu.be/-Tof__Q8_5s △ Less

Submitted 12 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.06899 [pdf, ps, other]

Decay estimates and Strichartz inequalities for a class of dispersive equations on H-type groups

Authors: Manli Song, Jinggang Tan

Abstract: Let $\mathcal{L}$ be the sub-Laplacian on H-type groups and $φ: \mathbb{R}^+ \to \mathbb{R}$ be a smooth function. The primary objective of the paper is to study the decay estimate for a class of dispersive semigroup given by $e^{itφ(\mathcal{L})}$. Inspired by earlier work of Guo-Peng-Wang \cite{GPW2008} in the Euclidean space and Song-Yang \cite{SY2023} on the Heisenberg group, we overcome the d… ▽ More Let $\mathcal{L}$ be the sub-Laplacian on H-type groups and $φ: \mathbb{R}^+ \to \mathbb{R}$ be a smooth function. The primary objective of the paper is to study the decay estimate for a class of dispersive semigroup given by $e^{itφ(\mathcal{L})}$. Inspired by earlier work of Guo-Peng-Wang \cite{GPW2008} in the Euclidean space and Song-Yang \cite{SY2023} on the Heisenberg group, we overcome the difficulty arising from the non-homogeneousness of $φ$ by frequency localization, which is based on the non-commutative Fourier transform on H-type groups, the properties of the Laguerre functions and Bessel functions, and the stationary phase theorem. Finally, as applications, we derive the new Strichartz inequalities for the solutions of some specific equations, such as the fractional Schrödinger equation, the fourth-order Schrödinger equation, the beam equation and the Klein-Gordon equation, which corresponds to $φ(r)=r^α$, $r^2+r,\sqrt{1+r^2},\sqrt{1+r}$, respectively. Moreover, we also prove that the time decay is sharp in these cases. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: arXiv admin note: text overlap with arXiv:2205.04106

MSC Class: 22E25; 33C45; 35H20; 35B40

arXiv:2407.04942 [pdf, other]

FOSP: Fine-tuning Offline Safe Policy through World Models

Authors: Chenyang Cao, Yucheng Xin, Silang Wu, Longxiang He, Zichen Yan, Junbo Tan, Xueqian Wang

Abstract: Model-based Reinforcement Learning (RL) has shown its high training efficiency and capability of handling high-dimensional tasks. Regarding safety issues, safe model-based RL can achieve nearly zero-cost performance and effectively manage the trade-off between performance and safety. Nevertheless, prior works still pose safety challenges due to the online exploration in real-world deployment. To a… ▽ More Model-based Reinforcement Learning (RL) has shown its high training efficiency and capability of handling high-dimensional tasks. Regarding safety issues, safe model-based RL can achieve nearly zero-cost performance and effectively manage the trade-off between performance and safety. Nevertheless, prior works still pose safety challenges due to the online exploration in real-world deployment. To address this, some offline RL methods have emerged as solutions, which learn from a static dataset in a safe way by avoiding interactions with the environment. In this paper, we aim to further enhance safety during the deployment stage for vision-based robotic tasks by fine-tuning an offline-trained policy. We incorporate in-sample optimization, model-based policy expansion, and reachability guidance to construct a safe offline-to-online framework. Moreover, our method proves to improve the generalization of offline policy in unseen safety-constrained scenarios. Finally, the efficiency of our method is validated on simulation benchmarks with five vision-only tasks and a real robot by solving some deployment problems using limited data. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 21 pages

arXiv:2407.02047 [pdf, other]

CountFormer: Multi-View Crowd Counting Transformer

Authors: Hong Mo, Xiong Zhang, Jianchao Tan, Cheng Yang, Qiong Gu, Bo Hang, Wenqi Ren

Abstract: Multi-view counting (MVC) methods have shown their superiority over single-view counterparts, particularly in situations characterized by heavy occlusion and severe perspective distortions. However, hand-crafted heuristic features and identical camera layout requirements in conventional MVC methods limit their applicability and scalability in real-world scenarios.In this work, we propose a concise… ▽ More Multi-view counting (MVC) methods have shown their superiority over single-view counterparts, particularly in situations characterized by heavy occlusion and severe perspective distortions. However, hand-crafted heuristic features and identical camera layout requirements in conventional MVC methods limit their applicability and scalability in real-world scenarios.In this work, we propose a concise 3D MVC framework called \textbf{CountFormer}to elevate multi-view image-level features to a scene-level volume representation and estimate the 3D density map based on the volume features. By incorporating a camera encoding strategy, CountFormer successfully embeds camera parameters into the volume query and image-level features, enabling it to handle various camera layouts with significant differences.Furthermore, we introduce a feature lifting module capitalized on the attention mechanism to transform image-level features into a 3D volume representation for each camera view. Subsequently, the multi-view volume aggregation module attentively aggregates various multi-view volumes to create a comprehensive scene-level volume representation, allowing CountFormer to handle images captured by arbitrary dynamic camera layouts. The proposed method performs favorably against the state-of-the-art approaches across various widely used datasets, demonstrating its greater suitability for real-world deployment compared to conventional MVC frameworks. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: Accepted By ECCV2024

arXiv:2406.19030 [pdf, other]

Using diffusion model as constraint: Empower Image Restoration Network Training with Diffusion Model

Authors: Jiangtong Tan, Feng Zhao

Abstract: Image restoration has made marvelous progress with the advent of deep learning. Previous methods usually rely on designing powerful network architecture to elevate performance, however, the natural visual effect of the restored results is limited by color and texture distortions. Besides the visual perceptual quality, the semantic perception recovery is an important but often overlooked perspectiv… ▽ More Image restoration has made marvelous progress with the advent of deep learning. Previous methods usually rely on designing powerful network architecture to elevate performance, however, the natural visual effect of the restored results is limited by color and texture distortions. Besides the visual perceptual quality, the semantic perception recovery is an important but often overlooked perspective of restored image, which is crucial for the deployment in high-level tasks. In this paper, we propose a new perspective to resort these issues by introducing a naturalness-oriented and semantic-aware optimization mechanism, dubbed DiffLoss. Specifically, inspired by the powerful distribution coverage capability of the diffusion model for natural image generation, we exploit the Markov chain sampling property of diffusion model and project the restored results of existing networks into the sampling space. Besides, we reveal that the bottleneck feature of diffusion models, also dubbed h-space feature, is a natural high-level semantic space. We delve into this property and propose a semantic-aware loss to further unlock its potential of semantic perception recovery, which paves the way to connect image restoration task and downstream high-level recognition task. With these two strategies, the DiffLoss can endow existing restoration methods with both more natural and semantic-aware results. We verify the effectiveness of our method on substantial common image restoration tasks and benchmarks. Code will be available at https://github.com/JosephTiTan/DiffLoss. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.18897 [pdf, other]

Resilience of the surface code to error bursts

Authors: Shi Jie Samuel Tan, Christopher A. Pattison, Matt McEwen, John Preskill

Abstract: Quantum error correction works effectively only if the error rate of gate operations is sufficiently low. However, some rare physical mechanisms can cause a temporary increase in the error rate that affects many qubits; examples include ionizing radiation in superconducting hardware and large deviations in the global control of atomic systems. We refer to such rare transient spikes in the gate err… ▽ More Quantum error correction works effectively only if the error rate of gate operations is sufficiently low. However, some rare physical mechanisms can cause a temporary increase in the error rate that affects many qubits; examples include ionizing radiation in superconducting hardware and large deviations in the global control of atomic systems. We refer to such rare transient spikes in the gate error rate as error bursts. In this work, we investigate the resilience of the surface code to generic error bursts. We assume that, after appropriate mitigation strategies, the spike in the error rate lasts for only a single syndrome extraction cycle; we also assume that the enhanced error rate is uniform across the code block. Under these assumptions, and for a circuit-level depolarizing noise model, we perform Monte Carlo simulations to determine the regime in burst error rate and background error rate for which the memory time becomes arbitrarily long as the code block size grows. Our results indicate that suitable hardware mitigation methods combined with standard decoding methods may suffice to protect against transient error bursts in the surface code. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.18518 [pdf, other]

APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets

Authors: Zuxin Liu, Thai Hoang, Jianguo Zhang, Ming Zhu, Tian Lan, Shirley Kokane, Juntao Tan, Weiran Yao, Zhiwei Liu, Yihao Feng, Rithesh Murthy, Liangwei Yang, Silvio Savarese, Juan Carlos Niebles, Huan Wang, Shelby Heinecke, Caiming Xiong

Abstract: The advancement of function-calling agent models requires diverse, reliable, and high-quality datasets. This paper presents APIGen, an automated data generation pipeline designed to synthesize verifiable high-quality datasets for function-calling applications. We leverage APIGen and collect 3,673 executable APIs across 21 different categories to generate diverse function-calling datasets in a scal… ▽ More The advancement of function-calling agent models requires diverse, reliable, and high-quality datasets. This paper presents APIGen, an automated data generation pipeline designed to synthesize verifiable high-quality datasets for function-calling applications. We leverage APIGen and collect 3,673 executable APIs across 21 different categories to generate diverse function-calling datasets in a scalable and structured manner. Each data in our dataset is verified through three hierarchical stages: format checking, actual function executions, and semantic verification, ensuring its reliability and correctness. We demonstrate that models trained with our curated datasets, even with only 7B parameters, can achieve state-of-the-art performance on the Berkeley Function-Calling Benchmark, outperforming multiple GPT-4 models. Moreover, our 1B model achieves exceptional performance, surpassing GPT-3.5-Turbo and Claude-3 Haiku. We release a dataset containing 60,000 high-quality entries, aiming to advance the field of function-calling agent domains. The dataset is available on Huggingface: https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k and the project homepage: https://apigen-pipeline.github.io/ △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.16671 [pdf, other]

STAR: Swarm Technology for Aerial Robotics Research

Authors: Jimmy Chiun, Yan Rui Tan, Yuhong Cao, John Tan, Guillaume Sartoretti

Abstract: In recent years, the field of aerial robotics has witnessed significant progress, finding applications in diverse domains, including post-disaster search and rescue operations. Despite these strides, the prohibitive acquisition costs associated with deploying physical multi-UAV systems have posed challenges, impeding their widespread utilization in research endeavors. To overcome these challenges,… ▽ More In recent years, the field of aerial robotics has witnessed significant progress, finding applications in diverse domains, including post-disaster search and rescue operations. Despite these strides, the prohibitive acquisition costs associated with deploying physical multi-UAV systems have posed challenges, impeding their widespread utilization in research endeavors. To overcome these challenges, we present STAR (Swarm Technology for Aerial Robotics Research), a framework developed explicitly to improve the accessibility of aerial swarm research experiments. Our framework introduces a swarm architecture based on the Crazyflie, a low-cost, open-source, palm-sized aerial platform, well suited for experimental swarm algorithms. To augment cost-effectiveness and mitigate the limitations of employing low-cost robots in experiments, we propose a landmark-based localization module leveraging fiducial markers. This module, also serving as a target detection module, enhances the adaptability and versatility of the framework. Additionally, collision and obstacle avoidance are implemented through velocity obstacles. The presented work strives to bridge the gap between theoretical advances and tangible implementations, thus fostering progress in the field. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.15539 [pdf, other]

First Measurement of Deeply Virtual Compton Scattering on the Neutron with Detection of the Active Neutron

Authors: CLAS Collaboration, A. Hobart, S. Niccolai, M. Čuić, K. Kumerički, P. Achenbach, J. S. Alvarado, W. R. Armstrong, H. Atac, H. Avakian, L. Baashen, N. A. Baltzell, L. Barion, M. Bashkanov, M. Battaglieri, B. Benkel, F. Benmokhtar, A. Bianconi, A. S. Biselli, S. Boiarinov, M. Bondi, W. A. Booth, F. Bossù, K. -Th. Brinkmann, W. J. Briscoe , et al. (124 additional authors not shown)

Abstract: Measuring Deeply Virtual Compton Scattering on the neutron is one of the necessary steps to understand the structure of the nucleon in terms of Generalized Parton Distributions (GPDs). Neutron targets play a complementary role to transversely polarized proton targets in the determination of the GPD $E$. This poorly known and poorly constrained GPD is essential to obtain the contribution of the qua… ▽ More Measuring Deeply Virtual Compton Scattering on the neutron is one of the necessary steps to understand the structure of the nucleon in terms of Generalized Parton Distributions (GPDs). Neutron targets play a complementary role to transversely polarized proton targets in the determination of the GPD $E$. This poorly known and poorly constrained GPD is essential to obtain the contribution of the quarks' angular momentum to the spin of the nucleon. DVCS on the neutron was measured for the first time selecting the exclusive final state by detecting the neutron, using the Jefferson Lab longitudinally polarized electron beam, with energies up to 10.6 GeV, and the CLAS12 detector. The extracted beam-spin asymmetries, combined with DVCS observables measured on the proton, allow a clean quark-flavor separation of the imaginary parts of the GPDs $H$ and $E$. △ Less

Submitted 25 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

Comments: 7 pages, 6 figures

Report number: JLAB-PHY-24-4089

arXiv:2406.13402 [pdf, other]

When $t$-intersecting hypergraphs admit bounded $c$-strong colourings

Authors: Kevin Hendrey, Freddie Illingworth, Nina Kamčev, Jane Tan

Abstract: The $c$-strong chromatic number of a hypergraph is the smallest number of colours needed to colour its vertices so that every edge sees at least $c$ colours or is rainbow. We show that every $t$-intersecting hypergraph has bounded $(t + 1)$-strong chromatic number, resolving a problem of Blais, Weinstein and Yoshida. In fact, we characterise when a $t$-intersecting hypergraph has large $c$-strong… ▽ More The $c$-strong chromatic number of a hypergraph is the smallest number of colours needed to colour its vertices so that every edge sees at least $c$ colours or is rainbow. We show that every $t$-intersecting hypergraph has bounded $(t + 1)$-strong chromatic number, resolving a problem of Blais, Weinstein and Yoshida. In fact, we characterise when a $t$-intersecting hypergraph has large $c$-strong chromatic number for $c\geq t+2$. Our characterisation also applies to hypergraphs which exclude sunflowers with specified parameters. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 13 pages

MSC Class: 05C15

arXiv:2406.12268 [pdf, ps, other]

Channel Twinning: An Enabler for Next-Generation Ubiquitous Wireless Connectivity

Authors: Yashuai Cao, Jingbo Tan, Jintao Wang, Wei Ni, Ekram Hossain, Dusit Niyato

Abstract: The emerging concept of channel twinning (CT) has great potential to become a key enabler of ubiquitous connectivity in next-generation (xG) wireless systems. By fusing multimodal sensor data, CT advocates a high-fidelity and low-overhead channel acquisition paradigm, which is promising to provide accurate channel prediction in cross-domain and high-mobility scenarios of ubiquitous xG networks. Ho… ▽ More The emerging concept of channel twinning (CT) has great potential to become a key enabler of ubiquitous connectivity in next-generation (xG) wireless systems. By fusing multimodal sensor data, CT advocates a high-fidelity and low-overhead channel acquisition paradigm, which is promising to provide accurate channel prediction in cross-domain and high-mobility scenarios of ubiquitous xG networks. However, the current literature lacks a universal CT architecture to address the challenges of heterogeneous scenarios, data, and resources in xG networks, which hinders the widespread deployment and applications of CT. This article discusses a new modularized CT architecture to bridge the barriers to scene recognition, cooperative sensing, and decentralized training. Based on the modularized design of CT, universal channel modeling, multimodal cooperative sensing, and lightweight twin modeling are described. Moreover, this article provides a concise definition, technical features, and case studies of CT, followed by potential applications of CT-empowered ubiquitous connectivity and some issues requiring future investigations. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: submitted to IEEE

arXiv:2406.11263 [pdf, other]

The Fall of ROME: Understanding the Collapse of LLMs in Model Editing

Authors: Wanli Yang, Fei Sun, Jiajun Tan, Xinyu Ma, Du Su, Dawei Yin, Huawei Shen

Abstract: Despite significant progress in model editing methods, their application in real-world scenarios remains challenging as they often cause large language models (LLMs) to collapse. Among them, ROME is particularly concerning, as it could disrupt LLMs with only a single edit. In this paper, we study the root causes of such collapse. Through extensive analysis, we identify two primary factors that con… ▽ More Despite significant progress in model editing methods, their application in real-world scenarios remains challenging as they often cause large language models (LLMs) to collapse. Among them, ROME is particularly concerning, as it could disrupt LLMs with only a single edit. In this paper, we study the root causes of such collapse. Through extensive analysis, we identify two primary factors that contribute to the collapse: i) inconsistent handling of prefixed and unprefixed keys in the parameter update equation may result in very small denominators, causing excessively large parameter updates; ii) the subject of collapse cases is usually the first token, whose unprefixed key distribution significantly differs from the prefixed key distribution in autoregressive transformers, causing the aforementioned issue to materialize. To validate our analysis, we propose a simple yet effective approach: uniformly using prefixed keys during editing phase and adding prefixes during the testing phase. The experimental results show that the proposed solution can prevent model collapse while maintaining the effectiveness of the edits. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.10841 [pdf, ps, other]

Local Hardy spaces associated with ball quasi-Banach function spaces and their dual spaces

Authors: Xinyu Chen, Jian Tan

Abstract: Let $X$ be a ball quasi-Banach function space on $\mathbb R^{n}$ and $h_{X}(\mathbb R^{n})$ the local Hardy space associated with $X$. In this paper, under some reasonable assumptions on $X$, the infinite and finite atomic decompositions for the local Hardy space $h_{X}(\mathbb R^{n})$ are established directly, without relying on the relation between $H_{X}(\mathbb R^{n})$ and… ▽ More Let $X$ be a ball quasi-Banach function space on $\mathbb R^{n}$ and $h_{X}(\mathbb R^{n})$ the local Hardy space associated with $X$. In this paper, under some reasonable assumptions on $X$, the infinite and finite atomic decompositions for the local Hardy space $h_{X}(\mathbb R^{n})$ are established directly, without relying on the relation between $H_{X}(\mathbb R^{n})$ and $h_{X}(\mathbb R^{n})$. Moreover, we apply the finite atomic decomposition to obtain the dual space of the local Hardy space $h_{X}(\mathbb R^{n})$. Especially, the above results can be applied to several specific ball quasi-Banach function spaces, demonstrating their wide range of applications. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: 28 pages. arXiv admin note: text overlap with arXiv:2208.06266, arXiv:2206.06551 by other authors

MSC Class: 42B30; 42B20

arXiv:2406.10499 [pdf, other]

Functional Clustering for Longitudinal Associations between Social Determinants of Health and Stroke Mortality in the US

Authors: Fangzhi Luo, Jianbin Tan, Donglan Zhang, Hui Huang, Ye Shen

Abstract: Understanding longitudinally changing associations between Social determinants of health (SDOH) and stroke mortality is crucial for timely stroke management. Previous studies have revealed a significant regional disparity in the SDOH -- stroke mortality associations. However, they do not develop data-driven methods based on these longitudinal associations for regional division in stroke control. T… ▽ More Understanding longitudinally changing associations between Social determinants of health (SDOH) and stroke mortality is crucial for timely stroke management. Previous studies have revealed a significant regional disparity in the SDOH -- stroke mortality associations. However, they do not develop data-driven methods based on these longitudinal associations for regional division in stroke control. To fill this gap, we propose a novel clustering method for SDOH -- stroke mortality associations in the US counties. To enhance interpretability and statistical efficiency of the clustering outcomes, we introduce a new class of smoothness-sparsity pursued penalties for simultaneous clustering and variable selection in the longitudinal associations. As a result, we can identify important SDOH that contribute to longitudinal changes in the stroke mortality, facilitating clustering of US counties into several regions based on how these SDOH relate to stroke mortality. The effectiveness of our proposed method is demonstrated through extensive numerical studies. By applying our method to a county-level SDOH and stroke mortality longitudinal data, we identify 18 important SDOH for stroke mortality and divide the US counties into two clusters based on these selected SDOH. Our findings unveil complex regional heterogeneity in the longitudinal associations between SDOH and stroke mortality, providing valuable insights in region-specific SDOH adjustments for mitigating stroke mortality. △ Less

Submitted 20 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

arXiv:2406.10290 [pdf, other]

MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases

Authors: Rithesh Murthy, Liangwei Yang, Juntao Tan, Tulika Manoj Awalgaonkar, Yilun Zhou, Shelby Heinecke, Sachin Desai, Jason Wu, Ran Xu, Sarah Tan, Jianguo Zhang, Zhiwei Liu, Shirley Kokane, Zuxin Liu, Ming Zhu, Huan Wang, Caiming Xiong, Silvio Savarese

Abstract: The deployment of Large Language Models (LLMs) and Large Multimodal Models (LMMs) on mobile devices has gained significant attention due to the benefits of enhanced privacy, stability, and personalization. However, the hardware constraints of mobile devices necessitate the use of models with fewer parameters and model compression techniques like quantization. Currently, there is limited understand… ▽ More The deployment of Large Language Models (LLMs) and Large Multimodal Models (LMMs) on mobile devices has gained significant attention due to the benefits of enhanced privacy, stability, and personalization. However, the hardware constraints of mobile devices necessitate the use of models with fewer parameters and model compression techniques like quantization. Currently, there is limited understanding of quantization's impact on various task performances, including LLM tasks, LMM tasks, and, critically, trust and safety. There is a lack of adequate tools for systematically testing these models on mobile devices. To address these gaps, we introduce MobileAIBench, a comprehensive benchmarking framework for evaluating mobile-optimized LLMs and LMMs. MobileAIBench assesses models across different sizes, quantization levels, and tasks, measuring latency and resource consumption on real devices. Our two-part open-source framework includes a library for running evaluations on desktops and an iOS app for on-device latency and hardware utilization measurements. Our thorough analysis aims to accelerate mobile AI research and deployment by providing insights into the performance and feasibility of deploying LLMs and LMMs on mobile platforms. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.09801 [pdf, other]

RaNeuS: Ray-adaptive Neural Surface Reconstruction

Authors: Yida Wang, David Joseph Tan, Nassir Navab, Federico Tombari

Abstract: Our objective is to leverage a differentiable radiance field \eg NeRF to reconstruct detailed 3D surfaces in addition to producing the standard novel view renderings. There have been related methods that perform such tasks, usually by utilizing a signed distance field (SDF). However, the state-of-the-art approaches still fail to correctly reconstruct the small-scale details, such as the leaves, ro… ▽ More Our objective is to leverage a differentiable radiance field \eg NeRF to reconstruct detailed 3D surfaces in addition to producing the standard novel view renderings. There have been related methods that perform such tasks, usually by utilizing a signed distance field (SDF). However, the state-of-the-art approaches still fail to correctly reconstruct the small-scale details, such as the leaves, ropes, and textile surfaces. Considering that different methods formulate and optimize the projection from SDF to radiance field with a globally constant Eikonal regularization, we improve with a ray-wise weighting factor to prioritize the rendering and zero-crossing surface fitting on top of establishing a perfect SDF. We propose to adaptively adjust the regularization on the signed distance field so that unsatisfying rendering rays won't enforce strong Eikonal regularization which is ineffective, and allow the gradients from regions with well-learned radiance to effectively back-propagated to the SDF. Consequently, balancing the two objectives in order to generate accurate and detailed surfaces. Additionally, concerning whether there is a geometric bias between the zero-crossing surface in SDF and rendering points in the radiance field, the projection becomes adjustable as well depending on different 3D locations during optimization. Our proposed \textit{RaNeuS} are extensively evaluated on both synthetic and real datasets, achieving state-of-the-art results on both novel view synthesis and geometric reconstruction. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 3DV 2024, oral. In: Proceedings of the IEEE/CVF International Conference on 3D Vision (2023)

arXiv:2406.09215 [pdf, other]

On Softmax Direct Preference Optimization for Recommendation

Authors: Yuxin Chen, Junfei Tan, An Zhang, Zhengyi Yang, Leheng Sheng, Enzhi Zhang, Xiang Wang, Tat-Seng Chua

Abstract: Recommender systems aim to predict personalized rankings based on user preference data. With the rise of Language Models (LMs), LM-based recommenders have been widely explored due to their extensive world knowledge and powerful reasoning abilities. Most of the LM-based recommenders convert historical interactions into language prompts, pairing with a positive item as the target response and fine-t… ▽ More Recommender systems aim to predict personalized rankings based on user preference data. With the rise of Language Models (LMs), LM-based recommenders have been widely explored due to their extensive world knowledge and powerful reasoning abilities. Most of the LM-based recommenders convert historical interactions into language prompts, pairing with a positive item as the target response and fine-tuning LM with a language modeling loss. However, the current objective fails to fully leverage preference data and is not optimized for personalized ranking tasks, which hinders the performance of LM-based recommenders. Inspired by the current advancement of Direct Preference Optimization (DPO) in human preference alignment and the success of softmax loss in recommendations, we propose Softmax-DPO (S-DPO) to instill ranking information into the LM to help LM-based recommenders distinguish preferred items from negatives, rather than solely focusing on positives. Specifically, we incorporate multiple negatives in user preference data and devise an alternative version of DPO loss tailored for LM-based recommenders, connected to softmax sampling strategies. Theoretically, we bridge S-DPO with the softmax loss over negative sampling and find that it has a side effect of mining hard negatives, which assures its exceptional capabilities in recommendation tasks. Empirically, extensive experiments conducted on three real-world datasets demonstrate the superiority of S-DPO to effectively model user preference and further boost recommendation performance while mitigating the data likelihood decline issue of DPO. Our codes are available at https://github.com/chenyuxin1999/S-DPO. △ Less

Submitted 14 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.08549 [pdf, other]

Investigating Mutual Coupling in the Hydrogen Epoch of Reionization Array and Mitigating its Effects on the 21-cm Power Spectrum

Authors: E. Rath, R. Pascua, A. T. Josaitis, A. Ewall-Wice, N. Fagnoni, E. de Lera Acedo, Z. E. Martinot, Z. Abdurashidova, T. Adams, J. E. Aguirre, R. Baartman, A. P. Beardsley, L. M. Berkhout, G. Bernardi, T. S. Billings, J. D. Bowman, P. Bull, J. Burba, R. Byrne, S. Carey, K. -F. Chen, S. Choudhuri, T. Cox, D. R. DeBoer, M. Dexter , et al. (56 additional authors not shown)

Abstract: Interferometric experiments designed to detect the highly redshifted 21-cm signal from neutral hydrogen are producing increasingly stringent constraints on the 21-cm power spectrum, but some k-modes remain systematics-dominated. Mutual coupling is a major systematic that must be overcome in order to detect the 21-cm signal, and simulations that reproduce effects seen in the data can guide strategi… ▽ More Interferometric experiments designed to detect the highly redshifted 21-cm signal from neutral hydrogen are producing increasingly stringent constraints on the 21-cm power spectrum, but some k-modes remain systematics-dominated. Mutual coupling is a major systematic that must be overcome in order to detect the 21-cm signal, and simulations that reproduce effects seen in the data can guide strategies for mitigating mutual coupling. In this paper, we analyse 12 nights of data from the Hydrogen Epoch of Reionization Array and compare the data against simulations that include a computationally efficient and physically motivated semi-analytic treatment of mutual coupling. We find that simulated coupling features qualitatively agree with coupling features in the data; however, coupling features in the data are brighter than the simulated features, indicating the presence of additional coupling mechanisms not captured by our model. We explore the use of fringe-rate filters as mutual coupling mitigation tools and use our simulations to investigate the effects of mutual coupling on a simulated cosmological 21-cm power spectrum in a "worst case" scenario where the foregrounds are particularly bright. We find that mutual coupling contaminates a large portion of the "EoR Window", and the contamination is several orders-of-magnitude larger than our simulated cosmic signal across a wide range of cosmological Fourier modes. While our fiducial fringe-rate filtering strategy reduces mutual coupling by roughly a factor of 100 in power, a non-negligible amount of coupling cannot be excised with fringe-rate filters, so more sophisticated mitigation strategies are required. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 19 pages, 12 figures, submitted to MNRAS

arXiv:2406.08122 [pdf]

Fully Few-shot Class-incremental Audio Classification Using Expandable Dual-embedding Extractor

Authors: Yongjie Si, Yanxiong Li, Jialong Li, Jiaxin Tan, Qianhua He

Abstract: It's assumed that training data is sufficient in base session of few-shot class-incremental audio classification. However, it's difficult to collect abundant samples for model training in base session in some practical scenarios due to the data scarcity of some classes. This paper explores a new problem of fully few-shot class-incremental audio classification with few training samples in all sessi… ▽ More It's assumed that training data is sufficient in base session of few-shot class-incremental audio classification. However, it's difficult to collect abundant samples for model training in base session in some practical scenarios due to the data scarcity of some classes. This paper explores a new problem of fully few-shot class-incremental audio classification with few training samples in all sessions. Moreover, we propose a method using expandable dual-embedding extractor to solve it. The proposed model consists of an embedding extractor and an expandable classifier. The embedding extractor consists of a pretrained Audio Spectrogram Transformer (AST) and a finetuned AST. The expandable classifier consists of prototypes and each prototype represents a class. Experiments are conducted on three datasets (LS-100, NSynth-100 and FSC-89). Results show that our method exceeds seven baseline ones in average accuracy with statistical significance. Code is at: https://github.com/YongjieSi/EDE. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: Accepted for publication on Interspeech 2024. 5 pages, 3 figures, 5 tables

arXiv:2406.08119 [pdf]

Low-Complexity Acoustic Scene Classification Using Parallel Attention-Convolution Network

Authors: Yanxiong Li, Jiaxin Tan, Guoqing Chen, Jialong Li, Yongjie Si, Qianhua He

Abstract: This work is an improved system that we submitted to task 1 of DCASE2023 challenge. We propose a method of low-complexity acoustic scene classification by a parallel attention-convolution network which consists of four modules, including pre-processing, fusion, global and local contextual information extraction. The proposed network is computationally efficient to capture global and local contextu… ▽ More This work is an improved system that we submitted to task 1 of DCASE2023 challenge. We propose a method of low-complexity acoustic scene classification by a parallel attention-convolution network which consists of four modules, including pre-processing, fusion, global and local contextual information extraction. The proposed network is computationally efficient to capture global and local contextual information from each audio clip. In addition, we integrate other techniques into our method, such as knowledge distillation, data augmentation, and adaptive residual normalization. When evaluated on the official dataset of DCASE2023 challenge, our method obtains the highest accuracy of 56.10% with parameter number of 5.21 kilo and multiply-accumulate operations of 1.44 million. It exceeds the top two systems of DCASE2023 challenge in accuracy and complexity, and obtains state-of-the-art result. Code is at: https://github.com/Jessytan/Low-complexity-ASC. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: Accepted for publication on Interspeech 2024. 5 pages, 4 figures, 3 tables

arXiv:2406.07006 [pdf, other]

MIPI 2024 Challenge on Few-shot RAW Image Denoising: Methods and Results

Authors: Xin Jin, Chunle Guo, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Ruoqi Li, Chang Liu, Ziyi Wang, Yao Du, Jingjing Yang, Long Bao, Heng Sun, Xiangyu Kong, Xiaoxia Xing, Jinlong Wu, Yuanyang Xue, Hyunhee Park, Sejun Song, Changho Kim, Jingfan Tan , et al. (17 additional authors not shown)

Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). Building on the achievements of the previous MIPI Workshops held at ECCV 2022 and CVPR 2023, we introduce our third MIPI challenge including three tracks focusing on novel image sensors and imaging algorithms. In this paper, we summarize and review the Few-shot RAW Image Denoising track on MIPI 2024. In total, 165 participants were successfully registered, and 7 teams submitted results in the final testing phase. The developed solutions in this challenge achieved state-of-the-art erformance on Few-shot RAW Image Denoising. More details of this challenge and the link to the dataset can be found at https://mipichallenge.org/MIPI2024. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Few-shot RAWImage Denoising Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

arXiv:2406.03899 [pdf, other]

PLDNet: PLD-Guided Lightweight Deep Network Boosted by Efficient Attention for Handheld Dual-Microphone Speech Enhancement

Authors: Nan Zhou, Youhai Jiang, Jialin Tan, Chongmin Qi

Abstract: Low-complexity speech enhancement on mobile phones is crucial in the era of 5G. Thus, focusing on handheld mobile phone communication scenario, based on power level difference (PLD) algorithm and lightweight U-Net, we propose PLD-guided lightweight deep network (PLDNet), an extremely lightweight dual-microphone speech enhancement method that integrates the guidance of signal processing algorithm a… ▽ More Low-complexity speech enhancement on mobile phones is crucial in the era of 5G. Thus, focusing on handheld mobile phone communication scenario, based on power level difference (PLD) algorithm and lightweight U-Net, we propose PLD-guided lightweight deep network (PLDNet), an extremely lightweight dual-microphone speech enhancement method that integrates the guidance of signal processing algorithm and lightweight attention-augmented U-Net. For the guidance information, we employ PLD algorithm to pre-process dual-microphone spectrum, and feed the output into subsequent deep neural network, which utilizes a lightweight U-Net with our proposed gated convolution augmented frequency attention (GCAFA) module to extract desired clean speech. Experimental results demonstrate that our proposed method achieves competitive performance with recent top-performing models while reducing computational cost by over 90%, highlighting the potential for low-complexity speech enhancement on mobile phones. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: Accepted at Interspeech 2024

arXiv:2406.02609 [pdf, other]

Less is More: Pseudo-Label Filtering for Continual Test-Time Adaptation

Authors: Jiayao Tan, Fan Lyu, Chenggong Ni, Tingliang Feng, Fuyuan Hu, Zhang Zhang, Shaochuang Zhao, Liang Wang

Abstract: Continual Test-Time Adaptation (CTTA) aims to adapt a pre-trained model to a sequence of target domains during the test phase without accessing the source data. To adapt to unlabeled data from unknown domains, existing methods rely on constructing pseudo-labels for all samples and updating the model through self-training. However, these pseudo-labels often involve noise, leading to insufficient ad… ▽ More Continual Test-Time Adaptation (CTTA) aims to adapt a pre-trained model to a sequence of target domains during the test phase without accessing the source data. To adapt to unlabeled data from unknown domains, existing methods rely on constructing pseudo-labels for all samples and updating the model through self-training. However, these pseudo-labels often involve noise, leading to insufficient adaptation. To improve the quality of pseudo-labels, we propose a pseudo-label selection method for CTTA, called Pseudo Labeling Filter (PLF). The key idea of PLF is to keep selecting appropriate thresholds for pseudo-labels and identify reliable ones for self-training. Specifically, we present three principles for setting thresholds during continuous domain learning, including initialization, growth and diversity. Based on these principles, we design Self-Adaptive Thresholding to filter pseudo-labels. Additionally, we introduce a Class Prior Alignment (CPA) method to encourage the model to make diverse predictions for unknown domain samples. Through extensive experiments, PLF outperforms current state-of-the-art methods, proving its effectiveness in CTTA. △ Less

Submitted 12 July, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

Comments: arXiv admin note: text overlap with arXiv:2310.03335 by other authors

arXiv:2405.18187 [pdf, other]

AlignIQL: Policy Alignment in Implicit Q-Learning through Constrained Optimization

Authors: Longxiang He, Li Shen, Junbo Tan, Xueqian Wang

Abstract: Implicit Q-learning (IQL) serves as a strong baseline for offline RL, which learns the value function using only dataset actions through quantile regression. However, it is unclear how to recover the implicit policy from the learned implicit Q-function and why IQL can utilize weighted regression for policy extraction. IDQL reinterprets IQL as an actor-critic method and gets weights of implicit pol… ▽ More Implicit Q-learning (IQL) serves as a strong baseline for offline RL, which learns the value function using only dataset actions through quantile regression. However, it is unclear how to recover the implicit policy from the learned implicit Q-function and why IQL can utilize weighted regression for policy extraction. IDQL reinterprets IQL as an actor-critic method and gets weights of implicit policy, however, this weight only holds for the optimal value function. In this work, we introduce a different way to solve the implicit policy-finding problem (IPF) by formulating this problem as an optimization problem. Based on this optimization problem, we further propose two practical algorithms AlignIQL and AlignIQL-hard, which inherit the advantages of decoupling actor from critic in IQL and provide insights into why IQL can use weighted regression for policy extraction. Compared with IQL and IDQL, we find our method keeps the simplicity of IQL and solves the implicit policy-finding problem. Experimental results on D4RL datasets show that our method achieves competitive or superior results compared with other SOTA offline RL methods. Especially in complex sparse reward tasks like Antmaze and Adroit, our method outperforms IQL and IDQL by a significant margin. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 19 pages, 3 figures, 4 tables

arXiv:2405.15673 [pdf, other]

Consistency of Neural Causal Partial Identification

Authors: Jiyuan Tan, Jose Blanchet, Vasilis Syrgkanis

Abstract: Recent progress in Neural Causal Models (NCMs) showcased how identification and partial identification of causal effects can be automatically carried out via training of neural generative models that respect the constraints encoded in a given causal graph [Xia et al. 2022, Balazadeh et al. 2022]. However, formal consistency of these methods has only been proven for the case of discrete variables o… ▽ More Recent progress in Neural Causal Models (NCMs) showcased how identification and partial identification of causal effects can be automatically carried out via training of neural generative models that respect the constraints encoded in a given causal graph [Xia et al. 2022, Balazadeh et al. 2022]. However, formal consistency of these methods has only been proven for the case of discrete variables or only for linear causal models. In this work, we prove consistency of partial identification via NCMs in a general setting with both continuous and categorical variables. Further, our results highlight the impact of the design of the underlying neural network architecture in terms of depth and connectivity as well as the importance of applying Lipschitz regularization in the training phase. In particular, we provide a counterexample showing that without Lipschitz regularization the NCM may not be asymptotically consistent. Our results are enabled by new results on the approximability of structural causal models via neural generative models, together with an analysis of the sample complexity of the resulting architectures and how that translates into an error in the constrained optimization problem that defines the partial identification bounds. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: 37 pages, 8 figures

arXiv:2405.12476 [pdf, other]

Benchmarking Fish Dataset and Evaluation Metric in Keypoint Detection -- Towards Precise Fish Morphological Assessment in Aquaculture Breeding

Authors: Weizhen Liu, Jiayu Tan, Guangyu Lan, Ao Li, Dongye Li, Le Zhao, Xiaohui Yuan, Nanqing Dong

Abstract: Accurate phenotypic analysis in aquaculture breeding necessitates the quantification of subtle morphological phenotypes. Existing datasets suffer from limitations such as small scale, limited species coverage, and inadequate annotation of keypoints for measuring refined and complex morphological phenotypes of fish body parts. To address this gap, we introduce FishPhenoKey, a comprehensive dataset… ▽ More Accurate phenotypic analysis in aquaculture breeding necessitates the quantification of subtle morphological phenotypes. Existing datasets suffer from limitations such as small scale, limited species coverage, and inadequate annotation of keypoints for measuring refined and complex morphological phenotypes of fish body parts. To address this gap, we introduce FishPhenoKey, a comprehensive dataset comprising 23,331 high-resolution images spanning six fish species. Notably, FishPhenoKey includes 22 phenotype-oriented annotations, enabling the capture of intricate morphological phenotypes. Motivated by the nuanced evaluation of these subtle morphologies, we also propose a new evaluation metric, Percentage of Measured Phenotype (PMP). It is designed to assess the accuracy of individual keypoint positions and is highly sensitive to the phenotypes measured using the corresponding keypoints. To enhance keypoint detection accuracy, we further propose a novel loss, Anatomically-Calibrated Regularization (ACR), that can be integrated into keypoint detection models, leveraging biological insights to refine keypoint localization. Our contributions set a new benchmark in fish phenotype analysis, addressing the challenges of precise morphological quantification and opening new avenues for research in sustainable aquaculture and genetic studies. Our dataset and code are available at https://github.com/WeizhenLiuBioinform/Fish-Phenotype-Detect. △ Less

Submitted 31 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

Comments: Accepted by IJCAI2024, Code: https://github.com/WeizhenLiuBioinform/Fish-Phenotype-Detect

arXiv:2405.09052 [pdf, other]

Dielectric Tensor Prediction for Inorganic Materials Using Latent Information from Preferred Potential

Authors: Zetian Mao, Wenwen Li, Jethro Tan

Abstract: Dielectrics are materials with widespread applications in flash memory, central processing units, photovoltaics, capacitors, etc. However, the availability of public dielectric data remains limited, hindering research and development efforts. Previously, machine learning models focused on predicting dielectric constants as scalars, overlooking the importance of dielectric tensors in understanding… ▽ More Dielectrics are materials with widespread applications in flash memory, central processing units, photovoltaics, capacitors, etc. However, the availability of public dielectric data remains limited, hindering research and development efforts. Previously, machine learning models focused on predicting dielectric constants as scalars, overlooking the importance of dielectric tensors in understanding material properties under directional electric fields for material design and simulation. This study demonstrates the value of common equivariant structural embedding features derived from a universal neural network potential in enhancing the prediction of dielectric properties. To integrate channel information from various-rank latent features while preserving the desired SE(3) equivariance to the second-rank dielectric tensors, we design an equivariant readout decoder to predict the total, electronic, and ionic dielectric tensors individually, and compare our model with the state-of-the-art models. Finally, we evaluate our model by conducting virtual screening on thermodynamical stable structure candidates in Materials Project. The material Ba\textsubscript{2}SmTaO\textsubscript{6} with large band gaps ($E_g=3.36 \mathrm{eV}$) and dielectric constants ($ε=93.81$) is successfully identified out of the 14k candidate set. The results show that our methods give good accuracy on predicting dielectric tensors of inorganic materials, emphasizing their potential in contributing to the discovery of novel dielectrics. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2405.05525 [pdf, other]

Ditto: Quantization-aware Secure Inference of Transformers upon MPC

Authors: Haoqi Wu, Wenjing Fang, Yancheng Zheng, Junming Ma, Jin Tan, Yinggui Wang, Lei Wang

Abstract: Due to the rising privacy concerns on sensitive client data and trained models like Transformers, secure multi-party computation (MPC) techniques are employed to enable secure inference despite attendant overhead. Existing works attempt to reduce the overhead using more MPC-friendly non-linear function approximations. However, the integration of quantization widely used in plaintext inference into… ▽ More Due to the rising privacy concerns on sensitive client data and trained models like Transformers, secure multi-party computation (MPC) techniques are employed to enable secure inference despite attendant overhead. Existing works attempt to reduce the overhead using more MPC-friendly non-linear function approximations. However, the integration of quantization widely used in plaintext inference into the MPC domain remains unclear. To bridge this gap, we propose the framework named Ditto to enable more efficient quantization-aware secure Transformer inference. Concretely, we first incorporate an MPC-friendly quantization into Transformer inference and employ a quantization-aware distillation procedure to maintain the model utility. Then, we propose novel MPC primitives to support the type conversions that are essential in quantization and implement the quantization-aware MPC execution of secure quantized inference. This approach significantly decreases both computation and communication overhead, leading to improvements in overall efficiency. We conduct extensive experiments on Bert and GPT2 models to evaluate the performance of Ditto. The results demonstrate that Ditto is about $3.14\sim 4.40\times$ faster than MPCFormer (ICLR 2023) and $1.44\sim 2.35\times$ faster than the state-of-the-art work PUMA with negligible utility degradation. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: to be published in ICML 2024

arXiv:2405.05355 [pdf, other]

Geometry-Informed Distance Candidate Selection for Adaptive Lightweight Omnidirectional Stereo Vision with Fisheye Images

Authors: Conner Pulling, Je Hon Tan, Yaoyu Hu, Sebastian Scherer

Abstract: Multi-view stereo omnidirectional distance estimation usually needs to build a cost volume with many hypothetical distance candidates. The cost volume building process is often computationally heavy considering the limited resources a mobile robot has. We propose a new geometry-informed way of distance candidates selection method which enables the use of a very small number of candidates and reduc… ▽ More Multi-view stereo omnidirectional distance estimation usually needs to build a cost volume with many hypothetical distance candidates. The cost volume building process is often computationally heavy considering the limited resources a mobile robot has. We propose a new geometry-informed way of distance candidates selection method which enables the use of a very small number of candidates and reduces the computational cost. We demonstrate the use of the geometry-informed candidates in a set of model variants. We find that by adjusting the candidates during robot deployment, our geometry-informed distance candidates also improve a pre-trained model's accuracy if the extrinsics or the number of cameras changes. Without any re-training or fine-tuning, our models outperform models trained with evenly distributed distance candidates. Models are also released as hardware-accelerated versions with a new dedicated large-scale dataset. The project page, code, and dataset can be found at https://theairlab.org/gicandidates/ . △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2405.01548 [pdf]

doi 10.1109/JLT.2023.3304659

Foundry's perspective on laser and SOA module integration with silicon photonics

Authors: James Y. S. Tan, Shawn Xie Wu, Salih Yanikgonul, Chao Li, Patrick Guo-Qiang Lo

Abstract: Silicon photonic integrated circuit (PIC) builds on the demand for a low cost approach from established silicon-based manufacturing infrastructure traditionally built for electronics. Besides its natural abundance, silicon has desirable properties such as optically low loss (at certain critical wavelengths), and small form factor to enable high density scaled-up optical on-chip circuitry. However,… ▽ More Silicon photonic integrated circuit (PIC) builds on the demand for a low cost approach from established silicon-based manufacturing infrastructure traditionally built for electronics. Besides its natural abundance, silicon has desirable properties such as optically low loss (at certain critical wavelengths), and small form factor to enable high density scaled-up optical on-chip circuitry. However, given its indirect bandgap, the platform is typically integrated with other direct bandgap (e.g., III-V semiconductor) platforms for on-chip light source. An effective solution to integrating light source onto silicon photonics platform is integral to a practical scaled-up and full-fledged integrated photonics implementation. Here, we discuss the integration solutions, and present our foundry's perspective toward realizing it. △ Less

Submitted 20 February, 2024; originally announced May 2024.

Comments: 14 pages

Journal ref: IEEE J Lightwave Technol. vol. 42, no. 3, pp. 1062-1074, 2024

arXiv:2405.00846 [pdf, other]

Gameplay Filters: Safe Robot Walking through Adversarial Imagination

Authors: Duy P. Nguyen, Kai-Chieh Hsu, Wenhao Yu, Jie Tan, Jaime F. Fisac

Abstract: Ensuring the safe operation of legged robots in uncertain, novel environments is crucial to their widespread adoption. Despite recent advances in safety filters that can keep arbitrary task-driven policies from incurring safety failures, existing solutions for legged robot locomotion still rely on simplified dynamics and may fail when the robot is perturbed away from predefined stable gaits. This… ▽ More Ensuring the safe operation of legged robots in uncertain, novel environments is crucial to their widespread adoption. Despite recent advances in safety filters that can keep arbitrary task-driven policies from incurring safety failures, existing solutions for legged robot locomotion still rely on simplified dynamics and may fail when the robot is perturbed away from predefined stable gaits. This paper presents a general approach that leverages offline game-theoretic reinforcement learning to synthesize a highly robust safety filter for high-order nonlinear dynamics. This gameplay filter then maintains runtime safety by continually simulating adversarial futures and precluding task-driven actions that would cause it to lose future games (and thereby violate safety). Validated on a 36-dimensional quadruped robot locomotion task, the gameplay safety filter exhibits inherent robustness to the sim-to-real gap without manual tuning or heuristic designs. Physical experiments demonstrate the effectiveness of the gameplay safety filter under perturbations, such as tugging and unmodeled irregular terrains, while simulation studies shed light on how to trade off computation and conservativeness without compromising safety. △ Less

Submitted 31 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

arXiv:2404.16885 [pdf]

Adapting an Artificial Intelligence Sexually Transmitted Diseases Symptom Checker Tool for Mpox Detection: The HeHealth Experience

Authors: Rayner Kay Jin Tan, Dilruk Perera, Salomi Arasaratnam, Yudara Kularathne

Abstract: Artificial Intelligence applications have shown promise in the management of pandemics and have been widely used to assist the identification, classification, and diagnosis of medical images. In response to the global outbreak of Monkeypox (Mpox), the HeHealth.ai team leveraged an existing tool to screen for sexually transmitted diseases to develop a digital screening test for symptomatic Mpox thr… ▽ More Artificial Intelligence applications have shown promise in the management of pandemics and have been widely used to assist the identification, classification, and diagnosis of medical images. In response to the global outbreak of Monkeypox (Mpox), the HeHealth.ai team leveraged an existing tool to screen for sexually transmitted diseases to develop a digital screening test for symptomatic Mpox through AI approaches. Prior to the global outbreak of Mpox, the team developed a smartphone app, where app users can use their own smartphone cameras to take pictures of their own penises to screen for symptomatic STD. The AI model was initially developed using 5000 cases and use a modified convolutional neural network to output prediction scores across visually diagnosable penis pathologies including Syphilis, Herpes Simplex Virus, and Human Papilloma Virus. From June 2022 to October 2022, a total of about 22,000 users downloaded the HeHealth app, and about 21,000 images have been analyzed using HeHealth AI technology. We then engaged in formative research, stakeholder engagement, rapid consolidation images, a validation study, and implementation of the tool from July 2022. From July 2022 to October 2022, a total of 1000 Mpox related images had been used to train the Mpox symptom checker tool. Our digital symptom checker tool showed accuracy of 87% to rule in Mpox and 90% to rule out symptomatic Mpox. Several hurdles identified included issues of data privacy and security for app users, initial lack of data to train the AI tool, and the potential generalizability of input data. We offer several suggestions to help others get started on similar projects in emergency situations, including engaging a wide range of stakeholders, having a multidisciplinary team, prioritizing pragmatism, as well as the concept that big data in fact is made up of small data. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: 15 pages, 4 figures

arXiv:2404.16808 [pdf, other]

Enhancing nanocrystal superlattice self-assembly near a metastable liquid binodal

Authors: Christian P. N. Tanner, Vivian R. K. Wall, Joshua Portner, Ahhyun Jeong, Avishek Das, James K. Utterback, Leo M. Hamerlynck, Jonathan G. Raybin, Matthew J. Hurley, Nicholas Leonard, Rebecca B. Wai, Jenna A. Tan, Mumtaz Gababa, Chenhui Zhu, Eric Schaible, Christopher J. Tassone, David T. Limmer, Samuel W. Teitelbaum, Dmitri V. Talapin, Naomi S. Ginsberg

Abstract: Bottom-up assembly of nanocrystals (NCs) into ordered arrays, or superlattices (SLs), is a promising route to design materials with new functionalities, but the degree of control over assembly into functional structures remains challenging. Using electrostatics, rather than density, to tune the interactions between semiconductor NCs, we watch self-assembly proceeding through a metastable liquid ph… ▽ More Bottom-up assembly of nanocrystals (NCs) into ordered arrays, or superlattices (SLs), is a promising route to design materials with new functionalities, but the degree of control over assembly into functional structures remains challenging. Using electrostatics, rather than density, to tune the interactions between semiconductor NCs, we watch self-assembly proceeding through a metastable liquid phase. We systematically investigate the phase behavior as a function of quench conditions in situ and in real time using small angle X-ray scattering (SAXS). Through quantitative fitting to colloid, liquid, and SL models, we extract the time evolution of each phase and the system phase diagram, which we find to be consistent with short-range attractive interactions. Using the phase diagram's predictive power, we establish control of the self-assembly rate over three orders of magnitude, and identify one- and two-step self-assembly regimes, with only the latter implicating the metastable liquid as an intermediate. Importantly, the presence of the metastable liquid increases SL formation rates relative to the equivalent one-step pathway, and SL order counterintuitively increases with the rate, revealing a highly desirable and generalizable kinetic strategy to promote and enhance ordered assembly. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: 16 pages, 4 figures

arXiv:2404.16172 [pdf, other]

Mirror Construction for Nakajima Quiver Varieties

Authors: Jiawei Hu, Siu-Cheong Lau, Ju Tan

Abstract: In this paper, we construct the ADHM quiver representations and the corresponding sheaves as the mirror objects of formal deformations of the framed immersed Lagrangian sphere decorated with flat bundles. More generally, framed double quivers of Nakajima are constructed as localized mirrors of framed Lagrangian immersions in dimension two. This produces a localized mirror functor to the dg categor… ▽ More In this paper, we construct the ADHM quiver representations and the corresponding sheaves as the mirror objects of formal deformations of the framed immersed Lagrangian sphere decorated with flat bundles. More generally, framed double quivers of Nakajima are constructed as localized mirrors of framed Lagrangian immersions in dimension two. This produces a localized mirror functor to the dg category of modules over the framed preprojective algebra. For affine ADE quivers in specific multiplicities, the corresponding (unframed) Lagrangian immersions are homological tori, whose moduli of stable deformations are asymptotically locally Euclidean (ALE) spaces. We show that framed stable Lagrangian branes are transformed into monadic complexes of framed torsion-free sheaves over the ALE spaces. A main ingredient is the notion of framed Lagrangian immersions. Moreover, it is important to note that the deformation space of a Lagrangian immersion with more than one component is stacky. Using the formalism of quiver algebroid stacks, we find isomorphisms between the moduli of stable Lagrangian immersions and that of special Lagrangian fibers of an SYZ fibration in the affine $A_n$ cases. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: 50 pages. Comments are welcome!

MSC Class: 14J33; 53D37

arXiv:2404.13267 [pdf, other]

Demystify Adult Learning: A Social Network and Large Language Model Assisted Approach

Authors: Fang Liu, Bosheng Ding, Chong Guan, Zhang Wei, Dusit Niyato, Justina Tan

Abstract: Adult learning is increasingly recognized as a crucial way for personal development and societal progress. It however is challenging, and adult learners face unique challenges such as balancing education with other life responsibilities. Collecting feedback from adult learners is effective in understanding their concerns and improving learning experiences, and social networks provide a rich source… ▽ More Adult learning is increasingly recognized as a crucial way for personal development and societal progress. It however is challenging, and adult learners face unique challenges such as balancing education with other life responsibilities. Collecting feedback from adult learners is effective in understanding their concerns and improving learning experiences, and social networks provide a rich source of real-time sentiment data from adult learners. Machine learning technologies especially large language models (LLMs) perform well in automating sentiment analysis. However, none of such models is specialized for adult learning with accurate sentiment understanding. In this paper, we present A-Learn, which enhances adult learning sentiment analysis by customizing existing general-purpose LLMs with domain-specific datasets for adult learning. We collect adult learners' comments from social networks and label the sentiment of each comment with an existing LLM to form labelled datasets tailored for adult learning. The datasets are used to customize A-Learn from several base LLMs. We conducted experimental studies and the results reveal A-Learn's competitive sentiment analysis performance, achieving up to 91.3% accuracy with 20% improvement over the base LLM. A-Learn is also employed for word cloud analysis to identify key concerns of adult learners. The research outcome of this study highlights the importance of applying machine learning with educational expertise for teaching improvement and educational innovations that benefit adult learning and adult learners. △ Less

Submitted 20 April, 2024; originally announced April 2024.

Comments: 6 pages, 3 figures

arXiv:2404.09292 [pdf, other]

Bridging Data Islands: Geographic Heterogeneity-Aware Federated Learning for Collaborative Remote Sensing Semantic Segmentation

Authors: Jieyi Tan, Yansheng Li, Sergey A. Bartalev, Bo Dang, Wei Chen, Yongjun Zhang, Liangqi Yuan

Abstract: Remote sensing semantic segmentation (RSS) is an essential task in Earth Observation missions. Due to data privacy concerns, high-quality remote sensing images with annotations cannot be well shared among institutions, making it difficult to fully utilize RSS data to train a generalized model. Federated Learning (FL), a privacy-preserving collaborative learning technology, is a potential solution.… ▽ More Remote sensing semantic segmentation (RSS) is an essential task in Earth Observation missions. Due to data privacy concerns, high-quality remote sensing images with annotations cannot be well shared among institutions, making it difficult to fully utilize RSS data to train a generalized model. Federated Learning (FL), a privacy-preserving collaborative learning technology, is a potential solution. However, the current research on how to effectively apply FL in RSS is still scarce and requires further investigation. Remote sensing images in various institutions often exhibit strong geographical heterogeneity. More specifically, it is reflected in terms of class-distribution heterogeneity and object-appearance heterogeneity. Unfortunately, most existing FL studies show inadequate focus on geographical heterogeneity, thus leading to performance degradation in the global model. Considering the aforementioned issues, we propose a novel Geographic Heterogeneity-Aware Federated Learning (GeoFed) framework to address privacy-preserving RSS. Through Global Feature Extension and Tail Regeneration modules, class-distribution heterogeneity is alleviated. Additionally, we design an Essential Feature Mining strategy to alleviate object-appearance heterogeneity by constructing essential features. Extensive experiments on three datasets (i.e., FBP, CASID, Inria) show that our GeoFed consistently outperforms the current state-of-the-art methods. The code will be available publicly. △ Less

Submitted 14 April, 2024; originally announced April 2024.

Comments: 13 pages,9 figures, 4 tables

arXiv:2404.08395 [pdf]

Room-Temperature Polariton Lasing from CdSe core-only Nanoplatelets

Authors: Francisco Freire-Fernández, Nathan G. Sinai, Max J. H. Tan, Sang-Min Park, Eric Koessler, Todd D. Krauss, Pengfei Huo, Teri W. Odom

Abstract: This paper reports how CdSe core-only nanoplatelets coupled with plasmonic Al nanoparticle lattices can exhibit exciton-polariton lasing. By improving a procedure to synthesize monodisperse 4-monolayer CdSe nanoplatelets, we could resolve polariton decay dynamics and pathways. Experiment and theory confirmed that the system is in the strong coupling regime based on anti-crossings in the dispersion… ▽ More This paper reports how CdSe core-only nanoplatelets coupled with plasmonic Al nanoparticle lattices can exhibit exciton-polariton lasing. By improving a procedure to synthesize monodisperse 4-monolayer CdSe nanoplatelets, we could resolve polariton decay dynamics and pathways. Experiment and theory confirmed that the system is in the strong coupling regime based on anti-crossings in the dispersion diagrams and magnitude of the Rabi splitting values. Notably, polariton lasing is observed only for cavity lattice periodicities that exhibit specific dispersive characteristics that enable polariton accumulation. The threshold of polariton lasing is 25-fold lower than reported photon lasing values from CdSe nanoplatelets in similar cavity designs. This open-cavity platform offers a simple approach to control exciton polaritons anticipated to benefit quantum information processing, optoelectronics, and chemical reactions. △ Less

Submitted 12 April, 2024; originally announced April 2024.

arXiv:2404.03402 [pdf, ps, other]

On steady solutions of the Hall-MHD system in Besov spaces

Authors: Jin Tan, Hiroyuki Tsurumi, Xin Zhang

Abstract: In this paper, we investigate the well-posedness and ill-posedness issues for the incompressible stationary Hall-magnetohydrodynamic (Hall-MHD) system in $\mathbb{R}^3.$ We first show the existence and uniqueness of solutions provided with the forces in $\dot B^{3/p-3}_{p,r}(\mathbb{R}^3)$ for $1\leq p <3$ and $r=1$. Moreover, this result can be extended to any $1\leq r\leq \infty$ whenever… ▽ More In this paper, we investigate the well-posedness and ill-posedness issues for the incompressible stationary Hall-magnetohydrodynamic (Hall-MHD) system in $\mathbb{R}^3.$ We first show the existence and uniqueness of solutions provided with the forces in $\dot B^{3/p-3}_{p,r}(\mathbb{R}^3)$ for $1\leq p <3$ and $r=1$. Moreover, this result can be extended to any $1\leq r\leq \infty$ whenever $p=2,$ without any additional assumption on the physical parameters. On the other hand, we establish some ill-posedness results for Hall-MHD system by using the discontinuity of the solution mapping of the three-dimensional stationary Navier-Stokes equations in \emph{critical} function spaces $\dot{B}^{3/p-1}_{p,r}(\mathbb{R}^3)$ ($p\geq 3$). △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: 22 pages

arXiv:2404.00653 [pdf, other]

Dual DETRs for Multi-Label Temporal Action Detection

Authors: Yuhan Zhu, Guozhen Zhang, Jing Tan, Gangshan Wu, Limin Wang

Abstract: Temporal Action Detection (TAD) aims to identify the action boundaries and the corresponding category within untrimmed videos. Inspired by the success of DETR in object detection, several methods have adapted the query-based framework to the TAD task. However, these approaches primarily followed DETR to predict actions at the instance level (i.e., identify each action by its center point), leading… ▽ More Temporal Action Detection (TAD) aims to identify the action boundaries and the corresponding category within untrimmed videos. Inspired by the success of DETR in object detection, several methods have adapted the query-based framework to the TAD task. However, these approaches primarily followed DETR to predict actions at the instance level (i.e., identify each action by its center point), leading to sub-optimal boundary localization. To address this issue, we propose a new Dual-level query-based TAD framework, namely DualDETR, to detect actions from both instance-level and boundary-level. Decoding at different levels requires semantics of different granularity, therefore we introduce a two-branch decoding structure. This structure builds distinctive decoding processes for different levels, facilitating explicit capture of temporal cues and semantics at each level. On top of the two-branch design, we present a joint query initialization strategy to align queries from both levels. Specifically, we leverage encoder proposals to match queries from each level in a one-to-one manner. Then, the matched queries are initialized using position and content prior from the matched action proposal. The aligned dual-level queries can refine the matched proposal with complementary cues during subsequent decoding. We evaluate DualDETR on three challenging multi-label TAD benchmarks. The experimental results demonstrate the superior performance of DualDETR to the existing state-of-the-art methods, achieving a substantial improvement under det-mAP and delivering impressive results under seg-mAP. △ Less

Submitted 31 March, 2024; originally announced April 2024.

Comments: Accepted by CVPR 2024

arXiv:2403.19021 [pdf, other]

IDGenRec: LLM-RecSys Alignment with Textual ID Learning

Authors: Juntao Tan, Shuyuan Xu, Wenyue Hua, Yingqiang Ge, Zelong Li, Yongfeng Zhang

Abstract: Generative recommendation based on Large Language Models (LLMs) have transformed the traditional ranking-based recommendation style into a text-to-text generation paradigm. However, in contrast to standard NLP tasks that inherently operate on human vocabulary, current research in generative recommendations struggles to effectively encode recommendation items within the text-to-text framework using… ▽ More Generative recommendation based on Large Language Models (LLMs) have transformed the traditional ranking-based recommendation style into a text-to-text generation paradigm. However, in contrast to standard NLP tasks that inherently operate on human vocabulary, current research in generative recommendations struggles to effectively encode recommendation items within the text-to-text framework using concise yet meaningful ID representations. To better align LLMs with recommendation needs, we propose IDGen, representing each item as a unique, concise, semantically rich, platform-agnostic textual ID using human language tokens. This is achieved by training a textual ID generator alongside the LLM-based recommender, enabling seamless integration of personalized recommendations into natural language generation. Notably, as user history is expressed in natural language and decoupled from the original dataset, our approach suggests the potential for a foundational generative recommendation model. Experiments show that our framework consistently surpasses existing models in sequential recommendation under standard experimental setting. Then, we explore the possibility of training a foundation recommendation model with the proposed method on data collected from 19 different datasets and tested its recommendation performance on 6 unseen datasets across different platforms under a completely zero-shot setting. The results show that the zero-shot performance of the pre-trained foundation model is comparable to or even better than some traditional recommendation models based on supervised training, showing the potential of the IDGen paradigm serving as the foundation model for generative recommendation. Code and data are open-sourced at https://github.com/agiresearch/IDGenRec. △ Less

Submitted 17 May, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

Comments: Accepted in SIGIR 2024

arXiv:2403.18927 [pdf, other]

Optimal Coherent Quantum Phase Estimation via Tapering

Authors: Dhrumil Patel, Shi Jie Samuel Tan, Yigit Subasi, Andrew T. Sornborger

Abstract: Quantum phase estimation is one of the fundamental primitives that underpins many quantum algorithms, including quantum amplitude estimation, the HHL algorithm for solving linear systems of equations, and quantum principal component analysis. Due to its significance as a subroutine, in this work, we study the coherent version of the phase estimation problem, where given an arbitrary input state an… ▽ More Quantum phase estimation is one of the fundamental primitives that underpins many quantum algorithms, including quantum amplitude estimation, the HHL algorithm for solving linear systems of equations, and quantum principal component analysis. Due to its significance as a subroutine, in this work, we study the coherent version of the phase estimation problem, where given an arbitrary input state and black-box access to unitaries $U$ and controlled-$U$, the goal is to estimate the phases of $U$ in superposition. Unlike most existing phase estimation algorithms, which employ intermediary measurements steps that inevitably destroy coherence, only a couple of algorithms, including the well-known standard quantum phase estimation algorithm, consider this coherent setting. In this work, we propose an improved version of this standard algorithm that utilizes tapering/window functions. Our algorithm, which we call tapered quantum phase estimation algorithm, achieves the optimal query complexity (total number of calls to $U$ and controlled-$U$) without requiring the use of a computationally expensive quantum sorting network for median computation, which the standard algorithm uses to boost the success probability arbitrarily close to one. We also show that the tapering functions that we use are optimal by formulating optimization problems with different optimization criteria. Beyond the asymptotic regime, we also provide non-asymptotic query complexity of our algorithm, as it is crucial for practical implementation. Finally, we also propose an efficient algorithm that prepares the quantum state corresponding to the optimal tapering function. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: 23 pages, 6 figures

Report number: LA-UR-23-30410

arXiv:2403.18197 [pdf, other]

LocoMan: Advancing Versatile Quadrupedal Dexterity with Lightweight Loco-Manipulators

Authors: Changyi Lin, Xingyu Liu, Yuxiang Yang, Yaru Niu, Wenhao Yu, Tingnan Zhang, Jie Tan, Byron Boots, Ding Zhao

Abstract: Quadrupedal robots have emerged as versatile agents capable of locomoting and manipulating in complex environments. Traditional designs typically rely on the robot's inherent body parts or incorporate top-mounted arms for manipulation tasks. However, these configurations may limit the robot's operational dexterity, efficiency and adaptability, particularly in cluttered or constrained spaces. In th… ▽ More Quadrupedal robots have emerged as versatile agents capable of locomoting and manipulating in complex environments. Traditional designs typically rely on the robot's inherent body parts or incorporate top-mounted arms for manipulation tasks. However, these configurations may limit the robot's operational dexterity, efficiency and adaptability, particularly in cluttered or constrained spaces. In this work, we present LocoMan, a dexterous quadrupedal robot with a novel morphology to perform versatile manipulation in diverse constrained environments. By equipping a Unitree Go1 robot with two low-cost and lightweight modular 3-DoF loco-manipulators on its front calves, LocoMan leverages the combined mobility and functionality of the legs and grippers for complex manipulation tasks that require precise 6D positioning of the end effector in a wide workspace. To harness the loco-manipulation capabilities of LocoMan, we introduce a unified control framework that extends the whole-body controller (WBC) to integrate the dynamics of loco-manipulators. Through experiments, we validate that the proposed whole-body controller can accurately and stably follow desired 6D trajectories of the end effector and torso, which, when combined with the large workspace from our design, facilitates a diverse set of challenging dexterous loco-manipulation tasks in confined spaces, such as opening doors, plugging into sockets, picking objects in narrow and low-lying spaces, and bimanual manipulation. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: Project page: https://linchangyi1.github.io/LocoMan

arXiv:2403.16138 [pdf, other]

Glimmers in the Cosmic Dawn: A Census of the Youngest Supermassive Black Holes by Photometric Variability

Authors: Matthew J. Hayes, Jonathan C. Tan, Richard S. Ellis, Alice R. Young, Vieri Cammelli, Jasbir Singh, Axel Runnholm, Aayush Saxena, Ragnhild Lunnan, Benjamin W. Keller, Pierluigi Monaco, Nicolas Laporte, Jens Melinder

Abstract: We report first results from a deep near infrared campaign with the Hubble Space Telescope to obtain late-epoch images of the Hubble Ultra-Deep Field (HUDF), 10-15 years after the first epoch data were obtained. The main objectives are to search for faint active galactic nuclei (AGN) at high redshifts by virtue of their photometric variability, and measure (or constrain) the comoving number densit… ▽ More We report first results from a deep near infrared campaign with the Hubble Space Telescope to obtain late-epoch images of the Hubble Ultra-Deep Field (HUDF), 10-15 years after the first epoch data were obtained. The main objectives are to search for faint active galactic nuclei (AGN) at high redshifts by virtue of their photometric variability, and measure (or constrain) the comoving number density of supermassive black holes (SMBHs), n_SMBH, at early times. In this Letter we present a brief overview of the program and preliminary results regarding eight objects. Three variables are supernovae, two of which are apparently hostless with indeterminable redshifts, although one has previously been recorded at a z\approx 6 galaxy. Two further objects are clear AGN candidates at z=2.0 and 3.2, based on morphology and/or spectroscopy, in particular infrared spectroscopy from JWST. Three variable targets are identified at z=6-7, which are also likely AGN candidates. These sources provide a first measure of n_SMBH in the reionization epoch by photometric variability, which places a firm lower limit of 3x10^{-4} cMpc^{-3}. After accounting for variability and luminosity incompleteness, we estimate n_SMBH \gtrsim 8x10^{-3} cMpc^{-3}, which is the largest value so far reported at these redshifts. This SMBH abundance is also strikingly similar to estimates of n_SMBH in the local Universe. We discuss how these results test various theories for SMBH formation. △ Less

Submitted 24 March, 2024; originally announced March 2024.

Comments: Submitted to ApJ

arXiv:2403.15951 [pdf, other]

MapTracker: Tracking with Strided Memory Fusion for Consistent Vector HD Mapping

Authors: Jiacheng Chen, Yuefan Wu, Jiaqi Tan, Hang Ma, Yasutaka Furukawa

Abstract: This paper presents a vector HD-mapping algorithm that formulates the mapping as a tracking task and uses a history of memory latents to ensure consistent reconstructions over time. Our method, MapTracker, accumulates a sensor stream into memory buffers of two latent representations: 1) Raster latents in the bird's-eye-view (BEV) space and 2) Vector latents over the road elements (i.e., pedestrian… ▽ More This paper presents a vector HD-mapping algorithm that formulates the mapping as a tracking task and uses a history of memory latents to ensure consistent reconstructions over time. Our method, MapTracker, accumulates a sensor stream into memory buffers of two latent representations: 1) Raster latents in the bird's-eye-view (BEV) space and 2) Vector latents over the road elements (i.e., pedestrian-crossings, lane-dividers, and road-boundaries). The approach borrows the query propagation paradigm from the tracking literature that explicitly associates tracked road elements from the previous frame to the current, while fusing a subset of memory latents selected with distance strides to further enhance temporal consistency. A vector latent is decoded to reconstruct the geometry of a road element. The paper further makes benchmark contributions by 1) Improving processing code for existing datasets to produce consistent ground truth with temporal alignments and 2) Augmenting existing mAP metrics with consistency checks. MapTracker significantly outperforms existing methods on both nuScenes and Agroverse2 datasets by over 8% and 19% on the conventional and the new consistency-aware metrics, respectively. The code will be available on our project page: https://map-tracker.github.io. △ Less

Submitted 23 March, 2024; originally announced March 2024.

Comments: Project page: https://map-tracker.github.io

arXiv:2403.15637 [pdf, other]

CoNVOI: Context-aware Navigation using Vision Language Models in Outdoor and Indoor Environments

Authors: Adarsh Jagan Sathyamoorthy, Kasun Weerakoon, Mohamed Elnoor, Anuj Zore, Brian Ichter, Fei Xia, Jie Tan, Wenhao Yu, Dinesh Manocha

Abstract: We present ConVOI, a novel method for autonomous robot navigation in real-world indoor and outdoor environments using Vision Language Models (VLMs). We employ VLMs in two ways: first, we leverage their zero-shot image classification capability to identify the context or scenario (e.g., indoor corridor, outdoor terrain, crosswalk, etc) of the robot's surroundings, and formulate context-based naviga… ▽ More We present ConVOI, a novel method for autonomous robot navigation in real-world indoor and outdoor environments using Vision Language Models (VLMs). We employ VLMs in two ways: first, we leverage their zero-shot image classification capability to identify the context or scenario (e.g., indoor corridor, outdoor terrain, crosswalk, etc) of the robot's surroundings, and formulate context-based navigation behaviors as simple text prompts (e.g. ``stay on the pavement"). Second, we utilize their state-of-the-art semantic understanding and logical reasoning capabilities to compute a suitable trajectory given the identified context. To this end, we propose a novel multi-modal visual marking approach to annotate the obstacle-free regions in the RGB image used as input to the VLM with numbers, by correlating it with a local occupancy map of the environment. The marked numbers ground image locations in the real-world, direct the VLM's attention solely to navigable locations, and elucidate the spatial relationships between them and terrains depicted in the image to the VLM. Next, we query the VLM to select numbers on the marked image that satisfy the context-based behavior text prompt, and construct a reference path using the selected numbers. Finally, we propose a method to extrapolate the reference trajectory when the robot's environmental context has not changed to prevent unnecessary VLM queries. We use the reference trajectory to guide a motion planner, and demonstrate that it leads to human-like behaviors (e.g. not cutting through a group of people, using crosswalks, etc.) in various real-world indoor and outdoor scenarios. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: 9 pages, 4 figures

arXiv:2403.14877 [pdf, other]

TEeVTOL: Balancing Energy and Time Efficiency in eVTOL Aircraft Path Planning Across City-Scale Wind Fields

Authors: Songyang Liu, Shuai Li, Haochen Li, Weizi Li, Jindong Tan

Abstract: Electric vertical-takeoff and landing (eVTOL) aircraft, recognized for their maneuverability and flexibility, offer a promising alternative to our transportation system. However, the operational effectiveness of these aircraft faces many challenges, such as the delicate balance between energy and time efficiency, stemming from unpredictable environmental factors, including wind fields. Mathematica… ▽ More Electric vertical-takeoff and landing (eVTOL) aircraft, recognized for their maneuverability and flexibility, offer a promising alternative to our transportation system. However, the operational effectiveness of these aircraft faces many challenges, such as the delicate balance between energy and time efficiency, stemming from unpredictable environmental factors, including wind fields. Mathematical modeling-based approaches have been adopted to plan aircraft flight path in urban wind fields with the goal to save energy and time costs. While effective, they are limited in adapting to dynamic and complex environments. To optimize energy and time efficiency in eVTOL's flight through dynamic wind fields, we introduce a novel path planning method leveraging deep reinforcement learning. We assess our method with extensive experiments, comparing it to Dijkstra's algorithm -- the theoretically optimal approach for determining shortest paths in a weighted graph, where weights represent either energy or time cost. The results show that our method achieves a graceful balance between energy and time efficiency, closely resembling the theoretically optimal values for both objectives. △ Less

Submitted 21 March, 2024; originally announced March 2024.

arXiv:2403.14531 [pdf, other]

doi 10.1093/jrsssb/qkae031

Green's matching: an efficient approach to parameter estimation in complex dynamic systems

Authors: Jianbin Tan, Guoyu Zhang, Xueqin Wang, Hui Huang, Fang Yao

Abstract: Parameters of differential equations are essential to characterize intrinsic behaviors of dynamic systems. Numerous methods for estimating parameters in dynamic systems are computationally and/or statistically inadequate, especially for complex systems with general-order differential operators, such as motion dynamics. This article presents Green's matching, a computationally tractable and statist… ▽ More Parameters of differential equations are essential to characterize intrinsic behaviors of dynamic systems. Numerous methods for estimating parameters in dynamic systems are computationally and/or statistically inadequate, especially for complex systems with general-order differential operators, such as motion dynamics. This article presents Green's matching, a computationally tractable and statistically efficient two-step method, which only needs to approximate trajectories in dynamic systems but not their derivatives due to the inverse of differential operators by Green's function. This yields a statistically optimal guarantee for parameter estimation in general-order equations, a feature not shared by existing methods, and provides an efficient framework for broad statistical inferences in complex dynamic systems. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: 40 pages, 4 figures

Journal ref: Journal of the Royal Statistical Society: Series B, 2024

Showing 1–50 of 922 results for author: Tan, J