subscribe to arXiv mailings

The discovery of a nearby 421~s transient with CHIME/FRB/Pulsar

Authors: Fengqiu Adam Dong, Tracy Clarke, Alice P. Curtin, Ajay Kumar, Ingrid Stairs, Shami Chatterjee, Amanda M. Cook, Emmanuel Fonseca, B. M. Gaensler, Jason W. T. Hessels, Victoria M. Kaspi, Mattias Lazda, Kiyoshi W. Masui, James W. McKee, Bradley W. Meyers, Aaron B. Pearlman, Scott M. Ransom, Paul Scholz, Kaitlyn Shin, Kendrick M. Smith, Chia Min Tan

Abstract: Neutron stars and white dwarfs are both dense remnants of post-main-sequence stars. Pulsars, magnetars and strongly magnetised white dwarfs have all been seen to been observed to exhibit coherent, pulsed radio emission in relation to their rotational period. Recently, a new type of radio long period transient (LPT) has been discovered. The bright radio emission of LPTs resembles that of radio puls… ▽ More Neutron stars and white dwarfs are both dense remnants of post-main-sequence stars. Pulsars, magnetars and strongly magnetised white dwarfs have all been seen to been observed to exhibit coherent, pulsed radio emission in relation to their rotational period. Recently, a new type of radio long period transient (LPT) has been discovered. The bright radio emission of LPTs resembles that of radio pulsars and magnetars. However, they pulse on timescales (minutes) much longer than previously seen. While minute timescales are common rotation periods for white dwarfs, LPTs are much brighter than the known pulsating white dwarfs, and dipolar radiation from isolated (as opposed to binary) magnetic white dwarfs has yet to be observed. Here, we report the discovery of a new $\sim$421~s LPT, CHIME J0630+25, using the CHIME/FRB and CHIME/Pulsar instruments. We used standard pulsar timing techniques and obtained a phase-coherent timing solution which yielded limits on the inferred magnetic field and characteristic age. CHIME J0630+25 is remarkably nearby ($170 \pm 80$~pc), making it the closest LPT discovered to date. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: Submitted

arXiv:2407.06406 [pdf, other]

Unveiling mussel plaque core ductility: the role of pore distribution and hierarchical structure

Authors: Yulan Lyu, Mengting Tan, Yong Pang, Wei Sun, Shuguang Li, Tao Liu

Abstract: The mussel thread-plaque system exhibits strong adhesion and high ductility, allowing it to adhere to various surfaces. While the microstructure of plaques has been thoroughly studied, the effect of their unique porous structure on ductility remains unclear. This study firstly investigated the porous structure of mussel plaque cores using scanning electron microscopy (SEM). Two-dimensional (2D) po… ▽ More The mussel thread-plaque system exhibits strong adhesion and high ductility, allowing it to adhere to various surfaces. While the microstructure of plaques has been thoroughly studied, the effect of their unique porous structure on ductility remains unclear. This study firstly investigated the porous structure of mussel plaque cores using scanning electron microscopy (SEM). Two-dimensional (2D) porous representative volume elements (RVEs) with scaled distribution parameters were generated, and the calibrated phase-field modelling method was applied to analyse the effect of the pore distribution and multi-scale porous structure on the failure mechanism of porous RVEs. The SEM analysis revealed that large-scale pores exhibited a lognormal size distribution and a uniform spatial distribution. Simulations showed that increasing the normalised mean radius value of the large-scale pore distribution can statistically lead to a decreasing trend in ductility, strength and strain energy, but cannot solely determine their values. The interaction between pores can lead to two different failure modes under the same pore distribution: progressive failure mode and sudden failure mode. Additionally, the hierarchical structure of multi-scale porous RVEs can further increase ductility by 40%-60% compared to single-scale porous RVEs by reducing stiffness, highlighting the hierarchical structure could be another key factor contributing to the high ductility. These findings deepen our understanding of how the pore distribution and multi-scale porous structure in mussel plaques contribute to their high ductility and affect other mechanical properties, providing valuable insights for the future design of highly ductile biomimetic materials. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2406.16964 [pdf, other]

Are Language Models Actually Useful for Time Series Forecasting?

Authors: Mingtian Tan, Mike A. Merrill, Vinayak Gupta, Tim Althoff, Thomas Hartvigsen

Abstract: Large language models (LLMs) are being applied to time series tasks, particularly time series forecasting. However, are language models actually useful for time series? After a series of ablation studies on three recent and popular LLM-based time series forecasting methods, we find that removing the LLM component or replacing it with a basic attention layer does not degrade the forecasting results… ▽ More Large language models (LLMs) are being applied to time series tasks, particularly time series forecasting. However, are language models actually useful for time series? After a series of ablation studies on three recent and popular LLM-based time series forecasting methods, we find that removing the LLM component or replacing it with a basic attention layer does not degrade the forecasting results -- in most cases the results even improved. We also find that despite their significant computational cost, pretrained LLMs do no better than models trained from scratch, do not represent the sequential dependencies in time series, and do not assist in few-shot settings. Additionally, we explore time series encoders and reveal that patching and attention structures perform similarly to state-of-the-art LLM-based forecasters. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: 25 pages, 8 figures and 20 tables

arXiv:2406.10447 [pdf, other]

The BabyView dataset: High-resolution egocentric videos of infants' and young children's everyday experiences

Authors: Bria Long, Violet Xiang, Stefan Stojanov, Robert Z. Sparks, Zi Yin, Grace E. Keene, Alvin W. M. Tan, Steven Y. Feng, Chengxu Zhuang, Virginia A. Marchman, Daniel L. K. Yamins, Michael C. Frank

Abstract: Human children far exceed modern machine learning algorithms in their sample efficiency, achieving high performance in key domains with much less data than current models. This ''data gap'' is a key challenge both for building intelligent artificial systems and for understanding human development. Egocentric video capturing children's experience -- their ''training data'' -- is a key ingredient fo… ▽ More Human children far exceed modern machine learning algorithms in their sample efficiency, achieving high performance in key domains with much less data than current models. This ''data gap'' is a key challenge both for building intelligent artificial systems and for understanding human development. Egocentric video capturing children's experience -- their ''training data'' -- is a key ingredient for comparison of humans and models and for the development of algorithmic innovations to bridge this gap. Yet there are few such datasets available, and extant data are low-resolution, have limited metadata, and importantly, represent only a small set of children's experiences. Here, we provide the first release of the largest developmental egocentric video dataset to date -- the BabyView dataset -- recorded using a high-resolution camera with a large vertical field-of-view and gyroscope/accelerometer data. This 493 hour dataset includes egocentric videos from children spanning 6 months - 5 years of age in both longitudinal, at-home contexts and in a preschool environment. We provide gold-standard annotations for the evaluation of speech transcription, speaker diarization, and human pose estimation, and evaluate models in each of these domains. We train self-supervised language and vision models and evaluate their transfer to out-of-distribution tasks including syntactic structure learning, object recognition, depth estimation, and image segmentation. Although performance in each scales with dataset size, overall performance is relatively lower than when models are trained on curated datasets, especially in the visual domain. Our dataset stands as an open challenge for robust, humanlike AI systems: how can such systems achieve human-levels of success on the same scale and distribution of training data as humans? △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 9 pages, 2 figures, 4 tables and SI. Submitted to NeurIPS Datasets and Benchmarks

arXiv:2406.10215 [pdf, other]

DevBench: A multimodal developmental benchmark for language learning

Authors: Alvin Wei Ming Tan, Sunny Yu, Bria Long, Wanjing Anya Ma, Tonya Murray, Rebecca D. Silverman, Jason D. Yeatman, Michael C. Frank

Abstract: How (dis)similar are the learning trajectories of vision-language models and children? Recent modeling work has attempted to understand the gap between models' and humans' data efficiency by constructing models trained on less data, especially multimodal naturalistic data. However, such models are often evaluated on adult-level benchmarks, with limited breadth in language abilities tested, and wit… ▽ More How (dis)similar are the learning trajectories of vision-language models and children? Recent modeling work has attempted to understand the gap between models' and humans' data efficiency by constructing models trained on less data, especially multimodal naturalistic data. However, such models are often evaluated on adult-level benchmarks, with limited breadth in language abilities tested, and without direct comparison to behavioral data. We introduce DevBench, a multimodal benchmark comprising seven language evaluation tasks spanning the domains of lexical, syntactic, and semantic ability, with behavioral data from both children and adults. We evaluate a set of vision-language models on these tasks, comparing models and humans not only on accuracy but on their response patterns. Across tasks, models exhibit variation in their closeness to human response patterns, and models that perform better on a task also more closely resemble human behavioral responses. We also examine the developmental trajectory of OpenCLIP over training, finding that greater training results in closer approximations to adult response patterns. DevBench thus provides a benchmark for comparing models to human language development. These comparisons highlight ways in which model and human language learning processes diverge, providing insight into entry points for improving language models. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.07054 [pdf, other]

CoEvol: Constructing Better Responses for Instruction Finetuning through Multi-Agent Cooperation

Authors: Renhao Li, Minghuan Tan, Derek F. Wong, Min Yang

Abstract: In recent years, instruction fine-tuning (IFT) on large language models (LLMs) has garnered considerable attention to enhance model performance on unseen tasks. Attempts have been made on automatic construction and effective selection for IFT data. However, we posit that previous methods have not fully harnessed the potential of LLMs for enhancing data quality. The responses within IFT data could… ▽ More In recent years, instruction fine-tuning (IFT) on large language models (LLMs) has garnered considerable attention to enhance model performance on unseen tasks. Attempts have been made on automatic construction and effective selection for IFT data. However, we posit that previous methods have not fully harnessed the potential of LLMs for enhancing data quality. The responses within IFT data could be further enhanced by leveraging the capabilities of LLMs themselves. In this paper, we propose CoEvol, an LLM-based multi-agent cooperation framework for the improvement of responses to instructions. To effectively refine the responses, we develop an iterative framework following a debate-advise-edit-judge paradigm. A two-stage multi-agent debate strategy is further devised to ensure the diversity and reliability of editing suggestions within the framework. Empirically, models equipped with CoEvol outperform competitive baselines evaluated by MT-Bench and AlpacaEval, demonstrating its effectiveness in enhancing instruction-following capabilities for LLMs. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.06063 [pdf, other]

Enabling Large-Scale and High-Precision Fluid Simulations on Near-Term Quantum Computers

Authors: Zhao-Yun Chen, Teng-Yang Ma, Chuang-Chao Ye, Liang Xu, Ming-Yang Tan, Xi-Ning Zhuang, Xiao-Fan Xu, Yun-Jie Wang, Tai-Ping Sun, Yong Chen, Lei Du, Liang-Liang Guo, Hai-Feng Zhang, Hao-Ran Tao, Tian-Le Wang, Xiao-Yan Yang, Ze-An Zhao, Peng Wang, Sheng Zhang, Chi Zhang, Ren-Ze Zhao, Zhi-Long Jia, Wei-Cheng Kong, Meng-Han Dou, Jun-Chao Wang , et al. (7 additional authors not shown)

Abstract: Quantum computational fluid dynamics (QCFD) offers a promising alternative to classical computational fluid dynamics (CFD) by leveraging quantum algorithms for higher efficiency. This paper introduces a comprehensive QCFD method, including an iterative method "Iterative-QLS" that suppresses error in quantum linear solver, and a subspace method to scale the solution to a larger size. We implement o… ▽ More Quantum computational fluid dynamics (QCFD) offers a promising alternative to classical computational fluid dynamics (CFD) by leveraging quantum algorithms for higher efficiency. This paper introduces a comprehensive QCFD method, including an iterative method "Iterative-QLS" that suppresses error in quantum linear solver, and a subspace method to scale the solution to a larger size. We implement our method on a superconducting quantum computer, demonstrating successful simulations of steady Poiseuille flow and unsteady acoustic wave propagation. The Poiseuille flow simulation achieved a relative error of less than $0.2\%$, and the unsteady acoustic wave simulation solved a 5043-dimensional matrix. We emphasize the utilization of the quantum-classical hybrid approach in applications of near-term quantum computers. By adapting to quantum hardware constraints and offering scalable solutions for large-scale CFD problems, our method paves the way for practical applications of near-term quantum computers in computational science. △ Less

Submitted 19 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

Comments: 31 pages, 10 figures

arXiv:2406.02864 [pdf, other]

NUMCoT: Numerals and Units of Measurement in Chain-of-Thought Reasoning using Large Language Models

Authors: Ancheng Xu, Minghuan Tan, Lei Wang, Min Yang, Ruifeng Xu

Abstract: Numeral systems and units of measurement are two conjoined topics in activities of human beings and have mutual effects with the languages expressing them. Currently, the evaluation of Large Language Models (LLMs) often involves mathematical reasoning, yet little attention is given to how minor changes in numbers or units can drastically alter the complexity of problems and the performance of LLMs… ▽ More Numeral systems and units of measurement are two conjoined topics in activities of human beings and have mutual effects with the languages expressing them. Currently, the evaluation of Large Language Models (LLMs) often involves mathematical reasoning, yet little attention is given to how minor changes in numbers or units can drastically alter the complexity of problems and the performance of LLMs. In this paper, we scrutinize existing LLMs on processing of numerals and units of measurement by constructing datasets with perturbations. We first anatomize the reasoning of math word problems to different sub-procedures like numeral conversions from language to numbers and measurement conversions based on units. Then we further annotate math word problems from ancient Chinese arithmetic works which are challenging in numerals and units of measurement. Experiments on perturbed datasets demonstrate that LLMs still encounter difficulties in handling numeral and measurement conversions. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: Findings of ACL 2024

arXiv:2406.02425 [pdf, other]

CoNav: A Benchmark for Human-Centered Collaborative Navigation

Authors: Changhao Li, Xinyu Sun, Peihao Chen, Jugang Fan, Zixu Wang, Yanxia Liu, Jinhui Zhu, Chuang Gan, Mingkui Tan

Abstract: Human-robot collaboration, in which the robot intelligently assists the human with the upcoming task, is an appealing objective. To achieve this goal, the agent needs to be equipped with a fundamental collaborative navigation ability, where the agent should reason human intention by observing human activities and then navigate to the human's intended destination in advance of the human. However, t… ▽ More Human-robot collaboration, in which the robot intelligently assists the human with the upcoming task, is an appealing objective. To achieve this goal, the agent needs to be equipped with a fundamental collaborative navigation ability, where the agent should reason human intention by observing human activities and then navigate to the human's intended destination in advance of the human. However, this vital ability has not been well studied in previous literature. To fill this gap, we propose a collaborative navigation (CoNav) benchmark. Our CoNav tackles the critical challenge of constructing a 3D navigation environment with realistic and diverse human activities. To achieve this, we design a novel LLM-based humanoid animation generation framework, which is conditioned on both text descriptions and environmental context. The generated humanoid trajectory obeys the environmental context and can be easily integrated into popular simulators. We empirically find that the existing navigation methods struggle in CoNav task since they neglect the perception of human intention. To solve this problem, we propose an intention-aware agent for reasoning both long-term and short-term human intention. The agent predicts navigation action based on the predicted intention and panoramic observation. The emergent agent behavior including observing humans, avoiding human collision, and navigation reveals the efficiency of the proposed datasets and agents. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.01054 [pdf, other]

Confidence-Based Task Prediction in Continual Disease Classification Using Probability Distribution

Authors: Tanvi Verma, Lukas Schwemer, Mingrui Tan, Fei Gao, Yong Liu, Huazhu Fu

Abstract: Deep learning models are widely recognized for their effectiveness in identifying medical image findings in disease classification. However, their limitations become apparent in the dynamic and ever-changing clinical environment, characterized by the continuous influx of newly annotated medical data from diverse sources. In this context, the need for continual learning becomes particularly paramou… ▽ More Deep learning models are widely recognized for their effectiveness in identifying medical image findings in disease classification. However, their limitations become apparent in the dynamic and ever-changing clinical environment, characterized by the continuous influx of newly annotated medical data from diverse sources. In this context, the need for continual learning becomes particularly paramount, not only to adapt to evolving medical scenarios but also to ensure the privacy of healthcare data. In our research, we emphasize the utilization of a network comprising expert classifiers, where a new expert classifier is added each time a new task is introduced. We present CTP, a task-id predictor that utilizes confidence scores, leveraging the probability distribution (logits) of the classifier to accurately determine the task-id at inference time. Logits are adjusted to ensure that classifiers yield a high-entropy distribution for data associated with tasks other than their own. By defining a noise region in the distribution and computing confidence scores, CTP achieves superior performance when compared to other relevant continual learning methods. Additionally, the performance of CTP can be further improved by providing it with a continuum of data at the time of inference. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.16725 [pdf, other]

doi 10.3847/1538-4357/ad509e

Discovery and follow-up of a quasiperiodically nulling and sub-pulse drifting pulsar with the Murchison Widefield Array

Authors: G. Grover, N. D. R. Bhat, S. McSweeney, C. P. Lee, B. W. Meyers, C. M. Tan, S. S. Kudale

Abstract: The phenomenon of pulsar nulling, where pulsars temporarily and stochastically cease their radio emission, is thought to be indicative of a `dying' pulsar, where radio emission ceases entirely. Here we report the discovery of a long-period pulsar, PSR J0452-3418, from the ongoing Southern-sky MWA Rapid Two-meter (SMART) pulsar survey. The pulsar has a rotation period of ${\sim}$1.67\,s and a dispe… ▽ More The phenomenon of pulsar nulling, where pulsars temporarily and stochastically cease their radio emission, is thought to be indicative of a `dying' pulsar, where radio emission ceases entirely. Here we report the discovery of a long-period pulsar, PSR J0452-3418, from the ongoing Southern-sky MWA Rapid Two-meter (SMART) pulsar survey. The pulsar has a rotation period of ${\sim}$1.67\,s and a dispersion measure of 19.8\,\dmu, and it exhibits both quasi-periodic nulling and sub-pulse drifting. Periodic nulling is uncommon, only reported in $<1$\% of the pulsar population, with even a smaller fraction showing periodic nulling and sub-pulse drifting. We describe the discovery and follow-up of the pulsar, including a positional determination using high-resolution imaging with the upgraded Giant Metrewave Radio Telescope (uGMRT), initial timing analysis using the combination of MWA and uGMRT data, and detailed characterisation of the nulling and drifting properties in the MWA's frequency band (140-170\,MHz). Our analysis suggests a nulling fraction of 34$\pm6$\% and a nulling periodicity of 42$^{+1.5}_{-1.3}$ pulses. We measure the phase ($P_2$) and time modulation ($P_3$) caused by the sub-pulse drifting, with an average $P_2$ of 7.1$^{+26.3}_{-3.1}$ degrees and a $P_3$ of 4.8$^{+1.5}_{-0.9}$ pulses. We compare and contrast the observed properties with those of other pulsars that exhibit sub-pulse drifting and quasi-periodic nulling phenomena, and find that the majority of these objects tend to be in the `death valley' in the period-period derivative ($P$-$\dot{P}$) diagram. We also discuss some broader implications for pulsar emission physics and the detectability of similar objects using next-generation pulsar surveys. △ Less

Submitted 28 May, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

Comments: 16 pages, 9 Figures, 4 Tables, Accepted for ApJ***

arXiv:2405.16433 [pdf, other]

CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework for Chinese Psychological Counseling

Authors: Chenhao Zhang, Renhao Li, Minghuan Tan, Min Yang, Jingwei Zhu, Di Yang, Jiahao Zhao, Guancheng Ye, Chengming Li, Xiping Hu

Abstract: Using large language models (LLMs) to assist psychological counseling is a significant but challenging task at present. Attempts have been made on improving empathetic conversations or acting as effective assistants in the treatment with LLMs. However, the existing datasets lack consulting knowledge, resulting in LLMs lacking professional consulting competence. Moreover, how to automatically evalu… ▽ More Using large language models (LLMs) to assist psychological counseling is a significant but challenging task at present. Attempts have been made on improving empathetic conversations or acting as effective assistants in the treatment with LLMs. However, the existing datasets lack consulting knowledge, resulting in LLMs lacking professional consulting competence. Moreover, how to automatically evaluate multi-turn dialogues within the counseling process remains an understudied area. To bridge the gap, we propose CPsyCoun, a report-based multi-turn dialogue reconstruction and evaluation framework for Chinese psychological counseling. To fully exploit psychological counseling reports, a two-phase approach is devised to construct high-quality dialogues while a comprehensive evaluation benchmark is developed for the effective automatic evaluation of multi-turn psychological consultations. Competitive experimental results demonstrate the effectiveness of our proposed framework in psychological counseling. We open-source the datasets and model for future research at https://github.com/CAS-SIAT-XinHai/CPsyCoun △ Less

Submitted 10 June, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

Comments: Appectped to Findings of ACL2024

arXiv:2405.13860 [pdf, other]

MAGIC: Map-Guided Few-Shot Audio-Visual Acoustics Modeling

Authors: Diwei Huang, Kunyang Lin, Peihao Chen, Qing Du, Mingkui Tan

Abstract: Few-shot audio-visual acoustics modeling seeks to synthesize the room impulse response in arbitrary locations with few-shot observations. To sufficiently exploit the provided few-shot data for accurate acoustic modeling, we present a *map-guided* framework by constructing acoustic-related visual semantic feature maps of the scenes. Visual features preserve semantic details related to sound and map… ▽ More Few-shot audio-visual acoustics modeling seeks to synthesize the room impulse response in arbitrary locations with few-shot observations. To sufficiently exploit the provided few-shot data for accurate acoustic modeling, we present a *map-guided* framework by constructing acoustic-related visual semantic feature maps of the scenes. Visual features preserve semantic details related to sound and maps provide explicit structural regularities of sound propagation, which are valuable for modeling environment acoustics. We thus extract pixel-wise semantic features derived from observations and project them into a top-down map, namely the **observation semantic map**. This map contains the relative positional information among points and the semantic feature information associated with each point. Yet, limited information extracted by few-shot observations on the map is not sufficient for understanding and modeling the whole scene. We address the challenge by generating a **scene semantic map** via diffusing features and anticipating the observation semantic map. The scene semantic map then interacts with echo encoding by a transformer-based encoder-decoder to predict RIR for arbitrary speaker-listener query pairs. Extensive experiments on Matterport3D and Replica dataset verify the efficacy of our framework. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: 17 pages, 12 pages for main paper, 5 pages for supplementary

arXiv:2405.10212 [pdf, other]

CPsyExam: A Chinese Benchmark for Evaluating Psychology using Examinations

Authors: Jiahao Zhao, Jingwei Zhu, Minghuan Tan, Min Yang, Di Yang, Chenhao Zhang, Guancheng Ye, Chengming Li, Xiping Hu

Abstract: In this paper, we introduce a novel psychological benchmark, CPsyExam, constructed from questions sourced from Chinese language examinations. CPsyExam is designed to prioritize psychological knowledge and case analysis separately, recognizing the significance of applying psychological knowledge to real-world scenarios. From the pool of 22k questions, we utilize 4k to create the benchmark that offe… ▽ More In this paper, we introduce a novel psychological benchmark, CPsyExam, constructed from questions sourced from Chinese language examinations. CPsyExam is designed to prioritize psychological knowledge and case analysis separately, recognizing the significance of applying psychological knowledge to real-world scenarios. From the pool of 22k questions, we utilize 4k to create the benchmark that offers balanced coverage of subjects and incorporates a diverse range of case analysis techniques.Furthermore, we evaluate a range of existing large language models~(LLMs), spanning from open-sourced to API-based models. Our experiments and analysis demonstrate that CPsyExam serves as an effective benchmark for enhancing the understanding of psychology within LLMs and enables the comparison of LLMs across various granularities. △ Less

Submitted 18 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

arXiv:2405.02811 [pdf, other]

PVTransformer: Point-to-Voxel Transformer for Scalable 3D Object Detection

Authors: Zhaoqi Leng, Pei Sun, Tong He, Dragomir Anguelov, Mingxing Tan

Abstract: 3D object detectors for point clouds often rely on a pooling-based PointNet to encode sparse points into grid-like voxels or pillars. In this paper, we identify that the common PointNet design introduces an information bottleneck that limits 3D object detection accuracy and scalability. To address this limitation, we propose PVTransformer: a transformer-based point-to-voxel architecture for 3D det… ▽ More 3D object detectors for point clouds often rely on a pooling-based PointNet to encode sparse points into grid-like voxels or pillars. In this paper, we identify that the common PointNet design introduces an information bottleneck that limits 3D object detection accuracy and scalability. To address this limitation, we propose PVTransformer: a transformer-based point-to-voxel architecture for 3D detection. Our key idea is to replace the PointNet pooling operation with an attention module, leading to a better point-to-voxel aggregation function. Our design respects the permutation invariance of sparse 3D points while being more expressive than the pooling-based PointNet. Experimental results show our PVTransformer achieves much better performance compared to the latest 3D object detectors. On the widely used Waymo Open Dataset, our PVTransformer achieves state-of-the-art 76.5 mAPH L2, outperforming the prior art of SWFormer by +1.7 mAPH L2. △ Less

Submitted 5 May, 2024; originally announced May 2024.

arXiv:2405.00236 [pdf, other]

STT: Stateful Tracking with Transformers for Autonomous Driving

Authors: Longlong Jing, Ruichi Yu, Xu Chen, Zhengli Zhao, Shiwei Sheng, Colin Graber, Qi Chen, Qinru Li, Shangxuan Wu, Han Deng, Sangjin Lee, Chris Sweeney, Qiurui He, Wei-Chih Hung, Tong He, Xingyi Zhou, Farshid Moussavi, Zijian Guo, Yin Zhou, Mingxing Tan, Weilong Yang, Congcong Li

Abstract: Tracking objects in three-dimensional space is critical for autonomous driving. To ensure safety while driving, the tracker must be able to reliably track objects across frames and accurately estimate their states such as velocity and acceleration in the present. Existing works frequently focus on the association task while either neglecting the model performance on state estimation or deploying c… ▽ More Tracking objects in three-dimensional space is critical for autonomous driving. To ensure safety while driving, the tracker must be able to reliably track objects across frames and accurately estimate their states such as velocity and acceleration in the present. Existing works frequently focus on the association task while either neglecting the model performance on state estimation or deploying complex heuristics to predict the states. In this paper, we propose STT, a Stateful Tracking model built with Transformers, that can consistently track objects in the scenes while also predicting their states accurately. STT consumes rich appearance, geometry, and motion signals through long term history of detections and is jointly optimized for both data association and state estimation tasks. Since the standard tracking metrics like MOTA and MOTP do not capture the combined performance of the two tasks in the wider spectrum of object states, we extend them with new metrics called S-MOTA and MOTPS that address this limitation. STT achieves competitive real-time performance on the Waymo Open Dataset. △ Less

Submitted 30 April, 2024; originally announced May 2024.

Comments: ICRA 2024

arXiv:2404.19199 [pdf, other]

Near-ultrastrong nonlinear light-matter coupling in superconducting circuits

Authors: Yufeng Ye, Jeremy B. Kline, Alec Yen, Gregory Cunningham, Max Tan, Alicia Zang, Michael Gingras, Bethany M. Niedzielski, Hannah Stickler, Kyle Serniak, Mollie E. Schwartz, Kevin P. O'Brien

Abstract: The interaction between an atom and an electromagnetic mode of a resonator is of both fundamental interest and is ubiquitous in quantum technologies. Most prior work studies a linear light-matter coupling of the form $g \widehatσ_x (\widehat{a} + \widehat{a}^\dagger)$, where $g$ measured relative to photonic ($ω_a$) and atomic ($ω_b$) mode frequencies can reach the ultrastrong regime (… ▽ More The interaction between an atom and an electromagnetic mode of a resonator is of both fundamental interest and is ubiquitous in quantum technologies. Most prior work studies a linear light-matter coupling of the form $g \widehatσ_x (\widehat{a} + \widehat{a}^\dagger)$, where $g$ measured relative to photonic ($ω_a$) and atomic ($ω_b$) mode frequencies can reach the ultrastrong regime ($g/ω_{a}\!>\!10^{-1}$). In contrast, a nonlinear light-matter coupling of the form $\fracχ{2} \widehatσ_z \widehat{a}^\dagger \widehat{a}$ has the advantage of commuting with the atomic $\widehatσ_z$ and photonic $\widehat{a}^\dagger\widehat{a}$ Hamiltonian, allowing for fundamental operations such as quantum-non-demolition measurement. However, due to the perturbative nature of nonlinear coupling, the state-of-the-art $χ/\text{max}(ω_a, ω_b)$ is limited to $\!<\!10^{-2}$. Here, we use a superconducting circuit architecture featuring a quarton coupler to experimentally demonstrate, for the first time, a near-ultrastrong $χ/\text{max}(ω_a, ω_b)= (4.852\pm0.006)\times10^{-2}$ nonlinear coupling of a superconducting artificial atom and a nearly-linear resonator. We also show signatures of light-light nonlinear coupling ($χ\widehat{a}^\dagger\widehat{a}\widehat{b}^\dagger\widehat{b}$), and $χ/2π= 580.3 \pm 0.4 $ MHz matter-matter nonlinear coupling ($\fracχ{4}\widehatσ_{z,a}\widehatσ_{z,b}$) which represents the largest reported $ZZ$ interaction between two coherent qubits. Such advances in the nonlinear coupling strength of light, matter modes enable new physical regimes and could lead to applications such as orders of magnitude faster qubit readout and gates. △ Less

Submitted 2 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.16745 [pdf, other]

Statistical Inference for Covariate-Adjusted and Interpretable Generalized Factor Model with Application to Testing Fairness

Authors: Jing Ouyang, Chengyu Cui, Kean Ming Tan, Gongjun Xu

Abstract: In the era of data explosion, statisticians have been developing interpretable and computationally efficient statistical methods to measure latent factors (e.g., skills, abilities, and personalities) using large-scale assessment data. In addition to understanding the latent information, the covariate effect on responses controlling for latent factors is also of great scientific interest and has wi… ▽ More In the era of data explosion, statisticians have been developing interpretable and computationally efficient statistical methods to measure latent factors (e.g., skills, abilities, and personalities) using large-scale assessment data. In addition to understanding the latent information, the covariate effect on responses controlling for latent factors is also of great scientific interest and has wide applications, such as evaluating the fairness of educational testing, where the covariate effect reflects whether a test question is biased toward certain individual characteristics (e.g., gender and race) taking into account their latent abilities. However, the large sample size, substantial covariate dimension, and great test length pose challenges to developing efficient methods and drawing valid inferences. Moreover, to accommodate the commonly encountered discrete types of responses, nonlinear latent factor models are often assumed, bringing further complexity to the problem. To address these challenges, we consider a covariate-adjusted generalized factor model and develop novel and interpretable conditions to address the identifiability issue. Based on the identifiability conditions, we propose a joint maximum likelihood estimation method and establish estimation consistency and asymptotic normality results for the covariate effects under a practical yet challenging asymptotic regime. Furthermore, we derive estimation and inference results for latent factors and the factor loadings. We illustrate the finite sample performance of the proposed method through extensive numerical studies and an application to an educational assessment dataset obtained from the Programme for International Student Assessment (PISA). △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.15348 [pdf]

High-Linearity PAM-4 Silicon Micro-ring Transmitter Architecture with Electronic-Photonic Hybrid DAC

Authors: Zheng Li, Chengyang Lv, Min Tan

Abstract: This paper presents a high linearity PAM-4 transmitter (TX) architecture, consisting of a three-segment micro-ring modulator (MRM) and a matched CMOS driver. This architecture can drive a high-linearity 4-level pulse amplitude (PAM-4) modulation signal, thereby extending the tunable operating wavelength range for achieving linear PAM-4 output. We use the three-segment MRM to increase design flexib… ▽ More This paper presents a high linearity PAM-4 transmitter (TX) architecture, consisting of a three-segment micro-ring modulator (MRM) and a matched CMOS driver. This architecture can drive a high-linearity 4-level pulse amplitude (PAM-4) modulation signal, thereby extending the tunable operating wavelength range for achieving linear PAM-4 output. We use the three-segment MRM to increase design flexibility so that the linearity of PAM-4 output can be optimized with another degree of freedom. Each phase shift region is directly driven by the independently amplitude-tunable Non-Return-to-Zero (NRZ) signal. The three-segment modulator can achieve an adjustable wavelength range of approximately 0.037 nm within the high linearity PAM-4 output limit when the driving voltage varies from 1.5 V to 3 V, simultaneously achieving an adjustable insertion loss (IL) range of approximately 2 dB, roughly four times that of the two-segment MRM with a similar design. The driver circuit with adjustable driving voltage is co-designed to adjust the eye height to improve PAM-4 linearity. In this article, the high linearity PAM-4 silicon micro-ring architecture can be employed in optical transmitters to adjust PAM-4 eye-opening size and maximize the PAM-4 output linearity, thus offering the potential for high-performance and low-power overhead transmitters. △ Less

Submitted 14 April, 2024; originally announced April 2024.

Comments: 14 pages, 11 figures

arXiv:2404.11757 [pdf, other]

Language Models Still Struggle to Zero-shot Reason about Time Series

Authors: Mike A. Merrill, Mingtian Tan, Vinayak Gupta, Tom Hartvigsen, Tim Althoff

Abstract: Time series are critical for decision-making in fields like finance and healthcare. Their importance has driven a recent influx of works passing time series into language models, leading to non-trivial forecasting on some datasets. But it remains unknown whether non-trivial forecasting implies that language models can reason about time series. To address this gap, we generate a first-of-its-kind e… ▽ More Time series are critical for decision-making in fields like finance and healthcare. Their importance has driven a recent influx of works passing time series into language models, leading to non-trivial forecasting on some datasets. But it remains unknown whether non-trivial forecasting implies that language models can reason about time series. To address this gap, we generate a first-of-its-kind evaluation framework for time series reasoning, including formal tasks and a corresponding dataset of multi-scale time series paired with text captions across ten domains. Using these data, we probe whether language models achieve three forms of reasoning: (1) Etiological Reasoning - given an input time series, can the language model identify the scenario that most likely created it? (2) Question Answering - can a language model answer factual questions about time series? (3) Context-Aided Forecasting - does highly relevant textual context improve a language model's time series forecasts? We find that otherwise highly-capable language models demonstrate surprisingly limited time series reasoning: they score marginally above random on etiological and question answering tasks (up to 30 percentage points worse than humans) and show modest success in using context to improve forecasting. These weakness showcase that time series reasoning is an impactful, yet deeply underdeveloped direction for language model research. We also make our datasets and code public at to support further research in this direction at https://github.com/behavioral-data/TSandLanguage △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.08395 [pdf]

Room-Temperature Polariton Lasing from CdSe core-only Nanoplatelets

Authors: Francisco Freire-Fernández, Nathan G. Sinai, Max J. H. Tan, Sang-Min Park, Eric Koessler, Todd D. Krauss, Pengfei Huo, Teri W. Odom

Abstract: This paper reports how CdSe core-only nanoplatelets coupled with plasmonic Al nanoparticle lattices can exhibit exciton-polariton lasing. By improving a procedure to synthesize monodisperse 4-monolayer CdSe nanoplatelets, we could resolve polariton decay dynamics and pathways. Experiment and theory confirmed that the system is in the strong coupling regime based on anti-crossings in the dispersion… ▽ More This paper reports how CdSe core-only nanoplatelets coupled with plasmonic Al nanoparticle lattices can exhibit exciton-polariton lasing. By improving a procedure to synthesize monodisperse 4-monolayer CdSe nanoplatelets, we could resolve polariton decay dynamics and pathways. Experiment and theory confirmed that the system is in the strong coupling regime based on anti-crossings in the dispersion diagrams and magnitude of the Rabi splitting values. Notably, polariton lasing is observed only for cavity lattice periodicities that exhibit specific dispersive characteristics that enable polariton accumulation. The threshold of polariton lasing is 25-fold lower than reported photon lasing values from CdSe nanoplatelets in similar cavity designs. This open-cavity platform offers a simple approach to control exciton polaritons anticipated to benefit quantum information processing, optoelectronics, and chemical reactions. △ Less

Submitted 12 April, 2024; originally announced April 2024.

arXiv:2404.07474 [pdf, other]

G-NeRF: Geometry-enhanced Novel View Synthesis from Single-View Images

Authors: Zixiong Huang, Qi Chen, Libo Sun, Yifan Yang, Naizhou Wang, Mingkui Tan, Qi Wu

Abstract: Novel view synthesis aims to generate new view images of a given view image collection. Recent attempts address this problem relying on 3D geometry priors (e.g., shapes, sizes, and positions) learned from multi-view images. However, such methods encounter the following limitations: 1) they require a set of multi-view images as training data for a specific scene (e.g., face, car or chair), which is… ▽ More Novel view synthesis aims to generate new view images of a given view image collection. Recent attempts address this problem relying on 3D geometry priors (e.g., shapes, sizes, and positions) learned from multi-view images. However, such methods encounter the following limitations: 1) they require a set of multi-view images as training data for a specific scene (e.g., face, car or chair), which is often unavailable in many real-world scenarios; 2) they fail to extract the geometry priors from single-view images due to the lack of multi-view supervision. In this paper, we propose a Geometry-enhanced NeRF (G-NeRF), which seeks to enhance the geometry priors by a geometry-guided multi-view synthesis approach, followed by a depth-aware training. In the synthesis process, inspired that existing 3D GAN models can unconditionally synthesize high-fidelity multi-view images, we seek to adopt off-the-shelf 3D GAN models, such as EG3D, as a free source to provide geometry priors through synthesizing multi-view data. Simultaneously, to further improve the geometry quality of the synthetic data, we introduce a truncation method to effectively sample latent codes within 3D GAN models. To tackle the absence of multi-view supervision for single-view images, we design the depth-aware training approach, incorporating a depth-aware discriminator to guide geometry priors through depth maps. Experiments demonstrate the effectiveness of our method in terms of both qualitative and quantitative results. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: CVPR 2024 Accepted Paper

arXiv:2404.04876 [pdf, other]

HiLo: Detailed and Robust 3D Clothed Human Reconstruction with High-and Low-Frequency Information of Parametric Models

Authors: Yifan Yang, Dong Liu, Shuhai Zhang, Zeshuai Deng, Zixiong Huang, Mingkui Tan

Abstract: Reconstructing 3D clothed human involves creating a detailed geometry of individuals in clothing, with applications ranging from virtual try-on, movies, to games. To enable practical and widespread applications, recent advances propose to generate a clothed human from an RGB image. However, they struggle to reconstruct detailed and robust avatars simultaneously. We empirically find that the high-f… ▽ More Reconstructing 3D clothed human involves creating a detailed geometry of individuals in clothing, with applications ranging from virtual try-on, movies, to games. To enable practical and widespread applications, recent advances propose to generate a clothed human from an RGB image. However, they struggle to reconstruct detailed and robust avatars simultaneously. We empirically find that the high-frequency (HF) and low-frequency (LF) information from a parametric model has the potential to enhance geometry details and improve robustness to noise, respectively. Based on this, we propose HiLo, namely clothed human reconstruction with high- and low-frequency information, which contains two components. 1) To recover detailed geometry using HF information, we propose a progressive HF Signed Distance Function to enhance the detailed 3D geometry of a clothed human. We analyze that our progressive learning manner alleviates large gradients that hinder model convergence. 2) To achieve robust reconstruction against inaccurate estimation of the parametric model by using LF information, we propose a spatial interaction implicit function. This function effectively exploits the complementary spatial information from a low-resolution voxel grid of the parametric model. Experimental results demonstrate that HiLo outperforms the state-of-the-art methods by 10.43% and 9.54% in terms of Chamfer distance on the Thuman2.0 and CAPE datasets, respectively. Additionally, HiLo demonstrates robustness to noise from the parametric model, challenging poses, and various clothing styles. △ Less

Submitted 19 April, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

Comments: CVPR 2024 Accepted Paper

arXiv:2403.15851 [pdf, other]

Local operator quench induced by two-dimensional inhomogeneous and homogeneous CFT Hamiltonians

Authors: Weibo Mao, Masahiro Nozaki, Kotaro Tamaoka, Mao Tian Tan

Abstract: We explore non-equilibrium processes in two-dimensional conformal field theories (2d CFTs) due to the growth of operators induced by inhomogeneous and homogeneous Hamiltonians by investigating the time dependence of the partition function, energy density, and entanglement entropy. The non-equilibrium processes considered in this paper are constructed out of the Lorentzian and Euclidean time evolut… ▽ More We explore non-equilibrium processes in two-dimensional conformal field theories (2d CFTs) due to the growth of operators induced by inhomogeneous and homogeneous Hamiltonians by investigating the time dependence of the partition function, energy density, and entanglement entropy. The non-equilibrium processes considered in this paper are constructed out of the Lorentzian and Euclidean time evolution governed by different Hamiltonians. We explore the effect of the time ordering on entanglement dynamics so that we find that in a free boson CFT and RCFTs, this time ordering does not affect the entanglement entropy, while in the holographic CFTs, it does. Our main finding is that in the holographic CFTs, the non-unitary time evolution induced by the inhomogeneous Hamiltonian can retain the initial state information longer than in the unitary time evolution. △ Less

Submitted 2 April, 2024; v1 submitted 23 March, 2024; originally announced March 2024.

Comments: 37 pages+appendices, 6 figures. v2: references added

Report number: RIKEN-iTHEMS-Report-24

arXiv:2403.12582 [pdf, other]

AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework

Authors: Xiang Li, Zhenyu Li, Chen Shi, Yong Xu, Qing Du, Mingkui Tan, Jun Huang, Wei Lin

Abstract: The task of financial analysis primarily encompasses two key areas: stock trend prediction and the corresponding financial question answering. Currently, machine learning and deep learning algorithms (ML&DL) have been widely applied for stock trend predictions, leading to significant progress. However, these methods fail to provide reasons for predictions, lacking interpretability and reasoning pr… ▽ More The task of financial analysis primarily encompasses two key areas: stock trend prediction and the corresponding financial question answering. Currently, machine learning and deep learning algorithms (ML&DL) have been widely applied for stock trend predictions, leading to significant progress. However, these methods fail to provide reasons for predictions, lacking interpretability and reasoning processes. Also, they can not integrate textual information such as financial news or reports. Meanwhile, large language models (LLMs) have remarkable textual understanding and generation ability. But due to the scarcity of financial training datasets and limited integration with real-time knowledge, LLMs still suffer from hallucinations and are unable to keep up with the latest information. To tackle these challenges, we first release AlphaFin datasets, combining traditional research datasets, real-time financial data, and handwritten chain-of-thought (CoT) data. It has a positive impact on training LLMs for completing financial analysis. We then use AlphaFin datasets to benchmark a state-of-the-art method, called Stock-Chain, for effectively tackling the financial analysis task, which integrates retrieval-augmented generation (RAG) techniques. Extensive experiments are conducted to demonstrate the effectiveness of our framework on financial analysis. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: COLING 2024. The first three authors contributed equally. Project website: https://github.com/AlphaFin-proj/AlphaFin

arXiv:2403.11491 [pdf, other]

Uncertainty-Calibrated Test-Time Model Adaptation without Forgetting

Authors: Mingkui Tan, Guohao Chen, Jiaxiang Wu, Yifan Zhang, Yaofo Chen, Peilin Zhao, Shuaicheng Niu

Abstract: Test-time adaptation (TTA) seeks to tackle potential distribution shifts between training and test data by adapting a given model w.r.t. any test sample. Although recent TTA has shown promising performance, we still face two key challenges: 1) prior methods perform backpropagation for each test sample, resulting in unbearable optimization costs to many applications; 2) while existing TTA can signi… ▽ More Test-time adaptation (TTA) seeks to tackle potential distribution shifts between training and test data by adapting a given model w.r.t. any test sample. Although recent TTA has shown promising performance, we still face two key challenges: 1) prior methods perform backpropagation for each test sample, resulting in unbearable optimization costs to many applications; 2) while existing TTA can significantly improve the test performance on out-of-distribution data, they often suffer from severe performance degradation on in-distribution data after TTA (known as forgetting). To this end, we have proposed an Efficient Anti-Forgetting Test-Time Adaptation (EATA) method which develops an active sample selection criterion to identify reliable and non-redundant samples for test-time entropy minimization. To alleviate forgetting, EATA introduces a Fisher regularizer estimated from test samples to constrain important model parameters from drastic changes. However, in EATA, the adopted entropy loss consistently assigns higher confidence to predictions even for samples that are underlying uncertain, leading to overconfident predictions. To tackle this, we further propose EATA with Calibration (EATA-C) to separately exploit the reducible model uncertainty and the inherent data uncertainty for calibrated TTA. Specifically, we measure the model uncertainty by the divergence between predictions from the full network and its sub-networks, on which we propose a divergence loss to encourage consistent predictions instead of overconfident ones. To further recalibrate prediction confidence, we utilize the disagreement among predicted labels as an indicator of the data uncertainty, and then devise a min-max entropy regularizer to selectively increase and decrease prediction confidence for different samples. Experiments on image classification and semantic segmentation verify the effectiveness of our methods. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: 20 pages, 14 tables, 11 figures. arXiv admin note: substantial text overlap with arXiv:2204.02610

arXiv:2403.04519 [pdf]

Qualitative analysis of a class of SIRS infectious disease models with nonlinear infection rate

Authors: Mengqi Tan

Abstract: The existence and local stability of some non-negative equilibrium points of a class of SIRS infectious disease models with non-linear infection and treatment rates are investigated under the condition that the total population is a constant. The qualitative theory of differential equations was used to demonstrate that the endemic equilibrium point of the system is either a stable equilibrium, an… ▽ More The existence and local stability of some non-negative equilibrium points of a class of SIRS infectious disease models with non-linear infection and treatment rates are investigated under the condition that the total population is a constant. The qualitative theory of differential equations was used to demonstrate that the endemic equilibrium point of the system is either a stable equilibrium, an unstable equilibrium or a degenerate equilibrium under different circumstances. Subsequently, the local stability of the non-negative equilibrium point of the system is analyzed. Finally, the bifurcation theory is used to prove that the system takes the natural recovery growth rate as the parameter of the saddle-node branching, and the conditions for the existence of the model saddle-node branching are given. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: arXiv admin note: text overlap with arXiv:2303.17293 by other authors

arXiv:2403.04391 [pdf, other]

doi 10.1103/PhysRevLett.132.066501

$π$ Phase Interlayer Shift and Stacking Fault in the Kagome Superconductor CsV$_3$Sb$_5$

Authors: Feng Jin, Wei Ren, Mingshu Tan, Mingtai Xie, Bingru Lu, Zheng Zhang, Jianting Ji, Qingming Zhang

Abstract: The stacking degree of freedom is a crucial factor in tuning material properties and has been extensively investigated in layered materials. The kagome superconductor CsV$_3$Sb$_5$ was recently discovered to exhibit a three-dimensional CDW phase below TCDW ~94 K. Despite the thorough investigation of in-plane modulation, the out-of-plane modulation has remained ambiguous. Here, our polarization- a… ▽ More The stacking degree of freedom is a crucial factor in tuning material properties and has been extensively investigated in layered materials. The kagome superconductor CsV$_3$Sb$_5$ was recently discovered to exhibit a three-dimensional CDW phase below TCDW ~94 K. Despite the thorough investigation of in-plane modulation, the out-of-plane modulation has remained ambiguous. Here, our polarization- and temperature-dependent Raman measurements reveal the breaking of C$_6$ rotational symmetry and the presence of three distinct domains oriented at approximately 120°to each other. The observations demonstrate that the CDW phase can be naturally explained as a 2c staggered order phase with adjacent layers exhibiting a relative $π$ phase shift. Further, we discover a first-order structural phase transition at approximately 65 K and suggest that it is a stacking order-disorder phase transition due to stacking fault, supported by the thermal hysteresis behavior of a Cs-related phonon mode. Our findings highlight the significance of the stacking degree of freedom in CsV$_3$Sb$_5$ and offer structural insights to comprehend the entanglement between superconductivity and CDW. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: This manuscript was published in Phys. Rev. Lett

Journal ref: Physical Review Letters 132, 066501 (2024)

arXiv:2402.17316 [pdf, other]

Towards Robust and Efficient Cloud-Edge Elastic Model Adaptation via Selective Entropy Distillation

Authors: Yaofo Chen, Shuaicheng Niu, Yaowei Wang, Shoukai Xu, Hengjie Song, Mingkui Tan

Abstract: The conventional deep learning paradigm often involves training a deep model on a server and then deploying the model or its distilled ones to resource-limited edge devices. Usually, the models shall remain fixed once deployed (at least for some period) due to the potential high cost of model adaptation for both the server and edge sides. However, in many real-world scenarios, the test environment… ▽ More The conventional deep learning paradigm often involves training a deep model on a server and then deploying the model or its distilled ones to resource-limited edge devices. Usually, the models shall remain fixed once deployed (at least for some period) due to the potential high cost of model adaptation for both the server and edge sides. However, in many real-world scenarios, the test environments may change dynamically (known as distribution shifts), which often results in degraded performance. Thus, one has to adapt the edge models promptly to attain promising performance. Moreover, with the increasing data collected at the edge, this paradigm also fails to further adapt the cloud model for better performance. To address these, we encounter two primary challenges: 1) the edge model has limited computation power and may only support forward propagation; 2) the data transmission budget between cloud and edge devices is limited in latency-sensitive scenarios. In this paper, we establish a Cloud-Edge Elastic Model Adaptation (CEMA) paradigm in which the edge models only need to perform forward propagation and the edge models can be adapted online. In our CEMA, to reduce the communication burden, we devise two criteria to exclude unnecessary samples from uploading to the cloud, i.e., dynamic unreliable and low-informative sample exclusion. Based on the uploaded samples, we update and distribute the affine parameters of normalization layers by distilling from the stronger foundation model to the edge model with a sample replay strategy. Extensive experimental results on ImageNet-C and ImageNet-R verify the effectiveness of our CEMA. △ Less

Submitted 6 June, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

Comments: Published in ICLR 2024

arXiv:2402.16389 [pdf, other]

MoZIP: A Multilingual Benchmark to Evaluate Large Language Models in Intellectual Property

Authors: Shiwen Ni, Minghuan Tan, Yuelin Bai, Fuqiang Niu, Min Yang, Bowen Zhang, Ruifeng Xu, Xiaojun Chen, Chengming Li, Xiping Hu, Ye Li, Jianping Fan

Abstract: Large language models (LLMs) have demonstrated impressive performance in various natural language processing (NLP) tasks. However, there is limited understanding of how well LLMs perform in specific domains (e.g, the intellectual property (IP) domain). In this paper, we contribute a new benchmark, the first Multilingual-oriented quiZ on Intellectual Property (MoZIP), for the evaluation of LLMs in… ▽ More Large language models (LLMs) have demonstrated impressive performance in various natural language processing (NLP) tasks. However, there is limited understanding of how well LLMs perform in specific domains (e.g, the intellectual property (IP) domain). In this paper, we contribute a new benchmark, the first Multilingual-oriented quiZ on Intellectual Property (MoZIP), for the evaluation of LLMs in the IP domain. The MoZIP benchmark includes three challenging tasks: IP multiple-choice quiz (IPQuiz), IP question answering (IPQA), and patent matching (PatentMatch). In addition, we also develop a new IP-oriented multilingual large language model (called MoZi), which is a BLOOMZ-based model that has been supervised fine-tuned with multilingual IP-related text data. We evaluate our proposed MoZi model and four well-known LLMs (i.e., BLOOMZ, BELLE, ChatGLM and ChatGPT) on the MoZIP benchmark. Experimental results demonstrate that MoZi outperforms BLOOMZ, BELLE and ChatGLM by a noticeable margin, while it had lower scores compared with ChatGPT. Notably, the performance of current LLMs on the MoZIP benchmark has much room for improvement, and even the most powerful ChatGPT does not reach the passing level. Our source code, data, and models are available at \url{https://github.com/AI-for-Science/MoZi}. △ Less

Submitted 26 February, 2024; originally announced February 2024.

Journal ref: LREC-COLING 2024

arXiv:2402.16041 [pdf, other]

Detecting Machine-Generated Texts by Multi-Population Aware Optimization for Maximum Mean Discrepancy

Authors: Shuhai Zhang, Yiliao Song, Jiahao Yang, Yuanqing Li, Bo Han, Mingkui Tan

Abstract: Large language models (LLMs) such as ChatGPT have exhibited remarkable performance in generating human-like texts. However, machine-generated texts (MGTs) may carry critical risks, such as plagiarism issues, misleading information, or hallucination issues. Therefore, it is very urgent and important to detect MGTs in many situations. Unfortunately, it is challenging to distinguish MGTs and human-wr… ▽ More Large language models (LLMs) such as ChatGPT have exhibited remarkable performance in generating human-like texts. However, machine-generated texts (MGTs) may carry critical risks, such as plagiarism issues, misleading information, or hallucination issues. Therefore, it is very urgent and important to detect MGTs in many situations. Unfortunately, it is challenging to distinguish MGTs and human-written texts because the distributional discrepancy between them is often very subtle due to the remarkable performance of LLMs. In this paper, we seek to exploit \textit{maximum mean discrepancy} (MMD) to address this issue in the sense that MMD can well identify distributional discrepancies. However, directly training a detector with MMD using diverse MGTs will incur a significantly increased variance of MMD since MGTs may contain \textit{multiple text populations} due to various LLMs. This will severely impair MMD's ability to measure the difference between two samples. To tackle this, we propose a novel \textit{multi-population} aware optimization method for MMD called MMD-MP, which can \textit{avoid variance increases} and thus improve the stability to measure the distributional discrepancy. Relying on MMD-MP, we develop two methods for paragraph-based and sentence-based detection, respectively. Extensive experiments on various LLMs, \eg, GPT2 and ChatGPT, show superior detection performance of our MMD-MP. The source code is available at \url{https://github.com/ZSHsh98/MMD-MP}. △ Less

Submitted 29 February, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

Comments: Accepted at ICLR 2024

arXiv:2402.11082 [pdf, other]

The AI Security Pyramid of Pain

Authors: Chris M. Ward, Josh Harguess, Julia Tao, Daniel Christman, Paul Spicer, Mike Tan

Abstract: We introduce the AI Security Pyramid of Pain, a framework that adapts the cybersecurity Pyramid of Pain to categorize and prioritize AI-specific threats. This framework provides a structured approach to understanding and addressing various levels of AI threats. Starting at the base, the pyramid emphasizes Data Integrity, which is essential for the accuracy and reliability of datasets and AI models… ▽ More We introduce the AI Security Pyramid of Pain, a framework that adapts the cybersecurity Pyramid of Pain to categorize and prioritize AI-specific threats. This framework provides a structured approach to understanding and addressing various levels of AI threats. Starting at the base, the pyramid emphasizes Data Integrity, which is essential for the accuracy and reliability of datasets and AI models, including their weights and parameters. Ensuring data integrity is crucial, as it underpins the effectiveness of all AI-driven decisions and operations. The next level, AI System Performance, focuses on MLOps-driven metrics such as model drift, accuracy, and false positive rates. These metrics are crucial for detecting potential security breaches, allowing for early intervention and maintenance of AI system integrity. Advancing further, the pyramid addresses the threat posed by Adversarial Tools, identifying and neutralizing tools used by adversaries to target AI systems. This layer is key to staying ahead of evolving attack methodologies. At the Adversarial Input layer, the framework addresses the detection and mitigation of inputs designed to deceive or exploit AI models. This includes techniques like adversarial patterns and prompt injection attacks, which are increasingly used in sophisticated attacks on AI systems. Data Provenance is the next critical layer, ensuring the authenticity and lineage of data and models. This layer is pivotal in preventing the use of compromised or biased data in AI systems. At the apex is the tactics, techniques, and procedures (TTPs) layer, dealing with the most complex and challenging aspects of AI security. This involves a deep understanding and strategic approach to counter advanced AI-targeted attacks, requiring comprehensive knowledge and planning. △ Less

Submitted 16 February, 2024; originally announced February 2024.

Comments: SPIE DCS 2024

arXiv:2402.09288 [pdf, other]

EcoVal: An Efficient Data Valuation Framework for Machine Learning

Authors: Ayush K Tarun, Vikram S Chundawat, Murari Mandal, Hong Ming Tan, Bowei Chen, Mohan Kankanhalli

Abstract: Quantifying the value of data within a machine learning workflow can play a pivotal role in making more strategic decisions in machine learning initiatives. The existing Shapley value based frameworks for data valuation in machine learning are computationally expensive as they require considerable amount of repeated training of the model to obtain the Shapley value. In this paper, we introduce an… ▽ More Quantifying the value of data within a machine learning workflow can play a pivotal role in making more strategic decisions in machine learning initiatives. The existing Shapley value based frameworks for data valuation in machine learning are computationally expensive as they require considerable amount of repeated training of the model to obtain the Shapley value. In this paper, we introduce an efficient data valuation framework EcoVal, to estimate the value of data for machine learning models in a fast and practical manner. Instead of directly working with individual data sample, we determine the value of a cluster of similar data points. This value is further propagated amongst all the member cluster points. We show that the overall value of the data can be determined by estimating the intrinsic and extrinsic value of each data. This is enabled by formulating the performance of a model as a \textit{production function}, a concept which is popularly used to estimate the amount of output based on factors like labor and capital in a traditional free economic market. We provide a formal proof of our valuation technique and elucidate the principles and mechanisms that enable its accelerated performance. We demonstrate the real-world applicability of our method by showcasing its effectiveness for both in-distribution and out-of-sample data. This work addresses one of the core challenges of efficient data valuation at scale in machine learning models. The code is available at \underline{https://github.com/respai-lab/ecoval}. △ Less

Submitted 9 July, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

Comments: KDD-2024

arXiv:2402.08188 [pdf, other]

High-cadence Timing of Binary Pulsars with CHIME

Authors: Chia Min Tan, Emmanuel Fonseca, Kathryn Crowter, Fengqiu Adam Dong, Victoria M. Kaspi, Kiyoshi W. Masui, James W. McKee, Bradley W. Meyers, Scott M. Ransom, Ingrid H. Stairs

Abstract: We performed near-daily observations on the binary pulsars PSR J0218+4232, PSR J1518+4904 and PSR J2023+2853 with the Canadian Hydrogen Intensity Mapping Experiment (CHIME). For the first time, we detected the Shapiro time delay in all three pulsar-binary systems, using only 2--4 years of CHIME/Pulsar timing data. We measured the pulsar masses to be $1.49^{+0.23}_{-0.20}$ M$_\odot$,… ▽ More We performed near-daily observations on the binary pulsars PSR J0218+4232, PSR J1518+4904 and PSR J2023+2853 with the Canadian Hydrogen Intensity Mapping Experiment (CHIME). For the first time, we detected the Shapiro time delay in all three pulsar-binary systems, using only 2--4 years of CHIME/Pulsar timing data. We measured the pulsar masses to be $1.49^{+0.23}_{-0.20}$ M$_\odot$, $1.470^{+0.030}_{-0.034}$ M$_\odot$ and $1.50^{+0.49}_{-0.38}$ M$_\odot$ respectively. The companion mass to PSR J0218+4232 was found to be $0.179^{+0.018}_{-0.016}$ M$_\odot$. We constrained the mass of the neutron-star companion of PSR J1518+4904 to be $1.248^{+0.035}_{-0.029}$ M$_\odot$, using the observed apsidal motion as a constraint on mass estimation. The binary companion to PSR J2023+2853 was found to have a mass of $0.93^{+0.17}_{-0.14}$ M$_\odot$; in the context of the near-circular orbit, this mass estimate suggests that the companion to PSR J2023+2853 is likely a high-mass white dwarf. By comparing the timing model obtained for PSR J0218+4232 with previous observations, we found a significant change in the observed orbital period of the system of $\dot{P_{\rm b}} = 0.14(2) \times 10^{-12}$; we determined that this variation arises from ``Shklovskii acceleration" due to relative motion of the binary system, and used this measurement to estimate a distance of $d=(6.7 \pm 1.0)$ kpc to PSR J0218+4232. This work demonstrates the capability of high-cadence observations, enabled by the CHIME/Pulsar system, to detect and refine general-relativistic effects of binary pulsars over short observing timescales. △ Less

Submitted 12 February, 2024; originally announced February 2024.

Comments: 14 pages, 4 figures, accepted for publication in ApJ

arXiv:2402.01935 [pdf, other]

Code Representation Learning At Scale

Authors: Dejiao Zhang, Wasi Ahmad, Ming Tan, Hantian Ding, Ramesh Nallapati, Dan Roth, Xiaofei Ma, Bing Xiang

Abstract: Recent studies have shown that code language models at scale demonstrate significant performance gains on downstream tasks, i.e., code generation. However, most of the existing works on code representation learning train models at a hundred million parameter scale using very limited pretraining corpora. In this work, we fuel code representation learning with a vast amount of code data via a two-st… ▽ More Recent studies have shown that code language models at scale demonstrate significant performance gains on downstream tasks, i.e., code generation. However, most of the existing works on code representation learning train models at a hundred million parameter scale using very limited pretraining corpora. In this work, we fuel code representation learning with a vast amount of code data via a two-stage pretraining scheme. We first train the encoders via a mix that leverages both randomness in masking language modeling and the structure aspect of programming language. We then enhance the representations via contrastive learning with hard negative and hard positive constructed in an unsupervised manner. We establish an off-the-shelf encoder model that persistently outperforms the existing models on a wide variety of downstream tasks by large margins. To comprehend the factors contributing to successful code representation learning, we conduct detailed ablations and share our findings on (i) a customized and effective token-level denoising scheme for source code; (ii) the importance of hard negatives and hard positives; (iii) how the proposed bimodal contrastive learning boost the cross-lingual semantic search performance; and (iv) how the pretraining schemes decide the downstream task performance scales with the model size. △ Less

Submitted 2 February, 2024; originally announced February 2024.

Comments: 10 pages

Journal ref: ICLR 2024

arXiv:2401.11991 [pdf, other]

Tight Bounds on the Message Complexity of Distributed Tree Verification

Authors: Shay Kutten, Peter Robinson, Ming Ming Tan

Abstract: We consider the message complexity of verifying whether a given subgraph of the communication network forms a tree with specific properties both in the KT-$ρ$ (nodes know their $ρ$-hop neighborhood, including node IDs) and the KT-$0$ (nodes do not have this knowledge) models. We develop a rather general framework that helps in establishing tight lower bounds for various tree verification problems.… ▽ More We consider the message complexity of verifying whether a given subgraph of the communication network forms a tree with specific properties both in the KT-$ρ$ (nodes know their $ρ$-hop neighborhood, including node IDs) and the KT-$0$ (nodes do not have this knowledge) models. We develop a rather general framework that helps in establishing tight lower bounds for various tree verification problems. We also consider two different verification requirements: namely that every node detects in the case the input is incorrect, as well as the requirement that at least one node detects. The results are stronger than previous ones in the sense that we assume that each node knows the number $n$ of nodes in the graph (in some cases) or an $α$ approximation of $n$ (in other cases). For spanning tree verification, we show that the message complexity inherently depends on the quality of the given approximation of $n$: We show a tight lower bound of $Ω(n^2)$ for the case $α\ge \sqrt{2}$ and a much better upper bound (i.e., $O(n \log n)$) when nodes are given a tighter approximation. On the other hand, our framework also yields an $Ω(n^2)$ lower bound on the message complexity of verifying a minimum spanning tree (MST), which reveals a polynomial separation between ST verification and MST verification. This result holds for randomized algorithms with perfect knowledge of the network size, and even when just one node detects illegal inputs, thus improving over the work of Kor, Korman, and Peleg (2013). For verifying a $d$-approximate BFS tree, we show that the same lower bound holds even if nodes know $n$ exactly, however, the lower bound is sensitive to $d$, which is the stretch parameter. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: Appeared at OPODIS 2023

arXiv:2401.11338 [pdf, other]

doi 10.1063/5.0199112

ENN's Roadmap for Proton-Boron Fusion Based on Spherical Torus

Authors: Min-sheng Liu, Hua-sheng Xie, Yu-min Wang, Jia-qi Dong, Kai-ming Feng, Xiang Gu, Xian-li Huang, Xin-chen Jiang, Ying-ying Li, Zhi Li, Bing Liu, Wen-jun Liu, Di Luo, Yueng-Kay Martin Peng, Yue-jiang Shi, Shao-dong Song, Xian-ming Song, Tian-tian Sun, Mu-zhi Tan, Xue-yun Wang, Yuan-ming Yang, Gang Yin, Han-yue Zhao, ENN fusion team

Abstract: ENN Science and Technology Development Co., Ltd. (ENN) is committed to generating fusion energy in an environmentally friendly and cost-effective manner, which requires abundant aneutronic fuel. Proton-boron ( p-$^{11}$B or p-B) fusion is considered an ideal choice for this purpose. Recent studies have suggested that p-B fusion, although challenging, is feasible based on new cross-section data, pr… ▽ More ENN Science and Technology Development Co., Ltd. (ENN) is committed to generating fusion energy in an environmentally friendly and cost-effective manner, which requires abundant aneutronic fuel. Proton-boron ( p-$^{11}$B or p-B) fusion is considered an ideal choice for this purpose. Recent studies have suggested that p-B fusion, although challenging, is feasible based on new cross-section data, provided that a hot ion mode and high wall reflection can be achieved to reduce electron radiation loss. The high beta and good confinement of the spherical torus (ST) make it an ideal candidate for p-B fusion. By utilizing the new spherical torus energy confinement scaling law, a reactor with a major radius $R_0=4$ m, central magnetic field $B_0=6$ T, central temperature $T_{i0}=150$ keV, plasma current $I_p=30$ MA, and hot ion mode $T_i/T_e=4$ can yield p-B fusion with $Q>10$. A roadmap for p-B fusion has been developed, with the next-generation device named EHL-2. EHL stands for ENN He-Long, which literally means ``peaceful Chinese Loong". The main target parameters include $R_0\simeq1.05$ m, $A\simeq1.85$, $B_0\simeq3$ T, $T_{i0}\simeq30$ keV, $I_p\simeq3$ MA, and $T_i/T_e\geq2$. The existing ST device EXL-50 was simultaneously upgraded to provide experimental support for the new roadmap, involving the installation and upgrading of the central solenoid, vacuum chamber, and magnetic systems. The construction of the upgraded ST fusion device, EXL-50U, was completed at the end of 2023, and it achieved its first plasma in January 2024. The construction of EHL-2 is estimated to be completed by 2026. △ Less

Submitted 10 June, 2024; v1 submitted 20 January, 2024; originally announced January 2024.

Comments: 16 pages, 8 figures

Journal ref: Phys. Plasmas 31, 062507 (2024)

arXiv:2401.09698 [pdf]

Photonic RF Channelization Based on Microcombs

Authors: Weiwei Han, Zhihui Liu, Mengxi Tan, Chaoran Huang, Jiayang Wu, Kun Xu, David J. Moss, Xingyuan Xu

Abstract: In recent decades, microwave photonic channelization techniques have developed significantly. Characterized by low loss, high versatility, large instantaneous bandwidth, and immunity to electromagnetic interference, microwave photonic channelization addresses the requirements of modern radar and electronic warfare for receivers. Microresonator-based optical frequency combs are promising devices fo… ▽ More In recent decades, microwave photonic channelization techniques have developed significantly. Characterized by low loss, high versatility, large instantaneous bandwidth, and immunity to electromagnetic interference, microwave photonic channelization addresses the requirements of modern radar and electronic warfare for receivers. Microresonator-based optical frequency combs are promising devices for photonic channelized receivers, enabling full advantage of multicarriers, large bandwidths, and accelerating the integration process of microwave photonic channelized receivers. In this paper, we review the research progress and trends in microwave photonic channelization, focusing on schemes that utilize integrated microcombs. We discuss the potential of microcomb-based RF channelization, as well as their challenges and limitations, and provide perspectives for their future development in the context of on-chip silicon-based photonics. △ Less

Submitted 17 January, 2024; originally announced January 2024.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2401.08703 [pdf, other]

Decoupled Prototype Learning for Reliable Test-Time Adaptation

Authors: Guowei Wang, Changxing Ding, Wentao Tan, Mingkui Tan

Abstract: Test-time adaptation (TTA) is a task that continually adapts a pre-trained source model to the target domain during inference. One popular approach involves fine-tuning model with cross-entropy loss according to estimated pseudo-labels. However, its performance is significantly affected by noisy pseudo-labels. This study reveals that minimizing the classification error of each sample causes the cr… ▽ More Test-time adaptation (TTA) is a task that continually adapts a pre-trained source model to the target domain during inference. One popular approach involves fine-tuning model with cross-entropy loss according to estimated pseudo-labels. However, its performance is significantly affected by noisy pseudo-labels. This study reveals that minimizing the classification error of each sample causes the cross-entropy loss's vulnerability to label noise. To address this issue, we propose a novel Decoupled Prototype Learning (DPL) method that features prototype-centric loss computation. First, we decouple the optimization of class prototypes. For each class prototype, we reduce its distance with positive samples and enlarge its distance with negative samples in a contrastive manner. This strategy prevents the model from overfitting to noisy pseudo-labels. Second, we propose a memory-based strategy to enhance DPL's robustness for the small batch sizes often encountered in TTA. We update each class's pseudo-feature from a memory in a momentum manner and insert an additional DPL loss. Finally, we introduce a consistency regularization-based approach to leverage samples with unconfident pseudo-labels. This approach transfers feature styles of samples with unconfident pseudo-labels to those with confident pseudo-labels. Thus, more reliable samples for TTA are created. The experimental results demonstrate that our methods achieve state-of-the-art performance on domain generalization benchmarks, and reliably improve the performance of self-training-based methods on image corruption benchmarks. The code will be released. △ Less

Submitted 25 January, 2024; v1 submitted 14 January, 2024; originally announced January 2024.

Comments: 12 pages, 5 figures

arXiv:2401.08174 [pdf, other]

An Efficient Instance Segmentation Framework Based on Oriented Bounding Boxes

Authors: Zhen Zhou, Junfeng Fan, Yunkai Ma, Sihan Zhao, Fengshui Jing, Min Tan

Abstract: Instance segmentation for completely occluded objects and dense objects in robot vision measurement are two challenging tasks. To uniformly deal with them, this paper proposes a unified coarse-to-fine instance segmentation framework, CFNet, which uses box prompt-based segmentation foundation models (BSMs), e.g., Segment Anything Model. Specifically, CFNet first detects oriented bounding boxes (OBB… ▽ More Instance segmentation for completely occluded objects and dense objects in robot vision measurement are two challenging tasks. To uniformly deal with them, this paper proposes a unified coarse-to-fine instance segmentation framework, CFNet, which uses box prompt-based segmentation foundation models (BSMs), e.g., Segment Anything Model. Specifically, CFNet first detects oriented bounding boxes (OBBs) to distinguish instances and provide coarse localization information. Then, it predicts OBB prompt-related masks for fine segmentation. CFNet performs instance segmentation with OBBs that only contain partial object boundaries on occluders to predict occluded object instances, which overcomes the difficulty of existing amodal instance segmentation methods in directly predicting occluded objects. In addition, since OBBs only serve as prompts, CFNet alleviates the over-dependence on bounding box detection performance of current instance segmentation methods using OBBs for dense objects. Moreover, to enable BSMs to handle OBB prompts, we propose a novel OBB prompt encoder. To make CFNet more lightweight, we perform knowledge distillation on it and introduce a Gaussian label smoothing method for teacher model outputs. Experiments demonstrate that CFNet outperforms current instance segmentation methods on both industrial and public datasets. The code is available at https://github.com/zhen6618/OBBInstanceSegmentation. △ Less

Submitted 1 July, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

arXiv:2401.07263 [pdf, other]

BET: Explaining Deep Reinforcement Learning through The Error-Prone Decisions

Authors: Xiao Liu, Jie Zhao, Wubing Chen, Mao Tan, Yongxing Su

Abstract: Despite the impressive capabilities of Deep Reinforcement Learning (DRL) agents in many challenging scenarios, their black-box decision-making process significantly limits their deployment in safety-sensitive domains. Several previous self-interpretable works focus on revealing the critical states of the agent's decision. However, they cannot pinpoint the error-prone states. To address this issue,… ▽ More Despite the impressive capabilities of Deep Reinforcement Learning (DRL) agents in many challenging scenarios, their black-box decision-making process significantly limits their deployment in safety-sensitive domains. Several previous self-interpretable works focus on revealing the critical states of the agent's decision. However, they cannot pinpoint the error-prone states. To address this issue, we propose a novel self-interpretable structure, named Backbone Extract Tree (BET), to better explain the agent's behavior by identify the error-prone states. At a high level, BET hypothesizes that states in which the agent consistently executes uniform decisions exhibit a reduced propensity for errors. To effectively model this phenomenon, BET expresses these states within neighborhoods, each defined by a curated set of representative states. Therefore, states positioned at a greater distance from these representative benchmarks are more prone to error. We evaluate BET in various popular RL environments and show its superiority over existing self-interpretable models in terms of explanation fidelity. Furthermore, we demonstrate a use case for providing explanations for the agents in StarCraft II, a sophisticated multi-agent cooperative game. To the best of our knowledge, we are the first to explain such a complex scenarios using a fully transparent structure. △ Less

Submitted 14 January, 2024; originally announced January 2024.

Comments: This is an early version of a paper that submitted to IJCAI 2024 8 pages, 4 figures and 1 table

arXiv:2401.07197 [pdf]

doi 10.1038/s44172-023-00135-7

Photonic real time video image signal processor at 17Tb/s based on a Kerr microcomb

Authors: Mengxi Tan, Xingyuan Xu, Andreas Boes, Bill Corcoran, Thach G. Nguyen, Sai T. Chu, Brent E. Little, Roberto Morandotti, Jiayang Wu, Arnan Mitchell, David J. Moss

Abstract: Signal processing has become central to many fields, from coherent optical telecommunications, where it is used to compensate signal impairments, to video image processing. Image processing is particularly important for observational astronomy, medical diagnosis, autonomous driving, big data and artificial intelligence. For these applications, signal processing traditionally has mainly been perfor… ▽ More Signal processing has become central to many fields, from coherent optical telecommunications, where it is used to compensate signal impairments, to video image processing. Image processing is particularly important for observational astronomy, medical diagnosis, autonomous driving, big data and artificial intelligence. For these applications, signal processing traditionally has mainly been performed electronically. However these, as well as new applications, particularly those involving real time video image processing, are creating unprecedented demand for ultrahigh performance, including high bandwidth and reduced energy consumption. Here, we demonstrate a photonic signal processor operating at 17 Terabits/s and use it to process video image signals in real-time. The system processes 400,000 video signals concurrently, performing 34 functions simultaneously that are key to object edge detection, edge enhancement and motion blur. As compared with spatial-light devices used for image processing, our system is not only ultra-high speed but highly reconfigurable and programable, able to perform many different functions without any change to the physical hardware. Our approach is based on an integrated Kerr soliton crystal microcomb, and opens up new avenues for ultrafast robotic vision and machine learning. △ Less

Submitted 13 January, 2024; originally announced January 2024.

Comments: 20 pages, 5 figures, 71 references

Journal ref: Nature Communications Engineering Volume 2 page 94 (2023)

arXiv:2401.04345 [pdf, other]

RomniStereo: Recurrent Omnidirectional Stereo Matching

Authors: Hualie Jiang, Rui Xu, Minglang Tan, Wenjie Jiang

Abstract: Omnidirectional stereo matching (OSM) is an essential and reliable means for $360^{\circ}$ depth sensing. However, following earlier works on conventional stereo matching, prior state-of-the-art (SOTA) methods rely on a 3D encoder-decoder block to regularize the cost volume, causing the whole system complicated and sub-optimal results. Recently, the Recurrent All-pairs Field Transforms (RAFT) base… ▽ More Omnidirectional stereo matching (OSM) is an essential and reliable means for $360^{\circ}$ depth sensing. However, following earlier works on conventional stereo matching, prior state-of-the-art (SOTA) methods rely on a 3D encoder-decoder block to regularize the cost volume, causing the whole system complicated and sub-optimal results. Recently, the Recurrent All-pairs Field Transforms (RAFT) based approach employs the recurrent update in 2D and has efficiently improved image-matching tasks, ie, optical flow, and stereo matching. To bridge the gap between OSM and RAFT, we mainly propose an opposite adaptive weighting scheme to seamlessly transform the outputs of spherical sweeping of OSM into the required inputs for the recurrent update, thus creating a recurrent omnidirectional stereo matching (RomniStereo) algorithm. Furthermore, we introduce two techniques, ie, grid embedding and adaptive context feature generation, which also contribute to RomniStereo's performance. Our best model improves the average MAE metric by 40.7\% over the previous SOTA baseline across five datasets. When visualizing the results, our models demonstrate clear advantages on both synthetic and realistic examples. The code is available at \url{https://github.com/HalleyJiang/RomniStereo}. △ Less

Submitted 25 January, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

Comments: accepted by IEEE RA-L, https://github.com/HalleyJiang/RomniStereo

arXiv:2401.03173 [pdf, other]

doi 10.4108/eetcasa.v10i1.4681

UGGNet: Bridging U-Net and VGG for Advanced Breast Cancer Diagnosis

Authors: Tran Cao Minh, Nguyen Kim Quoc, Phan Cong Vinh, Dang Nhu Phu, Vuong Xuan Chi, Ha Minh Tan

Abstract: In the field of medical imaging, breast ultrasound has emerged as a crucial diagnostic tool for early detection of breast cancer. However, the accuracy of diagnosing the location of the affected area and the extent of the disease depends on the experience of the physician. In this paper, we propose a novel model called UGGNet, combining the power of the U-Net and VGG architectures to enhance the p… ▽ More In the field of medical imaging, breast ultrasound has emerged as a crucial diagnostic tool for early detection of breast cancer. However, the accuracy of diagnosing the location of the affected area and the extent of the disease depends on the experience of the physician. In this paper, we propose a novel model called UGGNet, combining the power of the U-Net and VGG architectures to enhance the performance of breast ultrasound image analysis. The U-Net component of the model helps accurately segment the lesions, while the VGG component utilizes deep convolutional layers to extract features. The fusion of these two architectures in UGGNet aims to optimize both segmentation and feature representation, providing a comprehensive solution for accurate diagnosis in breast ultrasound images. Experimental results have demonstrated that the UGGNet model achieves a notable accuracy of 78.2% on the "Breast Ultrasound Images Dataset." △ Less

Submitted 6 January, 2024; originally announced January 2024.

Comments: Submitted to the journal "EAI Endorsed Transactions on Context-aware Systems and Applications" ,2 images, 5 data tables

Journal ref: EAI Endorsed Transactions on Contex-aware Systems and Applications, 10(1), 2024

arXiv:2312.07471 [pdf, other]

The Green Bank North Celestial Cap Survey IX: Timing Follow-up for 128 Pulsars

Authors: A. E. McEwen, J. K. Swiggum, D. L. Kaplan, C. M. Tan, B. W. Meyers, E. Fonseca, G. Y. Agazie, P. Chawla, K. Crowter, M. E. DeCesar, T. Dolch, F. A. Dong, W. Fiore, E. Fonseca, D. C. Good, A. G. Istrate, V. M. Kaspi, V. I. Kondratiev, J. van Leeuwen, L. Levin, E. F. Lewis, R. S. Lynch, K. W. Masui, J. W. McKee, M. A. McLaughlin , et al. (6 additional authors not shown)

Abstract: The Green Bank North Celestial Cap survey is one of the largest and most sensitive searches for pulsars and transient radio objects. Observations for the survey have finished; priorities have shifted toward long-term monitoring of its discoveries. In this study, we have developed a pipeline to handle large datasets of archival observations and connect them to recent, high-cadence observations take… ▽ More The Green Bank North Celestial Cap survey is one of the largest and most sensitive searches for pulsars and transient radio objects. Observations for the survey have finished; priorities have shifted toward long-term monitoring of its discoveries. In this study, we have developed a pipeline to handle large datasets of archival observations and connect them to recent, high-cadence observations taken using the Canadian Hydrogen Intensity Mapping Experiment (CHIME) telescope. This pipeline handles data for 128 pulsars and has produced measurements of spin, positional, and orbital parameters that connect data over observation gaps as large as 2000 days. We have also measured glitches in the timing residuals for five of the pulsars included and proper motion for 19 sources (13 new). We include updates to orbital parameters for 19 pulsars, including 9 previously unpublished binaries. For two of these binaries, we provide updated measurements of post-Keplerian binary parameters, which result in much more precise estimates of the total masses of both systems. For PSR J0509+3801, the much improved measurement of the Einstein delay yields much improved mass measurements for the pulsar and its companion, 1.399(6)\Msun and 1.412(6)\Msun, respectively. For this system, we have also obtained a measurement of the orbital decay due to the emission of gravitational waves: $\dot{P}_{\rm B} = -1.37(7)\times10^{-12}$, which is in agreement with the rate predicted by general relativity for these masses. △ Less

Submitted 12 December, 2023; originally announced December 2023.

Comments: accepted for publication in The Astrophysical Journal

arXiv:2312.05783 [pdf, other]

DCIR: Dynamic Consistency Intrinsic Reward for Multi-Agent Reinforcement Learning

Authors: Kunyang Lin, Yufeng Wang, Peihao Chen, Runhao Zeng, Siyuan Zhou, Mingkui Tan, Chuang Gan

Abstract: Learning optimal behavior policy for each agent in multi-agent systems is an essential yet difficult problem. Despite fruitful progress in multi-agent reinforcement learning, the challenge of addressing the dynamics of whether two agents should exhibit consistent behaviors is still under-explored. In this paper, we propose a new approach that enables agents to learn whether their behaviors should… ▽ More Learning optimal behavior policy for each agent in multi-agent systems is an essential yet difficult problem. Despite fruitful progress in multi-agent reinforcement learning, the challenge of addressing the dynamics of whether two agents should exhibit consistent behaviors is still under-explored. In this paper, we propose a new approach that enables agents to learn whether their behaviors should be consistent with that of other agents by utilizing intrinsic rewards to learn the optimal policy for each agent. We begin by defining behavior consistency as the divergence in output actions between two agents when provided with the same observation. Subsequently, we introduce dynamic consistency intrinsic reward (DCIR) to stimulate agents to be aware of others' behaviors and determine whether to be consistent with them. Lastly, we devise a dynamic scale network (DSN) that provides learnable scale factors for the agent at every time step to dynamically ascertain whether to award consistent behavior and the magnitude of rewards. We evaluate DCIR in multiple environments including Multi-agent Particle, Google Research Football and StarCraft II Micromanagement, demonstrating its efficacy. △ Less

Submitted 10 December, 2023; originally announced December 2023.

Comments: 15 pages, 11 pages for main paper, 4 pages for supplementary

arXiv:2312.01892 [pdf, other]

PSR J0210+5845; An ultra wide binary pulsar with a B6V main-sequence star companion

Authors: E. van der Wateren, C. G. Bassa, G. H. Janssen, I. V. Yanes-Rizo, J. Casares, G. Nelemans, B. W. Stappers, C. M. Tan

Abstract: We report on radio timing observations of PSR J0210+5845 which reveal large deviations from typical pulsar spin-down behaviour. We interpret these deviations as being due to binary motion around the $V=13.5$ star 2MASS J02105640$+$5845176, which is coincident in celestial position and distance with the pulsar. Archival observations and new optical spectroscopy identify this star as a B6V star with… ▽ More We report on radio timing observations of PSR J0210+5845 which reveal large deviations from typical pulsar spin-down behaviour. We interpret these deviations as being due to binary motion around the $V=13.5$ star 2MASS J02105640$+$5845176, which is coincident in celestial position and distance with the pulsar. Archival observations and new optical spectroscopy identify this star as a B6V star with a temperature of $T_\mathrm{eff}\approx 14\,000$K and a mass of $M_\mathrm{c}= 3.5$ to $3.8$M$_\odot$, making it the lowest mass main-sequence star known orbiting a non-recycled pulsar. We found that the timing observations constrain the binary orbit to be wide and moderately eccentric, with an orbital period of $P_\mathrm{b}=47^{+40}_{-14}$yr and eccentricity $e=0.46^{+0.10}_{-0.07}$. We predict that the next periastron passage will occur between 2030 and 2034. Due to the low companion mass, we find that the probability for a system with the properties of PSR J0210+5845 and its binary companion to survive the supernova is low. We show that a low velocity and fortuitously directed natal kick is required for the binary to remain bound during the supernova explosion, and argue that an electron-capture supernova is a plausible formation scenario for the pulsar. △ Less

Submitted 4 December, 2023; originally announced December 2023.

Comments: Accepted to A&A

arXiv:2311.18302 [pdf, ps, other]

Topological 5d $\mathcal {N} = 2$ Gauge Theory: Novel Floer Homologies, their Dualities, and an $A_\infty$-category of Three-Manifolds

Authors: Arif Er, Zhi-Cong Ong, Meng-Chwan Tan

Abstract: We show how one can define novel gauge-theoretic Floer homologies of four, three and two-manifolds from the physics of a certain topologically-twisted 5d ${\cal N}=2$ gauge theory via its supersymmetric quantum mechanics interpretation. They are associated with Vafa-Witten, Hitchin and $G_{\mathbb{C}}$-BF configurations on the four, three and two-manifolds, respectively. We also show how one can d… ▽ More We show how one can define novel gauge-theoretic Floer homologies of four, three and two-manifolds from the physics of a certain topologically-twisted 5d ${\cal N}=2$ gauge theory via its supersymmetric quantum mechanics interpretation. They are associated with Vafa-Witten, Hitchin and $G_{\mathbb{C}}$-BF configurations on the four, three and two-manifolds, respectively. We also show how one can define novel symplectic Floer homologies of Hitchin spaces, which in turn will allow us to derive novel Atiyah-Floer correspondences that relate our gauge-theoretic Floer homologies to symplectic intersection Floer homologies of Higgs bundles. Furthermore, topological invariance and 5d "S-duality" suggest a web of relations and a Langlands duality amongst these novel Floer homologies and their loop/toroidal group generalizations. Last but not least, via a 2d gauged Landau-Ginzburg model interpretation of the 5d theory, we derive, from the soliton string theory that it defines and the 5d partition function, a Fukaya-Seidel type $A_\infty$-category of Hitchin configurations on three-manifolds and its novel Atiyah-Floer correspondence. Our work therefore furnishes purely physical proofs and generalizations of the mathematical conjectures of Haydys [1], Abouzaid-Manolescu [2], and Bousseau [3], and more. △ Less

Submitted 30 May, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

Comments: 61 pp. Additional material in Section 9 which physically proves and generalizes Bousseau's mathematical conjecture in [3]

arXiv:2311.17945 [pdf, other]

Contrastive Vision-Language Alignment Makes Efficient Instruction Learner

Authors: Lizhao Liu, Xinyu Sun, Tianhang Xiang, Zhuangwei Zhuang, Liuren Yin, Mingkui Tan

Abstract: We study the task of extending the large language model (LLM) into a vision-language instruction-following model. This task is crucial but challenging since the LLM is trained on text modality only, making it hard to effectively digest the visual modality. To address this, existing methods typically train a visual adapter to align the representation between a pre-trained vision transformer (ViT) a… ▽ More We study the task of extending the large language model (LLM) into a vision-language instruction-following model. This task is crucial but challenging since the LLM is trained on text modality only, making it hard to effectively digest the visual modality. To address this, existing methods typically train a visual adapter to align the representation between a pre-trained vision transformer (ViT) and the LLM by a generative image captioning loss. However, we find that the generative objective can only produce weak alignment for vision and language, making the aligned vision-language model very hungry for the instruction fine-tuning data. In this paper, we propose CG-VLM that applies both Contrastive and Generative alignment objectives to effectively align the representation of ViT and LLM. Different from image level and sentence level alignment in common contrastive learning settings, CG-VLM aligns the image-patch level features and text-token level embeddings, which, however, is very hard to achieve as no explicit grounding patch-token relation provided in standard image captioning datasets. To address this issue, we propose to maximize the averaged similarity between pooled image-patch features and text-token embeddings. Extensive experiments demonstrate that the proposed CG-VLM produces strong vision-language alignment and is an efficient instruction learner. For example, using only 10% instruction tuning data, we reach 95% performance of state-of-the-art method LLaVA [29] on the zero-shot ScienceQA-Image benchmark. △ Less

Submitted 28 November, 2023; originally announced November 2023.

Comments: 17 pages, 10 pages for main paper, 7 pages for supplementary

arXiv:2311.15649 [pdf, other]

RoboGPT: an intelligent agent of making embodied long-term decisions for daily instruction tasks

Authors: Yaran Chen, Wenbo Cui, Yuanwen Chen, Mining Tan, Xinyao Zhang, Dongbin Zhao, He Wang

Abstract: Robotic agents must master common sense and long-term sequential decisions to solve daily tasks through natural language instruction. The developments in Large Language Models (LLMs) in natural language processing have inspired efforts to use LLMs in complex robot planning. Despite LLMs' great generalization and comprehension of instruction tasks, LLMs-generated task plans sometimes lack feasibili… ▽ More Robotic agents must master common sense and long-term sequential decisions to solve daily tasks through natural language instruction. The developments in Large Language Models (LLMs) in natural language processing have inspired efforts to use LLMs in complex robot planning. Despite LLMs' great generalization and comprehension of instruction tasks, LLMs-generated task plans sometimes lack feasibility and correctness. To address the problem, we propose a RoboGPT agent\footnote{our code and dataset will be released soon} for making embodied long-term decisions for daily tasks, with two modules: 1) LLMs-based planning with re-plan to break the task into multiple sub-goals; 2) RoboSkill individually designed for sub-goals to learn better navigation and manipulation skills. The LLMs-based planning is enhanced with a new robotic dataset and re-plan, called RoboGPT. The new robotic dataset of 67k daily instruction tasks is gathered for fine-tuning the Llama model and obtaining RoboGPT. RoboGPT planner with strong generalization can plan hundreds of daily instruction tasks. Additionally, a low-computational Re-Plan module is designed to allow plans to flexibly adapt to the environment, thereby addressing the nomenclature diversity challenge. The proposed RoboGPT agent outperforms SOTA methods on the ALFRED daily tasks. Moreover, RoboGPT planner exceeds SOTA LLM-based planners like ChatGPT in task-planning rationality for hundreds of unseen daily tasks, and even other domain tasks, while keeping the large model's original broad application and generality. △ Less

Submitted 30 June, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

Showing 1–50 of 485 results for author: Tan, M