-
The discovery of a nearby 421~s transient with CHIME/FRB/Pulsar
Authors:
Fengqiu Adam Dong,
Tracy Clarke,
Alice P. Curtin,
Ajay Kumar,
Ingrid Stairs,
Shami Chatterjee,
Amanda M. Cook,
Emmanuel Fonseca,
B. M. Gaensler,
Jason W. T. Hessels,
Victoria M. Kaspi,
Mattias Lazda,
Kiyoshi W. Masui,
James W. McKee,
Bradley W. Meyers,
Aaron B. Pearlman,
Scott M. Ransom,
Paul Scholz,
Kaitlyn Shin,
Kendrick M. Smith,
Chia Min Tan
Abstract:
Neutron stars and white dwarfs are both dense remnants of post-main-sequence stars. Pulsars, magnetars and strongly magnetised white dwarfs have all been seen to been observed to exhibit coherent, pulsed radio emission in relation to their rotational period. Recently, a new type of radio long period transient (LPT) has been discovered. The bright radio emission of LPTs resembles that of radio puls…
▽ More
Neutron stars and white dwarfs are both dense remnants of post-main-sequence stars. Pulsars, magnetars and strongly magnetised white dwarfs have all been seen to been observed to exhibit coherent, pulsed radio emission in relation to their rotational period. Recently, a new type of radio long period transient (LPT) has been discovered. The bright radio emission of LPTs resembles that of radio pulsars and magnetars. However, they pulse on timescales (minutes) much longer than previously seen. While minute timescales are common rotation periods for white dwarfs, LPTs are much brighter than the known pulsating white dwarfs, and dipolar radiation from isolated (as opposed to binary) magnetic white dwarfs has yet to be observed. Here, we report the discovery of a new $\sim$421~s LPT, CHIME J0630+25, using the CHIME/FRB and CHIME/Pulsar instruments. We used standard pulsar timing techniques and obtained a phase-coherent timing solution which yielded limits on the inferred magnetic field and characteristic age. CHIME J0630+25 is remarkably nearby ($170 \pm 80$~pc), making it the closest LPT discovered to date.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Unveiling mussel plaque core ductility: the role of pore distribution and hierarchical structure
Authors:
Yulan Lyu,
Mengting Tan,
Yong Pang,
Wei Sun,
Shuguang Li,
Tao Liu
Abstract:
The mussel thread-plaque system exhibits strong adhesion and high ductility, allowing it to adhere to various surfaces. While the microstructure of plaques has been thoroughly studied, the effect of their unique porous structure on ductility remains unclear. This study firstly investigated the porous structure of mussel plaque cores using scanning electron microscopy (SEM). Two-dimensional (2D) po…
▽ More
The mussel thread-plaque system exhibits strong adhesion and high ductility, allowing it to adhere to various surfaces. While the microstructure of plaques has been thoroughly studied, the effect of their unique porous structure on ductility remains unclear. This study firstly investigated the porous structure of mussel plaque cores using scanning electron microscopy (SEM). Two-dimensional (2D) porous representative volume elements (RVEs) with scaled distribution parameters were generated, and the calibrated phase-field modelling method was applied to analyse the effect of the pore distribution and multi-scale porous structure on the failure mechanism of porous RVEs. The SEM analysis revealed that large-scale pores exhibited a lognormal size distribution and a uniform spatial distribution. Simulations showed that increasing the normalised mean radius value of the large-scale pore distribution can statistically lead to a decreasing trend in ductility, strength and strain energy, but cannot solely determine their values. The interaction between pores can lead to two different failure modes under the same pore distribution: progressive failure mode and sudden failure mode. Additionally, the hierarchical structure of multi-scale porous RVEs can further increase ductility by 40%-60% compared to single-scale porous RVEs by reducing stiffness, highlighting the hierarchical structure could be another key factor contributing to the high ductility. These findings deepen our understanding of how the pore distribution and multi-scale porous structure in mussel plaques contribute to their high ductility and affect other mechanical properties, providing valuable insights for the future design of highly ductile biomimetic materials.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Are Language Models Actually Useful for Time Series Forecasting?
Authors:
Mingtian Tan,
Mike A. Merrill,
Vinayak Gupta,
Tim Althoff,
Thomas Hartvigsen
Abstract:
Large language models (LLMs) are being applied to time series tasks, particularly time series forecasting. However, are language models actually useful for time series? After a series of ablation studies on three recent and popular LLM-based time series forecasting methods, we find that removing the LLM component or replacing it with a basic attention layer does not degrade the forecasting results…
▽ More
Large language models (LLMs) are being applied to time series tasks, particularly time series forecasting. However, are language models actually useful for time series? After a series of ablation studies on three recent and popular LLM-based time series forecasting methods, we find that removing the LLM component or replacing it with a basic attention layer does not degrade the forecasting results -- in most cases the results even improved. We also find that despite their significant computational cost, pretrained LLMs do no better than models trained from scratch, do not represent the sequential dependencies in time series, and do not assist in few-shot settings. Additionally, we explore time series encoders and reveal that patching and attention structures perform similarly to state-of-the-art LLM-based forecasters.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
The BabyView dataset: High-resolution egocentric videos of infants' and young children's everyday experiences
Authors:
Bria Long,
Violet Xiang,
Stefan Stojanov,
Robert Z. Sparks,
Zi Yin,
Grace E. Keene,
Alvin W. M. Tan,
Steven Y. Feng,
Chengxu Zhuang,
Virginia A. Marchman,
Daniel L. K. Yamins,
Michael C. Frank
Abstract:
Human children far exceed modern machine learning algorithms in their sample efficiency, achieving high performance in key domains with much less data than current models. This ''data gap'' is a key challenge both for building intelligent artificial systems and for understanding human development. Egocentric video capturing children's experience -- their ''training data'' -- is a key ingredient fo…
▽ More
Human children far exceed modern machine learning algorithms in their sample efficiency, achieving high performance in key domains with much less data than current models. This ''data gap'' is a key challenge both for building intelligent artificial systems and for understanding human development. Egocentric video capturing children's experience -- their ''training data'' -- is a key ingredient for comparison of humans and models and for the development of algorithmic innovations to bridge this gap. Yet there are few such datasets available, and extant data are low-resolution, have limited metadata, and importantly, represent only a small set of children's experiences. Here, we provide the first release of the largest developmental egocentric video dataset to date -- the BabyView dataset -- recorded using a high-resolution camera with a large vertical field-of-view and gyroscope/accelerometer data. This 493 hour dataset includes egocentric videos from children spanning 6 months - 5 years of age in both longitudinal, at-home contexts and in a preschool environment. We provide gold-standard annotations for the evaluation of speech transcription, speaker diarization, and human pose estimation, and evaluate models in each of these domains. We train self-supervised language and vision models and evaluate their transfer to out-of-distribution tasks including syntactic structure learning, object recognition, depth estimation, and image segmentation. Although performance in each scales with dataset size, overall performance is relatively lower than when models are trained on curated datasets, especially in the visual domain. Our dataset stands as an open challenge for robust, humanlike AI systems: how can such systems achieve human-levels of success on the same scale and distribution of training data as humans?
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
DevBench: A multimodal developmental benchmark for language learning
Authors:
Alvin Wei Ming Tan,
Sunny Yu,
Bria Long,
Wanjing Anya Ma,
Tonya Murray,
Rebecca D. Silverman,
Jason D. Yeatman,
Michael C. Frank
Abstract:
How (dis)similar are the learning trajectories of vision-language models and children? Recent modeling work has attempted to understand the gap between models' and humans' data efficiency by constructing models trained on less data, especially multimodal naturalistic data. However, such models are often evaluated on adult-level benchmarks, with limited breadth in language abilities tested, and wit…
▽ More
How (dis)similar are the learning trajectories of vision-language models and children? Recent modeling work has attempted to understand the gap between models' and humans' data efficiency by constructing models trained on less data, especially multimodal naturalistic data. However, such models are often evaluated on adult-level benchmarks, with limited breadth in language abilities tested, and without direct comparison to behavioral data. We introduce DevBench, a multimodal benchmark comprising seven language evaluation tasks spanning the domains of lexical, syntactic, and semantic ability, with behavioral data from both children and adults. We evaluate a set of vision-language models on these tasks, comparing models and humans not only on accuracy but on their response patterns. Across tasks, models exhibit variation in their closeness to human response patterns, and models that perform better on a task also more closely resemble human behavioral responses. We also examine the developmental trajectory of OpenCLIP over training, finding that greater training results in closer approximations to adult response patterns. DevBench thus provides a benchmark for comparing models to human language development. These comparisons highlight ways in which model and human language learning processes diverge, providing insight into entry points for improving language models.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
CoEvol: Constructing Better Responses for Instruction Finetuning through Multi-Agent Cooperation
Authors:
Renhao Li,
Minghuan Tan,
Derek F. Wong,
Min Yang
Abstract:
In recent years, instruction fine-tuning (IFT) on large language models (LLMs) has garnered considerable attention to enhance model performance on unseen tasks. Attempts have been made on automatic construction and effective selection for IFT data. However, we posit that previous methods have not fully harnessed the potential of LLMs for enhancing data quality. The responses within IFT data could…
▽ More
In recent years, instruction fine-tuning (IFT) on large language models (LLMs) has garnered considerable attention to enhance model performance on unseen tasks. Attempts have been made on automatic construction and effective selection for IFT data. However, we posit that previous methods have not fully harnessed the potential of LLMs for enhancing data quality. The responses within IFT data could be further enhanced by leveraging the capabilities of LLMs themselves. In this paper, we propose CoEvol, an LLM-based multi-agent cooperation framework for the improvement of responses to instructions. To effectively refine the responses, we develop an iterative framework following a debate-advise-edit-judge paradigm. A two-stage multi-agent debate strategy is further devised to ensure the diversity and reliability of editing suggestions within the framework. Empirically, models equipped with CoEvol outperform competitive baselines evaluated by MT-Bench and AlpacaEval, demonstrating its effectiveness in enhancing instruction-following capabilities for LLMs.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Enabling Large-Scale and High-Precision Fluid Simulations on Near-Term Quantum Computers
Authors:
Zhao-Yun Chen,
Teng-Yang Ma,
Chuang-Chao Ye,
Liang Xu,
Ming-Yang Tan,
Xi-Ning Zhuang,
Xiao-Fan Xu,
Yun-Jie Wang,
Tai-Ping Sun,
Yong Chen,
Lei Du,
Liang-Liang Guo,
Hai-Feng Zhang,
Hao-Ran Tao,
Tian-Le Wang,
Xiao-Yan Yang,
Ze-An Zhao,
Peng Wang,
Sheng Zhang,
Chi Zhang,
Ren-Ze Zhao,
Zhi-Long Jia,
Wei-Cheng Kong,
Meng-Han Dou,
Jun-Chao Wang
, et al. (7 additional authors not shown)
Abstract:
Quantum computational fluid dynamics (QCFD) offers a promising alternative to classical computational fluid dynamics (CFD) by leveraging quantum algorithms for higher efficiency. This paper introduces a comprehensive QCFD method, including an iterative method "Iterative-QLS" that suppresses error in quantum linear solver, and a subspace method to scale the solution to a larger size. We implement o…
▽ More
Quantum computational fluid dynamics (QCFD) offers a promising alternative to classical computational fluid dynamics (CFD) by leveraging quantum algorithms for higher efficiency. This paper introduces a comprehensive QCFD method, including an iterative method "Iterative-QLS" that suppresses error in quantum linear solver, and a subspace method to scale the solution to a larger size. We implement our method on a superconducting quantum computer, demonstrating successful simulations of steady Poiseuille flow and unsteady acoustic wave propagation. The Poiseuille flow simulation achieved a relative error of less than $0.2\%$, and the unsteady acoustic wave simulation solved a 5043-dimensional matrix. We emphasize the utilization of the quantum-classical hybrid approach in applications of near-term quantum computers. By adapting to quantum hardware constraints and offering scalable solutions for large-scale CFD problems, our method paves the way for practical applications of near-term quantum computers in computational science.
△ Less
Submitted 19 June, 2024; v1 submitted 10 June, 2024;
originally announced June 2024.
-
NUMCoT: Numerals and Units of Measurement in Chain-of-Thought Reasoning using Large Language Models
Authors:
Ancheng Xu,
Minghuan Tan,
Lei Wang,
Min Yang,
Ruifeng Xu
Abstract:
Numeral systems and units of measurement are two conjoined topics in activities of human beings and have mutual effects with the languages expressing them. Currently, the evaluation of Large Language Models (LLMs) often involves mathematical reasoning, yet little attention is given to how minor changes in numbers or units can drastically alter the complexity of problems and the performance of LLMs…
▽ More
Numeral systems and units of measurement are two conjoined topics in activities of human beings and have mutual effects with the languages expressing them. Currently, the evaluation of Large Language Models (LLMs) often involves mathematical reasoning, yet little attention is given to how minor changes in numbers or units can drastically alter the complexity of problems and the performance of LLMs. In this paper, we scrutinize existing LLMs on processing of numerals and units of measurement by constructing datasets with perturbations. We first anatomize the reasoning of math word problems to different sub-procedures like numeral conversions from language to numbers and measurement conversions based on units. Then we further annotate math word problems from ancient Chinese arithmetic works which are challenging in numerals and units of measurement. Experiments on perturbed datasets demonstrate that LLMs still encounter difficulties in handling numeral and measurement conversions.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
CoNav: A Benchmark for Human-Centered Collaborative Navigation
Authors:
Changhao Li,
Xinyu Sun,
Peihao Chen,
Jugang Fan,
Zixu Wang,
Yanxia Liu,
Jinhui Zhu,
Chuang Gan,
Mingkui Tan
Abstract:
Human-robot collaboration, in which the robot intelligently assists the human with the upcoming task, is an appealing objective. To achieve this goal, the agent needs to be equipped with a fundamental collaborative navigation ability, where the agent should reason human intention by observing human activities and then navigate to the human's intended destination in advance of the human. However, t…
▽ More
Human-robot collaboration, in which the robot intelligently assists the human with the upcoming task, is an appealing objective. To achieve this goal, the agent needs to be equipped with a fundamental collaborative navigation ability, where the agent should reason human intention by observing human activities and then navigate to the human's intended destination in advance of the human. However, this vital ability has not been well studied in previous literature. To fill this gap, we propose a collaborative navigation (CoNav) benchmark. Our CoNav tackles the critical challenge of constructing a 3D navigation environment with realistic and diverse human activities. To achieve this, we design a novel LLM-based humanoid animation generation framework, which is conditioned on both text descriptions and environmental context. The generated humanoid trajectory obeys the environmental context and can be easily integrated into popular simulators. We empirically find that the existing navigation methods struggle in CoNav task since they neglect the perception of human intention. To solve this problem, we propose an intention-aware agent for reasoning both long-term and short-term human intention. The agent predicts navigation action based on the predicted intention and panoramic observation. The emergent agent behavior including observing humans, avoiding human collision, and navigation reveals the efficiency of the proposed datasets and agents.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Confidence-Based Task Prediction in Continual Disease Classification Using Probability Distribution
Authors:
Tanvi Verma,
Lukas Schwemer,
Mingrui Tan,
Fei Gao,
Yong Liu,
Huazhu Fu
Abstract:
Deep learning models are widely recognized for their effectiveness in identifying medical image findings in disease classification. However, their limitations become apparent in the dynamic and ever-changing clinical environment, characterized by the continuous influx of newly annotated medical data from diverse sources. In this context, the need for continual learning becomes particularly paramou…
▽ More
Deep learning models are widely recognized for their effectiveness in identifying medical image findings in disease classification. However, their limitations become apparent in the dynamic and ever-changing clinical environment, characterized by the continuous influx of newly annotated medical data from diverse sources. In this context, the need for continual learning becomes particularly paramount, not only to adapt to evolving medical scenarios but also to ensure the privacy of healthcare data. In our research, we emphasize the utilization of a network comprising expert classifiers, where a new expert classifier is added each time a new task is introduced. We present CTP, a task-id predictor that utilizes confidence scores, leveraging the probability distribution (logits) of the classifier to accurately determine the task-id at inference time. Logits are adjusted to ensure that classifiers yield a high-entropy distribution for data associated with tasks other than their own. By defining a noise region in the distribution and computing confidence scores, CTP achieves superior performance when compared to other relevant continual learning methods. Additionally, the performance of CTP can be further improved by providing it with a continuum of data at the time of inference.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Discovery and follow-up of a quasiperiodically nulling and sub-pulse drifting pulsar with the Murchison Widefield Array
Authors:
G. Grover,
N. D. R. Bhat,
S. McSweeney,
C. P. Lee,
B. W. Meyers,
C. M. Tan,
S. S. Kudale
Abstract:
The phenomenon of pulsar nulling, where pulsars temporarily and stochastically cease their radio emission, is thought to be indicative of a `dying' pulsar, where radio emission ceases entirely. Here we report the discovery of a long-period pulsar, PSR J0452-3418, from the ongoing Southern-sky MWA Rapid Two-meter (SMART) pulsar survey. The pulsar has a rotation period of ${\sim}$1.67\,s and a dispe…
▽ More
The phenomenon of pulsar nulling, where pulsars temporarily and stochastically cease their radio emission, is thought to be indicative of a `dying' pulsar, where radio emission ceases entirely. Here we report the discovery of a long-period pulsar, PSR J0452-3418, from the ongoing Southern-sky MWA Rapid Two-meter (SMART) pulsar survey. The pulsar has a rotation period of ${\sim}$1.67\,s and a dispersion measure of 19.8\,\dmu, and it exhibits both quasi-periodic nulling and sub-pulse drifting. Periodic nulling is uncommon, only reported in $<1$\% of the pulsar population, with even a smaller fraction showing periodic nulling and sub-pulse drifting. We describe the discovery and follow-up of the pulsar, including a positional determination using high-resolution imaging with the upgraded Giant Metrewave Radio Telescope (uGMRT), initial timing analysis using the combination of MWA and uGMRT data, and detailed characterisation of the nulling and drifting properties in the MWA's frequency band (140-170\,MHz). Our analysis suggests a nulling fraction of 34$\pm6$\% and a nulling periodicity of 42$^{+1.5}_{-1.3}$ pulses. We measure the phase ($P_2$) and time modulation ($P_3$) caused by the sub-pulse drifting, with an average $P_2$ of 7.1$^{+26.3}_{-3.1}$ degrees and a $P_3$ of 4.8$^{+1.5}_{-0.9}$ pulses. We compare and contrast the observed properties with those of other pulsars that exhibit sub-pulse drifting and quasi-periodic nulling phenomena, and find that the majority of these objects tend to be in the `death valley' in the period-period derivative ($P$-$\dot{P}$) diagram. We also discuss some broader implications for pulsar emission physics and the detectability of similar objects using next-generation pulsar surveys.
△ Less
Submitted 28 May, 2024; v1 submitted 26 May, 2024;
originally announced May 2024.
-
CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework for Chinese Psychological Counseling
Authors:
Chenhao Zhang,
Renhao Li,
Minghuan Tan,
Min Yang,
Jingwei Zhu,
Di Yang,
Jiahao Zhao,
Guancheng Ye,
Chengming Li,
Xiping Hu
Abstract:
Using large language models (LLMs) to assist psychological counseling is a significant but challenging task at present. Attempts have been made on improving empathetic conversations or acting as effective assistants in the treatment with LLMs. However, the existing datasets lack consulting knowledge, resulting in LLMs lacking professional consulting competence. Moreover, how to automatically evalu…
▽ More
Using large language models (LLMs) to assist psychological counseling is a significant but challenging task at present. Attempts have been made on improving empathetic conversations or acting as effective assistants in the treatment with LLMs. However, the existing datasets lack consulting knowledge, resulting in LLMs lacking professional consulting competence. Moreover, how to automatically evaluate multi-turn dialogues within the counseling process remains an understudied area. To bridge the gap, we propose CPsyCoun, a report-based multi-turn dialogue reconstruction and evaluation framework for Chinese psychological counseling. To fully exploit psychological counseling reports, a two-phase approach is devised to construct high-quality dialogues while a comprehensive evaluation benchmark is developed for the effective automatic evaluation of multi-turn psychological consultations. Competitive experimental results demonstrate the effectiveness of our proposed framework in psychological counseling. We open-source the datasets and model for future research at https://github.com/CAS-SIAT-XinHai/CPsyCoun
△ Less
Submitted 10 June, 2024; v1 submitted 26 May, 2024;
originally announced May 2024.
-
MAGIC: Map-Guided Few-Shot Audio-Visual Acoustics Modeling
Authors:
Diwei Huang,
Kunyang Lin,
Peihao Chen,
Qing Du,
Mingkui Tan
Abstract:
Few-shot audio-visual acoustics modeling seeks to synthesize the room impulse response in arbitrary locations with few-shot observations. To sufficiently exploit the provided few-shot data for accurate acoustic modeling, we present a *map-guided* framework by constructing acoustic-related visual semantic feature maps of the scenes. Visual features preserve semantic details related to sound and map…
▽ More
Few-shot audio-visual acoustics modeling seeks to synthesize the room impulse response in arbitrary locations with few-shot observations. To sufficiently exploit the provided few-shot data for accurate acoustic modeling, we present a *map-guided* framework by constructing acoustic-related visual semantic feature maps of the scenes. Visual features preserve semantic details related to sound and maps provide explicit structural regularities of sound propagation, which are valuable for modeling environment acoustics. We thus extract pixel-wise semantic features derived from observations and project them into a top-down map, namely the **observation semantic map**. This map contains the relative positional information among points and the semantic feature information associated with each point. Yet, limited information extracted by few-shot observations on the map is not sufficient for understanding and modeling the whole scene. We address the challenge by generating a **scene semantic map** via diffusing features and anticipating the observation semantic map. The scene semantic map then interacts with echo encoding by a transformer-based encoder-decoder to predict RIR for arbitrary speaker-listener query pairs. Extensive experiments on Matterport3D and Replica dataset verify the efficacy of our framework.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
CPsyExam: A Chinese Benchmark for Evaluating Psychology using Examinations
Authors:
Jiahao Zhao,
Jingwei Zhu,
Minghuan Tan,
Min Yang,
Di Yang,
Chenhao Zhang,
Guancheng Ye,
Chengming Li,
Xiping Hu
Abstract:
In this paper, we introduce a novel psychological benchmark, CPsyExam, constructed from questions sourced from Chinese language examinations. CPsyExam is designed to prioritize psychological knowledge and case analysis separately, recognizing the significance of applying psychological knowledge to real-world scenarios. From the pool of 22k questions, we utilize 4k to create the benchmark that offe…
▽ More
In this paper, we introduce a novel psychological benchmark, CPsyExam, constructed from questions sourced from Chinese language examinations. CPsyExam is designed to prioritize psychological knowledge and case analysis separately, recognizing the significance of applying psychological knowledge to real-world scenarios. From the pool of 22k questions, we utilize 4k to create the benchmark that offers balanced coverage of subjects and incorporates a diverse range of case analysis techniques.Furthermore, we evaluate a range of existing large language models~(LLMs), spanning from open-sourced to API-based models. Our experiments and analysis demonstrate that CPsyExam serves as an effective benchmark for enhancing the understanding of psychology within LLMs and enables the comparison of LLMs across various granularities.
△ Less
Submitted 18 May, 2024; v1 submitted 16 May, 2024;
originally announced May 2024.
-
PVTransformer: Point-to-Voxel Transformer for Scalable 3D Object Detection
Authors:
Zhaoqi Leng,
Pei Sun,
Tong He,
Dragomir Anguelov,
Mingxing Tan
Abstract:
3D object detectors for point clouds often rely on a pooling-based PointNet to encode sparse points into grid-like voxels or pillars. In this paper, we identify that the common PointNet design introduces an information bottleneck that limits 3D object detection accuracy and scalability. To address this limitation, we propose PVTransformer: a transformer-based point-to-voxel architecture for 3D det…
▽ More
3D object detectors for point clouds often rely on a pooling-based PointNet to encode sparse points into grid-like voxels or pillars. In this paper, we identify that the common PointNet design introduces an information bottleneck that limits 3D object detection accuracy and scalability. To address this limitation, we propose PVTransformer: a transformer-based point-to-voxel architecture for 3D detection. Our key idea is to replace the PointNet pooling operation with an attention module, leading to a better point-to-voxel aggregation function. Our design respects the permutation invariance of sparse 3D points while being more expressive than the pooling-based PointNet. Experimental results show our PVTransformer achieves much better performance compared to the latest 3D object detectors. On the widely used Waymo Open Dataset, our PVTransformer achieves state-of-the-art 76.5 mAPH L2, outperforming the prior art of SWFormer by +1.7 mAPH L2.
△ Less
Submitted 5 May, 2024;
originally announced May 2024.
-
STT: Stateful Tracking with Transformers for Autonomous Driving
Authors:
Longlong Jing,
Ruichi Yu,
Xu Chen,
Zhengli Zhao,
Shiwei Sheng,
Colin Graber,
Qi Chen,
Qinru Li,
Shangxuan Wu,
Han Deng,
Sangjin Lee,
Chris Sweeney,
Qiurui He,
Wei-Chih Hung,
Tong He,
Xingyi Zhou,
Farshid Moussavi,
Zijian Guo,
Yin Zhou,
Mingxing Tan,
Weilong Yang,
Congcong Li
Abstract:
Tracking objects in three-dimensional space is critical for autonomous driving. To ensure safety while driving, the tracker must be able to reliably track objects across frames and accurately estimate their states such as velocity and acceleration in the present. Existing works frequently focus on the association task while either neglecting the model performance on state estimation or deploying c…
▽ More
Tracking objects in three-dimensional space is critical for autonomous driving. To ensure safety while driving, the tracker must be able to reliably track objects across frames and accurately estimate their states such as velocity and acceleration in the present. Existing works frequently focus on the association task while either neglecting the model performance on state estimation or deploying complex heuristics to predict the states. In this paper, we propose STT, a Stateful Tracking model built with Transformers, that can consistently track objects in the scenes while also predicting their states accurately. STT consumes rich appearance, geometry, and motion signals through long term history of detections and is jointly optimized for both data association and state estimation tasks. Since the standard tracking metrics like MOTA and MOTP do not capture the combined performance of the two tasks in the wider spectrum of object states, we extend them with new metrics called S-MOTA and MOTPS that address this limitation. STT achieves competitive real-time performance on the Waymo Open Dataset.
△ Less
Submitted 30 April, 2024;
originally announced May 2024.
-
Near-ultrastrong nonlinear light-matter coupling in superconducting circuits
Authors:
Yufeng Ye,
Jeremy B. Kline,
Alec Yen,
Gregory Cunningham,
Max Tan,
Alicia Zang,
Michael Gingras,
Bethany M. Niedzielski,
Hannah Stickler,
Kyle Serniak,
Mollie E. Schwartz,
Kevin P. O'Brien
Abstract:
The interaction between an atom and an electromagnetic mode of a resonator is of both fundamental interest and is ubiquitous in quantum technologies. Most prior work studies a linear light-matter coupling of the form $g \widehatσ_x (\widehat{a} + \widehat{a}^\dagger)$, where $g$ measured relative to photonic ($ω_a$) and atomic ($ω_b$) mode frequencies can reach the ultrastrong regime (…
▽ More
The interaction between an atom and an electromagnetic mode of a resonator is of both fundamental interest and is ubiquitous in quantum technologies. Most prior work studies a linear light-matter coupling of the form $g \widehatσ_x (\widehat{a} + \widehat{a}^\dagger)$, where $g$ measured relative to photonic ($ω_a$) and atomic ($ω_b$) mode frequencies can reach the ultrastrong regime ($g/ω_{a}\!>\!10^{-1}$). In contrast, a nonlinear light-matter coupling of the form $\fracχ{2} \widehatσ_z \widehat{a}^\dagger \widehat{a}$ has the advantage of commuting with the atomic $\widehatσ_z$ and photonic $\widehat{a}^\dagger\widehat{a}$ Hamiltonian, allowing for fundamental operations such as quantum-non-demolition measurement. However, due to the perturbative nature of nonlinear coupling, the state-of-the-art $χ/\text{max}(ω_a, ω_b)$ is limited to $\!<\!10^{-2}$. Here, we use a superconducting circuit architecture featuring a quarton coupler to experimentally demonstrate, for the first time, a near-ultrastrong $χ/\text{max}(ω_a, ω_b)= (4.852\pm0.006)\times10^{-2}$ nonlinear coupling of a superconducting artificial atom and a nearly-linear resonator. We also show signatures of light-light nonlinear coupling ($χ\widehat{a}^\dagger\widehat{a}\widehat{b}^\dagger\widehat{b}$), and $χ/2π= 580.3 \pm 0.4 $ MHz matter-matter nonlinear coupling ($\fracχ{4}\widehatσ_{z,a}\widehatσ_{z,b}$) which represents the largest reported $ZZ$ interaction between two coherent qubits. Such advances in the nonlinear coupling strength of light, matter modes enable new physical regimes and could lead to applications such as orders of magnitude faster qubit readout and gates.
△ Less
Submitted 2 May, 2024; v1 submitted 29 April, 2024;
originally announced April 2024.
-
Statistical Inference for Covariate-Adjusted and Interpretable Generalized Factor Model with Application to Testing Fairness
Authors:
Jing Ouyang,
Chengyu Cui,
Kean Ming Tan,
Gongjun Xu
Abstract:
In the era of data explosion, statisticians have been developing interpretable and computationally efficient statistical methods to measure latent factors (e.g., skills, abilities, and personalities) using large-scale assessment data. In addition to understanding the latent information, the covariate effect on responses controlling for latent factors is also of great scientific interest and has wi…
▽ More
In the era of data explosion, statisticians have been developing interpretable and computationally efficient statistical methods to measure latent factors (e.g., skills, abilities, and personalities) using large-scale assessment data. In addition to understanding the latent information, the covariate effect on responses controlling for latent factors is also of great scientific interest and has wide applications, such as evaluating the fairness of educational testing, where the covariate effect reflects whether a test question is biased toward certain individual characteristics (e.g., gender and race) taking into account their latent abilities. However, the large sample size, substantial covariate dimension, and great test length pose challenges to developing efficient methods and drawing valid inferences. Moreover, to accommodate the commonly encountered discrete types of responses, nonlinear latent factor models are often assumed, bringing further complexity to the problem. To address these challenges, we consider a covariate-adjusted generalized factor model and develop novel and interpretable conditions to address the identifiability issue. Based on the identifiability conditions, we propose a joint maximum likelihood estimation method and establish estimation consistency and asymptotic normality results for the covariate effects under a practical yet challenging asymptotic regime. Furthermore, we derive estimation and inference results for latent factors and the factor loadings. We illustrate the finite sample performance of the proposed method through extensive numerical studies and an application to an educational assessment dataset obtained from the Programme for International Student Assessment (PISA).
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
High-Linearity PAM-4 Silicon Micro-ring Transmitter Architecture with Electronic-Photonic Hybrid DAC
Authors:
Zheng Li,
Chengyang Lv,
Min Tan
Abstract:
This paper presents a high linearity PAM-4 transmitter (TX) architecture, consisting of a three-segment micro-ring modulator (MRM) and a matched CMOS driver. This architecture can drive a high-linearity 4-level pulse amplitude (PAM-4) modulation signal, thereby extending the tunable operating wavelength range for achieving linear PAM-4 output. We use the three-segment MRM to increase design flexib…
▽ More
This paper presents a high linearity PAM-4 transmitter (TX) architecture, consisting of a three-segment micro-ring modulator (MRM) and a matched CMOS driver. This architecture can drive a high-linearity 4-level pulse amplitude (PAM-4) modulation signal, thereby extending the tunable operating wavelength range for achieving linear PAM-4 output. We use the three-segment MRM to increase design flexibility so that the linearity of PAM-4 output can be optimized with another degree of freedom. Each phase shift region is directly driven by the independently amplitude-tunable Non-Return-to-Zero (NRZ) signal. The three-segment modulator can achieve an adjustable wavelength range of approximately 0.037 nm within the high linearity PAM-4 output limit when the driving voltage varies from 1.5 V to 3 V, simultaneously achieving an adjustable insertion loss (IL) range of approximately 2 dB, roughly four times that of the two-segment MRM with a similar design. The driver circuit with adjustable driving voltage is co-designed to adjust the eye height to improve PAM-4 linearity. In this article, the high linearity PAM-4 silicon micro-ring architecture can be employed in optical transmitters to adjust PAM-4 eye-opening size and maximize the PAM-4 output linearity, thus offering the potential for high-performance and low-power overhead transmitters.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
Language Models Still Struggle to Zero-shot Reason about Time Series
Authors:
Mike A. Merrill,
Mingtian Tan,
Vinayak Gupta,
Tom Hartvigsen,
Tim Althoff
Abstract:
Time series are critical for decision-making in fields like finance and healthcare. Their importance has driven a recent influx of works passing time series into language models, leading to non-trivial forecasting on some datasets. But it remains unknown whether non-trivial forecasting implies that language models can reason about time series. To address this gap, we generate a first-of-its-kind e…
▽ More
Time series are critical for decision-making in fields like finance and healthcare. Their importance has driven a recent influx of works passing time series into language models, leading to non-trivial forecasting on some datasets. But it remains unknown whether non-trivial forecasting implies that language models can reason about time series. To address this gap, we generate a first-of-its-kind evaluation framework for time series reasoning, including formal tasks and a corresponding dataset of multi-scale time series paired with text captions across ten domains. Using these data, we probe whether language models achieve three forms of reasoning: (1) Etiological Reasoning - given an input time series, can the language model identify the scenario that most likely created it? (2) Question Answering - can a language model answer factual questions about time series? (3) Context-Aided Forecasting - does highly relevant textual context improve a language model's time series forecasts?
We find that otherwise highly-capable language models demonstrate surprisingly limited time series reasoning: they score marginally above random on etiological and question answering tasks (up to 30 percentage points worse than humans) and show modest success in using context to improve forecasting. These weakness showcase that time series reasoning is an impactful, yet deeply underdeveloped direction for language model research. We also make our datasets and code public at to support further research in this direction at https://github.com/behavioral-data/TSandLanguage
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Room-Temperature Polariton Lasing from CdSe core-only Nanoplatelets
Authors:
Francisco Freire-Fernández,
Nathan G. Sinai,
Max J. H. Tan,
Sang-Min Park,
Eric Koessler,
Todd D. Krauss,
Pengfei Huo,
Teri W. Odom
Abstract:
This paper reports how CdSe core-only nanoplatelets coupled with plasmonic Al nanoparticle lattices can exhibit exciton-polariton lasing. By improving a procedure to synthesize monodisperse 4-monolayer CdSe nanoplatelets, we could resolve polariton decay dynamics and pathways. Experiment and theory confirmed that the system is in the strong coupling regime based on anti-crossings in the dispersion…
▽ More
This paper reports how CdSe core-only nanoplatelets coupled with plasmonic Al nanoparticle lattices can exhibit exciton-polariton lasing. By improving a procedure to synthesize monodisperse 4-monolayer CdSe nanoplatelets, we could resolve polariton decay dynamics and pathways. Experiment and theory confirmed that the system is in the strong coupling regime based on anti-crossings in the dispersion diagrams and magnitude of the Rabi splitting values. Notably, polariton lasing is observed only for cavity lattice periodicities that exhibit specific dispersive characteristics that enable polariton accumulation. The threshold of polariton lasing is 25-fold lower than reported photon lasing values from CdSe nanoplatelets in similar cavity designs. This open-cavity platform offers a simple approach to control exciton polaritons anticipated to benefit quantum information processing, optoelectronics, and chemical reactions.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
G-NeRF: Geometry-enhanced Novel View Synthesis from Single-View Images
Authors:
Zixiong Huang,
Qi Chen,
Libo Sun,
Yifan Yang,
Naizhou Wang,
Mingkui Tan,
Qi Wu
Abstract:
Novel view synthesis aims to generate new view images of a given view image collection. Recent attempts address this problem relying on 3D geometry priors (e.g., shapes, sizes, and positions) learned from multi-view images. However, such methods encounter the following limitations: 1) they require a set of multi-view images as training data for a specific scene (e.g., face, car or chair), which is…
▽ More
Novel view synthesis aims to generate new view images of a given view image collection. Recent attempts address this problem relying on 3D geometry priors (e.g., shapes, sizes, and positions) learned from multi-view images. However, such methods encounter the following limitations: 1) they require a set of multi-view images as training data for a specific scene (e.g., face, car or chair), which is often unavailable in many real-world scenarios; 2) they fail to extract the geometry priors from single-view images due to the lack of multi-view supervision. In this paper, we propose a Geometry-enhanced NeRF (G-NeRF), which seeks to enhance the geometry priors by a geometry-guided multi-view synthesis approach, followed by a depth-aware training. In the synthesis process, inspired that existing 3D GAN models can unconditionally synthesize high-fidelity multi-view images, we seek to adopt off-the-shelf 3D GAN models, such as EG3D, as a free source to provide geometry priors through synthesizing multi-view data. Simultaneously, to further improve the geometry quality of the synthetic data, we introduce a truncation method to effectively sample latent codes within 3D GAN models. To tackle the absence of multi-view supervision for single-view images, we design the depth-aware training approach, incorporating a depth-aware discriminator to guide geometry priors through depth maps. Experiments demonstrate the effectiveness of our method in terms of both qualitative and quantitative results.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
HiLo: Detailed and Robust 3D Clothed Human Reconstruction with High-and Low-Frequency Information of Parametric Models
Authors:
Yifan Yang,
Dong Liu,
Shuhai Zhang,
Zeshuai Deng,
Zixiong Huang,
Mingkui Tan
Abstract:
Reconstructing 3D clothed human involves creating a detailed geometry of individuals in clothing, with applications ranging from virtual try-on, movies, to games. To enable practical and widespread applications, recent advances propose to generate a clothed human from an RGB image. However, they struggle to reconstruct detailed and robust avatars simultaneously. We empirically find that the high-f…
▽ More
Reconstructing 3D clothed human involves creating a detailed geometry of individuals in clothing, with applications ranging from virtual try-on, movies, to games. To enable practical and widespread applications, recent advances propose to generate a clothed human from an RGB image. However, they struggle to reconstruct detailed and robust avatars simultaneously. We empirically find that the high-frequency (HF) and low-frequency (LF) information from a parametric model has the potential to enhance geometry details and improve robustness to noise, respectively. Based on this, we propose HiLo, namely clothed human reconstruction with high- and low-frequency information, which contains two components. 1) To recover detailed geometry using HF information, we propose a progressive HF Signed Distance Function to enhance the detailed 3D geometry of a clothed human. We analyze that our progressive learning manner alleviates large gradients that hinder model convergence. 2) To achieve robust reconstruction against inaccurate estimation of the parametric model by using LF information, we propose a spatial interaction implicit function. This function effectively exploits the complementary spatial information from a low-resolution voxel grid of the parametric model. Experimental results demonstrate that HiLo outperforms the state-of-the-art methods by 10.43% and 9.54% in terms of Chamfer distance on the Thuman2.0 and CAPE datasets, respectively. Additionally, HiLo demonstrates robustness to noise from the parametric model, challenging poses, and various clothing styles.
△ Less
Submitted 19 April, 2024; v1 submitted 7 April, 2024;
originally announced April 2024.
-
Local operator quench induced by two-dimensional inhomogeneous and homogeneous CFT Hamiltonians
Authors:
Weibo Mao,
Masahiro Nozaki,
Kotaro Tamaoka,
Mao Tian Tan
Abstract:
We explore non-equilibrium processes in two-dimensional conformal field theories (2d CFTs) due to the growth of operators induced by inhomogeneous and homogeneous Hamiltonians by investigating the time dependence of the partition function, energy density, and entanglement entropy. The non-equilibrium processes considered in this paper are constructed out of the Lorentzian and Euclidean time evolut…
▽ More
We explore non-equilibrium processes in two-dimensional conformal field theories (2d CFTs) due to the growth of operators induced by inhomogeneous and homogeneous Hamiltonians by investigating the time dependence of the partition function, energy density, and entanglement entropy. The non-equilibrium processes considered in this paper are constructed out of the Lorentzian and Euclidean time evolution governed by different Hamiltonians. We explore the effect of the time ordering on entanglement dynamics so that we find that in a free boson CFT and RCFTs, this time ordering does not affect the entanglement entropy, while in the holographic CFTs, it does. Our main finding is that in the holographic CFTs, the non-unitary time evolution induced by the inhomogeneous Hamiltonian can retain the initial state information longer than in the unitary time evolution.
△ Less
Submitted 2 April, 2024; v1 submitted 23 March, 2024;
originally announced March 2024.
-
AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework
Authors:
Xiang Li,
Zhenyu Li,
Chen Shi,
Yong Xu,
Qing Du,
Mingkui Tan,
Jun Huang,
Wei Lin
Abstract:
The task of financial analysis primarily encompasses two key areas: stock trend prediction and the corresponding financial question answering. Currently, machine learning and deep learning algorithms (ML&DL) have been widely applied for stock trend predictions, leading to significant progress. However, these methods fail to provide reasons for predictions, lacking interpretability and reasoning pr…
▽ More
The task of financial analysis primarily encompasses two key areas: stock trend prediction and the corresponding financial question answering. Currently, machine learning and deep learning algorithms (ML&DL) have been widely applied for stock trend predictions, leading to significant progress. However, these methods fail to provide reasons for predictions, lacking interpretability and reasoning processes. Also, they can not integrate textual information such as financial news or reports. Meanwhile, large language models (LLMs) have remarkable textual understanding and generation ability. But due to the scarcity of financial training datasets and limited integration with real-time knowledge, LLMs still suffer from hallucinations and are unable to keep up with the latest information. To tackle these challenges, we first release AlphaFin datasets, combining traditional research datasets, real-time financial data, and handwritten chain-of-thought (CoT) data. It has a positive impact on training LLMs for completing financial analysis. We then use AlphaFin datasets to benchmark a state-of-the-art method, called Stock-Chain, for effectively tackling the financial analysis task, which integrates retrieval-augmented generation (RAG) techniques. Extensive experiments are conducted to demonstrate the effectiveness of our framework on financial analysis.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Uncertainty-Calibrated Test-Time Model Adaptation without Forgetting
Authors:
Mingkui Tan,
Guohao Chen,
Jiaxiang Wu,
Yifan Zhang,
Yaofo Chen,
Peilin Zhao,
Shuaicheng Niu
Abstract:
Test-time adaptation (TTA) seeks to tackle potential distribution shifts between training and test data by adapting a given model w.r.t. any test sample. Although recent TTA has shown promising performance, we still face two key challenges: 1) prior methods perform backpropagation for each test sample, resulting in unbearable optimization costs to many applications; 2) while existing TTA can signi…
▽ More
Test-time adaptation (TTA) seeks to tackle potential distribution shifts between training and test data by adapting a given model w.r.t. any test sample. Although recent TTA has shown promising performance, we still face two key challenges: 1) prior methods perform backpropagation for each test sample, resulting in unbearable optimization costs to many applications; 2) while existing TTA can significantly improve the test performance on out-of-distribution data, they often suffer from severe performance degradation on in-distribution data after TTA (known as forgetting). To this end, we have proposed an Efficient Anti-Forgetting Test-Time Adaptation (EATA) method which develops an active sample selection criterion to identify reliable and non-redundant samples for test-time entropy minimization. To alleviate forgetting, EATA introduces a Fisher regularizer estimated from test samples to constrain important model parameters from drastic changes. However, in EATA, the adopted entropy loss consistently assigns higher confidence to predictions even for samples that are underlying uncertain, leading to overconfident predictions. To tackle this, we further propose EATA with Calibration (EATA-C) to separately exploit the reducible model uncertainty and the inherent data uncertainty for calibrated TTA. Specifically, we measure the model uncertainty by the divergence between predictions from the full network and its sub-networks, on which we propose a divergence loss to encourage consistent predictions instead of overconfident ones. To further recalibrate prediction confidence, we utilize the disagreement among predicted labels as an indicator of the data uncertainty, and then devise a min-max entropy regularizer to selectively increase and decrease prediction confidence for different samples. Experiments on image classification and semantic segmentation verify the effectiveness of our methods.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Qualitative analysis of a class of SIRS infectious disease models with nonlinear infection rate
Authors:
Mengqi Tan
Abstract:
The existence and local stability of some non-negative equilibrium points of a class of SIRS infectious disease models with non-linear infection and treatment rates are investigated under the condition that the total population is a constant. The qualitative theory of differential equations was used to demonstrate that the endemic equilibrium point of the system is either a stable equilibrium, an…
▽ More
The existence and local stability of some non-negative equilibrium points of a class of SIRS infectious disease models with non-linear infection and treatment rates are investigated under the condition that the total population is a constant. The qualitative theory of differential equations was used to demonstrate that the endemic equilibrium point of the system is either a stable equilibrium, an unstable equilibrium or a degenerate equilibrium under different circumstances. Subsequently, the local stability of the non-negative equilibrium point of the system is analyzed. Finally, the bifurcation theory is used to prove that the system takes the natural recovery growth rate as the parameter of the saddle-node branching, and the conditions for the existence of the model saddle-node branching are given.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
$π$ Phase Interlayer Shift and Stacking Fault in the Kagome Superconductor CsV$_3$Sb$_5$
Authors:
Feng Jin,
Wei Ren,
Mingshu Tan,
Mingtai Xie,
Bingru Lu,
Zheng Zhang,
Jianting Ji,
Qingming Zhang
Abstract:
The stacking degree of freedom is a crucial factor in tuning material properties and has been extensively investigated in layered materials. The kagome superconductor CsV$_3$Sb$_5$ was recently discovered to exhibit a three-dimensional CDW phase below TCDW ~94 K. Despite the thorough investigation of in-plane modulation, the out-of-plane modulation has remained ambiguous. Here, our polarization- a…
▽ More
The stacking degree of freedom is a crucial factor in tuning material properties and has been extensively investigated in layered materials. The kagome superconductor CsV$_3$Sb$_5$ was recently discovered to exhibit a three-dimensional CDW phase below TCDW ~94 K. Despite the thorough investigation of in-plane modulation, the out-of-plane modulation has remained ambiguous. Here, our polarization- and temperature-dependent Raman measurements reveal the breaking of C$_6$ rotational symmetry and the presence of three distinct domains oriented at approximately 120°to each other. The observations demonstrate that the CDW phase can be naturally explained as a 2c staggered order phase with adjacent layers exhibiting a relative $π$ phase shift. Further, we discover a first-order structural phase transition at approximately 65 K and suggest that it is a stacking order-disorder phase transition due to stacking fault, supported by the thermal hysteresis behavior of a Cs-related phonon mode. Our findings highlight the significance of the stacking degree of freedom in CsV$_3$Sb$_5$ and offer structural insights to comprehend the entanglement between superconductivity and CDW.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
Towards Robust and Efficient Cloud-Edge Elastic Model Adaptation via Selective Entropy Distillation
Authors:
Yaofo Chen,
Shuaicheng Niu,
Yaowei Wang,
Shoukai Xu,
Hengjie Song,
Mingkui Tan
Abstract:
The conventional deep learning paradigm often involves training a deep model on a server and then deploying the model or its distilled ones to resource-limited edge devices. Usually, the models shall remain fixed once deployed (at least for some period) due to the potential high cost of model adaptation for both the server and edge sides. However, in many real-world scenarios, the test environment…
▽ More
The conventional deep learning paradigm often involves training a deep model on a server and then deploying the model or its distilled ones to resource-limited edge devices. Usually, the models shall remain fixed once deployed (at least for some period) due to the potential high cost of model adaptation for both the server and edge sides. However, in many real-world scenarios, the test environments may change dynamically (known as distribution shifts), which often results in degraded performance. Thus, one has to adapt the edge models promptly to attain promising performance. Moreover, with the increasing data collected at the edge, this paradigm also fails to further adapt the cloud model for better performance. To address these, we encounter two primary challenges: 1) the edge model has limited computation power and may only support forward propagation; 2) the data transmission budget between cloud and edge devices is limited in latency-sensitive scenarios. In this paper, we establish a Cloud-Edge Elastic Model Adaptation (CEMA) paradigm in which the edge models only need to perform forward propagation and the edge models can be adapted online. In our CEMA, to reduce the communication burden, we devise two criteria to exclude unnecessary samples from uploading to the cloud, i.e., dynamic unreliable and low-informative sample exclusion. Based on the uploaded samples, we update and distribute the affine parameters of normalization layers by distilling from the stronger foundation model to the edge model with a sample replay strategy. Extensive experimental results on ImageNet-C and ImageNet-R verify the effectiveness of our CEMA.
△ Less
Submitted 6 June, 2024; v1 submitted 27 February, 2024;
originally announced February 2024.
-
MoZIP: A Multilingual Benchmark to Evaluate Large Language Models in Intellectual Property
Authors:
Shiwen Ni,
Minghuan Tan,
Yuelin Bai,
Fuqiang Niu,
Min Yang,
Bowen Zhang,
Ruifeng Xu,
Xiaojun Chen,
Chengming Li,
Xiping Hu,
Ye Li,
Jianping Fan
Abstract:
Large language models (LLMs) have demonstrated impressive performance in various natural language processing (NLP) tasks. However, there is limited understanding of how well LLMs perform in specific domains (e.g, the intellectual property (IP) domain). In this paper, we contribute a new benchmark, the first Multilingual-oriented quiZ on Intellectual Property (MoZIP), for the evaluation of LLMs in…
▽ More
Large language models (LLMs) have demonstrated impressive performance in various natural language processing (NLP) tasks. However, there is limited understanding of how well LLMs perform in specific domains (e.g, the intellectual property (IP) domain). In this paper, we contribute a new benchmark, the first Multilingual-oriented quiZ on Intellectual Property (MoZIP), for the evaluation of LLMs in the IP domain. The MoZIP benchmark includes three challenging tasks: IP multiple-choice quiz (IPQuiz), IP question answering (IPQA), and patent matching (PatentMatch). In addition, we also develop a new IP-oriented multilingual large language model (called MoZi), which is a BLOOMZ-based model that has been supervised fine-tuned with multilingual IP-related text data. We evaluate our proposed MoZi model and four well-known LLMs (i.e., BLOOMZ, BELLE, ChatGLM and ChatGPT) on the MoZIP benchmark. Experimental results demonstrate that MoZi outperforms BLOOMZ, BELLE and ChatGLM by a noticeable margin, while it had lower scores compared with ChatGPT. Notably, the performance of current LLMs on the MoZIP benchmark has much room for improvement, and even the most powerful ChatGPT does not reach the passing level. Our source code, data, and models are available at \url{https://github.com/AI-for-Science/MoZi}.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
Detecting Machine-Generated Texts by Multi-Population Aware Optimization for Maximum Mean Discrepancy
Authors:
Shuhai Zhang,
Yiliao Song,
Jiahao Yang,
Yuanqing Li,
Bo Han,
Mingkui Tan
Abstract:
Large language models (LLMs) such as ChatGPT have exhibited remarkable performance in generating human-like texts. However, machine-generated texts (MGTs) may carry critical risks, such as plagiarism issues, misleading information, or hallucination issues. Therefore, it is very urgent and important to detect MGTs in many situations. Unfortunately, it is challenging to distinguish MGTs and human-wr…
▽ More
Large language models (LLMs) such as ChatGPT have exhibited remarkable performance in generating human-like texts. However, machine-generated texts (MGTs) may carry critical risks, such as plagiarism issues, misleading information, or hallucination issues. Therefore, it is very urgent and important to detect MGTs in many situations. Unfortunately, it is challenging to distinguish MGTs and human-written texts because the distributional discrepancy between them is often very subtle due to the remarkable performance of LLMs. In this paper, we seek to exploit \textit{maximum mean discrepancy} (MMD) to address this issue in the sense that MMD can well identify distributional discrepancies. However, directly training a detector with MMD using diverse MGTs will incur a significantly increased variance of MMD since MGTs may contain \textit{multiple text populations} due to various LLMs. This will severely impair MMD's ability to measure the difference between two samples. To tackle this, we propose a novel \textit{multi-population} aware optimization method for MMD called MMD-MP, which can \textit{avoid variance increases} and thus improve the stability to measure the distributional discrepancy. Relying on MMD-MP, we develop two methods for paragraph-based and sentence-based detection, respectively. Extensive experiments on various LLMs, \eg, GPT2 and ChatGPT, show superior detection performance of our MMD-MP. The source code is available at \url{https://github.com/ZSHsh98/MMD-MP}.
△ Less
Submitted 29 February, 2024; v1 submitted 25 February, 2024;
originally announced February 2024.
-
The AI Security Pyramid of Pain
Authors:
Chris M. Ward,
Josh Harguess,
Julia Tao,
Daniel Christman,
Paul Spicer,
Mike Tan
Abstract:
We introduce the AI Security Pyramid of Pain, a framework that adapts the cybersecurity Pyramid of Pain to categorize and prioritize AI-specific threats. This framework provides a structured approach to understanding and addressing various levels of AI threats. Starting at the base, the pyramid emphasizes Data Integrity, which is essential for the accuracy and reliability of datasets and AI models…
▽ More
We introduce the AI Security Pyramid of Pain, a framework that adapts the cybersecurity Pyramid of Pain to categorize and prioritize AI-specific threats. This framework provides a structured approach to understanding and addressing various levels of AI threats. Starting at the base, the pyramid emphasizes Data Integrity, which is essential for the accuracy and reliability of datasets and AI models, including their weights and parameters. Ensuring data integrity is crucial, as it underpins the effectiveness of all AI-driven decisions and operations. The next level, AI System Performance, focuses on MLOps-driven metrics such as model drift, accuracy, and false positive rates. These metrics are crucial for detecting potential security breaches, allowing for early intervention and maintenance of AI system integrity. Advancing further, the pyramid addresses the threat posed by Adversarial Tools, identifying and neutralizing tools used by adversaries to target AI systems. This layer is key to staying ahead of evolving attack methodologies. At the Adversarial Input layer, the framework addresses the detection and mitigation of inputs designed to deceive or exploit AI models. This includes techniques like adversarial patterns and prompt injection attacks, which are increasingly used in sophisticated attacks on AI systems. Data Provenance is the next critical layer, ensuring the authenticity and lineage of data and models. This layer is pivotal in preventing the use of compromised or biased data in AI systems. At the apex is the tactics, techniques, and procedures (TTPs) layer, dealing with the most complex and challenging aspects of AI security. This involves a deep understanding and strategic approach to counter advanced AI-targeted attacks, requiring comprehensive knowledge and planning.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
EcoVal: An Efficient Data Valuation Framework for Machine Learning
Authors:
Ayush K Tarun,
Vikram S Chundawat,
Murari Mandal,
Hong Ming Tan,
Bowei Chen,
Mohan Kankanhalli
Abstract:
Quantifying the value of data within a machine learning workflow can play a pivotal role in making more strategic decisions in machine learning initiatives. The existing Shapley value based frameworks for data valuation in machine learning are computationally expensive as they require considerable amount of repeated training of the model to obtain the Shapley value. In this paper, we introduce an…
▽ More
Quantifying the value of data within a machine learning workflow can play a pivotal role in making more strategic decisions in machine learning initiatives. The existing Shapley value based frameworks for data valuation in machine learning are computationally expensive as they require considerable amount of repeated training of the model to obtain the Shapley value. In this paper, we introduce an efficient data valuation framework EcoVal, to estimate the value of data for machine learning models in a fast and practical manner. Instead of directly working with individual data sample, we determine the value of a cluster of similar data points. This value is further propagated amongst all the member cluster points. We show that the overall value of the data can be determined by estimating the intrinsic and extrinsic value of each data. This is enabled by formulating the performance of a model as a \textit{production function}, a concept which is popularly used to estimate the amount of output based on factors like labor and capital in a traditional free economic market. We provide a formal proof of our valuation technique and elucidate the principles and mechanisms that enable its accelerated performance. We demonstrate the real-world applicability of our method by showcasing its effectiveness for both in-distribution and out-of-sample data. This work addresses one of the core challenges of efficient data valuation at scale in machine learning models. The code is available at \underline{https://github.com/respai-lab/ecoval}.
△ Less
Submitted 9 July, 2024; v1 submitted 14 February, 2024;
originally announced February 2024.
-
High-cadence Timing of Binary Pulsars with CHIME
Authors:
Chia Min Tan,
Emmanuel Fonseca,
Kathryn Crowter,
Fengqiu Adam Dong,
Victoria M. Kaspi,
Kiyoshi W. Masui,
James W. McKee,
Bradley W. Meyers,
Scott M. Ransom,
Ingrid H. Stairs
Abstract:
We performed near-daily observations on the binary pulsars PSR J0218+4232, PSR J1518+4904 and PSR J2023+2853 with the Canadian Hydrogen Intensity Mapping Experiment (CHIME). For the first time, we detected the Shapiro time delay in all three pulsar-binary systems, using only 2--4 years of CHIME/Pulsar timing data. We measured the pulsar masses to be $1.49^{+0.23}_{-0.20}$ M$_\odot$,…
▽ More
We performed near-daily observations on the binary pulsars PSR J0218+4232, PSR J1518+4904 and PSR J2023+2853 with the Canadian Hydrogen Intensity Mapping Experiment (CHIME). For the first time, we detected the Shapiro time delay in all three pulsar-binary systems, using only 2--4 years of CHIME/Pulsar timing data. We measured the pulsar masses to be $1.49^{+0.23}_{-0.20}$ M$_\odot$, $1.470^{+0.030}_{-0.034}$ M$_\odot$ and $1.50^{+0.49}_{-0.38}$ M$_\odot$ respectively. The companion mass to PSR J0218+4232 was found to be $0.179^{+0.018}_{-0.016}$ M$_\odot$. We constrained the mass of the neutron-star companion of PSR J1518+4904 to be $1.248^{+0.035}_{-0.029}$ M$_\odot$, using the observed apsidal motion as a constraint on mass estimation. The binary companion to PSR J2023+2853 was found to have a mass of $0.93^{+0.17}_{-0.14}$ M$_\odot$; in the context of the near-circular orbit, this mass estimate suggests that the companion to PSR J2023+2853 is likely a high-mass white dwarf. By comparing the timing model obtained for PSR J0218+4232 with previous observations, we found a significant change in the observed orbital period of the system of $\dot{P_{\rm b}} = 0.14(2) \times 10^{-12}$; we determined that this variation arises from ``Shklovskii acceleration" due to relative motion of the binary system, and used this measurement to estimate a distance of $d=(6.7 \pm 1.0)$ kpc to PSR J0218+4232. This work demonstrates the capability of high-cadence observations, enabled by the CHIME/Pulsar system, to detect and refine general-relativistic effects of binary pulsars over short observing timescales.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
Code Representation Learning At Scale
Authors:
Dejiao Zhang,
Wasi Ahmad,
Ming Tan,
Hantian Ding,
Ramesh Nallapati,
Dan Roth,
Xiaofei Ma,
Bing Xiang
Abstract:
Recent studies have shown that code language models at scale demonstrate significant performance gains on downstream tasks, i.e., code generation. However, most of the existing works on code representation learning train models at a hundred million parameter scale using very limited pretraining corpora. In this work, we fuel code representation learning with a vast amount of code data via a two-st…
▽ More
Recent studies have shown that code language models at scale demonstrate significant performance gains on downstream tasks, i.e., code generation. However, most of the existing works on code representation learning train models at a hundred million parameter scale using very limited pretraining corpora. In this work, we fuel code representation learning with a vast amount of code data via a two-stage pretraining scheme. We first train the encoders via a mix that leverages both randomness in masking language modeling and the structure aspect of programming language. We then enhance the representations via contrastive learning with hard negative and hard positive constructed in an unsupervised manner. We establish an off-the-shelf encoder model that persistently outperforms the existing models on a wide variety of downstream tasks by large margins. To comprehend the factors contributing to successful code representation learning, we conduct detailed ablations and share our findings on (i) a customized and effective token-level denoising scheme for source code; (ii) the importance of hard negatives and hard positives; (iii) how the proposed bimodal contrastive learning boost the cross-lingual semantic search performance; and (iv) how the pretraining schemes decide the downstream task performance scales with the model size.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
Tight Bounds on the Message Complexity of Distributed Tree Verification
Authors:
Shay Kutten,
Peter Robinson,
Ming Ming Tan
Abstract:
We consider the message complexity of verifying whether a given subgraph of the communication network forms a tree with specific properties both in the KT-$ρ$ (nodes know their $ρ$-hop neighborhood, including node IDs) and the KT-$0$ (nodes do not have this knowledge) models. We develop a rather general framework that helps in establishing tight lower bounds for various tree verification problems.…
▽ More
We consider the message complexity of verifying whether a given subgraph of the communication network forms a tree with specific properties both in the KT-$ρ$ (nodes know their $ρ$-hop neighborhood, including node IDs) and the KT-$0$ (nodes do not have this knowledge) models. We develop a rather general framework that helps in establishing tight lower bounds for various tree verification problems. We also consider two different verification requirements: namely that every node detects in the case the input is incorrect, as well as the requirement that at least one node detects. The results are stronger than previous ones in the sense that we assume that each node knows the number $n$ of nodes in the graph (in some cases) or an $α$ approximation of $n$ (in other cases). For spanning tree verification, we show that the message complexity inherently depends on the quality of the given approximation of $n$: We show a tight lower bound of $Ω(n^2)$ for the case $α\ge \sqrt{2}$ and a much better upper bound (i.e., $O(n \log n)$) when nodes are given a tighter approximation. On the other hand, our framework also yields an $Ω(n^2)$ lower bound on the message complexity of verifying a minimum spanning tree (MST), which reveals a polynomial separation between ST verification and MST verification. This result holds for randomized algorithms with perfect knowledge of the network size, and even when just one node detects illegal inputs, thus improving over the work of Kor, Korman, and Peleg (2013). For verifying a $d$-approximate BFS tree, we show that the same lower bound holds even if nodes know $n$ exactly, however, the lower bound is sensitive to $d$, which is the stretch parameter.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
ENN's Roadmap for Proton-Boron Fusion Based on Spherical Torus
Authors:
Min-sheng Liu,
Hua-sheng Xie,
Yu-min Wang,
Jia-qi Dong,
Kai-ming Feng,
Xiang Gu,
Xian-li Huang,
Xin-chen Jiang,
Ying-ying Li,
Zhi Li,
Bing Liu,
Wen-jun Liu,
Di Luo,
Yueng-Kay Martin Peng,
Yue-jiang Shi,
Shao-dong Song,
Xian-ming Song,
Tian-tian Sun,
Mu-zhi Tan,
Xue-yun Wang,
Yuan-ming Yang,
Gang Yin,
Han-yue Zhao,
ENN fusion team
Abstract:
ENN Science and Technology Development Co., Ltd. (ENN) is committed to generating fusion energy in an environmentally friendly and cost-effective manner, which requires abundant aneutronic fuel. Proton-boron ( p-$^{11}$B or p-B) fusion is considered an ideal choice for this purpose. Recent studies have suggested that p-B fusion, although challenging, is feasible based on new cross-section data, pr…
▽ More
ENN Science and Technology Development Co., Ltd. (ENN) is committed to generating fusion energy in an environmentally friendly and cost-effective manner, which requires abundant aneutronic fuel. Proton-boron ( p-$^{11}$B or p-B) fusion is considered an ideal choice for this purpose. Recent studies have suggested that p-B fusion, although challenging, is feasible based on new cross-section data, provided that a hot ion mode and high wall reflection can be achieved to reduce electron radiation loss. The high beta and good confinement of the spherical torus (ST) make it an ideal candidate for p-B fusion. By utilizing the new spherical torus energy confinement scaling law, a reactor with a major radius $R_0=4$ m, central magnetic field $B_0=6$ T, central temperature $T_{i0}=150$ keV, plasma current $I_p=30$ MA, and hot ion mode $T_i/T_e=4$ can yield p-B fusion with $Q>10$. A roadmap for p-B fusion has been developed, with the next-generation device named EHL-2. EHL stands for ENN He-Long, which literally means ``peaceful Chinese Loong". The main target parameters include $R_0\simeq1.05$ m, $A\simeq1.85$, $B_0\simeq3$ T, $T_{i0}\simeq30$ keV, $I_p\simeq3$ MA, and $T_i/T_e\geq2$. The existing ST device EXL-50 was simultaneously upgraded to provide experimental support for the new roadmap, involving the installation and upgrading of the central solenoid, vacuum chamber, and magnetic systems. The construction of the upgraded ST fusion device, EXL-50U, was completed at the end of 2023, and it achieved its first plasma in January 2024. The construction of EHL-2 is estimated to be completed by 2026.
△ Less
Submitted 10 June, 2024; v1 submitted 20 January, 2024;
originally announced January 2024.
-
Photonic RF Channelization Based on Microcombs
Authors:
Weiwei Han,
Zhihui Liu,
Mengxi Tan,
Chaoran Huang,
Jiayang Wu,
Kun Xu,
David J. Moss,
Xingyuan Xu
Abstract:
In recent decades, microwave photonic channelization techniques have developed significantly. Characterized by low loss, high versatility, large instantaneous bandwidth, and immunity to electromagnetic interference, microwave photonic channelization addresses the requirements of modern radar and electronic warfare for receivers. Microresonator-based optical frequency combs are promising devices fo…
▽ More
In recent decades, microwave photonic channelization techniques have developed significantly. Characterized by low loss, high versatility, large instantaneous bandwidth, and immunity to electromagnetic interference, microwave photonic channelization addresses the requirements of modern radar and electronic warfare for receivers. Microresonator-based optical frequency combs are promising devices for photonic channelized receivers, enabling full advantage of multicarriers, large bandwidths, and accelerating the integration process of microwave photonic channelized receivers. In this paper, we review the research progress and trends in microwave photonic channelization, focusing on schemes that utilize integrated microcombs. We discuss the potential of microcomb-based RF channelization, as well as their challenges and limitations, and provide perspectives for their future development in the context of on-chip silicon-based photonics.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
Decoupled Prototype Learning for Reliable Test-Time Adaptation
Authors:
Guowei Wang,
Changxing Ding,
Wentao Tan,
Mingkui Tan
Abstract:
Test-time adaptation (TTA) is a task that continually adapts a pre-trained source model to the target domain during inference. One popular approach involves fine-tuning model with cross-entropy loss according to estimated pseudo-labels. However, its performance is significantly affected by noisy pseudo-labels. This study reveals that minimizing the classification error of each sample causes the cr…
▽ More
Test-time adaptation (TTA) is a task that continually adapts a pre-trained source model to the target domain during inference. One popular approach involves fine-tuning model with cross-entropy loss according to estimated pseudo-labels. However, its performance is significantly affected by noisy pseudo-labels. This study reveals that minimizing the classification error of each sample causes the cross-entropy loss's vulnerability to label noise. To address this issue, we propose a novel Decoupled Prototype Learning (DPL) method that features prototype-centric loss computation. First, we decouple the optimization of class prototypes. For each class prototype, we reduce its distance with positive samples and enlarge its distance with negative samples in a contrastive manner. This strategy prevents the model from overfitting to noisy pseudo-labels. Second, we propose a memory-based strategy to enhance DPL's robustness for the small batch sizes often encountered in TTA. We update each class's pseudo-feature from a memory in a momentum manner and insert an additional DPL loss. Finally, we introduce a consistency regularization-based approach to leverage samples with unconfident pseudo-labels. This approach transfers feature styles of samples with unconfident pseudo-labels to those with confident pseudo-labels. Thus, more reliable samples for TTA are created. The experimental results demonstrate that our methods achieve state-of-the-art performance on domain generalization benchmarks, and reliably improve the performance of self-training-based methods on image corruption benchmarks. The code will be released.
△ Less
Submitted 25 January, 2024; v1 submitted 14 January, 2024;
originally announced January 2024.
-
An Efficient Instance Segmentation Framework Based on Oriented Bounding Boxes
Authors:
Zhen Zhou,
Junfeng Fan,
Yunkai Ma,
Sihan Zhao,
Fengshui Jing,
Min Tan
Abstract:
Instance segmentation for completely occluded objects and dense objects in robot vision measurement are two challenging tasks. To uniformly deal with them, this paper proposes a unified coarse-to-fine instance segmentation framework, CFNet, which uses box prompt-based segmentation foundation models (BSMs), e.g., Segment Anything Model. Specifically, CFNet first detects oriented bounding boxes (OBB…
▽ More
Instance segmentation for completely occluded objects and dense objects in robot vision measurement are two challenging tasks. To uniformly deal with them, this paper proposes a unified coarse-to-fine instance segmentation framework, CFNet, which uses box prompt-based segmentation foundation models (BSMs), e.g., Segment Anything Model. Specifically, CFNet first detects oriented bounding boxes (OBBs) to distinguish instances and provide coarse localization information. Then, it predicts OBB prompt-related masks for fine segmentation. CFNet performs instance segmentation with OBBs that only contain partial object boundaries on occluders to predict occluded object instances, which overcomes the difficulty of existing amodal instance segmentation methods in directly predicting occluded objects. In addition, since OBBs only serve as prompts, CFNet alleviates the over-dependence on bounding box detection performance of current instance segmentation methods using OBBs for dense objects. Moreover, to enable BSMs to handle OBB prompts, we propose a novel OBB prompt encoder. To make CFNet more lightweight, we perform knowledge distillation on it and introduce a Gaussian label smoothing method for teacher model outputs. Experiments demonstrate that CFNet outperforms current instance segmentation methods on both industrial and public datasets. The code is available at https://github.com/zhen6618/OBBInstanceSegmentation.
△ Less
Submitted 1 July, 2024; v1 submitted 16 January, 2024;
originally announced January 2024.
-
BET: Explaining Deep Reinforcement Learning through The Error-Prone Decisions
Authors:
Xiao Liu,
Jie Zhao,
Wubing Chen,
Mao Tan,
Yongxing Su
Abstract:
Despite the impressive capabilities of Deep Reinforcement Learning (DRL) agents in many challenging scenarios, their black-box decision-making process significantly limits their deployment in safety-sensitive domains. Several previous self-interpretable works focus on revealing the critical states of the agent's decision. However, they cannot pinpoint the error-prone states. To address this issue,…
▽ More
Despite the impressive capabilities of Deep Reinforcement Learning (DRL) agents in many challenging scenarios, their black-box decision-making process significantly limits their deployment in safety-sensitive domains. Several previous self-interpretable works focus on revealing the critical states of the agent's decision. However, they cannot pinpoint the error-prone states. To address this issue, we propose a novel self-interpretable structure, named Backbone Extract Tree (BET), to better explain the agent's behavior by identify the error-prone states. At a high level, BET hypothesizes that states in which the agent consistently executes uniform decisions exhibit a reduced propensity for errors. To effectively model this phenomenon, BET expresses these states within neighborhoods, each defined by a curated set of representative states. Therefore, states positioned at a greater distance from these representative benchmarks are more prone to error. We evaluate BET in various popular RL environments and show its superiority over existing self-interpretable models in terms of explanation fidelity. Furthermore, we demonstrate a use case for providing explanations for the agents in StarCraft II, a sophisticated multi-agent cooperative game. To the best of our knowledge, we are the first to explain such a complex scenarios using a fully transparent structure.
△ Less
Submitted 14 January, 2024;
originally announced January 2024.
-
Photonic real time video image signal processor at 17Tb/s based on a Kerr microcomb
Authors:
Mengxi Tan,
Xingyuan Xu,
Andreas Boes,
Bill Corcoran,
Thach G. Nguyen,
Sai T. Chu,
Brent E. Little,
Roberto Morandotti,
Jiayang Wu,
Arnan Mitchell,
David J. Moss
Abstract:
Signal processing has become central to many fields, from coherent optical telecommunications, where it is used to compensate signal impairments, to video image processing. Image processing is particularly important for observational astronomy, medical diagnosis, autonomous driving, big data and artificial intelligence. For these applications, signal processing traditionally has mainly been perfor…
▽ More
Signal processing has become central to many fields, from coherent optical telecommunications, where it is used to compensate signal impairments, to video image processing. Image processing is particularly important for observational astronomy, medical diagnosis, autonomous driving, big data and artificial intelligence. For these applications, signal processing traditionally has mainly been performed electronically. However these, as well as new applications, particularly those involving real time video image processing, are creating unprecedented demand for ultrahigh performance, including high bandwidth and reduced energy consumption. Here, we demonstrate a photonic signal processor operating at 17 Terabits/s and use it to process video image signals in real-time. The system processes 400,000 video signals concurrently, performing 34 functions simultaneously that are key to object edge detection, edge enhancement and motion blur. As compared with spatial-light devices used for image processing, our system is not only ultra-high speed but highly reconfigurable and programable, able to perform many different functions without any change to the physical hardware. Our approach is based on an integrated Kerr soliton crystal microcomb, and opens up new avenues for ultrafast robotic vision and machine learning.
△ Less
Submitted 13 January, 2024;
originally announced January 2024.
-
RomniStereo: Recurrent Omnidirectional Stereo Matching
Authors:
Hualie Jiang,
Rui Xu,
Minglang Tan,
Wenjie Jiang
Abstract:
Omnidirectional stereo matching (OSM) is an essential and reliable means for $360^{\circ}$ depth sensing. However, following earlier works on conventional stereo matching, prior state-of-the-art (SOTA) methods rely on a 3D encoder-decoder block to regularize the cost volume, causing the whole system complicated and sub-optimal results. Recently, the Recurrent All-pairs Field Transforms (RAFT) base…
▽ More
Omnidirectional stereo matching (OSM) is an essential and reliable means for $360^{\circ}$ depth sensing. However, following earlier works on conventional stereo matching, prior state-of-the-art (SOTA) methods rely on a 3D encoder-decoder block to regularize the cost volume, causing the whole system complicated and sub-optimal results. Recently, the Recurrent All-pairs Field Transforms (RAFT) based approach employs the recurrent update in 2D and has efficiently improved image-matching tasks, ie, optical flow, and stereo matching. To bridge the gap between OSM and RAFT, we mainly propose an opposite adaptive weighting scheme to seamlessly transform the outputs of spherical sweeping of OSM into the required inputs for the recurrent update, thus creating a recurrent omnidirectional stereo matching (RomniStereo) algorithm. Furthermore, we introduce two techniques, ie, grid embedding and adaptive context feature generation, which also contribute to RomniStereo's performance. Our best model improves the average MAE metric by 40.7\% over the previous SOTA baseline across five datasets. When visualizing the results, our models demonstrate clear advantages on both synthetic and realistic examples. The code is available at \url{https://github.com/HalleyJiang/RomniStereo}.
△ Less
Submitted 25 January, 2024; v1 submitted 8 January, 2024;
originally announced January 2024.
-
UGGNet: Bridging U-Net and VGG for Advanced Breast Cancer Diagnosis
Authors:
Tran Cao Minh,
Nguyen Kim Quoc,
Phan Cong Vinh,
Dang Nhu Phu,
Vuong Xuan Chi,
Ha Minh Tan
Abstract:
In the field of medical imaging, breast ultrasound has emerged as a crucial diagnostic tool for early detection of breast cancer. However, the accuracy of diagnosing the location of the affected area and the extent of the disease depends on the experience of the physician. In this paper, we propose a novel model called UGGNet, combining the power of the U-Net and VGG architectures to enhance the p…
▽ More
In the field of medical imaging, breast ultrasound has emerged as a crucial diagnostic tool for early detection of breast cancer. However, the accuracy of diagnosing the location of the affected area and the extent of the disease depends on the experience of the physician. In this paper, we propose a novel model called UGGNet, combining the power of the U-Net and VGG architectures to enhance the performance of breast ultrasound image analysis. The U-Net component of the model helps accurately segment the lesions, while the VGG component utilizes deep convolutional layers to extract features. The fusion of these two architectures in UGGNet aims to optimize both segmentation and feature representation, providing a comprehensive solution for accurate diagnosis in breast ultrasound images. Experimental results have demonstrated that the UGGNet model achieves a notable accuracy of 78.2% on the "Breast Ultrasound Images Dataset."
△ Less
Submitted 6 January, 2024;
originally announced January 2024.
-
The Green Bank North Celestial Cap Survey IX: Timing Follow-up for 128 Pulsars
Authors:
A. E. McEwen,
J. K. Swiggum,
D. L. Kaplan,
C. M. Tan,
B. W. Meyers,
E. Fonseca,
G. Y. Agazie,
P. Chawla,
K. Crowter,
M. E. DeCesar,
T. Dolch,
F. A. Dong,
W. Fiore,
E. Fonseca,
D. C. Good,
A. G. Istrate,
V. M. Kaspi,
V. I. Kondratiev,
J. van Leeuwen,
L. Levin,
E. F. Lewis,
R. S. Lynch,
K. W. Masui,
J. W. McKee,
M. A. McLaughlin
, et al. (6 additional authors not shown)
Abstract:
The Green Bank North Celestial Cap survey is one of the largest and most sensitive searches for pulsars and transient radio objects. Observations for the survey have finished; priorities have shifted toward long-term monitoring of its discoveries. In this study, we have developed a pipeline to handle large datasets of archival observations and connect them to recent, high-cadence observations take…
▽ More
The Green Bank North Celestial Cap survey is one of the largest and most sensitive searches for pulsars and transient radio objects. Observations for the survey have finished; priorities have shifted toward long-term monitoring of its discoveries. In this study, we have developed a pipeline to handle large datasets of archival observations and connect them to recent, high-cadence observations taken using the Canadian Hydrogen Intensity Mapping Experiment (CHIME) telescope. This pipeline handles data for 128 pulsars and has produced measurements of spin, positional, and orbital parameters that connect data over observation gaps as large as 2000 days. We have also measured glitches in the timing residuals for five of the pulsars included and proper motion for 19 sources (13 new). We include updates to orbital parameters for 19 pulsars, including 9 previously unpublished binaries. For two of these binaries, we provide updated measurements of post-Keplerian binary parameters, which result in much more precise estimates of the total masses of both systems. For PSR J0509+3801, the much improved measurement of the Einstein delay yields much improved mass measurements for the pulsar and its companion, 1.399(6)\Msun and 1.412(6)\Msun, respectively. For this system, we have also obtained a measurement of the orbital decay due to the emission of gravitational waves: $\dot{P}_{\rm B} = -1.37(7)\times10^{-12}$, which is in agreement with the rate predicted by general relativity for these masses.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
DCIR: Dynamic Consistency Intrinsic Reward for Multi-Agent Reinforcement Learning
Authors:
Kunyang Lin,
Yufeng Wang,
Peihao Chen,
Runhao Zeng,
Siyuan Zhou,
Mingkui Tan,
Chuang Gan
Abstract:
Learning optimal behavior policy for each agent in multi-agent systems is an essential yet difficult problem. Despite fruitful progress in multi-agent reinforcement learning, the challenge of addressing the dynamics of whether two agents should exhibit consistent behaviors is still under-explored. In this paper, we propose a new approach that enables agents to learn whether their behaviors should…
▽ More
Learning optimal behavior policy for each agent in multi-agent systems is an essential yet difficult problem. Despite fruitful progress in multi-agent reinforcement learning, the challenge of addressing the dynamics of whether two agents should exhibit consistent behaviors is still under-explored. In this paper, we propose a new approach that enables agents to learn whether their behaviors should be consistent with that of other agents by utilizing intrinsic rewards to learn the optimal policy for each agent. We begin by defining behavior consistency as the divergence in output actions between two agents when provided with the same observation. Subsequently, we introduce dynamic consistency intrinsic reward (DCIR) to stimulate agents to be aware of others' behaviors and determine whether to be consistent with them. Lastly, we devise a dynamic scale network (DSN) that provides learnable scale factors for the agent at every time step to dynamically ascertain whether to award consistent behavior and the magnitude of rewards. We evaluate DCIR in multiple environments including Multi-agent Particle, Google Research Football and StarCraft II Micromanagement, demonstrating its efficacy.
△ Less
Submitted 10 December, 2023;
originally announced December 2023.
-
PSR J0210+5845; An ultra wide binary pulsar with a B6V main-sequence star companion
Authors:
E. van der Wateren,
C. G. Bassa,
G. H. Janssen,
I. V. Yanes-Rizo,
J. Casares,
G. Nelemans,
B. W. Stappers,
C. M. Tan
Abstract:
We report on radio timing observations of PSR J0210+5845 which reveal large deviations from typical pulsar spin-down behaviour. We interpret these deviations as being due to binary motion around the $V=13.5$ star 2MASS J02105640$+$5845176, which is coincident in celestial position and distance with the pulsar. Archival observations and new optical spectroscopy identify this star as a B6V star with…
▽ More
We report on radio timing observations of PSR J0210+5845 which reveal large deviations from typical pulsar spin-down behaviour. We interpret these deviations as being due to binary motion around the $V=13.5$ star 2MASS J02105640$+$5845176, which is coincident in celestial position and distance with the pulsar. Archival observations and new optical spectroscopy identify this star as a B6V star with a temperature of $T_\mathrm{eff}\approx 14\,000$K and a mass of $M_\mathrm{c}= 3.5$ to $3.8$M$_\odot$, making it the lowest mass main-sequence star known orbiting a non-recycled pulsar. We found that the timing observations constrain the binary orbit to be wide and moderately eccentric, with an orbital period of $P_\mathrm{b}=47^{+40}_{-14}$yr and eccentricity $e=0.46^{+0.10}_{-0.07}$. We predict that the next periastron passage will occur between 2030 and 2034. Due to the low companion mass, we find that the probability for a system with the properties of PSR J0210+5845 and its binary companion to survive the supernova is low. We show that a low velocity and fortuitously directed natal kick is required for the binary to remain bound during the supernova explosion, and argue that an electron-capture supernova is a plausible formation scenario for the pulsar.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
Topological 5d $\mathcal {N} = 2$ Gauge Theory: Novel Floer Homologies, their Dualities, and an $A_\infty$-category of Three-Manifolds
Authors:
Arif Er,
Zhi-Cong Ong,
Meng-Chwan Tan
Abstract:
We show how one can define novel gauge-theoretic Floer homologies of four, three and two-manifolds from the physics of a certain topologically-twisted 5d ${\cal N}=2$ gauge theory via its supersymmetric quantum mechanics interpretation. They are associated with Vafa-Witten, Hitchin and $G_{\mathbb{C}}$-BF configurations on the four, three and two-manifolds, respectively. We also show how one can d…
▽ More
We show how one can define novel gauge-theoretic Floer homologies of four, three and two-manifolds from the physics of a certain topologically-twisted 5d ${\cal N}=2$ gauge theory via its supersymmetric quantum mechanics interpretation. They are associated with Vafa-Witten, Hitchin and $G_{\mathbb{C}}$-BF configurations on the four, three and two-manifolds, respectively. We also show how one can define novel symplectic Floer homologies of Hitchin spaces, which in turn will allow us to derive novel Atiyah-Floer correspondences that relate our gauge-theoretic Floer homologies to symplectic intersection Floer homologies of Higgs bundles. Furthermore, topological invariance and 5d "S-duality" suggest a web of relations and a Langlands duality amongst these novel Floer homologies and their loop/toroidal group generalizations. Last but not least, via a 2d gauged Landau-Ginzburg model interpretation of the 5d theory, we derive, from the soliton string theory that it defines and the 5d partition function, a Fukaya-Seidel type $A_\infty$-category of Hitchin configurations on three-manifolds and its novel Atiyah-Floer correspondence. Our work therefore furnishes purely physical proofs and generalizations of the mathematical conjectures of Haydys [1], Abouzaid-Manolescu [2], and Bousseau [3], and more.
△ Less
Submitted 30 May, 2024; v1 submitted 30 November, 2023;
originally announced November 2023.
-
Contrastive Vision-Language Alignment Makes Efficient Instruction Learner
Authors:
Lizhao Liu,
Xinyu Sun,
Tianhang Xiang,
Zhuangwei Zhuang,
Liuren Yin,
Mingkui Tan
Abstract:
We study the task of extending the large language model (LLM) into a vision-language instruction-following model. This task is crucial but challenging since the LLM is trained on text modality only, making it hard to effectively digest the visual modality. To address this, existing methods typically train a visual adapter to align the representation between a pre-trained vision transformer (ViT) a…
▽ More
We study the task of extending the large language model (LLM) into a vision-language instruction-following model. This task is crucial but challenging since the LLM is trained on text modality only, making it hard to effectively digest the visual modality. To address this, existing methods typically train a visual adapter to align the representation between a pre-trained vision transformer (ViT) and the LLM by a generative image captioning loss. However, we find that the generative objective can only produce weak alignment for vision and language, making the aligned vision-language model very hungry for the instruction fine-tuning data. In this paper, we propose CG-VLM that applies both Contrastive and Generative alignment objectives to effectively align the representation of ViT and LLM. Different from image level and sentence level alignment in common contrastive learning settings, CG-VLM aligns the image-patch level features and text-token level embeddings, which, however, is very hard to achieve as no explicit grounding patch-token relation provided in standard image captioning datasets. To address this issue, we propose to maximize the averaged similarity between pooled image-patch features and text-token embeddings. Extensive experiments demonstrate that the proposed CG-VLM produces strong vision-language alignment and is an efficient instruction learner. For example, using only 10% instruction tuning data, we reach 95% performance of state-of-the-art method LLaVA [29] on the zero-shot ScienceQA-Image benchmark.
△ Less
Submitted 28 November, 2023;
originally announced November 2023.
-
RoboGPT: an intelligent agent of making embodied long-term decisions for daily instruction tasks
Authors:
Yaran Chen,
Wenbo Cui,
Yuanwen Chen,
Mining Tan,
Xinyao Zhang,
Dongbin Zhao,
He Wang
Abstract:
Robotic agents must master common sense and long-term sequential decisions to solve daily tasks through natural language instruction. The developments in Large Language Models (LLMs) in natural language processing have inspired efforts to use LLMs in complex robot planning. Despite LLMs' great generalization and comprehension of instruction tasks, LLMs-generated task plans sometimes lack feasibili…
▽ More
Robotic agents must master common sense and long-term sequential decisions to solve daily tasks through natural language instruction. The developments in Large Language Models (LLMs) in natural language processing have inspired efforts to use LLMs in complex robot planning. Despite LLMs' great generalization and comprehension of instruction tasks, LLMs-generated task plans sometimes lack feasibility and correctness. To address the problem, we propose a RoboGPT agent\footnote{our code and dataset will be released soon} for making embodied long-term decisions for daily tasks, with two modules: 1) LLMs-based planning with re-plan to break the task into multiple sub-goals; 2) RoboSkill individually designed for sub-goals to learn better navigation and manipulation skills. The LLMs-based planning is enhanced with a new robotic dataset and re-plan, called RoboGPT. The new robotic dataset of 67k daily instruction tasks is gathered for fine-tuning the Llama model and obtaining RoboGPT. RoboGPT planner with strong generalization can plan hundreds of daily instruction tasks. Additionally, a low-computational Re-Plan module is designed to allow plans to flexibly adapt to the environment, thereby addressing the nomenclature diversity challenge. The proposed RoboGPT agent outperforms SOTA methods on the ALFRED daily tasks. Moreover, RoboGPT planner exceeds SOTA LLM-based planners like ChatGPT in task-planning rationality for hundreds of unseen daily tasks, and even other domain tasks, while keeping the large model's original broad application and generality.
△ Less
Submitted 30 June, 2024; v1 submitted 27 November, 2023;
originally announced November 2023.