subscribe to arXiv mailings

Exploring Camera Encoder Designs for Autonomous Driving Perception

Authors: Barath Lakshmanan, Joshua Chen, Shiyi Lan, Maying Shen, Zhiding Yu, Jose M. Alvarez

Abstract: The cornerstone of autonomous vehicles (AV) is a solid perception system, where camera encoders play a crucial role. Existing works usually leverage pre-trained Convolutional Neural Networks (CNN) or Vision Transformers (ViTs) designed for general vision tasks, such as image classification, segmentation, and 2D detection. Although those well-known architectures have achieved state-of-the-art accur… ▽ More The cornerstone of autonomous vehicles (AV) is a solid perception system, where camera encoders play a crucial role. Existing works usually leverage pre-trained Convolutional Neural Networks (CNN) or Vision Transformers (ViTs) designed for general vision tasks, such as image classification, segmentation, and 2D detection. Although those well-known architectures have achieved state-of-the-art accuracy in AV-related tasks, e.g., 3D Object Detection, there remains significant potential for improvement in network design due to the nuanced complexities of industrial-level AV dataset. Moreover, existing public AV benchmarks usually contain insufficient data, which might lead to inaccurate evaluation of those architectures.To reveal the AV-specific model insights, we start from a standard general-purpose encoder, ConvNeXt and progressively transform the design. We adjust different design parameters including width and depth of the model, stage compute ratio, attention mechanisms, and input resolution, supported by systematic analysis to each modifications. This customization yields an architecture optimized for AV camera encoder achieving 8.79% mAP improvement over the baseline. We believe our effort could become a sweet cookbook of image encoders for AV and pave the way to the next-level drive system. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2406.12079 [pdf, other]

Multi-Dimensional Pruning: Joint Channel, Layer and Block Pruning with Latency Constraint

Authors: Xinglong Sun, Barath Lakshmanan, Maying Shen, Shiyi Lan, Jingde Chen, Jose Alvarez

Abstract: As we push the boundaries of performance in various vision tasks, the models grow in size correspondingly. To keep up with this growth, we need very aggressive pruning techniques for efficient inference and deployment on edge devices. Existing pruning approaches are limited to channel pruning and struggle with aggressive parameter reductions. In this paper, we propose a novel multi-dimensional pru… ▽ More As we push the boundaries of performance in various vision tasks, the models grow in size correspondingly. To keep up with this growth, we need very aggressive pruning techniques for efficient inference and deployment on edge devices. Existing pruning approaches are limited to channel pruning and struggle with aggressive parameter reductions. In this paper, we propose a novel multi-dimensional pruning framework that jointly optimizes pruning across channels, layers, and blocks while adhering to latency constraints. We develop a latency modeling technique that accurately captures model-wide latency variations during pruning, which is crucial for achieving an optimal latency-accuracy trade-offs at high pruning ratio. We reformulate pruning as a Mixed-Integer Nonlinear Program (MINLP) to efficiently determine the optimal pruned structure with only a single pass. Our extensive results demonstrate substantial improvements over previous methods, particularly at large pruning ratios. In classification, our method significantly outperforms prior art HALP with a Top-1 accuracy of 70.0(v.s. 68.6) and an FPS of 5262 im/s(v.s. 4101 im/s). In 3D object detection, we establish a new state-of-the-art by pruning StreamPETR at a 45% pruning ratio, achieving higher FPS (37.3 vs. 31.7) and mAP (0.451 vs. 0.449) than the dense baseline. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: Under Review

arXiv:2406.06978 [pdf, other]

Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation

Authors: Zhenxin Li, Kailin Li, Shihao Wang, Shiyi Lan, Zhiding Yu, Yishen Ji, Zhiqi Li, Ziyue Zhu, Jan Kautz, Zuxuan Wu, Yu-Gang Jiang, Jose M. Alvarez

Abstract: We propose Hydra-MDP, a novel paradigm employing multiple teachers in a teacher-student model. This approach uses knowledge distillation from both human and rule-based teachers to train the student model, which features a multi-head decoder to learn diverse trajectory candidates tailored to various evaluation metrics. With the knowledge of rule-based teachers, Hydra-MDP learns how the environment… ▽ More We propose Hydra-MDP, a novel paradigm employing multiple teachers in a teacher-student model. This approach uses knowledge distillation from both human and rule-based teachers to train the student model, which features a multi-head decoder to learn diverse trajectory candidates tailored to various evaluation metrics. With the knowledge of rule-based teachers, Hydra-MDP learns how the environment influences the planning in an end-to-end manner instead of resorting to non-differentiable post-processing. This method achieves the $1^{st}$ place in the Navsim challenge, demonstrating significant improvements in generalization across diverse driving environments and conditions. Code will be available at \url{https://github.com/NVlabs/Hydra-MDP}. △ Less

Submitted 19 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

Comments: The 1st place solution of End-to-end Driving at Scale at the CVPR 2024 Autonomous Grand Challenge

arXiv:2405.01533 [pdf, other]

OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning

Authors: Shihao Wang, Zhiding Yu, Xiaohui Jiang, Shiyi Lan, Min Shi, Nadine Chang, Jan Kautz, Ying Li, Jose M. Alvarez

Abstract: The advances in multimodal large language models (MLLMs) have led to growing interests in LLM-based autonomous driving agents to leverage their strong reasoning capabilities. However, capitalizing on MLLMs' strong reasoning capabilities for improved planning behavior is challenging since planning requires full 3D situational awareness beyond 2D reasoning. To address this challenge, our work propos… ▽ More The advances in multimodal large language models (MLLMs) have led to growing interests in LLM-based autonomous driving agents to leverage their strong reasoning capabilities. However, capitalizing on MLLMs' strong reasoning capabilities for improved planning behavior is challenging since planning requires full 3D situational awareness beyond 2D reasoning. To address this challenge, our work proposes a holistic framework for strong alignment between agent models and 3D driving tasks. Our framework starts with a novel 3D MLLM architecture that uses sparse queries to lift and compress visual representations into 3D before feeding them into an LLM. This query-based representation allows us to jointly encode dynamic objects and static map elements (e.g., traffic lanes), providing a condensed world model for perception-action alignment in 3D. We further propose OmniDrive-nuScenes, a new visual question-answering dataset challenging the true 3D situational awareness of a model with comprehensive visual question-answering (VQA) tasks, including scene description, traffic regulation, 3D grounding, counterfactual reasoning, decision making and planning. Extensive studies show the effectiveness of the proposed architecture as well as the importance of the VQA tasks for reasoning and planning in complex 3D scenes. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2404.01990 [pdf, other]

What is Point Supervision Worth in Video Instance Segmentation?

Authors: Shuaiyi Huang, De-An Huang, Zhiding Yu, Shiyi Lan, Subhashree Radhakrishnan, Jose M. Alvarez, Abhinav Shrivastava, Anima Anandkumar

Abstract: Video instance segmentation (VIS) is a challenging vision task that aims to detect, segment, and track objects in videos. Conventional VIS methods rely on densely-annotated object masks which are expensive. We reduce the human annotations to only one point for each object in a video frame during training, and obtain high-quality mask predictions close to fully supervised models. Our proposed train… ▽ More Video instance segmentation (VIS) is a challenging vision task that aims to detect, segment, and track objects in videos. Conventional VIS methods rely on densely-annotated object masks which are expensive. We reduce the human annotations to only one point for each object in a video frame during training, and obtain high-quality mask predictions close to fully supervised models. Our proposed training method consists of a class-agnostic proposal generation module to provide rich negative samples and a spatio-temporal point-based matcher to match the object queries with the provided point annotations. Comprehensive experiments on three VIS benchmarks demonstrate competitive performance of the proposed framework, nearly matching fully supervised methods. △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2402.12177 [pdf, ps, other]

Mafin: Enhancing Black-Box Embeddings with Model Augmented Fine-Tuning

Authors: Mingtian Zhang, Shawn Lan, Peter Hayes, David Barber

Abstract: Retrieval Augmented Generation (RAG) has emerged as an effective solution for mitigating hallucinations in Large Language Models (LLMs). The retrieval stage in RAG typically involves a pre-trained embedding model, which converts queries and passages into vectors to capture their semantics. However, a standard pre-trained embedding model may exhibit sub-optimal performance when applied to specific… ▽ More Retrieval Augmented Generation (RAG) has emerged as an effective solution for mitigating hallucinations in Large Language Models (LLMs). The retrieval stage in RAG typically involves a pre-trained embedding model, which converts queries and passages into vectors to capture their semantics. However, a standard pre-trained embedding model may exhibit sub-optimal performance when applied to specific domain knowledge, necessitating fine-tuning. This paper addresses scenarios where the embeddings are only available from a black-box model. We introduce Model augmented fine-tuning (Mafin) -- a novel approach for fine-tuning a black-box embedding model by augmenting it with a trainable embedding model. Our results demonstrate that Mafin significantly enhances the performance of the black-box embeddings by only requiring the training of a small augmented model. We validate the effectiveness of our method on both labeled and unlabeled datasets, illustrating its broad applicability and efficiency. △ Less

Submitted 12 March, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.00892 [pdf, other]

EVA-GAN: Enhanced Various Audio Generation via Scalable Generative Adversarial Networks

Authors: Shijia Liao, Shiyi Lan, Arun George Zachariah

Abstract: The advent of Large Models marks a new era in machine learning, significantly outperforming smaller models by leveraging vast datasets to capture and synthesize complex patterns. Despite these advancements, the exploration into scaling, especially in the audio generation domain, remains limited, with previous efforts didn't extend into the high-fidelity (HiFi) 44.1kHz domain and suffering from bot… ▽ More The advent of Large Models marks a new era in machine learning, significantly outperforming smaller models by leveraging vast datasets to capture and synthesize complex patterns. Despite these advancements, the exploration into scaling, especially in the audio generation domain, remains limited, with previous efforts didn't extend into the high-fidelity (HiFi) 44.1kHz domain and suffering from both spectral discontinuities and blurriness in the high-frequency domain, alongside a lack of robustness against out-of-domain data. These limitations restrict the applicability of models to diverse use cases, including music and singing generation. Our work introduces Enhanced Various Audio Generation via Scalable Generative Adversarial Networks (EVA-GAN), yields significant improvements over previous state-of-the-art in spectral and high-frequency reconstruction and robustness in out-of-domain data performance, enabling the generation of HiFi audios by employing an extensive dataset of 36,000 hours of 44.1kHz audio, a context-aware module, a Human-In-The-Loop artifact measurement toolkit, and expands the model to approximately 200 million parameters. Demonstrations of our work are available at https://double-blind-eva-gan.cc. △ Less

Submitted 30 January, 2024; originally announced February 2024.

arXiv:2401.03844 [pdf, other]

Fully Attentional Networks with Self-emerging Token Labeling

Authors: Bingyin Zhao, Zhiding Yu, Shiyi Lan, Yutao Cheng, Anima Anandkumar, Yingjie Lao, Jose M. Alvarez

Abstract: Recent studies indicate that Vision Transformers (ViTs) are robust against out-of-distribution scenarios. In particular, the Fully Attentional Network (FAN) - a family of ViT backbones, has achieved state-of-the-art robustness. In this paper, we revisit the FAN models and improve their pre-training with a self-emerging token labeling (STL) framework. Our method contains a two-stage training framew… ▽ More Recent studies indicate that Vision Transformers (ViTs) are robust against out-of-distribution scenarios. In particular, the Fully Attentional Network (FAN) - a family of ViT backbones, has achieved state-of-the-art robustness. In this paper, we revisit the FAN models and improve their pre-training with a self-emerging token labeling (STL) framework. Our method contains a two-stage training framework. Specifically, we first train a FAN token labeler (FAN-TL) to generate semantically meaningful patch token labels, followed by a FAN student model training stage that uses both the token labels and the original class label. With the proposed STL framework, our best model based on FAN-L-Hybrid (77.3M parameters) achieves 84.8% Top-1 accuracy and 42.1% mCE on ImageNet-1K and ImageNet-C, and sets a new state-of-the-art for ImageNet-A (46.1%) and ImageNet-R (56.6%) without using extra data, outperforming the original FAN counterpart by significant margins. The proposed framework also demonstrates significantly enhanced performance on downstream tasks such as semantic segmentation, with up to 1.7% improvement in robustness over the counterpart model. Code is available at https://github.com/NVlabs/STL. △ Less

Submitted 8 January, 2024; originally announced January 2024.

Journal ref: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 5585-5595

arXiv:2312.13764 [pdf, other]

A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Descriptive Properties

Authors: Junfei Xiao, Ziqi Zhou, Wenxuan Li, Shiyi Lan, Jieru Mei, Zhiding Yu, Alan Yuille, Yuyin Zhou, Cihang Xie

Abstract: This paper introduces ProLab, a novel approach using property-level label space for creating strong interpretable segmentation models. Instead of relying solely on category-specific annotations, ProLab uses descriptive properties grounded in common sense knowledge for supervising segmentation models. It is based on two core designs. First, we employ Large Language Models (LLMs) and carefully craft… ▽ More This paper introduces ProLab, a novel approach using property-level label space for creating strong interpretable segmentation models. Instead of relying solely on category-specific annotations, ProLab uses descriptive properties grounded in common sense knowledge for supervising segmentation models. It is based on two core designs. First, we employ Large Language Models (LLMs) and carefully crafted prompts to generate descriptions of all involved categories that carry meaningful common sense knowledge and follow a structured format. Second, we introduce a description embedding model preserving semantic correlation across descriptions and then cluster them into a set of descriptive properties (e.g., 256) using K-Means. These properties are based on interpretable common sense knowledge consistent with theories of human recognition. We empirically show that our approach makes segmentation models perform stronger on five classic benchmarks (e.g., ADE20K, COCO-Stuff, Pascal Context, Cityscapes, and BDD). Our method also shows better scalability with extended training steps than category-level supervision. Our interpretable segmentation framework also emerges with the generalization ability to segment out-of-domain or unknown categories using only in-domain descriptive properties. Code is available at https://github.com/lambert-x/ProLab. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: Preprint. Code is available at https://github.com/lambert-x/ProLab

arXiv:2312.11799 [pdf, other]

Scaling Up Bayesian Neural Networks with Neural Networks

Authors: Zahra Moslemi, Yang Meng, Shiwei Lan, Babak Shahbaba

Abstract: Bayesian Neural Network (BNN) offers a more principled, robust, and interpretable framework for analyzing high-dimensional data. They address the typical challenges associated with conventional deep learning methods, such as data insatiability, ad-hoc nature, and susceptibility to overfitting. However, their implementation typically relies on Markov chain Monte Carlo (MCMC) methods that are charac… ▽ More Bayesian Neural Network (BNN) offers a more principled, robust, and interpretable framework for analyzing high-dimensional data. They address the typical challenges associated with conventional deep learning methods, such as data insatiability, ad-hoc nature, and susceptibility to overfitting. However, their implementation typically relies on Markov chain Monte Carlo (MCMC) methods that are characterized by their computational intensity and inefficiency in a high-dimensional space. To address this issue, we propose a novel Calibration-Emulation-Sampling (CES) strategy to significantly enhance the computational efficiency of BNN. In this CES framework, during the initial calibration stage, we collect a small set of samples from the parameter space. These samples serve as training data for the emulator. Here, we employ a Deep Neural Network (DNN) emulator to approximate the forward mapping, i.e., the process that input data go through various layers to generate predictions. The trained emulator is then used for sampling from the posterior distribution at substantially higher speed compared to the original BNN. Using simulated and real data, we demonstrate that our proposed method improves computational efficiency of BNN, while maintaining similar performance in terms of prediction accuracy and uncertainty quantification. △ Less

Submitted 18 December, 2023; originally announced December 2023.

Comments: 13 pages

arXiv:2312.03031 [pdf, other]

Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?

Authors: Zhiqi Li, Zhiding Yu, Shiyi Lan, Jiahan Li, Jan Kautz, Tong Lu, Jose M. Alvarez

Abstract: End-to-end autonomous driving recently emerged as a promising research direction to target autonomy from a full-stack perspective. Along this line, many of the latest works follow an open-loop evaluation setting on nuScenes to study the planning behavior. In this paper, we delve deeper into the problem by conducting thorough analyses and demystifying more devils in the details. We initially observ… ▽ More End-to-end autonomous driving recently emerged as a promising research direction to target autonomy from a full-stack perspective. Along this line, many of the latest works follow an open-loop evaluation setting on nuScenes to study the planning behavior. In this paper, we delve deeper into the problem by conducting thorough analyses and demystifying more devils in the details. We initially observed that the nuScenes dataset, characterized by relatively simple driving scenarios, leads to an under-utilization of perception information in end-to-end models incorporating ego status, such as the ego vehicle's velocity. These models tend to rely predominantly on the ego vehicle's status for future path planning. Beyond the limitations of the dataset, we also note that current metrics do not comprehensively assess the planning quality, leading to potentially biased conclusions drawn from existing benchmarks. To address this issue, we introduce a new metric to evaluate whether the predicted trajectories adhere to the road. We further propose a simple baseline able to achieve competitive results without relying on perception annotations. Given the current limitations on the benchmark and metrics, we suggest the community reassess relevant prevailing research and be cautious whether the continued pursuit of state-of-the-art would yield convincing and universal conclusions. Code and models are available at \url{https://github.com/NVlabs/BEV-Planner} △ Less

Submitted 2 June, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

Comments: Accept to cvpr 2024

arXiv:2312.01696 [pdf, other]

BEVNeXt: Reviving Dense BEV Frameworks for 3D Object Detection

Authors: Zhenxin Li, Shiyi Lan, Jose M. Alvarez, Zuxuan Wu

Abstract: Recently, the rise of query-based Transformer decoders is reshaping camera-based 3D object detection. These query-based decoders are surpassing the traditional dense BEV (Bird's Eye View)-based methods. However, we argue that dense BEV frameworks remain important due to their outstanding abilities in depth estimation and object localization, depicting 3D scenes accurately and comprehensively. This… ▽ More Recently, the rise of query-based Transformer decoders is reshaping camera-based 3D object detection. These query-based decoders are surpassing the traditional dense BEV (Bird's Eye View)-based methods. However, we argue that dense BEV frameworks remain important due to their outstanding abilities in depth estimation and object localization, depicting 3D scenes accurately and comprehensively. This paper aims to address the drawbacks of the existing dense BEV-based 3D object detectors by introducing our proposed enhanced components, including a CRF-modulated depth estimation module enforcing object-level consistencies, a long-term temporal aggregation module with extended receptive fields, and a two-stage object decoder combining perspective techniques with CRF-modulated depth embedding. These enhancements lead to a "modernized" dense BEV framework dubbed BEVNeXt. On the nuScenes benchmark, BEVNeXt outperforms both BEV-based and query-based frameworks under various settings, achieving a state-of-the-art result of 64.2 NDS on the nuScenes test set. Code will be available at \url{https://github.com/woxihuanjiangguo/BEVNeXt}. △ Less

Submitted 24 March, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

arXiv:2312.00081 [pdf, other]

Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding

Authors: Wujian Peng, Sicheng Xie, Zuyao You, Shiyi Lan, Zuxuan Wu

Abstract: Vision language models (VLM) have demonstrated remarkable performance across various downstream tasks. However, understanding fine-grained visual-linguistic concepts, such as attributes and inter-object relationships, remains a significant challenge. While several benchmarks aim to evaluate VLMs in finer granularity, their primary focus remains on the linguistic aspect, neglecting the visual dimen… ▽ More Vision language models (VLM) have demonstrated remarkable performance across various downstream tasks. However, understanding fine-grained visual-linguistic concepts, such as attributes and inter-object relationships, remains a significant challenge. While several benchmarks aim to evaluate VLMs in finer granularity, their primary focus remains on the linguistic aspect, neglecting the visual dimension. Here, we highlight the importance of evaluating VLMs from both a textual and visual perspective. We introduce a progressive pipeline to synthesize images that vary in a specific attribute while ensuring consistency in all other aspects. Utilizing this data engine, we carefully design a benchmark, SPEC, to diagnose the comprehension of object size, position, existence, and count. Subsequently, we conduct a thorough evaluation of four leading VLMs on SPEC. Surprisingly, their performance is close to random guess, revealing significant limitations. With this in mind, we propose a simple yet effective approach to optimize VLMs in fine-grained understanding, achieving significant improvements on SPEC without compromising the zero-shot performance. Results on two additional fine-grained benchmarks also show consistent improvements, further validating the transferability of our approach. Code and data are available at https://github.com/wjpoom/SPEC. △ Less

Submitted 30 March, 2024; v1 submitted 29 November, 2023; originally announced December 2023.

Comments: Accepted by CVPR 2024

arXiv:2311.14671 [pdf, other]

SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation

Authors: Lingchen Meng, Shiyi Lan, Hengduo Li, Jose M. Alvarez, Zuxuan Wu, Yu-Gang Jiang

Abstract: In-context segmentation aims at segmenting novel images using a few labeled example images, termed as "in-context examples", exploring content similarities between examples and the target. The resulting models can be generalized seamlessly to novel segmentation tasks, significantly reducing the labeling and training costs compared with conventional pipelines. However, in-context segmentation is mo… ▽ More In-context segmentation aims at segmenting novel images using a few labeled example images, termed as "in-context examples", exploring content similarities between examples and the target. The resulting models can be generalized seamlessly to novel segmentation tasks, significantly reducing the labeling and training costs compared with conventional pipelines. However, in-context segmentation is more challenging than classic ones requiring the model to learn segmentation rules conditioned on a few samples. Unlike previous work with ad-hoc or non-end-to-end designs, we propose SEGIC, an end-to-end segment-in-context framework built upon a single vision foundation model (VFM). In particular, SEGIC leverages the emergent correspondence within VFM to capture dense relationships between target images and in-context samples. As such, information from in-context samples is then extracted into three types of instructions, i.e. geometric, visual, and meta instructions, serving as explicit conditions for the final mask prediction. SEGIC is a straightforward yet effective approach that yields state-of-the-art performance on one-shot segmentation benchmarks. Notably, SEGIC can be easily generalized to diverse tasks, including video object segmentation and open-vocabulary segmentation. Code will be available at https://github.com/MengLcool/SEGIC. △ Less

Submitted 29 March, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

arXiv:2311.05175 [pdf, other]

doi 10.1103/PhysRevLett.131.193601

Creation of Two-Mode Squeezed States in Atomic Mechanical Oscillators

Authors: Wui Seng Leong, Mingjie Xin, Zilong Chen, Yu Wang, Shau-Yu Lan

Abstract: Two-mode squeezed states, which are entangled states with bipartite quantum correlations in continuous-variable systems, are crucial in quantum information processing and metrology. Recently, continuous-variable quantum computing with the vibrational modes of trapped atoms has emerged with significant progress, featuring a high degree of control in hybridizing with spin qubits. Creating two-mode s… ▽ More Two-mode squeezed states, which are entangled states with bipartite quantum correlations in continuous-variable systems, are crucial in quantum information processing and metrology. Recently, continuous-variable quantum computing with the vibrational modes of trapped atoms has emerged with significant progress, featuring a high degree of control in hybridizing with spin qubits. Creating two-mode squeezed states in such a platform could enable applications that are only viable with photons. Here, we experimentally demonstrate two-mode squeezed states by employing atoms in a two-dimensional optical lattice as quantum registers. The states are generated by a controlled projection conditioned on the relative phase of two independent squeezed states. The individual squeezing is created by sudden jumps of the oscillators' frequencies, allowing generating of the two-mode squeezed states at a rate within a fraction of the oscillation frequency. We validate the states by entanglement steering criteria and Fock state analysis. Our results can be applied in other mechanical oscillators for quantum sensing and continuous-variable quantum information. △ Less

Submitted 9 November, 2023; originally announced November 2023.

Journal ref: PhysRevLett.131.193601 (2023)

arXiv:2311.03695 [pdf, other]

Context Shift Reduction for Offline Meta-Reinforcement Learning

Authors: Yunkai Gao, Rui Zhang, Jiaming Guo, Fan Wu, Qi Yi, Shaohui Peng, Siming Lan, Ruizhi Chen, Zidong Du, Xing Hu, Qi Guo, Ling Li, Yunji Chen

Abstract: Offline meta-reinforcement learning (OMRL) utilizes pre-collected offline datasets to enhance the agent's generalization ability on unseen tasks. However, the context shift problem arises due to the distribution discrepancy between the contexts used for training (from the behavior policy) and testing (from the exploration policy). The context shift problem leads to incorrect task inference and fur… ▽ More Offline meta-reinforcement learning (OMRL) utilizes pre-collected offline datasets to enhance the agent's generalization ability on unseen tasks. However, the context shift problem arises due to the distribution discrepancy between the contexts used for training (from the behavior policy) and testing (from the exploration policy). The context shift problem leads to incorrect task inference and further deteriorates the generalization ability of the meta-policy. Existing OMRL methods either overlook this problem or attempt to mitigate it with additional information. In this paper, we propose a novel approach called Context Shift Reduction for OMRL (CSRO) to address the context shift problem with only offline datasets. The key insight of CSRO is to minimize the influence of policy in context during both the meta-training and meta-test phases. During meta-training, we design a max-min mutual information representation learning mechanism to diminish the impact of the behavior policy on task representation. In the meta-test phase, we introduce the non-prior context collection strategy to reduce the effect of the exploration policy. Experimental results demonstrate that CSRO significantly reduces the context shift and improves the generalization ability, surpassing previous methods across various challenging domains. △ Less

Submitted 6 November, 2023; originally announced November 2023.

arXiv:2311.03466 [pdf]

Spatial Correlation at the Boson Peak Frequency in Amorphous Materials

Authors: X. Y. Li, H. P. Zhang, S. Lan, D. L. Abernathy, C. H. Hu, L. R. Fan, M. Z. Li, X. -L. Wang

Abstract: The Boson peak (BP), an excess of vibrational density of states, is ubiquitous for amorphous materials and is believed to hold the key to understanding the dynamics of glass and glass transition. Previous studies have established an energy scale for the BP, which is ~1-10 meV or ~THz in frequency. However, so far, little is known about the momentum dependence or spatial correlation of the BP. Here… ▽ More The Boson peak (BP), an excess of vibrational density of states, is ubiquitous for amorphous materials and is believed to hold the key to understanding the dynamics of glass and glass transition. Previous studies have established an energy scale for the BP, which is ~1-10 meV or ~THz in frequency. However, so far, little is known about the momentum dependence or spatial correlation of the BP. Here, we report the observation of the BP in model Zr-Cu-Al metallic glasses over a wide range of momentum transfer, using inelastic neutron scattering, heat capacity, Raman scattering measurements, and molecular dynamics (MD) simulations. The BP energy is largely dispersionless; however, the BP intensity was found to scale with the static structure factor. Additional MD simulations with a generic Lennard-Jones potential confirmed the same. Based on these results, an analytical expression for the dynamic structure factor was formulated for the BP excitation. Further analysis of the simulated disordered structures suggests that the BP is related to local structure fluctuations (e.g., in shear strain). Our results offered insights into the nature of the BP and provide guidance for the development of theories of amorphous materials. △ Less

Submitted 6 November, 2023; originally announced November 2023.

Comments: 19 pages, 4 figures,

arXiv:2311.01316 [pdf, other]

doi 10.1103/PhysRevLett.131.221602

Heating up quadruply quantized vortices: Splitting patterns and dynamical transitions

Authors: Shanquan Lan, Xin Li, Yu Tian, Peng Yang, Hongbao Zhang

Abstract: Using holographic duality, we investigate the impact of finite temperature on the instability and splitting patterns of quadruply quantized vortices, providing the first-ever analysis in this context. Through linear stability analysis, we reveal the occurrence of two consecutive dynamical transitions. At a specific low temperature, the dominant unstable mode transitions from the $2$-fold rotationa… ▽ More Using holographic duality, we investigate the impact of finite temperature on the instability and splitting patterns of quadruply quantized vortices, providing the first-ever analysis in this context. Through linear stability analysis, we reveal the occurrence of two consecutive dynamical transitions. At a specific low temperature, the dominant unstable mode transitions from the $2$-fold rotational symmetry mode to the $3$-fold one, followed by a transition from the $3$-fold one to the $4$-fold one at a higher temperature. As the temperature is increased, we also observe the $5$ and $6$-fold rotational symmetry unstable modes get excited successively. Employing the full non-linear numerical simulations, we further demonstrate that these two novel dynamical transitions, along with the temperature-induced instabilities for the $5$ and $6$-fold rotational symmetry modes, can be identified by examining the resulting distinct splitting patterns, which offers a promising route for the experimental verification in the cold atom gases. △ Less

Submitted 19 November, 2023; v1 submitted 2 November, 2023; originally announced November 2023.

Comments: 10 pages,8 figures, version to appear in Physical Review Letters

arXiv:2311.01075 [pdf, other]

Contrastive Modules with Temporal Attention for Multi-Task Reinforcement Learning

Authors: Siming Lan, Rui Zhang, Qi Yi, Jiaming Guo, Shaohui Peng, Yunkai Gao, Fan Wu, Ruizhi Chen, Zidong Du, Xing Hu, Xishan Zhang, Ling Li, Yunji Chen

Abstract: In the field of multi-task reinforcement learning, the modular principle, which involves specializing functionalities into different modules and combining them appropriately, has been widely adopted as a promising approach to prevent the negative transfer problem that performance degradation due to conflicts between tasks. However, most of the existing multi-task RL methods only combine shared mod… ▽ More In the field of multi-task reinforcement learning, the modular principle, which involves specializing functionalities into different modules and combining them appropriately, has been widely adopted as a promising approach to prevent the negative transfer problem that performance degradation due to conflicts between tasks. However, most of the existing multi-task RL methods only combine shared modules at the task level, ignoring that there may be conflicts within the task. In addition, these methods do not take into account that without constraints, some modules may learn similar functions, resulting in restricting the model's expressiveness and generalization capability of modular methods. In this paper, we propose the Contrastive Modules with Temporal Attention(CMTA) method to address these limitations. CMTA constrains the modules to be different from each other by contrastive learning and combining shared modules at a finer granularity than the task level with temporal attention, alleviating the negative transfer within the task and improving the generalization ability and the performance for multi-task RL. We conducted the experiment on Meta-World, a multi-task RL benchmark containing various robotics manipulation tasks. Experimental results show that CMTA outperforms learning each task individually for the first time and achieves substantial performance improvements over the baselines. △ Less

Submitted 2 November, 2023; originally announced November 2023.

Comments: This paper has been accepted at NeurIPS 2023 as a poster

arXiv:2310.19731 [pdf, other]

ViR: Towards Efficient Vision Retention Backbones

Authors: Ali Hatamizadeh, Michael Ranzinger, Shiyi Lan, Jose M. Alvarez, Sanja Fidler, Jan Kautz

Abstract: Vision Transformers (ViTs) have attracted a lot of popularity in recent years, due to their exceptional capabilities in modeling long-range spatial dependencies and scalability for large scale training. Although the training parallelism of self-attention mechanism plays an important role in retaining great performance, its quadratic complexity baffles the application of ViTs in many scenarios whic… ▽ More Vision Transformers (ViTs) have attracted a lot of popularity in recent years, due to their exceptional capabilities in modeling long-range spatial dependencies and scalability for large scale training. Although the training parallelism of self-attention mechanism plays an important role in retaining great performance, its quadratic complexity baffles the application of ViTs in many scenarios which demand fast inference. This effect is even more pronounced in applications in which autoregressive modeling of input features is required. In Natural Language Processing (NLP), a new stream of efforts has proposed parallelizable models with recurrent formulation that allows for efficient inference in generative applications. Inspired by this trend, we propose a new class of computer vision models, dubbed Vision Retention Networks (ViR), with dual parallel and recurrent formulations, which strike an optimal balance between fast inference and parallel training with competitive performance. In particular, ViR scales favorably for image throughput and memory consumption in tasks that require higher-resolution images due to its flexible formulation in processing large sequence lengths. The ViR is the first attempt to realize dual parallel and recurrent equivalency in a general vision backbone for recognition tasks. We have validated the effectiveness of ViR through extensive experiments with different dataset sizes and various image resolutions and achieved competitive performance. Code: https://github.com/NVlabs/ViR △ Less

Submitted 26 January, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

Comments: Introduction of Vision Retention Networks (ViR) for Efficient Visual Modeling

arXiv:2309.16963 [pdf, other]

doi 10.3847/2041-8213/acfedf

Effect of irradiation on the spin of millisecond pulsars

Authors: Shunyi Lan, Xiangcun Meng

Abstract: A millisecond pulsar (MSP) is an old neutron star (NS) that has accreted material from its companion star, causing it to spin up, which is known as the recycling scenario. During the mass transfer phase, the system manifests itself as an X-ray binary. PSR J1402+13 is an MSP with a spin period of $5.89~{\rm ms}$ and a spin period derivative of $\log\dot{P}_{\rm spin}=-16.32$. These properties make… ▽ More A millisecond pulsar (MSP) is an old neutron star (NS) that has accreted material from its companion star, causing it to spin up, which is known as the recycling scenario. During the mass transfer phase, the system manifests itself as an X-ray binary. PSR J1402+13 is an MSP with a spin period of $5.89~{\rm ms}$ and a spin period derivative of $\log\dot{P}_{\rm spin}=-16.32$. These properties make it a notable object within the pulsar population, as MSPs typically exhibit low spin period derivatives. In this paper, we aim to explain how an MSP can posses high spin period derivative by binary evolution. By utilizing the stellar evolution code \textsc{MESA}, we examine the effects of irradiation on the companion star and the propeller effect on the NS during binary evolution. We demonstrate that irradiation can modify the spin period and mass of an MSP, resulting in a higher spin period derivative. These results suggest that the irradiation effect may serve as a key factor in explaining MSPs with high spin period derivatives. △ Less

Submitted 29 September, 2023; originally announced September 2023.

Comments: Accepted for publication in ApJL. Compiled in AASTEX62

Journal ref: ApJL 956 L24 (2023)

arXiv:2308.04556 [pdf, other]

FocalFormer3D : Focusing on Hard Instance for 3D Object Detection

Authors: Yilun Chen, Zhiding Yu, Yukang Chen, Shiyi Lan, Animashree Anandkumar, Jiaya Jia, Jose Alvarez

Abstract: False negatives (FN) in 3D object detection, {\em e.g.}, missing predictions of pedestrians, vehicles, or other obstacles, can lead to potentially dangerous situations in autonomous driving. While being fatal, this issue is understudied in many current 3D detection methods. In this work, we propose Hard Instance Probing (HIP), a general pipeline that identifies \textit{FN} in a multi-stage manner… ▽ More False negatives (FN) in 3D object detection, {\em e.g.}, missing predictions of pedestrians, vehicles, or other obstacles, can lead to potentially dangerous situations in autonomous driving. While being fatal, this issue is understudied in many current 3D detection methods. In this work, we propose Hard Instance Probing (HIP), a general pipeline that identifies \textit{FN} in a multi-stage manner and guides the models to focus on excavating difficult instances. For 3D object detection, we instantiate this method as FocalFormer3D, a simple yet effective detector that excels at excavating difficult objects and improving prediction recall. FocalFormer3D features a multi-stage query generation to discover hard objects and a box-level transformer decoder to efficiently distinguish objects from massive object candidates. Experimental results on the nuScenes and Waymo datasets validate the superior performance of FocalFormer3D. The advantage leads to strong performance on both detection and tracking, in both LiDAR and multi-modal settings. Notably, FocalFormer3D achieves a 70.5 mAP and 73.9 NDS on nuScenes detection benchmark, while the nuScenes tracking benchmark shows 72.1 AMOTA, both ranking 1st place on the nuScenes LiDAR leaderboard. Our code is available at \url{https://github.com/NVlabs/FocalFormer3D}. △ Less

Submitted 8 August, 2023; originally announced August 2023.

Comments: Accepted by ICCV 2023

arXiv:2308.03666 [pdf, other]

Bridging Trustworthiness and Open-World Learning: An Exploratory Neural Approach for Enhancing Interpretability, Generalization, and Robustness

Authors: Shide Du, Zihan Fang, Shiyang Lan, Yanchao Tan, Manuel Günther, Shiping Wang, Wenzhong Guo

Abstract: As researchers strive to narrow the gap between machine intelligence and human through the development of artificial intelligence technologies, it is imperative that we recognize the critical importance of trustworthiness in open-world, which has become ubiquitous in all aspects of daily life for everyone. However, several challenges may create a crisis of trust in current artificial intelligence… ▽ More As researchers strive to narrow the gap between machine intelligence and human through the development of artificial intelligence technologies, it is imperative that we recognize the critical importance of trustworthiness in open-world, which has become ubiquitous in all aspects of daily life for everyone. However, several challenges may create a crisis of trust in current artificial intelligence systems that need to be bridged: 1) Insufficient explanation of predictive results; 2) Inadequate generalization for learning models; 3) Poor adaptability to uncertain environments. Consequently, we explore a neural program to bridge trustworthiness and open-world learning, extending from single-modal to multi-modal scenarios for readers. 1) To enhance design-level interpretability, we first customize trustworthy networks with specific physical meanings; 2) We then design environmental well-being task-interfaces via flexible learning regularizers for improving the generalization of trustworthy learning; 3) We propose to increase the robustness of trustworthy learning by integrating open-world recognition losses with agent mechanisms. Eventually, we enhance various trustworthy properties through the establishment of design-level explainability, environmental well-being task-interfaces and open-world recognition programs. These designed open-world protocols are applicable across a wide range of surroundings, under open-world multimedia recognition scenarios with significant performance improvements observed. △ Less

Submitted 18 October, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

arXiv:2307.01492 [pdf, other]

FB-OCC: 3D Occupancy Prediction based on Forward-Backward View Transformation

Authors: Zhiqi Li, Zhiding Yu, David Austin, Mingsheng Fang, Shiyi Lan, Jan Kautz, Jose M. Alvarez

Abstract: This technical report summarizes the winning solution for the 3D Occupancy Prediction Challenge, which is held in conjunction with the CVPR 2023 Workshop on End-to-End Autonomous Driving and CVPR 23 Workshop on Vision-Centric Autonomous Driving Workshop. Our proposed solution FB-OCC builds upon FB-BEV, a cutting-edge camera-based bird's-eye view perception design using forward-backward projection.… ▽ More This technical report summarizes the winning solution for the 3D Occupancy Prediction Challenge, which is held in conjunction with the CVPR 2023 Workshop on End-to-End Autonomous Driving and CVPR 23 Workshop on Vision-Centric Autonomous Driving Workshop. Our proposed solution FB-OCC builds upon FB-BEV, a cutting-edge camera-based bird's-eye view perception design using forward-backward projection. On top of FB-BEV, we further study novel designs and optimization tailored to the 3D occupancy prediction task, including joint depth-semantic pre-training, joint voxel-BEV representation, model scaling up, and effective post-processing strategies. These designs and optimization result in a state-of-the-art mIoU score of 54.19% on the nuScenes dataset, ranking the 1st place in the challenge track. Code and models will be released at: https://github.com/NVlabs/FB-BEV. △ Less

Submitted 4 July, 2023; originally announced July 2023.

Comments: Outstanding Champion and Innovation Award in the 3D Occupancy Prediction Challenge (CVPR23)

arXiv:2306.16378 [pdf, other]

Spatiotemporal Besov Priors for Bayesian Inverse Problems

Authors: Shiwei Lan, Mirjeta Pasha, Shuyi Li, Weining Shen

Abstract: Fast development in science and technology has driven the need for proper statistical tools to capture special data features such as abrupt changes or sharp contrast. Many inverse problems in data science require spatiotemporal solutions derived from a sequence of time-dependent objects with these spatial features, e.g., dynamic reconstruction of computerized tomography (CT) images with edges. Con… ▽ More Fast development in science and technology has driven the need for proper statistical tools to capture special data features such as abrupt changes or sharp contrast. Many inverse problems in data science require spatiotemporal solutions derived from a sequence of time-dependent objects with these spatial features, e.g., dynamic reconstruction of computerized tomography (CT) images with edges. Conventional methods based on Gaussian processes (GP) often fall short in providing satisfactory solutions since they tend to offer over-smooth priors. Recently, the Besov process (BP), defined by wavelet expansions with random coefficients, has emerged as a more suitable prior for Bayesian inverse problems of this nature. While BP excels in handling spatial inhomogeneity, it does not automatically incorporate temporal correlation inherited in the dynamically changing objects. In this paper, we generalize BP to a novel spatiotemporal Besov process (STBP) by replacing the random coefficients in the series expansion with stochastic time functions as Q-exponential process (Q-EP) which governs the temporal correlation structure. We thoroughly investigate the mathematical and statistical properties of STBP. A white-noise representation of STBP is also proposed to facilitate the inference. Simulations, two limited-angle CT reconstruction examples and a highly non-linear inverse problem involving Navier-Stokes equation are used to demonstrate the advantage of the proposed STBP in preserving spatial features while accounting for temporal changes compared with the classic STGP and a time-uncorrelated approach. △ Less

Submitted 26 March, 2024; v1 submitted 28 June, 2023; originally announced June 2023.

Comments: 47 pages, 15 figures

arXiv:2306.07307 [pdf, other]

Online Prototype Alignment for Few-shot Policy Transfer

Authors: Qi Yi, Rui Zhang, Shaohui Peng, Jiaming Guo, Yunkai Gao, Kaizhao Yuan, Ruizhi Chen, Siming Lan, Xing Hu, Zidong Du, Xishan Zhang, Qi Guo, Yunji Chen

Abstract: Domain adaptation in reinforcement learning (RL) mainly deals with the changes of observation when transferring the policy to a new environment. Many traditional approaches of domain adaptation in RL manage to learn a mapping function between the source and target domain in explicit or implicit ways. However, they typically require access to abundant data from the target domain. Besides, they ofte… ▽ More Domain adaptation in reinforcement learning (RL) mainly deals with the changes of observation when transferring the policy to a new environment. Many traditional approaches of domain adaptation in RL manage to learn a mapping function between the source and target domain in explicit or implicit ways. However, they typically require access to abundant data from the target domain. Besides, they often rely on visual clues to learn the mapping function and may fail when the source domain looks quite different from the target domain. To address these problems, we propose a novel framework Online Prototype Alignment (OPA) to learn the mapping function based on the functional similarity of elements and is able to achieve the few-shot policy transfer within only several episodes. The key insight of OPA is to introduce an exploration mechanism that can interact with the unseen elements of the target domain in an efficient and purposeful manner, and then connect them with the seen elements in the source domain according to their functionalities (instead of visual clues). Experimental results show that when the target domain looks visually different from the source domain, OPA can achieve better transfer performance even with much fewer samples from the target domain, outperforming prior methods. △ Less

Submitted 12 June, 2023; originally announced June 2023.

Comments: This paper has been accepted at ICML2023

arXiv:2305.18312 [pdf, other]

Balancing Test Accuracy and Security in Computerized Adaptive Testing

Authors: Wanyong Feng, Aritra Ghosh, Stephen Sireci, Andrew S. Lan

Abstract: Computerized adaptive testing (CAT) is a form of personalized testing that accurately measures students' knowledge levels while reducing test length. Bilevel optimization-based CAT (BOBCAT) is a recent framework that learns a data-driven question selection algorithm to effectively reduce test length and improve test accuracy. However, it suffers from high question exposure and test overlap rates,… ▽ More Computerized adaptive testing (CAT) is a form of personalized testing that accurately measures students' knowledge levels while reducing test length. Bilevel optimization-based CAT (BOBCAT) is a recent framework that learns a data-driven question selection algorithm to effectively reduce test length and improve test accuracy. However, it suffers from high question exposure and test overlap rates, which potentially affects test security. This paper introduces a constrained version of BOBCAT to address these problems by changing its optimization setup and enabling us to trade off test accuracy for question exposure and test overlap rates. We show that C-BOBCAT is effective through extensive experiments on two real-world adult testing datasets. △ Less

Submitted 18 May, 2023; originally announced May 2023.

Comments: The 24th International Conference on Artificial Intelligence in Education (AIED 2023)

arXiv:2305.17569 [pdf, other]

doi 10.1109/TMM.2023.3275853

Collaborative Multi-Agent Video Fast-Forwarding

Authors: Shuyue Lan, Zhilu Wang, Ermin Wei, Amit K. Roy-Chowdhury, Qi Zhu

Abstract: Multi-agent applications have recently gained significant popularity. In many computer vision tasks, a network of agents, such as a team of robots with cameras, could work collaboratively to perceive the environment for efficient and accurate situation awareness. However, these agents often have limited computation, communication, and storage resources. Thus, reducing resource consumption while st… ▽ More Multi-agent applications have recently gained significant popularity. In many computer vision tasks, a network of agents, such as a team of robots with cameras, could work collaboratively to perceive the environment for efficient and accurate situation awareness. However, these agents often have limited computation, communication, and storage resources. Thus, reducing resource consumption while still providing an accurate perception of the environment becomes an important goal when deploying multi-agent systems. To achieve this goal, we identify and leverage the overlap among different camera views in multi-agent systems for reducing the processing, transmission and storage of redundant/unimportant video frames. Specifically, we have developed two collaborative multi-agent video fast-forwarding frameworks in distributed and centralized settings, respectively. In these frameworks, each individual agent can selectively process or skip video frames at adjustable paces based on multiple strategies via reinforcement learning. Multiple agents then collaboratively sense the environment via either 1) a consensus-based distributed framework called DMVF that periodically updates the fast-forwarding strategies of agents by establishing communication and consensus among connected neighbors, or 2) a centralized framework called MFFNet that utilizes a central controller to decide the fast-forwarding strategies for agents based on collected data. We demonstrate the efficacy and efficiency of our proposed frameworks on a real-world surveillance video dataset VideoWeb and a new simulated driving dataset CarlaSim, through extensive simulations and deployment on an embedded platform with TCP communication. We show that compared with other approaches in the literature, our frameworks achieve better coverage of important frames, while significantly reducing the number of frames processed at each agent. △ Less

Submitted 27 May, 2023; originally announced May 2023.

Comments: IEEE Transactions on Multimedia, 2023. arXiv admin note: text overlap with arXiv:2008.04437

arXiv:2305.00274 [pdf]

Evolution of medium-range order and its correlation with magnetic nanodomains in Fe-Dy-B-Nb bulk metallic glasses

Authors: Jiacheng Ge, Yao Gu, Zhongzhen Yao, Sinan Liu, Huiqiang Ying, Chenyu Lu, Zhenduo Wu, Yang Ren, Jun-ichi Suzuki, Zhenhua Xie, Yubin Ke, He Zhu, Song Tang, Xun-Li Wang, Si Lan

Abstract: Fe-based metallic glasses are promising functional materials for advanced magnetism and sensor fields. Tailoring magnetic performance in amorphous materials requires a thorough knowledge of the correlation between structural disorder and magnetic order, which remains ambiguous. Two practical difficulties remain: the first is directly observing subtle magnetic structural changes on multiple scales,… ▽ More Fe-based metallic glasses are promising functional materials for advanced magnetism and sensor fields. Tailoring magnetic performance in amorphous materials requires a thorough knowledge of the correlation between structural disorder and magnetic order, which remains ambiguous. Two practical difficulties remain: the first is directly observing subtle magnetic structural changes on multiple scales, and the second is precisely regulating the various amorphous states. Here we propose a novel approach to tailor the amorphous structure through the liquid liquid phase transition. In-situ synchrotron diffraction has unraveled a medium-range ordering process dominated by edge-sharing cluster connectivity during the liquid-liquid phase transition. Moreover, nanodomains with topological order have been found to exist in composition with liquid-liquid phase transition, manifesting as hexagonal patterns in small-angle neutron scattering profiles. The liquid-liquid phase transition can induce the nanodomains to be more locally ordered, generating stronger exchange interactions due to the reduced Fe-Fe bond and the enhanced structural order, leading to the increment of saturation magnetization. Furthermore, the increased local heterogeneity in the medium range scale enhances the magnetic anisotropy, promoting the permeability response under applied stress and leading to a better stress-impedance effect. These experimental results pave the way to tailor the magnetic structure and performance through the liquid-liquid phase transition. △ Less

Submitted 29 April, 2023; originally announced May 2023.

Comments: number of pages is 31 and number of figures is 14, including the Supplementary Material

arXiv:2302.04858 [pdf, other]

Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning

Authors: Zhuolin Yang, Wei Ping, Zihan Liu, Vijay Korthikanti, Weili Nie, De-An Huang, Linxi Fan, Zhiding Yu, Shiyi Lan, Bo Li, Ming-Yu Liu, Yuke Zhu, Mohammad Shoeybi, Bryan Catanzaro, Chaowei Xiao, Anima Anandkumar

Abstract: Augmenting pretrained language models (LMs) with a vision encoder (e.g., Flamingo) has obtained the state-of-the-art results in image-to-text generation. However, these models store all the knowledge within their parameters, thus often requiring enormous model parameters to model the abundant visual concepts and very rich textual descriptions. Additionally, they are inefficient in incorporating ne… ▽ More Augmenting pretrained language models (LMs) with a vision encoder (e.g., Flamingo) has obtained the state-of-the-art results in image-to-text generation. However, these models store all the knowledge within their parameters, thus often requiring enormous model parameters to model the abundant visual concepts and very rich textual descriptions. Additionally, they are inefficient in incorporating new data, requiring a computational-expensive fine-tuning process. In this work, we introduce a Retrieval-augmented Visual Language Model, Re-ViLM, built upon the Flamingo, that supports retrieving the relevant knowledge from the external database for zero and in-context few-shot image-to-text generations. By storing certain knowledge explicitly in the external database, our approach reduces the number of model parameters and can easily accommodate new data during evaluation by simply updating the database. We also construct an interleaved image and text data that facilitates in-context few-shot learning capabilities. We demonstrate that Re-ViLM significantly boosts performance for image-to-text generation tasks, especially for zero-shot and few-shot generation in out-of-domain settings with 4 times less parameters compared with baseline methods. △ Less

Submitted 22 October, 2023; v1 submitted 9 February, 2023; originally announced February 2023.

Comments: Findings of EMNLP 2023

arXiv:2301.03992 [pdf, other]

Vision Transformers Are Good Mask Auto-Labelers

Authors: Shiyi Lan, Xitong Yang, Zhiding Yu, Zuxuan Wu, Jose M. Alvarez, Anima Anandkumar

Abstract: We propose Mask Auto-Labeler (MAL), a high-quality Transformer-based mask auto-labeling framework for instance segmentation using only box annotations. MAL takes box-cropped images as inputs and conditionally generates their mask pseudo-labels.We show that Vision Transformers are good mask auto-labelers. Our method significantly reduces the gap between auto-labeling and human annotation regarding… ▽ More We propose Mask Auto-Labeler (MAL), a high-quality Transformer-based mask auto-labeling framework for instance segmentation using only box annotations. MAL takes box-cropped images as inputs and conditionally generates their mask pseudo-labels.We show that Vision Transformers are good mask auto-labelers. Our method significantly reduces the gap between auto-labeling and human annotation regarding mask quality. Instance segmentation models trained using the MAL-generated masks can nearly match the performance of their fully-supervised counterparts, retaining up to 97.4\% performance of fully supervised models. The best model achieves 44.1\% mAP on COCO instance segmentation (test-dev 2017), outperforming state-of-the-art box-supervised methods by significant margins. Qualitative results indicate that masks produced by MAL are, in some cases, even better than human annotations. △ Less

Submitted 10 January, 2023; originally announced January 2023.

arXiv:2301.03203 [pdf, other]

doi 10.1007/JHEP05(2023)223

Splitting of doubly quantized vortices in holographic superfluid of finite temperature

Authors: Shanquan Lan, Xin Li, Jiexiong Mo, Yu Tian, Yu-Kun Yan, Peng Yang, Hongbao Zhang

Abstract: The temperature effect on the linear instability and the splitting process of a doubly quantized vortex is studied. Using the linear perturbation theory to calculate out the quasi-normal modes of the doubly quantized vortex, we find that the imaginary part of the unstable mode increases with the temperature till some turning temperature, after which the imaginary part of the unstable mode decrease… ▽ More The temperature effect on the linear instability and the splitting process of a doubly quantized vortex is studied. Using the linear perturbation theory to calculate out the quasi-normal modes of the doubly quantized vortex, we find that the imaginary part of the unstable mode increases with the temperature till some turning temperature, after which the imaginary part of the unstable mode decreases with the temperature. On the other hand, by the fully non-linear numerical simulations, we also examine the real time splitting process of the doubly quantized vortex, where not only do the split singly quantized vortex pair depart from each other, but also revolve around each other. In particular, the characteristic time scale for the splitting process is identified and its temperature dependence is found to be in good agreement with the linear instability analysis in the sense that the larger the imaginary part of the unstable mode is, the longer the splitting time is. Such a temperature effect is expected to be verified in the cold atom experiments in the near future. △ Less

Submitted 16 May, 2023; v1 submitted 9 January, 2023; originally announced January 2023.

Comments: 21 pages, 9 figures, references added, clarifications made, typos corrected, version to appear in JHEP

arXiv:2210.12852 [pdf, other]

1st Place Solution of The Robust Vision Challenge 2022 Semantic Segmentation Track

Authors: Junfei Xiao, Zhichao Xu, Shiyi Lan, Zhiding Yu, Alan Yuille, Anima Anandkumar

Abstract: This report describes the winning solution to the Robust Vision Challenge (RVC) semantic segmentation track at ECCV 2022. Our method adopts the FAN-B-Hybrid model as the encoder and uses SegFormer as the segmentation framework. The model is trained on a composite dataset consisting of images from 9 datasets (ADE20K, Cityscapes, Mapillary Vistas, ScanNet, VIPER, WildDash 2, IDD, BDD, and COCO) with… ▽ More This report describes the winning solution to the Robust Vision Challenge (RVC) semantic segmentation track at ECCV 2022. Our method adopts the FAN-B-Hybrid model as the encoder and uses SegFormer as the segmentation framework. The model is trained on a composite dataset consisting of images from 9 datasets (ADE20K, Cityscapes, Mapillary Vistas, ScanNet, VIPER, WildDash 2, IDD, BDD, and COCO) with a simple dataset balancing strategy. All the original labels are projected to a 256-class unified label space, and the model is trained using a cross-entropy loss. Without significant hyperparameter tuning or any specific loss weighting, our solution ranks the first place on all the testing semantic segmentation benchmarks from multiple domains (ADE20K, Cityscapes, Mapillary Vistas, ScanNet, VIPER, and WildDash 2). The proposed method can serve as a strong baseline for the multi-domain segmentation task and benefit future works. Code will be available at https://github.com/lambert-x/RVC_Segmentation. △ Less

Submitted 7 November, 2022; v1 submitted 23 October, 2022; originally announced October 2022.

Comments: The Winning Solution to The Robust Vision Challenge 2022 Semantic Segmentation Track

arXiv:2210.07987 [pdf, other]

Bayesian Learning via Q-Exponential Process

Authors: Shuyi Li, Michael O'Connor, Shiwei Lan

Abstract: Regularization is one of the most fundamental topics in optimization, statistics and machine learning. To get sparsity in estimating a parameter $u\in\mathbb{R}^d$, an $\ell_q$ penalty term, $\Vert u\Vert_q$, is usually added to the objective function. What is the probabilistic distribution corresponding to such $\ell_q$ penalty? What is the correct stochastic process corresponding to… ▽ More Regularization is one of the most fundamental topics in optimization, statistics and machine learning. To get sparsity in estimating a parameter $u\in\mathbb{R}^d$, an $\ell_q$ penalty term, $\Vert u\Vert_q$, is usually added to the objective function. What is the probabilistic distribution corresponding to such $\ell_q$ penalty? What is the correct stochastic process corresponding to $\Vert u\Vert_q$ when we model functions $u\in L^q$? This is important for statistically modeling large dimensional objects, e.g. images, with penalty to preserve certainty properties, e.g. edges in the image. In this work, we generalize the $q$-exponential distribution (with density proportional to) $\exp{(- \frac{1}{2}|u|^q)}$ to a stochastic process named $Q$-exponential (Q-EP) process that corresponds to the $L_q$ regularization of functions. The key step is to specify consistent multivariate $q$-exponential distributions by choosing from a large family of elliptic contour distributions. The work is closely related to Besov process which is usually defined by the expanded series. Q-EP can be regarded as a definition of Besov process with explicit probabilistic formulation and direct control on the correlation length. From the Bayesian perspective, Q-EP provides a flexible prior on functions with sharper penalty ($q<2$) than the commonly used Gaussian process (GP). We compare GP, Besov and Q-EP in modeling functional data, reconstructing images, and solving inverse problems and demonstrate the advantage of our proposed methodology. △ Less

Submitted 15 November, 2023; v1 submitted 14 October, 2022; originally announced October 2022.

Comments: 21 pages, 15 figures

Journal ref: Proceedings of 37th Conference on Neural Information Processing Systems, 2023 @ New Orleans

arXiv:2210.02909 [pdf, other]

doi 10.1103/PhysRevC.107.034907

$K^{*0}$ production in Au+Au collisions at $\sqrt{s_{\rm NN}}$ = 7.7, 11.5, 14.5, 19.6, 27 and 39 GeV from RHIC beam energy scan

Authors: STAR Collaboration, M. S. Abdallah, B. E. Aboona, J. Adam, L. Adamczyk, J. R. Adams, J. K. Adkins, I. Aggarwal, M. M. Aggarwal, Z. Ahammed, D. M. Anderson, E. C. Aschenauer, J. Atchison, V. Bairathi, W. Baker, J. G. Ball Cap, K. Barish, R. Bellwied, P. Bhagat, A. Bhasin, S. Bhatta, J. Bielcik, J. Bielcikova, J. D. Brandenburg, X. Z. Cai , et al. (350 additional authors not shown)

Abstract: We report the measurement of $K^{*0}$ meson at midrapidity ($|y|<$ 1.0) in Au+Au collisions at $\sqrt{s_{\rm NN}}$~=~7.7, 11.5, 14.5, 19.6, 27 and 39 GeV collected by the STAR experiment during the RHIC beam energy scan (BES) program. The transverse momentum spectra, yield, and average transverse momentum of $K^{*0}$ are presented as functions of collision centrality and beam energy. The… ▽ More We report the measurement of $K^{*0}$ meson at midrapidity ($|y|<$ 1.0) in Au+Au collisions at $\sqrt{s_{\rm NN}}$~=~7.7, 11.5, 14.5, 19.6, 27 and 39 GeV collected by the STAR experiment during the RHIC beam energy scan (BES) program. The transverse momentum spectra, yield, and average transverse momentum of $K^{*0}$ are presented as functions of collision centrality and beam energy. The $K^{*0}/K$ yield ratios are presented for different collision centrality intervals and beam energies. The $K^{*0}/K$ ratio in heavy-ion collisions are observed to be smaller than that in small system collisions (e+e and p+p). The $K^{*0}/K$ ratio follows a similar centrality dependence to that observed in previous RHIC and LHC measurements. The data favor the scenario of the dominance of hadronic re-scattering over regeneration for $K^{*0}$ production in the hadronic phase of the medium. △ Less

Submitted 5 April, 2023; v1 submitted 6 October, 2022; originally announced October 2022.

Comments: 17 pages, 12 figures

Journal ref: Phys. Rev. C. 107. 034907 (2023)

arXiv:2209.12403 [pdf, other]

doi 10.1002/wics.1608

Sampling Constrained Continuous Probability Distributions: A Review

Authors: Shiwei Lan, Lulu Kang

Abstract: The problem of sampling constrained continuous distributions has frequently appeared in many machine/statistical learning models. Many Monte Carlo Markov Chain (MCMC) sampling methods have been adapted to handle different types of constraints on the random variables. Among these methods, Hamilton Monte Carlo (HMC) and the related approaches have shown significant advantages in terms of computation… ▽ More The problem of sampling constrained continuous distributions has frequently appeared in many machine/statistical learning models. Many Monte Carlo Markov Chain (MCMC) sampling methods have been adapted to handle different types of constraints on the random variables. Among these methods, Hamilton Monte Carlo (HMC) and the related approaches have shown significant advantages in terms of computational efficiency compared to other counterparts. In this article, we first review HMC and some extended sampling methods, and then we concretely explain three constrained HMC-based sampling methods, reflection, reformulation, and spherical HMC. For illustration, we apply these methods to solve three well-known constrained sampling problems, truncated multivariate normal distributions, Bayesian regularized regression, and nonparametric density estimation. In this review, we also connect constrained sampling with another similar problem in the statistical design of experiments of constrained design space. △ Less

Submitted 25 September, 2022; originally announced September 2022.

Journal ref: WIREs Computational Statistics 2023

arXiv:2209.11940 [pdf, other]

doi 10.1103/PhysRevC.107.024908

Higher-Order Cumulants and Correlation Functions of Proton Multiplicity Distributions in $\sqrt{s_{\mathrm{NN}}}$ = 3 GeV Au+Au Collisions at the RHIC STAR Experiment

Authors: STAR Collaboration, M. S. Abdallah, B. E. Aboona, J. Adam, L. Adamczyk, J. R. Adams, J. K. Adkins, I. Aggarwal, M. M. Aggarwal, Z. Ahammed, D. M. Anderson, E. C. Aschenauer, J. Atchison, V. Bairathi, W. Baker, J. G. Ball Cap, K. Barish, R. Bellwied, P. Bhagat, A. Bhasin, S. Bhatta, J. Bielcik, J. Bielcikova, J. D. Brandenburg, X. Z. Cai , et al. (349 additional authors not shown)

Abstract: We report a measurement of cumulants and correlation functions of event-by-event proton multiplicity distributions from fixed-target Au+Au collisions at $\sqrt{s_{\rm NN}}$ = 3 GeV measured by the STAR experiment. Protons are identified within the rapidity ($y$) and transverse momentum ($p_{\rm T}$) region $-0.9 < y<0$ and $0.4 < p_{\rm T} <2.0 $ GeV/$c$ in the center-of-mass frame. A systematic a… ▽ More We report a measurement of cumulants and correlation functions of event-by-event proton multiplicity distributions from fixed-target Au+Au collisions at $\sqrt{s_{\rm NN}}$ = 3 GeV measured by the STAR experiment. Protons are identified within the rapidity ($y$) and transverse momentum ($p_{\rm T}$) region $-0.9 < y<0$ and $0.4 < p_{\rm T} <2.0 $ GeV/$c$ in the center-of-mass frame. A systematic analysis of the proton cumulants and correlation functions up to sixth-order as well as the corresponding ratios as a function of the collision centrality, $p_{\rm T}$, and $y$ are presented. The effect of pileup and initial volume fluctuations on these observables and the respective corrections are discussed in detail. The results are compared to calculations from the hadronic transport UrQMD model as well as a hydrodynamic model. In the most central 5\% collisions, the value of proton cumulant ratio $C_4/C_2$ is negative, drastically different from the values observed in Au+Au collisions at higher energies. Compared to model calculations including Lattice QCD, a hadronic transport model, and a hydrodynamic model, the strong suppression in the ratio of $C_4/C_2$ at 3 GeV Au+Au collisions indicates an energy regime dominated by hadronic interactions. △ Less

Submitted 22 February, 2023; v1 submitted 24 September, 2022; originally announced September 2022.

Comments: 25 pages, 20 figures, 4 tables

Journal ref: Phys. Rev. C 107, 024908(2023)

arXiv:2209.08479 [pdf, other]

doi 10.1116/5.0126745

Bi-color atomic beam slower and magnetic field compensation for ultracold gases

Authors: Jianing Li, Kelvin Lim, Swarup Das, Thomas Zanon-Willette, Chen-Hao Feng, Paul Robert, Andrea Bertoldi, Philippe Bouyer, Chang Chi Kwong, Shau-Yu Lan, David Wilkowski

Abstract: Transversely loaded bidimensional-magneto-optical-traps (2D-MOT) have been recently developed as high flux sources for cold strontium atoms to realize a new generation of compact experimental setups. Here, we discuss on the implementation of a cross-polarized bi-color slower for a strontium atomic beam improving the 2D-MOT loading, and increasing the number of atoms in a final MOT by eleven times.… ▽ More Transversely loaded bidimensional-magneto-optical-traps (2D-MOT) have been recently developed as high flux sources for cold strontium atoms to realize a new generation of compact experimental setups. Here, we discuss on the implementation of a cross-polarized bi-color slower for a strontium atomic beam improving the 2D-MOT loading, and increasing the number of atoms in a final MOT by eleven times. Our slowing scheme addresses simultaneously two excited Zeeman substates of the 88Sr 1S0->1P1 transition at 461 nm. We also realized a 3-axis active feedback control of the magnetic field down to the microgauss regime. Such a compensation is performed thanks to a network of eight magnetic field probes arranged in a cuboid configuration around the atomic cold sample, and a pair of coils in Helmholtz configuration along each of three Cartesian directions. Our active feedback is capable of efficiently suppressing most of the magnetically-induced position fluctuations of the 689~nm intercombination-line MOT. △ Less

Submitted 5 January, 2023; v1 submitted 18 September, 2022; originally announced September 2022.

Comments: 8 pages, 6 figures

Journal ref: AVS Quantum Sci. 4, 046801 (2022)

arXiv:2208.00653 [pdf, ps, other]

doi 10.1103/PhysRevC.107.024901

Pion, kaon, and (anti-)proton production in U+U Collisions at $\sqrt{s_{NN}}$ = 193 GeV measured with the STAR detector

Authors: STAR Collaboration, M. S. Abdallah, B. E. Aboona, J. Adam, J. R. Adams, J. K. Adkins, G. Agakishiev, I. Aggarwal, M. M. Aggarwal, Z. Ahammed, A. Aitbaev, I. Alekseev, D. M. Anderson, A. Aparin, J. Atchison, G. S. Averichev, V. Bairathi, W. Baker, J. G. Ball Cap, K. Barish, P. Bhagat, A. Bhasin, S. Bhatta, I. G. Bordyuzhin, J. D. Brandenburg , et al. (330 additional authors not shown)

Abstract: We present the first measurements of transverse momentum spectra of $π^{\pm}$, $K^{\pm}$, $p(\bar{p})$ at midrapidity ($|y| < 0.1$) in U+U collisions at $\sqrt{s_{NN}}$ = 193 GeV with the STAR detector at the Relativistic Heavy Ion Collider (RHIC). The centrality dependence of particle yields, average transverse momenta, particle ratios and kinetic freeze-out parameters are discussed. The results… ▽ More We present the first measurements of transverse momentum spectra of $π^{\pm}$, $K^{\pm}$, $p(\bar{p})$ at midrapidity ($|y| < 0.1$) in U+U collisions at $\sqrt{s_{NN}}$ = 193 GeV with the STAR detector at the Relativistic Heavy Ion Collider (RHIC). The centrality dependence of particle yields, average transverse momenta, particle ratios and kinetic freeze-out parameters are discussed. The results are compared with the published results from Au+Au collisions at $\sqrt{s_{NN}} =$ 200 GeV in STAR. The results are also compared to those from A Multi Phase Transport (AMPT) model. △ Less

Submitted 11 February, 2023; v1 submitted 1 August, 2022; originally announced August 2022.

Comments: 17 pages, 14 figures and 7 tables; Replaced with the updated version published in Physical Review C

Journal ref: Phys. Rev. C 107 (2023) 024901

arXiv:2207.02814 [pdf, other]

doi 10.1103/PhysRevD.107.L121901

Holographic dissipation prefers the Landau over the Keldysh form

Authors: Yu-Kun Yan, Shanquan Lan, Yu Tian, Peng Yang, Shunhui Yao, Hongbao Zhang

Abstract: Although holographic duality has been regarded as a complementary tool in helping understand the non-equilibrium dynamics of strongly coupled many-body systems, it still remains a remarkable challenge how to confront its predictions quantitatively with the real experimental scenarios. By matching the holographic vortex dynamics with the phenomenological dissipative Gross-Pitaeviskii models, we fin… ▽ More Although holographic duality has been regarded as a complementary tool in helping understand the non-equilibrium dynamics of strongly coupled many-body systems, it still remains a remarkable challenge how to confront its predictions quantitatively with the real experimental scenarios. By matching the holographic vortex dynamics with the phenomenological dissipative Gross-Pitaeviskii models, we find that the holographic dissipation mechanism can be well captured by the Landau form rather than the Keldysh one, although the latter is much more widely used in numerical simulations. Our finding is expected to open up novel avenues for facilitating the quantitative test of the holographic predictions against the upcoming experimental data. Our result also provides a prime example how holographic duality can help select proper phenomenological models to describe far-from-equilibrium nonlinear dynamics beyond the hydrodynamic regime. △ Less

Submitted 26 May, 2023; v1 submitted 6 July, 2022; originally announced July 2022.

Comments: some changes in presentation, version to appear in PRD as a Letter

Journal ref: Phys. Rev. D 107, L121901 (2023)

arXiv:2207.00778 [pdf, other]

doi 10.1016/j.physletb.2022.137449

Measurement of $\rm ^4_ΛH$ and $\rm ^4_ΛHe$ binding energy in Au+Au collisions at $\sqrt{s_\mathrm{NN}}$ = 3 GeV

Authors: STAR Collaboration, M. S. Abdallah, B. E. Aboona, J. Adam, L. Adamczyk, J. R. Adams, J. K. Adkins, I. Aggarwal, M. M. Aggarwal, Z. Ahammed, D. M. Anderson, E. C. Aschenauer, M. U. Ashraf, J. Atchison, V. Bairathi, W. Baker, J. G. Ball Cap, K. Barish, A. Behera, R. Bellwied, P. Bhagat, A. Bhasin, J. Bielcik, J. Bielcikova, J. D. Brandenburg , et al. (348 additional authors not shown)

Abstract: Measurements of mass and $Λ$ binding energy of $\rm ^4_ΛH$ and $\rm ^4_ΛHe$ in Au+Au collisions at $\sqrt{s_{_{\rm NN}}}=3$ GeV are presented, with an aim to address the charge symmetry breaking (CSB) problem in hypernuclei systems with atomic number A = 4. The $Λ$ binding energies are measured to be $\rm 2.22\pm0.06(stat.) \pm0.14(syst.)$ MeV and $\rm 2.38\pm0.13(stat.) \pm0.12(syst.)$ MeV for… ▽ More Measurements of mass and $Λ$ binding energy of $\rm ^4_ΛH$ and $\rm ^4_ΛHe$ in Au+Au collisions at $\sqrt{s_{_{\rm NN}}}=3$ GeV are presented, with an aim to address the charge symmetry breaking (CSB) problem in hypernuclei systems with atomic number A = 4. The $Λ$ binding energies are measured to be $\rm 2.22\pm0.06(stat.) \pm0.14(syst.)$ MeV and $\rm 2.38\pm0.13(stat.) \pm0.12(syst.)$ MeV for $\rm ^4_ΛH$ and $\rm ^4_ΛHe$, respectively. The measured $Λ$ binding-energy difference is $\rm 0.16\pm0.14(stat.)\pm0.10(syst.)$ MeV for ground states. Combined with the $γ$-ray transition energies, the binding-energy difference for excited states is $\rm -0.16\pm0.14(stat.)\pm0.10(syst.)$ MeV, which is negative and comparable to the value of the ground states within uncertainties. These new measurements on the $Λ$ binding-energy difference in A = 4 hypernuclei systems are consistent with the theoretical calculations that result in $\rm ΔB_Λ^4(1_{exc}^{+})\approx -ΔB_Λ^4(0_{g.s.}^{+})<0$ and present a new method for the study of CSB effect using relativistic heavy-ion collisions. △ Less

Submitted 3 October, 2022; v1 submitted 2 July, 2022; originally announced July 2022.

Comments: 8 pages, 5 figures

Journal ref: M. Abdallah et. al., STAR Collaboration, Physics Letters B 834 (2022) 137449

arXiv:2205.11800 [pdf, other]

doi 10.1103/PhysRevD.106.072010

Azimuthal transverse single-spin asymmetries of inclusive jets and identified hadrons within jets from polarized $pp$ collisions at $\sqrt{s}$ = 200 GeV

Authors: STAR Collaboration, M. S. Abdallah, B. E. Aboona, J. Adam, L. Adamczyk, J. R. Adams, J. K. Adkins, I. Aggarwal, M. M. Aggarwal, Z. Ahammed, D. M. Anderson, E. C. Aschenauer, M. U. Ashraf, J. Atchison, V. Bairathi, W. Baker, J. G. Ball Cap, K. Barish, A. Behera, R. Bellwied, P. Bhagat, A. Bhasin, J. Bielcik, J. Bielcikova, J. D. Brandenburg , et al. (348 additional authors not shown)

Abstract: The STAR Collaboration reports measurements of the transverse single-spin asymmetries, $A_N$, for inclusive jets and identified `hadrons within jets' production at midrapidity from transversely polarized $pp$ collisions at $\sqrt{s}$ = 200 GeV, based on data recorded in 2012 and 2015. The inclusive jet asymmetry measurements include $A_N$ for inclusive jets and $A_N$ for jets containing a charged… ▽ More The STAR Collaboration reports measurements of the transverse single-spin asymmetries, $A_N$, for inclusive jets and identified `hadrons within jets' production at midrapidity from transversely polarized $pp$ collisions at $\sqrt{s}$ = 200 GeV, based on data recorded in 2012 and 2015. The inclusive jet asymmetry measurements include $A_N$ for inclusive jets and $A_N$ for jets containing a charged pion carrying a momentum fraction $z>0.3$ of the jet momentum. The identified hadron within jet asymmetry measurements include the Collins effect for charged pions, kaons and protons, and the Collins-like effect for charged pions. The measured asymmetries are determined for several distinct kinematic regions, characterized by the jet transverse momentum $p_{T}$ and pseudorapidity $η$, as well as the hadron momentum fraction $z$ and momentum transverse to the jet axis $j_{T}$. These results probe higher momentum scales ($Q^{2}$ up to $\sim$\,900 GeV$^{2}$) than current, semi-inclusive deep inelastic scattering measurements, and they provide new constraints on quark transversity in the proton and enable tests of evolution, universality and factorization breaking in the transverse-momentum-dependent formalism. △ Less

Submitted 19 September, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

Comments: 24 pages, 24 figures, Accepted by PRD

arXiv:2205.11073 [pdf, ps, other]

doi 10.1103/PhysRevC.107.024912

Azimuthal anisotropy measurement of (multi-)strange hadrons in Au+Au collisions at $\sqrt{s_{\text{NN}}}$ = 54.4 GeV

Authors: STAR Collaboration, M. S. Abdallah, B. E. Aboona, J. Adam, L. Adamczyk, J. R. Adams, J. K. Adkins, I. Aggarwal, M. M. Aggarwal, Z. Ahammed, D. M. Anderson, E. C. Aschenauer, J. Atchison, V. Bairathi, W. Baker, J. G. Ball Cap, K. Barish, R. Bellwied, P. Bhagat, A. Bhasin, S. Bhatta, J. Bielcik, J. Bielcikova, J. D. Brandenburg, X. Z. Cai , et al. (347 additional authors not shown)

Abstract: Azimuthal anisotropy of produced particles is one of the most important observables used to access the collective properties of the expanding medium created in relativistic heavy-ion collisions. In this paper, we present second ($v_{2}$) and third ($v_{3}$) order azimuthal anisotropies of $K_{S}^{0}$, $φ$, $Λ$, $Ξ$ and $Ω$ at mid-rapidity ($|y|<$1) in Au+Au collisions at $\sqrt{s_{\text{NN}}}$ = 5… ▽ More Azimuthal anisotropy of produced particles is one of the most important observables used to access the collective properties of the expanding medium created in relativistic heavy-ion collisions. In this paper, we present second ($v_{2}$) and third ($v_{3}$) order azimuthal anisotropies of $K_{S}^{0}$, $φ$, $Λ$, $Ξ$ and $Ω$ at mid-rapidity ($|y|<$1) in Au+Au collisions at $\sqrt{s_{\text{NN}}}$ = 54.4 GeV measured by the STAR detector. The $v_{2}$ and $v_{3}$ are measured as a function of transverse momentum and centrality. Their energy dependence is also studied. $v_{3}$ is found to be more sensitive to the change in the center-of-mass energy than $v_{2}$. Scaling by constituent quark number is found to hold for $v_{2}$ within 10%. This observation could be evidence for the development of partonic collectivity in 54.4 GeV Au+Au collisions. Differences in $v_{2}$ and $v_{3}$ between baryons and anti-baryons are presented, and ratios of $v_{3}$/$v_{2}^{3/2}$ are studied and motivated by hydrodynamical calculations. The ratio of $v_{2}$ of $φ$ mesons to that of anti-protons ($v_{2}(φ)/v_{2}(\bar{p})$) shows centrality dependence at low transverse momentum, presumably resulting from the larger effects from hadronic interactions on anti-proton $v_{2}$. △ Less

Submitted 23 February, 2023; v1 submitted 23 May, 2022; originally announced May 2022.

Comments: 12 pages, 14 figures

Journal ref: Phys. Rev. C 107, 024912 (2023)

arXiv:2205.08994 [pdf]

Interfacial properties of 2D WS2 on SiO2 substrate from x-ray photoelectron spectroscopy and first-principles calculations

Authors: Changjie Zhou, Huili Zhu, Weifeng Yang, Qiubao Lin, Tongchang Zheng, Lan Yang, Shuqiong Lan

Abstract: Two-dimensional (2D) WS2 films were deposited on SiO2 wafers, and the related interfacial properties were investigated by high-resolution x-ray photoelectron spectroscopy (XPS) and first-principles calculations. Using the direct (indirect) method, the valence band offset (VBO) at monolayer WS2/SiO2 interface was found to be 3.97 eV (3.86 eV), and the conduction band offset (CBO) was 2.70 eV (2.81… ▽ More Two-dimensional (2D) WS2 films were deposited on SiO2 wafers, and the related interfacial properties were investigated by high-resolution x-ray photoelectron spectroscopy (XPS) and first-principles calculations. Using the direct (indirect) method, the valence band offset (VBO) at monolayer WS2/SiO2 interface was found to be 3.97 eV (3.86 eV), and the conduction band offset (CBO) was 2.70 eV (2.81 eV). Furthermore, the VBO (CBO) at bulk WS2/SiO2 interface is found to be about 0.48 eV (0.33 eV) larger due to the interlayer orbital coupling and splitting of valence and conduction band edges. Therefore, the WS2/SiO2 heterostructure has a Type I energy-band alignment. The band offsets obtained experimentally and theoretically are consistent except the narrower theoretical bandgap of SiO2. The theoretical calculations further reveal a binding energy of 75 meV per S atom and the totally separated partial density of states, indicating a weak interaction and negligible Fermi level pinning effect between WS2 monolayer and SiO2 surface. Our combined experimental and theoretical results provide proof of the sufficient VBOs and CBOs and weak interaction in 2D WS2/SiO2 heterostructures. △ Less

Submitted 18 May, 2022; originally announced May 2022.

arXiv:2204.11661 [pdf, ps, other]

Two-particle correlations on transverse rapidity in Au+Au collisions at $\sqrt{s_{\rm NN}}=200$ GeV at STAR

Authors: STAR Collaboration, M. S. Abdallah, B. E. Aboona, J. Adam, L. Adamczyk, J. R. Adams, J. K. Adkins, G. Agakishiev, I. Aggarwal, M. M. Aggarwal, Z. Ahammed, A. Aitbaev, I. Alekseev, D. M. Anderson, A. Aparin, E. C. Aschenauer, M. U. Ashraf, F. G. Atetalla, G. S. Averichev, V. Bairathi, W. Baker, J. G. Ball Cap, K. Barish, A. Behera, R. Bellwied , et al. (370 additional authors not shown)

Abstract: Two-particle correlation measurements projected onto two-dimensional, transverse rapidity coordinates ($y_{T1},y_{T2}$), allow access to dynamical properties of the QCD medium produced in relativistic heavy-ion collisions that angular correlation measurements are not sensitive to. We report non-identified charged-particle correlations for Au + Au minimum-bias collisions at $\sqrt{s_{\rm NN}}$ = 20… ▽ More Two-particle correlation measurements projected onto two-dimensional, transverse rapidity coordinates ($y_{T1},y_{T2}$), allow access to dynamical properties of the QCD medium produced in relativistic heavy-ion collisions that angular correlation measurements are not sensitive to. We report non-identified charged-particle correlations for Au + Au minimum-bias collisions at $\sqrt{s_{\rm NN}}$ = 200 GeV taken by the STAR experiment at the Relativistic Heavy-Ion Collider (RHIC). Correlations are presented as 2D functions of transverse rapidity for like-sign, unlike-sign and all charged-particle pairs, as well as for particle pairs whose relative azimuthal angles lie on the near-side, the away-side, or at all relative azimuth. The correlations are constructed using charged particles with transverse momentum $p_T \geq 0.15$ GeV/$c$, pseudorapidity from $-$1 to 1, and azimuthal angles from $-π$ to $π$. The significant correlation structures that are observed evolve smoothly with collision centrality. The major correlation features include a saddle shape plus a broad peak with maximum near $y_T \approx 3$, corresponding to $p_T \approx$ 1.5 GeV/$c$. The broad peak is observed in both like- and unlike-sign charge combinations and in near- and away-side relative azimuthal angles. The all-charge, all-azimuth correlation measurements are compared with the theoretical predictions of {\sc hijing} and {\sc epos}. The results indicate that the correlations for peripheral to mid-central collisions can be approximately described as a superposition of nucleon + nucleon collisions with minimal effects from the QCD medium. Strong medium effects are indicated in mid- to most-central collisions. △ Less

Submitted 25 April, 2022; originally announced April 2022.

arXiv:2204.10929 [pdf, other]

Bayesian Spatiotemporal Modeling for Inverse Problems

Authors: Shiwei Lan, Shuyi Li, Mirjeta Pasha

Abstract: Inverse problems with spatiotemporal observations are ubiquitous in scientific studies and engineering applications. In these spatiotemporal inverse problems, observed multivariate time series are used to infer parameters of physical or biological interests. Traditional solutions for these problems often ignore the spatial or temporal correlations in the data (static model), or simply model the da… ▽ More Inverse problems with spatiotemporal observations are ubiquitous in scientific studies and engineering applications. In these spatiotemporal inverse problems, observed multivariate time series are used to infer parameters of physical or biological interests. Traditional solutions for these problems often ignore the spatial or temporal correlations in the data (static model), or simply model the data summarized over time (time-averaged model). In either case, the data information that contains the spatiotemporal interactions is not fully utilized for parameter learning, which leads to insufficient modeling in these problems. In this paper, we apply Bayesian models based on spatiotemporal Gaussian processes (STGP) to the inverse problems with spatiotemporal data and show that the spatial and temporal information provides more effective parameter estimation and uncertainty quantification (UQ). We demonstrate the merit of Bayesian spatiotemporal modeling for inverse problems compared with traditional static and time-averaged approaches using a time-dependent advection-diffusion partial different equation (PDE) and three chaotic ordinary differential equations (ODE). We also provide theoretic justification for the superiority of spatiotemporal modeling to fit the trajectories even it appears cumbersome (e.g. for chaotic dynamics). △ Less

Submitted 22 April, 2022; originally announced April 2022.

Comments: 38 pages, 23 figures

arXiv:2204.02302 [pdf, other]

doi 10.1038/s41586-022-05557-5

Pattern of Global Spin Alignment of $φ$ and $K^{*0}$ mesons in Heavy-Ion Collisions

Authors: STAR Collaboration, M. S. Abdallah, B. E. Aboona, J. Adam, L. Adamczyk, J. R. Adams, J. K. Adkins, G. Agakishiev, I. Aggarwal, M. M. Aggarwal, Z. Ahammed, A. Aitbaev, I. Alekseev, D. M. Anderson, A. Aparin, E. C. Aschenauer, M. U. Ashraf, F. G. Atetalla, G. S. Averichev, V. Bairathi, W. Baker, J. G. Ball Cap, K. Barish, A. Behera, R. Bellwied , et al. (368 additional authors not shown)

Abstract: Notwithstanding decades of progress since Yukawa first developed a description of the force between nucleons in terms of meson exchange, a full understanding of the strong interaction remains a major challenge in modern science. One remaining difficulty arises from the non-perturbative nature of the strong force, which leads to the phenomenon of quark confinement at distances on the order of the s… ▽ More Notwithstanding decades of progress since Yukawa first developed a description of the force between nucleons in terms of meson exchange, a full understanding of the strong interaction remains a major challenge in modern science. One remaining difficulty arises from the non-perturbative nature of the strong force, which leads to the phenomenon of quark confinement at distances on the order of the size of the proton. Here we show that in relativistic heavy-ion collisions, where quarks and gluons are set free over an extended volume, two species of produced vector (spin-1) mesons, namely $φ$ and $K^{*0}$, emerge with a surprising pattern of global spin alignment. In particular, the global spin alignment for $φ$ is unexpectedly large, while that for $K^{*0}$ is consistent with zero. The observed spin-alignment pattern and magnitude for the $φ$ cannot be explained by conventional mechanisms, while a model with a connection to strong force fields, i.e. an effective proxy description within the Standard Model and Quantum Chromodynamics, accommodates the current data. This connection, if fully established, will open a potential new avenue for studying the behaviour of strong force fields. △ Less

Submitted 18 January, 2023; v1 submitted 5 April, 2022; originally announced April 2022.

arXiv:2204.01625 [pdf, other]

doi 10.1126/sciadv.abq3903

Tomography of Ultra-relativistic Nuclei with Polarized Photon-gluon Collisions

Authors: STAR Collaboration, M. S. Abdallah, B. E. Aboona, J. Adam, L. Adamczyk, J. R. Adams, J. K. Adkins, G. Agakishiev, I. Aggarwal, M. M. Aggarwal, Z. Ahammed, A. Aitbaev, I. Alekseev, D. M. Anderson, A. Aparin, E. C. Aschenauer, M. U. Ashraf, F. G. Atetalla, G. S. Averichev, V. Bairathi, W. Baker, J. G. Ball Cap, K. Barish, A. Behera, R. Bellwied , et al. (370 additional authors not shown)

Abstract: A linearly polarized photon can be quantized from the Lorentz-boosted electromagnetic field of a nucleus traveling at ultra-relativistic speed. When two relativistic heavy nuclei pass one another at a distance of a few nuclear radii, the photon from one nucleus may interact through a virtual quark-antiquark pair with gluons from the other nucleus forming a short-lived vector meson (e.g. ${ρ^0}$).… ▽ More A linearly polarized photon can be quantized from the Lorentz-boosted electromagnetic field of a nucleus traveling at ultra-relativistic speed. When two relativistic heavy nuclei pass one another at a distance of a few nuclear radii, the photon from one nucleus may interact through a virtual quark-antiquark pair with gluons from the other nucleus forming a short-lived vector meson (e.g. ${ρ^0}$). In this experiment, the polarization was utilized in diffractive photoproduction to observe a unique spin interference pattern in the angular distribution of ${ρ^0\rightarrowπ^+π^-}$ decays. The observed interference is a result of an overlap of two wave functions at a distance an order of magnitude larger than the ${ρ^0}$ travel distance within its lifetime. The strong-interaction nuclear radii were extracted from these diffractive interactions, and found to be $6.53\pm 0.06$ fm ($^{197} {\rm Au }$) and $7.29\pm 0.08$ fm ($^{238} {\rm U}$), larger than the nuclear charge radii. The observable is demonstrated to be sensitive to the nuclear geometry and quantum interference of non-identical particles. △ Less

Submitted 4 April, 2022; originally announced April 2022.

Journal ref: STAR Collaboration, Sci. Adv. 9, abq3903 (2023)

arXiv:2203.07204 [pdf, ps, other]

Centrality and transverse momentum dependence of higher-order flow harmonics of identified hadrons in Au+Au collisions at $\sqrt{s_{\rm NN}}$ = 200 GeV

Authors: STAR Collaboration, M. S. Abdallah, B. E. Aboona, J. Adam, L. Adamczyk, J. R. Adams, J. K. Adkins, G. Agakishiev, I. Aggarwal, M. M. Aggarwal, Z. Ahammed, A. Aitbaev, I. Alekseev, D. M. Anderson, A. Aparin, E. C. Aschenauer, M. U. Ashraf, F. G. Atetalla, G. S. Averichev, V. Bairathi, W. Baker, J. G. Ball Cap, K. Barish, A. Behera, R. Bellwied , et al. (373 additional authors not shown)

Abstract: We present high-precision measurements of elliptic, triangular, and quadrangular flow $v_{2}$, $v_{3}$, and $v_{4}$, respectively, at midrapidity ($|η|<1.0$) for identified hadrons $π$, $p$, $K$, $\varphi$, $K_s$, $Λ$ as a function of centrality and transverse momentum in Au+Au collisions at the center-of-mass energy $\sqrt{s_{\rm NN}}=$ 200 GeV. We observe similar $v_{n}$ trends between light and… ▽ More We present high-precision measurements of elliptic, triangular, and quadrangular flow $v_{2}$, $v_{3}$, and $v_{4}$, respectively, at midrapidity ($|η|<1.0$) for identified hadrons $π$, $p$, $K$, $\varphi$, $K_s$, $Λ$ as a function of centrality and transverse momentum in Au+Au collisions at the center-of-mass energy $\sqrt{s_{\rm NN}}=$ 200 GeV. We observe similar $v_{n}$ trends between light and strange mesons which indicates that the heavier strange quarks flow as strongly as the lighter up and down quarks. The number-of-constituent-quark scaling for $v_{2}$, $v_{3}$, and $v_{4}$ is found to hold within statistical uncertainty for 0-10$\%$, 10-40$\%$ and 40-80$\%$ collision centrality intervals. The results are compared to several viscous hydrodynamic calculations with varying initial conditions, and could serve as an additional constraint to the development of hydrodynamic models. △ Less

Submitted 14 March, 2022; originally announced March 2022.

Comments: 15 pages, 11 figures, submitted for publication

arXiv:2203.00721 [pdf, other]

Validation of the Reduced Unified Continuum Formulation Against In Vitro 4D-Flow MRI

Authors: Ingrid S. Lan, Ju Liu, Weiguang Yang, Judith Zimmermann, Daniel B. Ennis, Alison L. Marsden

Abstract: In our recent work, we introduced the reduced unified continuum formulation for vascular fluid-structure interaction (FSI) and demonstrated enhanced solver accuracy, scalability, and performance compared to conventional approaches. We further verified the formulation against Womersley's deformable wall theory. In this study, we assessed its performance in a compliant patient-specific aortic model… ▽ More In our recent work, we introduced the reduced unified continuum formulation for vascular fluid-structure interaction (FSI) and demonstrated enhanced solver accuracy, scalability, and performance compared to conventional approaches. We further verified the formulation against Womersley's deformable wall theory. In this study, we assessed its performance in a compliant patient-specific aortic model by leveraging 3D printing, 2D magnetic resonance imaging (MRI), and 4D-flow MRI to extract high-resolution anatomical and hemodynamic information from an in vitro flow circuit. To accurately reflect experimental conditions, we additionally enabled in-plane vascular motion at each inlet and outlet, and implemented viscoelastic external tissue support and vascular tissue prestressing. Validation of our formulation is achieved through close quantitative agreement in pressures, lumen area changes, pulse wave velocity, and early systolic velocities, as well as qualitative agreement in late systolic flow structures. Our validated suite of FSI techniques can be used to investigate vascular disease initiation, progression, and treatment at a computational cost on the same order as that of rigid-walled simulations. This study is the first to validate a cardiovascular FSI formulation against an in vitro flow circuit involving a compliant vascular phantom of complex patient-specific anatomy. △ Less

Submitted 1 March, 2022; originally announced March 2022.

Showing 1–50 of 227 results for author: Lan, S