Skip to main content

Showing 1–50 of 620 results for author: Lu, T

  1. arXiv:2407.08941  [pdf, other

    cs.IT

    Two Classes of Optimal Multi-Input Structures for Node Computations in Message Passing Algorithms

    Authors: Teng Lu, Xuan He, Xiaohu Tang

    Abstract: In this paper, we delve into the computations performed at a node within a message-passing algorithm. We investigate low complexity/latency multi-input structures that can be adopted by the node for computing outgoing messages y = (y1, y2, . . . , yn) from incoming messages x = (x1, x2, . . . , xn), where each yj , j = 1, 2, . . . , n is computed via a multi-way tree with leaves x excluding xj . S… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  2. arXiv:2406.18796  [pdf, other

    quant-ph

    Protecting three-dimensional entanglement from correlated amplitude damping channel

    Authors: Xing Xiao, Wen-Rui Huang, Tian-Xiang Lu, Yan-Ling Li

    Abstract: Quantum entanglement is a crucial resource in quantum information processing, and protecting it against noise poses a significant challenge. This paper introduces two strategies for preserving qutrit-qutrit entanglement in the presence of correlated amplitude damping (CAD) noise: weak measurement (WM) and environment-assisted measurement (EAM), both combined with quantum measurement reversal (QMR)… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 14 pages,, 6 figures, comments are welcome!

  3. arXiv:2406.18070  [pdf, other

    cs.CV

    EgoVideo: Exploring Egocentric Foundation Model and Downstream Adaptation

    Authors: Baoqi Pei, Guo Chen, Jilan Xu, Yuping He, Yicheng Liu, Kanghua Pan, Yifei Huang, Yali Wang, Tong Lu, Limin Wang, Yu Qiao

    Abstract: In this report, we present our solutions to the EgoVis Challenges in CVPR 2024, including five tracks in the Ego4D challenge and three tracks in the EPIC-Kitchens challenge. Building upon the video-language two-tower model and leveraging our meticulously organized egocentric video data, we introduce a novel foundation model called EgoVideo. This model is specifically designed to cater to the uniqu… ▽ More

    Submitted 30 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: Champion solutions in the EgoVis CVPR 2024 workshop

  4. arXiv:2406.18036  [pdf, other

    quant-ph

    Operating Single-Photon Circulator by Spinning Optical Resonators

    Authors: Jing Li, Tian-Xiang Lu, Meiyu Peng, Le-Man Kuang, Hui Jing, Lan Zhou

    Abstract: A circulator is one of the crucial devices in quantum networks and simulations. We propose a four-port circulator that regulate the flow of single photons at muti-frequency points by studying the coherent transmission of a single photon in a coupled system of two resonators and two waveguides. When both resonators are static or rotate at the same angular velocity, single-photon transport demonstra… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 12 pages, 5 figures

  5. arXiv:2406.14673  [pdf, other

    cs.CL

    Insights into LLM Long-Context Failures: When Transformers Know but Don't Tell

    Authors: Taiming Lu, Muhan Gao, Kuai Yu, Adam Byerly, Daniel Khashabi

    Abstract: Large Language Models (LLMs) exhibit positional bias, struggling to utilize information from the middle or end of long contexts. Our study explores LLMs' long-context reasoning by probing their hidden representations. We find that while LLMs encode the position of target information, they often fail to leverage this in generating accurate responses. This reveals a disconnect between information re… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  6. arXiv:2406.13748  [pdf, other

    cs.CL cs.LG

    Every Language Counts: Learn and Unlearn in Multilingual LLMs

    Authors: Taiming Lu, Philipp Koehn

    Abstract: This paper investigates the propagation of harmful information in multilingual large language models (LLMs) and evaluates the efficacy of various unlearning methods. We demonstrate that fake information, regardless of the language it is in, once introduced into these models through training data, can spread across different languages, compromising the integrity and reliability of the generated con… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  7. arXiv:2406.08418  [pdf, other

    cs.CV cs.AI

    OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

    Authors: Qingyun Li, Zhe Chen, Weiyun Wang, Wenhai Wang, Shenglong Ye, Zhenjiang Jin, Guanzhou Chen, Yinan He, Zhangwei Gao, Erfei Cui, Jiashuo Yu, Hao Tian, Jiasheng Zhou, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai, Licheng Wen, Xiangchao Yan, Zhenxiang Li, Pei Chu, Yi Wang , et al. (15 additional authors not shown)

    Abstract: Image-text interleaved data, consisting of multiple images and texts arranged in a natural document format, aligns with the presentation paradigm of internet data and closely resembles human reading habits. Recent studies have shown that such data aids multimodal in-context learning and maintains the capabilities of large language models during multimodal fine-tuning. However, the limited scale an… ▽ More

    Submitted 12 July, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  8. arXiv:2406.08394  [pdf, other

    cs.CV

    VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks

    Authors: Jiannan Wu, Muyan Zhong, Sen Xing, Zeqiang Lai, Zhaoyang Liu, Wenhai Wang, Zhe Chen, Xizhou Zhu, Lewei Lu, Tong Lu, Ping Luo, Yu Qiao, Jifeng Dai

    Abstract: We present VisionLLM v2, an end-to-end generalist multimodal large model (MLLM) that unifies visual perception, understanding, and generation within a single framework. Unlike traditional MLLMs limited to text output, VisionLLM v2 significantly broadens its application scope. It excels not only in conventional visual question answering (VQA) but also in open-ended, cross-domain vision tasks such a… ▽ More

    Submitted 14 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: 43 pages

  9. arXiv:2406.07971  [pdf, other

    cs.CL cs.AI cs.LG

    It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF

    Authors: Taiming Lu, Lingfeng Shen, Xinyu Yang, Weiting Tan, Beidi Chen, Huaxiu Yao

    Abstract: Reinforcement Learning from Human Feedback (RLHF) involves training policy models (PMs) and reward models (RMs) to align language models with human preferences. Instead of focusing solely on PMs and RMs independently, we propose to examine their interactions during fine-tuning, introducing the concept of seamlessness. Our study starts with observing the saturation phenomenon, where continual impro… ▽ More

    Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  10. arXiv:2406.03530  [pdf, other

    cond-mat.str-el

    Fractional Chern Insulators in Twisted Bilayer MoTe$_2$: A Composite Fermion Perspective

    Authors: Tianhong Lu, Luiz H. Santos

    Abstract: The discovery of Fractional Chern Insulators (FCIs) in twisted bilayer MoTe$_2$ has sparked significant interest in fractional topological matter without external magnetic fields. Unlike the flat dispersion of Landau levels, moiré electronic states are influenced by lattice effects within a nanometer-scale superlattice. This study examines the impact of these lattice effects on the topological pha… ▽ More

    Submitted 10 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: Main text: 5 pages and 4 figures. Updated version with improved figures and enhanced text presentation

  11. arXiv:2406.02039  [pdf, other

    cs.AR

    LMB: Augmenting PCIe Devices with CXL-Linked Memory Buffer

    Authors: Jiapin Wang, Xiangping Zhang, Chenlei Tang, Xiang Chen, Tao Lu

    Abstract: PCIe devices, such as SSDs and GPUs, are pivotal in modern data centers, and their value is set to grow amidst the emergence of AI and large models. However, these devices face onboard DRAM shortage issue due to internal space limitation, preventing accommodation of sufficient DRAM modules alongside flash or GPU processing chips. Current solutions either curb device-internal memory usage or supple… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  12. arXiv:2405.19511  [pdf, other

    astro-ph.EP

    Planet-Planet Scattering and ZLK Migration -- The Dynamical History of HAT-P-11

    Authors: Tiger Lu, Qier An, Gongjie Li, Sarah C. Millholland, G. Mirek Brandt, Timothy D. Brandt

    Abstract: The two planets of the HAT-P-11 system represent fascinating dynamical puzzles due to their significant eccentricities and orbital misalignments. In particular, HAT-P-11 b is on a close-in orbit that tides should have circularized well within the age of the system. Here we propose a two-step dynamical process that can reproduce all intriguing aspects of the system. We first invoke planet-planet sc… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 17 pages, 8 figures, submitted to ApJ

  13. arXiv:2405.19510  [pdf, other

    astro-ph.EP

    Significant mutual inclinations between the stellar spin and the orbits of both planets in the HAT-P-11 system

    Authors: Qier An, Tiger Lu, G. Mirek Brandt, Timothy D Brandt, Gongjie Li

    Abstract: Planet-star and planet-planet obliquity encode a planetary system's dynamical history, but both obliquities are hard to measure for misaligned systems with close-in companions. HAT-P-11 is a K4 star with two known planets: a close-in, misaligned super-Neptune with a approx 5-day orbit, and an outer super-Jupiter with a approx 10-year orbit. In this work we present a joint orbit fit of HAT-P-11 sys… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 12 pages, 8 figures, submitted to AJ

  14. arXiv:2405.07527  [pdf, other

    cs.LG cs.AI

    Train Faster, Perform Better: Modular Adaptive Training in Over-Parameterized Models

    Authors: Yubin Shi, Yixuan Chen, Mingzhi Dong, Xiaochen Yang, Dongsheng Li, Yujiang Wang, Robert P. Dick, Qin Lv, Yingying Zhao, Fan Yang, Tun Lu, Ning Gu, Li Shang

    Abstract: Despite their prevalence in deep-learning communities, over-parameterized models convey high demands of computational costs for proper training. This work studies the fine-grained, modular-level learning dynamics of over-parameterized models to attain a more efficient and fruitful training strategy. Empirical evidence reveals that when scaling down into network modules, such as heads in self-atten… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: Accepted at NeurIPS 2023

  15. arXiv:2405.03800  [pdf, other

    astro-ph.EP astro-ph.IM physics.comp-ph

    TRACE: a Time-Reversible Algorithm for Close Encounters

    Authors: Tiger Lu, David M. Hernandez, Hanno Rein

    Abstract: We present TRACE, a time-reversible hybrid integrator for the planetary N-body problem. Like hybrid symplectic integrators, TRACE can resolve close encounters between particles while retaining many of the accuracy and speed advantages of a fixed timestep symplectic method such the Wisdom-Holman map. TRACE switches methods time-reversibly during close encounters following the prescription of Hernan… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: Submitted to MNRAS

  16. arXiv:2404.16821  [pdf, other

    cs.CV

    How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

    Authors: Zhe Chen, Weiyun Wang, Hao Tian, Shenglong Ye, Zhangwei Gao, Erfei Cui, Wenwen Tong, Kongzhi Hu, Jiapeng Luo, Zheng Ma, Ji Ma, Jiaqi Wang, Xiaoyi Dong, Hang Yan, Hewei Guo, Conghui He, Botian Shi, Zhenjiang Jin, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai , et al. (10 additional authors not shown)

    Abstract: In this report, we introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introduce three simple improvements: (1) Strong Vision Encoder: we explored a continuous learning strategy for the large-scale vision foundation model -- InternViT-6B, boosting its visual… ▽ More

    Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Technical report

  17. arXiv:2404.14316  [pdf, other

    cs.CL

    Automated Long Answer Grading with RiceChem Dataset

    Authors: Shashank Sonkar, Kangqi Ni, Lesa Tran Lu, Kristi Kincaid, John S. Hutchinson, Richard G. Baraniuk

    Abstract: We introduce a new area of study in the field of educational Natural Language Processing: Automated Long Answer Grading (ALAG). Distinguishing itself from Automated Short Answer Grading (ASAG) and Automated Essay Grading (AEG), ALAG presents unique challenges due to the complexity and multifaceted nature of fact-based long answers. To study ALAG, we introduce RiceChem, a dataset derived from a col… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  18. arXiv:2404.13680  [pdf, other

    cs.CV cs.AI

    Zero-shot High-fidelity and Pose-controllable Character Animation

    Authors: Bingwen Zhu, Fanyi Wang, Tianyi Lu, Peng Liu, Jingwen Su, Jinxiu Liu, Yanhao Zhang, Zuxuan Wu, Guo-Jun Qi, Yu-Gang Jiang

    Abstract: Image-to-video (I2V) generation aims to create a video sequence from a single image, which requires high temporal coherence and visual fidelity. However, existing approaches suffer from inconsistency of character appearances and poor preservation of fine details. Moreover, they require a large amount of video data for training, which can be computationally demanding. To address these limitations,… ▽ More

    Submitted 5 June, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

    Comments: 10 pages, 5 figures

  19. arXiv:2404.11044  [pdf, other

    cs.AR

    Asynchronous Memory Access Unit: Exploiting Massive Parallelism for Far Memory Access

    Authors: Luming Wang, Xu Zhang, Songyue Wang, Zhuolun Jiang, Tianyue Lu, Mingyu Chen, Siwei Luo, Keji Huang

    Abstract: The growing memory demands of modern applications have driven the adoption of far memory technologies in data centers to provide cost-effective, high-capacity memory solutions. However, far memory presents new performance challenges because its access latencies are significantly longer and more variable than local DRAM. For applications to achieve acceptable performance on far memory, a high degre… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  20. arXiv:2404.06514  [pdf, other

    quant-ph cond-mat.stat-mech cond-mat.str-el

    Disentangling transitions in topological order induced by boundary decoherence

    Authors: Tsung-Cheng Lu

    Abstract: We study the entanglement structure of topological orders subject to decoherence on the bipartition boundary. Focusing on the toric codes in $d$ space dimensions for $d=2,3,4$, we explore whether the boundary decoherence may be able to induce a disentangling transition, characterized by the destruction of mixed-state long-range entanglement across the bipartition, measured by topological entanglem… ▽ More

    Submitted 26 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    Comments: 16 pages, 5 figures; typos fixed

  21. arXiv:2404.00723  [pdf, other

    quant-ph physics.optics

    Quantum Weak Force Sensing with Squeezed Magnomechanics

    Authors: Qian Zhang, Jie Wang, Tian-Xiang Lu, Franco Nori, Hui Jing

    Abstract: Cavity magnomechanics, exhibiting remarkable experimental tunability, rich magnonic nonlinearities, and compatibility with various quantum systems, has witnessed considerable advances in recent years. However, the potential benefits of using cavity magnomechanical (CMM) systems in further improving the performance of quantum-enhanced sensing for weak forces remain largely unexplored. Here we show… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  22. arXiv:2403.17898  [pdf, other

    cs.CV

    Octree-GS: Towards Consistent Real-time Rendering with LOD-Structured 3D Gaussians

    Authors: Kerui Ren, Lihan Jiang, Tao Lu, Mulin Yu, Linning Xu, Zhangkai Ni, Bo Dai

    Abstract: The recent 3D Gaussian splatting (3D-GS) has shown remarkable rendering fidelity and efficiency compared to NeRF-based neural scene representations. While demonstrating the potential for real-time rendering, 3D-GS encounters rendering bottlenecks in large scenes with complex details due to an excessive number of Gaussian primitives located within the viewing frustum. This limitation is particularl… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: Project page: https://city-super.github.io/octree-gs/

  23. arXiv:2403.16964  [pdf, other

    cs.CV

    GSDF: 3DGS Meets SDF for Improved Rendering and Reconstruction

    Authors: Mulin Yu, Tao Lu, Linning Xu, Lihan Jiang, Yuanbo Xiangli, Bo Dai

    Abstract: Presenting a 3D scene from multiview images remains a core and long-standing challenge in computer vision and computer graphics. Two main requirements lie in rendering and reconstruction. Notably, SOTA rendering quality is usually achieved with neural volumetric rendering techniques, which rely on aggregated point/primitive-wise color and neglect the underlying scene geometry. Learning of neural i… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Project page: https://city-super.github.io/GSDF

  24. arXiv:2403.12995  [pdf, other

    q-bio.BM cs.CE cs.LG

    ESM All-Atom: Multi-scale Protein Language Model for Unified Molecular Modeling

    Authors: Kangjie Zheng, Siyu Long, Tianyu Lu, Junwei Yang, Xinyu Dai, Ming Zhang, Zaiqing Nie, Wei-Ying Ma, Hao Zhou

    Abstract: Protein language models have demonstrated significant potential in the field of protein engineering. However, current protein language models primarily operate at the residue scale, which limits their ability to provide information at the atom level. This limitation prevents us from fully exploiting the capabilities of protein language models for applications involving both proteins and small mole… ▽ More

    Submitted 12 June, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: ICML2024 camera-ready, update some experimental results, add github url, fix some typos

  25. arXiv:2403.09979  [pdf, other

    quant-ph

    Quantum Advantage of One-Way Squeezing in Enhancing Weak-Force Sensing

    Authors: Jie Wang, Qian Zhang, Ya-Feng Jiao, Sheng-Dian Zhang, Tian-Xiang Lu, Zhipeng Li, Cheng-Wei Qiu, Hui Jing

    Abstract: Cavity optomechanical (COM) sensors, featuring efficient light-motion couplings, have been widely used for ultra sensitive measurements of various physical quantities ranging from displacements to accelerations or weak forces. Previous works, however, have mainly focused on reciprocal COM systems. Here, we propose how to further improve the performance of quantum COM sensors by breaking reciprocal… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: 7 pages,3 figures

  26. arXiv:2403.09626  [pdf, other

    cs.CV

    Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding

    Authors: Guo Chen, Yifei Huang, Jilan Xu, Baoqi Pei, Zhe Chen, Zhiqi Li, Jiahao Wang, Kunchang Li, Tong Lu, Limin Wang

    Abstract: Understanding videos is one of the fundamental directions in computer vision research, with extensive efforts dedicated to exploring various architectures such as RNN, 3D CNN, and Transformers. The newly proposed architecture of state space model, e.g., Mamba, shows promising traits to extend its success in long sequence modeling to video modeling. To assess whether Mamba can be a viable alternati… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: Technical Report

  27. arXiv:2403.04247  [pdf, other

    cs.CL

    UltraWiki: Ultra-fine-grained Entity Set Expansion with Negative Seed Entities

    Authors: Yangning Li, Qingsong Lv, Tianyu Yu, Yinghui Li, Shulin Huang, Tingwei Lu, Xuming Hu, Wenhao JIang, Hai-Tao Zheng, Hui Wang

    Abstract: Entity Set Expansion (ESE) aims to identify new entities belonging to the same semantic class as a given set of seed entities. Traditional methods primarily relied on positive seed entities to represent a target semantic class, which poses challenge for the representation of ultra-fine-grained semantic classes. Ultra-fine-grained semantic classes are defined based on fine-grained semantic classes… ▽ More

    Submitted 23 April, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

    Comments: Initial Version

  28. arXiv:2403.03419  [pdf, other

    cs.CL cs.AI

    Negating Negatives: Alignment without Human Positive Samples via Distributional Dispreference Optimization

    Authors: Shitong Duan, Xiaoyuan Yi, Peng Zhang, Tun Lu, Xing Xie, Ning Gu

    Abstract: Large language models (LLMs) have revolutionized the role of AI, yet also pose potential risks of propagating unethical content. Alignment technologies have been introduced to steer LLMs towards human preference, gaining increasing attention. Despite notable breakthroughs in this direction, existing methods heavily rely on high-quality positive-negative training pairs, suffering from noisy labels… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  29. arXiv:2403.02308  [pdf, other

    cs.CV

    Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

    Authors: Yuchen Duan, Weiyun Wang, Zhe Chen, Xizhou Zhu, Lewei Lu, Tong Lu, Yu Qiao, Hongsheng Li, Jifeng Dai, Wenhai Wang

    Abstract: Transformers have revolutionized computer vision and natural language processing, but their high computational complexity limits their application in high-resolution image processing and long-context analysis. This paper introduces Vision-RWKV (VRWKV), a model adapted from the RWKV model used in the NLP field with necessary modifications for vision tasks. Similar to the Vision Transformer (ViT), o… ▽ More

    Submitted 7 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

  30. arXiv:2402.19327  [pdf

    cond-mat.mtrl-sci

    GPTFF: A high-accuracy out-of-the-box universal AI force field for arbitrary inorganic materials

    Authors: Fankai Xie, Tenglong Lu, Sheng Meng, Miao Liu

    Abstract: This study introduces a novel AI force field, namely graph-based pre-trained transformer force field (GPTFF), which can simulate arbitrary inorganic systems with good precision and generalizability. Harnessing a large trove of the data and the attention mechanism of transformer algorithms, the model can accurately predict energy, atomic forces, and stress with Mean Absolute Error (MAE) values of 3… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  31. arXiv:2402.15991  [pdf, other

    cs.CL

    $C^3$: Confidence Calibration Model Cascade for Inference-Efficient Cross-Lingual Natural Language Understanding

    Authors: Taixi Lu, Haoyu Wang, Huajie Shao, Jing Gao, Huaxiu Yao

    Abstract: Cross-lingual natural language understanding (NLU) is a critical task in natural language processing (NLP). Recent advancements have seen multilingual pre-trained language models (mPLMs) significantly enhance the performance of these tasks. However, mPLMs necessitate substantial resources and incur high computational costs during inference, posing challenges for deployment in real-world and real-t… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  32. arXiv:2402.08426  [pdf, other

    cs.IR cs.LG

    Frequency-aware Graph Signal Processing for Collaborative Filtering

    Authors: Jiafeng Xia, Dongsheng Li, Hansu Gu, Tun Lu, Peng Zhang, Li Shang, Ning Gu

    Abstract: Graph Signal Processing (GSP) based recommendation algorithms have recently attracted lots of attention due to its high efficiency. However, these methods failed to consider the importance of various interactions that reflect unique user/item characteristics and failed to utilize user and item high-order neighborhood information to model user preference, thus leading to sub-optimal performance. To… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  33. arXiv:2402.02374  [pdf, other

    cs.CV

    PromptRR: Diffusion Models as Prompt Generators for Single Image Reflection Removal

    Authors: Tao Wang, Wanglong Lu, Kaihao Zhang, Wenhan Luo, Tae-Kyun Kim, Tong Lu, Hongdong Li, Ming-Hsuan Yang

    Abstract: Existing single image reflection removal (SIRR) methods using deep learning tend to miss key low-frequency (LF) and high-frequency (HF) differences in images, affecting their effectiveness in removing reflections. To address this problem, this paper proposes a novel prompt-guided reflection removal (PromptRR) framework that uses frequency information as new visual prompts for better reflection per… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

    Comments: 10 pages, 10 figures

  34. InteractOut: Leveraging Interaction Proxies as Input Manipulation Strategies for Reducing Smartphone Overuse

    Authors: Tao Lu, Hongxiao Zheng, Tianying Zhang, Xuhai Xu, Anhong Guo

    Abstract: Smartphone overuse poses risks to people's physical and mental health. However, current intervention techniques mainly focus on explicitly changing screen content (i.e., output) and often fail to persistently reduce smartphone overuse due to being over-restrictive or over-flexible. We present the design and implementation of InteractOut, a suite of implicit input manipulation techniques that lever… ▽ More

    Submitted 19 February, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: CHI 2024

  35. arXiv:2401.15261  [pdf, other

    cs.CV

    Vanishing-Point-Guided Video Semantic Segmentation of Driving Scenes

    Authors: Diandian Guo, Deng-Ping Fan, Tongyu Lu, Christos Sakaridis, Luc Van Gool

    Abstract: The estimation of implicit cross-frame correspondences and the high computational cost have long been major challenges in video semantic segmentation (VSS) for driving scenes. Prior works utilize keyframes, feature propagation, or cross-frame attention to address these issues. By contrast, we are the first to harness vanishing point (VP) priors for more effective segmentation. Intuitively, objects… ▽ More

    Submitted 25 April, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

    Comments: CVPR 2024 highlight

  36. arXiv:2401.13723  [pdf, other

    astro-ph.IM astro-ph.EP physics.soc-ph

    Emerging Researchers in Exoplanetary Science (ERES): Lessons Learned in Conference Organization for Early-Career Researchers

    Authors: W. Garrett Levine, Konstantin Gerbig, Emma M. Louden, Tiger Lu, Cheng-Han Hsieh, Christopher O'Connor, Rixin Li, Jiayin Dong

    Abstract: Since 2015, the Emerging Researchers in Exoplanetary Science (ERES) conference has provided a venue for early-career researchers in exoplanetary astronomy, astrophysics, and planetary science to share their research, network, and build new collaborations. ERES stands out in that it is spearheaded by early-career researchers, providing a unique attendance experience for the participants and a profe… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

    Comments: To appear in the Bulletin of the American Astronomical Society (see DOI); 13 pages, 6 figures

  37. arXiv:2401.10529  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences

    Authors: Xiyao Wang, Yuhang Zhou, Xiaoyu Liu, Hongjin Lu, Yuancheng Xu, Feihong He, Jaehong Yoon, Taixi Lu, Gedas Bertasius, Mohit Bansal, Huaxiu Yao, Furong Huang

    Abstract: Multimodal Large Language Models (MLLMs) have demonstrated proficiency in handling a variety of visual-language tasks. However, current MLLM benchmarks are predominantly designed to evaluate reasoning based on static information about a single image, and the ability of modern MLLMs to extrapolate from image sequences, which is essential for understanding our ever-changing world, has been less inve… ▽ More

    Submitted 24 January, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Comments: 27 pages, 23 figures

  38. arXiv:2401.10208  [pdf, other

    cs.CV cs.CL

    MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer

    Authors: Changyao Tian, Xizhou Zhu, Yuwen Xiong, Weiyun Wang, Zhe Chen, Wenhai Wang, Yuntao Chen, Lewei Lu, Tong Lu, Jie Zhou, Hongsheng Li, Yu Qiao, Jifeng Dai

    Abstract: Developing generative models for interleaved image-text data has both research and practical value. It requires models to understand the interleaved sequences and subsequently generate images and text. However, existing attempts are limited by the issue that the fixed number of visual tokens cannot efficiently capture image details, which is particularly problematic in the multi-image scenarios. T… ▽ More

    Submitted 2 April, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

    Comments: 20 pages, 9 figures, 17 tables

  39. arXiv:2401.08614  [pdf, ps, other

    math.QA

    Computing the Haar state of $\mathcal{O}(SL_q(3))$ using value preserving (anti)homomorphisms

    Authors: Ting Lu

    Abstract: In this paper, we introduce two (anti)homomorphisms that preserve the Haar state values of monomials. Together with the modular automorphism, the three (anti)homomorphisms are used in our new algorithm to compute the Haar states of monomials on $\mathcal{O}(SL_q(3))$. Comparing with the algorithm proposed in the author's previous work \cite{lu2023}, the new algorithm reduces the linear relations u… ▽ More

    Submitted 26 April, 2024; v1 submitted 1 December, 2023; originally announced January 2024.

    Comments: Removed text overlap with arXiv:2301.12683. Updated the introduction section and changed title for section 2

    MSC Class: 20G42(Primary) 46L53(Secondary)

  40. arXiv:2401.08036  [pdf, other

    cs.CV

    3D Lane Detection from Front or Surround-View using Joint-Modeling & Matching

    Authors: Haibin Zhou, Huabing Zhou, Jun Chang, Tao Lu, Jiayi Ma

    Abstract: 3D lanes offer a more comprehensive understanding of the road surface geometry than 2D lanes, thereby providing crucial references for driving decisions and trajectory planning. While many efforts aim to improve prediction accuracy, we recognize that an efficient network can bring results closer to lane modeling. However, if the modeling data is imprecise, the results might not accurately capture… ▽ More

    Submitted 28 May, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

    Comments: Accepted to IEEE Transactions on Intelligent Vehicles(T-IV). 13 pages with 9 figures and 6 tables

  41. arXiv:2401.06197  [pdf, other

    cs.CV

    Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications

    Authors: Yuwen Xiong, Zhiqi Li, Yuntao Chen, Feng Wang, Xizhou Zhu, Jiapeng Luo, Wenhai Wang, Tong Lu, Hongsheng Li, Yu Qiao, Lewei Lu, Jie Zhou, Jifeng Dai

    Abstract: We introduce Deformable Convolution v4 (DCNv4), a highly efficient and effective operator designed for a broad spectrum of vision applications. DCNv4 addresses the limitations of its predecessor, DCNv3, with two key enhancements: 1. removing softmax normalization in spatial aggregation to enhance its dynamic property and expressive power and 2. optimizing memory access to minimize redundant operat… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

    Comments: Tech report; Code: https://github.com/OpenGVLab/DCNv4

  42. CRA-PCN: Point Cloud Completion with Intra- and Inter-level Cross-Resolution Transformers

    Authors: Yi Rong, Haoran Zhou, Lixin Yuan, Cheng Mei, Jiahao Wang, Tong Lu

    Abstract: Point cloud completion is an indispensable task for recovering complete point clouds due to incompleteness caused by occlusion, limited sensor resolution, etc. The family of coarse-to-fine generation architectures has recently exhibited great success in point cloud completion and gradually became mainstream. In this work, we unveil one of the key ingredients behind these methods: meticulously devi… ▽ More

    Submitted 14 February, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

    Comments: Accepted to AAAI 2024

  43. arXiv:2312.17235  [pdf, other

    cs.CV

    A Simple LLM Framework for Long-Range Video Question-Answering

    Authors: Ce Zhang, Taixi Lu, Md Mohaiminul Islam, Ziyang Wang, Shoubin Yu, Mohit Bansal, Gedas Bertasius

    Abstract: We present LLoVi, a language-based framework for long-range video question-answering (LVQA). Unlike prior long-range video understanding methods, which are often costly and require specialized long-range video modeling design (e.g., memory queues, state-space layers, etc.), our approach uses a frame/clip-level visual captioner (e.g., BLIP2, LaViLa, LLaVA) coupled with a Large Language Model (GPT-3… ▽ More

    Submitted 26 February, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

  44. arXiv:2312.15690  [pdf, other

    cs.CV

    Word length-aware text spotting: Enhancing detection and recognition in dense text image

    Authors: Hao Wang, Huabing Zhou, Yanduo Zhang, Tao Lu, Jiayi Ma

    Abstract: Scene text spotting is essential in various computer vision applications, enabling extracting and interpreting textual information from images. However, existing methods often neglect the spatial semantics of word images, leading to suboptimal detection recall rates for long and short words within long-tailed word length distributions that exist prominently in dense scenes. In this paper, we prese… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

  45. arXiv:2312.14238  [pdf, other

    cs.CV

    InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

    Authors: Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, Bin Li, Ping Luo, Tong Lu, Yu Qiao, Jifeng Dai

    Abstract: The exponential growth of large language models (LLMs) has opened up numerous possibilities for multimodal AGI systems. However, the progress in vision and vision-language foundation models, which are also critical elements of multi-modal AGI, has not kept pace with LLMs. In this work, we design a large-scale vision-language foundation model (InternVL), which scales up the vision foundation model… ▽ More

    Submitted 15 January, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: 25 pages, 5 figures, 28 tables

  46. arXiv:2312.11968  [pdf, other

    physics.optics

    Multi-color nonreciprocal optical amplifier with spinning active optomechanics

    Authors: Ru-Ting Sun, Mei-Yu Peng, Tian-Xiang Lu, Ya-Feng Jiao, Jie Wang, Qian Zhang, Hui Jing

    Abstract: We propose to achieve a multi-color nonreciprocal optical amplifier, a crucial device in optical communication and information processing, by spinning an active resonator. We show that in such a device, due to the interplay of the Sagnac effect and the optical gain, nonreciprocal signal {amplification} can be realized, accompanied by a giant enhancement of optical group delay from… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: 8pages, 4 figures

  47. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  48. arXiv:2312.06896  [pdf, other

    physics.optics

    Quantum squeezing induced nonreciprocal phonon laser

    Authors: Tian-Xiang Lu, Yan Wang, Keyu Xia, Xing Xiao, Le-Man Kuang, Hui Jing

    Abstract: Phonon lasers or coherent amplifications of mechanical oscillations have provided powerful tools for both fundamental studies of coherent acoustics and diverse applications ranging from ultrasensitive force sensing to phononic information processing. Here, we propose how to achieve directional phonon lasing with an optomechanical resonator coupled to a nonlinear optical resonator. We find that, by… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: 11 pages, 4 figures

  49. arXiv:2312.03988  [pdf, other

    quant-ph

    Enhanced high-dimensional teleportation in correlated amplitude damping noise by weak measurement and environment-assisted measurement

    Authors: Xing Xiao, Tian-Xiang Lu, Yan-Ling Li

    Abstract: High-dimensional teleportation provides various benefits in quantum networks and repeaters, but all these advantages rely on the high-quality distribution of high-dimensional entanglement over a noisy channel. It is essential to consider correlation effects when two entangled qutrits travel consecutively through the same channel. In this paper, we present two strategies for enhancing qutrit telepo… ▽ More

    Submitted 2 July, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

    Comments: 18 pages, 5 figures. The figure 1 is replaced

  50. arXiv:2312.03031  [pdf, other

    cs.CV

    Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?

    Authors: Zhiqi Li, Zhiding Yu, Shiyi Lan, Jiahan Li, Jan Kautz, Tong Lu, Jose M. Alvarez

    Abstract: End-to-end autonomous driving recently emerged as a promising research direction to target autonomy from a full-stack perspective. Along this line, many of the latest works follow an open-loop evaluation setting on nuScenes to study the planning behavior. In this paper, we delve deeper into the problem by conducting thorough analyses and demystifying more devils in the details. We initially observ… ▽ More

    Submitted 2 June, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: Accept to cvpr 2024