Skip to main content

Showing 1–50 of 650 results for author: Kong, L

  1. arXiv:2407.06190  [pdf, other

    cs.CV cs.LG cs.RO

    4D Contrastive Superflows are Dense 3D Representation Learners

    Authors: Xiang Xu, Lingdong Kong, Hui Shuai, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu, Qingshan Liu

    Abstract: In the realm of autonomous driving, accurate 3D perception is the foundation. However, developing such models relies on extensive human annotations -- a process that is both costly and labor-intensive. To address this challenge from a data representation learning perspective, we introduce SuperFlow, a novel framework designed to harness consecutive LiDAR-camera pairs for establishing spatiotempora… ▽ More

    Submitted 9 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

    Comments: ECCV 2024; 36 pages, 11 figures, 11 tables; Code at https://github.com/Xiangxu-0103/SuperFlow

  2. arXiv:2407.02641  [pdf, other

    cs.LG cs.AI

    Learning Graph Structures and Uncertainty for Accurate and Calibrated Time-series Forecasting

    Authors: Harshavardhan Kamarthi, Lingkai Kong, Alexander Rodriguez, Chao Zhang, B Aditya Prakash

    Abstract: Multi-variate time series forecasting is an important problem with a wide range of applications. Recent works model the relations between time-series as graphs and have shown that propagating information over the relation graph can improve time series forecasting. However, in many cases, relational information is not available or is noisy and reliable. Moreover, most works ignore the underlying un… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  3. arXiv:2406.18201  [pdf, other

    eess.IV cs.CV

    EFCNet: Every Feature Counts for Small Medical Object Segmentation

    Authors: Lingjie Kong, Qiaoling Wei, Chengming Xu, Han Chen, Yanwei Fu

    Abstract: This paper explores the segmentation of very small medical objects with significant clinical value. While Convolutional Neural Networks (CNNs), particularly UNet-like models, and recent Transformers have shown substantial progress in image segmentation, our empirical findings reveal their poor performance in segmenting the small medical objects and lesions concerned in this paper. This limitation… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  4. arXiv:2406.16976  [pdf, other

    cs.NE cs.AI cs.LG physics.chem-ph

    Efficient Evolutionary Search Over Chemical Space with Large Language Models

    Authors: Haorui Wang, Marta Skreta, Cher-Tian Ser, Wenhao Gao, Lingkai Kong, Felix Strieth-Kalthoff, Chenru Duan, Yuchen Zhuang, Yue Yu, Yanqiao Zhu, Yuanqi Du, Alán Aspuru-Guzik, Kirill Neklyudov, Chao Zhang

    Abstract: Molecular discovery, when formulated as an optimization problem, presents significant computational challenges because optimization objectives can be non-differentiable. Evolutionary Algorithms (EAs), often used to optimize black-box objectives in molecular discovery, traverse chemical space by performing random mutations and crossovers, leading to a large number of expensive objective evaluations… ▽ More

    Submitted 2 July, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

  5. arXiv:2406.16896  [pdf, other

    eess.SP cs.LG

    f-GAN: A frequency-domain-constrained generative adversarial network for PPG to ECG synthesis

    Authors: Nathan C. L. Kong, Dae Lee, Huyen Do, Dae Hoon Park, Cong Xu, Hongda Mao, Jonathan Chung

    Abstract: Electrocardiograms (ECGs) and photoplethysmograms (PPGs) are generally used to monitor an individual's cardiovascular health. In clinical settings, ECGs and fingertip PPGs are the main signals used for assessing cardiovascular health, but the equipment necessary for their collection precludes their use in daily monitoring. Although PPGs obtained from wrist-worn devices are susceptible to noise due… ▽ More

    Submitted 15 May, 2024; originally announced June 2024.

  6. arXiv:2406.16708  [pdf, other

    cs.LG stat.ME

    CausalFormer: An Interpretable Transformer for Temporal Causal Discovery

    Authors: Lingbai Kong, Wengen Li, Hanchen Yang, Yichao Zhang, Jihong Guan, Shuigeng Zhou

    Abstract: Temporal causal discovery is a crucial task aimed at uncovering the causal relations within time series data. The latest temporal causal discovery methods usually train deep learning models on prediction tasks to uncover the causality between time series. They capture causal relations by analyzing the parameters of some components of the trained models, e.g., attention weights and convolution weig… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  7. arXiv:2406.16567  [pdf, other

    cs.CL

    Data Augmentation of Multi-turn Psychological Dialogue via Knowledge-driven Progressive Thought Prompting

    Authors: Jiyue Jiang, Liheng Chen, Sheng Wang, Lingpeng Kong, Yu Li, Chuan Wu

    Abstract: Existing dialogue data augmentation (DA) techniques predominantly focus on augmenting utterance-level dialogues, which makes it difficult to take dialogue contextual information into account. The advent of large language models (LLMs) has simplified the implementation of multi-turn dialogues. Due to absence of professional understanding and knowledge, it remains challenging to deliver satisfactory… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  8. arXiv:2406.16500  [pdf, other

    cs.NE

    A Dual-Channel Particle Swarm Optimization Algorithm Based on Adaptive Balance Search

    Authors: Zhenxing Zhang, Tianxian Zhang, Xiangliang Xu, Lingjiang Kong, Yi Han, Zicheng Wang

    Abstract: The balance between exploration (Er) and exploitation (Ei) determines the generalization performance of the particle swarm optimization (PSO) algorithm on different problems. Although the insufficient balance caused by global best being located near a local minimum has been widely researched, few scholars have systematically paid attention to two behaviors about personal best position (P) and glob… ▽ More

    Submitted 25 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

  9. arXiv:2406.16064  [pdf, other

    cond-mat.mes-hall physics.comp-ph

    MicroMagnetic.jl: A Julia package for micromagnetic and atomistic simulations with GPU support

    Authors: Weiwei Wang, Boyao Lyu, Lingyao Kong, Hans Fangohr, Haifeng Du

    Abstract: MicroMagnetic.jl is an open-source Julia package for micromagnetic and atomistic simulations. Using the features of the Julia programming language, MicroMagnetic.jl supports CPU and various GPU platforms, including NVIDIA, AMD, Intel, and Apple GPUs. Moreover, MicroMagnetic.jl supports Monte Carlo simulations for atomistic models and implements the Nudged-Elastic-Band method for energy barrier com… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  10. arXiv:2406.14683  [pdf, other

    cs.LG cs.CL

    TAGLAS: An atlas of text-attributed graph datasets in the era of large graph and language models

    Authors: Jiarui Feng, Hao Liu, Lecheng Kong, Yixin Chen, Muhan Zhang

    Abstract: In this report, we present TAGLAS, an atlas of text-attributed graph (TAG) datasets and benchmarks. TAGs are graphs with node and edge features represented in text, which have recently gained wide applicability in training graph-language or graph foundation models. In TAGLAS, we collect and integrate more than 23 TAG datasets with domains ranging from citation graphs to molecule graphs and tasks f… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Preprint

  11. arXiv:2406.14393  [pdf, other

    cs.LG cs.CL

    Jailbreaking as a Reward Misspecification Problem

    Authors: Zhihui Xie, Jiahui Gao, Lei Li, Zhenguo Li, Qi Liu, Lingpeng Kong

    Abstract: The widespread adoption of large language models (LLMs) has raised concerns about their safety and reliability, particularly regarding their vulnerability to adversarial attacks. In this paper, we propose a novel perspective that attributes this vulnerability to reward misspecification during the alignment process. We introduce a metric ReGap to quantify the extent of reward misspecification and d… ▽ More

    Submitted 12 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: github url added

  12. arXiv:2406.12214  [pdf, other

    cs.RO cs.CV

    Is Your HD Map Constructor Reliable under Sensor Corruptions?

    Authors: Xiaoshuai Hao, Mengchuan Wei, Yifan Yang, Haimei Zhao, Hui Zhang, Yi Zhou, Qiang Wang, Weiming Li, Lingdong Kong, Jing Zhang

    Abstract: Driving systems often rely on high-definition (HD) maps for precise environmental information, which is crucial for planning and navigation. While current HD map constructors perform well under ideal conditions, their resilience to real-world challenges, \eg, adverse weather and sensor failures, is not well understood, raising safety concerns. This work introduces MapBench, the first comprehensive… ▽ More

    Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: project url: https://mapbench.github.io/

  13. arXiv:2406.11643  [pdf, other

    cs.CV

    AnyMaker: Zero-shot General Object Customization via Decoupled Dual-Level ID Injection

    Authors: Lingjie Kong, Kai Wu, Xiaobin Hu, Wenhui Han, Jinlong Peng, Chengming Xu, Donghao Luo, Jiangning Zhang, Chengjie Wang, Yanwei Fu

    Abstract: Text-to-image based object customization, aiming to generate images with the same identity (ID) as objects of interest in accordance with text prompts and reference images, has made significant progress. However, recent customizing research is dominated by specialized tasks, such as human customization or virtual try-on, leaving a gap in general object customization. To this end, we introduce AnyM… ▽ More

    Submitted 5 July, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  14. arXiv:2406.09387  [pdf, other

    stat.ML stat.CO stat.ME

    Oblivious subspace embeddings for compressed Tucker decompositions

    Authors: Matthew Pietrosanu, Bei Jiang, Linglong Kong

    Abstract: Emphasis in the tensor literature on random embeddings (tools for low-distortion dimension reduction) for the canonical polyadic (CP) tensor decomposition has left analogous results for the more expressive Tucker decomposition comparatively lacking. This work establishes general Johnson-Lindenstrauss (JL) type guarantees for the estimation of Tucker decompositions when an oblivious random embeddin… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    MSC Class: 68W20 ACM Class: G.3

  15. arXiv:2406.09130  [pdf, other

    cs.LG cs.AI

    Time-Series Forecasting for Out-of-Distribution Generalization Using Invariant Learning

    Authors: Haoxin Liu, Harshavardhan Kamarthi, Lingkai Kong, Zhiyuan Zhao, Chao Zhang, B. Aditya Prakash

    Abstract: Time-series forecasting (TSF) finds broad applications in real-world scenarios. Due to the dynamic nature of time-series data, it is crucial to equip TSF models with out-of-distribution (OOD) generalization abilities, as historical training data and future test data can have different distributions. In this paper, we aim to alleviate the inherent OOD problem in TSF via invariant learning. We ident… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 14 pages

    ACM Class: H.0

  16. arXiv:2406.08627  [pdf, other

    cs.LG cs.CL

    Time-MMD: A New Multi-Domain Multimodal Dataset for Time Series Analysis

    Authors: Haoxin Liu, Shangqing Xu, Zhiyuan Zhao, Lingkai Kong, Harshavardhan Kamarthi, Aditya B. Sasanur, Megha Sharma, Jiaming Cui, Qingsong Wen, Chao Zhang, B. Aditya Prakash

    Abstract: Time series data are ubiquitous across a wide range of real-world domains. While real-world time series analysis (TSA) requires human experts to integrate numerical series data with multimodal domain-specific knowledge, most existing TSA models rely solely on numerical data, overlooking the significance of information beyond numerical series. This oversight is due to the untapped potential of text… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  17. arXiv:2406.06649  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    2DQuant: Low-bit Post-Training Quantization for Image Super-Resolution

    Authors: Kai Liu, Haotong Qin, Yong Guo, Xin Yuan, Linghe Kong, Guihai Chen, Yulun Zhang

    Abstract: Low-bit quantization has become widespread for compressing image super-resolution (SR) models for edge deployment, which allows advanced SR models to enjoy compact low-bit parameters and efficient integer/bitwise constructions for storage compression and inference acceleration, respectively. However, it is notorious that low-bit quantization degrades the accuracy of SR models compared to their ful… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 9 pages, 6 figures. The code and models will be available at https://github.com/Kai-Liu001/2DQuant

  18. arXiv:2406.05954  [pdf, other

    cs.AI cs.LG eess.SY

    Aligning Large Language Models with Representation Editing: A Control Perspective

    Authors: Lingkai Kong, Haorui Wang, Wenhao Mu, Yuanqi Du, Yuchen Zhuang, Yifei Zhou, Yue Song, Rongzhi Zhang, Kai Wang, Chao Zhang

    Abstract: Aligning large language models (LLMs) with human objectives is crucial for real-world applications. However, fine-tuning LLMs for alignment often suffers from unstable training and requires substantial computing resources. Test-time alignment techniques, such as prompting and guided decoding, do not modify the underlying model, and their performance remains dependent on the original model's capabi… ▽ More

    Submitted 11 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: fix typos

  19. arXiv:2406.05723  [pdf, other

    cs.CV

    Binarized Diffusion Model for Image Super-Resolution

    Authors: Zheng Chen, Haotong Qin, Yong Guo, Xiongfei Su, Xin Yuan, Linghe Kong, Yulun Zhang

    Abstract: Advanced diffusion models (DMs) perform impressively in image super-resolution (SR), but the high memory and computational costs hinder their deployment. Binarization, an ultra-compression algorithm, offers the potential for effectively accelerating DMs. Nonetheless, due to the model structure and the multi-step iterative attribute of DMs, existing binarization methods result in significant perfor… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Code is available at https://github.com/zhengchen1999/BI-DiffSR

  20. arXiv:2406.05316  [pdf, other

    cs.LG

    C-Mamba: Channel Correlation Enhanced State Space Models for Multivariate Time Series Forecasting

    Authors: Chaolv Zeng, Zhanyu Liu, Guanjie Zheng, Linghe Kong

    Abstract: In recent years, significant progress has been made in multivariate time series forecasting using Linear-based, Transformer-based, and Convolution-based models. However, these approaches face notable limitations: linear forecasters struggle with representation capacities, attention mechanisms suffer from quadratic complexity, and convolutional models have a restricted receptive field. These constr… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  21. arXiv:2406.04744  [pdf, other

    cs.CL

    CRAG -- Comprehensive RAG Benchmark

    Authors: Xiao Yang, Kai Sun, Hao Xin, Yushi Sun, Nikita Bhalla, Xiangsen Chen, Sajal Choudhary, Rongze Daniel Gui, Ziran Will Jiang, Ziyu Jiang, Lingkun Kong, Brian Moran, Jiaqi Wang, Yifan Ethan Xu, An Yan, Chenyu Yang, Eting Yuan, Hanwen Zha, Nan Tang, Lei Chen, Nicolas Scheffer, Yue Liu, Nirav Shah, Rakesh Wanga, Anuj Kumar , et al. (2 additional authors not shown)

    Abstract: Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution to alleviate Large Language Model (LLM)'s deficiency in lack of knowledge. Existing RAG datasets, however, do not adequately represent the diverse and dynamic nature of real-world Question Answering (QA) tasks. To bridge this gap, we introduce the Comprehensive RAG Benchmark (CRAG), a factual question answering bench… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  22. arXiv:2406.02131  [pdf, other

    cs.LG cs.AI

    CondTSF: One-line Plugin of Dataset Condensation for Time Series Forecasting

    Authors: Jianrong Ding, Zhanyu Liu, Guanjie Zheng, Haiming Jin, Linghe Kong

    Abstract: Dataset condensation is a newborn technique that generates a small dataset that can be used in training deep neural networks to lower training costs. The objective of dataset condensation is to ensure that the model trained with the synthetic dataset can perform comparably to the model trained with full datasets. However, existing methods predominantly concentrate on classification tasks, posing c… ▽ More

    Submitted 11 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: 23 pages, 13 figures

  23. arXiv:2406.01876  [pdf, other

    cs.DB cs.AI cs.CL cs.IR cs.LG

    GRAM: Generative Retrieval Augmented Matching of Data Schemas in the Context of Data Security

    Authors: Xuanqing Liu, Luyang Kong, Runhui Wang, Patrick Song, Austin Nevins, Henrik Johnson, Nimish Amlathe, Davor Golac

    Abstract: Schema matching constitutes a pivotal phase in the data ingestion process for contemporary database systems. Its objective is to discern pairwise similarities between two sets of attributes, each associated with a distinct data table. This challenge emerges at the initial stages of data analytics, such as when incorporating a third-party table into existing databases to inform business insights. G… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: KDD 2024 Camera Ready; 11 pages, 8 figures

  24. arXiv:2406.00519  [pdf, other

    cs.LG cs.AI stat.ML

    Learning Discrete Concepts in Latent Hierarchical Models

    Authors: Lingjing Kong, Guangyi Chen, Biwei Huang, Eric P. Xing, Yuejie Chi, Kun Zhang

    Abstract: Learning concepts from natural high-dimensional data (e.g., images) holds potential in building human-aligned and interpretable machine learning models. Despite its encouraging prospect, formalization and theoretical insights into this crucial task are still lacking. In this work, we formalize concepts as discrete latent causal variables that are related via a hierarchical causal model that encode… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  25. arXiv:2405.20390  [pdf, other

    cs.LG math.NA math.OC stat.ML

    Quantitative Convergences of Lie Group Momentum Optimizers

    Authors: Lingkai Kong, Molei Tao

    Abstract: Explicit, momentum-based dynamics that optimize functions defined on Lie groups can be constructed via variational optimization and momentum trivialization. Structure preserving time discretizations can then turn this dynamics into optimization algorithms. This article investigates two types of discretization, Lie Heavy-Ball, which is a known splitting scheme, and Lie NAG-SC, which is newly propos… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  26. arXiv:2405.20357  [pdf

    eess.IV physics.app-ph physics.optics

    Encryption in ghost imaging with Kronecker products of random matrices

    Authors: Yi-Ning Zhao, Lin-Shan Chen, Lingxin Kong, Chong Wang, Cheng Ren, De-Zhong Cao

    Abstract: By forming measurement matrices with the Kronecker product of two random matrices, image encryption in computational ghost imaging is investigated. The two-dimensional images are conveniently reconstructed with the pseudo-inverse matrices of the two random matrices. To suppress the noise, the method of truncated singular value decomposition can be applied to either or both of the two pseudo-invers… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 5 pages, 4 figures

  27. Searching for the highest energy of pulsation and critical luminosity of Swift J0243.6+6124 observed by Insight-HXMT

    Authors: Qing-Xia Zhao, Xian Hou, Ming-Yu Ge, Shuang-Nan Zhang, Yun-Xiang Xiao, You-Li Tuo, Zi-Xu Yang, Ling-Da Kong, Jin-Lu Qu, Shu Zhang, Jian-Cheng Wang

    Abstract: Owing to the broad energy coverage of Insight-HXMT in the hard X-ray band, we detected the highest energy of pulsation exceeding 200 keV around the 2017-2018 outburst peak of the first Galactic pulsating ultraluminous X-ray source (PULX) Swift J0243.6+6124, which is the highest energy detected from PULXs to date. We also obtained the highest energy of pulsation of every exposure during the outburs… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 9 pages, 9 figures, published in RAA

  28. arXiv:2405.17741  [pdf, other

    cs.AI

    LoRA-Switch: Boosting the Efficiency of Dynamic LLM Adapters via System-Algorithm Co-design

    Authors: Rui Kong, Qiyang Li, Xinyu Fang, Qingtian Feng, Qingfeng He, Yazhu Dong, Weijun Wang, Yuanchun Li, Linghe Kong, Yunxin Liu

    Abstract: Recent literature has found that an effective method to customize or further improve large language models (LLMs) is to add dynamic adapters, such as low-rank adapters (LoRA) with Mixture-of-Experts (MoE) structures. Though such dynamic adapters incur modest computational complexity, they surprisingly lead to huge inference latency overhead, slowing down the decoding speed by 2.5+ times. In this p… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  29. arXiv:2405.17426  [pdf, other

    cs.CV cs.RO

    Benchmarking and Improving Bird's Eye View Perception Robustness in Autonomous Driving

    Authors: Shaoyuan Xie, Lingdong Kong, Wenwei Zhang, Jiawei Ren, Liang Pan, Kai Chen, Ziwei Liu

    Abstract: Recent advancements in bird's eye view (BEV) representations have shown remarkable promise for in-vehicle 3D perception. However, while these methods have achieved impressive results on standard benchmarks, their robustness in varied conditions remains insufficiently assessed. In this study, we present RoboBEV, an extensive benchmark suite designed to evaluate the resilience of BEV algorithms. Thi… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Preprint; 17 pages, 13 figures, 11 tables; Code at this https URL: https://github.com/Daniel-xsy/RoboBEV

  30. arXiv:2405.16381  [pdf, other

    cs.LG cs.AI stat.ML

    Trivialized Momentum Facilitates Diffusion Generative Modeling on Lie Groups

    Authors: Yuchen Zhu, Tianrong Chen, Lingkai Kong, Evangelos A. Theodorou, Molei Tao

    Abstract: The generative modeling of data on manifold is an important task, for which diffusion models in flat spaces typically need nontrivial adaptations. This article demonstrates how a technique called `trivialization' can transfer the effectiveness of diffusion models in Euclidean spaces to Lie groups. In particular, an auxiliary momentum variable was algorithmically introduced to help transport the po… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  31. arXiv:2405.15756  [pdf, other

    cs.LG cs.AI

    Sparse Expansion and Neuronal Disentanglement

    Authors: Shashata Sawmya, Linghao Kong, Ilia Markov, Dan Alistarh, Nir Shavit

    Abstract: We show how to improve the inference efficiency of an LLM by expanding it into a mixture of sparse experts, where each expert is a copy of the original weights, one-shot pruned for a specific cluster of input values. We call this approach $\textit{Sparse Expansion}$. We show that, for models such as Llama 2 70B, as we increase the number of sparse experts, Sparse Expansion outperforms all other on… ▽ More

    Submitted 24 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: 10 pages, 8 figures

  32. arXiv:2405.14870  [pdf, other

    cs.CV cs.RO

    An Empirical Study of Training State-of-the-Art LiDAR Segmentation Models

    Authors: Jiahao Sun, Chunmei Qing, Xiang Xu, Lingdong Kong, Youquan Liu, Li Li, Chenming Zhu, Jingwei Zhang, Zeqi Xiao, Runnan Chen, Tai Wang, Wenwei Zhang, Kai Chen

    Abstract: In the rapidly evolving field of autonomous driving, precise segmentation of LiDAR data is crucial for understanding complex 3D environments. Traditional approaches often rely on disparate, standalone codebases, hindering unified advancements and fair benchmarking across models. To address these challenges, we introduce MMDetection3D-lidarseg, a comprehensive toolbox designed for the efficient tra… ▽ More

    Submitted 30 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: Preprint; 17 pages, 4 figures, 7 tables; Code at https://github.com/open-mmlab/mmdetection3d

  33. arXiv:2405.14343  [pdf, other

    cs.CV

    Efficient Visual State Space Model for Image Deblurring

    Authors: Lingshun Kong, Jiangxin Dong, Ming-Hsuan Yang, Jinshan Pan

    Abstract: Convolutional neural networks (CNNs) and Vision Transformers (ViTs) have achieved excellent performance in image restoration. ViTs typically yield superior results in image restoration compared to CNNs due to their ability to capture long-range dependencies and input-dependent characteristics. However, the computational complexity of Transformer-based models grows quadratically with the image reso… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  34. arXiv:2405.14295  [pdf, other

    cs.CV

    Focus Anywhere for Fine-grained Multi-page Document Understanding

    Authors: Chenglong Liu, Haoran Wei, Jinyue Chen, Lingyu Kong, Zheng Ge, Zining Zhu, Liang Zhao, Jianjian Sun, Chunrui Han, Xiangyu Zhang

    Abstract: Modern LVLMs still struggle to achieve fine-grained document understanding, such as OCR/translation/caption for regions of interest to the user, tasks that require the context of the entire page, or even multiple pages. Accordingly, this paper proposes Fox, an effective pipeline, hybrid data, and tuning strategy, that catalyzes LVLMs to focus anywhere on single/multi-page documents. We introduce a… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  35. arXiv:2405.08816  [pdf, other

    cs.CV cs.RO

    The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition

    Authors: Lingdong Kong, Shaoyuan Xie, Hanjiang Hu, Yaru Niu, Wei Tsang Ooi, Benoit R. Cottereau, Lai Xing Ng, Yuexin Ma, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu, Weichao Qiu, Wei Zhang, Xu Cao, Hao Lu, Ying-Cong Chen, Caixin Kang, Xinning Zhou, Chengyang Ying, Wentao Shang, Xingxing Wei, Yinpeng Dong, Bo Yang, Shengyin Jiang , et al. (66 additional authors not shown)

    Abstract: In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that c… ▽ More

    Submitted 29 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: ICRA 2024; 32 pages, 24 figures, 5 tables; Code at https://robodrive-24.github.io/

  36. arXiv:2405.06889  [pdf, other

    stat.ME math.OC

    Tuning parameter selection for the adaptive nuclear norm regularized trace regression

    Authors: Pan Shang, Lingchen Kong, Yiting Ma

    Abstract: Regularized models have been applied in lots of areas, with high-dimensional data sets being popular. Because tuning parameter decides the theoretical performance and computational efficiency of the regularized models, tuning parameter selection is a basic and important issue. We consider the tuning parameter selection for adaptive nuclear norm regularized trace regression, which achieves by the B… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  37. arXiv:2405.05259  [pdf, other

    cs.CV cs.RO

    OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies

    Authors: Lingdong Kong, Youquan Liu, Lai Xing Ng, Benoit R. Cottereau, Wei Tsang Ooi

    Abstract: Event-based semantic segmentation (ESS) is a fundamental yet challenging task for event camera sensing. The difficulties in interpreting and annotating event data limit its scalability. While domain adaptation from images to event data can help to mitigate this issue, there exist data representational differences that require additional effort to resolve. In this work, for the first time, we syner… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: CVPR 2024 (Highlight); 26 pages, 12 figures, 11 tables; Code at https://github.com/ldkong1205/OpenESS

  38. arXiv:2405.05258  [pdf, other

    cs.CV cs.LG cs.RO

    Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving

    Authors: Lingdong Kong, Xiang Xu, Jiawei Ren, Wenwei Zhang, Liang Pan, Kai Chen, Wei Tsang Ooi, Ziwei Liu

    Abstract: Efficient data utilization is crucial for advancing 3D scene understanding in autonomous driving, where reliance on heavily human-annotated LiDAR point clouds challenges fully supervised methods. Addressing this, our study extends into semi-supervised learning for LiDAR semantic segmentation, leveraging the intrinsic spatial priors of driving scenes and multi-sensor complements to augment the effi… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Preprint; 17 pages, 6 figures, 8 tables; Code at https://github.com/ldkong1205/LaserMix

  39. arXiv:2405.04967  [pdf, other

    cond-mat.mtrl-sci

    MatterSim: A Deep Learning Atomistic Model Across Elements, Temperatures and Pressures

    Authors: Han Yang, Chenxi Hu, Yichi Zhou, Xixian Liu, Yu Shi, Jielan Li, Guanzhi Li, Zekun Chen, Shuizhou Chen, Claudio Zeni, Matthew Horton, Robert Pinsler, Andrew Fowler, Daniel Zügner, Tian Xie, Jake Smith, Lixin Sun, Qian Wang, Lingyu Kong, Chang Liu, Hongxia Hao, Ziheng Lu

    Abstract: Accurate and fast prediction of materials properties is central to the digital transformation of materials design. However, the vast design space and diverse operating conditions pose significant challenges for accurately modeling arbitrary material candidates and forecasting their properties. We present MatterSim, a deep learning model actively learned from large-scale first-principles computatio… ▽ More

    Submitted 10 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

  40. arXiv:2405.04902  [pdf, other

    eess.IV cs.CV

    HAGAN: Hybrid Augmented Generative Adversarial Network for Medical Image Synthesis

    Authors: Zhihan Ju, Wanting Zhou, Longteng Kong, Yu Chen, Yi Li, Zhenan Sun, Caifeng Shan

    Abstract: Medical Image Synthesis (MIS) plays an important role in the intelligent medical field, which greatly saves the economic and time costs of medical diagnosis. However, due to the complexity of medical images and similar characteristics of different tissue cells, existing methods face great challenges in meeting their biological consistency. To this end, we propose the Hybrid Augmented Generative Ad… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  41. arXiv:2405.03729  [pdf

    eess.IV physics.optics quant-ph

    Computational ghost imaging with hybrid transforms by integrating Hadamard, discrete cosine, and Haar matrices

    Authors: Yi-Ning Zhao, Lin-Shan Chen, Liu-Ya Chen, Lingxin Kong, Chong Wang, Cheng Ren, Su-Heng Zhang, De-Zhong Cao

    Abstract: A scenario of ghost imaging with hybrid transform approach is proposed by integrating Hadamard, discrete cosine, and Haar matrices. The measurement matrix is formed by the Kronecker product of the two different transform matrices. The image information can be conveniently reconstructed by the corresponding inverse matrices. In experiment, six hybridization sets are performed in computational ghost… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 5 pages, 4 figures

  42. arXiv:2405.01538  [pdf, other

    cs.CV cs.LG cs.RO

    Multi-Space Alignments Towards Universal LiDAR Segmentation

    Authors: Youquan Liu, Lingdong Kong, Xiaoyang Wu, Runnan Chen, Xin Li, Liang Pan, Ziwei Liu, Yuexin Ma

    Abstract: A unified and versatile LiDAR segmentation model with strong robustness and generalizability is desirable for safe autonomous driving perception. This work presents M3Net, a one-of-a-kind framework for fulfilling multi-task, multi-dataset, multi-modality LiDAR segmentation in a universal manner using just a single set of parameters. To better exploit data volume and diversity, we first combine lar… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: CVPR 2024; 33 pages, 14 figures, 14 tables; Code at https://github.com/youquanl/M3Net

  43. arXiv:2404.14542  [pdf, other

    cs.CV

    UVEB: A Large-scale Benchmark and Baseline Towards Real-World Underwater Video Enhancement

    Authors: Yaofeng Xie, Lingwei Kong, Kai Chen, Ziqiang Zheng, Xiao Yu, Zhibin Yu, Bing Zheng

    Abstract: Learning-based underwater image enhancement (UIE) methods have made great progress. However, the lack of large-scale and high-quality paired training samples has become the main bottleneck hindering the development of UIE. The inter-frame information in underwater videos can accelerate or optimize the UIE process. Thus, we constructed the first large-scale high-resolution underwater video enhancem… ▽ More

    Submitted 27 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 10 pages,CVPR2024 accept

    ACM Class: I.4

  44. arXiv:2404.10262  [pdf, other

    stat.CO math.OC

    Safe Feature Identification Rule for Fused Lasso by An Extra Dual Variable

    Authors: Pan Shang, Huangyue Chen, Lingchen Kong

    Abstract: Fused Lasso was proposed to characterize the sparsity of the coefficients and the sparsity of their successive differences for the linear regression. Due to its wide applications, there are many existing algorithms to solve fused Lasso. However, the computation of this model is time-consuming in high-dimensional data sets. To accelerate the calculation of fused Lasso in high-dimension data sets, w… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  45. arXiv:2404.10046  [pdf, other

    cond-mat.supr-con cond-mat.str-el

    Observation of Cooper-pair density modulation state

    Authors: Lingyuan Kong, Michał Papaj, Hyunjin Kim, Yiran Zhang, Eli Baum, Hui Li, Kenji Watanabe, Takashi Taniguchi, Genda Gu, Patrick A. Lee, Stevan Nadj-Perge

    Abstract: Superconducting states that break space-group symmetries of the underlying crystal can exhibit nontrivial spatial modulation of the order parameter. Previously, such remarkable states were intimately associated with the breaking of translational symmetry, giving rise to the density-wave orders, with wavelengths spanning several unit cells. However, a related basic concept has been long overlooked:… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Full submission including supplementary information, 4 main figures

  46. arXiv:2404.09987  [pdf, other

    cs.CV

    OneChart: Purify the Chart Structural Extraction via One Auxiliary Token

    Authors: Jinyue Chen, Lingyu Kong, Haoran Wei, Chenglong Liu, Zheng Ge, Liang Zhao, Jianjian Sun, Chunrui Han, Xiangyu Zhang

    Abstract: Chart parsing poses a significant challenge due to the diversity of styles, values, texts, and so forth. Even advanced large vision-language models (LVLMs) with billions of parameters struggle to handle such tasks satisfactorily. To address this, we propose OneChart: a reliable agent specifically devised for the structural extraction of chart information. Similar to popular LVLMs, OneChart incorpo… ▽ More

    Submitted 25 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: 14 pages, 9 figures and 6 tables

  47. arXiv:2404.07459  [pdf, other

    stat.ME math.OC

    Safe subspace screening for the adaptive nuclear norm regularized trace regression

    Authors: Pan Shang, Lingchen Kong

    Abstract: Matrix form data sets arise in many areas, so there are lots of works about the matrix regression models. One special model of these models is the adaptive nuclear norm regularized trace regression, which has been proven have good statistical performances. In order to accelerate the computation of this model, we consider the technique called screening rule. According to matrix decomposition and op… ▽ More

    Submitted 15 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

  48. arXiv:2403.18528  [pdf, other

    math.OC q-fin.MF

    Limited Attention Allocation in a Stochastic Linear Quadratic System with Multiplicative Noise

    Authors: Xiangyu Cui, Jianjun Gao, Lingjie Kong

    Abstract: This study addresses limited attention allocation in a stochastic linear quadratic system with multiplicative noise. Our approach enables strategic resource allocation to enhance noise estimation and improve control decisions. We provide analytical optimal control and propose a numerical method for optimal attention allocation. Additionally, we apply our ffndings to dynamic mean-variance portfolio… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  49. arXiv:2403.18272  [pdf, other

    astro-ph.HE

    Recovery of High-energy Low-frequency Quasi-periodic Oscillations from Black Hole X-ray Binary MAXI J1535-571 with a Hilbert-Huang Transform Method

    Authors: Qingcang Shui, Shu Zhang, Shuangnan Zhang, Yupeng Chen, Lingda Kong, Jingqiang Peng, Long Ji, Pengju Wang, Zhi Chang, Zhuoli Yu, Hongxing Yin, Jinlu Qu, Lian Tao, Mingyu Ge, Xiang Ma, Liang Zhang, Wei Yu, Jian Li

    Abstract: We propose a method based on the Hilbert-Huang transform (HHT) to recover the high-energy waveform of low-frequency quasi-periodic oscillations (LFQPOs). Based on the method, we successfully obtain the modulation of the phase-folded light curve above 170 keV using the QPO phase reconstructed at lower energies in MAXI J1535-571 with Insight-HXMT observations. A comprehensive simulation study is con… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: 21 pages, 15 figures, accepted for publication in ApJL

  50. arXiv:2403.17010  [pdf, other

    cs.CV cs.LG cs.RO

    Calib3D: Calibrating Model Preferences for Reliable 3D Scene Understanding

    Authors: Lingdong Kong, Xiang Xu, Jun Cen, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu

    Abstract: Safety-critical 3D scene understanding tasks necessitate not only accurate but also confident predictions from 3D perception models. This study introduces Calib3D, a pioneering effort to benchmark and scrutinize the reliability of 3D scene understanding models from an uncertainty estimation viewpoint. We comprehensively evaluate 28 state-of-the-art models across 10 diverse 3D datasets, uncovering… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Preprint; 37 pages, 8 figures, 11 tables; Code at https://github.com/ldkong1205/Calib3D