Skip to main content

Showing 1–50 of 334 results for author: Fu, C

  1. arXiv:2407.08916  [pdf

    cs.LG cs.IR

    Transforming Movie Recommendations with Advanced Machine Learning: A Study of NMF, SVD,and K-Means Clustering

    Authors: Yubing Yan, Camille Moreau, Zhuoyue Wang, Wenhan Fan, Chengqian Fu

    Abstract: This study develops a robust movie recommendation system using various machine learning techniques, including Non- Negative Matrix Factorization (NMF), Truncated Singular Value Decomposition (SVD), and K-Means clustering. The primary objective is to enhance user experience by providing personalized movie recommendations. The research encompasses data preprocessing, model training, and evaluation,… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted by 2024 4th International Symposium on Computer Technology and Information Science, IEEE

  2. OMR-NET: a two-stage octave multi-scale residual network for screen content image compression

    Authors: Shiqi Jiang, Ting Ren, Congrui Fu, Shuai Li, Hui Yuan

    Abstract: Screen content (SC) differs from natural scene (NS) with unique characteristics such as noise-free, repetitive patterns, and high contrast. Aiming at addressing the inadequacies of current learned image compression (LIC) methods for SC, we propose an improved two-stage octave convolutional residual blocks (IToRB) for high and low-frequency feature extraction and a cascaded two-stage multi-scale re… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 7 figures, 2 tables

    Journal ref: IEEE Signal Processing Letters, 2024

  3. arXiv:2407.08466  [pdf, other

    eess.IV cs.CV

    Global Spatial-Temporal Information-based Residual ConvLSTM for Video Space-Time Super-Resolution

    Authors: Congrui Fu, Hui Yuan, Shiqi Jiang, Guanghui Zhang, Liquan Shen, Raouf Hamzaoui

    Abstract: By converting low-frame-rate, low-resolution videos into high-frame-rate, high-resolution ones, space-time video super-resolution techniques can enhance visual experiences and facilitate more efficient information dissemination. We propose a convolutional neural network (CNN) for space-time video super-resolution, namely GIRNet. To generate highly accurate features and thus improve performance, th… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  4. arXiv:2406.17419  [pdf, other

    cs.CL cs.AI

    Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA

    Authors: Minzheng Wang, Longze Chen, Cheng Fu, Shengyi Liao, Xinghua Zhang, Bingli Wu, Haiyang Yu, Nan Xu, Lei Zhang, Run Luo, Yunshui Li, Min Yang, Fei Huang, Yongbin Li

    Abstract: Long-context modeling capabilities have garnered widespread attention, leading to the emergence of Large Language Models (LLMs) with ultra-context windows. Meanwhile, benchmarks for evaluating long-context LLMs are gradually catching up. However, existing benchmarks employ irrelevant noise texts to artificially extend the length of test cases, diverging from the real-world scenarios of long-contex… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: We release our code and data publicly at https://github.com/MozerWang/Loong

  5. arXiv:2406.11519  [pdf, other

    cs.CV eess.IV

    HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model

    Authors: Di Wang, Meiqi Hu, Yao Jin, Yuchun Miao, Jiaqi Yang, Yichu Xu, Xiaolei Qin, Jiaqi Ma, Lingyu Sun, Chenxing Li, Chuan Fu, Hongruixuan Chen, Chengxi Han, Naoto Yokoya, Jing Zhang, Minqiang Xu, Lin Liu, Lefei Zhang, Chen Wu, Bo Du, Dacheng Tao, Liangpei Zhang

    Abstract: Foundation models (FMs) are revolutionizing the analysis and understanding of remote sensing (RS) scenes, including aerial RGB, multispectral, and SAR images. However, hyperspectral images (HSIs), which are rich in spectral information, have not seen much application of FMs, with existing methods often restricted to specific tasks and lacking generality. To fill this gap, we introduce HyperSIGMA,… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: The code and models will be released at https://github.com/WHU-Sigma/HyperSIGMA

  6. arXiv:2406.10261  [pdf, other

    cs.CL cs.AI

    FoodSky: A Food-oriented Large Language Model that Passes the Chef and Dietetic Examination

    Authors: Pengfei Zhou, Weiqing Min, Chaoran Fu, Ying Jin, Mingyu Huang, Xiangyang Li, Shuhuan Mei, Shuqiang Jiang

    Abstract: Food is foundational to human life, serving not only as a source of nourishment but also as a cornerstone of cultural identity and social interaction. As the complexity of global dietary needs and preferences grows, food intelligence is needed to enable food perception and reasoning for various tasks, ranging from recipe generation and dietary recommendation to diet-disease correlation discovery a… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 32 pages, 19 figures

  7. arXiv:2406.10228  [pdf, other

    cs.CV cs.AI cs.CL

    VEGA: Learning Interleaved Image-Text Comprehension in Vision-Language Large Models

    Authors: Chenyu Zhou, Mengdan Zhang, Peixian Chen, Chaoyou Fu, Yunhang Shen, Xiawu Zheng, Xing Sun, Rongrong Ji

    Abstract: The swift progress of Multi-modal Large Models (MLLMs) has showcased their impressive ability to tackle tasks blending vision and language. Yet, most current models and benchmarks cater to scenarios with a narrow scope of visual and textual contexts. These models often fall short when faced with complex comprehension tasks, which involve navigating through a plethora of irrelevant and potentially… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Project Page: https://zhourax.github.io/VEGA/

  8. arXiv:2406.08487  [pdf, other

    cs.CV

    Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models

    Authors: Yi-Fan Zhang, Qingsong Wen, Chaoyou Fu, Xue Wang, Zhang Zhang, Liang Wang, Rong Jin

    Abstract: Seeing clearly with high resolution is a foundation of Large Multimodal Models (LMMs), which has been proven to be vital for visual perception and reasoning. Existing works usually employ a straightforward resolution upscaling method, where the image consists of global and local branches, with the latter being the sliced image patches but resized to the same resolution as the former. This means th… ▽ More

    Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: Project page: https://github.com/yfzhang114/SliME

  9. arXiv:2406.07706  [pdf, other

    cs.CV

    Object-level Scene Deocclusion

    Authors: Zhengzhe Liu, Qing Liu, Chirui Chang, Jianming Zhang, Daniil Pakhomov, Haitian Zheng, Zhe Lin, Daniel Cohen-Or, Chi-Wing Fu

    Abstract: Deoccluding the hidden portions of objects in a scene is a formidable task, particularly when addressing real-world scenes. In this paper, we present a new self-supervised PArallel visible-to-COmplete diffusion framework, named PACO, a foundation model for object-level scene deocclusion. Leveraging the rich prior of pre-trained models, we first design the parallel variational autoencoder, which pr… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: SIGGRAPH 2024. A foundation model for category-agnostic object deocclusion

  10. arXiv:2406.04025  [pdf

    cs.CL

    The syntax-semantics interface in a child's path: A study of 3- to 11-year-olds' elicited production of Mandarin recursive relative clauses

    Authors: Caimei Yang, Qihang Yang, Xingzhi Su, Chenxi Fu, Xiaoyi Wang, Ying Yan, Zaijiang Man

    Abstract: There have been apparently conflicting claims over the syntax-semantics relationship in child acquisition. However, few of them have assessed the child's path toward the acquisition of recursive relative clauses (RRCs). The authors of the current paper did experiments to investigate 3- to 11-year-olds' most-structured elicited production of eight Mandarin RRCs in a 4 (syntactic types)*2 (semantic… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  11. arXiv:2405.21075  [pdf, other

    cs.CV cs.CL

    Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

    Authors: Chaoyou Fu, Yuhan Dai, Yongdong Luo, Lei Li, Shuhuai Ren, Renrui Zhang, Zihan Wang, Chenyu Zhou, Yunhang Shen, Mengdan Zhang, Peixian Chen, Yanwei Li, Shaohui Lin, Sirui Zhao, Ke Li, Tong Xu, Xiawu Zheng, Enhong Chen, Rongrong Ji, Xing Sun

    Abstract: In the quest for artificial general intelligence, Multi-modal Large Language Models (MLLMs) have emerged as a focal point in recent advancements. However, the predominant focus remains on developing their capabilities in static image understanding. The potential of MLLMs in processing sequential visual data is still insufficiently explored, highlighting the absence of a comprehensive, high-quality… ▽ More

    Submitted 16 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

    Comments: Project Page: https://video-mme.github.io

  12. arXiv:2405.05941  [pdf, other

    cs.RO cs.CV cs.LG

    Evaluating Real-World Robot Manipulation Policies in Simulation

    Authors: Xuanlin Li, Kyle Hsu, Jiayuan Gu, Karl Pertsch, Oier Mees, Homer Rich Walke, Chuyuan Fu, Ishikaa Lunawat, Isabel Sieh, Sean Kirmani, Sergey Levine, Jiajun Wu, Chelsea Finn, Hao Su, Quan Vuong, Ted Xiao

    Abstract: The field of robotics has made significant advances towards generalist robot manipulation policies. However, real-world evaluation of such policies is not scalable and faces reproducibility challenges, which are likely to worsen as policies broaden the spectrum of tasks they can perform. We identify control and visual disparities between real and simulated environments as key challenges for reliab… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  13. arXiv:2405.02933  [pdf, other

    cs.CL

    Relay Decoding: Concatenating Large Language Models for Machine Translation

    Authors: Chengpeng Fu, Xiaocheng Feng, Yichong Huang, Wenshuai Huo, Baohang Li, Hui Wang, Bin Qin, Ting Liu

    Abstract: Leveraging large language models for machine translation has demonstrated promising results. However, it does require the large language models to possess the capability of handling both the source and target languages in machine translation. When it is challenging to find large models that support the desired languages, resorting to continuous learning methods becomes a costly endeavor. To mitiga… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: Work in progress

  14. arXiv:2404.18612  [pdf

    cs.RO

    Enhancing Prosthetic Safety and Environmental Adaptability: A Visual-Inertial Prosthesis Motion Estimation Approach on Uneven Terrains

    Authors: Chuheng Chen, Xinxing Chen, Shucong Yin, Yuxuan Wang, Binxin Huang, Yuquan Leng, Chenglong Fu

    Abstract: Environment awareness is crucial for enhancing walking safety and stability of amputee wearing powered prosthesis when crossing uneven terrains such as stairs and obstacles. However, existing environmental perception systems for prosthesis only provide terrain types and corresponding parameters, which fails to prevent potential collisions when crossing uneven terrains and may lead to falls and oth… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  15. arXiv:2404.18067  [pdf, other

    cs.LO

    Type Inference for Isabelle2Cpp

    Authors: Dongchen Jiang, Chenxi Fu

    Abstract: Isabelle2Cpp is a code generation framework that supports automatic generation of C++ code from Isabelle/HOL specifications. However, if some type information of Isabelle/HOL specification is missing, Isabelle2Cpp may not complete the code generation automatically. In order to solve this problem, this paper provides a type system for Isabelle2Cpp, which is used to perform type inference and type u… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: 22 pages, 4 figures

    MSC Class: 68N30 ACM Class: D.2.4

  16. arXiv:2404.16033  [pdf, other

    cs.CV cs.CL

    Cantor: Inspiring Multimodal Chain-of-Thought of MLLM

    Authors: Timin Gao, Peixian Chen, Mengdan Zhang, Chaoyou Fu, Yunhang Shen, Yan Zhang, Shengchuan Zhang, Xiawu Zheng, Xing Sun, Liujuan Cao, Rongrong Ji

    Abstract: With the advent of large language models(LLMs) enhanced by the chain-of-thought(CoT) methodology, visual reasoning problem is usually decomposed into manageable sub-tasks and tackled sequentially with various external tools. However, such a paradigm faces the challenge of the potential "determining hallucinations" in decision-making due to insufficient visual information and the limitation of low-… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: The project page is available at https://ggg0919.github.io/cantor/

  17. arXiv:2404.08224   

    cs.LG cs.AI cs.CR cs.IT eess.SY

    HCL-MTSAD: Hierarchical Contrastive Consistency Learning for Accurate Detection of Industrial Multivariate Time Series Anomalies

    Authors: Haili Sun, Yan Huang, Lansheng Han, Cai Fu, Chunjie Zhou

    Abstract: Multivariate Time Series (MTS) anomaly detection focuses on pinpointing samples that diverge from standard operational patterns, which is crucial for ensuring the safety and security of industrial applications. The primary challenge in this domain is to develop representations capable of discerning anomalies effectively. The prevalent methods for anomaly detection in the literature are predominant… ▽ More

    Submitted 18 April, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: This paper is a manuscript that is still in the process of revision, including Table 1, Figure 2, problem definition in section III.B and method description proposed in section IV. In addition, the submitter has not been authorized by the first author and other co-authors to post the paper to arXiv

  18. arXiv:2404.05350  [pdf, other

    cs.LG cs.CR

    Certified PEFTSmoothing: Parameter-Efficient Fine-Tuning with Randomized Smoothing

    Authors: Chengyan Fu, Wenjie Wang

    Abstract: Randomized smoothing is the primary certified robustness method for accessing the robustness of deep learning models to adversarial perturbations in the l2-norm, by adding isotropic Gaussian noise to the input image and returning the majority votes over the base classifier. Theoretically, it provides a certified norm bound, ensuring predictions of adversarial examples are stable within this bound.… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  19. arXiv:2404.04997  [pdf, other

    cs.LG cs.AI cs.CL

    Adapting LLMs for Efficient Context Processing through Soft Prompt Compression

    Authors: Cangqing Wang, Yutian Yang, Ruisi Li, Dan Sun, Ruicong Cai, Yuzhu Zhang, Chengqian Fu, Lillian Floyd

    Abstract: The rapid advancement of Large Language Models (LLMs) has inaugurated a transformative epoch in natural language processing, fostering unprecedented proficiency in text generation, comprehension, and contextual scrutiny. Nevertheless, effectively handling extensive contexts, crucial for myriad applications, poses a formidable obstacle owing to the intrinsic constraints of the models' context windo… ▽ More

    Submitted 18 April, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted by the 2024 International Conference on Image Processing and Computer Applications (IPCA 2024)

  20. arXiv:2404.04050  [pdf, other

    cs.CV

    No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation

    Authors: Xiangyang Zhu, Renrui Zhang, Bowei He, Ziyu Guo, Jiaming Liu, Han Xiao, Chaoyou Fu, Hao Dong, Peng Gao

    Abstract: To reduce the reliance on large-scale datasets, recent works in 3D segmentation resort to few-shot learning. Current 3D few-shot segmentation methods first pre-train models on 'seen' classes, and then evaluate their generalization performance on 'unseen' classes. However, the prior pre-training stage not only introduces excessive time overhead but also incurs a significant domain gap on 'unseen' c… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: CVPR Highlight. Code is available at https://github.com/yangyangyang127/Seg-NN. arXiv admin note: text overlap with arXiv:2308.12961

  21. Creating synthetic energy meter data using conditional diffusion and building metadata

    Authors: Chun Fu, Hussain Kazmi, Matias Quintana, Clayton Miller

    Abstract: Advances in machine learning and increased computational power have driven progress in energy-related research. However, limited access to private energy data from buildings hinders traditional regression models relying on historical data. While generative models offer a solution, previous studies have primarily focused on short-term generation periods (e.g., daily profiles) and a limited number o… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 17 pages, 11 figures, submitted to journal "Energy and Buildings"

    Journal ref: Energy Build. 2024;312: 114216

  22. arXiv:2403.19507  [pdf, other

    cs.LG

    SineNet: Learning Temporal Dynamics in Time-Dependent Partial Differential Equations

    Authors: Xuan Zhang, Jacob Helwig, Yuchao Lin, Yaochen Xie, Cong Fu, Stephan Wojtowytsch, Shuiwang Ji

    Abstract: We consider using deep neural networks to solve time-dependent partial differential equations (PDEs), where multi-scale processing is crucial for modeling complex, time-evolving dynamics. While the U-Net architecture with skip connections is commonly used by prior studies to enable multi-scale processing, our analysis shows that the need for features to evolve across layers results in temporally m… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: The Twelfth International Conference on Learning Representations

  23. arXiv:2403.19374  [pdf, other

    cs.ET eess.SY

    A noise-tolerant, resource-saving probabilistic binary neural network implemented by the SOT-MRAM compute-in-memory system

    Authors: Yu Gu, Puyang Huang, Tianhao Chen, Chenyi Fu, Aitian Chen, Shouzhong Peng, Xixiang Zhang, Xufeng Kou

    Abstract: We report a spin-orbit torque(SOT) magnetoresistive random-access memory(MRAM)-based probabilistic binary neural network(PBNN) for resource-saving and hardware noise-tolerant computing applications. With the presence of thermal fluctuation, the non-destructive SOT-driven magnetization switching characteristics lead to a random weight matrix with controllable probability distribution. In the meanwh… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: 5 pages, 10 figures

    MSC Class: 94C60 ACM Class: B.2.4; B.3.0

  24. arXiv:2403.18575  [pdf, other

    cs.CV

    HandBooster: Boosting 3D Hand-Mesh Reconstruction by Conditional Synthesis and Sampling of Hand-Object Interactions

    Authors: Hao Xu, Haipeng Li, Yinqiao Wang, Shuaicheng Liu, Chi-Wing Fu

    Abstract: Reconstructing 3D hand mesh robustly from a single image is very challenging, due to the lack of diversity in existing real-world datasets. While data synthesis helps relieve the issue, the syn-to-real gap still hinders its usage. In this work, we present HandBooster, a new approach to uplift the data diversity and boost the 3D hand-mesh reconstruction performance by training a conditional generat… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  25. arXiv:2403.18351  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Generating Diverse Agricultural Data for Vision-Based Farming Applications

    Authors: Mikolaj Cieslak, Umabharathi Govindarajan, Alejandro Garcia, Anuradha Chandrashekar, Torsten Hädrich, Aleksander Mendoza-Drosik, Dominik L. Michels, Sören Pirk, Chia-Chun Fu, Wojciech Pałubicki

    Abstract: We present a specialized procedural model for generating synthetic agricultural scenes, focusing on soybean crops, along with various weeds. This model is capable of simulating distinct growth stages of these plants, diverse soil conditions, and randomized field arrangements under varying lighting conditions. The integration of real-world textures and environmental factors into the procedural gene… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: 10 pages, 8 figures, 3 tables

    MSC Class: 68T07; 68T45 ACM Class: I.2.10; I.4.6

  26. arXiv:2403.16004  [pdf, other

    cs.LG cs.AI

    A Federated Parameter Aggregation Method for Node Classification Tasks with Different Graph Network Structures

    Authors: Hao Song, Jiacheng Yao, Zhengxi Li, Shaocong Xu, Shibo Jin, Jiajun Zhou, Chenbo Fu, Qi Xuan, Shanqing Yu

    Abstract: Over the past few years, federated learning has become widely used in various classical machine learning fields because of its collaborative ability to train data from multiple sources without compromising privacy. However, in the area of graph neural networks, the nodes and network structures of graphs held by clients are different in many practical applications, and the aggregation method that d… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  27. arXiv:2403.11857  [pdf, other

    cs.LG cond-mat.mtrl-sci

    Complete and Efficient Graph Transformers for Crystal Material Property Prediction

    Authors: Keqiang Yan, Cong Fu, Xiaofeng Qian, Xiaoning Qian, Shuiwang Ji

    Abstract: Crystal structures are characterized by atomic bases within a primitive unit cell that repeats along a regular lattice throughout 3D space. The periodic and infinite nature of crystals poses unique challenges for geometric graph representation learning. Specifically, constructing graphs that effectively capture the complete geometric information of crystals and handle chiral crystals remains an un… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: This paper has been accepted by ICLR 2024

  28. arXiv:2403.11624  [pdf, other

    cs.IR cs.LG

    Dual-Channel Multiplex Graph Neural Networks for Recommendation

    Authors: Xiang Li, Chaofan Fu, Zhongying Zhao, Guanjie Zheng, Chao Huang, Junyu Dong, Yanwei Yu

    Abstract: Efficient recommender systems play a crucial role in accurately capturing user and item attributes that mirror individual preferences. Some existing recommendation techniques have started to shift their focus towards modeling various types of interaction relations between users and items in real-world recommendation scenarios, such as clicks, marking favorites, and purchases on online shopping pla… ▽ More

    Submitted 29 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

  29. arXiv:2403.11186  [pdf, other

    cs.CV

    NetTrack: Tracking Highly Dynamic Objects with a Net

    Authors: Guangze Zheng, Shijie Lin, Haobo Zuo, Changhong Fu, Jia Pan

    Abstract: The complex dynamicity of open-world objects presents non-negligible challenges for multi-object tracking (MOT), often manifested as severe deformations, fast motion, and occlusions. Most methods that solely depend on coarse-grained object cues, such as boxes and the overall appearance of the object, are susceptible to degradation due to distorted internal relationships of dynamic objects. To addr… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  30. arXiv:2403.02616  [pdf

    cs.LG cs.AI cs.CR cs.NI eess.SY

    Unsupervised Spatio-Temporal State Estimation for Fine-grained Adaptive Anomaly Diagnosis of Industrial Cyber-physical Systems

    Authors: Haili Sun, Yan Huang, Lansheng Han, Cai Fu, Chunjie Zhou

    Abstract: Accurate detection and diagnosis of abnormal behaviors such as network attacks from multivariate time series (MTS) are crucial for ensuring the stable and effective operation of industrial cyber-physical systems (CPS). However, existing researches pay little attention to the logical dependencies among system working states, and have difficulties in explaining the evolution mechanisms of abnormal s… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: 23 pages, 7 figures

  31. arXiv:2403.00868  [pdf, other

    cs.CL cs.AI

    SoftTiger: A Clinical Foundation Model for Healthcare Workflows

    Authors: Ye Chen, Igor Couto, Wei Cai, Cong Fu, Bruno Dorneles

    Abstract: We introduce SoftTiger, a clinical large language model (CLaM) designed as a foundation model for healthcare workflows. The narrative and unstructured nature of clinical notes is a major obstacle for healthcare intelligentization. We address a critical problem of structuring clinical notes into clinical data, according to international interoperability standards. We collect and annotate data for t… ▽ More

    Submitted 26 March, 2024; v1 submitted 29 February, 2024; originally announced March 2024.

  32. arXiv:2403.00801  [pdf, other

    cs.IR cs.AI cs.CL

    Self-Retrieval: Building an Information Retrieval System with One Large Language Model

    Authors: Qiaoyu Tang, Jiawei Chen, Bowen Yu, Yaojie Lu, Cheng Fu, Haiyang Yu, Hongyu Lin, Fei Huang, Ben He, Xianpei Han, Le Sun, Yongbin Li

    Abstract: The rise of large language models (LLMs) has transformed the role of information retrieval (IR) systems in the way to humans accessing information. Due to the isolated architecture and the limited interaction, existing IR systems are unable to fully accommodate the shift from directly providing information to humans to indirectly serving large language models. In this paper, we propose Self-Retrie… ▽ More

    Submitted 23 February, 2024; originally announced March 2024.

  33. arXiv:2402.18318  [pdf

    cs.RO

    SD-SLAM: A Semantic SLAM Approach for Dynamic Scenes Based on LiDAR Point Clouds

    Authors: Feiya Li, Chunyun Fu, Dongye Sun, Jian Li, Jianwen Wang

    Abstract: Point cloud maps generated via LiDAR sensors using extensive remotely sensed data are commonly used by autonomous vehicles and robots for localization and navigation. However, dynamic objects contained in point cloud maps not only downgrade localization accuracy and navigation performance but also jeopardize the map quality. In response to this challenge, we propose in this paper a novel semantic… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  34. arXiv:2402.18117  [pdf, other

    cs.CV cs.LG

    PRCL: Probabilistic Representation Contrastive Learning for Semi-Supervised Semantic Segmentation

    Authors: Haoyu Xie, Changqi Wang, Jian Zhao, Yang Liu, Jun Dan, Chong Fu, Baigui Sun

    Abstract: Tremendous breakthroughs have been developed in Semi-Supervised Semantic Segmentation (S4) through contrastive learning. However, due to limited annotations, the guidance on unlabeled images is generated by the model itself, which inevitably exists noise and disturbs the unsupervised training process. To address this issue, we propose a robust contrastive-based S4 framework, termed the Probabilist… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: 19 pages, 11 figures

  35. arXiv:2402.11450  [pdf, other

    cs.RO

    Learning to Learn Faster from Human Feedback with Language Model Predictive Control

    Authors: Jacky Liang, Fei Xia, Wenhao Yu, Andy Zeng, Montserrat Gonzalez Arenas, Maria Attarian, Maria Bauza, Matthew Bennice, Alex Bewley, Adil Dostmohamed, Chuyuan Kelly Fu, Nimrod Gileadi, Marissa Giustina, Keerthana Gopalakrishnan, Leonard Hasenclever, Jan Humplik, Jasmine Hsu, Nikhil Joshi, Ben Jyenis, Chase Kew, Sean Kirmani, Tsang-Wei Edward Lee, Kuang-Huei Lee, Assaf Hurwitz Michaely, Joss Moore , et al. (25 additional authors not shown)

    Abstract: Large language models (LLMs) have been shown to exhibit a wide range of capabilities, such as writing robot code from language commands -- enabling non-experts to direct robot behaviors, modify them based on feedback, or compose them to perform new tasks. However, these capabilities (driven by in-context learning) are limited to short-term interactions, where users' feedback remains relevant for o… ▽ More

    Submitted 31 May, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

  36. arXiv:2402.10512  [pdf, other

    cs.AR

    A Novel Computing Paradigm for MobileNetV3 using Memristor

    Authors: Jiale Li, Longyu Ma, Chiu-Wing Sham, Chong Fu

    Abstract: The advancement in the field of machine learning is inextricably linked with the concurrent progress in domain-specific hardware accelerators such as GPUs and TPUs. However, the rapidly growing computational demands necessitated by larger models and increased data have become a primary bottleneck in further advancing machine learning, especially in mobile and edge devices. Currently, the neuromorp… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  37. arXiv:2402.10457  [pdf, other

    cs.DS cs.LG

    Learning-Augmented Skip Lists

    Authors: Chunkai Fu, Jung Hoon Seo, Samson Zhou

    Abstract: We study the integration of machine learning advice into the design of skip lists to improve upon traditional data structure design. Given access to a possibly erroneous oracle that outputs estimated fractional frequencies for search queries on a set of items, we construct a skip list that provably provides the optimal expected search time, within nearly a factor of two. In fact, our learning-augm… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  38. arXiv:2402.05589  [pdf, other

    cs.CV

    RESMatch: Referring Expression Segmentation in a Semi-Supervised Manner

    Authors: Ying Zang, Chenglong Fu, Runlong Cao, Didi Zhu, Min Zhang, Wenjun Hu, Lanyun Zhu, Tianrun Chen

    Abstract: Referring expression segmentation (RES), a task that involves localizing specific instance-level objects based on free-form linguistic descriptions, has emerged as a crucial frontier in human-AI interaction. It demands an intricate understanding of both visual and textual contexts and often requires extensive training data. This paper introduces RESMatch, the first semi-supervised learning (SSL) a… ▽ More

    Submitted 11 February, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

  39. arXiv:2402.02313  [pdf, other

    cs.CV cs.GR

    CNS-Edit: 3D Shape Editing via Coupled Neural Shape Optimization

    Authors: Jingyu Hu, Ka-Hei Hui, Zhengzhe Liu, Hao Zhang, Chi-Wing Fu

    Abstract: This paper introduces a new approach based on a coupled representation and a neural volume optimization to implicitly perform 3D shape editing in latent space. This work has three innovations. First, we design the coupled neural shape (CNS) representation for supporting 3D shape editing. This representation includes a latent code, which captures high-level global semantics of the shape, and a 3D n… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

  40. arXiv:2402.01389  [pdf, other

    cs.CV

    SiMA-Hand: Boosting 3D Hand-Mesh Reconstruction by Single-to-Multi-View Adaptation

    Authors: Yinqiao Wang, Hao Xu, Pheng-Ann Heng, Chi-Wing Fu

    Abstract: Estimating 3D hand mesh from RGB images is a longstanding track, in which occlusion is one of the most challenging problems. Existing attempts towards this task often fail when the occlusion dominates the image space. In this paper, we propose SiMA-Hand, aiming to boost the mesh reconstruction performance by Single-to-Multi-view Adaptation. First, we design a multi-view hand reconstructor to fuse… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  41. arXiv:2401.11067  [pdf, other

    cs.CV cs.GR

    Make-A-Shape: a Ten-Million-scale 3D Shape Model

    Authors: Ka-Hei Hui, Aditya Sanghi, Arianna Rampini, Kamal Rahimi Malekshan, Zhengzhe Liu, Hooman Shayani, Chi-Wing Fu

    Abstract: Significant progress has been made in training large generative models for natural language and images. Yet, the advancement of 3D generative models is hindered by their substantial resource demands for training, along with inefficient, non-compact, and less expressive representations. This paper introduces Make-A-Shape, a new 3D generative model designed for efficient training on a vast scale, ca… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

  42. arXiv:2401.05778  [pdf, other

    cs.CL cs.AI

    Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems

    Authors: Tianyu Cui, Yanling Wang, Chuanpu Fu, Yong Xiao, Sijia Li, Xinhao Deng, Yunpeng Liu, Qinglin Zhang, Ziyi Qiu, Peiyang Li, Zhixing Tan, Junwu Xiong, Xinyu Kong, Zujie Wen, Ke Xu, Qi Li

    Abstract: Large language models (LLMs) have strong capabilities in solving diverse natural language processing tasks. However, the safety and security issues of LLM systems have become the major obstacle to their widespread application. Many studies have extensively investigated risks in LLM systems and developed the corresponding mitigation strategies. Leading-edge enterprises such as OpenAI, Google, Meta,… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

  43. arXiv:2401.05072  [pdf, other

    cs.CL

    Aligning Translation-Specific Understanding to General Understanding in Large Language Models

    Authors: Yichong Huang, Xiaocheng Feng, Baohang Li, Chengpeng Fu, Wenshuai Huo, Ting Liu, Bing Qin

    Abstract: Although large language models (LLMs) have shown surprising language understanding and generation capabilities, they have yet to gain a revolutionary advancement in the field of machine translation. One potential cause of the limited performance is the misalignment between the translation-specific understanding and general understanding inside LLMs. To align the translation-specific understanding… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: work in progress

  44. arXiv:2401.03914  [pdf, other

    cs.CV

    D3PRefiner: A Diffusion-based Denoise Method for 3D Human Pose Refinement

    Authors: Danqi Yan, Qing Gao, Yuepeng Qian, Xinxing Chen, Chenglong Fu, Yuquan Leng

    Abstract: Three-dimensional (3D) human pose estimation using a monocular camera has gained increasing attention due to its ease of implementation and the abundance of data available from daily life. However, owing to the inherent depth ambiguity in images, the accuracy of existing monocular camera-based 3D pose estimation methods remains unsatisfactory, and the estimated 3D poses usually include much noise.… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

  45. arXiv:2312.16787  [pdf

    cs.RO

    L-LO: Enhancing Pose Estimation Precision via a Landmark-Based LiDAR Odometry

    Authors: Feiya Li, Chunyun Fu, Dongye Sun

    Abstract: The majority of existing LiDAR odometry solutions are based on simple geometric features such as points, lines or planes which cannot fully reflect the characteristics of surrounding environments. In this study, we propose a novel LiDAR odometry which effectively utilizes the overall exterior characteristics of environmental landmarks. The vehicle pose estimation is accomplished by means of two se… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

  46. arXiv:2312.12436  [pdf, other

    cs.CV cs.AI cs.CL cs.MM

    A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise

    Authors: Chaoyou Fu, Renrui Zhang, Zihan Wang, Yubo Huang, Zhengye Zhang, Longtian Qiu, Gaoxiang Ye, Yunhang Shen, Mengdan Zhang, Peixian Chen, Sirui Zhao, Shaohui Lin, Deqiang Jiang, Di Yin, Peng Gao, Ke Li, Hongsheng Li, Xing Sun

    Abstract: The surge of interest towards Multi-modal Large Language Models (MLLMs), e.g., GPT-4V(ision) from OpenAI, has marked a significant trend in both academia and industry. They endow Large Language Models (LLMs) with powerful capabilities in visual understanding, enabling them to tackle diverse multi-modal tasks. Very recently, Google released Gemini, its newest and most capable MLLM built from the gr… ▽ More

    Submitted 20 December, 2023; v1 submitted 19 December, 2023; originally announced December 2023.

    Comments: Total 120 pages. See our project at https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models

  47. arXiv:2312.08688  [pdf, other

    cs.CL cs.AI

    TigerBot: An Open Multilingual Multitask LLM

    Authors: Ye Chen, Wei Cai, Liangmin Wu, Xiaowei Li, Zhanxuan Xin, Cong Fu

    Abstract: We release and introduce the TigerBot family of large language models (LLMs), consisting of base and chat models, sized from 7, 13, 70 and 180 billion parameters. We develop our models embarking from Llama-2 and BLOOM, and push the boundary further in data, training algorithm, infrastructure, and application tools. Our models yield meaningful performance gain over SOTA open-source models, e.g., Ll… ▽ More

    Submitted 14 December, 2023; v1 submitted 14 December, 2023; originally announced December 2023.

  48. arXiv:2312.04435  [pdf, other

    cs.MM

    Deep3DSketch: 3D modeling from Free-hand Sketches with View- and Structural-Aware Adversarial Training

    Authors: Tianrun Chen, Chenglong Fu, Lanyun Zhu, Papa Mao, Jia Zhang, Ying Zang, Lingyun Sun

    Abstract: This work aims to investigate the problem of 3D modeling using single free-hand sketches, which is one of the most natural ways we humans express ideas. Although sketch-based 3D modeling can drastically make the 3D modeling process more accessible, the sparsity and ambiguity of sketches bring significant challenges for creating high-fidelity 3D models that reflect the creators' ideas. In this work… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: ICASSP 2023. arXiv admin note: substantial text overlap with arXiv:2310.18148

  49. Analysis of Coding Gain Due to In-Loop Reshaping

    Authors: Chau-Wai Wong, Chang-Hong Fu, Mengting Xu, Guan-Ming Su

    Abstract: Reshaping, a point operation that alters the characteristics of signals, has been shown capable of improving the compression ratio in video coding practices. Out-of-loop reshaping that directly modifies the input video signal was first adopted as the supplemental enhancement information (SEI) for the HEVC/H.265 without the need to alter the core design of the video codec. VVC/H.266 further improve… ▽ More

    Submitted 19 June, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

    Comments: Published in IEEE Transactions on Image Processing

  50. arXiv:2312.02153  [pdf, other

    cs.CV

    Aligning and Prompting Everything All at Once for Universal Visual Perception

    Authors: Yunhang Shen, Chaoyou Fu, Peixian Chen, Mengdan Zhang, Ke Li, Xing Sun, Yunsheng Wu, Shaohui Lin, Rongrong Ji

    Abstract: Vision foundation models have been explored recently to build general-purpose vision systems. However, predominant paradigms, driven by casting instance-level tasks as an object-word alignment, bring heavy cross-modality interaction, which is not effective in prompting object detection and visual grounding. Another line of work that focuses on pixel-level tasks often encounters a large annotation… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.