Skip to main content

Showing 1–50 of 232 results for author: Feng, H

  1. arXiv:2407.12014  [pdf, other

    cs.HC cs.CY

    Surprising Performances of Students with Autism in Classroom with NAO Robot

    Authors: Qin Yang, Huan Lu, Dandan Liang, Shengrong Gong, Huanghao Feng

    Abstract: Autism is a developmental disorder that manifests in early childhood and persists throughout life, profoundly affecting social behavior and hindering the acquisition of learning and social skills in those diagnosed. As technological advancements progress, an increasing array of technologies is being utilized to support the education of students with Autism Spectrum Disorder (ASD), aiming to improv… ▽ More

    Submitted 26 June, 2024; originally announced July 2024.

  2. arXiv:2407.12005  [pdf, other

    cs.MM cs.CV

    VCEval: Rethinking What is a Good Educational Video and How to Automatically Evaluate It

    Authors: Xiaoxuan Zhu, Zhouhong Gu, Sihang Jiang, Zhixu Li, Hongwei Feng, Yanghua Xiao

    Abstract: Online courses have significantly lowered the barrier to accessing education, yet the varying content quality of these videos poses challenges. In this work, we focus on the task of automatically evaluating the quality of video course content. We have constructed a dataset with a substantial collection of video courses and teaching materials. We propose three evaluation principles and design a new… ▽ More

    Submitted 15 June, 2024; originally announced July 2024.

  3. arXiv:2407.11678  [pdf, other

    cs.LG math.ST stat.ML

    Theoretical Insights into CycleGAN: Analyzing Approximation and Estimation Errors in Unpaired Data Generation

    Authors: Luwei Sun, Dongrui Shen, Han Feng

    Abstract: In this paper, we focus on analyzing the excess risk of the unpaired data generation model, called CycleGAN. Unlike classical GANs, CycleGAN not only transforms data between two unpaired distributions but also ensures the mappings are consistent, which is encouraged by the cycle-consistency term unique to CycleGAN. The increasing complexity of model structure and the addition of the cycle-consiste… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  4. arXiv:2407.08949  [pdf, other

    cs.CV

    One-Shot Pose-Driving Face Animation Platform

    Authors: He Feng, Donglin Di, Yongjia Ma, Wei Chen, Tonghua Su

    Abstract: The objective of face animation is to generate dynamic and expressive talking head videos from a single reference face, utilizing driving conditions derived from either video or audio inputs. Current approaches often require fine-tuning for specific identities and frequently fail to produce expressive videos due to the limited effectiveness of Wav2Pose modules. To facilitate the generation of one-… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  5. arXiv:2407.06494  [pdf, other

    cs.LG cs.AI

    A Generative Approach to Control Complex Physical Systems

    Authors: Long Wei, Peiyan Hu, Ruiqi Feng, Haodong Feng, Yixuan Du, Tao Zhang, Rui Wang, Yue Wang, Zhi-Ming Ma, Tailin Wu

    Abstract: Controlling the evolution of complex physical systems is a fundamental task across science and engineering. Classical techniques suffer from limited applicability or huge computational costs. On the other hand, recent deep learning and reinforcement learning-based approaches often struggle to optimize long-term control sequences under the constraints of system dynamics. In this work, we introduce… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  6. arXiv:2407.04737  [pdf, other

    eess.SP cs.AI

    Hierarchical Decoupling Capacitor Optimization for Power Distribution Network of 2.5D ICs with Co-Analysis of Frequency and Time Domains Based on Deep Reinforcement Learning

    Authors: Yuanyuan Duan, Haiyang Feng, Zhiping Yu, Hanming Wu, Leilai Shao, Xiaolei Zhu

    Abstract: With the growing need for higher memory bandwidth and computation density, 2.5D design, which involves integrating multiple chiplets onto an interposer, emerges as a promising solution. However, this integration introduces significant challenges due to increasing data rates and a large number of I/Os, necessitating advanced optimization of the power distribution networks (PDNs) both on-chip and on… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  7. arXiv:2407.04272  [pdf, other

    cs.LG cs.DC

    Accelerating Communication in Deep Learning Recommendation Model Training with Dual-Level Adaptive Lossy Compression

    Authors: Hao Feng, Boyuan Zhang, Fanjiang Ye, Min Si, Ching-Hsiang Chu, Jiannan Tian, Chunxing Yin, Summer Deng, Yuchen Hao, Pavan Balaji, Tong Geng, Dingwen Tao

    Abstract: DLRM is a state-of-the-art recommendation system model that has gained widespread adoption across various industry applications. The large size of DLRM models, however, necessitates the use of multiple devices/GPUs for efficient training. A significant bottleneck in this process is the time-consuming all-to-all communication required to collect embedding data from all devices. To mitigate this, we… ▽ More

    Submitted 11 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

    Comments: accepted by SC '24

  8. arXiv:2407.01976  [pdf, other

    cs.CL cs.AI cs.MM

    A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding

    Authors: Jinghui Lu, Haiyang Yu, Yanjie Wang, Yongjie Ye, Jingqun Tang, Ziwei Yang, Binghong Wu, Qi Liu, Hao Feng, Han Wang, Hao Liu, Can Huang

    Abstract: Recently, many studies have demonstrated that exclusively incorporating OCR-derived text and spatial layouts with large language models (LLMs) can be highly effective for document understanding tasks. However, existing methods that integrate spatial layouts with text have limitations, such as producing overly long text sequences or failing to fully leverage the autoregressive traits of LLMs. In th… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  9. arXiv:2407.01281  [pdf, other

    cs.LG cs.AI math.FA

    Bridging Smoothness and Approximation: Theoretical Insights into Over-Smoothing in Graph Neural Networks

    Authors: Guangrui Yang, Jianfei Li, Ming Li, Han Feng, Ding-Xuan Zhou

    Abstract: In this paper, we explore the approximation theory of functions defined on graphs. Our study builds upon the approximation results derived from the $K$-functional. We establish a theoretical framework to assess the lower bounds of approximation for target functions using Graph Convolutional Networks (GCNs) and examine the over-smoothing phenomenon commonly observed in these networks. Initially, we… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  10. arXiv:2407.00697  [pdf, other

    cs.CV cs.AI eess.SP

    CaFNet: A Confidence-Driven Framework for Radar Camera Depth Estimation

    Authors: Huawei Sun, Hao Feng, Julius Ott, Lorenzo Servadei, Robert Wille

    Abstract: Depth estimation is critical in autonomous driving for interpreting 3D scenes accurately. Recently, radar-camera depth estimation has become of sufficient interest due to the robustness and low-cost properties of radar. Thus, this paper introduces a two-stage, end-to-end trainable Confidence-aware Fusion Net (CaFNet) for dense depth estimation, combining RGB imagery with sparse and noisy radar poi… ▽ More

    Submitted 7 July, 2024; v1 submitted 30 June, 2024; originally announced July 2024.

    Comments: Accepted by IROS 2024

  11. arXiv:2406.18927  [pdf, other

    cs.CV

    RoFIR: Robust Fisheye Image Rectification Framework Impervious to Optical Center Deviation

    Authors: Zhaokang Liao, Hao Feng, Shaokai Liu, Wengang Zhou, Houqiang Li

    Abstract: Fisheye images are categorized fisheye into central and deviated based on the optical center position. Existing rectification methods are limited to central fisheye images, while this paper proposes a novel method that extends to deviated fisheye image rectification. The challenge lies in the variant global distortion distribution pattern caused by the random optical center position. To address th… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  12. arXiv:2406.18360  [pdf, other

    cs.CV

    XLD: A Cross-Lane Dataset for Benchmarking Novel Driving View Synthesis

    Authors: Hao Li, Ming Yuan, Yan Zhang, Chenming Wu, Chen Zhao, Chunyu Song, Haocheng Feng, Errui Ding, Dingwen Zhang, Jingdong Wang

    Abstract: Thoroughly testing autonomy systems is crucial in the pursuit of safe autonomous driving vehicles. It necessitates creating safety-critical scenarios that go beyond what can be safely collected from real-world data, as many of these scenarios occur infrequently on public roads. However, the evaluation of most existing NVS methods relies on sporadic sampling of image frames from the training data,… ▽ More

    Submitted 26 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: project page: https://3d-aigc.github.io/XLD/

  13. arXiv:2406.18198  [pdf, other

    cs.CV

    VDG: Vision-Only Dynamic Gaussian for Driving Simulation

    Authors: Hao Li, Jingfeng Li, Dingwen Zhang, Chenming Wu, Jieqi Shi, Chen Zhao, Haocheng Feng, Errui Ding, Jingdong Wang, Junwei Han

    Abstract: Dynamic Gaussian splatting has led to impressive scene reconstruction and image synthesis advances in novel views. Existing methods, however, heavily rely on pre-computed poses and Gaussian initialization by Structure from Motion (SfM) algorithms or expensive sensors. For the first time, this paper addresses this issue by integrating self-supervised VO into our pose-free dynamic Gaussian method (V… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  14. arXiv:2406.12641  [pdf, other

    cs.CL

    DetectBench: Can Large Language Model Detect and Piece Together Implicit Evidence?

    Authors: Zhouhong Gu, Lin Zhang, Xiaoxuan Zhu, Jiangjie Chen, Wenhao Huang, Yikai Zhang, Shusen Wang, Zheyu Ye, Yan Gao, Hongwei Feng, Yanghua Xiao

    Abstract: Detecting evidence within the context is a key step in the process of reasoning task. Evaluating and enhancing the capabilities of LLMs in evidence detection will strengthen context-based reasoning performance. This paper proposes a benchmark called DetectBench for verifying the ability to detect and piece together implicit evidence within a long context. DetectBench contains 3,928 multiple-choice… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  15. arXiv:2406.10621  [pdf, other

    cs.CL cs.AI

    StrucText-Eval: An Autogenerated Benchmark for Evaluating Large Language Model's Ability in Structure-Rich Text Understanding

    Authors: Zhouhong Gu, Haoning Ye, Zeyang Zhou, Hongwei Feng, Yanghua Xiao

    Abstract: Given the substantial volumes of structured data held by many companies, enabling Large Language Models (LLMs) to directly understand structured text in non-structured forms could significantly enhance their capabilities across various business scenarios. To this end, we propose evaluation data generation method for assessing LLM's ability in understanding the structure-rich text, which generates… ▽ More

    Submitted 30 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

  16. arXiv:2406.07840  [pdf, other

    cs.CV

    SynthForge: Synthesizing High-Quality Face Dataset with Controllable 3D Generative Models

    Authors: Abhay Rawat, Shubham Dokania, Astitva Srivastava, Shuaib Ahmed, Haiwen Feng, Rahul Tallamraju

    Abstract: Recent advancements in generative models have unlocked the capabilities to render photo-realistic data in a controllable fashion. Trained on the real data, these generative models are capable of producing realistic samples with minimal to no domain gap, as compared to the traditional graphics rendering. However, using the data generated using such models for training downstream tasks remains under… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 11 pages, 4 figures, 3 tables. Under Review

  17. arXiv:2406.02058  [pdf, other

    cs.CV cs.RO

    OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding

    Authors: Yanmin Wu, Jiarui Meng, Haijie Li, Chenming Wu, Yahao Shi, Xinhua Cheng, Chen Zhao, Haocheng Feng, Errui Ding, Jingdong Wang, Jian Zhang

    Abstract: This paper introduces OpenGaussian, a method based on 3D Gaussian Splatting (3DGS) capable of 3D point-level open vocabulary understanding. Our primary motivation stems from observing that existing 3DGS-based open vocabulary methods mainly focus on 2D pixel-level parsing. These methods struggle with 3D point-level tasks due to weak feature expressiveness and inaccurate 2D-3D feature associations.… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: technical report, 15 pages

  18. arXiv:2406.01326  [pdf, other

    cs.CV

    TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy

    Authors: Weichao Zhao, Hao Feng, Qi Liu, Jingqun Tang, Shu Wei, Binghong Wu, Lei Liao, Yongjie Ye, Hao Liu, Houqiang Li, Can Huang

    Abstract: Tables contain factual and quantitative data accompanied by various structures and contents that pose challenges for machine comprehension. Previous methods generally design task-specific architectures and objectives for individual tasks, resulting in modal isolation and intricate workflows. In this paper, we present a novel large vision-language model, TabPedia, equipped with a concept synergy me… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 20 pages, 8 figures

  19. arXiv:2406.00227  [pdf, other

    cs.CV

    ImplicitTerrain: a Continuous Surface Model for Terrain Data Analysis

    Authors: Haoan Feng, Xin Xu, Leila De Floriani

    Abstract: Digital terrain models (DTMs) are pivotal in remote sensing, cartography, and landscape management, requiring accurate surface representation and topological information restoration. While topology analysis traditionally relies on smooth manifolds, the absence of an easy-to-use continuous surface model for a large terrain results in a preference for discrete meshes. Structural representation based… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

    Comments: 10pages, CVPR2024 Workshop INRV

  20. arXiv:2405.20786  [pdf, other

    cs.CV cs.HC

    Stratified Avatar Generation from Sparse Observations

    Authors: Han Feng, Wenchao Ma, Quankai Gao, Xianwei Zheng, Nan Xue, Huijuan Xu

    Abstract: Estimating 3D full-body avatars from AR/VR devices is essential for creating immersive experiences in AR/VR applications. This task is challenging due to the limited input from Head Mounted Devices, which capture only sparse observations from the head and hands. Predicting the full-body avatars, particularly the lower body, from these sparse observations presents significant difficulties. In this… ▽ More

    Submitted 3 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted by CVPR 2024 (Oral)

  21. arXiv:2405.19338  [pdf, other

    eess.SP cs.AI cs.CV

    Accurate Patient Alignment without Unnecessary Imaging Dose via Synthesizing Patient-specific 3D CT Images from 2D kV Images

    Authors: Yuzhen Ding, Jason M. Holmes, Hongying Feng, Baoxin Li, Lisa A. McGee, Jean-Claude M. Rwigema, Sujay A. Vora, Daniel J. Ma, Robert L. Foote, Samir H. Patel, Wei Liu

    Abstract: In radiotherapy, 2D orthogonally projected kV images are used for patient alignment when 3D-on-board imaging(OBI) unavailable. But tumor visibility is constrained due to the projection of patient's anatomy onto a 2D plane, potentially leading to substantial setup errors. In treatment room with 3D-OBI such as cone beam CT(CBCT), the field of view(FOV) of CBCT is limited with unnecessarily high imag… ▽ More

    Submitted 1 April, 2024; originally announced May 2024.

    Comments: 17 pages, 8 figures and tables

  22. arXiv:2405.14251  [pdf, other

    cs.RO eess.SY

    Efficient Navigation of a Robotic Fish Swimming Across the Vortical Flow Field

    Authors: Haodong Feng, Dehan Yuan, Jiale Miao, Jie You, Yue Wang, Yi Zhu, Dixia Fan

    Abstract: Navigating efficiently across vortical flow fields presents a significant challenge in various robotic applications. The dynamic and unsteady nature of vortical flows often disturbs the control of underwater robots, complicating their operation in hydrodynamic environments. Conventional control methods, which depend on accurate modeling, fail in these settings due to the complexity of fluid-struct… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  23. arXiv:2405.13800  [pdf, other

    cs.CV cs.AI

    Dense Connector for MLLMs

    Authors: Huanjin Yao, Wenhao Wu, Taojiannan Yang, YuXin Song, Mengxi Zhang, Haocheng Feng, Yifan Sun, Zhiheng Li, Wanli Ouyang, Jingdong Wang

    Abstract: Do we fully leverage the potential of visual encoder in Multimodal Large Language Models (MLLMs)? The recent outstanding performance of MLLMs in multimodal understanding has garnered broad attention from both academia and industry. In the current MLLM rat race, the focus seems to be predominantly on the linguistic side. We witness the rise of larger and higher-quality instruction datasets, as well… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: Technical report. 25 pages

  24. arXiv:2405.11985  [pdf, other

    cs.CV

    MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering

    Authors: Jingqun Tang, Qi Liu, Yongjie Ye, Jinghui Lu, Shu Wei, Chunhui Lin, Wanqing Li, Mohamad Fitri Faiz Bin Mahmood, Hao Feng, Zhen Zhao, Yanjie Wang, Yuliang Liu, Hao Liu, Xiang Bai, Can Huang

    Abstract: Text-Centric Visual Question Answering (TEC-VQA) in its proper format not only facilitates human-machine interaction in text-centric visual environments but also serves as a de facto gold proxy to evaluate AI models in the domain of text-centric scene understanding. Nonetheless, most existing TEC-VQA benchmarks have focused on high-resource languages like English and Chinese. Despite pioneering wo… ▽ More

    Submitted 11 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

  25. arXiv:2405.11165  [pdf, other

    cs.CV

    Automated Multi-level Preference for MLLMs

    Authors: Mengxi Zhang, Wenhao Wu, Yu Lu, Yuxin Song, Kang Rong, Huanjin Yao, Jianbo Zhao, Fanglong Liu, Yifan Sun, Haocheng Feng, Jingdong Wang

    Abstract: Current multimodal Large Language Models (MLLMs) suffer from ``hallucination'', occasionally generating responses that are not grounded in the input images. To tackle this challenge, one promising path is to utilize reinforcement learning from human feedback (RLHF), which steers MLLMs towards learning superior responses while avoiding inferior ones. We rethink the common practice of using binary p… ▽ More

    Submitted 28 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

    Comments: Preprint

  26. arXiv:2404.15228  [pdf, other

    cs.CV cs.CL

    Re-Thinking Inverse Graphics With Large Language Models

    Authors: Peter Kulits, Haiwen Feng, Weiyang Liu, Victoria Abrevaya, Michael J. Black

    Abstract: Inverse graphics -- the task of inverting an image into physical variables that, when rendered, enable reproduction of the observed scene -- is a fundamental challenge in computer vision and graphics. Disentangling an image into its constituent elements, such as the shape, color, and material properties of the objects of the 3D scene that produced it, requires a comprehensive understanding of the… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 31 pages; project page: https://ig-llm.is.tue.mpg.de/

  27. arXiv:2404.13639  [pdf

    cs.NI

    Performance Analysis for Deterministic System using Time Sensitive Network

    Authors: Md Mehedi Hasan, He Feng

    Abstract: Modern technology necessitates the use of dependable, fast, and inexpensive networks as the backbone for data transmission. Switched Ethernet coupled with the Time Sensitive Networking

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: No Comment

    MSC Class: Review

  28. arXiv:2404.12803  [pdf, other

    cs.CV cs.LG

    TextSquare: Scaling up Text-Centric Visual Instruction Tuning

    Authors: Jingqun Tang, Chunhui Lin, Zhen Zhao, Shu Wei, Binghong Wu, Qi Liu, Hao Feng, Yang Li, Siqi Wang, Lei Liao, Wei Shi, Yuliang Liu, Hao Liu, Yuan Xie, Xiang Bai, Can Huang

    Abstract: Text-centric visual question answering (VQA) has made great strides with the development of Multimodal Large Language Models (MLLMs), yet open-source models still fall short of leading models like GPT4V and Gemini, partly due to a lack of extensive, high-quality instruction tuning data. To this end, we introduce a new approach for creating a massive, high-quality instruction-tuning dataset, Square… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  29. arXiv:2404.11864  [pdf, other

    cs.CV

    Progressive Multi-modal Conditional Prompt Tuning

    Authors: Xiaoyu Qiu, Hao Feng, Yuechen Wang, Wengang Zhou, Houqiang Li

    Abstract: Pre-trained vision-language models (VLMs) have shown remarkable generalization capabilities via prompting, which leverages VLMs as knowledge bases to extract information beneficial for downstream tasks. However, existing methods primarily employ uni-modal prompting, which only engages a uni-modal branch, failing to simultaneously adjust vision-language (V-L) features. Additionally, the one-pass fo… ▽ More

    Submitted 24 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

  30. arXiv:2404.10405  [pdf, other

    cs.CV cs.AI cs.LG

    Integration of Self-Supervised BYOL in Semi-Supervised Medical Image Recognition

    Authors: Hao Feng, Yuanzhe Jia, Ruijia Xu, Mukesh Prasad, Ali Anaissi, Ali Braytee

    Abstract: Image recognition techniques heavily rely on abundant labeled data, particularly in medical contexts. Addressing the challenges associated with obtaining labeled data has led to the prominence of self-supervised learning and semi-supervised learning, especially in scenarios with limited annotated data. In this paper, we proposed an innovative approach by integrating self-supervised learning into s… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: Accepted by ICCS 2024

  31. arXiv:2404.09797  [pdf, other

    cs.CV

    TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding

    Authors: Bozhi Luan, Hao Feng, Hong Chen, Yonghui Wang, Wengang Zhou, Houqiang Li

    Abstract: The advent of Large Multimodal Models (LMMs) has sparked a surge in research aimed at harnessing their remarkable reasoning abilities. However, for understanding text-rich images, challenges persist in fully leveraging the potential of LMMs, and existing methods struggle with effectively processing high-resolution images. In this work, we propose TextCoT, a novel Chain-of-Thought framework for tex… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  32. arXiv:2404.07833  [pdf

    cs.CV cs.LG

    Streamlined Photoacoustic Image Processing with Foundation Models: A Training-Free Solution

    Authors: Handi Deng, Yucheng Zhou, Jiaxuan Xiang, Liujie Gu, Yan Luo, Hai Feng, Mingyuan Liu, Cheng Ma

    Abstract: Foundation models have rapidly evolved and have achieved significant accomplishments in computer vision tasks. Specifically, the prompt mechanism conveniently allows users to integrate image prior information into the model, making it possible to apply models without any training. Therefore, we propose a method based on foundation models and zero training to solve the tasks of photoacoustic (PA) i… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  33. arXiv:2404.06165  [pdf, other

    cs.CV cs.MM eess.IV eess.SP

    Enhanced Radar Perception via Multi-Task Learning: Towards Refined Data for Sensor Fusion Applications

    Authors: Huawei Sun, Hao Feng, Gianfranco Mauro, Julius Ott, Georg Stettinger, Lorenzo Servadei, Robert Wille

    Abstract: Radar and camera fusion yields robustness in perception tasks by leveraging the strength of both sensors. The typical extracted radar point cloud is 2D without height information due to insufficient antennas along the elevation axis, which challenges the network performance. This work introduces a learning-based approach to infer the height of radar points associated with 3D objects. A novel robus… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: Accepted by IEEE Intelligent Vehicles Symposium (IV 2024)

  34. arXiv:2404.04221  [pdf, other

    cs.CL

    How Lexical is Bilingual Lexicon Induction?

    Authors: Harsh Kohli, Helian Feng, Nicholas Dronen, Calvin McCarter, Sina Moeini, Ali Kebarighotbi

    Abstract: In contemporary machine learning approaches to bilingual lexicon induction (BLI), a model learns a mapping between the embedding spaces of a language pair. Recently, retrieve-and-rank approach to BLI has achieved state of the art results on the task. However, the problem remains challenging in low-resource settings, due to the paucity of data. The task is complicated by factors such as lexical var… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: 8 pages, 4 figures. Paper accepted at NAACL Findings 2024

  35. arXiv:2403.16459  [pdf, other

    cs.LG math.ST stat.ML

    On the rates of convergence for learning with convolutional neural networks

    Authors: Yunfei Yang, Han Feng, Ding-Xuan Zhou

    Abstract: We study approximation and learning capacities of convolutional neural networks (CNNs) with one-side zero-padding and multiple channels. Our first result proves a new approximation bound for CNNs with certain constraint on the weights. Our second result gives new analysis on the covering number of feed-forward neural networks with CNNs as special cases. The analysis carefully takes into account th… ▽ More

    Submitted 8 April, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

  36. arXiv:2403.15009  [pdf, other

    cs.CV

    TexRO: Generating Delicate Textures of 3D Models by Recursive Optimization

    Authors: Jinbo Wu, Xing Liu, Chenming Wu, Xiaobo Gao, Jialun Liu, Xinqi Liu, Chen Zhao, Haocheng Feng, Errui Ding, Jingdong Wang

    Abstract: This paper presents TexRO, a novel method for generating delicate textures of a known 3D mesh by optimizing its UV texture. The key contributions are two-fold. We propose an optimal viewpoint selection strategy, that finds the most miniature set of viewpoints covering all the faces of a mesh. Our viewpoint selection strategy guarantees the completeness of a generated result. We propose a recursive… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: Technical report. Project page: https://3d-aigc.github.io/TexRO

  37. arXiv:2403.14611  [pdf, other

    cs.CV

    Explorative Inbetweening of Time and Space

    Authors: Haiwen Feng, Zheng Ding, Zhihao Xia, Simon Niklaus, Victoria Abrevaya, Michael J. Black, Xuaner Zhang

    Abstract: We introduce bounded generation as a generalized task to control video generation to synthesize arbitrary camera and subject motion based only on a given start and end frame. Our objective is to fully leverage the inherent generalization capability of an image-to-video model without additional training or fine-tuning of the original model. This is achieved through the proposed new sampling strateg… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: project page at https://time-reversal.github.io

  38. arXiv:2403.13433  [pdf, other

    cs.AI cs.CL cs.CY

    AgentGroupChat: An Interactive Group Chat Simulacra For Better Eliciting Emergent Behavior

    Authors: Zhouhong Gu, Xiaoxuan Zhu, Haoran Guo, Lin Zhang, Yin Cai, Hao Shen, Jiangjie Chen, Zheyu Ye, Yifei Dai, Yan Gao, Yao Hu, Hongwei Feng, Yanghua Xiao

    Abstract: Language significantly influences the formation and evolution of Human emergent behavior, which is crucial in understanding collective intelligence within human societies. Considering that the study of how language affects human behavior needs to put it into the dynamic scenarios in which it is used, we introduce AgentGroupChat in this paper, a simulation that delves into the complex role of langu… ▽ More

    Submitted 4 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  39. arXiv:2403.13352  [pdf, other

    cs.CV

    AGFSync: Leveraging AI-Generated Feedback for Preference Optimization in Text-to-Image Generation

    Authors: Jingkun An, Yinghao Zhu, Zongjian Li, Haoran Feng, Bohua Chen, Yemin Shi, Chengwei Pan

    Abstract: Text-to-Image (T2I) diffusion models have achieved remarkable success in image generation. Despite their progress, challenges remain in both prompt-following ability, image quality and lack of high-quality datasets, which are essential for refining these models. As acquiring labeled data is costly, we introduce AGFSync, a framework that enhances T2I diffusion models through Direct Preference Optim… ▽ More

    Submitted 3 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  40. arXiv:2403.10147  [pdf, other

    cs.CV

    GGRt: Towards Pose-free Generalizable 3D Gaussian Splatting in Real-time

    Authors: Hao Li, Yuanyuan Gao, Chenming Wu, Dingwen Zhang, Yalun Dai, Chen Zhao, Haocheng Feng, Errui Ding, Jingdong Wang, Junwei Han

    Abstract: This paper presents GGRt, a novel approach to generalizable novel view synthesis that alleviates the need for real camera poses, complexity in processing high-resolution images, and lengthy optimization processes, thus facilitating stronger applicability of 3D Gaussian Splatting (3D-GS) in real-world scenarios. Specifically, we design a novel joint learning framework that consists of an Iterative… ▽ More

    Submitted 18 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

    Comments: Project page: https://3d-aigc.github.io/GGRt

  41. arXiv:2403.07825  [pdf, other

    cs.CL

    The Missing Piece in Model Editing: A Deep Dive into the Hidden Damage Brought By Model Editing

    Authors: Jianchen Wang, Zhouhong Gu, Xiaoxuan Zhu, Lin Zhang, Haoning Ye, Zhuozhi Xiong, Hongwei Feng, Yanghua Xiao

    Abstract: Large Language Models have revolutionized numerous tasks with their remarkable efficacy. However, editing these models, crucial for rectifying outdated or erroneous information, often leads to a complex issue known as the ripple effect in the hidden space. While difficult to detect, this effect can significantly impede the efficacy of model editing tasks and deteriorate model performance. This pap… ▽ More

    Submitted 2 July, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  42. arXiv:2402.19108  [pdf, other

    cs.CV

    DeepEraser: Deep Iterative Context Mining for Generic Text Eraser

    Authors: Hao Feng, Wendi Wang, Shaokai Liu, Jiajun Deng, Wengang Zhou, Houqiang Li

    Abstract: In this work, we present DeepEraser, an effective deep network for generic text removal. DeepEraser utilizes a recurrent architecture that erases the text in an image via iterative operations. Our idea comes from the process of erasing pencil script, where the text area designated for removal is subject to continuous monitoring and the text is attenuated progressively, ensuring a thorough and clea… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  43. arXiv:2402.18784  [pdf, other

    cs.AI q-bio.NC

    Brain-inspired and Self-based Artificial Intelligence

    Authors: Yi Zeng, Feifei Zhao, Yuxuan Zhao, Dongcheng Zhao, Enmeng Lu, Qian Zhang, Yuwei Wang, Hui Feng, Zhuoya Zhao, Jihang Wang, Qingqun Kong, Yinqian Sun, Yang Li, Guobin Shen, Bing Han, Yiting Dong, Wenxuan Pan, Xiang He, Aorigele Bao, Jin Wang

    Abstract: The question "Can machines think?" and the Turing Test to assess whether machines could achieve human-level intelligence is one of the roots of AI. With the philosophical argument "I think, therefore I am", this paper challenge the idea of a "thinking machine" supported by current AIs since there is no sense of self in them. Current artificial intelligence is only seemingly intelligent information… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  44. arXiv:2402.16607  [pdf, other

    cs.CV

    GVA: Reconstructing Vivid 3D Gaussian Avatars from Monocular Videos

    Authors: Xinqi Liu, Chenming Wu, Jialun Liu, Xing Liu, Jinbo Wu, Chen Zhao, Haocheng Feng, Errui Ding, Jingdong Wang

    Abstract: In this paper, we present a novel method that facilitates the creation of vivid 3D Gaussian avatars from monocular video inputs (GVA). Our innovation lies in addressing the intricate challenges of delivering high-fidelity human body reconstructions and aligning 3D Gaussians with human skin surfaces accurately. The key contributions of this paper are twofold. Firstly, we introduce a pose refinement… ▽ More

    Submitted 19 March, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

  45. arXiv:2402.15153  [pdf, other

    cs.CL cs.LG

    Self-Adaptive Reconstruction with Contrastive Learning for Unsupervised Sentence Embeddings

    Authors: Junlong Liu, Xichen Shang, Huawen Feng, Junhao Zheng, Qianli Ma

    Abstract: Unsupervised sentence embeddings task aims to convert sentences to semantic vector representations. Most previous works directly use the sentence representations derived from pretrained language models. However, due to the token bias in pretrained language models, the models can not capture the fine-grained semantics in sentences, which leads to poor predictions. To address this issue, we propose… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: 8 pages, 3 figures

  46. arXiv:2402.13669  [pdf, other

    cs.CL

    Self-Distillation Bridges Distribution Gap in Language Model Fine-Tuning

    Authors: Zhaorui Yang, Tianyu Pang, Haozhe Feng, Han Wang, Wei Chen, Minfeng Zhu, Qian Liu

    Abstract: The surge in Large Language Models (LLMs) has revolutionized natural language processing, but fine-tuning them for specific tasks often encounters challenges in balancing performance and preserving general instruction-following abilities. In this paper, we posit that the distribution gap between task datasets and the LLMs serves as the primary underlying cause. To address the problem, we introduce… ▽ More

    Submitted 28 May, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: ACL 2024

  47. arXiv:2402.10063  [pdf, other

    cs.LG

    Balancing the Causal Effects in Class-Incremental Learning

    Authors: Junhao Zheng, Ruiyan Wang, Chongzhi Zhang, Huawen Feng, Qianli Ma

    Abstract: Class-Incremental Learning (CIL) is a practical and challenging problem for achieving general artificial intelligence. Recently, Pre-Trained Models (PTMs) have led to breakthroughs in both visual and natural language processing tasks. Despite recent studies showing PTMs' potential ability to learn sequentially, a plethora of work indicates the necessity of alleviating the catastrophic forgetting o… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

  48. arXiv:2401.18050  [pdf

    cs.ET physics.optics

    Hypermultiplexed Integrated Tensor Optical Processor

    Authors: Shaoyuan Ou, Alexander Sludds, Ryan Hamerly, Ke Zhang, Hanke Feng, Eric Zhong, Cheng Wang, Dirk Englund, Mengjie Yu, Zaijun Chen

    Abstract: The escalating data volume and complexity resulting from the rapid expansion of artificial intelligence (AI), internet of things (IoT) and 5G/6G mobile networks is creating an urgent need for energy-efficient, scalable computing hardware. Here we demonstrate a hypermultiplexed integrated tensor optical processor (HITOP) that performs trillions of operations per second (TeraOPS) at the energy cost… ▽ More

    Submitted 13 February, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

  49. Alleviating Structural Distribution Shift in Graph Anomaly Detection

    Authors: Yuan Gao, Xiang Wang, Xiangnan He, Zhenguang Liu, Huamin Feng, Yongdong Zhang

    Abstract: Graph anomaly detection (GAD) is a challenging binary classification problem due to its different structural distribution between anomalies and normal nodes -- abnormal nodes are a minority, therefore holding high heterophily and low homophily compared to normal nodes. Furthermore, due to various time factors and the annotation preferences of human experts, the heterophily and homophily can change… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: Accepted to WSDM 2023

  50. arXiv:2401.11181  [pdf, other

    cs.DC

    Inference without Interference: Disaggregate LLM Inference for Mixed Downstream Workloads

    Authors: Cunchen Hu, Heyang Huang, Liangliang Xu, Xusheng Chen, Jiang Xu, Shuang Chen, Hao Feng, Chenxi Wang, Sa Wang, Yungang Bao, Ninghui Sun, Yizhou Shan

    Abstract: Transformer-based large language model (LLM) inference serving is now the backbone of many cloud services. LLM inference consists of a prefill phase and a decode phase. However, existing LLM deployment practices often overlook the distinct characteristics of these phases, leading to significant interference. To mitigate interference, our insight is to carefully schedule and group inference request… ▽ More

    Submitted 20 January, 2024; originally announced January 2024.