Skip to main content

Showing 1–50 of 160 results for author: Zhai, G

  1. arXiv:2407.13719  [pdf, other

    cs.CV

    HazeCLIP: Towards Language Guided Real-World Image Dehazing

    Authors: Ruiyi Wang, Wenhao Li, Xiaohong Liu, Chunyi Li, Zicheng Zhang, Xiongkuo Min, Guangtao Zhai

    Abstract: Existing methods have achieved remarkable performance in single image dehazing, particularly on synthetic datasets. However, they often struggle with real-world hazy images due to domain shift, limiting their practical applicability. This paper introduces HazeCLIP, a language-guided adaptation framework designed to enhance the real-world performance of pre-trained dehazing networks. Inspired by th… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 5 pages, 4 figures

  2. arXiv:2407.12431  [pdf, other

    cs.CV

    GLARE: Low Light Image Enhancement via Generative Latent Feature based Codebook Retrieval

    Authors: Han Zhou, Wei Dong, Xiaohong Liu, Shuaicheng Liu, Xiongkuo Min, Guangtao Zhai, Jun Chen

    Abstract: Most existing Low-light Image Enhancement (LLIE) methods either directly map Low-Light (LL) to Normal-Light (NL) images or use semantic or illumination maps as guides. However, the ill-posed nature of LLIE and the difficulty of semantic retrieval from impaired inputs limit these methods, especially in extremely low-light conditions. To address this issue, we present a new LLIE network via Generati… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  3. arXiv:2407.10459  [pdf, other

    cs.CV

    DiffStega: Towards Universal Training-Free Coverless Image Steganography with Diffusion Models

    Authors: Yiwei Yang, Zheyuan Liu, Jun Jia, Zhongpai Gao, Yunhao Li, Wei Sun, Xiaohong Liu, Guangtao Zhai

    Abstract: Traditional image steganography focuses on concealing one image within another, aiming to avoid steganalysis by unauthorized entities. Coverless image steganography (CIS) enhances imperceptibility by not using any cover image. Recent works have utilized text prompts as keys in CIS through diffusion models. However, this approach faces three challenges: invalidated when private prompt is guessed, c… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 9 pages, 7 figures; reference added; accepted at IJCAI2024 main track

  4. arXiv:2407.05169  [pdf, other

    cs.CV

    DehazeDCT: Towards Effective Non-Homogeneous Dehazing via Deformable Convolutional Transformer

    Authors: Wei Dong, Han Zhou, Ruiyi Wang, Xiaohong Liu, Guangtao Zhai, Jun Chen

    Abstract: Image dehazing, a pivotal task in low-level vision, aims to restore the visibility and detail from hazy images. Many deep learning methods with powerful representation learning capability demonstrate advanced performance on non-homogeneous dehazing, however, these methods usually struggle with processing high-resolution images (e.g., $4000 \times 6000$) due to their heavy computational demands. To… ▽ More

    Submitted 24 May, 2024; originally announced July 2024.

  5. arXiv:2406.16641  [pdf, other

    cs.CV cs.AI

    Vision-Language Consistency Guided Multi-modal Prompt Learning for Blind AI Generated Image Quality Assessment

    Authors: Jun Fu, Wei Zhou, Qiuping Jiang, Hantao Liu, Guangtao Zhai

    Abstract: Recently, textual prompt tuning has shown inspirational performance in adapting Contrastive Language-Image Pre-training (CLIP) models to natural image quality assessment. However, such uni-modal prompt learning method only tunes the language branch of CLIP models. This is not enough for adapting CLIP models to AI generated image quality assessment (AGIQA) since AGIs visually differ from natural im… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE Signal Processing Letter

  6. arXiv:2406.15848  [pdf, other

    cs.CV

    Quality-guided Skin Tone Enhancement for Portrait Photography

    Authors: Shiqi Gao, Huiyu Duan, Xinyue Li, Kang Fu, Yicong Peng, Qihang Xu, Yuanyuan Chang, Jia Wang, Xiongkuo Min, Guangtao Zhai

    Abstract: In recent years, learning-based color and tone enhancement methods for photos have become increasingly popular. However, most learning-based image enhancement methods just learn a mapping from one distribution to another based on one dataset, lacking the ability to adjust images continuously and controllably. It is important to enable the learning-based enhancement models to adjust an image contin… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  7. arXiv:2406.09356  [pdf, other

    cs.CV eess.IV

    CMC-Bench: Towards a New Paradigm of Visual Signal Compression

    Authors: Chunyi Li, Xiele Wu, Haoning Wu, Donghui Feng, Zicheng Zhang, Guo Lu, Xiongkuo Min, Xiaohong Liu, Guangtao Zhai, Weisi Lin

    Abstract: Ultra-low bitrate image compression is a challenging and demanding topic. With the development of Large Multimodal Models (LMMs), a Cross Modality Compression (CMC) paradigm of Image-Text-Image has emerged. Compared with traditional codecs, this semantic-level compression can reduce image data size to 0.1\% or even lower, which has strong potential applications. However, CMC has certain defects in… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  8. arXiv:2406.06087  [pdf, other

    cs.CV

    GAIA: Rethinking Action Quality Assessment for AI-Generated Videos

    Authors: Zijian Chen, Wei Sun, Yuan Tian, Jun Jia, Zicheng Zhang, Jiarui Wang, Ru Huang, Xiongkuo Min, Guangtao Zhai, Wenjun Zhang

    Abstract: Assessing action quality is both imperative and challenging due to its significant impact on the quality of AI-generated videos, further complicated by the inherently ambiguous nature of actions within AI-generated video (AIGV). Current action quality assessment (AQA) algorithms predominantly focus on actions from real specific scenarios and are pre-trained with normative action features, thus ren… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 28 pages, 13 figures

  9. arXiv:2406.04765  [pdf, other

    cs.CV cs.MM

    SMC++: Masked Learning of Unsupervised Video Semantic Compression

    Authors: Yuan Tian, Guo Lu, Guangtao Zhai

    Abstract: Most video compression methods focus on human visual perception, neglecting semantic preservation. This leads to severe semantic loss during the compression, hampering downstream video analysis tasks. In this paper, we propose a Masked Video Modeling (MVM)-powered compression framework that particularly preserves video semantics, by jointly mining and compressing the semantics in a self-supervised… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  10. arXiv:2406.03070  [pdf, other

    cs.CV cs.AI

    A-Bench: Are LMMs Masters at Evaluating AI-generated Images?

    Authors: Zicheng Zhang, Haoning Wu, Chunyi Li, Yingjie Zhou, Wei Sun, Xiongkuo Min, Zijian Chen, Xiaohong Liu, Weisi Lin, Guangtao Zhai

    Abstract: How to accurately and efficiently assess AI-generated images (AIGIs) remains a critical challenge for generative models. Given the high costs and extensive time commitments required for user studies, many researchers have turned towards employing large multi-modal models (LMMs) as AIGI evaluators, the precision and validity of which are still questionable. Furthermore, traditional benchmarks often… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  11. arXiv:2406.02559  [pdf, other

    cs.CV

    ShadowRefiner: Towards Mask-free Shadow Removal via Fast Fourier Transformer

    Authors: Wei Dong, Han Zhou, Yuqiong Tian, Jingke Sun, Xiaohong Liu, Guangtao Zhai, Jun Chen

    Abstract: Shadow-affected images often exhibit pronounced spatial discrepancies in color and illumination, consequently degrading various vision applications including object detection and segmentation systems. To effectively eliminate shadows in real-world images while preserving intricate details and producing visually compelling outcomes, we introduce a mask-free Shadow Removal and Refinement network (Sh… ▽ More

    Submitted 2 July, 2024; v1 submitted 17 April, 2024; originally announced June 2024.

    Comments: Accepted by CVPR workshop 2024 (NTIRE 2024); Corrected references

  12. arXiv:2405.19298  [pdf, other

    cs.CV eess.IV

    Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare

    Authors: Hanwei Zhu, Haoning Wu, Yixuan Li, Zicheng Zhang, Baoliang Chen, Lingyu Zhu, Yuming Fang, Guangtao Zhai, Weisi Lin, Shiqi Wang

    Abstract: While recent advancements in large multimodal models (LMMs) have significantly improved their abilities in image quality assessment (IQA) relying on absolute quality rating, how to transfer reliable relative quality comparison outputs to continuous perceptual quality scores remains largely unexplored. To address this gap, we introduce Compare2Score-an all-around LMM-based no-reference IQA (NR-IQA)… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  13. arXiv:2405.09286  [pdf, other

    cs.MM cs.CV

    MVBIND: Self-Supervised Music Recommendation For Videos Via Embedding Space Binding

    Authors: Jiajie Teng, Huiyu Duan, Yucheng Zhu, Sijing Wu, Guangtao Zhai

    Abstract: Recent years have witnessed the rapid development of short videos, which usually contain both visual and audio modalities. Background music is important to the short videos, which can significantly influence the emotions of the viewers. However, at present, the background music of short videos is generally chosen by the video producer, and there is a lack of automatic music recommendation methods… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  14. arXiv:2405.08745  [pdf, other

    eess.IV cs.CV cs.MM

    Enhancing Blind Video Quality Assessment with Rich Quality-aware Features

    Authors: Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai

    Abstract: In this paper, we present a simple but effective method to enhance blind video quality assessment (BVQA) models for social media videos. Motivated by previous researches that leverage pre-trained features extracted from various computer vision models as the feature representation for BVQA, we further explore rich quality-aware features from pre-trained blind image quality assessment (BIQA) and BVQ… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  15. arXiv:2405.08555  [pdf, other

    cs.CV cs.MM

    Dual-Branch Network for Portrait Image Quality Assessment

    Authors: Wei Sun, Weixia Zhang, Yanwei Jiang, Haoning Wu, Zicheng Zhang, Jun Jia, Yingjie Zhou, Zhongpeng Ji, Xiongkuo Min, Weisi Lin, Guangtao Zhai

    Abstract: Portrait images typically consist of a salient person against diverse backgrounds. With the development of mobile devices and image processing techniques, users can conveniently capture portrait images anytime and anywhere. However, the quality of these portraits may suffer from the degradation caused by unfavorable environmental conditions, subpar photography techniques, and inferior capturing de… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  16. arXiv:2405.07346  [pdf, other

    cs.CV

    Understanding and Evaluating Human Preferences for AI Generated Images with Instruction Tuning

    Authors: Jiarui Wang, Huiyu Duan, Guangtao Zhai, Xiongkuo Min

    Abstract: Artificial Intelligence Generated Content (AIGC) has grown rapidly in recent years, among which AI-based image generation has gained widespread attention due to its efficient and imaginative image creation ability. However, AI-generated Images (AIGIs) may not satisfy human preferences due to their unique distortions, which highlights the necessity to understand and evaluate human preferences for A… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

  17. arXiv:2405.06342  [pdf, other

    cs.CV eess.IV

    Compression-Realized Deep Structural Network for Video Quality Enhancement

    Authors: Hanchi Sun, Xiaohong Liu, Xinyang Jiang, Yifei Shen, Dongsheng Li, Xiongkuo Min, Guangtao Zhai

    Abstract: This paper focuses on the task of quality enhancement for compressed videos. Although deep network-based video restorers achieve impressive progress, most of the existing methods lack a structured design to optimally leverage the priors within compression codecs. Since the quality degradation of the video is primarily induced by the compression algorithm, a new paradigm is urgently needed for a mo… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  18. arXiv:2405.03333  [pdf, other

    cs.CV

    Light-VQA+: A Video Quality Assessment Model for Exposure Correction with Vision-Language Guidance

    Authors: Xunchu Zhou, Xiaohong Liu, Yunlong Dong, Tengchuan Kou, Yixuan Gao, Zicheng Zhang, Chunyi Li, Haoning Wu, Guangtao Zhai

    Abstract: Recently, User-Generated Content (UGC) videos have gained popularity in our daily lives. However, UGC videos often suffer from poor exposure due to the limitations of photographic equipment and techniques. Therefore, Video Exposure Correction (VEC) algorithms have been proposed, Low-Light Video Enhancement (LLVE) and Over-Exposed Video Recovery (OEVR) included. Equally important to the VEC is the… ▽ More

    Submitted 13 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

  19. arXiv:2405.00915  [pdf, other

    cs.CV cs.AI cs.LG

    EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion

    Authors: Guangyao Zhai, Evin Pınar Örnek, Dave Zhenyu Chen, Ruotong Liao, Yan Di, Nassir Navab, Federico Tombari, Benjamin Busam

    Abstract: We present EchoScene, an interactive and controllable generative model that generates 3D indoor scenes on scene graphs. EchoScene leverages a dual-branch diffusion model that dynamically adapts to scene graphs. Existing methods struggle to handle scene graphs due to varying numbers of nodes, multiple edge combinations, and manipulator-induced node-edge operations. EchoScene overcomes this by assoc… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 25 pages. 10 figures

  20. arXiv:2404.18343  [pdf, other

    cs.MM cs.CV

    G-Refine: A General Quality Refiner for Text-to-Image Generation

    Authors: Chunyi Li, Haoning Wu, Hongkun Hao, Zicheng Zhang, Tengchaun Kou, Chaofeng Chen, Lei Bai, Xiaohong Liu, Weisi Lin, Guangtao Zhai

    Abstract: With the evolution of Text-to-Image (T2I) models, the quality defects of AI-Generated Images (AIGIs) pose a significant barrier to their widespread adoption. In terms of both perception and alignment, existing models cannot always guarantee high-quality results. To mitigate this limitation, we introduce G-Refine, a general image quality refiner designed to enhance low-quality images without compro… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  21. arXiv:2404.18203  [pdf, other

    cs.CV cs.AI

    LMM-PCQA: Assisting Point Cloud Quality Assessment with LMM

    Authors: Zicheng Zhang, Haoning Wu, Yingjie Zhou, Chunyi Li, Wei Sun, Chaofeng Chen, Xiongkuo Min, Xiaohong Liu, Weisi Lin, Guangtao Zhai

    Abstract: Although large multi-modality models (LMMs) have seen extensive exploration and application in various quality assessment studies, their integration into Point Cloud Quality Assessment (PCQA) remains unexplored. Given LMMs' exceptional performance and robustness in low-level vision and quality assessment tasks, this study aims to investigate the feasibility of imparting PCQA knowledge to LMMs thro… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  22. arXiv:2404.18162  [pdf, other

    cs.MM q-bio.NC

    fMRI Exploration of Visual Quality Assessment

    Authors: Yiming Zhang, Ying Hu, Xiongkuo Min, Yan Zhou, Guangtao Zhai

    Abstract: Despite significant strides in visual quality assessment, the neural mechanisms underlying visual quality perception remain insufficiently explored. This study employed fMRI to examine brain activity during image quality assessment and identify differences in human processing of images with varying quality. Fourteen healthy participants underwent tasks assessing both image quality and content clas… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  23. arXiv:2404.17762  [pdf, other

    cs.CV

    Large Multi-modality Model Assisted AI-Generated Image Quality Assessment

    Authors: Puyi Wang, Wei Sun, Zicheng Zhang, Jun Jia, Yanwei Jiang, Zhichao Zhang, Xiongkuo Min, Guangtao Zhai

    Abstract: Traditional deep neural network (DNN)-based image quality assessment (IQA) models leverage convolutional neural networks (CNN) or Transformer to learn the quality-aware feature representation, achieving commendable performance on natural scene images. However, when applied to AI-Generated images (AGIs), these DNN-based IQA models exhibit subpar performance. This situation is largely due to the sem… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  24. arXiv:2404.16687  [pdf, other

    cs.CV

    NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

    Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More

    Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  25. arXiv:2404.16205  [pdf, other

    cs.CV cs.MM

    AIS 2024 Challenge on Video Quality Assessment of User-Generated Content: Methods and Results

    Authors: Marcos V. Conde, Saman Zadtootaghaj, Nabajeet Barman, Radu Timofte, Chenlong He, Qi Zheng, Ruoxi Zhu, Zhengzhong Tu, Haiqiang Wang, Xiangguang Chen, Wenhui Meng, Xiang Pan, Huiying Shi, Han Zhu, Xiaozhong Xu, Lei Sun, Zhenzhong Chen, Shan Liu, Zicheng Zhang, Haoning Wu, Yingjie Zhou, Chunyi Li, Xiaohong Liu, Weisi Lin, Guangtao Zhai , et al. (11 additional authors not shown)

    Abstract: This paper reviews the AIS 2024 Video Quality Assessment (VQA) Challenge, focused on User-Generated Content (UGC). The aim of this challenge is to gather deep learning-based methods capable of estimating the perceptual quality of UGC videos. The user-generated videos from the YouTube UGC Dataset include diverse content (sports, games, lyrics, anime, etc.), quality and resolutions. The proposed met… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 Workshop -- AI for Streaming (AIS) Video Quality Assessment Challenge

  26. arXiv:2404.11313  [pdf, other

    eess.IV cs.AI

    NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

    Authors: Xin Li, Kun Yuan, Yajing Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

  27. arXiv:2404.09003  [pdf, other

    cs.CV eess.IV

    THQA: A Perceptual Quality Assessment Database for Talking Heads

    Authors: Yingjie Zhou, Zicheng Zhang, Wei Sun, Xiaohong Liu, Xiongkuo Min, Zhihua Wang, Xiao-Ping Zhang, Guangtao Zhai

    Abstract: In the realm of media technology, digital humans have gained prominence due to rapid advancements in computer technology. However, the manual modeling and control required for the majority of digital humans pose significant obstacles to efficient development. The speech-driven methods offer a novel avenue for manipulating the mouth shape and expressions of digital humans. Despite the proliferation… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

  28. arXiv:2404.07537  [pdf, other

    cs.CV

    How is Visual Attention Influenced by Text Guidance? Database and Model

    Authors: Yinan Sun, Xiongkuo Min, Huiyu Duan, Guangtao Zhai

    Abstract: The analysis and prediction of visual attention have long been crucial tasks in the fields of computer vision and image processing. In practical applications, images are generally accompanied by various text descriptions, however, few studies have explored the influence of text descriptions on visual attention, let alone developed visual saliency prediction models considering text guidance. In thi… ▽ More

    Submitted 12 April, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

  29. arXiv:2404.03407  [pdf, other

    cs.CV

    AIGIQA-20K: A Large Database for AI-Generated Image Quality Assessment

    Authors: Chunyi Li, Tengchuan Kou, Yixuan Gao, Yuqin Cao, Wei Sun, Zicheng Zhang, Yingjie Zhou, Zhichao Zhang, Weixia Zhang, Haoning Wu, Xiaohong Liu, Xiongkuo Min, Guangtao Zhai

    Abstract: With the rapid advancements in AI-Generated Content (AIGC), AI-Generated Images (AIGIs) have been widely applied in entertainment, education, and social media. However, due to the significant variance in quality among different AIGIs, there is an urgent need for models that consistently match human subjective ratings. To address this issue, we organized a challenge towards AIGC quality assessment… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  30. arXiv:2404.01024  [pdf, other

    cs.CV eess.IV

    AIGCOIQA2024: Perceptual Quality Assessment of AI Generated Omnidirectional Images

    Authors: Liu Yang, Huiyu Duan, Long Teng, Yucheng Zhu, Xiaohong Liu, Menghan Hu, Xiongkuo Min, Guangtao Zhai, Patrick Le Callet

    Abstract: In recent years, the rapid advancement of Artificial Intelligence Generated Content (AIGC) has attracted widespread attention. Among the AIGC, AI generated omnidirectional images hold significant potential for Virtual Reality (VR) and Augmented Reality (AR) applications, hence omnidirectional AIGC techniques have also been widely studied. AI-generated omnidirectional images exhibit unique distorti… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  31. arXiv:2403.17881  [pdf, other

    cs.CV

    Deepfake Generation and Detection: A Benchmark and Survey

    Authors: Gan Pei, Jiangning Zhang, Menghan Hu, Zhenyu Zhang, Chengjie Wang, Yunsheng Wu, Guangtao Zhai, Jian Yang, Chunhua Shen, Dacheng Tao

    Abstract: Deepfake is a technology dedicated to creating highly realistic facial images and videos under specific conditions, which has significant application potential in fields such as entertainment, movie production, digital human creation, to name a few. With the advancements in deep learning, techniques primarily represented by Variational Autoencoders and Generative Adversarial Networks have achieved… ▽ More

    Submitted 16 May, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: We closely follow the latest developments in https://github.com/flyingby/Awesome-Deepfake-Generation-and-Detection

  32. arXiv:2403.11956  [pdf, other

    cs.CV

    Subjective-Aligned Dataset and Metric for Text-to-Video Quality Assessment

    Authors: Tengchuan Kou, Xiaohong Liu, Zicheng Zhang, Chunyi Li, Haoning Wu, Xiongkuo Min, Guangtao Zhai, Ning Liu

    Abstract: With the rapid development of generative models, Artificial Intelligence-Generated Contents (AIGC) have exponentially increased in daily lives. Among them, Text-to-Video (T2V) generation has received widespread attention. Though many T2V models have been released for generating high perceptual quality videos, there is still lack of a method to evaluate the quality of these videos quantitatively. T… ▽ More

    Submitted 18 May, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

  33. arXiv:2403.11324  [pdf, other

    cs.CV

    GeoGaussian: Geometry-aware Gaussian Splatting for Scene Rendering

    Authors: Yanyan Li, Chenyu Lyu, Yan Di, Guangyao Zhai, Gim Hee Lee, Federico Tombari

    Abstract: During the Gaussian Splatting optimization process, the scene's geometry can gradually deteriorate if its structure is not deliberately preserved, especially in non-textured regions such as walls, ceilings, and furniture surfaces. This degradation significantly affects the rendering quality of novel views that deviate significantly from the viewpoints in the training data. To mitigate this issue,… ▽ More

    Submitted 17 July, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

    Comments: accepted to ECCV 2024

  34. arXiv:2403.06452  [pdf, other

    cs.CV

    Text2QR: Harmonizing Aesthetic Customization and Scanning Robustness for Text-Guided QR Code Generation

    Authors: Guangyang Wu, Xiaohong Liu, Jun Jia, Xuehao Cui, Guangtao Zhai

    Abstract: In the digital era, QR codes serve as a linchpin connecting virtual and physical realms. Their pervasive integration across various applications highlights the demand for aesthetically pleasing codes without compromised scannability. However, prevailing methods grapple with the intrinsic challenge of balancing customization and scannability. Notably, stable-diffusion models have ushered in an epoc… ▽ More

    Submitted 12 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  35. arXiv:2403.06421  [pdf, other

    cs.CV

    A Comparative Study of Perceptual Quality Metrics for Audio-driven Talking Head Videos

    Authors: Weixia Zhang, Chengguang Zhu, Jingnan Gao, Yichao Yan, Guangtao Zhai, Xiaokang Yang

    Abstract: The rapid advancement of Artificial Intelligence Generated Content (AIGC) technology has propelled audio-driven talking head generation, gaining considerable research attention for practical applications. However, performance evaluation research lags behind the development of talking head generation techniques. Existing literature relies on heuristic quantitative metrics without human validation,… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  36. arXiv:2403.06406  [pdf, other

    cs.CV

    Comparison of No-Reference Image Quality Models via MAP Estimation in Diffusion Latents

    Authors: Weixia Zhang, Dingquan Li, Guangtao Zhai, Xiaokang Yang, Kede Ma

    Abstract: Contemporary no-reference image quality assessment (NR-IQA) models can effectively quantify the perceived image quality, with high correlations between model predictions and human perceptual scores on fixed test sets. However, little progress has been made in comparing NR-IQA models from a perceptual optimization perspective. Here, for the first time, we demonstrate that NR-IQA models can be plugg… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  37. arXiv:2402.16749  [pdf, other

    cs.CV cs.AI eess.IV

    MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model

    Authors: Chunyi Li, Guo Lu, Donghui Feng, Haoning Wu, Zicheng Zhang, Xiaohong Liu, Guangtao Zhai, Weisi Lin, Wenjun Zhang

    Abstract: With the evolution of storage and communication protocols, ultra-low bitrate image compression has become a highly demanding topic. However, existing compression algorithms must sacrifice either consistency with the ground truth or perceptual quality at ultra-low bitrate. In recent years, the rapid development of the Large Multimodal Model (LMM) has made it possible to balance these two goals. To… ▽ More

    Submitted 17 April, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: 13 page, 11 figures, 4 tables

  38. arXiv:2402.16641  [pdf, other

    cs.CV

    Towards Open-ended Visual Quality Comparison

    Authors: Haoning Wu, Hanwei Zhu, Zicheng Zhang, Erli Zhang, Chaofeng Chen, Liang Liao, Chunyi Li, Annan Wang, Wenxiu Sun, Qiong Yan, Xiaohong Liu, Guangtao Zhai, Shiqi Wang, Weisi Lin

    Abstract: Comparative settings (e.g. pairwise choice, listwise ranking) have been adopted by a wide range of subjective studies for image quality assessment (IQA), as it inherently standardizes the evaluation criteria across different observers and offer more clear-cut responses. In this work, we extend the edge of emerging large multi-modality models (LMMs) to further advance visual quality comparison into… ▽ More

    Submitted 4 March, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: Fix typos

  39. arXiv:2402.16599  [pdf, other

    cs.CV eess.IV

    Resolution-Agnostic Neural Compression for High-Fidelity Portrait Video Conferencing via Implicit Radiance Fields

    Authors: Yifei Li, Xiaohong Liu, Yicong Peng, Guangtao Zhai, Jun Zhou

    Abstract: Video conferencing has caught much more attention recently. High fidelity and low bandwidth are two major objectives of video compression for video conferencing applications. Most pioneering methods rely on classic video compression codec without high-level feature embedding and thus can not reach the extremely low bandwidth. Recent works instead employ model-based neural compression to acquire ul… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: 16 pages, 5 figures, accepted by IFTC2023

  40. arXiv:2402.07116  [pdf, other

    cs.CV

    A Benchmark for Multi-modal Foundation Models on Low-level Vision: from Single Images to Pairs

    Authors: Zicheng Zhang, Haoning Wu, Erli Zhang, Guangtao Zhai, Weisi Lin

    Abstract: The rapid development of Multi-modality Large Language Models (MLLMs) has navigated a paradigm shift in computer vision, moving towards versatile foundational models. However, evaluating MLLMs in low-level visual perception and understanding remains a yet-to-explore domain. To this end, we design benchmark settings to emulate human language responses related to low-level vision: the low-level visu… ▽ More

    Submitted 11 February, 2024; originally announced February 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2309.14181

  41. arXiv:2402.03413  [pdf, other

    cs.MM cs.CV eess.IV

    Perceptual Video Quality Assessment: A Survey

    Authors: Xiongkuo Min, Huiyu Duan, Wei Sun, Yucheng Zhu, Guangtao Zhai

    Abstract: Perceptual video quality assessment plays a vital role in the field of video processing due to the existence of quality degradations introduced in various stages of video signal acquisition, compression, transmission and display. With the advancement of internet communication and cloud service technology, video content and traffic are growing exponentially, which further emphasizes the requirement… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  42. arXiv:2402.01201  [pdf, other

    cs.LG cs.AI

    Few-Shot Class-Incremental Learning with Prior Knowledge

    Authors: Wenhao Jiang, Duo Li, Menghan Hu, Guangtao Zhai, Xiaokang Yang, Xiao-Ping Zhang

    Abstract: To tackle the issues of catastrophic forgetting and overfitting in few-shot class-incremental learning (FSCIL), previous work has primarily concentrated on preserving the memory of old knowledge during the incremental phase. The role of pre-trained model in shaping the effectiveness of incremental learning is frequently underestimated in these studies. Therefore, to enhance the generalization abil… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  43. arXiv:2401.04435  [pdf, other

    cs.CV

    Uncertainty-aware Sampling for Long-tailed Semi-supervised Learning

    Authors: Kuo Yang, Duo Li, Menghan Hu, Guangtao Zhai, Xiaokang Yang, Xiao-Ping Zhang

    Abstract: For semi-supervised learning with imbalance classes, the long-tailed distribution of data will increase the model prediction bias toward dominant classes, undermining performance on less frequent classes. Existing methods also face challenges in ensuring the selection of sufficiently reliable pseudo-labels for model training and there is a lack of mechanisms to adjust the selection of more reliabl… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

    Comments: Submitted to TPAMI

  44. arXiv:2401.01569  [pdf, other

    cs.CV

    AttentionLut: Attention Fusion-based Canonical Polyadic LUT for Real-time Image Enhancement

    Authors: Kang Fu, Yicong Peng, Zicheng Zhang, Qihang Xu, Xiaohong Liu, Jia Wang, Guangtao Zhai

    Abstract: Recently, many algorithms have employed image-adaptive lookup tables (LUTs) to achieve real-time image enhancement. Nonetheless, a prevailing trend among existing methods has been the employment of linear combinations of basic LUTs to formulate image-adaptive LUTs, which limits the generalization ability of these methods. To address this limitation, we propose a novel framework named AttentionLut… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

  45. arXiv:2401.01117  [pdf, other

    cs.CV eess.IV

    Q-Refine: A Perceptual Quality Refiner for AI-Generated Image

    Authors: Chunyi Li, Haoning Wu, Zicheng Zhang, Hongkun Hao, Kaiwei Zhang, Lei Bai, Xiaohong Liu, Xiongkuo Min, Weisi Lin, Guangtao Zhai

    Abstract: With the rapid evolution of the Text-to-Image (T2I) model in recent years, their unsatisfactory generation result has become a challenge. However, uniformly refining AI-Generated Images (AIGIs) of different qualities not only limited optimization capabilities for low-quality AIGIs but also brought negative optimization to high-quality AIGIs. To address this issue, a quality-award refiner named Q-R… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Comments: 6 pages, 5 figures

  46. arXiv:2312.17090  [pdf, other

    cs.CV cs.CL cs.LG

    Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels

    Authors: Haoning Wu, Zicheng Zhang, Weixia Zhang, Chaofeng Chen, Liang Liao, Chunyi Li, Yixuan Gao, Annan Wang, Erli Zhang, Wenxiu Sun, Qiong Yan, Xiongkuo Min, Guangtao Zhai, Weisi Lin

    Abstract: The explosion of visual content available online underscores the requirement for an accurate machine assessor to robustly evaluate scores across diverse types of visual contents. While recent studies have demonstrated the exceptional potentials of large multi-modality models (LMMs) on a wide range of related fields, in this work, we explore how to teach them for visual rating aligned with human op… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

    Comments: Technical Report

  47. arXiv:2312.15300  [pdf, other

    cs.CV

    Q-Boost: On Visual Quality Assessment Ability of Low-level Multi-Modality Foundation Models

    Authors: Zicheng Zhang, Haoning Wu, Zhongpeng Ji, Chunyi Li, Erli Zhang, Wei Sun, Xiaohong Liu, Xiongkuo Min, Fengyu Sun, Shangling Jui, Weisi Lin, Guangtao Zhai

    Abstract: Recent advancements in Multi-modality Large Language Models (MLLMs) have demonstrated remarkable capabilities in complex high-level vision tasks. However, the exploration of MLLM potential in visual quality assessment, a vital aspect of low-level vision, remains limited. To address this gap, we introduce Q-Boost, a novel strategy designed to enhance low-level MLLMs in image quality assessment (IQA… ▽ More

    Submitted 23 December, 2023; originally announced December 2023.

  48. arXiv:2312.05476  [pdf, other

    cs.CV

    Exploring the Naturalness of AI-Generated Images

    Authors: Zijian Chen, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhongpeng Ji, Fengyu Sun, Shangling Jui, Xiongkuo Min, Guangtao Zhai, Wenjun Zhang

    Abstract: The proliferation of Artificial Intelligence-Generated Images (AGIs) has greatly expanded the Image Naturalness Assessment (INA) problem. Different from early definitions that mainly focus on tone-mapped images with limited distortions (e.g., exposure, contrast, and color reproduction), INA on AI-generated images is especially challenging as it has more diverse contents and could be affected by fa… ▽ More

    Submitted 4 March, 2024; v1 submitted 9 December, 2023; originally announced December 2023.

    Comments: 33 pages

  49. arXiv:2312.04369  [pdf, other

    cs.CV

    SingingHead: A Large-scale 4D Dataset for Singing Head Animation

    Authors: Sijing Wu, Yunhao Li, Weitian Zhang, Jun Jia, Yucheng Zhu, Yichao Yan, Guangtao Zhai, Xiaokang Yang

    Abstract: Singing, as a common facial movement second only to talking, can be regarded as a universal language across ethnicities and cultures, plays an important role in emotional communication, art, and entertainment. However, it is often overlooked in the field of audio-driven facial animation due to the lack of singing head datasets and the domain gap between singing and talking in rhythm and amplitude.… ▽ More

    Submitted 14 July, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

    Comments: Project page: https://wsj-sjtu.github.io/SingingHead/

  50. arXiv:2311.18216  [pdf, other

    cs.CV cs.MM eess.IV

    FS-BAND: A Frequency-Sensitive Banding Detector

    Authors: Zijian Chen, Wei Sun, Zicheng Zhang, Ru Huang, Fangfang Lu, Xiongkuo Min, Guangtao Zhai, Wenjun Zhang

    Abstract: Banding artifact, as known as staircase-like contour, is a common quality annoyance that happens in compression, transmission, etc. scenarios, which largely affects the user's quality of experience (QoE). The banding distortion typically appears as relatively small pixel-wise variations in smooth backgrounds, which is difficult to analyze in the spatial domain but easily reflected in the frequency… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2311.17752