Skip to main content

Showing 1–50 of 99 results for author: Lan, C

  1. arXiv:2407.13108  [pdf, other

    cs.CV

    UCIP: A Universal Framework for Compressed Image Super-Resolution using Dynamic Prompt

    Authors: Xin Li, Bingchen Li, Yeying Jin, Cuiling Lan, Hanxin Zhu, Yulin Ren, Zhibo Chen

    Abstract: Compressed Image Super-resolution (CSR) aims to simultaneously super-resolve the compressed images and tackle the challenging hybrid distortions caused by compression. However, existing works on CSR usually focuses on a single compression codec, i.e., JPEG, ignoring the diverse traditional or learning-based codecs in the practical application, e.g., HEVC, VVC, HIFIC, etc. In this work, we propose… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  2. arXiv:2405.15222  [pdf, other

    cs.CV cs.AI cs.RO

    Leveraging Unknown Objects to Construct Labeled-Unlabeled Meta-Relationships for Zero-Shot Object Navigation

    Authors: Yanwei Zheng, Changrui Li, Chuanlin Lan, Yaling Li, Xiao Zhang, Yifei Zou, Dongxiao Yu, Zhipeng Cai

    Abstract: Zero-shot object navigation (ZSON) addresses situation where an agent navigates to an unseen object that does not present in the training set. Previous works mainly train agent using seen objects with known labels, and ignore the seen objects without labels. In this paper, we introduce seen objects without labels, herein termed as ``unknown objects'', into training procedure to enrich the agent's… ▽ More

    Submitted 26 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

  3. arXiv:2405.07481  [pdf, other

    cs.CV

    Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis

    Authors: Tianci Bi, Xiaoyi Zhang, Zhizheng Zhang, Wenxuan Xie, Cuiling Lan, Yan Lu, Nanning Zheng

    Abstract: Significant progress has been made in scene text detection models since the rise of deep learning, but scene text layout analysis, which aims to group detected text instances as paragraphs, has not kept pace. Previous works either treated text detection and grouping using separate models, or train a model from scratch while using a unified one. All of them have not yet made full use of the already… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: Accepted to CVPR 2024

  4. arXiv:2403.15691  [pdf, other

    cs.CV

    Temporal-Spatial Object Relations Modeling for Vision-and-Language Navigation

    Authors: Bowen Huang, Yanwei Zheng, Chuanlin Lan, Xinpeng Zhao, Yifei Zou, Dongxiao yu

    Abstract: Vision-and-Language Navigation (VLN) is a challenging task where an agent is required to navigate to a natural language described location via vision observations. The navigation abilities of the agent can be enhanced by the relations between objects, which are usually learned using internal objects or external datasets. The relationships between internal objects are modeled employing graph convol… ▽ More

    Submitted 16 May, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

  5. arXiv:2403.08635  [pdf, other

    cs.LG cs.AI stat.ML

    Human Alignment of Large Language Models through Online Preference Optimisation

    Authors: Daniele Calandriello, Daniel Guo, Remi Munos, Mark Rowland, Yunhao Tang, Bernardo Avila Pires, Pierre Harvey Richemond, Charline Le Lan, Michal Valko, Tianqi Liu, Rishabh Joshi, Zeyu Zheng, Bilal Piot

    Abstract: Ensuring alignment of language models' outputs with human preferences is critical to guarantee a useful, safe, and pleasant user experience. Thus, human alignment has been extensively studied recently and several methods such as Reinforcement Learning from Human Feedback (RLHF), Direct Policy Optimisation (DPO) and Sequence Likelihood Calibration (SLiC) have emerged. In this paper, our contributio… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  6. arXiv:2403.08295  [pdf, other

    cs.CL cs.AI

    Gemma: Open Models Based on Gemini Research and Technology

    Authors: Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Léonard Hussenot, Pier Giuseppe Sessa, Aakanksha Chowdhery, Adam Roberts, Aditya Barua, Alex Botev, Alex Castro-Ros, Ambrose Slone, Amélie Héliou, Andrea Tacchetti, Anna Bulanova, Antonia Paterson, Beth Tsai, Bobak Shahriari , et al. (83 additional authors not shown)

    Abstract: This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Ge… ▽ More

    Submitted 16 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  7. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  8. arXiv:2402.13088  [pdf, other

    cs.CV

    Slot-VLM: SlowFast Slots for Video-Language Modeling

    Authors: Jiaqi Xu, Cuiling Lan, Wenxuan Xie, Xuejin Chen, Yan Lu

    Abstract: Video-Language Models (VLMs), powered by the advancements in Large Language Models (LLMs), are charting new frontiers in video understanding. A pivotal challenge is the development of an efficient method to encapsulate video content into a set of representative tokens to align with LLMs. In this work, we introduce Slot-VLM, a novel framework designed to generate semantically decomposed video token… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: 16 pages, 10 figures

  9. arXiv:2402.09712  [pdf, other

    cs.CV cs.AI

    Diffusion Model with Cross Attention as an Inductive Bias for Disentanglement

    Authors: Tao Yang, Cuiling Lan, Yan Lu, Nanning zheng

    Abstract: Disentangled representation learning strives to extract the intrinsic factors within observed data. Factorizing these representations in an unsupervised manner is notably challenging and usually requires tailored loss functions or specific structural designs. In this paper, we introduce a new perspective and framework, demonstrating that diffusion models with cross-attention can serve as a powerfu… ▽ More

    Submitted 12 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

  10. arXiv:2401.10011  [pdf, other

    cs.CV

    CPCL: Cross-Modal Prototypical Contrastive Learning for Weakly Supervised Text-based Person Re-Identification

    Authors: Yanwei Zheng, Xinpeng Zhao, Chuanlin Lan, Xiaowei Zhang, Bowen Huang, Jibin Yang, Dongxiao Yu

    Abstract: Weakly supervised text-based person re-identification (TPRe-ID) seeks to retrieve images of a target person using textual descriptions, without relying on identity annotations and is more challenging and practical. The primary challenge is the intra-class differences, encompassing intra-modal feature variations and cross-modal semantic gaps. Prior works have focused on instance-level samples and i… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: 9 pages, 6 figures

  11. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  12. arXiv:2312.04931  [pdf, other

    cs.CV

    Retrieval-based Video Language Model for Efficient Long Video Question Answering

    Authors: Jiaqi Xu, Cuiling Lan, Wenxuan Xie, Xuejin Chen, Yan Lu

    Abstract: The remarkable natural language understanding, reasoning, and generation capabilities of large language models (LLMs) have made them attractive for application to video question answering (Video QA) tasks, utilizing video tokens as contextual input. However, employing LLMs for long video understanding presents significant challenges and remains under-explored. The extensive number of video tokens… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  13. arXiv:2310.02674  [pdf, other

    cs.CV cs.AI cs.CY cs.MM

    ObjFormer: Learning Land-Cover Changes From Paired OSM Data and Optical High-Resolution Imagery via Object-Guided Transformer

    Authors: Hongruixuan Chen, Cuiling Lan, Jian Song, Clifford Broni-Bediako, Junshi Xia, Naoto Yokoya

    Abstract: Optical high-resolution imagery and OSM data are two important data sources of change detection (CD). Previous related studies focus on utilizing the information in OSM data to aid the CD on optical high-resolution images. This paper pioneers the direct detection of land-cover changes utilizing paired OSM data and optical imagery, thereby expanding the scope of CD tasks. To this end, we propose an… ▽ More

    Submitted 26 June, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: Accepted by IEEE TGRS

  14. arXiv:2308.15512  [pdf, other

    cs.CV

    Shatter and Gather: Learning Referring Image Segmentation with Text Supervision

    Authors: Dongwon Kim, Namyup Kim, Cuiling Lan, Suha Kwak

    Abstract: Referring image segmentation, the task of segmenting any arbitrary entities described in free-form texts, opens up a variety of vision applications. However, manual labeling of training data for this task is prohibitively costly, leading to lack of labeled data for training. We address this issue by a weakly supervised learning approach using text descriptions of training images as the only source… ▽ More

    Submitted 24 October, 2023; v1 submitted 29 August, 2023; originally announced August 2023.

    Comments: Accepted to ICCV 2023, Project page: https://southflame.github.io/sag/

  15. arXiv:2308.09388  [pdf, other

    cs.CV

    Diffusion Models for Image Restoration and Enhancement -- A Comprehensive Survey

    Authors: Xin Li, Yulin Ren, Xin Jin, Cuiling Lan, Xingrui Wang, Wenjun Zeng, Xinchao Wang, Zhibo Chen

    Abstract: Image restoration (IR) has been an indispensable and challenging task in the low-level vision field, which strives to improve the subjective quality of images distorted by various forms of degradation. Recently, the diffusion model has achieved significant advancements in the visual generation of AIGC, thereby raising an intuitive question, "whether diffusion model can boost image restoration". To… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: 34 pages

  16. arXiv:2307.14008  [pdf, other

    cs.CV

    Adaptive Frequency Filters As Efficient Global Token Mixers

    Authors: Zhipeng Huang, Zhizheng Zhang, Cuiling Lan, Zheng-Jun Zha, Yan Lu, Baining Guo

    Abstract: Recent vision transformers, large-kernel CNNs and MLPs have attained remarkable successes in broad vision tasks thanks to their effective information fusion in the global scope. However, their efficient deployments, especially on mobile devices, still suffer from noteworthy challenges due to the heavy computational costs of self-attention mechanisms, large kernels, or fully connected layers. In th… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

    Comments: Accepted by ICCV2023

  17. arXiv:2306.10171  [pdf, other

    cs.LG cs.AI stat.ML

    Bootstrapped Representations in Reinforcement Learning

    Authors: Charline Le Lan, Stephen Tu, Mark Rowland, Anna Harutyunyan, Rishabh Agarwal, Marc G. Bellemare, Will Dabney

    Abstract: In reinforcement learning (RL), state representations are key to dealing with large or continuous state spaces. While one of the promises of deep learning algorithms is to automatically construct features well-tuned for the task they try to solve, such a representation might not emerge from end-to-end training of deep RL agents. To mitigate this issue, auxiliary objectives are often incorporated i… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

    Comments: ICML 2023

  18. arXiv:2306.00008  [pdf, other

    cs.LG cs.CL

    Brainformers: Trading Simplicity for Efficiency

    Authors: Yanqi Zhou, Nan Du, Yanping Huang, Daiyi Peng, Chang Lan, Da Huang, Siamak Shakeri, David So, Andrew Dai, Yifeng Lu, Zhifeng Chen, Quoc Le, Claire Cui, James Laudon, Jeff Dean

    Abstract: Transformers are central to recent successes in natural language processing and computer vision. Transformers have a mostly uniform backbone where layers alternate between feed-forward and self-attention in order to build a deep network. Here we investigate this design choice and find that more complex blocks that have different permutations of layer primitives can be more efficient. Using this in… ▽ More

    Submitted 25 April, 2024; v1 submitted 29 May, 2023; originally announced June 2023.

  19. arXiv:2305.18063  [pdf, other

    cs.CV

    Vector-based Representation is the Key: A Study on Disentanglement and Compositional Generalization

    Authors: Tao Yang, Yuwang Wang, Cuiling Lan, Yan Lu, Nanning Zheng

    Abstract: Recognizing elementary underlying concepts from observations (disentanglement) and generating novel combinations of these concepts (compositional generalization) are fundamental abilities for humans to support rapid knowledge learning and generalize to new tasks, with which the deep learning models struggle. Towards human-like intelligence, various works on disentangled representation learning hav… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

    Comments: Preprint

  20. arXiv:2305.10403  [pdf, other

    cs.CL cs.AI

    PaLM 2 Technical Report

    Authors: Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yanping Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yujing Zhang, Gustavo Hernandez Abrego , et al. (103 additional authors not shown)

    Abstract: We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on… ▽ More

    Submitted 13 September, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

  21. arXiv:2304.12567  [pdf, other

    cs.LG cs.AI stat.ML

    Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks

    Authors: Jesse Farebrother, Joshua Greaves, Rishabh Agarwal, Charline Le Lan, Ross Goroshin, Pablo Samuel Castro, Marc G. Bellemare

    Abstract: Auxiliary tasks improve the representations learned by deep reinforcement learning agents. Analytically, their effect is reasonably well understood; in practice, however, their primary use remains in support of a main learning objective, rather than as a method for learning representations. This is perhaps surprising given that many auxiliary tasks are defined procedurally, and hence can be treate… ▽ More

    Submitted 25 April, 2023; originally announced April 2023.

    Comments: ICLR 2023. Code and models are available at https://github.com/google-research/google-research/tree/master/pvn 22 pages, 8 figures

  22. arXiv:2303.06859  [pdf, other

    cs.CV cs.MM eess.IV

    Learning Distortion Invariant Representation for Image Restoration from A Causality Perspective

    Authors: Xin Li, Bingchen Li, Xin Jin, Cuiling Lan, Zhibo Chen

    Abstract: In recent years, we have witnessed the great advancement of Deep neural networks (DNNs) in image restoration. However, a critical limitation is that they cannot generalize well to real-world degradations with different degrees or types. In this paper, we are the first to propose a novel training strategy for image restoration from the causality perspective, to improve the generalization ability of… ▽ More

    Submitted 31 March, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

    Comments: Accepted by CVPR2023

  23. arXiv:2302.14430  [pdf, other

    cs.CV

    Tracking Fast by Learning Slow: An Event-based Speed Adaptive Hand Tracker Leveraging Knowledge in RGB Domain

    Authors: Chuanlin Lan, Ziyuan Yin, Arindam Basu, Rosa H. M. Chan

    Abstract: 3D hand tracking methods based on monocular RGB videos are easily affected by motion blur, while event camera, a sensor with high temporal resolution and dynamic range, is naturally suitable for this task with sparse output and low power consumption. However, obtaining 3D annotations of fast-moving hands is difficult for constructing event-based hand-tracking datasets. In this paper, we provided a… ▽ More

    Submitted 28 February, 2023; originally announced February 2023.

  24. arXiv:2302.11866  [pdf, other

    cs.NI

    DCNetBench: Scaleable Data Center Network Benchmarking

    Authors: Ke Liu, Wanling Gao, Chunjie Luo, Cheng Huang, Chunxin Lan, Zhenxing Zhang, Lei Wang, Xiwen He, Nan Li, Jianfeng Zhan

    Abstract: Data center networking is the central infrastructure of the modern information society. However, benchmarking them is very challenging as the real-world network traffic is difficult to model, and Internet service giants treat the network traffic as confidential. Several industries have published a few publicly available network traces. However, these traces are collected from specific data center… ▽ More

    Submitted 23 February, 2023; originally announced February 2023.

    Comments: 19 pages, 15 figures

  25. arXiv:2301.08883  [pdf, other

    cs.LG eess.SP

    Versatile Neural Processes for Learning Implicit Neural Representations

    Authors: Zongyu Guo, Cuiling Lan, Zhizheng Zhang, Yan Lu, Zhibo Chen

    Abstract: Representing a signal as a continuous function parameterized by neural network (a.k.a. Implicit Neural Representations, INRs) has attracted increasing attention in recent years. Neural Processes (NPs), which model the distributions over functions conditioned on partial observations (context set), provide a practical solution for fast inference of continuous functions. However, existing NP architec… ▽ More

    Submitted 21 February, 2023; v1 submitted 20 January, 2023; originally announced January 2023.

    Comments: Camera-ready version for ICLR2023

  26. arXiv:2301.01069  [pdf, other

    eess.IV cs.CV cs.IR

    Saliency-Aware Spatio-Temporal Artifact Detection for Compressed Video Quality Assessment

    Authors: Liqun Lin, Yang Zheng, Weiling Chen, Chengdong Lan, Tiesong Zhao

    Abstract: Compressed videos often exhibit visually annoying artifacts, known as Perceivable Encoding Artifacts (PEAs), which dramatically degrade video visual quality. Subjective and objective measures capable of identifying and quantifying various types of PEAs are critical in improving visual quality. In this paper, we investigate the influence of four spatial PEAs (i.e. blurring, blocking, bleeding, and… ▽ More

    Submitted 3 January, 2023; originally announced January 2023.

  27. arXiv:2212.04025  [pdf, other

    cs.LG cs.AI stat.ML

    A Novel Stochastic Gradient Descent Algorithm for Learning Principal Subspaces

    Authors: Charline Le Lan, Joshua Greaves, Jesse Farebrother, Mark Rowland, Fabian Pedregosa, Rishabh Agarwal, Marc G. Bellemare

    Abstract: Many machine learning problems encode their data as a matrix with a possibly very large number of rows and columns. In several applications like neuroscience, image compression or deep reinforcement learning, the principal subspace of such a matrix provides a useful, low-dimensional representation of individual data. Here, we are interested in determining the $d$-dimensional principal subspace of… ▽ More

    Submitted 7 December, 2022; originally announced December 2022.

    Comments: 8 pages in main content, 2 pages of bibliography and 5 pages in Appendix

  28. arXiv:2212.03319  [pdf, other

    cs.LG cs.AI

    Understanding Self-Predictive Learning for Reinforcement Learning

    Authors: Yunhao Tang, Zhaohan Daniel Guo, Pierre Harvey Richemond, Bernardo Ávila Pires, Yash Chandak, Rémi Munos, Mark Rowland, Mohammad Gheshlaghi Azar, Charline Le Lan, Clare Lyle, András György, Shantanu Thakoor, Will Dabney, Bilal Piot, Daniele Calandriello, Michal Valko

    Abstract: We study the learning dynamics of self-predictive learning for reinforcement learning, a family of algorithms that learn representations by minimizing the prediction error of their own future latent representations. Despite its recent empirical success, such algorithms have an apparent defect: trivial representations (such as constants) minimize the prediction error, yet it is obviously undesirabl… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

  29. arXiv:2212.02739  [pdf, other

    cs.CV cs.AI

    Semantic-aware Message Broadcasting for Efficient Unsupervised Domain Adaptation

    Authors: Xin Li, Cuiling Lan, Guoqiang Wei, Zhibo Chen

    Abstract: Vision transformer has demonstrated great potential in abundant vision tasks. However, it also inevitably suffers from poor generalization capability when the distribution shift occurs in testing (i.e., out-of-distribution data). To mitigate this issue, we propose a novel method, Semantic-aware Message Broadcasting (SAMB), which enables more informative and flexible feature alignment for unsupervi… ▽ More

    Submitted 5 December, 2022; originally announced December 2022.

    Comments: 13 pages, 5 figures

  30. arXiv:2208.04173  [pdf, other

    cs.CV cs.AI

    SIAD: Self-supervised Image Anomaly Detection System

    Authors: Jiawei Li, Chenxi Lan, Xinyi Zhang, Bolin Jiang, Yuqiu Xie, Naiqi Li, Yan Liu, Yaowei Li, Enze Huo, Bin Chen

    Abstract: Recent trends in AIGC effectively boosted the application of visual inspection. However, most of the available systems work in a human-in-the-loop manner and can not provide long-term support to the online application. To make a step forward, this paper outlines an automatic annotation system called SsaA, working in a self-supervised learning manner, for continuously making the online visual inspe… ▽ More

    Submitted 8 October, 2023; v1 submitted 8 August, 2022; originally announced August 2022.

    Comments: 4 pages, 3 figures, ICCV 2023 Demo Track

  31. Unified Normalization for Accelerating and Stabilizing Transformers

    Authors: Qiming Yang, Kai Zhang, Chaoxiang Lan, Zhi Yang, Zheyang Li, Wenming Tan, Jun Xiao, Shiliang Pu

    Abstract: Solid results from Transformers have made them prevailing architectures in various natural language and vision tasks. As a default component in Transformers, Layer Normalization (LN) normalizes activations within each token to boost the robustness. However, LN requires on-the-fly statistics calculation in inference as well as division and square root operations, leading to inefficiency on hardware… ▽ More

    Submitted 2 August, 2022; originally announced August 2022.

    Comments: ACM MM'22

  32. arXiv:2205.03599  [pdf, other

    eess.IV cs.CV

    GAN-Based Multi-View Video Coding with Spatio-Temporal EPI Reconstruction

    Authors: Chengdong Lan, Hao Yan, Cheng Luo, Tiesong Zhao

    Abstract: The introduction of multiple viewpoints in video scenes inevitably increases the bitrates required for storage and transmission. To reduce bitrates, researchers have developed methods to skip intermediate viewpoints during compression and delivery, and ultimately reconstruct them using Side Information (SI). Typically, depth maps are used to construct SI. However, their methods suffer from inaccur… ▽ More

    Submitted 5 May, 2023; v1 submitted 7 May, 2022; originally announced May 2022.

  33. arXiv:2203.16768  [pdf, other

    cs.CV cs.AI

    ReSTR: Convolution-free Referring Image Segmentation Using Transformers

    Authors: Namyup Kim, Dongwon Kim, Cuiling Lan, Wenjun Zeng, Suha Kwak

    Abstract: Referring image segmentation is an advanced semantic segmentation task where target is not a predefined class but is described in natural language. Most of existing methods for this task rely heavily on convolutional neural networks, which however have trouble capturing long-range dependencies between entities in the language expression and are not flexible enough for modeling interactions between… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

    Comments: CVPR 2022 accepted

  34. arXiv:2203.12198  [pdf, other

    cs.CV

    Deep Frequency Filtering for Domain Generalization

    Authors: Shiqi Lin, Zhizheng Zhang, Zhipeng Huang, Yan Lu, Cuiling Lan, Peng Chu, Quanzeng You, Jiang Wang, Zicheng Liu, Amey Parulkar, Viraj Navkal, Zhibo Chen

    Abstract: Improving the generalization ability of Deep Neural Networks (DNNs) is critical for their practical uses, which has been a longstanding challenge. Some theoretical studies have uncovered that DNNs have preferences for some frequency components in the learning process and indicated that this may affect the robustness of learned features. In this paper, we propose Deep Frequency Filtering (DFF) for… ▽ More

    Submitted 25 March, 2023; v1 submitted 23 March, 2022; originally announced March 2022.

    Comments: Accepted by CVPR2023

  35. arXiv:2203.06108  [pdf, other

    cs.CV

    Active Token Mixer

    Authors: Guoqiang Wei, Zhizheng Zhang, Cuiling Lan, Yan Lu, Zhibo Chen

    Abstract: The three existing dominant network families, i.e., CNNs, Transformers, and MLPs, differ from each other mainly in the ways of fusing spatial contextual information, leaving designing more effective token-mixing mechanisms at the core of backbone architecture development. In this work, we propose an innovative token-mixer, dubbed Active Token Mixer (ATM), to actively incorporate flexible contextua… ▽ More

    Submitted 23 December, 2022; v1 submitted 11 March, 2022; originally announced March 2022.

    Comments: Accepted by AAAI2023

  36. arXiv:2203.00543  [pdf, other

    cs.LG cs.AI stat.ML

    On the Generalization of Representations in Reinforcement Learning

    Authors: Charline Le Lan, Stephen Tu, Adam Oberman, Rishabh Agarwal, Marc G. Bellemare

    Abstract: In reinforcement learning, state representations are used to tractably deal with large problem spaces. State representations serve both to approximate the value function with few parameters, but also to generalize to newly encountered states. Their features may be learned implicitly (as part of a neural network) or explicitly (for example, the successor representation of \citet{dayan1993improving}… ▽ More

    Submitted 1 March, 2022; originally announced March 2022.

    Comments: Accepted at AISTATS22

  37. arXiv:2201.12096  [pdf, other

    cs.LG

    Mask-based Latent Reconstruction for Reinforcement Learning

    Authors: Tao Yu, Zhizheng Zhang, Cuiling Lan, Yan Lu, Zhibo Chen

    Abstract: For deep reinforcement learning (RL) from pixels, learning effective state representations is crucial for achieving high performance. However, in practice, limited experience and high-dimensional inputs prevent effective representation learning. To address this, motivated by the success of mask-based modeling in other research fields, we introduce mask-based reconstruction to promote state represe… ▽ More

    Submitted 9 October, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

    Comments: Accepted to NeurIPS 2022

  38. arXiv:2112.06632  [pdf, other

    cs.CV

    Lifelong Unsupervised Domain Adaptive Person Re-identification with Coordinated Anti-forgetting and Adaptation

    Authors: Zhipeng Huang, Zhizheng Zhang, Cuiling Lan, Wenjun Zeng, Peng Chu, Quanzeng You, Jiang Wang, Zicheng Liu, Zheng-jun Zha

    Abstract: Unsupervised domain adaptive person re-identification (ReID) has been extensively investigated to mitigate the adverse effects of domain gaps. Those works assume the target domain data can be accessible all at once. However, for the real-world streaming data, this hinders the timely adaptation to changing data statistics and sufficient exploitation of increasing samples. In this paper, to address… ▽ More

    Submitted 29 March, 2022; v1 submitted 13 December, 2021; originally announced December 2021.

    Comments: Accepted by CVPR2022

  39. arXiv:2111.13420  [pdf, other

    cs.LG cs.CV

    Confounder Identification-free Causal Visual Feature Learning

    Authors: Xin Li, Zhizheng Zhang, Guoqiang Wei, Cuiling Lan, Wenjun Zeng, Xin Jin, Zhibo Chen

    Abstract: Confounders in deep learning are in general detrimental to model's generalization where they infiltrate feature representations. Therefore, learning causal features that are free of interference from confounders is important. Most previous causal learning based approaches employ back-door criterion to mitigate the adverse effect of certain specific confounder, which require the explicit identifica… ▽ More

    Submitted 9 October, 2022; v1 submitted 26 November, 2021; originally announced November 2021.

    Comments: 21 pages

  40. arXiv:2111.07046  [pdf, other

    cs.LG

    Iterative Training: Finding Binary Weight Deep Neural Networks with Layer Binarization

    Authors: Cheng-Chou Lan

    Abstract: In low-latency or mobile applications, lower computation complexity, lower memory footprint and better energy efficiency are desired. Many prior works address this need by removing redundant parameters. Parameter quantization replaces floating-point arithmetic with lower precision fixed-point arithmetic, further reducing complexity. Typical training of quantized weight neural networks starts fro… ▽ More

    Submitted 13 November, 2021; originally announced November 2021.

    Comments: 10 pages, 7 figures

  41. arXiv:2111.03993  [pdf, other

    cs.CV

    Multi-Scale Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition

    Authors: Pengfei Zhang, Cuiling Lan, Wenjun Zeng, Junliang Xing, Jianru Xue, Nanning Zheng

    Abstract: Skeleton data is of low dimension. However, there is a trend of using very deep and complicated feedforward neural networks to model the skeleton sequence without considering the complexity in recent year. In this paper, a simple yet effective multi-scale semantics-guided neural network (MS-SGN) is proposed for skeleton-based action recognition. We explicitly introduce the high level semantics of… ▽ More

    Submitted 6 November, 2021; originally announced November 2021.

  42. arXiv:2110.14994  [pdf, other

    cs.CV

    Skeleton-Based Mutually Assisted Interacted Object Localization and Human Action Recognition

    Authors: Liang Xu, Cuiling Lan, Wenjun Zeng, Cewu Lu

    Abstract: Skeleton data carries valuable motion information and is widely explored in human action recognition. However, not only the motion information but also the interaction with the environment provides discriminative cues to recognize the action of persons. In this paper, we propose a joint learning framework for mutually assisted "interacted object localization" and "human action recognition" based o… ▽ More

    Submitted 10 May, 2022; v1 submitted 28 October, 2021; originally announced October 2021.

    Comments: Accepted to the IEEE Transactions on Multimedia 2022

  43. arXiv:2109.14196  [pdf, other

    cs.CV cs.AI

    WEDGE: Web-Image Assisted Domain Generalization for Semantic Segmentation

    Authors: Namyup Kim, Taeyoung Son, Jaehyun Pahk, Cuiling Lan, Wenjun Zeng, Suha Kwak

    Abstract: Domain generalization for semantic segmentation is highly demanded in real applications, where a trained model is expected to work well in previously unseen domains. One challenge lies in the lack of data which could cover the diverse distributions of the possible unseen domains for training. In this paper, we propose a WEb-image assisted Domain GEneralization (WEDGE) scheme, which is the first to… ▽ More

    Submitted 2 May, 2023; v1 submitted 29 September, 2021; originally announced September 2021.

  44. Two Souls in an Adversarial Image: Towards Universal Adversarial Example Detection using Multi-view Inconsistency

    Authors: Sohaib Kiani, Sana Awan, Chao Lan, Fengjun Li, Bo Luo

    Abstract: In the evasion attacks against deep neural networks (DNN), the attacker generates adversarial instances that are visually indistinguishable from benign samples and sends them to the target DNN to trigger misclassifications. In this paper, we propose a novel multi-view adversarial image detector, namely Argos, based on a novel observation. That is, there exist two "souls" in an adversarial instance… ▽ More

    Submitted 11 October, 2021; v1 submitted 25 September, 2021; originally announced September 2021.

    Journal ref: Annual Computer Security Applications Conference (ACSAC '21), December 6--10, 2021, Virtual Event, USA

  45. arXiv:2108.00139  [pdf, ps, other

    cs.CV cs.MM

    Pose-Guided Feature Learning with Knowledge Distillation for Occluded Person Re-Identification

    Authors: Kecheng Zheng, Cuiling Lan, Wenjun Zeng, Jiawei Liu, Zhizheng Zhang, Zheng-Jun Zha

    Abstract: Occluded person re-identification (ReID) aims to match person images with occlusion. It is fundamentally challenging because of the serious occlusion which aggravates the misalignment problem between images. At the cost of incorporating a pose estimator, many works introduce pose information to alleviate the misalignment in both training and testing. To achieve high accuracy while preserving low i… ▽ More

    Submitted 23 August, 2021; v1 submitted 30 July, 2021; originally announced August 2021.

    Comments: ACM MM 2021

  46. arXiv:2107.12719  [pdf, other

    cs.MM cs.CV cs.SD eess.AS

    The CORSMAL benchmark for the prediction of the properties of containers

    Authors: Alessio Xompero, Santiago Donaher, Vladimir Iashin, Francesca Palermo, Gökhan Solak, Claudio Coppola, Reina Ishikawa, Yuichi Nagao, Ryo Hachiuma, Qi Liu, Fan Feng, Chuanlin Lan, Rosa H. M. Chan, Guilherme Christmann, Jyun-Ting Song, Gonuguntla Neeharika, Chinnakotla Krishna Teja Reddy, Dinesh Jain, Bakhtawar Ur Rehman, Andrea Cavallaro

    Abstract: The contactless estimation of the weight of a container and the amount of its content manipulated by a person are key pre-requisites for safe human-to-robot handovers. However, opaqueness and transparencies of the container and the content, and variability of materials, shapes, and sizes, make this estimation difficult. In this paper, we present a range of methods and an open framework to benchmar… ▽ More

    Submitted 21 April, 2022; v1 submitted 27 July, 2021; originally announced July 2021.

    Comments: Authors' post-print accepted for publication in IEEE Access, see https://doi.org/10.1109/ACCESS.2022.3166906 . 14 pages, 6 tables, 7 figures

    Journal ref: IEEE Access, vol. 10, 2022, 1-15

  47. arXiv:2106.10812  [pdf, other

    cs.CV

    ToAlign: Task-oriented Alignment for Unsupervised Domain Adaptation

    Authors: Guoqiang Wei, Cuiling Lan, Wenjun Zeng, Zhizheng Zhang, Zhibo Chen

    Abstract: Unsupervised domain adaptive classifcation intends to improve the classifcation performance on unlabeled target domain. To alleviate the adverse effect of domain shift, many approaches align the source and target domains in the feature space. However, a feature is usually taken as a whole for alignment without explicitly making domain alignment proactively serve the classifcation task, leading to… ▽ More

    Submitted 26 October, 2021; v1 submitted 20 June, 2021; originally announced June 2021.

    Comments: Accepted to NeurIPS 2021

  48. arXiv:2106.04152  [pdf, other

    cs.LG

    PlayVirtual: Augmenting Cycle-Consistent Virtual Trajectories for Reinforcement Learning

    Authors: Tao Yu, Cuiling Lan, Wenjun Zeng, Mingxiao Feng, Zhizheng Zhang, Zhibo Chen

    Abstract: Learning good feature representations is important for deep reinforcement learning (RL). However, with limited experience, RL often suffers from data inefficiency for training. For un-experienced or less-experienced trajectories (i.e., state-action sequences), the lack of data limits the use of them for better feature learning. In this work, we propose a novel method, dubbed PlayVirtual, which aug… ▽ More

    Submitted 27 October, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

    Comments: Accepted to NeurIPS 2021

  49. arXiv:2103.13917  [pdf, other

    cs.CV

    Disentanglement-based Cross-Domain Feature Augmentation for Effective Unsupervised Domain Adaptive Person Re-identification

    Authors: Zhizheng Zhang, Cuiling Lan, Wenjun Zeng, Quanzeng You, Zicheng Liu, Kecheng Zheng, Zhibo Chen

    Abstract: Unsupervised domain adaptive (UDA) person re-identification (ReID) aims to transfer the knowledge from the labeled source domain to the unlabeled target domain for person matching. One challenge is how to generate target domain samples with reliable labels for training. To address this problem, we propose a Disentanglement-based Cross-Domain Feature Augmentation (DCDFA) strategy, where the augment… ▽ More

    Submitted 25 March, 2021; originally announced March 2021.

  50. arXiv:2103.13575  [pdf, other

    cs.CV

    MetaAlign: Coordinating Domain Alignment and Classification for Unsupervised Domain Adaptation

    Authors: Guoqiang Wei, Cuiling Lan, Wenjun Zeng, Zhibo Chen

    Abstract: For unsupervised domain adaptation (UDA), to alleviate the effect of domain shift, many approaches align the source and target domains in the feature space by adversarial learning or by explicitly aligning their statistics. However, the optimization objective of such domain alignment is generally not coordinated with that of the object classification task itself such that their descent directions… ▽ More

    Submitted 24 March, 2021; originally announced March 2021.

    Comments: Accepted by CVPR2021