Skip to main content

Showing 1–50 of 80 results for author: Alvarez, J M

  1. arXiv:2407.07276  [pdf, other

    cs.CV cs.AI

    Exploring Camera Encoder Designs for Autonomous Driving Perception

    Authors: Barath Lakshmanan, Joshua Chen, Shiyi Lan, Maying Shen, Zhiding Yu, Jose M. Alvarez

    Abstract: The cornerstone of autonomous vehicles (AV) is a solid perception system, where camera encoders play a crucial role. Existing works usually leverage pre-trained Convolutional Neural Networks (CNN) or Vision Transformers (ViTs) designed for general vision tasks, such as image classification, segmentation, and 2D detection. Although those well-known architectures have achieved state-of-the-art accur… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  2. arXiv:2406.06978  [pdf, other

    cs.CV

    Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation

    Authors: Zhenxin Li, Kailin Li, Shihao Wang, Shiyi Lan, Zhiding Yu, Yishen Ji, Zhiqi Li, Ziyue Zhu, Jan Kautz, Zuxuan Wu, Yu-Gang Jiang, Jose M. Alvarez

    Abstract: We propose Hydra-MDP, a novel paradigm employing multiple teachers in a teacher-student model. This approach uses knowledge distillation from both human and rule-based teachers to train the student model, which features a multi-head decoder to learn diverse trajectory candidates tailored to various evaluation metrics. With the knowledge of rule-based teachers, Hydra-MDP learns how the environment… ▽ More

    Submitted 19 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: The 1st place solution of End-to-end Driving at Scale at the CVPR 2024 Autonomous Grand Challenge

  3. arXiv:2406.04484  [pdf, ps, other

    cs.CV

    Step Out and Seek Around: On Warm-Start Training with Incremental Data

    Authors: Maying Shen, Hongxu Yin, Pavlo Molchanov, Lei Mao, Jose M. Alvarez

    Abstract: Data often arrives in sequence over time in real-world deep learning applications such as autonomous driving. When new training data is available, training the model from scratch undermines the benefit of leveraging the learned knowledge, leading to significant training costs. Warm-starting from a previously trained checkpoint is the most intuitive way to retain knowledge and advance learning. How… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  4. arXiv:2405.18902  [pdf, other

    cs.LG cs.AI stat.ML

    A Causal Framework for Evaluating Deferring Systems

    Authors: Filippo Palomba, Andrea Pugnana, José Manuel Alvarez, Salvatore Ruggieri

    Abstract: Deferring systems extend supervised Machine Learning (ML) models with the possibility to defer predictions to human experts. However, evaluating the impact of a deferring strategy on system accuracy is still an overlooked area. This paper fills this gap by evaluating deferring systems through a causal lens. We link the potential outcomes framework for causal inference with deferring systems. This… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  5. arXiv:2405.17187  [pdf, other

    cs.CV cs.AI cs.RO

    Memorize What Matters: Emergent Scene Decomposition from Multitraverse

    Authors: Yiming Li, Zehong Wang, Yue Wang, Zhiding Yu, Zan Gojcic, Marco Pavone, Chen Feng, Jose M. Alvarez

    Abstract: Humans naturally retain memories of permanent elements, while ephemeral moments often slip through the cracks of memory. This selective retention is crucial for robotic perception, localization, and mapping. To endow robots with this capability, we introduce 3D Gaussian Mapping (3DGM), a self-supervised, camera-only offline mapping framework grounded in 3D Gaussian Splatting. 3DGM converts multitr… ▽ More

    Submitted 29 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: Project page: https://3d-gaussian-mapping.github.io; Code and data: https://github.com/NVlabs/3DGM

  6. arXiv:2405.13693  [pdf, ps, other

    cs.LG

    Uncovering Algorithmic Discrimination: An Opportunity to Revisit the Comparator

    Authors: Jose M. Alvarez, Salvatore Ruggieri

    Abstract: Causal reasoning, in particular, counterfactual reasoning plays a central role in testing for discrimination. Counterfactual reasoning materializes when testing for discrimination, what is known as the counterfactual model of discrimination, when we compare the discrimination comparator with the discrimination complainant, where the comparator is a similar (or similarly situated) profile to that o… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  7. arXiv:2405.01533  [pdf, other

    cs.CV

    OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning

    Authors: Shihao Wang, Zhiding Yu, Xiaohui Jiang, Shiyi Lan, Min Shi, Nadine Chang, Jan Kautz, Ying Li, Jose M. Alvarez

    Abstract: The advances in multimodal large language models (MLLMs) have led to growing interests in LLM-based autonomous driving agents to leverage their strong reasoning capabilities. However, capitalizing on MLLMs' strong reasoning capabilities for improved planning behavior is challenging since planning requires full 3D situational awareness beyond 2D reasoning. To address this challenge, our work propos… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  8. arXiv:2404.14908  [pdf, other

    cs.CV

    Mining Supervision for Dynamic Regions in Self-Supervised Monocular Depth Estimation

    Authors: Hoang Chuong Nguyen, Tianyu Wang, Jose M. Alvarez, Miaomiao Liu

    Abstract: This paper focuses on self-supervised monocular depth estimation in dynamic scenes trained on monocular videos. Existing methods jointly estimate pixel-wise depth and motion, relying mainly on an image reconstruction loss. Dynamic regions1 remain a critical challenge for these methods due to the inherent ambiguity in depth and motion estimation, resulting in inaccurate depth estimation. This paper… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR2024

  9. arXiv:2404.01990  [pdf, other

    cs.CV

    What is Point Supervision Worth in Video Instance Segmentation?

    Authors: Shuaiyi Huang, De-An Huang, Zhiding Yu, Shiyi Lan, Subhashree Radhakrishnan, Jose M. Alvarez, Abhinav Shrivastava, Anima Anandkumar

    Abstract: Video instance segmentation (VIS) is a challenging vision task that aims to detect, segment, and track objects in videos. Conventional VIS methods rely on densely-annotated object masks which are expensive. We reduce the human annotations to only one point for each object in a video frame during training, and obtain high-quality mask predictions close to fully supervised models. Our proposed train… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  10. arXiv:2403.09230  [pdf, other

    cs.CV

    Improving Distant 3D Object Detection Using 2D Box Supervision

    Authors: Zetong Yang, Zhiding Yu, Chris Choy, Renhao Wang, Anima Anandkumar, Jose M. Alvarez

    Abstract: Improving the detection of distant 3d objects is an important yet challenging task. For camera-based 3D perception, the annotation of 3d bounding relies heavily on LiDAR for accurate depth information. As such, the distance of annotation is often limited due to the sparsity of LiDAR points on distant objects, which hampers the capability of existing detectors for long-range scenarios. We address t… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  11. arXiv:2401.13408  [pdf, other

    cs.AI cs.CY cs.HC

    Causal Perception

    Authors: Jose M. Alvarez, Salvatore Ruggieri

    Abstract: Perception occurs when two individuals interpret the same information differently. Despite being a known phenomenon with implications for bias in decision-making, as individual experience determines interpretation, perception remains largely overlooked in machine learning (ML) research. Modern decision flows, whether partially or fully automated, involve human experts interacting with ML applicati… ▽ More

    Submitted 22 May, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

    Comments: arXiv admin note: text overlap with arXiv:2305.09535 by other authors

  12. arXiv:2401.03844  [pdf, other

    cs.CV

    Fully Attentional Networks with Self-emerging Token Labeling

    Authors: Bingyin Zhao, Zhiding Yu, Shiyi Lan, Yutao Cheng, Anima Anandkumar, Yingjie Lao, Jose M. Alvarez

    Abstract: Recent studies indicate that Vision Transformers (ViTs) are robust against out-of-distribution scenarios. In particular, the Fully Attentional Network (FAN) - a family of ViT backbones, has achieved state-of-the-art robustness. In this paper, we revisit the FAN models and improve their pre-training with a self-emerging token labeling (STL) framework. Our method contains a two-stage training framew… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

    Journal ref: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 5585-5595

  13. arXiv:2312.03031  [pdf, other

    cs.CV

    Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?

    Authors: Zhiqi Li, Zhiding Yu, Shiyi Lan, Jiahan Li, Jan Kautz, Tong Lu, Jose M. Alvarez

    Abstract: End-to-end autonomous driving recently emerged as a promising research direction to target autonomy from a full-stack perspective. Along this line, many of the latest works follow an open-loop evaluation setting on nuScenes to study the planning behavior. In this paper, we delve deeper into the problem by conducting thorough analyses and demystifying more devils in the details. We initially observ… ▽ More

    Submitted 2 June, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: Accept to cvpr 2024

  14. arXiv:2312.01696  [pdf, other

    cs.CV

    BEVNeXt: Reviving Dense BEV Frameworks for 3D Object Detection

    Authors: Zhenxin Li, Shiyi Lan, Jose M. Alvarez, Zuxuan Wu

    Abstract: Recently, the rise of query-based Transformer decoders is reshaping camera-based 3D object detection. These query-based decoders are surpassing the traditional dense BEV (Bird's Eye View)-based methods. However, we argue that dense BEV frameworks remain important due to their outstanding abilities in depth estimation and object localization, depicting 3D scenes accurately and comprehensively. This… ▽ More

    Submitted 24 March, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

  15. arXiv:2311.14671  [pdf, other

    cs.CV

    SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation

    Authors: Lingchen Meng, Shiyi Lan, Hengduo Li, Jose M. Alvarez, Zuxuan Wu, Yu-Gang Jiang

    Abstract: In-context segmentation aims at segmenting novel images using a few labeled example images, termed as "in-context examples", exploring content similarities between examples and the target. The resulting models can be generalized seamlessly to novel segmentation tasks, significantly reducing the labeling and training costs compared with conventional pipelines. However, in-context segmentation is mo… ▽ More

    Submitted 29 March, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

  16. arXiv:2310.19731  [pdf, other

    cs.CV cs.AI cs.LG

    ViR: Towards Efficient Vision Retention Backbones

    Authors: Ali Hatamizadeh, Michael Ranzinger, Shiyi Lan, Jose M. Alvarez, Sanja Fidler, Jan Kautz

    Abstract: Vision Transformers (ViTs) have attracted a lot of popularity in recent years, due to their exceptional capabilities in modeling long-range spatial dependencies and scalability for large scale training. Although the training parallelism of self-attention mechanism plays an important role in retaining great performance, its quadratic complexity baffles the application of ViTs in many scenarios whic… ▽ More

    Submitted 26 January, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

    Comments: Introduction of Vision Retention Networks (ViR) for Efficient Visual Modeling

  17. arXiv:2309.05192  [pdf, other

    cs.CV

    Towards Viewpoint Robustness in Bird's Eye View Segmentation

    Authors: Tzofi Klinghoffer, Jonah Philion, Wenzheng Chen, Or Litany, Zan Gojcic, Jungseock Joo, Ramesh Raskar, Sanja Fidler, Jose M. Alvarez

    Abstract: Autonomous vehicles (AV) require that neural networks used for perception be robust to different viewpoints if they are to be deployed across many types of vehicles without the repeated cost of data collection and labeling for each. AV companies typically focus on collecting data from diverse scenarios and locations, but not camera rig configurations, due to cost. As a result, only a small number… ▽ More

    Submitted 10 September, 2023; originally announced September 2023.

    Comments: ICCV 2023. Project Page: https://nvlabs.github.io/viewpoint-robustness

  18. arXiv:2308.02236  [pdf, other

    cs.CV

    FB-BEV: BEV Representation from Forward-Backward View Transformations

    Authors: Zhiqi Li, Zhiding Yu, Wenhai Wang, Anima Anandkumar, Tong Lu, Jose M. Alvarez

    Abstract: View Transformation Module (VTM), where transformations happen between multi-view image features and Bird-Eye-View (BEV) representation, is a crucial step in camera-based BEV perception systems. Currently, the two most prominent VTM paradigms are forward projection and backward projection. Forward projection, represented by Lift-Splat-Shoot, leads to sparsely projected BEV features without post-pr… ▽ More

    Submitted 17 August, 2023; v1 submitted 4 August, 2023; originally announced August 2023.

    Comments: Accept to ICCV 2023, camera-ready version

  19. arXiv:2307.15398  [pdf, other

    cs.LG cs.CY

    The Initial Screening Order Problem

    Authors: Jose M. Alvarez, Antonio Mastropietro, Salvatore Ruggieri

    Abstract: We investigate the role of the initial screening order (ISO) in candidate screening processes, such as employee hiring and academic admissions. The ISO refers to the order in which the screener evaluates the candidate pool. It has been largely overlooked in the literature, despite its potential impact on the optimality and fairness of the chosen set, especially under a human screener. We define tw… ▽ More

    Submitted 24 April, 2024; v1 submitted 28 July, 2023; originally announced July 2023.

  20. arXiv:2307.04106  [pdf, other

    cs.CV

    Parametric Depth Based Feature Representation Learning for Object Detection and Segmentation in Bird's Eye View

    Authors: Jiayu Yang, Enze Xie, Miaomiao Liu, Jose M. Alvarez

    Abstract: Recent vision-only perception models for autonomous driving achieved promising results by encoding multi-view image features into Bird's-Eye-View (BEV) space. A critical step and the main bottleneck of these methods is transforming image features into the BEV coordinate frame. This paper focuses on leveraging geometry information, such as depth, to model such feature transformation. Existing works… ▽ More

    Submitted 11 July, 2023; v1 submitted 9 July, 2023; originally announced July 2023.

  21. arXiv:2307.01492  [pdf, other

    cs.CV cs.RO

    FB-OCC: 3D Occupancy Prediction based on Forward-Backward View Transformation

    Authors: Zhiqi Li, Zhiding Yu, David Austin, Mingsheng Fang, Shiyi Lan, Jan Kautz, Jose M. Alvarez

    Abstract: This technical report summarizes the winning solution for the 3D Occupancy Prediction Challenge, which is held in conjunction with the CVPR 2023 Workshop on End-to-End Autonomous Driving and CVPR 23 Workshop on Vision-Centric Autonomous Driving Workshop. Our proposed solution FB-OCC builds upon FB-BEV, a cutting-edge camera-based bird's-eye view perception design using forward-backward projection.… ▽ More

    Submitted 4 July, 2023; originally announced July 2023.

    Comments: Outstanding Champion and Innovation Award in the 3D Occupancy Prediction Challenge (CVPR23)

  22. arXiv:2306.06189  [pdf, other

    cs.CV cs.AI cs.LG

    FasterViT: Fast Vision Transformers with Hierarchical Attention

    Authors: Ali Hatamizadeh, Greg Heinrich, Hongxu Yin, Andrew Tao, Jose M. Alvarez, Jan Kautz, Pavlo Molchanov

    Abstract: We design a new family of hybrid CNN-ViT neural networks, named FasterViT, with a focus on high image throughput for computer vision (CV) applications. FasterViT combines the benefits of fast local representation learning in CNNs and global modeling properties in ViT. Our newly introduced Hierarchical Attention (HAT) approach decomposes global self-attention with quadratic complexity into a multi-… ▽ More

    Submitted 1 April, 2024; v1 submitted 9 June, 2023; originally announced June 2023.

    Comments: ICLR'24 Accepted Paper

  23. GHz sample excitation at the ALBA-PEEM

    Authors: Muhammad Waqas Khaliq, José M. Álvarez, Antonio Camps, Nahikari González, José Ferrer, Ana Martinez-Carboneres, Jordi Prat, Sandra Ruiz-Gómez, Miguel Angel Niño, Ferran Macià, Lucia Aballe, Michael Foerster

    Abstract: We describe a setup that is used for high-frequency electrical sample excitation in a cathode lens electron microscope with the sample stage at high voltage as used in many synchrotron light sources. Electrical signals are transmitted by dedicated high-frequency components to the printed circuit board supporting the sample. Sub-miniature push-on connectors (SMP) are used to realize the connection… ▽ More

    Submitted 11 May, 2023; originally announced May 2023.

    Journal ref: Ultramicroscopy 2023

  24. Domain Adaptive Decision Trees: Implications for Accuracy and Fairness

    Authors: Jose M. Alvarez, Kristen M. Scott, Salvatore Ruggieri, Bettina Berendt

    Abstract: In uses of pre-trained machine learning models, it is a known issue that the target population in which the model is being deployed may not have been reflected in the source population with which the model was trained. This can result in a biased model when deployed, leading to a reduction in model performance. One risk is that, as the population changes, certain demographic groups will be under-s… ▽ More

    Submitted 31 May, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

    Comments: *Both authors contributed equally to this work. Accepted at FAccT '23

    Journal ref: FAccT '23: the 2023 ACM Conference on Fairness, Accountability, and Transparency Chicago IL USA June 12 - 15, 2023

  25. arXiv:2302.12251  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Completion

    Authors: Yiming Li, Zhiding Yu, Christopher Choy, Chaowei Xiao, Jose M. Alvarez, Sanja Fidler, Chen Feng, Anima Anandkumar

    Abstract: Humans can easily imagine the complete 3D geometry of occluded objects and scenes. This appealing ability is vital for recognition and understanding. To enable such capability in AI systems, we propose VoxFormer, a Transformer-based semantic scene completion framework that can output complete 3D volumetric semantics from only 2D images. Our framework adopts a two-stage design where we start from a… ▽ More

    Submitted 25 March, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

    Comments: CVPR 2023 Highlight (10% of accepted papers, 2.5% of submissions)

  26. arXiv:2302.11944  [pdf, other

    stat.ML cs.CY cs.LG

    Counterfactual Situation Testing: Uncovering Discrimination under Fairness given the Difference

    Authors: Jose M. Alvarez, Salvatore Ruggieri

    Abstract: We present counterfactual situation testing (CST), a causal data mining framework for detecting discrimination in classifiers. CST aims to answer in an actionable and meaningful way the intuitive question "what would have been the model outcome had the individual, or complainant, been of a different protected status?" It extends the legally-grounded situation testing of Thanh et al. (2011) by oper… ▽ More

    Submitted 16 October, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

    Journal ref: Proceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization; Boston, USA; October 30 - November 1, 2023

  27. arXiv:2301.03992  [pdf, other

    cs.CV cs.LG cs.MM

    Vision Transformers Are Good Mask Auto-Labelers

    Authors: Shiyi Lan, Xitong Yang, Zhiding Yu, Zuxuan Wu, Jose M. Alvarez, Anima Anandkumar

    Abstract: We propose Mask Auto-Labeler (MAL), a high-quality Transformer-based mask auto-labeling framework for instance segmentation using only box annotations. MAL takes box-cropped images as inputs and conditionally generates their mask pseudo-labels.We show that Vision Transformers are good mask auto-labelers. Our method significantly reduces the gap between auto-labeling and human annotation regarding… ▽ More

    Submitted 10 January, 2023; originally announced January 2023.

  28. arXiv:2211.02206  [pdf, other

    cs.CV

    Soft Masking for Cost-Constrained Channel Pruning

    Authors: Ryan Humble, Maying Shen, Jorge Albericio Latorre, Eric Darve1, Jose M. Alvarez

    Abstract: Structured channel pruning has been shown to significantly accelerate inference time for convolution neural networks (CNNs) on modern hardware, with a relatively minor loss of network accuracy. Recent works permanently zero these channels during training, which we observe to significantly hamper final accuracy, particularly as the fraction of the network being pruned increases. We propose Soft Mas… ▽ More

    Submitted 3 November, 2022; originally announced November 2022.

    Comments: Accepted by ECCV 2022

  29. arXiv:2210.06659  [pdf, other

    cs.CV

    Structural Pruning via Latency-Saliency Knapsack

    Authors: Maying Shen, Hongxu Yin, Pavlo Molchanov, Lei Mao, Jianna Liu, Jose M. Alvarez

    Abstract: Structural pruning can simplify network architecture and improve inference speed. We propose Hardware-Aware Latency Pruning (HALP) that formulates structural pruning as a global resource allocation optimization problem, aiming at maximizing the accuracy while constraining latency under a predefined budget on targeting device. For filter importance ranking, HALP leverages latency lookup table to tr… ▽ More

    Submitted 18 October, 2022; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: Accepted by NeurIPS 2022. arXiv admin note: substantial text overlap with arXiv:2110.10811

  30. arXiv:2210.01234  [pdf, other

    cs.LG cs.AI cs.CV

    Optimizing Data Collection for Machine Learning

    Authors: Rafid Mahmood, James Lucas, Jose M. Alvarez, Sanja Fidler, Marc T. Law

    Abstract: Modern deep learning systems require huge data sets to achieve impressive performance, but there is little guidance on how much or what kind of data to collect. Over-collecting data incurs unnecessary present costs, while under-collecting may incur future costs and delay workflows. We propose a new paradigm for modeling the data collection workflow as a formal optimal data collection problem that… ▽ More

    Submitted 3 October, 2022; originally announced October 2022.

    Comments: Accepted to NeurIPS 2022

  31. arXiv:2207.01778  [pdf, other

    cs.CV

    Object-Level Targeted Selection via Deep Template Matching

    Authors: Suraj Kothawade, Donna Roy, Michele Fenzi, Elmar Haussmann, Jose M. Alvarez, Christoph Angerer

    Abstract: Retrieving images with objects that are semantically similar to objects of interest (OOI) in a query image has many practical use cases. A few examples include fixing failures like false negatives/positives of a learned model or mitigating class imbalance in a dataset. The targeted selection task requires finding the relevant data from a large-scale pool of unlabeled data. Manual mining at this sc… ▽ More

    Submitted 4 July, 2022; originally announced July 2022.

    Comments: In Proceedings of the Intelligent Vehicles Symposium, IV 2022

  32. arXiv:2207.01725  [pdf, other

    cs.CV cs.LG

    How Much More Data Do I Need? Estimating Requirements for Downstream Tasks

    Authors: Rafid Mahmood, James Lucas, David Acuna, Daiqing Li, Jonah Philion, Jose M. Alvarez, Zhiding Yu, Sanja Fidler, Marc T. Law

    Abstract: Given a small training data set and a learning algorithm, how much more data is necessary to reach a target validation or test performance? This question is of critical importance in applications such as autonomous driving or medical imaging where collecting data is expensive and time-consuming. Overestimating or underestimating data requirements incurs substantial costs that could be avoided with… ▽ More

    Submitted 13 July, 2022; v1 submitted 4 July, 2022; originally announced July 2022.

    Comments: Accepted to CVPR 2022

  33. arXiv:2205.14971  [pdf, other

    cs.CV cs.LG

    Knowledge Distillation for 6D Pose Estimation by Aligning Distributions of Local Predictions

    Authors: Shuxuan Guo, Yinlin Hu, Jose M. Alvarez, Mathieu Salzmann

    Abstract: Knowledge distillation facilitates the training of a compact student network by using a deep teacher one. While this has achieved great success in many tasks, it remains completely unstudied for image-based 6D object pose estimation. In this work, we introduce the first knowledge distillation method driven by the 6D pose estimation task. To this end, we observe that most modern 6D pose estimation… ▽ More

    Submitted 28 November, 2022; v1 submitted 30 May, 2022; originally announced May 2022.

  34. arXiv:2205.03783  [pdf, other

    cs.CV

    Non-parametric Depth Distribution Modelling based Depth Inference for Multi-view Stereo

    Authors: Jiayu Yang, Jose M. Alvarez, Miaomiao Liu

    Abstract: Recent cost volume pyramid based deep neural networks have unlocked the potential of efficiently leveraging high-resolution images for depth inference from multi-view stereo. In general, those approaches assume that the depth of each pixel follows a unimodal distribution. Boundary pixels usually follow a multi-modal distribution as they represent different depths; Therefore, the assumption results… ▽ More

    Submitted 8 May, 2022; originally announced May 2022.

    Comments: CVPR 2022

  35. arXiv:2204.12451  [pdf, other

    cs.CV

    Understanding The Robustness in Vision Transformers

    Authors: Daquan Zhou, Zhiding Yu, Enze Xie, Chaowei Xiao, Anima Anandkumar, Jiashi Feng, Jose M. Alvarez

    Abstract: Recent studies show that Vision Transformers(ViTs) exhibit strong robustness against various corruptions. Although this property is partly attributed to the self-attention mechanism, there is still a lack of systematic understanding. In this paper, we examine the role of self-attention in learning robust representations. Our study is motivated by the intriguing properties of the emerging visual gr… ▽ More

    Submitted 8 November, 2022; v1 submitted 26 April, 2022; originally announced April 2022.

  36. arXiv:2204.05088  [pdf, other

    cs.CV

    M$^2$BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified Birds-Eye View Representation

    Authors: Enze Xie, Zhiding Yu, Daquan Zhou, Jonah Philion, Anima Anandkumar, Sanja Fidler, Ping Luo, Jose M. Alvarez

    Abstract: In this paper, we propose M$^2$BEV, a unified framework that jointly performs 3D object detection and map segmentation in the Birds Eye View~(BEV) space with multi-camera image inputs. Unlike the majority of previous works which separately process detection and segmentation, M$^2$BEV infers both tasks with a unified model and improves efficiency. M$^2$BEV efficiently transforms multi-view 2D image… ▽ More

    Submitted 19 April, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

    Comments: Tech Report

  37. arXiv:2203.06640  [pdf

    stat.AP cs.LG econ.EM

    Measuring anomalies in cigarette sales by using official data from Spanish provinces: Are there only the anomalies detected by the Empty Pack Surveys (EPS) used by Transnational Tobacco Companies (TTCs)?

    Authors: Pedro Cadahia, Antonio A. Golpe, Juan M. Martín Álvarez, E. Asensio

    Abstract: There is literature that questions the veracity of the studies commissioned by the transnational tobacco companies (TTC) to measure the illicit tobacco trade. Furthermore, there are studies that indicate that the Empty Pack Surveys (EPS) ordered by the TTCs are oversized. The novelty of this study is that, in addition to detecting the anomalies analyzed in the EPSs, there are provinces in which ci… ▽ More

    Submitted 13 March, 2022; originally announced March 2022.

    Journal ref: Cadahia, P., Golpe, A., Martín-Álvarez, J. M., Asensio, E. (2021). Tobacco Induced Diseases, 19(December), 98

  38. arXiv:2202.12181  [pdf, other

    cs.CV

    FreeSOLO: Learning to Segment Objects without Annotations

    Authors: Xinlong Wang, Zhiding Yu, Shalini De Mello, Jan Kautz, Anima Anandkumar, Chunhua Shen, Jose M. Alvarez

    Abstract: Instance segmentation is a fundamental vision task that aims to recognize and segment each object in an image. However, it requires costly annotations such as bounding boxes and segmentation masks for learning. In this work, we propose a fully unsupervised learning method that learns class-agnostic instance segmentation without any annotations. We present FreeSOLO, a self-supervised instance segme… ▽ More

    Submitted 25 April, 2022; v1 submitted 24 February, 2022; originally announced February 2022.

    Comments: 13 pages. Accepted to IEEE/CVF Conf. Comp. Vision Pattern Recognition (CVPR) 2022

  39. arXiv:2201.11358  [pdf, other

    cs.LG cs.CY cs.DS stat.ML

    Fairness Implications of Encoding Protected Categorical Attributes

    Authors: Carlos Mougan, Jose M. Alvarez, Salvatore Ruggieri, Steffen Staab

    Abstract: Past research has demonstrated that the explicit use of protected attributes in machine learning can improve both performance and fairness. Many machine learning algorithms, however, cannot directly process categorical attributes, such as country of birth or ethnicity. Because protected attributes frequently are categorical, they must be encoded as features that can be input to a chosen machine le… ▽ More

    Submitted 5 May, 2023; v1 submitted 27 January, 2022; originally announced January 2022.

    Comments: AIES'23 6th AAAI/ACM Conference on AI, Ethics, and Society 22 pages

  40. Boosting Supervised Learning Performance with Co-training

    Authors: Xinnan Du, William Zhang, Jose M. Alvarez

    Abstract: Deep learning perception models require a massive amount of labeled training data to achieve good performance. While unlabeled data is easy to acquire, the cost of labeling is prohibitive and could create a tremendous burden on companies or individuals. Recently, self-supervision has emerged as an alternative to leveraging unlabeled data. In this paper, we propose a new light-weight self-supervise… ▽ More

    Submitted 18 November, 2021; originally announced November 2021.

    Comments: 2021 IEEE Intelligent Vehicles Symposium

  41. arXiv:2110.12007  [pdf, other

    cs.CV cs.LG

    When to Prune? A Policy towards Early Structural Pruning

    Authors: Maying Shen, Pavlo Molchanov, Hongxu Yin, Jose M. Alvarez

    Abstract: Pruning enables appealing reductions in network memory footprint and time complexity. Conventional post-training pruning techniques lean towards efficient inference while overlooking the heavy computation for training. Recent exploration of pre-training pruning at initialization hints on training cost reduction via pruning, but suffers noticeable performance degradation. We attempt to combine the… ▽ More

    Submitted 22 October, 2021; originally announced October 2021.

  42. arXiv:2110.10811  [pdf, ps, other

    cs.CV cs.LG

    HALP: Hardware-Aware Latency Pruning

    Authors: Maying Shen, Hongxu Yin, Pavlo Molchanov, Lei Mao, Jianna Liu, Jose M. Alvarez

    Abstract: Structural pruning can simplify network architecture and improve inference speed. We propose Hardware-Aware Latency Pruning (HALP) that formulates structural pruning as a global resource allocation optimization problem, aiming at maximizing the accuracy while constraining latency under a predefined budget. For filter importance ranking, HALP leverages latency lookup table to track latency reductio… ▽ More

    Submitted 20 October, 2021; originally announced October 2021.

  43. arXiv:2109.03814  [pdf, other

    cs.CV

    Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers

    Authors: Zhiqi Li, Wenhai Wang, Enze Xie, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo, Tong Lu

    Abstract: Panoptic segmentation involves a combination of joint semantic segmentation and instance segmentation, where image contents are divided into two types: things and stuff. We present Panoptic SegFormer, a general framework for panoptic segmentation with transformers. It contains three innovative components: an efficient deeply-supervised mask decoder, a query decoupling strategy, and an improved pos… ▽ More

    Submitted 18 March, 2022; v1 submitted 8 September, 2021; originally announced September 2021.

    Comments: Accepted to CVPR 2022

  44. arXiv:2107.06304  [pdf, other

    cs.LG cs.CV

    Privacy Vulnerability of Split Computing to Data-Free Model Inversion Attacks

    Authors: Xin Dong, Hongxu Yin, Jose M. Alvarez, Jan Kautz, Pavlo Molchanov, H. T. Kung

    Abstract: Mobile edge devices see increased demands in deep neural networks (DNNs) inference while suffering from stringent constraints in computing resources. Split computing (SC) emerges as a popular approach to the issue by executing only initial layers on devices and offloading the remaining to the cloud. Prior works usually assume that SC offers privacy benefits as only intermediate features, instead o… ▽ More

    Submitted 24 October, 2022; v1 submitted 13 July, 2021; originally announced July 2021.

    Comments: A new data-free inversion method to reverse neural networks and get input from intermediate feature maps. BMVC'22

  45. arXiv:2106.11921  [pdf, other

    cs.CV cs.LG

    Not All Labels Are Equal: Rationalizing The Labeling Costs for Training Object Detection

    Authors: Ismail Elezi, Zhiding Yu, Anima Anandkumar, Laura Leal-Taixe, Jose M. Alvarez

    Abstract: Deep neural networks have reached high accuracy on object detection but their success hinges on large amounts of labeled data. To reduce the labels dependency, various active learning strategies have been proposed, typically based on the confidence of the detector. However, these methods are biased towards high-performing classes and can lead to acquired datasets that are not good representatives… ▽ More

    Submitted 29 November, 2021; v1 submitted 22 June, 2021; originally announced June 2021.

    Comments: Includes supplementary material

  46. arXiv:2106.05209  [pdf, other

    cs.CV cs.AI cs.LG

    Distilling Image Classifiers in Object Detectors

    Authors: Shuxuan Guo, Jose M. Alvarez, Mathieu Salzmann

    Abstract: Knowledge distillation constitutes a simple yet effective way to improve the performance of a compact student network by exploiting the knowledge of a more powerful teacher. Nevertheless, the knowledge distillation literature remains limited to the scenario where the student and the teacher tackle the same task. Here, we investigate the problem of transferring knowledge not only across architectur… ▽ More

    Submitted 9 February, 2022; v1 submitted 9 June, 2021; originally announced June 2021.

  47. arXiv:2105.15203  [pdf, other

    cs.CV cs.LG

    SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

    Authors: Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo

    Abstract: We present SegFormer, a simple, efficient yet powerful semantic segmentation framework which unifies Transformers with lightweight multilayer perception (MLP) decoders. SegFormer has two appealing features: 1) SegFormer comprises a novel hierarchically structured Transformer encoder which outputs multiscale features. It does not need positional encoding, thereby avoiding the interpolation of posit… ▽ More

    Submitted 28 October, 2021; v1 submitted 31 May, 2021; originally announced May 2021.

    Comments: Accepted by NeurIPS 2021

  48. arXiv:2104.07586  [pdf, other

    cs.LG cs.CV

    See through Gradients: Image Batch Recovery via GradInversion

    Authors: Hongxu Yin, Arun Mallya, Arash Vahdat, Jose M. Alvarez, Jan Kautz, Pavlo Molchanov

    Abstract: Training deep neural networks requires gradient estimation from data batches to update parameters. Gradients per parameter are averaged over a set of data and this has been presumed to be safe for privacy-preserving training in joint, collaborative, and federated learning applications. Prior work only showed the possibility of recovering input data given gradients under very restrictive conditions… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

    Comments: CVPR 2021 accepted paper

  49. arXiv:2104.05702  [pdf, other

    cs.CV cs.LG

    Image-Level or Object-Level? A Tale of Two Resampling Strategies for Long-Tailed Detection

    Authors: Nadine Chang, Zhiding Yu, Yu-Xiong Wang, Anima Anandkumar, Sanja Fidler, Jose M. Alvarez

    Abstract: Training on datasets with long-tailed distributions has been challenging for major recognition tasks such as classification and detection. To deal with this challenge, image resampling is typically introduced as a simple but effective approach. However, we observe that long-tailed detection differs from classification since multiple classes may be present in one image. As a result, image resamplin… ▽ More

    Submitted 18 October, 2021; v1 submitted 12 April, 2021; originally announced April 2021.

    Comments: Accepted to ICML 2021

  50. arXiv:2104.02972  [pdf, other

    cs.CV

    Self-supervised Learning of Depth Inference for Multi-view Stereo

    Authors: Jiayu Yang, Jose M. Alvarez, Miaomiao Liu

    Abstract: Recent supervised multi-view depth estimation networks have achieved promising results. Similar to all supervised approaches, these networks require ground-truth data during training. However, collecting a large amount of multi-view depth data is very challenging. Here, we propose a self-supervised learning framework for multi-view stereo that exploit pseudo labels from the input data. We start by… ▽ More

    Submitted 7 April, 2021; originally announced April 2021.

    Comments: CVPR 2021