research-article

Open access

Drivable Avatar Clothing: Faithful Full-Body Telepresence with Dynamic Clothing Driven by Sparse RGB-D Input

Authors:

Jessica Hodgins, and

Timur BagautdinovAuthors Info & Claims

SA '23: SIGGRAPH Asia 2023 Conference Papers

December 2023

Article No.: 24, Pages 1 - 11

https://doi.org/10.1145/3610548.3618136

Published: 11 December 2023 Publication History

All formats PDF

Abstract

Clothing is an important part of human appearance but challenging to model in photorealistic avatars. In this work we present avatars with dynamically moving loose clothing that can be faithfully driven by sparse RGB-D inputs as well as body and face motion. We propose a Neural Iterative Closest Point (N-ICP) algorithm that can efficiently track the coarse garment shape given sparse depth input. Given the coarse tracking results, the input RGB-D images are then remapped to texel-aligned features, which are fed into the drivable avatar models to faithfully reconstruct appearance details. We evaluate our method against recent image-driven synthesis baselines, and conduct a comprehensive analysis of the N-ICP algorithm. We demonstrate that our method can generalize to a novel testing environment, while preserving the ability to produce high-fidelity and faithful clothing dynamics and appearance.

Supplemental Material

MP4 File

Supplementary video in "Aug17-voice.mp4" and supplementary document in "supp.pdf".

Download
461.95 MB

PDF File

Supplementary video in "Aug17-voice.mp4" and supplementary document in "supp.pdf".

Download
1.35 MB

References

[1]

Timur Bagautdinov, Chenglei Wu, Tomas Simon, Fabian Prada, Takaaki Shiratori, Shih-En Wei, Weipeng Xu, Yaser Sheikh, and Jason Saragih. 2021. Driving-signal aware full-body avatars. ACM Transactions on Graphics (TOG) 40, 4 (2021).

Digital Library

[2]

Bharat Lal Bhatnagar, Cristian Sminchisescu, Christian Theobalt, and Gerard Pons-Moll. 2020. Loopreg: Self-supervised learning of implicit surface correspondences, pose and shape for 3d human mesh registration. Advances in Neural Information Processing Systems 33 (2020).

[3]

Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, and Michael J Black. 2016. Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image. In European Conference on Computer Vision.

[4]

Aljaz Bozic, Pablo Palafox, Michael Zollhöfer, Angela Dai, Justus Thies, and Matthias Nießner. 2020. Neural non-rigid tracking. Advances in Neural Information Processing Systems 33 (2020).

[5]

Aljaz Bozic, Pablo Palafox, Michael Zollhofer, Justus Thies, Angela Dai, and Matthias Nießner. 2021. Neural deformation graphs for globally-consistent non-rigid reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]

Anpei Chen, Zexiang Xu, Fuqiang Zhao, Xiaoshuai Zhang, Fanbo Xiang, Jingyi Yu, and Hao Su. 2021. Mvsnerf: Fast generalizable radiance field reconstruction from multi-view stereo. In Proceedings of the IEEE/CVF International Conference on Computer Vision.

[7]

Mingfei Chen, Jianfeng Zhang, Xiangyu Xu, Lijuan Liu, Yujun Cai, Jiashi Feng, and Shuicheng Yan. 2022. Geometry-guided progressive nerf for generalizable and efficient neural human rendering. In European Conference on Computer Vision.

Digital Library

[8]

Enric Corona, Gerard Pons-Moll, Guillem Alenyà, and Francesc Moreno-Noguer. 2022. Learned vertex descent: a new direction for 3D human model fitting. In European Conference on Computer Vision.

Digital Library

[9]

Brian Curless and Marc Levoy. 1996. A volumetric method for building complex models from range images. In SIGGRAPH 1996 Conference Papers.

Digital Library

[10]

Wei Dong, Yixing Lao, Michael Kaess, and Vladlen Koltun. 2022. ASH: A modern framework for parallel spatial hashing in 3D perception. IEEE transactions on pattern analysis and machine intelligence (2022).

[11]

Mingsong Dou, Philip Davidson, Sean Ryan Fanello, Sameh Khamis, Adarsh Kowdle, Christoph Rhemann, Vladimir Tankovich, and Shahram Izadi. 2017. Motion2fusion: Real-time volumetric performance capture. ACM Transactions on Graphics (TOG) 36, 6 (2017).

Digital Library

[12]

Mingsong Dou, Sameh Khamis, Yury Degtyarev, Philip Davidson, Sean Ryan Fanello, Adarsh Kowdle, Sergio Orts Escolano, Christoph Rhemann, David Kim, Jonathan Taylor, 2016. Fusion4d: Real-time performance capture of challenging scenes. ACM Transactions on Graphics (TOG) 35, 4 (2016).

Digital Library

[13]

Reuben Feinman. 2021. Pytorch-minimize: a library for numerical optimization with autograd. https://github.com/rfeinman/pytorch-minimize

[14]

Wanquan Feng, Juyong Zhang, Hongrui Cai, Haofei Xu, Junhui Hou, and Hujun Bao. 2021. Recurrent multi-view alignment network for unsupervised surface registration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]

Qingzhe Gao, Yiming Wang, Libin Liu, Lingjie Liu, Christian Theobalt, and Baoquan Chen. 2023. Neural novel actor: Learning a generalized animatable neural representation for human actors. IEEE Transactions on Visualization and Computer Graphics (2023).

[16]

Xiangjun Gao, Jiaolong Yang, Jongyoo Kim, Sida Peng, Zicheng Liu, and Xin Tong. 2022. MPS-NeRF: Generalizable 3D Human Rendering from Multiview Images. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022).

[17]

Artur Grigorev, Karim Iskakov, Anastasia Ianina, Renat Bashirov, Ilya Zakharkin, Alexander Vakhitov, and Victor Lempitsky. 2021. Stylepeople: A generative model of fullbody human avatars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]

Kaiwen Guo, Feng Xu, Yangang Wang, Yebin Liu, and Qionghai Dai. 2015. Robust non-rigid motion tracking and surface reconstruction using l0 regularization. In Proceedings of the IEEE International Conference on Computer Vision.

Digital Library

[19]

Marc Habermann, Lingjie Liu, Weipeng Xu, Michael Zollhoefer, Gerard Pons-Moll, and Christian Theobalt. 2021. Real-time deep dynamic characters. ACM Transactions on Graphics (TOG) 40, 4 (2021).

Digital Library

[20]

Mustafa Işık, Martin Rünz, Markos Georgopoulos, Taras Khakhulin, Jonathan Starck, Lourdes Agapito, and Matthias Nießner. 2023. HumanRF: High-Fidelity Neural Radiance Fields for Humans in Motion. ACM Transactions on Graphics (TOG) 42, 4 (2023).

Digital Library

[21]

Youngjoong Kwon, Dahun Kim, Duygu Ceylan, and Henry Fuchs. 2021. Neural human performer: Learning generalizable radiance fields for human performance rendering. Advances in Neural Information Processing Systems 34 (2021).

[22]

YoungJoong Kwon, Dahun Kim, Duygu Ceylan, and Henry Fuchs. 2023. Neural Image-based Avatars: Generalizable Radiance Fields for Human Avatar Modeling. In International Conference on Learning Representations.

[23]

Jason Lawrence, Danb Goldman, Supreeth Achar, Gregory Major Blascovich, Joseph G Desloge, Tommy Fortes, Eric M Gomez, Sascha Häberling, Hugues Hoppe, Andy Huibers, 2021. Project starline: a high-fidelity telepresence system. ACM Transactions on Graphics (TOG) 40, 6 (2021).

Digital Library

[24]

Hao Li, Robert W Sumner, and Mark Pauly. 2008. Global correspondence optimization for non-rigid registration of depth scans. In Computer graphics forum, Vol. 27.

[25]

Ruilong Li, Julian Tanke, Minh Vo, Michael Zollhöfer, Jürgen Gall, Angjoo Kanazawa, and Christoph Lassner. 2022. Tava: Template-free animatable volumetric actors. In European Conference on Computer Vision.

Digital Library

[26]

Yang Li, Aljaz Bozic, Tianwei Zhang, Yanli Ji, Tatsuya Harada, and Matthias Nießner. 2020. Learning to optimize non-rigid tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]

Haotong Lin, Sida Peng, Zhen Xu, Yunzhi Yan, Qing Shuai, Hujun Bao, and Xiaowei Zhou. 2022. Efficient Neural Radiance Fields for Interactive Free-viewpoint Video. In SIGGRAPH Asia 2022 Conference Papers.

[28]

Lingjie Liu, Marc Habermann, Viktor Rudnev, Kripasindhu Sarkar, Jiatao Gu, and Christian Theobalt. 2021. Neural actor: Neural free-view synthesis of human actors with pose control. ACM Transactions on Graphics (TOG) 40, 6 (2021).

Digital Library

[29]

Stephen Lombardi, Jason Saragih, Tomas Simon, and Yaser Sheikh. 2018. Deep appearance models for face rendering. ACM Transactions on Graphics (TOG) 37, 4 (2018).

Digital Library

[30]

Stephen Lombardi, Tomas Simon, Gabriel Schwartz, Michael Zollhoefer, Yaser Sheikh, and Jason Saragih. 2021. Mixture of volumetric primitives for efficient neural rendering. ACM Transactions on Graphics (TOG) 40, 4 (2021).

Digital Library

[31]

Ricardo Martin-Brualla, Rohit Pandey, Shuoran Yang, Pavel Pidlypenskyi, Jonathan Taylor, Julien Valentin, Sameh Khamis, Philip Davidson, Anastasia Tkach, Peter Lincoln, 2018. LookinGood: enhancing performance capture with real-time neural re-rendering. ACM Transactions on Graphics (TOG) 37, 6 (2018).

Digital Library

[32]

Dushyant Mehta, Srinath Sridhar, Oleksandr Sotnychenko, Helge Rhodin, Mohammad Shafiei, Hans-Peter Seidel, Weipeng Xu, Dan Casas, and Christian Theobalt. 2017. Vnect: Real-time 3d human pose estimation with a single rgb camera. ACM Transactions on Graphics (TOG) 36, 4 (2017).

Digital Library

[33]

Marko Mihajlovic, Aayush Bansal, Michael Zollhoefer, Siyu Tang, and Shunsuke Saito. 2022. KeypointNeRF: Generalizing image-based volumetric avatars using relative spatial encoding of keypoints. In European Conference on Computer Vision.

Digital Library

[34]

Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. 2020. Nerf: Representing scenes as neural radiance fields for view synthesis. In European Conference on Computer Vision.

Digital Library

[35]

Richard A Newcombe, Dieter Fox, and Steven M Seitz. 2015. Dynamicfusion: Reconstruction and tracking of non-rigid scenes in real-time. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

[36]

Phong Nguyen-Ha, Nikolaos Sarafianos, Christoph Lassner, Janne Heikkilä, and Tony Tung. 2022. Free-Viewpoint RGB-D Human Performance Capture and Rendering. In European Conference on Computer Vision.

[37]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems 32 (2019).

[38]

Sida Peng, Junting Dong, Qianqian Wang, Shangzhan Zhang, Qing Shuai, Xiaowei Zhou, and Hujun Bao. 2021a. Animatable neural radiance fields for modeling dynamic human bodies. In Proceedings of the IEEE/CVF International Conference on Computer Vision.

[39]

Sida Peng, Yuanqing Zhang, Yinghao Xu, Qianqian Wang, Qing Shuai, Hujun Bao, and Xiaowei Zhou. 2021b. Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]

Luis Pineda, Taosha Fan, Maurizio Monge, Shobha Venkataraman, Paloma Sodhi, Ricky TQ Chen, Joseph Ortiz, Daniel DeTone, Austin Wang, Stuart Anderson, 2022. Theseus: A library for differentiable nonlinear optimization. Advances in Neural Information Processing Systems 35 (2022).

[41]

Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. 2017. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in Neural Information Processing Systems 30 (2017).

[42]

Amit Raj, Julian Tanke, James Hays, Minh Vo, Carsten Stoll, and Christoph Lassner. 2021. ANR: Articulated Neural Rendering for Virtual Avatars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]

Edoardo Remelli, Timur Bagautdinov, Shunsuke Saito, Chenglei Wu, Tomas Simon, Shih-En Wei, Kaiwen Guo, Zhe Cao, Fabian Prada, Jason Saragih, 2022. Drivable Volumetric Avatars using Texel-Aligned Features. In SIGGRAPH 2022 Conference Papers.

Digital Library

[44]

Ruizhi Shao, Liliang Chen, Zerong Zheng, Hongwen Zhang, Yuxiang Zhang, Han Huang, Yandong Guo, and Yebin Liu. 2022a. FloRen: Real-time High-quality Human Performance Rendering via Appearance Flow Using Sparse RGB Cameras. In SIGGRAPH Asia 2022 Conference Papers.

Digital Library

[45]

Ruizhi Shao, Hongwen Zhang, He Zhang, Mingjia Chen, Yan-Pei Cao, Tao Yu, and Yebin Liu. 2022b. Doublefield: Bridging the neural surface and radiance fields for high-fidelity human reconstruction and rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46]

Ruizhi Shao, Zerong Zheng, Hanzhang Tu, Boning Liu, Hongwen Zhang, and Yebin Liu. 2023. Tensor4D: Efficient Neural 4D Decomposition for High-fidelity Dynamic Reconstruction and Rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47]

Aliaksandra Shysheya, Egor Zakharov, Kara-Ali Aliev, Renat Bashirov, Egor Burkov, Karim Iskakov, Aleksei Ivakhnenko, Yury Malkov, Igor Pasechnik, Dmitry Ulyanov, 2019. Textured neural avatars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]

Jie Song, Xu Chen, and Otmar Hilliges. 2020. Human body model fitting by learned gradient descent. In European Conference on Computer Vision.

Digital Library

[49]

Shih-Yang Su, Frank Yu, Michael Zollhöfer, and Helge Rhodin. 2021. A-nerf: Articulated neural radiance fields for learning human shape, appearance, and pose. Advances in Neural Information Processing Systems 34 (2021).

[50]

Robert W Sumner, Johannes Schmid, and Mark Pauly. 2007. Embedded deformation for shape manipulation. ACM Transactions on Graphics (TOG) 26, 3 (2007).

Digital Library

[51]

Xin Suo, Yuheng Jiang, Pei Lin, Yingliang Zhang, Minye Wu, Kaiwen Guo, and Lan Xu. 2021. Neuralhumanfvv: Real-time neural volumetric human performance rendering using rgb cameras. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.

[52]

Justus Thies, Michael Zollhöfer, and Matthias Nießner. 2019. Deferred neural rendering: Image synthesis using neural textures. ACM Transactions on Graphics (TOG) 38, 4 (2019).

Digital Library

[53]

Qianqian Wang, Zhicheng Wang, Kyle Genova, Pratul P Srinivasan, Howard Zhou, Jonathan T Barron, Ricardo Martin-Brualla, Noah Snavely, and Thomas Funkhouser. 2021. Ibrnet: Learning multi-view image-based rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[54]

Yi Wang, Xin Tao, Xiaojuan Qi, Xiaoyong Shen, and Jiaya Jia. 2018. Image inpainting via generative multi-column convolutional neural networks. Advances in neural information processing systems 31 (2018).

[55]

Donglai Xiang, Timur Bagautdinov, Tuur Stuyck, Fabian Prada, Javier Romero, Weipeng Xu, Shunsuke Saito, Jingfan Guo, Breannan Smith, Takaaki Shiratori, 2022. Dressing Avatars: Deep Photorealistic Appearance for Physically Simulated Clothing. ACM Transactions on Graphics (TOG) 41, 6 (2022).

Digital Library

[56]

Donglai Xiang, Fabian Prada, Timur Bagautdinov, Weipeng Xu, Yuan Dong, He Wen, Jessica Hodgins, and Chenglei Wu. 2021. Modeling clothing as a separate layer for an animatable human avatar. ACM Transactions on Graphics (TOG) 40, 6 (2021).

Digital Library

[57]

Alex Yu, Vickie Ye, Matthew Tancik, and Angjoo Kanazawa. 2021a. pixelnerf: Neural radiance fields from one or few images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[58]

Tao Yu, Zerong Zheng, Kaiwen Guo, Pengpeng Liu, Qionghai Dai, and Yebin Liu. 2021b. Function4d: Real-time human volumetric capture from very sparse consumer rgbd sensors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[59]

Fuqiang Zhao, Wei Yang, Jiakai Zhang, Pei Lin, Yingliang Zhang, Jingyi Yu, and Lan Xu. 2022. Humannerf: Efficiently generated human radiance field from sparse inputs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[60]

Zerong Zheng, Han Huang, Tao Yu, Hongwen Zhang, Yandong Guo, and Yebin Liu. 2022. Structured local radiance fields for human avatar modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[61]

Michael Zollhöfer, Matthias Nießner, Shahram Izadi, Christoph Rehmann, Christopher Zach, Matthew Fisher, Chenglei Wu, Andrew Fitzgibbon, Charles Loop, Christian Theobalt, 2014. Real-time non-rigid reconstruction using an RGB-D camera. ACM Transactions on Graphics (TOG) 33, 4 (2014).

Digital Library

Cited By

Xiang D(2023)Modeling Dynamic Clothing for Data-Driven Photorealistic AvatarsSIGGRAPH Asia 2023 Doctoral Consortium10.1145/3623053.3623373(1-5)Online publication date: 28-Nov-2023
https://dl.acm.org/doi/10.1145/3623053.3623373

Index Terms

Drivable Avatar Clothing: Faithful Full-Body Telepresence with Dynamic Clothing Driven by Sparse RGB-D Input
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Computer graphics

Recommendations

Modeling clothing as a separate layer for an animatable human avatar

We have recently seen great progress in building photorealistic animatable full-body codec avatars, but generating high-fidelity animation of clothing is still difficult. To address these difficulties, we propose a method to build an animatable clothed ...
Read More
Capturing and Animation of Body and Clothing from Monocular Video
SA '22: SIGGRAPH Asia 2022 Conference Papers

While recent work has shown progress on extracting clothed 3D human avatars from a single image, video, or a set of 3D scans, several limitations remain. Most methods use a holistic representation to jointly model the body and clothing, which means that ...
Read More
A 2D human body model dressed in eigen clothing
ECCV'10: Proceedings of the 11th European conference on Computer vision: Part I

Detection, tracking, segmentation and pose estimation of people in monocular images are widely studied. Two-dimensional models of the human body are extensively used, however, they are typically fairly crude, representing the body either as a rough ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SA '23: SIGGRAPH Asia 2023 Conference Papers

December 2023

1113 pages

ISBN:9798400703157

DOI:10.1145/3610548

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGGRAPH: ACM Special Interest Group on Computer Graphics and Interactive Techniques

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 December 2023

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Data Availability

Supplementary video in "Aug17-voice.mp4" and supplementary document in "supp.pdf". https://dl.acm.org/doi/10.1145/3610548.3618136#supp.pdf

Supplementary video in "Aug17-voice.mp4" and supplementary document in "supp.pdf". https://dl.acm.org/doi/10.1145/3610548.3618136#supp.pdf

Conference

SA '23

Sponsor:

SIGGRAPH

SA '23: SIGGRAPH Asia 2023

December 12 - 15, 2023

NSW, Sydney, Australia

Acceptance Rates

Overall Acceptance Rate 178 of 869 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
429
Total Downloads

Downloads (Last 12 months)429
Downloads (Last 6 weeks)78

Other Metrics

View Author Metrics

Citations

Cited By

Xiang D(2023)Modeling Dynamic Clothing for Data-Driven Photorealistic AvatarsSIGGRAPH Asia 2023 Doctoral Consortium10.1145/3623053.3623373(1-5)Online publication date: 28-Nov-2023
https://dl.acm.org/doi/10.1145/3623053.3623373

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents