Skip to main content

Showing 1–50 of 337 results for author: Abbeel, P

  1. arXiv:2407.12282  [pdf, other

    cs.LG cs.AI cs.AR

    Chip Placement with Diffusion

    Authors: Vint Lee, Chun Deng, Leena Elzeiny, Pieter Abbeel, John Wawrzynek

    Abstract: Macro placement is a vital step in digital circuit design that defines the physical location of large collections of components, known as macros, on a 2-dimensional chip. The physical layout obtained during placement determines key performance metrics of the chip, such as power consumption, area, and performance. Existing learning-based methods typically fall short because of their reliance on rei… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  2. arXiv:2406.07398  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Visual Representation Learning with Stochastic Frame Prediction

    Authors: Huiwon Jang, Dongyoung Kim, Junsu Kim, Jinwoo Shin, Pieter Abbeel, Younggyo Seo

    Abstract: Self-supervised learning of image representations by predicting future frames is a promising direction but still remains a challenge. This is because of the under-determined nature of frame prediction; multiple potential futures can arise from a single current frame. To tackle this challenge, in this paper, we revisit the idea of stochastic video generation that learns to capture uncertainty in fr… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: International Conference on Machine Learning (ICML) 2024

  3. arXiv:2405.04798  [pdf, other

    cs.RO cs.AI

    From LLMs to Actions: Latent Codes as Bridges in Hierarchical Robot Control

    Authors: Yide Shentu, Philipp Wu, Aravind Rajeswaran, Pieter Abbeel

    Abstract: Hierarchical control for robotics has long been plagued by the need to have a well defined interface layer to communicate between high-level task planners and low-level policies. With the advent of LLMs, language has been emerging as a prospective interface layer. However, this has several limitations. Not all tasks can be decomposed into steps that are easily expressible in natural language (e.g.… ▽ More

    Submitted 8 July, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

  4. arXiv:2403.10506  [pdf, other

    cs.RO cs.AI cs.LG

    HumanoidBench: Simulated Humanoid Benchmark for Whole-Body Locomotion and Manipulation

    Authors: Carmelo Sferrazza, Dun-Ming Huang, Xingyu Lin, Youngwoon Lee, Pieter Abbeel

    Abstract: Humanoid robots hold great promise in assisting humans in diverse environments and tasks, due to their flexibility and adaptability leveraging human-like morphology. However, research in humanoid robots is often bottlenecked by the costly and fragile hardware setups. To accelerate algorithmic research in humanoid robots, we present a high-dimensional, simulated robot learning benchmark, HumanoidBe… ▽ More

    Submitted 18 June, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

  5. arXiv:2403.04114  [pdf, other

    cs.RO cs.CV cs.LG

    Closing the Visual Sim-to-Real Gap with Object-Composable NeRFs

    Authors: Nikhil Mishra, Maximilian Sieb, Pieter Abbeel, Xi Chen

    Abstract: Deep learning methods for perception are the cornerstone of many robotic systems. Despite their potential for impressive performance, obtaining real-world training data is expensive, and can be impractically difficult for some tasks. Sim-to-real transfer with domain randomization offers a potential workaround, but often requires extensive manual tuning and results in models that are brittle to dis… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: ICRA 2024

  6. arXiv:2403.03174  [pdf, other

    cs.RO cs.AI

    MOKA: Open-Vocabulary Robotic Manipulation through Mark-Based Visual Prompting

    Authors: Fangchen Liu, Kuan Fang, Pieter Abbeel, Sergey Levine

    Abstract: Open-vocabulary generalization requires robotic systems to perform tasks involving complex and diverse environments and task goals. While the recent advances in vision language models (VLMs) present unprecedented opportunities to solve unseen problems, how to utilize their emergent capabilities to control robots in the physical world remains an open question. In this paper, we present MOKA (Markin… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  7. arXiv:2403.02338  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Twisting Lids Off with Two Hands

    Authors: Toru Lin, Zhao-Heng Yin, Haozhi Qi, Pieter Abbeel, Jitendra Malik

    Abstract: Manipulating objects with two multi-fingered hands has been a long-standing challenge in robotics, attributed to the contact-rich nature of many manipulation tasks and the complexity inherent in coordinating a high-dimensional bimanual system. In this work, we consider the problem of twisting lids of various bottle-like objects with two hands, and demonstrate that policies trained in simulation us… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: Project page can be found at https://toruowo.github.io/bimanual-twist

  8. arXiv:2402.17139  [pdf, other

    cs.CV cs.AI

    Video as the New Language for Real-World Decision Making

    Authors: Sherry Yang, Jacob Walker, Jack Parker-Holder, Yilun Du, Jake Bruce, Andre Barreto, Pieter Abbeel, Dale Schuurmans

    Abstract: Both text and video data are abundant on the internet and support large-scale self-supervised learning through next token or frame prediction. However, they have not been equally leveraged: language models have had significant real-world impact, whereas video generation has remained largely limited to media entertainment. Yet video data captures important information about the physical world that… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  9. arXiv:2402.17135  [pdf, other

    cs.LG cs.AI

    Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings

    Authors: Kevin Frans, Seohong Park, Pieter Abbeel, Sergey Levine

    Abstract: Can we pre-train a generalist agent from a large amount of unlabeled offline trajectories such that it can be immediately adapted to any new downstream tasks in a zero-shot manner? In this work, we present a functional reward encoding (FRE) as a general, scalable solution to this zero-shot RL problem. Our main idea is to learn functional representations of any arbitrary tasks by encoding their sta… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  10. arXiv:2402.10260  [pdf, other

    cs.LG cs.CL cs.CR

    A StrongREJECT for Empty Jailbreaks

    Authors: Alexandra Souly, Qingyuan Lu, Dillon Bowen, Tu Trinh, Elvis Hsieh, Sana Pandey, Pieter Abbeel, Justin Svegliato, Scott Emmons, Olivia Watkins, Sam Toyer

    Abstract: The rise of large language models (LLMs) has drawn attention to the existence of "jailbreaks" that allow the models to be used maliciously. However, there is no standard benchmark for measuring the severity of a jailbreak, leaving authors of jailbreak papers to create their own. We show that these benchmarks often include vague or unanswerable questions and use grading criteria that are biased tow… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: Code and data at https://github.com/alexandrasouly/strongreject

  11. arXiv:2402.08268  [pdf, other

    cs.LG

    World Model on Million-Length Video And Language With Blockwise RingAttention

    Authors: Hao Liu, Wilson Yan, Matei Zaharia, Pieter Abbeel

    Abstract: Current language models fall short in understanding aspects of the world not easily described in words, and struggle with complex, long-form tasks. Video sequences offer valuable temporal information absent in language and static images, making them attractive for joint modeling with language. Such models could develop a understanding of both human textual knowledge and the physical world, enablin… ▽ More

    Submitted 14 March, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

  12. arXiv:2401.16889  [pdf, other

    cs.RO cs.AI eess.SY

    Reinforcement Learning for Versatile, Dynamic, and Robust Bipedal Locomotion Control

    Authors: Zhongyu Li, Xue Bin Peng, Pieter Abbeel, Sergey Levine, Glen Berseth, Koushil Sreenath

    Abstract: This paper presents a comprehensive study on using deep reinforcement learning (RL) to create dynamic locomotion controllers for bipedal robots. Going beyond focusing on a single locomotion skill, we develop a general control solution that can be used for a range of dynamic bipedal skills, from periodic walking and running to aperiodic jumping and standing. Our RL-based controller incorporates a n… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

  13. arXiv:2401.08553  [pdf, other

    cs.RO

    FMB: a Functional Manipulation Benchmark for Generalizable Robotic Learning

    Authors: Jianlan Luo, Charles Xu, Fangchen Liu, Liam Tan, Zipeng Lin, Jeffrey Wu, Pieter Abbeel, Sergey Levine

    Abstract: In this paper, we propose a real-world benchmark for studying robotic learning in the context of functional manipulation: a robot needs to accomplish complex long-horizon behaviors by composing individual manipulation skills in functionally relevant ways. The core design principles of our Functional Manipulation Benchmark (FMB) emphasize a harmonious balance between complexity and accessibility. T… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

  14. arXiv:2401.05442  [pdf, other

    cs.LG cs.AI

    Functional Graphical Models: Structure Enables Offline Data-Driven Optimization

    Authors: Jakub Grudzien Kuba, Masatoshi Uehara, Pieter Abbeel, Sergey Levine

    Abstract: While machine learning models are typically trained to solve prediction problems, we might often want to use them for optimization problems. For example, given a dataset of proteins and their corresponding fluorescence levels, we might want to optimize for a new protein with the highest possible fluorescence. This kind of data-driven optimization (DDO) presents a range of challenges beyond those i… ▽ More

    Submitted 11 January, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

  15. arXiv:2401.00025  [pdf, other

    cs.RO cs.CV

    Any-point Trajectory Modeling for Policy Learning

    Authors: Chuan Wen, Xingyu Lin, John So, Kai Chen, Qi Dou, Yang Gao, Pieter Abbeel

    Abstract: Learning from demonstration is a powerful method for teaching robots new skills, and having more demonstration data often improves policy learning. However, the high cost of collecting demonstration data is a significant bottleneck. Videos, as a rich data source, contain knowledge of behaviors, physics, and semantics, but extracting control-specific information from them is challenging due to the… ▽ More

    Submitted 12 July, 2024; v1 submitted 28 December, 2023; originally announced January 2024.

    Comments: 18 pages, 15 figures

  16. arXiv:2312.11752  [pdf, other

    cs.LG cs.AI

    Learning a Diffusion Model Policy from Rewards via Q-Score Matching

    Authors: Michael Psenka, Alejandro Escontrela, Pieter Abbeel, Yi Ma

    Abstract: Diffusion models have become a popular choice for representing actor policies in behavior cloning and offline reinforcement learning. This is due to their natural ability to optimize an expressive class of distributions over a continuous space. However, previous works fail to exploit the score-based structure of diffusion models, and instead utilize a simple behavior cloning term to train the acto… ▽ More

    Submitted 16 July, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: ICML 2024. 20 pages, 9 figures

  17. arXiv:2311.18827  [pdf, other

    cs.GR cs.AI cs.CV cs.LG cs.MM

    Motion-Conditioned Image Animation for Video Editing

    Authors: Wilson Yan, Andrew Brown, Pieter Abbeel, Rohit Girdhar, Samaneh Azadi

    Abstract: We introduce MoCA, a Motion-Conditioned Image Animation approach for video editing. It leverages a simple decomposition of the video editing problem into image editing followed by motion-conditioned image animation. Furthermore, given the lack of robust evaluation datasets for video editing, we introduce a new benchmark that measures edit capability across a wide variety of tasks, such as object r… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

    Comments: Project page: https://facebookresearch.github.io/MoCA

  18. arXiv:2311.09235  [pdf, other

    cs.LG cs.AI

    Scalable Diffusion for Materials Generation

    Authors: Sherry Yang, KwangHwan Cho, Amil Merchant, Pieter Abbeel, Dale Schuurmans, Igor Mordatch, Ekin Dogus Cubuk

    Abstract: Generative models trained on internet-scale data are capable of generating novel and realistic texts, images, and videos. A natural next question is whether these models can advance science, for example by generating novel stable materials. Traditionally, models with explicit structures (e.g., graphs) have been used in modeling structural relationships in scientific data (e.g., atoms and bonds in… ▽ More

    Submitted 3 June, 2024; v1 submitted 18 October, 2023; originally announced November 2023.

    Comments: https://unified-materials.github.io/

  19. arXiv:2311.02194  [pdf, other

    cs.LG cs.AI

    AlberDICE: Addressing Out-Of-Distribution Joint Actions in Offline Multi-Agent RL via Alternating Stationary Distribution Correction Estimation

    Authors: Daiki E. Matsunaga, Jongmin Lee, Jaeseok Yoon, Stefanos Leonardos, Pieter Abbeel, Kee-Eung Kim

    Abstract: One of the main challenges in offline Reinforcement Learning (RL) is the distribution shift that arises from the learned policy deviating from the data collection policy. This is often addressed by avoiding out-of-distribution (OOD) actions during policy improvement as their presence can lead to substantial performance degradation. This challenge is amplified in the offline Multi-Agent RL (MARL) s… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

    Comments: 31 pages, 12 figures, Accepted at NeurIPS 2023

  20. arXiv:2311.01450  [pdf, other

    cs.LG cs.AI cs.RO

    DreamSmooth: Improving Model-based Reinforcement Learning via Reward Smoothing

    Authors: Vint Lee, Pieter Abbeel, Youngwoon Lee

    Abstract: Model-based reinforcement learning (MBRL) has gained much attention for its ability to learn complex behaviors in a sample-efficient way: planning actions by generating imaginary trajectories with predicted rewards. Despite its success, we found that surprisingly, reward prediction is often a bottleneck of MBRL, especially for sparse rewards that are challenging (or even ambiguous) to predict. Mot… ▽ More

    Submitted 17 February, 2024; v1 submitted 2 November, 2023; originally announced November 2023.

    Comments: For code and website, see https://vint-1.github.io/dreamsmooth/

  21. arXiv:2311.01011  [pdf, other

    cs.LG cs.CR

    Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game

    Authors: Sam Toyer, Olivia Watkins, Ethan Adrian Mendes, Justin Svegliato, Luke Bailey, Tiffany Wang, Isaac Ong, Karim Elmaaroufi, Pieter Abbeel, Trevor Darrell, Alan Ritter, Stuart Russell

    Abstract: While Large Language Models (LLMs) are increasingly being used in real-world applications, they remain vulnerable to prompt injection attacks: malicious third party prompts that subvert the intent of the system designer. To help researchers study this problem, we present a dataset of over 126,000 prompt injection attacks and 46,000 prompt-based "defenses" against prompt injection, all created by p… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

  22. arXiv:2311.00924  [pdf, other

    cs.RO cs.AI

    The Power of the Senses: Generalizable Manipulation from Vision and Touch through Masked Multimodal Learning

    Authors: Carmelo Sferrazza, Younggyo Seo, Hao Liu, Youngwoon Lee, Pieter Abbeel

    Abstract: Humans rely on the synergy of their senses for most essential tasks. For tasks requiring object manipulation, we seamlessly and effectively exploit the complementarity of our senses of vision and touch. This paper draws inspiration from such capabilities and aims to find a systematic approach to fuse visual and tactile information in a reinforcement learning setting. We propose Masked Multimodal L… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

  23. arXiv:2310.17688  [pdf, other

    cs.CY cs.AI cs.CL cs.LG

    Managing extreme AI risks amid rapid progress

    Authors: Yoshua Bengio, Geoffrey Hinton, Andrew Yao, Dawn Song, Pieter Abbeel, Trevor Darrell, Yuval Noah Harari, Ya-Qin Zhang, Lan Xue, Shai Shalev-Shwartz, Gillian Hadfield, Jeff Clune, Tegan Maharaj, Frank Hutter, Atılım Güneş Baydin, Sheila McIlraith, Qiqi Gao, Ashwin Acharya, David Krueger, Anca Dragan, Philip Torr, Stuart Russell, Daniel Kahneman, Jan Brauner, Sören Mindermann

    Abstract: Artificial Intelligence (AI) is progressing rapidly, and companies are shifting their focus to developing generalist AI systems that can autonomously act and pursue goals. Increases in capabilities and autonomy may soon massively amplify AI's impact, with risks that include large-scale social harms, malicious uses, and an irreversible loss of human control over autonomous AI systems. Although rese… ▽ More

    Submitted 22 May, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: Published in Science: https://www.science.org/doi/10.1126/science.adn0117

  24. arXiv:2310.10645  [pdf, other

    cs.RO cs.AI cs.CL cs.HC

    Interactive Task Planning with Language Models

    Authors: Boyi Li, Philipp Wu, Pieter Abbeel, Jitendra Malik

    Abstract: An interactive robot framework accomplishes long-horizon task planning and can easily generalize to new goals or distinct tasks, even during execution. However, most traditional methods require predefined module design, which makes it hard to generalize to different goals. Recent large language model based approaches can allow for more open-ended planning but often require heavy prompt engineering… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

  25. arXiv:2310.10625  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Video Language Planning

    Authors: Yilun Du, Mengjiao Yang, Pete Florence, Fei Xia, Ayzaan Wahid, Brian Ichter, Pierre Sermanet, Tianhe Yu, Pieter Abbeel, Joshua B. Tenenbaum, Leslie Kaelbling, Andy Zeng, Jonathan Tompson

    Abstract: We are interested in enabling visual planning for complex long-horizon tasks in the space of generated videos and language, leveraging recent advances in large generative models pretrained on Internet-scale data. To this end, we present video language planning (VLP), an algorithm that consists of a tree search procedure, where we train (i) vision-language models to serve as both policies and value… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: https://video-language-planning.github.io/

  26. arXiv:2310.08899  [pdf, other

    cs.CL

    Exploration with Principles for Diverse AI Supervision

    Authors: Hao Liu, Matei Zaharia, Pieter Abbeel

    Abstract: Training large transformers using next-token prediction has given rise to groundbreaking advancements in AI. While this generative AI approach has produced impressive results, it heavily leans on human supervision. Even state-of-the-art AI models like ChatGPT depend on fine-tuning through human demonstrations, demanding extensive human input and domain expertise. This strong reliance on human over… ▽ More

    Submitted 23 November, 2023; v1 submitted 13 October, 2023; originally announced October 2023.

  27. arXiv:2310.08864  [pdf, other

    cs.RO

    Open X-Embodiment: Robotic Learning Datasets and RT-X Models

    Authors: Open X-Embodiment Collaboration, Abby O'Neill, Abdul Rehman, Abhinav Gupta, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, Ajinkya Jain, Albert Tung, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anchit Gupta, Andrew Wang, Andrey Kolobov, Anikait Singh, Animesh Garg, Aniruddha Kembhavi, Annie Xie , et al. (267 additional authors not shown)

    Abstract: Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning method… ▽ More

    Submitted 1 June, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Project website: https://robotics-transformer-x.github.io

  28. arXiv:2310.06114  [pdf, other

    cs.AI

    Learning Interactive Real-World Simulators

    Authors: Mengjiao Yang, Yilun Du, Kamyar Ghasemipour, Jonathan Tompson, Leslie Kaelbling, Dale Schuurmans, Pieter Abbeel

    Abstract: Generative models trained on internet data have revolutionized how text, image, and video content can be created. Perhaps the next milestone for generative models is to simulate realistic experience in response to actions taken by humans, robots, and other interactive agents. Applications of a real-world simulator range from controllable content creation in games and movies, to training embodied a… ▽ More

    Submitted 12 January, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: https://universal-simulator.github.io

  29. arXiv:2310.02635  [pdf, other

    cs.RO cs.AI cs.LG

    Foundation Reinforcement Learning: towards Embodied Generalist Agents with Foundation Prior Assistance

    Authors: Weirui Ye, Yunsheng Zhang, Mengchen Wang, Shengjie Wang, Xianfan Gu, Pieter Abbeel, Yang Gao

    Abstract: Recently, people have shown that large-scale pre-training from internet-scale data is the key to building generalist models, as witnessed in NLP. To build embodied generalist agents, we and many other researchers hypothesize that such foundation prior is also an indispensable component. However, it is unclear what is the proper concrete form to represent those embodied foundation priors and how th… ▽ More

    Submitted 10 October, 2023; v1 submitted 4 October, 2023; originally announced October 2023.

  30. arXiv:2310.01889  [pdf, other

    cs.CL

    Ring Attention with Blockwise Transformers for Near-Infinite Context

    Authors: Hao Liu, Matei Zaharia, Pieter Abbeel

    Abstract: Transformers have emerged as the architecture of choice for many state-of-the-art AI models, showcasing exceptional performance across a wide range of AI applications. However, the memory demands imposed by Transformers limit their ability to handle long sequences, thereby posing challenges in utilizing videos, actions, and other long-form sequences and modalities in complex environments. We prese… ▽ More

    Submitted 27 November, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: Code: https://github.com/lhao499/llm_large_context

  31. arXiv:2309.13942  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Speed Co-Augmentation for Unsupervised Audio-Visual Pre-training

    Authors: Jiangliu Wang, Jianbo Jiao, Yibing Song, Stephen James, Zhan Tong, Chongjian Ge, Pieter Abbeel, Yun-hui Liu

    Abstract: This work aims to improve unsupervised audio-visual pre-training. Inspired by the efficacy of data augmentation in visual contrastive learning, we propose a novel speed co-augmentation method that randomly changes the playback speeds of both audio and video data. Despite its simplicity, the speed co-augmentation method possesses two compelling attributes: (1) it increases the diversity of audio-vi… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

    Comments: Published at the CVPR 2023 Sight and Sound workshop

  32. arXiv:2309.13037  [pdf, other

    cs.RO

    GELLO: A General, Low-Cost, and Intuitive Teleoperation Framework for Robot Manipulators

    Authors: Philipp Wu, Yide Shentu, Zhongke Yi, Xingyu Lin, Pieter Abbeel

    Abstract: Humans can teleoperate robots to accomplish complex manipulation tasks. Imitation learning has emerged as a powerful framework that leverages human teleoperated demonstrations to teach robots new skills. However, the performance of the learned policies is bottlenecked by the quality, scale, and variety of the demonstration data. In this paper, we aim to lower the barrier to collecting large and hi… ▽ More

    Submitted 18 July, 2024; v1 submitted 22 September, 2023; originally announced September 2023.

  33. arXiv:2308.16893  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Language-Conditioned Path Planning

    Authors: Amber Xie, Youngwoon Lee, Pieter Abbeel, Stephen James

    Abstract: Contact is at the core of robotic manipulation. At times, it is desired (e.g. manipulation and grasping), and at times, it is harmful (e.g. when avoiding obstacles). However, traditional path planning algorithms focus solely on collision-free paths, limiting their applicability in contact-rich tasks. To address this limitation, we propose the domain of Language-Conditioned Path Planning, where con… ▽ More

    Submitted 31 August, 2023; originally announced August 2023.

    Comments: Conference on Robot Learning, 2023

  34. arXiv:2308.12270  [pdf, other

    cs.LG cs.AI

    Language Reward Modulation for Pretraining Reinforcement Learning

    Authors: Ademi Adeniji, Amber Xie, Carmelo Sferrazza, Younggyo Seo, Stephen James, Pieter Abbeel

    Abstract: Using learned reward functions (LRFs) as a means to solve sparse-reward reinforcement learning (RL) tasks has yielded some steady progress in task-complexity through the years. In this work, we question whether today's LRFs are best-suited as a direct replacement for task rewards. Instead, we propose leveraging the capabilities of LRFs as a pretraining signal for RL. Concretely, we propose… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

    Comments: Code available at https://github.com/ademiadeniji/lamp

  35. arXiv:2308.06036  [pdf, ps, other

    cs.RO eess.SY

    The Impact of Overall Optimization on Warehouse Automation

    Authors: Hiroshi Yoshitake, Pieter Abbeel

    Abstract: In this study, we propose a novel approach for investigating optimization performance by flexible robot coordination in automated warehouses with multi-agent reinforcement learning (MARL)-based control. Automated systems using robots are expected to achieve efficient operations compared with manual systems in terms of overall optimization performance. However, the impact of overall optimization on… ▽ More

    Submitted 11 August, 2023; originally announced August 2023.

    Comments: 8 pages, 4 figures, accepted at International Conference on Intelligent Robots and Systems (IROS2023)

  36. arXiv:2308.01399  [pdf, other

    cs.CL cs.AI cs.LG

    Learning to Model the World with Language

    Authors: Jessy Lin, Yuqing Du, Olivia Watkins, Danijar Hafner, Pieter Abbeel, Dan Klein, Anca Dragan

    Abstract: To interact with humans and act in the world, agents need to understand the range of language that people use and relate it to the visual world. While current agents can learn to execute simple language instructions, we aim to build agents that leverage diverse language -- language like "this button turns on the TV" or "I put the bowls away" -- that conveys general knowledge, describes the state o… ▽ More

    Submitted 31 May, 2024; v1 submitted 31 July, 2023; originally announced August 2023.

    Comments: ICML 2024. Website: https://dynalang.github.io/

  37. arXiv:2308.00091  [pdf, other

    cs.RO cs.CV cs.LG

    Convolutional Occupancy Models for Dense Packing of Complex, Novel Objects

    Authors: Nikhil Mishra, Pieter Abbeel, Xi Chen, Maximilian Sieb

    Abstract: Dense packing in pick-and-place systems is an important feature in many warehouse and logistics applications. Prior work in this space has largely focused on planning algorithms in simulation, but real-world packing performance is often bottlenecked by the difficulty of perceiving 3D object geometry in highly occluded, partially observed scenes. In this work, we present a fully-convolutional shape… ▽ More

    Submitted 31 July, 2023; originally announced August 2023.

    Comments: In IROS 2023. Code and dataset are available at https://sites.google.com/view/fcon-packing/

  38. arXiv:2307.03567  [pdf, other

    cs.RO cs.CV

    SpawnNet: Learning Generalizable Visuomotor Skills from Pre-trained Networks

    Authors: Xingyu Lin, John So, Sashwat Mahalingam, Fangchen Liu, Pieter Abbeel

    Abstract: The existing internet-scale image and video datasets cover a wide range of everyday objects and tasks, bringing the potential of learning policies that generalize in diverse scenarios. Prior works have explored visual pre-training with different self-supervised objectives. Still, the generalization capabilities of the learned policies and the advantages over well-tuned baselines remain unclear fro… ▽ More

    Submitted 21 October, 2023; v1 submitted 7 July, 2023; originally announced July 2023.

  39. arXiv:2306.12554  [pdf, other

    cs.LG cs.AI

    Improving Long-Horizon Imitation Through Instruction Prediction

    Authors: Joey Hejna, Pieter Abbeel, Lerrel Pinto

    Abstract: Complex, long-horizon planning and its combinatorial nature pose steep challenges for learning-based agents. Difficulties in such settings are exacerbated in low data regimes where over-fitting stifles generalization and compounding errors hurt accuracy. In this work, we explore the use of an often unused source of auxiliary supervision: language. Inspired by recent advances in transformer-based m… ▽ More

    Submitted 21 June, 2023; originally announced June 2023.

    Comments: Published at AAAI 2023

  40. arXiv:2306.10190  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    ALP: Action-Aware Embodied Learning for Perception

    Authors: Xinran Liang, Anthony Han, Wilson Yan, Aditi Raghunathan, Pieter Abbeel

    Abstract: Current methods in training and benchmarking vision models exhibit an over-reliance on passive, curated datasets. Although models trained on these datasets have shown strong performance in a wide variety of tasks such as classification, detection, and segmentation, they fundamentally are unable to generalize to an ever-evolving world due to constant out-of-distribution shifts of input data. Theref… ▽ More

    Submitted 17 October, 2023; v1 submitted 16 June, 2023; originally announced June 2023.

    Comments: project website available at https://xinranliang.github.io/alp/

  41. arXiv:2306.01872  [pdf, other

    cs.AI

    Probabilistic Adaptation of Text-to-Video Models

    Authors: Mengjiao Yang, Yilun Du, Bo Dai, Dale Schuurmans, Joshua B. Tenenbaum, Pieter Abbeel

    Abstract: Large text-to-video models trained on internet-scale data have demonstrated exceptional capabilities in generating high-fidelity videos from arbitrary textual descriptions. However, adapting these models to tasks with limited domain-specific data, such as animation or robotics videos, poses a significant computational challenge, since finetuning a pretrained large model can be prohibitively expens… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

    Comments: Project website https://video-adapter.github.io/. First two authors contributed equally

  42. arXiv:2306.00942  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Train Offline, Test Online: A Real Robot Learning Benchmark

    Authors: Gaoyue Zhou, Victoria Dean, Mohan Kumar Srirama, Aravind Rajeswaran, Jyothish Pari, Kyle Hatch, Aryan Jain, Tianhe Yu, Pieter Abbeel, Lerrel Pinto, Chelsea Finn, Abhinav Gupta

    Abstract: Three challenges limit the progress of robot learning research: robots are expensive (few labs can participate), everyone uses different robots (findings do not generalize across labs), and we lack internet-scale robotics data. We take on these challenges via a new benchmark: Train Offline, Test Online (TOTO). TOTO provides remote users with access to shared robotic hardware for evaluating methods… ▽ More

    Submitted 30 June, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: Accepted to ICRA 2023

  43. arXiv:2305.19476  [pdf, other

    cs.LG cs.AI

    Accelerating Reinforcement Learning with Value-Conditional State Entropy Exploration

    Authors: Dongyoung Kim, Jinwoo Shin, Pieter Abbeel, Younggyo Seo

    Abstract: A promising technique for exploration is to maximize the entropy of visited state distribution, i.e., state entropy, by encouraging uniform coverage of visited state space. While it has been effective for an unsupervised setup, it tends to struggle in a supervised setup with a task reward, where an agent prefers to visit high-value states to exploit the task reward. Such a preference can cause an… ▽ More

    Submitted 30 June, 2024; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2024. Project webpage: https://sites.google.com/view/rl-vcse

  44. arXiv:2305.19370  [pdf, other

    cs.CL cs.LG

    Blockwise Parallel Transformer for Large Context Models

    Authors: Hao Liu, Pieter Abbeel

    Abstract: Transformers have emerged as the cornerstone of state-of-the-art natural language processing models, showcasing exceptional performance across a wide range of AI applications. However, the memory demands posed by the self-attention mechanism and the large feedforward network in Transformers limit their ability to handle long sequences, thereby creating challenges for tasks involving multiple long… ▽ More

    Submitted 28 August, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

  45. arXiv:2305.16554  [pdf, other

    cs.LG

    Emergent Agentic Transformer from Chain of Hindsight Experience

    Authors: Hao Liu, Pieter Abbeel

    Abstract: Large transformer models powered by diverse data and model scale have dominated natural language modeling and computer vision and pushed the frontier of multiple AI areas. In reinforcement learning (RL), despite many efforts into transformer-based policies, a key limitation, however, is that current transformer-based policies cannot learn by directly combining information from multiple sub-optimal… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

    Comments: International Conference on Machine Learning (ICML) 2023

  46. arXiv:2305.16381  [pdf, other

    cs.LG cs.CV

    DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models

    Authors: Ying Fan, Olivia Watkins, Yuqing Du, Hao Liu, Moonkyung Ryu, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, Kangwook Lee, Kimin Lee

    Abstract: Learning from human feedback has been shown to improve text-to-image models. These techniques first learn a reward function that captures what humans care about in the task and then improve the models based on the learned reward function. Even though relatively simple approaches (e.g., rejection sampling based on reward scores) have been investigated, fine-tuning text-to-image models with the rewa… ▽ More

    Submitted 1 November, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023

  47. arXiv:2305.15717  [pdf, other

    cs.CL

    The False Promise of Imitating Proprietary LLMs

    Authors: Arnav Gudibande, Eric Wallace, Charlie Snell, Xinyang Geng, Hao Liu, Pieter Abbeel, Sergey Levine, Dawn Song

    Abstract: An emerging method to cheaply improve a weaker language model is to finetune it on outputs from a stronger model, such as a proprietary system like ChatGPT (e.g., Alpaca, Self-Instruct, and others). This approach looks to cheaply imitate the proprietary model's capabilities using a weaker open-source model. In this work, we critically analyze this approach. We first finetune a series of LMs that i… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

  48. arXiv:2305.14343  [pdf, other

    cs.LG cs.AI cs.CV

    Video Prediction Models as Rewards for Reinforcement Learning

    Authors: Alejandro Escontrela, Ademi Adeniji, Wilson Yan, Ajay Jain, Xue Bin Peng, Ken Goldberg, Youngwoon Lee, Danijar Hafner, Pieter Abbeel

    Abstract: Specifying reward signals that allow agents to learn complex behaviors is a long-standing challenge in reinforcement learning. A promising approach is to extract preferences for behaviors from unlabeled videos, which are widely available on the internet. We present Video Prediction Rewards (VIPER), an algorithm that leverages pretrained video prediction models as action-free reward signals for rei… ▽ More

    Submitted 30 May, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: 22 pages, 18 figures, 4 tables. under review

  49. arXiv:2305.06305  [pdf, other

    cs.CV cs.AI cs.RO

    Self-Supervised Instance Segmentation by Grasping

    Authors: YuXuan Liu, Xi Chen, Pieter Abbeel

    Abstract: Instance segmentation is a fundamental skill for many robotic applications. We propose a self-supervised method that uses grasp interactions to collect segmentation supervision for an instance segmentation model. When a robot grasps an item, the mask of that grasped item can be inferred from the images of the scene before and after the grasp. Leveraging this insight, we learn a grasp segmentation… ▽ More

    Submitted 10 May, 2023; originally announced May 2023.

  50. arXiv:2305.02968  [pdf, other

    cs.LG cs.AI

    Masked Trajectory Models for Prediction, Representation, and Control

    Authors: Philipp Wu, Arjun Majumdar, Kevin Stone, Yixin Lin, Igor Mordatch, Pieter Abbeel, Aravind Rajeswaran

    Abstract: We introduce Masked Trajectory Models (MTM) as a generic abstraction for sequential decision making. MTM takes a trajectory, such as a state-action sequence, and aims to reconstruct the trajectory conditioned on random subsets of the same trajectory. By training with a highly randomized masking pattern, MTM learns versatile networks that can take on different roles or capabilities, by simply choos… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

    Comments: Accepted for publication at ICML 2023. Project webpage: https://wuphilipp.github.io/mtm/