Skip to main content

Showing 1–50 of 116 results for author: Yin, P

  1. arXiv:2407.10956  [pdf, other

    cs.AI cs.CL

    Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?

    Authors: Ruisheng Cao, Fangyu Lei, Haoyuan Wu, Jixuan Chen, Yeqiao Fu, Hongcheng Gao, Xinzhuang Xiong, Hanchong Zhang, Yuchen Mao, Wenjing Hu, Tianbao Xie, Hongshen Xu, Danyang Zhang, Sida Wang, Ruoxi Sun, Pengcheng Yin, Caiming Xiong, Ansong Ni, Qian Liu, Victor Zhong, Lu Chen, Kai Yu, Tao Yu

    Abstract: Data science and engineering workflows often span multiple stages, from warehousing to orchestration, using tools like BigQuery, dbt, and Airbyte. As vision language models (VLMs) advance in multimodal understanding and code generation, VLM-based agents could potentially automate these workflows by generating SQL queries, Python code, and GUI operations. This automation can improve the productivit… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 34 pages, 14 figures, 10 tables

  2. arXiv:2407.09522  [pdf, other

    cs.DB cs.AI cs.LG stat.ML

    UQE: A Query Engine for Unstructured Databases

    Authors: Hanjun Dai, Bethany Yixin Wang, Xingchen Wan, Bo Dai, Sherry Yang, Azade Nova, Pengcheng Yin, Phitchaya Mangpo Phothilimthana, Charles Sutton, Dale Schuurmans

    Abstract: Analytics on structured data is a mature field with many successful methods. However, most real world data exists in unstructured form, such as images and conversations. We investigate the potential of Large Language Models (LLMs) to enable unstructured data analytics. In particular, we propose a new Universal Query Engine (UQE) that directly interrogates and draws insights from unstructured data… ▽ More

    Submitted 23 June, 2024; originally announced July 2024.

  3. arXiv:2407.03347  [pdf, other

    math.NA cs.LG math-ph

    Chebyshev Spectral Neural Networks for Solving Partial Differential Equations

    Authors: Pengsong Yin, Shuo Ling, Wenjun Ying

    Abstract: The purpose of this study is to utilize the Chebyshev spectral method neural network(CSNN) model to solve differential equations. This approach employs a single-layer neural network wherein Chebyshev spectral methods are used to construct neurons satisfying boundary conditions. The study uses a feedforward neural network model and error backpropagation principles, utilizing automatic differentiati… ▽ More

    Submitted 6 June, 2024; originally announced July 2024.

  4. arXiv:2407.01013  [pdf, other

    cs.RO

    Collaborative Graph Exploration with Reduced Pose-SLAM Uncertainty via Submodular Optimization

    Authors: Ruofei Bai, Shenghai Yuan, Hongliang Guo, Pengyu Yin, Wei-Yun Yau, Lihua Xie

    Abstract: This paper considers the collaborative graph exploration problem in GPS-denied environments, where a group of robots are required to cover a graph environment while maintaining reliable pose estimations in collaborative simultaneous localization and mapping (SLAM). Considering both objectives presents challenges for multi-robot pathfinding, as it involves the expensive covariance inference for SLA… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 9 pages, 13 figures, accepted by IEEE/RSJ IROS(2024)

  5. arXiv:2406.00800  [pdf, other

    cs.LG cs.AI

    MagR: Weight Magnitude Reduction for Enhancing Post-Training Quantization

    Authors: Aozhong Zhang, Naigang Wang, Yanxia Deng, Xin Li, Zi Yang, Penghang Yin

    Abstract: In this paper, we present a simple optimization-based preprocessing technique called Weight Magnitude Reduction (MagR) to improve the performance of post-training quantization. For each linear layer, we adjust the pre-trained floating-point weights by solving an $\ell_\infty$-regularized optimization problem. This process greatly diminishes the maximum magnitude of the weights and smooths out outl… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  6. arXiv:2405.12514  [pdf

    cs.HC cs.AI

    Future You: A Conversation with an AI-Generated Future Self Reduces Anxiety, Negative Emotions, and Increases Future Self-Continuity

    Authors: Pat Pataranutaporn, Kavin Winson, Peggy Yin, Auttasak Lapapirojn, Pichayoot Ouppaphan, Monchai Lertsutthiwong, Pattie Maes, Hal Hershfield

    Abstract: We introduce "Future You," an interactive, brief, single-session, digital chat intervention designed to improve future self-continuity--the degree of connection an individual feels with a temporally distant future self--a characteristic that is positively related to mental health and wellbeing. Our system allows users to chat with a relatable yet AI-powered virtual version of their future selves t… ▽ More

    Submitted 9 July, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

  7. arXiv:2405.04812  [pdf, other

    cs.RO cs.CV

    General Place Recognition Survey: Towards Real-World Autonomy

    Authors: Peng Yin, Jianhao Jiao, Shiqi Zhao, Lingyun Xu, Guoquan Huang, Howie Choset, Sebastian Scherer, Jianda Han

    Abstract: In the realm of robotics, the quest for achieving real-world autonomy, capable of executing large-scale and long-term operations, has positioned place recognition (PR) as a cornerstone technology. Despite the PR community's remarkable strides over the past two decades, garnering attention from fields like computer vision and robotics, the development of PR methods that sufficiently support real-wo… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: 20 pages, 12 figures, under review

  8. arXiv:2404.14662  [pdf, other

    cs.LG cs.CL cs.PL cs.SE

    NExT: Teaching Large Language Models to Reason about Code Execution

    Authors: Ansong Ni, Miltiadis Allamanis, Arman Cohan, Yinlin Deng, Kensen Shi, Charles Sutton, Pengcheng Yin

    Abstract: A fundamental skill among human developers is the ability to understand and reason about program execution. As an example, a programmer can mentally simulate code execution in natural language to debug and repair code (aka. rubber duck debugging). However, large language models (LLMs) of code are typically trained on the surface textual form of programs, thus may lack a semantic understanding of h… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: 35 pages

  9. arXiv:2404.12308  [pdf, other

    cs.RO cs.LG eess.SY

    ASID: Active Exploration for System Identification in Robotic Manipulation

    Authors: Marius Memmel, Andrew Wagenmaker, Chuning Zhu, Patrick Yin, Dieter Fox, Abhishek Gupta

    Abstract: Model-free control strategies such as reinforcement learning have shown the ability to learn control strategies without requiring an accurate model or simulator of the world. While this is appealing due to the lack of modeling requirements, such methods can be sample inefficient, making them impractical in many real-world domains. On the other hand, model-based control techniques leveraging accura… ▽ More

    Submitted 26 June, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: Project website at https://weirdlabuw.github.io/asid

  10. arXiv:2403.14734  [pdf, other

    cs.SE cs.AI cs.CL cs.PL

    A Survey of Neural Code Intelligence: Paradigms, Advances and Beyond

    Authors: Qiushi Sun, Zhirui Chen, Fangzhi Xu, Kanzhi Cheng, Chang Ma, Zhangyue Yin, Jianing Wang, Chengcheng Han, Renyu Zhu, Shuai Yuan, Qipeng Guo, Xipeng Qiu, Pengcheng Yin, Xiaoli Li, Fei Yuan, Lingpeng Kong, Xiang Li, Zhiyong Wu

    Abstract: Neural Code Intelligence -- leveraging deep learning to understand, generate, and optimize code -- holds immense potential for transformative impacts on the whole society. Bridging the gap between Natural Language and Programming Language, this domain has drawn significant attention from researchers in both research communities over the past few years. This survey presents a systematic and chronol… ▽ More

    Submitted 23 June, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: 64 pages, 6 figures, 10 tables, 692 references

  11. arXiv:2403.12945  [pdf, other

    cs.RO

    DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

    Authors: Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany, Mohan Kumar Srirama, Lawrence Yunliang Chen, Kirsty Ellis, Peter David Fagan, Joey Hejna, Masha Itkina, Marion Lepert, Yecheng Jason Ma, Patrick Tree Miller, Jimmy Wu, Suneel Belkhale, Shivin Dass, Huy Ha, Arhan Jain, Abraham Lee, Youngwoon Lee, Marius Memmel, Sungjae Park , et al. (74 additional authors not shown)

    Abstract: The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a resu… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Project website: https://droid-dataset.github.io/

  12. arXiv:2403.11496  [pdf, other

    cs.RO cs.AI

    MCD: Diverse Large-Scale Multi-Campus Dataset for Robot Perception

    Authors: Thien-Minh Nguyen, Shenghai Yuan, Thien Hoang Nguyen, Pengyu Yin, Haozhi Cao, Lihua Xie, Maciej Wozniak, Patric Jensfelt, Marko Thiel, Justin Ziegenbein, Noel Blunder

    Abstract: Perception plays a crucial role in various robot applications. However, existing well-annotated datasets are biased towards autonomous driving scenarios, while unlabelled SLAM datasets are quickly over-fitted, and often lack environment and domain variations. To expand the frontier of these fields, we introduce a comprehensive dataset named MCD (Multi-Campus Dataset), featuring a wide range of sen… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted by The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024

  13. arXiv:2403.07134  [pdf, other

    cs.LG cs.CV

    COMQ: A Backpropagation-Free Algorithm for Post-Training Quantization

    Authors: Aozhong Zhang, Zi Yang, Naigang Wang, Yingyong Qin, Jack Xin, Xin Li, Penghang Yin

    Abstract: Post-training quantization (PTQ) has emerged as a practical approach to compress large neural networks, making them highly efficient for deployment. However, effectively reducing these models to their low-bit counterparts without compromising the original accuracy remains a key challenge. In this paper, we propose an innovative PTQ algorithm termed COMQ, which sequentially conducts coordinate-wise… ▽ More

    Submitted 4 June, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

  14. arXiv:2403.06461  [pdf, other

    cs.CV

    Reliable Spatial-Temporal Voxels For Multi-Modal Test-Time Adaptation

    Authors: Haozhi Cao, Yuecong Xu, Jianfei Yang, Pengyu Yin, Xingyu Ji, Shenghai Yuan, Lihua Xie

    Abstract: Multi-modal test-time adaptation (MM-TTA) is proposed to adapt models to an unlabeled target domain by leveraging the complementary multi-modal inputs in an online manner. Previous MM-TTA methods rely on predictions of cross-modal information in each input frame, while they ignore the fact that predictions of geometric neighborhoods within consecutive frames are highly correlated, leading to unsta… ▽ More

    Submitted 15 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

  15. arXiv:2403.05124  [pdf, other

    cs.CV

    CLIP-Gaze: Towards General Gaze Estimation via Visual-Linguistic Model

    Authors: Pengwei Yin, Guanzhong Zeng, Jingjing Wang, Di Xie

    Abstract: Gaze estimation methods often experience significant performance degradation when evaluated across different domains, due to the domain gap between the testing and training data. Existing methods try to address this issue using various domain generalization approaches, but with little success because of the limited diversity of gaze datasets, such as appearance, wearable, and image quality. To ove… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: Accepted to AAAI 2024

  16. arXiv:2402.08699  [pdf, other

    cs.SE cs.LG

    Unsupervised Evaluation of Code LLMs with Round-Trip Correctness

    Authors: Miltiadis Allamanis, Sheena Panthaplackel, Pengcheng Yin

    Abstract: To evaluate code large language models (LLMs), research has relied on a few small manually curated benchmarks, such as HumanEval and MBPP, which represent a narrow part of the real-world software domains. In this work, we introduce round-trip correctness (RTC) as an alternative evaluation method. RTC allows Code LLM evaluation on a broader spectrum of real-world software domains without the need f… ▽ More

    Submitted 27 May, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Comments: Published in ICML 2024

  17. arXiv:2402.08073  [pdf, other

    cs.LG cs.PL cs.SE

    Grounding Data Science Code Generation with Input-Output Specifications

    Authors: Yeming Wen, Pengcheng Yin, Kensen Shi, Henryk Michalewski, Swarat Chaudhuri, Alex Polozov

    Abstract: Large language models (LLMs) have recently demonstrated a remarkable ability to generate code from natural language (NL) prompts. However, in the real world, NL is often too ambiguous to capture the true intent behind programming problems, requiring additional input-output (I/O) specifications. Unfortunately, LLMs can have difficulty aligning their outputs with both the NL prompt and the I/O speci… ▽ More

    Submitted 14 March, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

  18. arXiv:2401.11140  [pdf, other

    cs.CV cs.AI

    Stability Plasticity Decoupled Fine-tuning For Few-shot end-to-end Object Detection

    Authors: Yuantao Yin, Ping Yin

    Abstract: Few-shot object detection(FSOD) aims to design methods to adapt object detectors efficiently with only few annotated samples. Fine-tuning has been shown to be an effective and practical approach. However, previous works often take the classical base-novel two stage fine-tuning procedure but ignore the implicit stability-plasticity contradiction among different modules. Specifically, the random re-… ▽ More

    Submitted 20 January, 2024; originally announced January 2024.

  19. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  20. arXiv:2312.10365  [pdf, other

    cs.DC cs.AI

    SPT: Fine-Tuning Transformer-based Language Models Efficiently with Sparsification

    Authors: Yuntao Gui, Xiao Yan, Peiqi Yin, Han Yang, James Cheng

    Abstract: Transformer-based large language models (e.g., BERT and GPT) achieve great success, and fine-tuning, which tunes a pre-trained model on a task-specific dataset, is the standard practice to utilize these models for downstream tasks. However, Transformer fine-tuning has long running time and high memory consumption due to the large size of the models. We propose the SPT system to fine-tune Transform… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

    Comments: Firstly submitted to VLDB November 1, 2023, rejection received on December 15, 2023

  21. arXiv:2311.17311  [pdf, other

    cs.CL cs.AI

    Universal Self-Consistency for Large Language Model Generation

    Authors: Xinyun Chen, Renat Aksitov, Uri Alon, Jie Ren, Kefan Xiao, Pengcheng Yin, Sushant Prakash, Charles Sutton, Xuezhi Wang, Denny Zhou

    Abstract: Self-consistency with chain-of-thought prompting (CoT) has demonstrated remarkable performance gains on various challenging tasks, by utilizing multiple reasoning paths sampled from large language models (LLMs). However, self-consistency relies on the answer extraction process to aggregate multiple solutions, which is not applicable to free-form answers. In this work, we propose Universal Self-Con… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  22. arXiv:2311.16450  [pdf, other

    cs.CV cs.AI

    Typhoon Intensity Prediction with Vision Transformer

    Authors: Huanxin Chen, Pengshuai Yin, Huichou Huang, Qingyao Wu, Ruirui Liu, Xiatian Zhu

    Abstract: Predicting typhoon intensity accurately across space and time is crucial for issuing timely disaster warnings and facilitating emergency response. This has vast potential for minimizing life losses and property damages as well as reducing economic and environmental impacts. Leveraging satellite imagery for scenario analysis is effective but also introduces additional challenges due to the complex… ▽ More

    Submitted 4 December, 2023; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: 8 pages, 2 figures, accepted by Tackling Climate Change with Machine Learning: workshop at NeurIPS 2023

  23. arXiv:2311.02883  [pdf, other

    cs.CL

    SQLPrompt: In-Context Text-to-SQL with Minimal Labeled Data

    Authors: Ruoxi Sun, Sercan Ö. Arik, Rajarishi Sinha, Hootan Nakhost, Hanjun Dai, Pengcheng Yin, Tomas Pfister

    Abstract: Text-to-SQL aims to automate the process of generating SQL queries on a database from natural language text. In this work, we propose "SQLPrompt", tailored to improve the few-shot prompting capabilities of Text-to-SQL for Large Language Models (LLMs). Our methods include innovative prompt design, execution-based consistency decoding strategy which selects the SQL with the most consistent execution… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

  24. arXiv:2310.08864  [pdf, other

    cs.RO

    Open X-Embodiment: Robotic Learning Datasets and RT-X Models

    Authors: Open X-Embodiment Collaboration, Abby O'Neill, Abdul Rehman, Abhinav Gupta, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, Ajinkya Jain, Albert Tung, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anchit Gupta, Andrew Wang, Andrey Kolobov, Anikait Singh, Animesh Garg, Aniruddha Kembhavi, Annie Xie , et al. (267 additional authors not shown)

    Abstract: Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning method… ▽ More

    Submitted 1 June, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Project website: https://robotics-transformer-x.github.io

  25. arXiv:2309.17446  [pdf, other

    cs.CL cs.LG cs.PL cs.SE

    L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models

    Authors: Ansong Ni, Pengcheng Yin, Yilun Zhao, Martin Riddell, Troy Feng, Rui Shen, Stephen Yin, Ye Liu, Semih Yavuz, Caiming Xiong, Shafiq Joty, Yingbo Zhou, Dragomir Radev, Arman Cohan

    Abstract: Recently, large language models (LLMs), especially those that are pretrained on code, have demonstrated strong capabilities in generating programs from natural language inputs in a few-shot or even zero-shot manner. Despite promising results, there is a notable lack of a comprehensive evaluation of these models language-to-code generation capabilities. Existing studies often focus on specific task… ▽ More

    Submitted 2 October, 2023; v1 submitted 29 September, 2023; originally announced September 2023.

    Comments: Project Website: https://l2c-eval.github.io/

  26. arXiv:2309.11839  [pdf, other

    cs.CV cs.RO

    MoPA: Multi-Modal Prior Aided Domain Adaptation for 3D Semantic Segmentation

    Authors: Haozhi Cao, Yuecong Xu, Jianfei Yang, Pengyu Yin, Shenghai Yuan, Lihua Xie

    Abstract: Multi-modal unsupervised domain adaptation (MM-UDA) for 3D semantic segmentation is a practical solution to embed semantic understanding in autonomous systems without expensive point-wise annotations. While previous MM-UDA methods can achieve overall improvement, they suffer from significant class-imbalanced performance, restricting their adoption in real applications. This imbalanced performance… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

  27. arXiv:2309.08914  [pdf, other

    cs.RO

    Outram: One-shot Global Localization via Triangulated Scene Graph and Global Outlier Pruning

    Authors: Pengyu Yin, Haozhi Cao, Thien-Minh Nguyen, Shenghai Yuan, Shuyang Zhang, Kangcheng Liu, Lihua Xie

    Abstract: One-shot LiDAR localization refers to the ability to estimate the robot pose from one single point cloud, which yields significant advantages in initialization and relocalization processes. In the point cloud domain, the topic has been extensively studied as a global descriptor retrieval (i.e., loop closure detection) and pose refinement (i.e., point cloud registration) problem both in isolation o… ▽ More

    Submitted 16 September, 2023; originally announced September 2023.

    Comments: 8 pages, 5 figures

  28. arXiv:2308.13922  [pdf, other

    cs.AR

    An Efficient FPGA-Based Accelerator for Swin Transformer

    Authors: Zhiyang Liu, Pengyu Yin, Zhenhua Ren

    Abstract: Since introduced, Swin Transformer has achieved remarkable results in the field of computer vision, it has sparked the need for dedicated hardware accelerators, specifically catering to edge computing demands. For the advantages of flexibility, low power consumption, FPGAs have been widely employed to accelerate the inference of convolutional neural networks (CNNs) and show potential in Transforme… ▽ More

    Submitted 26 August, 2023; originally announced August 2023.

  29. arXiv:2307.13883  [pdf, other

    cs.LG cs.PL

    ExeDec: Execution Decomposition for Compositional Generalization in Neural Program Synthesis

    Authors: Kensen Shi, Joey Hong, Yinlin Deng, Pengcheng Yin, Manzil Zaheer, Charles Sutton

    Abstract: When writing programs, people have the ability to tackle a new complex task by decomposing it into smaller and more familiar subtasks. While it is difficult to measure whether neural program synthesis methods have similar capabilities, we can measure whether they compositionally generalize, that is, whether a model that has been trained on the simpler subtasks is subsequently able to solve more co… ▽ More

    Submitted 6 May, 2024; v1 submitted 25 July, 2023; originally announced July 2023.

    Comments: ICLR 2024

  30. arXiv:2306.17436  [pdf, other

    cs.RO

    LIO-GVM: an Accurate, Tightly-Coupled Lidar-Inertial Odometry with Gaussian Voxel Map

    Authors: Xingyu Ji, Shenghai Yuan, Pengyu Yin, Lihua Xie

    Abstract: This letter presents an accurate and robust Lidar Inertial Odometry framework. We fuse LiDAR scans with IMU data using a tightly-coupled iterative error state Kalman filter for robust and fast localization. To achieve robust correspondence matching, we represent the points as a set of Gaussian distributions and evaluate the divergence in variance for outlier rejection. Based on the fitted distribu… ▽ More

    Submitted 6 May, 2024; v1 submitted 30 June, 2023; originally announced June 2023.

  31. arXiv:2306.03346  [pdf, other

    cs.LG cs.AI

    Stabilizing Contrastive RL: Techniques for Robotic Goal Reaching from Offline Data

    Authors: Chongyi Zheng, Benjamin Eysenbach, Homer Walke, Patrick Yin, Kuan Fang, Ruslan Salakhutdinov, Sergey Levine

    Abstract: Robotic systems that rely primarily on self-supervised learning have the potential to decrease the amount of human annotation and engineering effort required to learn control strategies. In the same way that prior robotic systems have leveraged self-supervised techniques from computer vision (CV) and natural language processing (NLP), our work builds on prior work showing that the reinforcement le… ▽ More

    Submitted 25 February, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

    Comments: ICLR 2024 Spotlight (< 5%). Website (https://chongyi-zheng.github.io/stable_contrastive_rl) and code (https://github.com/chongyi-zheng/stable_contrastive_rl)

  32. arXiv:2306.00739  [pdf, other

    cs.CL cs.AI cs.DB

    SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended)

    Authors: Ruoxi Sun, Sercan Ö. Arik, Alex Muzio, Lesly Miculicich, Satya Gundabathula, Pengcheng Yin, Hanjun Dai, Hootan Nakhost, Rajarishi Sinha, Zifeng Wang, Tomas Pfister

    Abstract: Text-to-SQL, the process of translating natural language into Structured Query Language (SQL), represents a transformative application of large language models (LLMs), potentially revolutionizing how humans interact with data. This paper introduces the SQL-PaLM framework, a comprehensive solution for understanding and enhancing Text-to-SQL using LLMs, using in the learning regimes of few-shot prom… ▽ More

    Submitted 30 March, 2024; v1 submitted 26 May, 2023; originally announced June 2023.

  33. arXiv:2305.10403  [pdf, other

    cs.CL cs.AI

    PaLM 2 Technical Report

    Authors: Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yanping Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yujing Zhang, Gustavo Hernandez Abrego , et al. (103 additional authors not shown)

    Abstract: We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on… ▽ More

    Submitted 13 September, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

  34. Incorporating Experts' Judgment into Machine Learning Models

    Authors: Hogun Park, Aly Megahed, Peifeng Yin, Yuya Ong, Pravar Mahajan, Pei Guo

    Abstract: Machine learning (ML) models have been quite successful in predicting outcomes in many applications. However, in some cases, domain experts might have a judgment about the expected outcome that might conflict with the prediction of ML models. One main reason for this is that the training data might not be totally representative of the population. In this paper, we present a novel framework that ai… ▽ More

    Submitted 29 April, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

    Comments: Accepted to Expert Systems with Applications Journal, 2023

  35. arXiv:2303.10457  [pdf, other

    cs.CV cs.RO

    Multi-Modal Continual Test-Time Adaptation for 3D Semantic Segmentation

    Authors: Haozhi Cao, Yuecong Xu, Jianfei Yang, Pengyu Yin, Shenghai Yuan, Lihua Xie

    Abstract: Continual Test-Time Adaptation (CTTA) generalizes conventional Test-Time Adaptation (TTA) by assuming that the target domain is dynamic over time rather than stationary. In this paper, we explore Multi-Modal Continual Test-Time Adaptation (MM-CTTA) as a new extension of CTTA for 3D semantic segmentation. The key to MM-CTTA is to adaptively attend to the reliable modality while avoiding catastrophi… ▽ More

    Submitted 18 March, 2023; originally announced March 2023.

    Comments: 15 pages, 6 tables, 7 figures

  36. arXiv:2302.10899  [pdf, other

    cs.LG cs.AI cs.IT math.NA

    Feature Affinity Assisted Knowledge Distillation and Quantization of Deep Neural Networks on Label-Free Data

    Authors: Zhijian Li, Biao Yang, Penghang Yin, Yingyong Qi, Jack Xin

    Abstract: In this paper, we propose a feature affinity (FA) assisted knowledge distillation (KD) method to improve quantization-aware training of deep neural networks (DNN). The FA loss on intermediate feature maps of DNNs plays the role of teaching middle steps of a solution to a student instead of only giving final answers in the conventional KD where the loss acts on the network logits at the output leve… ▽ More

    Submitted 18 August, 2023; v1 submitted 9 February, 2023; originally announced February 2023.

  37. arXiv:2301.08857  [pdf, other

    cs.RO

    CoBigICP: Robust and Precise Point Set Registration using Correntropy Metrics and Bidirectional Correspondence

    Authors: Pengyu Yin, Di Wang, Shaoyi Du, Shihui Ying, Yue Gao, Nanning Zheng

    Abstract: In this paper, we propose a novel probabilistic variant of iterative closest point (ICP) dubbed as CoBigICP. The method leverages both local geometrical information and global noise characteristics. Locally, the 3D structure of both target and source clouds are incorporated into the objective function through bidirectional correspondence. Globally, error metric of correntropy is introduced as nois… ▽ More

    Submitted 20 January, 2023; originally announced January 2023.

    Comments: 6 pages, 4 figures. Accepted to IROS2020

  38. arXiv:2301.07425  [pdf, other

    cs.RO

    Segregator: Global Point Cloud Registration with Semantic and Geometric Cues

    Authors: Pengyu Yin, Shenghai Yuan, Haozhi Cao, Xingyu Ji, Shuyang Zhang, Lihua Xie

    Abstract: This paper presents Segregator, a global point cloud registration framework that exploits both semantic information and geometric distribution to efficiently build up outlier-robust correspondences and search for inliers. Current state-of-the-art algorithms rely on point features to set up putative correspondences and refine them by employing pair-wise distance consistency checks. However, such a… ▽ More

    Submitted 28 February, 2023; v1 submitted 18 January, 2023; originally announced January 2023.

    Comments: 6 pages, 5 figures. Accepted to ICRA2023

  39. arXiv:2212.14710  [pdf, other

    cs.CV

    NeRF-Gaze: A Head-Eye Redirection Parametric Model for Gaze Estimation

    Authors: Pengwei Yin, Jiawu Dai, Jingjing Wang, Di Xie, Shiliang Pu

    Abstract: Gaze estimation is the fundamental basis for many visual tasks. Yet, the high cost of acquiring gaze datasets with 3D annotations hinders the optimization and application of gaze estimation models. In this work, we propose a novel Head-Eye redirection parametric model based on Neural Radiance Field, which allows dense gaze data generation with view consistency and accurate gaze direction. Moreover… ▽ More

    Submitted 30 December, 2022; originally announced December 2022.

    Comments: 10 pages, 8 figures, submitted to CVPR 2023

  40. arXiv:2212.09248  [pdf, other

    cs.CL cs.SE

    Natural Language to Code Generation in Interactive Data Science Notebooks

    Authors: Pengcheng Yin, Wen-Ding Li, Kefan Xiao, Abhishek Rao, Yeming Wen, Kensen Shi, Joshua Howland, Paige Bailey, Michele Catasta, Henryk Michalewski, Alex Polozov, Charles Sutton

    Abstract: Computational notebooks, such as Jupyter notebooks, are interactive computing environments that are ubiquitous among data scientists to perform data wrangling and analytic tasks. To measure the performance of AI pair programmers that automatically synthesize programs for those tasks given natural language (NL) intents from users, we build ARCADE, a benchmark of 1082 code generation problems using… ▽ More

    Submitted 19 December, 2022; originally announced December 2022.

    Comments: 46 pages. 32 figures

  41. arXiv:2211.15082  [pdf, other

    cs.LG

    DGI: Easy and Efficient Inference for GNNs

    Authors: Peiqi Yin, Xiao Yan, Jinjing Zhou, Qiang Fu, Zhenkun Cai, James Cheng, Bo Tang, Minjie Wang

    Abstract: While many systems have been developed to train Graph Neural Networks (GNNs), efficient model inference and evaluation remain to be addressed. For instance, using the widely adopted node-wise approach, model evaluation can account for up to 94% of the time in the end-to-end training process due to neighbor explosion, which means that a node accesses its multi-hop neighbors. On the other hand, laye… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.

    Comments: 10 pages, 10 figures

  42. arXiv:2210.06601  [pdf, other

    cs.RO cs.AI cs.LG

    Generalization with Lossy Affordances: Leveraging Broad Offline Data for Learning Visuomotor Tasks

    Authors: Kuan Fang, Patrick Yin, Ashvin Nair, Homer Walke, Gengchen Yan, Sergey Levine

    Abstract: The utilization of broad datasets has proven to be crucial for generalization for a wide range of fields. However, how to effectively make use of diverse multi-task data for novel downstream tasks still remains a grand challenge in robotics. To tackle this challenge, we introduce a framework that acquires goal-conditioned policies for unseen temporally extended tasks via offline reinforcement lear… ▽ More

    Submitted 18 April, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: CoRL 2022

  43. arXiv:2209.14265  [pdf, other

    cs.CV

    360FusionNeRF: Panoramic Neural Radiance Fields with Joint Guidance

    Authors: Shreyas Kulkarni, Peng Yin, Sebastian Scherer

    Abstract: We present a method to synthesize novel views from a single $360^\circ$ panorama image based on the neural radiance field (NeRF). Prior studies in a similar setting rely on the neighborhood interpolation capability of multi-layer perceptions to complete missing regions caused by occlusion, which leads to artifacts in their predictions. We propose 360FusionNeRF, a semi-supervised learning framework… ▽ More

    Submitted 3 October, 2022; v1 submitted 28 September, 2022; originally announced September 2022.

    Comments: 8 pages, Fig 3, Submitted to IEEE RAL. arXiv admin note: text overlap with arXiv:2106.10859, arXiv:2104.00677, arXiv:2203.09957, arXiv:2204.00928 by other authors

  44. arXiv:2209.10848  [pdf, other

    cs.SD cs.AI eess.AS

    MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and Accompanied Baseline

    Authors: Yifan Hu, Pengkai Yin, Rui Liu, Feilong Bao, Guanglai Gao

    Abstract: This paper introduces a high-quality open-source text-to-speech (TTS) synthesis dataset for Mongolian, a low-resource language spoken by over 10 million people worldwide. The dataset, named MnTTS, consists of about 8 hours of transcribed audio recordings spoken by a 22-year-old professional female Mongolian announcer. It is the first publicly available dataset developed to promote Mongolian TTS ap… ▽ More

    Submitted 22 September, 2022; originally announced September 2022.

    Comments: Accepted at the 2022 International Conference on Asian Language Processing (IALP2022)

  45. arXiv:2209.10775  [pdf, other

    cs.RO cs.AI

    MUI-TARE: Multi-Agent Cooperative Exploration with Unknown Initial Position

    Authors: Jingtian Yan, Xingqiao Lin, Zhongqiang Ren, Shiqi Zhao, Jieqiong Yu, Chao Cao, Peng Yin, Ji Zhang, Sebastian Scherer

    Abstract: Multi-agent exploration of a bounded 3D environment with unknown initial positions of agents is a challenging problem. It requires quickly exploring the environments as well as robustly merging the sub-maps built by the agents. We take the view that the existing approaches are either aggressive or conservative: Aggressive strategies merge two sub-maps built by different agents together when overla… ▽ More

    Submitted 22 September, 2022; originally announced September 2022.

    Comments: 8 pages, 8 figures, Submitted to IEEE RAL

  46. arXiv:2209.06376  [pdf, other

    cs.CV cs.RO

    iSimLoc: Visual Global Localization for Previously Unseen Environments with Simulated Images

    Authors: Peng Yin, Ivan Cisneros, Ji Zhang, Howie Choset, Sebastian Scherer

    Abstract: The visual camera is an attractive device in beyond visual line of sight (B-VLOS) drone operation, since they are low in size, weight, power, and cost, and can provide redundant modality to GPS failures. However, state-of-the-art visual localization algorithms are unable to match visual data that have a significantly different appearance due to illuminations or viewpoints. This paper presents iSim… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

    Comments: 17 pages, 16 Figures, Conditional accpted by IEEE Transactions on Robotics

  47. arXiv:2209.04497  [pdf, other

    cs.RO cs.CV

    General Place Recognition Survey: Towards the Real-world Autonomy Age

    Authors: Peng Yin, Shiqi Zhao, Ivan Cisneros, Abulikemu Abuduweili, Guoquan Huang, Micheal Milford, Changliu Liu, Howie Choset, Sebastian Scherer

    Abstract: Place recognition is the fundamental module that can assist Simultaneous Localization and Mapping (SLAM) in loop-closure detection and re-localization for long-term navigation. The place recognition community has made astonishing progress over the last $20$ years, and this has attracted widespread research interest and application in multiple fields such as computer vision and robotics. However, f… ▽ More

    Submitted 9 September, 2022; originally announced September 2022.

    Comments: 20 pages, 10 figures. Submitted to IEEE T-RO survey paper

  48. arXiv:2208.14543  [pdf, other

    cs.RO cs.CV cs.LG

    BioSLAM: A Bio-inspired Lifelong Memory System for General Place Recognition

    Authors: Peng Yin, Abulikemu Abuduweili, Shiqi Zhao, Changliu Liu, Sebastian Scherer

    Abstract: We present BioSLAM, a lifelong SLAM framework for learning various new appearances incrementally and maintaining accurate place recognition for previously visited areas. Unlike humans, artificial neural networks suffer from catastrophic forgetting and may forget the previously visited areas when trained with new arrivals. For humans, researchers discover that there exists a memory replay mechanism… ▽ More

    Submitted 30 August, 2022; originally announced August 2022.

    Comments: 19 pages, 18 figures, submitted to IEEE T-RO

  49. Bridging the gap between target-based and cell-based drug discovery with a graph generative multi-task model

    Authors: Fan Hu, Dongqi Wang, Huazhen Huang, Yishen Hu, Peng Yin

    Abstract: Drug discovery is vitally important for protecting human against disease. Target-based screening is one of the most popular methods to develop new drugs in the past several decades. This method efficiently screens candidate drugs inhibiting target protein in vitro, but it often fails due to inadequate activity of the selected drugs in vivo. Accurate computational methods are needed to bridge this… ▽ More

    Submitted 8 August, 2022; originally announced August 2022.

    Journal ref: Journal of Chemical Information and Modeling, 2022

  50. arXiv:2207.12317  [pdf, other

    cs.CV cs.RO

    ALTO: A Large-Scale Dataset for UAV Visual Place Recognition and Localization

    Authors: Ivan Cisneros, Peng Yin, Ji Zhang, Howie Choset, Sebastian Scherer

    Abstract: We present the ALTO dataset, a vision-focused dataset for the development and benchmarking of Visual Place Recognition and Localization methods for Unmanned Aerial Vehicles. The dataset is composed of two long (approximately 150km and 260km) trajectories flown by a helicopter over Ohio and Pennsylvania, and it includes high precision GPS-INS ground truth location data, high precision accelerometer… ▽ More

    Submitted 19 July, 2022; originally announced July 2022.

    Comments: UAV Localization dataset paper