Skip to main content

Showing 1–50 of 334 results for author: Chai, J

  1. arXiv:2407.07035  [pdf, other

    cs.CL cs.CV

    Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models

    Authors: Yue Zhang, Ziqiao Ma, Jialu Li, Yanyuan Qiao, Zun Wang, Joyce Chai, Qi Wu, Mohit Bansal, Parisa Kordjamshidi

    Abstract: Vision-and-Language Navigation (VLN) has gained increasing attention over recent years and many approaches have emerged to advance their development. The remarkable achievements of foundation models have shaped the challenges and proposed methods for VLN research. In this survey, we provide a top-down review that adopts a principled framework for embodied planning and reasoning, and emphasizes the… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Authors contributed equally to this work, and supervisors contributed equal advising to this work

  2. arXiv:2407.06192  [pdf, other

    cs.CV cs.AI cs.CL

    Multi-Object Hallucination in Vision-Language Models

    Authors: Xuweiyi Chen, Ziqiao Ma, Xuejun Zhang, Sihan Xu, Shengyi Qian, Jianing Yang, David F. Fouhey, Joyce Chai

    Abstract: Large vision language models (LVLMs) often suffer from object hallucination, producing objects not present in the given images. While current benchmarks for object hallucination primarily concentrate on the presence of a single object class rather than individual entities, this work systematically investigates multi-object hallucination, examining how models misperceive (e.g., invent nonexistent o… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted to ALVR @ ACL 2024 | Project page: https://multi-object-hallucination.github.io/

  3. arXiv:2406.17044  [pdf, other

    quant-ph

    Fault-tolerant embedding of quantum circuits on hardware architectures via swap gates

    Authors: Shao-Hen Chiew, Ezequiel Ignacio Rodriguez Chiacchio, Vishal Sharma, Jing Hao Chai, Hui Khoon Ng

    Abstract: In near-term quantum computing devices, connectivity between qubits remain limited by architectural constraints. A computational circuit with given connectivity requirements necessary for multi-qubit gates have to be embedded within physical hardware with fixed connectivity. Long-distance gates have to be done by first routing the relevant qubits together. The simplest routing strategy involves th… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  4. arXiv:2406.15478  [pdf

    cond-mat.mtrl-sci physics.app-ph

    Impact of the Top SiO2 Interlayer Thickness on Memory Window of Si Channel FeFET with TiN/SiO2/Hf0.5Zr0.5O2/SiOx/Si (MIFIS) Gate Structure

    Authors: Tao Hu, Xianzhou Shao, Mingkai Bai, Xinpei Jia, Saifei Dai, Xiaoqing Sun, Runhao Han, Jia Yang, Xiaoyu Ke, Fengbin Tian, Shuai Yang, Junshuai Chai, Hao Xu, Xiaolei Wang, Wenwu Wang, Tianchun Ye

    Abstract: We study the impact of top SiO2 interlayer thickness on the memory window (MW) of Si channel ferroelectric field-effect transistor (FeFET) with TiN/SiO2/Hf0.5Zr0.5O2/SiOx/Si (MIFIS) gate structure. We find that the MW increases with the increasing thickness of the top SiO2 interlayer, and such an increase exhibits a two-stage linear dependence. The physical origin is the presence of the different… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 6 pages, 12 figures. arXiv admin note: substantial text overlap with arXiv:2404.15825

  5. arXiv:2406.10630  [pdf, other

    cs.CL cs.AI cs.CR cs.MA

    Emerging Safety Attack and Defense in Federated Instruction Tuning of Large Language Models

    Authors: Rui Ye, Jingyi Chai, Xiangrui Liu, Yaodong Yang, Yanfeng Wang, Siheng Chen

    Abstract: Federated learning (FL) enables multiple parties to collaboratively fine-tune an large language model (LLM) without the need of direct data sharing. Ideally, by training on decentralized data that is aligned with human preferences and safety principles, federated instruction tuning can result in an LLM that could behave in a helpful and safe manner. In this paper, we for the first time reveal the… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 18 pages

  6. arXiv:2406.09264  [pdf, other

    cs.HC cs.AI cs.CL

    Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions

    Authors: Hua Shen, Tiffany Knearem, Reshmi Ghosh, Kenan Alkiek, Kundan Krishna, Yachuan Liu, Ziqiao Ma, Savvas Petridis, Yi-Hao Peng, Li Qiwei, Sushrita Rakshit, Chenglei Si, Yutong Xie, Jeffrey P. Bigham, Frank Bentley, Joyce Chai, Zachary Lipton, Qiaozhu Mei, Rada Mihalcea, Michael Terry, Diyi Yang, Meredith Ringel Morris, Paul Resnick, David Jurgens

    Abstract: Recent advancements in general-purpose AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment. However, the lack of clarified definitions and scopes of human-AI alignment poses a significant obstacle, hampering collaborative efforts across research domains to achieve th… ▽ More

    Submitted 17 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: 56 pages

  7. arXiv:2406.05132  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.RO

    3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination

    Authors: Jianing Yang, Xuweiyi Chen, Nikhil Madaan, Madhavan Iyengar, Shengyi Qian, David F. Fouhey, Joyce Chai

    Abstract: The integration of language and 3D perception is crucial for developing embodied agents and robots that comprehend and interact with the physical world. While large language models (LLMs) have demonstrated impressive language understanding and generation capabilities, their adaptation to 3D environments (3D-LLMs) remains in its early stages. A primary challenge is the absence of large-scale datase… ▽ More

    Submitted 12 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: Project website: https://3d-grand.github.io

  8. arXiv:2406.04845  [pdf, other

    cs.CL cs.AI cs.DC cs.LG cs.MA

    FedLLM-Bench: Realistic Benchmarks for Federated Learning of Large Language Models

    Authors: Rui Ye, Rui Ge, Xinyu Zhu, Jingyi Chai, Yaxin Du, Yang Liu, Yanfeng Wang, Siheng Chen

    Abstract: Federated learning has enabled multiple parties to collaboratively train large language models without directly sharing their data (FedLLM). Following this training paradigm, the community has put massive efforts from diverse aspects including framework, performance, and privacy. However, an unpleasant fact is that there are currently no realistic datasets and benchmarks for FedLLM and previous wo… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 22 pages

  9. arXiv:2406.04640  [pdf, other

    cs.LG

    LinkGPT: Teaching Large Language Models To Predict Missing Links

    Authors: Zhongmou He, Jing Zhu, Shengyi Qian, Joyce Chai, Danai Koutra

    Abstract: Large Language Models (LLMs) have shown promising results on various language and vision tasks. Recently, there has been growing interest in applying LLMs to graph-based tasks, particularly on Text-Attributed Graphs (TAGs). However, most studies have focused on node classification, while the use of LLMs for link prediction (LP) remains understudied. In this work, we propose a new task on LLMs, whe… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  10. arXiv:2406.03008  [pdf, other

    cs.CV cs.AI cs.CL

    DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences

    Authors: Yidong Huang, Jacob Sansom, Ziqiao Ma, Felix Gervits, Joyce Chai

    Abstract: Recent advancements in foundation models (FMs) have unlocked new prospects in autonomous driving, yet the experimental settings of these studies are preliminary, over-simplified, and fail to capture the complexity of real-world driving scenarios in human environments. It remains under-explored whether FM agents can handle long-horizon navigation tasks with free-from dialogue and deal with unexpect… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: First Vision and Language for Autonomous Driving and Robotics Workshop (VLADR @ CVPR 2024)

  11. arXiv:2405.18256  [pdf

    cond-mat.mtrl-sci

    Electrical Control Grain Dimensionality with Multilevel Magnetic Anisotropy

    Authors: Shengyao Li, Sabpreet Bhatti, Siew Lang Teo, Ming Lin, Xinyue Pan, Zherui Yang, Peng Song, Wanghao Tian, Xinyu He, Jianwei Chai, Xian Jun Loh, Qiang Zhu, S. N. Piramanayagam, Xiao Renshaw Wang

    Abstract: In alignment with the increasing demand for larger storage capacity and longer data retention, electrical control of magnetic anisotropy has been a research focus in the realm of spintronics. Typically, magnetic anisotropy is determined by grain dimensionality, which is set during the fabrication of magnetic thin films. Despite the intrinsic correlation between magnetic anisotropy and grain dimens… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  12. arXiv:2405.13828  [pdf, other

    cs.CL cs.AI

    Babysit A Language Model From Scratch: Interactive Language Learning by Trials and Demonstrations

    Authors: Ziqiao Ma, Zekun Wang, Joyce Chai

    Abstract: Humans are efficient language learners and inherently social creatures. Our language development is largely shaped by our social interactions, for example, the demonstration and feedback from caregivers. Contrary to human language learning, recent advancements in large language models have primarily adopted a non-interactive training paradigm, and refined pre-trained models through feedback afterw… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  13. arXiv:2405.09187  [pdf, ps, other

    physics.chem-ph cond-mat.mtrl-sci physics.comp-ph quant-ph

    Spin Symmetry in Thermally-Assisted-Occupation Density Functional Theory

    Authors: Yu-Yang Wang, Jeng-Da Chai

    Abstract: For electronic systems with multi-reference (MR) character, Kohn-Sham density functional theory (KS-DFT) with the conventional exchange-correlation (xc) energy functionals can lead to incorrect spin densities and related properties. For example, for H2 dissociation, the spin-restricted and spin-unrestricted solutions obtained with the same xc energy functional in KS-DFT can be distinctly different… ▽ More

    Submitted 29 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

    Comments: accepted for publication in Phys. Rev. A, 23 pages, 5 figures

    Journal ref: Phys. Rev. A 109, 062808 (2024)

  14. arXiv:2404.15825  [pdf

    physics.app-ph

    Impact of Top SiO2 interlayer Thickness on Memory Window of Si Channel FeFET with TiN/SiO2/Hf0.5Zr0.5O2/SiOx/Si (MIFIS) Gate Structure

    Authors: Tao Hu, Xianzhou Shao, Mingkai Bai, Xinpei Jia, Saifei Dai, Xiaoqing Sun, Runhao Han, Jia Yang, Xiaoyu Ke, Fengbin Tian, Shuai Yang, Junshuai Chai, Hao Xu, Xiaolei Wang, Wenwu Wang, Tianchun Ye

    Abstract: We study the impact of top SiO2 interlayer thickness on memory window of Si channel FeFET with TiN/SiO2/Hf0.5Zr0.5O2/SiOx/Si (MIFIS) gate structure. The memory window increases with thicker top SiO2. We realize the memory window of 6.3 V for 3.4 nm top SiO2. Moreover, we find that the endurance characteristic degrades with increasing the initial memory window.

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: 4 page 7 figures

  15. Magneto-optical properties of a quantum dot array interacting with a far-infrared photon mode of a cylindrical cavity

    Authors: Vidar Gudmundsson, Vram Mughnetsyan, Hsi-Sheng Goan, Jeng-Da Chai, Nzar Rauf Abdullah, Chi-Shung Tang, Valeriu Moldoveanu, Andrei Manolescu

    Abstract: We model the equilibrium properties of a two-dimensional electron gas in a square lateral superlattice of quantum dots in a GaAs heterostructure subject to an external homogeneous perpendicular magnetic field and a far-infrared circular cylindrical photon cavity with one quantized mode, the TE011 mode. In a truncated linear basis constructed by a tensor product of the single-electron states of the… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: RevTeX - pdfLaTeX, 12 pages with 8 included pdf figures

    Journal ref: Phys. Rev. B 109, 235306 (2024)

  16. arXiv:2402.16846  [pdf, other

    cs.CV cs.AI cs.CL

    GROUNDHOG: Grounding Large Language Models to Holistic Segmentation

    Authors: Yichi Zhang, Ziqiao Ma, Xiaofeng Gao, Suhaila Shakiah, Qiaozi Gao, Joyce Chai

    Abstract: Most multimodal large language models (MLLMs) learn language-to-object grounding through causal language modeling where grounded objects are captured by bounding boxes as sequences of location tokens. This paradigm lacks pixel-level representations that are important for fine-grained visual understanding and diagnosis. In this work, we introduce GROUNDHOG, an MLLM developed by grounding Large Lang… ▽ More

    Submitted 16 April, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: Accepted to CVPR 2024. Website: https://groundhog-mllm.github.io/

  17. arXiv:2402.06954  [pdf, other

    cs.LG cs.CL cs.DC cs.MA

    OpenFedLLM: Training Large Language Models on Decentralized Private Data via Federated Learning

    Authors: Rui Ye, Wenhao Wang, Jingyi Chai, Dihan Li, Zexi Li, Yinda Xu, Yaxin Du, Yanfeng Wang, Siheng Chen

    Abstract: Trained on massive publicly available data, large language models (LLMs) have demonstrated tremendous success across various fields. While more data contributes to better performance, a disconcerting reality is that high-quality public data will be exhausted in a few years. In this paper, we offer a potential next step for contemporary LLMs: collaborative and privacy-preserving LLM training on the… ▽ More

    Submitted 10 February, 2024; originally announced February 2024.

    Comments: 28 pages, 3 figures, 16 tables

  18. arXiv:2401.02520  [pdf, other

    stat.ML cs.LG math.ST

    Structured Matrix Learning under Arbitrary Entrywise Dependence and Estimation of Markov Transition Kernel

    Authors: Jinhang Chai, Jianqing Fan

    Abstract: The problem of structured matrix estimation has been studied mostly under strong noise dependence assumptions. This paper considers a general framework of noisy low-rank-plus-sparse matrix recovery, where the noise matrix may come from any joint distribution with arbitrary dependence across entries. We propose an incoherent-constrained least-square estimator and prove its tightness both in the sen… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

    Comments: 55 pages, 4 figures

  19. arXiv:2312.16829  [pdf

    cond-mat.mtrl-sci physics.app-ph physics.soc-ph

    Enlargement of Memory Window of Si Channel FeFET by Inserting Al2O3 Interlayer on Ferroelectric Hf0.5Zr0.5O2

    Authors: Tao Hu, Xiaoqing Sun, Mingkai Bai, Xinpei Jia, Saifei Dai, Tingting Li, Runhao Han, Yajing Ding, Hongyang Fan, Yuanyuan Zhao, Junshuai Chai, Hao Xu, Mengwei Si, Xiaolei Wang, Wenwu Wang

    Abstract: In this work, we demonstrate the enlargement of the memory window of Si channel FeFET with ferroelectric Hf0.5Zr0.5O2 by gate-side dielectric interlayer engineering. By inserting an Al2O3 dielectric interlayer between TiN gate metal and ferroelectric Hf0.5Zr0.5O2, we achieve a memory window of 3.2 V with endurance of ~105 cycles and retention over 10 years. The physical origin of memory window enl… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

    Comments: 3 pages,6 figures;

  20. arXiv:2312.05807  [pdf, other

    cs.LG cs.CV

    Federated Learning Empowered by Generative Content

    Authors: Rui Ye, Xinyu Zhu, Jingyi Chai, Siheng Chen, Yanfeng Wang

    Abstract: Federated learning (FL) enables leveraging distributed private data for model training in a privacy-preserving way. However, data heterogeneity significantly limits the performance of current FL methods. In this paper, we propose a novel FL framework termed FedGC, designed to mitigate data heterogeneity issues by diversifying private data with generative content. FedGC is a simple-to-implement fra… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

    Comments: 19 pages

  21. arXiv:2312.05437  [pdf, other

    cs.IT cs.AI cs.NI

    Rate-Distortion-Perception Theory for Semantic Communication

    Authors: Jingxuan Chai, Yong Xiao, Guangming Shi, Walid Saad

    Abstract: Semantic communication has attracted significant interest recently due to its capability to meet the fast growing demand on user-defined and human-oriented communication services such as holographic communications, eXtended reality (XR), and human-to-machine interactions. Unfortunately, recent study suggests that the traditional Shannon information theory, focusing mainly on delivering semantic-ag… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

    Comments: accepted at IEEE International Conference on Network Protocols (ICNP) Workshop, Reykjavik, Iceland, October 10-13, 2023

  22. arXiv:2312.04965  [pdf, other

    cs.CV cs.AI cs.CL

    Inversion-Free Image Editing with Natural Language

    Authors: Sihan Xu, Yidong Huang, Jiayi Pan, Ziqiao Ma, Joyce Chai

    Abstract: Despite recent advances in inversion-based editing, text-guided image manipulation remains challenging for diffusion models. The primary bottlenecks include 1) the time-consuming nature of the inversion process; 2) the struggle to balance consistency with accuracy; 3) the lack of compatibility with efficient consistency sampling methods used in consistency models. To address the above issues, we s… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: Project Page: https://sled-group.github.io/InfEdit/

  23. arXiv:2311.17041  [pdf, other

    cs.CV cs.AI cs.CL

    Efficient In-Context Learning in Vision-Language Models for Egocentric Videos

    Authors: Keunwoo Peter Yu, Zheyuan Zhang, Fengyuan Hu, Joyce Chai

    Abstract: Recent advancements in text-only large language models (LLMs) have highlighted the benefit of in-context learning for adapting to new tasks with a few demonstrations. However, extending in-context learning to large vision-language models (VLMs) using a huge amount of naturalistic vision-language data has shown limited success, particularly for egocentric videos, due to high data collection costs.… ▽ More

    Submitted 29 November, 2023; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: 10 pages, LaTeX; added acknowledgments

  24. arXiv:2311.05729  [pdf, other

    cs.CV

    GIPCOL: Graph-Injected Soft Prompting for Compositional Zero-Shot Learning

    Authors: Guangyue Xu, Joyce Chai, Parisa Kordjamshidi

    Abstract: Pre-trained vision-language models (VLMs) have achieved promising success in many fields, especially with prompt learning paradigm. In this work, we propose GIP-COL (Graph-Injected Soft Prompting for COmpositional Learning) to better explore the compositional zero-shot learning (CZSL) ability of VLMs within the prompt-based learning framework. The soft prompt in GIPCOL is structured and consists o… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Comments: WACV24

  25. arXiv:2311.01580  [pdf, other

    cs.CL cs.AI

    MetaReVision: Meta-Learning with Retrieval for Visually Grounded Compositional Concept Acquisition

    Authors: Guangyue Xu, Parisa Kordjamshidi, Joyce Chai

    Abstract: Humans have the ability to learn novel compositional concepts by recalling and generalizing primitive concepts acquired from past experiences. Inspired by this observation, in this paper, we propose MetaReVision, a retrieval-enhanced meta-learning model to address the visually grounded compositional concept learning problem. The proposed MetaReVision consists of a retrieval module and a meta-learn… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Journal ref: EMNLP-Finding(2023)

  26. arXiv:2311.00738  [pdf, other

    cs.AI cs.HC

    Can Foundation Models Watch, Talk and Guide You Step by Step to Make a Cake?

    Authors: Yuwei Bao, Keunwoo Peter Yu, Yichi Zhang, Shane Storks, Itamar Bar-Yossef, Alexander De La Iglesia, Megan Su, Xiao Lin Zheng, Joyce Chai

    Abstract: Despite tremendous advances in AI, it remains a significant challenge to develop interactive task guidance systems that can offer situated, personalized guidance and assist humans in various tasks. These systems need to have a sophisticated understanding of the user as well as the environment, and make timely accurate decisions on when and what to say. To address this issue, we created a new multi… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: Accepted to EMNLP 2023 Findings

  27. arXiv:2311.00047  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans?

    Authors: Yichi Zhang, Jiayi Pan, Yuchen Zhou, Rui Pan, Joyce Chai

    Abstract: Vision-Language Models (VLMs) are trained on vast amounts of data captured by humans emulating our understanding of the world. However, known as visual illusions, human's perception of reality isn't always faithful to the physical world. This raises a key question: do VLMs have the similar kind of illusions as humans do, or do they faithfully learn to represent reality? To investigate this questio… ▽ More

    Submitted 31 October, 2023; originally announced November 2023.

    Comments: Accepted at EMNLP 2023 main conference

  28. arXiv:2310.19619  [pdf, other

    cs.CL cs.AI

    Towards A Holistic Landscape of Situated Theory of Mind in Large Language Models

    Authors: Ziqiao Ma, Jacob Sansom, Run Peng, Joyce Chai

    Abstract: Large Language Models (LLMs) have generated considerable interest and debate regarding their potential emergence of Theory of Mind (ToM). Several recent inquiries reveal a lack of robust ToM in these models and pose a pressing demand to develop new benchmarks, as current ones primarily focus on different aspects of ToM and are prone to shortcuts and data leakage. In this position paper, we seek to… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

    Comments: Theme Track, Findings of EMNLP 2023

  29. arXiv:2310.18364  [pdf, other

    cs.CL cs.AI

    From Heuristic to Analytic: Cognitively Motivated Strategies for Coherent Physical Commonsense Reasoning

    Authors: Zheyuan Zhang, Shane Storks, Fengyuan Hu, Sungryull Sohn, Moontae Lee, Honglak Lee, Joyce Chai

    Abstract: Pre-trained language models (PLMs) have shown impressive performance in various language tasks. However, they are prone to spurious correlations, and often generate illusory information. In real-world applications, PLMs should justify decisions with formalized, coherent reasoning chains, but this challenge remains under-explored. Cognitive psychology theorizes that humans are capable of utilizing… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023 Main Conference

  30. arXiv:2310.13165  [pdf, other

    cs.CV cs.AI cs.LG

    CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation

    Authors: Sihan Xu, Ziqiao Ma, Yidong Huang, Honglak Lee, Joyce Chai

    Abstract: Diffusion models (DMs) have enabled breakthroughs in image synthesis tasks but lack an intuitive interface for consistent image-to-image (I2I) translation. Various methods have been explored to address this issue, including mask-based methods, attention-based methods, and image-conditioning. However, it remains a critical challenge to enable unpaired I2I translation with pre-trained DMs while main… ▽ More

    Submitted 9 March, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023

  31. arXiv:2310.07968  [pdf, other

    cs.RO cs.CL cs.HC

    Think, Act, and Ask: Open-World Interactive Personalized Robot Navigation

    Authors: Yinpei Dai, Run Peng, Sikai Li, Joyce Chai

    Abstract: Zero-Shot Object Navigation (ZSON) enables agents to navigate towards open-vocabulary objects in unknown environments. The existing works of ZSON mainly focus on following individual instructions to find generic object classes, neglecting the utilization of natural language interaction and the complexities of identifying user-specific objects. To address these limitations, we introduce Zero-shot I… ▽ More

    Submitted 29 May, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: Video URL: https://www.youtube.com/watch?v=rN5S8QIhhQc

  32. arXiv:2310.05316  [pdf, other

    cs.CV

    Understanding the Feature Norm for Out-of-Distribution Detection

    Authors: Jaewoo Park, Jacky Chen Long Chai, Jaeho Yoon, Andrew Beng Jin Teoh

    Abstract: A neural network trained on a classification dataset often exhibits a higher vector norm of hidden layer features for in-distribution (ID) samples, while producing relatively lower norm values on unseen instances from out-of-distribution (OOD). Despite this intriguing phenomenon being utilized in many applications, the underlying cause has not been thoroughly investigated. In this study, we demyst… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: Accepted to ICCV2023

  33. arXiv:2309.12311  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.RO

    LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent

    Authors: Jianing Yang, Xuweiyi Chen, Shengyi Qian, Nikhil Madaan, Madhavan Iyengar, David F. Fouhey, Joyce Chai

    Abstract: 3D visual grounding is a critical skill for household robots, enabling them to navigate, manipulate objects, and answer questions based on their environment. While existing approaches often rely on extensive labeled data or exhibit limitations in handling complex language queries, we propose LLM-Grounder, a novel zero-shot, open-vocabulary, Large Language Model (LLM)-based 3D visual grounding pipe… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

    Comments: Project website: https://chat-with-nerf.github.io/

  34. Natural Language Instructions for Intuitive Human Interaction with Robotic Assistants in Field Construction Work

    Authors: Somin Park, Xi Wang, Carol C. Menassa, Vineet R. Kamat, Joyce Y. Chai

    Abstract: The introduction of robots is widely considered to have significant potential of alleviating the issues of worker shortage and stagnant productivity that afflict the construction industry. However, it is challenging to use fully automated robots in complex and unstructured construction sites. Human-Robot Collaboration (HRC) has shown promise of combining human workers' flexibility and robot assist… ▽ More

    Submitted 11 July, 2023; v1 submitted 9 July, 2023; originally announced July 2023.

  35. arXiv:2307.02615  [pdf, other

    cs.CL cs.AI cs.LG

    Human Inspired Progressive Alignment and Comparative Learning for Grounded Word Acquisition

    Authors: Yuwei Bao, Barrett Martin Lattimer, Joyce Chai

    Abstract: Human language acquisition is an efficient, supervised, and continual process. In this work, we took inspiration from how human babies acquire their first language, and developed a computational process for word acquisition through comparative learning. Motivated by cognitive findings, we generated a small dataset that enables the computation models to compare the similarities and differences of v… ▽ More

    Submitted 5 July, 2023; originally announced July 2023.

    Journal ref: ACL 2023

  36. arXiv:2306.08685  [pdf, other

    cs.CL cs.AI cs.CV

    World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Language Models

    Authors: Ziqiao Ma, Jiayi Pan, Joyce Chai

    Abstract: The ability to connect language units to their referents in the physical world, referred to as grounding, is crucial to learning and understanding grounded meanings of words. While humans demonstrate fast mapping in new word learning, it remains unclear whether modern vision-language models can truly represent language with their grounded meanings and how grounding may further bootstrap new word l… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

    Comments: ACL 2023

  37. arXiv:2305.17626  [pdf, other

    cs.AI cs.CL cs.LG

    In-Context Analogical Reasoning with Pre-Trained Language Models

    Authors: Xiaoyang Hu, Shane Storks, Richard L. Lewis, Joyce Chai

    Abstract: Analogical reasoning is a fundamental capacity of human cognition that allows us to reason abstractly about novel situations by relating them to past experiences. While it is thought to be essential for robust reasoning in AI systems, conventional approaches require significant training and/or hard-coding of domain knowledge to be applied to benchmark tasks. Inspired by cognitive science research… ▽ More

    Submitted 5 June, 2023; v1 submitted 28 May, 2023; originally announced May 2023.

  38. arXiv:2305.16579  [pdf, other

    cs.CL cs.AI

    NLP Reproducibility For All: Understanding Experiences of Beginners

    Authors: Shane Storks, Keunwoo Peter Yu, Ziqiao Ma, Joyce Chai

    Abstract: As natural language processing (NLP) has recently seen an unprecedented level of excitement, and more people are eager to enter the field, it is unclear whether current research reproducibility efforts are sufficient for this group of beginners to apply the latest developments. To understand their needs, we conducted a study with 93 students in an introductory NLP course, where students reproduced… ▽ More

    Submitted 3 June, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: ACL 2023 Theme Track

  39. arXiv:2305.11271  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    Towards Collaborative Plan Acquisition through Theory of Mind Modeling in Situated Dialogue

    Authors: Cristian-Paul Bara, Ziqiao Ma, Yingzhuo Yu, Julie Shah, Joyce Chai

    Abstract: Collaborative tasks often begin with partial task knowledge and incomplete initial plans from each partner. To complete these tasks, agents need to engage in situated communication with their partners and coordinate their partial plans towards a complete plan to achieve a joint task goal. While such collaboration seems effortless in a human-human team, it is highly challenging for human-AI collabo… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

    Journal ref: International Joint Conferences on Artificial Intelligence (IJCAI 2023)

  40. arXiv:2305.10407  [pdf, other

    cs.CL

    BAD: BiAs Detection for Large Language Models in the context of candidate screening

    Authors: Nam Ho Koh, Joseph Plata, Joyce Chai

    Abstract: Application Tracking Systems (ATS) have allowed talent managers, recruiters, and college admissions committees to process large volumes of potential candidate applications efficiently. Traditionally, this screening process was conducted manually, creating major bottlenecks due to the quantity of applications and introducing many instances of human bias. The advent of large language models (LLMs) s… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

    Comments: 12 pages, 6 figures

    MSC Class: I.2; I.2.7 ACM Class: F.2.2, I.2.7

  41. arXiv:2304.10066  [pdf, other

    cs.CV

    Recognizability Embedding Enhancement for Very Low-Resolution Face Recognition and Quality Estimation

    Authors: Jacky Chen Long Chai, Tiong-Sik Ng, Cheng-Yaw Low, Jaewoo Park, Andrew Beng Jin Teoh

    Abstract: Very low-resolution face recognition (VLRFR) poses unique challenges, such as tiny regions of interest and poor resolution due to extreme standoff distance or wide viewing angle of the acquisition devices. In this paper, we study principled approaches to elevate the recognizability of a face in the embedding space instead of the visual quality. We first formulate a robust learning-based face recog… ▽ More

    Submitted 19 April, 2023; originally announced April 2023.

    Comments: Accepted to CVPR23

  42. arXiv:2304.00061  [pdf, other

    cs.LG cs.AI

    To be Robust and to be Fair: Aligning Fairness with Robustness

    Authors: Junyi Chai, Xiaoqian Wang

    Abstract: Adversarial training has been shown to be reliable in improving robustness against adversarial samples. However, the problem of adversarial training in terms of fairness has not yet been properly studied, and the relationship between fairness and accuracy attack still remains unclear. Can we simultaneously improve robustness w.r.t. both fairness and accuracy? To tackle this topic, in this paper, w… ▽ More

    Submitted 31 March, 2023; originally announced April 2023.

  43. arXiv:2303.06593  [pdf, other

    eess.SP

    Domain-Knowledge-Aided Airborne Ground Moving Targets Tracking

    Authors: Jianduo Chai, Shaoming He, Hyo-Sang Shin

    Abstract: This paper investigates the problem of traffic surveillance using an unmanned aerial vehicle (UAV) and proposes a domain-knowledge-aided airborne ground moving targets tracking algorithm. To improve the accuracy of multiple targets tracking, the proposed algorithm incorporates domain knowledge into the joint probabilistic data association (JPDA) filter as state constraints. The domain knowledge co… ▽ More

    Submitted 12 March, 2023; originally announced March 2023.

  44. arXiv:2302.10518  [pdf, other

    cs.CV

    USR: Unsupervised Separated 3D Garment and Human Reconstruction via Geometry and Semantic Consistency

    Authors: Yue Shi, Yuxuan Xiong, Jingyi Chai, Bingbing Ni, Wenjun Zhang

    Abstract: Dressed people reconstruction from images is a popular task with promising applications in the creative media and game industry. However, most existing methods reconstruct the human body and garments as a whole with the supervision of 3D models, which hinders the downstream interaction tasks and requires hard-to-obtain data. To address these issues, we propose an unsupervised separated 3D garments… ▽ More

    Submitted 2 March, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

  45. arXiv:2301.03205  [pdf, other

    cond-mat.mtrl-sci physics.app-ph

    Enhanced photovoltaic effect in graphene-silicon Schottky junction under mechanical manipulation

    Authors: Dong Pu, Muhammad Abid Anwar, Jiachao Zhou, Renwei Mao, Xin Pan, Jian Chai, Feng Tian, Hua Wang, Huan Hu, Yang Xu

    Abstract: Graphene-silicon Schottky junction (GSJ) which has the potential for large-scale manufacturing and integration can bring new opportunities to Schottky solar cells for photovoltaic (PV) power conversion. However, the essential power conversion limitation for these devices lies in the small open-circuit voltage ($V_{oc}$), which depends on the Schottky barrier height (SBH). In this study, we introdu… ▽ More

    Submitted 9 January, 2023; originally announced January 2023.

    Comments: 7 pages, 4 figures. The following article has been accepted by Applied Physics Letters

  46. arXiv:2212.03830  [pdf, other

    cs.AI eess.SY

    A Hierarchical Deep Reinforcement Learning Framework for 6-DOF UCAV Air-to-Air Combat

    Authors: Jiajun Chai, Wenzhang Chen, Yuanheng Zhu, Zong-xin Yao, Dongbin Zhao

    Abstract: Unmanned combat air vehicle (UCAV) combat is a challenging scenario with continuous action space. In this paper, we propose a general hierarchical framework to resolve the within-vision-range (WVR) air-to-air combat problem under 6 dimensions of degree (6-DOF) dynamics. The core idea is to divide the whole decision process into two loops and use reinforcement learning (RL) to solve them separately… ▽ More

    Submitted 5 December, 2022; originally announced December 2022.

  47. arXiv:2211.05077  [pdf, other

    cs.CV

    Prompting Large Pre-trained Vision-Language Models For Compositional Concept Learning

    Authors: Guangyue Xu, Parisa Kordjamshidi, Joyce Chai

    Abstract: This work explores the zero-shot compositional learning ability of large pre-trained vision-language models(VLMs) within the prompt-based learning framework and propose a model (\textit{PromptCompVL}) to solve the compositonal zero-shot learning (CZSL) problem. \textit{PromptCompVL} makes two design choices: first, it uses a soft-prompting instead of hard-prompting to inject learnable parameters t… ▽ More

    Submitted 9 November, 2022; originally announced November 2022.

  48. arXiv:2210.14426  [pdf

    cond-mat.mtrl-sci cond-mat.mes-hall physics.app-ph physics.chem-ph

    Liquid Metal Printed Ultrathin Oxides for Monolayer WS2 Top-Gate Transistors

    Authors: Yiyu Zhang, Dasari Venkatakrishnarao, Michel Bosman, Wei Fu, Sarthak Das, Fabio Bussolotti, Rainer Lee, Siew Lang Teo, Ding Huang, Ivan Verzhbitskiy, Zhuojun Jiang, Zhuoling Jiang, Jian Wei Chai, Shi Wun Tong, Zi-En Ooi, Calvin Pei Yu Wong, Yee Sin Ang, Kuan Eng Johnson Goh, Chit Siong Lau

    Abstract: Two-dimensional (2D) semiconductors are promising channel materials for continued downscaling of complementary metal-oxide-semiconductor (CMOS) logic circuits. However, their full potential continues to be limited by a lack of scalable high-k dielectrics that can achieve atomically smooth interfaces, small equivalent oxide thicknesses (EOT), excellent gate control, and low leakage currents. Here,… ▽ More

    Submitted 25 October, 2022; originally announced October 2022.

  49. arXiv:2210.12511  [pdf, other

    cs.AI cs.CL cs.CV cs.RO

    DOROTHIE: Spoken Dialogue for Handling Unexpected Situations in Interactive Autonomous Driving Agents

    Authors: Ziqiao Ma, Ben VanDerPloeg, Cristian-Paul Bara, Huang Yidong, Eui-In Kim, Felix Gervits, Matthew Marge, Joyce Chai

    Abstract: In the real world, autonomous driving agents navigate in highly dynamic environments full of unexpected situations where pre-trained models are unreliable. In these situations, what is immediately available to vehicles is often only human operators. Empowering autonomous driving agents with the ability to navigate in a continuous and dynamic environment and to communicate with humans through senso… ▽ More

    Submitted 22 October, 2022; originally announced October 2022.

    Comments: Findings of EMNLP, 2022

  50. arXiv:2210.12485  [pdf, other

    cs.AI cs.CL cs.RO

    DANLI: Deliberative Agent for Following Natural Language Instructions

    Authors: Yichi Zhang, Jianing Yang, Jiayi Pan, Shane Storks, Nikhil Devraj, Ziqiao Ma, Keunwoo Peter Yu, Yuwei Bao, Joyce Chai

    Abstract: Recent years have seen an increasing amount of work on embodied AI agents that can perform tasks by following human language instructions. However, most of these agents are reactive, meaning that they simply learn and imitate behaviors encountered in the training data. These reactive agents are insufficient for long-horizon complex tasks. To address this limitation, we propose a neuro-symbolic del… ▽ More

    Submitted 22 October, 2022; originally announced October 2022.

    Comments: Accepted in EMNLP 2022