-
AvaTaR: Optimizing LLM Agents for Tool-Assisted Knowledge Retrieval
Authors:
Shirley Wu,
Shiyu Zhao,
Qian Huang,
Kexin Huang,
Michihiro Yasunaga,
Kaidi Cao,
Vassilis N. Ioannidis,
Karthik Subbian,
Jure Leskovec,
James Zou
Abstract:
Large language model (LLM) agents have demonstrated impressive capability in utilizing external tools and knowledge to boost accuracy and reduce hallucinations. However, developing the prompting techniques that make LLM agents able to effectively use external tools and knowledge is a heuristic and laborious task. Here, we introduce AvaTaR, a novel and automatic framework that optimizes an LLM agen…
▽ More
Large language model (LLM) agents have demonstrated impressive capability in utilizing external tools and knowledge to boost accuracy and reduce hallucinations. However, developing the prompting techniques that make LLM agents able to effectively use external tools and knowledge is a heuristic and laborious task. Here, we introduce AvaTaR, a novel and automatic framework that optimizes an LLM agent to effectively use the provided tools and improve its performance on a given task/domain. During optimization, we design a comparator module to iteratively provide insightful and holistic prompts to the LLM agent via reasoning between positive and negative examples sampled from training data. We demonstrate AvaTaR on four complex multimodal retrieval datasets featuring textual, visual, and relational information. We find AvaTaR consistently outperforms state-of-the-art approaches across all four challenging tasks and exhibits strong generalization ability when applied to novel cases, achieving an average relative improvement of 14% on the Hit@1 metric. Code and dataset are available at https://github.com/zou-group/avatar.
△ Less
Submitted 17 June, 2024; v1 submitted 17 June, 2024;
originally announced June 2024.
-
Auger photoemission as a laser-like coherent cathode
Authors:
Yushan Zeng,
Bin Zhang,
Kecheng Cao,
Xiao-jing Liu,
Yiming Pan
Abstract:
In pursuit of quantum advancements across disciplines, a bright and coherent electron source is expected to be a cornerstone of diverse applications including electron microscopy, laser accelerators, and free electron lasers. Current cathodes, such as cold field and photoemission, can generate high-quality electron beams with different cathode materials, geometric configurations, and laser excitat…
▽ More
In pursuit of quantum advancements across disciplines, a bright and coherent electron source is expected to be a cornerstone of diverse applications including electron microscopy, laser accelerators, and free electron lasers. Current cathodes, such as cold field and photoemission, can generate high-quality electron beams with different cathode materials, geometric configurations, and laser excitation profiles, but their maintenance of both quantum coherence and high beam brightness suffers from the space-charge repulsion of many electrons. Here, we propose a new mechanism to provide collective emission of coherent electrons based on Auger photoemission. Our approach leverages a photon-induced four-level Auger process that necessitates a combination of photoemission and Auger recombination. The Auger electrons, energized through a recycling process of photoelectrons, emit collectively into the vacuum as secondary electrons. We compare coherent and incoherent Auger photoemission, identifying that the working condition of the coherent photoemission requires population inversion, akin to the four-level laser system. Our work provides insights for experimental realization and nanofabrication of Auger photocathodes, addressing a critical need in advancing quantum technologies relating to correlated coherent sources.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
Exploring Dynamical Phase Transitions in the XY Chain through Linear Quench: Early and Long-term Perspectives
Authors:
Kaiyuan Cao,
Peiqing Tong
Abstract:
We investigate the nonequilibrium dynamics induced by a finite-time linear quench in the XY chain. Initially, we examine the dynamical quantum phase transition, characterized by the nonanalytic behavior of the Loschmidt amplitude. We find distinct behaviors of DQPTs during and following the ramp. Following the ramp, the ramp crossing the critical point $h_{c}$ is the sufficient condition for the o…
▽ More
We investigate the nonequilibrium dynamics induced by a finite-time linear quench in the XY chain. Initially, we examine the dynamical quantum phase transition, characterized by the nonanalytic behavior of the Loschmidt amplitude. We find distinct behaviors of DQPTs during and following the ramp. Following the ramp, the ramp crossing the critical point $h_{c}$ is the sufficient condition for the occurrence of DQPT, but it is not during the ramp. Through AIA approximation analysis, we establish that adequate distancing from the critical point is crucial for DQPT manifestation during the ramp, elucidating the absence of DQPT as the ramp gets faster. Additionally, we explore another type of dynamical phase transition, describing the long-term relaxation behavior of the order parameter. Our finding indicates that the asymptotic behavior of the time-dependent part induced by the linear quench is equivalent to that following a sudden quench, i.e., time-dependent part exhibits power-law decays of $\sim t^{-3/2}$ and $\sim t^{-1/2}$ for the ramp to the commensurate and incommensurate phases, respectively. Moreover, we also delve into the steady part, which showcases nonanalytic singularities at the critical point.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
ContEvol formalism: possibly a new twist on computational physics
Authors:
Kaili Cao
Abstract:
We present the ContEvol (continuous evolution) formalism, a family of implicit numerical methods which only need to solve linear equations and are almost symplectic. Combining values and derivatives of functions, ContEvol outputs allow users to recover full history and render full distributions. Using classic harmonic oscillator as a prototype case, we show that ContEvol methods lead to lower-orde…
▽ More
We present the ContEvol (continuous evolution) formalism, a family of implicit numerical methods which only need to solve linear equations and are almost symplectic. Combining values and derivatives of functions, ContEvol outputs allow users to recover full history and render full distributions. Using classic harmonic oscillator as a prototype case, we show that ContEvol methods lead to lower-order errors than two commonly used Runge--Kutta methods. Applying first-order ContEvol to simple celestial mechanics problems, we demonstrate that deviation from equation(s) of motion of ContEvol tracks is still $\mathcal{O}(h^5)$ ($h$ is the step length) by our definition. Numerical experiments with an eccentric elliptical orbit indicate that first-order ContEvol is a viable alternative to classic Runge--Kutta or the symplectic leapfrog integrator. Solving stationary Schrödinger equation in quantum mechanics, we manifest ability of ContEvol to handle boundary value or eigenvalue problems. Important directions for future work, including mathematical foundation, higher dimensions, and technical improvements, are discussed at the end of this article.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
The augmented codes of a family of linear codes with locality 2
Authors:
Ziling Heng,
Keqing Cao
Abstract:
In this paper, we first generalize the class of linear codes by Ding and Ding (IEEE TIT, 61(11), pp. 5835-5842, 2015). Then we mainly study the augmented codes of this generalized class of linear codes. For one thing, we use Gaussian sums to determine the parameters and weight distributions of the augmented codes in some cases. It is shown that the augmented codes are self-orthogonal and have only…
▽ More
In this paper, we first generalize the class of linear codes by Ding and Ding (IEEE TIT, 61(11), pp. 5835-5842, 2015). Then we mainly study the augmented codes of this generalized class of linear codes. For one thing, we use Gaussian sums to determine the parameters and weight distributions of the augmented codes in some cases. It is shown that the augmented codes are self-orthogonal and have only a few nonzero weights. For another thing, the locality of the augmented codes is proved to be 2, which indicates the augmented codes are useful in distributed storage. Besides, the augmented codes are projective as the minimum distance of their duals is proved to be 3. In particular, we obtain several (almost) optimal linear codes and locally recoverable codes.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
ID-Animator: Zero-Shot Identity-Preserving Human Video Generation
Authors:
Xuanhua He,
Quande Liu,
Shengju Qian,
Xin Wang,
Tao Hu,
Ke Cao,
Keyu Yan,
Jie Zhang
Abstract:
Generating high-fidelity human video with specified identities has attracted significant attention in the content generation community. However, existing techniques struggle to strike a balance between training efficiency and identity preservation, either requiring tedious case-by-case fine-tuning or usually missing identity details in the video generation process. In this study, we present \textb…
▽ More
Generating high-fidelity human video with specified identities has attracted significant attention in the content generation community. However, existing techniques struggle to strike a balance between training efficiency and identity preservation, either requiring tedious case-by-case fine-tuning or usually missing identity details in the video generation process. In this study, we present \textbf{ID-Animator}, a zero-shot human-video generation approach that can perform personalized video generation given a single reference facial image without further training. ID-Animator inherits existing diffusion-based video generation backbones with a face adapter to encode the ID-relevant embeddings from learnable facial latent queries. To facilitate the extraction of identity information in video generation, we introduce an ID-oriented dataset construction pipeline that incorporates unified human attributes and action captioning techniques from a constructed facial image pool. Based on this pipeline, a random reference training strategy is further devised to precisely capture the ID-relevant embeddings with an ID-preserving loss, thus improving the fidelity and generalization capacity of our model for ID-specific video generation. Extensive experiments demonstrate the superiority of ID-Animator to generate personalized human videos over previous models. Moreover, our method is highly compatible with popular pre-trained T2V models like animatediff and various community backbone models, showing high extendability in real-world applications for video generation where identity preservation is highly desired. Our codes and checkpoints are released at https://github.com/ID-Animator/ID-Animator.
△ Less
Submitted 25 June, 2024; v1 submitted 23 April, 2024;
originally announced April 2024.
-
STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases
Authors:
Shirley Wu,
Shiyu Zhao,
Michihiro Yasunaga,
Kexin Huang,
Kaidi Cao,
Qian Huang,
Vassilis N. Ioannidis,
Karthik Subbian,
James Zou,
Jure Leskovec
Abstract:
Answering real-world complex queries, such as complex product search, often requires accurate retrieval from semi-structured knowledge bases that involve blend of unstructured (e.g., textual descriptions of products) and structured (e.g., entity relations of products) information. However, previous works have mostly studied textual and relational retrieval tasks as separate topics. To address the…
▽ More
Answering real-world complex queries, such as complex product search, often requires accurate retrieval from semi-structured knowledge bases that involve blend of unstructured (e.g., textual descriptions of products) and structured (e.g., entity relations of products) information. However, previous works have mostly studied textual and relational retrieval tasks as separate topics. To address the gap, we develop STARK, a large-scale Semi-structure retrieval benchmark on Textual and Relational K nowledge Bases. Our benchmark covers three domains/datasets: product search, academic paper search, and queries in precision medicine. We design a novel pipeline to synthesize realistic user queries that integrate diverse relational information and complex textual properties, together with their ground-truth answers (items). We conduct rigorous human evaluation to validate the quality of our synthesized queries. We further enhance the benchmark with high-quality human-generated queries to provide an authentic reference. STARK serves as a comprehensive testbed for evaluating the performance of retrieval systems driven by large language models (LLMs). Our experiments suggest that STARK presents significant challenges to the current retrieval and LLM systems, indicating the demand for building more capable retrieval systems. The benchmark data and code are available on https://github.com/snap-stanford/stark.
△ Less
Submitted 20 May, 2024; v1 submitted 19 April, 2024;
originally announced April 2024.
-
NAND-like SOT-MRAM-based Approximate Storage for Error-Tolerant Applications
Authors:
Min Wang,
Zhengyi Hou,
Chenyi Wang,
Zhengjie Yan,
Shixing Li,
Ao Du,
Wenlong Cai,
Jinhao Li,
Hongchao Zhang,
Kaihua Cao,
Kewen Shi,
Bi Wang,
Yuanfu Zhao,
Qingyi Xiang,
Zhaohao Wang,
Weisheng Zhao
Abstract:
We demonstrate approximate storage based on NAND-like spin-orbit torque (SOT) MRAM, through "device-modeling-architecture" explorations. We experimentally achieve down to 1E-5 level selectivity. Selectivity and low-power solutions are established by numerical calculation workflow. System-level power consumption is evaluated in the 512 KB last-level cache according to 5 quality levels. Error-tolera…
▽ More
We demonstrate approximate storage based on NAND-like spin-orbit torque (SOT) MRAM, through "device-modeling-architecture" explorations. We experimentally achieve down to 1E-5 level selectivity. Selectivity and low-power solutions are established by numerical calculation workflow. System-level power consumption is evaluated in the 512 KB last-level cache according to 5 quality levels. Error-tolerant applications, such as image processing, alleviate the demand for selectivity down to the 5E-2 level, leading to 54% ~ 61% energy-saving. Our proposal paves the novel and suitable path for high-density and low-power NAND-like SOT-MRAM.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
PyTorch Frame: A Modular Framework for Multi-Modal Tabular Learning
Authors:
Weihua Hu,
Yiwen Yuan,
Zecheng Zhang,
Akihiro Nitta,
Kaidi Cao,
Vid Kocijan,
Jure Leskovec,
Matthias Fey
Abstract:
We present PyTorch Frame, a PyTorch-based framework for deep learning over multi-modal tabular data. PyTorch Frame makes tabular deep learning easy by providing a PyTorch-based data structure to handle complex tabular data, introducing a model abstraction to enable modular implementation of tabular models, and allowing external foundation models to be incorporated to handle complex columns (e.g.,…
▽ More
We present PyTorch Frame, a PyTorch-based framework for deep learning over multi-modal tabular data. PyTorch Frame makes tabular deep learning easy by providing a PyTorch-based data structure to handle complex tabular data, introducing a model abstraction to enable modular implementation of tabular models, and allowing external foundation models to be incorporated to handle complex columns (e.g., LLMs for text columns). We demonstrate the usefulness of PyTorch Frame by implementing diverse tabular models in a modular way, successfully applying these models to complex multi-modal tabular data, and integrating our framework with PyTorch Geometric, a PyTorch library for Graph Neural Networks (GNNs), to perform end-to-end learning over relational databases.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
HCTO: Optimality-Aware LiDAR Inertial Odometry with Hybrid Continuous Time Optimization for Compact Wearable Mapping System
Authors:
Jianping Li,
Shenghai Yuan,
Muqing Cao,
Thien-Minh Nguyen,
Kun Cao,
Lihua Xie
Abstract:
Compact wearable mapping system (WMS) has gained significant attention due to their convenience in various applications. Specifically, it provides an efficient way to collect prior maps for 3D structure inspection and robot-based "last-mile delivery" in complex environments. However, vibrations in human motion and the uneven distribution of point cloud features in complex environments often lead t…
▽ More
Compact wearable mapping system (WMS) has gained significant attention due to their convenience in various applications. Specifically, it provides an efficient way to collect prior maps for 3D structure inspection and robot-based "last-mile delivery" in complex environments. However, vibrations in human motion and the uneven distribution of point cloud features in complex environments often lead to rapid drift, which is a prevalent issue when applying existing LiDAR Inertial Odometry (LIO) methods on low-cost WMS. To address these limitations, we propose a novel LIO for WMSs based on Hybrid Continuous Time Optimization (HCTO) considering the optimality of Lidar correspondences. First, HCTO recognizes patterns in human motion (high-frequency part, low-frequency part, and constant velocity part) by analyzing raw IMU measurements. Second, HCTO constructs hybrid IMU factors according to different motion states, which enables robust and accurate estimation against vibration-induced noise in the IMU measurements. Third, the best point correspondences are selected using optimal design to achieve real-time performance and better odometry accuracy. We conduct experiments on head-mounted WMS datasets to evaluate the performance of our system, demonstrating significant advantages over state-of-the-art methods. Video recordings of experiments can be found on the project page of HCTO: \href{https://github.com/kafeiyin00/HCTO}{https://github.com/kafeiyin00/HCTO}.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Suppression of flux jumps in high-$J_c$ Nb$_3$Sn conductors by ferromagnetic layer
Authors:
Cun Xue,
Kai-Wei Cao,
Tian He,
Chong Wei,
Wei Liu,
Jun-Yi Ge
Abstract:
Flux jumps observed in high-$J_c$ Nb$_3$Sn conductors are urgent problems to construct high field superconducting magnets. The low-field instabilities usually reduce the current-carrying capability and thus cause the premature quench of Nb$_3$Sn coils at low magnetic field. In this paper, we explore suppressing the flux jumps by ferromagnetic (FM) layer. Firstly, we experimentally and theoreticall…
▽ More
Flux jumps observed in high-$J_c$ Nb$_3$Sn conductors are urgent problems to construct high field superconducting magnets. The low-field instabilities usually reduce the current-carrying capability and thus cause the premature quench of Nb$_3$Sn coils at low magnetic field. In this paper, we explore suppressing the flux jumps by ferromagnetic (FM) layer. Firstly, we experimentally and theoretically investigate the flux jumps of Nb$_3$Sn/FM hybrid wires exposed to a magnetic field loop with constant sweeping rate. Comparing with bare Nb$_3$Sn and Nb$_3$Sn/Cu wires, we reveal two underlying mechanisms that the suppression of flux jumps is mainly attributed to the thermal effect of FM layer for the case of lower sweeping rate, whereas both thermal and electromagnetic effects play a crucial role for the case of higher sweeping rate. Furthermore, we explore the flux jumps of Nb$_3$Sn/FM hybrid wires exposed to AC magnetic fields with amplitude $B_{a0}$ and frequency $\rmω$. We build up the phase diagrams of flux jumps in the plane $\rmω$-$B_{a0}$ for bare Nb$_{3}$Sn wire, Nb$_{3}$Sn/Cu wire and Nb$_{3}$Sn/FM wire, respectively. We stress that the region of flux jumps of Nb$_{3}$Sn/FM wire is much smaller than the other two wires, which indicates that the Nb$_{3}$Sn/FM wire has significant advantage over merely increasing the heat capacity. The findings shed light on suppression of the flux jumps by utilizing FM materials, which is useful for developing new type of high-$J_c$ Nb$_{3}$Sn conductors.
△ Less
Submitted 4 June, 2024; v1 submitted 10 March, 2024;
originally announced March 2024.
-
Unpacking Tokenization: Evaluating Text Compression and its Correlation with Model Performance
Authors:
Omer Goldman,
Avi Caciularu,
Matan Eyal,
Kris Cao,
Idan Szpektor,
Reut Tsarfaty
Abstract:
Despite it being the cornerstone of BPE, the most common tokenization algorithm, the importance of compression in the tokenization process is still unclear. In this paper, we argue for the theoretical importance of compression, that can be viewed as 0-gram language modeling where equal probability is assigned to all tokens. We also demonstrate the empirical importance of compression for downstream…
▽ More
Despite it being the cornerstone of BPE, the most common tokenization algorithm, the importance of compression in the tokenization process is still unclear. In this paper, we argue for the theoretical importance of compression, that can be viewed as 0-gram language modeling where equal probability is assigned to all tokens. We also demonstrate the empirical importance of compression for downstream success of pre-trained language models. We control the compression ability of several BPE tokenizers by varying the amount of documents available during their training: from 1 million documents to a character-based tokenizer equivalent to no training data at all. We then pre-train English language models based on those tokenizers and fine-tune them over several tasks. We show that there is a correlation between tokenizers' compression and models' downstream performance, suggesting that compression is a reliable intrinsic indicator of tokenization quality. These correlations are more pronounced for generation tasks (over classification) or for smaller models (over large ones). We replicated a representative part of our experiments on Turkish and found similar results, confirming that our results hold for languages with typological characteristics dissimilar to English. We conclude that building better compressing tokenizers is a fruitful avenue for further research and for improving overall model performance.
△ Less
Submitted 22 June, 2024; v1 submitted 10 March, 2024;
originally announced March 2024.
-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Authors:
Gemini Team,
Petko Georgiev,
Ving Ian Lei,
Ryan Burnell,
Libin Bai,
Anmol Gulati,
Garrett Tanzer,
Damien Vincent,
Zhufeng Pan,
Shibo Wang,
Soroosh Mariooryad,
Yifan Ding,
Xinyang Geng,
Fred Alcober,
Roy Frostig,
Mark Omernick,
Lexi Walker,
Cosmin Paduraru,
Christina Sorokin,
Andrea Tacchetti,
Colin Gaffney,
Samira Daruki,
Olcan Sercinoglu,
Zach Gleicher,
Juliette Love
, et al. (1092 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February…
▽ More
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
△ Less
Submitted 14 June, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Nature vs. Nurture: Distinguishing Effects from Stellar Processing and Chemical Evolution on Carbon and Nitrogen in Red Giant Stars
Authors:
John D. Roberts,
Marc H. Pinsonneault,
Jennifer A. Johnson,
Joel C. Zinn,
David H. Weinberg,
Mathieu Vrard,
Jamie Tayar,
Dennis Stello,
Benoît Mosser,
James W. Johnson,
Kaili Cao,
Keivan G. Stassun,
Guy S. Stringfellow,
Aldo Serenelli,
Savita Mathur,
Saskia Hekker,
Rafael A. García,
Yvonne P. Elsworth,
Enrico Corsaro
Abstract:
The surface [C/N] ratios of evolved giants are strongly affected by the first dredge-up (FDU) of nuclear-processed material from stellar cores. C and N also have distinct nucleosynthetic origins and serve as diagnostics of mixing and mass loss. We use subgiants to find strong trends in the birth [C/N] with [Fe/H], which differ between the low-$α$ and high-$α$ populations. We demonstrate that these…
▽ More
The surface [C/N] ratios of evolved giants are strongly affected by the first dredge-up (FDU) of nuclear-processed material from stellar cores. C and N also have distinct nucleosynthetic origins and serve as diagnostics of mixing and mass loss. We use subgiants to find strong trends in the birth [C/N] with [Fe/H], which differ between the low-$α$ and high-$α$ populations. We demonstrate that these birth trends have a strong impact on the surface abundances after the FDU. This effect is neglected in current stellar models, which use solar-scaled C and N. We map out the FDU as a function of evolutionary state, mass, and composition using a large and precisely measured asteroseismic dataset in first-ascent red giant branch (RGB) and core He-burning, or red clump (RC), stars. We describe the domains where [C/N] is a useful mass diagnostic and find that the RC complements the RGB and extends the range of validity to higher mass. We find evidence for extra mixing on the RGB below [Fe/H]= -0.4, matching literature results, for high-$α$ giants, but there is no clear evidence of mixing in the low-$α$ giants. The predicted signal of mass loss is weak and difficult to detect in our sample. We discuss implications for stellar physics and stellar population applications.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Intrinsic supercurrent diode effect in NbSe2 nanobridge
Authors:
Yiwen Zhang,
Jiliang Cai,
Peng Dong,
Jiadian He,
Yifan Ding,
Jinghui Wang,
Xiang Zhou,
Kecheng Cao,
Yueshen Wu,
Jun Li
Abstract:
The significance of the superconducting diode effect lies in its potential application as a fundamental component in the development of next-generation superconducting circuit technology. The stringent operating conditions at low temperatures have posed challenges for the conventional semiconductor diode, primarily due to its exceptionally high resistivity. In response to this limitation, various…
▽ More
The significance of the superconducting diode effect lies in its potential application as a fundamental component in the development of next-generation superconducting circuit technology. The stringent operating conditions at low temperatures have posed challenges for the conventional semiconductor diode, primarily due to its exceptionally high resistivity. In response to this limitation, various approaches have emerged to achieve the superconducting diode effect, primarily involving the disruption of inversion symmetry in a two-dimensional superconductor through heterostructure fabrication. In this study, we present a direct observation of the supercurrent diode effect in a NbSe2 nanobridge with a length of approximately 15 nm, created using focused helium ion beam fabrication. Nonreciprocal supercurrents were identified, reaching a peak value of approximately 380 $μ$A for each bias polarity at $B_{z}^{max} =\pm 0.2$ mT. Notably, the nonreciprocal supercurrent can be toggled by altering the bias polarity. This discovery of the superconducting diode effect introduces a novel avenue and mechanism through nanofabrication on a superconducting flake, offering fresh perspectives for the development of superconducting devices and potential circuits.
△ Less
Submitted 25 February, 2024;
originally announced February 2024.
-
PST-Bench: Tracing and Benchmarking the Source of Publications
Authors:
Fanjin Zhang,
Kun Cao,
Yukuo Cen,
Jifan Yu,
Da Yin,
Jie Tang
Abstract:
Tracing the source of research papers is a fundamental yet challenging task for researchers. The billion-scale citation relations between papers hinder researchers from understanding the evolution of science efficiently. To date, there is still a lack of an accurate and scalable dataset constructed by professional researchers to identify the direct source of their studied papers, based on which au…
▽ More
Tracing the source of research papers is a fundamental yet challenging task for researchers. The billion-scale citation relations between papers hinder researchers from understanding the evolution of science efficiently. To date, there is still a lack of an accurate and scalable dataset constructed by professional researchers to identify the direct source of their studied papers, based on which automatic algorithms can be developed to expand the evolutionary knowledge of science. In this paper, we study the problem of paper source tracing (PST) and construct a high-quality and ever-increasing dataset PST-Bench in computer science. Based on PST-Bench, we reveal several intriguing discoveries, such as the differing evolution patterns across various topics. An exploration of various methods underscores the hardness of PST-Bench, pinpointing potential directions on this topic. The dataset and codes have been available at https://github.com/THUDM/paper-source-trace.
△ Less
Submitted 25 February, 2024;
originally announced February 2024.
-
OAG-Bench: A Human-Curated Benchmark for Academic Graph Mining
Authors:
Fanjin Zhang,
Shijie Shi,
Yifan Zhu,
Bo Chen,
Yukuo Cen,
Jifan Yu,
Yelin Chen,
Lulu Wang,
Qingfei Zhao,
Yuqing Cheng,
Tianyi Han,
Yuwei An,
Dan Zhang,
Weng Lam Tam,
Kun Cao,
Yunhe Pang,
Xinyu Guan,
Huihui Yuan,
Jian Song,
Xiaoyan Li,
Yuxiao Dong,
Jie Tang
Abstract:
With the rapid proliferation of scientific literature, versatile academic knowledge services increasingly rely on comprehensive academic graph mining. Despite the availability of public academic graphs, benchmarks, and datasets, these resources often fall short in multi-aspect and fine-grained annotations, are constrained to specific task types and domains, or lack underlying real academic graphs.…
▽ More
With the rapid proliferation of scientific literature, versatile academic knowledge services increasingly rely on comprehensive academic graph mining. Despite the availability of public academic graphs, benchmarks, and datasets, these resources often fall short in multi-aspect and fine-grained annotations, are constrained to specific task types and domains, or lack underlying real academic graphs. In this paper, we present OAG-Bench, a comprehensive, multi-aspect, and fine-grained human-curated benchmark based on the Open Academic Graph (OAG). OAG-Bench covers 10 tasks, 20 datasets, 70+ baselines, and 120+ experimental results to date. We propose new data annotation strategies for certain tasks and offer a suite of data pre-processing codes, algorithm implementations, and standardized evaluation protocols to facilitate academic graph mining. Extensive experiments reveal that even advanced algorithms like large language models (LLMs) encounter difficulties in addressing key challenges in certain tasks, such as paper source tracing and scholar profiling. We also introduce the Open Academic Graph Challenge (OAG-Challenge) to encourage community input and sharing. We envisage that OAG-Bench can serve as a common ground for the community to evaluate and compare algorithms in academic graph mining, thereby accelerating algorithm development and advancement in this field. OAG-Bench is accessible at https://www.aminer.cn/data/.
△ Less
Submitted 20 June, 2024; v1 submitted 24 February, 2024;
originally announced February 2024.
-
A sufficient condition for the height function to be constant in $ I_g\times_ρ\mathbb{P}^n $
Authors:
Kaijian Cao
Abstract:
This paper makes some modifications to the warped product space. Based on Alias,Impera and Rigoli, a warping function is added to the warped product space. This new function affects the Riemannian metric of the warped product space. In this new warped product space, we continue to discuss the sufficient condition for calculating the height of the immersed surface.
This paper makes some modifications to the warped product space. Based on Alias,Impera and Rigoli, a warping function is added to the warped product space. This new function affects the Riemannian metric of the warped product space. In this new warped product space, we continue to discuss the sufficient condition for calculating the height of the immersed surface.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Pan-Mamba: Effective pan-sharpening with State Space Model
Authors:
Xuanhua He,
Ke Cao,
Keyu Yan,
Rui Li,
Chengjun Xie,
Jie Zhang,
Man Zhou
Abstract:
Pan-sharpening involves integrating information from low-resolution multi-spectral and high-resolution panchromatic images to generate high-resolution multi-spectral counterparts. While recent advancements in the state space model, particularly the efficient long-range dependency modeling achieved by Mamba, have revolutionized computer vision community, its untapped potential in pan-sharpening mot…
▽ More
Pan-sharpening involves integrating information from low-resolution multi-spectral and high-resolution panchromatic images to generate high-resolution multi-spectral counterparts. While recent advancements in the state space model, particularly the efficient long-range dependency modeling achieved by Mamba, have revolutionized computer vision community, its untapped potential in pan-sharpening motivates our exploration. Our contribution, Pan-Mamba, represents a novel pan-sharpening network that leverages the efficiency of the Mamba model in global information modeling. In Pan-Mamba, we customize two core components: channel swapping Mamba and cross-modal Mamba, strategically designed for efficient cross-modal information exchange and fusion. The former initiates a lightweight cross-modal interaction through the exchange of partial panchromatic and multi-spectral channels, while the latter facilities the information representation capability by exploiting inherent cross-modal relationships. Through extensive experiments across diverse datasets, our proposed approach surpasses state-of-the-art methods, showcasing superior fusion results in pan-sharpening. To the best of our knowledge, this work is the first attempt in exploring the potential of the Mamba model and establishes a new frontier in the pan-sharpening techniques. The source code is available at \url{https://github.com/alexhe101/Pan-Mamba}.
△ Less
Submitted 8 March, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
Fourier Prompt Tuning for Modality-Incomplete Scene Segmentation
Authors:
Ruiping Liu,
Jiaming Zhang,
Kunyu Peng,
Yufan Chen,
Ke Cao,
Junwei Zheng,
M. Saquib Sarfraz,
Kailun Yang,
Rainer Stiefelhagen
Abstract:
Integrating information from multiple modalities enhances the robustness of scene perception systems in autonomous vehicles, providing a more comprehensive and reliable sensory framework. However, the modality incompleteness in multi-modal segmentation remains under-explored. In this work, we establish a task called Modality-Incomplete Scene Segmentation (MISS), which encompasses both system-level…
▽ More
Integrating information from multiple modalities enhances the robustness of scene perception systems in autonomous vehicles, providing a more comprehensive and reliable sensory framework. However, the modality incompleteness in multi-modal segmentation remains under-explored. In this work, we establish a task called Modality-Incomplete Scene Segmentation (MISS), which encompasses both system-level modality absence and sensor-level modality errors. To avoid the predominant modality reliance in multi-modal fusion, we introduce a Missing-aware Modal Switch (MMS) strategy to proactively manage missing modalities during training. Utilizing bit-level batch-wise sampling enhances the model's performance in both complete and incomplete testing scenarios. Furthermore, we introduce the Fourier Prompt Tuning (FPT) method to incorporate representative spectral information into a limited number of learnable prompts that maintain robustness against all MISS scenarios. Akin to fine-tuning effects but with fewer tunable parameters (1.1%). Extensive experiments prove the efficacy of our proposed approach, showcasing an improvement of 5.84% mIoU over the prior state-of-the-art parameter-efficient methods in modality missing. The source code is publicly available at https://github.com/RuipingL/MISS.
△ Less
Submitted 10 April, 2024; v1 submitted 30 January, 2024;
originally announced January 2024.
-
Neutron Scattering Studies on the High-$T_c$ Superconductor La$_3$Ni$_2$O$_{7-δ}$ at Ambient Pressure
Authors:
Tao Xie,
Mengwu Huo,
Xiaosheng Ni,
Feiran Shen,
Xing Huang,
Hualei Sun,
Helen C. Walker,
Devashibhai Adroja,
Dehong Yu,
Bing Shen,
Lunhua He,
Kun Cao,
Meng Wang
Abstract:
After several decades of studies of high-temperature superconductivity, there is no compelling theory for the mechanism yet; however, the spin fluctuations have been widely believed to play a crucial role in forming the superconducting Cooper pairs. The recent discovery of high-temperature superconductivity near 80 K in the bilayer nickelate La$_3$Ni$_2$O$_7$ under pressure provides a new platform…
▽ More
After several decades of studies of high-temperature superconductivity, there is no compelling theory for the mechanism yet; however, the spin fluctuations have been widely believed to play a crucial role in forming the superconducting Cooper pairs. The recent discovery of high-temperature superconductivity near 80 K in the bilayer nickelate La$_3$Ni$_2$O$_7$ under pressure provides a new platform to elucidate the origins of high-temperature superconductivity. We perform elastic and inelastic neutron scattering studies on a polycrystalline sample of La$_3$Ni$_2$O$_{7-δ}$ at ambient pressure. No magnetic order can be identified down to 10 K. The absence of long-range magnetic order in neutron diffraction measurements may be ascribed to the smallness of the magnetic moment. However, we observe a weak flat spin-fluctuation signal at $\sim$ 45 meV in the inelastic scattering spectra. The observed spin excitations could be interpreted as a result of strong interlayer and weak intralayer magnetic couplings for stripe-type antiferromagnetic orders. Our results provide crucial information on the spin dynamics and are thus important for understanding the superconductivity in La$_3$Ni$_2$O$_7$.
△ Less
Submitted 4 April, 2024; v1 submitted 23 January, 2024;
originally announced January 2024.
-
Towards End-to-End GPS Localization with Neural Pseudorange Correction
Authors:
Xu Weng,
KV Ling,
Haochen Liu,
Kun Cao
Abstract:
Pseudorange errors are the root cause of localization inaccuracy in GPS. Previous data-driven methods regress and eliminate pseudorange errors using handcrafted intermediate labels. Unlike them, we propose an end-to-end GPS localization framework, E2E-PrNet, to train a neural network for pseudorange correction (PrNet) directly using the final task loss calculated with the ground truth of GPS recei…
▽ More
Pseudorange errors are the root cause of localization inaccuracy in GPS. Previous data-driven methods regress and eliminate pseudorange errors using handcrafted intermediate labels. Unlike them, we propose an end-to-end GPS localization framework, E2E-PrNet, to train a neural network for pseudorange correction (PrNet) directly using the final task loss calculated with the ground truth of GPS receiver states. The gradients of the loss with respect to learnable parameters are backpropagated through a differentiable nonlinear least squares optimizer to PrNet. The feasibility is verified with GPS data collected by Android phones, showing that E2E-PrNet outperforms the state-of-the-art end-to-end GPS localization methods.
△ Less
Submitted 19 January, 2024;
originally announced January 2024.
-
Quantum phase transitions in the alternating XY chain with three-site interactions
Authors:
Kaiyuan Cao,
Hao Fu,
Xue Liu,
Ming Zhong,
Peiqing Tong
Abstract:
We investigate the quantum phase transition in the alternating XY chain with the XZX+YZY type of three-spin interactions. We present the exact solution derived by means of the Jordan-Wigner transformation and study the average magnetization, spin correlations, and von Neumann entropy to establish the phase diagram. The phase diagram consists of the ferromagnetic phases, the paramagnetic phases, an…
▽ More
We investigate the quantum phase transition in the alternating XY chain with the XZX+YZY type of three-spin interactions. We present the exact solution derived by means of the Jordan-Wigner transformation and study the average magnetization, spin correlations, and von Neumann entropy to establish the phase diagram. The phase diagram consists of the ferromagnetic phases, the paramagnetic phases, and the phase with weak magnetization (WM). By examining the nearest-neighbor transverse spin correlation, we probe that in the WM phase, the spins within a supercell generate a cluster with a small total spin, but between the nearest-neighbor supercells are distributed randomly. Especially for the dimerized limit case, the spins within a supercell tend to point to opposite directions of the transverse field. In addition, we also investigate the influence of the three-site interaction, and find that the WM phase is absent as the strength of the three-site interaction increases. Our findings shed light on the complex behavior of the alternating XY chain and provide valuable insights for future studies.
△ Less
Submitted 4 January, 2024; v1 submitted 1 January, 2024;
originally announced January 2024.
-
Gemini: A Family of Highly Capable Multimodal Models
Authors:
Gemini Team,
Rohan Anil,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Jiahui Yu,
Radu Soricut,
Johan Schalkwyk,
Andrew M. Dai,
Anja Hauth,
Katie Millican,
David Silver,
Melvin Johnson,
Ioannis Antonoglou,
Julian Schrittwieser,
Amelia Glaese,
Jilin Chen,
Emily Pitler,
Timothy Lillicrap,
Angeliki Lazaridou,
Orhan Firat,
James Molloy,
Michael Isard,
Paul R. Barham,
Tom Hennigan,
Benjamin Lee
, et al. (1325 additional authors not shown)
Abstract:
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr…
▽ More
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
△ Less
Submitted 17 June, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
GraphMETRO: Mitigating Complex Graph Distribution Shifts via Mixture of Aligned Experts
Authors:
Shirley Wu,
Kaidi Cao,
Bruno Ribeiro,
James Zou,
Jure Leskovec
Abstract:
Graph data are inherently complex and heterogeneous, leading to a high natural diversity of distributional shifts. However, it remains unclear how to build machine learning architectures that generalize to complex non-synthetic distributional shifts naturally occurring in the real world. Here we develop GraphMETRO, a Graph Neural Network architecture, that reliably models natural diversity and cap…
▽ More
Graph data are inherently complex and heterogeneous, leading to a high natural diversity of distributional shifts. However, it remains unclear how to build machine learning architectures that generalize to complex non-synthetic distributional shifts naturally occurring in the real world. Here we develop GraphMETRO, a Graph Neural Network architecture, that reliably models natural diversity and captures complex distributional shifts. GraphMETRO employs a Mixture-of-Experts (MoE) architecture with a gating model and multiple expert models, where each expert model targets a specific distributional shift to produce a shift-invariant representation, and the gating model identifies shift components. Additionally, we design a novel objective that aligns the representations from different expert models to ensure smooth optimization. GraphMETRO achieves state-of-the-art results on four datasets from GOOD benchmark comprised of complex and natural real-world distribution shifts, improving by 67% and 4.2% on WebKB and Twitch datasets.
△ Less
Submitted 5 February, 2024; v1 submitted 7 December, 2023;
originally announced December 2023.
-
Fino-Vezzoni conjecture on Lie algebras with abelian ideals of codimension two
Authors:
Kexiang Cao,
Fangyang Zheng
Abstract:
In this paper, we confirm the Fino-Vezzoni Conjecture for unimodular Lie algebras which contain abelian ideals of codimension two, a natural generalization to the class of almost abelian Lie algebras. This provides new evidence towards the validity of the conjecture on a very special type of $3$-step solvmanifolds.
In this paper, we confirm the Fino-Vezzoni Conjecture for unimodular Lie algebras which contain abelian ideals of codimension two, a natural generalization to the class of almost abelian Lie algebras. This provides new evidence towards the validity of the conjecture on a very special type of $3$-step solvmanifolds.
△ Less
Submitted 18 March, 2024; v1 submitted 16 November, 2023;
originally announced November 2023.
-
Relaxation dynamics in the alternating XY chain following a quantum quench
Authors:
Kaiyuan Cao,
Yayun Hu,
Peiqing Tong,
Guangwen Yang,
Peng Liu
Abstract:
We investigate the relaxation dynamics of the fermion two-point correlation function $C_{mn}(t)=\langleψ(t)|c_{m}^†c_{n}|ψ(t)\rangle$ in the XY chain with staggered nearest-neighbor hopping interaction after a quench. We find that the deviation $δC_{mn}(t)=C_{mn}(t)-C_{mn}(\infty)$ decays with time following the power law behavior $t^{-μ}$, where the exponent $μ$ depends on whether the quench is t…
▽ More
We investigate the relaxation dynamics of the fermion two-point correlation function $C_{mn}(t)=\langleψ(t)|c_{m}^†c_{n}|ψ(t)\rangle$ in the XY chain with staggered nearest-neighbor hopping interaction after a quench. We find that the deviation $δC_{mn}(t)=C_{mn}(t)-C_{mn}(\infty)$ decays with time following the power law behavior $t^{-μ}$, where the exponent $μ$ depends on whether the quench is to the commensurate phase ($μ=1$) and incommensurate phase ($μ=\frac{1}{2}$). This decay of $δC_{mn}(t)$ arises from the transient behavior of the double excited quasiparticle occupations and the transitions between different excitation spectra. Furthermore, we find that the steady value $C_{mn}(\infty)$, which is different from the ground state expectation value, only involves the average fermion occupation numbers (i.e. the average excited single particle). We also observe nonanalytic singularities in the steady value $C_{mn}(\infty)$ for the quench to the critical points of the quantum phase transitions (QPTs), suggesting its potential use as a signature of QPTs.
△ Less
Submitted 4 January, 2024; v1 submitted 14 November, 2023;
originally announced November 2023.
-
Topological phases of many-body non-Hermitian systems
Authors:
Kui Cao,
Su-Peng Kou
Abstract:
We show that many-body fermionic non-Hermitian systems require two distinct sets of topological invariants to describe the topology of energy bands and quantum states respectively, with the latter yet to be explored. We identify 10 symmetry classes -- determined by particle-hole, linearized time-reversal, and linearized chiral symmetries. Each class has topological invariant associated with each d…
▽ More
We show that many-body fermionic non-Hermitian systems require two distinct sets of topological invariants to describe the topology of energy bands and quantum states respectively, with the latter yet to be explored. We identify 10 symmetry classes -- determined by particle-hole, linearized time-reversal, and linearized chiral symmetries. Each class has topological invariant associated with each dimension, dictating the topology of quantum states. These findings pave the way for deeper understanding of the topological phases of many-body non-Hermitian systems.
△ Less
Submitted 23 April, 2024; v1 submitted 6 November, 2023;
originally announced November 2023.
-
Constructing the Fulde-Ferrell-Larkin-Ovchinnikov state in antiferromagnetic insulator CrOCl
Authors:
Yifan Ding,
Jiadian He,
Shihao Zhang,
Huakun Zuo,
Pingfan Gu,
Jiliang Cai,
Xiaohui Zeng,
Pu Yan,
Kecheng Cao,
Kenji Watanabe,
Takashi Taniguchi,
Peng Dong,
Yiwen Zhang,
Yueshen Wu,
Xiang Zhou,
Jinghui Wang,
Yulin Chen,
Yu Ye,
Jianpeng Liu,
Jun Li
Abstract:
Time reversal symmetry breaking in superconductors, resulting from external magnetic fields or spontaneous magnetization, often leads to unconventional superconducting properties. In this way, a conventional Fulde-Ferrell-Larkin-Ovchinnikov (FFLO) state, characterized by the Cooper pairs with nonzero total momentum, may be realized by the Zeeman effect caused from external magnetic fields. Here, w…
▽ More
Time reversal symmetry breaking in superconductors, resulting from external magnetic fields or spontaneous magnetization, often leads to unconventional superconducting properties. In this way, a conventional Fulde-Ferrell-Larkin-Ovchinnikov (FFLO) state, characterized by the Cooper pairs with nonzero total momentum, may be realized by the Zeeman effect caused from external magnetic fields. Here, we report the observation of superconductivity in a few-layer antiferromagnetic insulator CrOCl by utilizing superconducting proximity effect with NbSe2 flakes. The superconductivity demonstrates a considerably weak gap of about 0.12 meV and the in-plane upper critical field reveals as behavior of the FFLO state at low temperature. Our first-principles calculations indicate that the proximitized superconductivity may exist in the CrOCl layer with Cr vacancies or line-defects. Moreover, the FFLO state could be induced by the inherent larger spin splitting in the CrOCl layer. Our findings not only demonstrate the fascinating interaction between superconductivity and magnetism, but also provide a possible path to construct FFLO state by intrinsic time reversal symmetry breaking and superconducting proximity effect.
△ Less
Submitted 31 October, 2023;
originally announced November 2023.
-
How do users design scientific workflows? The Case of Snakemake
Authors:
Sebastian Pohl,
Nourhan Elfaramawy,
Kedi Cao,
Birte Kehr,
Matthias Weidlich
Abstract:
Scientific workflows automate the analysis of large-scale scientific data, fostering the reuse of data processing operators as well as the reproducibility and traceability of analysis results. In exploratory research, however, workflows are continuously adapted, utilizing a wide range of tools and software libraries, to test scientific hypotheses. Script-based workflow engines cater to the require…
▽ More
Scientific workflows automate the analysis of large-scale scientific data, fostering the reuse of data processing operators as well as the reproducibility and traceability of analysis results. In exploratory research, however, workflows are continuously adapted, utilizing a wide range of tools and software libraries, to test scientific hypotheses. Script-based workflow engines cater to the required flexibility through direct integration of programming primitives but lack abstractions for interactive exploration of the workflow design by a user during workflow execution. To derive requirements for such interactive workflows, we conduct an empirical study on the use of Snakemake, a popular Python-based workflow engine. Based on workflows collected from 1602 GitHub repositories, we present insights on common structures of Snakemake workflows, as well as the language features typically adopted in their specification.
△ Less
Submitted 25 September, 2023;
originally announced September 2023.
-
PyPose v0.6: The Imperative Programming Interface for Robotics
Authors:
Zitong Zhan,
Xiangfu Li,
Qihang Li,
Haonan He,
Abhinav Pandey,
Haitao Xiao,
Yangmengfei Xu,
Xiangyu Chen,
Kuan Xu,
Kun Cao,
Zhipeng Zhao,
Zihan Wang,
Huan Xu,
Zihang Fang,
Yutian Chen,
Wentao Wang,
Xu Fang,
Yi Du,
Tianhao Wu,
Xiao Lin,
Yuheng Qiu,
Fan Yang,
Jingnan Shi,
Shaoshu Su,
Yiren Lu
, et al. (11 additional authors not shown)
Abstract:
PyPose is an open-source library for robot learning. It combines a learning-based approach with physics-based optimization, which enables seamless end-to-end robot learning. It has been used in many tasks due to its meticulously designed application programming interface (API) and efficient implementation. From its initial launch in early 2022, PyPose has experienced significant enhancements, inco…
▽ More
PyPose is an open-source library for robot learning. It combines a learning-based approach with physics-based optimization, which enables seamless end-to-end robot learning. It has been used in many tasks due to its meticulously designed application programming interface (API) and efficient implementation. From its initial launch in early 2022, PyPose has experienced significant enhancements, incorporating a wide variety of new features into its platform. To satisfy the growing demand for understanding and utilizing the library and reduce the learning curve of new users, we present the fundamental design principle of the imperative programming interface, and showcase the flexible usage of diverse functionalities and modules using an extremely simple Dubins car example. We also demonstrate that the PyPose can be easily used to navigate a real quadruped robot with a few lines of code.
△ Less
Submitted 22 September, 2023;
originally announced September 2023.
-
Dynamical relaxation behavior of extended XY chain with gapless phase following a quantum quench
Authors:
Kaiyuan Cao,
Yayun Hu,
Peiqing Tong,
Guangwen Yang
Abstract:
We investigate the dynamical relaxation behavior of the two-point correlation in extended XY models with a gapless phase after quenches from various initial states. Specifically, we study the XY chain with gapless phase induced by the additional interactions: Dzyaloshinskii-Moriya interaction and XZY-YZX type of three-site interaction. When quenching from the gapped phase, we observe that the addi…
▽ More
We investigate the dynamical relaxation behavior of the two-point correlation in extended XY models with a gapless phase after quenches from various initial states. Specifically, we study the XY chain with gapless phase induced by the additional interactions: Dzyaloshinskii-Moriya interaction and XZY-YZX type of three-site interaction. When quenching from the gapped phase, we observe that the additional interactions have no effect on the relaxation behavior. The relaxation behavior is $δC_{mn}(t)\sim t^{-3/2}$ and $\sim t^{-1/2}$ for the quench to the commensurate phase and the incommensurate phase, respectively. However, when quenching from the gapless phase, we demonstrate that the scaling behavior of $δC_{mn}(t)$ is changed to $\sim t^{-1}$ for the quench to the commensurate phase, and the decay of $δC_{mn}(t)$ follows $\sim t^{-1}$ or $\sim t^{-1/2}$ for the quench to the incommensurate phase depending on the parameters of pre-quench Hamiltonian. We also establish the dynamical phase diagrams based on the dynamical relaxation behavior of $δC_{mn}(t)$ in the extended XY models.
△ Less
Submitted 5 September, 2023;
originally announced September 2023.
-
TpuGraphs: A Performance Prediction Dataset on Large Tensor Computational Graphs
Authors:
Phitchaya Mangpo Phothilimthana,
Sami Abu-El-Haija,
Kaidi Cao,
Bahare Fatemi,
Mike Burrows,
Charith Mendis,
Bryan Perozzi
Abstract:
Precise hardware performance models play a crucial role in code optimizations. They can assist compilers in making heuristic decisions or aid autotuners in identifying the optimal configuration for a given program. For example, the autotuner for XLA, a machine learning compiler, discovered 10-20% speedup on state-of-the-art models serving substantial production traffic at Google. Although there ex…
▽ More
Precise hardware performance models play a crucial role in code optimizations. They can assist compilers in making heuristic decisions or aid autotuners in identifying the optimal configuration for a given program. For example, the autotuner for XLA, a machine learning compiler, discovered 10-20% speedup on state-of-the-art models serving substantial production traffic at Google. Although there exist a few datasets for program performance prediction, they target small sub-programs such as basic blocks or kernels. This paper introduces TpuGraphs, a performance prediction dataset on full tensor programs, represented as computational graphs, running on Tensor Processing Units (TPUs). Each graph in the dataset represents the main computation of a machine learning workload, e.g., a training epoch or an inference step. Each data sample contains a computational graph, a compilation configuration, and the execution time of the graph when compiled with the configuration. The graphs in the dataset are collected from open-source machine learning programs, featuring popular model architectures, e.g., ResNet, EfficientNet, Mask R-CNN, and Transformer. TpuGraphs provides 25x more graphs than the largest graph property prediction dataset (with comparable graph sizes), and 770x larger graphs on average compared to existing performance prediction datasets on machine learning programs. This graph-level prediction task on large graphs introduces new challenges in learning, ranging from scalability, training efficiency, to model quality.
△ Less
Submitted 5 December, 2023; v1 submitted 25 August, 2023;
originally announced August 2023.
-
Communication-Free Distributed GNN Training with Vertex Cut
Authors:
Kaidi Cao,
Rui Deng,
Shirley Wu,
Edward W Huang,
Karthik Subbian,
Jure Leskovec
Abstract:
Training Graph Neural Networks (GNNs) on real-world graphs consisting of billions of nodes and edges is quite challenging, primarily due to the substantial memory needed to store the graph and its intermediate node and edge features, and there is a pressing need to speed up the training process. A common approach to achieve speed up is to divide the graph into many smaller subgraphs, which are the…
▽ More
Training Graph Neural Networks (GNNs) on real-world graphs consisting of billions of nodes and edges is quite challenging, primarily due to the substantial memory needed to store the graph and its intermediate node and edge features, and there is a pressing need to speed up the training process. A common approach to achieve speed up is to divide the graph into many smaller subgraphs, which are then distributed across multiple GPUs in one or more machines and processed in parallel. However, existing distributed methods require frequent and substantial cross-GPU communication, leading to significant time overhead and progressively diminishing scalability. Here, we introduce CoFree-GNN, a novel distributed GNN training framework that significantly speeds up the training process by implementing communication-free training. The framework utilizes a Vertex Cut partitioning, i.e., rather than partitioning the graph by cutting the edges between partitions, the Vertex Cut partitions the edges and duplicates the node information to preserve the graph structure. Furthermore, the framework maintains high model accuracy by incorporating a reweighting mechanism to handle a distorted graph distribution that arises from the duplicated nodes. We also propose a modified DropEdge technique to further speed up the training process. Using an extensive set of experiments on real-world networks, we demonstrate that CoFree-GNN speeds up the GNN training process by up to 10 times over the existing state-of-the-art GNN training approaches.
△ Less
Submitted 6 August, 2023;
originally announced August 2023.
-
Coincidence detection probability of $(γ, 2e)$ photoemission measurement
Authors:
Yuehua Su,
Kun Cao,
Chao Zhang
Abstract:
In the study of the strongly correlated electrons, one of the challenging core tasks is to develop the potential techniques for direct detection of the many-body correlations of the strongly correlated electrons. $(γ, 2e)$ photoemission technique has been developed to investigate the two-body correlations of the target correlated electrons. In this article, we will focus on this technique for the…
▽ More
In the study of the strongly correlated electrons, one of the challenging core tasks is to develop the potential techniques for direct detection of the many-body correlations of the strongly correlated electrons. $(γ, 2e)$ photoemission technique has been developed to investigate the two-body correlations of the target correlated electrons. In this article, we will focus on this technique for the correlated electrons near the Fermi energy. The coincidence detection probability of the two emitted electrons in the $(γ, 2e)$ photoemission measurement is shown to be relevant to a two-body Bethe-Salpeter wave function, which describes the dynamical two-body correlations of the target correlated electrons. As the coincidence detection probability involves an electron-electron interaction matrix element, the arbitrary momentum and/or energy transfer due to this electron-electron interaction makes the $(γ, 2e)$ photoemission technique fail to reveal the inner-pair structure of the two-body Bethe-Salpeter wave function. However, the center-of-mass momentum and energy of the two-body Bethe-Salpeter wave function can be distinctly resolved. Thus, the $(γ, 2e)$ photoemission technique can provide the center-of-mass physics of the two-body Bethe-Salpeter wave function of the target correlated electrons. It would be one potential technique to study the center-of-mass physics of the Cooper pairs in superconductor.
△ Less
Submitted 24 July, 2023; v1 submitted 24 July, 2023;
originally announced July 2023.
-
Tightly-Coupled LiDAR-Visual SLAM Based on Geometric Features for Mobile Agents
Authors:
Ke Cao,
Ruiping Liu,
Ze Wang,
Kunyu Peng,
Jiaming Zhang,
Junwei Zheng,
Zhifeng Teng,
Kailun Yang,
Rainer Stiefelhagen
Abstract:
The mobile robot relies on SLAM (Simultaneous Localization and Mapping) to provide autonomous navigation and task execution in complex and unknown environments. However, it is hard to develop a dedicated algorithm for mobile robots due to dynamic and challenging situations, such as poor lighting conditions and motion blur. To tackle this issue, we propose a tightly-coupled LiDAR-visual SLAM based…
▽ More
The mobile robot relies on SLAM (Simultaneous Localization and Mapping) to provide autonomous navigation and task execution in complex and unknown environments. However, it is hard to develop a dedicated algorithm for mobile robots due to dynamic and challenging situations, such as poor lighting conditions and motion blur. To tackle this issue, we propose a tightly-coupled LiDAR-visual SLAM based on geometric features, which includes two sub-systems (LiDAR and monocular visual SLAM) and a fusion framework. The fusion framework associates the depth and semantics of the multi-modal geometric features to complement the visual line landmarks and to add direction optimization in Bundle Adjustment (BA). This further constrains visual odometry. On the other hand, the entire line segment detected by the visual subsystem overcomes the limitation of the LiDAR subsystem, which can only perform the local calculation for geometric features. It adjusts the direction of linear feature points and filters out outliers, leading to a higher accurate odometry system. Finally, we employ a module to detect the subsystem's operation, providing the LiDAR subsystem's output as a complementary trajectory to our system while visual subsystem tracking fails. The evaluation results on the public dataset M2DGR, gathered from ground robots across various indoor and outdoor scenarios, show that our system achieves more accurate and robust pose estimation compared to current state-of-the-art multi-modal methods.
△ Less
Submitted 25 December, 2023; v1 submitted 15 July, 2023;
originally announced July 2023.
-
Open Scene Understanding: Grounded Situation Recognition Meets Segment Anything for Helping People with Visual Impairments
Authors:
Ruiping Liu,
Jiaming Zhang,
Kunyu Peng,
Junwei Zheng,
Ke Cao,
Yufan Chen,
Kailun Yang,
Rainer Stiefelhagen
Abstract:
Grounded Situation Recognition (GSR) is capable of recognizing and interpreting visual scenes in a contextually intuitive way, yielding salient activities (verbs) and the involved entities (roles) depicted in images. In this work, we focus on the application of GSR in assisting people with visual impairments (PVI). However, precise localization information of detected objects is often required to…
▽ More
Grounded Situation Recognition (GSR) is capable of recognizing and interpreting visual scenes in a contextually intuitive way, yielding salient activities (verbs) and the involved entities (roles) depicted in images. In this work, we focus on the application of GSR in assisting people with visual impairments (PVI). However, precise localization information of detected objects is often required to navigate their surroundings confidently and make informed decisions. For the first time, we propose an Open Scene Understanding (OpenSU) system that aims to generate pixel-wise dense segmentation masks of involved entities instead of bounding boxes. Specifically, we build our OpenSU system on top of GSR by additionally adopting an efficient Segment Anything Model (SAM). Furthermore, to enhance the feature extraction and interaction between the encoder-decoder structure, we construct our OpenSU system using a solid pure transformer backbone to improve the performance of GSR. In order to accelerate the convergence, we replace all the activation functions within the GSR decoders with GELU, thereby reducing the training duration. In quantitative analysis, our model achieves state-of-the-art performance on the SWiG dataset. Moreover, through field testing on dedicated assistive technology datasets and application demonstrations, the proposed OpenSU system can be used to enhance scene understanding and facilitate the independent mobility of people with visual impairments. Our code will be available at https://github.com/RuipingL/OpenSU.
△ Less
Submitted 15 July, 2023;
originally announced July 2023.
-
Learning Large Graph Property Prediction via Graph Segment Training
Authors:
Kaidi Cao,
Phitchaya Mangpo Phothilimthana,
Sami Abu-El-Haija,
Dustin Zelle,
Yanqi Zhou,
Charith Mendis,
Jure Leskovec,
Bryan Perozzi
Abstract:
Learning to predict properties of large graphs is challenging because each prediction requires the knowledge of an entire graph, while the amount of memory available during training is bounded. Here we propose Graph Segment Training (GST), a general framework that utilizes a divide-and-conquer approach to allow learning large graph property prediction with a constant memory footprint. GST first di…
▽ More
Learning to predict properties of large graphs is challenging because each prediction requires the knowledge of an entire graph, while the amount of memory available during training is bounded. Here we propose Graph Segment Training (GST), a general framework that utilizes a divide-and-conquer approach to allow learning large graph property prediction with a constant memory footprint. GST first divides a large graph into segments and then backpropagates through only a few segments sampled per training iteration. We refine the GST paradigm by introducing a historical embedding table to efficiently obtain embeddings for segments not sampled for backpropagation. To mitigate the staleness of historical embeddings, we design two novel techniques. First, we finetune the prediction head to fix the input distribution shift. Second, we introduce Stale Embedding Dropout to drop some stale embeddings during training to reduce bias. We evaluate our complete method GST-EFD (with all the techniques together) on two large graph property prediction benchmarks: MalNet and TpuGraphs. Our experiments show that GST-EFD is both memory-efficient and fast, while offering a slight boost on test accuracy over a typical full graph training regime.
△ Less
Submitted 5 November, 2023; v1 submitted 20 May, 2023;
originally announced May 2023.
-
What is the best recipe for character-level encoder-only modelling?
Authors:
Kris Cao
Abstract:
This paper aims to benchmark recent progress in language understanding models that output contextualised representations at the character level. Many such modelling architectures and methods to train those architectures have been proposed, but it is currently unclear what the relative contributions of the architecture vs. the pretraining objective are to final model performance. We explore the des…
▽ More
This paper aims to benchmark recent progress in language understanding models that output contextualised representations at the character level. Many such modelling architectures and methods to train those architectures have been proposed, but it is currently unclear what the relative contributions of the architecture vs. the pretraining objective are to final model performance. We explore the design space of such models, comparing architectural innovations and a variety of different pretraining objectives on a suite of evaluation tasks with a fixed training procedure in order to find the currently optimal way to build and train character-level BERT-like models. We find that our best performing character-level model exceeds the performance of a token-based model trained with the same settings on the same data, suggesting that character-level models are ready for more widespread adoption. Unfortunately, the best method to train character-level models still relies on a subword-level tokeniser during pretraining, and final model performance is highly dependent on tokeniser quality. We believe our results demonstrate the readiness of character-level models for multilingual language representation, and encourage NLP practitioners to try them as drop-in replacements for token-based models.
△ Less
Submitted 9 May, 2023;
originally announced May 2023.
-
Path Planning for Multiple Tethered Robots Using Topological Braids
Authors:
Muqing Cao,
Kun Cao,
Shenghai Yuan,
Kangcheng Liu,
Yan Loi Wong,
Lihua Xie
Abstract:
Path planning for multiple tethered robots is a challenging problem due to the complex interactions among the cables and the possibility of severe entanglements. Previous works on this problem either consider idealistic cable models or provide no guarantee for entanglement-free paths. In this work, we present a new approach to address this problem using the theory of braids. By establishing a topo…
▽ More
Path planning for multiple tethered robots is a challenging problem due to the complex interactions among the cables and the possibility of severe entanglements. Previous works on this problem either consider idealistic cable models or provide no guarantee for entanglement-free paths. In this work, we present a new approach to address this problem using the theory of braids. By establishing a topological equivalence between the physical cables and the space-time trajectories of the robots, and identifying particular braid patterns that emerge from the entangled trajectories, we obtain the key finding that all complex entanglements stem from a finite number of interaction patterns between 2 or 3 robots. Hence, non-entanglement can be guaranteed by avoiding these interaction patterns in the trajectories of the robots. Based on this finding, we present a graph search algorithm using the permutation grid to efficiently search for a feasible topology of paths and reject braid patterns that result in an entanglement. We demonstrate that the proposed algorithm can achieve 100% goal-reaching capability without entanglement for up to 10 drones with a slack cable model in a high-fidelity simulation platform. The practicality of the proposed approach is verified using three small tethered UAVs in indoor flight experiments.
△ Less
Submitted 15 June, 2023; v1 submitted 29 April, 2023;
originally announced May 2023.
-
Pilgrimage to Pureland: Art, Perception and the Wutai Mural VR Reconstruction
Authors:
Rongxuan Mu,
Yuhe Nie,
Kent Cao,
Ruoxin You,
Yinzong Wei,
Xin Tong
Abstract:
Virtual reality (VR) supports audiences to engage with cultural heritage proactively. We designed an easy-to-access and guided Pilgrimage To Pureland VR reconstruction of Dunhuang Mogao Grottoes to offer the general public an accessible and engaging way to explore the Dunhuang murals. We put forward an immersive VR reconstruction paradigm that can efficiently convert complex 2D artwork into a VR e…
▽ More
Virtual reality (VR) supports audiences to engage with cultural heritage proactively. We designed an easy-to-access and guided Pilgrimage To Pureland VR reconstruction of Dunhuang Mogao Grottoes to offer the general public an accessible and engaging way to explore the Dunhuang murals. We put forward an immersive VR reconstruction paradigm that can efficiently convert complex 2D artwork into a VR environment. We reconstructed the Mt. Wutai pilgrimage mural in Cave 61, Mogao Grottoes, Dunhuang, into an immersive VR environment and created a plot-based and interactive experience that offers users a more accessible solution to visit, understand and appreciate the complex religious, historical, and artistic value of Dunhuang murals. \textcolor{black}{Our system remarkably smoothed users' approaches to those elusive cultural heritages. Appropriate adaptation of plots and 3D VR transfer consistent with the original art style could enhance the accessibility of cultural heritages.
△ Less
Submitted 15 April, 2023;
originally announced April 2023.
-
Statistical mechanics for non-Hermitian quantum systems
Authors:
Kui Cao,
Su-Peng Kou
Abstract:
We present a systematic study of statistical mechanics for non-Hermitian quantum systems. Our work reveals that the stability of a non-Hermitian system necessitates the existence of a single path-dependent conserved quantity, which, in conjunction with the system's Hamiltonian, dictates the equilibrium state. By elucidating the relationship between the Hamiltonian and the supported conserved quant…
▽ More
We present a systematic study of statistical mechanics for non-Hermitian quantum systems. Our work reveals that the stability of a non-Hermitian system necessitates the existence of a single path-dependent conserved quantity, which, in conjunction with the system's Hamiltonian, dictates the equilibrium state. By elucidating the relationship between the Hamiltonian and the supported conserved quantity, we propose criteria for discerning equilibrium states with finite relaxation times. Although our findings indicate that only non-Hermitian systems with real energy spectrum precisely possess such conserved quantities, we also demonstrate that an effective conserved quantity can manifest in certain systems with complex energy spectra. The effective conserved quantity, alongside the effective transitions within their associated subspace, collectively determines the system's equilibrium state. Our results provide valuable insights into non-Hermitian systems across more realistic contexts and hold potential for applications in a diverse range of physical systems.
△ Less
Submitted 1 December, 2023; v1 submitted 10 April, 2023;
originally announced April 2023.
-
Revisiting Deep Learning for Variable Type Recovery
Authors:
Kevin Cao,
Kevin Leach
Abstract:
Compiled binary executables are often the only available artifact in reverse engineering, malware analysis, and software systems maintenance. Unfortunately, the lack of semantic information like variable types makes comprehending binaries difficult. In efforts to improve the comprehensibility of binaries, researchers have recently used machine learning techniques to predict semantic information co…
▽ More
Compiled binary executables are often the only available artifact in reverse engineering, malware analysis, and software systems maintenance. Unfortunately, the lack of semantic information like variable types makes comprehending binaries difficult. In efforts to improve the comprehensibility of binaries, researchers have recently used machine learning techniques to predict semantic information contained in the original source code. Chen et al. implemented DIRTY, a Transformer-based Encoder-Decoder architecture capable of augmenting decompiled code with variable names and types by leveraging decompiler output tokens and variable size information. Chen et al. were able to demonstrate a substantial increase in name and type extraction accuracy on Hex-Rays decompiler outputs compared to existing static analysis and AI-based techniques. We extend the original DIRTY results by re-training the DIRTY model on a dataset produced by the open-source Ghidra decompiler. Although Chen et al. concluded that Ghidra was not a suitable decompiler candidate due to its difficulty in parsing and incorporating DWARF symbols during analysis, we demonstrate that straightforward parsing of variable data generated by Ghidra results in similar retyping performance. We hope this work inspires further interest and adoption of the Ghidra decompiler for use in research projects.
△ Less
Submitted 7 April, 2023;
originally announced April 2023.
-
Non-Hermitian Chiral Skin Effect
Authors:
Xinran Ma,
Kui Cao,
Xiaoran Wang,
Zheng Wei,
Supeng Kou
Abstract:
The interplay between non-Hermitian effects and topological insulators has become a frontier of research in non-Hermitian physics. However, the existence of a non-Hermitian skin effect for topological-protected edge states remains controversial. In this paper, we discover an alternative form of the non-Hermitian skin effect called the non-Hermitian chiral skin effect (NHCSE). NHCSE is a non-Hermit…
▽ More
The interplay between non-Hermitian effects and topological insulators has become a frontier of research in non-Hermitian physics. However, the existence of a non-Hermitian skin effect for topological-protected edge states remains controversial. In this paper, we discover an alternative form of the non-Hermitian skin effect called the non-Hermitian chiral skin effect (NHCSE). NHCSE is a non-Hermitian skin effect under periodic boundary condition rather than open boundary condition. Specifically, the chiral modes of the NHCSE localize around \textquotedblleft topological defects\textquotedblright characterized by global dissipation rather than being confined to the system boundaries. We show its detailed physical properties by taking the non-Hermitian Haldane model as an example. As a result, the intrinsic mechanism of the hybrid skin-topological effect in Chern insulators is fully understood via NHCSE. Therefore, this progress will be helpful for solving the controversial topic of hybrid skin-topological effect and thus benefit the research on both non-Hermitian physics and topological quantum states.
△ Less
Submitted 24 July, 2023; v1 submitted 3 April, 2023;
originally announced April 2023.
-
Aperiodic dynamical quantum phase transitions in multi-band Bloch Hamiltonian and its origin
Authors:
Kaiyuan Cao,
Hao Guo,
Guangwen Yang
Abstract:
We investigate the dynamical quantum phase transition (DQPT) in the multi-band Bloch Hamiltonian of the one-dimensional periodic Kitaev model, focusing on quenches from a Bloch band. By analyzing the dynamical free energy and Pancharatnam geometric phase, we show that the critical times of DQPTs deviate from periodic spacing due to the multi-band effect, contrasting with results from two-band mode…
▽ More
We investigate the dynamical quantum phase transition (DQPT) in the multi-band Bloch Hamiltonian of the one-dimensional periodic Kitaev model, focusing on quenches from a Bloch band. By analyzing the dynamical free energy and Pancharatnam geometric phase, we show that the critical times of DQPTs deviate from periodic spacing due to the multi-band effect, contrasting with results from two-band models. We propose a geometric interpretation to explain this non-uniform spacing. Additionally, we clarify the conditions needed for DQPT occurrence in the multi-band Bloch Hamiltonian, highlighting that a DQPT only arises when the quench from the Bloch states collapses the band gap at the critical point. Moreover, we establish that the dynamical topological order parameter, defined by the winding number of the Pancharatnam geometric phase, is not quantized but still exhibits discontinuous jumps at DQPT critical times due to periodic modulation. Additionally, we extend our analysis to mixed-state DQPT and find its absence at non-zero temperatures.
△ Less
Submitted 27 July, 2023; v1 submitted 28 March, 2023;
originally announced March 2023.
-
360BEV: Panoramic Semantic Mapping for Indoor Bird's-Eye View
Authors:
Zhifeng Teng,
Jiaming Zhang,
Kailun Yang,
Kunyu Peng,
Hao Shi,
Simon Reiß,
Ke Cao,
Rainer Stiefelhagen
Abstract:
Seeing only a tiny part of the whole is not knowing the full circumstance. Bird's-eye-view (BEV) perception, a process of obtaining allocentric maps from egocentric views, is restricted when using a narrow Field of View (FoV) alone. In this work, mapping from 360° panoramas to BEV semantics, the 360BEV task, is established for the first time to achieve holistic representations of indoor scenes in…
▽ More
Seeing only a tiny part of the whole is not knowing the full circumstance. Bird's-eye-view (BEV) perception, a process of obtaining allocentric maps from egocentric views, is restricted when using a narrow Field of View (FoV) alone. In this work, mapping from 360° panoramas to BEV semantics, the 360BEV task, is established for the first time to achieve holistic representations of indoor scenes in a top-down view. Instead of relying on narrow-FoV image sequences, a panoramic image with depth information is sufficient to generate a holistic BEV semantic map. To benchmark 360BEV, we present two indoor datasets, 360BEV-Matterport and 360BEV-Stanford, both of which include egocentric panoramic images and semantic segmentation labels, as well as allocentric semantic maps. Besides delving deep into different mapping paradigms, we propose a dedicated solution for panoramic semantic mapping, namely 360Mapper. Through extensive experiments, our methods achieve 44.32% and 45.78% in mIoU on both datasets respectively, surpassing previous counterparts with gains of +7.60% and +9.70% in mIoU. Code and datasets are available at the project page: https://jamycheung.github.io/360BEV.html.
△ Less
Submitted 4 September, 2023; v1 submitted 21 March, 2023;
originally announced March 2023.
-
Simulating image coaddition with the Nancy Grace Roman Space Telescope: II. Analysis of the simulated images and implications for weak lensing
Authors:
Masaya Yamamoto,
Katherine Laliotis,
Emily Macbeth,
Tianqing Zhang,
Christopher M. Hirata,
M. A. Troxel,
Kaili Cao,
Ami Choi,
Jahmour Givans,
Katrin Heitmann,
Mustapha Ishak,
Mike Jarvis,
Eve Kovacs,
Heyang Long,
Rachel Mandelbaum,
Andy Park,
Anna Porredon,
Christopher W. Walter,
W. Michael Wood-Vasey
Abstract:
One challenge for applying current weak lensing analysis tools to the Nancy Grace Roman Space Telescope is that individual images will be undersampled. Our companion paper presented an initial application of Imcom - an algorithm that builds an optimal mapping from input to output pixels to reconstruct a fully sampled combined image - on the Roman image simulations. In this paper, we measure the ou…
▽ More
One challenge for applying current weak lensing analysis tools to the Nancy Grace Roman Space Telescope is that individual images will be undersampled. Our companion paper presented an initial application of Imcom - an algorithm that builds an optimal mapping from input to output pixels to reconstruct a fully sampled combined image - on the Roman image simulations. In this paper, we measure the output noise power spectra, identify the sources of the major features in the power spectra, and show that simple analytic models that ignore sampling effects underestimate the power spectra of the coadded noise images. We compute the moments of both idealized injected stars and fully simulated stars in the coadded images, and their 1- and 2-point statistics. We show that the idealized injected stars have root-mean-square ellipticity errors (1 - 6) x 10-4 per component depending on the band; the correlation functions are >= 2 orders of magnitude below requirements, indicating that the image combination step itself is using a small fraction of the overall Roman 2nd moment error budget, although the 4th moments are larger and warrant further investigation. The stars in the simulated sky images, which include blending and chromaticity effects, have correlation functions near the requirement level (and below the requirement level in a wide-band image constructed by stacking all 4 filters). We evaluate the noise-induced biases in the ellipticities of injected stars, and explain the resulting trends with an analytical model. We conclude by enumerating the next steps in developing an image coaddition pipeline for Roman.
△ Less
Submitted 12 January, 2024; v1 submitted 15 March, 2023;
originally announced March 2023.
-
Simulating image coaddition with the Nancy Grace Roman Space Telescope: I. Simulation methodology and general results
Authors:
Christopher M. Hirata,
Masaya Yamamoto,
Katherine Laliotis,
Emily Macbeth,
M. A. Troxel,
Tianqing Zhang,
Kaili Cao,
Ami Choi,
Jahmour Givans,
Katrin Heitmann,
Mustapha Ishak,
Mike Jarvis,
Eve Kovacs,
Heyang Long,
Rachel Mandelbaum,
Andy Park,
Anna Porredon,
Christopher W. Walter,
W. Michael Wood-Vasey
Abstract:
The upcoming Nancy Grace Roman Space Telescope will carry out a wide-area survey in the near infrared. A key science objective is the measurement of cosmic structure via weak gravitational lensing. Roman data will be undersampled, which introduces new challenges in the measurement of source galaxy shapes; a potential solution is to use linear algebra-based coaddition techniques such as Imcom that…
▽ More
The upcoming Nancy Grace Roman Space Telescope will carry out a wide-area survey in the near infrared. A key science objective is the measurement of cosmic structure via weak gravitational lensing. Roman data will be undersampled, which introduces new challenges in the measurement of source galaxy shapes; a potential solution is to use linear algebra-based coaddition techniques such as Imcom that combine multiple undersampled images to produce a single oversampled output mosaic with a desired "target" point spread function (PSF). We present here an initial application of Imcom to 0.64 square degrees of simulated Roman data, based on the Roman branch of the Legacy Survey of Space and Time (LSST) Dark Energy Science Collaboration (DESC) Data Challenge 2 (DC2) simulation. We show that Imcom runs successfully on simulated data that includes features such as plate scale distortions, chip gaps, detector defects, and cosmic ray masks. We simultaneously propagate grids of injected sources and simulated noise fields as well as the full simulation. We quantify the residual deviations of the PSF from the target (the "leakage"), as well as noise properties of the output images; we discuss how the overall tiling pattern as well as Moiré patterns appear in the final leakage and noise maps. We include appendices on interpolation algorithms and the interaction of undersampling with image processing operations that may be of broader applicability. The companion paper ("Paper II") explores the implications for weak lensing analyses.
△ Less
Submitted 12 January, 2024; v1 submitted 15 March, 2023;
originally announced March 2023.
-
AutoTransfer: AutoML with Knowledge Transfer -- An Application to Graph Neural Networks
Authors:
Kaidi Cao,
Jiaxuan You,
Jiaju Liu,
Jure Leskovec
Abstract:
AutoML has demonstrated remarkable success in finding an effective neural architecture for a given machine learning task defined by a specific dataset and an evaluation metric. However, most present AutoML techniques consider each task independently from scratch, which requires exploring many architectures, leading to high computational cost. Here we propose AutoTransfer, an AutoML solution that i…
▽ More
AutoML has demonstrated remarkable success in finding an effective neural architecture for a given machine learning task defined by a specific dataset and an evaluation metric. However, most present AutoML techniques consider each task independently from scratch, which requires exploring many architectures, leading to high computational cost. Here we propose AutoTransfer, an AutoML solution that improves search efficiency by transferring the prior architectural design knowledge to the novel task of interest. Our key innovation includes a task-model bank that captures the model performance over a diverse set of GNN architectures and tasks, and a computationally efficient task embedding that can accurately measure the similarity among different tasks. Based on the task-model bank and the task embeddings, we estimate the design priors of desirable models of the novel task, by aggregating a similarity-weighted sum of the top-K design distributions on tasks that are similar to the task of interest. The computed design priors can be used with any AutoML search algorithm. We evaluate AutoTransfer on six datasets in the graph machine learning domain. Experiments demonstrate that (i) our proposed task embedding can be computed efficiently, and that tasks with similar embeddings have similar best-performing architectures; (ii) AutoTransfer significantly improves search efficiency with the transferred design priors, reducing the number of explored architectures by an order of magnitude. Finally, we release GNN-Bank-101, a large-scale dataset of detailed GNN training information of 120,000 task-model combinations to facilitate and inspire future research.
△ Less
Submitted 14 March, 2023;
originally announced March 2023.
-
Relational Multi-Task Learning: Modeling Relations between Data and Tasks
Authors:
Kaidi Cao,
Jiaxuan You,
Jure Leskovec
Abstract:
A key assumption in multi-task learning is that at the inference time the multi-task model only has access to a given data point but not to the data point's labels from other tasks. This presents an opportunity to extend multi-task learning to utilize data point's labels from other auxiliary tasks, and this way improves performance on the new task. Here we introduce a novel relational multi-task l…
▽ More
A key assumption in multi-task learning is that at the inference time the multi-task model only has access to a given data point but not to the data point's labels from other tasks. This presents an opportunity to extend multi-task learning to utilize data point's labels from other auxiliary tasks, and this way improves performance on the new task. Here we introduce a novel relational multi-task learning setting where we leverage data point labels from auxiliary tasks to make more accurate predictions on the new task. We develop MetaLink, where our key innovation is to build a knowledge graph that connects data points and tasks and thus allows us to leverage labels from auxiliary tasks. The knowledge graph consists of two types of nodes: (1) data nodes, where node features are data embeddings computed by the neural network, and (2) task nodes, with the last layer's weights for each task as node features. The edges in this knowledge graph capture data-task relationships, and the edge label captures the label of a data point on a particular task. Under MetaLink, we reformulate the new task as a link label prediction problem between a data node and a task node. The MetaLink framework provides flexibility to model knowledge transfer from auxiliary task labels to the task of interest. We evaluate MetaLink on 6 benchmark datasets in both biochemical and vision domains. Experiments demonstrate that MetaLink can successfully utilize the relations among different tasks, outperforming the state-of-the-art methods under the proposed relational multi-task learning setting, with up to 27% improvement in ROC AUC.
△ Less
Submitted 14 March, 2023;
originally announced March 2023.