-
Adaptive Prediction Ensemble: Improving Out-of-Distribution Generalization of Motion Forecasting
Authors:
Jinning Li,
Jiachen Li,
Sangjae Bae,
David Isele
Abstract:
Deep learning-based trajectory prediction models for autonomous driving often struggle with generalization to out-of-distribution (OOD) scenarios, sometimes performing worse than simple rule-based models. To address this limitation, we propose a novel framework, Adaptive Prediction Ensemble (APE), which integrates deep learning and rule-based prediction experts. A learned routing function, trained…
▽ More
Deep learning-based trajectory prediction models for autonomous driving often struggle with generalization to out-of-distribution (OOD) scenarios, sometimes performing worse than simple rule-based models. To address this limitation, we propose a novel framework, Adaptive Prediction Ensemble (APE), which integrates deep learning and rule-based prediction experts. A learned routing function, trained concurrently with the deep learning model, dynamically selects the most reliable prediction based on the input scenario. Our experiments on large-scale datasets, including Waymo Open Motion Dataset (WOMD) and Argoverse, demonstrate improvement in zero-shot generalization across datasets. We show that our method outperforms individual prediction models and other variants, particularly in long-horizon prediction and scenarios with a high proportion of OOD data. This work highlights the potential of hybrid approaches for robust and generalizable motion prediction in autonomous driving.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Benchmarking Large Neighborhood Search for Multi-Agent Path Finding
Authors:
Jiaqi Tan,
Yudong Luo,
Jiaoyang Li,
Hang Ma
Abstract:
Multi-Agent Path Finding (MAPF) aims to arrange collision-free goal-reaching paths for a group of agents. Anytime MAPF solvers based on large neighborhood search (LNS) have gained prominence recently due to their flexibility and scalability. Neighborhood selection strategy is crucial to the success of MAPF-LNS and a flurry of methods have been proposed. However, several pitfalls exist and hinder a…
▽ More
Multi-Agent Path Finding (MAPF) aims to arrange collision-free goal-reaching paths for a group of agents. Anytime MAPF solvers based on large neighborhood search (LNS) have gained prominence recently due to their flexibility and scalability. Neighborhood selection strategy is crucial to the success of MAPF-LNS and a flurry of methods have been proposed. However, several pitfalls exist and hinder a comprehensive evaluation of these new methods, which mainly include: 1) Lower than actual or incorrect baseline performance; 2) Lack of a unified evaluation setting and criterion; 3) Lack of a codebase or executable model for supervised learning methods. To overcome these challenges, we conduct a fair comparison across prominent methods on the same benchmark and hyperparameter search settings. Additionally, we propose a simple neighborhood selection strategy which marks a clear advancement in terms of runtime efficiency in large maps with large number of agents. Our benchmarking evaluation promotes new challenges for existing learning based methods and presents opportunities for future research when machine learning is integrated with MAPF-LNS. Code and data are available at https://github.com/ChristinaTan0704/mapf-lns-benchmark.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Detailed Mapping of the Galactic Disk Structure in the Solar Neighborhood through LAMOST K Dwarfs
Authors:
Xi-Can Tang,
Hao Tian,
Jing Li,
Bing-qiu Chen,
Yi-Rong Chen,
Chao Liu,
Dan Qiu
Abstract:
The Galactic disk is one of the main components of the Milky Way, which contributes most of the luminosity. Its structure is essential for understanding the formation and evolution of the Milky Way. Using 174,443 K-type dwarf stars observed by both LAMOST and Gaia DR3, we study the disk density profile in the local volume within 1,200 pc. In the azimuthal dimension, we find strong asymmetric signa…
▽ More
The Galactic disk is one of the main components of the Milky Way, which contributes most of the luminosity. Its structure is essential for understanding the formation and evolution of the Milky Way. Using 174,443 K-type dwarf stars observed by both LAMOST and Gaia DR3, we study the disk density profile in the local volume within 1,200 pc. In the azimuthal dimension, we find strong asymmetric signal of the thin disk. The surface density and the scale height of the southern disk significantly change versus the azimuthal angle at the same galactocentric distance $R$. Meanwhile, in the vertical dimension, the scale height of the northern disk has quite different trend than that of the southern one. The scale height of the southern disk shows a decreasing trend with $φ\sim-2.5^\circ$, and change to an increasing one with $φ\sim5.0^°$. Meanwhile, the scale height of the northern disk has a consistently smaller increase. Finally, we divide the entire sample into three subsamples based on metallicity and all three subsamples show significant non-axisymmetric and north-south asymmetric signals in the Galactic disk. Furthermore, we find that the scale height of the metal-poor ([Fe/H] $<$ -0.4 dex) subsample in the northern disk is greater than that of the metal-rich ([Fe/H] $>$ -0.1 dex) subsample. However, in the southern disk, the scale height exhibits varying relationships across different metallicity slices.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Multi-objective Aerial Collaborative Secure Communication Optimization via Generative Diffusion Model-enabled Deep Reinforcement Learning
Authors:
Chuang Zhang,
Geng Sun,
Jiahui Li,
Qingqing Wu,
Jiacheng Wang,
Dusit Niyato,
Yuanwei Liu
Abstract:
Due to flexibility and low-cost, unmanned aerial vehicles (UAVs) are increasingly crucial for enhancing coverage and functionality of wireless networks. However, incorporating UAVs into next-generation wireless communication systems poses significant challenges, particularly in sustaining high-rate and long-range secure communications against eavesdropping attacks. In this work, we consider a UAV…
▽ More
Due to flexibility and low-cost, unmanned aerial vehicles (UAVs) are increasingly crucial for enhancing coverage and functionality of wireless networks. However, incorporating UAVs into next-generation wireless communication systems poses significant challenges, particularly in sustaining high-rate and long-range secure communications against eavesdropping attacks. In this work, we consider a UAV swarm-enabled secure surveillance network system, where a UAV swarm forms a virtual antenna array to transmit sensitive surveillance data to a remote base station (RBS) via collaborative beamforming (CB) so as to resist mobile eavesdroppers. Specifically, we formulate an aerial secure communication and energy efficiency multi-objective optimization problem (ASCEE-MOP) to maximize the secrecy rate of the system and to minimize the flight energy consumption of the UAV swarm. To address the non-convex, NP-hard and dynamic ASCEE-MOP, we propose a generative diffusion model-enabled twin delayed deep deterministic policy gradient (GDMTD3) method. Specifically, GDMTD3 leverages an innovative application of diffusion models to determine optimal excitation current weights and position decisions of UAVs. The diffusion models can better capture the complex dynamics and the trade-off of the ASCEE-MOP, thereby yielding promising solutions. Simulation results highlight the superior performance of the proposed approach compared with traditional deployment strategies and some other deep reinforcement learning (DRL) benchmarks. Moreover, performance analysis under various parameter settings of GDMTD3 and different numbers of UAVs verifies the robustness of the proposed approach.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Scalable microwave-to-optical transducers at single photon level with spins
Authors:
Tian Xie,
Rikuto Fukumori,
Jiahui Li,
Andrei Faraon
Abstract:
Microwave-to-optical transduction of single photons will play an essential role in interconnecting future superconducting quantum devices, with applications in distributed quantum computing and secure communications. Various transducers that couple microwave and optical modes via an optical drive have been developed, utilizing nonlinear phenomena such as the Pockels effect and a combination of ele…
▽ More
Microwave-to-optical transduction of single photons will play an essential role in interconnecting future superconducting quantum devices, with applications in distributed quantum computing and secure communications. Various transducers that couple microwave and optical modes via an optical drive have been developed, utilizing nonlinear phenomena such as the Pockels effect and a combination of electromechanical, piezoelectric, and optomechanical couplings. However, the limited strength of these nonlinearities, set by bulk material properties, requires the use of high quality factor resonators, often in conjunction with sophisticated nano-fabrication of suspended structures. Thus, an efficient and scalable transduction technology is still an outstanding goal. Rare-earth ion (REI) doped crystals provide high-quality atomic resonances that result in effective second-order nonlinearities stronger by many orders of magnitude compared to conventional materials. Here, we use ytterbium-171 ions doped in a YVO$_4$ crystal at 340 ppm with an effective resonant $χ^{(2)}$ nonlinearity of ~ 10$^7$ pm/V to implement an on-chip microwave-to-optical transducer. Without an engineered optical cavity, we achieve percent-level efficiencies with an added noise as low as 1.24(9) photons. To showcase scalability, we demonstrate the interference of photons originating from two simultaneously operated transducers, enabled by the inherent absolute frequencies of the atomic transitions. These results establish REI-based transducers as a highly competitive transduction platform, provide existing REI-based quantum technologies a native link to various leading quantum microwave platforms, and pave the way toward remote transducer-assisted entanglement of superconducting quantum machines.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Autoregressive Speech Synthesis without Vector Quantization
Authors:
Lingwei Meng,
Long Zhou,
Shujie Liu,
Sanyuan Chen,
Bing Han,
Shujie Hu,
Yanqing Liu,
Jinyu Li,
Sheng Zhao,
Xixin Wu,
Helen Meng,
Furu Wei
Abstract:
We present MELLE, a novel continuous-valued tokens based language modeling approach for text to speech synthesis (TTS). MELLE autoregressively generates continuous mel-spectrogram frames directly from text condition, bypassing the need for vector quantization, which are originally designed for audio compression and sacrifice fidelity compared to mel-spectrograms. Specifically, (i) instead of cross…
▽ More
We present MELLE, a novel continuous-valued tokens based language modeling approach for text to speech synthesis (TTS). MELLE autoregressively generates continuous mel-spectrogram frames directly from text condition, bypassing the need for vector quantization, which are originally designed for audio compression and sacrifice fidelity compared to mel-spectrograms. Specifically, (i) instead of cross-entropy loss, we apply regression loss with a proposed spectrogram flux loss function to model the probability distribution of the continuous-valued tokens. (ii) we have incorporated variational inference into MELLE to facilitate sampling mechanisms, thereby enhancing the output diversity and model robustness. Experiments demonstrate that, compared to the two-stage codec language models VALL-E and its variants, the single-stage MELLE mitigates robustness issues by avoiding the inherent flaws of sampling discrete codes, achieves superior performance across multiple metrics, and, most importantly, offers a more streamlined paradigm. See https://aka.ms/melle for demos of our work.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Cross-Regularization
Authors:
Jinlong Li,
Zequn Jie,
Elisa Ricci,
Lin Ma,
Nicu Sebe
Abstract:
Efficient finetuning of vision-language models (VLMs) like CLIP for specific downstream tasks is gaining significant attention. Previous works primarily focus on prompt learning to adapt the CLIP into a variety of downstream tasks, however, suffering from task overfitting when finetuned on a small data set. In this paper, we introduce an orthogonal finetuning method for efficiently updating pretra…
▽ More
Efficient finetuning of vision-language models (VLMs) like CLIP for specific downstream tasks is gaining significant attention. Previous works primarily focus on prompt learning to adapt the CLIP into a variety of downstream tasks, however, suffering from task overfitting when finetuned on a small data set. In this paper, we introduce an orthogonal finetuning method for efficiently updating pretrained weights which enhances robustness and generalization, while a cross-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed \textbf{\textit{OrthCR}}. Specifically, trainable orthogonal matrices are injected seamlessly into the transformer architecture and enforced with orthogonality constraint using Cayley parameterization, benefiting from the norm-preserving property and thus leading to stable and faster convergence. To alleviate deviation from orthogonal constraint during training, a cross-regularization strategy is further employed with initial pretrained weights within a bypass manner. In addition, to enrich the sample diversity for downstream tasks, we first explore Cutout data augmentation to boost the efficient finetuning and comprehend how our approach improves the specific downstream performance and maintains the generalizability in the perspective of Orthogonality Learning. Beyond existing prompt learning techniques, we conduct extensive experiments to demonstrate that our method explicitly steers pretrained weight space to represent the task-specific knowledge and presents competitive generalizability under \textit{base-to-base/base-to-new}, \textit{cross-dataset transfer} and \textit{domain generalization} evaluations.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Chromosomal Structural Abnormality Diagnosis by Homologous Similarity
Authors:
Juren Li,
Fanzhe Fu,
Ran Wei,
Yifei Sun,
Zeyu Lai,
Ning Song,
Xin Chen,
Yang Yang
Abstract:
Pathogenic chromosome abnormalities are very common among the general population. While numerical chromosome abnormalities can be quickly and precisely detected, structural chromosome abnormalities are far more complex and typically require considerable efforts by human experts for identification. This paper focuses on investigating the modeling of chromosome features and the identification of chr…
▽ More
Pathogenic chromosome abnormalities are very common among the general population. While numerical chromosome abnormalities can be quickly and precisely detected, structural chromosome abnormalities are far more complex and typically require considerable efforts by human experts for identification. This paper focuses on investigating the modeling of chromosome features and the identification of chromosomes with structural abnormalities. Most existing data-driven methods concentrate on a single chromosome and consider each chromosome independently, overlooking the crucial aspect of homologous chromosomes. In normal cases, homologous chromosomes share identical structures, with the exception that one of them is abnormal. Therefore, we propose an adaptive method to align homologous chromosomes and diagnose structural abnormalities through homologous similarity. Inspired by the process of human expert diagnosis, we incorporate information from multiple pairs of homologous chromosomes simultaneously, aiming to reduce noise disturbance and improve prediction performance. Extensive experiments on real-world datasets validate the effectiveness of our model compared to baselines.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
fairBERTs: Erasing Sensitive Information Through Semantic and Fairness-aware Perturbations
Authors:
Jinfeng Li,
Yuefeng Chen,
Xiangyu Liu,
Longtao Huang,
Rong Zhang,
Hui Xue
Abstract:
Pre-trained language models (PLMs) have revolutionized both the natural language processing research and applications. However, stereotypical biases (e.g., gender and racial discrimination) encoded in PLMs have raised negative ethical implications for PLMs, which critically limits their broader applications. To address the aforementioned unfairness issues, we present fairBERTs, a general framework…
▽ More
Pre-trained language models (PLMs) have revolutionized both the natural language processing research and applications. However, stereotypical biases (e.g., gender and racial discrimination) encoded in PLMs have raised negative ethical implications for PLMs, which critically limits their broader applications. To address the aforementioned unfairness issues, we present fairBERTs, a general framework for learning fair fine-tuned BERT series models by erasing the protected sensitive information via semantic and fairness-aware perturbations generated by a generative adversarial network. Through extensive qualitative and quantitative experiments on two real-world tasks, we demonstrate the great superiority of fairBERTs in mitigating unfairness while maintaining the model utility. We also verify the feasibility of transferring adversarial components in fairBERTs to other conventionally trained BERT-like models for yielding fairness improvements. Our findings may shed light on further research on building fairer fine-tuned PLMs.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Magnon squeezing via reservoir-engineered optomagnomechanics
Authors:
Zhi-Yuan Fan,
Huai-Bing Zhu,
Hao-Tian Li,
Jie Li
Abstract:
We show how to prepare magnonic squeezed states in an optomagnomechanical system, in which magnetostriction induced mechanical displacement couples to an optical cavity via radiation pressure. We discuss two scenarios depending on whether the magnomechanical coupling is linear or dispersive. We show that in both cases the strong mechanical squeezing obtained via two-tone driving of the optical cav…
▽ More
We show how to prepare magnonic squeezed states in an optomagnomechanical system, in which magnetostriction induced mechanical displacement couples to an optical cavity via radiation pressure. We discuss two scenarios depending on whether the magnomechanical coupling is linear or dispersive. We show that in both cases the strong mechanical squeezing obtained via two-tone driving of the optical cavity can be efficiently transferred to the magnon mode. In the linear coupling case, stationary magnon squeezing is achieved; while in the dispersive coupling case, a transient magnonic squeezed state is prepared in a two-step protocol. The proposed magnonic squeezed states find promising applications in quantum information processing and quantum sensing using magnons.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
SCPNet: Unsupervised Cross-modal Homography Estimation via Intra-modal Self-supervised Learning
Authors:
Runmin Zhang,
Jun Ma,
Si-Yuan Cao,
Lun Luo,
Beinan Yu,
Shu-Jie Chen,
Junwei Li,
Hui-Liang Shen
Abstract:
We propose a novel unsupervised cross-modal homography estimation framework based on intra-modal Self-supervised learning, Correlation, and consistent feature map Projection, namely SCPNet. The concept of intra-modal self-supervised learning is first presented to facilitate the unsupervised cross-modal homography estimation. The correlation-based homography estimation network and the consistent fe…
▽ More
We propose a novel unsupervised cross-modal homography estimation framework based on intra-modal Self-supervised learning, Correlation, and consistent feature map Projection, namely SCPNet. The concept of intra-modal self-supervised learning is first presented to facilitate the unsupervised cross-modal homography estimation. The correlation-based homography estimation network and the consistent feature map projection are combined to form the learnable architecture of SCPNet, boosting the unsupervised learning framework. SCPNet is the first to achieve effective unsupervised homography estimation on the satellite-map image pair cross-modal dataset, GoogleMap, under [-32,+32] offset on a 128x128 image, leading the supervised approach MHN by 14.0% of mean average corner error (MACE). We further conduct extensive experiments on several cross-modal/spectral and manually-made inconsistent datasets, on which SCPNet achieves the state-of-the-art (SOTA) performance among unsupervised approaches, and owns 49.0%, 25.2%, 36.4%, and 10.7% lower MACEs than the supervised approach MHN. Source code is available at https://github.com/RM-Zhang/SCPNet.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
RoCap: A Robotic Data Collection Pipeline for the Pose Estimation of Appearance-Changing Objects
Authors:
Jiahao Nick Li,
Toby Chong,
Zhongyi Zhou,
Hironori Yoshida,
Koji Yatani,
Xiang 'Anthony' Chen,
Takeo Igarashi
Abstract:
Object pose estimation plays a vital role in mixed-reality interactions when users manipulate tangible objects as controllers. Traditional vision-based object pose estimation methods leverage 3D reconstruction to synthesize training data. However, these methods are designed for static objects with diffuse colors and do not work well for objects that change their appearance during manipulation, suc…
▽ More
Object pose estimation plays a vital role in mixed-reality interactions when users manipulate tangible objects as controllers. Traditional vision-based object pose estimation methods leverage 3D reconstruction to synthesize training data. However, these methods are designed for static objects with diffuse colors and do not work well for objects that change their appearance during manipulation, such as deformable objects like plush toys, transparent objects like chemical flasks, reflective objects like metal pitchers, and articulated objects like scissors. To address this limitation, we propose Rocap, a robotic pipeline that emulates human manipulation of target objects while generating data labeled with ground truth pose information. The user first gives the target object to a robotic arm, and the system captures many pictures of the object in various 6D configurations. The system trains a model by using captured images and their ground truth pose information automatically calculated from the joint angles of the robotic arm. We showcase pose estimation for appearance-changing objects by training simple deep-learning models using the collected data and comparing the results with a model trained with synthetic data based on 3D reconstruction via quantitative and qualitative evaluation. The findings underscore the promising capabilities of Rocap.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Knowledge Overshadowing Causes Amalgamated Hallucination in Large Language Models
Authors:
Yuji Zhang,
Sha Li,
Jiateng Liu,
Pengfei Yu,
Yi R. Fung,
Jing Li,
Manling Li,
Heng Ji
Abstract:
Hallucination is often regarded as a major impediment for using large language models (LLMs), especially for knowledge-intensive tasks. Even when the training corpus consists solely of true statements, language models still generate hallucinations in the form of amalgamations of multiple facts. We coin this phenomenon as ``knowledge overshadowing'': when we query knowledge from a language model wi…
▽ More
Hallucination is often regarded as a major impediment for using large language models (LLMs), especially for knowledge-intensive tasks. Even when the training corpus consists solely of true statements, language models still generate hallucinations in the form of amalgamations of multiple facts. We coin this phenomenon as ``knowledge overshadowing'': when we query knowledge from a language model with multiple conditions, some conditions overshadow others, leading to hallucinated outputs. This phenomenon partially stems from training data imbalance, which we verify on both pretrained models and fine-tuned models, over a wide range of LM model families and sizes.From a theoretical point of view, knowledge overshadowing can be interpreted as over-generalization of the dominant conditions (patterns). We show that the hallucination rate grows with both the imbalance ratio (between the popular and unpopular condition) and the length of dominant condition description, consistent with our derived generalization bound. Finally, we propose to utilize overshadowing conditions as a signal to catch hallucination before it is produced, along with a training-free self-contrastive decoding method to alleviate hallucination during inference. Our proposed approach showcases up to 82% F1 for hallucination anticipation and 11.2% to 39.4% hallucination control, with different models and datasets.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Study of the decay and production properties of $D_{s1}(2536)$ and $D_{s2}^*(2573)$
Authors:
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (645 additional authors not shown)
Abstract:
The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be…
▽ More
The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be $(35.9\pm 4.8\pm 3.5)\%$ and $(37.4\pm 3.1\pm 4.6)\%$, respectively. The measurements are in tension with predictions based on the assumption that the $D_{s1}(2536)$ and $D_{s2}^*(2573)$ are dominated by a bare $c\bar{s}$ component. The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ cross sections are measured, and a resonant structure at around 4.6~GeV with a width of 50~MeV is observed for the first time with a statistical significance of $15σ$ in the $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ process. It could be the $Y(4626)$ found by the Belle collaboration in the $D_s^+D_{s1}(2536)^{-}$ final state, since they have similar masses and widths. There is also evidence for a structure at around 4.75~GeV in both processes.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
GLBench: A Comprehensive Benchmark for Graph with Large Language Models
Authors:
Yuhan Li,
Peisong Wang,
Xiao Zhu,
Aochuan Chen,
Haiyun Jiang,
Deng Cai,
Victor Wai Kin Chan,
Jia Li
Abstract:
The emergence of large language models (LLMs) has revolutionized the way we interact with graphs, leading to a new paradigm called GraphLLM. Despite the rapid development of GraphLLM methods in recent years, the progress and understanding of this field remain unclear due to the lack of a benchmark with consistent experimental protocols. To bridge this gap, we introduce GLBench, the first comprehen…
▽ More
The emergence of large language models (LLMs) has revolutionized the way we interact with graphs, leading to a new paradigm called GraphLLM. Despite the rapid development of GraphLLM methods in recent years, the progress and understanding of this field remain unclear due to the lack of a benchmark with consistent experimental protocols. To bridge this gap, we introduce GLBench, the first comprehensive benchmark for evaluating GraphLLM methods in both supervised and zero-shot scenarios. GLBench provides a fair and thorough evaluation of different categories of GraphLLM methods, along with traditional baselines such as graph neural networks. Through extensive experiments on a collection of real-world datasets with consistent data processing and splitting strategies, we have uncovered several key findings. Firstly, GraphLLM methods outperform traditional baselines in supervised settings, with LLM-as-enhancers showing the most robust performance. However, using LLMs as predictors is less effective and often leads to uncontrollable output issues. We also notice that no clear scaling laws exist for current GraphLLM methods. In addition, both structures and semantics are crucial for effective zero-shot transfer, and our proposed simple baseline can even outperform several models tailored for zero-shot scenarios. The data and code of the benchmark can be found at https://github.com/NineAbyss/GLBench.
△ Less
Submitted 11 July, 2024; v1 submitted 10 July, 2024;
originally announced July 2024.
-
Question-Score Identity Detection (Q-SID): A Statistical Algorithm to Detect Collusion Groups with Error Quantification from Exam Question Scores
Authors:
Guanao Yan,
Jingyi Jessica Li,
Mark D. Biggin
Abstract:
Collusion between students in online exams is a major problem that undermines the integrity of the exam results. Although there exist methods that use exam data to identify pairs of students who have likely copied each other's answers, these methods are restricted to specific formats of multiple-choice exams. Here we present a statistical algorithm, Q-SID, that efficiently detects groups of studen…
▽ More
Collusion between students in online exams is a major problem that undermines the integrity of the exam results. Although there exist methods that use exam data to identify pairs of students who have likely copied each other's answers, these methods are restricted to specific formats of multiple-choice exams. Here we present a statistical algorithm, Q-SID, that efficiently detects groups of students who likely have colluded, i.e., collusion groups, with error quantification. Q-SID uses graded numeric question scores only, so it works for many formats of multiple-choice and non-multiple-choice exams. Q-SID reports two false-positive rates (FPRs) for each collusion group: (1) empirical FPR, whose null data are from 36 strictly proctored exam datasets independent of the user-input exam data and (2) synthetic FPR, whose null data are simulated from a copula-based probabilistic model, which is first fitted to the user-input exam data and then modified to have no collusion. On 34 unproctored exam datasets, including two benchmark datasets with true positives and negatives verified by textural analysis, we demonstrate that Q-SID is a collusion detection algorithm with powerful and robust performance across exam formats, numbers of questions and students, and exam complexity.
△ Less
Submitted 12 July, 2024; v1 submitted 10 July, 2024;
originally announced July 2024.
-
High-Resolution Cloud Detection Network
Authors:
Jingsheng Li,
Tianxiang Xue,
Jiayi Zhao,
Jingmin Ge,
Yufang Min,
Wei Su,
Kun Zhan
Abstract:
The complexity of clouds, particularly in terms of texture detail at high resolutions, has not been well explored by most existing cloud detection networks. This paper introduces the High-Resolution Cloud Detection Network (HR-cloud-Net), which utilizes a hierarchical high-resolution integration approach. HR-cloud-Net integrates a high-resolution representation module, layer-wise cascaded feature…
▽ More
The complexity of clouds, particularly in terms of texture detail at high resolutions, has not been well explored by most existing cloud detection networks. This paper introduces the High-Resolution Cloud Detection Network (HR-cloud-Net), which utilizes a hierarchical high-resolution integration approach. HR-cloud-Net integrates a high-resolution representation module, layer-wise cascaded feature fusion module, and multi-resolution pyramid pooling module to effectively capture complex cloud features. This architecture preserves detailed cloud texture information while facilitating feature exchange across different resolutions, thereby enhancing overall performance in cloud detection. Additionally, a novel approach is introduced wherein a student view, trained on noisy augmented images, is supervised by a teacher view processing normal images. This setup enables the student to learn from cleaner supervisions provided by the teacher, leading to improved performance. Extensive evaluations on three optical satellite image cloud detection datasets validate the superior performance of HR-cloud-Net compared to existing methods.The source code is available at \url{https://github.com/kunzhan/HR-cloud-Net}.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Event-Aided Time-to-Collision Estimation for Autonomous Driving
Authors:
Jinghang Li,
Bangyan Liao,
Xiuyuan LU,
Peidong Liu,
Shaojie Shen,
Yi Zhou
Abstract:
Predicting a potential collision with leading vehicles is an essential functionality of any autonomous/assisted driving system. One bottleneck of existing vision-based solutions is that their updating rate is limited to the frame rate of standard cameras used. In this paper, we present a novel method that estimates the time to collision using a neuromorphic event-based camera, a biologically inspi…
▽ More
Predicting a potential collision with leading vehicles is an essential functionality of any autonomous/assisted driving system. One bottleneck of existing vision-based solutions is that their updating rate is limited to the frame rate of standard cameras used. In this paper, we present a novel method that estimates the time to collision using a neuromorphic event-based camera, a biologically inspired visual sensor that can sense at exactly the same rate as scene dynamics. The core of the proposed algorithm consists of a two-step approach for efficient and accurate geometric model fitting on event data in a coarse-to-fine manner. The first step is a robust linear solver based on a novel geometric measurement that overcomes the partial observability of event-based normal flow. The second step further refines the resulting model via a spatio-temporal registration process formulated as a nonlinear optimization problem. Experiments on both synthetic and real data demonstrate the effectiveness of the proposed method, outperforming other alternative methods in terms of efficiency and accuracy.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Dual-stage Hyperspectral Image Classification Model with Spectral Supertoken
Authors:
Peifu Liu,
Tingfa Xu,
Jie Wang,
Huan Chen,
Huiyan Bai,
Jianan Li
Abstract:
Hyperspectral image classification, a task that assigns pre-defined classes to each pixel in a hyperspectral image of remote sensing scenes, often faces challenges due to the neglect of correlations between spectrally similar pixels. This oversight can lead to inaccurate edge definitions and difficulties in managing minor spectral variations in contiguous areas. To address these issues, we introdu…
▽ More
Hyperspectral image classification, a task that assigns pre-defined classes to each pixel in a hyperspectral image of remote sensing scenes, often faces challenges due to the neglect of correlations between spectrally similar pixels. This oversight can lead to inaccurate edge definitions and difficulties in managing minor spectral variations in contiguous areas. To address these issues, we introduce the novel Dual-stage Spectral Supertoken Classifier (DSTC), inspired by superpixel concepts. DSTC employs spectrum-derivative-based pixel clustering to group pixels with similar spectral characteristics into spectral supertokens. By projecting the classification of these tokens onto the image space, we achieve pixel-level results that maintain regional classification consistency and precise boundary. Moreover, recognizing the diversity within tokens, we propose a class-proportion-based soft label. This label adaptively assigns weights to different categories based on their prevalence, effectively managing data distribution imbalances and enhancing classification performance. Comprehensive experiments on WHU-OHS, IP, KSC, and UP datasets corroborate the robust classification capabilities of DSTC and the effectiveness of its individual components. Code will be publicly available at https://github.com/laprf/DSTC.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Deformation-Recovery Diffusion Model (DRDM): Instance Deformation for Image Manipulation and Synthesis
Authors:
Jian-Qing Zheng,
Yuanhan Mo,
Yang Sun,
Jiahua Li,
Fuping Wu,
Ziyang Wang,
Tonia Vincent,
Bartłomiej W. Papież
Abstract:
In medical imaging, the diffusion models have shown great potential in synthetic image generation tasks. However, these models often struggle with the interpretable connections between the generated and existing images and could create illusions. To address these challenges, our research proposes a novel diffusion-based generative model based on deformation diffusion and recovery. This model, name…
▽ More
In medical imaging, the diffusion models have shown great potential in synthetic image generation tasks. However, these models often struggle with the interpretable connections between the generated and existing images and could create illusions. To address these challenges, our research proposes a novel diffusion-based generative model based on deformation diffusion and recovery. This model, named Deformation-Recovery Diffusion Model (DRDM), diverges from traditional score/intensity and latent feature-based approaches, emphasizing morphological changes through deformation fields rather than direct image synthesis. This is achieved by introducing a topological-preserving deformation field generation method, which randomly samples and integrates a set of multi-scale Deformation Vector Fields (DVF). DRDM is trained to learn to recover unreasonable deformation components, thereby restoring each randomly deformed image to a realistic distribution. These innovations facilitate the generation of diverse and anatomically plausible deformations, enhancing data augmentation and synthesis for further analysis in downstream tasks, such as few-shot learning and image registration. Experimental results in cardiac MRI and pulmonary CT show DRDM is capable of creating diverse, large (over 10% image size deformation scale), and high-quality (negative ratio of folding rate is lower than 1%) deformation fields. The further experimental results in downstream tasks, 2D image segmentation and 3D image registration, indicate significant improvements resulting from DRDM, showcasing the potential of our model to advance image manipulation and synthesis in medical imaging and beyond.
Our implementation will be available at https://github.com/jianqingzheng/def_diff_rec.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Trichotomy for the orbits of a hypercyclic operator on a Banach space
Authors:
Jian Li
Abstract:
We obtain a trichotomy for the orbits of a hypercyclic operator $T$ on a separable Banach space $X$: (1) every vector is mean asymptotic to zero; (2) generic vectors are absolutely mean irregular; (3) every hypercyclic vector is mean divergent to infinity. Examples of weighted backward shifts on $\ell^p$ show that all three cases can happen.
We obtain a trichotomy for the orbits of a hypercyclic operator $T$ on a separable Banach space $X$: (1) every vector is mean asymptotic to zero; (2) generic vectors are absolutely mean irregular; (3) every hypercyclic vector is mean divergent to infinity. Examples of weighted backward shifts on $\ell^p$ show that all three cases can happen.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models
Authors:
Yue Zhang,
Ziqiao Ma,
Jialu Li,
Yanyuan Qiao,
Zun Wang,
Joyce Chai,
Qi Wu,
Mohit Bansal,
Parisa Kordjamshidi
Abstract:
Vision-and-Language Navigation (VLN) has gained increasing attention over recent years and many approaches have emerged to advance their development. The remarkable achievements of foundation models have shaped the challenges and proposed methods for VLN research. In this survey, we provide a top-down review that adopts a principled framework for embodied planning and reasoning, and emphasizes the…
▽ More
Vision-and-Language Navigation (VLN) has gained increasing attention over recent years and many approaches have emerged to advance their development. The remarkable achievements of foundation models have shaped the challenges and proposed methods for VLN research. In this survey, we provide a top-down review that adopts a principled framework for embodied planning and reasoning, and emphasizes the current methods and future opportunities leveraging foundation models to address VLN challenges. We hope our in-depth discussions could provide valuable resources and insights: on one hand, to milestone the progress and explore opportunities and potential roles for foundation models in this field, and on the other, to organize different challenges and solutions in VLN to foundation model researchers.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
SKYCASTLE: Taming LEO Mobility to Facilitate Seamless and Low-latency Satellite Internet Services
Authors:
Jihao Li,
Hewu Li,
Zeqi Lai,
Qian Wu,
Weisen Liu,
Xiaomo Wang,
Yuanjie Li,
Jun Liu,
Qi Zhang
Abstract:
Emerging integrated space and terrestrial networks (ISTN) built upon low earth orbit (LEO) satellite constellations aim at providing planet-wide Internet services, not only for residential users, but also for mobile users (e.g., in airplane and cruise scenarios). Efficiently managing global mobility and keeping connections active for mobile users is critical for ISTN operators. However, our quanti…
▽ More
Emerging integrated space and terrestrial networks (ISTN) built upon low earth orbit (LEO) satellite constellations aim at providing planet-wide Internet services, not only for residential users, but also for mobile users (e.g., in airplane and cruise scenarios). Efficiently managing global mobility and keeping connections active for mobile users is critical for ISTN operators. However, our quantitative analysis identifies that existing mobility management (MM) schemes suffer from frequent connection interruptions and long latency in ISTN scenarios. The fundamental challenge stems from a unique characteristic of ISTNs: not only users are mobile, but also core network infrastructures (i.e., LEO satellites) are frequently changing their locations in the network. To facilitate seamless and low-latency satellite Internet services, this paper presents SKYCASTLE, a novel network-based global mobility management mechanism. SKYCASTLE incorporates two key techniques to address frequent connection interruptions in ISTNs. First, to reduce the interruption time, SKYCASTLE adopts distributed satellite anchors to track the location changes of mobile nodes, manage handovers and avoid routing convergence. Second, SKYCASTLE leverages an anchor manager to schedule MM functionalities at satellites to reduce deployment costs while guaranteeing low latency. Extensive evaluations combining real constellation information and mobile user trajectories show that: SKYCASTLE can improve up to 55.8% uninterrupted time and reduce 47.8% latency as compared to other existing MM solutions.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Implicit Regression in Subspace for High-Sensitivity CEST Imaging
Authors:
Chu Chen,
Yang Liu,
Se Weon Park,
Jizhou Li,
Kannie W. Y. Chan,
Raymond H. F. Chan
Abstract:
Chemical Exchange Saturation Transfer (CEST) MRI demonstrates its capability in significantly enhancing the detection of proteins and metabolites with low concentrations through exchangeable protons. The clinical application of CEST, however, is constrained by its low contrast and low signal-to-noise ratio (SNR) in the acquired data. Denoising, as one of the post-processing stages for CEST data, c…
▽ More
Chemical Exchange Saturation Transfer (CEST) MRI demonstrates its capability in significantly enhancing the detection of proteins and metabolites with low concentrations through exchangeable protons. The clinical application of CEST, however, is constrained by its low contrast and low signal-to-noise ratio (SNR) in the acquired data. Denoising, as one of the post-processing stages for CEST data, can effectively improve the accuracy of CEST quantification. In this work, by modeling spatial variant z-spectrums into low-dimensional subspace, we introduce Implicit Regression in Subspace (IRIS), which is an unsupervised denoising algorithm utilizing the excellent property of implicit neural representation for continuous mapping. Experiments conducted on both synthetic and in-vivo data demonstrate that our proposed method surpasses other CEST denoising methods regarding both qualitative and quantitative performance.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Exploring the Causality of End-to-End Autonomous Driving
Authors:
Jiankun Li,
Hao Li,
Jiangjiang Liu,
Zhikang Zou,
Xiaoqing Ye,
Fan Wang,
Jizhou Huang,
Hua Wu,
Haifeng Wang
Abstract:
Deep learning-based models are widely deployed in autonomous driving areas, especially the increasingly noticed end-to-end solutions. However, the black-box property of these models raises concerns about their trustworthiness and safety for autonomous driving, and how to debug the causality has become a pressing concern. Despite some existing research on the explainability of autonomous driving, t…
▽ More
Deep learning-based models are widely deployed in autonomous driving areas, especially the increasingly noticed end-to-end solutions. However, the black-box property of these models raises concerns about their trustworthiness and safety for autonomous driving, and how to debug the causality has become a pressing concern. Despite some existing research on the explainability of autonomous driving, there is currently no systematic solution to help researchers debug and identify the key factors that lead to the final predicted action of end-to-end autonomous driving. In this work, we propose a comprehensive approach to explore and analyze the causality of end-to-end autonomous driving. First, we validate the essential information that the final planning depends on by using controlled variables and counterfactual interventions for qualitative analysis. Then, we quantitatively assess the factors influencing model decisions by visualizing and statistically analyzing the response of key model inputs. Finally, based on the comprehensive study of the multi-factorial end-to-end autonomous driving system, we have developed a strong baseline and a tool for exploring causality in the close-loop simulator CARLA. It leverages the essential input sources to obtain a well-designed model, resulting in highly competitive capabilities. As far as we know, our work is the first to unveil the mystery of end-to-end autonomous driving and turn the black box into a white one. Thorough close-loop experiments demonstrate that our method can be applied to end-to-end autonomous driving solutions for causality debugging. Code will be available at https://github.com/bdvisl/DriveInsight.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Improving Speech Enhancement by Integrating Inter-Channel and Band Features with Dual-branch Conformer
Authors:
Jizhen Li,
Xinmeng Xu,
Weiping Tu,
Yuhong Yang,
Rong Zhu
Abstract:
Recent speech enhancement methods based on convolutional neural networks (CNNs) and transformer have been demonstrated to efficaciously capture time-frequency (T-F) information on spectrogram. However, the correlation of each channels of speech features is failed to explore. Theoretically, each channel map of speech features obtained by different convolution kernels contains information with diffe…
▽ More
Recent speech enhancement methods based on convolutional neural networks (CNNs) and transformer have been demonstrated to efficaciously capture time-frequency (T-F) information on spectrogram. However, the correlation of each channels of speech features is failed to explore. Theoretically, each channel map of speech features obtained by different convolution kernels contains information with different scales demonstrating strong correlations. To fill this gap, we propose a novel dual-branch architecture named channel-aware dual-branch conformer (CADB-Conformer), which effectively explores the long range time and frequency correlations among different channels, respectively, to extract channel relation aware time-frequency information. Ablation studies conducted on DNS-Challenge 2020 dataset demonstrate the importance of channel feature leveraging while showing the significance of channel relation aware T-F information for speech enhancement. Extensive experiments also show that the proposed model achieves superior performance than recent methods with an attractive computational costs.
△ Less
Submitted 11 July, 2024; v1 submitted 8 July, 2024;
originally announced July 2024.
-
Not all explicit cues help communicate: Pedestrians' perceptions, fixations, and decisions toward automated vehicles with varied appearance
Authors:
Wei Lyu,
Yaqin Cao,
Yi Ding,
Jingyu Li,
Kai Tian,
Hui Zhang
Abstract:
Given pedestrians' vulnerability in road traffic, it remains unclear how novel AV appearances will impact pedestrians crossing behaviour. To address this gap, this study pioneers an investigation into the influence of AVs' exterior design, correlated with their kinematics, on pedestrians' road-crossing perception and decision-making. A video-based eye-tracking experimental study was conducted with…
▽ More
Given pedestrians' vulnerability in road traffic, it remains unclear how novel AV appearances will impact pedestrians crossing behaviour. To address this gap, this study pioneers an investigation into the influence of AVs' exterior design, correlated with their kinematics, on pedestrians' road-crossing perception and decision-making. A video-based eye-tracking experimental study was conducted with 61 participants who responded to video stimuli depicting a manipulated vehicle approaching a predefined road-crossing location on an unsignalized, two-way road. The vehicle's kinematic pattern was manipulated into yielding and non-yielding, and its external appearances were varied across five types: with a human driver (as a conventional vehicle), with no driver (as an AV), with text-based identity indications, with roof radar sensors, with dynamic eHMIs adjusted to vehicle kinematics. Participants' perceived clarity, crossing initiation distance (CID), crossing decision time (CDT), and gaze behaviour, during interactions were recorded and reported. The results indicated that AVs' kinematic profiles play a dominant role in pedestrians' road-crossing decisions, supported by their subjective evaluations, CID, CDT, and gaze patterns during interactions. Moreover, the use of clear eHMI, such as dynamic pedestrian icons, reduced pedestrians' visual load, enhanced their perceptual clarity, expedited road-crossing decisions, and thereby improved overall crossing efficiency. However, it was found that both textual identity indications and roof radar sensors have no significant effect on pedestrians' decisions but negatively impact pedestrians' visual attention, as evidenced by heightened fixation counts and prolonged fixation durations, particularly under yielding conditions. Excessive visual and cognitive resource occupation suggests that not all explicit cues facilitate human-vehicle communication.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Enhancing spatial auditory attention decoding with neuroscience-inspired prototype training
Authors:
Zelin Qiu,
Jianjun Gu,
Dingding Yao,
Junfeng Li
Abstract:
The spatial auditory attention decoding (Sp-AAD) technology aims to determine the direction of auditory attention in multi-talker scenarios via neural recordings. Despite the success of recent Sp-AAD algorithms, their performance is hindered by trial-specific features in EEG data. This study aims to improve decoding performance against these features. Studies in neuroscience indicate that spatial…
▽ More
The spatial auditory attention decoding (Sp-AAD) technology aims to determine the direction of auditory attention in multi-talker scenarios via neural recordings. Despite the success of recent Sp-AAD algorithms, their performance is hindered by trial-specific features in EEG data. This study aims to improve decoding performance against these features. Studies in neuroscience indicate that spatial auditory attention can be reflected in the topological distribution of EEG energy across different frequency bands. This insight motivates us to propose Prototype Training, a neuroscience-inspired method for Sp-AAD. This method constructs prototypes with enhanced energy distribution representations and reduced trial-specific characteristics, enabling the model to better capture auditory attention features. To implement prototype training, an EEGWaveNet that employs the wavelet transform of EEG is further proposed. Detailed experiments indicate that the EEGWaveNet with prototype training outperforms other competitive models on various datasets, and the effectiveness of the proposed method is also validated. As a training method independent of model architecture, prototype training offers new insights into the field of Sp-AAD.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Soli-enabled Noncontact Heart Rate Detection for Sleep and Meditation Tracking
Authors:
Luzhou Xu,
Jaime Lien,
Haiguang Li,
Nicholas Gillian,
Rajeev Nongpiur,
Jihan Li,
Qian Zhang,
Jian Cui,
David Jorgensen,
Adam Bernstein,
Lauren Bedal,
Eiji Hayashi,
Jin Yamanaka,
Alex Lee,
Jian Wang,
D Shin,
Ivan Poupyrev,
Trausti Thormundsson,
Anupam Pathak,
Shwetak Patel
Abstract:
Heart rate (HR) is a crucial physiological signal that can be used to monitor health and fitness. Traditional methods for measuring HR require wearable devices, which can be inconvenient or uncomfortable, especially during sleep and meditation. Noncontact HR detection methods employing microwave radar can be a promising alternative. However, the existing approaches in the literature usually use hi…
▽ More
Heart rate (HR) is a crucial physiological signal that can be used to monitor health and fitness. Traditional methods for measuring HR require wearable devices, which can be inconvenient or uncomfortable, especially during sleep and meditation. Noncontact HR detection methods employing microwave radar can be a promising alternative. However, the existing approaches in the literature usually use high-gain antennas and require the sensor to face the user's chest or back, making them difficult to integrate into a portable device and unsuitable for sleep and meditation tracking applications. This study presents a novel approach for noncontact HR detection using a miniaturized Soli radar chip embedded in a portable device (Google Nest Hub). The chip has a $6.5 \mbox{ mm} \times 5 \mbox{ mm} \times 0.9 \mbox{ mm}$ dimension and can be easily integrated into various devices. The proposed approach utilizes advanced signal processing and machine learning techniques to extract HRs from radar signals. The approach is validated on a sleep dataset (62 users, 498 hours) and a meditation dataset (114 users, 1131 minutes). The approach achieves a mean absolute error (MAE) of $1.69$ bpm and a mean absolute percentage error (MAPE) of $2.67\%$ on the sleep dataset. On the meditation dataset, the approach achieves an MAE of $1.05$ bpm and a MAPE of $1.56\%$. The recall rates for the two datasets are $88.53\%$ and $98.16\%$, respectively. This study represents the first application of the noncontact HR detection technology to sleep and meditation tracking, offering a promising alternative to wearable devices for HR monitoring during sleep and meditation.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Data-driven Nucleus Subclassification on Colon H&E using Style-transferred Digital Pathology
Authors:
Lucas W. Remedios,
Shunxing Bao,
Samuel W. Remedios,
Ho Hin Lee,
Leon Y. Cai,
Thomas Li,
Ruining Deng,
Nancy R. Newlin,
Adam M. Saunders,
Can Cui,
Jia Li,
Qi Liu,
Ken S. Lau,
Joseph T. Roland,
Mary K Washington,
Lori A. Coburn,
Keith T. Wilson,
Yuankai Huo,
Bennett A. Landman
Abstract:
Understanding the way cells communicate, co-locate, and interrelate is essential to furthering our understanding of how the body functions. H&E is widely available, however, cell subtyping often requires expert knowledge and the use of specialized stains. To reduce the annotation burden, AI has been proposed for the classification of cells on H&E. For example, the recent Colon Nucleus Identificati…
▽ More
Understanding the way cells communicate, co-locate, and interrelate is essential to furthering our understanding of how the body functions. H&E is widely available, however, cell subtyping often requires expert knowledge and the use of specialized stains. To reduce the annotation burden, AI has been proposed for the classification of cells on H&E. For example, the recent Colon Nucleus Identification and Classification (CoNIC) Challenge focused on labeling 6 cell types on H&E of the colon. However, the CoNIC Challenge was unable to classify epithelial subtypes (progenitor, enteroendocrine, goblet), lymphocyte subtypes (B, helper T, cytotoxic T), and connective subtypes (fibroblasts). We use inter-modality learning to label previously un-labelable cell types on H&E. We take advantage of multiplexed immunofluorescence (MxIF) histology to label 14 cell subclasses. We performed style transfer on the same MxIF tissues to synthesize realistic virtual H&E which we paired with the MxIF-derived cell subclassification labels. We evaluated the efficacy of using a supervised learning scheme where the input was realistic-quality virtual H&E and the labels were MxIF-derived cell subclasses. We assessed our model on private virtual H&E and public real H&E. On virtual H&E, we were able to classify helper T cells and epithelial progenitors with positive predictive values of $0.34 \pm 0.15$ (prevalence $0.03 \pm 0.01$) and $0.47 \pm 0.1$ (prevalence $0.07 \pm 0.02$) respectively, when using ground truth centroid information. On real H&E we could classify helper T cells and epithelial progenitors with upper bound positive predictive values of $0.43 \pm 0.03$ (parent class prevalence 0.21) and $0.94 \pm 0.02$ (parent class prevalence 0.49) when using ground truth centroid information. This is the first work to provide cell type classification for helper T and epithelial progenitor nuclei on H&E.
△ Less
Submitted 15 May, 2024;
originally announced July 2024.
-
OSN: Infinite Representations of Dynamic 3D Scenes from Monocular Videos
Authors:
Ziyang Song,
Jinxi Li,
Bo Yang
Abstract:
It has long been challenging to recover the underlying dynamic 3D scene representations from a monocular RGB video. Existing works formulate this problem into finding a single most plausible solution by adding various constraints such as depth priors and strong geometry constraints, ignoring the fact that there could be infinitely many 3D scene representations corresponding to a single dynamic vid…
▽ More
It has long been challenging to recover the underlying dynamic 3D scene representations from a monocular RGB video. Existing works formulate this problem into finding a single most plausible solution by adding various constraints such as depth priors and strong geometry constraints, ignoring the fact that there could be infinitely many 3D scene representations corresponding to a single dynamic video. In this paper, we aim to learn all plausible 3D scene configurations that match the input video, instead of just inferring a specific one. To achieve this ambitious goal, we introduce a new framework, called OSN. The key to our approach is a simple yet innovative object scale network together with a joint optimization module to learn an accurate scale range for every dynamic 3D object. This allows us to sample as many faithful 3D scene configurations as possible. Extensive experiments show that our method surpasses all baselines and achieves superior accuracy in dynamic novel view synthesis on multiple synthetic and real-world datasets. Most notably, our method demonstrates a clear advantage in learning fine-grained 3D scene geometry. Our code and data are available at https://github.com/vLAR-group/OSN
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
LLMBox: A Comprehensive Library for Large Language Models
Authors:
Tianyi Tang,
Yiwen Hu,
Bingqian Li,
Wenyang Luo,
Zijing Qin,
Haoxiang Sun,
Jiapeng Wang,
Shiyi Xu,
Xiaoxue Cheng,
Geyang Guo,
Han Peng,
Bowen Zheng,
Yiru Tang,
Yingqian Min,
Yushuo Chen,
Jie Chen,
Yuanqian Zhao,
Luran Ding,
Yuhao Wang,
Zican Dong,
Chunxuan Xia,
Junyi Li,
Kun Zhou,
Wayne Xin Zhao,
Ji-Rong Wen
Abstract:
To facilitate the research on large language models (LLMs), this paper presents a comprehensive and unified library, LLMBox, to ease the development, use, and evaluation of LLMs. This library is featured with three main merits: (1) a unified data interface that supports the flexible implementation of various training strategies, (2) a comprehensive evaluation that covers extensive tasks, datasets,…
▽ More
To facilitate the research on large language models (LLMs), this paper presents a comprehensive and unified library, LLMBox, to ease the development, use, and evaluation of LLMs. This library is featured with three main merits: (1) a unified data interface that supports the flexible implementation of various training strategies, (2) a comprehensive evaluation that covers extensive tasks, datasets, and models, and (3) more practical consideration, especially on user-friendliness and efficiency. With our library, users can easily reproduce existing methods, train new models, and conduct comprehensive performance comparisons. To rigorously test LLMBox, we conduct extensive experiments in a diverse coverage of evaluation settings, and experimental results demonstrate the effectiveness and efficiency of our library in supporting various implementations related to LLMs. The detailed introduction and usage guidance can be found at https://github.com/RUCAIBox/LLMBox.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
A Survey of Models for Cognitive Diagnosis: New Developments and Future Directions
Authors:
Fei Wang,
Weibo Gao,
Qi Liu,
Jiatong Li,
Guanhao Zhao,
Zheng Zhang,
Zhenya Huang,
Mengxiao Zhu,
Shijin Wang,
Wei Tong,
Enhong Chen
Abstract:
Cognitive diagnosis has been developed for decades as an effective measurement tool to evaluate human cognitive status such as ability level and knowledge mastery. It has been applied to a wide range of fields including education, sport, psychological diagnosis, etc. By providing better awareness of cognitive status, it can serve as the basis for personalized services such as well-designed medical…
▽ More
Cognitive diagnosis has been developed for decades as an effective measurement tool to evaluate human cognitive status such as ability level and knowledge mastery. It has been applied to a wide range of fields including education, sport, psychological diagnosis, etc. By providing better awareness of cognitive status, it can serve as the basis for personalized services such as well-designed medical treatment, teaching strategy and vocational training. This paper aims to provide a survey of current models for cognitive diagnosis, with more attention on new developments using machine learning-based methods. By comparing the model structures, parameter estimation algorithms, model evaluation methods and applications, we provide a relatively comprehensive review of the recent trends in cognitive diagnosis models. Further, we discuss future directions that are worthy of exploration. In addition, we release two Python libraries: EduData for easy access to some relevant public datasets we have collected, and EduCDM that implements popular CDMs to facilitate both applications and research purposes.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation
Authors:
Haorui He,
Zengqiang Shang,
Chaoren Wang,
Xuyuan Li,
Yicheng Gu,
Hua Hua,
Liwei Liu,
Chen Yang,
Jiaqi Li,
Peiyang Shi,
Yuancheng Wang,
Kai Chen,
Pengyuan Zhang,
Zhizheng Wu
Abstract:
Recently, speech generation models have made significant progress by using large-scale training data. However, the research community struggle to produce highly spontaneous and human-like speech due to the lack of large-scale, diverse, and spontaneous speech data. This paper presents \textit{Emilia}, the first multilingual speech generation dataset from in-the-wild speech data, and Emilia-Pipe, th…
▽ More
Recently, speech generation models have made significant progress by using large-scale training data. However, the research community struggle to produce highly spontaneous and human-like speech due to the lack of large-scale, diverse, and spontaneous speech data. This paper presents \textit{Emilia}, the first multilingual speech generation dataset from in-the-wild speech data, and Emilia-Pipe, the first open-source preprocessing pipeline designed to transform in-the-wild speech data into high-quality training data with annotations for speech generation. Emilia starts with over 101k hours of speech in six languages and features diverse speech with varied speaking styles. To facilitate the scale-up of Emilia, the open-source pipeline Emilia-Pipe can process one hour of raw speech data ready for model training in a few mins, which enables the research community to collaborate on large-scale speech generation research. Experimental results validate the effectiveness of Emilia. Demos are available at: https://emilia-dataset.github.io/Emilia-Demo-Page/.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Unfolding a Hopf bifurcation in a linear reaction-diffusion equation with strongly localized impurity existence of breathing pulses
Authors:
Ji Li,
Qing Yu,
Qian Zhang
Abstract:
This paper presents a general framework to derive the weakly nonlinear stability near a Hopf bifurcation in a special class of multi-scale reaction-diffusion equations. The main focus is on how the linearity and nonlinearity of the fast variables in system influence the emergence of the breathing pulses when the slow variables are linear and the bifurcation parameter is around the Hopf bifurcation…
▽ More
This paper presents a general framework to derive the weakly nonlinear stability near a Hopf bifurcation in a special class of multi-scale reaction-diffusion equations. The main focus is on how the linearity and nonlinearity of the fast variables in system influence the emergence of the breathing pulses when the slow variables are linear and the bifurcation parameter is around the Hopf bifurcation point. By applying the matching principle to the fast and slow changing quantities and using the relevant theory of singular perturbation, we obtain explicit expressions for the stationary pulses. Then, the normal form theory and the center manifold theory are applied to give Hopf normal form expressions. Finally, one of these expressions is verified by the numerical simulation.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
The multiple birth properties of multi-type Markov branching processes
Authors:
Junping Li,
Wanting Zhang
Abstract:
The main purpose of this paper is to consider the multiple birth properties for multi-type Markov branching processes. We first construct a new multi-dimensional Markov process based on the multi-type Markov branching process, which can reveal the multiple birth characteristics. Then the joint probability distribution of multiple birth of multi-type Markov branching process until any time $t$ is o…
▽ More
The main purpose of this paper is to consider the multiple birth properties for multi-type Markov branching processes. We first construct a new multi-dimensional Markov process based on the multi-type Markov branching process, which can reveal the multiple birth characteristics. Then the joint probability distribution of multiple birth of multi-type Markov branching process until any time $t$ is obtained by using the new process. Furthermore, the probability distribution of multiple birth until the extinction of the process is also given.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
Fixed-point properties of the Mordukhovich differential operator
Authors:
Jinlu Li
Abstract:
In this paper, we investigate some fixed-point properties of the Mordukhovich differential operator of set valued mappings (or, single valued mappings) on Banach spaces. In particular, we study the fixed-point properties of the Mordukhovich differential operator for the metric projection operator onto some closed and convex subsets in Banach spaces, such as, closed balls in Banach spaces, positive…
▽ More
In this paper, we investigate some fixed-point properties of the Mordukhovich differential operator of set valued mappings (or, single valued mappings) on Banach spaces. In particular, we study the fixed-point properties of the Mordukhovich differential operator for the metric projection operator onto some closed and convex subsets in Banach spaces, such as, closed balls in Banach spaces, positive cones in real spaces l2 and l1 and sets of polynomials in C[0, 1].
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
A Survey of Datasets for Information Diffusion Tasks
Authors:
Fuxia Guo,
Xiaowen Wang,
Yanwei Xie,
Zehao Wang,
Jingqiu Li,
Lanjun Wang
Abstract:
Information diffusion across various new media platforms gradually influences perceptions, decisions, and social behaviors of individual users. In communication studies, the famous Five W's of Communication model (5W Model) has displayed the process of information diffusion clearly. At present, although plenty of studies and corresponding datasets about information diffusion have emerged, a system…
▽ More
Information diffusion across various new media platforms gradually influences perceptions, decisions, and social behaviors of individual users. In communication studies, the famous Five W's of Communication model (5W Model) has displayed the process of information diffusion clearly. At present, although plenty of studies and corresponding datasets about information diffusion have emerged, a systematic categorization of tasks and an integration of datasets are still lacking. To address this gap, we survey a systematic taxonomy of information diffusion tasks and datasets based on the "5W Model" framework. We first categorize the information diffusion tasks into ten subtasks with definitions and datasets analysis, from three main tasks of information diffusion prediction, social bot detection, and misinformation detection. We also collect the publicly available dataset repository of information diffusion tasks with the available links and compare them based on six attributes affiliated to users and content: user information, social network, bot label, propagation content, propagation network, and veracity label. In addition, we discuss the limitations and future directions of current datasets and research topics to advance the future development of information diffusion. The dataset repository can be accessed at our website https://github.com/fuxiaG/Information-Diffusion-Datasets.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
Regularity of powers of edge ideals of edge-weighted integrally closed cycles
Authors:
Guangjun Zhu,
Yijun Cui,
Jiaxin Li,
Yi Yang
Abstract:
This paper gives exact formulas for the regularity of powers of edge ideals of an edge-weighted integrally closed cycle.
This paper gives exact formulas for the regularity of powers of edge ideals of an edge-weighted integrally closed cycle.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
Activity-Induced Stiffness, Entanglement Network and Dynamic Slowdown in Unentangled Semidilute Polymer Solutions
Authors:
Jing Li,
Bokai Zhang,
Zhi-Yong Wang
Abstract:
Active polymers possess numerous unique properties that are quite different from those observed in the system of small active molecule due to the intricate interplay between their activity and topological constraints. This study focuses on the conformational changes induced by activity, impacting effective stiffness and crucially influencing entanglement and dynamics. When the two terminals of a l…
▽ More
Active polymers possess numerous unique properties that are quite different from those observed in the system of small active molecule due to the intricate interplay between their activity and topological constraints. This study focuses on the conformational changes induced by activity, impacting effective stiffness and crucially influencing entanglement and dynamics. When the two terminals of a linear chain undergo active modification through coupling to a high-temperature thermal bath, there is a substantial increase in chain size, indicating a notable enhancement in effective stiffness. Unlike in passive semiflexible chains where stiffness predominantly affects local bond angles, activity-induced stiffness manifests at the scale of tens of monomers. While activity raises the ambient temperature, it significantly decreases diffusion by over an order of magnitude. The slowdown of dynamics observed can be attributed to increased entanglement due to chain elongation.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
LoRA-GA: Low-Rank Adaptation with Gradient Approximation
Authors:
Shaowen Wang,
Linxi Yu,
Jian Li
Abstract:
Fine-tuning large-scale pretrained models is prohibitively expensive in terms of computational and memory costs. LoRA, as one of the most popular Parameter-Efficient Fine-Tuning (PEFT) methods, offers a cost-effective alternative by fine-tuning an auxiliary low-rank model that has significantly fewer parameters. Although LoRA reduces the computational and memory requirements significantly at each…
▽ More
Fine-tuning large-scale pretrained models is prohibitively expensive in terms of computational and memory costs. LoRA, as one of the most popular Parameter-Efficient Fine-Tuning (PEFT) methods, offers a cost-effective alternative by fine-tuning an auxiliary low-rank model that has significantly fewer parameters. Although LoRA reduces the computational and memory requirements significantly at each iteration, extensive empirical evidence indicates that it converges at a considerably slower rate compared to full fine-tuning, ultimately leading to increased overall compute and often worse test performance. In our paper, we perform an in-depth investigation of the initialization method of LoRA and show that careful initialization (without any change of the architecture and the training algorithm) can significantly enhance both efficiency and performance. In particular, we introduce a novel initialization method, LoRA-GA (Low Rank Adaptation with Gradient Approximation), which aligns the gradients of low-rank matrix product with those of full fine-tuning at the first step. Our extensive experiments demonstrate that LoRA-GA achieves a convergence rate comparable to that of full fine-tuning (hence being significantly faster than vanilla LoRA as well as various recent improvements) while simultaneously attaining comparable or even better performance. For example, on the subset of the GLUE dataset with T5-Base, LoRA-GA outperforms LoRA by 5.69% on average. On larger models such as Llama 2-7B, LoRA-GA shows performance improvements of 0.34, 11.52%, and 5.05% on MT-bench, GSM8K, and Human-eval, respectively. Additionally, we observe up to 2-4 times convergence speed improvement compared to vanilla LoRA, validating its effectiveness in accelerating convergence and enhancing model performance. Code is available at https://github.com/Outsider565/LoRA-GA.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
Congestion-Approximators from the Bottom Up
Authors:
Jason Li,
Satish Rao,
Di Wang
Abstract:
We develop a novel algorithm to construct a congestion-approximator with polylogarithmic quality on a capacitated, undirected graph in nearly-linear time. Our approach is the first *bottom-up* hierarchical construction, in contrast to previous *top-down* approaches including that of Racke, Shah, and Taubig (SODA 2014), the only other construction achieving polylogarithmic quality that is implement…
▽ More
We develop a novel algorithm to construct a congestion-approximator with polylogarithmic quality on a capacitated, undirected graph in nearly-linear time. Our approach is the first *bottom-up* hierarchical construction, in contrast to previous *top-down* approaches including that of Racke, Shah, and Taubig (SODA 2014), the only other construction achieving polylogarithmic quality that is implementable in nearly-linear time (Peng, SODA 2016). Similar to Racke, Shah, and Taubig, our construction at each hierarchical level requires calls to an approximate max-flow/min-cut subroutine. However, the main advantage to our bottom-up approach is that these max-flow calls can be implemented directly *without recursion*. More precisely, the previously computed levels of the hierarchy can be converted into a *pseudo-congestion-approximator*, which then translates to a max-flow algorithm that is sufficient for the particular max-flow calls used in the construction of the next hierarchical level. As a result, we obtain the first non-recursive algorithms for congestion-approximator and approximate max-flow that run in nearly-linear time, a conceptual improvement to the aforementioned algorithms that recursively alternate between the two problems.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
MetaFruit Meets Foundation Models: Leveraging a Comprehensive Multi-Fruit Dataset for Advancing Agricultural Foundation Models
Authors:
Jiajia Li,
Kyle Lammers,
Xunyuan Yin,
Xiang Yin,
Long He,
Renfu Lu,
Zhaojian Li
Abstract:
Fruit harvesting poses a significant labor and financial burden for the industry, highlighting the critical need for advancements in robotic harvesting solutions. Machine vision-based fruit detection has been recognized as a crucial component for robust identification of fruits to guide robotic manipulation. Despite considerable progress in leveraging deep learning and machine learning techniques…
▽ More
Fruit harvesting poses a significant labor and financial burden for the industry, highlighting the critical need for advancements in robotic harvesting solutions. Machine vision-based fruit detection has been recognized as a crucial component for robust identification of fruits to guide robotic manipulation. Despite considerable progress in leveraging deep learning and machine learning techniques for fruit detection, a common shortfall is the inability to swiftly extend the developed models across different orchards and/or various fruit species. Additionally, the limited availability of pertinent data further compounds these challenges. In this work, we introduce MetaFruit, the largest publicly available multi-class fruit dataset, comprising 4,248 images and 248,015 manually labeled instances across diverse U.S. orchards. Furthermore, this study proposes an innovative open-set fruit detection system leveraging advanced Vision Foundation Models (VFMs) for fruit detection that can adeptly identify a wide array of fruit types under varying orchard conditions. This system not only demonstrates remarkable adaptability in learning from minimal data through few-shot learning but also shows the ability to interpret human instructions for subtle detection tasks. The performance of the developed foundation model is comprehensively evaluated using several metrics, which outperforms the existing state-of-the-art algorithms in both our MetaFruit dataset and other open-sourced fruit datasets, thereby setting a new benchmark in the field of agricultural technology and robotic harvesting. The MetaFruit dataset and detection framework are open-sourced to foster future research in vision-based fruit harvesting, marking a significant stride toward addressing the urgent needs of the agricultural sector.
△ Less
Submitted 13 May, 2024;
originally announced July 2024.
-
Spectroscopy of deeply bound orbitals in neutron-rich Ca isotopes
Authors:
P. J. Li,
J. Lee,
P. Doornenbal,
S. Chen,
S. Wang,
A. Obertelli,
Y. Chazono,
J. D. Holt,
B. S. Hu,
K. Ogata,
Y. Utsuno,
K. Yoshida,
N. L. Achouri,
H. Baba,
F. Browne,
D. Calvet,
F. Château,
N. Chiga,
A. Corsi,
M. L. Cortés,
A. Delbart,
J-M. Gheller,
A. Giganon,
A. Gillibert,
C. Hilaire
, et al. (63 additional authors not shown)
Abstract:
The calcium isotopes are an ideal system to investigate the evolution of shell structure and magic numbers. Although the properties of surface nucleons in calcium have been well studied, probing the structure of deeply bound nucleons remains a challenge. Here, we report on the first measurement of unbound states in $^{53}$Ca and $^{55}$Ca, populated from \ts{54,56}Ca($p,pn$) reactions at a beam en…
▽ More
The calcium isotopes are an ideal system to investigate the evolution of shell structure and magic numbers. Although the properties of surface nucleons in calcium have been well studied, probing the structure of deeply bound nucleons remains a challenge. Here, we report on the first measurement of unbound states in $^{53}$Ca and $^{55}$Ca, populated from \ts{54,56}Ca($p,pn$) reactions at a beam energy of around 216 MeV/nucleon at the RIKEN Radioactive Isotopes Beam Factory. The resonance properties, partial cross sections, and momentum distributions of these unbound states were analyzed. Orbital angular momentum $l$ assignments were extracted from momentum distributions based on calculations using the distorted wave impulse approximation (DWIA) reaction model. The resonances at excitation energies of 5516(41)\,keV in $^{53}$Ca and 6000(250)\,keV in $^{55}$Ca indicate a significant $l$\, =\,3 component, providing the first experimental evidence for the $ν0f_{7/2}$ single-particle strength of unbound hole states in the neutron-rich Ca isotopes. The observed excitation energies and cross-sections point towards extremely localized and well separated strength distributions, with some fragmentation for the $ν0f_{7/2}$ orbital in $^{55}$Ca. These results are in good agreement with predictions from shell-model calculations using the effective GXPF1Bs interaction and \textit{ab initio} calculations and diverge markedly from the experimental distributions in the nickel isotones at $Z=28$.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
PROUD: PaRetO-gUided Diffusion Model for Multi-objective Generation
Authors:
Yinghua Yao,
Yuangang Pan,
Jing Li,
Ivor Tsang,
Xin Yao
Abstract:
Recent advancements in the realm of deep generative models focus on generating samples that satisfy multiple desired properties. However, prevalent approaches optimize these property functions independently, thus omitting the trade-offs among them. In addition, the property optimization is often improperly integrated into the generative models, resulting in an unnecessary compromise on generation…
▽ More
Recent advancements in the realm of deep generative models focus on generating samples that satisfy multiple desired properties. However, prevalent approaches optimize these property functions independently, thus omitting the trade-offs among them. In addition, the property optimization is often improperly integrated into the generative models, resulting in an unnecessary compromise on generation quality (i.e., the quality of generated samples). To address these issues, we formulate a constrained optimization problem. It seeks to optimize generation quality while ensuring that generated samples reside at the Pareto front of multiple property objectives. Such a formulation enables the generation of samples that cannot be further improved simultaneously on the conflicting property functions and preserves good quality of generated samples. Building upon this formulation, we introduce the PaRetO-gUided Diffusion model (PROUD), wherein the gradients in the denoising process are dynamically adjusted to enhance generation quality while the generated samples adhere to Pareto optimality. Experimental evaluations on image generation and protein generation tasks demonstrate that our PROUD consistently maintains superior generation quality while approaching Pareto optimality across multiple property functions compared to various baselines.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Towards Context-aware Support for Color Vision Deficiency: An Approach Integrating LLM and AR
Authors:
Shogo Morita,
Yan Zhang,
Takuto Yamauchi,
Sinan Chen,
Jialong Li,
Kenji Tei
Abstract:
People with color vision deficiency often face challenges in distinguishing colors such as red and green, which can complicate daily tasks and require the use of assistive tools or environmental adjustments. Current support tools mainly focus on presentation-based aids, like the color vision modes found in iPhone accessibility settings. However, offering context-aware support, like indicating the…
▽ More
People with color vision deficiency often face challenges in distinguishing colors such as red and green, which can complicate daily tasks and require the use of assistive tools or environmental adjustments. Current support tools mainly focus on presentation-based aids, like the color vision modes found in iPhone accessibility settings. However, offering context-aware support, like indicating the doneness of meat, remains a challenge since task-specific solutions are not cost-effective for all possible scenarios. To address this, our paper proposes an application that provides contextual and autonomous assistance. This application is mainly composed of: (i) an augmented reality interface that efficiently captures context; and (ii) a multi-modal large language model-based reasoner that serves to cognitize the context and then reason about the appropriate support contents. Preliminary user experiments with two color vision deficient users across five different scenarios have demonstrated the effectiveness and universality of our application.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Energy Efficient Knapsack Optimization Using Probabilistic Memristor Crossbars
Authors:
Jinzhan Li,
Suhas Kumar,
Su-in Yi
Abstract:
Constrained optimization underlies crucial societal problems (for instance, stock trading and bandwidth allocation), but is often computationally hard (complexity grows exponentially with problem size). The big-data era urgently demands low-latency and low-energy optimization at the edge, which cannot be handled by digital processors due to their non-parallel von Neumann architecture. Recent effor…
▽ More
Constrained optimization underlies crucial societal problems (for instance, stock trading and bandwidth allocation), but is often computationally hard (complexity grows exponentially with problem size). The big-data era urgently demands low-latency and low-energy optimization at the edge, which cannot be handled by digital processors due to their non-parallel von Neumann architecture. Recent efforts using massively parallel hardware (such as memristor crossbars and quantum processors) employing annealing algorithms, while promising, have handled relatively easy and stable problems with sparse or binary representations (such as the max-cut or traveling salesman problems).However, most real-world applications embody three features, which are encoded in the knapsack problem, and cannot be handled by annealing algorithms - dense and non-binary representations, with destabilizing self-feedback. Here we demonstrate a post-digital-hardware-friendly randomized competitive Ising-inspired (RaCI) algorithm performing knapsack optimization, experimentally implemented on a foundry-manufactured CMOS-integrated probabilistic analog memristor crossbar. Our solution outperforms digital and quantum approaches by over 4 orders of magnitude in energy efficiency.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Glass formation in mechanically interlocked ring polymers: the role of induced chain stiffness
Authors:
Jian Li,
Bokai Zhang,
Yushan Li
Abstract:
Polymer-related materials exhibit rich glassy behaviors at different length scales due to their various molecular structures and topological constraints. Recent studies have identified transient interpenetration of the long-chain rings contributing to dynamic arrest on the center-of-mass level. Interpenetration of rings is proposed as an approach to facilitate glass formation in polymer melts. In…
▽ More
Polymer-related materials exhibit rich glassy behaviors at different length scales due to their various molecular structures and topological constraints. Recent studies have identified transient interpenetration of the long-chain rings contributing to dynamic arrest on the center-of-mass level. Interpenetration of rings is proposed as an approach to facilitate glass formation in polymer melts. In this work, inspired by recent advances in the synthesis of mechanically interlocked polymers, we investigate glass transition on the nanometer-scale segments influenced by permanent interpenetration of rings using molecular dynamics simulations. We find that decreasing chain length in the mechanically interlocked system is equivalent to inducing an effective chain stiffness on the sub-rings. The induced stiffness provides a unified explanation for these unique structural features and transient dynamic arrest in the system of interlocked rings with rather short chains. Further, a crossover is observed in the scaling relation between localization and glassy depth upon cooling. Our work reveals a dynamic transition from weak to strong caging at the crossover temperature. According to the localization model, we demonstrate that the chain stiffness increases the critical temperature and oscillation distance, therefore leads to more fragile dynamics and deeper glassy state. These findings are consistent with the predictions of molecular simulations and theories for polymers with real local stiffness. Our work deepens the understanding of the role of induced stiffness on glass transition, and opens up a new direction to design rich glass materials by manipulating stiffness through mechanical bonds.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Towards Cross-View-Consistent Self-Supervised Surround Depth Estimation
Authors:
Laiyan Ding,
Hualie Jiang,
Jie Li,
Yongquan Chen,
Rui Huang
Abstract:
Depth estimation is a cornerstone for autonomous driving, yet acquiring per-pixel depth ground truth for supervised learning is challenging. Self-Supervised Surround Depth Estimation (SSSDE) from consecutive images offers an economical alternative. While previous SSSDE methods have proposed different mechanisms to fuse information across images, few of them explicitly consider the cross-view const…
▽ More
Depth estimation is a cornerstone for autonomous driving, yet acquiring per-pixel depth ground truth for supervised learning is challenging. Self-Supervised Surround Depth Estimation (SSSDE) from consecutive images offers an economical alternative. While previous SSSDE methods have proposed different mechanisms to fuse information across images, few of them explicitly consider the cross-view constraints, leading to inferior performance, particularly in overlapping regions. This paper proposes an efficient and consistent pose estimation design and two loss functions to enhance cross-view consistency for SSSDE. For pose estimation, we propose to use only front-view images to reduce training memory and sustain pose estimation consistency. The first loss function is the dense depth consistency loss, which penalizes the difference between predicted depths in overlapping regions. The second one is the multi-view reconstruction consistency loss, which aims to maintain consistency between reconstruction from spatial and spatial-temporal contexts. Additionally, we introduce a novel flipping augmentation to improve the performance further. Our techniques enable a simple neural model to achieve state-of-the-art performance on the DDAD and nuScenes datasets. Last but not least, our proposed techniques can be easily applied to other methods. The code will be made public.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
LLMAEL: Large Language Models are Good Context Augmenters for Entity Linking
Authors:
Amy Xin,
Yunjia Qi,
Zijun Yao,
Fangwei Zhu,
Kaisheng Zeng,
Xu Bin,
Lei Hou,
Juanzi Li
Abstract:
Entity Linking (EL) models are well-trained at mapping mentions to their corresponding entities according to a given context. However, EL models struggle to disambiguate long-tail entities due to their limited training data. Meanwhile, large language models (LLMs) are more robust at interpreting uncommon mentions. Yet, due to a lack of specialized training, LLMs suffer at generating correct entity…
▽ More
Entity Linking (EL) models are well-trained at mapping mentions to their corresponding entities according to a given context. However, EL models struggle to disambiguate long-tail entities due to their limited training data. Meanwhile, large language models (LLMs) are more robust at interpreting uncommon mentions. Yet, due to a lack of specialized training, LLMs suffer at generating correct entity IDs. Furthermore, training an LLM to perform EL is cost-intensive. Building upon these insights, we introduce LLM-Augmented Entity Linking LLMAEL, a plug-and-play approach to enhance entity linking through LLM data augmentation. We leverage LLMs as knowledgeable context augmenters, generating mention-centered descriptions as additional input, while preserving traditional EL models for task specific processing. Experiments on 6 standard datasets show that the vanilla LLMAEL outperforms baseline EL models in most cases, while the fine-tuned LLMAEL set the new state-of-the-art results across all 6 benchmarks.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.