-
MuseBarControl: Enhancing Fine-Grained Control in Symbolic Music Generation through Pre-Training and Counterfactual Loss
Authors:
Yangyang Shu,
Haiming Xu,
Ziqin Zhou,
Anton van den Hengel,
Lingqiao Liu
Abstract:
Automatically generating symbolic music-music scores tailored to specific human needs-can be highly beneficial for musicians and enthusiasts. Recent studies have shown promising results using extensive datasets and advanced transformer architectures. However, these state-of-the-art models generally offer only basic control over aspects like tempo and style for the entire composition, lacking the a…
▽ More
Automatically generating symbolic music-music scores tailored to specific human needs-can be highly beneficial for musicians and enthusiasts. Recent studies have shown promising results using extensive datasets and advanced transformer architectures. However, these state-of-the-art models generally offer only basic control over aspects like tempo and style for the entire composition, lacking the ability to manage finer details, such as control at the level of individual bars. While fine-tuning a pre-trained symbolic music generation model might seem like a straightforward method for achieving this finer control, our research indicates challenges in this approach. The model often fails to respond adequately to new, fine-grained bar-level control signals. To address this, we propose two innovative solutions. First, we introduce a pre-training task designed to link control signals directly with corresponding musical tokens, which helps in achieving a more effective initialization for subsequent fine-tuning. Second, we implement a novel counterfactual loss that promotes better alignment between the generated music and the control prompts. Together, these techniques significantly enhance our ability to control music generation at the bar level, showing a 13.06\% improvement over conventional methods. Our subjective evaluations also confirm that this enhanced control does not compromise the musical quality of the original pre-trained generative model.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Data-Centric AI in the Age of Large Language Models
Authors:
Xinyi Xu,
Zhaoxuan Wu,
Rui Qiao,
Arun Verma,
Yao Shu,
Jingtan Wang,
Xinyuan Niu,
Zhenfeng He,
Jiangwei Chen,
Zijian Zhou,
Gregory Kang Ruey Lau,
Hieu Dao,
Lucas Agussurja,
Rachael Hwee Ling Sim,
Xiaoqiang Lin,
Wenyang Hu,
Zhongxiang Dai,
Pang Wei Koh,
Bryan Kian Hsiang Low
Abstract:
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs). We start by making the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs, and yet it receives disproportionally low attention from the research community. We identify four specific…
▽ More
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs). We start by making the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs, and yet it receives disproportionally low attention from the research community. We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization. In each scenario, we underscore the importance of data, highlight promising research directions, and articulate the potential impacts on the research community and, where applicable, the society as a whole. For instance, we advocate for a suite of data-centric benchmarks tailored to the scale and complexity of data for LLMs. These benchmarks can be used to develop new data curation methods and document research efforts and results, which can help promote openness and transparency in AI and LLM research.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Dynamic Asset Allocation with Asset-Specific Regime Forecasts
Authors:
Yizhan Shu,
Chenyu Yu,
John M. Mulvey
Abstract:
This article introduces a novel hybrid regime identification-forecasting framework designed to enhance multi-asset portfolio construction by integrating asset-specific regime forecasts. Unlike traditional approaches that focus on broad economic regimes affecting the entire asset universe, our framework leverages both unsunpervised and supervised learning to generate tailored regime forecasts for i…
▽ More
This article introduces a novel hybrid regime identification-forecasting framework designed to enhance multi-asset portfolio construction by integrating asset-specific regime forecasts. Unlike traditional approaches that focus on broad economic regimes affecting the entire asset universe, our framework leverages both unsunpervised and supervised learning to generate tailored regime forecasts for individual assets. Initially, we use the statistical jump model, a robust unsupervised regime identification model, to derive regime labels for historical periods, classifying them into bullish or bearish states based on features extracted from an asset return series. Following this, a supervised gradient-boosted decision tree classifier is trained to predict these regimes using a combination of asset-specific return features and cross-asset macro-features. We apply this framework individually to each asset in our universe. Subsequently, return and risk forecasts which incorporate these regime predictions are input into Markowitz mean-variance optimization to determine optimal asset allocation weights. We demonstrate the efficacy of our approach through an empirical study on a multi-asset portfolio comprising twelve risky assets, including global equity, bond, real estate, and commodity indexes spanning from 1991 to 2023. The results consistently show outperformance across various portfolio models, including minimum-variance, mean-variance, and naive-diversified portfolios, highlighting the advantages of integrating asset-specific regime forecasts into dynamic asset allocation.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
DeformTime: Capturing Variable Dependencies with Deformable Attention for Time Series Forecasting
Authors:
Yuxuan Shu,
Vasileios Lampos
Abstract:
In multivariate time series (MTS) forecasting, existing state-of-the-art deep learning approaches tend to focus on autoregressive formulations and overlook the information within exogenous indicators. To address this limitation, we present DeformTime, a neural network architecture that attempts to capture correlated temporal patterns from the input space, and hence, improve forecasting accuracy. I…
▽ More
In multivariate time series (MTS) forecasting, existing state-of-the-art deep learning approaches tend to focus on autoregressive formulations and overlook the information within exogenous indicators. To address this limitation, we present DeformTime, a neural network architecture that attempts to capture correlated temporal patterns from the input space, and hence, improve forecasting accuracy. It deploys two core operations performed by deformable attention blocks (DABs): learning dependencies across variables from different time steps (variable DAB), and preserving temporal dependencies in data from previous time steps (temporal DAB). Input data transformation is explicitly designed to enhance learning from the deformed series of information while passing through a DAB. We conduct extensive experiments on 6 MTS data sets, using previously established benchmarks as well as challenging infectious disease modelling tasks with more exogenous variables. The results demonstrate that DeformTime improves accuracy against previous competitive methods across the vast majority of MTS forecasting tasks, reducing the mean absolute error by 10% on average. Notably, performance gains remain consistent across longer forecasting horizons.
△ Less
Submitted 18 June, 2024; v1 submitted 11 June, 2024;
originally announced June 2024.
-
MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding
Authors:
Junjie Zhou,
Yan Shu,
Bo Zhao,
Boya Wu,
Shitao Xiao,
Xi Yang,
Yongping Xiong,
Bo Zhang,
Tiejun Huang,
Zheng Liu
Abstract:
The evaluation of Long Video Understanding (LVU) performance poses an important but challenging research problem. Despite previous efforts, the existing video understanding benchmarks are severely constrained by several issues, especially the insufficient lengths of videos, a lack of diversity in video types and evaluation tasks, and the inappropriateness for evaluating LVU performances. To addres…
▽ More
The evaluation of Long Video Understanding (LVU) performance poses an important but challenging research problem. Despite previous efforts, the existing video understanding benchmarks are severely constrained by several issues, especially the insufficient lengths of videos, a lack of diversity in video types and evaluation tasks, and the inappropriateness for evaluating LVU performances. To address the above problems, we propose a new benchmark, called MLVU (Multi-task Long Video Understanding Benchmark), for the comprehensive and in-depth evaluation of LVU. MLVU presents the following critical values: 1) The substantial and flexible extension of video lengths, which enables the benchmark to evaluate LVU performance across a wide range of durations. 2) The inclusion of various video genres, e.g., movies, surveillance footage, egocentric videos, cartoons, game videos, etc., which reflects the models' LVU performances in different scenarios. 3) The development of diversified evaluation tasks, which enables a comprehensive examination of MLLMs' key abilities in long-video understanding. The empirical study with 20 latest MLLMs reveals significant room for improvement in today's technique, as all existing methods struggle with most of the evaluation tasks and exhibit severe performance degradation when handling longer videos. Additionally, it suggests that factors such as context length, image-understanding quality, and the choice of LLM backbone can play critical roles in future advancements. We anticipate that MLVU will advance the research of long video understanding by providing a comprehensive and in-depth analysis of MLLMs.
△ Less
Submitted 19 June, 2024; v1 submitted 6 June, 2024;
originally announced June 2024.
-
Effects of Exponential Gaussian Distribution on (Double Sampling) Randomized Smoothing
Authors:
Youwei Shu,
Xi Xiao,
Derui Wang,
Yuxin Cao,
Siji Chen,
Jason Xue,
Linyi Li,
Bo Li
Abstract:
Randomized Smoothing (RS) is currently a scalable certified defense method providing robustness certification against adversarial examples. Although significant progress has been achieved in providing defenses against $\ell_p$ adversaries, the interaction between the smoothing distribution and the robustness certification still remains vague. In this work, we comprehensively study the effect of tw…
▽ More
Randomized Smoothing (RS) is currently a scalable certified defense method providing robustness certification against adversarial examples. Although significant progress has been achieved in providing defenses against $\ell_p$ adversaries, the interaction between the smoothing distribution and the robustness certification still remains vague. In this work, we comprehensively study the effect of two families of distributions, named Exponential Standard Gaussian (ESG) and Exponential General Gaussian (EGG) distributions, on Randomized Smoothing and Double Sampling Randomized Smoothing (DSRS). We derive an analytic formula for ESG's certified radius, which converges to the origin formula of RS as the dimension $d$ increases. Additionally, we prove that EGG can provide tighter constant factors than DSRS in providing $Ω(\sqrt{d})$ lower bounds of $\ell_2$ certified radius, and thus further addresses the curse of dimensionality in RS. Our experiments on real-world datasets confirm our theoretical analysis of the ESG distributions, that they provide almost the same certification under different exponents $η$ for both RS and DSRS. In addition, EGG brings a significant improvement to the DSRS certification, but the mechanism can be different when the classifier properties are different. Compared to the primitive DSRS, the increase in certified accuracy provided by EGG is prominent, up to 6.4% on ImageNet.
△ Less
Submitted 5 June, 2024; v1 submitted 4 June, 2024;
originally announced June 2024.
-
Pt nanoparticles dispersed in a metal-organic framework as peroxidase mimics for colorimetric detection of GSH
Authors:
Yanzheng Shu,
Yanwei Chen,
Guiye Shan
Abstract:
Metal-organic skeleton materials have been widely used in catalysis with their porous structure and adsorption properties. Precious metal nanoparticles have good catalytic properties. If the noble metal nanoparticles are adsorbed on the MOFs surface, the active sites can be increased and the catalytic effect of the materials can be greatly improved. We successfully synthesized Pt@ZIF-8 in two step…
▽ More
Metal-organic skeleton materials have been widely used in catalysis with their porous structure and adsorption properties. Precious metal nanoparticles have good catalytic properties. If the noble metal nanoparticles are adsorbed on the MOFs surface, the active sites can be increased and the catalytic effect of the materials can be greatly improved. We successfully synthesized Pt@ZIF-8 in two steps, the average particle size of Pt nanoparticles is about 3 nm. Pt@ZIF-8 possesses peroxidase activity and can oxidize colorless TMB to oxTMB in the presence of hydrogen peroxide. The peroxide-like nature of Pt@ZIF-8 is consistent with Michaelis-Menten kinetics. Glutathione is a reducing substance that reduces blue oxTMB to colorless oxTMB. This colorimetric method achieves a simple, sensitive and intuitive detection of glutathione. The detection limit of this experiment is low, which is promising in biomolecular detection.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
HOLISMOKES XIII: Strong-lens candidates at all mass scales and their environments from the Hyper-Suprime Cam and deep learning
Authors:
Stefan Schuldt,
Raoul Canameras,
Irham T. Andika,
Satadru Bag,
Alejandra Melo,
Yiping Shu,
Sherry H. Suyu,
Stefan Taubenberger,
Claudio Grillo
Abstract:
We have performed a systematic search for galaxy-scale strong lenses using Hyper Suprime-Cam imaging data, focusing on lenses in overdense environments. To identify these lens candidates, we exploit our neural network from HOLISMOKES VI, which is trained on realistic gri mock-images as positive examples, and real images as negative examples. Compared to our previous work, we lower the i-Kron radiu…
▽ More
We have performed a systematic search for galaxy-scale strong lenses using Hyper Suprime-Cam imaging data, focusing on lenses in overdense environments. To identify these lens candidates, we exploit our neural network from HOLISMOKES VI, which is trained on realistic gri mock-images as positive examples, and real images as negative examples. Compared to our previous work, we lower the i-Kron radius limit to >0.5". This results in an increase by around 73 million sources to more than 135 million images. During our visual multi-stage grading of the network candidates, we now inspect simultaneously larger stamps (80"x80") to identify large, extended arcs cropped in the 10"x10" cutouts, and classify additionally their overall environment. Here we also reinspect our previous lens candidates and classify their environment. Using these 546 visually identified lens candidates, we further define various criteria by exploiting extensive and complementary photometric redshift catalogs, to select the candidates in overdensities. In total, we identified 24 grade-A and 138 grade-B candidates with either spatially-resolved multiple images or extended, distorted arcs in the new sample. Furthermore, with our different techniques, we identify in total 237/546 lens candidates in a cluster-like or overdense environment, containing only 49 group- or cluster-scale re-discoveries. These results demonstrate the feasibility of downloading and applying network classifiers to hundreds of million cutouts, necessary in the upcoming era of big data from deep, wide-field imaging surveys like Euclid and the Rubin Observatory Legacy Survey of Space and Time, while leading to a sample size that can be inspected by humans. These networks, with false-positive rates of ~0.01%, are very powerful tools to identify such rare galaxy-scale strong lensing systems, while also aiding in the discovery of new strong lensing clusters.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Learning Interpretable Scheduling Algorithms for Data Processing Clusters
Authors:
Zhibo Hu,
Chen Wang,
Helen,
Paik,
Yanfeng Shu,
Liming Zhu
Abstract:
Workloads in data processing clusters are often represented in the form of DAG (Directed Acyclic Graph) jobs. Scheduling DAG jobs is challenging. Simple heuristic scheduling algorithms are often adopted in practice in production data centres. There is much room for scheduling performance optimisation for cost saving. Recently, reinforcement learning approaches (like decima) have been attempted to…
▽ More
Workloads in data processing clusters are often represented in the form of DAG (Directed Acyclic Graph) jobs. Scheduling DAG jobs is challenging. Simple heuristic scheduling algorithms are often adopted in practice in production data centres. There is much room for scheduling performance optimisation for cost saving. Recently, reinforcement learning approaches (like decima) have been attempted to optimise DAG job scheduling and demonstrate clear performance gain in comparison to traditional algorithms. However, reinforcement learning (RL) approaches face their own problems in real-world deployment. In particular, their black-box decision making processes and generalizability in unseen workloads may add a non-trivial burden to the cluster administrators. Moreover, adapting RL models on unseen workloads often requires significant amount of training data, which leaves edge cases run in a sub-optimal mode. To fill the gap, we propose a new method to distill a simple scheduling policy based on observations of the behaviours of a complex deep learning model. The simple model not only provides interpretability of scheduling decisions, but also adaptive to edge cases easily through tuning. We show that our method achieves high fidelity to the decisions made by deep learning models and outperforms these models when additional heuristics are taken into account.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
ROSE: Register Assisted General Time Series Forecasting with Decomposed Frequency Learning
Authors:
Yihang Wang,
Yuying Qiu,
Peng Chen,
Kai Zhao,
Yang Shu,
Zhongwen Rao,
Lujia Pan,
Bin Yang,
Chenjuan Guo
Abstract:
With the increasing collection of time series data from various domains, there arises a strong demand for general time series forecasting models pre-trained on a large number of time-series datasets to support a variety of downstream prediction tasks. Enabling general time series forecasting faces two challenges: how to obtain unified representations from multi-domian time series data, and how to…
▽ More
With the increasing collection of time series data from various domains, there arises a strong demand for general time series forecasting models pre-trained on a large number of time-series datasets to support a variety of downstream prediction tasks. Enabling general time series forecasting faces two challenges: how to obtain unified representations from multi-domian time series data, and how to capture domain-specific features from time series data across various domains for adaptive transfer in downstream tasks. To address these challenges, we propose a Register Assisted General Time Series Forecasting Model with Decomposed Frequency Learning (ROSE), a novel pre-trained model for time series forecasting. ROSE employs Decomposed Frequency Learning for the pre-training task, which decomposes coupled semantic and periodic information in time series with frequency-based masking and reconstruction to obtain unified representations across domains. We also equip ROSE with a Time Series Register, which learns to generate a register codebook to capture domain-specific representations during pre-training and enhances domain-adaptive transfer by selecting related register tokens on downstream tasks. After pre-training on large-scale time series data, ROSE achieves state-of-the-art forecasting performance on 8 real-world benchmarks. Remarkably, even in few-shot scenarios, it demonstrates competitive or superior performance compared to existing methods trained with full data.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars
Authors:
Zhaoxuan Wu,
Xiaoqiang Lin,
Zhongxiang Dai,
Wenyang Hu,
Yao Shu,
See-Kiong Ng,
Patrick Jaillet,
Bryan Kian Hsiang Low
Abstract:
Large language models (LLMs) have shown impressive capabilities in real-world applications. The capability of in-context learning (ICL) allows us to adapt an LLM to downstream tasks by including input-label exemplars in the prompt without model fine-tuning. However, the quality of these exemplars in the prompt greatly impacts performance, highlighting the need for an effective automated exemplar s…
▽ More
Large language models (LLMs) have shown impressive capabilities in real-world applications. The capability of in-context learning (ICL) allows us to adapt an LLM to downstream tasks by including input-label exemplars in the prompt without model fine-tuning. However, the quality of these exemplars in the prompt greatly impacts performance, highlighting the need for an effective automated exemplar selection method. Recent studies have explored retrieval-based approaches to select exemplars tailored to individual test queries, which can be undesirable due to extra test-time computation and an increased risk of data exposure. Moreover, existing methods fail to adequately account for the impact of exemplar ordering on the performance. On the other hand, the impact of the instruction, another essential component in the prompt given to the LLM, is often overlooked in existing exemplar selection methods. To address these challenges, we propose a novel method named EASE, which leverages the hidden embedding from a pre-trained language model to represent ordered sets of exemplars and uses a neural bandit algorithm to optimize the sets of exemplars while accounting for exemplar ordering. Our EASE can efficiently find an ordered set of exemplars that performs well for all test queries from a given task, thereby eliminating test-time computation. Importantly, EASE can be readily extended to jointly optimize both the exemplars and the instruction. Through extensive empirical evaluations (including novel tasks), we demonstrate the superiority of EASE over existing methods, and reveal practical insights about the impact of exemplar selection on ICL, which may be of independent interest. Our code is available at https://github.com/ZhaoxuanWu/EASE-Prompt-Optimization.
△ Less
Submitted 25 May, 2024;
originally announced May 2024.
-
Towards a General Time Series Anomaly Detector with Adaptive Bottlenecks and Dual Adversarial Decoders
Authors:
Qichao Shentu,
Beibu Li,
Kai Zhao,
Yang Shu,
Zhongwen Rao,
Lujia Pan,
Bin Yang,
Chenjuan Guo
Abstract:
Time series anomaly detection plays a vital role in a wide range of applications. Existing methods require training one specific model for each dataset, which exhibits limited generalization capability across different target datasets, hindering anomaly detection performance in various scenarios with scarce training data. Aiming at this problem, we propose constructing a general time series anomal…
▽ More
Time series anomaly detection plays a vital role in a wide range of applications. Existing methods require training one specific model for each dataset, which exhibits limited generalization capability across different target datasets, hindering anomaly detection performance in various scenarios with scarce training data. Aiming at this problem, we propose constructing a general time series anomaly detection model, which is pre-trained on extensive multi-domain datasets and can subsequently apply to a multitude of downstream scenarios. The significant divergence of time series data across different domains presents two primary challenges in building such a general model: (1) meeting the diverse requirements of appropriate information bottlenecks tailored to different datasets in one unified model, and (2) enabling distinguishment between multiple normal and abnormal patterns, both are crucial for effective anomaly detection in various target scenarios. To tackle these two challenges, we propose a General time series anomaly Detector with Adaptive Bottlenecks and Dual Adversarial Decoders (DADA), which enables flexible selection of bottlenecks based on different data and explicitly enhances clear differentiation between normal and abnormal series. We conduct extensive experiments on nine target datasets from different domains. After pre-training on multi-domain data, DADA, serving as a zero-shot anomaly detector for these datasets, still achieves competitive or even superior results compared to those models tailored to each specific dataset.
△ Less
Submitted 2 June, 2024; v1 submitted 24 May, 2024;
originally announced May 2024.
-
HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models
Authors:
Bernal Jiménez Gutiérrez,
Yiheng Shu,
Yu Gu,
Michihiro Yasunaga,
Yu Su
Abstract:
In order to thrive in hostile and ever-changing natural environments, mammalian brains evolved to store large amounts of knowledge about the world and continually integrate new information while avoiding catastrophic forgetting. Despite the impressive accomplishments, large language models (LLMs), even with retrieval-augmented generation (RAG), still struggle to efficiently and effectively integra…
▽ More
In order to thrive in hostile and ever-changing natural environments, mammalian brains evolved to store large amounts of knowledge about the world and continually integrate new information while avoiding catastrophic forgetting. Despite the impressive accomplishments, large language models (LLMs), even with retrieval-augmented generation (RAG), still struggle to efficiently and effectively integrate a large amount of new experiences after pre-training. In this work, we introduce HippoRAG, a novel retrieval framework inspired by the hippocampal indexing theory of human long-term memory to enable deeper and more efficient knowledge integration over new experiences. HippoRAG synergistically orchestrates LLMs, knowledge graphs, and the Personalized PageRank algorithm to mimic the different roles of neocortex and hippocampus in human memory. We compare HippoRAG with existing RAG methods on multi-hop question answering and show that our method outperforms the state-of-the-art methods remarkably, by up to 20%. Single-step retrieval with HippoRAG achieves comparable or better performance than iterative retrieval like IRCoT while being 10-30 times cheaper and 6-13 times faster, and integrating HippoRAG into IRCoT brings further substantial gains. Finally, we show that our method can tackle new types of scenarios that are out of reach of existing methods. Code and data are available at https://github.com/OSU-NLP-Group/HippoRAG.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Systematic comparison of neural networks used in discovering strong gravitational lenses
Authors:
Anupreeta More,
Raoul Canameras,
Anton T. Jaelani,
Yiping Shu,
Yuichiro Ishida,
Kenneth C. Wong,
Kaiki Taro Inoue,
Stefan Schuldt,
Alessandro Sonnenfeld
Abstract:
Efficient algorithms are being developed to search for strong gravitational lens systems owing to increasing large imaging surveys. Neural networks have been successfully used to discover galaxy-scale lens systems in imaging surveys such as the Kilo Degree Survey, Hyper-Suprime Cam (HSC) Survey and Dark Energy Survey over the last few years. Thus, it has become imperative to understand how some of…
▽ More
Efficient algorithms are being developed to search for strong gravitational lens systems owing to increasing large imaging surveys. Neural networks have been successfully used to discover galaxy-scale lens systems in imaging surveys such as the Kilo Degree Survey, Hyper-Suprime Cam (HSC) Survey and Dark Energy Survey over the last few years. Thus, it has become imperative to understand how some of these networks compare, their strengths and the role of the training datasets as most of the networks make use of supervised learning algorithms. In this work, we present the first-of-its-kind systematic comparison and benchmarking of networks from four teams that have analysed the HSC Survey data. Each team has designed their training samples and developed neural networks independently but coordinated apriori in reserving specific datasets strictly for test purposes. The test sample consists of mock lenses, real (candidate) lenses and real non-lenses gathered from various sources to benchmark and characterise the performance of each of the network. While each team's network performed much better on their own constructed test samples compared to those from others, all networks performed comparable on the test sample with real (candidate) lenses and non-lenses. We also investigate the impact of swapping the training samples amongst the teams while retaining the same network architecture. We find that this resulted in improved performance for some networks. These results have direct implications on measures to be taken for lens searches with upcoming imaging surveys such as the Rubin-Legacy Survey of Space and Time, Roman and Euclid.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Batched Stochastic Bandit for Nondegenerate Functions
Authors:
Yu Liu,
Yunlu Shu,
Tianyu Wang
Abstract:
This paper studies batched bandit learning problems for nondegenerate functions. We introduce an algorithm that solves the batched bandit problem for nondegenerate functions near-optimally. More specifically, we introduce an algorithm, called Geometric Narrowing (GN), whose regret bound is of order $\widetilde{\mathcal{O}} ( A_{+}^d \sqrt{T} )$. In addition, GN only needs…
▽ More
This paper studies batched bandit learning problems for nondegenerate functions. We introduce an algorithm that solves the batched bandit problem for nondegenerate functions near-optimally. More specifically, we introduce an algorithm, called Geometric Narrowing (GN), whose regret bound is of order $\widetilde{\mathcal{O}} ( A_{+}^d \sqrt{T} )$. In addition, GN only needs $\mathcal{O} (\log \log T)$ batches to achieve this regret. We also provide lower bound analysis for this problem. More specifically, we prove that over some (compact) doubling metric space of doubling dimension $d$: 1. For any policy $π$, there exists a problem instance on which $π$ admits a regret of order $Ω ( A_-^d \sqrt{T})$; 2. No policy can achieve a regret of order $ A_-^d \sqrt{T} $ over all problem instances, using less than $ Ω( \log \log T ) $ rounds of communications. Our lower bound analysis shows that the GN algorithm achieves near optimal regret with minimal number of batches.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
Towards Real-World HDR Video Reconstruction: A Large-Scale Benchmark Dataset and A Two-Stage Alignment Network
Authors:
Yong Shu,
Liquan Shen,
Xiangyu Hu,
Mengyao Li,
Zihao Zhou
Abstract:
As an important and practical way to obtain high dynamic range (HDR) video, HDR video reconstruction from sequences with alternating exposures is still less explored, mainly due to the lack of large-scale real-world datasets. Existing methods are mostly trained on synthetic datasets, which perform poorly in real scenes. In this work, to facilitate the development of real-world HDR video reconstruc…
▽ More
As an important and practical way to obtain high dynamic range (HDR) video, HDR video reconstruction from sequences with alternating exposures is still less explored, mainly due to the lack of large-scale real-world datasets. Existing methods are mostly trained on synthetic datasets, which perform poorly in real scenes. In this work, to facilitate the development of real-world HDR video reconstruction, we present Real-HDRV, a large-scale real-world benchmark dataset for HDR video reconstruction, featuring various scenes, diverse motion patterns, and high-quality labels. Specifically, our dataset contains 500 LDRs-HDRs video pairs, comprising about 28,000 LDR frames and 4,000 HDR labels, covering daytime, nighttime, indoor, and outdoor scenes. To our best knowledge, our dataset is the largest real-world HDR video reconstruction dataset. Correspondingly, we propose an end-to-end network for HDR video reconstruction, where a novel two-stage strategy is designed to perform alignment sequentially. Specifically, the first stage performs global alignment with the adaptively estimated global offsets, reducing the difficulty of subsequent alignment. The second stage implicitly performs local alignment in a coarse-to-fine manner at the feature level using the adaptive separable convolution. Extensive experiments demonstrate that: (1) models trained on our dataset can achieve better performance on real scenes than those trained on synthetic datasets; (2) our method outperforms previous state-of-the-art methods. Our dataset is available at https://github.com/yungsyu99/Real-HDRV.
△ Less
Submitted 30 April, 2024;
originally announced May 2024.
-
Measuring the refractive index and thickness of multilayer samples by Fourier domain optical coherence tomography
Authors:
Yu-Lin Ku,
Yao-Gen Shu
Abstract:
Non-contact measurement of the refractive index and thickness of multilayer biological tissues is of great significance for biomedical applications and can greatly improve medical diagnosis and treatment. In this work, we introduce a theoretical method to simultaneously extract the above information using a Fourier domain optical coherence tomography (FD-OCT) system, in which no additional arrange…
▽ More
Non-contact measurement of the refractive index and thickness of multilayer biological tissues is of great significance for biomedical applications and can greatly improve medical diagnosis and treatment. In this work, we introduce a theoretical method to simultaneously extract the above information using a Fourier domain optical coherence tomography (FD-OCT) system, in which no additional arrangement and prior information about the object is required other than the OCT interference spectrum. The single reflection components can be extracted from the observed spectrum by isolating the primary spikes in the sample reflectance profile, and then the refractive index and thickness can be obtained by fitting the actual and modeled values of the single reflection spectrum. In a two-layer sample example, the simulation results show that our method can reconstruct the results with high accuracy. The relative error is within 0.01%. The complexity of our approach grows linearly with the number of sample layers, making it well-adapted to multilayer situations. Our method takes into account both single and multiple reflections in multilayer samples and is therefore equally applicable to samples with high refractive index contrast.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Minimizing End-to-End Latency for Joint Source-Channel Coding Systems
Authors:
Kaiyi Chi,
Qianqian Yang,
Yuanchao Shu,
Zhaohui Yang,
Zhiguo Shi
Abstract:
While existing studies have highlighted the advantages of deep learning (DL)-based joint source-channel coding (JSCC) schemes in enhancing transmission efficiency, they often overlook the crucial aspect of resource management during the deployment phase. In this paper, we propose an approach to minimize the transmission latency in an uplink JSCC-based system. We first analyze the correlation betwe…
▽ More
While existing studies have highlighted the advantages of deep learning (DL)-based joint source-channel coding (JSCC) schemes in enhancing transmission efficiency, they often overlook the crucial aspect of resource management during the deployment phase. In this paper, we propose an approach to minimize the transmission latency in an uplink JSCC-based system. We first analyze the correlation between end-to-end latency and task performance, based on which the end-to-end delay model for each device is established. Then, we formulate a non-convex optimization problem aiming at minimizing the maximum end-to-end latency across all devices, which is proved to be NP-hard. We then transform the original problem into a more tractable one, from which we derive the closed form solution on the optimal compression ratio, truncation threshold selection policy, and resource allocation strategy. We further introduce a heuristic algorithm with low complexity, leveraging insights from the structure of the optimal solution. Simulation results demonstrate that both the proposed optimal algorithm and the heuristic algorithm significantly reduce end-to-end latency. Notably, the proposed heuristic algorithm achieves nearly the same performance to the optimal solution but with considerably lower computational complexity.
△ Less
Submitted 29 March, 2024;
originally announced March 2024.
-
Retina Vision Transformer (RetinaViT): Introducing Scaled Patches into Vision Transformers
Authors:
Yuyang Shu,
Michael E. Bain
Abstract:
Humans see low and high spatial frequency components at the same time, and combine the information from both to form a visual scene. Drawing on this neuroscientific inspiration, we propose an altered Vision Transformer architecture where patches from scaled down versions of the input image are added to the input of the first Transformer Encoder layer. We name this model Retina Vision Transformer (…
▽ More
Humans see low and high spatial frequency components at the same time, and combine the information from both to form a visual scene. Drawing on this neuroscientific inspiration, we propose an altered Vision Transformer architecture where patches from scaled down versions of the input image are added to the input of the first Transformer Encoder layer. We name this model Retina Vision Transformer (RetinaViT) due to its inspiration from the human visual system. Our experiments show that when trained on the ImageNet-1K dataset with a moderate configuration, RetinaViT achieves a 3.3% performance improvement over the original ViT. We hypothesize that this improvement can be attributed to the inclusion of low spatial frequency components in the input, which improves the ability to capture structural features, and to select and forward important features to deeper layers. RetinaViT thereby opens doors to further investigations into vertical pathways and attention patterns.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Imaginary-time relaxation quantum critical dynamics in two-dimensional dimerized Heisenberg model
Authors:
Jia-Qi Cai,
Yu-Rong Shu,
Xue-Qing Rao,
Shuai Yin
Abstract:
We study the imaginary-time relaxation critical dynamics of the Neel-paramagnetic quantum phase transition in the two-dimensional (2D) dimerized S = 1/2 Heisenberg model. We focus on the scaling correction in the short-time region. A unified scaling form including both short-time and finite-size corrections is proposed. According to this full scaling form, improved short-imaginary-time scaling rel…
▽ More
We study the imaginary-time relaxation critical dynamics of the Neel-paramagnetic quantum phase transition in the two-dimensional (2D) dimerized S = 1/2 Heisenberg model. We focus on the scaling correction in the short-time region. A unified scaling form including both short-time and finite-size corrections is proposed. According to this full scaling form, improved short-imaginary-time scaling relations are obtained. We numerically verify the scaling form and the improved short-time scaling relations for different initial states using projector quantum Monte Carlo algorithm.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Robustifying and Boosting Training-Free Neural Architecture Search
Authors:
Zhenfeng He,
Yao Shu,
Zhongxiang Dai,
Bryan Kian Hsiang Low
Abstract:
Neural architecture search (NAS) has become a key component of AutoML and a standard tool to automate the design of deep neural networks. Recently, training-free NAS as an emerging paradigm has successfully reduced the search costs of standard training-based NAS by estimating the true architecture performance with only training-free metrics. Nevertheless, the estimation ability of these metrics ty…
▽ More
Neural architecture search (NAS) has become a key component of AutoML and a standard tool to automate the design of deep neural networks. Recently, training-free NAS as an emerging paradigm has successfully reduced the search costs of standard training-based NAS by estimating the true architecture performance with only training-free metrics. Nevertheless, the estimation ability of these metrics typically varies across different tasks, making it challenging to achieve robust and consistently good search performance on diverse tasks with only a single training-free metric. Meanwhile, the estimation gap between training-free metrics and the true architecture performances limits training-free NAS to achieve superior performance. To address these challenges, we propose the robustifying and boosting training-free NAS (RoBoT) algorithm which (a) employs the optimized combination of existing training-free metrics explored from Bayesian optimization to develop a robust and consistently better-performing metric on diverse tasks, and (b) applies greedy search, i.e., the exploitation, on the newly developed metric to bridge the aforementioned gap and consequently to boost the search performance of standard training-free NAS further. Remarkably, the expected performance of our RoBoT can be theoretically guaranteed, which improves over the existing training-free NAS under mild conditions with additional interesting insights. Our extensive experiments on various NAS benchmark tasks yield substantial empirical evidence to support our theoretical results.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
van Hove Singularity-Driven Emergence of Multiple Flat Bands in Kagome Superconductors
Authors:
Hailan Luo,
Lin Zhao,
Zhen Zhao,
Haitao Yang,
Yun-Peng Huang,
Hongxiong Liu,
Yuhao Gu,
Feng Jin,
Hao Chen,
Taimin Miao,
Chaohui Yin,
Chengmin Shen,
Xiaolin Ren,
Bo Liang,
Yingjie Shu,
Yiwen Chen,
Fengfeng Zhang,
Feng Yang,
Shenjin Zhang,
Qinjun Peng,
Hanqing Mao,
Guodong Liu,
Jiangping Hu,
Youguo Shi,
Zuyan Xu
, et al. (5 additional authors not shown)
Abstract:
The newly discovered Kagome superconductors AV$_3$Sb$_5$ (A=K, Rb and Cs) continue to bring surprises in generating unusual phenomena and physical properties, including anomalous Hall effect, unconventional charge density wave, electronic nematicity and time-reversal symmetry breaking. Here we report an unexpected emergence of multiple flat bands in the AV$_3$Sb$_5$ superconductors. By performing…
▽ More
The newly discovered Kagome superconductors AV$_3$Sb$_5$ (A=K, Rb and Cs) continue to bring surprises in generating unusual phenomena and physical properties, including anomalous Hall effect, unconventional charge density wave, electronic nematicity and time-reversal symmetry breaking. Here we report an unexpected emergence of multiple flat bands in the AV$_3$Sb$_5$ superconductors. By performing high-resolution angle-resolved photoemission (ARPES) measurements, we observed four branches of flat bands that span over the entire momentum space. The appearance of the flat bands is not anticipated from the band structure calculations and cannot be accounted for by the known mechanisms of flat band generation. It is intimately related to the evolution of van Hove singularities. It is for the first time to observe such emergence of multiple flat bands in solid materials. Our findings provide new insights in revealing the underlying mechanism that governs the unusual behaviors in the Kagome superconductors. They also provide a new pathway in producing flat bands and set a platform to study the flat bands related physics.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
Localized Zeroth-Order Prompt Optimization
Authors:
Wenyang Hu,
Yao Shu,
Zongmin Yu,
Zhaoxuan Wu,
Xiangqiang Lin,
Zhongxiang Dai,
See-Kiong Ng,
Bryan Kian Hsiang Low
Abstract:
The efficacy of large language models (LLMs) in understanding and generating natural language has aroused a wide interest in developing prompt-based methods to harness the power of black-box LLMs. Existing methodologies usually prioritize a global optimization for finding the global optimum, which however will perform poorly in certain tasks. This thus motivates us to re-think the necessity of fin…
▽ More
The efficacy of large language models (LLMs) in understanding and generating natural language has aroused a wide interest in developing prompt-based methods to harness the power of black-box LLMs. Existing methodologies usually prioritize a global optimization for finding the global optimum, which however will perform poorly in certain tasks. This thus motivates us to re-think the necessity of finding a global optimum in prompt optimization. To answer this, we conduct a thorough empirical study on prompt optimization and draw two major insights. Contrasting with the rarity of global optimum, local optima are usually prevalent and well-performed, which can be more worthwhile for efficient prompt optimization (Insight I). The choice of the input domain, covering both the generation and the representation of prompts, affects the identification of well-performing local optima (Insight II). Inspired by these insights, we propose a novel algorithm, namely localized zeroth-order prompt optimization (ZOPO), which incorporates a Neural Tangent Kernel-based derived Gaussian process into standard zeroth-order optimization for an efficient search of well-performing local optima in prompt optimization. Remarkably, ZOPO outperforms existing baselines in terms of both the optimization performance and the query efficiency, which we demonstrate through extensive experiments.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
GPTSee: Enhancing Moment Retrieval and Highlight Detection via Description-Based Similarity Features
Authors:
Yunzhuo Sun,
Yifang Xu,
Zien Xie,
Yukun Shu,
Sidan Du
Abstract:
Moment retrieval (MR) and highlight detection (HD) aim to identify relevant moments and highlights in video from corresponding natural language query. Large language models (LLMs) have demonstrated proficiency in various computer vision tasks. However, existing methods for MR\&HD have not yet been integrated with LLMs. In this letter, we propose a novel two-stage model that takes the output of LLM…
▽ More
Moment retrieval (MR) and highlight detection (HD) aim to identify relevant moments and highlights in video from corresponding natural language query. Large language models (LLMs) have demonstrated proficiency in various computer vision tasks. However, existing methods for MR\&HD have not yet been integrated with LLMs. In this letter, we propose a novel two-stage model that takes the output of LLMs as the input to the second-stage transformer encoder-decoder. First, MiniGPT-4 is employed to generate the detailed description of the video frame and rewrite the query statement, fed into the encoder as new features. Then, semantic similarity is computed between the generated description and the rewritten queries. Finally, continuous high-similarity video frames are converted into span anchors, serving as prior position information for the decoder. Experiments demonstrate that our approach achieves a state-of-the-art result, and by using only span anchors and similarity scores as outputs, positioning accuracy outperforms traditional methods, like Moment-DETR.
△ Less
Submitted 10 March, 2024; v1 submitted 3 March, 2024;
originally announced March 2024.
-
Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments
Authors:
Yu Gu,
Yiheng Shu,
Hao Yu,
Xiao Liu,
Yuxiao Dong,
Jie Tang,
Jayanth Srinivasa,
Hugo Latapie,
Yu Su
Abstract:
The applications of large language models (LLMs) have expanded well beyond the confines of text processing, signaling a new era where LLMs are envisioned as generalist language agents capable of operating within complex real-world environments. These environments are often highly expansive, making it impossible for the LLM to process them within its short-term memory. Motivated by recent research…
▽ More
The applications of large language models (LLMs) have expanded well beyond the confines of text processing, signaling a new era where LLMs are envisioned as generalist language agents capable of operating within complex real-world environments. These environments are often highly expansive, making it impossible for the LLM to process them within its short-term memory. Motivated by recent research on extending the capabilities of LLMs with tools, this paper investigates the intriguing potential of tools to augment LLMs in handling such complexity. To this end, we design customized tools to aid in the proactive exploration within these massive environments. Such tools can serve as a middleware layer shielding the LLM from environmental complexity. In two representative complex environments -- knowledge bases (KBs) and databases -- we demonstrate the significant potential of augmenting language agents with tools in complex environments. Notably, equipped with these tools, GPT-4 achieves 2.8X the performance of the best baseline in tasks requiring access to database content and 2.2X in KB tasks. Our findings illuminate the path for advancing language agents in complex real-world applications.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
OptEx: Expediting First-Order Optimization with Approximately Parallelized Iterations
Authors:
Yao Shu,
Jiongfeng Fang,
Ying Tiffany He,
Fei Richard Yu
Abstract:
First-order optimization (FOO) algorithms are pivotal in numerous computational domains such as machine learning and signal denoising. However, their application to complex tasks like neural network training often entails significant inefficiencies due to the need for many sequential iterations for convergence. In response, we introduce first-order optimization expedited with approximately paralle…
▽ More
First-order optimization (FOO) algorithms are pivotal in numerous computational domains such as machine learning and signal denoising. However, their application to complex tasks like neural network training often entails significant inefficiencies due to the need for many sequential iterations for convergence. In response, we introduce first-order optimization expedited with approximately parallelized iterations (OptEx), the first framework that enhances the efficiency of FOO by leveraging parallel computing to mitigate its iterative bottleneck. OptEx employs kernelized gradient estimation to make use of gradient history for future gradient prediction, enabling parallelization of iterations -- a strategy once considered impractical because of the inherent iterative dependency in FOO. We provide theoretical guarantees for the reliability of our kernelized gradient estimation and the iteration complexity of SGD-based OptEx, confirming that estimation errors diminish to zero as historical gradients accumulate and that SGD-based OptEx enjoys an effective acceleration rate of $Ω(\sqrt{N})$ over standard SGD given parallelism of N. We also use extensive empirical studies, including synthetic functions, reinforcement learning tasks, and neural network training across various datasets, to underscore the substantial efficiency improvements achieved by OptEx.
△ Less
Submitted 17 February, 2024;
originally announced February 2024.
-
Prompt Perturbation in Retrieval-Augmented Generation based Large Language Models
Authors:
Zhibo Hu,
Chen Wang,
Yanfeng Shu,
Helen,
Paik,
Liming Zhu
Abstract:
The robustness of large language models (LLMs) becomes increasingly important as their use rapidly grows in a wide range of domains. Retrieval-Augmented Generation (RAG) is considered as a means to improve the trustworthiness of text generation from LLMs. However, how the outputs from RAG-based LLMs are affected by slightly different inputs is not well studied. In this work, we find that the inser…
▽ More
The robustness of large language models (LLMs) becomes increasingly important as their use rapidly grows in a wide range of domains. Retrieval-Augmented Generation (RAG) is considered as a means to improve the trustworthiness of text generation from LLMs. However, how the outputs from RAG-based LLMs are affected by slightly different inputs is not well studied. In this work, we find that the insertion of even a short prefix to the prompt leads to the generation of outputs far away from factually correct answers. We systematically evaluate the effect of such prefixes on RAG by introducing a novel optimization technique called Gradient Guided Prompt Perturbation (GGPP). GGPP achieves a high success rate in steering outputs of RAG-based LLMs to targeted wrong answers. It can also cope with instructions in the prompts requesting to ignore irrelevant context. We also exploit LLMs' neuron activation difference between prompts with and without GGPP perturbations to give a method that improves the robustness of RAG-based LLMs through a highly effective detector trained on neuron activation triggered by GGPP generated prompts. Our evaluation on open-sourced LLMs demonstrates the effectiveness of our methods.
△ Less
Submitted 20 June, 2024; v1 submitted 11 February, 2024;
originally announced February 2024.
-
Pathformer: Multi-scale Transformers with Adaptive Pathways for Time Series Forecasting
Authors:
Peng Chen,
Yingying Zhang,
Yunyao Cheng,
Yang Shu,
Yihang Wang,
Qingsong Wen,
Bin Yang,
Chenjuan Guo
Abstract:
Transformers for time series forecasting mainly model time series from limited or fixed scales, making it challenging to capture different characteristics spanning various scales. We propose Pathformer, a multi-scale Transformer with adaptive pathways. It integrates both temporal resolution and temporal distance for multi-scale modeling. Multi-scale division divides the time series into different…
▽ More
Transformers for time series forecasting mainly model time series from limited or fixed scales, making it challenging to capture different characteristics spanning various scales. We propose Pathformer, a multi-scale Transformer with adaptive pathways. It integrates both temporal resolution and temporal distance for multi-scale modeling. Multi-scale division divides the time series into different temporal resolutions using patches of various sizes. Based on the division of each scale, dual attention is performed over these patches to capture global correlations and local details as temporal dependencies. We further enrich the multi-scale Transformer with adaptive pathways, which adaptively adjust the multi-scale modeling process based on the varying temporal dynamics of the input, improving the accuracy and generalization of Pathformer. Extensive experiments on eleven real-world datasets demonstrate that Pathformer not only achieves state-of-the-art performance by surpassing all current models but also exhibits stronger generalization abilities under various transfer scenarios. The code is made available at https://github.com/decisionintelligence/pathformer.
△ Less
Submitted 6 March, 2024; v1 submitted 4 February, 2024;
originally announced February 2024.
-
Downside Risk Reduction Using Regime-Switching Signals: A Statistical Jump Model Approach
Authors:
Yizhan Shu,
Chenyu Yu,
John M. Mulvey
Abstract:
This article investigates a regime-switching investment strategy aimed at mitigating downside risk by reducing market exposure during anticipated unfavorable market regimes. We highlight the statistical jump model (JM) for market regime identification, a recently developed robust model that distinguishes itself from traditional Markov-switching models by enhancing regime persistence through a jump…
▽ More
This article investigates a regime-switching investment strategy aimed at mitigating downside risk by reducing market exposure during anticipated unfavorable market regimes. We highlight the statistical jump model (JM) for market regime identification, a recently developed robust model that distinguishes itself from traditional Markov-switching models by enhancing regime persistence through a jump penalty applied at each state transition. Our JM utilizes a feature set comprising risk and return measures derived solely from the return series, with the optimal jump penalty selected through a time-series cross-validation method that directly optimizes strategy performance. Our empirical analysis evaluates the realistic out-of-sample performance of various strategies on major equity indices from the US, Germany, and Japan from 1990 to 2023, in the presence of transaction costs and trading delays. The results demonstrate the consistent outperformance of the JM-guided strategy in reducing risk metrics such as volatility and maximum drawdown, and enhancing risk-adjusted returns like the Sharpe ratio, when compared to both hidden Markov model-guided strategy and the buy-and-hold strategy. These findings underline the enhanced persistence, practicality, and versatility of strategies utilizing JMs for regime-switching signals.
△ Less
Submitted 10 July, 2024; v1 submitted 7 February, 2024;
originally announced February 2024.
-
Improvement of Frequency Source Phase Noise Reduction Design under Vibration Condition
Authors:
Liwei Yin,
Yongjiang Shu,
Heng Zhang,
Yuefei Dai,
Xiaopeng Lu,
Yunlong Lian,
Zhonghua Wang,
Yong Ding
Abstract:
Reasonable vibration reduction design is an important way to achieve low phase noise index of airborne frequency source output signal. Aiming at the problem of phase noise deterioration of an airborne frequency source under random condition, this paper proposes to improve the vibration reduction mode crystal oscillator and reduce the distance between the barycenter of frequency source and crystal…
▽ More
Reasonable vibration reduction design is an important way to achieve low phase noise index of airborne frequency source output signal. Aiming at the problem of phase noise deterioration of an airborne frequency source under random condition, this paper proposes to improve the vibration reduction mode crystal oscillator and reduce the distance between the barycenter of frequency source and crystal oscillator vibration based on the analysis of the relationship between the frequency source and the phase noise of output signal. Experimental results show that the active noise control system achieves 62dB phase noise compensation under the random vibration of 0.04-0.1g*g/Hz amplitude range and 5-2000 Hz frequency range.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Visual Text Meets Low-level Vision: A Comprehensive Survey on Visual Text Processing
Authors:
Yan Shu,
Weichao Zeng,
Zhenhang Li,
Fangmin Zhao,
Yu Zhou
Abstract:
Visual text, a pivotal element in both document and scene images, speaks volumes and attracts significant attention in the computer vision domain. Beyond visual text detection and recognition, the field of visual text processing has experienced a surge in research, driven by the advent of fundamental generative models. However, challenges persist due to the unique properties and features that dist…
▽ More
Visual text, a pivotal element in both document and scene images, speaks volumes and attracts significant attention in the computer vision domain. Beyond visual text detection and recognition, the field of visual text processing has experienced a surge in research, driven by the advent of fundamental generative models. However, challenges persist due to the unique properties and features that distinguish text from general objects. Effectively leveraging these unique textual characteristics is crucial in visual text processing, as observed in our study. In this survey, we present a comprehensive, multi-perspective analysis of recent advancements in this field. Initially, we introduce a hierarchical taxonomy encompassing areas ranging from text image enhancement and restoration to text image manipulation, followed by different learning paradigms. Subsequently, we conduct an in-depth discussion of how specific textual features such as structure, stroke, semantics, style, and spatial context are seamlessly integrated into various tasks. Furthermore, we explore available public datasets and benchmark the reviewed methods on several widely-used datasets. Finally, we identify principal challenges and potential avenues for future research. Our aim is to establish this survey as a fundamental resource, fostering continued exploration and innovation in the dynamic area of visual text processing.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
Source-Free Unsupervised Domain Adaptation with Hypothesis Consolidation of Prediction Rationale
Authors:
Yangyang Shu,
Xiaofeng Cao,
Qi Chen,
Bowen Zhang,
Ziqin Zhou,
Anton van den Hengel,
Lingqiao Liu
Abstract:
Source-Free Unsupervised Domain Adaptation (SFUDA) is a challenging task where a model needs to be adapted to a new domain without access to target domain labels or source domain data. The primary difficulty in this task is that the model's predictions may be inaccurate, and using these inaccurate predictions for model adaptation can lead to misleading results. To address this issue, this paper pr…
▽ More
Source-Free Unsupervised Domain Adaptation (SFUDA) is a challenging task where a model needs to be adapted to a new domain without access to target domain labels or source domain data. The primary difficulty in this task is that the model's predictions may be inaccurate, and using these inaccurate predictions for model adaptation can lead to misleading results. To address this issue, this paper proposes a novel approach that considers multiple prediction hypotheses for each sample and investigates the rationale behind each hypothesis. By consolidating these hypothesis rationales, we identify the most likely correct hypotheses, which we then use as a pseudo-labeled set to support a semi-supervised learning procedure for model adaptation. To achieve the optimal performance, we propose a three-step adaptation process: model pre-adaptation, hypothesis consolidation, and semi-supervised learning. Extensive experimental results demonstrate that our approach achieves state-of-the-art performance in the SFUDA task and can be easily integrated into existing approaches to improve their performance. The codes are available at \url{https://github.com/GANPerf/HCPR}.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
Depth-agnostic Single Image Dehazing
Authors:
Honglei Xu,
Yan Shu,
Shaohui Liu
Abstract:
Single image dehazing is a challenging ill-posed problem. Existing datasets for training deep learning-based methods can be generated by hand-crafted or synthetic schemes. However, the former often suffers from small scales, while the latter forces models to learn scene depth instead of haze distribution, decreasing their dehazing ability. To overcome the problem, we propose a simple yet novel syn…
▽ More
Single image dehazing is a challenging ill-posed problem. Existing datasets for training deep learning-based methods can be generated by hand-crafted or synthetic schemes. However, the former often suffers from small scales, while the latter forces models to learn scene depth instead of haze distribution, decreasing their dehazing ability. To overcome the problem, we propose a simple yet novel synthetic method to decouple the relationship between haze density and scene depth, by which a depth-agnostic dataset (DA-HAZE) is generated. Meanwhile, a Global Shuffle Strategy (GSS) is proposed for generating differently scaled datasets, thereby enhancing the generalization ability of the model. Extensive experiments indicate that models trained on DA-HAZE achieve significant improvements on real-world benchmarks, with less discrepancy between SOTS and DA-SOTS (the test set of DA-HAZE). Additionally, Depth-agnostic dehazing is a more complicated task because of the lack of depth prior. Therefore, an efficient architecture with stronger feature modeling ability and fewer computational costs is necessary. We revisit the U-Net-based architectures for dehazing, in which dedicatedly designed blocks are incorporated. However, the performances of blocks are constrained by limited feature fusion methods. To this end, we propose a Convolutional Skip Connection (CSC) module, allowing vanilla feature fusion methods to achieve promising results with minimal costs. Extensive experimental results demonstrate that current state-of-the-art methods. equipped with CSC can achieve better performance and reasonable computational expense, whether the haze distribution is relevant to the scene depth.
△ Less
Submitted 14 January, 2024;
originally announced January 2024.
-
Unsupervised hard Negative Augmentation for contrastive learning
Authors:
Yuxuan Shu,
Vasileios Lampos
Abstract:
We present Unsupervised hard Negative Augmentation (UNA), a method that generates synthetic negative instances based on the term frequency-inverse document frequency (TF-IDF) retrieval model. UNA uses TF-IDF scores to ascertain the perceived importance of terms in a sentence and then produces negative samples by replacing terms with respect to that. Our experiments demonstrate that models trained…
▽ More
We present Unsupervised hard Negative Augmentation (UNA), a method that generates synthetic negative instances based on the term frequency-inverse document frequency (TF-IDF) retrieval model. UNA uses TF-IDF scores to ascertain the perceived importance of terms in a sentence and then produces negative samples by replacing terms with respect to that. Our experiments demonstrate that models trained with UNA improve the overall performance in semantic textual similarity tasks. Additional performance gains are obtained when combining UNA with the paraphrasing augmentation. Further results show that our method is compatible with different backbone models. Ablation studies also support the choice of having a TF-IDF-driven control on negative augmentation.
△ Less
Submitted 4 January, 2024;
originally announced January 2024.
-
The survival of scientific stylization
Authors:
Yuanyuan Shu,
Tianxing Pan
Abstract:
This study elaborates a text-based metric to quantify the unique position of stylized scientific research, characterized by its innovative integration of diverse knowledge components and potential to pivot established scientific paradigms. Our analysis reveals a concerning decline in stylized research, highlighted by its comparative undervaluation in terms of citation counts and protracted peer-re…
▽ More
This study elaborates a text-based metric to quantify the unique position of stylized scientific research, characterized by its innovative integration of diverse knowledge components and potential to pivot established scientific paradigms. Our analysis reveals a concerning decline in stylized research, highlighted by its comparative undervaluation in terms of citation counts and protracted peer-review duration. Despite facing these challenges, the disruptive potential of stylized research remains robust, consistently introducing groundbreaking questions and theories. This paper posits that substantive reforms are necessary to incentivize and recognize the value of stylized research, including optimizations to the peer-review process and the criteria for evaluating scientific impact. Embracing these changes may be imperative to halt the downturn in stylized research and ensure enduring scholarly exploration in endless frontiers.
△ Less
Submitted 10 December, 2023;
originally announced December 2023.
-
A framework for mining lifestyle profiles through multi-dimensional and high-order mobility feature clustering
Authors:
Yeshuo Shu,
Gangcheng Zhang,
Keyi Liu,
Jintong Tang,
Liyan Xu
Abstract:
Human mobility demonstrates a high degree of regularity, which facilitates the discovery of lifestyle profiles. Existing research has yet to fully utilize the regularities embedded in high-order features extracted from human mobility records in such profiling. This study proposes a progressive feature extraction strategy that mines high-order mobility features from users' moving trajectory records…
▽ More
Human mobility demonstrates a high degree of regularity, which facilitates the discovery of lifestyle profiles. Existing research has yet to fully utilize the regularities embedded in high-order features extracted from human mobility records in such profiling. This study proposes a progressive feature extraction strategy that mines high-order mobility features from users' moving trajectory records from the spatial, temporal, and semantic dimensions. Specific features are extracted such as travel motifs, rhythms decomposed by discrete Fourier transform (DFT) of mobility time series, and vectorized place semantics by word2vec, respectively to the three dimensions, and they are further clustered to reveal the users' lifestyle characteristics. An experiment using a trajectory dataset of over 500k users in Shenzhen, China yields seven user clusters with different lifestyle profiles that can be well interpreted by common sense. The results suggest the possibility of fine-grained user profiling through cross-order trajectory feature engineering and clustering.
△ Less
Submitted 1 December, 2023;
originally announced December 2023.
-
Confidant: Customizing Transformer-based LLMs via Collaborative Edge Training
Authors:
Yuhao Chen,
Yuxuan Yan,
Qianqian Yang,
Yuanchao Shu,
Shibo He,
Jiming Chen
Abstract:
Transformer-based large language models (LLMs) have demonstrated impressive capabilities in a variety of natural language processing (NLP) tasks. Nonetheless, it is challenging to deploy and fine-tune LLMs on mobile edge devices with limited computing, memory, and energy budgets. In this paper, we propose Confidant, a multi-backend collaborative training framework for customizing state-of-the-art…
▽ More
Transformer-based large language models (LLMs) have demonstrated impressive capabilities in a variety of natural language processing (NLP) tasks. Nonetheless, it is challenging to deploy and fine-tune LLMs on mobile edge devices with limited computing, memory, and energy budgets. In this paper, we propose Confidant, a multi-backend collaborative training framework for customizing state-of-the-art LLMs on commodity mobile devices like smartphones. Confidant partitions an LLM into several sub-models so that each fits into a mobile device's memory. A pipeline parallel training mechanism is further developed to ensure fast and efficient distributed training. In addition, we propose a novel backend scheduler to allocate different attention heads to heterogeneous compute hardware, including mobile CPU and GPUs, to maximize the compute resource utilization on each edge device. Our preliminary experimental results show that Confidant achieves at most 45.3% memory reduction and 8.03x inference speedup in practical settings.
△ Less
Submitted 22 November, 2023;
originally announced November 2023.
-
Cryogenic quasi-static embedded DRAM for energy-efficient compute-in-memory applications
Authors:
Yuhao Shu,
Hongtu Zhang,
Hao Sun,
Mengru Zhang,
Wenfeng Zhao,
Qi Deng,
Zhidong Tang,
Yumeng Yuan,
Yongqi Hu,
Yu Gu,
Xufeng Kou,
Yajun Ha
Abstract:
Compute-in-memory (CIM) presents an attractive approach for energy-efficient computing in data-intensive applications. However, the development of suitable memory designs to achieve high-performance CIM remains a challenging task. Here, we propose a cryogenic quasi-static embedded DRAM to address the logic-memory mismatch of CIM. Guided by the re-calibrated cryogenic device model, the designed fou…
▽ More
Compute-in-memory (CIM) presents an attractive approach for energy-efficient computing in data-intensive applications. However, the development of suitable memory designs to achieve high-performance CIM remains a challenging task. Here, we propose a cryogenic quasi-static embedded DRAM to address the logic-memory mismatch of CIM. Guided by the re-calibrated cryogenic device model, the designed four-transistor bit-cell achieves full-swing data storage, low power consumption, and extended retention time at cryogenic temperatures. Combined with the adoption of cryogenic write bitline biasing technique and readout circuitry optimization, our 4Kb cryogenic eDRAM chip demonstrates a 1.37$\times$10$^6$ times improvement in retention time, while achieving a 75 times improvement in retention variability, compared to room-temperature operation. Moreover, it also achieves outstanding power performance with a retention power of 112 fW and a dynamic power of 108 $μ$W at 4.2 K, which can be further decreased by 7.1% and 13.6% using the dynamic voltage scaling technique. This work reveals the great potential of cryogenic CMOS for high-density data storage and lays a solid foundation for energy-efficient CIM implementations.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
Modeling biases from constant stellar mass-to-light ratio assumption in galaxy dynamics and strong lensing
Authors:
Yan Liang,
Dandan Xu,
Dominique Sluse,
Alessandro Sonnenfeld,
Yiping Shu
Abstract:
A constant stellar-mass to light ratio $M_{\star}/L$ has been widely-used in studies of galaxy dynamics and strong lensing, which aim at disentangling the mass density distributions of dark matter and baryons. In this work, we take early-type galaxies from the cosmological hydrodynamic IllustrisTNG-100 simulation to investigate possible systematic bias in the inferences due to a constant…
▽ More
A constant stellar-mass to light ratio $M_{\star}/L$ has been widely-used in studies of galaxy dynamics and strong lensing, which aim at disentangling the mass density distributions of dark matter and baryons. In this work, we take early-type galaxies from the cosmological hydrodynamic IllustrisTNG-100 simulation to investigate possible systematic bias in the inferences due to a constant $M_{\star}/L$ assumption. To do so, we construct two-component matter density models, where one component describes the dark matter distribution, the other one for the stellar mass, which is made to follow the light profile by assuming a constant factor of $M_{\star}/L$. Specifically, we adopt multiple commonly used dark matter models and light distributions. We fit the two-component models directly to the {\it total} matter density distributions of simulated galaxies to eliminate systematics from other modelling procedures. We find that galaxies in general have more centrally-concentrated stellar mass profile than their light distribution. This is more significant among more massive galaxies, for which the $M_{\star}/L$ profile rises up markedly towards the centre and may often exhibit a dented feature due to on-going star formation at about one effective radius, encompassing a quenched bulge region. As a consequence, a constant $M_{\star}/L$ causes a model degeneracy to be artificially broken under specific model assumptions, resulting in strong and model-dependent biases on estimated properties, such as the central dark matter fraction and the initial mass function. Either a steeper dark matter profile with an over-predicted density fraction, or an over-predicted stellar mass normalization ($M_{\star}/L$) is often obtained through model fitting. The exact biased behaviour depends on the slope difference between mass and light, as well as on the adopted models for dark matter and light.
△ Less
Submitted 13 November, 2023;
originally announced November 2023.
-
CLiF-VQA: Enhancing Video Quality Assessment by Incorporating High-Level Semantic Information related to Human Feelings
Authors:
Yachun Mi,
Yu Li,
Yan Shu,
Chen Hui,
Puchao Zhou,
Shaohui Liu
Abstract:
Video Quality Assessment (VQA) aims to simulate the process of perceiving video quality by the human visual system (HVS). The judgments made by HVS are always influenced by human subjective feelings. However, most of the current VQA research focuses on capturing various distortions in the spatial and temporal domains of videos, while ignoring the impact of human feelings. In this paper, we propose…
▽ More
Video Quality Assessment (VQA) aims to simulate the process of perceiving video quality by the human visual system (HVS). The judgments made by HVS are always influenced by human subjective feelings. However, most of the current VQA research focuses on capturing various distortions in the spatial and temporal domains of videos, while ignoring the impact of human feelings. In this paper, we propose CLiF-VQA, which considers both features related to human feelings and spatial features of videos. In order to effectively extract features related to human feelings from videos, we explore the consistency between CLIP and human feelings in video perception for the first time. Specifically, we design multiple objective and subjective descriptions closely related to human feelings as prompts. Further we propose a novel CLIP-based semantic feature extractor (SFE) which extracts features related to human feelings by sliding over multiple regions of the video frame. In addition, we further capture the low-level-aware features of the video through a spatial feature extraction module. The two different features are then aggregated thereby obtaining the quality score of the video. Extensive experiments show that the proposed CLiF-VQA exhibits excellent performance on several VQA datasets.
△ Less
Submitted 13 November, 2023;
originally announced November 2023.
-
Relaxation Critical Dynamics with Emergent Symmetry
Authors:
Yu-Rong Shu,
Shuai Yin
Abstract:
Different from usual critical point characterized by a single length scale, critical point with emergent symmetry exhibits intriguing critical properties characterized by two relevant length scales, attracting long-term investigations from both theoretical and experimental aspects. A natural question is how the critical dynamics is affected by the presence of two relevant length scales. Here we st…
▽ More
Different from usual critical point characterized by a single length scale, critical point with emergent symmetry exhibits intriguing critical properties characterized by two relevant length scales, attracting long-term investigations from both theoretical and experimental aspects. A natural question is how the critical dynamics is affected by the presence of two relevant length scales. Here we study the relaxation critical dynamics in the three-dimensional ($3$D) clock model, whose critical point has emergent $U(1)$ symmetry. We find that in contrast to the magnatization $M$, whose relaxation process is described by the usual dynamic exponent $z$ of the $3$D $XY$ universality class, the angular order parameter $φ_q$ shows a two-stage evolution characterized by different dynamic critical exponents. While in the short-time stage the relaxation dynamics is governed by $z$, in the long-time stage the dynamics is controlled by a new dynamic exponent $z'$. We also show the off-critical-point effects in the critical relaxation. Our results may be experimentally detected in the hexagonal RMnO$_3$ (R$=$rare earth) materials.
△ Less
Submitted 10 November, 2023;
originally announced November 2023.
-
AccEPT: An Acceleration Scheme for Speeding Up Edge Pipeline-parallel Training
Authors:
Yuhao Chen,
Yuxuan Yan,
Qianqian Yang,
Yuanchao Shu,
Shibo He,
Zhiguo Shi,
Jiming Chen
Abstract:
It is usually infeasible to fit and train an entire large deep neural network (DNN) model using a single edge device due to the limited resources. To facilitate intelligent applications across edge devices, researchers have proposed partitioning a large model into several sub-models, and deploying each of them to a different edge device to collaboratively train a DNN model. However, the communicat…
▽ More
It is usually infeasible to fit and train an entire large deep neural network (DNN) model using a single edge device due to the limited resources. To facilitate intelligent applications across edge devices, researchers have proposed partitioning a large model into several sub-models, and deploying each of them to a different edge device to collaboratively train a DNN model. However, the communication overhead caused by the large amount of data transmitted from one device to another during training, as well as the sub-optimal partition point due to the inaccurate latency prediction of computation at each edge device can significantly slow down training. In this paper, we propose AccEPT, an acceleration scheme for accelerating the edge collaborative pipeline-parallel training. In particular, we propose a light-weight adaptive latency predictor to accurately estimate the computation latency of each layer at different devices, which also adapts to unseen devices through continuous learning. Therefore, the proposed latency predictor leads to better model partitioning which balances the computation loads across participating devices. Moreover, we propose a bit-level computation-efficient data compression scheme to compress the data to be transmitted between devices during training. Our numerical results demonstrate that our proposed acceleration approach is able to significantly speed up edge pipeline parallel training up to 3 times faster in the considered experimental settings.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
Exploiting Correlated Auxiliary Feedback in Parameterized Bandits
Authors:
Arun Verma,
Zhongxiang Dai,
Yao Shu,
Bryan Kian Hsiang Low
Abstract:
We study a novel variant of the parameterized bandits problem in which the learner can observe additional auxiliary feedback that is correlated with the observed reward. The auxiliary feedback is readily available in many real-life applications, e.g., an online platform that wants to recommend the best-rated services to its users can observe the user's rating of service (rewards) and collect addit…
▽ More
We study a novel variant of the parameterized bandits problem in which the learner can observe additional auxiliary feedback that is correlated with the observed reward. The auxiliary feedback is readily available in many real-life applications, e.g., an online platform that wants to recommend the best-rated services to its users can observe the user's rating of service (rewards) and collect additional information like service delivery time (auxiliary feedback). In this paper, we first develop a method that exploits auxiliary feedback to build a reward estimator with tight confidence bounds, leading to a smaller regret. We then characterize the regret reduction in terms of the correlation coefficient between reward and its auxiliary feedback. Experimental results in different settings also verify the performance gain achieved by our proposed method.
△ Less
Submitted 5 November, 2023;
originally announced November 2023.
-
Benchmarking Sequential Visual Input Reasoning and Prediction in Multimodal Large Language Models
Authors:
Mingwei Zhu,
Leigang Sha,
Yu Shu,
Kangjia Zhao,
Tiancheng Zhao,
Jianwei Yin
Abstract:
Multimodal large language models (MLLMs) have shown great potential in perception and interpretation tasks, but their capabilities in predictive reasoning remain under-explored. To address this gap, we introduce a novel benchmark that assesses the predictive reasoning capabilities of MLLMs across diverse scenarios. Our benchmark targets three important domains: abstract pattern reasoning, human ac…
▽ More
Multimodal large language models (MLLMs) have shown great potential in perception and interpretation tasks, but their capabilities in predictive reasoning remain under-explored. To address this gap, we introduce a novel benchmark that assesses the predictive reasoning capabilities of MLLMs across diverse scenarios. Our benchmark targets three important domains: abstract pattern reasoning, human activity prediction, and physical interaction prediction. We further develop three evaluation methods powered by large language model to robustly quantify a model's performance in predicting and reasoning the future based on multi-visual context. Empirical experiments confirm the soundness of the proposed benchmark and evaluation methods via rigorous testing and reveal pros and cons of current popular MLLMs in the task of predictive reasoning. Lastly, our proposed benchmark provides a standardized evaluation framework for MLLMs and can facilitate the development of more advanced models that can reason and predict over complex long sequence of multimodal input.
△ Less
Submitted 20 October, 2023;
originally announced October 2023.
-
Searching for strong gravitational lenses
Authors:
Cameron Lemon,
Frédéric Courbin,
Anupreeta More,
Paul Schechter,
Raoul Cañameras,
Ludovic Delchambre,
Calvin Leung,
Yiping Shu,
Chiara Spiniello,
Yashar Hezaveh,
Jonas Klüter,
Richard McMahon
Abstract:
Strong gravitational lenses provide unique laboratories for cosmological and astrophysical investigations, but they must first be discovered - a task that can be met with significant contamination by other astrophysical objects and asterisms. Here we review strong lens searches, covering various sources (quasars, galaxies, supernovae, FRBs, GRBs, and GWs), lenses (early- and late-type galaxies, gr…
▽ More
Strong gravitational lenses provide unique laboratories for cosmological and astrophysical investigations, but they must first be discovered - a task that can be met with significant contamination by other astrophysical objects and asterisms. Here we review strong lens searches, covering various sources (quasars, galaxies, supernovae, FRBs, GRBs, and GWs), lenses (early- and late-type galaxies, groups, and clusters), datasets (imaging, spectra, and lightcurves), and wavelengths. We first present the physical characteristics of the lens and source populations, highlighting relevant details for constructing targeted searches. Search techniques are described based on the main lensing feature that is required for the technique to work, namely one of: (i) an associated magnification, (ii) multiple spatially-resolved images, (iii) multiple redshifts, or (iv) a non-zero time delay between images. To use the current lens samples for science, and for the design of future searches, we list several selection biases that exist due to these discovery techniques. We conclude by discussing the future of lens searches in upcoming surveys and the new population of lenses that will be discovered.
△ Less
Submitted 27 October, 2023; v1 submitted 20 October, 2023;
originally announced October 2023.
-
Nonequilibrium dynamics in Dirac quantum criticality
Authors:
Yin-Kai Yu,
Zhi Zeng,
Yu-Rong Shu,
Zi-Xiang Li,
Shuai Yin
Abstract:
Quantum criticality within Dirac fermions harbors a plethora of exotic phenomena, attracting sustained attention in the past decades. Nevertheless, the nonequilibrium dynamics therein has rarely been studied. To fill in the gap, we explore the imaginary-time relaxation dynamics in a typical Dirac quantum criticality belonging to chiral Heisenberg universality class. Performing large-scale quantum…
▽ More
Quantum criticality within Dirac fermions harbors a plethora of exotic phenomena, attracting sustained attention in the past decades. Nevertheless, the nonequilibrium dynamics therein has rarely been studied. To fill in the gap, we explore the imaginary-time relaxation dynamics in a typical Dirac quantum criticality belonging to chiral Heisenberg universality class. Performing large-scale quantum Monte Carlo simulation, we unveil rich nonequilibrium critical phenomena from different initial states. Particularly, a new dynamic exponent characterizing the non-stationary evolution in the short-time state is determined as $θ=-0.84(4)$, in sharp contrast with the prevalent belief that $θ$ is positive as demonstrated in classical cases. Furthermore, we propose a universal dynamic scaling theory governing the fruitful nonequilibrium properties in Dirac quantum criticality. Armed with the scaling theory, we develop a new framework to investigate fermionic quantum criticality based on short-time dynamics, paving a promising avenue to fathoming quantum criticality in diverse fermionic systems with high efficiency.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Quantum Bayesian Optimization
Authors:
Zhongxiang Dai,
Gregory Kang Ruey Lau,
Arun Verma,
Yao Shu,
Bryan Kian Hsiang Low,
Patrick Jaillet
Abstract:
Kernelized bandits, also known as Bayesian optimization (BO), has been a prevalent method for optimizing complicated black-box reward functions. Various BO algorithms have been theoretically shown to enjoy upper bounds on their cumulative regret which are sub-linear in the number T of iterations, and a regret lower bound of Omega(sqrt(T)) has been derived which represents the unavoidable regrets f…
▽ More
Kernelized bandits, also known as Bayesian optimization (BO), has been a prevalent method for optimizing complicated black-box reward functions. Various BO algorithms have been theoretically shown to enjoy upper bounds on their cumulative regret which are sub-linear in the number T of iterations, and a regret lower bound of Omega(sqrt(T)) has been derived which represents the unavoidable regrets for any classical BO algorithm. Recent works on quantum bandits have shown that with the aid of quantum computing, it is possible to achieve tighter regret upper bounds better than their corresponding classical lower bounds. However, these works are restricted to either multi-armed or linear bandits, and are hence not able to solve sophisticated real-world problems with non-linear reward functions. To this end, we introduce the quantum-Gaussian process-upper confidence bound (Q-GP-UCB) algorithm. To the best of our knowledge, our Q-GP-UCB is the first BO algorithm able to achieve a regret upper bound of O(polylog T), which is significantly smaller than its regret lower bound of Omega(sqrt(T)) in the classical setting. Moreover, thanks to our novel analysis of the confidence ellipsoid, our Q-GP-UCB with the linear kernel achieves a smaller regret than the quantum linear UCB algorithm from the previous work. We use simulations, as well as an experiment using a real quantum computer, to verify that the theoretical quantum speedup achieved by our Q-GP-UCB is also potentially relevant in practice.
△ Less
Submitted 8 October, 2023;
originally announced October 2023.
-
Use Your INSTINCT: INSTruction optimization for LLMs usIng Neural bandits Coupled with Transformers
Authors:
Xiaoqiang Lin,
Zhaoxuan Wu,
Zhongxiang Dai,
Wenyang Hu,
Yao Shu,
See-Kiong Ng,
Patrick Jaillet,
Bryan Kian Hsiang Low
Abstract:
Large language models (LLMs) have shown remarkable instruction-following capabilities and achieved impressive performances in various applications. However, the performances of LLMs depend heavily on the instructions given to them, which are typically manually tuned with substantial human efforts. Recent work has used the query-efficient Bayesian optimization (BO) algorithm to automatically optimi…
▽ More
Large language models (LLMs) have shown remarkable instruction-following capabilities and achieved impressive performances in various applications. However, the performances of LLMs depend heavily on the instructions given to them, which are typically manually tuned with substantial human efforts. Recent work has used the query-efficient Bayesian optimization (BO) algorithm to automatically optimize the instructions given to black-box LLMs. However, BO usually falls short when optimizing highly sophisticated (e.g., high-dimensional) objective functions, such as the functions mapping an instruction to the performance of an LLM. This is mainly due to the limited expressive power of the Gaussian process (GP) which is used by BO as a surrogate to model the objective function. Meanwhile, it has been repeatedly shown that neural networks (NNs), especially pre-trained transformers, possess strong expressive power and can model highly complex functions. So, we adopt a neural bandit algorithm which replaces the GP in BO by an NN surrogate to optimize instructions for black-box LLMs. More importantly, the neural bandit algorithm allows us to naturally couple the NN surrogate with the hidden representation learned by a pre-trained transformer (i.e., an open-source LLM), which significantly boosts its performance. These motivate us to propose our INSTruction optimization usIng Neural bandits Coupled with Transformers (INSTINCT) algorithm. We perform instruction optimization for ChatGPT and use extensive experiments to show that INSTINCT consistently outperforms baselines in different tasks, e.g., various instruction induction tasks and the task of improving zero-shot chain-of-thought instructions. Our code is available at https://github.com/xqlin98/INSTINCT.
△ Less
Submitted 23 June, 2024; v1 submitted 1 October, 2023;
originally announced October 2023.
-
AutoAgents: A Framework for Automatic Agent Generation
Authors:
Guangyao Chen,
Siwei Dong,
Yu Shu,
Ge Zhang,
Jaward Sesay,
Börje F. Karlsson,
Jie Fu,
Yemin Shi
Abstract:
Large language models (LLMs) have enabled remarkable advances in automated task-solving with multi-agent systems. However, most existing LLM-based multi-agent approaches rely on predefined agents to handle simple tasks, limiting the adaptability of multi-agent collaboration to different scenarios. Therefore, we introduce AutoAgents, an innovative framework that adaptively generates and coordinates…
▽ More
Large language models (LLMs) have enabled remarkable advances in automated task-solving with multi-agent systems. However, most existing LLM-based multi-agent approaches rely on predefined agents to handle simple tasks, limiting the adaptability of multi-agent collaboration to different scenarios. Therefore, we introduce AutoAgents, an innovative framework that adaptively generates and coordinates multiple specialized agents to build an AI team according to different tasks. Specifically, AutoAgents couples the relationship between tasks and roles by dynamically generating multiple required agents based on task content and planning solutions for the current task based on the generated expert agents. Multiple specialized agents collaborate with each other to efficiently accomplish tasks. Concurrently, an observer role is incorporated into the framework to reflect on the designated plans and agents' responses and improve upon them. Our experiments on various benchmarks demonstrate that AutoAgents generates more coherent and accurate solutions than the existing multi-agent methods. This underscores the significance of assigning different roles to different tasks and of team cooperation, offering new perspectives for tackling complex tasks. The repository of this project is available at https://github.com/Link-AGI/AutoAgents.
△ Less
Submitted 29 April, 2024; v1 submitted 29 September, 2023;
originally announced September 2023.
-
Data Distribution Bottlenecks in Grounding Language Models to Knowledge Bases
Authors:
Yiheng Shu,
Zhiwei Yu
Abstract:
Language models (LMs) have already demonstrated remarkable abilities in understanding and generating both natural and formal language. Despite these advances, their integration with real-world environments such as large-scale knowledge bases (KBs) remains an underdeveloped area, affecting applications such as semantic parsing and indulging in "hallucinated" information. This paper is an experiment…
▽ More
Language models (LMs) have already demonstrated remarkable abilities in understanding and generating both natural and formal language. Despite these advances, their integration with real-world environments such as large-scale knowledge bases (KBs) remains an underdeveloped area, affecting applications such as semantic parsing and indulging in "hallucinated" information. This paper is an experimental investigation aimed at uncovering the robustness challenges that LMs encounter when tasked with knowledge base question answering (KBQA). The investigation covers scenarios with inconsistent data distribution between training and inference, such as generalization to unseen domains, adaptation to various language variations, and transferability across different datasets. Our comprehensive experiments reveal that even when employed with our proposed data augmentation techniques, advanced small and large language models exhibit poor performance in various dimensions. While the LM is a promising technology, the robustness of the current form in dealing with complex environments is fragile and of limited practicality because of the data distribution issue. This calls for future research on data collection and LM learning paradims.
△ Less
Submitted 9 February, 2024; v1 submitted 15 September, 2023;
originally announced September 2023.