subscribe to arXiv mailings

Hallucination Detection and Hallucination Mitigation: An Investigation

Authors: Junliang Luo, Tianyu Li, Di Wu, Michael Jenkin, Steve Liu, Gregory Dudek

Abstract: Large language models (LLMs), including ChatGPT, Bard, and Llama, have achieved remarkable successes over the last two years in a range of different applications. In spite of these successes, there exist concerns that limit the wide application of LLMs. A key problem is the problem of hallucination. Hallucination refers to the fact that in addition to correct responses, LLMs can also generate seem… ▽ More Large language models (LLMs), including ChatGPT, Bard, and Llama, have achieved remarkable successes over the last two years in a range of different applications. In spite of these successes, there exist concerns that limit the wide application of LLMs. A key problem is the problem of hallucination. Hallucination refers to the fact that in addition to correct responses, LLMs can also generate seemingly correct but factually incorrect responses. This report aims to present a comprehensive review of the current literature on both hallucination detection and hallucination mitigation. We hope that this report can serve as a good reference for both engineers and researchers who are interested in LLMs and applying them to real world tasks. △ Less

Submitted 16 January, 2024; originally announced January 2024.

arXiv:2401.07261 [pdf, other]

LookAhead: Preventing DeFi Attacks via Unveiling Adversarial Contracts

Authors: Shoupeng Ren, Tianyu Tu, Jian Liu, Di Wu, Kui Ren

Abstract: DeFi incidents stemming from various smart contract vulnerabilities have culminated in financial damages exceeding 3 billion USD. The attacks causing such incidents commonly commence with the deployment of adversarial contracts, subsequently leveraging these contracts to execute adversarial transactions that exploit vulnerabilities in victim contracts. Existing defense mechanisms leverage heuristi… ▽ More DeFi incidents stemming from various smart contract vulnerabilities have culminated in financial damages exceeding 3 billion USD. The attacks causing such incidents commonly commence with the deployment of adversarial contracts, subsequently leveraging these contracts to execute adversarial transactions that exploit vulnerabilities in victim contracts. Existing defense mechanisms leverage heuristic or machine learning algorithms to detect adversarial transactions, but they face significant challenges in detecting private adversarial transactions. Namely, attackers can send adversarial transactions directly to miners, evading visibility within the blockchain network and effectively bypassing the detection. In this paper, we propose a new direction for detecting DeFi attacks, i.e., detecting adversarial contracts instead of adversarial transactions, allowing us to proactively identify potential attack intentions, even if they employ private adversarial transactions. Specifically, we observe that most adversarial contracts follow a similar pattern, e.g., anonymous fund source, closed-source, frequent token-related function calls. Based on this observation, we build a machine learning classifier that can effectively distinguish adversarial contracts from benign ones. We build a dataset consists of features extracted from 269 adversarial contracts and 13,000 benign contracts. Based on this dataset, we evaluate different classifiers, the results of which show that our method for identifying DeFi adversarial contracts performs exceptionally well. For example, the F1-Score for LightGBM-based classifier is 0.9541, with a remarkably low false positive rate of only 0.15%. △ Less

Submitted 2 February, 2024; v1 submitted 14 January, 2024; originally announced January 2024.

Comments: 14 pages, 11 figures

arXiv:2401.05409 [pdf, other]

Image-based Data Representations of Time Series: A Comparative Analysis in EEG Artifact Detection

Authors: Aaron Maiwald, Leon Ackermann, Maximilian Kalcher, Daniel J. Wu

Abstract: Alternative data representations are powerful tools that augment the performance of downstream models. However, there is an abundance of such representations within the machine learning toolbox, and the field lacks a comparative understanding of the suitability of each representation method. In this paper, we propose artifact detection and classification within EEG data as a testbed for profilin… ▽ More Alternative data representations are powerful tools that augment the performance of downstream models. However, there is an abundance of such representations within the machine learning toolbox, and the field lacks a comparative understanding of the suitability of each representation method. In this paper, we propose artifact detection and classification within EEG data as a testbed for profiling image-based data representations of time series data. We then evaluate eleven popular deep learning architectures on each of six commonly-used representation methods. We find that, while the choice of representation entails a choice within the tradeoff between bias and variance, certain representations are practically more effective in highlighting features which increase the signal-to-noise ratio of the data. We present our results on EEG data, and open-source our testing framework to enable future comparative analyses in this vein. △ Less

Submitted 21 December, 2023; originally announced January 2024.

Comments: 13 pages, 4 figures

arXiv:2401.04921 [pdf, other]

Diffusion-based Pose Refinement and Muti-hypothesis Generation for 3D Human Pose Estimaiton

Authors: Hongbo Kang, Yong Wang, Mengyuan Liu, Doudou Wu, Peng Liu, Xinlin Yuan, Wenming Yang

Abstract: Previous probabilistic models for 3D Human Pose Estimation (3DHPE) aimed to enhance pose accuracy by generating multiple hypotheses. However, most of the hypotheses generated deviate substantially from the true pose. Compared to deterministic models, the excessive uncertainty in probabilistic models leads to weaker performance in single-hypothesis prediction. To address these two challenges, we pr… ▽ More Previous probabilistic models for 3D Human Pose Estimation (3DHPE) aimed to enhance pose accuracy by generating multiple hypotheses. However, most of the hypotheses generated deviate substantially from the true pose. Compared to deterministic models, the excessive uncertainty in probabilistic models leads to weaker performance in single-hypothesis prediction. To address these two challenges, we propose a diffusion-based refinement framework called DRPose, which refines the output of deterministic models by reverse diffusion and achieves more suitable multi-hypothesis prediction for the current pose benchmark by multi-step refinement with multiple noises. To this end, we propose a Scalable Graph Convolution Transformer (SGCT) and a Pose Refinement Module (PRM) for denoising and refining. Extensive experiments on Human3.6M and MPI-INF-3DHP datasets demonstrate that our method achieves state-of-the-art performance on both single and multi-hypothesis 3DHPE. Code is available at https://github.com/KHB1698/DRPose. △ Less

Submitted 9 January, 2024; originally announced January 2024.

arXiv:2401.03287 [pdf, ps, other]

Advancing Stepped Wedge Cluster Randomized Trials Analysis: Bayesian Hierarchical Penalized Spline Models for Immediate and Time-Varying Intervention Effects

Authors: Danni Wu, Hyung G. Park, Corita R. Grudzen, Keith S. Goldfeld

Abstract: Stepped wedge cluster randomized trials (SWCRTs) often face challenges with potential confounding by time trends. Traditional frequentist methods can fail to provide adequate coverage of the intervention's true effect using confidence intervals, whereas Bayesian approaches show potential for better coverage of intervention effects. However, Bayesian methods have seen limited development in SWCRTs.… ▽ More Stepped wedge cluster randomized trials (SWCRTs) often face challenges with potential confounding by time trends. Traditional frequentist methods can fail to provide adequate coverage of the intervention's true effect using confidence intervals, whereas Bayesian approaches show potential for better coverage of intervention effects. However, Bayesian methods have seen limited development in SWCRTs. We propose two novel Bayesian hierarchical penalized spline models for SWCRTs. The first model is for SWCRTs involving many clusters and time periods, focusing on immediate intervention effects. To evaluate its efficacy, we compared this model to traditional frequentist methods. We further developed the model to estimate time-varying intervention effects. We conducted a comparative analysis of this Bayesian spline model against an existing Bayesian monotone effect curve model. The proposed models are applied in the Primary Palliative Care for Emergency Medicine stepped wedge trial to evaluate the effectiveness of primary palliative care intervention. Extensive simulations and a real-world application demonstrate the strengths of the proposed Bayesian models. The Bayesian immediate effect model consistently achieves near the frequentist nominal coverage probability for true intervention effect, providing more reliable interval estimations than traditional frequentist models, while maintaining high estimation accuracy. The proposed Bayesian time-varying effect model exhibits advancements over the existing Bayesian monotone effect curve model in terms of improved accuracy and reliability. To the best of our knowledge, this is the first development of Bayesian hierarchical spline modeling for SWCRTs. The proposed models offer an accurate and robust analysis of intervention effects. Their application could lead to effective adjustments in intervention strategies. △ Less

Submitted 1 February, 2024; v1 submitted 6 January, 2024; originally announced January 2024.

arXiv:2401.02901 [pdf, other]

Charged-current non-standard neutrino interactions at Daya Bay

Authors: Daya Bay collaboration, F. P. An, W. D. Bai, A. B. Balantekin, M. Bishai, S. Blyth, G. F. Cao, J. Cao, J. F. Chang, Y. Chang, H. S. Chen, H. Y. Chen, S. M. Chen, Y. Chen, Y. X. Chen, Z. Y. Chen, J. Cheng, Y. C. Cheng, Z. K. Cheng, J. J. Cherwinka, M. C. Chu, J. P. Cummings, O. Dalager, F. S. Deng, X. Y. Ding , et al. (177 additional authors not shown)

Abstract: The full data set of the Daya Bay reactor neutrino experiment is used to probe the effect of the charged current non-standard interactions (CC-NSI) on neutrino oscillation experiments. Two different approaches are applied and constraints on the corresponding CC-NSI parameters are obtained with the neutrino flux taken from the Huber-Mueller model with a $5\%$ uncertainty. For the quantum mechanics-… ▽ More The full data set of the Daya Bay reactor neutrino experiment is used to probe the effect of the charged current non-standard interactions (CC-NSI) on neutrino oscillation experiments. Two different approaches are applied and constraints on the corresponding CC-NSI parameters are obtained with the neutrino flux taken from the Huber-Mueller model with a $5\%$ uncertainty. For the quantum mechanics-based approach (QM-NSI), the constraints on the CC-NSI parameters $ε_{eα}$ and $ε_{eα}^{s}$ are extracted with and without the assumption that the effects of the new physics are the same in the production and detection processes, respectively. The approach based on the weak effective field theory (WEFT-NSI) deals with four types of CC-NSI represented by the parameters $[\varepsilon_{X}]_{eα}$. For both approaches, the results for the CC-NSI parameters are shown for cases with various fixed values of the CC-NSI and the Dirac CP-violating phases, and when they are allowed to vary freely. We find that constraints on the QM-NSI parameters $ε_{eα}$ and $ε_{eα}^{s}$ from the Daya Bay experiment alone can reach the order $\mathcal{O}(0.01)$ for the former and $\mathcal{O}(0.1)$ for the latter, while for WEFT-NSI parameters $[\varepsilon_{X}]_{eα}$, we obtain $\mathcal{O}(0.1)$ for both cases. △ Less

Submitted 19 March, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

Comments: 25 pages, 16 figures, 6 tables; 36 pages, format changed, references added

arXiv:2401.02888 [pdf, other]

On the Selection of Intermediate Length Representative Periods for Capacity Expansion

Authors: Osten Anderson, Nanpeng Yu, Konstantinos Oikonomou, Di Wu

Abstract: As the decarbonization of power systems accelerates, there has been increasing interest in capacity expansion models for their role in guiding this transition. Representative period selection is an important component of capacity expansion modeling, enabling computational tractability of optimization while ensuring fidelity between the representative periods and the full year. However, little atte… ▽ More As the decarbonization of power systems accelerates, there has been increasing interest in capacity expansion models for their role in guiding this transition. Representative period selection is an important component of capacity expansion modeling, enabling computational tractability of optimization while ensuring fidelity between the representative periods and the full year. However, little attention has been devoted to selecting representative periods longer than a single day. This prevents the capacity expansion model from directly simulating interday energy sharing, which is of key importance as energy generation becomes more variable and storage more important. To this end, we propose a novel method for selecting representative periods of any length. The method is validated using a capacity expansion model and production cost model based on California's decarbonization goals. We demonstrate that the representative period length has a substantial impact in the results of the capacity expansion investment plan. △ Less

Submitted 5 January, 2024; originally announced January 2024.

arXiv:2401.02435 [pdf, other]

doi 10.1109/TVCG.2023.3262039

Image Collage on Arbitrary Shape via Shape-Aware Slicing and Optimization

Authors: Dong-Yi Wu, Thi-Ngoc-Hanh Le, Sheng-Yi Yao, Yun-Chen Lin, Tong-Yee Lee

Abstract: Image collage is a very useful tool for visualizing an image collection. Most of the existing methods and commercial applications for generating image collages are designed on simple shapes, such as rectangular and circular layouts. This greatly limits the use of image collages in some artistic and creative settings. Although there are some methods that can generate irregularly-shaped image collag… ▽ More Image collage is a very useful tool for visualizing an image collection. Most of the existing methods and commercial applications for generating image collages are designed on simple shapes, such as rectangular and circular layouts. This greatly limits the use of image collages in some artistic and creative settings. Although there are some methods that can generate irregularly-shaped image collages, they often suffer from severe image overlapping and excessive blank space. This prevents such methods from being effective information communication tools. In this paper, we present a shape slicing algorithm and an optimization scheme that can create image collages of arbitrary shapes in an informative and visually pleasing manner given an input shape and an image collection. To overcome the challenge of irregular shapes, we propose a novel algorithm, called Shape-Aware Slicing, which partitions the input shape into cells based on medial axis and binary slicing tree. Shape-Aware Slicing, which is designed specifically for irregular shapes, takes human perception and shape structure into account to generate visually pleasing partitions. Then, the layout is optimized by analyzing input images with the goal of maximizing the total salient regions of the images. To evaluate our method, we conduct extensive experiments and compare our results against previous work. The evaluations show that our proposed algorithm can efficiently arrange image collections on irregular shapes and create visually superior results than prior work and existing commercial tools. △ Less

Submitted 17 November, 2023; originally announced January 2024.

Comments: This paper has been accepted for publication on IEEE Transactions on Visualization and Computer Graphics (TVCG), March 2023. Project website http://graphics.csie.ncku.edu.tw/shapedimagecollage

arXiv:2401.01589 [pdf, other]

The Security and Privacy of Mobile Edge Computing: An Artificial Intelligence Perspective

Authors: Cheng Wang, Zenghui Yuan, Pan Zhou, Zichuan Xu, Ruixuan Li, Dapeng Oliver Wu

Abstract: Mobile Edge Computing (MEC) is a new computing paradigm that enables cloud computing and information technology (IT) services to be delivered at the network's edge. By shifting the load of cloud computing to individual local servers, MEC helps meet the requirements of ultralow latency, localized data processing, and extends the potential of Internet of Things (IoT) for end-users. However, the cros… ▽ More Mobile Edge Computing (MEC) is a new computing paradigm that enables cloud computing and information technology (IT) services to be delivered at the network's edge. By shifting the load of cloud computing to individual local servers, MEC helps meet the requirements of ultralow latency, localized data processing, and extends the potential of Internet of Things (IoT) for end-users. However, the crosscutting nature of MEC and the multidisciplinary components necessary for its deployment have presented additional security and privacy concerns. Fortunately, Artificial Intelligence (AI) algorithms can cope with excessively unpredictable and complex data, which offers a distinct advantage in dealing with sophisticated and developing adversaries in the security industry. Hence, in this paper we comprehensively provide a survey of security and privacy in MEC from the perspective of AI. On the one hand, we use European Telecommunications Standards Institute (ETSI) MEC reference architecture as our based framework while merging the Software Defined Network (SDN) and Network Function Virtualization (NFV) to better illustrate a serviceable platform of MEC. On the other hand, we focus on new security and privacy issues, as well as potential solutions from the viewpoints of AI. Finally, we comprehensively discuss the opportunities and challenges associated with applying AI to MEC security and privacy as possible future research directions. △ Less

Submitted 3 January, 2024; originally announced January 2024.

Comments: Accepted at IEEE IoTJ

arXiv:2401.00897 [pdf, other]

Masked Modeling for Self-supervised Representation Learning on Vision and Beyond

Authors: Siyuan Li, Luyuan Zhang, Zedong Wang, Di Wu, Lirong Wu, Zicheng Liu, Jun Xia, Cheng Tan, Yang Liu, Baigui Sun, Stan Z. Li

Abstract: As the deep learning revolution marches on, self-supervised learning has garnered increasing attention in recent years thanks to its remarkable representation learning ability and the low dependence on labeled data. Among these varied self-supervised techniques, masked modeling has emerged as a distinctive approach that involves predicting parts of the original data that are proportionally masked… ▽ More As the deep learning revolution marches on, self-supervised learning has garnered increasing attention in recent years thanks to its remarkable representation learning ability and the low dependence on labeled data. Among these varied self-supervised techniques, masked modeling has emerged as a distinctive approach that involves predicting parts of the original data that are proportionally masked during training. This paradigm enables deep models to learn robust representations and has demonstrated exceptional performance in the context of computer vision, natural language processing, and other modalities. In this survey, we present a comprehensive review of the masked modeling framework and its methodology. We elaborate on the details of techniques within masked modeling, including diverse masking strategies, recovering targets, network architectures, and more. Then, we systematically investigate its wide-ranging applications across domains. Furthermore, we also explore the commonalities and differences between masked modeling methods in different fields. Toward the end of this paper, we conclude by discussing the limitations of current techniques and point out several potential avenues for advancing masked modeling research. A paper list project with this survey is available at \url{https://github.com/Lupin1998/Awesome-MIM}. △ Less

Submitted 9 January, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

Comments: Preprint v2 (fix typos and citations). GitHub project at https://github.com/Lupin1998/Awesome-MIM

arXiv:2401.00187 [pdf]

Anomalous size effects of effective stiffnesses in bistable counter-rotating mechanical metamaterials

Authors: Zehuan Tang, Tingfeng Ma, Boyue Su, Pengfei Kang, Bowei Wu, Hui Chen, Shuanghuizhi Li, Decai Wu, Yujie Zhang, Gen Zhao

Abstract: Counter-rotating mechanical metamaterials have previously been found to have anomalous characteristics or functions such as auxetics effects, shape changers, and soliton transports, which are all under monostable conditions. The properties of counter-rotating mechanical metamaterials under bistable conditions have not yet been explored. Here, we found that for a bistable counter-rotating metamater… ▽ More Counter-rotating mechanical metamaterials have previously been found to have anomalous characteristics or functions such as auxetics effects, shape changers, and soliton transports, which are all under monostable conditions. The properties of counter-rotating mechanical metamaterials under bistable conditions have not yet been explored. Here, we found that for a bistable counter-rotating metamaterial chain, the effective stiffnesses of the two steady states are different in the chain with even-numbered nodes. For the chain with odd-numbered nodes, the effective stiffnesses corresponding to the two steady states are exactly the same. This special property is not characterized by the characteristic attenuation lengths of the underlying mechanism, but depends on the different symmetries of the underlying mechanism of the chains with odd and even nodes. In addition, the relationship between the abnormal non-monotonic size effect and equilibrium angle are clarified. More interestingly, for one-dimensional chains with even-numbered nodes, the size effect of effective stiffness bifurcates at a specific equilibrium angle, and the according mechanisms are revealed. △ Less

Submitted 30 December, 2023; originally announced January 2024.

arXiv:2312.17346 [pdf, other]

STanHop: Sparse Tandem Hopfield Model for Memory-Enhanced Time Series Prediction

Authors: Dennis Wu, Jerry Yao-Chieh Hu, Weijian Li, Bo-Yu Chen, Han Liu

Abstract: We present STanHop-Net (Sparse Tandem Hopfield Network) for multivariate time series prediction with memory-enhanced capabilities. At the heart of our approach is STanHop, a novel Hopfield-based neural network block, which sparsely learns and stores both temporal and cross-series representations in a data-dependent fashion. In essence, STanHop sequentially learn temporal representation and cross-s… ▽ More We present STanHop-Net (Sparse Tandem Hopfield Network) for multivariate time series prediction with memory-enhanced capabilities. At the heart of our approach is STanHop, a novel Hopfield-based neural network block, which sparsely learns and stores both temporal and cross-series representations in a data-dependent fashion. In essence, STanHop sequentially learn temporal representation and cross-series representation using two tandem sparse Hopfield layers. In addition, StanHop incorporates two additional external memory modules: a Plug-and-Play module and a Tune-and-Play module for train-less and task-aware memory-enhancements, respectively. They allow StanHop-Net to swiftly respond to certain sudden events. Methodologically, we construct the StanHop-Net by stacking STanHop blocks in a hierarchical fashion, enabling multi-resolution feature extraction with resolution-specific sparsity. Theoretically, we introduce a sparse extension of the modern Hopfield model (Generalized Sparse Modern Hopfield Model) and show that it endows a tighter memory retrieval error compared to the dense counterpart without sacrificing memory capacity. Empirically, we validate the efficacy of our framework on both synthetic and real-world settings. △ Less

Submitted 28 December, 2023; originally announced December 2023.

arXiv:2312.17071 [pdf, other]

SCTNet: Single-Branch CNN with Transformer Semantic Information for Real-Time Segmentation

Authors: Zhengze Xu, Dongyue Wu, Changqian Yu, Xiangxiang Chu, Nong Sang, Changxin Gao

Abstract: Recent real-time semantic segmentation methods usually adopt an additional semantic branch to pursue rich long-range context. However, the additional branch incurs undesirable computational overhead and slows inference speed. To eliminate this dilemma, we propose SCTNet, a single branch CNN with transformer semantic information for real-time segmentation. SCTNet enjoys the rich semantic representa… ▽ More Recent real-time semantic segmentation methods usually adopt an additional semantic branch to pursue rich long-range context. However, the additional branch incurs undesirable computational overhead and slows inference speed. To eliminate this dilemma, we propose SCTNet, a single branch CNN with transformer semantic information for real-time segmentation. SCTNet enjoys the rich semantic representations of an inference-free semantic branch while retaining the high efficiency of lightweight single branch CNN. SCTNet utilizes a transformer as the training-only semantic branch considering its superb ability to extract long-range context. With the help of the proposed transformer-like CNN block CFBlock and the semantic information alignment module, SCTNet could capture the rich semantic information from the transformer branch in training. During the inference, only the single branch CNN needs to be deployed. We conduct extensive experiments on Cityscapes, ADE20K, and COCO-Stuff-10K, and the results show that our method achieves the new state-of-the-art performance. The code and model is available at https://github.com/xzz777/SCTNet △ Less

Submitted 15 January, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

Comments: Accepted by AAAI 2024; typos corrected; code and models have been released at https://github.com/xzz777/SCTNet

arXiv:2312.15603 [pdf, other]

A Split-and-Privatize Framework for Large Language Model Fine-Tuning

Authors: Xicong Shen, Yang Liu, Huiqi Liu, Jue Hong, Bing Duan, Zirui Huang, Yunlong Mao, Ye Wu, Di Wu

Abstract: Fine-tuning is a prominent technique to adapt a pre-trained language model to downstream scenarios. In parameter-efficient fine-tuning, only a small subset of modules are trained over the downstream datasets, while leaving the rest of the pre-trained model frozen to save computation resources. In recent years, a popular productization form arises as Model-as-a-Service (MaaS), in which vendors prov… ▽ More Fine-tuning is a prominent technique to adapt a pre-trained language model to downstream scenarios. In parameter-efficient fine-tuning, only a small subset of modules are trained over the downstream datasets, while leaving the rest of the pre-trained model frozen to save computation resources. In recent years, a popular productization form arises as Model-as-a-Service (MaaS), in which vendors provide abundant pre-trained language models, server resources and core functions, and customers can fine-tune, deploy and invoke their customized model by accessing the one-stop MaaS with their own private dataset. In this paper, we identify the model and data privacy leakage risks in MaaS fine-tuning, and propose a Split-and-Privatize (SAP) framework, which manage to mitigate the privacy issues by adapting the existing split learning architecture. The proposed SAP framework is sufficiently investigated by experiments, and the results indicate that it can enhance the empirical privacy by 62% at the cost of 1% model performance degradation on the Stanford Sentiment Treebank dataset. △ Less

Submitted 24 December, 2023; originally announced December 2023.

arXiv:2312.15320 [pdf]

GestaltMML: Enhancing Rare Genetic Disease Diagnosis through Multimodal Machine Learning Combining Facial Images and Clinical Texts

Authors: Da Wu, Jingye Yang, Cong Liu, Tzung-Chien Hsieh, Elaine Marchi, Justin Blair, Peter Krawitz, Chunhua Weng, Wendy Chung, Gholson J. Lyon, Ian D. Krantz, Jennifer M. Kalish, Kai Wang

Abstract: Individuals with suspected rare genetic disorders often undergo multiple clinical evaluations, imaging studies, laboratory tests and genetic tests, to find a possible answer over a prolonged period of time. Addressing this "diagnostic odyssey" thus has substantial clinical, psychosocial, and economic benefits. Many rare genetic diseases have distinctive facial features, which can be used by artifi… ▽ More Individuals with suspected rare genetic disorders often undergo multiple clinical evaluations, imaging studies, laboratory tests and genetic tests, to find a possible answer over a prolonged period of time. Addressing this "diagnostic odyssey" thus has substantial clinical, psychosocial, and economic benefits. Many rare genetic diseases have distinctive facial features, which can be used by artificial intelligence algorithms to facilitate clinical diagnosis, in prioritizing candidate diseases to be further examined by lab tests or genetic assays, or in helping the phenotype-driven reinterpretation of genome/exome sequencing data. Existing methods using frontal facial photos were built on conventional Convolutional Neural Networks (CNNs), rely exclusively on facial images, and cannot capture non-facial phenotypic traits and demographic information essential for guiding accurate diagnoses. Here we introduce GestaltMML, a multimodal machine learning (MML) approach solely based on the Transformer architecture. It integrates facial images, demographic information (age, sex, ethnicity), and clinical notes (optionally, a list of Human Phenotype Ontology terms) to improve prediction accuracy. Furthermore, we also evaluated GestaltMML on a diverse range of datasets, including 528 diseases from the GestaltMatcher Database, several in-house datasets of Beckwith-Wiedemann syndrome (BWS, over-growth syndrome with distinct facial features), Sotos syndrome (overgrowth syndrome with overlapping features with BWS), NAA10-related neurodevelopmental syndrome, Cornelia de Lange syndrome (multiple malformation syndrome), and KBG syndrome (multiple malformation syndrome). Our results suggest that GestaltMML effectively incorporates multiple modalities of data, greatly narrowing candidate genetic diagnoses of rare diseases and may facilitate the reinterpretation of genome/exome sequencing data. △ Less

Submitted 21 April, 2024; v1 submitted 23 December, 2023; originally announced December 2023.

Comments: Significant revisions

arXiv:2312.14409 [pdf, other]

Probing scalar induced gravitational waves with PTA and LISA: The Importance of third order correction

Authors: Zhe Chang, Yu-Ting Kuang, Di Wu, Jing-Zhi Zhou

Abstract: We revisit the calculation of third order \acp{SIGW} and extend it from a monochromatic primordial power spectrum to a more general log-normal one. We investigate the impact of third order SIGWs on \ac{SNR} of \ac{LISA} and \ac{PTA} observations, and find that third order SIGWs significantly contribute to the total energy density spectrum of \acp{GW} in high-frequency region. For a primordial powe… ▽ More We revisit the calculation of third order \acp{SIGW} and extend it from a monochromatic primordial power spectrum to a more general log-normal one. We investigate the impact of third order SIGWs on \ac{SNR} of \ac{LISA} and \ac{PTA} observations, and find that third order SIGWs significantly contribute to the total energy density spectrum of \acp{GW} in high-frequency region. For a primordial power spectrum amplitude of $A_ζ=10^{-2}\sim 10^{-1}$, the effects of third order SIGWs lead to a $40\%$ to $400\%$ increase in the SNR for LISA. Additionally, our PTA data analysis reveals that third order SIGWs diminish both the amplitude $A_ζ$ and the peak frequency $f_*$ of the primordial power spectrum. △ Less

Submitted 26 February, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

arXiv:2312.11863 [pdf, other]

Neural Network Approximation for Pessimistic Offline Reinforcement Learning

Authors: Di Wu, Yuling Jiao, Li Shen, Haizhao Yang, Xiliang Lu

Abstract: Deep reinforcement learning (RL) has shown remarkable success in specific offline decision-making scenarios, yet its theoretical guarantees are still under development. Existing works on offline RL theory primarily emphasize a few trivial settings, such as linear MDP or general function approximation with strong assumptions and independent data, which lack guidance for practical use. The coupling… ▽ More Deep reinforcement learning (RL) has shown remarkable success in specific offline decision-making scenarios, yet its theoretical guarantees are still under development. Existing works on offline RL theory primarily emphasize a few trivial settings, such as linear MDP or general function approximation with strong assumptions and independent data, which lack guidance for practical use. The coupling of deep learning and Bellman residuals makes this problem challenging, in addition to the difficulty of data dependence. In this paper, we establish a non-asymptotic estimation error of pessimistic offline RL using general neural network approximation with $\mathcal{C}$-mixing data regarding the structure of networks, the dimension of datasets, and the concentrability of data coverage, under mild assumptions. Our result shows that the estimation error consists of two parts: the first converges to zero at a desired rate on the sample size with partially controllable concentrability, and the second becomes negligible if the residual constraint is tight. This result demonstrates the explicit efficiency of deep adversarial offline RL frameworks. We utilize the empirical process tool for $\mathcal{C}$-mixing sequences and the neural network approximation theory for the Hölder class to achieve this. We also develop methods to bound the Bellman estimation error caused by function approximation with empirical Bellman constraint perturbations. Additionally, we present a result that lessens the curse of dimensionality using data with low intrinsic dimensionality and function classes with low complexity. Our estimation provides valuable insights into the development of deep offline RL and guidance for algorithm model design. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: Full version of the paper accepted to the 38th Annual AAAI Conference on Artificial Intelligence (AAAI 2024)

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.10317 [pdf, other]

Spatial-Temporal DAG Convolutional Networks for End-to-End Joint Effective Connectivity Learning and Resting-State fMRI Classification

Authors: Rui Yang, Wenrui Dai, Huajun She, Yiping P. Du, Dapeng Wu, Hongkai Xiong

Abstract: Building comprehensive brain connectomes has proved of fundamental importance in resting-state fMRI (rs-fMRI) analysis. Based on the foundation of brain network, spatial-temporal-based graph convolutional networks have dramatically improved the performance of deep learning methods in rs-fMRI time series classification. However, existing works either pre-define the brain network as the correlation… ▽ More Building comprehensive brain connectomes has proved of fundamental importance in resting-state fMRI (rs-fMRI) analysis. Based on the foundation of brain network, spatial-temporal-based graph convolutional networks have dramatically improved the performance of deep learning methods in rs-fMRI time series classification. However, existing works either pre-define the brain network as the correlation matrix derived from the raw time series or jointly learn the connectome and model parameters without any topology constraint. These methods could suffer from degraded classification performance caused by the deviation from the intrinsic brain connectivity and lack biological interpretability of demonstrating the causal structure (i.e., effective connectivity) among brain regions. Moreover, most existing methods for effective connectivity learning are unaware of the downstream classification task and cannot sufficiently exploit useful rs-fMRI label information. To address these issues in an end-to-end manner, we model the brain network as a directed acyclic graph (DAG) to discover direct causal connections between brain regions and propose Spatial-Temporal DAG Convolutional Network (ST-DAGCN) to jointly infer effective connectivity and classify rs-fMRI time series by learning brain representations based on nonlinear structural equation model. The optimization problem is formulated into a continuous program and solved with score-based learning method via gradient descent. We evaluate ST-DAGCN on two public rs-fMRI databases. Experiments show that ST-DAGCN outperforms existing models by evident margins in rs-fMRI classification and simultaneously learns meaningful edges of effective connectivity that help understand brain activity patterns and pathological mechanisms in brain disease. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: Accepted by NeurIPS 2023 Temporal Graph Learning Workshop

arXiv:2312.10310 [pdf, other]

scBiGNN: Bilevel Graph Representation Learning for Cell Type Classification from Single-cell RNA Sequencing Data

Authors: Rui Yang, Wenrui Dai, Chenglin Li, Junni Zou, Dapeng Wu, Hongkai Xiong

Abstract: Single-cell RNA sequencing (scRNA-seq) technology provides high-throughput gene expression data to study the cellular heterogeneity and dynamics of complex organisms. Graph neural networks (GNNs) have been widely used for automatic cell type classification, which is a fundamental problem to solve in scRNA-seq analysis. However, existing methods do not sufficiently exploit both gene-gene and cell-c… ▽ More Single-cell RNA sequencing (scRNA-seq) technology provides high-throughput gene expression data to study the cellular heterogeneity and dynamics of complex organisms. Graph neural networks (GNNs) have been widely used for automatic cell type classification, which is a fundamental problem to solve in scRNA-seq analysis. However, existing methods do not sufficiently exploit both gene-gene and cell-cell relationships, and thus the true potential of GNNs is not realized. In this work, we propose a bilevel graph representation learning method, named scBiGNN, to simultaneously mine the relationships at both gene and cell levels for more accurate single-cell classification. Specifically, scBiGNN comprises two GNN modules to identify cell types. A gene-level GNN is established to adaptively learn gene-gene interactions and cell representations via the self-attention mechanism, and a cell-level GNN builds on the cell-cell graph that is constructed from the cell representations generated by the gene-level GNN. To tackle the scalability issue for processing a large number of cells, scBiGNN adopts an Expectation Maximization (EM) framework in which the two modules are alternately trained via the E-step and M-step to learn from each other. Through this interaction, the gene- and cell-level structural information is integrated to gradually enhance the classification performance of both GNN modules. Experiments on benchmark datasets demonstrate that our scBiGNN outperforms a variety of existing methods for cell type classification from scRNA-seq data. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: Accepted by NeurIPS 2023 AI for Science Workshop

arXiv:2312.07795 [pdf, other]

Traffic Signal Control Using Lightweight Transformers: An Offline-to-Online RL Approach

Authors: Xingshuai Huang, Di Wu, Benoit Boulet

Abstract: Efficient traffic signal control is critical for reducing traffic congestion and improving overall transportation efficiency. The dynamic nature of traffic flow has prompted researchers to explore Reinforcement Learning (RL) for traffic signal control (TSC). Compared with traditional methods, RL-based solutions have shown preferable performance. However, the application of RL-based traffic signal… ▽ More Efficient traffic signal control is critical for reducing traffic congestion and improving overall transportation efficiency. The dynamic nature of traffic flow has prompted researchers to explore Reinforcement Learning (RL) for traffic signal control (TSC). Compared with traditional methods, RL-based solutions have shown preferable performance. However, the application of RL-based traffic signal controllers in the real world is limited by the low sample efficiency and high computational requirements of these solutions. In this work, we propose DTLight, a simple yet powerful lightweight Decision Transformer-based TSC method that can learn policy from easily accessible offline datasets. DTLight novelly leverages knowledge distillation to learn a lightweight controller from a well-trained larger teacher model to reduce implementation computation. Additionally, it integrates adapter modules to mitigate the expenses associated with fine-tuning, which makes DTLight practical for online adaptation with minimal computation and only a few fine-tuning steps during real deployment. Moreover, DTLight is further enhanced to be more applicable to real-world TSC problems. Extensive experiments on synthetic and real-world scenarios show that DTLight pre-trained purely on offline datasets can outperform state-of-the-art online RL-based methods in most scenarios. Experiment results also show that online fine-tuning further improves the performance of DTLight by up to 42.6% over the best online RL baseline methods. In this work, we also introduce Datasets specifically designed for TSC with offline RL (referred to as DTRL). Our datasets and code are publicly available. △ Less

Submitted 12 December, 2023; originally announced December 2023.

arXiv:2312.06655 [pdf, other]

Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior

Authors: Fangfu Liu, Diankun Wu, Yi Wei, Yongming Rao, Yueqi Duan

Abstract: Recently, 3D content creation from text prompts has demonstrated remarkable progress by utilizing 2D and 3D diffusion models. While 3D diffusion models ensure great multi-view consistency, their ability to generate high-quality and diverse 3D assets is hindered by the limited 3D data. In contrast, 2D diffusion models find a distillation approach that achieves excellent generalization and rich deta… ▽ More Recently, 3D content creation from text prompts has demonstrated remarkable progress by utilizing 2D and 3D diffusion models. While 3D diffusion models ensure great multi-view consistency, their ability to generate high-quality and diverse 3D assets is hindered by the limited 3D data. In contrast, 2D diffusion models find a distillation approach that achieves excellent generalization and rich details without any 3D data. However, 2D lifting methods suffer from inherent view-agnostic ambiguity thereby leading to serious multi-face Janus issues, where text prompts fail to provide sufficient guidance to learn coherent 3D results. Instead of retraining a costly viewpoint-aware model, we study how to fully exploit easily accessible coarse 3D knowledge to enhance the prompts and guide 2D lifting optimization for refinement. In this paper, we propose Sherpa3D, a new text-to-3D framework that achieves high-fidelity, generalizability, and geometric consistency simultaneously. Specifically, we design a pair of guiding strategies derived from the coarse 3D prior generated by the 3D diffusion model: a structural guidance for geometric fidelity and a semantic guidance for 3D coherence. Employing the two types of guidance, the 2D diffusion model enriches the 3D content with diversified and high-quality results. Extensive experiments show the superiority of our Sherpa3D over the state-of-the-art text-to-3D methods in terms of quality and 3D consistency. △ Less

Submitted 11 December, 2023; originally announced December 2023.

Comments: Project page: https://liuff19.github.io/Sherpa3D/

arXiv:2312.06234 [pdf]

Delivery of nanosecond laser pulses by multimode anti-resonant hollow core fiber at 1 um wavelength

Authors: Meng Zhao, Fei Yu, Dakun Wu, Xinyue Zhu, Si Chen, Meng Wang, MinZhe Liu, Kun Zhao, RuiZhan Zhai, Zhongqing Jia, Jonathan Knight

Abstract: In this paper we explore the application of low-loss multimode anti-resonant hollow-core fiber (MM-AR-HCF) in the delivery of nanosecond laser pulses at 1 um wavelength. MM-AR-HCF of large core offers a rich content of low-loss higher-order modes which plays a key role in the efficient coupling and transmission of high-power laser of degraded beam quality. In the experiment, laser pulses of an ave… ▽ More In this paper we explore the application of low-loss multimode anti-resonant hollow-core fiber (MM-AR-HCF) in the delivery of nanosecond laser pulses at 1 um wavelength. MM-AR-HCF of large core offers a rich content of low-loss higher-order modes which plays a key role in the efficient coupling and transmission of high-power laser of degraded beam quality. In the experiment, laser pulses of an average pulse energy of 21.8 mJ with 14.6 ns pulse width (corresponding a peak power of 1.49 MW) are transmitted through MM-AR-HCF of 9.8 m length without damaging. Up to 94 % coupling efficiency is achieved where the incident laser beam suffers a degraded beam quality with and of 2.18 and 1.99 respectively. Laser-induced damage threshold (LIDT) of MM-AR-HCF measures 22.6 mJ for 94 % coupling efficiency, which is 7 times higher than that for multimode silica optical fiber with a core diameter of 200 um. △ Less

Submitted 11 December, 2023; originally announced December 2023.

Comments: 8 pages, 9 figures

arXiv:2312.04087 [pdf, other]

VRPTEST: Evaluating Visual Referring Prompting in Large Multimodal Models

Authors: Zongjie Li, Chaozheng Wang, Chaowei Liu, Pingchuan Ma, Daoyuan Wu, Shuai Wang, Cuiyun Gao

Abstract: With recent advancements in Large Multimodal Models (LMMs) across various domains, a novel prompting method called visual referring prompting has emerged, showing significant potential in enhancing human-computer interaction within multimodal systems. This method offers a more natural and flexible approach to human interaction with these systems compared to traditional text descriptions or coordin… ▽ More With recent advancements in Large Multimodal Models (LMMs) across various domains, a novel prompting method called visual referring prompting has emerged, showing significant potential in enhancing human-computer interaction within multimodal systems. This method offers a more natural and flexible approach to human interaction with these systems compared to traditional text descriptions or coordinates. However, the categorization of visual referring prompting remains undefined, and its impact on the performance of LMMs has yet to be formally examined. In this study, we conduct the first comprehensive analysis of LMMs using a variety of visual referring prompting strategies. We introduce a benchmark dataset called VRPTEST, comprising 3 different visual tasks and 2,275 images, spanning diverse combinations of prompt strategies. Using VRPTEST, we conduct a comprehensive evaluation of eight versions of prominent open-source and proprietary foundation models, including two early versions of GPT-4V. We develop an automated assessment framework based on software metamorphic testing techniques to evaluate the accuracy of LMMs without the need for human intervention or manual labeling. We find that the current proprietary models generally outperform the open-source ones, showing an average accuracy improvement of 22.70%; however, there is still potential for improvement. Moreover, our quantitative analysis shows that the choice of prompt strategy significantly affects the accuracy of LMMs, with variations ranging from -17.5% to +7.3%. Further case studies indicate that an appropriate visual referring prompting strategy can improve LMMs' understanding of context and location information, while an unsuitable one might lead to answer rejection. We also provide insights on minimizing the negative impact of visual referring prompting on LMMs. △ Less

Submitted 7 December, 2023; originally announced December 2023.

Comments: 13 pages

arXiv:2312.04054 [pdf, other]

Queueing Delay Minimization in Overloaded Networks via Rate Control

Authors: Xinyu Wu, Dan Wu, Eytan Modiano

Abstract: We develop link rate control policies to minimize the queueing delay of packets in overloaded networks. We show that increasing link rates does not guarantee delay reduction during overload. We consider a fluid queueing model that facilitates explicit characterization of the queueing delay of packets, and establish explicit conditions on link rates that can minimize the average and maximum queuein… ▽ More We develop link rate control policies to minimize the queueing delay of packets in overloaded networks. We show that increasing link rates does not guarantee delay reduction during overload. We consider a fluid queueing model that facilitates explicit characterization of the queueing delay of packets, and establish explicit conditions on link rates that can minimize the average and maximum queueing delay in both single-hop and multi-stage (switching) networks. These min-delay conditions require maintaining an identical ratio between the ingress and egress rates of different nodes at the same layer of the network. We term the policies that follow these conditions rate-proportional policies. We further generalize the rate-proportional policies to queue-proportional policies, which minimize the queueing delay asymptotically based on the time-varying queue length while remaining agnostic of packet arrival rates. We validate that the proposed policies lead to minimum queueing delay under various network topologies and settings, compared with benchmarks including the backpressure policy that maximizes network throughput and the max-link-rate policy that fully utilizes bandwidth. We further remark that the explicit min-delay policy design in multi-stage networks facilitates co-optimization with other metrics, such as minimizing total bandwidth, balancing link utilization and node buffer usage. This demonstrates the wider utility of our main results in data center network optimization in practice. △ Less

Submitted 7 December, 2023; originally announced December 2023.

arXiv:2312.03633 [pdf]

Exploring the Reversal Curse and Other Deductive Logical Reasoning in BERT and GPT-Based Large Language Models

Authors: Da Wu, Jingye Yang, Kai Wang

Abstract: The term "Reversal Curse" refers to the scenario where auto-regressive decoder large language models (LLMs), such as ChatGPT, trained on "A is B" fail to learn "B is A," assuming that B and A are distinct and can be uniquely identified from each other, demonstrating a basic failure of logical deduction. This raises a red flag in the use of GPT models for certain general tasks such as constructing… ▽ More The term "Reversal Curse" refers to the scenario where auto-regressive decoder large language models (LLMs), such as ChatGPT, trained on "A is B" fail to learn "B is A," assuming that B and A are distinct and can be uniquely identified from each other, demonstrating a basic failure of logical deduction. This raises a red flag in the use of GPT models for certain general tasks such as constructing knowledge graphs, considering their adherence to this symmetric principle. In our study, we examined a bidirectional LLM, BERT, and found that it is immune to the reversal curse. Driven by ongoing efforts to construct biomedical knowledge graphs with LLMs, we also embarked on evaluating more complex but essential deductive reasoning capabilities. This process included first training encoder and decoder language models to master the intersection and union operations on two sets and then moving on to assess their capability to infer different combinations of union and intersection operations on three newly created sets. The findings showed that while both encoder and decoder language models, trained for tasks involving two sets (union/intersection), were proficient in such scenarios, they encountered difficulties when dealing with operations that included three sets (various combinations of union and intersection). Our research highlights the distinct characteristics of encoder and decoder models in simple and complex logical reasoning. In practice, the choice between BERT and GPT should be guided by the specific requirements and nature of the task at hand, leveraging their respective strengths in bidirectional context comprehension and sequence prediction. △ Less

Submitted 1 July, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

Comments: Final revision. To appear in Patterns

arXiv:2312.03277 [pdf, other]

Anomaly Detection for Scalable Task Grouping in Reinforcement Learning-based RAN Optimization

Authors: Jimmy Li, Igor Kozlov, Di Wu, Xue Liu, Gregory Dudek

Abstract: The use of learning-based methods for optimizing cellular radio access networks (RAN) has received increasing attention in recent years. This coincides with a rapid increase in the number of cell sites worldwide, driven largely by dramatic growth in cellular network traffic. Training and maintaining learned models that work well across a large number of cell sites has thus become a pertinent probl… ▽ More The use of learning-based methods for optimizing cellular radio access networks (RAN) has received increasing attention in recent years. This coincides with a rapid increase in the number of cell sites worldwide, driven largely by dramatic growth in cellular network traffic. Training and maintaining learned models that work well across a large number of cell sites has thus become a pertinent problem. This paper proposes a scalable framework for constructing a reinforcement learning policy bank that can perform RAN optimization across a large number of cell sites with varying traffic patterns. Central to our framework is a novel application of anomaly detection techniques to assess the compatibility between sites (tasks) and the policy bank. This allows our framework to intelligently identify when a policy can be reused for a task, and when a new policy needs to be trained and added to the policy bank. Our results show that our approach to compatibility assessment leads to an efficient use of computational resources, by allowing us to construct a performant policy bank without exhaustively training on all tasks, which makes it applicable under real-world constraints. △ Less

Submitted 5 December, 2023; originally announced December 2023.

arXiv:2312.03226 [pdf, other]

doi 10.1109/TIP.2023.3341332

Rethinking Object Saliency Ranking: A Novel Whole-flow Processing Paradigm

Authors: Mengke Song, Linfeng Li, Dunquan Wu, Wenfeng Song, Chenglizhao Chen

Abstract: Existing salient object detection methods are capable of predicting binary maps that highlight visually salient regions. However, these methods are limited in their ability to differentiate the relative importance of multiple objects and the relationships among them, which can lead to errors and reduced accuracy in downstream tasks that depend on the relative importance of multiple objects. To con… ▽ More Existing salient object detection methods are capable of predicting binary maps that highlight visually salient regions. However, these methods are limited in their ability to differentiate the relative importance of multiple objects and the relationships among them, which can lead to errors and reduced accuracy in downstream tasks that depend on the relative importance of multiple objects. To conquer, this paper proposes a new paradigm for saliency ranking, which aims to completely focus on ranking salient objects by their "importance order". While previous works have shown promising performance, they still face ill-posed problems. First, the saliency ranking ground truth (GT) orders generation methods are unreasonable since determining the correct ranking order is not well-defined, resulting in false alarms. Second, training a ranking model remains challenging because most saliency ranking methods follow the multi-task paradigm, leading to conflicts and trade-offs among different tasks. Third, existing regression-based saliency ranking methods are complex for saliency ranking models due to their reliance on instance mask-based saliency ranking orders. These methods require a significant amount of data to perform accurately and can be challenging to implement effectively. To solve these problems, this paper conducts an in-depth analysis of the causes and proposes a whole-flow processing paradigm of saliency ranking task from the perspective of "GT data generation", "network structure design" and "training protocol". The proposed approach outperforms existing state-of-the-art methods on the widely-used SALICON set, as demonstrated by extensive experiments with fair and reasonable comparisons. The saliency ranking task is still in its infancy, and our proposed unified framework can serve as a fundamental strategy to guide future work. △ Less

Submitted 5 December, 2023; originally announced December 2023.

Comments: 16 pages, 14 figures, accepted by IEEE Transactions on Image Processing

arXiv:2312.00746 [pdf, other]

Deciphering Digital Detectives: Understanding LLM Behaviors and Capabilities in Multi-Agent Mystery Games

Authors: Dekun Wu, Haochen Shi, Zhiyuan Sun, Bang Liu

Abstract: In this study, we explore the application of Large Language Models (LLMs) in \textit{Jubensha}, a Chinese detective role-playing game and a novel area in Artificial Intelligence (AI) driven gaming. We introduce the first dataset specifically for Jubensha, including character scripts and game rules, to foster AI agent development in this complex narrative environment. Our work also presents a uniqu… ▽ More In this study, we explore the application of Large Language Models (LLMs) in \textit{Jubensha}, a Chinese detective role-playing game and a novel area in Artificial Intelligence (AI) driven gaming. We introduce the first dataset specifically for Jubensha, including character scripts and game rules, to foster AI agent development in this complex narrative environment. Our work also presents a unique multi-agent interaction framework using LLMs, allowing AI agents to autonomously engage in this game. To evaluate the gaming performance of these AI agents, we developed novel methods measuring their mastery of case information and reasoning skills. Furthermore, we incorporated the latest advancements in in-context learning to improve the agents' performance in information gathering, murderer identification, and logical reasoning. The experimental results validate the effectiveness of our proposed methods. This work aims to offer a novel perspective on understanding LLM capabilities and establish a new benchmark for evaluating large language model-based agents. △ Less

Submitted 29 February, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

ACM Class: I.2.0; I.2.1; I.2.7

arXiv:2312.00589 [pdf, other]

Merlin:Empowering Multimodal LLMs with Foresight Minds

Authors: En Yu, Liang Zhao, Yana Wei, Jinrong Yang, Dongming Wu, Lingyu Kong, Haoran Wei, Tiancai Wang, Zheng Ge, Xiangyu Zhang, Wenbing Tao

Abstract: Humans possess the remarkable ability to foresee the future to a certain extent based on present observations, a skill we term as foresight minds. However, this capability remains largely under explored within existing Multimodal Large Language Models (MLLMs), hindering their capacity to learn the fundamental principles of how things operate and the intentions behind the observed subjects. To addr… ▽ More Humans possess the remarkable ability to foresee the future to a certain extent based on present observations, a skill we term as foresight minds. However, this capability remains largely under explored within existing Multimodal Large Language Models (MLLMs), hindering their capacity to learn the fundamental principles of how things operate and the intentions behind the observed subjects. To address this issue, we introduce the integration of future modeling into the existing learning frameworks of MLLMs. By utilizing the subject trajectory, a highly structured representation of a consecutive frame sequence, as a learning objective, we aim to bridge the gap between the past and the future. We propose two innovative methods to empower MLLMs with foresight minds, Foresight Pre-Training (FPT) and Foresight Instruction-Tuning (FIT), which are inspired by the modern learning paradigm of LLMs. Specifically, FPT jointly training various tasks centered on trajectories, enabling MLLMs to learn how to attend and predict entire trajectories from a given initial observation. Then, FIT requires MLLMs to first predict trajectories of related objects and then reason about potential future events based on them. Aided by FPT and FIT, we build a novel and unified MLLM named Merlin that supports multi-images input and analysis about potential actions of multiple objects for the future reasoning. Experimental results show Merlin powerful foresight minds with impressive performance on both future reasoning and visual comprehension tasks. △ Less

Submitted 3 July, 2024; v1 submitted 30 November, 2023; originally announced December 2023.

Comments: Accepted by ECCV2024. Project page: https://ahnsun.github.io/merlin

arXiv:2312.00324 [pdf, other]

Machine Learning for Actionable Warning Identification: A Comprehensive Survey

Authors: Xiuting Ge, Chunrong Fang, Xuanye Li, Weisong Sun, Daoyuan Wu, Juan Zhai, Shangwei Lin, Zhihong Zhao, Yang Liu, Zhenyu Chen

Abstract: Actionable Warning Identification (AWI) plays a crucial role in improving the usability of static code analyzers. With recent advances in Machine Learning (ML), various approaches have been proposed to incorporate ML techniques into AWI. These ML-based AWI approaches, benefiting from ML's strong ability to learn subtle and previously unseen patterns from historical data, have demonstrated superior… ▽ More Actionable Warning Identification (AWI) plays a crucial role in improving the usability of static code analyzers. With recent advances in Machine Learning (ML), various approaches have been proposed to incorporate ML techniques into AWI. These ML-based AWI approaches, benefiting from ML's strong ability to learn subtle and previously unseen patterns from historical data, have demonstrated superior performance. However, a comprehensive overview of these approaches is missing, which could hinder researchers/practitioners from understanding the current process and discovering potential for future improvement in the ML-based AWI community. In this paper, we systematically review the state-of-the-art ML-based AWI approaches. First, we employ a meticulous survey methodology and gather 50 primary studies from 2000/01/01 to 2023/09/01. Then, we outline the typical ML-based AWI workflow, including warning dataset preparation, preprocessing, AWI model construction, and evaluation stages. In such a workflow, we categorize ML-based AWI approaches based on the warning output format. Besides, we analyze the techniques used in each stage, along with their strengths, weaknesses, and distribution. Finally, we provide practical research directions for future ML-based AWI approaches, focusing on aspects like data improvement (e.g., enhancing the warning labeling strategy) and model exploration (e.g., exploring large language models for AWI). △ Less

Submitted 30 November, 2023; originally announced December 2023.

arXiv:2311.17819 [pdf, ps, other]

Weak Solar Radio Bursts from the Solar Wind Acceleration Region Observed by Parker Solar Probe and Its Probable Emission Mechanism

Authors: Ling Chen, Bing Ma, Dejin Wu, Xiaowei Zhou, Marc Pulupa, PeiJin Zhang, Pietro Zucca, Stuart D. Bale, Justin C. Kasper, SuPing Duan

Abstract: The Parker Solar Probe (PSP) provides us the unprecedentedly close approach observation to the Sun, and hence the possibility of directly understanding the "elementary process" which occurs in the kinetic scale of particles collective interactioin in solar coronal plasmas. We reported a kind of weak solar radio bursts (SRBs), which are detected by PSP when it passed a low-density magnetic channel… ▽ More The Parker Solar Probe (PSP) provides us the unprecedentedly close approach observation to the Sun, and hence the possibility of directly understanding the "elementary process" which occurs in the kinetic scale of particles collective interactioin in solar coronal plasmas. We reported a kind of weak solar radio bursts (SRBs), which are detected by PSP when it passed a low-density magnetic channel during its second encounter phase. These weak SRBs have low starting frequecny $\sim 20$ MHz and narrow frequency range from a few tens MHz to a few hundres kHz. Their dynamic spectra display a strongly evolving feature of the intermediate relative drift rate decreasing rapidly from above 0.01/s to below 0.01/s. Analyses based on common empirical models of solar coronal plasmas indicate that these weak SRBs originate from the heliocentric distance $\sim 1.1-6.1~R_S$ (the solar radius), a typical solar wind acceleration region with a low-$β$ plasma, and indicate that their soruces have a typic motion velociy $\sim v_A$ (Alfvén velocity) obviously lower than that of fast electrons required by effectively exciting SRBs. We propose that solitary kinetic Alfvén waves with kinetic scales can be responsible for the generation of these small-scalevweak SRBs, called solitary wave radiation (SWR). △ Less

Submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.17354 [pdf]

A natural language processing-based approach: mapping human perception by understanding deep semantic features in street view images

Authors: Haoran Ma, Dongdong Wu

Abstract: In the past decade, using Street View images and machine learning to measure human perception has become a mainstream research approach in urban science. However, this approach using only image-shallow information makes it difficult to comprehensively understand the deep semantic features of human perception of a scene. In this study, we proposed a new framework based on a pre-train natural langua… ▽ More In the past decade, using Street View images and machine learning to measure human perception has become a mainstream research approach in urban science. However, this approach using only image-shallow information makes it difficult to comprehensively understand the deep semantic features of human perception of a scene. In this study, we proposed a new framework based on a pre-train natural language model to understand the relationship between human perception and the sense of a scene. Firstly, Place Pulse 2.0 was used as our base dataset, which contains a variety of human-perceived labels, namely, beautiful, safe, wealthy, depressing, boring, and lively. An image captioning network was used to extract the description information of each street view image. Secondly, a pre-trained BERT model was finetuning and added a regression function for six human perceptual dimensions. Furthermore, we compared the performance of five traditional regression methods with our approach and conducted a migration experiment in Hong Kong. Our results show that human perception scoring by deep semantic features performed better than previous studies by machine learning methods with shallow features. The use of deep scene semantic features provides new ideas for subsequent human perception research, as well as better explanatory power in the face of spatial heterogeneity. △ Less

Submitted 29 November, 2023; originally announced November 2023.

Comments: 11 pages, 8 figures

arXiv:2311.15772 [pdf, other]

Attend Who is Weak: Enhancing Graph Condensation via Cross-Free Adversarial Training

Authors: Xinglin Li, Kun Wang, Hanhui Deng, Yuxuan Liang, Di Wu

Abstract: In this paper, we study the \textit{graph condensation} problem by compressing the large, complex graph into a concise, synthetic representation that preserves the most essential and discriminative information of structure and features. We seminally propose the concept of Shock Absorber (a type of perturbation) that enhances the robustness and stability of the original graphs against changes in an… ▽ More In this paper, we study the \textit{graph condensation} problem by compressing the large, complex graph into a concise, synthetic representation that preserves the most essential and discriminative information of structure and features. We seminally propose the concept of Shock Absorber (a type of perturbation) that enhances the robustness and stability of the original graphs against changes in an adversarial training fashion. Concretely, (I) we forcibly match the gradients between pre-selected graph neural networks (GNNs) trained on a synthetic, simplified graph and the original training graph at regularly spaced intervals. (II) Before each update synthetic graph point, a Shock Absorber serves as a gradient attacker to maximize the distance between the synthetic dataset and the original graph by selectively perturbing the parts that are underrepresented or insufficiently informative. We iteratively repeat the above two processes (I and II) in an adversarial training fashion to maintain the highly-informative context without losing correlation with the original dataset. More importantly, our shock absorber and the synthesized graph parallelly share the backward process in a free training manner. Compared to the original adversarial training, it introduces almost no additional time overhead. We validate our framework across 8 datasets (3 graph and 5 node classification datasets) and achieve prominent results: for example, on Cora, Citeseer and Ogbn-Arxiv, we can gain nearly 1.13% to 5.03% improvements compare with SOTA models. Moreover, our algorithm adds only about 0.2% to 2.2% additional time overhead over Flicker, Citeseer and Ogbn-Arxiv. Compared to the general adversarial training, our approach improves time efficiency by nearly 4-fold. △ Less

Submitted 27 November, 2023; originally announced November 2023.

arXiv:2311.15214 [pdf, other]

doi 10.1109/TPAMI.2023.3279394

A Novel Normalized-Cut Solver with Nearest Neighbor Hierarchical Initialization

Authors: Feiping Nie, Jitao Lu, Danyang Wu, Rong Wang, Xuelong Li

Abstract: Normalized-Cut (N-Cut) is a famous model of spectral clustering. The traditional N-Cut solvers are two-stage: 1) calculating the continuous spectral embedding of normalized Laplacian matrix; 2) discretization via $K$-means or spectral rotation. However, this paradigm brings two vital problems: 1) two-stage methods solve a relaxed version of the original problem, so they cannot obtain good solution… ▽ More Normalized-Cut (N-Cut) is a famous model of spectral clustering. The traditional N-Cut solvers are two-stage: 1) calculating the continuous spectral embedding of normalized Laplacian matrix; 2) discretization via $K$-means or spectral rotation. However, this paradigm brings two vital problems: 1) two-stage methods solve a relaxed version of the original problem, so they cannot obtain good solutions for the original N-Cut problem; 2) solving the relaxed problem requires eigenvalue decomposition, which has $\mathcal{O}(n^3)$ time complexity ($n$ is the number of nodes). To address the problems, we propose a novel N-Cut solver designed based on the famous coordinate descent method. Since the vanilla coordinate descent method also has $\mathcal{O}(n^3)$ time complexity, we design various accelerating strategies to reduce the time complexity to $\mathcal{O}(|E|)$ ($|E|$ is the number of edges). To avoid reliance on random initialization which brings uncertainties to clustering, we propose an efficient initialization method that gives deterministic outputs. Extensive experiments on several benchmark datasets demonstrate that the proposed solver can obtain larger objective values of N-Cut, meanwhile achieving better clustering performance compared to traditional solvers. △ Less

Submitted 26 November, 2023; originally announced November 2023.

arXiv:2311.15031 [pdf, other]

Robust and Efficient Semi-supervised Learning for Ising Model

Authors: Daiqing Wu, Molei Liu

Abstract: In biomedical studies, it is often desirable to characterize the interactive mode of multiple disease outcomes beyond their marginal risk. Ising model is one of the most popular choices serving for this purpose. Nevertheless, learning efficiency of Ising models can be impeded by the scarcity of accurate disease labels, which is a prominent problem in contemporary studies driven by electronic healt… ▽ More In biomedical studies, it is often desirable to characterize the interactive mode of multiple disease outcomes beyond their marginal risk. Ising model is one of the most popular choices serving for this purpose. Nevertheless, learning efficiency of Ising models can be impeded by the scarcity of accurate disease labels, which is a prominent problem in contemporary studies driven by electronic health records (EHR). Semi-supervised learning (SSL) leverages the large unlabeled sample with auxiliary EHR features to assist the learning with labeled data only and is a potential solution to this issue. In this paper, we develop a novel SSL method for efficient inference of Ising model. Our method first models the outcomes against the auxiliary features, then uses it to project the score function of the supervised estimator onto the EHR features, and incorporates the unlabeled sample to augment the supervised estimator for variance reduction without introducing bias. For the key step of conditional modeling, we propose strategies that can effectively leverage the auxiliary EHR information while maintaining moderate model complexity. In addition, we introduce approaches including intrinsic efficient updates and ensemble, to overcome the potential misspecification of the conditional model that may cause efficiency loss. Our method is justified by asymptotic theory and shown to outperform existing SSL methods through simulation studies. We also illustrate its utility in a real example about several key phenotypes related to frequent ICU admission on MIMIC-III data set. △ Less

Submitted 25 November, 2023; originally announced November 2023.

arXiv:2311.14673 [pdf, other]

doi 10.1002/adom.202301863

Pump-induced terahertz conductivity response and peculiar bound state in Mn3Si2Te6

Authors: Qiong Wu, Qiangwei Yin, Sijie Zhang, Tianchen Hu, Dong Wu, Li Yue, Bohan Li, Shuxiang Xu, Rongsheng Li, Qiaomei Liu, Hechang Lei, Tao Dong, Nanlin Wang

Abstract: We report the significant enhancement on ultrafast terahertz optical conductivity and the unexpected formation of a polaronic-like state in semiconductor Mn3Si2Te6 at room temperature. With the absorption of pump photons, the low-frequency terahertz photoconductivity spectrum exhibits a significant rise, quickly forming a broad peak and subsequently shifting to higher energy. The short-lived natur… ▽ More We report the significant enhancement on ultrafast terahertz optical conductivity and the unexpected formation of a polaronic-like state in semiconductor Mn3Si2Te6 at room temperature. With the absorption of pump photons, the low-frequency terahertz photoconductivity spectrum exhibits a significant rise, quickly forming a broad peak and subsequently shifting to higher energy. The short-lived nature of the broad peak, as well as the distribution of optical constants, strongly points towards a transient polaron mechanism. Our study not only provides profound insights into the remarkable photoelectric response of Mn3Si2Te6 but also highlights its significant potential for future photoelectric applications. △ Less

Submitted 25 October, 2023; originally announced November 2023.

arXiv:2311.06812 [pdf, other]

MANSY: Generalizing Neural Adaptive Immersive Video Streaming With Ensemble and Representation Learning

Authors: Duo Wu, Panlong Wu, Miao Zhang, Fangxin Wang

Abstract: The popularity of immersive videos has prompted extensive research into neural adaptive tile-based streaming to optimize video transmission over networks with limited bandwidth. However, the diversity of users' viewing patterns and Quality of Experience (QoE) preferences has not been fully addressed yet by existing neural adaptive approaches for viewport prediction and bitrate selection. Their per… ▽ More The popularity of immersive videos has prompted extensive research into neural adaptive tile-based streaming to optimize video transmission over networks with limited bandwidth. However, the diversity of users' viewing patterns and Quality of Experience (QoE) preferences has not been fully addressed yet by existing neural adaptive approaches for viewport prediction and bitrate selection. Their performance can significantly deteriorate when users' actual viewing patterns and QoE preferences differ considerably from those observed during the training phase, resulting in poor generalization. In this paper, we propose MANSY, a novel streaming system that embraces user diversity to improve generalization. Specifically, to accommodate users' diverse viewing patterns, we design a Transformer-based viewport prediction model with an efficient multi-viewport trajectory input output architecture based on implicit ensemble learning. Besides, we for the first time combine the advanced representation learning and deep reinforcement learning to train the bitrate selection model to maximize diverse QoE objectives, enabling the model to generalize across users with diverse preferences. Extensive experiments demonstrate that MANSY outperforms state-of-the-art approaches in viewport prediction accuracy and QoE improvement on both trained and unseen viewing patterns and QoE preferences, achieving better generalization. △ Less

Submitted 12 November, 2023; originally announced November 2023.

Comments: This work has been submitted to the IEEE Transactions on Mobile Computing for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2311.06529 [pdf, ps, other]

doi 10.1007/s41605-023-00409-w

Insight-HXMT on-orbit thermal control status and thermal deformation impact analysis

Authors: Aimei Zhang, Yifan Zhang, Jinyuan Liao, Yupeng Xu, Yusa Wang, Wenbo Luo, Yupeng Zhou, Zhiying Qian, Xiaobo Li, Fangjun Lu, Shuangnan Zhang, Liming Song, Congzhan Liu, Fan Zhang, Jianyin Nie, Juan Wang, Sheng Yang, Tong Zhang, Xiaojing Liu, Ruijie Wang, Xufang Li, Yifei Zhang, Zhengwei Li, Xuefeng Lu, He Xu , et al. (1 additional authors not shown)

Abstract: Purpose: The Hard X-ray Modulation Telescope is China's first X-ray astronomy satellite launched on June 15th, 2017, dubbed Insight-HXMT. Active and passive thermal control measures are employed to keep devices at suitable temperatures. In this paper, we analyzed the on-orbit thermal monitoring data of the first 5 years and investigated the effect of thermal deformation on the point spread functio… ▽ More Purpose: The Hard X-ray Modulation Telescope is China's first X-ray astronomy satellite launched on June 15th, 2017, dubbed Insight-HXMT. Active and passive thermal control measures are employed to keep devices at suitable temperatures. In this paper, we analyzed the on-orbit thermal monitoring data of the first 5 years and investigated the effect of thermal deformation on the point spread function (PSF) of the telescopes. Methods: We examined the data of the on-orbit temperatures measured using 157 thermistors placed on the collimators, detectors and their support structures and compared the results with the thermal control requirements. The thermal deformation was evaluated by the relative orientation of the two star sensors installed on the main support structure. its effect was estimated with evolution of the PSF obtained with calibration scanning observations of the Crab nebula. Conclusion: The on-orbit temperatures met the thermal control requirements thus far, and the effect of thermal deformation on the PSF was negligible after the on-orbit pointing calibration. △ Less

Submitted 11 November, 2023; originally announced November 2023.

Comments: 25 pages, 35 figures, submitted

arXiv:2311.05102 [pdf, other]

New constraints on primordial non-Gaussianity from missing two-loop contributions of scalar induced gravitational waves

Authors: Zhe Chang, Yu-Ting Kuang, Di Wu, Jing-Zhi Zhou, Qing-Hua Zhu

Abstract: We analyze the energy density spectrum of \acp{SIGW} using the NANOGrav 15-year data set, thereby constraining the primordial non-Gaussian parameter $f_{\mathrm{NL}}$. For the first time, we calculate the seventeen missing two-loop diagrams proportional to $f_{\mathrm{NL}}A_ζ^3$ that correspond to the two-point correlation function $\langle h^{λ,(3)}_{\mathbf{k}} h^{λ',(2)}_{\mathbf{k}'} \rangle$… ▽ More We analyze the energy density spectrum of \acp{SIGW} using the NANOGrav 15-year data set, thereby constraining the primordial non-Gaussian parameter $f_{\mathrm{NL}}$. For the first time, we calculate the seventeen missing two-loop diagrams proportional to $f_{\mathrm{NL}}A_ζ^3$ that correspond to the two-point correlation function $\langle h^{λ,(3)}_{\mathbf{k}} h^{λ',(2)}_{\mathbf{k}'} \rangle$ for local-type primordial non-Gaussianity. The total energy density spectrum of \acp{SIGW} can be significantly suppressed by these two-loop diagrams. If \acp{SIGW} dominate the \acp{SGWB} observed in \ac{PTA} experiments, the parameter interval $f_{\mathrm{NL}}\in [-5,-1]$ is notably excluded based on NANOGrav 15-year data set. After taking into account abundance of \acp{PBH} and the convergence of the cosmological perturbation expansion, we find that the only possible parameter range for $f_{\mathrm{NL}}$ might be $-1\le f_{\mathrm{NL}}< 0$. △ Less

Submitted 8 November, 2023; originally announced November 2023.

Comments: 8 pages, 4 figures

arXiv:2311.03922 [pdf, other]

Riemann-Hilbert problems from rank 3 WKB spectral networks

Authors: Dongjian Wu

Abstract: We extract cluster structures and establish spectral coordinates from rank 3 WKB spectral networks $\mathcal W(\varphi,\vartheta)$ when zeros of $\varphi(z)$ are almost on a line in the complex plane. Then, we provide solutions to the Riemann-Hilbert problems (cf. arXiv:1611.03697) defined by these WKB spectral networks, using the spectral coordinates. As an application, we embed spaces of framed… ▽ More We extract cluster structures and establish spectral coordinates from rank 3 WKB spectral networks $\mathcal W(\varphi,\vartheta)$ when zeros of $\varphi(z)$ are almost on a line in the complex plane. Then, we provide solutions to the Riemann-Hilbert problems (cf. arXiv:1611.03697) defined by these WKB spectral networks, using the spectral coordinates. As an application, we embed spaces of framed polynomial cubic differentials, associated with these WKB spectral networks, into spaces of stability conditions, adopting the approach of arXiv:1302.7030. △ Less

Submitted 7 November, 2023; originally announced November 2023.

Comments: 58 pages, 22 figures, comments welcome!

arXiv:2311.00370 [pdf]

Discovery of four pulsars in a pilot survey at intermediate Galactic latitudes with FAST

Authors: Q. J. Zhi, J. T. Bai, S. Dai, X. Xu, S. J. Dang, L. H. Shang, R. S. Zhao, D. Li, W. W. Zhu, N. Wang, J. P. Yuan, P. Wang, L. Zhang, Y. Feng, J. B. Wang, S. Q. Wang, Q. D. Wu, A. J. Dong, H. Yang, J. Tian, W. Q. Zhong, X. H. Luo, Miroslav D. Filipovi, G. J. Qiao

Abstract: We present the discovery and timing results of four pulsars discovered in a pilot survey at intermediate Galactic latitudes with the Five-hundred Aperture Spherical Telescope (FAST). Among these pulsars, two belong to the category of millisecond pulsars (MSPs) with spin periods of less than 20 ms. The other two fall under the classification of "mildly recycled" pulsars, with massive white dwarfs a… ▽ More We present the discovery and timing results of four pulsars discovered in a pilot survey at intermediate Galactic latitudes with the Five-hundred Aperture Spherical Telescope (FAST). Among these pulsars, two belong to the category of millisecond pulsars (MSPs) with spin periods of less than 20 ms. The other two fall under the classification of "mildly recycled" pulsars, with massive white dwarfs as companions. Remarkably, this small survey, covering an area of 4.7 $deg^2$ , led to the discovery of four recycled pulsars. Such success underscores the immense potential of future surveys at intermediate Galactic latitudes. In order to assess the potential yield of MSPs, we conducted population simulations and found that both FAST and Parkes new phased array feed surveys, focusing on intermediate Galactic latitudes, have the capacity to uncover several hundred new MSPs. △ Less

Submitted 28 December, 2023; v1 submitted 1 November, 2023; originally announced November 2023.

Comments: 7 pages, 4 figures, 2 tables, accepted to ApJ

arXiv:2311.00288 [pdf, other]

Active Instruction Tuning: Improving Cross-Task Generalization by Training on Prompt Sensitive Tasks

Authors: Po-Nien Kung, Fan Yin, Di Wu, Kai-Wei Chang, Nanyun Peng

Abstract: Instruction tuning (IT) achieves impressive zero-shot generalization results by training large language models (LLMs) on a massive amount of diverse tasks with instructions. However, how to select new tasks to improve the performance and generalizability of IT models remains an open question. Training on all existing tasks is impractical due to prohibiting computation requirements, and randomly se… ▽ More Instruction tuning (IT) achieves impressive zero-shot generalization results by training large language models (LLMs) on a massive amount of diverse tasks with instructions. However, how to select new tasks to improve the performance and generalizability of IT models remains an open question. Training on all existing tasks is impractical due to prohibiting computation requirements, and randomly selecting tasks can lead to suboptimal performance. In this work, we propose active instruction tuning based on prompt uncertainty, a novel framework to identify informative tasks, and then actively tune the models on the selected tasks. We represent the informativeness of new tasks with the disagreement of the current model outputs over perturbed prompts. Our experiments on NIV2 and Self-Instruct datasets demonstrate that our method consistently outperforms other baseline strategies for task selection, achieving better out-of-distribution generalization with fewer training tasks. Additionally, we introduce a task map that categorizes and diagnoses tasks based on prompt uncertainty and prediction probability. We discover that training on ambiguous (prompt-uncertain) tasks improves generalization while training on difficult (prompt-certain and low-probability) tasks offers no benefit, underscoring the importance of task selection for instruction tuning. △ Less

Submitted 1 November, 2023; originally announced November 2023.

Comments: EMNLP 2023 Main

arXiv:2311.00285 [pdf, ps, other]

Mixture-of-Experts for Open Set Domain Adaptation: A Dual-Space Detection Approach

Authors: Zhenbang Du, Jiayu An, Yunlu Tu, Jiahao Hong, Dongrui Wu

Abstract: Open Set Domain Adaptation (OSDA) aims to cope with the distribution and label shifts between the source and target domains simultaneously, performing accurate classification for known classes while identifying unknown class samples in the target domain. Most existing OSDA approaches, depending on the final image feature space of deep models, require manually-tuned thresholds, and may easily miscl… ▽ More Open Set Domain Adaptation (OSDA) aims to cope with the distribution and label shifts between the source and target domains simultaneously, performing accurate classification for known classes while identifying unknown class samples in the target domain. Most existing OSDA approaches, depending on the final image feature space of deep models, require manually-tuned thresholds, and may easily misclassify unknown samples as known classes. Mixture-of-Experts (MoE) could be a remedy. Within a MoE, different experts handle distinct input features, producing unique expert routing patterns for various classes in a routing feature space. As a result, unknown class samples may display different expert routing patterns to known classes. In this paper, we propose Dual-Space Detection, which exploits the inconsistencies between the image feature space and the routing feature space to detect unknown class samples without any threshold. Graph Router is further introduced to better make use of the spatial information among image patches. Experiments on three different datasets validated the effectiveness and superiority of our approach. △ Less

Submitted 3 July, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2310.20709 [pdf, other]

Quadratic Differentials as Stability Conditions of Graded Skew-gentle Algebras

Authors: Suiqi Lu, Yu Qiu, Dongjian Wu

Abstract: We prove that the principal component of the exchange graph of hearts of a graded skew-gentle algebra can be identified with the corresponding exchange graph of S-graphs, using the geometric models and $\operatorname{Int}=\operatorname{dim}\operatorname{Hom}$ formula in Qiu-Zhang-Zhou. Using the same argument in Bridgeland-Smith, Barbieri-Möller-Qiu-So and Christ-Haiden-Qiu, we extend this identif… ▽ More We prove that the principal component of the exchange graph of hearts of a graded skew-gentle algebra can be identified with the corresponding exchange graph of S-graphs, using the geometric models and $\operatorname{Int}=\operatorname{dim}\operatorname{Hom}$ formula in Qiu-Zhang-Zhou. Using the same argument in Bridgeland-Smith, Barbieri-Möller-Qiu-So and Christ-Haiden-Qiu, we extend this identification to an isomorphism between the spaces of stability conditions and of quadratic differentials. △ Less

Submitted 31 October, 2023; originally announced October 2023.

arXiv:2310.17997 [pdf]

Deep Learning Enables Large Depth-of-Field Images for Sub-Diffraction-Limit Scanning Superlens Microscopy

Authors: Hui Sun, Hao Luo, Feifei Wang, Qingjiu Chen, Meng Chen, Xiaoduo Wang, Haibo Yu, Guanglie Zhang, Lianqing Liu, Jianping Wang, Dapeng Wu, Wen Jung Li

Abstract: Scanning electron microscopy (SEM) is indispensable in diverse applications ranging from microelectronics to food processing because it provides large depth-of-field images with a resolution beyond the optical diffraction limit. However, the technology requires coating conductive films on insulator samples and a vacuum environment. We use deep learning to obtain the mapping relationship between op… ▽ More Scanning electron microscopy (SEM) is indispensable in diverse applications ranging from microelectronics to food processing because it provides large depth-of-field images with a resolution beyond the optical diffraction limit. However, the technology requires coating conductive films on insulator samples and a vacuum environment. We use deep learning to obtain the mapping relationship between optical super-resolution (OSR) images and SEM domain images, which enables the transformation of OSR images into SEM-like large depth-of-field images. Our custom-built scanning superlens microscopy (SSUM) system, which requires neither coating samples by conductive films nor a vacuum environment, is used to acquire the OSR images with features down to ~80 nm. The peak signal-to-noise ratio (PSNR) and structural similarity index measure values indicate that the deep learning method performs excellently in image-to-image translation, with a PSNR improvement of about 0.74 dB over the optical super-resolution images. The proposed method provides a high level of detail in the reconstructed results, indicating that it has broad applicability to chip-level defect detection, biological sample analysis, forensics, and various other fields. △ Less

Submitted 27 October, 2023; originally announced October 2023.

Comments: 13 pages,7 figures

arXiv:2310.16333 [pdf, other]

Scalable Optimal Power Management for Large-Scale Battery Energy Storage Systems

Authors: Amir Farakhor, Di Wu, Yebin Wang, Huazhen Fang

Abstract: Large-scale battery energy storage systems (BESS) are helping transition the world towards sustainability with their broad use, among others, in electrified transportation, power grid, and renewables. However, optimal power management for them is often computationally formidable. To overcome this challenge, we develop a scalable approach in the paper. The proposed approach partitions the constitut… ▽ More Large-scale battery energy storage systems (BESS) are helping transition the world towards sustainability with their broad use, among others, in electrified transportation, power grid, and renewables. However, optimal power management for them is often computationally formidable. To overcome this challenge, we develop a scalable approach in the paper. The proposed approach partitions the constituting cells of a large-scale BESS into clusters based on their state-of-charge (SoC), temperature, and internal resistance. Each cluster is characterized by a representative model that approximately captures its collective SoC and temperature dynamics, as well as its overall power losses in charging/discharging. Based on the clusters, we then formulate a problem of receding-horizon optimal power control to minimize the power losses while promoting SoC and temperature balancing. The cluster-based power optimization will decide the power quota for each cluster, and then every cluster will split the quota among the constituent cells. Since the number of clusters is much fewer than the number of cells, the proposed approach significantly reduces the computational costs, allowing optimal power management to scale up to large-scale BESS. Extensive simulations are performed to evaluate the proposed approach. The obtained results highlight a significant computational overhead reduction by more than 60% for a small-scale and 98% for a large-scale BESS compared to the conventional cell-level optimization. Experimental validation based on a 20-cell prototype further demonstrates its effectiveness and utility. △ Less

Submitted 6 November, 2023; v1 submitted 24 October, 2023; originally announced October 2023.

Comments: IEEE Transactions on Transportation Electrification

arXiv:2310.16125 [pdf]

Online Two-stage Thermal History Prediction Method for Metal Additive Manufacturing of Thin Walls

Authors: Yifan Tang, M. Rahmani Dehaghani, Pouyan Sajadi, Shahriar Bakrani Balani, Akshay Dhalpe, Suraj Panicker, Di Wu, Eric Coatanea, G. Gary Wang

Abstract: This paper aims to propose an online two-stage thermal history prediction method, which could be integrated into a metal AM process for performance control. Based on the similarity of temperature curves (curve segments of a temperature profile of one point) between any two successive layers, the first stage of the proposed method designs a layer-to-layer prediction model to estimate the temperatur… ▽ More This paper aims to propose an online two-stage thermal history prediction method, which could be integrated into a metal AM process for performance control. Based on the similarity of temperature curves (curve segments of a temperature profile of one point) between any two successive layers, the first stage of the proposed method designs a layer-to-layer prediction model to estimate the temperature curves of the yet-to-print layer from measured temperatures of certain points on the previously printed layer. With measured/predicted temperature profiles of several points on the same layer, the second stage proposes a reduced order model (ROM) (intra-layer prediction model) to decompose and construct the temperature profiles of all points on the same layer, which could be used to build the temperature field of the entire layer. The training of ROM is performed with an extreme learning machine (ELM) for computational efficiency. Fifteen wire arc AM experiments and nine simulations are designed for thin walls with a fixed length and unidirectional printing of each layer. The test results indicate that the proposed prediction method could construct the thermal history of a yet-to-print layer within 0.1 seconds on a low-cost desktop computer. Meanwhile, the method has acceptable generalization capability in most cases from lower layers to higher layers in the same simulation, as well as from one simulation to a new simulation on different AM process parameters. More importantly, after fine-tuning the proposed method with limited experimental data, the relative errors of all predicted temperature profiles on a new experiment are smaller than 0.09, which demonstrates the applicability and generalization of the proposed two-stage thermal history prediction method in online applications for metal AM. △ Less

Submitted 17 November, 2023; v1 submitted 24 October, 2023; originally announced October 2023.

Comments: 30 pages, 21 figures, 2 tables

arXiv:2310.15840 [pdf, ps, other]

Cotorsion pairs in comma categories

Authors: Yuan Yuan, Jian He, Dejun Wu

Abstract: Let A and B be abelian categories with enough projective and injective objects, and T : A-B a left exact additive functor. Then one has a comma category (B*T). It is shown that If T : A-B is X-exact, then (*X, X) is a (hereditary) cotorsion pair in A and (*Y, Y)) is a (hereditary) cotorsion pair in B if and only if ((*X, Y ), <h(X, Y)> ) is a (hereditary) cotorsion pair in (B*T) and X and Y are cl… ▽ More Let A and B be abelian categories with enough projective and injective objects, and T : A-B a left exact additive functor. Then one has a comma category (B*T). It is shown that If T : A-B is X-exact, then (*X, X) is a (hereditary) cotorsion pair in A and (*Y, Y)) is a (hereditary) cotorsion pair in B if and only if ((*X, Y ), <h(X, Y)> ) is a (hereditary) cotorsion pair in (B*T) and X and Y are closed under extensions. Furthermore, we characterize when special preenveloping classes in abelian categories A and B can induce special preenveloping classes in (B*T). △ Less

Submitted 24 October, 2023; originally announced October 2023.

Comments: arXiv admin note: text overlap with arXiv:1911.03345 by other authors

arXiv:2310.13810 [pdf]

A Better Match for Drivers and Riders: Reinforcement Learning at Lyft

Authors: Xabi Azagirre, Akshay Balwally, Guillaume Candeli, Nicholas Chamandy, Benjamin Han, Alona King, Hyungjun Lee, Martin Loncaric, Sebastien Martin, Vijay Narasiman, Zhiwei, Qin, Baptiste Richard, Sara Smoot, Sean Taylor, Garrett van Ryzin, Di Wu, Fei Yu, Alex Zamoshchin

Abstract: To better match drivers to riders in our ridesharing application, we revised Lyft's core matching algorithm. We use a novel online reinforcement learning approach that estimates the future earnings of drivers in real time and use this information to find more efficient matches. This change was the first documented implementation of a ridesharing matching algorithm that can learn and improve in rea… ▽ More To better match drivers to riders in our ridesharing application, we revised Lyft's core matching algorithm. We use a novel online reinforcement learning approach that estimates the future earnings of drivers in real time and use this information to find more efficient matches. This change was the first documented implementation of a ridesharing matching algorithm that can learn and improve in real time. We evaluated the new approach during weeks of switchback experimentation in most Lyft markets, and estimated how it benefited drivers, riders, and the platform. In particular, it enabled our drivers to serve millions of additional riders each year, leading to more than $30 million per year in incremental revenue. Lyft rolled out the algorithm globally in 2021. △ Less

Submitted 13 November, 2023; v1 submitted 20 October, 2023; originally announced October 2023.

Showing 151–200 of 1,481 results for author: Wu, D