subscribe to arXiv mailings

doi 10.1609/aaai.v38i5.28249

Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification

Authors: Wenshuo Peng, Kaipeng Zhang, Yue Yang, Hao Zhang, Yu Qiao

Abstract: Vision-language foundation models have been incredibly successful in a wide range of downstream computer vision tasks using adaptation methods. However, due to the high cost of obtaining pre-training datasets, pairs with weak image-text correlation in the data exist in large numbers. We call them weak-paired samples. Due to the limitations of these weak-paired samples, the pre-training model are u… ▽ More Vision-language foundation models have been incredibly successful in a wide range of downstream computer vision tasks using adaptation methods. However, due to the high cost of obtaining pre-training datasets, pairs with weak image-text correlation in the data exist in large numbers. We call them weak-paired samples. Due to the limitations of these weak-paired samples, the pre-training model are unable to mine all the knowledge from pre-training data. The existing adaptation methods do not consider the missing knowledge, which may lead to crucial task-related knowledge for the downstream tasks being ignored. To address this issue, we propose a new adaptation framework called Data Adaptive Traceback (DAT). Specifically, we utilize a zero-shot-based method to extract the most downstream task-related subset of the pre-training data to enable the downstream tasks. Furthermore, we adopt a pseudo-label-based semi-supervised technique to reuse the pre-training images and a vision-language contrastive learning method to address the confirmation bias issue in semi-supervised learning. We conduct extensive experiments that show our proposed DAT approach meaningfully improves various benchmark datasets performance over traditional adaptation methods by simply. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 9 pages,4 figures

arXiv:2407.08085 [pdf, other]

Light Dark Matter Constraints from SuperCDMS HVeV Detectors Operated Underground with an Anticoincidence Event Selection

Authors: SuperCDMS Collaboration, M. F. Albakry, I. Alkhatib, D. Alonso-González, D. W. P. Amaral, J. Anczarski, T. Aralis, T. Aramaki, I. J. Arnquist, I. Ataee Langroudy, E. Azadbakht, C. Bathurst, R. Bhattacharyya, A. J. Biffl, P. L. Brink, M. Buchanan, R. Bunker, B. Cabrera, R. Calkins, R. A. Cameron, C. Cartaro, D. G. Cerdeño, Y. -Y. Chang, M. Chaudhuri, J. -H. Chen , et al. (115 additional authors not shown)

Abstract: This article presents constraints on dark-matter-electron interactions obtained from the first underground data-taking campaign with multiple SuperCDMS HVeV detectors operated in the same housing. An exposure of 7.63 g-days is used to set upper limits on the dark-matter-electron scattering cross section for dark matter masses between 0.5 and 1000 MeV/$c^2$, as well as upper limits on dark photon k… ▽ More This article presents constraints on dark-matter-electron interactions obtained from the first underground data-taking campaign with multiple SuperCDMS HVeV detectors operated in the same housing. An exposure of 7.63 g-days is used to set upper limits on the dark-matter-electron scattering cross section for dark matter masses between 0.5 and 1000 MeV/$c^2$, as well as upper limits on dark photon kinetic mixing and axion-like particle axioelectric coupling for masses between 1.2 and 23.3 eV/$c^2$. Compared to an earlier HVeV search, sensitivity was improved as a result of an increased overburden of 225 meters of water equivalent, an anticoincidence event selection, and better pile-up rejection. In the case of dark-matter-electron scattering via a heavy mediator, an improvement by up to a factor of 25 in cross-section sensitivity was achieved. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: 7 pages + title and references, 4 figures, and 1 table

arXiv:2407.03245 [pdf, other]

TieBot: Learning to Knot a Tie from Visual Demonstration through a Real-to-Sim-to-Real Approach

Authors: Weikun Peng, Jun Lv, Yuwei Zeng, Haonan Chen, Siheng Zhao, Jichen Sun, Cewu Lu, Lin Shao

Abstract: The tie-knotting task is highly challenging due to the tie's high deformation and long-horizon manipulation actions. This work presents TieBot, a Real-to-Sim-to-Real learning from visual demonstration system for the robots to learn to knot a tie. We introduce the Hierarchical Feature Matching approach to estimate a sequence of tie's meshes from the demonstration video. With these estimated meshes… ▽ More The tie-knotting task is highly challenging due to the tie's high deformation and long-horizon manipulation actions. This work presents TieBot, a Real-to-Sim-to-Real learning from visual demonstration system for the robots to learn to knot a tie. We introduce the Hierarchical Feature Matching approach to estimate a sequence of tie's meshes from the demonstration video. With these estimated meshes used as subgoals, we first learn a teacher policy using privileged information. Then, we learn a student policy with point cloud observation by imitating teacher policy. Lastly, our pipeline learns a residual policy when the learned policy is applied to real-world execution, mitigating the Sim2Real gap. We demonstrate the effectiveness of TieBot in simulation and the real world. In the real-world experiment, a dual-arm robot successfully knots a tie, achieving 50% success rate among 10 trials. Videos can be found https://tiebots.github.io/. △ Less

Submitted 3 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

Comments: fix few typos

arXiv:2406.18129 [pdf, other]

CTS: Sim-to-Real Unsupervised Domain Adaptation on 3D Detection

Authors: Meiying Zhang, Weiyuan Peng, Guangyao Ding, Chenyang Lei, Chunlin Ji, Qi Hao

Abstract: Simulation data can be accurately labeled and have been expected to improve the performance of data-driven algorithms, including object detection. However, due to the various domain inconsistencies from simulation to reality (sim-to-real), cross-domain object detection algorithms usually suffer from dramatic performance drops. While numerous unsupervised domain adaptation (UDA) methods have been d… ▽ More Simulation data can be accurately labeled and have been expected to improve the performance of data-driven algorithms, including object detection. However, due to the various domain inconsistencies from simulation to reality (sim-to-real), cross-domain object detection algorithms usually suffer from dramatic performance drops. While numerous unsupervised domain adaptation (UDA) methods have been developed to address cross-domain tasks between real-world datasets, progress in sim-to-real remains limited. This paper presents a novel Complex-to-Simple (CTS) framework to transfer models from labeled simulation (source) to unlabeled reality (target) domains. Based on a two-stage detector, the novelty of this work is threefold: 1) developing fixed-size anchor heads and RoI augmentation to address size bias and feature diversity between two domains, thereby improving the quality of pseudo-label; 2) developing a novel corner-format representation of aleatoric uncertainty (AU) for the bounding box, to uniformly quantify pseudo-label quality; 3) developing a noise-aware mean teacher domain adaptation method based on AU, as well as object-level and frame-level sampling strategies, to migrate the impact of noisy labels. Experimental results demonstrate that our proposed approach significantly enhances the sim-to-real domain adaptation capability of 3D object detection models, outperforming state-of-the-art cross-domain algorithms, which are usually developed for real-to-real UDA tasks. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.18116 [pdf, other]

BADGE: BADminton report Generation and Evaluation with LLM

Authors: Shang-Hsuan Chiang, Lin-Wei Chao, Kuang-Da Wang, Chih-Chuan Wang, Wen-Chih Peng

Abstract: Badminton enjoys widespread popularity, and reports on matches generally include details such as player names, game scores, and ball types, providing audiences with a comprehensive view of the games. However, writing these reports can be a time-consuming task. This challenge led us to explore whether a Large Language Model (LLM) could automate the generation and evaluation of badminton reports. We… ▽ More Badminton enjoys widespread popularity, and reports on matches generally include details such as player names, game scores, and ball types, providing audiences with a comprehensive view of the games. However, writing these reports can be a time-consuming task. This challenge led us to explore whether a Large Language Model (LLM) could automate the generation and evaluation of badminton reports. We introduce a novel framework named BADGE, designed for this purpose using LLM. Our method consists of two main phases: Report Generation and Report Evaluation. Initially, badminton-related data is processed by the LLM, which then generates a detailed report of the match. We tested different Input Data Types, In-Context Learning (ICL), and LLM, finding that GPT-4 performs best when using CSV data type and the Chain of Thought prompting. Following report generation, the LLM evaluates and scores the reports to assess their quality. Our comparisons between the scores evaluated by GPT-4 and human judges show a tendency to prefer GPT-4 generated reports. Since the application of LLM in badminton reporting remains largely unexplored, our research serves as a foundational step for future advancements in this area. Moreover, our method can be extended to other sports games, thereby enhancing sports promotion. For more details, please refer to https://github.com/AndyChiangSH/BADGE. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: Accepted by IJCAI 2024 Workshop: The 2nd International Workshop on Intelligent Technologies for Precision Sports Science (IT4PSS)

arXiv:2406.11483 [pdf]

Analysis of water injection heat recovery potential of abandoned oil wells to geothermal wells in northern Shaanxi

Authors: Yu Huagui, Liu Shi, Pang Yanyan, Wang Peng, Gao Qian

Abstract: The Chang 2 bottom water reservoir area in the western part of northern Shaanxi is one of the core oil-producing areas in the Ordos Basin.One of the main reservoirs is the Chang 2 reservoir of the Triassic Yanchang Formation, which has good physical conditions, active edge and bottom water, and high geothermal gradient. In this paper, the reservoir numerical simulation software CMG is used to simu… ▽ More The Chang 2 bottom water reservoir area in the western part of northern Shaanxi is one of the core oil-producing areas in the Ordos Basin.One of the main reservoirs is the Chang 2 reservoir of the Triassic Yanchang Formation, which has good physical conditions, active edge and bottom water, and high geothermal gradient. In this paper, the reservoir numerical simulation software CMG is used to simulate the water intake and heat recovery in the target study area, and the heat recovery rate and heat recovery of the three water production methods of direct water production, four injection and one production and one injection and four production under different injection pressures are analyzed. The results show that it is difficult to realize the direct water extraction from the bottom water reservoir. The annual heat recovery of single well of four injection and one production and one injection and four production is converted to the standard coal production between 190 ~ 420 t, so the Chang 2 reservoir in the western part of northern Shaanxi has the potential of water injection and heat recovery. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Journal ref: Modern Electric Power, 2023, 1-9

arXiv:2406.11176 [pdf, other]

Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement

Authors: Weimin Xiong, Yifan Song, Xiutian Zhao, Wenhao Wu, Xun Wang, Ke Wang, Cheng Li, Wei Peng, Sujian Li

Abstract: Large language model agents have exhibited exceptional performance across a range of complex interactive tasks. Recent approaches have utilized tuning with expert trajectories to enhance agent performance, yet they primarily concentrate on outcome rewards, which may lead to errors or suboptimal actions due to the absence of process supervision signals. In this paper, we introduce the Iterative ste… ▽ More Large language model agents have exhibited exceptional performance across a range of complex interactive tasks. Recent approaches have utilized tuning with expert trajectories to enhance agent performance, yet they primarily concentrate on outcome rewards, which may lead to errors or suboptimal actions due to the absence of process supervision signals. In this paper, we introduce the Iterative step-level Process Refinement (IPR) framework, which provides detailed step-by-step guidance to enhance agent training. Specifically, we adopt the Monte Carlo method to estimate step-level rewards. During each iteration, the agent explores along the expert trajectory and generates new actions. These actions are then evaluated against the corresponding step of expert trajectory using step-level rewards. Such comparison helps identify discrepancies, yielding contrastive action pairs that serve as training data for the agent. Our experiments on three complex agent tasks demonstrate that our framework outperforms a variety of strong baselines. Moreover, our analytical findings highlight the effectiveness of IPR in augmenting action efficiency and its applicability to diverse models. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.10744 [pdf, other]

Technique Report of CVPR 2024 PBDL Challenges

Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Shengping Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou, Cong Li, Senyan Xu , et al. (75 additional authors not shown)

Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, and medium properties from images. In recent years, deep learning has shown promising improvements for various vision tasks, and when combined with physics-based vision, these approaches can enhance the robustness and accuracy of vision systems. This technical report summarizes the outcomes of the Physics-Based Vision Meets Deep Learning (PBDL) 2024 challenge, held in CVPR 2024 workshop. The challenge consisted of eight tracks, focusing on Low-Light Enhancement and Detection as well as High Dynamic Range (HDR) Imaging. This report details the objectives, methodologies, and results of each track, highlighting the top-performing solutions and their innovative approaches. △ Less

Submitted 12 July, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

Comments: CVPR 2024 PBDL Challenges: https://pbdl-ws.github.io/pbdl2024/challenge/index.html

arXiv:2406.09265 [pdf, other]

Sharing Matters: Analysing Neurons Across Languages and Tasks in LLMs

Authors: Weixuan Wang, Barry Haddow, Wei Peng, Alexandra Birch

Abstract: Multilingual large language models (LLMs) have greatly increased the ceiling of performance on non-English tasks. However the mechanisms behind multilingualism in these LLMs are poorly understood. Of particular interest is the degree to which internal representations are shared between languages. Recent work on neuron analysis of LLMs has focused on the monolingual case, and the limited work on th… ▽ More Multilingual large language models (LLMs) have greatly increased the ceiling of performance on non-English tasks. However the mechanisms behind multilingualism in these LLMs are poorly understood. Of particular interest is the degree to which internal representations are shared between languages. Recent work on neuron analysis of LLMs has focused on the monolingual case, and the limited work on the multilingual case has not considered the interaction between tasks and linguistic representations. In our work, we investigate how neuron activation is shared across languages by categorizing neurons into four distinct groups according to their responses across different languages for a particular input: all-shared, partial-shared, specific, and non-activated. This categorization is combined with a study of neuron attribution, i.e. the importance of a neuron w.r.t an output. Our analysis reveals the following insights: (i) the linguistic sharing patterns are strongly affected by the type of task, but neuron behaviour changes across different inputs even for the same task; (ii) all-shared neurons play a key role in generating correct responses; (iii) boosting multilingual alignment by increasing all-shared neurons can enhance accuracy on multilingual tasks. The code is available at https://github.com/weixuan-wang123/multilingual-neurons. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2405.15299 [pdf, other]

Transparent Object Depth Completion

Authors: Yifan Zhou, Wanli Peng, Zhongyu Yang, He Liu, Yi Sun

Abstract: The perception of transparent objects for grasp and manipulation remains a major challenge, because existing robotic grasp methods which heavily rely on depth maps are not suitable for transparent objects due to their unique visual properties. These properties lead to gaps and inaccuracies in the depth maps of the transparent objects captured by depth sensors. To address this issue, we propose an… ▽ More The perception of transparent objects for grasp and manipulation remains a major challenge, because existing robotic grasp methods which heavily rely on depth maps are not suitable for transparent objects due to their unique visual properties. These properties lead to gaps and inaccuracies in the depth maps of the transparent objects captured by depth sensors. To address this issue, we propose an end-to-end network for transparent object depth completion that combines the strengths of single-view RGB-D based depth completion and multi-view depth estimation. Moreover, we introduce a depth refinement module based on confidence estimation to fuse predicted depth maps from single-view and multi-view modules, which further refines the restored depth map. The extensive experiments on the ClearPose and TransCG datasets demonstrate that our method achieves superior accuracy and robustness in complex scenarios with significant occlusion compared to the state-of-the-art methods. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.10305 [pdf, other]

4D Panoptic Scene Graph Generation

Authors: Jingkang Yang, Jun Cen, Wenxuan Peng, Shuai Liu, Fangzhou Hong, Xiangtai Li, Kaiyang Zhou, Qifeng Chen, Ziwei Liu

Abstract: We are living in a three-dimensional space while moving forward through a fourth dimension: time. To allow artificial intelligence to develop a comprehensive understanding of such a 4D environment, we introduce 4D Panoptic Scene Graph (PSG-4D), a new representation that bridges the raw visual data perceived in a dynamic 4D world and high-level visual understanding. Specifically, PSG-4D abstracts r… ▽ More We are living in a three-dimensional space while moving forward through a fourth dimension: time. To allow artificial intelligence to develop a comprehensive understanding of such a 4D environment, we introduce 4D Panoptic Scene Graph (PSG-4D), a new representation that bridges the raw visual data perceived in a dynamic 4D world and high-level visual understanding. Specifically, PSG-4D abstracts rich 4D sensory data into nodes, which represent entities with precise location and status information, and edges, which capture the temporal relations. To facilitate research in this new area, we build a richly annotated PSG-4D dataset consisting of 3K RGB-D videos with a total of 1M frames, each of which is labeled with 4D panoptic segmentation masks as well as fine-grained, dynamic scene graphs. To solve PSG-4D, we propose PSG4DFormer, a Transformer-based model that can predict panoptic segmentation masks, track masks along the time axis, and generate the corresponding scene graphs via a relation component. Extensive experiments on the new dataset show that our method can serve as a strong baseline for future research on PSG-4D. In the end, we provide a real-world application example to demonstrate how we can achieve dynamic scene understanding by integrating a large language model into our PSG-4D system. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: Accepted as NeurIPS 2023. Code: https://github.com/Jingkang50/PSG4D Previous Series: PSG https://github.com/Jingkang50/OpenPSG and PVSG https://github.com/Jingkang50/OpenPVSG

arXiv:2405.07464 [pdf]

Atomic-scale tunable phonon transport at tailored grain boundaries

Authors: Xiaowang Wang, Chaitanya A. Gadre, Runqing Yang, Wanjuan Zou, Xing Bin, Christopher Addiego, Toshihiro Aoki, Yujie Quan, Wei-Tao Peng, Yifeng Huang, Chaojie Du, Mingjie Xu, Xingxu Yan, Ruqian Wu, Shyue Ping Ong, Bolin Liao, Penghui Cao, Xiaoqing Pan

Abstract: Manipulating thermal properties in materials has been of fundamental importance for advancing innovative technologies. Heat carriers such as phonons are impeded by breaking crystal symmetry or periodicity. Notable methods of impeding the phonon propagation include varying the density of defects, interfaces, and nanostructures, as well as changing composition. However, a robust link between the ind… ▽ More Manipulating thermal properties in materials has been of fundamental importance for advancing innovative technologies. Heat carriers such as phonons are impeded by breaking crystal symmetry or periodicity. Notable methods of impeding the phonon propagation include varying the density of defects, interfaces, and nanostructures, as well as changing composition. However, a robust link between the individual nanoscale defect structures, phonon states, and macroscopic thermal conductivity is lacking. Here we reveal from nanoscale structure-phonon mechanisms on how the grain boundary (GB) tilt and twist angles fundamentally drive the changes in atom rearrangements, exotic vibrational states, and finally macroscopic heat transport at different bicrystal strontium titanate GBs using emerging atomic resolution vibrational spectroscopy. The 10 deg and 22 deg tilt GBs exhibit reduced phonon populations by 54% and 16% compared to the bulk value, respectively, consistent with measured thermal conductivities. A tiny twist angle further introduces a fine and local tunning of thermal conductivity by introducing twist induced defects periodically embedded with the tilt induced GB defects. Our results demonstrate that varying the tilt angle coarsely modifies the phonon population along entire GB while varying the twist angle incurs a finer adjustment at periodic locations on the GB. Our study offers a systematic approach to understanding and manipulating cross GB thermal transport of arbitrary GBs predictably and precisely. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.06964 [pdf, other]

ManiFoundation Model for General-Purpose Robotic Manipulation of Contact Synthesis with Arbitrary Objects and Robots

Authors: Zhixuan Xu, Chongkai Gao, Zixuan Liu, Gang Yang, Chenrui Tie, Haozhuo Zheng, Haoyu Zhou, Weikun Peng, Debang Wang, Tianyi Chen, Zhouliang Yu, Lin Shao

Abstract: To substantially enhance robot intelligence, there is a pressing need to develop a large model that enables general-purpose robots to proficiently undertake a broad spectrum of manipulation tasks, akin to the versatile task-planning ability exhibited by LLMs. The vast diversity in objects, robots, and manipulation tasks presents huge challenges. Our work introduces a comprehensive framework to dev… ▽ More To substantially enhance robot intelligence, there is a pressing need to develop a large model that enables general-purpose robots to proficiently undertake a broad spectrum of manipulation tasks, akin to the versatile task-planning ability exhibited by LLMs. The vast diversity in objects, robots, and manipulation tasks presents huge challenges. Our work introduces a comprehensive framework to develop a foundation model for general robotic manipulation that formalizes a manipulation task as contact synthesis. Specifically, our model takes as input object and robot manipulator point clouds, object physical attributes, target motions, and manipulation region masks. It outputs contact points on the object and associated contact forces or post-contact motions for robots to achieve the desired manipulation task. We perform extensive experiments both in the simulation and real-world settings, manipulating articulated rigid objects, rigid objects, and deformable objects that vary in dimensionality, ranging from one-dimensional objects like ropes to two-dimensional objects like cloth and extending to three-dimensional objects such as plasticine. Our model achieves average success rates of around 90\%. Supplementary materials and videos are available on our project website at https://manifoundationmodel.github.io/. △ Less

Submitted 11 May, 2024; originally announced May 2024.

arXiv:2404.18527 [pdf]

Bridging Data Barriers among Participants: Assessing the Potential of Geoenergy through Federated Learning

Authors: Weike Peng, Jiaxin Gao, Yuntian Chen, Shengwei Wang

Abstract: Machine learning algorithms emerge as a promising approach in energy fields, but its practical is hindered by data barriers, stemming from high collection costs and privacy concerns. This study introduces a novel federated learning (FL) framework based on XGBoost models, enabling safe collaborative modeling with accessible yet concealed data from multiple parties. Hyperparameter tuning of the mode… ▽ More Machine learning algorithms emerge as a promising approach in energy fields, but its practical is hindered by data barriers, stemming from high collection costs and privacy concerns. This study introduces a novel federated learning (FL) framework based on XGBoost models, enabling safe collaborative modeling with accessible yet concealed data from multiple parties. Hyperparameter tuning of the models is achieved through Bayesian Optimization. To ascertain the merits of the proposed FL-XGBoost method, a comparative analysis is conducted between separate and centralized models to address a classical binary classification problem in geoenergy sector. The results reveal that the proposed FL framework strikes an optimal balance between privacy and accuracy. FL models demonstrate superior accuracy and generalization capabilities compared to separate models, particularly for participants with limited data or low correlation features and offers significant privacy benefits compared to centralized model. The aggregated optimization approach within the FL agreement proves effective in tuning hyperparameters. This study opens new avenues for assessing unconventional reservoirs through collaborative and privacy-preserving FL techniques. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.16469 [pdf, ps, other]

From weak to strong-coupling superconductivity tuned by substrate in TiN films

Authors: Yixin Liu, Zulei Xu, Aobo Yu, Xiaoni Wang, Wei Peng, Yu Wu, Gang Mu, Zhi-Rong Lin

Abstract: The interplay between substrates and superconducting thin films has attracted increasing attention. Here, we report an in-depth investigation on superconducting properties of the epitaxial TiN thin films grown on two different substrates by dc reactive magnetron sputtering. The TiN films grown on (0001) sapphire exhibit (111) crystal orientation, while that grown on (100) Si substrates exhibit (10… ▽ More The interplay between substrates and superconducting thin films has attracted increasing attention. Here, we report an in-depth investigation on superconducting properties of the epitaxial TiN thin films grown on two different substrates by dc reactive magnetron sputtering. The TiN films grown on (0001) sapphire exhibit (111) crystal orientation, while that grown on (100) Si substrates exhibit (100) orientation. Moreover, the samples grown on Si reveal a relatively lower level of disorder, accompanied by the higher critical transition temperature $T_c$ and smaller magnitude of upper critical field slope near $T_c$. Remarkably, we uncovered a rather high value of superconducting gap (with $Δ_0/k_BT_c$ = 3.05) in TiN film on Si indicating a very strong coupling superconductivity, in sharp contrast to the case using sapphires as the substrate which reveals a weak-coupling feature. Further analysis shows that the weakened electronic screening effect due to the high level of disorder and the suppressed electronic density of states may be the underlying reasons for the occurrence of weak coupling superconductivity in the TiN films based on sapphire substrate. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: 6 pages, 5 figures

arXiv:2404.10413 [pdf, other]

VDTuner: Automated Performance Tuning for Vector Data Management Systems

Authors: Tiannuo Yang, Wen Hu, Wangqi Peng, Yusen Li, Jianguo Li, Gang Wang, Xiaoguang Liu

Abstract: Vector data management systems (VDMSs) have become an indispensable cornerstone in large-scale information retrieval and machine learning systems like large language models. To enhance the efficiency and flexibility of similarity search, VDMS exposes many tunable index parameters and system parameters for users to specify. However, due to the inherent characteristics of VDMS, automatic performance… ▽ More Vector data management systems (VDMSs) have become an indispensable cornerstone in large-scale information retrieval and machine learning systems like large language models. To enhance the efficiency and flexibility of similarity search, VDMS exposes many tunable index parameters and system parameters for users to specify. However, due to the inherent characteristics of VDMS, automatic performance tuning for VDMS faces several critical challenges, which cannot be well addressed by the existing auto-tuning methods. In this paper, we introduce VDTuner, a learning-based automatic performance tuning framework for VDMS, leveraging multi-objective Bayesian optimization. VDTuner overcomes the challenges associated with VDMS by efficiently exploring a complex multi-dimensional parameter space without requiring any prior knowledge. Moreover, it is able to achieve a good balance between search speed and recall rate, delivering an optimal configuration. Extensive evaluations demonstrate that VDTuner can markedly improve VDMS performance (14.12% in search speed and 186.38% in recall rate) compared with default setting, and is more efficient compared with state-of-the-art baselines (up to 3.57 times faster in terms of tuning time). In addition, VDTuner is scalable to specific user preference and cost-aware optimization objective. VDTuner is available online at https://github.com/tiannuo-yang/VDTuner. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: Accepted by ICDE 2024

arXiv:2404.10229 [pdf, other]

Generative Text Steganography with Large Language Model

Authors: Jiaxuan Wu, Zhengxian Wu, Yiming Xue, Juan Wen, Wanli Peng

Abstract: Recent advances in large language models (LLMs) have blurred the boundary of high-quality text generation between humans and machines, which is favorable for generative text steganography. While, current advanced steganographic mapping is not suitable for LLMs since most users are restricted to accessing only the black-box API or user interface of the LLMs, thereby lacking access to the training v… ▽ More Recent advances in large language models (LLMs) have blurred the boundary of high-quality text generation between humans and machines, which is favorable for generative text steganography. While, current advanced steganographic mapping is not suitable for LLMs since most users are restricted to accessing only the black-box API or user interface of the LLMs, thereby lacking access to the training vocabulary and its sampling probabilities. In this paper, we explore a black-box generative text steganographic method based on the user interfaces of large language models, which is called LLM-Stega. The main goal of LLM-Stega is that the secure covert communication between Alice (sender) and Bob (receiver) is conducted by using the user interfaces of LLMs. Specifically, We first construct a keyword set and design a new encrypted steganographic mapping to embed secret messages. Furthermore, to guarantee accurate extraction of secret messages and rich semantics of generated stego texts, an optimization mechanism based on reject sampling is proposed. Comprehensive experiments demonstrate that the proposed LLM-Stega outperforms current state-of-the-art methods. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.07200 [pdf, other]

Toward a Better Understanding of Fourier Neural Operators: Analysis and Improvement from a Spectral Perspective

Authors: Shaoxiang Qin, Fuyuan Lyu, Wenhui Peng, Dingyang Geng, Ju Wang, Naiping Gao, Xue Liu, Liangzhu Leon Wang

Abstract: In solving partial differential equations (PDEs), Fourier Neural Operators (FNOs) have exhibited notable effectiveness compared to Convolutional Neural Networks (CNNs). This paper presents clear empirical evidence through spectral analysis to elucidate the superiority of FNO over CNNs: FNO is significantly more capable of learning low-frequencies. This empirical evidence also unveils FNO's distinc… ▽ More In solving partial differential equations (PDEs), Fourier Neural Operators (FNOs) have exhibited notable effectiveness compared to Convolutional Neural Networks (CNNs). This paper presents clear empirical evidence through spectral analysis to elucidate the superiority of FNO over CNNs: FNO is significantly more capable of learning low-frequencies. This empirical evidence also unveils FNO's distinct low-frequency bias, which limits FNO's effectiveness in learning high-frequency information from PDE data. To tackle this challenge, we introduce SpecBoost, an ensemble learning framework that employs multiple FNOs to better capture high-frequency information. Specifically, a secondary FNO is utilized to learn the overlooked high-frequency information from the prediction residual of the initial FNO. Experiments demonstrate that SpecBoost noticeably enhances FNO's prediction accuracy on diverse PDE applications, achieving an up to 71% improvement. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.05834 [pdf, other]

Fourier neural operator for large eddy simulation of compressible Rayleigh-Taylor turbulence

Authors: Tengfei Luo, Zhijie Li, Zelong Yuan, Wenhui Peng, Tianyuan Liu, Liangzhu, Wang, Jianchun Wang

Abstract: The Fourier neural operator (FNO) framework is applied to the large eddy simulation (LES) of three-dimensional compressible Rayleigh-Taylor (RT) turbulence with miscible fluids at Atwood number $A_t=0.5$, stratification parameter $Sr=1.0$, and Reynolds numbers $Re=10000$ and 30000. The FNO model is first used for predicting three-dimensional compressible turbulence. The different magnitudes of phy… ▽ More The Fourier neural operator (FNO) framework is applied to the large eddy simulation (LES) of three-dimensional compressible Rayleigh-Taylor (RT) turbulence with miscible fluids at Atwood number $A_t=0.5$, stratification parameter $Sr=1.0$, and Reynolds numbers $Re=10000$ and 30000. The FNO model is first used for predicting three-dimensional compressible turbulence. The different magnitudes of physical fields are normalized using root mean square values for an easier training of FNO models. In the \emph{a posteriori} tests, the FNO model outperforms the velocity gradient model (VGM), the dynamic Smagorinsky model (DSM), and implicit large eddy simulation (ILES) in predicting various statistical quantities and instantaneous structures, and is particularly superior to traditional LES methods in predicting temperature fields and velocity divergence. Moreover, the computational efficiency of the FNO model is much higher than that of traditional LES methods. FNO models trained with short-time, low Reynolds number data exhibit a good generalization performance on longer-time predictions and higher Reynolds numbers in the \emph{a posteriori} tests. △ Less

Submitted 2 July, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

arXiv:2403.16235 [pdf, other]

The Next Generation Virgo Cluster Survey (NGVS). III. A Catalog of Surface Brightness Fluctuation Distances and the Three-Dimensional Distribution of Galaxies in the Virgo Cluster

Authors: Michele Cantiello, John P. Blakeslee, Patrick Côté, Gabriella Raimondo, Jean-Charles Cuillandre, Patrick R. Durrell, Stephen Gwyn, Nandini Hazra, Eric W. Peng, Joel C. Roediger, Rúben Sánchez-Janssen, Max Kurzner

Abstract: The surface brightness fluctuation (SBF) method is a robust and efficient way of measuring distances to galaxies containing evolved stellar populations. Although many recent applications of the method have used space-based imaging, SBF remains a powerful technique for ground-based telescopes. Deep, wide-field imaging surveys with subarsecond seeing enable SBF measurements for numerous nearby galax… ▽ More The surface brightness fluctuation (SBF) method is a robust and efficient way of measuring distances to galaxies containing evolved stellar populations. Although many recent applications of the method have used space-based imaging, SBF remains a powerful technique for ground-based telescopes. Deep, wide-field imaging surveys with subarsecond seeing enable SBF measurements for numerous nearby galaxies. Using a preliminary calibration, Cantiello et al. (2018) presented SBF distances for 89 bright, mainly early-type galaxies observed in the Next Generation Virgo Cluster Survey (NGVS). Here, we present a refined calibration and SBF distances for 278 galaxies extending several magnitudes fainter than in previous work. The derived distances have uncertainties of 5-12\% depending on the properties of the individual galaxies, and our sample is more than three times larger than any previous SBF study of this region. Virgo has a famously complex structure with numerous subclusters, clouds and groups; we associate individual galaxies with the various substructures and map their three-dimensional spatial distribution. Curiously, subcluster A, centered around M87, appears to have two peaks in distance: the main peak at $\sim$16.5 Mpc and a smaller one at $\sim$19.4 Mpc. Subclusters B and C have distances of $\sim$15.8 Mpc. The W and W' groups form a filament-like structure, extending more than 15~Mpc behind the cluster with a commensurate velocity increase of $\sim$1000 \kms\ along its length. These measurements are a valuable resource for future studies of the relationship between galaxy properties and local environment within a dynamic and evolving region. △ Less

Submitted 24 March, 2024; originally announced March 2024.

Comments: 30 pages, 15 figures, Acccepted for publication on the ApJ

arXiv:2403.16026 [pdf, other]

A transformer-based neural operator for large-eddy simulation of turbulence

Authors: Zhijie Li, Tianyuan Liu, Wenhui Peng, Zelong Yuan, Jianchun Wang

Abstract: Predicting the large-scale dynamics of three-dimensional (3D) turbulence is challenging for machine learning approaches. This paper introduces a transformer-based neural operator (TNO) to achieve precise and efficient predictions in the large-eddy simulation (LES) of 3D turbulence. The performance of the proposed TNO model is systematically tested and compared with LES using classical sub-grid sca… ▽ More Predicting the large-scale dynamics of three-dimensional (3D) turbulence is challenging for machine learning approaches. This paper introduces a transformer-based neural operator (TNO) to achieve precise and efficient predictions in the large-eddy simulation (LES) of 3D turbulence. The performance of the proposed TNO model is systematically tested and compared with LES using classical sub-grid scale (SGS) models, including the dynamic Smagorinsky model (DSM) and the dynamic mixed model (DMM), as well as the original Fourier neural operator (FNO) model, in homogeneous isotropic turbulence (HIT) and free-shear turbulent mixing layer. The numerical simulations comprehensively evaluate the performance of these models on a variety of flow statistics, including the velocity spectrum, the probability density functions (PDFs) of vorticity, the PDFs of velocity increments, the evolution of turbulent kinetic energy, and the iso-surface of the Q-criterion. The results indicate that the accuracy of the TNO model is comparable to the LES with DSM model, and outperforms the FNO model and LES using DMM in HIT. In the free-shear turbulence, the TNO model exhibits superior accuracy compared to other models. Moreover, the TNO model has fewer parameters than the FNO model and enables long-term stable predictions, which the FNO model cannot achieve. The well-trained TNO model is significantly faster than traditional LES with DSM and DMM models, and can be generalized to higher Taylor-Reynolds number cases, indicating its strong potential for 3D nonlinear engineering applications. △ Less

Submitted 6 June, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

Comments: 45 pages, 21 figures. arXiv admin note: text overlap with arXiv:2305.10215

arXiv:2403.12851 [pdf, other]

doi 10.1007/s11433-023-2381-0

Observation of spectral lines in the exceptional GRB 221009A

Authors: Yan-Qiu Zhang, Shao-Lin Xiong, Ji-Rong Mao, Shuang-Nan Zhang, Wang-Chen Xue, Chao Zheng, Jia-Cong Liu, Zhen Zhang, Xi-Lu Wang, Ming-Yu Ge, Shu-Xu Yi, Li-Ming Song, Zheng-Hua An, Ce Cai, Xin-Qiao Li, Wen-Xi Peng, Wen-Jun Tan, Chen-Wei Wang, Xiang-Yang Wen, Yue Wang, Shuo Xiao, Fan Zhang, Peng Zhang, Shi-Jie Zheng

Abstract: As the brightest gamma-ray burst ever observed, GRB 221009A provided a precious opportunity to explore spectral line features. In this paper, we performed a comprehensive spectroscopy analysis of GRB 221009A jointly with GECAM-C and Fermi/GBM data to search for emission and absorption lines. For the first time we investigated the line feature throughout this GRB including the most bright part wher… ▽ More As the brightest gamma-ray burst ever observed, GRB 221009A provided a precious opportunity to explore spectral line features. In this paper, we performed a comprehensive spectroscopy analysis of GRB 221009A jointly with GECAM-C and Fermi/GBM data to search for emission and absorption lines. For the first time we investigated the line feature throughout this GRB including the most bright part where many instruments suffered problems, and identified prominent emission lines in multiple time intervals. The central energy of the Gaussian emission line evolves from about 37 MeV to 6 MeV, with a nearly constant ratio (about 10\%) between the line width and central energy. Particularly, we find that both the central energy and the energy flux of the emission line evolve with time as a power law decay with power law index of -1 and -2 respectively. We suggest that the observed emission lines most likely originate from the blue-shifted electron positron pair annihilation 511 keV line. We find that a standard high latitude emission scenario cannot fully interpret the observation, thus we propose that the emission line comes from some dense clumps with electron positron pairs traveling together with the jet. In this scenario, we can use the emission line to directly, for the first time, measure the bulk Lorentz factor of the jet ($Γ$) and reveal its time evolution (i.e. $Γ\sim t^{-1}$) during the prompt emission. Interestingly, we find that the flux of the annihilation line in the co-moving frame keeps constant. These discoveries of the spectral line features shed new and important lights on the physics of GRB and relativistic jet. △ Less

Submitted 28 May, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

Comments: Accepted by SCIENCE CHINA Physics, Mechanics & Astronomy (SCPMA)

Journal ref: Observation of spectral lines in the exceptional GRB 221009A. Sci. China-Phys. Mech. Astron. 67, 289511 (2024)

arXiv:2403.12406 [pdf, other]

Offline Imitation of Badminton Player Behavior via Experiential Contexts and Brownian Motion

Authors: Kuang-Da Wang, Wei-Yao Wang, Ping-Chun Hsieh, Wen-Chih Peng

Abstract: In the dynamic and rapid tactic involvements of turn-based sports, badminton stands out as an intrinsic paradigm that requires alter-dependent decision-making of players. While the advancement of learning from offline expert data in sequential decision-making has been witnessed in various domains, how to rally-wise imitate the behaviors of human players from offline badminton matches has remained… ▽ More In the dynamic and rapid tactic involvements of turn-based sports, badminton stands out as an intrinsic paradigm that requires alter-dependent decision-making of players. While the advancement of learning from offline expert data in sequential decision-making has been witnessed in various domains, how to rally-wise imitate the behaviors of human players from offline badminton matches has remained underexplored. Replicating opponents' behavior benefits players by allowing them to undergo strategic development with direction before matches. However, directly applying existing methods suffers from the inherent hierarchy of the match and the compounding effect due to the turn-based nature of players alternatively taking actions. In this paper, we propose RallyNet, a novel hierarchical offline imitation learning model for badminton player behaviors: (i) RallyNet captures players' decision dependencies by modeling decision-making processes as a contextual Markov decision process. (ii) RallyNet leverages the experience to generate context as the agent's intent in the rally. (iii) To generate more realistic behavior, RallyNet leverages Geometric Brownian Motion (GBM) to model the interactions between players by introducing a valuable inductive bias for learning player behaviors. In this manner, RallyNet links player intents with interaction models with GBM, providing an understanding of interactions for sports analytics. We extensively validate RallyNet with the largest available real-world badminton dataset consisting of men's and women's singles, demonstrating its ability to imitate player behaviors. Results reveal RallyNet's superiority over offline imitation learning methods and state-of-the-art turn-based approaches, outperforming them by at least 16% in mean rule-based agent normalization score. Furthermore, we discuss various practical use cases to highlight RallyNet's applicability. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: Preprint

arXiv:2403.10281 [pdf, other]

Team Trifecta at Factify5WQA: Setting the Standard in Fact Verification with Fine-Tuning

Authors: Shang-Hsuan Chiang, Ming-Chih Lo, Lin-Wei Chao, Wen-Chih Peng

Abstract: In this paper, we present Pre-CoFactv3, a comprehensive framework comprised of Question Answering and Text Classification components for fact verification. Leveraging In-Context Learning, Fine-tuned Large Language Models (LLMs), and the FakeNet model, we address the challenges of fact verification. Our experiments explore diverse approaches, comparing different Pre-trained LLMs, introducing FakeNe… ▽ More In this paper, we present Pre-CoFactv3, a comprehensive framework comprised of Question Answering and Text Classification components for fact verification. Leveraging In-Context Learning, Fine-tuned Large Language Models (LLMs), and the FakeNet model, we address the challenges of fact verification. Our experiments explore diverse approaches, comparing different Pre-trained LLMs, introducing FakeNet, and implementing various ensemble methods. Notably, our team, Trifecta, secured first place in the AAAI-24 Factify 3.0 Workshop, surpassing the baseline accuracy by 103% and maintaining a 70% lead over the second competitor. This success underscores the efficacy of our approach and its potential contributions to advancing fact verification research. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: Accepted by AAAI 2024 Workshop: FACTIFY 3.0 - Workshop Series on Multimodal Fact-Checking and Hate Speech Detection

arXiv:2403.09926 [pdf, other]

The Next Generation Virgo Cluster Survey (NGVS). XXVII.The Size and Structure of Globular Cluster Systems and their Connection to Dark Matter Halos

Authors: Sungsoon Lim, Eric W. Peng, Patrick Côté, Laura Ferrarese, Joel C. Roediger, Chengze Liu, Chelsea Spengler, Elisabeth Sola, Pierre-Alain Duc, Laura V. Sales, John P. Blakeslee, Jean-Charles Cuillandre, Patrick R. Durrell, Eric Emsellem, Stephen D. J. Gwyn, Ariane Lançon, Francine R. Marleau, J. Christopher Mihos, Oliver Müller, Thomas H. Puzia, Rubén Sánchez-Janssen

Abstract: We study the size and structure of globular clusters (GC) systems of 118 early-type galaxies from the NGVS, MATLAS, and ACSVCS surveys. Fitting Sérsic profiles, we investigate the relationship between effective radii of GC systems ($R_{e, \rm gc}$) and galaxy properties. GC systems are 2--4 times more extended than host galaxies across the entire stellar mass range of our sample (… ▽ More We study the size and structure of globular clusters (GC) systems of 118 early-type galaxies from the NGVS, MATLAS, and ACSVCS surveys. Fitting Sérsic profiles, we investigate the relationship between effective radii of GC systems ($R_{e, \rm gc}$) and galaxy properties. GC systems are 2--4 times more extended than host galaxies across the entire stellar mass range of our sample ($10^{8.3} < M_* < 10^{11.6}~M_{\odot}$). The relationship between $R_{e, \rm gc}$ and galaxy stellar mass exhibits a characteristic "knee" at a stellar mass of $M_p \simeq 10^{10.8}$, similar to galaxy $R_e$--stellar mass relationship. We present a new characterization of the traditional blue and red GC color sub-populations, describing them with respect to host galaxy $(g'-i')$ color ($Δ_{gi}$): GCs with similar colors to their hosts have a "red" $Δ_{gi}$, and those significantly bluer GCs have a "blue" $Δ_{gi}$. The GC populations with red $Δ_{gi}$, even in dwarf galaxies, are twice as extended as the stars, suggesting that formation or survival mechanisms favor the outer regions. We find a tight correlation between $R_{e, \rm gc}$ and the total number of GCs, with intrinsic scatter $\lesssim 0.1$ dex spanning two and three orders of magnitude in size and number, respectively. This holds for both red and blue subpopulations, albeit with different slopes. Assuming that $N_{GC, Total}$ correlates with $M_{200}$, we find that the red GC systems have effective radii of roughly 1-5\% $R_{\rm 200}$, while the blue GC systems in massive galaxies can have sizes as large as $\sim$10\% $R_{\rm 200}$. Environmental dependence on $R_{e, \rm gc}$ is also found, with lower density environments exhibiting more extended GC systems at fixed mass. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: 28 pages, 18 Figures, 3 tables, accepted for publication in ApJ

arXiv:2403.04785 [pdf, other]

Large Language Multimodal Models for 5-Year Chronic Disease Cohort Prediction Using EHR Data

Authors: Jun-En Ding, Phan Nguyen Minh Thao, Wen-Chih Peng, Jian-Zhe Wang, Chun-Cheng Chug, Min-Chen Hsieh, Yun-Chien Tseng, Ling Chen, Dongsheng Luo, Chi-Te Wang, Pei-fu Chen, Feng Liu, Fang-Ming Hung

Abstract: Chronic diseases such as diabetes are the leading causes of morbidity and mortality worldwide. Numerous research studies have been attempted with various deep learning models in diagnosis. However, most previous studies had certain limitations, including using publicly available datasets (e.g. MIMIC), and imbalanced data. In this study, we collected five-year electronic health records (EHRs) from… ▽ More Chronic diseases such as diabetes are the leading causes of morbidity and mortality worldwide. Numerous research studies have been attempted with various deep learning models in diagnosis. However, most previous studies had certain limitations, including using publicly available datasets (e.g. MIMIC), and imbalanced data. In this study, we collected five-year electronic health records (EHRs) from the Taiwan hospital database, including 1,420,596 clinical notes, 387,392 laboratory test results, and more than 1,505 laboratory test items, focusing on research pre-training large language models. We proposed a novel Large Language Multimodal Models (LLMMs) framework incorporating multimodal data from clinical notes and laboratory test results for the prediction of chronic disease risk. Our method combined a text embedding encoder and multi-head attention layer to learn laboratory test values, utilizing a deep neural network (DNN) module to merge blood features with chronic disease semantics into a latent space. In our experiments, we observe that clinicalBERT and PubMed-BERT, when combined with attention fusion, can achieve an accuracy of 73% in multiclass chronic diseases and diabetes prediction. By transforming laboratory test values into textual descriptions and employing the Flan T-5 model, we achieved a 76% Area Under the ROC Curve (AUROC), demonstrating the effectiveness of leveraging numerical text data for training and inference in language models. This approach significantly improves the accuracy of early-stage diabetes prediction. △ Less

Submitted 2 March, 2024; originally announced March 2024.

arXiv:2403.03051 [pdf, other]

Prediction of turbulent channel flow using Fourier neural operator-based machine-learning strategy

Authors: Yunpeng Wang, Zhijie Li, Zelong Yuan, Wenhui Peng, Tianyuan Liu, Jianchun Wang

Abstract: Fast and accurate predictions of turbulent flows are of great importance in the science and engineering field. In this paper, we investigate the implicit U-Net enhanced Fourier neural operator (IUFNO) in the stable prediction of long-time dynamics of three-dimensional (3D) turbulent channel flows. The trained IUFNO models are tested in the large-eddy simulations (LES) at coarse grids for three fri… ▽ More Fast and accurate predictions of turbulent flows are of great importance in the science and engineering field. In this paper, we investigate the implicit U-Net enhanced Fourier neural operator (IUFNO) in the stable prediction of long-time dynamics of three-dimensional (3D) turbulent channel flows. The trained IUFNO models are tested in the large-eddy simulations (LES) at coarse grids for three friction Reynolds numbers: $Re_τ\approx180$, $395$ and $590$. The adopted near-wall mesh grids are tangibly coarser than the general requirements for wall-resolved LES. The numerical experiments show that the IUFNO framework outperforms the traditional dynamic Smagorinsky model (DSM) and the wall-adapted local eddy-viscosity (WALE) model in the predictions of a variety of flow statistics and structures, including the mean and fluctuating velocities, the probability density functions (PDFs) and joint PDF of velocity fluctuations, the Reynolds stress profile, the kinetic energy spectrum, and the Q-criterion (vortex structures). Meanwhile, the trained IUFNO models are computationally much faster than the traditional LES models. Thus, the IUFNO is a promising approach for the fast prediction of wall-bounded turbulent flow. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2402.17271 [pdf, other]

Capacitive coupling study of the HERD SCD prototype: preliminary results

Authors: Ruo-Si Lu, Rui Qiao, Ke Gong, Wen-Xi Peng, Wei-Shuai Zhang, Dong-Ya Guo, Jia-Ju Wei, Yi-Ming Hu, Jian-Hua Guo, Qi Wu, Peng Hu, Xuan Liu, Bing Lu, Yi-Rong Zhang

Abstract: The Silicon Charge Detector (SCD) is a subdetector of the High Energy Cosmic Radiation Detection payload. The dynamic range of the silicon microstrip detector can be extended by the capacitive coupling effect, which is related to the interstrip capacitance and the coupling capacitance. A detector prototype with several sets of parameters was designed and tested in the ion beams at the CERN Super P… ▽ More The Silicon Charge Detector (SCD) is a subdetector of the High Energy Cosmic Radiation Detection payload. The dynamic range of the silicon microstrip detector can be extended by the capacitive coupling effect, which is related to the interstrip capacitance and the coupling capacitance. A detector prototype with several sets of parameters was designed and tested in the ion beams at the CERN Super Proton Synchrotron. The capacitive coupling fractions with readout strip and floating strip incidences were studied using the beam test data and SPICE simulation. △ Less

Submitted 27 February, 2024; originally announced February 2024.

arXiv:2402.15741 [pdf, other]

Observation of the In-plane Anomalous Hall Effect induced by Octupole in Magnetization Space

Authors: Wenzhi Peng, Zheng Liu, Haolin Pan, Peng Wang, Yulong Chen, Jiachen Zhang, Xuhao Yu, Jinhui Shen, Mingmin Yang, Qian Niu, Yang Gao, Dazhi Hou

Abstract: The Anomalous Hall Effect (AHE) manifests as a transverse voltage proportional to magnetization in ferromagnetic materials under the application of a charge current, being an indispensable tool for probing magnetism, especially in nanoscale devices. However, the AHE primarily sensitizes to out-of-plane magnetization, thereby hindering its capacity to discern the in-plane magnetization, a character… ▽ More The Anomalous Hall Effect (AHE) manifests as a transverse voltage proportional to magnetization in ferromagnetic materials under the application of a charge current, being an indispensable tool for probing magnetism, especially in nanoscale devices. However, the AHE primarily sensitizes to out-of-plane magnetization, thereby hindering its capacity to discern the in-plane magnetization, a characteristic prevalent in ferromagnetic films. Here we challenge this conventional understanding by demonstrating the in-plane magnetization-induced AHE in iron and nickel, two ubiquitous ferromagnets. This observation of the in-plane AHE is remarkable as it contradicts existing theories that forbid such phenomena in cubic crystal systems. We trace the origin of this unanticipated phenomenon to a hitherto unconsidered octupole of the anomalous Hall conductivity in the magnetization space, a mechanism we propose could enable the detection of in-plane AHE in a wide range of ferromagnetic materials. This work realizes the in-plane AHE in common ferromagnets by exploiting the anomalous Hall conductivity octupole, revealing a new physical origin of the AHE and promising to revolutionize the design of magnetic devices and sensors. △ Less

Submitted 24 February, 2024; originally announced February 2024.

arXiv:2402.12888 [pdf, other]

Transformer-based Learned Image Compression for Joint Decoding and Denoising

Authors: Yi-Hsin Chen, Kuan-Wei Ho, Shiau-Rung Tsai, Guan-Hsun Lin, Alessandro Gnutti, Wen-Hsiao Peng, Riccardo Leonardi

Abstract: This work introduces a Transformer-based image compression system. It has the flexibility to switch between the standard image reconstruction and the denoising reconstruction from a single compressed bitstream. Instead of training separate decoders for these tasks, we incorporate two add-on modules to adapt a pre-trained image decoder from performing the standard image reconstruction to joint deco… ▽ More This work introduces a Transformer-based image compression system. It has the flexibility to switch between the standard image reconstruction and the denoising reconstruction from a single compressed bitstream. Instead of training separate decoders for these tasks, we incorporate two add-on modules to adapt a pre-trained image decoder from performing the standard image reconstruction to joint decoding and denoising. Our scheme adopts a two-pronged approach. It features a latent refinement module to refine the latent representation of a noisy input image for reconstructing a noise-free image. Additionally, it incorporates an instance-specific prompt generator that adapts the decoding process to improve on the latent refinement. Experimental results show that our method achieves a similar level of denoising quality to training a separate decoder for joint decoding and denoising at the expense of only a modest increase in the decoder's model size and computational complexity. △ Less

Submitted 20 February, 2024; originally announced February 2024.

Comments: Accepted to PCS 2024

arXiv:2402.12816 [pdf, other]

OMRA: Online Motion Resolution Adaptation to Remedy Domain Shift in Learned Hierarchical B-frame Coding

Authors: Zong-Lin Gao, Sang NguyenQuang, Wen-Hsiao Peng, Xiem HoangVan

Abstract: Learned hierarchical B-frame coding aims to leverage bi-directional reference frames for better coding efficiency. However, the domain shift between training and test scenarios due to dataset limitations poses a challenge. This issue arises from training the codec with small groups of pictures (GOP) but testing it on large GOPs. Specifically, the motion estimation network, when trained on small GO… ▽ More Learned hierarchical B-frame coding aims to leverage bi-directional reference frames for better coding efficiency. However, the domain shift between training and test scenarios due to dataset limitations poses a challenge. This issue arises from training the codec with small groups of pictures (GOP) but testing it on large GOPs. Specifically, the motion estimation network, when trained on small GOPs, is unable to handle large motion at test time, incurring a negative impact on compression performance. To mitigate the domain shift, we present an online motion resolution adaptation (OMRA) method. It adapts the spatial resolution of video frames on a per-frame basis to suit the capability of the motion estimation network in a pre-trained B-frame codec. Our OMRA is an online, inference technique. It need not re-train the codec and is readily applicable to existing B-frame codecs that adopt hierarchical bi-directional prediction. Experimental results show that OMRA significantly enhances the compression performance of two state-of-the-art learned B-frame codecs on commonly used datasets. △ Less

Submitted 20 February, 2024; originally announced February 2024.

Comments: 7 pages, submitted to IEEE ICIP 2024

arXiv:2402.05418 [pdf, other]

The Next Generation Virgo Cluster Survey. XXXVII. Distant RR Lyrae Stars and the Milky Way Stellar Halo out to 300 kpc

Authors: Yuting Feng, Puragra Guhathakurta, Eric W. Peng, Stephen D. J. Gwyn, Laura Ferrarese, Patrick Côté, Jean-Charles Cuillandre, Jeffrey Munsell, Manjima Talukdar

Abstract: RR Lyrae stars are standard candles with characteristic photometric variability and serve as powerful tracers of Galactic structure, substructure, accretion history, and dark matter content. Here we report the discovery of distant RR Lyrae stars, including some of the most distant stars known in the Milky Way halo, with Galactocentric distances of approximately 300 kpc. We use time-series u*g'i'z'… ▽ More RR Lyrae stars are standard candles with characteristic photometric variability and serve as powerful tracers of Galactic structure, substructure, accretion history, and dark matter content. Here we report the discovery of distant RR Lyrae stars, including some of the most distant stars known in the Milky Way halo, with Galactocentric distances of approximately 300 kpc. We use time-series u*g'i'z' Canada-France-Hawaii Telescope/MegaCam photometry from the Next Generation Virgo Cluster Survey (NGVS). We employ a template light curve fitting method based on empirical Sloan Digital Sky Survey (SDSS) Stripe 82 RR Lyrae data to identify RR Lyrae candidates in the NGVS data set. We eliminate several hundred suspected quasars and identify 180 RR Lyrae candidates, with heliocentric distances of approximately 20--300 kpc. The halo stellar density distribution is consistent with an r^(-4.09 +/- 0.10) power-law radial profile over most of this distance range with no signs of a break. The distribution of ab-type RR Lyrae in a period-amplitude plot (Bailey diagram) suggests that the mean metallicity of the halo decreases outwards. Compared to other recent RR Lyrae surveys, like Pan-STARRS1 (PS1), the High Cadence Transient Survey (HiTS), and the Dark Energy Survey (DES), our NGVS study has better single-epoch photometric precision and a comparable number of epochs but smaller sky coverage. At large distances, our RR Lyrae sample appears to be relatively pure and complete, with well-measured periods and amplitudes. These newly discovered distant RR Lyrae stars are important additions to the few secure stellar tracers beyond 150 kpc in the Milky Way halo. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: Accepted by ApJ

arXiv:2402.01204 [pdf, other]

A Survey on Self-Supervised Learning for Non-Sequential Tabular Data

Authors: Wei-Yao Wang, Wei-Wei Du, Derek Xu, Wei Wang, Wen-Chih Peng

Abstract: Self-supervised learning (SSL) has been incorporated into many state-of-the-art models in various domains, where SSL defines pretext tasks based on unlabeled datasets to learn contextualized and robust representations. Recently, SSL has been a new trend in exploring the representation learning capability in the realm of tabular data, which is more challenging due to not having explicit relations f… ▽ More Self-supervised learning (SSL) has been incorporated into many state-of-the-art models in various domains, where SSL defines pretext tasks based on unlabeled datasets to learn contextualized and robust representations. Recently, SSL has been a new trend in exploring the representation learning capability in the realm of tabular data, which is more challenging due to not having explicit relations for learning descriptive representations. This survey aims to systematically review and summarize the recent progress and challenges of SSL for non-sequential tabular data (SSL4NS-TD). We first present a formal definition of NS-TD and clarify its correlation to related studies. Then, these approaches are categorized into three groups -- predictive learning, contrastive learning, and hybrid learning, with their motivations and strengths of representative methods within each direction. On top of this, application issues of SSL4NS-TD are presented, including automatic data engineering, cross-table transferability, and domain knowledge integration. In addition, we elaborate on existing benchmarks and datasets for NS-TD applications to discuss the performance of existing tabular models. Finally, we discuss the challenges of SSL4NS-TD and provide potential directions for future research. We expect our work to be useful in terms of encouraging more research on lowering the barrier to entry SSL for the tabular domain and improving the foundations for implicit tabular data. △ Less

Submitted 5 February, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: The paper list can be found at https://github.com/wwweiwei/awesome-self-supervised-learning-for-tabular-data

arXiv:2402.01140 [pdf, other]

Root Cause Analysis In Microservice Using Neural Granger Causal Discovery

Authors: Cheng-Ming Lin, Ching Chang, Wei-Yao Wang, Kuang-Da Wang, Wen-Chih Peng

Abstract: In recent years, microservices have gained widespread adoption in IT operations due to their scalability, maintenance, and flexibility. However, it becomes challenging for site reliability engineers (SREs) to pinpoint the root cause due to the complex relationships in microservices when facing system malfunctions. Previous research employed structured learning methods (e.g., PC-algorithm) to estab… ▽ More In recent years, microservices have gained widespread adoption in IT operations due to their scalability, maintenance, and flexibility. However, it becomes challenging for site reliability engineers (SREs) to pinpoint the root cause due to the complex relationships in microservices when facing system malfunctions. Previous research employed structured learning methods (e.g., PC-algorithm) to establish causal relationships and derive root causes from causal graphs. Nevertheless, they ignored the temporal order of time series data and failed to leverage the rich information inherent in the temporal relationships. For instance, in cases where there is a sudden spike in CPU utilization, it can lead to an increase in latency for other microservices. However, in this scenario, the anomaly in CPU utilization occurs before the latency increase, rather than simultaneously. As a result, the PC-algorithm fails to capture such characteristics. To address these challenges, we propose RUN, a novel approach for root cause analysis using neural Granger causal discovery with contrastive learning. RUN enhances the backbone encoder by integrating contextual information from time series, and leverages a time series forecasting model to conduct neural Granger causal discovery. In addition, RUN incorporates Pagerank with a personalization vector to efficiently recommend the top-k root causes. Extensive experiments conducted on the synthetic and real-world microservice-based datasets demonstrate that RUN noticeably outperforms the state-of-the-art root cause analysis methods. Moreover, we provide an analysis scenario for the sock-shop case to showcase the practicality and efficacy of RUN in microservice-based applications. Our code is publicly available at https://github.com/zmlin1998/RUN. △ Less

Submitted 1 February, 2024; originally announced February 2024.

Comments: AAAI 2024 Main Track

arXiv:2402.00253 [pdf, other]

A Survey on Hallucination in Large Vision-Language Models

Authors: Hanchao Liu, Wenyuan Xue, Yifei Chen, Dapeng Chen, Xiutian Zhao, Ke Wang, Liping Hou, Rongjun Li, Wei Peng

Abstract: Recent development of Large Vision-Language Models (LVLMs) has attracted growing attention within the AI landscape for its practical implementation potential. However, ``hallucination'', or more specifically, the misalignment between factual visual content and corresponding textual generation, poses a significant challenge of utilizing LVLMs. In this comprehensive survey, we dissect LVLM-related h… ▽ More Recent development of Large Vision-Language Models (LVLMs) has attracted growing attention within the AI landscape for its practical implementation potential. However, ``hallucination'', or more specifically, the misalignment between factual visual content and corresponding textual generation, poses a significant challenge of utilizing LVLMs. In this comprehensive survey, we dissect LVLM-related hallucinations in an attempt to establish an overview and facilitate future mitigation. Our scrutiny starts with a clarification of the concept of hallucinations in LVLMs, presenting a variety of hallucination symptoms and highlighting the unique challenges inherent in LVLM hallucinations. Subsequently, we outline the benchmarks and methodologies tailored specifically for evaluating hallucinations unique to LVLMs. Additionally, we delve into an investigation of the root causes of these hallucinations, encompassing insights from the training data and model components. We also critically review existing methods for mitigating hallucinations. The open questions and future directions pertaining to hallucinations within LVLMs are discussed to conclude this survey. △ Less

Submitted 5 May, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

arXiv:2401.15509 [pdf, other]

Style-News: Incorporating Stylized News Generation and Adversarial Verification for Neural Fake News Detection

Authors: Wei-Yao Wang, Yu-Chieh Chang, Wen-Chih Peng

Abstract: With the improvements in generative models, the issues of producing hallucinations in various domains (e.g., law, writing) have been brought to people's attention due to concerns about misinformation. In this paper, we focus on neural fake news, which refers to content generated by neural networks aiming to mimic the style of real news to deceive people. To prevent harmful disinformation spreading… ▽ More With the improvements in generative models, the issues of producing hallucinations in various domains (e.g., law, writing) have been brought to people's attention due to concerns about misinformation. In this paper, we focus on neural fake news, which refers to content generated by neural networks aiming to mimic the style of real news to deceive people. To prevent harmful disinformation spreading fallaciously from malicious social media (e.g., content farms), we propose a novel verification framework, Style-News, using publisher metadata to imply a publisher's template with the corresponding text types, political stance, and credibility. Based on threat modeling aspects, a style-aware neural news generator is introduced as an adversary for generating news content conditioning for a specific publisher, and style and source discriminators are trained to defend against this attack by identifying which publisher the style corresponds with, and discriminating whether the source of the given news is human-written or machine-generated. To evaluate the quality of the generated content, we integrate various dimensional metrics (language fluency, content preservation, and style adherence) and demonstrate that Style-News significantly outperforms the previous approaches by a margin of 0.35 for fluency, 15.24 for content, and 0.38 for style at most. Moreover, our discriminative model outperforms state-of-the-art baselines in terms of publisher prediction (up to 4.64%) and neural fake news detection (+6.94% $\sim$ 31.72%). △ Less

Submitted 27 January, 2024; originally announced January 2024.

Comments: EACL 2024 Main Track

arXiv:2401.09025 [pdf, other]

Exploring the Diversity of Music Experiences for Deaf and Hard of Hearing People

Authors: Kyrie Zhixuan Zhou, Weirui Peng, Yuhan Liu, Rachel F. Adler

Abstract: Sensory substitution or enhancement techniques have been proposed to enable deaf or hard of hearing (DHH) people to listen to and even compose music. However, little is known about how such techniques enhance DHH people's music experience. Since deafness is a spectrum -- as are DHH people's preferences and perceptions of music -- a more situated understanding of their interaction with music is nee… ▽ More Sensory substitution or enhancement techniques have been proposed to enable deaf or hard of hearing (DHH) people to listen to and even compose music. However, little is known about how such techniques enhance DHH people's music experience. Since deafness is a spectrum -- as are DHH people's preferences and perceptions of music -- a more situated understanding of their interaction with music is needed. To understand the music experience of this population, we conducted social media analyses, both qualitatively and quantitatively, in the deaf and hard of hearing Reddit communities. Our content analysis revealed that DHH people leveraged sign language and visual/haptic cues to feel the music and preferred familiar, non-lyrical, instrument-heavy, and loud music. In addition, hearing aids were not customized for music, and the visual/haptic techniques developed were not widely adopted by DHH people, leading to their suboptimal music experiences. The DHH community embodied mutual support among music lovers, evidenced by active information sharing and Q&A around music and hearing loss. We reflect on design justice for DHH people's music experience and propose practical design implications to create a more accessible music experience for them. △ Less

Submitted 17 January, 2024; originally announced January 2024.

arXiv:2401.08053 [pdf, other]

SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation

Authors: Zhixuan Liu, Peter Schaldenbrand, Beverley-Claire Okogwu, Wenxuan Peng, Youngsik Yun, Andrew Hundt, Jihie Kim, Jean Oh

Abstract: Accurate representation in media is known to improve the well-being of the people who consume it. Generative image models trained on large web-crawled datasets such as LAION are known to produce images with harmful stereotypes and misrepresentations of cultures. We improve inclusive representation in generated images by (1) engaging with communities to collect a culturally representative dataset t… ▽ More Accurate representation in media is known to improve the well-being of the people who consume it. Generative image models trained on large web-crawled datasets such as LAION are known to produce images with harmful stereotypes and misrepresentations of cultures. We improve inclusive representation in generated images by (1) engaging with communities to collect a culturally representative dataset that we call the Cross-Cultural Understanding Benchmark (CCUB) and (2) proposing a novel Self-Contrastive Fine-Tuning (SCoFT) method that leverages the model's known biases to self-improve. SCoFT is designed to prevent overfitting on small datasets, encode only high-level information from the data, and shift the generated distribution away from misrepresentations encoded in a pretrained model. Our user study conducted on 51 participants from 5 different countries based on their self-selected national cultural affiliation shows that fine-tuning on CCUB consistently generates images with higher cultural relevance and fewer stereotypes when compared to the Stable Diffusion baseline, which is further improved with our SCoFT technique. △ Less

Submitted 15 January, 2024; originally announced January 2024.

arXiv:2401.06775 [pdf, other]

Large language models in healthcare and medical domain: A review

Authors: Zabir Al Nazi, Wei Peng

Abstract: The deployment of large language models (LLMs) within the healthcare sector has sparked both enthusiasm and apprehension. These models exhibit the remarkable capability to provide proficient responses to free-text queries, demonstrating a nuanced understanding of professional medical knowledge. This comprehensive survey delves into the functionalities of existing LLMs designed for healthcare appli… ▽ More The deployment of large language models (LLMs) within the healthcare sector has sparked both enthusiasm and apprehension. These models exhibit the remarkable capability to provide proficient responses to free-text queries, demonstrating a nuanced understanding of professional medical knowledge. This comprehensive survey delves into the functionalities of existing LLMs designed for healthcare applications, elucidating the trajectory of their development, starting from traditional Pretrained Language Models (PLMs) to the present state of LLMs in healthcare sector. First, we explore the potential of LLMs to amplify the efficiency and effectiveness of diverse healthcare applications, particularly focusing on clinical language understanding tasks. These tasks encompass a wide spectrum, ranging from named entity recognition and relation extraction to natural language inference, multi-modal medical applications, document classification, and question-answering. Additionally, we conduct an extensive comparison of the most recent state-of-the-art LLMs in the healthcare domain, while also assessing the utilization of various open-source LLMs and highlighting their significance in healthcare applications. Furthermore, we present the essential performance metrics employed to evaluate LLMs in the biomedical domain, shedding light on their effectiveness and limitations. Finally, we summarize the prominent challenges and constraints faced by large language models in the healthcare sector, offering a holistic perspective on their potential benefits and shortcomings. This review provides a comprehensive exploration of the current landscape of LLMs in healthcare, addressing their role in transforming medical applications and the areas that warrant further research and development. △ Less

Submitted 8 July, 2024; v1 submitted 12 December, 2023; originally announced January 2024.

arXiv:2401.06517 [pdf, other]

LiDAR Depth Map Guided Image Compression Model

Authors: Alessandro Gnutti, Stefano Della Fiore, Mattia Savardi, Yi-Hsin Chen, Riccardo Leonardi, Wen-Hsiao Peng

Abstract: The incorporation of LiDAR technology into some high-end smartphones has unlocked numerous possibilities across various applications, including photography, image restoration, augmented reality, and more. In this paper, we introduce a novel direction that harnesses LiDAR depth maps to enhance the compression of the corresponding RGB camera images. To the best of our knowledge, this represents the… ▽ More The incorporation of LiDAR technology into some high-end smartphones has unlocked numerous possibilities across various applications, including photography, image restoration, augmented reality, and more. In this paper, we introduce a novel direction that harnesses LiDAR depth maps to enhance the compression of the corresponding RGB camera images. To the best of our knowledge, this represents the initial exploration in this particular research direction. Specifically, we propose a Transformer-based learned image compression system capable of achieving variable-rate compression using a single model while utilizing the LiDAR depth map as supplementary information for both the encoding and decoding processes. Experimental results demonstrate that integrating LiDAR yields an average PSNR gain of 0.83 dB and an average bitrate reduction of 16% as compared to its absence. △ Less

Submitted 27 June, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

arXiv:2401.02074 [pdf, other]

On the boundary of the central quadratic hyperbolic component

Authors: Guizhen Cui, Wenjuan Peng

Abstract: We give a concrete description for the boundary of the central quadratic hyperbolic component. The connectedness of the Julia sets of the boundary maps are also considered. We give a concrete description for the boundary of the central quadratic hyperbolic component. The connectedness of the Julia sets of the boundary maps are also considered. △ Less

Submitted 4 January, 2024; originally announced January 2024.

Comments: 9 pages,3 figures

MSC Class: 37F10; 37F20

arXiv:2401.00652 [pdf, other]

From Covert Hiding to Visual Editing: Robust Generative Video Steganography

Authors: Xueying Mao, Xiaoxiao Hu, Wanli Peng, Zhenliang Gan, Qichao Ying, Zhenxing Qian, Sheng Li, Xinpeng Zhang

Abstract: Traditional video steganography methods are based on modifying the covert space for embedding, whereas we propose an innovative approach that embeds secret message within semantic feature for steganography during the video editing process. Although existing traditional video steganography methods display a certain level of security and embedding capacity, they lack adequate robustness against comm… ▽ More Traditional video steganography methods are based on modifying the covert space for embedding, whereas we propose an innovative approach that embeds secret message within semantic feature for steganography during the video editing process. Although existing traditional video steganography methods display a certain level of security and embedding capacity, they lack adequate robustness against common distortions in online social networks (OSNs). In this paper, we introduce an end-to-end robust generative video steganography network (RoGVS), which achieves visual editing by modifying semantic feature of videos to embed secret message. We employ face-swapping scenario to showcase the visual editing effects. We first design a secret message embedding module to adaptively hide secret message into the semantic feature of videos. Extensive experiments display that the proposed RoGVS method applied to facial video datasets demonstrate its superiority over existing video and image steganography techniques in terms of both robustness and capacity. △ Less

Submitted 31 December, 2023; originally announced January 2024.

Comments: Under Review

arXiv:2312.17617 [pdf, other]

Large Language Models for Generative Information Extraction: A Survey

Authors: Derong Xu, Wei Chen, Wenjun Peng, Chao Zhang, Tong Xu, Xiangyu Zhao, Xian Wu, Yefeng Zheng, Yang Wang, Enhong Chen

Abstract: Information extraction (IE) aims to extract structural knowledge (such as entities, relations, and events) from plain natural language texts. Recently, generative Large Language Models (LLMs) have demonstrated remarkable capabilities in text understanding and generation, allowing for generalization across various domains and tasks. As a result, numerous works have been proposed to harness abilitie… ▽ More Information extraction (IE) aims to extract structural knowledge (such as entities, relations, and events) from plain natural language texts. Recently, generative Large Language Models (LLMs) have demonstrated remarkable capabilities in text understanding and generation, allowing for generalization across various domains and tasks. As a result, numerous works have been proposed to harness abilities of LLMs and offer viable solutions for IE tasks based on a generative paradigm. To conduct a comprehensive systematic review and exploration of LLM efforts for IE tasks, in this study, we survey the most recent advancements in this field. We first present an extensive overview by categorizing these works in terms of various IE subtasks and learning paradigms, then we empirically analyze the most advanced methods and discover the emerging trend of IE tasks with LLMs. Based on thorough review conducted, we identify several insights in technique and promising research directions that deserve further exploration in future studies. We maintain a public repository and consistently update related resources at: \url{https://github.com/quqxui/Awesome-LLM4IE-Papers}. △ Less

Submitted 4 June, 2024; v1 submitted 29 December, 2023; originally announced December 2023.

Comments: v2: Updated 100+ new papers, 5 technical categories

arXiv:2312.15829 [pdf, other]

MaskCRT: Masked Conditional Residual Transformer for Learned Video Compression

Authors: Yi-Hsin Chen, Hong-Sheng Xie, Cheng-Wei Chen, Zong-Lin Gao, Martin Benjak, Wen-Hsiao Peng, Jörn Ostermann

Abstract: Conditional coding has lately emerged as the mainstream approach to learned video compression. However, a recent study shows that it may perform worse than residual coding when the information bottleneck arises. Conditional residual coding was thus proposed, creating a new school of thought to improve on conditional coding. Notably, conditional residual coding relies heavily on the assumption that… ▽ More Conditional coding has lately emerged as the mainstream approach to learned video compression. However, a recent study shows that it may perform worse than residual coding when the information bottleneck arises. Conditional residual coding was thus proposed, creating a new school of thought to improve on conditional coding. Notably, conditional residual coding relies heavily on the assumption that the residual frame has a lower entropy rate than that of the intra frame. Recognizing that this assumption is not always true due to dis-occlusion phenomena or unreliable motion estimates, we propose a masked conditional residual coding scheme. It learns a soft mask to form a hybrid of conditional coding and conditional residual coding in a pixel adaptive manner. We introduce a Transformer-based conditional autoencoder. Several strategies are investigated with regard to how to condition a Transformer-based autoencoder for inter-frame coding, a topic that is largely under-explored. Additionally, we propose a channel transform module (CTM) to decorrelate the image latents along the channel dimension, with the aim of using the simple hyperprior to approach similar compression performance to the channel-wise autoregressive model. Experimental results confirm the superiority of our masked conditional residual transformer (termed MaskCRT) to both conditional coding and conditional residual coding. On commonly used datasets, MaskCRT shows comparable BD-rate results to VTM-17.0 under the low delay P configuration in terms of PSNR-RGB and outperforms VTM-17.0 in terms of MS-SSIM-RGB. It also opens up a new research direction for advancing learned video compression. △ Less

Submitted 10 July, 2024; v1 submitted 25 December, 2023; originally announced December 2023.

Comments: Accepted for Publication in IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)

arXiv:2312.13629 [pdf, other]

$PT$ Symmetric PINN for integrable nonlocal equations: Forward and inverse problems

Authors: Wei-Qi Peng, Yong Chen

Abstract: Since the $PT$-symmetric nonlocal equations contain the physical information of the $PT$-symmetric, it is very appropriate to embed the physical information of the $PT$-symmetric into the loss function of PINN, named PTS-PINN. For general $PT$-symmetric nonlocal equations, especially those equations involving the derivation of nonlocal terms, due to the existence of nonlocal terms, directly using… ▽ More Since the $PT$-symmetric nonlocal equations contain the physical information of the $PT$-symmetric, it is very appropriate to embed the physical information of the $PT$-symmetric into the loss function of PINN, named PTS-PINN. For general $PT$-symmetric nonlocal equations, especially those equations involving the derivation of nonlocal terms, due to the existence of nonlocal terms, directly using the original PINN method to solve such nonlocal equations will face certain challenges. This problem can be solved by the PTS-PINN method which can be illustrated in two aspects. First, we treat the nonlocal term of the equation as a new local component, so that the equation is coupled at this time. In this way, we successfully avoid differentiating nonlocal terms in neural networks. On the other hand, in order to improve the accuracy, we make a second improvement, which is to embed the physical information of the $PT$-symmetric into the loss function. Through a series of independent numerical experiments, we evaluate the efficacy of PTS-PINN in tackling the forward and inverse problems for the nonlocal nonlinear Schrödinger (NLS) equation, the nonlocal derivative NLS equation, the nonlocal (2+1)-dimensional NLS equation, and the nonlocal three wave interaction systems. The numerical experiments demonstrate that PTS-PINN has good performance. In particular, PTS-PINN has also demonstrated an extraordinary ability to learn large space-time scale rogue waves for nonlocal equations. △ Less

Submitted 11 January, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

arXiv:2312.11887 [pdf, other]

Searching for Intermediate Mass Black Holes in Globular Clusters Through Tidal Disruption Events

Authors: Vivian L. Tang, Piero Madau, Elisa Bortolas, Eric W. Peng

Abstract: Intermediate mass black holes (IMBHs) may be the link between stellar mass holes and the supermassive variety in the nuclei of galaxies, and globular clusters (GCs) may be one of the most promising environments for their formation. Here we carry out a pilot study of the observability of tidal disruption events (TDEs) from 10^3 Msun < M_BH < 10^5 Msun IMBHs embedded in stellar cusps at the center o… ▽ More Intermediate mass black holes (IMBHs) may be the link between stellar mass holes and the supermassive variety in the nuclei of galaxies, and globular clusters (GCs) may be one of the most promising environments for their formation. Here we carry out a pilot study of the observability of tidal disruption events (TDEs) from 10^3 Msun < M_BH < 10^5 Msun IMBHs embedded in stellar cusps at the center of GCs. We model the long super-Eddington accretion phase and ensuing optical flare, and derive the disruption rate of main-sequence stars as a function of black hole mass and GC properties with the help of a 1D Fokker-Planck approach. The photospheric emission of the adiabatically expanding outflow dominates the observable radiation and peaks in the NUV/optical bands, outshining the brightness of the (old) stellar population of GCs in Virgo for a period of months to years. A search for TDE events in a sample of nearly 4,000 GCs observed at multiple epochs by the Next Generation Virgo Cluster Survey (NGVS) yields null results. Given our model predictions, this sample is too small to set stringent constraints on the present-day occupation fraction of GCs hosting IMBHs. Naturally, better simulations of the properties of the cluster central stellar distribution, TDE light curves and rates, together with larger surveys of GCs are all needed to gain deeper insights into the presence of IMBHs in GCs. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: 12 pages, 9 figures, submitted for publication in The Astrophysical Journal

arXiv:2312.11553 [pdf, other]

SeGA: Preference-Aware Self-Contrastive Learning with Prompts for Anomalous User Detection on Twitter

Authors: Ying-Ying Chang, Wei-Yao Wang, Wen-Chih Peng

Abstract: In the dynamic and rapidly evolving world of social media, detecting anomalous users has become a crucial task to address malicious activities such as misinformation and cyberbullying. As the increasing number of anomalous users improves the ability to mimic normal users and evade detection, existing methods only focusing on bot detection are ineffective in terms of capturing subtle distinctions b… ▽ More In the dynamic and rapidly evolving world of social media, detecting anomalous users has become a crucial task to address malicious activities such as misinformation and cyberbullying. As the increasing number of anomalous users improves the ability to mimic normal users and evade detection, existing methods only focusing on bot detection are ineffective in terms of capturing subtle distinctions between users. To address these challenges, we proposed SeGA, preference-aware self-contrastive learning for anomalous user detection, which leverages heterogeneous entities and their relations in the Twittersphere to detect anomalous users with different malicious strategies. SeGA utilizes the knowledge of large language models to summarize user preferences via posts. In addition, integrating user preferences with prompts as pseudo-labels for preference-aware self-contrastive learning enables the model to learn multifaceted aspects for describing the behaviors of users. Extensive experiments on the proposed TwBNT benchmark demonstrate that SeGA significantly outperforms the state-of-the-art methods (+3.5\% ~ 27.6\%) and empirically validate the effectiveness of the model design and pre-training strategies. Our code and data are publicly available at https://github.com/ying0409/SeGA. △ Less

Submitted 17 December, 2023; originally announced December 2023.

Comments: AAAI 2024 Main Track

arXiv:2312.10942 [pdf, other]

ShuttleSHAP: A Turn-Based Feature Attribution Approach for Analyzing Forecasting Models in Badminton

Authors: Wei-Yao Wang, Wen-Chih Peng, Wei Wang, Philip S. Yu

Abstract: Agent forecasting systems have been explored to investigate agent patterns and improve decision-making in various domains, e.g., pedestrian predictions and marketing bidding. Badminton represents a fascinating example of a multifaceted turn-based sport, requiring both sophisticated tactic developments and alternate-dependent decision-making. Recent deep learning approaches for player tactic foreca… ▽ More Agent forecasting systems have been explored to investigate agent patterns and improve decision-making in various domains, e.g., pedestrian predictions and marketing bidding. Badminton represents a fascinating example of a multifaceted turn-based sport, requiring both sophisticated tactic developments and alternate-dependent decision-making. Recent deep learning approaches for player tactic forecasting in badminton show promising performance partially attributed to effective reasoning about rally-player interactions. However, a critical obstacle lies in the unclear functionality of which features are learned for simulating players' behaviors by black-box models, where existing explainers are not equipped with turn-based and multi-output attributions. To bridge this gap, we propose a turn-based feature attribution approach, ShuttleSHAP, for analyzing forecasting models in badminton based on variants of Shapley values. ShuttleSHAP is a model-agnostic explainer that aims to quantify contribution by not only temporal aspects but also player aspects in terms of multifaceted cues. Incorporating the proposed analysis tool into the state-of-the-art turn-based forecasting model on the benchmark dataset reveals that it is, in fact, insignificant to reason about past strokes, while conventional sequential models have greater impacts. Instead, players' styles influence the models for the future simulation of a rally. On top of that, we investigate and discuss the causal analysis of these findings and demonstrate the practicability with local analysis. △ Less

Submitted 18 December, 2023; originally announced December 2023.

Comments: Preprint

arXiv:2312.09827 [pdf, other]

doi 10.1103/PhysRevC.109.054910

Identified charged-hadron production in $p$$+$Al, $^3$He$+$Au, and Cu$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV and in U$+$U collisions at $\sqrt{s_{_{NN}}}=193$ GeV

Authors: PHENIX Collaboration, N. J. Abdulameer, U. Acharya, A. Adare, C. Aidala, N. N. Ajitanand, Y. Akiba, R. Akimoto, J. Alexander, M. Alfred, V. Andrieux, K. Aoki, N. Apadula, H. Asano, E. T. Atomssa, T. C. Awes, B. Azmoun, V. Babintsev, M. Bai, X. Bai, N. S. Bandara, B. Bannier, K. N. Barish, S. Bathe, V. Baublis , et al. (456 additional authors not shown)

Abstract: The PHENIX experiment has performed a systematic study of identified charged-hadron ($π^\pm$, $K^\pm$, $p$, $\bar{p}$) production at midrapidity in $p$$+$Al, $^3$He$+$Au, Cu$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV and U$+$U collisions at $\sqrt{s_{_{NN}}}=193$ GeV. Identified charged-hadron invariant transverse-momentum ($p_T$) and transverse-mass ($m_T$) spectra are presented and interprete… ▽ More The PHENIX experiment has performed a systematic study of identified charged-hadron ($π^\pm$, $K^\pm$, $p$, $\bar{p}$) production at midrapidity in $p$$+$Al, $^3$He$+$Au, Cu$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV and U$+$U collisions at $\sqrt{s_{_{NN}}}=193$ GeV. Identified charged-hadron invariant transverse-momentum ($p_T$) and transverse-mass ($m_T$) spectra are presented and interpreted in terms of radially expanding thermalized systems. The particle ratios of $K/π$ and $p/π$ have been measured in different centrality ranges of large (Cu$+$Au, U$+$U) and small ($p$$+$Al, $^3$He$+$Au) collision systems. The values of $K/π$ ratios measured in all considered collision systems were found to be consistent with those measured in $p$$+$$p$ collisions. However the values of $p/π$ ratios measured in large collision systems reach the values of $\approx0.6$, which is $\approx2$ times larger than in $p$$+$$p$ collisions. These results can be qualitatively understood in terms of the baryon enhancement expected from hadronization by recombination. Identified charged-hadron nuclear-modification factors ($R_{AB}$) are also presented. Enhancement of proton $R_{AB}$ values over meson $R_{AB}$ values was observed in central $^3$He$+$Au, Cu$+$Au, and U$+$U collisions. The proton $R_{AB}$ values measured in $p$$+$Al collision system were found to be consistent with $R_{AB}$ values of $φ$, $π^\pm$, $K^\pm$, and $π^0$ mesons, which may indicate that the size of the system produced in $p$$+$Al collisions is too small for recombination to cause a noticeable increase in proton production. △ Less

Submitted 22 May, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: 480 authors from 78 institutions, 18 pages, 6 tables, 16 figures. v2 is version accepted for publication in Physical Review C. HEPdata tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.html

Journal ref: Phys. Rev. C 109, 054910 (2024)

arXiv:2312.06372 [pdf, other]

Ternary Spike: Learning Ternary Spikes for Spiking Neural Networks

Authors: Yufei Guo, Yuanpei Chen, Xiaode Liu, Weihang Peng, Yuhan Zhang, Xuhui Huang, Zhe Ma

Abstract: The Spiking Neural Network (SNN), as one of the biologically inspired neural network infrastructures, has drawn increasing attention recently. It adopts binary spike activations to transmit information, thus the multiplications of activations and weights can be substituted by additions, which brings high energy efficiency. However, in the paper, we theoretically and experimentally prove that the b… ▽ More The Spiking Neural Network (SNN), as one of the biologically inspired neural network infrastructures, has drawn increasing attention recently. It adopts binary spike activations to transmit information, thus the multiplications of activations and weights can be substituted by additions, which brings high energy efficiency. However, in the paper, we theoretically and experimentally prove that the binary spike activation map cannot carry enough information, thus causing information loss and resulting in accuracy decreasing. To handle the problem, we propose a ternary spike neuron to transmit information. The ternary spike neuron can also enjoy the event-driven and multiplication-free operation advantages of the binary spike neuron but will boost the information capacity. Furthermore, we also embed a trainable factor in the ternary spike neuron to learn the suitable spike amplitude, thus our SNN will adopt different spike amplitudes along layers, which can better suit the phenomenon that the membrane potential distributions are different along layers. To retain the efficiency of the vanilla ternary spike, the trainable ternary spike SNN will be converted to a standard one again via a re-parameterization technique in the inference. Extensive experiments with several popular network structures over static and dynamic datasets show that the ternary spike can consistently outperform state-of-the-art methods. Our code is open-sourced at https://github.com/yfguo91/Ternary-Spike. △ Less

Submitted 16 December, 2023; v1 submitted 11 December, 2023; originally announced December 2023.

Comments: Accepted by AAAI2024

Showing 1–50 of 604 results for author: Peng, W