subscribe to arXiv mailings

Novel clustered federated learning based on local loss

Authors: Endong Gu, Yongxin Chen, Hao Wen, Xingju Cai, Deren Han

Abstract: This paper proposes LCFL, a novel clustering metric for evaluating clients' data distributions in federated learning. LCFL aligns with federated learning requirements, accurately assessing client-to-client variations in data distribution. It offers advantages over existing clustered federated learning methods, addressing privacy concerns, improving applicability to non-convex models, and providing… ▽ More This paper proposes LCFL, a novel clustering metric for evaluating clients' data distributions in federated learning. LCFL aligns with federated learning requirements, accurately assessing client-to-client variations in data distribution. It offers advantages over existing clustered federated learning methods, addressing privacy concerns, improving applicability to non-convex models, and providing more accurate classification results. LCFL does not require prior knowledge of clients' data distributions. We provide a rigorous mathematical analysis, demonstrating the correctness and feasibility of our framework. Numerical experiments with neural network instances highlight the superior performance of LCFL over baselines on several clustered federated learning benchmarks. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.08547 [pdf]

Necklace-like pattern of vortex bound states

Authors: Zhiyong Hou, Kailun Chen, Wenshan Hong, Da Wang, Wen Duan, Huan Yang, Shiliang Li, Huiqian Luo, Qiang-Hua Wang, Tao Xiang, Hai-Hu Wen

Abstract: Vortex is a topological defect in the superconducting condensate when a magnetic field is applied to a type-II superconductor, as elucidated by the Ginzburg-Landau theory. Due to the confinement of the quasiparticles by a vortex, it exhibits a circular shaped pattern of bound states with discrete energy levels, as predicted by the Caroli-de Gennes-Matricon theory in 1964. Here, however, we report… ▽ More Vortex is a topological defect in the superconducting condensate when a magnetic field is applied to a type-II superconductor, as elucidated by the Ginzburg-Landau theory. Due to the confinement of the quasiparticles by a vortex, it exhibits a circular shaped pattern of bound states with discrete energy levels, as predicted by the Caroli-de Gennes-Matricon theory in 1964. Here, however, we report a completely new type of vortex pattern which is necklace-like in an iron-based superconductor KCa2Fe4As4F2. Our theoretical analysis shows that this necklace-like vortex pattern arises from selective off-shell interference between vortex bound states of opposite angular momenta in the presence of rotational symmetry breaking due to disorders. This fascinating effect can be observed in a system with a small Fermi energy and wave vector, conditions fortuitously met in our samples. Our results not only disclose a novel vortex structure but also provide insights into comprehending the physics of the superconducting condensate. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 29 pages total; 16 pages of main text with 5 figures, 13 pages of supplementary materials with 10 figures

arXiv:2407.08532 [pdf, other]

Tactics, Techniques, and Procedures (TTPs) in Interpreted Malware: A Zero-Shot Generation with Large Language Models

Authors: Ying Zhang, Xiaoyan Zhou, Hui Wen, Wenjia Niu, Jiqiang Liu, Haining Wang, Qiang Li

Abstract: Nowadays, the open-source software (OSS) ecosystem suffers from security threats of software supply chain (SSC) attacks. Interpreted OSS malware plays a vital role in SSC attacks, as criminals have an arsenal of attack vectors to deceive users into installing malware and executing malicious activities. In this paper, we introduce tactics, techniques, and procedures (TTPs) proposed by MITRE ATT\&CK… ▽ More Nowadays, the open-source software (OSS) ecosystem suffers from security threats of software supply chain (SSC) attacks. Interpreted OSS malware plays a vital role in SSC attacks, as criminals have an arsenal of attack vectors to deceive users into installing malware and executing malicious activities. In this paper, we introduce tactics, techniques, and procedures (TTPs) proposed by MITRE ATT\&CK into the interpreted malware analysis to characterize different phases of an attack lifecycle. Specifically, we propose GENTTP, a zero-shot approach to extracting a TTP of an interpreted malware package. GENTTP leverages large language models (LLMs) to automatically generate a TTP, where the input is a malicious package, and the output is a deceptive tactic and an execution tactic of attack vectors. To validate the effectiveness of GENTTP, we collect two datasets for evaluation: a dataset with ground truth labels and a large dataset in the wild. Experimental results show that GENTTP can generate TTPs with high accuracy and efficiency. To demonstrate GENTTP's benefits, we build an LLM-based Chatbot from 3,700+ PyPI malware's TTPs. We further conduct a quantitative analysis of malware's TTPs at a large scale. Our main findings include: (1) many OSS malicious packages share a relatively stable TTP, even with the increasing emergence of malware and attack campaigns, (2) a TTP reflects characteristics of a malware-based attack, and (3) an attacker's intent behind the malware is linked to a TTP. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 19 pages, 11 figures

arXiv:2407.06853 [pdf, other]

TimeTravel: Real-time Timing Drift Attack on System Time Using Acoustic Waves

Authors: Jianshuo Liu, Hong Li, Haining Wang, Mengjie Sun, Hui Wen, Jinfa Wang, Limin Sun

Abstract: Real-time Clock (RTC) has been widely used in various real-time systems to provide precise system time. In this paper, we reveal a new security vulnerability of the RTC circuit, where the internal storage time or timestamp can be arbitrarily modified forward or backward. The security threat of dynamic modifications of system time caused by this vulnerability is called TimeTravel. Based on acoustic… ▽ More Real-time Clock (RTC) has been widely used in various real-time systems to provide precise system time. In this paper, we reveal a new security vulnerability of the RTC circuit, where the internal storage time or timestamp can be arbitrarily modified forward or backward. The security threat of dynamic modifications of system time caused by this vulnerability is called TimeTravel. Based on acoustic resonance and piezoelectric effects, TimeTravel applies acoustic guide waves to the quartz crystal, thereby adjusting the characteristics of the oscillating signal transmitted into the RTC circuit. By manipulating the parameters of acoustic waves, TimeTravel can accelerate or decelerate the timing speed of system time at an adjustable rate, resulting in the relative drift of the timing, which can pose serious safety threats. To assess the severity of TimeTravel, we examine nine modules and two commercial devices under the RTC circuit. The experimental results show that TimeTravel can drift system time forward and backward at a chosen speed with a maximum 93% accuracy. Our analysis further shows that TimeTravel can maintain an attack success rate of no less than 77% under environments with typical obstacle items. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: Accepted by USENIX Security 2024 winter cycle and will appear in USENIX Security 2025

arXiv:2407.06348 [pdf, other]

FORAY: Towards Effective Attack Synthesis against Deep Logical Vulnerabilities in DeFi Protocols

Authors: Hongbo Wen, Hanzhi Liu, Jiaxin Song, Yanju Chen, Wenbo Guo, Yu Feng

Abstract: Blockchain adoption has surged with the rise of Decentralized Finance (DeFi) applications. However, the significant value of digital assets managed by DeFi protocols makes them prime targets for attacks. Current smart contract vulnerability detection tools struggle with DeFi protocols due to deep logical bugs arising from complex financial interactions between multiple smart contracts. These tools… ▽ More Blockchain adoption has surged with the rise of Decentralized Finance (DeFi) applications. However, the significant value of digital assets managed by DeFi protocols makes them prime targets for attacks. Current smart contract vulnerability detection tools struggle with DeFi protocols due to deep logical bugs arising from complex financial interactions between multiple smart contracts. These tools primarily analyze individual contracts and resort to brute-force methods for DeFi protocols crossing numerous smart contracts, leading to inefficiency. We introduce Foray, a highly effective attack synthesis framework against deep logical bugs in DeFi protocols. Foray proposes a novel attack sketch generation and completion framework. Specifically, instead of treating DeFis as regular programs, we design a domain-specific language (DSL) to lift the low-level smart contracts into their high-level financial operations. Based on our DSL, we first compile a given DeFi protocol into a token flow graph, our graphical representation of DeFi protocols. Then, we design an efficient sketch generation method to synthesize attack sketches for a certain attack goal (e.g., price manipulation, arbitrage, etc.). This algorithm strategically identifies candidate sketches by finding reachable paths in TFG, which is much more efficient than random enumeration. For each candidate sketch written in our DSL, Foray designs a domain-specific symbolic compilation to compile it into SMT constraints. Our compilation simplifies the constraints by removing redundant smart contract semantics. It maintains the usability of symbolic compilation, yet scales to problems orders of magnitude larger. Finally, the candidates are completed via existing solvers and are transformed into concrete attacks via direct syntax transformation. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.06061 [pdf]

Superconductivity up to 14.2 K in MnB$_4$ under pressure

Authors: Zhe-Ning Xiang, Ying-Jie Zhang, Qing Lu, Qing Li, Yiwen Li, Tianheng Huang, Yijie Zhu, Yongze Ye, Jian Sun, Hai-Hu Wen

Abstract: The discovery of superconductivity in 3$d$-transition metal compounds with strong magnetism is interesting but rare. Especially for Mn-based compounds, there exist only very limited materials that show superconductivity. Here, we report the discovery of superconductivity up to 14.2 K in a Mn-based material MnB$_4$. By applying high pressures, we found the continuous suppression of a weak insulatin… ▽ More The discovery of superconductivity in 3$d$-transition metal compounds with strong magnetism is interesting but rare. Especially for Mn-based compounds, there exist only very limited materials that show superconductivity. Here, we report the discovery of superconductivity up to 14.2 K in a Mn-based material MnB$_4$. By applying high pressures, we found the continuous suppression of a weak insulating behavior and the occurrence of superconductivity after about 30 GPa. With further increasing pressure, $T_\text{c}$ is gradually enhanced and reaches the maximum value of about 14.2 K at 150 GPa with a Fermi-Liquid behavior in the normal states. The synchrotron X-ray diffraction data reveal the unchanged monoclinic (S.G: $P2_1/c$) symmetry but an unusual crossover of the lattice parameters $b$ and $c$. Theoretical calculations based on the electron-phonon coupling picture reveal a very low $T_\text{c}$ (less than 1 K), manifesting an exotic pairing mechanism beyond the Bardeen-Cooper-Schrieffer (BCS) theory. Our findings show a promising way to explore high $T_\text{c}$ superconductivity by combining the 3d-transition metal magnetic elements and light elements. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: 36 pages total; 24 pages of main text with 5 figures, 12 pages of supplement with 1 table and 8 figures

arXiv:2407.06001 [pdf, other]

Pseudo-triplet Guided Few-shot Composed Image Retrieval

Authors: Bohan Hou, Haoqiang Lin, Haokun Wen, Meng Liu, Xuemeng Song

Abstract: Composed Image Retrieval (CIR) is a challenging task that aims to retrieve the target image based on a multimodal query, i.e., a reference image and its corresponding modification text. While previous supervised or zero-shot learning paradigms all fail to strike a good trade-off between time-consuming annotation cost and retrieval performance, recent researchers introduced the task of few-shot CIR… ▽ More Composed Image Retrieval (CIR) is a challenging task that aims to retrieve the target image based on a multimodal query, i.e., a reference image and its corresponding modification text. While previous supervised or zero-shot learning paradigms all fail to strike a good trade-off between time-consuming annotation cost and retrieval performance, recent researchers introduced the task of few-shot CIR (FS-CIR) and proposed a textual inversion-based network based on pretrained CLIP model to realize it. Despite its promising performance, the approach suffers from two key limitations: insufficient multimodal query composition training and indiscriminative training triplet selection. To address these two limitations, in this work, we propose a novel two-stage pseudo triplet guided few-shot CIR scheme, dubbed PTG-FSCIR. In the first stage, we employ a masked training strategy and advanced image caption generator to construct pseudo triplets from pure image data to enable the model to acquire primary knowledge related to multimodal query composition. In the second stage, based on active learning, we design a pseudo modification text-based query-target distance metric to evaluate the challenging score for each unlabeled sample. Meanwhile, we propose a robust top range-based random sampling strategy according to the 3-$σ$ rule in statistics, to sample the challenging samples for fine-tuning the pretrained model. Notably, our scheme is plug-and-play and compatible with any existing supervised CIR models. We tested our scheme across three backbones on three public datasets (i.e., FashionIQ, CIRR, and Birds-to-Words), achieving maximum improvements of 26.4%, 25.5% and 21.6% respectively, demonstrating our scheme's effectiveness. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: 15 pages, 5 figures,

arXiv:2406.17662 [pdf]

A piezoelectric ski-jump laser beam scanning chip-to-free space photonic link

Authors: Matt Saha, Y. Henry Wen, Andrew S. Greenspon, Matthew Zimmermann, Kevin J. Palm, Alex Witte, Mark Dong, Andrew J. Leenheer, Genevieve Clark, Gerald Gilbert, Matt Eichenfield, Dirk Englund

Abstract: A seamless interface between integrated photonic processors and targets in free-space enables wide-ranging advancements in telescopy, free-space communication, optical ranging, materials processing, biomedical imaging, near eye display, machine optical intelligence and quantum control. An optimal solution allows for 2D scanning from anywhere on a photonic chip over a large number of diffraction li… ▽ More A seamless interface between integrated photonic processors and targets in free-space enables wide-ranging advancements in telescopy, free-space communication, optical ranging, materials processing, biomedical imaging, near eye display, machine optical intelligence and quantum control. An optimal solution allows for 2D scanning from anywhere on a photonic chip over a large number of diffraction limited spots in the far field. Leading approaches rely on scanners where the numerical aperture and actuator size are linked, resulting in a trade off between resolution, speed and footprint, whereas scanning fibers have been limited to bulk optical and mechanical components. Here, we introduce a CMOS fabricated photonic "ski-jump" composed of a broadband, single mode silicon nitride waveguide monolithically integrated atop a piezo-actuated cantilever. The ski-jump passively curl 90 degrees out-of-plane via mechanical meta-stress engineering in a footprint of less than 0.1 mm squared and emit submicron diffraction-limited optical modes with piezoelectric steering. They also exhibit kHz-rate longitudinal and lateral mechanical resonances with displacement ranges exceeding 400 micron and 180 micron, respectively, and quality factors Q>10,000 under vacuum. These resonances enable 2D beam scanning at footprint-adjusted spot-rates of 68.6 Megaspot/s-mm squared surpassing state-of-the-art MEMS mirrors by more than 50. Using these devices, we demonstrate arbitrary 2D image projection and the repeatable initialization and readout of single photons from silicon vacancies in diamond waveguides. Based on current device performance, we identify pathways for achieving >1 Giga-spots in a square cm area to provide a seamless, scalable optical pipeline between integrated photonic processors and the free-space world. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: 12 pages main text, 1 page methods, 10 pages supplementary information

arXiv:2406.17067 [pdf]

Optical Control of Adaptive Nanoscale Domain Networks

Authors: Marc Zajac, Tao Zhou, Tiannan Yang, Sujit Das, Yue Cao, Burak Guzelturk, Vladimir Stoica, Mathew Cherukara, John W. Freeland, Venkatraman Gopalan, Ramamoorthy Ramesh, Lane W. Martin, Long-Qing Chen, Martin Holt, Stephan Hruszkewycz, Haidan Wen

Abstract: Adaptive networks can sense and adjust to dynamic environments to optimize their performance. Understanding their nanoscale responses to external stimuli is essential for applications in nanodevices and neuromorphic computing. However, it is challenging to image such responses on the nanoscale with crystallographic sensitivity. Here, the evolution of nanodomain networks in (PbTiO3)n/(SrTiO3)n supe… ▽ More Adaptive networks can sense and adjust to dynamic environments to optimize their performance. Understanding their nanoscale responses to external stimuli is essential for applications in nanodevices and neuromorphic computing. However, it is challenging to image such responses on the nanoscale with crystallographic sensitivity. Here, the evolution of nanodomain networks in (PbTiO3)n/(SrTiO3)n superlattices was directly visualized in real space as the system adapts to ultrafast repetitive optical excitations that emulate controlled neural inputs. The adaptive response allows the system to explore a wealth of metastable states that were previously inaccessible. Their reconfiguration and competition were quantitatively measured by scanning x-ray nanodiffraction as a function of the number of applied pulses, in which crystallographic characteristics were quantitatively assessed by assorted diffraction patterns using unsupervised machine-learning methods. The corresponding domain boundaries and their connectivity were drastically altered by light, holding promise for light-programmable nanocircuits in analogy to neuroplasticity. Phase-field simulations elucidate that the reconfiguration of the domain networks is a result of the interplay between photocarriers and transient lattice temperature. The demonstrated optical control scheme and the uncovered nanoscopic insights open opportunities for remote control of adaptive nanoscale domain networks. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.15423 [pdf]

Fast and accurate extraction of ultra-high quality factor from cavity ring-down measurement

Authors: Yanping Yang, Shihan Liu, Yong Geng, Huashun Wen, Heng Zhou

Abstract: Cavity ring-down is an essential test to measure ultra-high quality factor (UHQ) optical cavities, which is, however, frequently misinterpreted due to lacking of a specified analysis guideline. Here we clarify the basic property of cavity ring down and present a step-by-step method that enables extraction of the overall quality factor, as well as the intrinsic loss and coupling state of UHQ caviti… ▽ More Cavity ring-down is an essential test to measure ultra-high quality factor (UHQ) optical cavities, which is, however, frequently misinterpreted due to lacking of a specified analysis guideline. Here we clarify the basic property of cavity ring down and present a step-by-step method that enables extraction of the overall quality factor, as well as the intrinsic loss and coupling state of UHQ cavities with better fidelity and simplicity than prior schemes. Our work can facilitate acurrate design and characterization of UHQ cavities for ultra-low noise lasers, high finesse reference cavities, and ultra-narrow optical filters. △ Less

Submitted 21 May, 2024; originally announced June 2024.

arXiv:2406.11824 [pdf, other]

Infinigen Indoors: Photorealistic Indoor Scenes using Procedural Generation

Authors: Alexander Raistrick, Lingjie Mei, Karhan Kayan, David Yan, Yiming Zuo, Beining Han, Hongyu Wen, Meenal Parakh, Stamatis Alexandropoulos, Lahav Lipson, Zeyu Ma, Jia Deng

Abstract: We introduce Infinigen Indoors, a Blender-based procedural generator of photorealistic indoor scenes. It builds upon the existing Infinigen system, which focuses on natural scenes, but expands its coverage to indoor scenes by introducing a diverse library of procedural indoor assets, including furniture, architecture elements, appliances, and other day-to-day objects. It also introduces a constrai… ▽ More We introduce Infinigen Indoors, a Blender-based procedural generator of photorealistic indoor scenes. It builds upon the existing Infinigen system, which focuses on natural scenes, but expands its coverage to indoor scenes by introducing a diverse library of procedural indoor assets, including furniture, architecture elements, appliances, and other day-to-day objects. It also introduces a constraint-based arrangement system, which consists of a domain-specific language for expressing diverse constraints on scene composition, and a solver that generates scene compositions that maximally satisfy the constraints. We provide an export tool that allows the generated 3D objects and scenes to be directly used for training embodied agents in real-time simulators such as Omniverse and Unreal. Infinigen Indoors is open-sourced under the BSD license. Please visit https://infinigen.org for code and videos. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: Accepted to CVPR 2024

arXiv:2406.11569 [pdf, other]

Pre-Training and Personalized Fine-Tuning via Over-the-Air Federated Meta-Learning: Convergence-Generalization Trade-Offs

Authors: Haifeng Wen, Hong Xing, Osvaldo Simeone

Abstract: For modern artificial intelligence (AI) applications such as large language models (LLMs), the training paradigm has recently shifted to pre-training followed by fine-tuning. Furthermore, owing to dwindling open repositories of data and thanks to efforts to democratize access to AI models, pre-training is expected to increasingly migrate from the current centralized deployments to federated learni… ▽ More For modern artificial intelligence (AI) applications such as large language models (LLMs), the training paradigm has recently shifted to pre-training followed by fine-tuning. Furthermore, owing to dwindling open repositories of data and thanks to efforts to democratize access to AI models, pre-training is expected to increasingly migrate from the current centralized deployments to federated learning (FL) implementations. Meta-learning provides a general framework in which pre-training and fine-tuning can be formalized. Meta-learning-based personalized FL (meta-pFL) moves beyond basic personalization by targeting generalization to new agents and tasks. This paper studies the generalization performance of meta-pFL for a wireless setting in which the agents participating in the pre-training phase, i.e., meta-learning, are connected via a shared wireless channel to the server. Adopting over-the-air computing, we study the trade-off between generalization to new agents and tasks, on the one hand, and convergence, on the other hand. The trade-off arises from the fact that channel impairments may enhance generalization, while degrading convergence. Extensive numerical results validate the theory. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 37 pages, 7 figures, submitted for possible journal publication

arXiv:2406.09530 [pdf]

Emergent Atomic Scale Polarization Vortices

Authors: Boyang Zhao, Gwan Yeong Jung, Huandong Chen, Shantanu Singh, Zhengyu Du, Claire Wu, Guodong Ren, Qinai Zhao, Nicholas S. Settineri, Simon J. Teat, Haidan Wen, Rohan Mishra, Jayakanth Ravichandran

Abstract: Topological defects, such as vortices and skyrmions in magnetic and dipolar systems, can give rise to properties that are not observed in typical magnets or dielectrics. Here, we report the discovery of an atomic-scale dipolar vortex lattice in the charge-density-wave (CDW) phase of BaTiS3, a quasi-one-dimensional (quasi-1D) hexagonal chalcogenide, using X-ray synchrotron single-crystal diffractio… ▽ More Topological defects, such as vortices and skyrmions in magnetic and dipolar systems, can give rise to properties that are not observed in typical magnets or dielectrics. Here, we report the discovery of an atomic-scale dipolar vortex lattice in the charge-density-wave (CDW) phase of BaTiS3, a quasi-one-dimensional (quasi-1D) hexagonal chalcogenide, using X-ray synchrotron single-crystal diffraction studies. The vortex lattice consists of a periodic array of vortex-vortex-antivortex patterns composed of electric dipoles from off-center displacements of octahedrally coordinated Ti atoms. Using first-principles calculations and phenomenological modeling, we show that the dipolar vortex lattice in BaTiS3 arises from the coupling between multiple lattice instabilities arising from flat, soft phonon bands. This mechanism contrasts with classical dipolar textures in ferroelectric heterostructures that emerge from the competition between electrostatic and strain energies, and necessitate a dimensional reduction in the form of thin films and heterostructures to stabilize the textures. The observation of dipolar vortices in BaTiS3 brings the ultimate scaling limit for dipolar topologies down to about a nanometer and unveils the intimate connection between crystal symmetry and real-space topology. Our work sets up zero-filling triangular lattice materials with instabilities as a playground for realizing and understanding quantum polarization topologies. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.08789 [pdf, ps, other]

Growth and characterization of the La$_{3}$Ni$_{2}$O$_{7-δ}$ thin films: dominant contribution of the $d_{x^{2}-y^{2}}$ orbital at ambient pressure

Authors: Yuecong Liu, Mengjun Ou, Haifeng Chu, Huan Yang, Qing Li, Yingjie Zhang, Hai-Hu Wen

Abstract: By using the pulsed-laser-ablation technique, we have successfully grown the La$_{3}$Ni$_{2}$O$_{7-δ}$ thin films with $c$-axis orientation perpendicular to the film surface. X-ray diffraction shows that the (00l) peaks can be well indexed to the La$_{3}$Ni$_{2}$O$_{7-δ}$ phase. Resistive measurements show that the samples can be tuned from weak insulating to metallic behavior through adjusting th… ▽ More By using the pulsed-laser-ablation technique, we have successfully grown the La$_{3}$Ni$_{2}$O$_{7-δ}$ thin films with $c$-axis orientation perpendicular to the film surface. X-ray diffraction shows that the (00l) peaks can be well indexed to the La$_{3}$Ni$_{2}$O$_{7-δ}$ phase. Resistive measurements show that the samples can be tuned from weak insulating to metallic behavior through adjusting the growth conditions. Surprisingly, all curves of $ρ-T$ in the temperature region of 2$\sim$300~K do not show the anomalies corresponding to either the spin density wave or the charge density wave orders as seen in bulk samples. Hall effect measurements show a linear field dependence with the dominant hole charge carriers, but the Hall coefficient $R_{H}=ρ_{xy}/H$ exhibits strong temperature dependence. The magnetoresistance above about 50~K is positive but very weak, indicating the absence of multiband effect. However, a negative magnetoresistance is observed at low temperatures, which shows the delocalization effect. Detailed analysis on the magnetoresistance suggests that the delocalization effect at low temperatures is due to the Kondo-like effect, rather than the Anderson weak localization. Our transport results suggest that, the electronic conduction is fulfilled by the $d_{x^{2}-y^{2}}$ orbital with holes as the dominant charge carriers, while the interaction through Hund's coupling with the localized $d_{z^{2}}$ orbital plays an important role in the charge dynamics. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.06832 [pdf, other]

Hidden correlations in stochastic photoinduced dynamics of a solid-state electrolyte

Authors: Jackson McClellan, Alfred Zong, Kim H. Pham, Hanzhe Liu, Zachery W. B. Iton, Burak Guzelturk, Donald A. Walko, Haidan Wen, Scott K. Cushing, Michael W. Zuerch

Abstract: Photoexcitation by ultrashort laser pulses plays a crucial role in controlling reaction pathways, creating nonequilibrium material properties, and offering a microscopic view of complex dynamics at the molecular level. The photo response following a laser pulse is, in general, non-identical between multiple exposures due to spatiotemporal fluctuations in a material or the stochastic nature of dyna… ▽ More Photoexcitation by ultrashort laser pulses plays a crucial role in controlling reaction pathways, creating nonequilibrium material properties, and offering a microscopic view of complex dynamics at the molecular level. The photo response following a laser pulse is, in general, non-identical between multiple exposures due to spatiotemporal fluctuations in a material or the stochastic nature of dynamical pathways. However, most ultrafast experiments using a stroboscopic pump-probe scheme struggle to distinguish intrinsic sample fluctuations from extrinsic apparatus noise, often missing seemingly random deviations from the averaged shot-to-shot response. Leveraging the stability and high photon-flux of time-resolved X-ray micro-diffraction at a synchrotron, we developed a method to quantitatively characterize the shot-to-shot variation of the photoinduced dynamics in a solid-state electrolyte. By analyzing temporal evolutions of the lattice parameter of a single grain in a powder ensemble, we found that the sample responses after different shots contain random fluctuations that are, however, not independent. Instead, there is a correlation between the nonequilibrium lattice trajectories following adjacent laser shots with a characteristic "correlation length" of approximately 1,500 shots, which represents an energy barrier of 0.38~eV for switching the photoinduced pathway, a value interestingly commensurate with the activation energy of lithium ion diffusion. Not only does our nonequilibrium noise correlation spectroscopy provide a new strategy for studying fluctuations that are central to phase transitions in both condensed matter and molecular systems, it also paves the way for discovering hidden correlations and novel metastable states buried in oft-presumed random, uncorrelated fluctuating dynamics. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.04837 [pdf, other]

Normal and superconducting properties of La$_3$Ni$_2$O$_7$

Authors: Meng Wang, Hai-Hu Wen, Tao Wu, Dao-Xin Yao, Tao Xiang

Abstract: This review provides a comprehensive overview of current research on the structural, electronic, and magnetic characteristics of the recently discovered high-temperature superconductor La$_3$Ni$_2$O$_7$ under high pressures. We present the experimental results for synthesizing and characterizing this material, derived from measurements of transport, thermodynamics, and various spectroscopic techni… ▽ More This review provides a comprehensive overview of current research on the structural, electronic, and magnetic characteristics of the recently discovered high-temperature superconductor La$_3$Ni$_2$O$_7$ under high pressures. We present the experimental results for synthesizing and characterizing this material, derived from measurements of transport, thermodynamics, and various spectroscopic techniques, and discuss their physical implications. We also explore theoretical models proposed to describe the electronic structures and superconducting pairing symmetry in La$_3$Ni$_2$O$_7$, highlighting the intricate interplay between electronic correlations and magnetic interactions. Despite these advances, challenges remain in growing high-quality samples free of extrinsic phases and oxygen deficiencies and in developing reliable measurement tools for determining diamagnetism and other physical quantities under high pressures. Further investigations in these areas are essential to deepening our understanding of the physical properties of La$_3$Ni$_2$O$_7$ and unlocking its superconducting pairing mechanism. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: 15 pages, 11 figures

arXiv:2406.03184 [pdf, other]

Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion

Authors: Hao Wen, Zehuan Huang, Yaohui Wang, Xinyuan Chen, Yu Qiao, Lu Sheng

Abstract: Existing single image-to-3D creation methods typically involve a two-stage process, first generating multi-view images, and then using these images for 3D reconstruction. However, training these two stages separately leads to significant data bias in the inference phase, thus affecting the quality of reconstructed results. We introduce a unified 3D generation framework, named Ouroboros3D, which in… ▽ More Existing single image-to-3D creation methods typically involve a two-stage process, first generating multi-view images, and then using these images for 3D reconstruction. However, training these two stages separately leads to significant data bias in the inference phase, thus affecting the quality of reconstructed results. We introduce a unified 3D generation framework, named Ouroboros3D, which integrates diffusion-based multi-view image generation and 3D reconstruction into a recursive diffusion process. In our framework, these two modules are jointly trained through a self-conditioning mechanism, allowing them to adapt to each other's characteristics for robust inference. During the multi-view denoising process, the multi-view diffusion model uses the 3D-aware maps rendered by the reconstruction module at the previous timestep as additional conditions. The recursive diffusion framework with 3D-aware feedback unites the entire process and improves geometric consistency.Experiments show that our framework outperforms separation of these two stages and existing methods that combine them at the inference phase. Project page: https://costwen.github.io/Ouroboros3D/ △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: See our project page at https://costwen.github.io/Ouroboros3D/

arXiv:2405.19818 [pdf, other]

WebUOT-1M: Advancing Deep Underwater Object Tracking with A Million-Scale Benchmark

Authors: Chunhui Zhang, Li Liu, Guanjie Huang, Hao Wen, Xi Zhou, Yanfeng Wang

Abstract: Underwater object tracking (UOT) is a foundational task for identifying and tracing submerged entities in underwater video sequences. However, current UOT datasets suffer from limitations in scale, diversity of target categories and scenarios covered, hindering the training and evaluation of modern tracking algorithms. To bridge this gap, we take the first step and introduce WebUOT-1M, \ie, the la… ▽ More Underwater object tracking (UOT) is a foundational task for identifying and tracing submerged entities in underwater video sequences. However, current UOT datasets suffer from limitations in scale, diversity of target categories and scenarios covered, hindering the training and evaluation of modern tracking algorithms. To bridge this gap, we take the first step and introduce WebUOT-1M, \ie, the largest public UOT benchmark to date, sourced from complex and realistic underwater environments. It comprises 1.1 million frames across 1,500 video clips filtered from 408 target categories, largely surpassing previous UOT datasets, \eg, UVOT400. Through meticulous manual annotation and verification, we provide high-quality bounding boxes for underwater targets. Additionally, WebUOT-1M includes language prompts for video sequences, expanding its application areas, \eg, underwater vision-language tracking. Most existing trackers are tailored for open-air environments, leading to performance degradation when applied to UOT due to domain gaps. Retraining and fine-tuning these trackers are challenging due to sample imbalances and limited real-world underwater datasets. To tackle these challenges, we propose a novel omni-knowledge distillation framework based on WebUOT-1M, incorporating various strategies to guide the learning of the student Transformer. To the best of our knowledge, this framework is the first to effectively transfer open-air domain knowledge to the UOT model through knowledge distillation, as demonstrated by results on both existing UOT datasets and the newly proposed WebUOT-1M. Furthermore, we comprehensively evaluate WebUOT-1M using 30 deep trackers, showcasing its value as a benchmark for UOT research by presenting new challenges and opportunities for future studies. The complete dataset, codes and tracking results, will be made publicly available. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: GitHub project: https://github.com/983632847/Awesome-Multimodal-Object-Tracking

arXiv:2405.14200 [pdf, other]

Awesome Multi-modal Object Tracking

Authors: Chunhui Zhang, Li Liu, Hao Wen, Xi Zhou, Yanfeng Wang

Abstract: Multi-modal object tracking (MMOT) is an emerging field that combines data from various modalities, \eg vision (RGB), depth, thermal infrared, event, language and audio, to estimate the state of an arbitrary object in a video sequence. It is of great significance for many applications such as autonomous driving and intelligent surveillance. In recent years, MMOT has received more and more attentio… ▽ More Multi-modal object tracking (MMOT) is an emerging field that combines data from various modalities, \eg vision (RGB), depth, thermal infrared, event, language and audio, to estimate the state of an arbitrary object in a video sequence. It is of great significance for many applications such as autonomous driving and intelligent surveillance. In recent years, MMOT has received more and more attention. However, existing MMOT algorithms mainly focus on two modalities (\eg RGB+depth, RGB+thermal infrared, and RGB+language). To leverage more modalities, some recent efforts have been made to learn a unified visual object tracking model for any modality. Additionally, some large-scale multi-modal tracking benchmarks have been established by simultaneously providing more than two modalities, such as vision-language-audio (\eg WebUAV-3M) and vision-depth-language (\eg UniMod1K). To track the latest progress in MMOT, we conduct a comprehensive investigation in this report. Specifically, we first divide existing MMOT tasks into five main categories, \ie RGBL tracking, RGBE tracking, RGBD tracking, RGBT tracking, and miscellaneous (RGB+X), where X can be any modality, such as language, depth, and event. Then, we analyze and summarize each MMOT task, focusing on widely used datasets and mainstream tracking algorithms based on their technical paradigms (\eg self-supervised learning, prompt learning, knowledge distillation, generative models, and state space models). Finally, we maintain a continuously updated paper list for MMOT at https://github.com/983632847/Awesome-Multimodal-Object-Tracking. △ Less

Submitted 31 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

Comments: A continuously updated project to track the latest progress in multi-modal object tracking

arXiv:2405.14135 [pdf, other]

Learning Geospatial Region Embedding with Heterogeneous Graph

Authors: Xingchen Zou, Jiani Huang, Xixuan Hao, Yuhao Yang, Haomin Wen, Yibo Yan, Chao Huang, Yuxuan Liang

Abstract: Learning effective geospatial embeddings is crucial for a series of geospatial applications such as city analytics and earth monitoring. However, learning comprehensive region representations presents two significant challenges: first, the deficiency of effective intra-region feature representation; and second, the difficulty of learning from intricate inter-region dependencies. In this paper, we… ▽ More Learning effective geospatial embeddings is crucial for a series of geospatial applications such as city analytics and earth monitoring. However, learning comprehensive region representations presents two significant challenges: first, the deficiency of effective intra-region feature representation; and second, the difficulty of learning from intricate inter-region dependencies. In this paper, we present GeoHG, an effective heterogeneous graph structure for learning comprehensive region embeddings for various downstream tasks. Specifically, we tailor satellite image representation learning through geo-entity segmentation and point-of-interest (POI) integration for expressive intra-regional features. Furthermore, GeoHG unifies informative spatial interdependencies and socio-environmental attributes into a powerful heterogeneous graph to encourage explicit modeling of higher-order inter-regional relationships. The intra-regional features and inter-regional correlations are seamlessly integrated by a model-agnostic graph learning framework for diverse downstream tasks. Extensive experiments demonstrate the effectiveness of GeoHG in geo-prediction tasks compared to existing methods, even under extreme data scarcity (with just 5% of training data). With interpretable region representations, GeoHG exhibits strong generalization capabilities across regions. We will release code and data upon paper notification. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.13745 [pdf, other]

NeurCross: A Self-Supervised Neural Approach for Representing Cross Fields in Quad Mesh Generation

Authors: Qiujie Dong, Huibiao Wen, Rui Xu, Xiaokang Yu, Jiaran Zhou, Shuangmin Chen, Shiqing Xin, Changhe Tu, Wenping Wang

Abstract: Quadrilateral mesh generation plays a crucial role in numerical simulations within Computer-Aided Design and Engineering (CAD/E). The quality of the cross field is essential for generating a quadrilateral mesh. In this paper, we propose a self-supervised neural representation of the cross field, named NeurCross, comprising two modules: one to fit the signed distance function (SDF) and another to p… ▽ More Quadrilateral mesh generation plays a crucial role in numerical simulations within Computer-Aided Design and Engineering (CAD/E). The quality of the cross field is essential for generating a quadrilateral mesh. In this paper, we propose a self-supervised neural representation of the cross field, named NeurCross, comprising two modules: one to fit the signed distance function (SDF) and another to predict the cross field. Unlike most existing approaches that operate directly on the given polygonal surface, NeurCross takes the SDF as a bridge to allow for SDF overfitting and the prediction of the cross field to proceed simultaneously. By utilizing a neural SDF, we achieve a smooth representation of the base surface, minimizing the impact of piecewise planar discretization and minor surface variations. Moreover, the principal curvatures and directions are fully encoded by the Hessian of the SDF, enabling the regularization of the overall cross field through minor adjustments to the SDF. Compared to state-of-the-art methods, NeurCross significantly improves the placement of singular points and the approximation accuracy between the input triangular surface and the output quad mesh, as demonstrated in the teaser figure. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.12459 [pdf, other]

PLM4Traj: Cognizing Movement Patterns and Travel Purposes from Trajectories with Pre-trained Language Models

Authors: Zeyu Zhou, Yan Lin, Haomin Wen, Shengnan Guo, Jilin Hu, Youfang Lin, Huaiyu Wan

Abstract: Spatio-temporal trajectories play a vital role in various spatio-temporal data mining tasks. Developing a versatile trajectory learning approach that can adapt to different tasks while ensuring high accuracy is crucial. This requires effectively extracting movement patterns and travel purposes embedded in trajectories. However, this task is challenging due to limitations in the size and quality of… ▽ More Spatio-temporal trajectories play a vital role in various spatio-temporal data mining tasks. Developing a versatile trajectory learning approach that can adapt to different tasks while ensuring high accuracy is crucial. This requires effectively extracting movement patterns and travel purposes embedded in trajectories. However, this task is challenging due to limitations in the size and quality of available trajectory datasets. On the other hand, pre-trained language models (PLMs) have shown great success in adapting to different tasks by training on large-scale, high-quality corpus datasets. Given the similarities between trajectories and sentences, there is potential in leveraging PLMs to enhance the development of a versatile and effective trajectory learning method. Nevertheless, vanilla PLMs are not tailored to handle the unique spatio-temporal features present in trajectories and lack the capability to extract movement patterns and travel purposes from them. To overcome these obstacles, we propose a model called PLM4Traj that effectively utilizes PLMs to model trajectories. PLM4Traj leverages the strengths of PLMs to create a versatile trajectory learning approach while addressing the limitations of vanilla PLMs in modeling trajectories. Firstly, PLM4Traj incorporates a novel trajectory semantic embedder that enables PLMs to process spatio-temporal features in trajectories and extract movement patterns and travel purposes from them. Secondly, PLM4Traj introduces a novel trajectory prompt that integrates movement patterns and travel purposes into PLMs, while also allowing the model to adapt to various tasks. Extensive experiments conducted on two real-world datasets and two representative tasks demonstrate that PLM4Traj successfully achieves its design goals. Codes are available at https://github.com/Zeru19/PLM4Traj. △ Less

Submitted 20 May, 2024; originally announced May 2024.

arXiv:2405.09004 [pdf, other]

Improving Sequential Market Clearing via Value-oriented Renewable Energy Forecasting

Authors: Yufan Zhang, Honglin Wen, Yuexin Bian, Yuanyuan Shi

Abstract: Large penetration of renewable energy sources (RESs) brings huge uncertainty into the electricity markets. While existing deterministic market clearing fails to accommodate the uncertainty, the recently proposed stochastic market clearing struggles to achieve desirable market properties. In this work, we propose a value-oriented forecasting approach, which tactically determines the RESs generation… ▽ More Large penetration of renewable energy sources (RESs) brings huge uncertainty into the electricity markets. While existing deterministic market clearing fails to accommodate the uncertainty, the recently proposed stochastic market clearing struggles to achieve desirable market properties. In this work, we propose a value-oriented forecasting approach, which tactically determines the RESs generation that enters the day-ahead market. With such a forecast, the existing deterministic market clearing framework can be maintained, and the day-ahead and real-time overall operation cost is reduced. At the training phase, the forecast model parameters are estimated to minimize expected day-ahead and real-time overall operation costs, instead of minimizing forecast errors in a statistical sense. Theoretically, we derive the exact form of the loss function for training the forecast model that aligns with such a goal. For market clearing modeled by linear programs, this loss function is a piecewise linear function. Additionally, we derive the analytical gradient of the loss function with respect to the forecast, which inspires an efficient training strategy. A numerical study shows our forecasts can bring significant benefits of the overall cost reduction to deterministic market clearing, compared to quality-oriented forecasting approach. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2405.04547 [pdf, other]

Constraints on typical relic gravitational waves based on data of LIGO

Authors: Minghui Zhang, Hao Wen

Abstract: Relic gravitational waves (RGWs) from early universe carry fundamental information, so it's extraordinarily important to search RGW signals from data of observatories like LIGO-Virgo network. Here, focusing on typical RGWs from inflation and first-order phase transition (by sound waves and bubble collisions), effective and targeted deep learning neural networks are established to search RGW signal… ▽ More Relic gravitational waves (RGWs) from early universe carry fundamental information, so it's extraordinarily important to search RGW signals from data of observatories like LIGO-Virgo network. Here, focusing on typical RGWs from inflation and first-order phase transition (by sound waves and bubble collisions), effective and targeted deep learning neural networks are established to search RGW signals among real LIGO data (O2, O3a and O3b). We construct Convolutional Neural Network (CNN) to estimate likelihood (by quantitative values and distributions) of existence of focused RGW signals in LIGO data, or provide constraints on their strengths. We find if the built CNN properly estimates the parameters of RGWs, it can accurately (about 94% to 99%) determine whether the samples contain RGW signals, and if not, the likelihood given by CNN is not reliable. After testing large amount of LIGO datasets, the results indicate no evidence of RGWs from inflation, sound waves, or bubble collisions predicted by focused theories, and it provides upper limits of their GW spectral energy densities of h^2Ω_{gw} of 10^{-5} (for various orders of GW amplitude given specific parameter regions by reverse mapping). In short, null results and upper limits are acquire; the methods and neural networks we develop to search RGWs from LIGO data could be effective and reliable, which can be applied not only for current data but also upcoming O4 data or other observational datasets, to establish an available scheme for exploring potential RGW signals or to provide constraints on relevant theoretical models. △ Less

Submitted 5 May, 2024; originally announced May 2024.

arXiv:2405.03644 [pdf, other]

When LLMs Meet Cybersecurity: A Systematic Literature Review

Authors: Jie Zhang, Haoyu Bu, Hui Wen, Yu Chen, Lun Li, Hongsong Zhu

Abstract: The rapid advancements in large language models (LLMs) have opened new avenues across various fields, including cybersecurity, which faces an ever-evolving threat landscape and need for innovative technologies. Despite initial explorations into the application of LLMs in cybersecurity, there is a lack of a comprehensive overview of this research area. This paper bridge this gap by providing a syst… ▽ More The rapid advancements in large language models (LLMs) have opened new avenues across various fields, including cybersecurity, which faces an ever-evolving threat landscape and need for innovative technologies. Despite initial explorations into the application of LLMs in cybersecurity, there is a lack of a comprehensive overview of this research area. This paper bridge this gap by providing a systematic literature review, encompassing an analysis of over 180 works, spanning across 25 LLMs and more than 10 downstream scenarios. Our comprehensive overview addresses three critical research questions: the construction of cybersecurity-oriented LLMs, LLMs' applications in various cybersecurity tasks, and the existing challenges and further research in this area. This study aims to shed light on the extensive potential of LLMs in enhancing cybersecurity practices, and serve as a valuable resource for applying LLMs in this doamin. We also maintain and regularly updated list of practical guides on LLMs for cybersecurity at https://github.com/tmylla/Awesome-LLM4Cybersecurity. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: 36 pages, 7 figures

arXiv:2405.02508 [pdf, other]

Rasterized Edge Gradients: Handling Discontinuities Differentiably

Authors: Stanislav Pidhorskyi, Tomas Simon, Gabriel Schwartz, He Wen, Yaser Sheikh, Jason Saragih

Abstract: Computing the gradients of a rendering process is paramount for diverse applications in computer vision and graphics. However, accurate computation of these gradients is challenging due to discontinuities and rendering approximations, particularly for surface-based representations and rasterization-based rendering. We present a novel method for computing gradients at visibility discontinuities for… ▽ More Computing the gradients of a rendering process is paramount for diverse applications in computer vision and graphics. However, accurate computation of these gradients is challenging due to discontinuities and rendering approximations, particularly for surface-based representations and rasterization-based rendering. We present a novel method for computing gradients at visibility discontinuities for rasterization-based differentiable renderers. Our method elegantly simplifies the traditionally complex problem through a carefully designed approximation strategy, allowing for a straightforward, effective, and performant solution. We introduce a novel concept of micro-edges, which allows us to treat the rasterized images as outcomes of a differentiable, continuous process aligned with the inherently non-differentiable, discrete-pixel rasterization. This technique eliminates the necessity for rendering approximations or other modifications to the forward pass, preserving the integrity of the rendered image, which makes it applicable to rasterized masks, depth, and normals images where filtering is prohibitive. Utilizing micro-edges simplifies gradient interpretation at discontinuities and enables handling of geometry intersections, offering an advantage over the prior art. We showcase our method in dynamic human head scene reconstruction, demonstrating effective handling of camera images and segmentation masks. △ Less

Submitted 16 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

arXiv:2404.18886 [pdf, other]

A Survey on Diffusion Models for Time Series and Spatio-Temporal Data

Authors: Yiyuan Yang, Ming Jin, Haomin Wen, Chaoli Zhang, Yuxuan Liang, Lintao Ma, Yi Wang, Chenghao Liu, Bin Yang, Zenglin Xu, Jiang Bian, Shirui Pan, Qingsong Wen

Abstract: The study of time series is crucial for understanding trends and anomalies over time, enabling predictive insights across various sectors. Spatio-temporal data, on the other hand, is vital for analyzing phenomena in both space and time, providing a dynamic perspective on complex system interactions. Recently, diffusion models have seen widespread application in time series and spatio-temporal data… ▽ More The study of time series is crucial for understanding trends and anomalies over time, enabling predictive insights across various sectors. Spatio-temporal data, on the other hand, is vital for analyzing phenomena in both space and time, providing a dynamic perspective on complex system interactions. Recently, diffusion models have seen widespread application in time series and spatio-temporal data mining. Not only do they enhance the generative and inferential capabilities for sequential and temporal data, but they also extend to other downstream tasks. In this survey, we comprehensively and thoroughly review the use of diffusion models in time series and spatio-temporal data, categorizing them by model category, task type, data modality, and practical application domain. In detail, we categorize diffusion models into unconditioned and conditioned types and discuss time series and spatio-temporal data separately. Unconditioned models, which operate unsupervised, are subdivided into probability-based and score-based models, serving predictive and generative tasks such as forecasting, anomaly detection, classification, and imputation. Conditioned models, on the other hand, utilize extra information to enhance performance and are similarly divided for both predictive and generative tasks. Our survey extensively covers their application in various fields, including healthcare, recommendation, climate, energy, audio, and transportation, providing a foundational understanding of how these models analyze and generate data. Through this structured overview, we aim to provide researchers and practitioners with a comprehensive understanding of diffusion models for time series and spatio-temporal data analysis, aiming to direct future innovations and applications by addressing traditional challenges and exploring innovative solutions within the diffusion model framework. △ Less

Submitted 11 June, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

Comments: Ongoing work & Under review; 27 pages, 8 figures, 2 tables; Github Repo: https://github.com/yyysjz1997/Awesome-TimeSeries-SpatioTemporal-Diffusion-Model

arXiv:2404.18191 [pdf, other]

Exploring the Robustness of In-Context Learning with Noisy Labels

Authors: Chen Cheng, Xinzhi Yu, Haodong Wen, Jingsong Sun, Guanzhang Yue, Yihao Zhang, Zeming Wei

Abstract: Recently, the mysterious In-Context Learning (ICL) ability exhibited by Transformer architectures, especially in large language models (LLMs), has sparked significant research interest. However, the resilience of Transformers' in-context learning capabilities in the presence of noisy samples, prevalent in both training corpora and prompt demonstrations, remains underexplored. In this paper, inspir… ▽ More Recently, the mysterious In-Context Learning (ICL) ability exhibited by Transformer architectures, especially in large language models (LLMs), has sparked significant research interest. However, the resilience of Transformers' in-context learning capabilities in the presence of noisy samples, prevalent in both training corpora and prompt demonstrations, remains underexplored. In this paper, inspired by prior research that studies ICL ability using simple function classes, we take a closer look at this problem by investigating the robustness of Transformers against noisy labels. Specifically, we first conduct a thorough evaluation and analysis of the robustness of Transformers against noisy labels during in-context learning and show that they exhibit notable resilience against diverse types of noise in demonstration labels. Furthermore, we delve deeper into this problem by exploring whether introducing noise into the training set, akin to a form of data augmentation, enhances such robustness during inference, and find that such noise can indeed improve the robustness of ICL. Overall, our fruitful analysis and findings provide a comprehensive understanding of the resilience of Transformer models against label noises during ICL and provide valuable insights into the research on Transformers in natural language processing. Our code is available at https://github.com/InezYu0928/in-context-learning. △ Less

Submitted 1 May, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

Comments: ICLR 2024 Workshop on Reliable and Responsible Foundation Models

arXiv:2404.15875 [pdf, other]

doi 10.1145/3626772.3657727

Simple but Effective Raw-Data Level Multimodal Fusion for Composed Image Retrieval

Authors: Haokun Wen, Xuemeng Song, Xiaolin Chen, Yinwei Wei, Liqiang Nie, Tat-Seng Chua

Abstract: Composed image retrieval (CIR) aims to retrieve the target image based on a multimodal query, i.e., a reference image paired with corresponding modification text. Recent CIR studies leverage vision-language pre-trained (VLP) methods as the feature extraction backbone, and perform nonlinear feature-level multimodal query fusion to retrieve the target image. Despite the promising performance, we arg… ▽ More Composed image retrieval (CIR) aims to retrieve the target image based on a multimodal query, i.e., a reference image paired with corresponding modification text. Recent CIR studies leverage vision-language pre-trained (VLP) methods as the feature extraction backbone, and perform nonlinear feature-level multimodal query fusion to retrieve the target image. Despite the promising performance, we argue that their nonlinear feature-level multimodal fusion may lead to the fused feature deviating from the original embedding space, potentially hurting the retrieval performance. To address this issue, in this work, we propose shifting the multimodal fusion from the feature level to the raw-data level to fully exploit the VLP model's multimodal encoding and cross-modal alignment abilities. In particular, we introduce a Dual Query Unification-based Composed Image Retrieval framework (DQU-CIR), whose backbone simply involves a VLP model's image encoder and a text encoder. Specifically, DQU-CIR first employs two training-free query unification components: text-oriented query unification and vision-oriented query unification, to derive a unified textual and visual query based on the raw data of the multimodal query, respectively. The unified textual query is derived by concatenating the modification text with the extracted reference image's textual description, while the unified visual query is created by writing the key modification words onto the reference image. Ultimately, to address diverse search intentions, DQU-CIR linearly combines the features of the two unified queries encoded by the VLP model to retrieve the target image. Extensive experiments on four real-world datasets validate the effectiveness of our proposed method. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: ACM SIGIR 2024

arXiv:2404.14941 [pdf, other]

Delayed Bottlenecking: Alleviating Forgetting in Pre-trained Graph Neural Networks

Authors: Zhe Zhao, Pengkun Wang, Xu Wang, Haibin Wen, Xiaolong Xie, Zhengyang Zhou, Qingfu Zhang, Yang Wang

Abstract: Pre-training GNNs to extract transferable knowledge and apply it to downstream tasks has become the de facto standard of graph representation learning. Recent works focused on designing self-supervised pre-training tasks to extract useful and universal transferable knowledge from large-scale unlabeled data. However, they have to face an inevitable question: traditional pre-training strategies that… ▽ More Pre-training GNNs to extract transferable knowledge and apply it to downstream tasks has become the de facto standard of graph representation learning. Recent works focused on designing self-supervised pre-training tasks to extract useful and universal transferable knowledge from large-scale unlabeled data. However, they have to face an inevitable question: traditional pre-training strategies that aim at extracting useful information about pre-training tasks, may not extract all useful information about the downstream task. In this paper, we reexamine the pre-training process within traditional pre-training and fine-tuning frameworks from the perspective of Information Bottleneck (IB) and confirm that the forgetting phenomenon in pre-training phase may cause detrimental effects on downstream tasks. Therefore, we propose a novel \underline{D}elayed \underline{B}ottlenecking \underline{P}re-training (DBP) framework which maintains as much as possible mutual information between latent representations and training data during pre-training phase by suppressing the compression operation and delays the compression operation to fine-tuning phase to make sure the compression can be guided with labeled fine-tuning data and downstream tasks. To achieve this, we design two information control objectives that can be directly optimized and further integrate them into the actual model design. Extensive experiments on both chemistry and biology domains demonstrate the effectiveness of DBP. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.14182 [pdf]

Record high superconducting transition temperature in Ti$_{1-x}$Mn$_x$ alloy with rich magnetic element Mn

Authors: Ying-Jie Zhang, Yijie Zhu, Qing Li, Zhe-Ning Xiang, Tianheng Huang, Jian Sun, Hai-Hu Wen

Abstract: It is well-known that magnetic moments are very harmful to superconductivity. A typical example is the element Mn whose compounds usually exhibit strong magnetism. Thus, it is very hard to achieve superconductivity in materials containing Mn. Here, we report enhanced superconductivity with the superconducting transition temperature ($T_\text{c}$) up to a record high-value of about 26 K in a beta-p… ▽ More It is well-known that magnetic moments are very harmful to superconductivity. A typical example is the element Mn whose compounds usually exhibit strong magnetism. Thus, it is very hard to achieve superconductivity in materials containing Mn. Here, we report enhanced superconductivity with the superconducting transition temperature ($T_\text{c}$) up to a record high-value of about 26 K in a beta-phase Ti$_{1-x}$Mn$_x$ alloy containing rich magnetic element Mn under high pressures. This is contrary to the intuition that the magnetic moments always suppress superconductivity. Under high pressures, we also found that in the middle-pressure regime, the Pauli limit of the upper critical field is surpassed. The synchrotron X-ray diffraction data shows an unchanged beta-phase with a continuous contraction of the cell volume, which is well supported by the first-principles calculations. Although the theoretical results based on electron-phonon coupling (EPC) can interpret the $T_\text{c}$ value in a certain pressure region, the monotonic enhancement of superconductivity by pressure cannot seek support from the theory. Our results show a surprising enhancement of superconductivity in Ti$_{1-x}$Mn$_x$ alloy with a considerable Mn content. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: 28 pages, 5 figures. Comments are welcome and appreciated

arXiv:2404.13405 [pdf]

Field-free switching of perpendicular magnetization by cooperation of planar Hall and orbital Hall effects

Authors: Zelalem Abebe Bekele, Yuan-Yuan Jiang, Kun Lei, Xiukai Lan, Xiangyu Liu, Hui Wen, Ding-Fu Shao, Kaiyou Wang

Abstract: Spin-orbit torques (SOTs) generated through the conventional spin Hall effect and/or Rashba-Edelstein effect are promising for manipulating magnetization. However, this approach typically exhibits non-deterministic and inefficient behaviour when it comes to switching perpendicular ferromagnets. This limitation posed a challenge for write-in operations in high-density magnetic memory devices. Here,… ▽ More Spin-orbit torques (SOTs) generated through the conventional spin Hall effect and/or Rashba-Edelstein effect are promising for manipulating magnetization. However, this approach typically exhibits non-deterministic and inefficient behaviour when it comes to switching perpendicular ferromagnets. This limitation posed a challenge for write-in operations in high-density magnetic memory devices. Here, we determine an effective solution to overcome this challenge by simultaneously leveraging both a planar Hall effect (PHE) and an orbital Hall effect (OHE). Using a representative Co/PtGd/Mo trilayer SOT device, we demonstrate that the PHE of Co is enhanced by the interfacial coupling of Co/PtGd, giving rise to a finite out-of-plane damping-like torque within the Co layer. Simultaneously, the OHE in Mo layer induces a strong out-of-plane orbital current, significantly amplifying the in-plane damping-like torque through orbital-to-spin conversion. While either the PHE or OHE alone proves insufficient for reversing the perpendicular magnetization of Co, their collaborative action enables high-efficiency field-free deterministic switching. Our work provides a straightforward strategy to realize high-speed and low-power spintronics. △ Less

Submitted 20 April, 2024; originally announced April 2024.

Comments: 13 pages, 3 figures, submitted to Nat. Commun

arXiv:2404.12708 [pdf, ps, other]

Magnetic-field driven evolution of zero-energy mode on Bi islands deposited on Fe(Te,Se)

Authors: Kailun Chen, Chuanhao Wen, Zhiyong Hou, Huan Yang, Hai-Hu Wen

Abstract: We investigate the magnetic-field dependent evolution of the zero-bias conductance peaks (ZBCPs) on the nanoscale bismuth islands grown on the FeTe$_{0.55}$Se$_{0.45}$ substrate. The ZBCPs can be observed throughout the entire region on these islands, and their characteristics align with the signatures of Majorana zero modes. Remarkably, the evolution of ZBCPs on these islands exhibits anomalous b… ▽ More We investigate the magnetic-field dependent evolution of the zero-bias conductance peaks (ZBCPs) on the nanoscale bismuth islands grown on the FeTe$_{0.55}$Se$_{0.45}$ substrate. The ZBCPs can be observed throughout the entire region on these islands, and their characteristics align with the signatures of Majorana zero modes. Remarkably, the evolution of ZBCPs on these islands exhibits anomalous behavior under varying magnetic fields: The magnitude of ZBCPs is first enhanced at weak fields lower than 2 T and then suppressed as the fields further increase. We attribute the non-monotonic evolution of the ZBCPs to the magnetic-field-enhanced topological edge states on these Bi islands. Our findings provide valuable insights into the probable origin of the Majorana zero modes in the Bi-island platform and the magnetic-field response of topological edge states. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: 6 pages, 4 figures

arXiv:2404.11131 [pdf, ps, other]

doi 10.1103/PhysRevMaterials.8.014801

Local pairing versus bulk superconductivity intertwined by the charge density wave order in Cs(V$_{1-x}$Ta$_{x}$)$_{3}$Sb$_{5}$

Authors: Jinyulin Li, Qing Li, Jinjin Liu, Ying Xiang, Huan Yang, Zhiwei Wang, Yugui Yao, Hai-Hu Wen

Abstract: There is a common belief that superconductivity and charge density wave (CDW) order accommodate homogenously in real space but compete with each other for the effective density of states in momentum space in CDW superconductors. By measuring resistivity along the $c$-axis in Cs(V$_{1-x}$Ta$_{x}$)$_{3}$Sb$_{5}$, we observe strong superconducting fluctuation behavior coexisting with the CDW order in… ▽ More There is a common belief that superconductivity and charge density wave (CDW) order accommodate homogenously in real space but compete with each other for the effective density of states in momentum space in CDW superconductors. By measuring resistivity along the $c$-axis in Cs(V$_{1-x}$Ta$_{x}$)$_{3}$Sb$_{5}$, we observe strong superconducting fluctuation behavior coexisting with the CDW order in the pristine CsV$_{3}$Sb$_{5}$, and the fluctuation region becomes narrowed when the Ta doping suppresses the CDW order. The onset transition temperature barely changes with the Ta doping. Therefore, the bulk superconductivity may be established by a doping-independent local pairing, and it can be suppressed in some regions by the spatially variable CDW order along the $c$-axis. Our results violate the above-mentioned belief about CDW superconductors and demonstrate the intricate interaction between superconductivity and CDW order in this kagome superconductor. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: 9 pages, 5 figures

Journal ref: Phys. Rev. Materials 8, 014801 (2024)

arXiv:2404.11115 [pdf, ps, other]

doi 10.1103/PhysRevB.106.214529

Strong-coupling superconductivity and weak vortex pinning in Ta-doped CsV$_{3}$Sb$_{5}$ single crystals

Authors: Jinyulin Li, Wei Xie, Jinjin Liu, Qing Li, Xiang Li, Huan Yang, Zhiwei Wang, Yugui Yao, Hai-Hu Wen

Abstract: By measuring magnetizations of pristine and Ta-doped CsV$_{3}$Sb$_{5}$ single crystals, we have carried out systematic studies on the lower critical field, critical current density, and equilibrium magnetization of this kagome system. The lower critical field has been investigated in the two typical samples, and the temperature dependent lower critical field obtained in Ta-doped sample can be fitt… ▽ More By measuring magnetizations of pristine and Ta-doped CsV$_{3}$Sb$_{5}$ single crystals, we have carried out systematic studies on the lower critical field, critical current density, and equilibrium magnetization of this kagome system. The lower critical field has been investigated in the two typical samples, and the temperature dependent lower critical field obtained in Ta-doped sample can be fitted by using the model with two $s$-wave superconducting gaps yielding the larger gap of $2Δ_{s1}/k_\mathrm{B}T_\mathrm{c}=7.9\;(\pm1.8)$. This indicates a strong-coupling feature of the V-based superconductors. The measured magnetization hysteresis loops allow us to calculate the critical current density, which shows a very weak bulk vortex pinning. The magnetization hysteresis loops measured in these two kinds of samples can be well described by a recently proposed generalized phenomenological model, which leads to the determination of many fundamental parameters for these superconductors. Our systematic results and detailed analysis conclude that this V-based kagome system has features of strong-coupling superconductivity, relatively large Ginzburg-Landau parameter and weak vortex coupling. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: 11 pages, 6 figures

Journal ref: Phys. Rev. B 106, 214529 (2022)

arXiv:2404.10490 [pdf, other]

Enhancing Sign Language Teaching: A Mixed Reality Approach for Immersive Learning and Multi-Dimensional Feedback

Authors: Hongli Wen, Yang Xu, Lin Li, Xudong Ru, Xingce Wang, Zhongke Wu

Abstract: Traditional sign language teaching methods face challenges such as limited feedback and diverse learning scenarios. Although 2D resources lack real-time feedback, classroom teaching is constrained by a scarcity of teacher. Methods based on VR and AR have relatively primitive interaction feedback mechanisms. This study proposes an innovative teaching model that uses real-time monocular vision and m… ▽ More Traditional sign language teaching methods face challenges such as limited feedback and diverse learning scenarios. Although 2D resources lack real-time feedback, classroom teaching is constrained by a scarcity of teacher. Methods based on VR and AR have relatively primitive interaction feedback mechanisms. This study proposes an innovative teaching model that uses real-time monocular vision and mixed reality technology. First, we introduce an improved hand-posture reconstruction method to achieve sign language semantic retention and real-time feedback. Second, a ternary system evaluation algorithm is proposed for a comprehensive assessment, maintaining good consistency with experts in sign language. Furthermore, we use mixed reality technology to construct a scenario-based 3D sign language classroom and explore the user experience of scenario teaching. Overall, this paper presents a novel teaching method that provides an immersive learning experience, advanced posture reconstruction, and precise feedback, achieving positive feedback on user experience and learning effectiveness. △ Less

Submitted 6 May, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: 8 pages, 6 figures

arXiv:2404.10383 [pdf, other]

Learning to Score Sign Language with Two-stage Method

Authors: Hongli Wen, Yang Xu

Abstract: Human action recognition and performance assessment have been hot research topics in recent years. Recognition problems have mature solutions in the field of sign language, but past research in performance analysis has focused on competitive sports and medical training, overlooking the scoring assessment ,which is an important part of sign language teaching digitalization. In this paper, we analyz… ▽ More Human action recognition and performance assessment have been hot research topics in recent years. Recognition problems have mature solutions in the field of sign language, but past research in performance analysis has focused on competitive sports and medical training, overlooking the scoring assessment ,which is an important part of sign language teaching digitalization. In this paper, we analyze the existing technologies for performance assessment and adopt methods that perform well in human pose reconstruction tasks combined with motion rotation embedded expressions, proposing a two-stage sign language performance evaluation pipeline. Our analysis shows that choosing reconstruction tasks in the first stage can provide more expressive features, and using smoothing methods can provide an effective reference for assessment. Experiments show that our method provides good score feedback mechanisms and high consistency with professional assessments compared to end-to-end evaluations. △ Less

Submitted 16 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: 9 pages, 7 figures

arXiv:2404.10353 [pdf, other]

Rethinking the Graph Polynomial Filter via Positive and Negative Coupling Analysis

Authors: Haodong Wen, Bodong Du, Ruixun Liu, Deyu Meng, Xiangyong Cao

Abstract: Recently, the optimization of polynomial filters within Spectral Graph Neural Networks (GNNs) has emerged as a prominent research focus. Existing spectral GNNs mainly emphasize polynomial properties in filter design, introducing computational overhead and neglecting the integration of crucial graph structure information. We argue that incorporating graph information into basis construction can enh… ▽ More Recently, the optimization of polynomial filters within Spectral Graph Neural Networks (GNNs) has emerged as a prominent research focus. Existing spectral GNNs mainly emphasize polynomial properties in filter design, introducing computational overhead and neglecting the integration of crucial graph structure information. We argue that incorporating graph information into basis construction can enhance understanding of polynomial basis, and further facilitate simplified polynomial filter design. Motivated by this, we first propose a Positive and Negative Coupling Analysis (PNCA) framework, where the concepts of positive and negative activation are defined and their respective and mixed effects are analysed. Then, we explore PNCA from the message propagation perspective, revealing the subtle information hidden in the activation process. Subsequently, PNCA is used to analyze the mainstream polynomial filters, and a novel simple basis that decouples the positive and negative activation and fully utilizes graph structure information is designed. Finally, a simple GNN (called GSCNet) is proposed based on the new basis. Experimental results on the benchmark datasets for node classification verify that our GSCNet obtains better or comparable results compared with existing state-of-the-art GNNs while demanding relatively less computational time. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: 13 pages, 8 figures, 6 tables

arXiv:2404.08964 [pdf, other]

Understanding Multimodal Deep Neural Networks: A Concept Selection View

Authors: Chenming Shang, Hengyuan Zhang, Hao Wen, Yujiu Yang

Abstract: The multimodal deep neural networks, represented by CLIP, have generated rich downstream applications owing to their excellent performance, thus making understanding the decision-making process of CLIP an essential research topic. Due to the complex structure and the massive pre-training data, it is often regarded as a black-box model that is too difficult to understand and interpret. Concept-base… ▽ More The multimodal deep neural networks, represented by CLIP, have generated rich downstream applications owing to their excellent performance, thus making understanding the decision-making process of CLIP an essential research topic. Due to the complex structure and the massive pre-training data, it is often regarded as a black-box model that is too difficult to understand and interpret. Concept-based models map the black-box visual representations extracted by deep neural networks onto a set of human-understandable concepts and use the concepts to make predictions, enhancing the transparency of the decision-making process. However, these methods involve the datasets labeled with fine-grained attributes by expert knowledge, which incur high costs and introduce excessive human prior knowledge and bias. In this paper, we observe the long-tail distribution of concepts, based on which we propose a two-stage Concept Selection Model (CSM) to mine core concepts without introducing any human priors. The concept greedy rough selection algorithm is applied to extract head concepts, and then the concept mask fine selection method performs the extraction of core concepts. Experiments show that our approach achieves comparable performance to end-to-end black-box models, and human evaluation demonstrates that the concepts discovered by our method are interpretable and comprehensible for humans. △ Less

Submitted 13 April, 2024; originally announced April 2024.

arXiv:2404.07960 [pdf, other]

Content Knowledge Identification with Multi-Agent Large Language Models (LLMs)

Authors: Kaiqi Yang, Yucheng Chu, Taylor Darwin, Ahreum Han, Hang Li, Hongzhi Wen, Yasemin Copur-Gencturk, Jiliang Tang, Hui Liu

Abstract: Teachers' mathematical content knowledge (CK) is of vital importance and need in teacher professional development (PD) programs. Computer-aided asynchronous PD systems are the most recent proposed PD techniques, which aim to help teachers improve their PD equally with fewer concerns about costs and limitations of time or location. However, current automatic CK identification methods, which serve a… ▽ More Teachers' mathematical content knowledge (CK) is of vital importance and need in teacher professional development (PD) programs. Computer-aided asynchronous PD systems are the most recent proposed PD techniques, which aim to help teachers improve their PD equally with fewer concerns about costs and limitations of time or location. However, current automatic CK identification methods, which serve as one of the core techniques of asynchronous PD systems, face challenges such as diversity of user responses, scarcity of high-quality annotated data, and low interpretability of the predictions. To tackle these challenges, we propose a Multi-Agent LLMs-based framework, LLMAgent-CK, to assess the user responses' coverage of identified CK learning goals without human annotations. By taking advantage of multi-agent LLMs in strong generalization ability and human-like discussions, our proposed LLMAgent-CK presents promising CK identifying performance on a real-world mathematical CK dataset MaCKT. Moreover, our case studies further demonstrate the working of the multi-agent framework. △ Less

Submitted 21 March, 2024; originally announced April 2024.

arXiv:2404.02495 [pdf, other]

On Covering Simplices by Dilations in Dimensions 3 and 4

Authors: Lei Song, Huanqi Wen, Zhixian Zhu

Abstract: We propose a conjecture regarding the integrally closedness of lattice polytopes with large lattice lengths. We demonstrate that a lattice simplex in dimension 3 (resp. 4) with lattice length of at least 2 (resp. 3 and no edge has lattice length 5) can be covered by dilated simplices of the form $sQ$, where integer $s\ge 2$ (resp. 3) and $Q$ is a lattice simplex. The covering property implies thes… ▽ More We propose a conjecture regarding the integrally closedness of lattice polytopes with large lattice lengths. We demonstrate that a lattice simplex in dimension 3 (resp. 4) with lattice length of at least 2 (resp. 3 and no edge has lattice length 5) can be covered by dilated simplices of the form $sQ$, where integer $s\ge 2$ (resp. 3) and $Q$ is a lattice simplex. The covering property implies these simplices are integrally closed. As an application, we derive a simple criterion for the projective normality of ample line bundles on weighted projective spaces of dimension 3 (resp. 4). Along the way, we discover certain unexpected phenomenon. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: Comments are welcome

arXiv:2403.18341 [pdf, other]

IterAlign: Iterative Constitutional Alignment of Large Language Models

Authors: Xiusi Chen, Hongzhi Wen, Sreyashi Nag, Chen Luo, Qingyu Yin, Ruirui Li, Zheng Li, Wei Wang

Abstract: With the rapid development of large language models (LLMs), aligning LLMs with human values and societal norms to ensure their reliability and safety has become crucial. Reinforcement learning with human feedback (RLHF) and Constitutional AI (CAI) have been proposed for LLM alignment. However, these methods require either heavy human annotations or explicitly pre-defined constitutions, which are l… ▽ More With the rapid development of large language models (LLMs), aligning LLMs with human values and societal norms to ensure their reliability and safety has become crucial. Reinforcement learning with human feedback (RLHF) and Constitutional AI (CAI) have been proposed for LLM alignment. However, these methods require either heavy human annotations or explicitly pre-defined constitutions, which are labor-intensive and resource-consuming. To overcome these drawbacks, we study constitution-based LLM alignment and propose a data-driven constitution discovery and self-alignment framework called IterAlign. IterAlign leverages red teaming to unveil the weaknesses of an LLM and automatically discovers new constitutions using a stronger LLM. These constitutions are then used to guide self-correction of the base LLM. Such a constitution discovery pipeline can be run iteratively and automatically to discover new constitutions that specifically target the alignment gaps in the current LLM. Empirical results on several safety benchmark datasets and multiple base LLMs show that IterAlign successfully improves truthfulness, helpfulness, harmlessness and honesty, improving the LLM alignment by up to $13.5\%$ in harmlessness. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: NAACL 2024

arXiv:2403.14735 [pdf, other]

doi 10.1145/3637528.3671451

Foundation Models for Time Series Analysis: A Tutorial and Survey

Authors: Yuxuan Liang, Haomin Wen, Yuqi Nie, Yushan Jiang, Ming Jin, Dongjin Song, Shirui Pan, Qingsong Wen

Abstract: Time series analysis stands as a focal point within the data mining community, serving as a cornerstone for extracting valuable insights crucial to a myriad of real-world applications. Recent advances in Foundation Models (FMs) have fundamentally reshaped the paradigm of model design for time series analysis, boosting various downstream tasks in practice. These innovative approaches often leverage… ▽ More Time series analysis stands as a focal point within the data mining community, serving as a cornerstone for extracting valuable insights crucial to a myriad of real-world applications. Recent advances in Foundation Models (FMs) have fundamentally reshaped the paradigm of model design for time series analysis, boosting various downstream tasks in practice. These innovative approaches often leverage pre-trained or fine-tuned FMs to harness generalized knowledge tailored for time series analysis. This survey aims to furnish a comprehensive and up-to-date overview of FMs for time series analysis. While prior surveys have predominantly focused on either application or pipeline aspects of FMs in time series analysis, they have often lacked an in-depth understanding of the underlying mechanisms that elucidate why and how FMs benefit time series analysis. To address this gap, our survey adopts a methodology-centric classification, delineating various pivotal elements of time-series FMs, including model architectures, pre-training techniques, adaptation methods, and data modalities. Overall, this survey serves to consolidate the latest advancements in FMs pertinent to time series analysis, accentuating their theoretical underpinnings, recent strides in development, and avenues for future exploration. △ Less

Submitted 18 June, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

Comments: In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'24)

arXiv:2403.14151 [pdf, other]

Deep Learning for Trajectory Data Management and Mining: A Survey and Beyond

Authors: Wei Chen, Yuxuan Liang, Yuanshao Zhu, Yanchuan Chang, Kang Luo, Haomin Wen, Lei Li, Yanwei Yu, Qingsong Wen, Chao Chen, Kai Zheng, Yunjun Gao, Xiaofang Zhou, Yu Zheng

Abstract: Trajectory computing is a pivotal domain encompassing trajectory data management and mining, garnering widespread attention due to its crucial role in various practical applications such as location services, urban traffic, and public safety. Traditional methods, focusing on simplistic spatio-temporal features, face challenges of complex calculations, limited scalability, and inadequate adaptabili… ▽ More Trajectory computing is a pivotal domain encompassing trajectory data management and mining, garnering widespread attention due to its crucial role in various practical applications such as location services, urban traffic, and public safety. Traditional methods, focusing on simplistic spatio-temporal features, face challenges of complex calculations, limited scalability, and inadequate adaptability to real-world complexities. In this paper, we present a comprehensive review of the development and recent advances in deep learning for trajectory computing (DL4Traj). We first define trajectory data and provide a brief overview of widely-used deep learning models. Systematically, we explore deep learning applications in trajectory management (pre-processing, storage, analysis, and visualization) and mining (trajectory-related forecasting, trajectory-related recommendation, trajectory classification, travel time estimation, anomaly detection, and mobility generation). Notably, we encapsulate recent advancements in Large Language Models (LLMs) that hold the potential to augment trajectory computing. Additionally, we summarize application scenarios, public datasets, and toolkits. Finally, we outline current challenges in DL4Traj research and propose future directions. Relevant papers and open-source resources have been collated and are continuously updated at: \href{https://github.com/yoshall/Awesome-Trajectory-Computing}{DL4Traj Repo}. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: 25 pages, 12 figures, 5 tables

arXiv:2403.09733 [pdf, other]

OverleafCopilot: Empowering Academic Writing in Overleaf with Large Language Models

Authors: Haomin Wen, Zhenjie Wei, Yan Lin, Jiyuan Wang, Yuxuan Liang, Huaiyu Wan

Abstract: The rapid development of Large Language Models (LLMs) has facilitated a variety of applications from different domains. In this technical report, we explore the integration of LLMs and the popular academic writing tool, Overleaf, to enhance the efficiency and quality of academic writing. To achieve the above goal, there are three challenges: i) including seamless interaction between Overleaf and L… ▽ More The rapid development of Large Language Models (LLMs) has facilitated a variety of applications from different domains. In this technical report, we explore the integration of LLMs and the popular academic writing tool, Overleaf, to enhance the efficiency and quality of academic writing. To achieve the above goal, there are three challenges: i) including seamless interaction between Overleaf and LLMs, ii) establishing reliable communication with the LLM provider, and iii) ensuring user privacy. To address these challenges, we present OverleafCopilot, the first-ever tool (i.e., a browser extension) that seamlessly integrates LLMs and Overleaf, enabling researchers to leverage the power of LLMs while writing papers. Specifically, we first propose an effective framework to bridge LLMs and Overleaf. Then, we developed PromptGenius, a website for researchers to easily find and share high-quality up-to-date prompts. Thirdly, we propose an agent command system to help researchers quickly build their customizable agents. OverleafCopilot (https://chromewebstore.google.com/detail/overleaf-copilot/eoadabdpninlhkkbhngoddfjianhlghb ) has been on the Chrome Extension Store, which now serves thousands of researchers. Additionally, the code of PromptGenius is released at https://github.com/wenhaomin/ChatGPT-PromptGenius. We believe our work has the potential to revolutionize academic writing practices, empowering researchers to produce higher-quality papers in less time. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.03631 [pdf, other]

Tackling Missing Values in Probabilistic Wind Power Forecasting: A Generative Approach

Authors: Honglin Wen, Pierre Pinson, Jie Gu, Zhijian Jin

Abstract: Machine learning techniques have been successfully used in probabilistic wind power forecasting. However, the issue of missing values within datasets due to sensor failure, for instance, has been overlooked for a long time. Although it is natural to consider addressing this issue by imputing missing values before model estimation and forecasting, we suggest treating missing values and forecasting… ▽ More Machine learning techniques have been successfully used in probabilistic wind power forecasting. However, the issue of missing values within datasets due to sensor failure, for instance, has been overlooked for a long time. Although it is natural to consider addressing this issue by imputing missing values before model estimation and forecasting, we suggest treating missing values and forecasting targets indifferently and predicting all unknown values simultaneously based on observations. In this paper, we offer an efficient probabilistic forecasting approach by estimating the joint distribution of features and targets based on a generative model. It is free of preprocessing, and thus avoids introducing potential errors. Compared with the traditional "impute, then predict" pipeline, the proposed approach achieves better performance in terms of continuous ranked probability score. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: 8 pages, to be presented at Power Systems Computation Conference (PSCC) 2024

arXiv:2403.02914 [pdf, ps, other]

DynST: Dynamic Sparse Training for Resource-Constrained Spatio-Temporal Forecasting

Authors: Hao Wu, Haomin Wen, Guibin Zhang, Yutong Xia, Kai Wang, Yuxuan Liang, Yu Zheng, Kun Wang

Abstract: The ever-increasing sensor service, though opening a precious path and providing a deluge of earth system data for deep-learning-oriented earth science, sadly introduce a daunting obstacle to their industrial level deployment. Concretely, earth science systems rely heavily on the extensive deployment of sensors, however, the data collection from sensors is constrained by complex geographical and s… ▽ More The ever-increasing sensor service, though opening a precious path and providing a deluge of earth system data for deep-learning-oriented earth science, sadly introduce a daunting obstacle to their industrial level deployment. Concretely, earth science systems rely heavily on the extensive deployment of sensors, however, the data collection from sensors is constrained by complex geographical and social factors, making it challenging to achieve comprehensive coverage and uniform deployment. To alleviate the obstacle, traditional approaches to sensor deployment utilize specific algorithms to design and deploy sensors. These methods dynamically adjust the activation times of sensors to optimize the detection process across each sub-region. Regrettably, formulating an activation strategy generally based on historical observations and geographic characteristics, which make the methods and resultant models were neither simple nor practical. Worse still, the complex technical design may ultimately lead to a model with weak generalizability. In this paper, we introduce for the first time the concept of spatio-temporal data dynamic sparse training and are committed to adaptively, dynamically filtering important sensor distributions. To our knowledge, this is the first proposal (termed DynST) of an industry-level deployment optimization concept at the data level. However, due to the existence of the temporal dimension, pruning of spatio-temporal data may lead to conflicts at different timestamps. To achieve this goal, we employ dynamic merge technology, along with ingenious dimensional mapping to mitigate potential impacts caused by the temporal aspect. During the training process, DynST utilize iterative pruning and sparse training, repeatedly identifying and dynamically removing sensor perception areas that contribute the least to future predictions. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2402.19348 [pdf, other]

Deep Learning for Cross-Domain Data Fusion in Urban Computing: Taxonomy, Advances, and Outlook

Authors: Xingchen Zou, Yibo Yan, Xixuan Hao, Yuehong Hu, Haomin Wen, Erdong Liu, Junbo Zhang, Yong Li, Tianrui Li, Yu Zheng, Yuxuan Liang

Abstract: As cities continue to burgeon, Urban Computing emerges as a pivotal discipline for sustainable development by harnessing the power of cross-domain data fusion from diverse sources (e.g., geographical, traffic, social media, and environmental data) and modalities (e.g., spatio-temporal, visual, and textual modalities). Recently, we are witnessing a rising trend that utilizes various deep-learning m… ▽ More As cities continue to burgeon, Urban Computing emerges as a pivotal discipline for sustainable development by harnessing the power of cross-domain data fusion from diverse sources (e.g., geographical, traffic, social media, and environmental data) and modalities (e.g., spatio-temporal, visual, and textual modalities). Recently, we are witnessing a rising trend that utilizes various deep-learning methods to facilitate cross-domain data fusion in smart cities. To this end, we propose the first survey that systematically reviews the latest advancements in deep learning-based data fusion methods tailored for urban computing. Specifically, we first delve into data perspective to comprehend the role of each modality and data source. Secondly, we classify the methodology into four primary categories: feature-based, alignment-based, contrast-based, and generation-based fusion methods. Thirdly, we further categorize multi-modal urban applications into seven types: urban planning, transportation, economy, public safety, society, environment, and energy. Compared with previous surveys, we focus more on the synergy of deep learning methods with urban computing applications. Furthermore, we shed light on the interplay between Large Language Models (LLMs) and urban computing, postulating future research directions that could revolutionize the field. We firmly believe that the taxonomy, progress, and prospects delineated in our survey stand poised to significantly enrich the research community. The summary of the comprehensive and up-to-date paper list can be found at https://github.com/yoshall/Awesome-Multimodal-Urban-Computing. △ Less

Submitted 16 June, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

arXiv:2402.12620 [pdf, other]

Are Large Language Models (LLMs) Good Social Predictors?

Authors: Kaiqi Yang, Hang Li, Hongzhi Wen, Tai-Quan Peng, Jiliang Tang, Hui Liu

Abstract: The prediction has served as a crucial scientific method in modern social studies. With the recent advancement of Large Language Models (LLMs), efforts have been made to leverage LLMs to predict the human features in social life, such as presidential voting. These works suggest that LLMs are capable of generating human-like responses. However, we find that the promising performance achieved by pre… ▽ More The prediction has served as a crucial scientific method in modern social studies. With the recent advancement of Large Language Models (LLMs), efforts have been made to leverage LLMs to predict the human features in social life, such as presidential voting. These works suggest that LLMs are capable of generating human-like responses. However, we find that the promising performance achieved by previous studies is because of the existence of input shortcut features to the response. In fact, by removing these shortcuts, the performance is reduced dramatically. To further revisit the ability of LLMs, we introduce a novel social prediction task, Soc-PRF Prediction, which utilizes general features as input and simulates real-world social study settings. With the comprehensive investigations on various LLMs, we reveal that LLMs cannot work as expected on social prediction when given general input features without shortcuts. We further investigate possible reasons for this phenomenon that suggest potential ways to enhance LLMs for social prediction. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.11634 [pdf]

Non-equilibrium pathways to emergent polar supertextures

Authors: Vladimir A. Stoica, Tiannan Yang, Sujit Das, Yue Cao, Huaiyu Wang, Yuya Kubota, Cheng Dai, Hari Padmanabhan, Yusuke Sato, Anudeep Mangu, Quynh L. Nguyen, Zhan Zhang, Disha Talreja, Marc E. Zajac, Donald A. Walko, Anthony D. DiChiara, Shigeki Owada, Kohei Miyanishi, Kenji Tamasaku, Takahiro Sato, James M. Glownia, Vincent Esposito, Silke Nelson, Matthias C. Hoffmann, Richard D. Schaller , et al. (9 additional authors not shown)

Abstract: Ultrafast stimuli can stabilize metastable states of matter inaccessible by equilibrium means. Establishing the spatiotemporal link between ultrafast excitation and metastability is crucial to understanding these phenomena. Here, we use single-shot optical-pump, X-ray-probe measurements to provide snapshots of the emergence of a persistent polar vortex supercrystal in a heterostructure that hosts… ▽ More Ultrafast stimuli can stabilize metastable states of matter inaccessible by equilibrium means. Establishing the spatiotemporal link between ultrafast excitation and metastability is crucial to understanding these phenomena. Here, we use single-shot optical-pump, X-ray-probe measurements to provide snapshots of the emergence of a persistent polar vortex supercrystal in a heterostructure that hosts a fine balance between built-in electrostatic and elastic frustrations by design. By perturbing this balance with photoinduced charges, a starting heterogenous mixture of polar phases disorders within a few picoseconds, resulting in a soup state composed of disordered ferroelectric and suppressed vortex orders. On the pico-to-nanosecond timescales, transient labyrinthine fluctuations form in this soup along with a recovering vortex order. On longer timescales, these fluctuations are progressively quenched by dynamical strain modulations, which drive the collective emergence of a single supercrystal phase. Our results, corroborated by dynamical phase-field modeling, reveal how ultrafast excitation of designer systems generates pathways for persistent metastability. △ Less

Submitted 18 February, 2024; originally announced February 2024.

Showing 1–50 of 739 results for author: Wen, H