-
Spectroastrometry and Reverberation Mapping (SARM) of Active Galactic Nuclei. I. The H$β$ Broad-line Region Structure and Black Hole Mass of Five Quasars
Authors:
Yan-Rong Li,
Chen Hu,
Zhu-Heng Yao,
Yong-Jie Chen,
Hua-Rui Bai,
Sen Yang,
Pu Du,
Feng-Na Fang,
Yi-Xin Fu,
Jun-Rong Liu,
Yue-Chang Peng,
Yu-Yang Songsheng,
Yi-Lin Wang,
Ming Xiao,
Shuo Zhai,
Hartmut Winkler,
Jin-Ming Bai,
Luis C. Ho,
Romain G. Petrov,
Jesus Aceituno,
Jian-Min Wang
Abstract:
We conduct a reverberation mapping (RM) campaign to spectroscopically monitor a sample of selected bright active galactic nuclei with large anticipated broad-line region (BLR) sizes adequate for spectroastrometric observations by the GRAVITY instrument on the Very Large Telescope Interferometer. We report the first results for five objects, IC 4329A, Mrk 335, Mrk 509, Mrk 1239, and PDS 456, among…
▽ More
We conduct a reverberation mapping (RM) campaign to spectroscopically monitor a sample of selected bright active galactic nuclei with large anticipated broad-line region (BLR) sizes adequate for spectroastrometric observations by the GRAVITY instrument on the Very Large Telescope Interferometer. We report the first results for five objects, IC 4329A, Mrk 335, Mrk 509, Mrk 1239, and PDS 456, among which Mrk 1239 and PDS 456 are for the first time spectroscopically monitored. We obtain multi-year monitoring data and perform multi-component spectral decomposition to extract the broad H$β$ profiles. We detect significant time lags between the H$β$ and continuum variations, generally obeying the previously established BLR size-luminosity relation. Velocity-resolved H$β$ time lags illustrate diverse, possibly evolving BLR kinematics. We further measure the H$β$ line widths from mean and rms spectra and the resulting virial products show good consistency among different seasons. Adopting a unity virial factor and the full width at half maximum of the broad H$β$ line from the mean spectrum as the measure of velocity, the obtained black hole mass averaged over seasons is $\log M_\bullet/M_\odot=8.02_{-0.14}^{+0.09}$, $6.92_{-0.12}^{+0.12}$, $8.01_{-0.25}^{+0.16}$, $7.44_{-0.14}^{+0.13}$, and $8.59_{-0.11}^{+0.07}$ for the five objects, respectively. The black hole mass estimations using other line width measures are also reported (up to the virial factors). For objects with previous RM campaigns, our mass estimates are in agreement with earlier results. In a companion paper, we will employ BLR dynamical modeling to directly infer the black hole mass and thereby determine the virial factors.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models
Authors:
Jinliang Lu,
Ziliang Pang,
Min Xiao,
Yaochen Zhu,
Rui Xia,
Jiajun Zhang
Abstract:
The remarkable success of Large Language Models (LLMs) has ushered natural language processing (NLP) research into a new era. Despite their diverse capabilities, LLMs trained on different corpora exhibit varying strengths and weaknesses, leading to challenges in maximizing their overall efficiency and versatility. To address these challenges, recent studies have explored collaborative strategies f…
▽ More
The remarkable success of Large Language Models (LLMs) has ushered natural language processing (NLP) research into a new era. Despite their diverse capabilities, LLMs trained on different corpora exhibit varying strengths and weaknesses, leading to challenges in maximizing their overall efficiency and versatility. To address these challenges, recent studies have explored collaborative strategies for LLMs. This paper provides a comprehensive overview of this emerging research area, highlighting the motivation behind such collaborations. Specifically, we categorize collaborative strategies into three primary approaches: Merging, Ensemble, and Cooperation. Merging involves integrating multiple LLMs in the parameter space. Ensemble combines the outputs of various LLMs. Cooperation} leverages different LLMs to allow full play to their diverse capabilities for specific tasks. We provide in-depth introductions to these methods from different perspectives and discuss their potential applications. Additionally, we outline future research directions, hoping this work will catalyze further studies on LLM collaborations and paving the way for advanced NLP applications.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
CA-FedRC: Codebook Adaptation via Federated Reservoir Computing in 5G NR
Authors:
Ziqiang Ye,
Sikai Liao,
Yulan Gao,
Shu Fang,
Yue Xiao,
Ming Xiao,
Saviour Zammit
Abstract:
With the burgeon deployment of the fifth-generation new radio (5G NR) networks, the codebook plays a crucial role in enabling the base station (BS) to acquire the channel state information (CSI). Different 5G NR codebooks incur varying overheads and exhibit performance disparities under diverse channel conditions, necessitating codebook adaptation based on channel conditions to reduce feedback ove…
▽ More
With the burgeon deployment of the fifth-generation new radio (5G NR) networks, the codebook plays a crucial role in enabling the base station (BS) to acquire the channel state information (CSI). Different 5G NR codebooks incur varying overheads and exhibit performance disparities under diverse channel conditions, necessitating codebook adaptation based on channel conditions to reduce feedback overhead while enhancing performance. However, existing methods of 5G NR codebooks adaptation require significant overhead for model training and feedback or fall short in performance. To address these limitations, this letter introduces a federated reservoir computing framework designed for efficient codebook adaptation in computationally and feedback resource-constrained mobile devices. This framework utilizes a novel series of indicators as input training data, striking an effective balance between performance and feedback overhead. Compared to conventional models, the proposed codebook adaptation via federated reservoir computing (CA-FedRC), achieves rapid convergence and significant loss reduction in both speed and accuracy. Extensive simulations under various channel conditions demonstrate that our algorithm not only reduces resource consumption of users but also accurately identifies channel types, thereby optimizing the trade-off between spectrum efficiency, computational complexity, and feedback overhead.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
MEEG and AT-DGNN: Advancing EEG Emotion Recognition with Music and Graph Learning
Authors:
Minghao Xiao,
Zhengxi Zhu,
Wenyu Wang,
Meixia Qu
Abstract:
Recent advances in neuroscience have elucidated the crucial role of coordinated brain region activities during cognitive tasks. To explore the complexity, we introduce the MEEG dataset, a comprehensive multi-modal music-induced electroencephalogram (EEG) dataset and the Attention-based Temporal Learner with Dynamic Graph Neural Network (AT-DGNN), a novel framework for EEG-based emotion recognition…
▽ More
Recent advances in neuroscience have elucidated the crucial role of coordinated brain region activities during cognitive tasks. To explore the complexity, we introduce the MEEG dataset, a comprehensive multi-modal music-induced electroencephalogram (EEG) dataset and the Attention-based Temporal Learner with Dynamic Graph Neural Network (AT-DGNN), a novel framework for EEG-based emotion recognition. The MEEG dataset captures a wide range of emotional responses to music, enabling an in-depth analysis of brainwave patterns in musical contexts. The AT-DGNN combines an attention-based temporal learner with a dynamic graph neural network (DGNN) to accurately model the local and global graph dynamics of EEG data across varying brain network topology. Our evaluations show that AT-DGNN achieves superior performance, with an accuracy (ACC) of 83.06\% in arousal and 85.31\% in valence, outperforming state-of-the-art (SOTA) methods on the MEEG dataset. Comparative analyses with traditional datasets like DEAP highlight the effectiveness of our approach and underscore the potential of music as a powerful medium for emotion induction. This study not only advances our understanding of the brain emotional processing, but also enhances the accuracy of emotion recognition technologies in brain-computer interfaces (BCI), leveraging both graph-based learning and the emotional impact of music. The source code and dataset are available at \textit{https://github.com/xmh1011/AT-DGNN}.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Prolonged Phase Segregation of Mixed-Halide Perovskite Nanocrystals in the Dark
Authors:
Xueying Ma,
Yuhui Ye,
Yang Xiao,
Shengnan Feng,
Chunfeng Zhang,
Keyu Xia,
Fengrui Hu,
Min Xiao,
Xiaoyong Wang
Abstract:
A critical issue hindering the potential applications of semiconductor mixed-halide perovskites is the phase segregation effect, wherein localized regions enriched with one type of halide anions would be formed upon continuous photogeneration of the excited-state charge carriers. These unexpected phases are capable of remixing again in the dark under the entropic driving force, the process of whic…
▽ More
A critical issue hindering the potential applications of semiconductor mixed-halide perovskites is the phase segregation effect, wherein localized regions enriched with one type of halide anions would be formed upon continuous photogeneration of the excited-state charge carriers. These unexpected phases are capable of remixing again in the dark under the entropic driving force, the process of which are now being exclusively studied after mixed-halide perovskites have arrived at the final stage of complete phase segregation. Here we show that after the removal of laser excitation from a solid film of mixed-halide perovskite nanocrystals with partial phase segregation, the iodide- and bromide-rich regions can continuously grow in the dark for a prolonged time period of several minutes. We propose that this dark phase segregation is sustained by the local electric fields associated with the surface-trapped charge carriers, whose slow dissipation out of mixed-halide perovskite nanocrystals causes a delayed occurrence of the reversal phase remixing process.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
Controlling quasi-parametric amplifications: From multiple PT-symmetry phase transitions to non-Hermitian sensing
Authors:
Xiaoxiong Wu,
Kai Bai,
Penghong Yu,
Zhaohui Dong,
Yanyan He,
Jingui Ma,
Vladislav V. Yakovlev,
Meng Xiao,
Xianfeng Chen,
Luqi Yuan
Abstract:
Quasi-parametric amplification (QPA) is a nonlinear interaction in which the idler wave is depleted through some loss mechanism. QPA plays an important role in signal amplification in ultrafast photonics and quantum light generation. The QPA process has a number of features characterized by the non-Hermitian parity-time ($\mathcal{PT}$) symmetry. In this report, we explore new interaction regimes…
▽ More
Quasi-parametric amplification (QPA) is a nonlinear interaction in which the idler wave is depleted through some loss mechanism. QPA plays an important role in signal amplification in ultrafast photonics and quantum light generation. The QPA process has a number of features characterized by the non-Hermitian parity-time ($\mathcal{PT}$) symmetry. In this report, we explore new interaction regimes and uncover multiple $\mathcal{PT}$-symmetry phase transitions in such QPA process where transitions are particularly sensitive to external parameters. In particular, we demonstrate the feasibility of detection of $10^{-11}$ inhomogeneities of the doped absorber, which is order of magnitude more sensitive than similar measurements performed in a linear absorption regime. In doing so, we reveal a family of $\mathcal{PT}$-symmetry phase transitions appearing in the QPA process and provide a novel nonlinear optical sensing mechanism for precise optical measurements.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
NOEMA formIng Cluster survEy (NICE): Characterizing eight massive galaxy groups at $1.5 < z < 4$ in the COSMOS field
Authors:
Nikolaj B. Sillassen,
Shuowen Jin,
Georgios E. Magdis,
Emanuele Daddi,
Tao Wang,
Shiying Lu,
Hanwen Sun,
Vinod Arumugam,
Daizhong Liu,
Malte Brinch,
Chiara D'Eugenio,
Raphael Gobat,
Carlos Gómez-Guijarro,
Michael Rich,
Eva Schinnerer,
Veronica Strazzullo,
Qinghua Tan,
Francesco Valentino,
Yijun Wang,
Mengyuan Xiao,
Luwenjia Zhou,
David Blánquez-Sesé,
Zheng Cai,
Yanmei Chen,
Laure Ciesla
, et al. (19 additional authors not shown)
Abstract:
The NOEMA formIng Cluster survEy (NICE) is a large program targeting 69 massive galaxy group candidates at $z>2$ in six deep fields. We report spectroscopic confirmation of eight groups at $1.65\leq z\leq3.61$ in COSMOS. Homogeneously selected as significant overdensities of red IRAC sources with red Herschel colors, four groups are confirmed by CO and [CI] with NOEMA 3mm observations, three are c…
▽ More
The NOEMA formIng Cluster survEy (NICE) is a large program targeting 69 massive galaxy group candidates at $z>2$ in six deep fields. We report spectroscopic confirmation of eight groups at $1.65\leq z\leq3.61$ in COSMOS. Homogeneously selected as significant overdensities of red IRAC sources with red Herschel colors, four groups are confirmed by CO and [CI] with NOEMA 3mm observations, three are confirmed with ALMA, and one is confirmed by H$α$ from Subaru/FMOS. We constructed the integrated FIR SEDs for the eight groups, obtaining total IR SFR $=260-1300~{\rm M_\odot}$~yr$^{-1}$. We adopted six methods to estimate the dark matter masses, including stellar mass to halo mass relations, overdensity with galaxy bias, and NFW profile fitting to radial stellar mass density. We found the radial stellar mass density are consistent with a NFW profile, supporting that they are collapsed structures hosted by a single dark matter halo. The best halo mass estimates are $\log(M_{\rm h}/{\rm M_\odot})=12.8-13.7$ with uncertainty of 0.3 dex. From halo mass estimates, we derive baryonic accretion rate ${\rm BAR}=(1-8)\times10^{3}\,{\rm M_{\odot}/yr}$ for this sample. We find a quasi-linear correlation between the integrated SFR/BAR and the theoretical halo mass limit for cold streams, $M_{\rm stream}/M_{\rm h}$, with ${\rm SFR/BAR}=10^{-0.46\pm0.22}\left({M_{\rm stream}/M_{\rm h}}\right)^{0.71\pm0.16}$ with a scatter of $0.40\,{\rm dex}$. Further, we compare halo masses and stellar masses with simulations, and find all structures are consistent with being progenitors of $M_{\rm h}(z=0)>10^{14}\,{\rm M_{\odot}}$ galaxy clusters, and the most massive central galaxies have stellar masses consistent with brightest cluster galaxies (BCGs) progenitors in the TNG300 simulation. The results strongly suggest these structures are forming massive galaxy clusters via baryonic and dark matter accretion.
△ Less
Submitted 5 July, 2024; v1 submitted 3 July, 2024;
originally announced July 2024.
-
Coded Cooperative Networks for Semi-Decentralized Federated Learning
Authors:
Shudi Weng,
Ming Xiao,
Mikael Skoglund
Abstract:
To enhance straggler resilience in federated learning (FL) systems, a semi-decentralized approach has been recently proposed, enabling collaboration between clients. Unlike the existing semi-decentralized schemes, which adaptively adjust the collaboration weight according to the network topology, this letter proposes a deterministic coded network that leverages wireless diversity for semi-decentra…
▽ More
To enhance straggler resilience in federated learning (FL) systems, a semi-decentralized approach has been recently proposed, enabling collaboration between clients. Unlike the existing semi-decentralized schemes, which adaptively adjust the collaboration weight according to the network topology, this letter proposes a deterministic coded network that leverages wireless diversity for semi-decentralized FL without requiring prior information about the entire network. Furthermore, the theoretical analyses of the outage and the convergence rate of the proposed scheme are provided. Finally, the superiority of our proposed method over benchmark methods is demonstrated through comprehensive simulations.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs
Authors:
Zirui Wang,
Mengzhou Xia,
Luxi He,
Howard Chen,
Yitao Liu,
Richard Zhu,
Kaiqu Liang,
Xindi Wu,
Haotian Liu,
Sadhika Malladi,
Alexis Chevalier,
Sanjeev Arora,
Danqi Chen
Abstract:
Chart understanding plays a pivotal role when applying Multimodal Large Language Models (MLLMs) to real-world tasks such as analyzing scientific papers or financial reports. However, existing datasets often focus on oversimplified and homogeneous charts with template-based questions, leading to an over-optimistic measure of progress. We demonstrate that although open-source models can appear to ou…
▽ More
Chart understanding plays a pivotal role when applying Multimodal Large Language Models (MLLMs) to real-world tasks such as analyzing scientific papers or financial reports. However, existing datasets often focus on oversimplified and homogeneous charts with template-based questions, leading to an over-optimistic measure of progress. We demonstrate that although open-source models can appear to outperform strong proprietary models on these benchmarks, a simple stress test with slightly different charts or questions can deteriorate performance by up to 34.5%. In this work, we propose CharXiv, a comprehensive evaluation suite involving 2,323 natural, challenging, and diverse charts from arXiv papers. CharXiv includes two types of questions: 1) descriptive questions about examining basic chart elements and 2) reasoning questions that require synthesizing information across complex visual elements in the chart. To ensure quality, all charts and questions are handpicked, curated, and verified by human experts. Our results reveal a substantial, previously underestimated gap between the reasoning skills of the strongest proprietary model (i.e., GPT-4o), which achieves 47.1% accuracy, and the strongest open-source model (i.e., InternVL Chat V1.5), which achieves 29.2%. All models lag far behind human performance of 80.5%, underscoring weaknesses in the chart understanding capabilities of existing MLLMs. We hope CharXiv facilitates future research on MLLM chart understanding by providing a more realistic and faithful measure of progress. Project page and leaderboard: https://charxiv.github.io/
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models
Authors:
Haonan Qiu,
Zhaoxi Chen,
Zhouxia Wang,
Yingqing He,
Menghan Xia,
Ziwei Liu
Abstract:
Diffusion model has demonstrated remarkable capability in video generation, which further sparks interest in introducing trajectory control into the generation process. While existing works mainly focus on training-based methods (e.g., conditional adapter), we argue that diffusion model itself allows decent control over the generated content without requiring any training. In this study, we introd…
▽ More
Diffusion model has demonstrated remarkable capability in video generation, which further sparks interest in introducing trajectory control into the generation process. While existing works mainly focus on training-based methods (e.g., conditional adapter), we argue that diffusion model itself allows decent control over the generated content without requiring any training. In this study, we introduce a tuning-free framework to achieve trajectory-controllable video generation, by imposing guidance on both noise construction and attention computation. Specifically, 1) we first show several instructive phenomenons and analyze how initial noises influence the motion trajectory of generated content. 2) Subsequently, we propose FreeTraj, a tuning-free approach that enables trajectory control by modifying noise sampling and attention mechanisms. 3) Furthermore, we extend FreeTraj to facilitate longer and larger video generation with controllable trajectories. Equipped with these designs, users have the flexibility to provide trajectories manually or opt for trajectories automatically generated by the LLM trajectory planner. Extensive experiments validate the efficacy of our approach in enhancing the trajectory controllability of video diffusion models.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Using graph neural networks to reconstruct charged pion showers in the CMS High Granularity Calorimeter
Authors:
M. Aamir,
B. Acar,
G. Adamov,
T. Adams,
C. Adloff,
S. Afanasiev,
C. Agrawal,
C. Agrawal,
A. Ahmad,
H. A. Ahmed,
S. Akbar,
N. Akchurin,
B. Akgul,
B. Akgun,
R. O. Akpinar,
E. Aktas,
A. AlKadhim,
V. Alexakhin,
J. Alimena,
J. Alison,
A. Alpana,
W. Alshehri,
P. Alvarez Dominguez,
M. Alyari,
C. Amendola
, et al. (550 additional authors not shown)
Abstract:
A novel method to reconstruct the energy of hadronic showers in the CMS High Granularity Calorimeter (HGCAL) is presented. The HGCAL is a sampling calorimeter with very fine transverse and longitudinal granularity. The active media are silicon sensors and scintillator tiles readout by SiPMs and the absorbers are a combination of lead and Cu/CuW in the electromagnetic section, and steel in the hadr…
▽ More
A novel method to reconstruct the energy of hadronic showers in the CMS High Granularity Calorimeter (HGCAL) is presented. The HGCAL is a sampling calorimeter with very fine transverse and longitudinal granularity. The active media are silicon sensors and scintillator tiles readout by SiPMs and the absorbers are a combination of lead and Cu/CuW in the electromagnetic section, and steel in the hadronic section. The shower reconstruction method is based on graph neural networks and it makes use of a dynamic reduction network architecture. It is shown that the algorithm is able to capture and mitigate the main effects that normally hinder the reconstruction of hadronic showers using classical reconstruction methods, by compensating for fluctuations in the multiplicity, energy, and spatial distributions of the shower's constituents. The performance of the algorithm is evaluated using test beam data collected in 2018 prototype of the CMS HGCAL accompanied by a section of the CALICE AHCAL prototype. The capability of the method to mitigate the impact of energy leakage from the calorimeter is also demonstrated.
△ Less
Submitted 30 June, 2024; v1 submitted 17 June, 2024;
originally announced June 2024.
-
Extracting $α_\mathrm{S}$ at future $e^+e^{-}$ Higgs factory with energy correlators
Authors:
Zhen Lin,
Manqi Ruan,
Meng Xiao,
Zhen Xu
Abstract:
The prospected sensitivity in $α_\mathrm{S}$ determination using an event shape observable, ratio of energy correlators at future electron-positron collider is presented. The study focuses on the collinear region which has suffered from large theoretical and hadronization uncertainty in the past. The ratio effectively reduces the impacts of the uncertainties. With the amount of data that future el…
▽ More
The prospected sensitivity in $α_\mathrm{S}$ determination using an event shape observable, ratio of energy correlators at future electron-positron collider is presented. The study focuses on the collinear region which has suffered from large theoretical and hadronization uncertainty in the past. The ratio effectively reduces the impacts of the uncertainties. With the amount of data that future electron-positron collider could produce in 1 minute (40 $\text{pb}^{-1}$) and 0.5 hour (1 $\text{fb}^{-1}$), a 1% and 0.2% precision of $α_\mathrm{S}$ could be reached.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
Solving Co-Path/Cycle Packing Faster than $3^k$
Authors:
Yuxi Liu,
Mingyu Xiao
Abstract:
The \textsc{Co-Path/Cycle Packing} problem asks whether we can delete at most $k$ vertices from the input graph such that the remaining graph is a collection of induced paths and cycles. \textsc{Co-Path/Cycle Packing} is a fundamental graph problem that has important applications in bioinformatics. Although this problem has been extensively studied in parameterized algorithms, it seems hard to bre…
▽ More
The \textsc{Co-Path/Cycle Packing} problem asks whether we can delete at most $k$ vertices from the input graph such that the remaining graph is a collection of induced paths and cycles. \textsc{Co-Path/Cycle Packing} is a fundamental graph problem that has important applications in bioinformatics. Although this problem has been extensively studied in parameterized algorithms, it seems hard to break the running time bound $3^k$. In 2015, Feng et al. provided an $O^*(3^k)$-time randomized algorithm. Recently, Tsur showed that this problem can be solved in $O^*(3^k)$ time deterministically. In this paper, by combining several techniques such as path decomposition, dynamic programming, and branch-and-search methods, we show that \textsc{Co-Path/Cycle Packing} can be solved in $O^*(2.8192^k)$ time. As a by-product, we also show that the \textsc{$d$-Bounded-Degree Vertex Deletion} problem, a generalization of \textsc{Co-Path/Cycle Packing}, can be solved in $O^*((d + 2)^p)$ time if a path decomposition of width $p$ is given, which implies that \textsc{$d$-Bounded-Degree Vertex Deletion} is FPT with parameter $p+d$.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
Photonic realization of chiral hinge states in a Chern-insulator stack
Authors:
Han-Rong Xia,
Jia-Zheng Li,
Si-Yu Yuan,
Meng Xiao
Abstract:
Higher-order topological insulators, as a novel family of topological phases, are a hot frontier in condensed matter physics due to their adherence to unconventional bulk-boundary correspondence. A three-dimensional second-order topological insulator can support one-dimensional modes along its hinges (dubbed as hinge states). Here, we present a simple and direct method to construct chiral hinge mo…
▽ More
Higher-order topological insulators, as a novel family of topological phases, are a hot frontier in condensed matter physics due to their adherence to unconventional bulk-boundary correspondence. A three-dimensional second-order topological insulator can support one-dimensional modes along its hinges (dubbed as hinge states). Here, we present a simple and direct method to construct chiral hinge modes based on a Chern-insulator stack. We analyze the existence of the hinge modes through the nontrivial quadrupole indices, and then design a photonic crystal to realize the specific flowing pattern of the hinge mode in our model. The experimental results align well with full-wave simulations, clearly demonstrating the existence of chiral hinge states. We also verify the robustness of these hinge states against defects in our photonic system.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
Discovery of a new N-emitter in the epoch of reionization
Authors:
D. Schaerer,
R. Marques-Chaves,
M. Xiao,
D. Korber
Abstract:
We report the discovery of a compact star-forming galaxy at $z=9.380$ in the GOODS-North field (named GN-z9p4) which shows numerous strong UV-optical emission lines and a single UV line, NIV] 1486. This makes GN-z9p4 the third-highest redshift N-emitter known to date. We determine the nebular abundances of H, C, N, O and Ne, size, and other physical properties of this object, and compare them to t…
▽ More
We report the discovery of a compact star-forming galaxy at $z=9.380$ in the GOODS-North field (named GN-z9p4) which shows numerous strong UV-optical emission lines and a single UV line, NIV] 1486. This makes GN-z9p4 the third-highest redshift N-emitter known to date. We determine the nebular abundances of H, C, N, O and Ne, size, and other physical properties of this object, and compare them to those of the other N-emitters known so far and to other star-forming galaxies. Using the direct method we find a metallicity 12+log(O/H)$=7.37 \pm 0.15$, one of the lowest among the N-emitters. The N/O abundance ratio is highly super-solar, and C/O and Ne/O normal compared to other galaxies at low metallicity. We show that the compactness of GN-z9p4 (with effective radius $118\pm16$ pc at 2 micron) and other N-emitters translates into very high stellar mass and SFR surface densities, which could be a criterium to identify other N-emitters. Future studies and larger samples are needed to understand these recently discovered, rare, and enigmatic objects.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
2.5D Multi-view Averaging Diffusion Model for 3D Medical Image Translation: Application to Low-count PET Reconstruction with CT-less Attenuation Correction
Authors:
Tianqi Chen,
Jun Hou,
Yinchi Zhou,
Huidong Xie,
Xiongchao Chen,
Qiong Liu,
Xueqi Guo,
Menghua Xia,
James S. Duncan,
Chi Liu,
Bo Zhou
Abstract:
Positron Emission Tomography (PET) is an important clinical imaging tool but inevitably introduces radiation hazards to patients and healthcare providers. Reducing the tracer injection dose and eliminating the CT acquisition for attenuation correction can reduce the overall radiation dose, but often results in PET with high noise and bias. Thus, it is desirable to develop 3D methods to translate t…
▽ More
Positron Emission Tomography (PET) is an important clinical imaging tool but inevitably introduces radiation hazards to patients and healthcare providers. Reducing the tracer injection dose and eliminating the CT acquisition for attenuation correction can reduce the overall radiation dose, but often results in PET with high noise and bias. Thus, it is desirable to develop 3D methods to translate the non-attenuation-corrected low-dose PET (NAC-LDPET) into attenuation-corrected standard-dose PET (AC-SDPET). Recently, diffusion models have emerged as a new state-of-the-art deep learning method for image-to-image translation, better than traditional CNN-based methods. However, due to the high computation cost and memory burden, it is largely limited to 2D applications. To address these challenges, we developed a novel 2.5D Multi-view Averaging Diffusion Model (MADM) for 3D image-to-image translation with application on NAC-LDPET to AC-SDPET translation. Specifically, MADM employs separate diffusion models for axial, coronal, and sagittal views, whose outputs are averaged in each sampling step to ensure the 3D generation quality from multiple views. To accelerate the 3D sampling process, we also proposed a strategy to use the CNN-based 3D generation as a prior for the diffusion model. Our experimental results on human patient studies suggested that MADM can generate high-quality 3D translation images, outperforming previous CNN-based and Diffusion-based baseline methods.
△ Less
Submitted 15 June, 2024; v1 submitted 12 June, 2024;
originally announced June 2024.
-
Enhanced Gene Selection in Single-Cell Genomics: Pre-Filtering Synergy and Reinforced Optimization
Authors:
Weiliang Zhang,
Zhen Meng,
Dongjie Wang,
Min Wu,
Kunpeng Liu,
Yuanchun Zhou,
Meng Xiao
Abstract:
Recent advancements in single-cell genomics necessitate precision in gene panel selection to interpret complex biological data effectively. Those methods aim to streamline the analysis of scRNA-seq data by focusing on the most informative genes that contribute significantly to the specific analysis task. Traditional selection methods, which often rely on expert domain knowledge, embedded machine l…
▽ More
Recent advancements in single-cell genomics necessitate precision in gene panel selection to interpret complex biological data effectively. Those methods aim to streamline the analysis of scRNA-seq data by focusing on the most informative genes that contribute significantly to the specific analysis task. Traditional selection methods, which often rely on expert domain knowledge, embedded machine learning models, or heuristic-based iterative optimization, are prone to biases and inefficiencies that may obscure critical genomic signals. Recognizing the limitations of traditional methods, we aim to transcend these constraints with a refined strategy. In this study, we introduce an iterative gene panel selection strategy that is applicable to clustering tasks in single-cell genomics. Our method uniquely integrates results from other gene selection algorithms, providing valuable preliminary boundaries or prior knowledge as initial guides in the search space to enhance the efficiency of our framework. Furthermore, we incorporate the stochastic nature of the exploration process in reinforcement learning (RL) and its capability for continuous optimization through reward-based feedback. This combination mitigates the biases inherent in the initial boundaries and harnesses RL's adaptability to refine and target gene panel selection dynamically. To illustrate the effectiveness of our method, we conducted detailed comparative experiments, case studies, and visualization analysis.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Holistic Memory Diversification for Incremental Learning in Growing Graphs
Authors:
Ziyue Qiao,
Junren Xiao,
Qingqiang Sun,
Meng Xiao,
Hui Xiong
Abstract:
This paper addresses the challenge of incremental learning in growing graphs with increasingly complex tasks. The goal is to continually train a graph model to handle new tasks while retaining its inference ability on previous tasks. Existing methods usually neglect the importance of memory diversity, limiting in effectively selecting high-quality memory from previous tasks and remembering broad p…
▽ More
This paper addresses the challenge of incremental learning in growing graphs with increasingly complex tasks. The goal is to continually train a graph model to handle new tasks while retaining its inference ability on previous tasks. Existing methods usually neglect the importance of memory diversity, limiting in effectively selecting high-quality memory from previous tasks and remembering broad previous knowledge within the scarce memory on graphs. To address that, we introduce a novel holistic Diversified Memory Selection and Generation (DMSG) framework for incremental learning in graphs, which first introduces a buffer selection strategy that considers both intra-class and inter-class diversities, employing an efficient greedy algorithm for sampling representative training nodes from graphs into memory buffers after learning each new task. Then, to adequately rememorize the knowledge preserved in the memory buffer when learning new tasks, we propose a diversified memory generation replay method. This method first utilizes a variational layer to generate the distribution of buffer node embeddings and sample synthesized ones for replaying. Furthermore, an adversarial variational embedding learning method and a reconstruction-based decoder are proposed to maintain the integrity and consolidate the generalization of the synthesized node embeddings, respectively. Finally, we evaluate our model on node classification tasks involving increasing class numbers. Extensive experimental results on publicly accessible datasets demonstrate the superiority of DMSG over state-of-the-art methods.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Enhancing Tabular Data Optimization with a Flexible Graph-based Reinforced Exploration Strategy
Authors:
Xiaohan Huang,
Dongjie Wang,
Zhiyuan Ning,
Ziyue Qiao,
Qingqing Long,
Haowei Zhu,
Min Wu,
Yuanchun Zhou,
Meng Xiao
Abstract:
Tabular data optimization methods aim to automatically find an optimal feature transformation process that generates high-value features and improves the performance of downstream machine learning tasks. Current frameworks for automated feature transformation rely on iterative sequence generation tasks, optimizing decision strategies through performance feedback from downstream tasks. However, the…
▽ More
Tabular data optimization methods aim to automatically find an optimal feature transformation process that generates high-value features and improves the performance of downstream machine learning tasks. Current frameworks for automated feature transformation rely on iterative sequence generation tasks, optimizing decision strategies through performance feedback from downstream tasks. However, these approaches fail to effectively utilize historical decision-making experiences and overlook potential relationships among generated features, thus limiting the depth of knowledge extraction. Moreover, the granularity of the decision-making process lacks dynamic backtracking capabilities for individual features, leading to insufficient adaptability when encountering inefficient pathways, adversely affecting overall robustness and exploration efficiency. To address the limitations observed in current automatic feature engineering frameworks, we introduce a novel method that utilizes a feature-state transformation graph to effectively preserve the entire feature transformation journey, where each node represents a specific transformation state. During exploration, three cascading agents iteratively select nodes and idea mathematical operations to generate new transformation states. This strategy leverages the inherent properties of the graph structure, allowing for the preservation and reuse of valuable transformations. It also enables backtracking capabilities through graph pruning techniques, which can rectify inefficient transformation paths. To validate the efficacy and flexibility of our approach, we conducted comprehensive experiments and detailed case studies, demonstrating superior performance in diverse scenarios.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
A local squared Wasserstein-2 method for efficient reconstruction of models with uncertainty
Authors:
Mingtao Xia,
Qijing Shen
Abstract:
In this paper, we propose a local squared Wasserstein-2 (W_2) method to solve the inverse problem of reconstructing models with uncertain latent variables or parameters. A key advantage of our approach is that it does not require prior information on the distribution of the latent variables or parameters in the underlying models. Instead, our method can efficiently reconstruct the distributions of…
▽ More
In this paper, we propose a local squared Wasserstein-2 (W_2) method to solve the inverse problem of reconstructing models with uncertain latent variables or parameters. A key advantage of our approach is that it does not require prior information on the distribution of the latent variables or parameters in the underlying models. Instead, our method can efficiently reconstruct the distributions of the output associated with different inputs based on empirical distributions of observation data. We demonstrate the effectiveness of our proposed method across several uncertainty quantification (UQ) tasks, including linear regression with coefficient uncertainty, training neural networks with weight uncertainty, and reconstructing ordinary differential equations (ODEs) with a latent random variable.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Element-wise Multiplication Based Physics-informed Neural Networks
Authors:
Feilong Jiang,
Xiaonan Hou,
Min Xia
Abstract:
As a promising framework for resolving partial differential equations (PDEs), physics-informed neural networks (PINNs) have received widespread attention from industrial and scientific fields. However, lack of expressive ability and initialization pathology issues are found to prevent the application of PINNs in complex PDEs. In this work, we propose Element-wise Multiplication Based Physics-infor…
▽ More
As a promising framework for resolving partial differential equations (PDEs), physics-informed neural networks (PINNs) have received widespread attention from industrial and scientific fields. However, lack of expressive ability and initialization pathology issues are found to prevent the application of PINNs in complex PDEs. In this work, we propose Element-wise Multiplication Based Physics-informed Neural Networks (EM-PINNs) to resolve these issues. The element-wise multiplication operation is adopted to transform features into high-dimensional, non-linear spaces, which effectively enhance the expressive capability of PINNs. Benefiting from element-wise multiplication operation, EM-PINNs can eliminate the initialization pathologies of PINNs. The proposed structure is verified on various benchmarks. The results show that EM-PINNs have strong expressive ability.
△ Less
Submitted 16 June, 2024; v1 submitted 6 June, 2024;
originally announced June 2024.
-
An efficient Wasserstein-distance approach for reconstructing jump-diffusion processes using parameterized neural networks
Authors:
Mingtao Xia,
Xiangting Li,
Qijing Shen,
Tom Chou
Abstract:
We analyze the Wasserstein distance ($W$-distance) between two probability distributions associated with two multidimensional jump-diffusion processes. Specifically, we analyze a temporally decoupled squared $W_2$-distance, which provides both upper and lower bounds associated with the discrepancies in the drift, diffusion, and jump amplitude functions between the two jump-diffusion processes. The…
▽ More
We analyze the Wasserstein distance ($W$-distance) between two probability distributions associated with two multidimensional jump-diffusion processes. Specifically, we analyze a temporally decoupled squared $W_2$-distance, which provides both upper and lower bounds associated with the discrepancies in the drift, diffusion, and jump amplitude functions between the two jump-diffusion processes. Then, we propose a temporally decoupled squared $W_2$-distance method for efficiently reconstructing unknown jump-diffusion processes from data using parameterized neural networks. We further show its performance can be enhanced by utilizing prior information on the drift function of the jump-diffusion process. The effectiveness of our proposed reconstruction method is demonstrated across several examples and applications.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Measurement of Electron Antineutrino Oscillation Amplitude and Frequency via Neutron Capture on Hydrogen at Daya Bay
Authors:
Daya Bay collaboration,
F. P. An,
W. D. Bai,
A. B. Balantekin,
M. Bishai,
S. Blyth,
G. F. Cao,
J. Cao,
J. F. Chang,
Y. Chang,
H. S. Chen,
H. Y. Chen,
S. M. Chen,
Y. Chen,
Y. X. Chen,
Z. Y. Chen,
J. Cheng,
J. Cheng,
Y. -C. Cheng,
Z. K. Cheng,
J. J. Cherwinka,
M. C. Chu,
J. P. Cummings,
O. Dalager,
F. S. Deng
, et al. (177 additional authors not shown)
Abstract:
This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive…
▽ More
This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive region, the relative $\overlineν_{e}$ rates and energy spectra variation among the near and far detectors gives $\mathrm{sin}^22θ_{13} = 0.0759_{-0.0049}^{+0.0050}$ and $Δm^2_{32} = (2.72^{+0.14}_{-0.15})\times10^{-3}$ eV$^2$ assuming the normal neutrino mass ordering, and $Δm^2_{32} = (-2.83^{+0.15}_{-0.14})\times10^{-3}$ eV$^2$ for the inverted neutrino mass ordering. This estimate of $\sin^2 2θ_{13}$ is consistent with and essentially independent from the one obtained using the capture-on-gadolinium sample at Daya Bay. The combination of these two results yields $\mathrm{sin}^22θ_{13}= 0.0833\pm0.0022$, which represents an 8% relative improvement in precision regarding the Daya Bay full 3158-day capture-on-gadolinium result.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Prospect of measuring the top quark mass through energy correlators
Authors:
Meng Xiao,
Yulei Ye,
Xinyu Zhu
Abstract:
Reaching a high precision of the top quark mass is an important task of the Large Hadron Collider. We perform a feasibility study of measuring the top quark mass through the three-point energy correlator. The expected sensitivity of the top quark mass in the boosted regime is presented. We further introduce its application to the low top $p_\text{T}$ regime and demonstrate that both the W boson an…
▽ More
Reaching a high precision of the top quark mass is an important task of the Large Hadron Collider. We perform a feasibility study of measuring the top quark mass through the three-point energy correlator. The expected sensitivity of the top quark mass in the boosted regime is presented. We further introduce its application to the low top $p_\text{T}$ regime and demonstrate that both the W boson and the top quark masses could be extracted from this single observable. Compared to traditional observables, the energy correlator shows robustness to uncertainties that usually dominate experimental measurements and provides a promising way to improve experimental precision.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
ToonCrafter: Generative Cartoon Interpolation
Authors:
Jinbo Xing,
Hanyuan Liu,
Menghan Xia,
Yong Zhang,
Xintao Wang,
Ying Shan,
Tien-Tsin Wong
Abstract:
We introduce ToonCrafter, a novel approach that transcends traditional correspondence-based cartoon video interpolation, paving the way for generative interpolation. Traditional methods, that implicitly assume linear motion and the absence of complicated phenomena like dis-occlusion, often struggle with the exaggerated non-linear and large motions with occlusion commonly found in cartoons, resulti…
▽ More
We introduce ToonCrafter, a novel approach that transcends traditional correspondence-based cartoon video interpolation, paving the way for generative interpolation. Traditional methods, that implicitly assume linear motion and the absence of complicated phenomena like dis-occlusion, often struggle with the exaggerated non-linear and large motions with occlusion commonly found in cartoons, resulting in implausible or even failed interpolation results. To overcome these limitations, we explore the potential of adapting live-action video priors to better suit cartoon interpolation within a generative framework. ToonCrafter effectively addresses the challenges faced when applying live-action video motion priors to generative cartoon interpolation. First, we design a toon rectification learning strategy that seamlessly adapts live-action video priors to the cartoon domain, resolving the domain gap and content leakage issues. Next, we introduce a dual-reference-based 3D decoder to compensate for lost details due to the highly compressed latent prior spaces, ensuring the preservation of fine details in interpolation results. Finally, we design a flexible sketch encoder that empowers users with interactive control over the interpolation results. Experimental results demonstrate that our proposed method not only produces visually convincing and more natural dynamics, but also effectively handles dis-occlusion. The comparative evaluation demonstrates the notable superiority of our approach over existing competitors.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Temporal Spiking Neural Networks with Synaptic Delay for Graph Reasoning
Authors:
Mingqing Xiao,
Yixin Zhu,
Di He,
Zhouchen Lin
Abstract:
Spiking neural networks (SNNs) are investigated as biologically inspired models of neural computation, distinguished by their computational capability and energy efficiency due to precise spiking times and sparse spikes with event-driven computation. A significant question is how SNNs can emulate human-like graph-based reasoning of concepts and relations, especially leveraging the temporal domain…
▽ More
Spiking neural networks (SNNs) are investigated as biologically inspired models of neural computation, distinguished by their computational capability and energy efficiency due to precise spiking times and sparse spikes with event-driven computation. A significant question is how SNNs can emulate human-like graph-based reasoning of concepts and relations, especially leveraging the temporal domain optimally. This paper reveals that SNNs, when amalgamated with synaptic delay and temporal coding, are proficient in executing (knowledge) graph reasoning. It is elucidated that spiking time can function as an additional dimension to encode relation properties via a neural-generalized path formulation. Empirical results highlight the efficacy of temporal delay in relation processing and showcase exemplary performance in diverse graph reasoning tasks. The spiking model is theoretically estimated to achieve $20\times$ energy savings compared to non-spiking counterparts, deepening insights into the capabilities and potential of biologically inspired SNNs for efficient reasoning. The code is available at https://github.com/pkuxmq/GRSNN.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Policy Gradient Methods for Risk-Sensitive Distributional Reinforcement Learning with Provable Convergence
Authors:
Minheng Xiao,
Xian Yu,
Lei Ying
Abstract:
Risk-sensitive reinforcement learning (RL) is crucial for maintaining reliable performance in many high-stakes applications. While most RL methods aim to learn a point estimate of the random cumulative cost, distributional RL (DRL) seeks to estimate the entire distribution of it. The distribution provides all necessary information about the cost and leads to a unified framework for handling variou…
▽ More
Risk-sensitive reinforcement learning (RL) is crucial for maintaining reliable performance in many high-stakes applications. While most RL methods aim to learn a point estimate of the random cumulative cost, distributional RL (DRL) seeks to estimate the entire distribution of it. The distribution provides all necessary information about the cost and leads to a unified framework for handling various risk measures in a risk-sensitive setting. However, developing policy gradient methods for risk-sensitive DRL is inherently more complex as it pertains to finding the gradient of a probability measure. This paper introduces a policy gradient method for risk-sensitive DRL with general coherent risk measures, where we provide an analytical form of the probability measure's gradient. We further prove the local convergence of the proposed algorithm under mild smoothness assumptions. For practical use, we also design a categorical distributional policy gradient algorithm (CDPG) based on categorical distributional policy evaluation and trajectory-based gradient estimation. Through experiments on a stochastic cliff-walking environment, we illustrate the benefits of considering a risk-sensitive setting in DRL.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
SimPO: Simple Preference Optimization with a Reference-Free Reward
Authors:
Yu Meng,
Mengzhou Xia,
Danqi Chen
Abstract:
Direct Preference Optimization (DPO) is a widely used offline preference optimization algorithm that reparameterizes reward functions in reinforcement learning from human feedback (RLHF) to enhance simplicity and training stability. In this work, we propose SimPO, a simpler yet more effective approach. The effectiveness of SimPO is attributed to a key design: using the average log probability of a…
▽ More
Direct Preference Optimization (DPO) is a widely used offline preference optimization algorithm that reparameterizes reward functions in reinforcement learning from human feedback (RLHF) to enhance simplicity and training stability. In this work, we propose SimPO, a simpler yet more effective approach. The effectiveness of SimPO is attributed to a key design: using the average log probability of a sequence as the implicit reward. This reward formulation better aligns with model generation and eliminates the need for a reference model, making it more compute and memory efficient. Additionally, we introduce a target reward margin to the Bradley-Terry objective to encourage a larger margin between the winning and losing responses, further enhancing the algorithm's performance. We compare SimPO to DPO and its latest variants across various state-of-the-art training setups, including both base and instruction-tuned models like Mistral and Llama3. We evaluated on extensive instruction-following benchmarks, including AlpacaEval 2, MT-Bench, and the recent challenging Arena-Hard benchmark. Our results demonstrate that SimPO consistently and significantly outperforms existing approaches without substantially increasing response length. Specifically, SimPO outperforms DPO by up to 6.4 points on AlpacaEval 2 and by up to 7.5 points on Arena-Hard. Our top-performing model, built on Llama3-8B-Instruct, achieves a remarkable 53.7 length-controlled win rate on AlpacaEval 2 -- surpassing Claude 3 Opus on the leaderboard, and a 36.5 win rate on Arena-Hard -- making it the strongest 8B open-source model.
△ Less
Submitted 8 July, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
Dose-aware Diffusion Model for 3D Low-dose PET: Multi-institutional Validation with Reader Study and Real Low-dose Data
Authors:
Huidong Xie,
Weijie Gan,
Bo Zhou,
Ming-Kai Chen,
Michal Kulon,
Annemarie Boustani,
Benjamin A. Spencer,
Reimund Bayerlein,
Xiongchao Chen,
Qiong Liu,
Xueqi Guo,
Menghua Xia,
Yinchi Zhou,
Hui Liu,
Liang Guo,
Hongyu An,
Ulugbek S. Kamilov,
Hanzhong Wang,
Biao Li,
Axel Rominger,
Kuangyu Shi,
Ge Wang,
Ramsey D. Badawi,
Chi Liu
Abstract:
As PET imaging is accompanied by radiation exposure and potentially increased cancer risk, reducing radiation dose in PET scans without compromising the image quality is an important topic. Deep learning (DL) techniques have been investigated for low-dose PET imaging. However, existing models have often resulted in compromised image quality when achieving low-dose PET and have limited generalizabi…
▽ More
As PET imaging is accompanied by radiation exposure and potentially increased cancer risk, reducing radiation dose in PET scans without compromising the image quality is an important topic. Deep learning (DL) techniques have been investigated for low-dose PET imaging. However, existing models have often resulted in compromised image quality when achieving low-dose PET and have limited generalizability to different image noise-levels, acquisition protocols, patient populations, and hospitals. Recently, diffusion models have emerged as the new state-of-the-art generative model to generate high-quality samples and have demonstrated strong potential for medical imaging tasks. However, for low-dose PET imaging, existing diffusion models failed to generate consistent 3D reconstructions, unable to generalize across varying noise-levels, often produced visually-appealing but distorted image details, and produced images with biased tracer uptake. Here, we develop DDPET-3D, a dose-aware diffusion model for 3D low-dose PET imaging to address these challenges. Collected from 4 medical centers globally with different scanners and clinical protocols, we extensively evaluated the proposed model using a total of 9,783 18F-FDG studies (1,596 patients) with low-dose/low-count levels ranging from 1% to 50%. With a cross-center, cross-scanner validation, the proposed DDPET-3D demonstrated its potential to generalize to different low-dose levels, different scanners, and different clinical protocols. As confirmed with reader studies performed by nuclear medicine physicians, the proposed method produced superior denoised results that are comparable to or even better than the 100% full-count images as well as previous DL baselines. The presented results show the potential of achieving low-dose PET while maintaining image quality. Lastly, a group of real low-dose scans was also included for evaluation.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Spatio-temporal Attention-based Hidden Physics-informed Neural Network for Remaining Useful Life Prediction
Authors:
Feilong Jiang,
Xiaonan Hou,
Min Xia
Abstract:
Predicting the Remaining Useful Life (RUL) is essential in Prognostic Health Management (PHM) for industrial systems. Although deep learning approaches have achieved considerable success in predicting RUL, challenges such as low prediction accuracy and interpretability pose significant challenges, hindering their practical implementation. In this work, we introduce a Spatio-temporal Attention-base…
▽ More
Predicting the Remaining Useful Life (RUL) is essential in Prognostic Health Management (PHM) for industrial systems. Although deep learning approaches have achieved considerable success in predicting RUL, challenges such as low prediction accuracy and interpretability pose significant challenges, hindering their practical implementation. In this work, we introduce a Spatio-temporal Attention-based Hidden Physics-informed Neural Network (STA-HPINN) for RUL prediction, which can utilize the associated physics of the system degradation. The spatio-temporal attention mechanism can extract important features from the input data. With the self-attention mechanism on both the sensor dimension and time step dimension, the proposed model can effectively extract degradation information. The hidden physics-informed neural network is utilized to capture the physics mechanisms that govern the evolution of RUL. With the constraint of physics, the model can achieve higher accuracy and reasonable predictions. The approach is validated on a benchmark dataset, demonstrating exceptional performance when compared to cutting-edge methods, especially in the case of complex conditions.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
Revisiting the Robust Generalization of Adversarial Prompt Tuning
Authors:
Fan Yang,
Mingxuan Xia,
Sangzhou Xia,
Chicheng Ma,
Hui Hui
Abstract:
Understanding the vulnerability of large-scale pre-trained vision-language models like CLIP against adversarial attacks is key to ensuring zero-shot generalization capacity on various downstream tasks. State-of-the-art defense mechanisms generally adopt prompt learning strategies for adversarial fine-tuning to improve the adversarial robustness of the pre-trained model while keeping the efficiency…
▽ More
Understanding the vulnerability of large-scale pre-trained vision-language models like CLIP against adversarial attacks is key to ensuring zero-shot generalization capacity on various downstream tasks. State-of-the-art defense mechanisms generally adopt prompt learning strategies for adversarial fine-tuning to improve the adversarial robustness of the pre-trained model while keeping the efficiency of adapting to downstream tasks. Such a setup leads to the problem of over-fitting which impedes further improvement of the model's generalization capacity on both clean and adversarial examples. In this work, we propose an adaptive Consistency-guided Adversarial Prompt Tuning (i.e., CAPT) framework that utilizes multi-modal prompt learning to enhance the alignment of image and text features for adversarial examples and leverage the strong generalization of pre-trained CLIP to guide the model-enhancing its robust generalization on adversarial examples while maintaining its accuracy on clean ones. We also design a novel adaptive consistency objective function to balance the consistency of adversarial inputs and clean inputs between the fine-tuning model and the pre-trained model. We conduct extensive experiments across 14 datasets and 4 data sparsity schemes (from 1-shot to full training data settings) to show the superiority of CAPT over other state-of-the-art adaption methods. CAPT demonstrated excellent performance in terms of the in-distribution performance and the generalization under input distribution shift and across datasets.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
Deciding regular games: a playground for exponential time algorithms
Authors:
Zihui Liang,
Bakh Khoussainov,
Mingyu Xiao
Abstract:
Regular games form a well-established class of games for analysis and synthesis of reactive systems. They include coloured Muller games, McNaughton games, Muller games, Rabin games, and Streett games. These games are played on directed graphs $\mathcal G$ where Player 0 and Player 1 play by generating an infinite path $ρ$ through the graph. The winner is determined by specifications put on the set…
▽ More
Regular games form a well-established class of games for analysis and synthesis of reactive systems. They include coloured Muller games, McNaughton games, Muller games, Rabin games, and Streett games. These games are played on directed graphs $\mathcal G$ where Player 0 and Player 1 play by generating an infinite path $ρ$ through the graph. The winner is determined by specifications put on the set $X$ of vertices in $ρ$ that occur infinitely often. These games are determined, enabling the partitioning of $\mathcal G$ into two sets $W_0$ and $W_1$ of winning positions for Player 0 and Player 1, respectively. Numerous algorithms exist that decide specific instances of regular games, e.g., Muller games, by computing $W_0$ and $W_1$. In this paper we aim to find general principles for designing uniform algorithms that decide all regular games. For this we utilise various recursive and dynamic programming algorithms that leverage standard notions such as subgames and traps. Importantly, we show that our techniques improve or match the performances of existing algorithms for many instances of regular games.
△ Less
Submitted 12 May, 2024;
originally announced May 2024.
-
FedGCS: A Generative Framework for Efficient Client Selection in Federated Learning via Gradient-based Optimization
Authors:
Zhiyuan Ning,
Chunlin Tian,
Meng Xiao,
Wei Fan,
Pengyang Wang,
Li Li,
Pengfei Wang,
Yuanchun Zhou
Abstract:
Federated Learning faces significant challenges in statistical and system heterogeneity, along with high energy consumption, necessitating efficient client selection strategies. Traditional approaches, including heuristic and learning-based methods, fall short of addressing these complexities holistically. In response, we propose FedGCS, a novel generative client selection framework that innovativ…
▽ More
Federated Learning faces significant challenges in statistical and system heterogeneity, along with high energy consumption, necessitating efficient client selection strategies. Traditional approaches, including heuristic and learning-based methods, fall short of addressing these complexities holistically. In response, we propose FedGCS, a novel generative client selection framework that innovatively recasts the client selection process as a generative task. Drawing inspiration from the methodologies used in large language models, FedGCS efficiently encodes abundant decision-making knowledge within a continuous representation space, enabling efficient gradient-based optimization to search for optimal client selection that will be finally output via generation. The framework comprises four steps: (1) automatic collection of diverse "selection-score" pair data using classical client selection methods; (2) training an encoder-evaluator-decoder framework on this data to construct a continuous representation space; (3) employing gradient-based optimization in this space for optimal client selection; (4) generating the final optimal client selection via using beam search for the well-trained decoder. FedGCS outperforms traditional methods by being more comprehensive, generalizable, and efficient, simultaneously optimizing for model performance, latency, and energy consumption. The effectiveness of FedGCS is proven through extensive experimental analyses.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
JWST FRESCO: a comprehensive census of H$β$+[OIII] emitters at 6.8<z<9.0 in the GOODS fields
Authors:
R. A. Meyer,
P. A. Oesch,
E. Giovinazzo,
A. Weibel,
G. Brammer,
J. Matthee,
R. P. Naidu,
R. J. Bouwens,
J. Chisholm,
A. Covelo-Paz,
Y. Fudamoto,
M. Maseda,
E. Nelson,
I. Shivaei,
M. Xiao,
T. Herard-Demanche,
G. D. Illingworth,
J. Kerutt,
I. Kramarenko,
I. Labbe,
E. Leonova,
D. Magee,
J. Matharu,
G. Prieto Lyon,
N. Reddy
, et al. (5 additional authors not shown)
Abstract:
We present the census of H$β$+[O III] $4960,5008$ Åemitters at $6.8<z<9.0$ from the JWST FRESCO survey over 124 arcmin$^2$ in the GOODS-North and GOODS-South fields. Our unbiased spectroscopic search results in 137 spectroscopically-confirmed galaxies at $6.8<z<9.0$ with observed [O III] fluxes $f_{[O III]}\gtrsim 1\times 10^{-18}\ \rm{erg}\ \rm{s}^{-1} \ \rm{cm}^{-2}$. The rest-frame optical line…
▽ More
We present the census of H$β$+[O III] $4960,5008$ Åemitters at $6.8<z<9.0$ from the JWST FRESCO survey over 124 arcmin$^2$ in the GOODS-North and GOODS-South fields. Our unbiased spectroscopic search results in 137 spectroscopically-confirmed galaxies at $6.8<z<9.0$ with observed [O III] fluxes $f_{[O III]}\gtrsim 1\times 10^{-18}\ \rm{erg}\ \rm{s}^{-1} \ \rm{cm}^{-2}$. The rest-frame optical line ratios of the median stacked spectrum indicate negligible dust attenuation, low metallicity ($12+\log(\rm{O/H})= 7.2-7.7$) and a high ionisation parameter $\log_{10}U \simeq -2.5$ at a median UV magnitude $M_{\rm{UV}}=-19.65^{+0.59}_{-1.05}$. We find a factor $\times\ 1.3$ difference in the number density of $6.8<z<9.0$ galaxies between GOODS-South and GOODS-North, which is caused by single overdensity at $7.0<z<7.2$ in GOODS-North. The bright end of the UV luminosity function of spectroscopically-confirmed [O III] emitters is in good agreement with that from pre-JWST dropout-selected samples. Discrepancies between the observed [O III] LF, [O III] /UV ratio and [O III] equivalent widths distribution and that predicted by theoretical models suggest burstier star-formation histories and/or more heterogeneous metallicity and ionising conditions in $z>7$ galaxies. We report a rapid decline of the [O III] luminosity density at $z\gtrsim 6-7$ which cannot be explained solely by the evolution of the cosmic star-formation rate density. Finally, we find that FRESCO, in only $2$h, captures star-forming galaxies likely accounting for $\sim 10-20\%$ of the ionising budget at $z=7$ and $z=8$, raising the prospect of detecting directly all the sources of reionisation with JWST.
△ Less
Submitted 16 May, 2024; v1 submitted 8 May, 2024;
originally announced May 2024.
-
sc-OTGM: Single-Cell Perturbation Modeling by Solving Optimal Mass Transport on the Manifold of Gaussian Mixtures
Authors:
Andac Demir,
Elizaveta Solovyeva,
James Boylan,
Mei Xiao,
Fabrizio Serluca,
Sebastian Hoersch,
Jeremy Jenkins,
Murthy Devarakonda,
Bulent Kiziltan
Abstract:
Influenced by breakthroughs in LLMs, single-cell foundation models are emerging. While these models show successful performance in cell type clustering, phenotype classification, and gene perturbation response prediction, it remains to be seen if a simpler model could achieve comparable or better results, especially with limited data. This is important, as the quantity and quality of single-cell d…
▽ More
Influenced by breakthroughs in LLMs, single-cell foundation models are emerging. While these models show successful performance in cell type clustering, phenotype classification, and gene perturbation response prediction, it remains to be seen if a simpler model could achieve comparable or better results, especially with limited data. This is important, as the quantity and quality of single-cell data typically fall short of the standards in textual data used for training LLMs. Single-cell sequencing often suffers from technical artifacts, dropout events, and batch effects. These challenges are compounded in a weakly supervised setting, where the labels of cell states can be noisy, further complicating the analysis. To tackle these challenges, we present sc-OTGM, streamlined with less than 500K parameters, making it approximately 100x more compact than the foundation models, offering an efficient alternative. sc-OTGM is an unsupervised model grounded in the inductive bias that the scRNAseq data can be generated from a combination of the finite multivariate Gaussian distributions. The core function of sc-OTGM is to create a probabilistic latent space utilizing a GMM as its prior distribution and distinguish between distinct cell populations by learning their respective marginal PDFs. It uses a Hit-and-Run Markov chain sampler to determine the OT plan across these PDFs within the GMM framework. We evaluated our model against a CRISPR-mediated perturbation dataset, called CROP-seq, consisting of 57 one-gene perturbations. Our results demonstrate that sc-OTGM is effective in cell state classification, aids in the analysis of differential gene expression, and ranks genes for target identification through a recommender system. It also predicts the effects of single-gene perturbations on downstream gene regulation and generates synthetic scRNA-seq data conditioned on specific cell states.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training
Authors:
Zexuan Zhong,
Mengzhou Xia,
Danqi Chen,
Mike Lewis
Abstract:
Mixture-of-experts (MoE) models facilitate efficient scaling; however, training the router network introduces the challenge of optimizing a non-differentiable, discrete objective. Recently, a fully-differentiable MoE architecture, SMEAR, was proposed (Muqeeth et al., 2023), which softly merges experts in the parameter space; nevertheless, its effectiveness was only demonstrated in downstream fine-…
▽ More
Mixture-of-experts (MoE) models facilitate efficient scaling; however, training the router network introduces the challenge of optimizing a non-differentiable, discrete objective. Recently, a fully-differentiable MoE architecture, SMEAR, was proposed (Muqeeth et al., 2023), which softly merges experts in the parameter space; nevertheless, its effectiveness was only demonstrated in downstream fine-tuning on classification tasks. In this paper, we present Lory, the first approach that scales such architectures to autoregressive language model pre-training. Lory introduces two key techniques: (1) a causal segment routing strategy that achieves high efficiency for expert merging operations while preserving the autoregressive nature of language models; (2) a similarity-based data batching method that encourages expert specialization by grouping similar documents in training instances. We pre-train a series of Lory models on 150B tokens from scratch, with up to 32 experts and 30B (1.5B active) parameters. Experimental results show significant performance gains over parameter-matched dense models on both perplexity (+13.9%) and a variety of downstream tasks (+1.5%-11.1%). Despite segment-level routing, Lory models achieve competitive performance compared to state-of-the-art MoE models with token-level routing. We further demonstrate that the trained experts in Lory capture domain-level specialization without supervision. Our work highlights the potential of fully-differentiable MoE architectures for language model pre-training and advocates future research in this area.
△ Less
Submitted 5 May, 2024;
originally announced May 2024.
-
Superconductivity of Bulk Abnormal Magic-stoichiometric Na3Cl Salt Crystals at Normal Pressure
Authors:
Shuqiang He,
Yi-Feng Zheng,
Guosheng Shi,
Yi-Jie Xiang,
Meihui Xiao,
Qituan Zhang,
Yue-Yu Zhang,
Haiping Fang
Abstract:
The identification of new materials with superconducting properties is the pursuit in the realm of superconductivity research. Here, excitedly, we show that the simplest salt daily used can be made a superconductor at normal pressure only by adjusting its stoichiometry of Na and Cl as Na3Cl at normal pressure based on first-principles calculations. This bulk stable abnormal Na-Cl stoichiometric cr…
▽ More
The identification of new materials with superconducting properties is the pursuit in the realm of superconductivity research. Here, excitedly, we show that the simplest salt daily used can be made a superconductor at normal pressure only by adjusting its stoichiometry of Na and Cl as Na3Cl at normal pressure based on first-principles calculations. This bulk stable abnormal Na-Cl stoichiometric crystal of 3:1, the first 'magic' ratio, includes metallic (Na) atoms in the core as well as hybridization of ionic and metallic bonding, facilitating the electron-phonon-coupling for superconductivity with a critical temperature Tc of 0.13 K. The flat bands and van Hove singularities near the Fermi level produce large densities of states, similar to H3S and LaH10, which is beneficial for the emergence of superconductivity. The crystal composed of with abnormal Na-Cl magic stoichiometry is a precisely tunable, purely sodium and chloride-based, three-dimensional bulk superconductor, which is therefore an ideal material for designing and understanding abnormal stoichiometric crystals. The methodology of constructing this bulk abnormal crystal may be general to almost all elements, which could lead to insights into the physics of other conventional superconductors and even high-critical-temperature superconductors.
△ Less
Submitted 17 April, 2024;
originally announced May 2024.
-
Exact Universal Characterization of Chiral-Symmetric Higher-Order Topological Phases
Authors:
Jia-Zheng Li,
Xun-Jiang Luo,
Fengcheng Wu,
Meng Xiao
Abstract:
Utilizing a series of Bott indices formulated through polynomials of position operators, we establish a comprehensive framework for characterizing topological zero-energy corner states in systems with chiral symmetry. Our framework covers systems with arbitrary shape, including topological phases that are not characterizable by previously proposed invariants such as multipole moments or multipole…
▽ More
Utilizing a series of Bott indices formulated through polynomials of position operators, we establish a comprehensive framework for characterizing topological zero-energy corner states in systems with chiral symmetry. Our framework covers systems with arbitrary shape, including topological phases that are not characterizable by previously proposed invariants such as multipole moments or multipole chiral numbers. A key feature of our framework is its ability to capture the real-space pattern of zero-energy corner states. We provide a rigorous analytical proof of its higher-order correspondence. To demonstrate the effectiveness of our theory, we examine several model systems with representative patterns of zero-energy corner states that previous frameworks fail to classify.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
LpQcM: Adaptable Lesion-Quantification-Consistent Modulation for Deep Learning Low-Count PET Image Denoising
Authors:
Menghua Xia,
Huidong Xie,
Qiong Liu,
Bo Zhou,
Hanzhong Wang,
Biao Li,
Axel Rominger,
Kuangyu Shi,
Georges EI Fakhri,
Chi Liu
Abstract:
Deep learning-based positron emission tomography (PET) image denoising offers the potential to reduce radiation exposure and scanning time by transforming low-count images into high-count equivalents. However, existing methods typically blur crucial details, leading to inaccurate lesion quantification. This paper proposes a lesion-perceived and quantification-consistent modulation (LpQcM) strategy…
▽ More
Deep learning-based positron emission tomography (PET) image denoising offers the potential to reduce radiation exposure and scanning time by transforming low-count images into high-count equivalents. However, existing methods typically blur crucial details, leading to inaccurate lesion quantification. This paper proposes a lesion-perceived and quantification-consistent modulation (LpQcM) strategy for enhanced PET image denoising, via employing downstream lesion quantification analysis as auxiliary tools. The LpQcM is a plug-and-play design adaptable to a wide range of model architectures, modulating the sampling and optimization procedures of model training without adding any computational burden to the inference phase. Specifically, the LpQcM consists of two components, the lesion-perceived modulation (LpM) and the multiscale quantification-consistent modulation (QcM). The LpM enhances lesion contrast and visibility by allocating higher sampling weights and stricter loss criteria to lesion-present samples determined by an auxiliary segmentation network than lesion-absent ones. The QcM further emphasizes accuracy of quantification for both the mean and maximum standardized uptake value (SUVmean and SUVmax) across multiscale sub-regions throughout the entire image, thereby enhancing the overall image quality. Experiments conducted on large PET datasets from multiple centers and vendors, and varying noise levels demonstrated the LpQcM efficacy across various denoising frameworks. Compared to frameworks without LpQcM, the integration of LpQcM reduces the lesion SUVmean bias by 2.92% on average and increases the peak signal-to-noise ratio (PSNR) by 0.34 on average, for denoising images of extremely low-count levels below 10%.
△ Less
Submitted 27 April, 2024;
originally announced April 2024.
-
A First Look at Spatially Resolved Star Formation at $4.8<z<6.5$ with JWST FRESCO NIRCam Slitless Spectroscopy
Authors:
Jasleen Matharu,
Erica J. Nelson,
Gabriel Brammer,
Pascal A. Oesch,
Natalie Allen,
Irene Shivaei,
Rohan P. Naidu,
John Chisholm,
Alba Covelo-Paz,
Yoshinobu Fudamoto,
Emma Giovinazzo,
Thomas Herard-Demanche,
Josephine Kerutt,
Ivan Kramarenko,
Danilo Marchesini,
Romain A. Meyer,
Gonzalo Prieto-Lyon,
Naveen Reddy,
Marko Shuntov,
Andrea Weibel,
Stijn Wuyts,
Mengyuan Xiao
Abstract:
We present the first results on the spatial distribution of star formation in 454 star-forming galaxies at $4.8<z<6.5$ using H-Alpha emission-line maps and F444W imaging tracing the stellar continuum from the JWST FRESCO NIRCam Slitless Spectroscopy Survey. Star-forming galaxies with stellar masses $6.8\leq$log($M_{*}/\mathrm{M}_{\odot}$)$<11.1$ have positive H-Alpha equivalent width profiles, pro…
▽ More
We present the first results on the spatial distribution of star formation in 454 star-forming galaxies at $4.8<z<6.5$ using H-Alpha emission-line maps and F444W imaging tracing the stellar continuum from the JWST FRESCO NIRCam Slitless Spectroscopy Survey. Star-forming galaxies with stellar masses $6.8\leq$log($M_{*}/\mathrm{M}_{\odot}$)$<11.1$ have positive H-Alpha equivalent width profiles, providing direct evidence for the inside-out growth of galaxies just after the epoch of reionisation. GALFIT is used to calculate half-light radii, $R_{\mathrm{eff}}$ and central surface densities within 1 kiloparsec, $Σ_{1\mathrm{kpc}}$ of H-Alpha and the continuum. At a fixed stellar mass of log$(M_{*}/\mathrm{M}_{\odot})=9.5$, $Σ_{1\mathrm{kpc, H}α}$ is $1.3\pm0.1$ times higher than $Σ_{1\mathrm{kpc, C}}$ with consistent H-Alpha and continuum $R_{\mathrm{eff}}$ that are both less than 1 kpc. These measurements suggest the rapid build up of compact bulges without a significant increase in their radii. By comparing to work done at lower redshifts with HST WFC3 Slitless Spectroscopy as part of the 3D-HST ($z=1$) and CLEAR ($z=0.5$) surveys, we find that $R_{\mathrm{eff}}(z)$ and $Σ_{1\mathrm{kpc}}(z)$ evolve faster with redshift for H-Alpha. As a function of the Hubble parameter, $\frac{R_{\mathrm{eff, H}α}}{R_{\mathrm{eff, C}}}=1.1h(z)^{-0.1}$ and $\frac{Σ_{1\mathrm{kpc,H}α}}{Σ_{1\mathrm{kpc,C}}}=h(z)^{0.3}$. These functions indicate there is a transition point at $2<z<4$ where the inside-out growth of the disk starts to dominate over the inside-out growth of the bulge towards lower redshifts. This is supported by the redshift evolution in EW(H$α$) profiles, where there is rapid increase in EW(H$α$) with radius within the half-light radius at $z=5.3$ but only significantly increasing EW(H$α$) with radius in the outer disk at $z=0.5$.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
An order analysis of hyperfinite Borel equivalence relations
Authors:
Su Gao,
Ming Xiao
Abstract:
In this paper we first consider hyperfinite Borel equivalence relations with a pair of Borel $\mathbb{Z}$-orderings. We define a notion of compatibility between such pairs, and prove a dichotomy theorem which characterizes exactly when a pair of Borel $\mathbb{Z}$-orderings are compatible with each other. We show that, if a pair of Borel $\mathbb{Z}$-orderings are incompatible, then a canonical in…
▽ More
In this paper we first consider hyperfinite Borel equivalence relations with a pair of Borel $\mathbb{Z}$-orderings. We define a notion of compatibility between such pairs, and prove a dichotomy theorem which characterizes exactly when a pair of Borel $\mathbb{Z}$-orderings are compatible with each other. We show that, if a pair of Borel $\mathbb{Z}$-orderings are incompatible, then a canonical incompatible pair of Borel $\mathbb{Z}$-orderings of $E_0$ can be Borel embedded into the given pair. We then consider hyperfinite-over-finite equivalence relations, which are countable Borel equivalence relations admitting Borel $\mathbb{Z}^2$-orderings. We show that if a hyperfinite-over-hyperfinite equivalence relation $E$ admits a Borel $\mathbb{Z}^2$-ordering which is self-compatible, then $E$ is hyperfinite.
△ Less
Submitted 6 May, 2024; v1 submitted 26 April, 2024;
originally announced April 2024.
-
Ruffle&Riley: Insights from Designing and Evaluating a Large Language Model-Based Conversational Tutoring System
Authors:
Robin Schmucker,
Meng Xia,
Amos Azaria,
Tom Mitchell
Abstract:
Conversational tutoring systems (CTSs) offer learning experiences through interactions based on natural language. They are recognized for promoting cognitive engagement and improving learning outcomes, especially in reasoning tasks. Nonetheless, the cost associated with authoring CTS content is a major obstacle to widespread adoption and to research on effective instructional design. In this paper…
▽ More
Conversational tutoring systems (CTSs) offer learning experiences through interactions based on natural language. They are recognized for promoting cognitive engagement and improving learning outcomes, especially in reasoning tasks. Nonetheless, the cost associated with authoring CTS content is a major obstacle to widespread adoption and to research on effective instructional design. In this paper, we discuss and evaluate a novel type of CTS that leverages recent advances in large language models (LLMs) in two ways: First, the system enables AI-assisted content authoring by inducing an easily editable tutoring script automatically from a lesson text. Second, the system automates the script orchestration in a learning-by-teaching format via two LLM-based agents (Ruffle&Riley) acting as a student and a professor. The system allows for free-form conversations that follow the ITS-typical inner and outer loop structure. We evaluate Ruffle&Riley's ability to support biology lessons in two between-subject online user studies (N = 200) comparing the system to simpler QA chatbots and reading activity. Analyzing system usage patterns, pre/post-test scores and user experience surveys, we find that Ruffle&Riley users report high levels of engagement, understanding and perceive the offered support as helpful. Even though Ruffle&Riley users require more time to complete the activity, we did not find significant differences in short-term learning gains over the reading activity. Our system architecture and user study provide various insights for designers of future CTSs. We further open-source our system to support ongoing research on effective instructional design of LLM-based learning technologies.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
Scalable cyclic transformation of orbital angular momentum modes based on a nonreciprocal Mach-Zehnder interferometer
Authors:
Y. F. Yang,
M. Y. Chen,
F. P. Li,
Y. P. Ruan,
Z. X. Li,
M. Xiao,
H. Zhang,
K. Y. Xia
Abstract:
The orbital angular momentum (OAM) of photons provides a pivotal resource for carrying out high-dimensional classical and quantum information processing due to its unique discrete high-dimensional nature. The cyclic transformation of a set of orthogonal OAM modes is an essential building block for universal high-dimensional information processing. Its realization in the quantum domain is the unive…
▽ More
The orbital angular momentum (OAM) of photons provides a pivotal resource for carrying out high-dimensional classical and quantum information processing due to its unique discrete high-dimensional nature. The cyclic transformation of a set of orthogonal OAM modes is an essential building block for universal high-dimensional information processing. Its realization in the quantum domain is the universal quantum Pauli-X gate. In this work, we experimentally demonstrate a cyclic transformation of six OAM modes with an averaged efficiency higher than 96% by exploiting a nonreciprocal Mach-Zehnder interferometer. Our system is simple and can, in principle, be scaled to more modes. By improving phase stabilization and inputting quantum photonic states, this method can perform universal single-photon quantum Pauli-X gate, thus paving the way for scalable high-dimensional quantum computation.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
The Traveling Tournament Problem: Improved Algorithms Based on Cycle Packing
Authors:
Jingyang Zhao,
Mingyu Xiao,
Chao Xu
Abstract:
The Traveling Tournament Problem (TTP) is a well-known benchmark problem in the field of tournament timetabling, which asks us to design a double round-robin schedule such that each pair of teams plays one game in each other's home venue, minimizing the total distance traveled by all $n$ teams ($n$ is even). TTP-$k$ is the problem with one more constraint that each team can have at most $k$-consec…
▽ More
The Traveling Tournament Problem (TTP) is a well-known benchmark problem in the field of tournament timetabling, which asks us to design a double round-robin schedule such that each pair of teams plays one game in each other's home venue, minimizing the total distance traveled by all $n$ teams ($n$ is even). TTP-$k$ is the problem with one more constraint that each team can have at most $k$-consecutive home games or away games. In this paper, we investigate schedules for TTP-$k$ and analyze the approximation ratio of the solutions. Most previous schedules were constructed based on a Hamiltonian cycle of the graph. We will propose a novel construction based on a $k$-cycle packing. Then, combining our $k$-cycle packing schedule with the Hamiltonian cycle schedule, we obtain improved approximation ratios for TTP-$k$ with deep analysis. The case where $k=3$, TTP-3, is one of the most investigated cases. We improve the approximation ratio of TTP-3 from $(1.667+\varepsilon)$ to $(1.598+\varepsilon)$, for any $\varepsilon>0$. For TTP-$4$, we improve the approximation ratio from $(1.750+\varepsilon)$ to $(1.700+\varepsilon)$. By a refined analysis of the Hamiltonian cycle construction, we also improve the approximation ratio of TTP-$k$ from $(\frac{5k-7}{2k}+\varepsilon)$ to $(\frac{5k^2-4k+3}{2k(k+1)}+\varepsilon)$ for any constant $k\geq 5$. Our methods can be extended to solve a variant called LDTTP-$k$ (TTP-$k$ where all teams are allocated on a straight line). We show that the $k$-cycle packing construction can achieve an approximation ratio of $(\frac{3k-3}{2k-1}+\varepsilon)$, which improves the approximation ratio of LDTTP-3 from $4/3$ to $(6/5+\varepsilon)$.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
FRESCO: The Paschen-$α$ Star Forming Sequence at Cosmic Noon
Authors:
Chloe Neufeld,
Pieter van Dokkum,
Yasmeen Asali,
Alba Covelo-Paz,
Joel Leja,
Jamie Lin,
Jorryt Matthee,
Pascal A. Oesch,
Naveen A. Reddy,
Irene Shivaei,
Katherine E. Whitaker,
Stijn Wuyts,
Gabriel Brammer,
Danilo Marchesini,
Michael V. Maseda,
Rohan P. Naidu,
Erica J. Nelson,
Anna Velichko,
Andrea Weibel,
Mengyuan Xiao
Abstract:
We present results from the JWST First Reionization Epoch Spectroscopically Complete Observations survey (FRESCO) on the star forming sequence of galaxies at $1.0<z<1.7$, around the peak of the cosmic star formation history. Star formation rates (SFRs) are measured from the redshifted, nearly dust-insensitive Paschen-$α$ emission line, and stellar mass measurements include the F444W (4.4 $μ$m; res…
▽ More
We present results from the JWST First Reionization Epoch Spectroscopically Complete Observations survey (FRESCO) on the star forming sequence of galaxies at $1.0<z<1.7$, around the peak of the cosmic star formation history. Star formation rates (SFRs) are measured from the redshifted, nearly dust-insensitive Paschen-$α$ emission line, and stellar mass measurements include the F444W (4.4 $μ$m; rest-frame H) band. We find SFRs of galaxies with $M*>9.5 M_\odot$ that are lower than found in many earlier studies by up to 0.6 dex, but in good agreement with recent results obtained with the Prospector fitting framework. The difference log(SFR(Pa$α$)-SFR(Prospector)) is -0.09 $\pm$ 0.04 dex at $10^{10-11} M_\odot$. We also measure the empirical relation between Paschen-$α$ luminosity and rest-frame H band magnitude and find that the scatter is only 0.04 dex lower than that of the SFR-M* relation and is much lower than the systematic differences among relations in the literature due to various methods of converting observed measurements to physical properties. We additionally identify examples of sources -- that, with standard cutoffs via the UVJ diagram, would be deemed quiescent -- with significant, typically extended, Paschen-$α$ emission. Our results may be indicative of the potential unification of methods used to derive the star forming sequence with careful selection of star forming galaxies and independent star formation rate and stellar mass indicators.
△ Less
Submitted 10 July, 2024; v1 submitted 16 April, 2024;
originally announced April 2024.
-
Exact and Efficient Unlearning for Large Language Model-based Recommendation
Authors:
Zhiyu Hu,
Yang Zhang,
Minghao Xiao,
Wenjie Wang,
Fuli Feng,
Xiangnan He
Abstract:
The evolving paradigm of Large Language Model-based Recommendation (LLMRec) customizes Large Language Models (LLMs) through parameter-efficient fine-tuning (PEFT) using recommendation data. The inclusion of user data in LLMs raises privacy concerns. To protect users, the unlearning process in LLMRec, specifically removing unusable data (e.g., historical behaviors) from established LLMRec models, b…
▽ More
The evolving paradigm of Large Language Model-based Recommendation (LLMRec) customizes Large Language Models (LLMs) through parameter-efficient fine-tuning (PEFT) using recommendation data. The inclusion of user data in LLMs raises privacy concerns. To protect users, the unlearning process in LLMRec, specifically removing unusable data (e.g., historical behaviors) from established LLMRec models, becomes crucial. However, existing unlearning methods are insufficient for the unique characteristics of LLM-Rec, mainly due to high computational costs or incomplete data erasure. In this study, we introduce the Adapter Partition and Aggregation (APA) framework for exact and efficient unlearning while maintaining recommendation performance. APA achieves this by establishing distinct adapters for partitioned training data shards and retraining only the adapters impacted by unusable data for unlearning. To preserve recommendation performance and mitigate considerable inference costs, APA employs parameter-level adapter aggregation with sample-adaptive attention for individual testing samples. Extensive experiments substantiate the effectiveness and efficiency of our proposed framework
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
Breast Cancer Image Classification Method Based on Deep Transfer Learning
Authors:
Weimin Wang,
Min Gao,
Mingxuan Xiao,
Xu Yan,
Yufeng Li
Abstract:
To address the issues of limited samples, time-consuming feature design, and low accuracy in detection and classification of breast cancer pathological images, a breast cancer image classification model algorithm combining deep learning and transfer learning is proposed. This algorithm is based on the DenseNet structure of deep neural networks, and constructs a network model by introducing attenti…
▽ More
To address the issues of limited samples, time-consuming feature design, and low accuracy in detection and classification of breast cancer pathological images, a breast cancer image classification model algorithm combining deep learning and transfer learning is proposed. This algorithm is based on the DenseNet structure of deep neural networks, and constructs a network model by introducing attention mechanisms, and trains the enhanced dataset using multi-level transfer learning. Experimental results demonstrate that the algorithm achieves an efficiency of over 84.0\% in the test set, with a significantly improved classification accuracy compared to previous models, making it applicable to medical breast cancer detection tasks.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
Facility Assignment with Fair Cost Sharing: Equilibrium and Mechanism Design
Authors:
Mengfan Ma,
Mingyu Xiao,
Tian Bai,
Xin Cheng
Abstract:
In the one-dimensional facility assignment problem, m facilities and n agents are positioned along the real line. Each agent will be assigned to a single facility to receive service. Each facility incurs a building cost, which is shared equally among the agents utilizing it. Additionally, each agent independently bears a connection cost to access a facility. Thus, an agent's cost is the sum of the…
▽ More
In the one-dimensional facility assignment problem, m facilities and n agents are positioned along the real line. Each agent will be assigned to a single facility to receive service. Each facility incurs a building cost, which is shared equally among the agents utilizing it. Additionally, each agent independently bears a connection cost to access a facility. Thus, an agent's cost is the sum of the connection cost and her portion of the building cost. The social cost is the total cost of all agents. Notably, the optimal assignment that minimizes the social cost can be found in polynomial time. In this paper, we study the problem from two game-theoretical settings regarding the strategy space of agents and the rule the assignment. In both settings, agents act strategically to minimize their individual costs.
In our first setting, the strategy space of agents is the set of facilities, granting agents the freedom to select any facility. Consequently, the self-formed assignment can exhibit instability, as agents may deviate to other facilities. We focus on the computation of an equilibrium assignment, where no agent has an incentive to unilaterally change her choice. We show that we can compute a pure Nash equilibrium in polynomial time.
In our second setting, agents report their positions to a mechanism for assignment to facilities. The strategy space of agents becomes the set of all positions. Our interest lies in strategyproof mechanisms. It is essential to note that the preference induced by the agents' cost function is more complex as it depends on how other agents are assigned. We establish a strong lower bound against all strategyproof and anonymous mechanisms: none can achieve a bounded social cost approximation ratio. Nonetheless, we identify a class of non-trivial strategyproof mechanisms for any n and m that is unanimous and anonymous.
△ Less
Submitted 13 April, 2024;
originally announced April 2024.
-
Survival Prediction Across Diverse Cancer Types Using Neural Networks
Authors:
Xu Yan,
Weimin Wang,
MingXuan Xiao,
Yufeng Li,
Min Gao
Abstract:
Gastric cancer and Colon adenocarcinoma represent widespread and challenging malignancies with high mortality rates and complex treatment landscapes. In response to the critical need for accurate prognosis in cancer patients, the medical community has embraced the 5-year survival rate as a vital metric for estimating patient outcomes. This study introduces a pioneering approach to enhance survival…
▽ More
Gastric cancer and Colon adenocarcinoma represent widespread and challenging malignancies with high mortality rates and complex treatment landscapes. In response to the critical need for accurate prognosis in cancer patients, the medical community has embraced the 5-year survival rate as a vital metric for estimating patient outcomes. This study introduces a pioneering approach to enhance survival prediction models for gastric and Colon adenocarcinoma patients. Leveraging advanced image analysis techniques, we sliced whole slide images (WSI) of these cancers, extracting comprehensive features to capture nuanced tumor characteristics. Subsequently, we constructed patient-level graphs, encapsulating intricate spatial relationships within tumor tissues. These graphs served as inputs for a sophisticated 4-layer graph convolutional neural network (GCN), designed to exploit the inherent connectivity of the data for comprehensive analysis and prediction. By integrating patients' total survival time and survival status, we computed C-index values for gastric cancer and Colon adenocarcinoma, yielding 0.57 and 0.64, respectively. Significantly surpassing previous convolutional neural network models, these results underscore the efficacy of our approach in accurately predicting patient survival outcomes. This research holds profound implications for both the medical and AI communities, offering insights into cancer biology and progression while advancing personalized treatment strategies. Ultimately, our study represents a significant stride in leveraging AI-driven methodologies to revolutionize cancer prognosis and improve patient outcomes on a global scale.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Convolutional neural network classification of cancer cytopathology images: taking breast cancer as an example
Authors:
MingXuan Xiao,
Yufeng Li,
Xu Yan,
Min Gao,
Weimin Wang
Abstract:
Breast cancer is a relatively common cancer among gynecological cancers. Its diagnosis often relies on the pathology of cells in the lesion. The pathological diagnosis of breast cancer not only requires professionals and time, but also sometimes involves subjective judgment. To address the challenges of dependence on pathologists expertise and the time-consuming nature of achieving accurate breast…
▽ More
Breast cancer is a relatively common cancer among gynecological cancers. Its diagnosis often relies on the pathology of cells in the lesion. The pathological diagnosis of breast cancer not only requires professionals and time, but also sometimes involves subjective judgment. To address the challenges of dependence on pathologists expertise and the time-consuming nature of achieving accurate breast pathological image classification, this paper introduces an approach utilizing convolutional neural networks (CNNs) for the rapid categorization of pathological images, aiming to enhance the efficiency of breast pathological image detection. And the approach enables the rapid and automatic classification of pathological images into benign and malignant groups. The methodology involves utilizing a convolutional neural network (CNN) model leveraging the Inceptionv3 architecture and transfer learning algorithm for extracting features from pathological images. Utilizing a neural network with fully connected layers and employing the SoftMax function for image classification. Additionally, the concept of image partitioning is introduced to handle high-resolution images. To achieve the ultimate classification outcome, the classification probabilities of each image block are aggregated using three algorithms: summation, product, and maximum. Experimental validation was conducted on the BreaKHis public dataset, resulting in accuracy rates surpassing 0.92 across all four magnification coefficients (40X, 100X, 200X, and 400X). It demonstrates that the proposed method effectively enhances the accuracy in classifying pathological images of breast cancer.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.