subscribe to arXiv mailings

FlowLearn: Evaluating Large Vision-Language Models on Flowchart Understanding

Authors: Huitong Pan, Qi Zhang, Cornelia Caragea, Eduard Dragut, Longin Jan Latecki

Abstract: Flowcharts are graphical tools for representing complex concepts in concise visual representations. This paper introduces the FlowLearn dataset, a resource tailored to enhance the understanding of flowcharts. FlowLearn contains complex scientific flowcharts and simulated flowcharts. The scientific subset contains 3,858 flowcharts sourced from scientific literature and the simulated subset contains… ▽ More Flowcharts are graphical tools for representing complex concepts in concise visual representations. This paper introduces the FlowLearn dataset, a resource tailored to enhance the understanding of flowcharts. FlowLearn contains complex scientific flowcharts and simulated flowcharts. The scientific subset contains 3,858 flowcharts sourced from scientific literature and the simulated subset contains 10,000 flowcharts created using a customizable script. The dataset is enriched with annotations for visual components, OCR, Mermaid code representation, and VQA question-answer pairs. Despite the proven capabilities of Large Vision-Language Models (LVLMs) in various visual understanding tasks, their effectiveness in decoding flowcharts - a crucial element of scientific communication - has yet to be thoroughly investigated. The FlowLearn test set is crafted to assess the performance of LVLMs in flowchart comprehension. Our study thoroughly evaluates state-of-the-art LVLMs, identifying existing limitations and establishing a foundation for future enhancements in this relatively underexplored domain. For instance, in tasks involving simulated flowcharts, GPT-4V achieved the highest accuracy (58%) in counting the number of nodes, while Claude recorded the highest accuracy (83%) in OCR tasks. Notably, no single model excels in all tasks within the FlowLearn framework, highlighting significant opportunities for further development. △ Less

Submitted 9 July, 2024; v1 submitted 6 July, 2024; originally announced July 2024.

Comments: ECAI 2024

arXiv:2407.04911 [pdf, other]

Enhanced Long-Tailed Recognition with Contrastive CutMix Augmentation

Authors: Haolin Pan, Yong Guo, Mianjie Yu, Jian Chen

Abstract: Real-world data often follows a long-tailed distribution, where a few head classes occupy most of the data and a large number of tail classes only contain very limited samples. In practice, deep models often show poor generalization performance on tail classes due to the imbalanced distribution. To tackle this, data augmentation has become an effective way by synthesizing new samples for tail clas… ▽ More Real-world data often follows a long-tailed distribution, where a few head classes occupy most of the data and a large number of tail classes only contain very limited samples. In practice, deep models often show poor generalization performance on tail classes due to the imbalanced distribution. To tackle this, data augmentation has become an effective way by synthesizing new samples for tail classes. Among them, one popular way is to use CutMix that explicitly mixups the images of tail classes and the others, while constructing the labels according to the ratio of areas cropped from two images. However, the area-based labels entirely ignore the inherent semantic information of the augmented samples, often leading to misleading training signals. To address this issue, we propose a Contrastive CutMix (ConCutMix) that constructs augmented samples with semantically consistent labels to boost the performance of long-tailed recognition. Specifically, we compute the similarities between samples in the semantic space learned by contrastive learning, and use them to rectify the area-based labels. Experiments show that our ConCutMix significantly improves the accuracy on tail classes as well as the overall performance. For example, based on ResNeXt-50, we improve the overall accuracy on ImageNet-LT by 3.0% thanks to the significant improvement of 3.3% on tail classes. We highlight that the improvement also generalizes well to other benchmarks and models. Our code and pretrained models are available at https://github.com/PanHaulin/ConCutMix. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 16 pages and 13 figures

arXiv:2407.04852 [pdf, other]

Small $x$ asymptotics for special function solutions of Painlevé III equation

Authors: Hao Pan, Andrei Prokhorov

Abstract: In this paper we compute the small $x$ asymptotics of the special function solutions of Painlevé-III equation. We use the representation of solution in terms of Hankel determinants of Bessel functions, which seems to be new. Hankel determinants can be rewritten as multiple contour integrals using Andrèief identity. Finally small $x$ asymptotics is obtained using elementary asymptotic methods. In this paper we compute the small $x$ asymptotics of the special function solutions of Painlevé-III equation. We use the representation of solution in terms of Hankel determinants of Bessel functions, which seems to be new. Hankel determinants can be rewritten as multiple contour integrals using Andrèief identity. Finally small $x$ asymptotics is obtained using elementary asymptotic methods. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 25 pages, 7 figures

MSC Class: 33C10; 33E17; 34E05; 34M55; 34M56

arXiv:2407.04531 [pdf, other]

Neutral atomic and molecular gas dynamics in the nearby spiral galaxies NGC 1512, NGC 4535, and NGC 7496

Authors: Sebastian Laudage, Cosima Eibensteiner, Frank Bigiel, Adam K. Leroy, Sharon Meidt, Eva Schinnerer, W. J. G. de Blok, Miguele Querejeta, Sophia Stuber, Dario Colombo, Erik Rosolowsky, D. J. Pisano, Dyas Utomo, Rebecca C. Levy, Ralf Klessen, Yixian Cao, Eric W. Koch, Sushma Kurapati, Patricia Sanchez-Blazquez, Justus Neumann, Lukas Neumann, Hsi-An Pan, Thomas G. Williams

Abstract: Neutral atomic gas (HI) effectively traces galactic dynamics across mid to large galactocentric radii. However, its limitations in observing small-scale changes within the central few kiloparsecs, coupled with the often observed HI deficit in galactic centers, necessitates using molecular gas emission as a preferred tracer in these regions. Understanding the dynamics of both neutral atomic and mol… ▽ More Neutral atomic gas (HI) effectively traces galactic dynamics across mid to large galactocentric radii. However, its limitations in observing small-scale changes within the central few kiloparsecs, coupled with the often observed HI deficit in galactic centers, necessitates using molecular gas emission as a preferred tracer in these regions. Understanding the dynamics of both neutral atomic and molecular gas is crucial for a more complete understanding of how galaxies evolve, funnel gas from the outer disk into their central parts, and eventually form stars. In this work we aim to quantify the dynamics of both, the neutral atomic and molecular gas, in the nearby spiral galaxies NGC 1512, NGC 4535, and NGC 7496 using new MeerKAT-HI observations together with ALMA CO (2-1) observations from the PHANGS collaboration. We use the analysis tool 3D-Barolo to fit tilted ring models to the HI and CO observations. A combined approach of using the HI to constrain the true disk orientation parameters before applying these to the CO datasets is tested. This paper sets expectations for the results of the upcoming high-resolution HI coverage of many galaxies in the PHANGS-ALMA sample using MeerKAT or VLA, to establish a robust methodology for characterizing galaxy orientations and deriving dynamics from combining new HI with existing CO data. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: accepted for publication in A&A; 13 pages, 9 Figures (+2 appendix pages)

arXiv:2407.01716 [pdf, other]

PHANGS-MeerKAT and MHONGOOSE HI observations of nearby spiral galaxies: physical drivers of the molecular gas fraction, $R_{\mathrm{mol}}$

Authors: Cosima Eibensteiner, Jiayi Sun, Frank Bigiel, Adam K. Leroy, Eva Schinnerer, Erik Rosolowsky, Sushma Kurapati, D. J. Pisano, W. J. G de Blok, Ashley T. Barnes, Mallory Thorp, Dario Colombo, Eric W. Koch, I-Da Chiang, Eve C. Ostriker, Eric J. Murphy, Nikki Zabel, Sebstian Laudage, Filippo M. Maccagni, Julia Healy, Srikrishna Sekhar, Dyas Utomo, Jakob den Brok, Yixian Cao, Mélanie Chevance , et al. (14 additional authors not shown)

Abstract: The molecular-to-atomic gas ratio is crucial to the evolution of the interstellar medium in galaxies. We investigate the balance between the atomic ($Σ_{\rm HI}$) and molecular gas ($Σ_{\rm H2}$) surface densities in eight nearby star-forming galaxies using new high-quality observations from MeerKAT and ALMA (for HI and CO, respectively). We define the molecular gas ratio as… ▽ More The molecular-to-atomic gas ratio is crucial to the evolution of the interstellar medium in galaxies. We investigate the balance between the atomic ($Σ_{\rm HI}$) and molecular gas ($Σ_{\rm H2}$) surface densities in eight nearby star-forming galaxies using new high-quality observations from MeerKAT and ALMA (for HI and CO, respectively). We define the molecular gas ratio as $R_{\rm mol} = Σ_{\rm H2} / Σ_{\rm HI}$ and measure how it depends on local conditions in the galaxy disks using multi-wavelength observations. We find that, depending on the galaxy, HI is detected at $>3σ$ out to 20-120 kpc in galactocentric radius ($r_{\rm gal}$). The typical radius at which $Σ_{\rm HI}$ reaches 1~$\rm M_\odot~pc^{-2}$ is $r_{\rm HI}\approx22$~kpc, which corresponds to 1-3 times the optical radius ($r_{25}$). $R_{\rm mol}$ correlates best with the dynamical equilibrium pressure, P$_{\rm DE}$, among potential drivers studied, with a median correlation coefficient of $<ρ>=0.89$. Correlations between $R_{\rm mol}$ and star formation rate, total gas and stellar surface density, metallicity, and $Σ_{\rm SFR}$/P$_{\rm DE}$ are present but somewhat weaker. Our results also show a direct correlation between P$_{\rm DE}$ and $Σ_{\rm SFR}$, supporting self-regulation models. Quantitatively, we measure similar scalings as previous works and attribute the modest differences that we find to the effect of varying resolution and sensitivity. At $r_{\rm gal} {\gtrsim}0.4~r_{25}$, atomic gas dominates over molecular gas, and at the balance of these two gas phases, we find that the baryon mass is dominated by stars, with $Σ_{*} > 5~Σ_{\rm gas}$. Our study constitutes an important step in the statistical investigation of how local galaxy properties impact the conversion from atomic to molecular gas in nearby galaxies. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: accepted for publication in A&A; 20 pages, 12 Figures (+4 appendix pages)

arXiv:2407.00978 [pdf, other]

Hybrid RAG-empowered Multi-modal LLM for Secure Healthcare Data Management: A Diffusion-based Contract Theory Approach

Authors: Cheng Su, Jinbo Wen, Jiawen Kang, Yonghua Wang, Hudan Pan, M. Shamim Hossain

Abstract: Secure data management and effective data sharing have become paramount in the rapidly evolving healthcare landscape. The advancement of generative artificial intelligence has positioned Multi-modal Large Language Models (MLLMs) as crucial tools for managing healthcare data. MLLMs can support multi-modal inputs and generate diverse types of content by leveraging large-scale training on vast amount… ▽ More Secure data management and effective data sharing have become paramount in the rapidly evolving healthcare landscape. The advancement of generative artificial intelligence has positioned Multi-modal Large Language Models (MLLMs) as crucial tools for managing healthcare data. MLLMs can support multi-modal inputs and generate diverse types of content by leveraging large-scale training on vast amounts of multi-modal data. However, critical challenges persist in developing medical MLLMs, including healthcare data security and freshness issues, affecting the output quality of MLLMs. In this paper, we propose a hybrid Retrieval-Augmented Generation (RAG)-empowered medical MLLMs framework for healthcare data management. This framework leverages a hierarchical cross-chain architecture to facilitate secure data training. Moreover, it enhances the output quality of MLLMs through hybrid RAG, which employs multi-modal metrics to filter various unimodal RAG results and incorporates these retrieval results as additional inputs to MLLMs. Additionally, we employ age of information to indirectly evaluate the data freshness impact of MLLMs and utilize contract theory to incentivize healthcare data holders to share fresh data, mitigating information asymmetry in data sharing. Finally, we utilize a generative diffusion model-based reinforcement learning algorithm to identify the optimal contract for efficient data sharing. Numerical results demonstrate the effectiveness of the proposed schemes, which achieve secure and efficient healthcare data management. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 12 pages, 6 figures

arXiv:2407.00031 [pdf, other]

Supercharging Federated Learning with Flower and NVIDIA FLARE

Authors: Holger R. Roth, Daniel J. Beutel, Yan Cheng, Javier Fernandez Marques, Heng Pan, Chester Chen, Zhihong Zhang, Yuhong Wen, Sean Yang, Isaac, Yang, Yuan-Ting Hsieh, Ziyue Xu, Daguang Xu, Nicholas D. Lane, Andrew Feng

Abstract: Several open-source systems, such as Flower and NVIDIA FLARE, have been developed in recent years while focusing on different aspects of federated learning (FL). Flower is dedicated to implementing a cohesive approach to FL, analytics, and evaluation. Over time, Flower has cultivated extensive strategies and algorithms tailored for FL application development, fostering a vibrant FL community in re… ▽ More Several open-source systems, such as Flower and NVIDIA FLARE, have been developed in recent years while focusing on different aspects of federated learning (FL). Flower is dedicated to implementing a cohesive approach to FL, analytics, and evaluation. Over time, Flower has cultivated extensive strategies and algorithms tailored for FL application development, fostering a vibrant FL community in research and industry. Conversely, FLARE has prioritized the creation of an enterprise-ready, resilient runtime environment explicitly designed for FL applications in production environments. In this paper, we describe our initial integration of both frameworks and show how they can work together to supercharge the FL ecosystem as a whole. Through the seamless integration of Flower and FLARE, applications crafted within the Flower framework can effortlessly operate within the FLARE runtime environment without necessitating any modifications. This initial integration streamlines the process, eliminating complexities and ensuring smooth interoperability between the two platforms, thus enhancing the overall efficiency and accessibility of FL applications. △ Less

Submitted 21 May, 2024; originally announced July 2024.

arXiv:2406.20026 [pdf, other]

FAST survey of H I and OH absorption towards extragalactic radio sources

Authors: Yogesh Chandola, D. J. Saikia, Yin-Zhe Ma, Zheng Zheng, Chao-Wei Tsai, Di Li, Denis Tramonte, Hengxing Pan

Abstract: Neutral atomic hydrogen and molecular gas in the host galaxies of radio active galactic nuclei (AGN) can be traced using H I 21-cm and OH-1667 MHz absorption lines to understand the fueling and feedback processes. We present the results of an H I and OH absorption survey with the Five-hundred-meter Aperture Spherical radio Telescope (FAST) towards 40 radio sources of low-intermediate radio luminos… ▽ More Neutral atomic hydrogen and molecular gas in the host galaxies of radio active galactic nuclei (AGN) can be traced using H I 21-cm and OH-1667 MHz absorption lines to understand the fueling and feedback processes. We present the results of an H I and OH absorption survey with the Five-hundred-meter Aperture Spherical radio Telescope (FAST) towards 40 radio sources of low-intermediate radio luminosity ($\sim$10$^{23}$-10$^{26}$ W Hz$^{-1}$ at 1.4 GHz), red mid-infrared color (W2[4.6 $μ$m]$-$W3[12 $μ$m] $>$ 2.5 mag) and redshift up to 0.35. From 13 sources with good data at H I observing frequencies, we report the detection of H I absorption towards 8 sources, 5 of which are new detections including 4 in the redshift range 0.25 to 0.35. Our detection rates are consistent with our previous results with dependence on the star-formation history of the host galaxy reflected in the mid-infrared \textit{WISE} W2$-$W3 colors and the compactness of the radio source. We find no significant dependence of detection rates on radio luminosity or redshift. We also find that H I column densities are anti-correlated with the low-frequency spectral indices ($α_{\rm 150 MHz}^{\rm 1.4 GHz}$, $S_ν\propto ν^{-α}$). We do not have any detection from 23 sources with good data at OH observing frequencies. However, by stacking the spectra we estimate the 3$σ$ upper limit of OH column density to be 2.27$\times$10$^{14}$$T_{\rm ex}$/10 K $\times$1/$f_{\rm c}$ cm$^{-2}$. By stacking the OH spectra for 7 associated H I absorbers, we get a 3$σ$ upper limit of 3.47$\times$10$^{14}$ $T_{\rm ex}$/10 K $\times$1/$f_{\rm c}$ cm$^{-2}$ on OH column density and 1.78$\times$10$^{-7}$ on [OH]/[H I] ratio. △ Less

Submitted 28 June, 2024; originally announced June 2024.

Comments: 20 pages, 8 figures, accepted for publication in The Astrophysical Journal (ApJ)

arXiv:2406.18227 [pdf, other]

GUIDE: A Guideline-Guided Dataset for Instructional Video Comprehension

Authors: Jiafeng Liang, Shixin Jiang, Zekun Wang, Haojie Pan, Zerui Chen, Zheng Chu, Ming Liu, Ruiji Fu, Zhongyuan Wang, Bing Qin

Abstract: There are substantial instructional videos on the Internet, which provide us tutorials for completing various tasks. Existing instructional video datasets only focus on specific steps at the video level, lacking experiential guidelines at the task level, which can lead to beginners struggling to learn new tasks due to the lack of relevant experience. Moreover, the specific steps without guidelines… ▽ More There are substantial instructional videos on the Internet, which provide us tutorials for completing various tasks. Existing instructional video datasets only focus on specific steps at the video level, lacking experiential guidelines at the task level, which can lead to beginners struggling to learn new tasks due to the lack of relevant experience. Moreover, the specific steps without guidelines are trivial and unsystematic, making it difficult to provide a clear tutorial. To address these problems, we present the GUIDE (Guideline-Guided) dataset, which contains 3.5K videos of 560 instructional tasks in 8 domains related to our daily life. Specifically, we annotate each instructional task with a guideline, representing a common pattern shared by all task-related videos. On this basis, we annotate systematic specific steps, including their associated guideline steps, specific step descriptions and timestamps. Our proposed benchmark consists of three sub-tasks to evaluate comprehension ability of models: (1) Step Captioning: models have to generate captions for specific steps from videos. (2) Guideline Summarization: models have to mine the common pattern in task-related videos and summarize a guideline from them. (3) Guideline-Guided Captioning: models have to generate captions for specific steps under the guide of guideline. We evaluate plenty of foundation models with GUIDE and perform in-depth analysis. Given the diversity and practicality of GUIDE, we believe that it can be used as a better benchmark for instructional video comprehension. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: IJCAI 2024

arXiv:2406.14756 [pdf, other]

SciDMT: A Large-Scale Corpus for Detecting Scientific Mentions

Authors: Huitong Pan, Qi Zhang, Cornelia Caragea, Eduard Dragut, Longin Jan Latecki

Abstract: We present SciDMT, an enhanced and expanded corpus for scientific mention detection, offering a significant advancement over existing related resources. SciDMT contains annotated scientific documents for datasets (D), methods (M), and tasks (T). The corpus consists of two components: 1) the SciDMT main corpus, which includes 48 thousand scientific articles with over 1.8 million weakly annotated me… ▽ More We present SciDMT, an enhanced and expanded corpus for scientific mention detection, offering a significant advancement over existing related resources. SciDMT contains annotated scientific documents for datasets (D), methods (M), and tasks (T). The corpus consists of two components: 1) the SciDMT main corpus, which includes 48 thousand scientific articles with over 1.8 million weakly annotated mention annotations in the format of in-text span, and 2) an evaluation set, which comprises 100 scientific articles manually annotated for evaluation purposes. To the best of our knowledge, SciDMT is the largest corpus for scientific entity mention detection. The corpus's scale and diversity are instrumental in developing and refining models for tasks such as indexing scientific papers, enhancing information retrieval, and improving the accessibility of scientific knowledge. We demonstrate the corpus's utility through experiments with advanced deep learning architectures like SciBERT and GPT-3.5. Our findings establish performance baselines and highlight unresolved challenges in scientific mention detection. SciDMT serves as a robust benchmark for the research community, encouraging the development of innovative models to further the field of scientific information extraction. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: LREC/COLING 2024

MSC Class: I.2.7

Journal ref: LREC-COLING. (2024) 14407-14417

arXiv:2406.12333 [pdf]

Permeability distribution of gas drainage of borehole with the different moisture content caused polar permeability effect

Authors: Lei Zhang, Yao Zhang, Hongyu Pan, Yan Cao, Yuhang Chu, Shihua Yang

Abstract: In order to study the penetration characteristics in areas with different water content and different stress distributions in the radial direction of the hole after hydraulicization measures, an improved LFTD1812 triaxial permeability meter was used to conduct a test to measure the polar permeability characteristics of coal with different water content combinations were measured by permeability in… ▽ More In order to study the penetration characteristics in areas with different water content and different stress distributions in the radial direction of the hole after hydraulicization measures, an improved LFTD1812 triaxial permeability meter was used to conduct a test to measure the polar permeability characteristics of coal with different water content combinations were measured by permeability instrument, and the porosity, permeability, pressure gradient and seepage velocity of different samples were analyzed. The relationship between sample porosity, permeability, pressure gradient and seepage velocity was discussed, the influence of moisture content on permeability was discussed, and the directionality and the directivity and polarization effect of permeability were found.. Result shows that The relationship between permeability and porosity shows two trends of exponential type and logarithmic type, and the porosity-permeability(φ-k) plane is divided into three influence regions: super index (I), index (II) and logarithm (III). △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 12 pages,10 figures

arXiv:2406.12225 [pdf, other]

The Solution for CVPR2024 Foundational Few-Shot Object Detection Challenge

Authors: Hongpeng Pan, Shifeng Yi, Shouwei Yang, Lei Qi, Bing Hu, Yi Xu, Yang Yang

Abstract: This report introduces an enhanced method for the Foundational Few-Shot Object Detection (FSOD) task, leveraging the vision-language model (VLM) for object detection. However, on specific datasets, VLM may encounter the problem where the detected targets are misaligned with the target concepts of interest. This misalignment hinders the zero-shot performance of VLM and the application of fine-tunin… ▽ More This report introduces an enhanced method for the Foundational Few-Shot Object Detection (FSOD) task, leveraging the vision-language model (VLM) for object detection. However, on specific datasets, VLM may encounter the problem where the detected targets are misaligned with the target concepts of interest. This misalignment hinders the zero-shot performance of VLM and the application of fine-tuning methods based on pseudo-labels. To address this issue, we propose the VLM+ framework, which integrates the multimodal large language model (MM-LLM). Specifically, we use MM-LLM to generate a series of referential expressions for each category. Based on the VLM predictions and the given annotations, we select the best referential expression for each category by matching the maximum IoU. Subsequently, we use these referential expressions to generate pseudo-labels for all images in the training set and then combine them with the original labeled data to fine-tune the VLM. Additionally, we employ iterative pseudo-label generation and optimization to further enhance the performance of the VLM. Our approach achieve 32.56 mAP in the final test. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: CVPR2024 Foundational Few-Shot Object Detection Challenge

arXiv:2406.12025 [pdf, other]

A 260 pc resolution ALMA map of HCN(1-0) in the galaxy NGC 4321

Authors: Lukas Neumann, Frank Bigiel, Ashley T. Barnes, Molly J. Gallagher, Adam Leroy, Antonio Usero, Erik Rosolowsky, Ivana Bešlić, Médéric Boquien, Yixian Cao, Mélanie Chevance, Dario Colombo, Daniel A. Dale, Cosima Eibensteiner, Kathryn Grasha, Jonathan D. Henshaw, María J. Jiménez-Donaire, Sharon Meidt, Shyam H. Menon, Eric J. Murphy, Hsi-An Pan, Miguel Querejeta, Toshiki Saito, Eva Schinnerer, Sophia K. Stuber , et al. (2 additional authors not shown)

Abstract: The star formation rate (SFR) is tightly connected to the amount of dense gas in molecular clouds. However, it is not fully understood how the relationship between dense molecular gas and star formation varies within galaxies and in different morphological environments. In this work, we study dense gas and star formation in the nearby spiral galaxy NGC 4321 to test how the amount of dense gas and… ▽ More The star formation rate (SFR) is tightly connected to the amount of dense gas in molecular clouds. However, it is not fully understood how the relationship between dense molecular gas and star formation varies within galaxies and in different morphological environments. In this work, we study dense gas and star formation in the nearby spiral galaxy NGC 4321 to test how the amount of dense gas and its ability to form stars varies with environmental properties at 260 pc scales. We present new ALMA observations of HCN(1-0) line emission. Combined with existing CO(2-1) observations from ALMA, and H-alpha from MUSE, as well as F2100W from JWST to trace the SFR, we measure the HCN/CO line ratio, a proxy for the dense gas fraction and SFR/HCN, a proxy for the star formation efficiency of the dense gas. Towards the centre of the galaxy, HCN/CO systematically increases while SFR/HCN decreases, but these ratios stay roughly constant throughout the disc. Spiral arms, interarm regions, and bar ends show similar HCN/CO and SFR/HCN. On the bar, there is a significantly lower SFR/HCN at a similar HCN/CO. We conclude that the centres of galaxies show the strongest environmental influence on dense gas and star formation, suggesting either that clouds couple strongly to the surrounding pressure or that HCN is tracing more of the bulk molecular gas that is less efficiently converted into stars. On the contrary, across the disc of NGC 4321, where the ISM pressure is typically low, SFR/HCN does not show large variations (< 0.3 dex) in agreement with Galactic observations of molecular clouds. Despite the large variations across environments and physical conditions, HCN/CO is a good predictor of the mean molecular gas surface density at 260 pc scales. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 18 pages, 9 figures, accepted for pub in A&A, Jun 13, 2024

arXiv:2406.10907 [pdf, other]

SparseDet: A Simple and Effective Framework for Fully Sparse LiDAR-based 3D Object Detection

Authors: Lin Liu, Ziying Song, Qiming Xia, Feiyang Jia, Caiyan Jia, Lei Yang, Hongyu Pan

Abstract: LiDAR-based sparse 3D object detection plays a crucial role in autonomous driving applications due to its computational efficiency advantages. Existing methods either use the features of a single central voxel as an object proxy, or treat an aggregated cluster of foreground points as an object proxy. However, the former lacks the ability to aggregate contextual information, resulting in insufficie… ▽ More LiDAR-based sparse 3D object detection plays a crucial role in autonomous driving applications due to its computational efficiency advantages. Existing methods either use the features of a single central voxel as an object proxy, or treat an aggregated cluster of foreground points as an object proxy. However, the former lacks the ability to aggregate contextual information, resulting in insufficient information expression in object proxies. The latter relies on multi-stage pipelines and auxiliary tasks, which reduce the inference speed. To maintain the efficiency of the sparse framework while fully aggregating contextual information, in this work, we propose SparseDet which designs sparse queries as object proxies. It introduces two key modules, the Local Multi-scale Feature Aggregation (LMFA) module and the Global Feature Aggregation (GFA) module, aiming to fully capture the contextual information, thereby enhancing the ability of the proxies to represent objects. Where LMFA sub-module achieves feature fusion across different scales for sparse key voxels %which does this through via coordinate transformations and using nearest neighbor relationships to capture object-level details and local contextual information, GFA sub-module uses self-attention mechanisms to selectively aggregate the features of the key voxels across the entire scene for capturing scene-level contextual information. Experiments on nuScenes and KITTI demonstrate the effectiveness of our method. Specifically, on nuScene, SparseDet surpasses the previous best sparse detector VoxelNeXt by 2.2\% mAP with 13.5 FPS, and on KITTI, it surpasses VoxelNeXt by 1.12\% $\mathbf{AP_{3D}}$ on hard level tasks with 17.9 FPS. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: arXiv admin note: text overlap with arXiv:2401.02702

arXiv:2406.02907 [pdf]

doi 10.1002/inf2.12504

Room-temperature tunable tunneling magnetoresistance in Fe3GaTe2/WSe2/Fe3GaTe2 van der Waals heterostructures

Authors: Haiyang Pan, Anil Kumar Singh, Chusheng Zhang, Xueqi Hu, Jiayu Shi, Liheng An, Naizhou Wang, Ruihuan Duan, Zheng Liu, S tuart S. P. Parkin, Pritam Deb, Weibo Gao

Abstract: The exceptional properties of two-dimensional (2D) magnet materials present a novel approach to fabricate functional magnetic tunnel junctions (MTJ) by constructing full van der Waals (vdW) heterostructures with atomically sharp and clean interfaces. The exploration of vdW MTJ devices with high working temperature and adjustable functionalities holds great potential for advancing the application o… ▽ More The exceptional properties of two-dimensional (2D) magnet materials present a novel approach to fabricate functional magnetic tunnel junctions (MTJ) by constructing full van der Waals (vdW) heterostructures with atomically sharp and clean interfaces. The exploration of vdW MTJ devices with high working temperature and adjustable functionalities holds great potential for advancing the application of 2D materials in magnetic sensing and data storage. Here, we report the observation of highly tunable room-temperature tunneling magnetoresistance through electronic means in a full vdW Fe3GaTe2/WSe2/Fe3GaTe2 MTJ. The spin valve effect of the MTJ can be detected even with the current below 1 nA, both at low and room temperatures, yielding a tunneling magnetoresistance (TMR) of 340% at 2 K and 50% at 300 K, respectively. Importantly, the magnitude and sign of TMR can be modulated by a DC bias current, even at room temperature, a capability that was previously unrealized in full vdW MTJs. This tunable TMR arises from the contribution of energy-dependent localized spin states in the metallic ferromagnet Fe3GaTe2 during tunnel transport when a finite electrical bias is applied. Our work offers a new perspective for designing and exploring room-temperature tunable spintronic devices based on vdW magnet heterostructures. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Journal ref: InfoMat.2023;e12504

arXiv:2406.02783 [pdf, other]

High-resolution Observation of Blowout Jets Regulated by Sunspot Rotation

Authors: Tingyu Gou, Rui Liu, Yang Su, Astrid M. Veronig, Hanya Pan, Runbin Luo, Weiqun Gan

Abstract: Coronal jets are believed to be the miniature version of large-scale solar eruptions. In particular, the eruption of a mini-filament inside the base arch is suggested to be the trigger and even driver of blowout jets. Here we propose an alternative triggering mechanism, based on high-resolution H-alpha observations of a blowout jet associated with a mini-filament and an M1.2-class flare. The mini-… ▽ More Coronal jets are believed to be the miniature version of large-scale solar eruptions. In particular, the eruption of a mini-filament inside the base arch is suggested to be the trigger and even driver of blowout jets. Here we propose an alternative triggering mechanism, based on high-resolution H-alpha observations of a blowout jet associated with a mini-filament and an M1.2-class flare. The mini-filament remains largely stationary during the blowout jet, except that it is straddled by flare loops connecting two flare ribbons, indicating that the magnetic arcade embedding the mini-filament has been torn into two parts, with the upper part escaping with the blowout jet. In the wake of the flare, the southern end of the mini-filament fans out like neighboring fibrils, indicative of mass and field exchanges between the mini-filament and the fibrils. The blowout jet is preceded by a standard jet. With H-alpha fibrils moving toward the single-strand spire in a sweeping fashion, the standard jet transitions to the blowout jet. The similar pattern of standard-to-blowout jet transition occurs in an earlier C-class flare before the mini-filament forms. The spiraling morphology and sweeping direction of these fibrils are suggestive of their footpoints being dragged by the leading sunspot that undergoes clockwise rotation for over two days. Soon after the sunspot rotation reaches a peak angular speed as fast as 10 deg/hr, the dormant active region becomes flare-productive, and the mini-filament forms through the interaction of moving magnetic features from the rotating sunspot with satellite spots/pores. Hence, we suggest that the sunspot rotation plays a key role in building up free energy for flares and jets and in triggering blowout jets by inducing sweeping motions of fibrils. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 16 pages, 10 figures, accepted in Solar Physics

arXiv:2406.01900 [pdf, other]

Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation

Authors: Yue Ma, Hongyu Liu, Hongfa Wang, Heng Pan, Yingqing He, Junkun Yuan, Ailing Zeng, Chengfei Cai, Heung-Yeung Shum, Wei Liu, Qifeng Chen

Abstract: We present Follow-Your-Emoji, a diffusion-based framework for portrait animation, which animates a reference portrait with target landmark sequences. The main challenge of portrait animation is to preserve the identity of the reference portrait and transfer the target expression to this portrait while maintaining temporal consistency and fidelity. To address these challenges, Follow-Your-Emoji equ… ▽ More We present Follow-Your-Emoji, a diffusion-based framework for portrait animation, which animates a reference portrait with target landmark sequences. The main challenge of portrait animation is to preserve the identity of the reference portrait and transfer the target expression to this portrait while maintaining temporal consistency and fidelity. To address these challenges, Follow-Your-Emoji equipped the powerful Stable Diffusion model with two well-designed technologies. Specifically, we first adopt a new explicit motion signal, namely expression-aware landmark, to guide the animation process. We discover this landmark can not only ensure the accurate motion alignment between the reference portrait and target motion during inference but also increase the ability to portray exaggerated expressions (i.e., large pupil movements) and avoid identity leakage. Then, we propose a facial fine-grained loss to improve the model's ability of subtle expression perception and reference portrait appearance reconstruction by using both expression and facial masks. Accordingly, our method demonstrates significant performance in controlling the expression of freestyle portraits, including real humans, cartoons, sculptures, and even animals. By leveraging a simple and effective progressive generation strategy, we extend our model to stable long-term animation, thus increasing its potential application value. To address the lack of a benchmark for this field, we introduce EmojiBench, a comprehensive benchmark comprising diverse portrait images, driving videos, and landmarks. We show extensive evaluations on EmojiBench to verify the superiority of Follow-Your-Emoji. △ Less

Submitted 6 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

Comments: Project Page: https://follow-your-emoji.github.io/

arXiv:2405.18530 [pdf]

doi 10.18429/JACoW-IPAC2024-THYN1

First results of AUP Nb3Sn quadrupole horizontal tests

Authors: M. Baldini, G. Ambrosio, G. Apollinari, J. Blowers, R. Bossert, R. Carcagno, G. Chlachidze, J. DiMarco, S. Feher, S. Krave, V. Lombardo, L. Martin, C. Narug, T. H. Nicol, V. Nikolic, A. Nobrega, V. Marinozzi, C. Orozco, T. Page, S. Stoynev, T. Strauss, M. Turenne, D. Turrioni, A. Vouris, M. Yu , et al. (26 additional authors not shown)

Abstract: The Large Hadron Collider will soon undergo an upgrade to increase its luminosity by a factor of ~10 [1]. A crucial part of this upgrade will be replacement of the NbTi focusing magnets with Nb3Sn magnets that achieve a ~50% increase in the field strength. This will be the first ever large-scale implementation of Nb3Sn magnets in a particle accelerator. The High-Luminosity LHC Upgrade, HL-LHC is a… ▽ More The Large Hadron Collider will soon undergo an upgrade to increase its luminosity by a factor of ~10 [1]. A crucial part of this upgrade will be replacement of the NbTi focusing magnets with Nb3Sn magnets that achieve a ~50% increase in the field strength. This will be the first ever large-scale implementation of Nb3Sn magnets in a particle accelerator. The High-Luminosity LHC Upgrade, HL-LHC is a CERN project with a world-wide collaboration. It is under construction and utilizes Nb3Sn Magnets (named MQXF) as key ingredients to increase tenfold the integrated luminosity delivered to the CMS and ATLAS experiments in the next decade. The HL-LHC AUP is the US effort to contribute approximately 50% of the low-beta focusing magnets and crab cavities for the HL-LHC. This paper will present the program to fabricate the Nb3Sn superconducting magnets. We are reporting the status of the HL-LHC AUP project present the results from horizontal tests of the first fully assembled cryo-assembly. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: IPAC'24 - 15th International Particle Accelerator Conference

Report number: FERMILAB-CONF-24-0273-TD

Journal ref: JACoW IPAC2024 (2024) THYN1

arXiv:2405.17004 [pdf, other]

Efficient Visual Fault Detection for Freight Train via Neural Architecture Search with Data Volume Robustness

Authors: Yang Zhang, Mingying Li, Huilin Pan, Moyun Liu, Yang Zhou

Abstract: Deep learning-based fault detection methods have achieved significant success. In visual fault detection of freight trains, there exists a large characteristic difference between inter-class components (scale variance) but intra-class on the contrary, which entails scale-awareness for detectors. Moreover, the design of task-specific networks heavily relies on human expertise. As a consequence, neu… ▽ More Deep learning-based fault detection methods have achieved significant success. In visual fault detection of freight trains, there exists a large characteristic difference between inter-class components (scale variance) but intra-class on the contrary, which entails scale-awareness for detectors. Moreover, the design of task-specific networks heavily relies on human expertise. As a consequence, neural architecture search (NAS) that automates the model design process gains considerable attention because of its promising performance. However, NAS is computationally intensive due to the large search space and huge data volume. In this work, we propose an efficient NAS-based framework for visual fault detection of freight trains to search for the task-specific detection head with capacities of multi-scale representation. First, we design a scale-aware search space for discovering an effective receptive field in the head. Second, we explore the robustness of data volume to reduce search costs based on the specifically designed search space, and a novel sharing strategy is proposed to reduce memory and further improve search efficiency. Extensive experimental results demonstrate the effectiveness of our method with data volume robustness, which achieves 46.8 and 47.9 mAP on the Bottom View and Side View datasets, respectively. Our framework outperforms the state-of-the-art approaches and linearly decreases the search costs with reduced data volumes. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 11 pages, 8 figures

arXiv:2405.16873 [pdf, other]

ContrastAlign: Toward Robust BEV Feature Alignment via Contrastive Learning for Multi-Modal 3D Object Detection

Authors: Ziying Song, Feiyang Jia, Hongyu Pan, Yadan Luo, Caiyan Jia, Guoxin Zhang, Lin Liu, Yang Ji, Lei Yang, Li Wang

Abstract: In the field of 3D object detection tasks, fusing heterogeneous features from LiDAR and camera sensors into a unified Bird's Eye View (BEV) representation is a widely adopted paradigm. However, existing methods are often compromised by imprecise sensor calibration, resulting in feature misalignment in LiDAR-camera BEV fusion. Moreover, such inaccuracies result in errors in depth estimation for the… ▽ More In the field of 3D object detection tasks, fusing heterogeneous features from LiDAR and camera sensors into a unified Bird's Eye View (BEV) representation is a widely adopted paradigm. However, existing methods are often compromised by imprecise sensor calibration, resulting in feature misalignment in LiDAR-camera BEV fusion. Moreover, such inaccuracies result in errors in depth estimation for the camera branch, ultimately causing misalignment between LiDAR and camera BEV features. In this work, we propose a novel ContrastAlign approach that utilizes contrastive learning to enhance the alignment of heterogeneous modalities, thereby improving the robustness of the fusion process. Specifically, our approach includes the L-Instance module, which directly outputs LiDAR instance features within LiDAR BEV features. Then, we introduce the C-Instance module, which predicts camera instance features through RoI (Region of Interest) pooling on the camera BEV features. We propose the InstanceFusion module, which utilizes contrastive learning to generate similar instance features across heterogeneous modalities. We then use graph matching to calculate the similarity between the neighboring camera instance features and the similarity instance features to complete the alignment of instance features. Our method achieves state-of-the-art performance, with an mAP of 70.3%, surpassing BEVFusion by 1.8% on the nuScenes validation set. Importantly, our method outperforms BEVFusion by 7.3% under conditions with misalignment noise. △ Less

Submitted 5 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.15769 [pdf, other]

FastDrag: Manipulate Anything in One Step

Authors: Xuanjia Zhao, Jian Guan, Congyi Fan, Dongli Xu, Youtian Lin, Haiwei Pan, Pengming Feng

Abstract: Drag-based image editing using generative models provides precise control over image contents, enabling users to manipulate anything in an image with a few clicks. However, prevailing methods typically adopt $n$-step iterations for latent semantic optimization to achieve drag-based image editing, which is time-consuming and limits practical applications. In this paper, we introduce a novel one-ste… ▽ More Drag-based image editing using generative models provides precise control over image contents, enabling users to manipulate anything in an image with a few clicks. However, prevailing methods typically adopt $n$-step iterations for latent semantic optimization to achieve drag-based image editing, which is time-consuming and limits practical applications. In this paper, we introduce a novel one-step drag-based image editing method, i.e., FastDrag, to accelerate the editing process. Central to our approach is a latent warpage function (LWF), which simulates the behavior of a stretched material to adjust the location of individual pixels within the latent space. This innovation achieves one-step latent semantic optimization and hence significantly promotes editing speeds. Meanwhile, null regions emerging after applying LWF are addressed by our proposed bilateral nearest neighbor interpolation (BNNI) strategy. This strategy interpolates these regions using similar features from neighboring areas, thus enhancing semantic integrity. Additionally, a consistency-preserving strategy is introduced to maintain the consistency between the edited and original images by adopting semantic information from the original image, saved as key and value pairs in self-attention module during diffusion inversion, to guide the diffusion sampling. Our FastDrag is validated on the DragBench dataset, demonstrating substantial improvements in processing time over existing methods, while achieving enhanced editing performance. Project page: https://fastdrag-site.github.io/ . △ Less

Submitted 6 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

Comments: 13 pages, 13 figures, Project page: https://fastdrag-site.github.io/

arXiv:2405.15188 [pdf, other]

PS-CAD: Local Geometry Guidance via Prompting and Selection for CAD Reconstruction

Authors: Bingchen Yang, Haiyong Jiang, Hao Pan, Peter Wonka, Jun Xiao, Guosheng Lin

Abstract: Reverse engineering CAD models from raw geometry is a classic but challenging research problem. In particular, reconstructing the CAD modeling sequence from point clouds provides great interpretability and convenience for editing. To improve upon this problem, we introduce geometric guidance into the reconstruction network. Our proposed model, PS-CAD, reconstructs the CAD modeling sequence one ste… ▽ More Reverse engineering CAD models from raw geometry is a classic but challenging research problem. In particular, reconstructing the CAD modeling sequence from point clouds provides great interpretability and convenience for editing. To improve upon this problem, we introduce geometric guidance into the reconstruction network. Our proposed model, PS-CAD, reconstructs the CAD modeling sequence one step at a time. At each step, we provide two forms of geometric guidance. First, we provide the geometry of surfaces where the current reconstruction differs from the complete model as a point cloud. This helps the framework to focus on regions that still need work. Second, we use geometric analysis to extract a set of planar prompts, that correspond to candidate surfaces where a CAD extrusion step could be started. Our framework has three major components. Geometric guidance computation extracts the two types of geometric guidance. Single-step reconstruction computes a single candidate CAD modeling step for each provided prompt. Single-step selection selects among the candidate CAD modeling steps. The process continues until the reconstruction is completed. Our quantitative results show a significant improvement across all metrics. For example, on the dataset DeepCAD, PS-CAD improves upon the best published SOTA method by reducing the geometry errors (CD and HD) by 10%, and the structural error (ECD metric) by about 15%. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.14936 [pdf, other]

Local and nonlocal stochastic control of quantum chaos: Measurement- and control-induced criticality

Authors: Haining Pan, Sriram Ganeshan, Thomas Iadecola, Justin H. Wilson, J. H. Pixley

Abstract: We theoretically study the topology of the phase diagram of a family of quantum models inspired by the classical Bernoulli map under stochastic control. The quantum models inherit a control-induced phase transition from the classical model and also manifest an entanglement phase transition intrinsic to the quantum setting. This measurement-induced phase transition has been shown in various setting… ▽ More We theoretically study the topology of the phase diagram of a family of quantum models inspired by the classical Bernoulli map under stochastic control. The quantum models inherit a control-induced phase transition from the classical model and also manifest an entanglement phase transition intrinsic to the quantum setting. This measurement-induced phase transition has been shown in various settings to either coincide or split off from the control transition, but a systematic understanding of the necessary and sufficient conditions for the two transitions to coincide in this case has so far been lacking. In this work, we generalize the control map to allow for either local or global control action. While this does not affect the classical aspects of the control transition that is described by a random walk, it significantly influences the quantum dynamics, leading to the universality class of the measurement-induced transition being dependent on the locality of the control operation. In the presence of a global control map, the two transitions coincide and the control-induced phase transition dominates the measurement-induced phase transition. Contrarily, the two transitions split in the presence of the local control map or additional projective measurements and generically take on distinct universality classes. For local control, the measurement-induced phase transition recovers the Haar logarithmic conformal field theory universality class found in feedback-free models. However, for global control, a novel universality class with correlation length exponent $ν\approx 0.7$ emerges from the interplay of control and projective measurements. This work provides a more refined understanding of the relationship between the control- and measurement-induced phase transitions. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: 16 pages, 10 figures

arXiv:2405.13901 [pdf, other]

DCT-Based Decorrelated Attention for Vision Transformers

Authors: Hongyi Pan, Emadeldeen Hamdan, Xin Zhu, Koushik Biswas, Ahmet Enis Cetin, Ulas Bagci

Abstract: Central to the Transformer architectures' effectiveness is the self-attention mechanism, a function that maps queries, keys, and values into a high-dimensional vector space. However, training the attention weights of queries, keys, and values is non-trivial from a state of random initialization. In this paper, we propose two methods. (i) We first address the initialization problem of Vision Transf… ▽ More Central to the Transformer architectures' effectiveness is the self-attention mechanism, a function that maps queries, keys, and values into a high-dimensional vector space. However, training the attention weights of queries, keys, and values is non-trivial from a state of random initialization. In this paper, we propose two methods. (i) We first address the initialization problem of Vision Transformers by introducing a simple, yet highly innovative, initialization approach utilizing Discrete Cosine Transform (DCT) coefficients. Our proposed DCT-based attention initialization marks a significant gain compared to traditional initialization strategies; offering a robust foundation for the attention mechanism. Our experiments reveal that the DCT-based initialization enhances the accuracy of Vision Transformers in classification tasks. (ii) We also recognize that since DCT effectively decorrelates image information in the frequency domain, this decorrelation is useful for compression because it allows the quantization step to discard many of the higher-frequency components. Based on this observation, we propose a novel DCT-based compression technique for the attention function of Vision Transformers. Since high-frequency DCT coefficients usually correspond to noise, we truncate the high-frequency DCT components of the input patches. Our DCT-based compression reduces the size of weight matrices for queries, keys, and values. While maintaining the same level of accuracy, our DCT compressed Swin Transformers obtain a considerable decrease in the computational overhead. △ Less

Submitted 28 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.13729 [pdf, other]

ComboStoc: Combinatorial Stochasticity for Diffusion Generative Models

Authors: Rui Xu, Jiepeng Wang, Hao Pan, Yang Liu, Xin Tong, Shiqing Xin, Changhe Tu, Taku Komura, Wenping Wang

Abstract: In this paper, we study an under-explored but important factor of diffusion generative models, i.e., the combinatorial complexity. Data samples are generally high-dimensional, and for various structured generation tasks, there are additional attributes which are combined to associate with data samples. We show that the space spanned by the combination of dimensions and attributes is insufficiently… ▽ More In this paper, we study an under-explored but important factor of diffusion generative models, i.e., the combinatorial complexity. Data samples are generally high-dimensional, and for various structured generation tasks, there are additional attributes which are combined to associate with data samples. We show that the space spanned by the combination of dimensions and attributes is insufficiently sampled by existing training scheme of diffusion generative models, causing degraded test time performance. We present a simple fix to this problem by constructing stochastic processes that fully exploit the combinatorial structures, hence the name ComboStoc. Using this simple strategy, we show that network training is significantly accelerated across diverse data modalities, including images and 3D structured shapes. Moreover, ComboStoc enables a new way of test time generation which uses insynchronized time steps for different dimensions and attributes, thus allowing for varying degrees of control over them. △ Less

Submitted 24 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.12367 [pdf, other]

Large-Scale Multi-Center CT and MRI Segmentation of Pancreas with Deep Learning

Authors: Zheyuan Zhang, Elif Keles, Gorkem Durak, Yavuz Taktak, Onkar Susladkar, Vandan Gorade, Debesh Jha, Asli C. Ormeci, Alpay Medetalibeyoglu, Lanhong Yao, Bin Wang, Ilkin Sevgi Isler, Linkai Peng, Hongyi Pan, Camila Lopes Vendrami, Amir Bourhani, Yury Velichko, Boqing Gong, Concetto Spampinato, Ayis Pyrros, Pallavi Tiwari, Derk C. F. Klatte, Megan Engels, Sanne Hoogenboom, Candice W. Bolan , et al. (13 additional authors not shown)

Abstract: Automated volumetric segmentation of the pancreas on cross-sectional imaging is needed for diagnosis and follow-up of pancreatic diseases. While CT-based pancreatic segmentation is more established, MRI-based segmentation methods are understudied, largely due to a lack of publicly available datasets, benchmarking research efforts, and domain-specific deep learning methods. In this retrospective st… ▽ More Automated volumetric segmentation of the pancreas on cross-sectional imaging is needed for diagnosis and follow-up of pancreatic diseases. While CT-based pancreatic segmentation is more established, MRI-based segmentation methods are understudied, largely due to a lack of publicly available datasets, benchmarking research efforts, and domain-specific deep learning methods. In this retrospective study, we collected a large dataset (767 scans from 499 participants) of T1-weighted (T1W) and T2-weighted (T2W) abdominal MRI series from five centers between March 2004 and November 2022. We also collected CT scans of 1,350 patients from publicly available sources for benchmarking purposes. We developed a new pancreas segmentation method, called PanSegNet, combining the strengths of nnUNet and a Transformer network with a new linear attention module enabling volumetric computation. We tested PanSegNet's accuracy in cross-modality (a total of 2,117 scans) and cross-center settings with Dice and Hausdorff distance (HD95) evaluation metrics. We used Cohen's kappa statistics for intra and inter-rater agreement evaluation and paired t-tests for volume and Dice comparisons, respectively. For segmentation accuracy, we achieved Dice coefficients of 88.3% (std: 7.2%, at case level) with CT, 85.0% (std: 7.9%) with T1W MRI, and 86.3% (std: 6.4%) with T2W MRI. There was a high correlation for pancreas volume prediction with R^2 of 0.91, 0.84, and 0.85 for CT, T1W, and T2W, respectively. We found moderate inter-observer (0.624 and 0.638 for T1W and T2W MRI, respectively) and high intra-observer agreement scores. All MRI data is made available at https://osf.io/kysnj/. Our source code is available at https://github.com/NUBagciLab/PaNSegNet. △ Less

Submitted 25 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

Comments: under review version

arXiv:2405.09056 [pdf, other]

CTS: A Consistency-Based Medical Image Segmentation Model

Authors: Kejia Zhang, Lan Zhang, Haiwei Pan, Baolong Yu

Abstract: In medical image segmentation tasks, diffusion models have shown significant potential. However, mainstream diffusion models suffer from drawbacks such as multiple sampling times and slow prediction results. Recently, consistency models, as a standalone generative network, have resolved this issue. Compared to diffusion models, consistency models can reduce the sampling times to once, not only ach… ▽ More In medical image segmentation tasks, diffusion models have shown significant potential. However, mainstream diffusion models suffer from drawbacks such as multiple sampling times and slow prediction results. Recently, consistency models, as a standalone generative network, have resolved this issue. Compared to diffusion models, consistency models can reduce the sampling times to once, not only achieving similar generative effects but also significantly speeding up training and prediction. However, they are not suitable for image segmentation tasks, and their application in the medical imaging field has not yet been explored. Therefore, this paper applies the consistency model to medical image segmentation tasks, designing multi-scale feature signal supervision modes and loss function guidance to achieve model convergence. Experiments have verified that the CTS model can obtain better medical image segmentation results with a single sampling during the test phase. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2405.08589 [pdf, other]

Variable Substitution and Bilinear Programming for Aligning Partially Overlapping Point Sets

Authors: Wei Lian, Zhesen Cui, Fei Ma, Hang Pan, Wangmeng Zuo

Abstract: In many applications, the demand arises for algorithms capable of aligning partially overlapping point sets while remaining invariant to the corresponding transformations. This research presents a method designed to meet such requirements through minimization of the objective function of the robust point matching (RPM) algorithm. First, we show that the RPM objective is a cubic polynomial. Then, t… ▽ More In many applications, the demand arises for algorithms capable of aligning partially overlapping point sets while remaining invariant to the corresponding transformations. This research presents a method designed to meet such requirements through minimization of the objective function of the robust point matching (RPM) algorithm. First, we show that the RPM objective is a cubic polynomial. Then, through variable substitution, we transform the RPM objective to a quadratic function. Leveraging the convex envelope of bilinear monomials, we proceed to relax the resulting objective function, thus obtaining a lower bound problem that can be conveniently decomposed into distinct linear assignment and low-dimensional convex quadratic program components, both amenable to efficient optimization. Furthermore, a branch-and-bound (BnB) algorithm is devised, which solely branches over the transformation parameters, thereby boosting convergence rate. Empirical evaluations demonstrate better robustness of the proposed methodology against non-rigid deformation, positional noise, and outliers, particularly in scenarios where outliers remain distinct from inliers, when compared with prevailing state-of-the-art approaches. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2405.07303 [pdf, other]

Search for solar axions by Primakoff effect with the full dataset of the CDEX-1B Experiment

Authors: L. T. Yang, S. K. Liu, Q. Yue, K. J. Kang, Y. J. Li, H. P. An, Greeshma C., J. P. Chang, Y. H. Chen, J. P. Cheng, W. H. Dai, Z. Deng, C. H. Fang, X. P. Geng, H. Gong, Q. J. Guo, T. Guo, X. Y. Guo, L. He, J. R. He, J. W. Hu, H. X. Huang, T. C. Huang, L. Jiang, S. Karmakar , et al. (61 additional authors not shown)

Abstract: We present the first limit on $g_{Aγ}$ coupling constant using the Bragg-Primakoff conversion based on an exposure of 1107.5 kg days of data from the CDEX-1B experiment at the China Jinping Underground Laboratory. The data are consistent with the null signal hypothesis, and no excess signals are observed. Limits of the coupling $g_{Aγ}<2.08\times10^{-9}$ GeV$^{-1}$ (95\% C.L.) are derived for axio… ▽ More We present the first limit on $g_{Aγ}$ coupling constant using the Bragg-Primakoff conversion based on an exposure of 1107.5 kg days of data from the CDEX-1B experiment at the China Jinping Underground Laboratory. The data are consistent with the null signal hypothesis, and no excess signals are observed. Limits of the coupling $g_{Aγ}<2.08\times10^{-9}$ GeV$^{-1}$ (95\% C.L.) are derived for axions with mass up to 100 eV/$c^2$. Within the hadronic model of KSVZ, our results exclude axion mass $>5.3~\rm{eV}/c^2$ at 95\% C.L. △ Less

Submitted 12 May, 2024; originally announced May 2024.

Comments: 7 pages, 5 figures

arXiv:2405.06166 [pdf, other]

MDNet: Multi-Decoder Network for Abdominal CT Organs Segmentation

Authors: Debesh Jha, Nikhil Kumar Tomar, Koushik Biswas, Gorkem Durak, Matthew Antalek, Zheyuan Zhang, Bin Wang, Md Mostafijur Rahman, Hongyi Pan, Alpay Medetalibeyoglu, Yury Velichko, Daniela Ladner, Amir Borhani, Ulas Bagci

Abstract: Accurate segmentation of organs from abdominal CT scans is essential for clinical applications such as diagnosis, treatment planning, and patient monitoring. To handle challenges of heterogeneity in organ shapes, sizes, and complex anatomical relationships, we propose a \textbf{\textit{\ac{MDNet}}}, an encoder-decoder network that uses the pre-trained \textit{MiT-B2} as the encoder and multiple di… ▽ More Accurate segmentation of organs from abdominal CT scans is essential for clinical applications such as diagnosis, treatment planning, and patient monitoring. To handle challenges of heterogeneity in organ shapes, sizes, and complex anatomical relationships, we propose a \textbf{\textit{\ac{MDNet}}}, an encoder-decoder network that uses the pre-trained \textit{MiT-B2} as the encoder and multiple different decoder networks. Each decoder network is connected to a different part of the encoder via a multi-scale feature enhancement dilated block. With each decoder, we increase the depth of the network iteratively and refine segmentation masks, enriching feature maps by integrating previous decoders' feature maps. To refine the feature map further, we also utilize the predicted masks from the previous decoder to the current decoder to provide spatial attention across foreground and background regions. MDNet effectively refines the segmentation mask with a high dice similarity coefficient (DSC) of 0.9013 and 0.9169 on the Liver Tumor segmentation (LiTS) and MSD Spleen datasets. Additionally, it reduces Hausdorff distance (HD) to 3.79 for the LiTS dataset and 2.26 for the spleen segmentation dataset, underscoring the precision of MDNet in capturing the complex contours. Moreover, \textit{\ac{MDNet}} is more interpretable and robust compared to the other baseline models. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.05364 [pdf, other]

Do spiral arms enhance star formation efficiency?

Authors: Miguel Querejeta, Adam K. Leroy, Sharon E. Meidt, Eva Schinnerer, Francesco Belfiore, Eric Emsellem, Ralf S. Klessen, Jiayi Sun, Mattia Sormani, Ivana Bešlic, Yixian Cao, Mélanie Chevance, Dario Colombo, Daniel A. Dale, Santiago García-Burillo, Simon C. O. Glover, Kathryn Grasha, Brent Groves, Eric. W. Koch, Lukas Neumann, Hsi-An Pan, Ismael Pessa, Jérôme Pety, Francesca Pinna, Lise Ramambason , et al. (10 additional authors not shown)

Abstract: Spiral arms are some of the most spectacular features in disc galaxies, and also present in our own Milky Way. It has been argued that star formation should proceed more efficiently in spiral arms as a result of gas compression. Yet, observational studies have so far yielded contradictory results. Here we examine arm/interarm surface density contrasts at ~100 pc resolution in 28 spiral galaxies fr… ▽ More Spiral arms are some of the most spectacular features in disc galaxies, and also present in our own Milky Way. It has been argued that star formation should proceed more efficiently in spiral arms as a result of gas compression. Yet, observational studies have so far yielded contradictory results. Here we examine arm/interarm surface density contrasts at ~100 pc resolution in 28 spiral galaxies from the PHANGS survey. We find that the arm/interarm contrast in stellar mass surface density (Sigma_*) is very modest, typically a few tens of percent. This is much smaller than the contrasts measured for molecular gas (Sigma_mol) or star formation rate (Sigma_SFR) surface density, which typically reach a factor of ~2-3. Yet, Sigma_mol and Sigma_SFR contrasts show a significant correlation with the enhancement in Sigma_*, suggesting that the small stellar contrast largely dictates the stronger accumulation of gas and star formation. All these contrasts increase for grand-design spirals compared to multi-armed and flocculent systems (and for galaxies with high stellar mass). The median star formation efficiency (SFE) of the molecular gas is 16% higher in spiral arms than in interarm regions, with a large scatter, and the contrast increases significantly (median SFE contrast 2.34) for regions of particularly enhanced stellar contrast (Sigma_* contrast >1.97). The molecular-to-atomic gas ratio (Sigma_mol/Sigma_atom) is higher in spiral arms, pointing to a transformation of atomic to molecular gas. In conclusion, the boost in the star formation efficiency of molecular gas in spiral arms is generally modest or absent, except for locations with exceptionally large stellar contrasts. (abridged) △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: 26 pages, 16 figures. Accepted for publication in A&A

arXiv:2405.04132

Large Bulk Photovoltaic Effect of Nitride Perovskite LaWN3 as Photocatalyst for Hydrogen Evolution Reaction: First Principles Calculation

Authors: Keyu An, Zhuang Qian, Haoqiang Ai, Zhichao Yu, Shuangpeng Wang, Hui Pan

Abstract: Bulk photovoltaic effect in noncentrosymmetric materials is a fundamental and significant property that holds potential for high-efficiency energy harvesting, such as photoelectric application and photocatalysis. Here, based on first principles calculation, we explore the electronic structure, dielectric property, shift current, and photocatalytic performance of novel nitride perovskite LaWN3. Our… ▽ More Bulk photovoltaic effect in noncentrosymmetric materials is a fundamental and significant property that holds potential for high-efficiency energy harvesting, such as photoelectric application and photocatalysis. Here, based on first principles calculation, we explore the electronic structure, dielectric property, shift current, and photocatalytic performance of novel nitride perovskite LaWN3. Our calculations show that LaWN3 possesses large dielectric constants and shift current. The shift current can be enhanced by considering spin-orbit coupling and is switchable by ferroelectric polarization, which suggests LaWN3 is a promising candidate for logic and neuromorphic photovoltaic devices driven by ferroelectric polarization. Additionally, LaWN3 shows advanced photocatalytic hydrogen evolution reaction as a photocatalyst. Especially, the (110) surface represents low surface energy and Gibbes free energy, implying that the (110) surface may be exposed to the active surface. Our finding highlights potential applications of novel polar nitride perovskite LaWN3 in various fields not only photoelectric devices but also photocatalysis. △ Less

Submitted 13 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

Comments: The manuscript needs to update the data

arXiv:2405.01503 [pdf, other]

PAM-UNet: Shifting Attention on Region of Interest in Medical Images

Authors: Abhijit Das, Debesh Jha, Vandan Gorade, Koushik Biswas, Hongyi Pan, Zheyuan Zhang, Daniela P. Ladner, Yury Velichko, Amir Borhani, Ulas Bagci

Abstract: Computer-aided segmentation methods can assist medical personnel in improving diagnostic outcomes. While recent advancements like UNet and its variants have shown promise, they face a critical challenge: balancing accuracy with computational efficiency. Shallow encoder architectures in UNets often struggle to capture crucial spatial features, leading in inaccurate and sparse segmentation. To addre… ▽ More Computer-aided segmentation methods can assist medical personnel in improving diagnostic outcomes. While recent advancements like UNet and its variants have shown promise, they face a critical challenge: balancing accuracy with computational efficiency. Shallow encoder architectures in UNets often struggle to capture crucial spatial features, leading in inaccurate and sparse segmentation. To address this limitation, we propose a novel \underline{P}rogressive \underline{A}ttention based \underline{M}obile \underline{UNet} (\underline{PAM-UNet}) architecture. The inverted residual (IR) blocks in PAM-UNet help maintain a lightweight framework, while layerwise \textit{Progressive Luong Attention} ($\mathcal{PLA}$) promotes precise segmentation by directing attention toward regions of interest during synthesis. Our approach prioritizes both accuracy and speed, achieving a commendable balance with a mean IoU of 74.65 and a dice score of 82.87, while requiring only 1.32 floating-point operations per second (FLOPS) on the Liver Tumor Segmentation Benchmark (LiTS) 2017 dataset. These results highlight the importance of developing efficient segmentation models to accelerate the adoption of AI in clinical practice. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: Accepted at 2024 IEEE EMBC

arXiv:2404.19515 [pdf, other]

The earthquake metric on Teichm{ü}ller space

Authors: Yi Huang, Ken'ichi Ohshika, Huiping Pan, Athanase Papadopoulos

Abstract: This is the first paper to systematically study the earthquake metric, an asymmetric Finsler metric on Teichm{ü}ller space introduced by Thurston. We provide proofs for several assertions of Thurston and establish new properties of this metric, among which are incompleteness, asymptotic distance to the boundary and comparisons with the Thurston metric and the Weil--Petersson metric. In doing so,… ▽ More This is the first paper to systematically study the earthquake metric, an asymmetric Finsler metric on Teichm{ü}ller space introduced by Thurston. We provide proofs for several assertions of Thurston and establish new properties of this metric, among which are incompleteness, asymptotic distance to the boundary and comparisons with the Thurston metric and the Weil--Petersson metric. In doing so, we propose a novel asymmetric generalisation of the notion of completion for symmetric metrics, which we call the FD-completion, and prove that for the earthquake metric the FD-completion and various symmetrised metric completions coincide with the Weil--Petersson completion. We also answer a question of Thurston by giving an interpretation of this metric arising from a global minimisation problem, namely, the earthquake magnitude minimisation problem. At several points of this paper, we formulate a certain number of open problems which will show that the earthquake metric constitutes a promising subject. △ Less

Submitted 30 April, 2024; originally announced April 2024.

arXiv:2404.16425 [pdf, other]

Soft X-ray prompt emission from a high-redshift gamma-ray burst EP240315a

Authors: Y. Liu, H. Sun, D. Xu, D. S. Svinkin, J. Delaunay, N. R. Tanvir, H. Gao, C. Zhang, Y. Chen, X. -F. Wu, B. Zhang, W. Yuan, J. An, G. Bruni, D. D. Frederiks, G. Ghirlanda, J. -W. Hu, A. Li, C. -K. Li, J. -D. Li, D. B. Malesani, L. Piro, G. Raman, R. Ricci, E. Troja , et al. (170 additional authors not shown)

Abstract: Long gamma-ray bursts (GRBs) are believed to originate from core collapse of massive stars. High-redshift GRBs can probe the star formation and reionization history of the early universe, but their detection remains rare. Here we report the detection of a GRB triggered in the 0.5--4 keV band by the Wide-field X-ray Telescope (WXT) on board the Einstein Probe (EP) mission, designated as EP240315a,… ▽ More Long gamma-ray bursts (GRBs) are believed to originate from core collapse of massive stars. High-redshift GRBs can probe the star formation and reionization history of the early universe, but their detection remains rare. Here we report the detection of a GRB triggered in the 0.5--4 keV band by the Wide-field X-ray Telescope (WXT) on board the Einstein Probe (EP) mission, designated as EP240315a, whose bright peak was also detected by the Swift Burst Alert Telescope and Konus-Wind through off-line analyses. At a redshift of $z=4.859$, EP240315a showed a much longer and more complicated light curve in the soft X-ray band than in gamma-rays. Benefiting from a large field-of-view ($\sim$3600 deg$^2$) and a high sensitivity, EP-WXT captured the earlier engine activation and extended late engine activity through a continuous detection. With a peak X-ray flux at the faint end of previously known high-$z$ GRBs, the detection of EP240315a demonstrates the great potential for EP to study the early universe via GRBs. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: 41 pages, 8 figures, 7 tables

arXiv:2404.15921 [pdf, ps, other]

Algebraic intersection for hyperbolic surfaces

Authors: Manman Jiang, Huiping Pan

Abstract: We show that the algebraic intersection form of hyperbolic surfaces of genus $g$ has a minimum in the moduli space and that the minimum grows in the order $(\log g)^{-2}$ in terms of the genus. We also describe the asymptotic behavior of the algebraic intersection form in the moduli space as the homologically systolic length goes to zero. We show that the algebraic intersection form of hyperbolic surfaces of genus $g$ has a minimum in the moduli space and that the minimum grows in the order $(\log g)^{-2}$ in terms of the genus. We also describe the asymptotic behavior of the algebraic intersection form in the moduli space as the homologically systolic length goes to zero. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: 33 pages, 4 figures. All comments are welcome!

MSC Class: 30F60; 32G15; 30F45

arXiv:2404.13701 [pdf, other]

Semantic-Rearrangement-Based Multi-Level Alignment for Domain Generalized Segmentation

Authors: Guanlong Jiao, Chenyangguang Zhang, Haonan Yin, Yu Mo, Biqing Huang, Hui Pan, Yi Luo, Jingxian Liu

Abstract: Domain generalized semantic segmentation is an essential computer vision task, for which models only leverage source data to learn the capability of generalized semantic segmentation towards the unseen target domains. Previous works typically address this challenge by global style randomization or feature regularization. In this paper, we argue that given the observation that different local seman… ▽ More Domain generalized semantic segmentation is an essential computer vision task, for which models only leverage source data to learn the capability of generalized semantic segmentation towards the unseen target domains. Previous works typically address this challenge by global style randomization or feature regularization. In this paper, we argue that given the observation that different local semantic regions perform different visual characteristics from the source domain to the target domain, methods focusing on global operations are hard to capture such regional discrepancies, thus failing to construct domain-invariant representations with the consistency from local to global level. Therefore, we propose the Semantic-Rearrangement-based Multi-Level Alignment (SRMA) to overcome this problem. SRMA first incorporates a Semantic Rearrangement Module (SRM), which conducts semantic region randomization to enhance the diversity of the source domain sufficiently. A Multi-Level Alignment module (MLA) is subsequently proposed with the help of such diversity to establish the global-regional-local consistent domain-invariant representations. By aligning features across randomized samples with domain-neutral knowledge at multiple levels, SRMA provides a more robust way to handle the source-target domain gap. Extensive experiments demonstrate the superiority of SRMA over the current state-of-the-art works on various benchmarks. △ Less

Submitted 21 April, 2024; originally announced April 2024.

arXiv:2404.09793 [pdf, other]

First Search for Light Fermionic Dark Matter Absorption on Electrons Using Germanium Detector in CDEX-10 Experiment

Authors: J. X. Liu, L. T. Yang, Q. Yue, K. J. Kang, Y. J. Li, H. P. An, Greeshma C., J. P. Chang, Y. H. Chen, J. P. Cheng, W. H. Dai, Z. Deng, C. H. Fang, X. P. Geng, H. Gong, Q. J. Guo, T. Guo, X. Y. Guo, L. He, J. R. He, J. W. Hu, H. X. Huang, T. C. Huang, L. Jiang, S. Karmakar , et al. (61 additional authors not shown)

Abstract: We present the first results of the search for sub-MeV fermionic dark matter absorbed by electron targets of Germanium using the 205.4~kg$\cdot$day data collected by the CDEX-10 experiment, with the analysis threshold of 160~eVee. No significant dark matter (DM) signals over the background are observed. Results are presented as limits on the cross section of DM--electron interaction. We present ne… ▽ More We present the first results of the search for sub-MeV fermionic dark matter absorbed by electron targets of Germanium using the 205.4~kg$\cdot$day data collected by the CDEX-10 experiment, with the analysis threshold of 160~eVee. No significant dark matter (DM) signals over the background are observed. Results are presented as limits on the cross section of DM--electron interaction. We present new constraints of cross section in the DM range of 0.1--10 keV/$c^2$ for vector and axial-vector interaction. The upper limit on the cross section is set to be $\rm 5.5\times10^{-46}~cm^2$ for vector interaction, and $\rm 1.8\times10^{-46}~cm^2$ for axial-vector interaction at DM mass of 5 keV/$c^2$. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: 6 pages, 4 figures

arXiv:2404.08406 [pdf, other]

MambaDFuse: A Mamba-based Dual-phase Model for Multi-modality Image Fusion

Authors: Zhe Li, Haiwei Pan, Kejia Zhang, Yuhua Wang, Fengming Yu

Abstract: Multi-modality image fusion (MMIF) aims to integrate complementary information from different modalities into a single fused image to represent the imaging scene and facilitate downstream visual tasks comprehensively. In recent years, significant progress has been made in MMIF tasks due to advances in deep neural networks. However, existing methods cannot effectively and efficiently extract modali… ▽ More Multi-modality image fusion (MMIF) aims to integrate complementary information from different modalities into a single fused image to represent the imaging scene and facilitate downstream visual tasks comprehensively. In recent years, significant progress has been made in MMIF tasks due to advances in deep neural networks. However, existing methods cannot effectively and efficiently extract modality-specific and modality-fused features constrained by the inherent local reductive bias (CNN) or quadratic computational complexity (Transformers). To overcome this issue, we propose a Mamba-based Dual-phase Fusion (MambaDFuse) model. Firstly, a dual-level feature extractor is designed to capture long-range features from single-modality images by extracting low and high-level features from CNN and Mamba blocks. Then, a dual-phase feature fusion module is proposed to obtain fusion features that combine complementary information from different modalities. It uses the channel exchange method for shallow fusion and the enhanced Multi-modal Mamba (M3) blocks for deep fusion. Finally, the fused image reconstruction module utilizes the inverse transformation of the feature extraction to generate the fused result. Through extensive experiments, our approach achieves promising fusion results in infrared-visible image fusion and medical image fusion. Additionally, in a unified benchmark, MambaDFuse has also demonstrated improved performance in downstream tasks such as object detection. Code with checkpoints will be available after the peer-review process. △ Less

Submitted 12 April, 2024; originally announced April 2024.

arXiv:2404.08347 [pdf, other]

Learning to Rebalance Multi-Modal Optimization by Adaptively Masking Subnetworks

Authors: Yang Yang, Hongpeng Pan, Qing-Yuan Jiang, Yi Xu, Jinghui Tang

Abstract: Multi-modal learning aims to enhance performance by unifying models from various modalities but often faces the "modality imbalance" problem in real data, leading to a bias towards dominant modalities and neglecting others, thereby limiting its overall effectiveness. To address this challenge, the core idea is to balance the optimization of each modality to achieve a joint optimum. Existing approa… ▽ More Multi-modal learning aims to enhance performance by unifying models from various modalities but often faces the "modality imbalance" problem in real data, leading to a bias towards dominant modalities and neglecting others, thereby limiting its overall effectiveness. To address this challenge, the core idea is to balance the optimization of each modality to achieve a joint optimum. Existing approaches often employ a modal-level control mechanism for adjusting the update of each modal parameter. However, such a global-wise updating mechanism ignores the different importance of each parameter. Inspired by subnetwork optimization, we explore a uniform sampling-based optimization strategy and find it more effective than global-wise updating. According to the findings, we further propose a novel importance sampling-based, element-wise joint optimization method, called Adaptively Mask Subnetworks Considering Modal Significance(AMSS). Specifically, we incorporate mutual information rates to determine the modal significance and employ non-uniform adaptive sampling to select foreground subnetworks from each modality for parameter updates, thereby rebalancing multi-modal learning. Additionally, we demonstrate the reliability of the AMSS strategy through convergence analysis. Building upon theoretical insights, we further enhance the multi-modal mask subnetwork strategy using unbiased estimation, referred to as AMSS+. Extensive experiments reveal the superiority of our approach over comparison methods. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: 17 pages;6 figures

arXiv:2404.08016 [pdf, other]

ONNXPruner: ONNX-Based General Model Pruning Adapter

Authors: Dongdong Ren, Wenbin Li, Tianyu Ding, Lei Wang, Qi Fan, Jing Huo, Hongbing Pan, Yang Gao

Abstract: Recent advancements in model pruning have focused on developing new algorithms and improving upon benchmarks. However, the practical application of these algorithms across various models and platforms remains a significant challenge. To address this challenge, we propose ONNXPruner, a versatile pruning adapter designed for the ONNX format models. ONNXPruner streamlines the adaptation process acros… ▽ More Recent advancements in model pruning have focused on developing new algorithms and improving upon benchmarks. However, the practical application of these algorithms across various models and platforms remains a significant challenge. To address this challenge, we propose ONNXPruner, a versatile pruning adapter designed for the ONNX format models. ONNXPruner streamlines the adaptation process across diverse deep learning frameworks and hardware platforms. A novel aspect of ONNXPruner is its use of node association trees, which automatically adapt to various model architectures. These trees clarify the structural relationships between nodes, guiding the pruning process, particularly highlighting the impact on interconnected nodes. Furthermore, we introduce a tree-level evaluation method. By leveraging node association trees, this method allows for a comprehensive analysis beyond traditional single-node evaluations, enhancing pruning performance without the need for extra operations. Experiments across multiple models and datasets confirm ONNXPruner's strong adaptability and increased efficacy. Our work aims to advance the practical application of model pruning. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.01379 [pdf, other]

A new theoretical approach to disordered Majorana nanowires: Studying disorder without any disorder

Authors: Haining Pan, Sankar Das Sarma

Abstract: The interplay of disorder and short finite wire length is the crucial physics hindering progress in the semiconductor-superconductor nanowire platform for realizing non-Abelian Majorana zero modes (MZM). Disorder effectively segments the nanowire into isolated patches of quantum dots (QD) which act as subgap Andreev bound states often mimicking MZMs. In this work, we propose and develop a new theo… ▽ More The interplay of disorder and short finite wire length is the crucial physics hindering progress in the semiconductor-superconductor nanowire platform for realizing non-Abelian Majorana zero modes (MZM). Disorder effectively segments the nanowire into isolated patches of quantum dots (QD) which act as subgap Andreev bound states often mimicking MZMs. In this work, we propose and develop a new theoretical approach to model disorder, effectively a spatially varying effective mass model, which does not rely on incorporating unknown microscopic details of disorder into the Hamiltonian. This model effectively segments the wire into multiple QDs, characterized by highly enhanced effective mass at impurity sites leading to the segmentation of the wire into effective random QDs. We find that this model can reproduce disorder physics, providing a crystal clear way to understand the effects of disorder by comparing the mean free path to the superconducting coherence length. In addition, this model allows precise control over the disorder regime, enabling us to evaluate the reliability of topological invariants (TI) in predicting MZMs. We find that TIs alone may yield a significant false positive rate as indicators for topology in the actual wire with increasing disorder strength. Therefore, we propose new indicators to characterize the spatial distribution of the zero-energy state, emphasizing the key necessity for isolated MZMs localized at wire ends. Employing this set of new indicators for stringent characterizations, we explore their experimental relevance to the measured differential conductance spectra. Our findings highlight the critical role of isolated localized states, beyond the TI, in identifying topological MZMs. We believe that this approach is a powerful tool for studying realistic Majorana nanowires where disorder and short wire length obfuscate the underlying topological physics. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: 61 pages, 78 figures

arXiv:2403.20276 [pdf, other]

Constraints on the Blazar-Boosted Dark Matter from the CDEX-10 Experiment

Authors: R. Xu, L. T. Yang, Q. Yue, K. J. Kang, Y. J. Li, H. P. An, Greeshma C., J. P. Chang, Y. H. Chen, J. P. Cheng, W. H. Dai, Z. Deng, C. H. Fang, X. P. Geng, H. Gong, Q. J. Guo, T. Guo, X. Y. Guo, L. He, S. M. He, J. W. Hu, H. X. Huang, T. C. Huang, L. Jiang, S. Karmakar , et al. (59 additional authors not shown)

Abstract: We report new constraints on light dark matter (DM) boosted by blazars using the 205.4 kg day data from the CDEX-10 experiment located at the China Jinping Underground Laboratory. Two representative blazars, TXS 0506+56 and BL Lacertae are studied. The results derived from TXS 0506+56 exclude DM-nucleon elastic scattering cross sections from $4.6\times 10^{-33}\ \rm cm^2$ to… ▽ More We report new constraints on light dark matter (DM) boosted by blazars using the 205.4 kg day data from the CDEX-10 experiment located at the China Jinping Underground Laboratory. Two representative blazars, TXS 0506+56 and BL Lacertae are studied. The results derived from TXS 0506+56 exclude DM-nucleon elastic scattering cross sections from $4.6\times 10^{-33}\ \rm cm^2$ to $1\times10^{-26}\ \rm cm^2$ for DM masses between 10 keV and 1 GeV, and the results derived from BL Lacertae exclude DM-nucleon elastic scattering cross sections from $2.4\times 10^{-34}\ \rm cm^2$ to $1\times10^{-26}\ \rm cm^2$ for the same range of DM masses. The constraints correspond to the best sensitivities among solid-state detector experiments in the sub-MeV mass range. △ Less

Submitted 29 March, 2024; originally announced March 2024.

Comments: 7 pages, 4 figures

arXiv:2403.20263 [pdf, other]

Probing Dark Matter Particles from Evaporating Primordial Black Holes via Electron Scattering in the CDEX-10 Experiment

Authors: Z. H. Zhang, L. T. Yang, Q. Yue, K. J. Kang, Y. J. Li, H. P. An, Greeshma C., J. P. Chang, Y. H. Chen, J. P. Cheng, W. H. Dai, Z. Deng, C. H. Fang, X. P. Geng, H. Gong, Q. J. Guo, T. Guo, X. Y. Guo, L. He, S. M. He, J. W. Hu, H. X. Huang, T. C. Huang, L. Jiang, S. Karmakar , et al. (59 additional authors not shown)

Abstract: Dark matter (DM) is a major constituent of the Universe. However, no definite evidence of DM particles (denoted as ``$χ$") has been found in DM direct detection (DD) experiments to date. There is a novel concept that detecting $χ$ from evaporating primordial black holes (PBHs). We search for $χ$ emitted from PBHs by investigating their interaction with target electrons. The examined PBH masses ran… ▽ More Dark matter (DM) is a major constituent of the Universe. However, no definite evidence of DM particles (denoted as ``$χ$") has been found in DM direct detection (DD) experiments to date. There is a novel concept that detecting $χ$ from evaporating primordial black holes (PBHs). We search for $χ$ emitted from PBHs by investigating their interaction with target electrons. The examined PBH masses range from 1$\times$10$^{15}$ to 7$\times$10$^{16}$ g under the current limits of PBH abundance $f_{PBH}$. Using 205.4 kg$\cdot$day data obtained from the CDEX-10 experiment conducted in the China Jinping Underground Laboratory, we exclude the $χ$--electron ($χ$--$e$) elastic-scattering cross section $σ_{χe} \sim 5\times10^{-29}$ cm$^2$ for $χ$ with a mass $m_χ\lesssim$ 0.1 keV from our results. If ($m_χ$, $σ_{χe}$) can be determined in the future, DD experiments are expected to impose strong constraints on $f_{PBH}$ for large $M_{PBH}$s. △ Less

Submitted 29 March, 2024; originally announced March 2024.

Comments: 8 pages, 6 figures

arXiv:2403.17994 [pdf, other]

Solution for Point Tracking Task of ICCV 1st Perception Test Challenge 2023

Authors: Hongpeng Pan, Yang Yang, Zhongtian Fu, Yuxuan Zhang, Shian Du, Yi Xu, Xiangyang Ji

Abstract: This report proposes an improved method for the Tracking Any Point (TAP) task, which tracks any physical surface through a video. Several existing approaches have explored the TAP by considering the temporal relationships to obtain smooth point motion trajectories, however, they still suffer from the cumulative error caused by temporal prediction. To address this issue, we propose a simple yet eff… ▽ More This report proposes an improved method for the Tracking Any Point (TAP) task, which tracks any physical surface through a video. Several existing approaches have explored the TAP by considering the temporal relationships to obtain smooth point motion trajectories, however, they still suffer from the cumulative error caused by temporal prediction. To address this issue, we propose a simple yet effective approach called TAP with confident static points (TAPIR+), which focuses on rectifying the tracking of the static point in the videos shot by a static camera. To clarify, our approach contains two key components: (1) Multi-granularity Camera Motion Detection, which could identify the video sequence by the static camera shot. (2) CMR-based point trajectory prediction with one moving object segmentation approach to isolate the static point from the moving object. Our approach ranked first in the final test with a score of 0.46. △ Less

Submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.15828 [pdf, other]

TJCCT: A Two-timescale Approach for UAV-assisted Mobile Edge Computing

Authors: Zemin Sun, Geng Sun, Qingqing Wu, Long He, Shuang Liang, Hongyang Pan, Dusit Niyato, Chau Yuen, Victor C. M. Leung

Abstract: Unmanned aerial vehicle (UAV)-assisted mobile edge computing (MEC) is emerging as a promising paradigm to provide aerial-terrestrial computing services in close proximity to mobile devices (MDs). However, meeting the demands of computation-intensive and delay-sensitive tasks for MDs poses several challenges, including the demand-supply contradiction between MDs and MEC servers, the demand-supply h… ▽ More Unmanned aerial vehicle (UAV)-assisted mobile edge computing (MEC) is emerging as a promising paradigm to provide aerial-terrestrial computing services in close proximity to mobile devices (MDs). However, meeting the demands of computation-intensive and delay-sensitive tasks for MDs poses several challenges, including the demand-supply contradiction between MDs and MEC servers, the demand-supply heterogeneity between MDs and MEC servers, the trajectory control requirements on energy efficiency and timeliness, and the different time-scale dynamics of the network. To address these issues, we first present a hierarchical architecture by incorporating terrestrial-aerial computing capabilities and leveraging UAV flexibility. Furthermore, we formulate a joint computing resource allocation, computation offloading, and trajectory control problem to maximize the system utility. Since the problem is a non-convex and NP-hard mixed integer nonlinear programming (MINLP), we propose a two-timescale joint computing resource allocation, computation offloading, and trajectory control (TJCCT) approach for solving the problem. In the short timescale, we propose a price-incentive model for on-demand computing resource allocation and a matching mechanism-based method for computation offloading. In the long timescale, we propose a convex optimization-based method for UAV trajectory control. Besides, we theoretically prove the stability, optimality, and polynomial complexity of TJCCT. Extended simulation results demonstrate that the proposed TJCCT outperforms the comparative algorithms in terms of the system utility, average processing rate, average completion delay, and average completion ratio. △ Less

Submitted 23 March, 2024; originally announced March 2024.

arXiv:2403.13751 [pdf, other]

The properties and kinematics of HCN emission across the closest starburst galaxy NGC 253 observed with ALMA

Authors: Ivana Beslic, Ashley T. Barnes, Frank Bigiel, Maria Jesus Jimenez-Donaire, Antonio Usero, Jonathan D. Henshaw, Christopher Faesi, Adam K. Leroy, Erik Rosolowsky, Jakob S. den Brok, Melanie Chevance, Cosima Eibensteiner, Kathryn Grasha, Ralf S. Klessen, J. M. Diedrerik Kruijssen, Daizhong Liu, Sharon Meidt, Justus Neumann, Lukas Neumann, Hsi-An Pan, Johannes Puschnig, Miguel Querejeta, Eva Schinnerer, Thomas G. Williams

Abstract: Studying molecular gas in nearby galaxies using hydrogen cyanide (HCN) as a tracer for higher densities than CO emission still poses a significant challenge. Even though several galaxies have HCN maps on a few kpc scales, higher-resolution maps are still required. Our goal is to examine the contrast in intensity between two tracers that probe different density regimes - HCN(1-0)/CO(2-1) ratio - an… ▽ More Studying molecular gas in nearby galaxies using hydrogen cyanide (HCN) as a tracer for higher densities than CO emission still poses a significant challenge. Even though several galaxies have HCN maps on a few kpc scales, higher-resolution maps are still required. Our goal is to examine the contrast in intensity between two tracers that probe different density regimes - HCN(1-0)/CO(2-1) ratio - and their kinematics across NGC 253. By utilizing the advanced capabilities of the Atacama Large Millimeter/submillimeter Array (ALMA), we can map these features at high resolution across a large field of view and uncover the nature of such dense gas in extragalactic systems. We present new ALMA Atacama Compact Array and Total Power (ACA+TP) observations of the HCN emission across NGC 253, covering the inner 8.6' of the galaxy disk at 300 pc scales. We analyze the integrated intensity and mean velocity of HCN and CO along each line of sight and use SCOUSE software to perform spectral decomposition, which considers each velocity component separately. Molecular gas traced by HCN piles up in a ring-like structure at a radius of 2 kpc. The HCN emission is enhanced by 2 orders of magnitude in the central 2 kpc regions, beyond which its intensity decreases with increasing galactocentric distance. The number of components in the HCN spectra shows a robust environmental dependence, with multiple velocity features across the center and bar. We have identified an increase in the HCN/CO ratio in these regions, corresponding to a velocity component likely associated with a molecular outflow. We have also discovered that the ratio between the total infrared luminosity and dense gas mass, which indicates the star formation efficiency of dense gas, is anti-correlated with the molecular gas surface density up to approximately 200 Msul/pc^2. In contrast, beyond this point, the ratio starts to increase. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: Accepted for publication to Astronomy and Astrophysics

arXiv:2403.12985 [pdf, other]

Multi-objective Optimization for Data Collection in UAV-assisted Agricultural IoT

Authors: Lingling Liu, Aimin Wang, Geng Sun, Jiahui Li, Hongyang Pan, Tony Q. S. Quek

Abstract: The ground fixed base stations (BSs) are often deployed inflexibly, and have high overheads, as well as are susceptible to the damage from natural disasters, making it impractical for them to continuously collect data from sensor devices. To improve the network coverage and performance of wireless communication, unmanned aerial vehicles (UAVs) have been introduced in diverse wireless networks, the… ▽ More The ground fixed base stations (BSs) are often deployed inflexibly, and have high overheads, as well as are susceptible to the damage from natural disasters, making it impractical for them to continuously collect data from sensor devices. To improve the network coverage and performance of wireless communication, unmanned aerial vehicles (UAVs) have been introduced in diverse wireless networks, therefore in this work we consider employing a UAV as an aerial BS to acquire data of agricultural Internet of Things (IoT) devices. To this end, we first formulate a UAV-assisted data collection multi-objective optimization problem (UDCMOP) to efficiently collect the data from agricultural sensing devices. Specifically, we aim to collaboratively optimize the hovering positions of UAV, visit sequence of UAV, speed of UAV, in addition to the transmit power of devices, to simultaneously achieve the maximization of minimum transmit rate of devices, the minimization of total energy consumption of devices, and the minimization of total energy consumption of UAV. Second, the proposed UDCMOP is a non-convex mixed integer nonlinear optimization problem, which indicates that it includes continuous and discrete solutions, making it intractable to be solved. Therefore, we solve it by proposing an improved multi-objective artificial hummingbird algorithm (IMOAHA) with several specific improvement factors, that are the hybrid initialization operator, Cauchy mutation foraging operator, in addition to the discrete mutation operator. Finally, simulations are carried out to testify that the proposed IMOAHA can effectively improve the system performance comparing to other benchmarks. △ Less

Submitted 3 March, 2024; originally announced March 2024.

Comments: 13 pages, 7 figures, 4 tables

arXiv:2403.11284 [pdf, other]

Fast Personalized Text-to-Image Syntheses With Attention Injection

Authors: Yuxuan Zhang, Yiren Song, Jinpeng Yu, Han Pan, Zhongliang Jing

Abstract: Currently, personalized image generation methods mostly require considerable time to finetune and often overfit the concept resulting in generated images that are similar to custom concepts but difficult to edit by prompts. We propose an effective and fast approach that could balance the text-image consistency and identity consistency of the generated image and reference image. Our method can gene… ▽ More Currently, personalized image generation methods mostly require considerable time to finetune and often overfit the concept resulting in generated images that are similar to custom concepts but difficult to edit by prompts. We propose an effective and fast approach that could balance the text-image consistency and identity consistency of the generated image and reference image. Our method can generate personalized images without any fine-tuning while maintaining the inherent text-to-image generation ability of diffusion models. Given a prompt and a reference image, we merge the custom concept into generated images by manipulating cross-attention and self-attention layers of the original diffusion model to generate personalized images that match the text description. Comprehensive experiments highlight the superiority of our method. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.07571 [pdf, other]

doi 10.1145/3589335.3651548

Proactive Recommendation with Iterative Preference Guidance

Authors: Shuxian Bi, Wenjie Wang, Hang Pan, Fuli Feng, Xiangnan He

Abstract: Recommender systems mainly tailor personalized recommendations according to user interests learned from user feedback. However, such recommender systems passively cater to user interests and even reinforce existing interests in the feedback loop, leading to problems like filter bubbles and opinion polarization. To counteract this, proactive recommendation actively steers users towards developing n… ▽ More Recommender systems mainly tailor personalized recommendations according to user interests learned from user feedback. However, such recommender systems passively cater to user interests and even reinforce existing interests in the feedback loop, leading to problems like filter bubbles and opinion polarization. To counteract this, proactive recommendation actively steers users towards developing new interests in a target item or topic by strategically modulating recommendation sequences. Existing work for proactive recommendation faces significant hurdles: 1) overlooking the user feedback in the guidance process; 2) lacking explicit modeling of the guiding objective; and 3) insufficient flexibility for integration into existing industrial recommender systems. To address these issues, we introduce an Iterative Preference Guidance (IPG) framework. IPG performs proactive recommendation in a flexible post-processing manner by ranking items according to their IPG scores that consider both interaction probability and guiding value. These scores are explicitly estimated with iteratively updated user representation that considers the most recent user interactions. Extensive experiments validate that IPG can effectively guide user interests toward target interests with a reasonable trade-off in recommender accuracy. The code is available at https://github.com/GabyUSTC/IPG-Rec. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: Accepted by WWW 2024 (Short)

Showing 1–50 of 802 results for author: Pan, H