-
Symmetric Second-Harmonic Generation in Sub-wavelength Periodically Poled Thin Film Lithium Niobate
Authors:
Fengyan Yang,
Juanjuan Lu,
Mohan Shen,
Guangcanlan Yang,
Hong X. Tang
Abstract:
Second harmonic generation (SHG) extensively employs periodically poled nonlinear crystals through forward quasi-phase-matching to achieve efficient frequency conversion. As poling periods approach sub-micrometers, backward quasi-phase-matching has also been demonstrated, albeit by utilizing pulsed laser drives. The realization of symmetric second harmonic generation, characterized by counterpropa…
▽ More
Second harmonic generation (SHG) extensively employs periodically poled nonlinear crystals through forward quasi-phase-matching to achieve efficient frequency conversion. As poling periods approach sub-micrometers, backward quasi-phase-matching has also been demonstrated, albeit by utilizing pulsed laser drives. The realization of symmetric second harmonic generation, characterized by counterpropagating pumps, however, has remained elusive despite theoretical predictions. The main challenge lies in achieving strong nonlinear coupling with poling period below half the wavelength of the second-harmonic light. The recent emergence of high-quality ferroelectric lithium niobate thin films provides an opportunity for achieving precise domain control at submicron dimensions. In this article, we demonstrate reliable control of ferroelectric domains in thin film lithium niobate waveguide with a poling period down to 370nm, thereby realizing highly efficient continuous-wave pumped symmetric SHG. This demonstration not only validates the feasibility of achieving subwavelength periodic poling on waveguides but also opens new avenues for leveraging submicron ferroelectric domain structures in integrated photonics and nonlinear optics research.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
DeCE: Deceptive Cross-Entropy Loss Designed for Defending Backdoor Attacks
Authors:
Guang Yang,
Yu Zhou,
Xiang Chen,
Xiangyu Zhang,
Terry Yue Zhuo,
David Lo,
Taolue Chen
Abstract:
Code Language Models (CLMs), particularly those leveraging deep learning, have achieved significant success in code intelligence domain. However, the issue of security, particularly backdoor attacks, is often overlooked in this process. The previous research has focused on designing backdoor attacks for CLMs, but effective defenses have not been adequately addressed. In particular, existing defens…
▽ More
Code Language Models (CLMs), particularly those leveraging deep learning, have achieved significant success in code intelligence domain. However, the issue of security, particularly backdoor attacks, is often overlooked in this process. The previous research has focused on designing backdoor attacks for CLMs, but effective defenses have not been adequately addressed. In particular, existing defense methods from natural language processing, when directly applied to CLMs, are not effective enough and lack generality, working well in some models and scenarios but failing in others, thus fall short in consistently mitigating backdoor attacks. To bridge this gap, we first confirm the phenomenon of ``early learning" as a general occurrence during the training of CLMs. This phenomenon refers to that a model initially focuses on the main features of training data but may become more sensitive to backdoor triggers over time, leading to overfitting and susceptibility to backdoor attacks. We then analyze that overfitting to backdoor triggers results from the use of the cross-entropy loss function, where the unboundedness of cross-entropy leads the model to increasingly concentrate on the features of the poisoned data. Based on this insight, we propose a general and effective loss function DeCE (Deceptive Cross-Entropy) by blending deceptive distributions and applying label smoothing to limit the gradient to be bounded, which prevents the model from overfitting to backdoor triggers and then enhances the security of CLMs against backdoor attacks. To verify the effectiveness of our defense method, we select code synthesis tasks as our experimental scenarios. Our experiments across various code synthesis datasets, models, and poisoning ratios demonstrate the applicability and effectiveness of DeCE in enhancing the security of CLMs.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Toward Efficient Deep Spiking Neuron Networks:A Survey On Compression
Authors:
Hui Xie,
Ge Yang,
Wenjuan Gao
Abstract:
With the rapid development of deep learning, Deep Spiking Neural Networks (DSNNs) have emerged as promising due to their unique spike event processing and asynchronous computation. When deployed on neuromorphic chips, DSNNs offer significant power advantages over Deep Artificial Neural Networks (DANNs) and eliminate time and energy consuming multiplications due to the binary nature of spikes (0 or…
▽ More
With the rapid development of deep learning, Deep Spiking Neural Networks (DSNNs) have emerged as promising due to their unique spike event processing and asynchronous computation. When deployed on neuromorphic chips, DSNNs offer significant power advantages over Deep Artificial Neural Networks (DANNs) and eliminate time and energy consuming multiplications due to the binary nature of spikes (0 or 1). Additionally, DSNNs excel in processing temporal information, making them potentially superior for handling temporal data compared to DANNs. However, their deep network structure and numerous parameters result in high computational costs and energy consumption, limiting real-life deployment. To enhance DSNNs efficiency, researchers have adapted methods from DANNs, such as pruning, quantization, and knowledge distillation, and developed specific techniques like reducing spike firing and pruning time steps. While previous surveys have covered DSNNs algorithms, hardware deployment, and general overviews, focused research on DSNNs compression and efficiency has been lacking. This survey addresses this gap by concentrating on efficient DSNNs and their compression methods. It begins with an exploration of DSNNs' biological background and computational units, highlighting differences from DANNs. It then delves into various compression methods, including pruning, quantization, knowledge distillation, and reducing spike firing, and concludes with suggestions for future research directions.
△ Less
Submitted 3 June, 2024;
originally announced July 2024.
-
DSCENet: Dynamic Screening and Clinical-Enhanced Multimodal Fusion for MPNs Subtype Classification
Authors:
Yuan Zhang,
Yaolei Qi,
Xiaoming Qi,
Yongyue Wei,
Guanyu Yang
Abstract:
The precise subtype classification of myeloproliferative neoplasms (MPNs) based on multimodal information, which assists clinicians in diagnosis and long-term treatment plans, is of great clinical significance. However, it remains a great challenging task due to the lack of diagnostic representativeness for local patches and the absence of diagnostic-relevant features from a single modality. In th…
▽ More
The precise subtype classification of myeloproliferative neoplasms (MPNs) based on multimodal information, which assists clinicians in diagnosis and long-term treatment plans, is of great clinical significance. However, it remains a great challenging task due to the lack of diagnostic representativeness for local patches and the absence of diagnostic-relevant features from a single modality. In this paper, we propose a Dynamic Screening and Clinical-Enhanced Network (DSCENet) for the subtype classification of MPNs on the multimodal fusion of whole slide images (WSIs) and clinical information. (1) A dynamic screening module is proposed to flexibly adapt the feature learning of local patches, reducing the interference of irrelevant features and enhancing their diagnostic representativeness. (2) A clinical-enhanced fusion module is proposed to integrate clinical indicators to explore complementary features across modalities, providing comprehensive diagnostic information. Our approach has been validated on the real clinical data, achieving an increase of 7.91% AUC and 16.89% accuracy compared with the previous state-of-the-art (SOTA) methods. The code is available at https://github.com/yuanzhang7/DSCENet.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Drantal-NeRF: Diffusion-Based Restoration for Anti-aliasing Neural Radiance Field
Authors:
Ganlin Yang,
Kaidong Zhang,
Jingjing Fu,
Dong Liu
Abstract:
Aliasing artifacts in renderings produced by Neural Radiance Field (NeRF) is a long-standing but complex issue in the field of 3D implicit representation, which arises from a multitude of intricate causes and was mitigated by designing more advanced but complex scene parameterization methods before. In this paper, we present a Diffusion-based restoration method for anti-aliasing Neural Radiance Fi…
▽ More
Aliasing artifacts in renderings produced by Neural Radiance Field (NeRF) is a long-standing but complex issue in the field of 3D implicit representation, which arises from a multitude of intricate causes and was mitigated by designing more advanced but complex scene parameterization methods before. In this paper, we present a Diffusion-based restoration method for anti-aliasing Neural Radiance Field (Drantal-NeRF). We consider the anti-aliasing issue from a low-level restoration perspective by viewing aliasing artifacts as a kind of degradation model added to clean ground truths. By leveraging the powerful prior knowledge encapsulated in diffusion model, we could restore the high-realism anti-aliasing renderings conditioned on aliased low-quality counterparts. We further employ a feature-wrapping operation to ensure multi-view restoration consistency and finetune the VAE decoder to better adapt to the scene-specific data distribution. Our proposed method is easy to implement and agnostic to various NeRF backbones. We conduct extensive experiments on challenging large-scale urban scenes as well as unbounded 360-degree scenes and achieve substantial qualitative and quantitative improvements.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Differentiable Optimization of Similarity Scores Between Models and Brains
Authors:
Nathan Cloos,
Moufan Li,
Markus Siegel,
Scott L. Brincat,
Earl K. Miller,
Guangyu Robert Yang,
Christopher J. Cueva
Abstract:
What metrics should guide the development of more realistic models of the brain? One proposal is to quantify the similarity between models and brains using methods such as linear regression, Centered Kernel Alignment (CKA), and angular Procrustes distance. To better understand the limitations of these similarity measures we analyze neural activity recorded in five experiments on nonhuman primates,…
▽ More
What metrics should guide the development of more realistic models of the brain? One proposal is to quantify the similarity between models and brains using methods such as linear regression, Centered Kernel Alignment (CKA), and angular Procrustes distance. To better understand the limitations of these similarity measures we analyze neural activity recorded in five experiments on nonhuman primates, and optimize synthetic datasets to become more similar to these neural recordings. How similar can these synthetic datasets be to neural activity while failing to encode task relevant variables? We find that some measures like linear regression and CKA, differ from angular Procrustes, and yield high similarity scores even when task relevant variables cannot be linearly decoded from the synthetic datasets. Synthetic datasets optimized to maximize similarity scores initially learn the first principal component of the target dataset, but angular Procrustes captures higher variance dimensions much earlier than methods like linear regression and CKA. We show in both theory and simulations how these scores change when different principal components are perturbed. And finally, we jointly optimize multiple similarity scores to find their allowed ranges, and show that a high angular Procrustes similarity, for example, implies a high CKA score, but not the converse.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
LGRNet: Local-Global Reciprocal Network for Uterine Fibroid Segmentation in Ultrasound Videos
Authors:
Huihui Xu,
Yijun Yang,
Angelica I Aviles-Rivero,
Guang Yang,
Jing Qin,
Lei Zhu
Abstract:
Regular screening and early discovery of uterine fibroid are crucial for preventing potential malignant transformations and ensuring timely, life-saving interventions. To this end, we collect and annotate the first ultrasound video dataset with 100 videos for uterine fibroid segmentation (UFUV). We also present Local-Global Reciprocal Network (LGRNet) to efficiently and effectively propagate the l…
▽ More
Regular screening and early discovery of uterine fibroid are crucial for preventing potential malignant transformations and ensuring timely, life-saving interventions. To this end, we collect and annotate the first ultrasound video dataset with 100 videos for uterine fibroid segmentation (UFUV). We also present Local-Global Reciprocal Network (LGRNet) to efficiently and effectively propagate the long-term temporal context which is crucial to help distinguish between uninformative noisy surrounding tissues and target lesion regions. Specifically, the Cyclic Neighborhood Propagation (CNP) is introduced to propagate the inter-frame local temporal context in a cyclic manner. Moreover, to aggregate global temporal context, we first condense each frame into a set of frame bottleneck queries and devise Hilbert Selective Scan (HilbertSS) to both efficiently path connect each frame and preserve the locality bias. A distribute layer is then utilized to disseminate back the global context for reciprocal refinement. Extensive experiments on UFUV and three public Video Polyp Segmentation (VPS) datasets demonstrate consistent improvements compared to state-of-the-art segmentation methods, indicating the effectiveness and versatility of LGRNet. Code, checkpoints, and dataset are available at https://github.com/bio-mlhui/LGRNet
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Probing axion-like particles in leptonic decays of heavy mesons
Authors:
Gang Yang,
Tianhong Wang,
Guo-Li Wang
Abstract:
We study the possibility to find the axion-like particles (ALPs) through the leptonic decays of heavy mesons. There are some deviations between the Standard Model (SM) predictions of the branching ratios of the leptonic decays of mesons and the experimental data. This provides some space for the existence of decay channels where the ALP is one of the products. Three scenarios are considered: first…
▽ More
We study the possibility to find the axion-like particles (ALPs) through the leptonic decays of heavy mesons. There are some deviations between the Standard Model (SM) predictions of the branching ratios of the leptonic decays of mesons and the experimental data. This provides some space for the existence of decay channels where the ALP is one of the products. Three scenarios are considered: first, the ALP is only coupled to one single charged fermion, namely, the quark, the antiquark, or the charged lepton; second, the ALP is only coupled to quark and antiquark with the same strength; third, the ALP is coupled to all the charged fermions with the same strength. The constraints of the coupling strength in different scenarios are obtained by comparing the experimental data of the branching ratios of leptonic decays of $B^-$, $D^+$, and $D_s^+$ mesons with the theoretical predictions which are achieved by using the Bethe-Salpeter (BS) method. These constraints are further applied to predict the upper limits of the leptonic decay processes of the $B_c^-$ meson in which the ALP participates.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Probing Perfection: The Relentless Art of Meddling for Pulmonary Airway Segmentation from HRCT via a Human-AI Collaboration Based Active Learning Method
Authors:
Shiyi Wang,
Yang Nan,
Sheng Zhang,
Federico Felder,
Xiaodan Xing,
Yingying Fang,
Javier Del Ser,
Simon L F Walsh,
Guang Yang
Abstract:
In pulmonary tracheal segmentation, the scarcity of annotated data is a prevalent issue in medical segmentation. Additionally, Deep Learning (DL) methods face challenges: the opacity of 'black box' models and the need for performance enhancement. Our Human-Computer Interaction (HCI) based models (RS_UNet, LC_UNet, UUNet, and WD_UNet) address these challenges by combining diverse query strategies w…
▽ More
In pulmonary tracheal segmentation, the scarcity of annotated data is a prevalent issue in medical segmentation. Additionally, Deep Learning (DL) methods face challenges: the opacity of 'black box' models and the need for performance enhancement. Our Human-Computer Interaction (HCI) based models (RS_UNet, LC_UNet, UUNet, and WD_UNet) address these challenges by combining diverse query strategies with various DL models. We train four HCI models and repeat these steps: (1) Query Strategy: The HCI models select samples that provide the most additional representative information when labeled in each iteration and identify unlabeled samples with the greatest predictive disparity using Wasserstein Distance, Least Confidence, Entropy Sampling, and Random Sampling. (2) Central line correction: Selected samples are used for expert correction of system-generated tracheal central lines in each training round. (3) Update training dataset: Experts update the training dataset after each DL model's training epoch, enhancing the trustworthiness and performance of the models. (4) Model training: The HCI model is trained using the updated dataset and an enhanced UNet version. Experimental results confirm the effectiveness of these HCI-based approaches, showing that WD-UNet, LC-UNet, UUNet, and RS-UNet achieve comparable or superior performance to state-of-the-art DL models. Notably, WD-UNet achieves this with only 15%-35% of the training data, reducing physician annotation time by 65%-85%.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Reverse time-to-death as time-scale in time-to-event analysis for studies of advanced illness and palliative care
Authors:
Yin Bun Cheung,
Xiangmei Ma,
Isha Chaudhry,
Nan Liu,
Qingyuan Zhuang,
Grace Meijuan Yang,
Chetna Malhotra,
Eric Andrew Finkelstein
Abstract:
Background: Incidence of adverse outcome events rises as patients with advanced illness approach end-of-life. Exposures that tend to occur near end-of-life, e.g., use of wheelchair, oxygen therapy and palliative care, may therefore be found associated with the incidence of the adverse outcomes. We propose a strategy for time-to-event analysis to mitigate the time-varying confounding. Methods: We p…
▽ More
Background: Incidence of adverse outcome events rises as patients with advanced illness approach end-of-life. Exposures that tend to occur near end-of-life, e.g., use of wheelchair, oxygen therapy and palliative care, may therefore be found associated with the incidence of the adverse outcomes. We propose a strategy for time-to-event analysis to mitigate the time-varying confounding. Methods: We propose a concept of reverse time-to-death (rTTD) and its use for the time-scale in time-to-event analysis. We used data on community-based palliative care uptake (exposure) and emergency department visits (outcome) among patients with advanced cancer in Singapore to illustrate. We compare the results against that of the common practice of using time-on-study (TOS) as time-scale. Results: Graphical analysis demonstrated that cancer patients receiving palliative care had higher rate of emergency department visits than non-recipients mainly because they were closer to end-of-life, and that rTTD analysis made comparison between patients at the same time-to-death. Analysis of emergency department visits in relation to palliative care using TOS time-scale showed significant increase in hazard ratio estimate when observed time-varying covariates were omitted from statistical adjustment (change-in-estimate=0.38; 95% CI 0.15 to 0.60). There was no such change in otherwise the same analysis using rTTD (change-in-estimate=0.04; 95% CI -0.02 to 0.11), demonstrating the ability of rTTD time-scale to mitigate confounding that intensifies in relation to time-to-death. Conclusion: Use of rTTD as time-scale in time-to-event analysis provides a simple and robust approach to control time-varying confounding in studies of advanced illness, even if the confounders are unmeasured.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
SwiftDiffusion: Efficient Diffusion Model Serving with Add-on Modules
Authors:
Suyi Li,
Lingyun Yang,
Xiaoxiao Jiang,
Hanfeng Lu,
Zhipeng Di,
Weiyi Lu,
Jiawei Chen,
Kan Liu,
Yinghao Yu,
Tao Lan,
Guodong Yang,
Lin Qu,
Liping Zhang,
Wei Wang
Abstract:
This paper documents our characterization study and practices for serving text-to-image requests with stable diffusion models in production. We first comprehensively analyze inference request traces for commercial text-to-image applications. It commences with our observation that add-on modules, i.e., ControlNets and LoRAs, that augment the base stable diffusion models, are ubiquitous in generatin…
▽ More
This paper documents our characterization study and practices for serving text-to-image requests with stable diffusion models in production. We first comprehensively analyze inference request traces for commercial text-to-image applications. It commences with our observation that add-on modules, i.e., ControlNets and LoRAs, that augment the base stable diffusion models, are ubiquitous in generating images for commercial applications. Despite their efficacy, these add-on modules incur high loading overhead, prolong the serving latency, and swallow up expensive GPU resources. Driven by our characterization study, we present SwiftDiffusion, a system that efficiently generates high-quality images using stable diffusion models and add-on modules. To achieve this, SwiftDiffusion reconstructs the existing text-to-image serving workflow by identifying the opportunities for parallel computation and distributing ControlNet computations across multiple GPUs. Further, SwiftDiffusion thoroughly analyzes the dynamics of image generation and develops techniques to eliminate the overhead associated with LoRA loading and patching while preserving the image quality. Last, SwiftDiffusion proposes specialized optimizations in the backbone architecture of the stable diffusion models, which are also compatible with the efficient serving of add-on modules. Compared to state-of-the-art text-to-image serving systems, SwiftDiffusion reduces serving latency by up to 5x and improves serving throughput by up to 2x without compromising image quality.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Open-TeleVision: Teleoperation with Immersive Active Visual Feedback
Authors:
Xuxin Cheng,
Jialong Li,
Shiqi Yang,
Ge Yang,
Xiaolong Wang
Abstract:
Teleoperation serves as a powerful method for collecting on-robot data essential for robot learning from demonstrations. The intuitiveness and ease of use of the teleoperation system are crucial for ensuring high-quality, diverse, and scalable data. To achieve this, we propose an immersive teleoperation system Open-TeleVision that allows operators to actively perceive the robot's surroundings in a…
▽ More
Teleoperation serves as a powerful method for collecting on-robot data essential for robot learning from demonstrations. The intuitiveness and ease of use of the teleoperation system are crucial for ensuring high-quality, diverse, and scalable data. To achieve this, we propose an immersive teleoperation system Open-TeleVision that allows operators to actively perceive the robot's surroundings in a stereoscopic manner. Additionally, the system mirrors the operator's arm and hand movements on the robot, creating an immersive experience as if the operator's mind is transmitted to a robot embodiment. We validate the effectiveness of our system by collecting data and training imitation learning policies on four long-horizon, precise tasks (Can Sorting, Can Insertion, Folding, and Unloading) for 2 different humanoid robots and deploy them in the real world. The system is open-sourced at: https://robot-tv.github.io/
△ Less
Submitted 8 July, 2024; v1 submitted 1 July, 2024;
originally announced July 2024.
-
Bridging Smoothness and Approximation: Theoretical Insights into Over-Smoothing in Graph Neural Networks
Authors:
Guangrui Yang,
Jianfei Li,
Ming Li,
Han Feng,
Ding-Xuan Zhou
Abstract:
In this paper, we explore the approximation theory of functions defined on graphs. Our study builds upon the approximation results derived from the $K$-functional. We establish a theoretical framework to assess the lower bounds of approximation for target functions using Graph Convolutional Networks (GCNs) and examine the over-smoothing phenomenon commonly observed in these networks. Initially, we…
▽ More
In this paper, we explore the approximation theory of functions defined on graphs. Our study builds upon the approximation results derived from the $K$-functional. We establish a theoretical framework to assess the lower bounds of approximation for target functions using Graph Convolutional Networks (GCNs) and examine the over-smoothing phenomenon commonly observed in these networks. Initially, we introduce the concept of a $K$-functional on graphs, establishing its equivalence to the modulus of smoothness. We then analyze a typical type of GCN to demonstrate how the high-frequency energy of the output decays, an indicator of over-smoothing. This analysis provides theoretical insights into the nature of over-smoothing within GCNs. Furthermore, we establish a lower bound for the approximation of target functions by GCNs, which is governed by the modulus of smoothness of these functions. This finding offers a new perspective on the approximation capabilities of GCNs. In our numerical experiments, we analyze several widely applied GCNs and observe the phenomenon of energy decay. These observations corroborate our theoretical results on exponential decay order.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Data-driven methods for flow and transport in porous media: a review
Authors:
Guang Yang,
Ran Xu,
Yusong Tian,
Songyuan Guo,
Jingyi Wu,
Xu Chu
Abstract:
This review examined the current advancements in data-driven methods for analyzing flow and transport in porous media, which has various applications in energy, chemical engineering, environmental science, and beyond. Although there has been progress in recent years, the challenges of current experimental and high-fidelity numerical simulations, such as high computational costs and difficulties in…
▽ More
This review examined the current advancements in data-driven methods for analyzing flow and transport in porous media, which has various applications in energy, chemical engineering, environmental science, and beyond. Although there has been progress in recent years, the challenges of current experimental and high-fidelity numerical simulations, such as high computational costs and difficulties in accurately representing complex, heterogeneous structures, can still potentially be addressed by state-of-the-art data-driven methods. We analyzed the synergistic potential of these methods, addressed their limitations, and suggested how they can be effectively integrated to improve both the fidelity and efficiency of current research. A discussion on future research directions in this field was conducted, emphasizing the need for collaborative efforts that combine domain expertise in physics and advanced computationald and data-driven methodologies.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
CMRxRecon2024: A Multi-Modality, Multi-View K-Space Dataset Boosting Universal Machine Learning for Accelerated Cardiac MRI
Authors:
Zi Wang,
Fanwen Wang,
Chen Qin,
Jun Lyu,
Ouyang Cheng,
Shuo Wang,
Yan Li,
Mengyao Yu,
Haoyu Zhang,
Kunyuan Guo,
Zhang Shi,
Qirong Li,
Ziqiang Xu,
Yajing Zhang,
Hao Li,
Sha Hua,
Binghua Chen,
Longyu Sun,
Mengting Sun,
Qin Li,
Ying-Hua Chu,
Wenjia Bai,
Jing Qin,
Xiahai Zhuang,
Claudia Prieto
, et al. (7 additional authors not shown)
Abstract:
Cardiac magnetic resonance imaging (MRI) has emerged as a clinically gold-standard technique for diagnosing cardiac diseases, thanks to its ability to provide diverse information with multiple modalities and anatomical views. Accelerated cardiac MRI is highly expected to achieve time-efficient and patient-friendly imaging, and then advanced image reconstruction approaches are required to recover h…
▽ More
Cardiac magnetic resonance imaging (MRI) has emerged as a clinically gold-standard technique for diagnosing cardiac diseases, thanks to its ability to provide diverse information with multiple modalities and anatomical views. Accelerated cardiac MRI is highly expected to achieve time-efficient and patient-friendly imaging, and then advanced image reconstruction approaches are required to recover high-quality, clinically interpretable images from undersampled measurements. However, the lack of publicly available cardiac MRI k-space dataset in terms of both quantity and diversity has severely hindered substantial technological progress, particularly for data-driven artificial intelligence. Here, we provide a standardized, diverse, and high-quality CMRxRecon2024 dataset to facilitate the technical development, fair evaluation, and clinical transfer of cardiac MRI reconstruction approaches, towards promoting the universal frameworks that enable fast and robust reconstructions across different cardiac MRI protocols in clinical practice. To the best of our knowledge, the CMRxRecon2024 dataset is the largest and most diverse publicly available cardiac k-space dataset. It is acquired from 330 healthy volunteers, covering commonly used modalities, anatomical views, and acquisition trajectories in clinical cardiac MRI workflows. Besides, an open platform with tutorials, benchmarks, and data processing tools is provided to facilitate data usage, advanced method development, and fair performance evaluation.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
The neutron array of the compact spectrometer for heavy ion experiments in Fermi energy region
Authors:
Dawei Si,
Sheng Xiao,
Yuhao Qin,
Yijie Wang,
Junhuai Xu,
Baiting Tian,
Boyuan Zhang,
Dong Guo,
Qin Zhi,
Xiaobao Wei,
Yibo Hao,
Zengxiang Wang,
Tianren Zhuo,
Yuansheng Yang,
Xianglun Wei,
Herun Yang,
Peng Ma,
Limin Duan,
Fangfang Duan,
Junbing Ma,
Shiwei Xu,
Zhen Bai,
Guo Yang,
Yanyun Yang,
Zhigang Xiao
Abstract:
The emission of neutrons from heavy ion reactions is an important observable for studying the asymmetric nuclear equation of state and the reaction dynamics. A 20-unit neutron array has been developed and mounted on the compact spectrometer for heavy ion experiments (CSHINE) to measure the neutron spectra, neutron-neutron and neutron-proton correlation functions. Each unit consists of a…
▽ More
The emission of neutrons from heavy ion reactions is an important observable for studying the asymmetric nuclear equation of state and the reaction dynamics. A 20-unit neutron array has been developed and mounted on the compact spectrometer for heavy ion experiments (CSHINE) to measure the neutron spectra, neutron-neutron and neutron-proton correlation functions. Each unit consists of a $\rm 15\times 15\times 15~cm^3$ plastic scintillator coupled to a $ φ=52 ~\rm mm$ photomultiplier. The Geant4 simulation with optical process is performed to investigate the time resolution and the neutron detection efficiency. The inherent time resolution of 212 ps is obtained by cosmic ray coincidence test. The n-$γ$ discrimination and time-of-flight performance are given by $\rm ^{252}Cf$ radioactive source test and beam test. The neutron energy spectra have been obtained in the angle range $30^\circ \le θ_{\rm lab} \le 51^\circ$ in the beam experiment of $^{124}$Sn+$^{124}$Sn at 25 MeV/u with CSHINE.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Negative Prototypes Guided Contrastive Learning for WSOD
Authors:
Yu Zhang,
Chuang Zhu,
Guoqing Yang,
Siqi Chen
Abstract:
Weakly Supervised Object Detection (WSOD) with only image-level annotation has recently attracted wide attention. Many existing methods ignore the inter-image relationship of instances which share similar characteristics while can certainly be determined not to belong to the same category. Therefore, in order to make full use of the weak label, we propose the Negative Prototypes Guided Contrastive…
▽ More
Weakly Supervised Object Detection (WSOD) with only image-level annotation has recently attracted wide attention. Many existing methods ignore the inter-image relationship of instances which share similar characteristics while can certainly be determined not to belong to the same category. Therefore, in order to make full use of the weak label, we propose the Negative Prototypes Guided Contrastive learning (NPGC) architecture. Firstly, we define Negative Prototype as the proposal with the highest confidence score misclassified for the category that does not appear in the label. Unlike other methods that only utilize category positive feature, we construct an online updated global feature bank to store both positive prototypes and negative prototypes. Meanwhile, we propose a pseudo label sampling module to mine reliable instances and discard the easily misclassified instances based on the feature similarity with corresponding prototypes in global feature bank. Finally, we follow the contrastive learning paradigm to optimize the proposal's feature representation by attracting same class samples closer and pushing different class samples away in the embedding space. Extensive experiments have been conducted on VOC07, VOC12 datasets, which shows that our proposed method achieves the state-of-the-art performance.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Decoding Decision Reasoning: A Counterfactual-Powered Model for Knowledge Discovery
Authors:
Yingying Fang,
Zihao Jin,
Xiaodan Xing,
Simon Walsh,
Guang Yang
Abstract:
In medical imaging, particularly in early disease detection and prognosis tasks, discerning the rationale behind an AI model's predictions is crucial for evaluating the reliability of its decisions. Conventional explanation methods face challenges in identifying discernible decisive features in medical image classifications, where discriminative features are subtle or not immediately apparent. To…
▽ More
In medical imaging, particularly in early disease detection and prognosis tasks, discerning the rationale behind an AI model's predictions is crucial for evaluating the reliability of its decisions. Conventional explanation methods face challenges in identifying discernible decisive features in medical image classifications, where discriminative features are subtle or not immediately apparent. To bridge this gap, we propose an explainable model that is equipped with both decision reasoning and feature identification capabilities. Our approach not only detects influential image patterns but also uncovers the decisive features that drive the model's final predictions. By implementing our method, we can efficiently identify and visualise class-specific features leveraged by the data-driven model, providing insights into the decision-making processes of deep learning models. We validated our model in the demanding realm of medical prognosis task, demonstrating its efficacy and potential in enhancing the reliability of AI in healthcare and in discovering new knowledge in diseases where prognostic understanding is limited.
△ Less
Submitted 23 May, 2024;
originally announced June 2024.
-
SimsChat: A Customisable Persona-Driven Role-Playing Agent
Authors:
Bohao Yang,
Dong Liu,
Chen Tang,
Chenghao Xiao,
Kun Zhao,
Chao Li,
Lin Yuan,
Guang Yang,
Lanxiao Huang,
Chenghua Lin
Abstract:
Large Language Models (LLMs) possess the remarkable capability to understand human instructions and generate high-quality text, enabling them to act as agents that simulate human behaviours. This capability allows LLMs to emulate human beings in a more advanced manner, beyond merely replicating simple human behaviours. However, there is a lack of exploring into leveraging LLMs to craft characters…
▽ More
Large Language Models (LLMs) possess the remarkable capability to understand human instructions and generate high-quality text, enabling them to act as agents that simulate human behaviours. This capability allows LLMs to emulate human beings in a more advanced manner, beyond merely replicating simple human behaviours. However, there is a lack of exploring into leveraging LLMs to craft characters from several aspects. In this work, we introduce the Customisable Conversation Agent Framework, which employs LLMs to simulate real-world characters that can be freely customised according to different user preferences. The customisable framework is helpful for designing customisable characters and role-playing agents according to human's preferences. We first propose the SimsConv dataset, which comprises 68 different customised characters, 1,360 multi-turn role-playing dialogues, and encompasses 13,971 interaction dialogues in total. The characters are created from several real-world elements, such as career, aspiration, trait, and skill. Building on these foundations, we present SimsChat, a freely customisable role-playing agent. It incorporates different real-world scenes and topic-specific character interaction dialogues, simulating characters' life experiences in various scenarios and topic-specific interactions with specific emotions. Experimental results show that our proposed framework achieves desirable performance and provides helpful guideline for building better simulacra of human beings in the future. Our data and code are available at https://github.com/Bernard-Yang/SimsChat.
△ Less
Submitted 30 June, 2024; v1 submitted 25 June, 2024;
originally announced June 2024.
-
DiffusionPDE: Generative PDE-Solving Under Partial Observation
Authors:
Jiahe Huang,
Guandao Yang,
Zichen Wang,
Jeong Joon Park
Abstract:
We introduce a general framework for solving partial differential equations (PDEs) using generative diffusion models. In particular, we focus on the scenarios where we do not have the full knowledge of the scene necessary to apply classical solvers. Most existing forward or inverse PDE approaches perform poorly when the observations on the data or the underlying coefficients are incomplete, which…
▽ More
We introduce a general framework for solving partial differential equations (PDEs) using generative diffusion models. In particular, we focus on the scenarios where we do not have the full knowledge of the scene necessary to apply classical solvers. Most existing forward or inverse PDE approaches perform poorly when the observations on the data or the underlying coefficients are incomplete, which is a common assumption for real-world measurements. In this work, we propose DiffusionPDE that can simultaneously fill in the missing information and solve a PDE by modeling the joint distribution of the solution and coefficient spaces. We show that the learned generative priors lead to a versatile framework for accurately solving a wide range of PDEs under partial observation, significantly outperforming the state-of-the-art methods for both forward and inverse directions.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Diff3Dformer: Leveraging Slice Sequence Diffusion for Enhanced 3D CT Classification with Transformer Networks
Authors:
Zihao Jin,
Yingying Fang,
Jiahao Huang,
Caiwen Xu,
Simon Walsh,
Guang Yang
Abstract:
The manifestation of symptoms associated with lung diseases can vary in different depths for individual patients, highlighting the significance of 3D information in CT scans for medical image classification. While Vision Transformer has shown superior performance over convolutional neural networks in image classification tasks, their effectiveness is often demonstrated on sufficiently large 2D dat…
▽ More
The manifestation of symptoms associated with lung diseases can vary in different depths for individual patients, highlighting the significance of 3D information in CT scans for medical image classification. While Vision Transformer has shown superior performance over convolutional neural networks in image classification tasks, their effectiveness is often demonstrated on sufficiently large 2D datasets and they easily encounter overfitting issues on small medical image datasets. To address this limitation, we propose a Diffusion-based 3D Vision Transformer (Diff3Dformer), which utilizes the latent space of the Diffusion model to form the slice sequence for 3D analysis and incorporates clustering attention into ViT to aggregate repetitive information within 3D CT scans, thereby harnessing the power of the advanced transformer in 3D classification tasks on small datasets. Our method exhibits improved performance on two different scales of small datasets of 3D lung CT scans, surpassing the state of the art 3D methods and other transformer-based approaches that emerged during the COVID-19 pandemic, demonstrating its robust and superior performance across different scales of data. Experimental results underscore the superiority of our proposed method, indicating its potential for enhancing medical image classification tasks in real-world scenarios.
△ Less
Submitted 26 June, 2024; v1 submitted 24 June, 2024;
originally announced June 2024.
-
Fuzzy Attention-based Border Rendering Network for Lung Organ Segmentation
Authors:
Sheng Zhang,
Yang Nan,
Yingying Fang,
Shiyi Wang,
Xiaodan Xing,
Zhifan Gao,
Guang Yang
Abstract:
Automatic lung organ segmentation on CT images is crucial for lung disease diagnosis. However, the unlimited voxel values and class imbalance of lung organs can lead to false-negative/positive and leakage issues in advanced methods. Additionally, some slender lung organs are easily lost during the recycled down/up-sample procedure, e.g., bronchioles & arterioles, causing severe discontinuity issue…
▽ More
Automatic lung organ segmentation on CT images is crucial for lung disease diagnosis. However, the unlimited voxel values and class imbalance of lung organs can lead to false-negative/positive and leakage issues in advanced methods. Additionally, some slender lung organs are easily lost during the recycled down/up-sample procedure, e.g., bronchioles & arterioles, causing severe discontinuity issue. Inspired by these, this paper introduces an effective lung organ segmentation method called Fuzzy Attention-based Border Rendering (FABR) network. Since fuzzy logic can handle the uncertainty in feature extraction, hence the fusion of deep networks and fuzzy sets should be a viable solution for better performance. Meanwhile, unlike prior top-tier methods that operate on all regular dense points, our FABR depicts lung organ regions as cube-trees, focusing only on recycle-sampled border vulnerable points, rendering the severely discontinuous, false-negative/positive organ regions with a novel Global-Local Cube-tree Fusion (GLCF) module. All experimental results, on four challenging datasets of airway & artery, demonstrate that our method can achieve the favorable performance significantly.
△ Less
Submitted 1 July, 2024; v1 submitted 23 June, 2024;
originally announced June 2024.
-
TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers
Authors:
Yakun Song,
Zhuo Chen,
Xiaofei Wang,
Ziyang Ma,
Guanrou Yang,
Xie Chen
Abstract:
Neural codec language model (LM) has demonstrated strong capability in zero-shot text-to-speech (TTS) synthesis. However, the codec LM often suffers from limitations in inference speed and stability, due to its auto-regressive nature and implicit alignment between text and audio. In this work, to handle these challenges, we introduce a new variant of neural codec LM, namely TacoLM. Specifically, T…
▽ More
Neural codec language model (LM) has demonstrated strong capability in zero-shot text-to-speech (TTS) synthesis. However, the codec LM often suffers from limitations in inference speed and stability, due to its auto-regressive nature and implicit alignment between text and audio. In this work, to handle these challenges, we introduce a new variant of neural codec LM, namely TacoLM. Specifically, TacoLM introduces a gated attention mechanism to improve the training and inference efficiency and reduce the model size. Meanwhile, an additional gated cross-attention layer is included for each decoder layer, which improves the efficiency and content accuracy of the synthesized speech. In the evaluation of the Librispeech corpus, the proposed TacoLM achieves a better word error rate, speaker similarity, and mean opinion score, with 90% fewer parameters and 5.2 times speed up, compared with VALL-E. Demo and code is available at https://ereboas.github.io/TacoLM/.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
DiffExplainer: Unveiling Black Box Models Via Counterfactual Generation
Authors:
Yingying Fang,
Shuang Wu,
Zihao Jin,
Caiwen Xu,
Shiyi Wang,
Simon Walsh,
Guang Yang
Abstract:
In the field of medical imaging, particularly in tasks related to early disease detection and prognosis, understanding the reasoning behind AI model predictions is imperative for assessing their reliability. Conventional explanation methods encounter challenges in identifying decisive features in medical image classifications, especially when discriminative features are subtle or not immediately e…
▽ More
In the field of medical imaging, particularly in tasks related to early disease detection and prognosis, understanding the reasoning behind AI model predictions is imperative for assessing their reliability. Conventional explanation methods encounter challenges in identifying decisive features in medical image classifications, especially when discriminative features are subtle or not immediately evident. To address this limitation, we propose an agent model capable of generating counterfactual images that prompt different decisions when plugged into a black box model. By employing this agent model, we can uncover influential image patterns that impact the black model's final predictions. Through our methodology, we efficiently identify features that influence decisions of the deep black box. We validated our approach in the rigorous domain of medical prognosis tasks, showcasing its efficacy and potential to enhance the reliability of deep learning models in medical image classification compared to existing interpretation methods. The code will be publicly available at https://github.com/ayanglab/DiffExplainer.
△ Less
Submitted 26 June, 2024; v1 submitted 21 June, 2024;
originally announced June 2024.
-
LayerMatch: Do Pseudo-labels Benefit All Layers?
Authors:
Chaoqi Liang,
Guanglei Yang,
Lifeng Qiao,
Zitong Huang,
Hongliang Yan,
Yunchao Wei,
Wangmeng Zuo
Abstract:
Deep neural networks have achieved remarkable performance across various tasks when supplied with large-scale labeled data. However, the collection of labeled data can be time-consuming and labor-intensive. Semi-supervised learning (SSL), particularly through pseudo-labeling algorithms that iteratively assign pseudo-labels for self-training, offers a promising solution to mitigate the dependency o…
▽ More
Deep neural networks have achieved remarkable performance across various tasks when supplied with large-scale labeled data. However, the collection of labeled data can be time-consuming and labor-intensive. Semi-supervised learning (SSL), particularly through pseudo-labeling algorithms that iteratively assign pseudo-labels for self-training, offers a promising solution to mitigate the dependency of labeled data. Previous research generally applies a uniform pseudo-labeling strategy across all model layers, assuming that pseudo-labels exert uniform influence throughout. Contrasting this, our theoretical analysis and empirical experiment demonstrate feature extraction layer and linear classification layer have distinct learning behaviors in response to pseudo-labels. Based on these insights, we develop two layer-specific pseudo-label strategies, termed Grad-ReLU and Avg-Clustering. Grad-ReLU mitigates the impact of noisy pseudo-labels by removing the gradient detrimental effects of pseudo-labels in the linear classification layer. Avg-Clustering accelerates the convergence of feature extraction layer towards stable clustering centers by integrating consistent outputs. Our approach, LayerMatch, which integrates these two strategies, can avoid the severe interference of noisy pseudo-labels in the linear classification layer while accelerating the clustering capability of the feature extraction layer. Through extensive experimentation, our approach consistently demonstrates exceptional performance on standard semi-supervised learning benchmarks, achieving a significant improvement of 10.38% over baseline method and a 2.44% increase compared to state-of-the-art methods.
△ Less
Submitted 27 June, 2024; v1 submitted 20 June, 2024;
originally announced June 2024.
-
Groupwise Deformable Registration of Diffusion Tensor Cardiovascular Magnetic Resonance: Disentangling Diffusion Contrast, Respiratory and Cardiac Motions
Authors:
Fanwen Wang,
Yihao Luo,
Ke Wen,
Jiahao Huang,
Pedro F. Ferreira,
Yaqing Luo,
Yinzhe Wu,
Camila Munoz,
Dudley J. Pennell,
Andrew D. Scott,
Sonia Nielles-Vallespin,
Guang Yang
Abstract:
Diffusion tensor based cardiovascular magnetic resonance (DT-CMR) offers a non-invasive method to visualize the myocardial microstructure. With the assumption that the heart is stationary, frames are acquired with multiple repetitions for different diffusion encoding directions. However, motion from poor breath-holding and imprecise cardiac triggering complicates DT-CMR analysis, further challenge…
▽ More
Diffusion tensor based cardiovascular magnetic resonance (DT-CMR) offers a non-invasive method to visualize the myocardial microstructure. With the assumption that the heart is stationary, frames are acquired with multiple repetitions for different diffusion encoding directions. However, motion from poor breath-holding and imprecise cardiac triggering complicates DT-CMR analysis, further challenged by its inherently low SNR, varied contrasts, and diffusion induced textures. Our solution is a novel framework employing groupwise registration with an implicit template to isolate respiratory and cardiac motions, while a tensor-embedded branch preserves diffusion contrast textures. We have devised a loss refinement tailored for non-linear least squares fitting and low SNR conditions. Additionally, we introduce new physics-based and clinical metrics for performance evaluation. Access code and supplementary materials at: https://github.com/ayanglab/DTCMR-Reg
△ Less
Submitted 3 July, 2024; v1 submitted 19 June, 2024;
originally announced June 2024.
-
Low-rank based motion correction followed by automatic frame selection in DT-CMR
Authors:
Fanwen Wang,
Pedro F. Ferreira,
Camila Munoz,
Ke Wen,
Yaqing Luo,
Jiahao Huang,
Yinzhe Wu,
Dudley J. Pennell,
Andrew D. Scott,
Sonia Nielles-Vallespin,
Guang Yang
Abstract:
Motivation: Post-processing of in-vivo diffusion tensor CMR (DT-CMR) is challenging due to the low SNR and variation in contrast between frames which makes image registration difficult, and the need to manually reject frames corrupted by motion. Goals: To develop a semi-automatic post-processing pipeline for robust DT-CMR registration and automatic frame selection. Approach: We used low intrinsic…
▽ More
Motivation: Post-processing of in-vivo diffusion tensor CMR (DT-CMR) is challenging due to the low SNR and variation in contrast between frames which makes image registration difficult, and the need to manually reject frames corrupted by motion. Goals: To develop a semi-automatic post-processing pipeline for robust DT-CMR registration and automatic frame selection. Approach: We used low intrinsic rank averaged frames as the reference to register other low-ranked frames. A myocardium-guided frame selection rejected the frames with signal loss, through-plane motion and poor registration. Results: The proposed method outperformed our previous noise-robust rigid registration on helix angle data quality and reduced negative eigenvalues in healthy volunteers.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
The association of domain-specific physical activity and sedentary activity with stroke: A prospective cohort study
Authors:
Xinyi He,
Shidi Wang,
Yi Li,
Jiucun Wang,
Guangrui Yang,
Jun Chen,
Zixin Hu
Abstract:
Background The incidence of stroke places a heavy burden on both society and individuals. Activity is closely related to cardiovascular health. This study aimed to investigate the relationship between the varying domains of PA, like occupation-related Physical Activity (OPA), transportation-related Physical Activity (TPA), leisure-time Physical Activity (LTPA), and Sedentary Activity (SA) with str…
▽ More
Background The incidence of stroke places a heavy burden on both society and individuals. Activity is closely related to cardiovascular health. This study aimed to investigate the relationship between the varying domains of PA, like occupation-related Physical Activity (OPA), transportation-related Physical Activity (TPA), leisure-time Physical Activity (LTPA), and Sedentary Activity (SA) with stroke. Methods Our analysis included 30,400 participants aged 20+ years from 2007 to 2018 National Health and Nutrition Examination Survey (NHANES). Stroke was identified based on the participant's self-reported diagnoses from previous medical consultations, and PA and SA were self-reported. Multivariable logistic and restricted cubic spline models were used to assess the associations. Results Participants achieving PA guidelines (performing PA more than 150 min/week) were 35.7% less likely to have a stroke based on both the total PA (odds ratio [OR] 0.643, 95% confidence interval [CI] 0.523-0.790) and LTPA (OR 0.643, 95% CI 0.514-0.805), while OPA or TPA did not demonstrate lower stroke risk. Furthermore, participants with less than 7.5 h/day SA levels were 21.6% (OR 0.784, 95% CI 0.665-0.925) less likely to have a stroke. The intensities of total PA and LTPA exhibited nonlinear U-shaped associations with stroke risk. In contrast, those of OPA and TPA showed negative linear associations, while SA intensities were positively linearly correlated with stroke risk. Conclusions LTPA, but not OPA or TPA, was associated with a lower risk of stroke at any amount, suggesting that significant cardiovascular health would benefit from increased PA. Additionally, the positive association between SA and stroke indicated that prolonged sitting was detrimental to cardiovascular health. Overall, increased PA within a reasonable range reduces the risk of stroke, while increased SA elevates it.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Reparameterizable Dual-Resolution Network for Real-time Semantic Segmentation
Authors:
Guoyu Yang,
Yuan Wang,
Daming Shi
Abstract:
Semantic segmentation plays a key role in applications such as autonomous driving and medical image. Although existing real-time semantic segmentation models achieve a commendable balance between accuracy and speed, their multi-path blocks still affect overall speed. To address this issue, this study proposes a Reparameterizable Dual-Resolution Network (RDRNet) dedicated to real-time semantic segm…
▽ More
Semantic segmentation plays a key role in applications such as autonomous driving and medical image. Although existing real-time semantic segmentation models achieve a commendable balance between accuracy and speed, their multi-path blocks still affect overall speed. To address this issue, this study proposes a Reparameterizable Dual-Resolution Network (RDRNet) dedicated to real-time semantic segmentation. Specifically, RDRNet employs a two-branch architecture, utilizing multi-path blocks during training and reparameterizing them into single-path blocks during inference, thereby enhancing both accuracy and inference speed simultaneously. Furthermore, we propose the Reparameterizable Pyramid Pooling Module (RPPM) to enhance the feature representation of the pyramid pooling module without increasing its inference time. Experimental results on the Cityscapes, CamVid, and Pascal VOC 2012 datasets demonstrate that RDRNet outperforms existing state-of-the-art models in terms of both performance and speed. The code is available at https://github.com/gyyang23/RDRNet.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
MegaScenes: Scene-Level View Synthesis at Scale
Authors:
Joseph Tung,
Gene Chou,
Ruojin Cai,
Guandao Yang,
Kai Zhang,
Gordon Wetzstein,
Bharath Hariharan,
Noah Snavely
Abstract:
Scene-level novel view synthesis (NVS) is fundamental to many vision and graphics applications. Recently, pose-conditioned diffusion models have led to significant progress by extracting 3D information from 2D foundation models, but these methods are limited by the lack of scene-level training data. Common dataset choices either consist of isolated objects (Objaverse), or of object-centric scenes…
▽ More
Scene-level novel view synthesis (NVS) is fundamental to many vision and graphics applications. Recently, pose-conditioned diffusion models have led to significant progress by extracting 3D information from 2D foundation models, but these methods are limited by the lack of scene-level training data. Common dataset choices either consist of isolated objects (Objaverse), or of object-centric scenes with limited pose distributions (DTU, CO3D). In this paper, we create a large-scale scene-level dataset from Internet photo collections, called MegaScenes, which contains over 100K structure from motion (SfM) reconstructions from around the world. Internet photos represent a scalable data source but come with challenges such as lighting and transient objects. We address these issues to further create a subset suitable for the task of NVS. Additionally, we analyze failure cases of state-of-the-art NVS methods and significantly improve generation consistency. Through extensive experiments, we validate the effectiveness of both our dataset and method on generating in-the-wild scenes. For details on the dataset and code, see our project page at https://megascenes.github.io .
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Harmonizing Feature Maps: A Graph Convolutional Approach for Enhancing Adversarial Robustness
Authors:
Kejia Zhang,
Juanjuan Weng,
Junwei Wu,
Guoqing Yang,
Shaozi Li,
Zhiming Luo
Abstract:
The vulnerability of Deep Neural Networks to adversarial perturbations presents significant security concerns, as the imperceptible perturbations can contaminate the feature space and lead to incorrect predictions. Recent studies have attempted to calibrate contaminated features by either suppressing or over-activating particular channels. Despite these efforts, we claim that adversarial attacks e…
▽ More
The vulnerability of Deep Neural Networks to adversarial perturbations presents significant security concerns, as the imperceptible perturbations can contaminate the feature space and lead to incorrect predictions. Recent studies have attempted to calibrate contaminated features by either suppressing or over-activating particular channels. Despite these efforts, we claim that adversarial attacks exhibit varying disruption levels across individual channels. Furthermore, we argue that harmonizing feature maps via graph and employing graph convolution can calibrate contaminated features. To this end, we introduce an innovative plug-and-play module called Feature Map-based Reconstructed Graph Convolution (FMR-GC). FMR-GC harmonizes feature maps in the channel dimension to reconstruct the graph, then employs graph convolution to capture neighborhood information, effectively calibrating contaminated features. Extensive experiments have demonstrated the superior performance and scalability of FMR-GC. Moreover, our model can be combined with advanced adversarial training methods to considerably enhance robustness without compromising the model's clean accuracy.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Ultra-low noise laser and optical frequency comb-based timing system for the Black Hole Explorer (BHEX) mission
Authors:
Hannah Tomio,
Guangning Yang,
Holly F. Leopardi,
Kenji Numata,
Anthony W. Yu,
Andrew Attar,
Xiaozhen Xu,
Wei Lu,
Cheryl Gramling,
T. K. Sridharan,
Peter Kurczynski
Abstract:
In this effort, we demonstrate the performance of a highly stable time reference for the proposed Black Hole Explorer (BHEX) mission, a space-based extension to the Event Horizon Telescope (EHT) Very Long Baseline Interferometry (VLBI) project. This precision timing system is based on the use of a space-qualified, ultra-low noise laser developed as part of the Laser Interferometer Space Antenna (L…
▽ More
In this effort, we demonstrate the performance of a highly stable time reference for the proposed Black Hole Explorer (BHEX) mission, a space-based extension to the Event Horizon Telescope (EHT) Very Long Baseline Interferometry (VLBI) project. This precision timing system is based on the use of a space-qualified, ultra-low noise laser developed as part of the Laser Interferometer Space Antenna (LISA) mission as the timing reference, and an optical frequency comb to transfer the stability of this laser to the microwave regime for instrumentation use. We describe the implementation of this system and experimental setup to characterize the stability performance. We present the results of this experiment that demonstrate the performance of this system meets requirements for the BHEX mission.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Low-Overhead Channel Estimation via 3D Extrapolation for TDD mmWave Massive MIMO Systems Under High-Mobility Scenarios
Authors:
Binggui Zhou,
Xi Yang,
Shaodan Ma,
Feifei Gao,
Guanghua Yang
Abstract:
In TDD mmWave massive MIMO systems, the downlink CSI can be attained through uplink channel estimation thanks to the uplink-downlink channel reciprocity. However, the channel aging issue is significant under high-mobility scenarios and thus necessitates frequent uplink channel estimation. In addition, large amounts of antennas and subcarriers lead to high-dimensional CSI matrices, aggravating the…
▽ More
In TDD mmWave massive MIMO systems, the downlink CSI can be attained through uplink channel estimation thanks to the uplink-downlink channel reciprocity. However, the channel aging issue is significant under high-mobility scenarios and thus necessitates frequent uplink channel estimation. In addition, large amounts of antennas and subcarriers lead to high-dimensional CSI matrices, aggravating the pilot training overhead. To systematically reduce the pilot overhead, a spatial, frequency, and temporal domain (3D) channel extrapolation framework is proposed in this paper. Considering the marginal effects of pilots in the spatial and frequency domains and the effectiveness of traditional knowledge-driven channel estimation methods, we first propose a knowledge-and-data driven spatial-frequency channel extrapolation network (KDD-SFCEN) for uplink channel estimation by exploiting the least square estimator for coarse channel estimation and joint spatial-frequency channel extrapolation to reduce the spatial-frequency domain pilot overhead. Then, resorting to the uplink-downlink channel reciprocity and temporal domain dependencies of downlink channels, a temporal uplink-downlink channel extrapolation network (TUDCEN) is proposed for slot-level channel extrapolation, aiming to enlarge the pilot signal period and thus reduce the temporal domain pilot overhead under high-mobility scenarios. Specifically, we propose the spatial-frequency sampling embedding module to reduce the representation dimension and consequent computational complexity, and we propose to exploit the autoregressive generative Transformer for generating downlink channels autoregressively. Numerical results demonstrate the superiority of the proposed framework in significantly reducing the pilot training overhead by more than 16 times and improving the system's spectral efficiency under high-mobility scenarios.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
ODIN: Identifying Protoclusters and Cosmic Filaments Traced by Ly$α$-emitting Galaxies
Authors:
Vandana Ramakrishnan,
Kyoung-Soo Lee,
Maria Celeste Artale,
Eric Gawiser. Yujin Yang,
Changbom Park,
Robin Ciardullo,
Lucia Guaita,
Sang Hyeok Im,
Seongjae Kim,
Ankit Kumar,
Jaehyun Lee,
Seong-Kook Lee,
Byeongha Moon,
Nelson Padilla,
Alexandra Pope,
Roxana Popescu,
Hyunmi Song,
Paulina Troncoso,
Francisco Valdes,
Ann Zabludoff
Abstract:
To understand the formation and evolution of massive cosmic structures, studying them at high redshift, in the epoch when they formed the majority of their mass is essential. The One-hundred-deg$^2$ DECam Imaging in Narrowbands (ODIN) survey is undertaking the widest-area narrowband program to date, to use Ly$α$-emitting galaxies (LAEs) to trace the large-scale structure (LSS) of the Universe at t…
▽ More
To understand the formation and evolution of massive cosmic structures, studying them at high redshift, in the epoch when they formed the majority of their mass is essential. The One-hundred-deg$^2$ DECam Imaging in Narrowbands (ODIN) survey is undertaking the widest-area narrowband program to date, to use Ly$α$-emitting galaxies (LAEs) to trace the large-scale structure (LSS) of the Universe at three cosmic epochs. In this work, we present results at $z$ = 3.1 based on early ODIN data in the COSMOS field. We identify and characterize protoclusters and cosmic filaments using multiple methods and discuss their strengths and weaknesses. We then compare our observations against the IllustrisTNG suite of cosmological hydrodynamical simulations. The two are in excellent agreement, with a similar number and angular size of structures identified above a specified density threshold. We are able to recover the simulated protoclusters with $\log$(M$_{z=0}$/$M_\odot$) $\gtrsim$ 14.4 in $\sim$ 60\% of the cases. With these objects we show that the descendant masses of the protoclusters in our sample can be estimated purely based on our 2D measurements, finding a median $z$ = 0 mass of $\sim10^{14.5}$M$_\odot$. The lack of information on the radial extent of each protocluster introduces a $\sim$0.4~dex uncertainty in its descendant mass. Finally, we show that the recovery of the cosmic web in the vicinity of protoclusters is both efficient and accurate. The similarity of our observations and the simulations imply that our structure selection is likewise robust and efficient, demonstrating that LAEs are reliable tracers of the LSS.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
CodeScore-R: An Automated Robustness Metric for Assessing the FunctionalCorrectness of Code Synthesis
Authors:
Guang Yang,
Yu Zhou,
Xiang Chen,
Xiangyu Zhang
Abstract:
Evaluation metrics are crucial in the field of code synthesis. Commonly used code evaluation metrics canbe classified into three types: match-based, semantic-based, and execution-based. Among them, the execution-basedPass@k metric accurately assesses the functionality of predicted code by executing test cases. However, calculatingthis metric requires a significant amount of overhead, necessitating…
▽ More
Evaluation metrics are crucial in the field of code synthesis. Commonly used code evaluation metrics canbe classified into three types: match-based, semantic-based, and execution-based. Among them, the execution-basedPass@k metric accurately assesses the functionality of predicted code by executing test cases. However, calculatingthis metric requires a significant amount of overhead, necessitating the design of an automated evaluation metric thatcan assess the functionality of predicted code without the need for test cases. Additionally, a good evaluation metricshould be robust, that is the metric can maintain its accuracy even when the predicted code undergoes minor changes.To address these challenges, we propose an automated robust metric, called CodeScore-R, based on UniXcoder andcontrastive learning, for evaluating the functionality of code synthesis. CodeScore-R employs techniques such assketch-based processing, syntactic-equivalent transformations, and mutation testing to effectively mitigate theinterference caused by identifiers, syntax structures, and operators on evaluation results. Experimental resultsdemonstrate that in the tasks of code generation and migration in Java and Python, CodeScore-R outperforms otherevaluation metrics and is more closely aligned with the Pass@k metric, while exhibiting stronger robustness.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Generalized W-Net: Arbitrary-style Chinese Character Synthesization
Authors:
Haochuan Jiang,
Guanyu Yang,
Fei Cheng,
Kaizhu Huang
Abstract:
Synthesizing Chinese characters with consistent style using few stylized examples is challenging. Existing models struggle to generate arbitrary style characters with limited examples. In this paper, we propose the Generalized W-Net, a novel class of W-shaped architectures that addresses this. By incorporating Adaptive Instance Normalization and introducing multi-content, our approach can synthesi…
▽ More
Synthesizing Chinese characters with consistent style using few stylized examples is challenging. Existing models struggle to generate arbitrary style characters with limited examples. In this paper, we propose the Generalized W-Net, a novel class of W-shaped architectures that addresses this. By incorporating Adaptive Instance Normalization and introducing multi-content, our approach can synthesize Chinese characters in any desired style, even with limited examples. It handles seen and unseen styles during training and can generate new character contents. Experimental results demonstrate the effectiveness of our approach.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Survey for Landing Generative AI in Social and E-commerce Recsys -- the Industry Perspectives
Authors:
Da Xu,
Danqing Zhang,
Guangyu Yang,
Bo Yang,
Shuyuan Xu,
Lingling Zheng,
Cindy Liang
Abstract:
Recently, generative AI (GAI), with their emerging capabilities, have presented unique opportunities for augmenting and revolutionizing industrial recommender systems (Recsys). Despite growing research efforts at the intersection of these fields, the integration of GAI into industrial Recsys remains in its infancy, largely due to the intricate nature of modern industrial Recsys infrastructure, ope…
▽ More
Recently, generative AI (GAI), with their emerging capabilities, have presented unique opportunities for augmenting and revolutionizing industrial recommender systems (Recsys). Despite growing research efforts at the intersection of these fields, the integration of GAI into industrial Recsys remains in its infancy, largely due to the intricate nature of modern industrial Recsys infrastructure, operations, and product sophistication. Drawing upon our experiences in successfully integrating GAI into several major social and e-commerce platforms, this survey aims to comprehensively examine the underlying system and AI foundations, solution frameworks, connections to key research advancements, as well as summarize the practical insights and challenges encountered in the endeavor to integrate GAI into industrial Recsys. As pioneering work in this domain, we hope outline the representative developments of relevant fields, shed lights on practical GAI adoptions in the industry, and motivate future research.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Large Out-of-Plane Piezoelectric Effect in Janus Ferromagnetic Semiconductor Monolayer of CrOFBr
Authors:
Qiuyue Ma,
Guochun Yang,
Busheng Wang,
Yong Liu
Abstract:
The exploitation of piezoelectric ferromagnetism (PFM) in two-dimensional (2D) materials with large out-of-plane piezoelectric response is motivated not only by technological applications but also scientific interest. In this study, the CrONM monolayer family (N=F, Cl; M=Br, Cl) was investigated using first-principles calculations, revealing that the Janus CrOFBr monolayer exhibits intrinsic ferro…
▽ More
The exploitation of piezoelectric ferromagnetism (PFM) in two-dimensional (2D) materials with large out-of-plane piezoelectric response is motivated not only by technological applications but also scientific interest. In this study, the CrONM monolayer family (N=F, Cl; M=Br, Cl) was investigated using first-principles calculations, revealing that the Janus CrOFBr monolayer exhibits intrinsic ferromagnetic semiconductor behavior along with a significant out-of-plane piezoelectric effect. The calculated out-of-plane piezoelectric strain coefficients d$_{31}$ and d$_{32}$ are up to 1.21 and 0.63 pm/V, respectively. These values are greater than those of the majority of 2D materials. Furthermore, our findings demonstrate that applying tensile strain can enhance the out-of-plane piezoelectric response, leading to a respective 27% and 67% augmentation in the piezoelectric strain coefficients d$_{31}$ and d$_{32}$ compared to the unstrained configurations. This discovery holds great potential for propelling the field of nanoelectronics forward and facilitating the development of multifunctional semiconductor spintronic applications. Finally, by comparing d$_{31}$ and d$_{32}$ of the CrONM monolayer family (N=F, Cl; M=Br, Cl), we find that the magnitudes of d$_{31}$ and d$_{32}$ are correlated with the electronegativity difference between the M and N atoms. These findings provide valuable insights for the design of 2D piezoelectric materials with enhanced vertical piezoelectric responses.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
W-Net: One-Shot Arbitrary-Style Chinese Character Generation with Deep Neural Networks
Authors:
Haochuan Jiang,
Guanyu Yang,
Kaizhu Huang,
Rui Zhang
Abstract:
Due to the huge category number, the sophisticated combinations of various strokes and radicals, and the free writing or printing styles, generating Chinese characters with diverse styles is always considered as a difficult task. In this paper, an efficient and generalized deep framework, namely, the W-Net, is introduced for the one-shot arbitrary-style Chinese character generation task. Specifica…
▽ More
Due to the huge category number, the sophisticated combinations of various strokes and radicals, and the free writing or printing styles, generating Chinese characters with diverse styles is always considered as a difficult task. In this paper, an efficient and generalized deep framework, namely, the W-Net, is introduced for the one-shot arbitrary-style Chinese character generation task. Specifically, given a single character (one-shot) with a specific style (e.g., a printed font or hand-writing style), the proposed W-Net model is capable of learning and generating any arbitrary characters sharing the style similar to the given single character. Such appealing property was rarely seen in the literature. We have compared the proposed W-Net framework to many other competitive methods. Experimental results showed the proposed method is significantly superior in the one-shot setting.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
InfoGaussian: Structure-Aware Dynamic Gaussians through Lightweight Information Shaping
Authors:
Yunchao Zhang,
Guandao Yang,
Leonidas Guibas,
Yanchao Yang
Abstract:
3D Gaussians, as a low-level scene representation, typically involve thousands to millions of Gaussians. This makes it difficult to control the scene in ways that reflect the underlying dynamic structure, where the number of independent entities is typically much smaller. In particular, it can be challenging to animate and move objects in the scene, which requires coordination among many Gaussians…
▽ More
3D Gaussians, as a low-level scene representation, typically involve thousands to millions of Gaussians. This makes it difficult to control the scene in ways that reflect the underlying dynamic structure, where the number of independent entities is typically much smaller. In particular, it can be challenging to animate and move objects in the scene, which requires coordination among many Gaussians. To address this issue, we develop a mutual information shaping technique that enforces movement resonance between correlated Gaussians in a motion network. Such correlations can be learned from putative 2D object masks in different views. By approximating the mutual information with the Jacobians of the motions, our method ensures consistent movements of the Gaussians composing different objects under various perturbations. In particular, we develop an efficient contrastive training pipeline with lightweight optimization to shape the motion network, avoiding the need for re-shaping throughout the motion sequence. Notably, our training only touches a small fraction of all Gaussians in the scene yet attains the desired compositional behavior according to the underlying dynamic structure. The proposed technique is evaluated on challenging scenes and demonstrates significant performance improvement in promoting consistent movements and 3D object segmentation while inducing low computation and memory requirements.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
MaLa-ASR: Multimedia-Assisted LLM-Based ASR
Authors:
Guanrou Yang,
Ziyang Ma,
Fan Yu,
Zhifu Gao,
Shiliang Zhang,
Xie Chen
Abstract:
As more and more information-rich data like video become available, utilizing multi-modal auxiliary information to enhance audio tasks has sparked widespread research interest. The recent surge in research on LLM-based audio models provides fresh perspectives for tackling audio tasks. Given that LLM can flexibly ingest multiple inputs, we propose MaLa-ASR, an LLM-based ASR model that can integrate…
▽ More
As more and more information-rich data like video become available, utilizing multi-modal auxiliary information to enhance audio tasks has sparked widespread research interest. The recent surge in research on LLM-based audio models provides fresh perspectives for tackling audio tasks. Given that LLM can flexibly ingest multiple inputs, we propose MaLa-ASR, an LLM-based ASR model that can integrate textual keywords extracted from presentation slides to improve recognition of conference content. MaLa-ASR yields average WERs of 9.4% and 11.7% on the L95 and S95 subsets of the SlideSpeech corpus, representing a significant relative WER drop of 27.9% and 44.7% over the baseline model reported in SlideSpeech. MaLa-ASR underscores LLM's strong performance in speech tasks and the capability to integrate auxiliary information conveniently. By adding keywords to the input prompt, the biased word error rate (B-WER) reduces relatively by 46.0% and 44.2%, establishing a new SOTA on this dataset.
△ Less
Submitted 13 June, 2024; v1 submitted 9 June, 2024;
originally announced June 2024.
-
Gentle-CLIP: Exploring Aligned Semantic In Low-Quality Multimodal Data With Soft Alignment
Authors:
Zijia Song,
Zelin Zang,
Yelin Wang,
Guozheng Yang,
Jiangbin Zheng,
Kaicheng yu,
Wanyu Chen,
Stan Z. Li
Abstract:
Multimodal fusion breaks through the barriers between diverse modalities and has already yielded numerous impressive performances. However, in various specialized fields, it is struggling to obtain sufficient alignment data for the training process, which seriously limits the use of previously elegant models. Thus, semi-supervised learning attempts to achieve multimodal alignment with fewer matche…
▽ More
Multimodal fusion breaks through the barriers between diverse modalities and has already yielded numerous impressive performances. However, in various specialized fields, it is struggling to obtain sufficient alignment data for the training process, which seriously limits the use of previously elegant models. Thus, semi-supervised learning attempts to achieve multimodal alignment with fewer matched pairs but traditional methods like pseudo-labeling are difficult to apply in domains with no label information. To address these problems, we transform semi-supervised multimodal alignment into a manifold matching problem and propose a new method based on CLIP, named Gentle-CLIP. Specifically, we design a novel semantic density distribution loss to explore implicit semantic alignment information from unpaired multimodal data by constraining the latent representation distribution with fine granularity, thus eliminating the need for numerous strictly matched pairs. Meanwhile, we introduce multi-kernel maximum mean discrepancy as well as self-supervised contrastive loss to pull separate modality distributions closer and enhance the stability of the representation distribution. In addition, the contrastive loss used in CLIP is employed on the supervised matched data to prevent negative optimization. Extensive experiments conducted on a range of tasks in various fields, including protein, remote sensing, and the general vision-language field, demonstrate the effectiveness of our proposed Gentle-CLIP.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Thermalization dynamics in photonic lattices of different geometries
Authors:
Guowen Yang,
Domenico Bongiovanni,
Daohong Song,
Roberto Morandotti,
Zhigang Chen,
Nikolaos K. Efremidis
Abstract:
The statistical mechanical behavior of weakly nonlinear multimoded optical settings is attracting increased interest during the last few years. The main purpose of this work is to numerically investigate the main factors that affect the thermalization process in photonic lattices. In particular, we find that lattices with identically selected properties (such as temperature, coupling coefficient,…
▽ More
The statistical mechanical behavior of weakly nonlinear multimoded optical settings is attracting increased interest during the last few years. The main purpose of this work is to numerically investigate the main factors that affect the thermalization process in photonic lattices. In particular, we find that lattices with identically selected properties (such as temperature, coupling coefficient, lattice size, and excitation conditions) can exhibit very different thermalization dynamics and thus thermalization distances. Our investigation is focused on two different two-dimensional lattices: the honeycomb lattice and the triangular lattice. Our numerical results show that, independently of the excitation conditions, the honeycomb lattice always thermalizes faster than the triangular lattice. We mainly explain this behavior to the quasilinear spectrum that promotes wave-mixing in the honeycomb lattice in comparison to the power-like spectrum of the triangular lattice. In addition, we investigate the combined effects of temperature as well as the sign and magnitude of the nonlinearity. Switching either the sign of the Kerr nonlinear coefficient or the sign of the temperature can lead to significant differences in the thermalization dynamics, a phenomenon that can be physically explained in terms of wave instabilities. Larger absolute values of the temperature |T| result in more uniform distributions for the power occupation numbers and faster thermalization speeds. Finally, as expected, increasing the magnitude of the nonlinearity results in accelerated thermalization. Our findings provide valuable insights into optical thermalization in discrete systems where experimental realization may bring about new possibilities for light manipulation and applications.
△ Less
Submitted 8 June, 2024;
originally announced June 2024.
-
An investigation of anisotropy in the bubbly turbulent flow via direct numerical simulations
Authors:
Xuanwei Zhang,
Yanchao Liu,
Wenkang Wang,
Guang Yang,
Xu Chu
Abstract:
This study explores the dynamics of dispersed bubbly turbulent flow in a channel using interface-resolved direct numerical simulation (DNS) with an efficient Coupled Level-Set Volume-of-Fluid (CLSVOF) solver. The influence of number of bubbles (96 and 192), flow direction, and Eotvos number was examined across eight distinct cases. The results indicate that in upward flows, bubbles tend to accumul…
▽ More
This study explores the dynamics of dispersed bubbly turbulent flow in a channel using interface-resolved direct numerical simulation (DNS) with an efficient Coupled Level-Set Volume-of-Fluid (CLSVOF) solver. The influence of number of bubbles (96 and 192), flow direction, and Eotvos number was examined across eight distinct cases. The results indicate that in upward flows, bubbles tend to accumulate near the wall, with smaller Eotvos numbers bringing them closer to the wall and enhancing energy dissipation through increased turbulence and vorticity. This proximity causes the liquid phase velocity to attenuate, and the bubbles, being more spherical, induce more isotropic turbulence. Conversely, in downward flows, bubbles cluster in the middle of the channel and induce additional pseudo-turbulence in the channel center, which induce additional turbulent kinetic energy in the channel center. The study further examines budget of Turbulent Kinetic Energy (TKE) and the exact balance equation for the Reynolds stresses, revealing that near-wall bubble motion generates substantial velocity gradients, particularly in the wall-normal direction, significantly impacting the turbulence structure.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
ShadowBound: Efficient Heap Memory Protection Through Advanced Metadata Management and Customized Compiler Optimization
Authors:
Zheng Yu,
Ganxiang Yang,
Xinyu Xing
Abstract:
In software development, the prevalence of unsafe languages such as C and C++ introduces potential vulnerabilities, especially within the heap, a pivotal component for dynamic memory allocation. Despite its significance, heap management complexities have made heap corruption pervasive, posing severe threats to system security. While prior solutions aiming for temporal and spatial memory safety exh…
▽ More
In software development, the prevalence of unsafe languages such as C and C++ introduces potential vulnerabilities, especially within the heap, a pivotal component for dynamic memory allocation. Despite its significance, heap management complexities have made heap corruption pervasive, posing severe threats to system security. While prior solutions aiming for temporal and spatial memory safety exhibit overheads deemed impractical, we present ShadowBound, a unique heap memory protection design. At its core, ShadowBound is an efficient out-of-bounds defense that can work with various use-after-free defenses (e.g. MarkUs, FFMalloc, PUMM) without compatibility constraints. We harness a shadow memory-based metadata management mechanism to store heap chunk boundaries and apply customized compiler optimizations tailored for boundary checking. We implemented ShadowBound atop the LLVM framework and integrated three state-of-the-art use-after-free defenses. Our evaluations show that ShadowBound provides robust heap protection with minimal time and memory overhead, suggesting its effectiveness and efficiency in safeguarding real-world programs against prevalent heap vulnerabilities.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Deep asymmetric mixture model for unsupervised cell segmentation
Authors:
Yang Nan,
Guang Yang
Abstract:
Automated cell segmentation has become increasingly crucial for disease diagnosis and drug discovery, as manual delineation is excessively laborious and subjective. To address this issue with limited manual annotation, researchers have developed semi/unsupervised segmentation approaches. Among these approaches, the Deep Gaussian mixture model plays a vital role due to its capacity to facilitate co…
▽ More
Automated cell segmentation has become increasingly crucial for disease diagnosis and drug discovery, as manual delineation is excessively laborious and subjective. To address this issue with limited manual annotation, researchers have developed semi/unsupervised segmentation approaches. Among these approaches, the Deep Gaussian mixture model plays a vital role due to its capacity to facilitate complex data distributions. However, these models assume that the data follows symmetric normal distributions, which is inapplicable for data that is asymmetrically distributed. These models also obstacles weak generalization capacity and are sensitive to outliers. To address these issues, this paper presents a novel asymmetric mixture model for unsupervised cell segmentation. This asymmetric mixture model is built by aggregating certain multivariate Gaussian mixture models with log-likelihood and self-supervised-based optimization functions. The proposed asymmetric mixture model outperforms (nearly 2-30% gain in dice coefficient, p<0.05) the existing state-of-the-art unsupervised models on cell segmentation including the segment anything.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
An Empirical Study of Excitation and Aggregation Design Adaptions in CLIP4Clip for Video-Text Retrieval
Authors:
Xiaolun Jing,
Genke Yang,
Jian Chu
Abstract:
CLIP4Clip model transferred from the CLIP has been the de-factor standard to solve the video clip retrieval task from frame-level input, triggering the surge of CLIP4Clip-based models in the video-text retrieval domain. In this work, we rethink the inherent limitation of widely-used mean pooling operation in the frame features aggregation and investigate the adaptions of excitation and aggregation…
▽ More
CLIP4Clip model transferred from the CLIP has been the de-factor standard to solve the video clip retrieval task from frame-level input, triggering the surge of CLIP4Clip-based models in the video-text retrieval domain. In this work, we rethink the inherent limitation of widely-used mean pooling operation in the frame features aggregation and investigate the adaptions of excitation and aggregation design for discriminative video representation generation. We present a novel excitationand-aggregation design, including (1) The excitation module is available for capturing non-mutuallyexclusive relationships among frame features and achieving frame-wise features recalibration, and (2) The aggregation module is applied to learn exclusiveness used for frame representations aggregation. Similarly, we employ the cascade of sequential module and aggregation design to generate discriminative video representation in the sequential type. Besides, we adopt the excitation design in the tight type to obtain representative frame features for multi-modal interaction. The proposed modules are evaluated on three benchmark datasets of MSR-VTT, ActivityNet and DiDeMo, achieving MSR-VTT (43.9 R@1), ActivityNet (44.1 R@1) and DiDeMo (31.0 R@1). They outperform the CLIP4Clip results by +1.2% (+0.5%), +4.5% (+1.9%) and +9.5% (+2.7%) relative (absolute) improvements, demonstrating the superiority of our proposed excitation and aggregation designs. We hope our work will serve as an alternative for frame representations aggregation and facilitate future research.
△ Less
Submitted 8 June, 2024; v1 submitted 25 May, 2024;
originally announced June 2024.
-
Measurement of Electron Antineutrino Oscillation Amplitude and Frequency via Neutron Capture on Hydrogen at Daya Bay
Authors:
Daya Bay collaboration,
F. P. An,
W. D. Bai,
A. B. Balantekin,
M. Bishai,
S. Blyth,
G. F. Cao,
J. Cao,
J. F. Chang,
Y. Chang,
H. S. Chen,
H. Y. Chen,
S. M. Chen,
Y. Chen,
Y. X. Chen,
Z. Y. Chen,
J. Cheng,
J. Cheng,
Y. -C. Cheng,
Z. K. Cheng,
J. J. Cherwinka,
M. C. Chu,
J. P. Cummings,
O. Dalager,
F. S. Deng
, et al. (177 additional authors not shown)
Abstract:
This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive…
▽ More
This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive region, the relative $\overlineν_{e}$ rates and energy spectra variation among the near and far detectors gives $\mathrm{sin}^22θ_{13} = 0.0759_{-0.0049}^{+0.0050}$ and $Δm^2_{32} = (2.72^{+0.14}_{-0.15})\times10^{-3}$ eV$^2$ assuming the normal neutrino mass ordering, and $Δm^2_{32} = (-2.83^{+0.15}_{-0.14})\times10^{-3}$ eV$^2$ for the inverted neutrino mass ordering. This estimate of $\sin^2 2θ_{13}$ is consistent with and essentially independent from the one obtained using the capture-on-gadolinium sample at Daya Bay. The combination of these two results yields $\mathrm{sin}^22θ_{13}= 0.0833\pm0.0022$, which represents an 8% relative improvement in precision regarding the Daya Bay full 3158-day capture-on-gadolinium result.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Ta2Pd3Te5 topological thermometer
Authors:
Yupeng Li,
Anqi Wang,
Senyang Pan,
Dayu Yan,
Guang Yang,
Xingchen Guo,
Yu Hong,
Guangtong Liu,
Fanming Qu,
Zhijun Wang,
Tian Qian,
Jinglei Zhang,
Youguo Shi,
Li Lu,
Jie Shen
Abstract:
In recent decades, there has been a persistent pursuit of applications for surface/edge states in topological systems, driven by their dissipationless transport effects. However, there have been limited tangible breakthroughs in this field. This work demonstrates the remarkable properties of the topological insulator Ta2Pd3Te5, as a thermometer. This material exhibits a power-law correlation in te…
▽ More
In recent decades, there has been a persistent pursuit of applications for surface/edge states in topological systems, driven by their dissipationless transport effects. However, there have been limited tangible breakthroughs in this field. This work demonstrates the remarkable properties of the topological insulator Ta2Pd3Te5, as a thermometer. This material exhibits a power-law correlation in temperature-dependent resistance at low temperatures, stemming from its Luttinger liquid behavior of edge states, while exhibiting semiconductor behavior at high temperatures. The power-law behavior effectively addresses the issue of infinite resistance in semiconductor thermometers at ultra-low temperatures, thereby playing a crucial role in enabling efficient thermometry in refrigerators supporting millikelvin temperatures or below. By employing chemical doping, adjusting thickness, and controlling gate voltage, its power-law behavior and semiconductor behavior can be effectively modulated. This enables efficient thermometry spanning from millikelvin temperatures to room temperature, and allows for precise local temperature measurement. Furthermore, this thermometer exhibits excellent temperature sensitivity and resolution, and can be fine-tuned to show small magnetoresistance. In summary, the Ta2Pd3Te5 thermometer, also referred to as a topological thermometer, exhibits outstanding performance and significant potential for measuring a wider range of temperatures compared to conventional low-temperature thermometers.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
MLAE: Masked LoRA Experts for Parameter-Efficient Fine-Tuning
Authors:
Junjie Wang,
Guangjing Yang,
Wentao Chen,
Huahui Yi,
Xiaohu Wu,
Qicheng Lao
Abstract:
In response to the challenges posed by the extensive parameter updates required for full fine-tuning of large-scale pre-trained models, parameter-efficient fine-tuning (PEFT) methods, exemplified by Low-Rank Adaptation (LoRA), have emerged. LoRA simplifies the fine-tuning process but may still struggle with a certain level of redundancy in low-rank matrices and limited effectiveness from merely in…
▽ More
In response to the challenges posed by the extensive parameter updates required for full fine-tuning of large-scale pre-trained models, parameter-efficient fine-tuning (PEFT) methods, exemplified by Low-Rank Adaptation (LoRA), have emerged. LoRA simplifies the fine-tuning process but may still struggle with a certain level of redundancy in low-rank matrices and limited effectiveness from merely increasing their rank. To address these issues, a natural idea is to enhance the independence and diversity of the learning process for the low-rank matrices. Therefore, we propose Masked LoRA Experts (MLAE), an innovative approach that applies the concept of masking to PEFT. Our method incorporates a cellular decomposition strategy that transforms a low-rank matrix into independent rank-1 submatrices, or ``experts'', thus enhancing independence. Additionally, we introduce a binary mask matrix that selectively activates these experts during training to promote more diverse and anisotropic learning, based on expert-level dropout strategies. Our investigations reveal that this selective activation not only enhances performance but also fosters a more diverse acquisition of knowledge with a marked decrease in parameter similarity among MLAE, significantly boosting the quality of the model while barely increasing the parameter count. Remarkably, MLAE achieves new SOTA performance with an average accuracy score of 78.8% on the VTAB-1k benchmark and 90.9% on the FGVC benchmark, demonstrating superior performance. Our code is available at https://github.com/jie040109/MLAE.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.