subscribe to arXiv mailings

arXiv:2407.09021 [pdf, other]

Squeeze-and-Excite ResNet-Conformers for Sound Event Localization, Detection, and Distance Estimation for DCASE 2024 Challenge

Authors: Jun Wei Yeow, Ee-Leng Tan, Jisheng Bai, Santi Peksi, Woon-Seng Gan

Abstract: This technical report details our systems submitted for Task 3 of the DCASE 2024 Challenge: Audio and Audiovisual Sound Event Localization and Detection (SELD) with Source Distance Estimation (SDE). We address only the audio-only SELD with SDE (SELDDE) task in this report. We propose to improve the existing ResNet-Conformer architectures with Squeeze-and-Excitation blocks in order to introduce add… ▽ More This technical report details our systems submitted for Task 3 of the DCASE 2024 Challenge: Audio and Audiovisual Sound Event Localization and Detection (SELD) with Source Distance Estimation (SDE). We address only the audio-only SELD with SDE (SELDDE) task in this report. We propose to improve the existing ResNet-Conformer architectures with Squeeze-and-Excitation blocks in order to introduce additional forms of channel- and spatial-wise attention. In order to improve SELD performance, we also utilize the Spatial Cue-Augmented Log-Spectrogram (SALSA) features over the commonly used log-mel spectra features for polyphonic SELD. We complement the existing Sony-TAu Realistic Spatial Soundscapes 2023 (STARSS23) dataset with the audio channel swapping technique and synthesize additional data using the SpatialScaper generator. We also perform distance scaling in order to prevent large distance errors from contributing more towards the loss function. Finally, we evaluate our approach on the evaluation subset of the STARSS23 dataset. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: Technical report for DCASE 2024 Challenge Task 3

arXiv:2407.08420 [pdf]

Skin Effect of Nonlinear Optical Responses in Antiferromagnets

Authors: Hang Zhou, Rui-Chun Xiao, Shu-Hui Zhang, Wei Gan, Hui Han, Hong-Miao Zhao, Wenjian Lu, Changjin Zhang, Yuping Sun, Hui Li, Ding-Fu Shao

Abstract: Nonlinear optics plays important roles in the research of fundamental physics and the applications of highperformance optoelectronic devices. The bulk nonlinear optical responses arise from the uniform light absorption in noncentrosymmetric crystals, and hence are usually considered to be the collective phenomena of all atoms. Here we show, in contrast to this common expectation, the nonlinear opt… ▽ More Nonlinear optics plays important roles in the research of fundamental physics and the applications of highperformance optoelectronic devices. The bulk nonlinear optical responses arise from the uniform light absorption in noncentrosymmetric crystals, and hence are usually considered to be the collective phenomena of all atoms. Here we show, in contrast to this common expectation, the nonlinear optical responses in antiferromagnets can be selectively accumulated near the surfaces, representing a skin effect. This is because the inversion symmetry, despite being broken globally, is barely violated locally deeply inside these antiferromagnets. Using A-type layered antiferromagnets as the representatives, we predict that the spatial-dependent nonlinear optical responses, such as bulk photovoltaic effect (BPVE) and second harmonic generation (SHG), are notable in the top- and bottom-most layers and decay rapidly when moving away from the surfaces. Such a phenomenon exists in a broad range of antiferromagnets composed of centrosymmetric sublattices, offering promising device applications using these antiferromagnets. Our work uncovers a previously overlooked property of nonlinear optical responses and opens new opportunities for high-performance antiferromagnetic optospintronics. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2407.05744 [pdf, other]

Automating Urban Soundscape Enhancements with AI: In-situ Assessment of Quality and Restorativeness in Traffic-Exposed Residential Areas

Authors: Bhan Lam, Zhen-Ting Ong, Kenneth Ooi, Wen-Hui Ong, Trevor Wong, Karn N. Watcharasupat, Vanessa Boey, Irene Lee, Joo Young Hong, Jian Kang, Kar Fye Alvin Lee, Georgios Christopoulos, Woon-Seng Gan

Abstract: Formalized in ISO 12913, the "soundscape" approach is a paradigmatic shift towards perception-based urban sound management, aiming to alleviate the substantial socioeconomic costs of noise pollution to advance the United Nations Sustainable Development Goals. Focusing on traffic-exposed outdoor residential sites, we implemented an automatic masker selection system (AMSS) utilizing natural sounds t… ▽ More Formalized in ISO 12913, the "soundscape" approach is a paradigmatic shift towards perception-based urban sound management, aiming to alleviate the substantial socioeconomic costs of noise pollution to advance the United Nations Sustainable Development Goals. Focusing on traffic-exposed outdoor residential sites, we implemented an automatic masker selection system (AMSS) utilizing natural sounds to mask (or augment) traffic soundscapes. We employed a pre-trained AI model to automatically select the optimal masker and adjust its playback level, adapting to changes over time in the ambient environment to maximize "Pleasantness", a perceptual dimension of soundscape quality in ISO 12913. Our validation study involving ($N=68$) residents revealed a significant 14.6 % enhancement in "Pleasantness" after intervention, correlating with increased restorativeness and positive affect. Perceptual enhancements at the traffic-exposed site matched those at a quieter control site with 6 dB(A) lower $L_\text{A,eq}$ and road traffic noise dominance, affirming the efficacy of AMSS as a soundscape intervention, while streamlining the labour-intensive assessment of "Pleasantness" with probabilistic AI prediction. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: 41 pages, 4 figures. Preprint submitted to an Elsevier journal

arXiv:2406.11602 [pdf, other]

Association between a Failed Prominence Eruption and the Drainage of Mass from Another Prominence

Authors: Jianchao Xue, Li Feng, Hui Li, Ping Zhang, Jun Chen, Guanglu Shi, Kaifan Ji, Ye Qiu, Chuan Li, Lei Lu, Beili Ying, Ying Li, Yu Huang, Youping Li, Jingwei Li, Jie Zhao, Dechao Song, Shuting Li, Zhengyuan Tian, Yingna Su, Qingmin Zhang, Yunyi Ge, Jiahui Shan, Qiao Li, Gen Li , et al. (9 additional authors not shown)

Abstract: Sympathetic eruptions of solar prominences have been studied for decades, however, it is usually difficult to identify their causal links. Here we present two failed prominence eruptions on 26 October 2022 and explore their connections. Using stereoscopic observations, the south prominence (PRO-S) erupts with untwisting motions, flare ribbons occur underneath, and new connections are formed during… ▽ More Sympathetic eruptions of solar prominences have been studied for decades, however, it is usually difficult to identify their causal links. Here we present two failed prominence eruptions on 26 October 2022 and explore their connections. Using stereoscopic observations, the south prominence (PRO-S) erupts with untwisting motions, flare ribbons occur underneath, and new connections are formed during the eruption. The north prominence (PRO-N) rises up along with PRO-S, and its upper part disappears due to catastrophic mass draining along an elongated structure after PRO-S failed eruption. We suggest that the eruption of PRO-S initiates due to a kink instability, further rises up, and fails to erupt due to reconnection with surrounding fields. The elongated structure connecting PRO-N overlies PRO-S, which causes the rising up of PRO-N along with PRO-S and mass drainage after PRO-S eruption. This study suggests that a prominence may end its life through mass drainage forced by an eruption underneath. △ Less

Submitted 20 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: 15 pages, 7 figures, has been accepted by Solar Physics

arXiv:2406.11297 [pdf, other]

doi 10.1007/s11207-024-02330-x

Parameter effects on the total intensity of H I Lyα line for a modelled coronal mass ejection and its driven shock

Authors: Beili Ying, Guanglu Shi, Li Feng, Lei Lu, Jianchao Xue, Shuting Li, Weiqun Gan, Hui Li

Abstract: The combination of the H I Lyα (121.6 nm) line formation mechanism with ultraviolet (UV) Lyα and white-light (WL) observations provides an effective method for determining the electron temperature of coronal mass ejections (CMEs). A key to ensuring the accuracy of this diagnostic technique is the precise calculation of theoretical Lyα intensities. This study performs a modelled CME and its driven… ▽ More The combination of the H I Lyα (121.6 nm) line formation mechanism with ultraviolet (UV) Lyα and white-light (WL) observations provides an effective method for determining the electron temperature of coronal mass ejections (CMEs). A key to ensuring the accuracy of this diagnostic technique is the precise calculation of theoretical Lyα intensities. This study performs a modelled CME and its driven shock via the 3D MHD simulation. We generate synthetic UV and WL images of the CME and shock to quantify the impact of different assumptions on theoretical Lyα intensities, such as the incident intensity of the Lyα line (Idisk), the geometric scattering function (p(θ)), and the kinetic temperature (Tn) assumed to be equal to the proton (Tp) or electron (Te) temperatures. By comparing differences of the Lyα intensities under these assumptions, we find that: (1) Using the uniform or Carrington maps of the disk Lyα emission underestimates the corona Lyα intensity (< 10%) compared to the synchronic map, except for a slight overestimate (< 4%) in the partial CME core. The Carrington map yields lower uncertainties than the uniform disk. (2) The geometric scattering process has a minor impact on the Lyα intensity, with a maximum relative uncertainty of < 5%. The Lyα intensity is underestimated for the most part but overestimated in the CME core. (3) Compared to the assumption Tn = Tp, using Tn = Te leads to more complex relative uncertainties in CME Lyα intensity. The CME core and void are both overestimated, with the maximum uncertainty in the core exceeding 50% and the void remaining below 35%. In the CME front, both over- and under-estimates exist with relative uncertainties of < 35%. The electron temperature assumption has a smaller impact on the shock, with an underestimated relative uncertainty of less than 20%. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 22 pages, 9 figures, accepted by Solar Physics

arXiv:2406.05070 [pdf, other]

Targeted Mining Precise-positioning Episode Rules

Authors: Jian Zhu, Xiaoye Chen, Wensheng Gan, Zefeng Chen, Philip S. Yu

Abstract: The era characterized by an exponential increase in data has led to the widespread adoption of data intelligence as a crucial task. Within the field of data mining, frequent episode mining has emerged as an effective tool for extracting valuable and essential information from event sequences. Various algorithms have been developed to discover frequent episodes and subsequently derive episode rules… ▽ More The era characterized by an exponential increase in data has led to the widespread adoption of data intelligence as a crucial task. Within the field of data mining, frequent episode mining has emerged as an effective tool for extracting valuable and essential information from event sequences. Various algorithms have been developed to discover frequent episodes and subsequently derive episode rules using the frequency function and anti-monotonicity principles. However, currently, there is a lack of algorithms specifically designed for mining episode rules that encompass user-specified query episodes. To address this challenge and enable the mining of target episode rules, we introduce the definition of targeted precise-positioning episode rules and formulate the problem of targeted mining precise-positioning episode rules. Most importantly, we develop an algorithm called Targeted Mining Precision Episode Rules (TaMIPER) to address the problem and optimize it using four proposed strategies, leading to significant reductions in both time and space resource requirements. As a result, TaMIPER offers high accuracy and efficiency in mining episode rules of user interest and holds promising potential for prediction tasks in various domains, such as weather observation, network intrusion, and e-commerce. Experimental results on six real datasets demonstrate the exceptional performance of TaMIPER. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: IEEE TETCI, 14 pages

arXiv:2406.02783 [pdf, other]

High-resolution Observation of Blowout Jets Regulated by Sunspot Rotation

Authors: Tingyu Gou, Rui Liu, Yang Su, Astrid M. Veronig, Hanya Pan, Runbin Luo, Weiqun Gan

Abstract: Coronal jets are believed to be the miniature version of large-scale solar eruptions. In particular, the eruption of a mini-filament inside the base arch is suggested to be the trigger and even driver of blowout jets. Here we propose an alternative triggering mechanism, based on high-resolution H-alpha observations of a blowout jet associated with a mini-filament and an M1.2-class flare. The mini-… ▽ More Coronal jets are believed to be the miniature version of large-scale solar eruptions. In particular, the eruption of a mini-filament inside the base arch is suggested to be the trigger and even driver of blowout jets. Here we propose an alternative triggering mechanism, based on high-resolution H-alpha observations of a blowout jet associated with a mini-filament and an M1.2-class flare. The mini-filament remains largely stationary during the blowout jet, except that it is straddled by flare loops connecting two flare ribbons, indicating that the magnetic arcade embedding the mini-filament has been torn into two parts, with the upper part escaping with the blowout jet. In the wake of the flare, the southern end of the mini-filament fans out like neighboring fibrils, indicative of mass and field exchanges between the mini-filament and the fibrils. The blowout jet is preceded by a standard jet. With H-alpha fibrils moving toward the single-strand spire in a sweeping fashion, the standard jet transitions to the blowout jet. The similar pattern of standard-to-blowout jet transition occurs in an earlier C-class flare before the mini-filament forms. The spiraling morphology and sweeping direction of these fibrils are suggestive of their footpoints being dragged by the leading sunspot that undergoes clockwise rotation for over two days. Soon after the sunspot rotation reaches a peak angular speed as fast as 10 deg/hr, the dormant active region becomes flare-productive, and the mini-filament forms through the interaction of moving magnetic features from the rotating sunspot with satellite spots/pores. Hence, we suggest that the sunspot rotation plays a key role in building up free energy for flares and jets and in triggering blowout jets by inducing sweeping motions of fibrils. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 16 pages, 10 figures, accepted in Solar Physics

arXiv:2405.18665 [pdf, other]

Refinement of global coronal and interplanetary magnetic field extrapolations constrained by remote-sensing and in-situ observations at the solar minimum

Authors: Guanglu Shi, Li Feng, Beili Ying, Shuting Li, Weiqun Gan

Abstract: Solar magnetic fields are closely related to various physical phenomena on the sun, which can be extrapolated with different models from photospheric magnetograms. However, the Open Flux Problem (OFP), the underestimation of the magnetic field derived from the extrapolated model, is still unsolved. To minimize the impact of the OFP, we propose three evaluation parameters to quantitatively evaluate… ▽ More Solar magnetic fields are closely related to various physical phenomena on the sun, which can be extrapolated with different models from photospheric magnetograms. However, the Open Flux Problem (OFP), the underestimation of the magnetic field derived from the extrapolated model, is still unsolved. To minimize the impact of the OFP, we propose three evaluation parameters to quantitatively evaluate magnetic field models and determine the optimal free parameters in the models by constraining the coronal magnetic fields (CMFs) and the interplanetary magnetic fields (IMFs) with real observations. Although the OFP still exists, we find that magnetic field lines traced from the coronal models effectively capture the intricate topological configurations observed in the corona, including streamers and plumes. The OFP is lessened by using the HMI synoptic map instead of the GONG daily synoptic maps, and the PFSS+PFCS model instead of the CSSS model. For Carrington Rotation (CR) 2231 at the solar minimum, we suggest that the optimal parameters for the PFSS+PFCS model are $R_{\mathrm{ss}} = 2.2-2.5\ R_{\mathrm{sun}}$ and $R_{\mathrm{scs}} = 10.5-14.0\ R_{\mathrm{sun}}$, as well as for the CSSS model are $R_{\mathrm{cs}} = 2.0 - 2.4\ R_{\mathrm{sun}}$, $R_{\mathrm{ss}} = 11.0 - 14.7\ R_{\mathrm{sun}}$ and $a = 1.0\ R_{\mathrm{sun}}$. Despite the IMFs at 1 AU being consistent with the measurements by artificially increasing the polar magnetic fields, the IMFs near the sun are still underestimated. The OFP might be advanced by improving the accuracy of both the weak magnetic fields and polar magnetic fields, especially considering magnetic activities arising from interplanetary physical processes. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 21 pages, 15 figures, accepted for publication in ApJ

arXiv:2405.16457 [pdf, other]

Entanglement island and Page curve for one-sided charged black hole

Authors: Yun-Feng Qu, Yi-Ling Lan, Hongwei Yu, Wen-Cong Gan, Fu-Wen Shu

Abstract: In this paper, we extend the method of calculating the entanglement entropy of Hawking radiation of black holes using the "in" vacuum state, which describes one-sided asymptotically flat neutral black hole formed by gravitational collapse, to dynamic charged black holes. We explore the influence of charge on the position of the boundary of island $\partial I$ and the Page time. Due to their distin… ▽ More In this paper, we extend the method of calculating the entanglement entropy of Hawking radiation of black holes using the "in" vacuum state, which describes one-sided asymptotically flat neutral black hole formed by gravitational collapse, to dynamic charged black holes. We explore the influence of charge on the position of the boundary of island $\partial I$ and the Page time. Due to their distinct geometric structures, we discuss non-extremal and extremal charged black holes separately. In non-extremal cases, the emergence of island saves the bound of entropy at late times, and the entanglement entropy of Hawking radiation satisfies the Page curve. Moreover, we also find that the position of the boundary of island $\partial I$ depends on the position of the cutoff surface (observers), differing from the behavior in eternal charged black holes. In extremal black holes, when the island exists, the entanglement entropy is approximately equal to the Bekenstein-Hawking entropy, while the entanglement entropy becomes ill-defined when island is absent. Our analysis underscores how different geometric configurations significantly influence the behavior of entropy. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.14158 [pdf, other]

Computation-efficient Virtual Sensing Approach with Multichannel Adjoint Least Mean Square Algorithm

Authors: Boxiang Wang, Junwei Ji, Xiaoyi Shen, Dongyuan Shi, Woon-Seng Gan

Abstract: Multichannel active noise control (ANC) systems are designed to create a large zone of quietness (ZoQ) around the error microphones, however, the placement of these microphones often presents challenges due to physical limitations. Virtual sensing technique that effectively suppresses the noise far from the physical error microphones is one of the most promising solutions. Nevertheless, the conven… ▽ More Multichannel active noise control (ANC) systems are designed to create a large zone of quietness (ZoQ) around the error microphones, however, the placement of these microphones often presents challenges due to physical limitations. Virtual sensing technique that effectively suppresses the noise far from the physical error microphones is one of the most promising solutions. Nevertheless, the conventional multichannel virtual sensing ANC (MVANC) system based on the multichannel filtered reference least mean square (MCFxLMS) algorithm often suffers from high computational complexity. This paper proposes a feedforward MVANC system that incorporates the multichannel adjoint least mean square (MCALMS) algorithm to overcome these limitations effectively. Computational analysis demonstrates the improvement of computational efficiency and numerical simulations exhibit comparable noise reduction performance at virtual locations compared to the conventional MCFxLMS algorithm. Additionally, the effects of varied tuning noises on system performance are also investigated, providing insightful findings on optimizing MVANC systems. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.13055 [pdf, other]

Large Language Models for Medicine: A Survey

Authors: Yanxin Zheng, Wensheng Gan, Zefeng Chen, Zhenlian Qi, Qian Liang, Philip S. Yu

Abstract: To address challenges in the digital economy's landscape of digital intelligence, large language models (LLMs) have been developed. Improvements in computational power and available resources have significantly advanced LLMs, allowing their integration into diverse domains for human life. Medical LLMs are essential application tools with potential across various medical scenarios. In this paper, w… ▽ More To address challenges in the digital economy's landscape of digital intelligence, large language models (LLMs) have been developed. Improvements in computational power and available resources have significantly advanced LLMs, allowing their integration into diverse domains for human life. Medical LLMs are essential application tools with potential across various medical scenarios. In this paper, we review LLM developments, focusing on the requirements and applications of medical LLMs. We provide a concise overview of existing models, aiming to explore advanced research directions and benefit researchers for future medical applications. We emphasize the advantages of medical LLMs in applications, as well as the challenges encountered during their development. Finally, we suggest directions for technical integration to mitigate challenges and potential research directions for the future of medical LLMs, aiming to meet the demands of the medical field better. △ Less

Submitted 19 May, 2024; originally announced May 2024.

Comments: Preprint. 5 figures,5 tables

arXiv:2405.13001 [pdf, other]

Large Language Models for Education: A Survey

Authors: Hanyi Xu, Wensheng Gan, Zhenlian Qi, Jiayang Wu, Philip S. Yu

Abstract: Artificial intelligence (AI) has a profound impact on traditional education. In recent years, large language models (LLMs) have been increasingly used in various applications such as natural language processing, computer vision, speech recognition, and autonomous driving. LLMs have also been applied in many fields, including recommendation, finance, government, education, legal affairs, and financ… ▽ More Artificial intelligence (AI) has a profound impact on traditional education. In recent years, large language models (LLMs) have been increasingly used in various applications such as natural language processing, computer vision, speech recognition, and autonomous driving. LLMs have also been applied in many fields, including recommendation, finance, government, education, legal affairs, and finance. As powerful auxiliary tools, LLMs incorporate various technologies such as deep learning, pre-training, fine-tuning, and reinforcement learning. The use of LLMs for smart education (LLMEdu) has been a significant strategic direction for countries worldwide. While LLMs have shown great promise in improving teaching quality, changing education models, and modifying teacher roles, the technologies are still facing several challenges. In this paper, we conduct a systematic review of LLMEdu, focusing on current technologies, challenges, and future developments. We first summarize the current state of LLMEdu and then introduce the characteristics of LLMs and education, as well as the benefits of integrating LLMs into education. We also review the process of integrating LLMs into the education industry, as well as the introduction of related technologies. Finally, we discuss the challenges and problems faced by LLMEdu, as well as prospects for future optimization of LLMEdu. △ Less

Submitted 11 May, 2024; originally announced May 2024.

Comments: Journal of Machine Learning and Cybernetics. 4 tables, 6 figures

arXiv:2405.12996 [pdf, other]

Dose-aware Diffusion Model for 3D Low-dose PET: Multi-institutional Validation with Reader Study and Real Low-dose Data

Authors: Huidong Xie, Weijie Gan, Bo Zhou, Ming-Kai Chen, Michal Kulon, Annemarie Boustani, Benjamin A. Spencer, Reimund Bayerlein, Xiongchao Chen, Qiong Liu, Xueqi Guo, Menghua Xia, Yinchi Zhou, Hui Liu, Liang Guo, Hongyu An, Ulugbek S. Kamilov, Hanzhong Wang, Biao Li, Axel Rominger, Kuangyu Shi, Ge Wang, Ramsey D. Badawi, Chi Liu

Abstract: As PET imaging is accompanied by radiation exposure and potentially increased cancer risk, reducing radiation dose in PET scans without compromising the image quality is an important topic. Deep learning (DL) techniques have been investigated for low-dose PET imaging. However, existing models have often resulted in compromised image quality when achieving low-dose PET and have limited generalizabi… ▽ More As PET imaging is accompanied by radiation exposure and potentially increased cancer risk, reducing radiation dose in PET scans without compromising the image quality is an important topic. Deep learning (DL) techniques have been investigated for low-dose PET imaging. However, existing models have often resulted in compromised image quality when achieving low-dose PET and have limited generalizability to different image noise-levels, acquisition protocols, patient populations, and hospitals. Recently, diffusion models have emerged as the new state-of-the-art generative model to generate high-quality samples and have demonstrated strong potential for medical imaging tasks. However, for low-dose PET imaging, existing diffusion models failed to generate consistent 3D reconstructions, unable to generalize across varying noise-levels, often produced visually-appealing but distorted image details, and produced images with biased tracer uptake. Here, we develop DDPET-3D, a dose-aware diffusion model for 3D low-dose PET imaging to address these challenges. Collected from 4 medical centers globally with different scanners and clinical protocols, we extensively evaluated the proposed model using a total of 9,783 18F-FDG studies (1,596 patients) with low-dose/low-count levels ranging from 1% to 50%. With a cross-center, cross-scanner validation, the proposed DDPET-3D demonstrated its potential to generalize to different low-dose levels, different scanners, and different clinical protocols. As confirmed with reader studies performed by nuclear medicine physicians, the proposed method produced superior denoised results that are comparable to or even better than the 100% full-count images as well as previous DL baselines. The presented results show the potential of achieving low-dose PET while maintaining image quality. Lastly, a group of real low-dose scans was also included for evaluation. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: 16 Pages, 15 Figures, 4 Tables. Paper under review. arXiv admin note: substantial text overlap with arXiv:2311.04248

arXiv:2405.12496 [pdf, other]

A Survey of Integrating Wireless Technology into Active Noise Control

Authors: Xiaoyi Shen, Dongyuan Shi, Zhengding Luo, Junwei Ji, Woon-Seng Gan

Abstract: Active Noise Control (ANC) is a widely adopted technology for reducing environmental noise across various scenarios. This paper focuses on enhancing noise reduction performance, particularly through the refinement of signal quality fed into ANC systems. We discuss the main wireless technique integrated into the ANC system, equipped with some innovative algorithms, in diverse environments. Instead… ▽ More Active Noise Control (ANC) is a widely adopted technology for reducing environmental noise across various scenarios. This paper focuses on enhancing noise reduction performance, particularly through the refinement of signal quality fed into ANC systems. We discuss the main wireless technique integrated into the ANC system, equipped with some innovative algorithms, in diverse environments. Instead of using microphone arrays, which increase the computation complexity of the ANC system, to isolate multiple noise sources to improve noise reduction performance, the application of the wireless technique avoids extra computation demand. Wireless transmissions of reference, error, and control signals are also applied to improve the convergence performance of the ANC system. Furthermore, this paper lists some wireless ANC applications, such as earbuds, headphones, windows, and headrests, underscoring their adaptability and efficiency in various settings. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.07536 [pdf, other]

Multi-AUV Kinematic Task Assignment based on Self-organizing Map Neural Network and Dubins Path Generator

Authors: Xin Li, Wenyang Gan, Pang Wen, Daqi Zhu

Abstract: To deal with the task assignment problem of multi-AUV systems under kinematic constraints, which means steering capability constraints for underactuated AUVs or other vehicles likely, an improved task assignment algorithm is proposed combining the Dubins Path algorithm with improved SOM neural network algorithm. At first, the aimed tasks are assigned to the AUVs by improved SOM neural network meth… ▽ More To deal with the task assignment problem of multi-AUV systems under kinematic constraints, which means steering capability constraints for underactuated AUVs or other vehicles likely, an improved task assignment algorithm is proposed combining the Dubins Path algorithm with improved SOM neural network algorithm. At first, the aimed tasks are assigned to the AUVs by improved SOM neural network method based on workload balance and neighborhood function. When there exists kinematic constraints or obstacles which may cause failure of trajectory planning, task re-assignment will be implemented by change the weights of SOM neurals, until the AUVs can have paths to reach all the targets. Then, the Dubins paths are generated in several limited cases. AUV's yaw angle is limited, which result in new assignments to the targets. Computation flow is designed so that the algorithm in MATLAB and Python can realizes the path planning to multiple targets. Finally, simulation results prove that the proposed algorithm can effectively accomplish the task assignment task for multi-AUV system. △ Less

Submitted 24 June, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.07485 [pdf, other]

The Energy Sources, the Physical Properties, and the Mass-loss History of SN 2017dio

Authors: Deng-Wang Shi, Shan-Qin Wang, Wen-Pei Gan, En-Wei Liang

Abstract: We study the energy sources, the physical properties of the ejecta and the circumstellar medium (CSM), as well as the mass-loss history of the progenitor of SN 2017dio which is a broad-lined Ic (Ic-BL) supernova (SN) having unusual light curves (LCs) and signatures of hydrogen-rich CSM in its early spectrum. We find that the temperature of SN 2017dio began to increase linearly about 20 days after… ▽ More We study the energy sources, the physical properties of the ejecta and the circumstellar medium (CSM), as well as the mass-loss history of the progenitor of SN 2017dio which is a broad-lined Ic (Ic-BL) supernova (SN) having unusual light curves (LCs) and signatures of hydrogen-rich CSM in its early spectrum. We find that the temperature of SN 2017dio began to increase linearly about 20 days after the explosion. We use the $^{56}$Ni plus the ejecta-CSM interaction (CSI) model to fit the LCs of SN 2017dio, finding that the masses of the ejecta, the $^{56}$Ni, and the CSM are $\sim$ 12.41 M$_\odot$, $\sim$ 0.17 M$_\odot$, and $\sim$ 5.82 M$_\odot$, respectively. The early-time photosphere velocity and the kinetic energy of the SN are respectively {$\sim$ 1.89 $\times 10^4$ km s$^{-1}$} and $\sim$ 2.66 $\times 10^{52}$ erg, which are respectively comparable to those of SNe Ic-BL and hypernovae (HNe). We suggest that the CSM of SN 2017dio might be {from an luminous-blue-variable-like outburst or} pulsational pair instability $\sim$ 1.2$-$11.4 yr prior to the SN explosion{, or binary mass transfer}. {Moreover,} we find that its ejecta mass is larger than those of many SNe Ic-BL, and that its $^{56}$Ni mass ($M_{\rm Ni}$) is approximately equal to the mean (or median) value of $M_{\rm Ni}$ of SNe Ic-BL in the literature, but lower than $M_{\rm Ni}$ of prototype HNe (e.g., SN 1998bw and SN 2003dh). △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: Accepted for publication in ApJ, 17 pages, 4 figures, 3 tables

arXiv:2405.01308 [pdf, ps, other]

Spectral and Imaging Observations of a C2.3 White-Light Flare from the Advanced Space-Based Solar Observatory (ASO-S) and the Chinese H$α$ Solar Explorer (CHASE)

Authors: Qiao Li, Ying Li, Yang Su, Dechao Song, Hui Li, Li Feng, Yu Huang, Youping Li, Jingwei Li, Jie Zhao, Lei Lu, Beili Ying, Jianchao Xue, Ping Zhang, Jun Tian, Xiaofeng Liu, Gen Li, Zhichen Jing, Shuting Li, Guanglu Shi, Zhengyuan Tian, Wei Chen, Yingna Su, Qingmin Zhang, Dong Li , et al. (5 additional authors not shown)

Abstract: Solar white-light flares are characterized by an enhancement in the optical continuum, which are usually large flares (say X- and M-class flares). Here we report a small C2.3 white-light flare (SOL2022-12-20T04:10) observed by the \emph{Advanced Space-based Solar Observatory} and the \emph{Chinese H$α$ Solar Explorer}. This flare exhibits an increase of $\approx$6.4\% in the photospheric Fe \texts… ▽ More Solar white-light flares are characterized by an enhancement in the optical continuum, which are usually large flares (say X- and M-class flares). Here we report a small C2.3 white-light flare (SOL2022-12-20T04:10) observed by the \emph{Advanced Space-based Solar Observatory} and the \emph{Chinese H$α$ Solar Explorer}. This flare exhibits an increase of $\approx$6.4\% in the photospheric Fe \textsc{i} line at 6569.2\,Å and {$\approx$3.2\%} in the nearby continuum. The continuum at 3600\,Å also shows an enhancement of $\approx$4.7\%. The white-light brightening kernels are mainly located at the flare ribbons and co-spatial with nonthermal hard X-ray sources, which implies that the enhanced white-light emissions are related to nonthermal electron-beam heating. At the brightening kernels, the Fe \textsc{i} line displays an absorption profile that has a good Gaussian shape, with a redshift up to $\approx$1.7 km s$^{-1}$, while the H$α$ line shows an emission profile though having a central reversal. The H$α$ line profile also shows a red or blue asymmetry caused by plasma flows with a velocity of several to tens of km s$^{-1}$. It is interesting to find that the H$α$ asymmetry is opposite at the conjugate footpoints. It is also found that the CHASE continuum increase seems to be related to the change of photospheric magnetic field. Our study provides comprehensive characteristics of a small white-light flare that help understand the energy release process of white-light flares. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: 23 pages, 6 figures, accepted by Solar Physics

arXiv:2404.18428 [pdf, other]

Geospatial Big Data: Survey and Challenges

Authors: Jiayang Wu, Wensheng Gan, Han-Chieh Chao, Philip S. Yu

Abstract: In recent years, geospatial big data (GBD) has obtained attention across various disciplines, categorized into big earth observation data and big human behavior data. Identifying geospatial patterns from GBD has been a vital research focus in the fields of urban management and environmental sustainability. This paper reviews the evolution of GBD mining and its integration with advanced artificial… ▽ More In recent years, geospatial big data (GBD) has obtained attention across various disciplines, categorized into big earth observation data and big human behavior data. Identifying geospatial patterns from GBD has been a vital research focus in the fields of urban management and environmental sustainability. This paper reviews the evolution of GBD mining and its integration with advanced artificial intelligence (AI) techniques. GBD consists of data generated by satellites, sensors, mobile devices, and geographical information systems, and we categorize geospatial data based on different perspectives. We outline the process of GBD mining and demonstrate how it can be incorporated into a unified framework. Additionally, we explore new technologies like large language models (LLM), the Metaverse, and knowledge graphs, and how they could make GBD even more useful. We also share examples of GBD helping with city management and protecting the environment. Finally, we discuss the real challenges that come up when working with GBD, such as issues with data retrieval and security. Our goal is to give readers a clear view of where GBD mining stands today and where it might go next. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: IEEE JSTARS. 14 pages, 5 figures

arXiv:2403.18139 [pdf, other]

Pseudo-MRI-Guided PET Image Reconstruction Method Based on a Diffusion Probabilistic Model

Authors: Weijie Gan, Huidong Xie, Carl von Gall, Günther Platsch, Michael T. Jurkiewicz, Andrea Andrade, Udunna C. Anazodo, Ulugbek S. Kamilov, Hongyu An, Jorge Cabello

Abstract: Anatomically guided PET reconstruction using MRI information has been shown to have the potential to improve PET image quality. However, these improvements are limited to PET scans with paired MRI information. In this work we employed a diffusion probabilistic model (DPM) to infer T1-weighted-MRI (deep-MRI) images from FDG-PET brain images. We then use the DPM-generated T1w-MRI to guide the PET re… ▽ More Anatomically guided PET reconstruction using MRI information has been shown to have the potential to improve PET image quality. However, these improvements are limited to PET scans with paired MRI information. In this work we employed a diffusion probabilistic model (DPM) to infer T1-weighted-MRI (deep-MRI) images from FDG-PET brain images. We then use the DPM-generated T1w-MRI to guide the PET reconstruction. The model was trained with brain FDG scans, and tested in datasets containing multiple levels of counts. Deep-MRI images appeared somewhat degraded than the acquired MRI images. Regarding PET image quality, volume of interest analysis in different brain regions showed that both PET reconstructed images using the acquired and the deep-MRI images improved image quality compared to OSEM. Same conclusions were found analysing the decimated datasets. A subjective evaluation performed by two physicians confirmed that OSEM scored consistently worse than the MRI-guided PET images and no significant differences were observed between the MRI-guided PET images. This proof of concept shows that it is possible to infer DPM-based MRI imagery to guide the PET reconstruction, enabling the possibility of changing reconstruction parameters such as the strength of the prior on anatomically guided PET reconstruction in the absence of MRI. △ Less

Submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.01686 [pdf, other]

doi 10.3847/2041-8213/ad319

AT2023lli: A Tidal Disruption Event with Prominent Optical Early Bump and Delayed Episodic X-ray Emission

Authors: Shifeng Huang, Ning Jiang, Jiazheng Zhu, Yibo Wang, Tinggui Wang, Shan-Qin Wang, Wen-Pei Gan, En-Wei Liang, Yu-Jing Qin, Zheyu Lin, Lin-Na Xu, Min-Xuan Cai, Ji-An Jiang, Xu Kong, Jiaxun Li, Long Li, Jian-Guo Wang, Ze-Lin Xu, Yongquan Xue, Ye-Fei Yuan, Jingquan Cheng, Lulu Fan, Jie Gao, Lei Hu, Weida Hu , et al. (20 additional authors not shown)

Abstract: High-cadence, multiwavelength observations have continuously revealed the diversity of tidal disruption events (TDEs), thus greatly advancing our knowledge and understanding of TDEs. In this work, we conducted an intensive optical-UV and X-ray follow-up campaign of TDE AT2023lli, and found a remarkable month-long bump in its UV/optical light curve nearly two months prior to maximum brightness. The… ▽ More High-cadence, multiwavelength observations have continuously revealed the diversity of tidal disruption events (TDEs), thus greatly advancing our knowledge and understanding of TDEs. In this work, we conducted an intensive optical-UV and X-ray follow-up campaign of TDE AT2023lli, and found a remarkable month-long bump in its UV/optical light curve nearly two months prior to maximum brightness. The bump represents the longest separation time from the main peak among known TDEs to date. The main UV/optical outburst declines as $t^{-4.10}$, making it one of the fastest decaying optically selected TDEs. Furthermore, we detected sporadic X-ray emission 30 days after the UV/optical peak, accompanied by a reduction in the period of inactivity. It is proposed that the UV/optical bump could be caused by the self-intersection of the stream debris, whereas the primary peak is generated by the reprocessed emission of the accretion process. In addition, our results suggest that episodic X-ray radiation during the initial phase of decline may be due to the patched obscurer surrounding the accretion disk, a phenomenon associated with the inhomogeneous reprocessing process. The double TDE scenario, in which two stars are disrupted in sequence, is also a possible explanation for producing the observed early bump and main peak. We anticipate that the multicolor light curves of TDEs, especially in the very early stages, and the underlying physics can be better understood in the near future with the assistance of dedicated surveys such as the deep high-cadence survey of the 2.5-meter Wide Field Survey Telescope (WFST). △ Less

Submitted 26 March, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

Comments: 14 pages, 8 figures,accepted for publication by ApJL

arXiv:2402.09460 [pdf, other]

doi 10.1109/ICASSP48485.2024.10448277

Unsupervised learning based end-to-end delayless generative fixed-filter active noise control

Authors: Zhengding Luo, Dongyuan Shi, Xiaoyi Shen, Woon-Seng Gan

Abstract: Delayless noise control is achieved by our earlier generative fixed-filter active noise control (GFANC) framework through efficient coordination between the co-processor and real-time controller. However, the one-dimensional convolutional neural network (1D CNN) in the co-processor requires initial training using labelled noise datasets. Labelling noise data can be resource-intensive and may intro… ▽ More Delayless noise control is achieved by our earlier generative fixed-filter active noise control (GFANC) framework through efficient coordination between the co-processor and real-time controller. However, the one-dimensional convolutional neural network (1D CNN) in the co-processor requires initial training using labelled noise datasets. Labelling noise data can be resource-intensive and may introduce some biases. In this paper, we propose an unsupervised-GFANC approach to simplify the 1D CNN training process and enhance its practicality. During training, the co-processor and real-time controller are integrated into an end-to-end differentiable ANC system. This enables us to use the accumulated squared error signal as the loss for training the 1D CNN. With this unsupervised learning paradigm, the unsupervised-GFANC method not only omits the labelling process but also exhibits better noise reduction performance compared to the supervised GFANC method in real noise experiments. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024)

arXiv:2402.07374 [pdf, ps, other]

The White-light Emissions in Two X-class Flares Observed by ASO-S and CHASE

Authors: Ying Li, Zhichen Jing, De-Chao Song, Qiao Li, Jun Tian, Xiaofeng Liu, Ya Wang, M. D. Ding, Andrea Francesco Battaglia, Li Feng, Hui Li, Weiqun Gan

Abstract: The white-light continuum emissions in solar flares (i.e., white-light flares) are usually observed on the solar disk but, in a few cases, off the limb. Here we present on-disk as well as off-limb continuum emissions at 3600 Å (in the Balmer continuum) in an X2.1 flare (SOL2023-03-03T17:52) and an X1.5 flare (SOL2023-08-07T20:46), respectively, observed by the White-light Solar Telescope (WST) on… ▽ More The white-light continuum emissions in solar flares (i.e., white-light flares) are usually observed on the solar disk but, in a few cases, off the limb. Here we present on-disk as well as off-limb continuum emissions at 3600 Å (in the Balmer continuum) in an X2.1 flare (SOL2023-03-03T17:52) and an X1.5 flare (SOL2023-08-07T20:46), respectively, observed by the White-light Solar Telescope (WST) on the Advanced Space-based Solar Observatory (ASO-S). These continuum emissions are seen at the ribbons for the X2.1 flare and on loops during the X1.5 event, in which the latter also appears in the decay phase. These emissions also show up in the pseudo-continuum images at Fe I λ6173 from the Helioseismic and Magnetic Imager (HMI) on the Solar Dynamics Observatory (SDO). In addition, the ribbon sources in the X2.1 flare exhibit significant enhancements in the Fe I line at 6569.2 Å and the nearby continuum observed by the Chinese Hα Solar Explorer (CHASE). It is found that the on-disk continuum emissions in the X2.1 flare are related to a nonthermal electron-beam heating either directly or indirectly, while the off-limb emissions in the X1.5 flare are associated with thermal plasma cooling or due to Thomson scattering. These comprehensive continuum observations can provide good constraints on flare energy deposition models, which helps well understand the physical mechanism of white-light flares. △ Less

Submitted 11 February, 2024; originally announced February 2024.

Comments: 13 pages, 1 table, 4 figures, accepted for publication in ApJL

arXiv:2402.02694 [pdf, other]

Description on IEEE ICME 2024 Grand Challenge: Semi-supervised Acoustic Scene Classification under Domain Shift

Authors: Jisheng Bai, Mou Wang, Haohe Liu, Han Yin, Yafei Jia, Siwei Huang, Yutong Du, Dongzhe Zhang, Dongyuan Shi, Woon-Seng Gan, Mark D. Plumbley, Susanto Rahardja, Bin Xiang, Jianfeng Chen

Abstract: Acoustic scene classification (ASC) is a crucial research problem in computational auditory scene analysis, and it aims to recognize the unique acoustic characteristics of an environment. One of the challenges of the ASC task is the domain shift between training and testing data. Since 2018, ASC challenges have focused on the generalization of ASC models across different recording devices. Althoug… ▽ More Acoustic scene classification (ASC) is a crucial research problem in computational auditory scene analysis, and it aims to recognize the unique acoustic characteristics of an environment. One of the challenges of the ASC task is the domain shift between training and testing data. Since 2018, ASC challenges have focused on the generalization of ASC models across different recording devices. Although this task, in recent years, has achieved substantial progress in device generalization, the challenge of domain shift between different geographical regions, involving discrepancies such as time, space, culture, and language, remains insufficiently explored at present. In addition, considering the abundance of unlabeled acoustic scene data in the real world, it is important to study the possible ways to utilize these unlabelled data. Therefore, we introduce the task Semi-supervised Acoustic Scene Classification under Domain Shift in the ICME 2024 Grand Challenge. We encourage participants to innovate with semi-supervised learning techniques, aiming to develop more robust ASC models under domain shift. △ Less

Submitted 28 February, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

arXiv:2401.13998 [pdf, other]

WAL-Net: Weakly supervised auxiliary task learning network for carotid plaques classification

Authors: Haitao Gan, Lingchao Fu, Ran Zhou, Weiyan Gan, Furong Wang, Xiaoyan Wu, Zhi Yang, Zhongwei Huang

Abstract: The classification of carotid artery ultrasound images is a crucial means for diagnosing carotid plaques, holding significant clinical relevance for predicting the risk of stroke. Recent research suggests that utilizing plaque segmentation as an auxiliary task for classification can enhance performance by leveraging the correlation between segmentation and classification tasks. However, this appro… ▽ More The classification of carotid artery ultrasound images is a crucial means for diagnosing carotid plaques, holding significant clinical relevance for predicting the risk of stroke. Recent research suggests that utilizing plaque segmentation as an auxiliary task for classification can enhance performance by leveraging the correlation between segmentation and classification tasks. However, this approach relies on obtaining a substantial amount of challenging-to-acquire segmentation annotations. This paper proposes a novel weakly supervised auxiliary task learning network model (WAL-Net) to explore the interdependence between carotid plaque classification and segmentation tasks. The plaque classification task is primary task, while the plaque segmentation task serves as an auxiliary task, providing valuable information to enhance the performance of the primary task. Weakly supervised learning is adopted in the auxiliary task to completely break away from the dependence on segmentation annotations. Experiments and evaluations are conducted on a dataset comprising 1270 carotid plaque ultrasound images from Wuhan University Zhongnan Hospital. Results indicate that the proposed method achieved an approximately 1.3% improvement in carotid plaque classification accuracy compared to the baseline network. Specifically, the accuracy of mixed-echoic plaques classification increased by approximately 3.3%, demonstrating the effectiveness of our approach. △ Less

Submitted 27 January, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

arXiv:2401.08678 [pdf, other]

Sub-band and Full-band Interactive U-Net with DPRNN for Demixing Cross-talk Stereo Music

Authors: Han Yin, Mou Wang, Jisheng Bai, Dongyuan Shi, Woon-Seng Gan, Jianfeng Chen

Abstract: This paper presents a detailed description of our proposed methods for the ICASSP 2024 Cadenza Challenge. Experimental results show that the proposed system can achieve better performance than official baselines. This paper presents a detailed description of our proposed methods for the ICASSP 2024 Cadenza Challenge. Experimental results show that the proposed system can achieve better performance than official baselines. △ Less

Submitted 10 January, 2024; originally announced January 2024.

Comments: Submitted to ICASSP 2024

arXiv:2401.07275 [pdf, ps, other]

A Statistical Study of Solar White-Light Flares Observed by the White-light Solar Telescope of the Lyman-alpha Solar Telescope on the Advanced Space-based Solar Observatory (ASO-S/LST/WST) at 360 nm

Authors: Zhichen Jing, Ying Li, Li Feng, Hui Li, Yu Huang, Youping Li, Yang Su, Wei Chen, Jun Tian, Dechao Song, Jingwei Li, Jianchao Xue, Jie Zhao, Lei Lu, Beili Ying, Ping Zhang, Yingna Su, Qingmin Zhang, Dong Li, Yunyi Ge, Shuting Li, Qiao Li, Gen Li, Xiaofeng Liu, Guanglu Shi , et al. (4 additional authors not shown)

Abstract: Solar white-light flares (WLFs) are those accompanied by brightenings in the optical continuum or integrated light. The White-light Solar Telescope (WST), as an instrument of the Lyman-alpha Solar Telescope (LST) on the Advanced Space-based Solar Observatory (ASO-S), provides continuous solar full-disk images at 360 nm, which can be used to study WLFs. We analyze 205 major flares above M1.0 from O… ▽ More Solar white-light flares (WLFs) are those accompanied by brightenings in the optical continuum or integrated light. The White-light Solar Telescope (WST), as an instrument of the Lyman-alpha Solar Telescope (LST) on the Advanced Space-based Solar Observatory (ASO-S), provides continuous solar full-disk images at 360 nm, which can be used to study WLFs. We analyze 205 major flares above M1.0 from October 2022 to May 2023 and identify 49 WLFs at 360 nm from WST observations, i.e. with an occurrence rate of 23.9%. The percentages of WLFs for M1 - M4 (31 out of 180), M5 - M9 (11 out of 18), and above X1 (7 for all) flares are 17.2%, 61.1%, and 100%, respectively, namely the larger the flares, the more likely they are WLFs at 360 nm. We further analyze 39 WLFs among the identified WLFs and investigate their properties such as white-light enhancement, duration, and brightening area. It is found that the relative enhancement of the white-light emission at 360 nm is mostly (>90%) less than 30% and the mean enhancement is 19.4%. The WLFs' duration at 360 nm is mostly (>80%) less than 20 minutes and its mean is 10.3 minutes. The brightening area at 360 nm is mostly (>75%) less than 500 arcsecond2 and the median value is 225. We find that there exist good correlations between the white-light enhancement/duration/area and the peak soft X-ray (SXR) flux of the flare, with correlation coefficients of 0.68, 0.58, and 0.80, respectively. In addition, the white-light emission in most WLFs peaks around the same time as the temporal derivative of SXR flux as well as the hard X-ray emission at 20 - 50 keV, indicative of Neupert effect. It is also found that the limb WLFs are more likely to have a greater enhancement, which is consistent with numerical simulations. △ Less

Submitted 14 January, 2024; originally announced January 2024.

arXiv:2401.06624 [pdf, ps, other]

Generalised Whittaker models as instances of relative Langlands duality II: Plancherel density and global periods

Authors: Wee Teck Gan, Bryan Wang Peng Jun

Abstract: In an earlier paper of the authors, a general family of instances of the relative Langlands duality of Ben-Zvi-Sakellaridis-Venkatesh [BZSV] were proposed and studied in the setting of branching problems for smooth representations. In this paper, we show the numerical conjectures of [BZSV] for the local Plancherel density, as well as an application to their conjectures on global periods, for this… ▽ More In an earlier paper of the authors, a general family of instances of the relative Langlands duality of Ben-Zvi-Sakellaridis-Venkatesh [BZSV] were proposed and studied in the setting of branching problems for smooth representations. In this paper, we show the numerical conjectures of [BZSV] for the local Plancherel density, as well as an application to their conjectures on global periods, for this general family of instances. △ Less

Submitted 12 January, 2024; originally announced January 2024.

MSC Class: 22E50; 11F70

arXiv:2401.01599 [pdf, other]

Generalization Error Curves for Analytic Spectral Algorithms under Power-law Decay

Authors: Yicheng Li, Weiye Gan, Zuoqiang Shi, Qian Lin

Abstract: The generalization error curve of certain kernel regression method aims at determining the exact order of generalization error with various source condition, noise level and choice of the regularization parameter rather than the minimax rate. In this work, under mild assumptions, we rigorously provide a full characterization of the generalization error curves of the kernel gradient descent method… ▽ More The generalization error curve of certain kernel regression method aims at determining the exact order of generalization error with various source condition, noise level and choice of the regularization parameter rather than the minimax rate. In this work, under mild assumptions, we rigorously provide a full characterization of the generalization error curves of the kernel gradient descent method (and a large class of analytic spectral algorithms) in kernel regression. Consequently, we could sharpen the near inconsistency of kernel interpolation and clarify the saturation effects of kernel regression algorithms with higher qualification, etc. Thanks to the neural tangent kernel theory, these results greatly improve our understanding of the generalization behavior of training the wide neural networks. A novel technical contribution, the analytic functional argument, might be of independent interest. △ Less

Submitted 3 January, 2024; originally announced January 2024.

arXiv:2312.14560 [pdf]

doi 10.1016/j.compositesb.2024.111287

Optical wood with switchable solar transmittance for all-round thermal management

Authors: He Gao, Ying Li, Yanjun Xie, Daxin Liang, Jian Li, Yonggui Wang, Zefang Xiao, Haigang Wang, Wentao Gan, Lorenzo Pattelli, Hongbo Xu

Abstract: Technologies enabling passive daytime radiative cooling and daylight harvesting are highly relevant for energy-efficient buildings. Despite recent progress demonstrated with passively cooling polymer coatings, however, it remains challenging to combine also a passive heat gain mechanism into a single substrate for all-round thermal management. Herein, we developed an optical wood (OW) with switcha… ▽ More Technologies enabling passive daytime radiative cooling and daylight harvesting are highly relevant for energy-efficient buildings. Despite recent progress demonstrated with passively cooling polymer coatings, however, it remains challenging to combine also a passive heat gain mechanism into a single substrate for all-round thermal management. Herein, we developed an optical wood (OW) with switchable transmittance of solar irradiation enabled by the hierarchically porous structure, ultralow absorption in solar spectrum and high infrared absorption of cellulose nanofibers. After delignification, the OW shows a high solar reflectance (94.9%) in the visible and high broadband emissivity (0.93) in the infrared region (2.5-25 $μ$m). Owing to the exceptional mass transport of its aligned cellulose nanofibers, OW can quickly switch to a new highly transparent state following phenylethanol impregnation. The solar transmittance of optical wood (OW-II state) can reach 68.4% from 250 to 2500 nm. The switchable OW exhibits efficient radiative cooling to 4.5 °C below ambient temperature in summer (81.4 W m$^{-2}$ cooling power), and daylight heating to 5.6 °C above the temperature of natural wood in winter (heating power 229.5 W m$^{-2}$), suggesting its promising role as a low-cost and sustainable solution to all-season thermal management applications. △ Less

Submitted 11 March, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

Comments: accepted version of the manuscript published on Composites Part B: Engineering

Journal ref: Composites Part B: Engineering, Volume 275, 2024, 111287, ISSN 1879-1069

arXiv:2312.10073 [pdf, other]

Data Scarcity in Recommendation Systems: A Survey

Authors: Zefeng Chen, Wensheng Gan, Jiayang Wu, Kaixia Hu, Hong Lin

Abstract: The prevalence of online content has led to the widespread adoption of recommendation systems (RSs), which serve diverse purposes such as news, advertisements, and e-commerce recommendations. Despite their significance, data scarcity issues have significantly impaired the effectiveness of existing RS models and hindered their progress. To address this challenge, the concept of knowledge transfer,… ▽ More The prevalence of online content has led to the widespread adoption of recommendation systems (RSs), which serve diverse purposes such as news, advertisements, and e-commerce recommendations. Despite their significance, data scarcity issues have significantly impaired the effectiveness of existing RS models and hindered their progress. To address this challenge, the concept of knowledge transfer, particularly from external sources like pre-trained language models, emerges as a potential solution to alleviate data scarcity and enhance RS development. However, the practice of knowledge transfer in RSs is intricate. Transferring knowledge between domains introduces data disparities, and the application of knowledge transfer in complex RS scenarios can yield negative consequences if not carefully designed. Therefore, this article contributes to this discourse by addressing the implications of data scarcity on RSs and introducing various strategies, such as data augmentation, self-supervised learning, transfer learning, broad learning, and knowledge graph utilization, to mitigate this challenge. Furthermore, it delves into the challenges and future direction within the RS domain, offering insights that are poised to facilitate the development and implementation of robust RSs, particularly when confronted with data scarcity. We aim to provide valuable guidance and inspiration for researchers and practitioners, ultimately driving advancements in the field of RS. △ Less

Submitted 7 December, 2023; originally announced December 2023.

Comments: ACM Transactions on Recommender Systems, 32 pages

arXiv:2312.03718 [pdf, other]

Large Language Models in Law: A Survey

Authors: Jinqi Lai, Wensheng Gan, Jiayang Wu, Zhenlian Qi, Philip S. Yu

Abstract: The advent of artificial intelligence (AI) has significantly impacted the traditional judicial industry. Moreover, recently, with the development of AI-generated content (AIGC), AI and law have found applications in various domains, including image recognition, automatic text generation, and interactive chat. With the rapid emergence and growing popularity of large models, it is evident that AI wi… ▽ More The advent of artificial intelligence (AI) has significantly impacted the traditional judicial industry. Moreover, recently, with the development of AI-generated content (AIGC), AI and law have found applications in various domains, including image recognition, automatic text generation, and interactive chat. With the rapid emergence and growing popularity of large models, it is evident that AI will drive transformation in the traditional judicial industry. However, the application of legal large language models (LLMs) is still in its nascent stage. Several challenges need to be addressed. In this paper, we aim to provide a comprehensive survey of legal LLMs. We not only conduct an extensive survey of LLMs, but also expose their applications in the judicial system. We first provide an overview of AI technologies in the legal field and showcase the recent research in LLMs. Then, we discuss the practical implementation presented by legal LLMs, such as providing legal advice to users and assisting judges during trials. In addition, we explore the limitations of legal LLMs, including data, algorithms, and judicial practice. Finally, we summarize practical recommendations and propose future development directions to address these challenges. △ Less

Submitted 25 November, 2023; originally announced December 2023.

Comments: Preprint

arXiv:2311.18810 [pdf, other]

Convergence of Nonconvex PnP-ADMM with MMSE Denoisers

Authors: Chicago Park, Shirin Shoushtari, Weijie Gan, Ulugbek S. Kamilov

Abstract: Plug-and-Play Alternating Direction Method of Multipliers (PnP-ADMM) is a widely-used algorithm for solving inverse problems by integrating physical measurement models and convolutional neural network (CNN) priors. PnP-ADMM has been theoretically proven to converge for convex data-fidelity terms and nonexpansive CNNs. It has however been observed that PnP-ADMM often empirically converges even for… ▽ More Plug-and-Play Alternating Direction Method of Multipliers (PnP-ADMM) is a widely-used algorithm for solving inverse problems by integrating physical measurement models and convolutional neural network (CNN) priors. PnP-ADMM has been theoretically proven to converge for convex data-fidelity terms and nonexpansive CNNs. It has however been observed that PnP-ADMM often empirically converges even for expansive CNNs. This paper presents a theoretical explanation for the observed stability of PnP-ADMM based on the interpretation of the CNN prior as a minimum mean-squared error (MMSE) denoiser. Our explanation parallels a similar argument recently made for the iterative shrinkage/thresholding algorithm variant of PnP (PnP-ISTA) and relies on the connection between MMSE denoisers and proximal operators. We also numerically evaluate the performance gap between PnP-ADMM using a nonexpansive DnCNN denoiser and expansive DRUNet denoiser, thus motivating the use of expansive CNNs. △ Less

Submitted 30 November, 2023; originally announced November 2023.

arXiv:2311.18073 [pdf, other]

DiffGEPCI: 3D MRI Synthesis from mGRE Signals using 2.5D Diffusion Model

Authors: Yuyang Hu, Satya V. V. N. Kothapalli, Weijie Gan, Alexander L. Sukstanskii, Gregory F. Wu, Manu Goyal, Dmitriy A. Yablonskiy, Ulugbek S. Kamilov

Abstract: We introduce a new framework called DiffGEPCI for cross-modality generation in magnetic resonance imaging (MRI) using a 2.5D conditional diffusion model. DiffGEPCI can synthesize high-quality Fluid Attenuated Inversion Recovery (FLAIR) and Magnetization Prepared-Rapid Gradient Echo (MPRAGE) images, without acquiring corresponding measurements, by leveraging multi-Gradient-Recalled Echo (mGRE) MRI… ▽ More We introduce a new framework called DiffGEPCI for cross-modality generation in magnetic resonance imaging (MRI) using a 2.5D conditional diffusion model. DiffGEPCI can synthesize high-quality Fluid Attenuated Inversion Recovery (FLAIR) and Magnetization Prepared-Rapid Gradient Echo (MPRAGE) images, without acquiring corresponding measurements, by leveraging multi-Gradient-Recalled Echo (mGRE) MRI signals as conditional inputs. DiffGEPCI operates in a two-step fashion: it initially estimates a 3D volume slice-by-slice using the axial plane and subsequently applies a refinement algorithm (referred to as 2.5D) to enhance the quality of the coronal and sagittal planes. Experimental validation on real mGRE data shows that DiffGEPCI achieves excellent performance, surpassing generative adversarial networks (GANs) and traditional diffusion models. △ Less

Submitted 18 April, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.15445 [pdf, other]

FLAIR: A Conditional Diffusion Framework with Applications to Face Video Restoration

Authors: Zihao Zou, Jiaming Liu, Shirin Shoushtari, Yubo Wang, Weijie Gan, Ulugbek S. Kamilov

Abstract: Face video restoration (FVR) is a challenging but important problem where one seeks to recover a perceptually realistic face videos from a low-quality input. While diffusion probabilistic models (DPMs) have been shown to achieve remarkable performance for face image restoration, they often fail to preserve temporally coherent, high-quality videos, compromising the fidelity of reconstructed faces.… ▽ More Face video restoration (FVR) is a challenging but important problem where one seeks to recover a perceptually realistic face videos from a low-quality input. While diffusion probabilistic models (DPMs) have been shown to achieve remarkable performance for face image restoration, they often fail to preserve temporally coherent, high-quality videos, compromising the fidelity of reconstructed faces. We present a new conditional diffusion framework called FLAIR for FVR. FLAIR ensures temporal consistency across frames in a computationally efficient fashion by converting a traditional image DPM into a video DPM. The proposed conversion uses a recurrent video refinement layer and a temporal self-attention at different scales. FLAIR also uses a conditional iterative refinement process to balance the perceptual and distortion quality during inference. This process consists of two key components: a data-consistency module that analytically ensures that the generated video precisely matches its degraded observation and a coarse-to-fine image enhancement module specifically for facial regions. Our extensive experiments show superiority of FLAIR over the current state-of-the-art (SOTA) for video super-resolution, deblurring, JPEG restoration, and space-time frame interpolation on two high-quality face video datasets. △ Less

Submitted 26 November, 2023; originally announced November 2023.

Comments: 32 pages, 27 figures

arXiv:2311.14068 [pdf, other]

Interactive Dual-Conformer with Scene-Inspired Mask for Soft Sound Event Detection

Authors: Han Yin, Jisheng Bai, Mou Wang, Dongyuan Shi, Woon-Seng Gan, Jianfeng Chen

Abstract: Traditional binary hard labels for sound event detection (SED) lack details about the complexity and variability of sound event distributions. Recently, a novel annotation workflow is proposed to generate fine-grained non-binary soft labels, resulting in a new real-life dataset named MAESTRO Real for SED. In this paper, we first propose an interactive dual-conformer (IDC) module, in which a cross-… ▽ More Traditional binary hard labels for sound event detection (SED) lack details about the complexity and variability of sound event distributions. Recently, a novel annotation workflow is proposed to generate fine-grained non-binary soft labels, resulting in a new real-life dataset named MAESTRO Real for SED. In this paper, we first propose an interactive dual-conformer (IDC) module, in which a cross-interaction mechanism is applied to effectively exploit the information from soft labels. In addition, a novel scene-inspired mask (SIM) based on soft labels is incorporated for more precise SED predictions. The SIM is initially generated through a statistical approach, referred as SIM-V1. However, the fixed artificial mask may mismatch the SED model, resulting in limited effectiveness. Therefore, we further propose SIM-V2, which employs a word embedding model for adaptive SIM estimation. Experimental results show that the proposed IDC module can effectively utilize the information from soft labels, and the integration of SIM-V1 can further improve the accuracy. In addition, the impact of different word embedding dimensions on SIM-V2 is explored, and the results show that the appropriate dimension can enable SIM-V2 achieve superior performance than SIM-V1. In DCASE 2023 Challenge Task4B, the proposed system achieved the top ranking performance on the evaluation dataset of MAESTRO Real. △ Less

Submitted 7 December, 2023; v1 submitted 23 November, 2023; originally announced November 2023.

Comments: to be improved (unfinished)

arXiv:2311.13165 [pdf, other]

Multimodal Large Language Models: A Survey

Authors: Jiayang Wu, Wensheng Gan, Zefeng Chen, Shicheng Wan, Philip S. Yu

Abstract: The exploration of multimodal language models integrates multiple data types, such as images, text, language, audio, and other heterogeneity. While the latest large language models excel in text-based tasks, they often struggle to understand and process other data types. Multimodal models address this limitation by combining various modalities, enabling a more comprehensive understanding of divers… ▽ More The exploration of multimodal language models integrates multiple data types, such as images, text, language, audio, and other heterogeneity. While the latest large language models excel in text-based tasks, they often struggle to understand and process other data types. Multimodal models address this limitation by combining various modalities, enabling a more comprehensive understanding of diverse data. This paper begins by defining the concept of multimodal and examining the historical development of multimodal algorithms. Furthermore, we introduce a range of multimodal products, focusing on the efforts of major technology companies. A practical guide is provided, offering insights into the technical aspects of multimodal models. Moreover, we present a compilation of the latest algorithms and commonly used datasets, providing researchers with valuable resources for experimentation and evaluation. Lastly, we explore the applications of multimodal models and discuss the challenges associated with their development. By addressing these aspects, this paper aims to facilitate a deeper understanding of multimodal models and their potential in various domains. △ Less

Submitted 22 November, 2023; originally announced November 2023.

Comments: IEEE BigData 2023. 10 pages

arXiv:2311.13160 [pdf, other]

Large Language Models in Education: Vision and Opportunities

Authors: Wensheng Gan, Zhenlian Qi, Jiayang Wu, Jerry Chun-Wei Lin

Abstract: With the rapid development of artificial intelligence technology, large language models (LLMs) have become a hot research topic. Education plays an important role in human social development and progress. Traditional education faces challenges such as individual student differences, insufficient allocation of teaching resources, and assessment of teaching effectiveness. Therefore, the applications… ▽ More With the rapid development of artificial intelligence technology, large language models (LLMs) have become a hot research topic. Education plays an important role in human social development and progress. Traditional education faces challenges such as individual student differences, insufficient allocation of teaching resources, and assessment of teaching effectiveness. Therefore, the applications of LLMs in the field of digital/smart education have broad prospects. The research on educational large models (EduLLMs) is constantly evolving, providing new methods and approaches to achieve personalized learning, intelligent tutoring, and educational assessment goals, thereby improving the quality of education and the learning experience. This article aims to investigate and summarize the application of LLMs in smart education. It first introduces the research background and motivation of LLMs and explains the essence of LLMs. It then discusses the relationship between digital education and EduLLMs and summarizes the current research status of educational large models. The main contributions are the systematic summary and vision of the research background, motivation, and application of large models for education (LLM4Edu). By reviewing existing research, this article provides guidance and insights for educators, researchers, and policy-makers to gain a deep understanding of the potential and challenges of LLM4Edu. It further provides guidance for further advancing the development and application of LLM4Edu, while still facing technical, ethical, and practical challenges requiring further research and exploration. △ Less

Submitted 22 November, 2023; originally announced November 2023.

Comments: IEEE BigData 2023. 10 pages

arXiv:2311.12371 [pdf, other]

AudioLog: LLMs-Powered Long Audio Logging with Hybrid Token-Semantic Contrastive Learning

Authors: Jisheng Bai, Han Yin, Mou Wang, Dongyuan Shi, Woon-Seng Gan, Jianfeng Chen, Susanto Rahardja

Abstract: Previous studies in automated audio captioning have faced difficulties in accurately capturing the complete temporal details of acoustic scenes and events within long audio sequences. This paper presents AudioLog, a large language models (LLMs)-powered audio logging system with hybrid token-semantic contrastive learning. Specifically, we propose to fine-tune the pre-trained hierarchical token-sema… ▽ More Previous studies in automated audio captioning have faced difficulties in accurately capturing the complete temporal details of acoustic scenes and events within long audio sequences. This paper presents AudioLog, a large language models (LLMs)-powered audio logging system with hybrid token-semantic contrastive learning. Specifically, we propose to fine-tune the pre-trained hierarchical token-semantic audio Transformer by incorporating contrastive learning between hybrid acoustic representations. We then leverage LLMs to generate audio logs that summarize textual descriptions of the acoustic environment. Finally, we evaluate the AudioLog system on two datasets with both scene and event annotations. Experiments show that the proposed system achieves exceptional performance in acoustic scene classification and sound event detection, surpassing existing methods in the field. Further analysis of the prompts to LLMs demonstrates that AudioLog can effectively summarize long audio sequences. To the best of our knowledge, this approach is the first attempt to leverage LLMs for summarizing long audio sequences. △ Less

Submitted 4 January, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

arXiv:2311.10945 [pdf, other]

An Empirical Bayes Framework for Open-Domain Dialogue Generation

Authors: Jing Yang Lee, Kong Aik Lee, Woon-Seng Gan

Abstract: To engage human users in meaningful conversation, open-domain dialogue agents are required to generate diverse and contextually coherent dialogue. Despite recent advancements, which can be attributed to the usage of pretrained language models, the generation of diverse and coherent dialogue remains an open research problem. A popular approach to address this issue involves the adaptation of variat… ▽ More To engage human users in meaningful conversation, open-domain dialogue agents are required to generate diverse and contextually coherent dialogue. Despite recent advancements, which can be attributed to the usage of pretrained language models, the generation of diverse and coherent dialogue remains an open research problem. A popular approach to address this issue involves the adaptation of variational frameworks. However, while these approaches successfully improve diversity, they tend to compromise on contextual coherence. Hence, we propose the Bayesian Open-domain Dialogue with Empirical Bayes (BODEB) framework, an empirical bayes framework for constructing an Bayesian open-domain dialogue agent by leveraging pretrained parameters to inform the prior and posterior parameter distributions. Empirical results show that BODEB achieves better results in terms of both diversity and coherence compared to variational frameworks. △ Less

Submitted 17 November, 2023; originally announced November 2023.

arXiv:2311.10943 [pdf, other]

Partially Randomizing Transformer Weights for Dialogue Response Diversity

Authors: Jing Yang Lee, Kong Aik Lee, Woon-Seng Gan

Abstract: Despite recent progress in generative open-domain dialogue, the issue of low response diversity persists. Prior works have addressed this issue via either novel objective functions, alternative learning approaches such as variational frameworks, or architectural extensions such as the Randomized Link (RL) Transformer. However, these approaches typically entail either additional difficulties during… ▽ More Despite recent progress in generative open-domain dialogue, the issue of low response diversity persists. Prior works have addressed this issue via either novel objective functions, alternative learning approaches such as variational frameworks, or architectural extensions such as the Randomized Link (RL) Transformer. However, these approaches typically entail either additional difficulties during training/inference, or a significant increase in model size and complexity. Hence, we propose the \underline{Pa}rtially \underline{Ra}ndomized trans\underline{Former} (PaRaFormer), a simple extension of the transformer which involves freezing the weights of selected layers after random initialization. Experimental results reveal that the performance of the PaRaformer is comparable to that of the aforementioned approaches, despite not entailing any additional training difficulty or increase in model complexity. △ Less

Submitted 17 November, 2023; originally announced November 2023.

arXiv:2311.07226 [pdf, other]

Large Language Models for Robotics: A Survey

Authors: Fanlong Zeng, Wensheng Gan, Yongheng Wang, Ning Liu, Philip S. Yu

Abstract: The human ability to learn, generalize, and control complex manipulation tasks through multi-modality feedback suggests a unique capability, which we refer to as dexterity intelligence. Understanding and assessing this intelligence is a complex task. Amidst the swift progress and extensive proliferation of large language models (LLMs), their applications in the field of robotics have garnered incr… ▽ More The human ability to learn, generalize, and control complex manipulation tasks through multi-modality feedback suggests a unique capability, which we refer to as dexterity intelligence. Understanding and assessing this intelligence is a complex task. Amidst the swift progress and extensive proliferation of large language models (LLMs), their applications in the field of robotics have garnered increasing attention. LLMs possess the ability to process and generate natural language, facilitating efficient interaction and collaboration with robots. Researchers and engineers in the field of robotics have recognized the immense potential of LLMs in enhancing robot intelligence, human-robot interaction, and autonomy. Therefore, this comprehensive review aims to summarize the applications of LLMs in robotics, delving into their impact and contributions to key areas such as robot control, perception, decision-making, and path planning. We first provide an overview of the background and development of LLMs for robotics, followed by a description of the benefits of LLMs for robotics and recent advancements in robotics models based on LLMs. We then delve into the various techniques used in the model, including those employed in perception, decision-making, control, and interaction. Finally, we explore the applications of LLMs in robotics and some potential challenges they may face in the near future. Embodied intelligence is the future of intelligent science, and LLMs-based robotics is one of the promising but challenging paths to achieve this. △ Less

Submitted 13 November, 2023; originally announced November 2023.

Comments: Preprint. 4 figures, 3 tables

arXiv:2311.05804 [pdf, other]

Model-as-a-Service (MaaS): A Survey

Authors: Wensheng Gan, Shicheng Wan, Philip S. Yu

Abstract: Due to the increased number of parameters and data in the pre-trained model exceeding a certain level, a foundation model (e.g., a large language model) can significantly improve downstream task performance and emerge with some novel special abilities (e.g., deep learning, complex reasoning, and human alignment) that were not present before. Foundation models are a form of generative artificial in… ▽ More Due to the increased number of parameters and data in the pre-trained model exceeding a certain level, a foundation model (e.g., a large language model) can significantly improve downstream task performance and emerge with some novel special abilities (e.g., deep learning, complex reasoning, and human alignment) that were not present before. Foundation models are a form of generative artificial intelligence (GenAI), and Model-as-a-Service (MaaS) has emerged as a groundbreaking paradigm that revolutionizes the deployment and utilization of GenAI models. MaaS represents a paradigm shift in how we use AI technologies and provides a scalable and accessible solution for developers and users to leverage pre-trained AI models without the need for extensive infrastructure or expertise in model training. In this paper, the introduction aims to provide a comprehensive overview of MaaS, its significance, and its implications for various industries. We provide a brief review of the development history of "X-as-a-Service" based on cloud computing and present the key technologies involved in MaaS. The development of GenAI models will become more democratized and flourish. We also review recent application studies of MaaS. Finally, we highlight several challenges and future issues in this promising area. MaaS is a new deployment and service paradigm for different AI-based models. We hope this review will inspire future research in the field of MaaS. △ Less

Submitted 9 November, 2023; originally announced November 2023.

Comments: Preprint. 3 figures, 1 tables

arXiv:2311.04248 [pdf, other]

DDPET-3D: Dose-aware Diffusion Model for 3D Ultra Low-dose PET Imaging

Authors: Huidong Xie, Weijie Gan, Bo Zhou, Xiongchao Chen, Qiong Liu, Xueqi Guo, Liang Guo, Hongyu An, Ulugbek S. Kamilov, Ge Wang, Chi Liu

Abstract: As PET imaging is accompanied by substantial radiation exposure and cancer risk, reducing radiation dose in PET scans is an important topic. Recently, diffusion models have emerged as the new state-of-the-art generative model to generate high-quality samples and have demonstrated strong potential for various tasks in medical imaging. However, it is difficult to extend diffusion models for 3D image… ▽ More As PET imaging is accompanied by substantial radiation exposure and cancer risk, reducing radiation dose in PET scans is an important topic. Recently, diffusion models have emerged as the new state-of-the-art generative model to generate high-quality samples and have demonstrated strong potential for various tasks in medical imaging. However, it is difficult to extend diffusion models for 3D image reconstructions due to the memory burden. Directly stacking 2D slices together to create 3D image volumes would results in severe inconsistencies between slices. Previous works tried to either apply a penalty term along the z-axis to remove inconsistencies or reconstruct the 3D image volumes with 2 pre-trained perpendicular 2D diffusion models. Nonetheless, these previous methods failed to produce satisfactory results in challenging cases for PET image denoising. In addition to administered dose, the noise levels in PET images are affected by several other factors in clinical settings, e.g. scan time, medical history, patient size, and weight, etc. Therefore, a method to simultaneously denoise PET images with different noise-levels is needed. Here, we proposed a Dose-aware Diffusion model for 3D low-dose PET imaging (DDPET-3D) to address these challenges. We extensively evaluated DDPET-3D on 100 patients with 6 different low-dose levels (a total of 600 testing studies), and demonstrated superior performance over previous diffusion models for 3D imaging problems as well as previous noise-aware medical image denoising models. The code is available at: https://github.com/xxx/xxx. △ Less

Submitted 28 November, 2023; v1 submitted 7 November, 2023; originally announced November 2023.

Comments: Paper under review. 16 pages, 11 figures, 4 tables

arXiv:2311.02121 [pdf, other]

Enhancing Monocular Height Estimation from Aerial Images with Street-view Images

Authors: Xiaomou Hou, Wanshui Gan, Naoto Yokoya

Abstract: Accurate height estimation from monocular aerial imagery presents a significant challenge due to its inherently ill-posed nature. This limitation is rooted in the absence of adequate geometric constraints available to the model when training with monocular imagery. Without additional geometric information to supplement the monocular image data, the model's ability to provide reliable estimations i… ▽ More Accurate height estimation from monocular aerial imagery presents a significant challenge due to its inherently ill-posed nature. This limitation is rooted in the absence of adequate geometric constraints available to the model when training with monocular imagery. Without additional geometric information to supplement the monocular image data, the model's ability to provide reliable estimations is compromised. In this paper, we propose a method that enhances monocular height estimation by incorporating street-view images. Our insight is that street-view images provide a distinct viewing perspective and rich structural details of the scene, serving as geometric constraints to enhance the performance of monocular height estimation. Specifically, we aim to optimize an implicit 3D scene representation, density field, with geometry constraints from street-view images, thereby improving the accuracy and robustness of height estimation. Our experimental results demonstrate the effectiveness of our proposed method, outperforming the baseline and offering significant improvements in terms of accuracy and structural consistency. △ Less

Submitted 3 November, 2023; originally announced November 2023.

arXiv:2311.02003 [pdf, other]

A Structured Pruning Algorithm for Model-based Deep Learning

Authors: Chicago Park, Weijie Gan, Zihao Zou, Yuyang Hu, Zhixin Sun, Ulugbek S. Kamilov

Abstract: There is a growing interest in model-based deep learning (MBDL) for solving imaging inverse problems. MBDL networks can be seen as iterative algorithms that estimate the desired image using a physical measurement model and a learned image prior specified using a convolutional neural net (CNNs). The iterative nature of MBDL networks increases the test-time computational complexity, which limits the… ▽ More There is a growing interest in model-based deep learning (MBDL) for solving imaging inverse problems. MBDL networks can be seen as iterative algorithms that estimate the desired image using a physical measurement model and a learned image prior specified using a convolutional neural net (CNNs). The iterative nature of MBDL networks increases the test-time computational complexity, which limits their applicability in certain large-scale applications. We address this issue by presenting structured pruning algorithm for model-based deep learning (SPADE) as the first structured pruning algorithm for MBDL networks. SPADE reduces the computational complexity of CNNs used within MBDL networks by pruning its non-essential weights. We propose three distinct strategies to fine-tune the pruned MBDL networks to minimize the performance loss. Each fine-tuning strategy has a unique benefit that depends on the presence of a pre-trained model and a high-quality ground truth. We validate SPADE on two distinct inverse problems, namely compressed sensing MRI and image super-resolution. Our results highlight that MBDL models pruned by SPADE can achieve substantial speed up in testing time while maintaining competitive performance. △ Less

Submitted 3 November, 2023; originally announced November 2023.

arXiv:2311.00456 [pdf, ps, other]

Partial Eruption of Solar Filaments. I. Configuration and Formation of Double-decker Filaments

Authors: Yijun Hou, Chuan Li, Ting Li, Jiangtao Su, Ye Qiu, Shuhong Yang, Liheng Yang, Leping Li, Yilin Guo, Zhengyong Hou, Qiao Song, Xianyong Bai, Guiping Zhou, Mingde Ding, Weiqun Gan, Yuanyong Deng

Abstract: Partial eruptions of solar filaments are the typical representative of solar eruptive behavior diversity. Here we investigate a typical filament partial eruption event and present integrated evidence for configuration of the pre-eruption filament and its formation. The CHASE H$α$ observations reveal structured Doppler velocity distribution within the pre-eruption filament, where distinct redshift… ▽ More Partial eruptions of solar filaments are the typical representative of solar eruptive behavior diversity. Here we investigate a typical filament partial eruption event and present integrated evidence for configuration of the pre-eruption filament and its formation. The CHASE H$α$ observations reveal structured Doppler velocity distribution within the pre-eruption filament, where distinct redshift only appeared in the east narrow part of the south filament region and then disappeared after the partial eruption while the north part dominated by blueshift remained. Combining the SDO, ASO-S observations, and NLFFF modeling results, we verify that there were two independent material flow systems within the pre-flare filament, whose magnetic topology is a special double-decker configuration consisting of two magnetic flux ropes (MFRs) with opposite magnetic twist. During the formation of this filament system, continuous magnetic flux cancellation and footpoint motion were observed around its north end. Therefore, we propose a new double-decker formation scenario that the two MFRs composing such double-decker configuration originated from two magnetic systems with different initial connections and opposite magnetic twist. Subsequent magnetic reconnection with surrounding newly-emerging fields resulted in the motion of footpoint of the upper MFR to the region around footpoint of the lower MFR, thus leading to eventual formation of the double-decker configuration consisting of two MFRs with similar footpoints but opposite signs of magnetic twist. These results provide a potential way to determine unambiguously the progenitor configuration of a partial-eruptive filament and reveal a special type of double-decker MFR configuration and a new double-decker formation scenario. △ Less

Submitted 1 November, 2023; originally announced November 2023.

Comments: 16 pages, 8 figures, 1 table, accepted for publication in ApJ as part of the Focus Issue "Early results from the Chinese Ha Solar Explorer (CHASE)"

arXiv:2311.00230 [pdf, other]

DINO-Mix: Enhancing Visual Place Recognition with Foundational Vision Model and Feature Mixing

Authors: Gaoshuang Huang, Yang Zhou, Xiaofei Hu, Chenglong Zhang, Luying Zhao, Wenjian Gan, Mingbo Hou

Abstract: Utilizing visual place recognition (VPR) technology to ascertain the geographical location of publicly available images is a pressing issue for real-world VPR applications. Although most current VPR methods achieve favorable results under ideal conditions, their performance in complex environments, characterized by lighting variations, seasonal changes, and occlusions caused by moving objects, is… ▽ More Utilizing visual place recognition (VPR) technology to ascertain the geographical location of publicly available images is a pressing issue for real-world VPR applications. Although most current VPR methods achieve favorable results under ideal conditions, their performance in complex environments, characterized by lighting variations, seasonal changes, and occlusions caused by moving objects, is generally unsatisfactory. In this study, we utilize the DINOv2 model as the backbone network for trimming and fine-tuning to extract robust image features. We propose a novel VPR architecture called DINO-Mix, which combines a foundational vision model with feature aggregation. This architecture relies on the powerful image feature extraction capabilities of foundational vision models. We employ an MLP-Mixer-based mix module to aggregate image features, resulting in globally robust and generalizable descriptors that enable high-precision VPR. We experimentally demonstrate that the proposed DINO-Mix architecture significantly outperforms current state-of-the-art (SOTA) methods. In test sets having lighting variations, seasonal changes, and occlusions (Tokyo24/7, Nordland, SF-XL-Testv1), our proposed DINO-Mix architecture achieved Top-1 accuracy rates of 91.75%, 80.18%, and 82%, respectively. Compared with SOTA methods, our architecture exhibited an average accuracy improvement of 5.14%. △ Less

Submitted 5 December, 2023; v1 submitted 31 October, 2023; originally announced November 2023.

Comments: Under review / Open source code

arXiv:2310.13699 [pdf, other]

Interaction in Metaverse: A Survey

Authors: Hong Lin, Zirun Gan, Wensheng Gan, Zhenlian Qi, Yuehua Wang, Philip S. Yu

Abstract: Human-computer interaction (HCI) emerged with the birth of the computer and has been upgraded through decades of development. Metaverse has attracted a lot of interest with its immersive experience, and HCI is the entrance to the Metaverse for people. It is predictable that HCI will determine the immersion of the Metaverse. However, the technologies of HCI in Metaverse are not mature enough. There… ▽ More Human-computer interaction (HCI) emerged with the birth of the computer and has been upgraded through decades of development. Metaverse has attracted a lot of interest with its immersive experience, and HCI is the entrance to the Metaverse for people. It is predictable that HCI will determine the immersion of the Metaverse. However, the technologies of HCI in Metaverse are not mature enough. There are many issues that we should address for HCI in the Metaverse. To this end, the purpose of this paper is to provide a systematic literature review on the key technologies and applications of HCI in the Metaverse. This paper is a comprehensive survey of HCI for the Metaverse, focusing on current technology, future directions, and challenges. First, we provide a brief overview of HCI in the Metaverse and their mutually exclusive relationships. Then, we summarize the evolution of HCI and its future characteristics in the Metaverse. Next, we envision and present the key technologies involved in HCI in the Metaverse. We also review recent case studies of HCI in the Metaverse. Finally, we highlight several challenges and future issues in this promising area. △ Less

Submitted 27 September, 2023; originally announced October 2023.

Comments: Preprint. 3 figures, 3 tables

arXiv:2310.07504 [pdf, other]

PtychoDV: Vision Transformer-Based Deep Unrolling Network for Ptychographic Image Reconstruction

Authors: Weijie Gan, Qiuchen Zhai, Michael Thompson McCann, Cristina Garcia Cardona, Ulugbek S. Kamilov, Brendt Wohlberg

Abstract: Ptychography is an imaging technique that captures multiple overlapping snapshots of a sample, illuminated coherently by a moving localized probe. The image recovery from ptychographic data is generally achieved via an iterative algorithm that solves a nonlinear phase retrieval problem derived from measured diffraction patterns. However, these iterative approaches have high computational cost. In… ▽ More Ptychography is an imaging technique that captures multiple overlapping snapshots of a sample, illuminated coherently by a moving localized probe. The image recovery from ptychographic data is generally achieved via an iterative algorithm that solves a nonlinear phase retrieval problem derived from measured diffraction patterns. However, these iterative approaches have high computational cost. In this paper, we introduce PtychoDV, a novel deep model-based network designed for efficient, high-quality ptychographic image reconstruction. PtychoDV comprises a vision transformer that generates an initial image from the set of raw measurements, taking into consideration their mutual correlations. This is followed by a deep unrolling network that refines the initial image using learnable convolutional priors and the ptychography measurement model. Experimental results on simulated data demonstrate that PtychoDV is capable of outperforming existing deep learning methods for this problem, and significantly reduces computational cost compared to iterative methodologies, while maintaining competitive performance. △ Less

Submitted 6 March, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

arXiv:2310.04297 [pdf, other]

A Plug-and-Play Image Registration Network

Authors: Junhao Hu, Weijie Gan, Zhixin Sun, Hongyu An, Ulugbek S. Kamilov

Abstract: Deformable image registration (DIR) is an active research topic in biomedical imaging. There is a growing interest in developing DIR methods based on deep learning (DL). A traditional DL approach to DIR is based on training a convolutional neural network (CNN) to estimate the registration field between two input images. While conceptually simple, this approach comes with a limitation that it exclu… ▽ More Deformable image registration (DIR) is an active research topic in biomedical imaging. There is a growing interest in developing DIR methods based on deep learning (DL). A traditional DL approach to DIR is based on training a convolutional neural network (CNN) to estimate the registration field between two input images. While conceptually simple, this approach comes with a limitation that it exclusively relies on a pre-trained CNN without explicitly enforcing fidelity between the registered image and the reference. We present plug-and-play image registration network (PIRATE) as a new DIR method that addresses this issue by integrating an explicit data-fidelity penalty and a CNN prior. PIRATE pre-trains a CNN denoiser on the registration field and "plugs" it into an iterative method as a regularizer. We additionally present PIRATE+ that fine-tunes the CNN prior in PIRATE using deep equilibrium models (DEQ). PIRATE+ interprets the fixed-point iteration of PIRATE as a network with effectively infinite layers and then trains the resulting network end-to-end, enabling it to learn more task-specific information and boosting its performance. Our numerical results on OASIS and CANDI datasets show that our methods achieve state-of-the-art performance on DIR. △ Less

Submitted 19 March, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

Showing 1–50 of 367 results for author: Gan, W