subscribe to arXiv mailings

A Calculus for Unreachable Code

Authors: Peter Zhong, Shu-Hung You, Simone Campanoni, Robert Bruce Findler, Matthew Flatt, Christos Dimoulas

Abstract: In Racket, the LLVM IR, Rust, and other modern languages, programmers and static analyses can hint, with special annotations, that certain parts of a program are unreachable. Same as other assumptions about undefined behavior; the compiler assumes these hints are correct and transforms the program aggressively. While compile-time transformations due to undefined behavior often perplex compiler w… ▽ More In Racket, the LLVM IR, Rust, and other modern languages, programmers and static analyses can hint, with special annotations, that certain parts of a program are unreachable. Same as other assumptions about undefined behavior; the compiler assumes these hints are correct and transforms the program aggressively. While compile-time transformations due to undefined behavior often perplex compiler writers and developers, we show that the essence of transformations due to unreachable code can be distilled in a surprisingly small set of simple formal rules. Specifically, following the well-established tradition of understanding linguistic phenomena through calculi, we introduce the first calculus for unreachable. Its term-rewriting rules that take advantage of unreachable fall into two groups. The first group allows the compiler to delete any code downstream of unreachable, and any effect-free code upstream of unreachable. The second group consists of rules that eliminate conditional expressions when one of their branches is unreachable. We show the correctness of the rules with a novel logical relation, and we examine how they correspond to transformations due to unreachable in Racket and LLVM. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2406.19627 [pdf]

Practical Power System Inertia Monitoring Based on Pumped Storage Hydropower Operation Signature

Authors: Hongyu Li, Chang Chen, Mark Baldwin, Shutang You, Wenpeng Yu, Lin Zhu, Yilu Liu

Abstract: This paper proposes a practical method to monitor power system inertia using Pumped Storage Hydropower (PSH) switching-off events. This approach offers real-time system-level inertia estimation with minimal expenses, no disruption, and the inclusion of behind-the-meter inertia. First, accurate inertia estimation is achieved through improved RoCoF calculation that accounts for pre-event RoCoF, redu… ▽ More This paper proposes a practical method to monitor power system inertia using Pumped Storage Hydropower (PSH) switching-off events. This approach offers real-time system-level inertia estimation with minimal expenses, no disruption, and the inclusion of behind-the-meter inertia. First, accurate inertia estimation is achieved through improved RoCoF calculation that accounts for pre-event RoCoF, reducing common random frequency fluctuations in practice. Second, PSH field data is analyzed, highlighting the benefits of using switching-off events for grid inertia estimation. Third, an event detection trigger is designed to capture pump switching-off events based on local and system features. Fourth, the method is validated on the U.S. Eastern Interconnection model with over 60,000 buses, demonstrating very high accuracy (3%-5% error rate). Finally, it is applied to the U.S. Western Interconnection, with field validation showing a 9.9% average absolute error rate. Despite challenges in practical power system inertia estimation, this method enhances decision-making for power grid reliability and efficiency, addressing challenges posed by renewable energy integration. △ Less

Submitted 1 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

Comments: 8 pages, 15 figures

arXiv:2406.16822 [pdf, other]

A Multi-Party, Multi-Blockchain Atomic Swap Protocol with Universal Adaptor Secret

Authors: Shengewei You, Aditya Joshi, Andrey Kuehlkamp, Jarek Nabrzyski

Abstract: The increasing complexity of digital asset transactions across multiple blockchains necessitates a robust atomic swap protocol that can securely handle more than two participants. Traditional atomic swap protocols, including those based on adaptor signatures, are vulnerable to malicious dropout attacks, which break atomicity and compromise the security of the transaction. This paper presents a nov… ▽ More The increasing complexity of digital asset transactions across multiple blockchains necessitates a robust atomic swap protocol that can securely handle more than two participants. Traditional atomic swap protocols, including those based on adaptor signatures, are vulnerable to malicious dropout attacks, which break atomicity and compromise the security of the transaction. This paper presents a novel multi-party atomic swap protocol that operates almost entirely off-chain, requiring only a single on-chain transaction for finalization. Our protocol leverages Schnorr-like signature verification and a universal adaptor secret to ensure atomicity and scalability across any number of participants and blockchains without the need for smart contracts or trusted third parties. By addressing key challenges such as collusion attacks and malicious dropouts, our protocol significantly enhances the security and efficiency of multi-party atomic swaps. Our contributions include the first scalable, fully off-chain protocol for atomic swaps involving any number of participants, adding zero overhead to native blockchains, and providing a practical and cost-effective solution for decentralized asset exchanges. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.10744 [pdf, other]

Technique Report of CVPR 2024 PBDL Challenges

Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Shengping Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou, Cong Li, Senyan Xu , et al. (75 additional authors not shown)

Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, and medium properties from images. In recent years, deep learning has shown promising improvements for various vision tasks, and when combined with physics-based vision, these approaches can enhance the robustness and accuracy of vision systems. This technical report summarizes the outcomes of the Physics-Based Vision Meets Deep Learning (PBDL) 2024 challenge, held in CVPR 2024 workshop. The challenge consisted of eight tracks, focusing on Low-Light Enhancement and Detection as well as High Dynamic Range (HDR) Imaging. This report details the objectives, methodologies, and results of each track, highlighting the top-performing solutions and their innovative approaches. △ Less

Submitted 12 July, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

Comments: CVPR 2024 PBDL Challenges: https://pbdl-ws.github.io/pbdl2024/challenge/index.html

arXiv:2406.06337 [pdf, other]

System- and Sample-agnostic Isotropic 3D Microscopy by Weakly Physics-informed, Domain-shift-resistant Axial Deblurring

Authors: Jiashu Han, Kunzan Liu, Keith B. Isaacson, Kristina Monakhova, Linda G. Griffith, Sixian You

Abstract: Three-dimensional (3D) subcellular imaging is essential for biomedical research, but the diffraction limit of optical microscopy compromises axial resolution, hindering accurate 3D structural analysis. This challenge is particularly pronounced in label-free imaging of thick, heterogeneous tissues, where assumptions about data distribution (e.g. sparsity, label-specific distribution, and lateral-ax… ▽ More Three-dimensional (3D) subcellular imaging is essential for biomedical research, but the diffraction limit of optical microscopy compromises axial resolution, hindering accurate 3D structural analysis. This challenge is particularly pronounced in label-free imaging of thick, heterogeneous tissues, where assumptions about data distribution (e.g. sparsity, label-specific distribution, and lateral-axial similarity) and system priors (e.g. independent and identically distributed (i.i.d.) noise and linear shift-invariant (LSI) point-spread functions (PSFs)) are often invalid. Here, we introduce SSAI-3D, a weakly physics-informed, domain-shift-resistant framework for robust isotropic 3D imaging. SSAI-3D enables robust axial deblurring by generating a PSF-flexible, noise-resilient, sample-informed training dataset and sparsely fine-tuning a large pre-trained blind deblurring network. SSAI-3D was applied to label-free nonlinear imaging of living organoids, freshly excised human endometrium tissue, and mouse whisker pads, and further validated in publicly available ground-truth-paired experimental datasets of 3D heterogeneous biological tissues with unknown blurring and noise across different microscopy systems. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 27 pages, 6 figures

arXiv:2405.16144 [pdf, other]

GreenCOD: A Green Camouflaged Object Detection Method

Authors: Hong-Shuo Chen, Yao Zhu, Suya You, Azad M. Madni, C. -C. Jay Kuo

Abstract: We introduce GreenCOD, a green method for detecting camouflaged objects, distinct in its avoidance of backpropagation techniques. GreenCOD leverages gradient boosting and deep features extracted from pre-trained Deep Neural Networks (DNNs). Traditional camouflaged object detection (COD) approaches often rely on complex deep neural network architectures, seeking performance improvements through bac… ▽ More We introduce GreenCOD, a green method for detecting camouflaged objects, distinct in its avoidance of backpropagation techniques. GreenCOD leverages gradient boosting and deep features extracted from pre-trained Deep Neural Networks (DNNs). Traditional camouflaged object detection (COD) approaches often rely on complex deep neural network architectures, seeking performance improvements through backpropagation-based fine-tuning. However, such methods are typically computationally demanding and exhibit only marginal performance variations across different models. This raises the question of whether effective training can be achieved without backpropagation. Addressing this, our work proposes a new paradigm that utilizes gradient boosting for COD. This approach significantly simplifies the model design, resulting in a system that requires fewer parameters and operations and maintains high performance compared to state-of-the-art deep learning models. Remarkably, our models are trained without backpropagation and achieve the best performance with fewer than 20G Multiply-Accumulate Operations (MACs). This new, more efficient paradigm opens avenues for further exploration in green, backpropagation-free model training. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2404.11901 [pdf, other]

Deep and Dynamic Metabolic and Structural Imaging in Living Tissues

Authors: Kunzan Liu, Honghao Cao, Kasey Shashaty, Li-Yu Yu, Sarah Spitz, Francesca Michela Pramotton, Zhengpeng Wan, Ellen L. Kan, Erin N. Tevonian, Manuel Levy, Eva Lendaro, Roger D. Kamm, Linda G. Griffith, Fan Wang, Tong Qiu, Sixian You

Abstract: Label-free imaging through two-photon autofluorescence (2PAF) of NAD(P)H allows for non-destructive and high-resolution visualization of cellular activities in living systems. However, its application to thick tissues and organoids has been restricted by its limited penetration depth within 300 $μ$m, largely due to tissue scattering at the typical excitation wavelength (~750 nm) required for NAD(P… ▽ More Label-free imaging through two-photon autofluorescence (2PAF) of NAD(P)H allows for non-destructive and high-resolution visualization of cellular activities in living systems. However, its application to thick tissues and organoids has been restricted by its limited penetration depth within 300 $μ$m, largely due to tissue scattering at the typical excitation wavelength (~750 nm) required for NAD(P)H. Here, we demonstrate that the imaging depth for NAD(P)H can be extended to over 700 $μ$m in living engineered human multicellular microtissues by adopting multimode fiber (MMF)-based low-repetition-rate high-peak-power three-photon (3P) excitation of NAD(P)H at 1100 nm. This is achieved by having over 0.5 MW peak power at the band of 1100$\pm$25 nm through adaptively modulating multimodal nonlinear pulse propagation with a compact fiber shaper. Moreover, the 8-fold increase in pulse energy at 1100 nm enables faster imaging of monocyte behaviors in the living multicellular models. These results represent a significant advance for deep and dynamic metabolic and structural imaging of intact living biosystems. The modular design (MMF with a slip-on fiber shaper) is anticipated to allow wide adoption of this methodology for demanding in vivo and in vitro imaging applications, including cancer research, autoimmune diseases, and tissue engineering. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: 20 pages, 5 figures, under review in Science Advances

arXiv:2404.07421 [pdf, other]

Controllable transitions among phase-matching conditions in a single nonlinear crystal

Authors: Zi-Qi Zeng, Shi-Xin You, Zi-Xiang Yang, Chenzhi Yuan, Chenglong You, Rui-Bo Jin

Abstract: Entangled photon pairs are crucial resources for quantum information processing protocols. Via the process of spontaneous parametric down-conversion (SPDC), we can generate these photon pairs using bulk nonlinear crystals. Traditionally, the crystal is designed to satisfy specific type of phase-matching condition. Here, we report controllable transitions among different types of phase-matching in… ▽ More Entangled photon pairs are crucial resources for quantum information processing protocols. Via the process of spontaneous parametric down-conversion (SPDC), we can generate these photon pairs using bulk nonlinear crystals. Traditionally, the crystal is designed to satisfy specific type of phase-matching condition. Here, we report controllable transitions among different types of phase-matching in a single periodically poled potassium titanyl phosphate (PPKTP) crystal. By carefully selecting pump conditions, we can satisfy different phase-matching conditions. This allows us to observe first-order type-II, fifth-order type-I, third-order type-0, and fifth-order type-II SPDCs. The temperature-dependent spectra of our source were also analyzed in detail. Finally, we discussed the possibility of observing more than nine SPDCs in this crystal. Our work not only deepens the understanding of the physics behind phase-matching conditions, but also offers the potential for a highly versatile entangled biphoton source for quantum information research. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: 8 pages, 3 figures

Journal ref: Chinese Optics Letters, 22(2), 021901(2024)

arXiv:2404.06903 [pdf, other]

DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting

Authors: Shijie Zhou, Zhiwen Fan, Dejia Xu, Haoran Chang, Pradyumna Chari, Tejas Bharadwaj, Suya You, Zhangyang Wang, Achuta Kadambi

Abstract: The increasing demand for virtual reality applications has highlighted the significance of crafting immersive 3D assets. We present a text-to-3D 360$^{\circ}$ scene generation pipeline that facilitates the creation of comprehensive 360$^{\circ}$ scenes for in-the-wild environments in a matter of minutes. Our approach utilizes the generative power of a 2D diffusion model and prompt self-refinement… ▽ More The increasing demand for virtual reality applications has highlighted the significance of crafting immersive 3D assets. We present a text-to-3D 360$^{\circ}$ scene generation pipeline that facilitates the creation of comprehensive 360$^{\circ}$ scenes for in-the-wild environments in a matter of minutes. Our approach utilizes the generative power of a 2D diffusion model and prompt self-refinement to create a high-quality and globally coherent panoramic image. This image acts as a preliminary "flat" (2D) scene representation. Subsequently, it is lifted into 3D Gaussians, employing splatting techniques to enable real-time exploration. To produce consistent 3D geometry, our pipeline constructs a spatially coherent structure by aligning the 2D monocular depth into a globally optimized point cloud. This point cloud serves as the initial state for the centroids of 3D Gaussians. In order to address invisible issues inherent in single-view inputs, we impose semantic and geometric constraints on both synthesized and input camera views as regularizations. These guide the optimization of Gaussians, aiding in the reconstruction of unseen regions. In summary, our method offers a globally consistent 3D scene within a 360$^{\circ}$ perspective, providing an enhanced immersive experience over existing techniques. Project website at: http://dreamscene360.github.io/ △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2403.20092 [pdf, other]

Modeling Weather Uncertainty for Multi-weather Co-Presence Estimation

Authors: Qi Bi, Shaodi You, Theo Gevers

Abstract: Images from outdoor scenes may be taken under various weather conditions. It is well studied that weather impacts the performance of computer vision algorithms and needs to be handled properly. However, existing algorithms model weather condition as a discrete status and estimate it using multi-label classification. The fact is that, physically, specifically in meteorology, weather are modeled as… ▽ More Images from outdoor scenes may be taken under various weather conditions. It is well studied that weather impacts the performance of computer vision algorithms and needs to be handled properly. However, existing algorithms model weather condition as a discrete status and estimate it using multi-label classification. The fact is that, physically, specifically in meteorology, weather are modeled as a continuous and transitional status. Instead of directly implementing hard classification as existing multi-weather classification methods do, we consider the physical formulation of multi-weather conditions and model the impact of physical-related parameter on learning from the image appearance. In this paper, we start with solid revisit of the physics definition of weather and how it can be described as a continuous machine learning and computer vision task. Namely, we propose to model the weather uncertainty, where the level of probability and co-existence of multiple weather conditions are both considered. A Gaussian mixture model is used to encapsulate the weather uncertainty and a uncertainty-aware multi-weather learning scheme is proposed based on prior-posterior learning. A novel multi-weather co-presence estimation transformer (MeFormer) is proposed. In addition, a new multi-weather co-presence estimation (MePe) dataset, along with 14 fine-grained weather categories and 16,078 samples, is proposed to benchmark both conventional multi-label weather classification task and multi-weather co-presence estimation task. Large scale experiments show that the proposed method achieves state-of-the-art performance and substantial generalization capabilities on both the conventional multi-label weather classification task and the proposed multi-weather co-presence estimation task. Besides, modeling weather uncertainty also benefits adverse-weather semantic segmentation. △ Less

Submitted 29 March, 2024; originally announced March 2024.

Comments: Work in progress

arXiv:2403.09338 [pdf, other]

LocalMamba: Visual State Space Model with Windowed Selective Scan

Authors: Tao Huang, Xiaohuan Pei, Shan You, Fei Wang, Chen Qian, Chang Xu

Abstract: Recent advancements in state space models, notably Mamba, have demonstrated significant progress in modeling long sequences for tasks like language understanding. Yet, their application in vision tasks has not markedly surpassed the performance of traditional Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). This paper posits that the key to enhancing Vision Mamba (ViM) lies in… ▽ More Recent advancements in state space models, notably Mamba, have demonstrated significant progress in modeling long sequences for tasks like language understanding. Yet, their application in vision tasks has not markedly surpassed the performance of traditional Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). This paper posits that the key to enhancing Vision Mamba (ViM) lies in optimizing scan directions for sequence modeling. Traditional ViM approaches, which flatten spatial tokens, overlook the preservation of local 2D dependencies, thereby elongating the distance between adjacent tokens. We introduce a novel local scanning strategy that divides images into distinct windows, effectively capturing local dependencies while maintaining a global perspective. Additionally, acknowledging the varying preferences for scan patterns across different network layers, we propose a dynamic method to independently search for the optimal scan choices for each layer, substantially improving performance. Extensive experiments across both plain and hierarchical models underscore our approach's superiority in effectively capturing image representations. For example, our model significantly outperforms Vim-Ti by 3.1% on ImageNet with the same 1.5G FLOPs. Code is available at: https://github.com/hunto/LocalMamba. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2403.06517 [pdf, other]

Active Generation for Image Classification

Authors: Tao Huang, Jiaqi Liu, Shan You, Chang Xu

Abstract: Recently, the growing capabilities of deep generative models have underscored their potential in enhancing image classification accuracy. However, existing methods often demand the generation of a disproportionately large number of images compared to the original dataset, while having only marginal improvements in accuracy. This computationally expensive and time-consuming process hampers the prac… ▽ More Recently, the growing capabilities of deep generative models have underscored their potential in enhancing image classification accuracy. However, existing methods often demand the generation of a disproportionately large number of images compared to the original dataset, while having only marginal improvements in accuracy. This computationally expensive and time-consuming process hampers the practicality of such approaches. In this paper, we propose to address the efficiency of image generation by focusing on the specific needs and characteristics of the model. With a central tenet of active learning, our method, named ActGen, takes a training-aware approach to image generation. It aims to create images akin to the challenging or misclassified samples encountered by the current model and incorporates these generated images into the training set to augment model performance. ActGen introduces an attentive image guidance technique, using real images as guides during the denoising process of a diffusion model. The model's attention on class prompt is leveraged to ensure the preservation of similar foreground object while diversifying the background. Furthermore, we introduce a gradient-based generation guidance method, which employs two losses to generate more challenging samples and prevent the generated images from being too similar to previously generated ones. Experimental results on the CIFAR and ImageNet datasets demonstrate that our method achieves better performance with a significantly reduced number of generated images. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2401.08233 [pdf]

doi 10.17703/IJACT.2023.11.4.393

Enhancing Wind Speed and Wind Power Forecasting Using Shape-Wise Feature Engineering: A Novel Approach for Improved Accuracy and Robustness

Authors: Mulomba Mukendi Christian, Yun Seon Kim, Hyebong Choi, Jaeyoung Lee, SongHee You

Abstract: Accurate prediction of wind speed and power is vital for enhancing the efficiency of wind energy systems. Numerous solutions have been implemented to date, demonstrating their potential to improve forecasting. Among these, deep learning is perceived as a revolutionary approach in the field. However, despite their effectiveness, the noise present in the collected data remains a significant challeng… ▽ More Accurate prediction of wind speed and power is vital for enhancing the efficiency of wind energy systems. Numerous solutions have been implemented to date, demonstrating their potential to improve forecasting. Among these, deep learning is perceived as a revolutionary approach in the field. However, despite their effectiveness, the noise present in the collected data remains a significant challenge. This noise has the potential to diminish the performance of these algorithms, leading to inaccurate predictions. In response to this, this study explores a novel feature engineering approach. This approach involves altering the data input shape in both Convolutional Neural Network-Long Short-Term Memory (CNN-LSTM) and Autoregressive models for various forecasting horizons. The results reveal substantial enhancements in model resilience against noise resulting from step increases in data. The approach could achieve an impressive 83% accuracy in predicting unseen data up to the 24th steps. Furthermore, this method consistently provides high accuracy for short, mid, and long-term forecasts, outperforming the performance of individual models. These findings pave the way for further research on noise reduction strategies at different forecasting horizons through shape-wise feature engineering. △ Less

Submitted 16 January, 2024; originally announced January 2024.

Journal ref: International Journal of Advanced Culture Technology Vol.11 No.4 393-405 (2023)

arXiv:2401.04368 [pdf]

doi 10.7236/IJASC.2023.12.4.434

Enhancing Acute Kidney Injury Prediction through Integration of Drug Features in Intensive Care Units

Authors: Gabriel D. M. Manalu, Mulomba Mukendi Christian, Songhee You, Hyebong Choi

Abstract: The relationship between acute kidney injury (AKI) prediction and nephrotoxic drugs, or drugs that adversely affect kidney function, is one that has yet to be explored in the critical care setting. One contributing factor to this gap in research is the limited investigation of drug modalities in the intensive care unit (ICU) context, due to the challenges of processing prescription data into the c… ▽ More The relationship between acute kidney injury (AKI) prediction and nephrotoxic drugs, or drugs that adversely affect kidney function, is one that has yet to be explored in the critical care setting. One contributing factor to this gap in research is the limited investigation of drug modalities in the intensive care unit (ICU) context, due to the challenges of processing prescription data into the corresponding drug representations and a lack in the comprehensive understanding of these drug representations. This study addresses this gap by proposing a novel approach that leverages patient prescription data as a modality to improve existing models for AKI prediction. We base our research on Electronic Health Record (EHR) data, extracting the relevant patient prescription information and converting it into the selected drug representation for our research, the extended-connectivity fingerprint (ECFP). Furthermore, we adopt a unique multimodal approach, developing machine learning models and 1D Convolutional Neural Networks (CNN) applied to clinical drug representations, establishing a procedure which has not been used by any previous studies predicting AKI. The findings showcase a notable improvement in AKI prediction through the integration of drug embeddings and other patient cohort features. By using drug features represented as ECFP molecular fingerprints along with common cohort features such as demographics and lab test values, we achieved a considerable improvement in model performance for the AKI prediction task over the baseline model which does not include the drug representations as features, indicating that our distinct approach enhances existing baseline techniques and highlights the relevance of drug data in predicting AKI in the ICU setting △ Less

Submitted 9 January, 2024; originally announced January 2024.

Comments: 9 pages, 2 tables

Journal ref: International Journal of Advanced Smart Convergence Vol.12 No.4 434- 442 (2023)

arXiv:2312.13307 [pdf, other]

Not All Steps are Equal: Efficient Generation with Progressive Diffusion Models

Authors: Wenhao Li, Xiu Su, Shan You, Tao Huang, Fei Wang, Chen Qian, Chang Xu

Abstract: Diffusion models have demonstrated remarkable efficacy in various generative tasks with the predictive prowess of denoising model. Currently, these models employ a uniform denoising approach across all timesteps. However, the inherent variations in noisy latents at each timestep lead to conflicts during training, constraining the potential of diffusion models. To address this challenge, we propose… ▽ More Diffusion models have demonstrated remarkable efficacy in various generative tasks with the predictive prowess of denoising model. Currently, these models employ a uniform denoising approach across all timesteps. However, the inherent variations in noisy latents at each timestep lead to conflicts during training, constraining the potential of diffusion models. To address this challenge, we propose a novel two-stage training strategy termed Step-Adaptive Training. In the initial stage, a base denoising model is trained to encompass all timesteps. Subsequently, we partition the timesteps into distinct groups, fine-tuning the model within each group to achieve specialized denoising capabilities. Recognizing that the difficulties of predicting noise at different timesteps vary, we introduce a diverse model size requirement. We dynamically adjust the model size for each timestep by estimating task difficulty based on its signal-to-noise ratio before fine-tuning. This adjustment is facilitated by a proxy-based structural importance assessment mechanism, enabling precise and efficient pruning of the base denoising model. Our experiments validate the effectiveness of the proposed training strategy, demonstrating an improvement in the FID score on CIFAR10 by over 0.3 while utilizing only 80\% of the computational resources. This innovative approach not only enhances model performance but also significantly reduces computational costs, opening new avenues for the development and application of diffusion models. △ Less

Submitted 1 January, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

arXiv:2312.12471 [pdf, other]

Atlantis: Enabling Underwater Depth Estimation with Stable Diffusion

Authors: Fan Zhang, Shaodi You, Yu Li, Ying Fu

Abstract: Monocular depth estimation has experienced significant progress on terrestrial images in recent years, largely due to deep learning advancements. However, it remains inadequate for underwater scenes, primarily because of data scarcity. Given the inherent challenges of light attenuation and backscattering in water, acquiring clear underwater images or precise depth information is notably difficult… ▽ More Monocular depth estimation has experienced significant progress on terrestrial images in recent years, largely due to deep learning advancements. However, it remains inadequate for underwater scenes, primarily because of data scarcity. Given the inherent challenges of light attenuation and backscattering in water, acquiring clear underwater images or precise depth information is notably difficult and costly. Consequently, learning-based approaches often rely on synthetic data or turn to unsupervised or self-supervised methods to mitigate this lack of data. Nonetheless, the performance of these methods is often constrained by the domain gap and looser constraints. In this paper, we propose a novel pipeline for generating photorealistic underwater images using accurate terrestrial depth data. This approach facilitates the training of supervised models for underwater depth estimation, effectively reducing the performance disparity between terrestrial and underwater environments. Contrary to prior synthetic datasets that merely apply style transfer to terrestrial images without altering the scene content, our approach uniquely creates vibrant, non-existent underwater scenes by leveraging terrestrial depth data through the innovative Stable Diffusion model. Specifically, we introduce a unique Depth2Underwater ControlNet, trained on specially prepared \{Underwater, Depth, Text\} data triplets, for this generation task. Our newly developed dataset enables terrestrial depth estimation models to achieve considerable improvements, both quantitatively and qualitatively, on unseen underwater images, surpassing their terrestrial pre-trained counterparts. Moreover, the enhanced depth accuracy for underwater scenes also aids underwater image restoration techniques that rely on depth maps, further demonstrating our dataset's utility. The dataset will be available at https://github.com/zkawfanx/Atlantis. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: 10 pages

arXiv:2312.03203 [pdf, other]

Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields

Authors: Shijie Zhou, Haoran Chang, Sicheng Jiang, Zhiwen Fan, Zehao Zhu, Dejia Xu, Pradyumna Chari, Suya You, Zhangyang Wang, Achuta Kadambi

Abstract: 3D scene representations have gained immense popularity in recent years. Methods that use Neural Radiance fields are versatile for traditional tasks such as novel view synthesis. In recent times, some work has emerged that aims to extend the functionality of NeRF beyond view synthesis, for semantically aware tasks such as editing and segmentation using 3D feature field distillation from 2D foundat… ▽ More 3D scene representations have gained immense popularity in recent years. Methods that use Neural Radiance fields are versatile for traditional tasks such as novel view synthesis. In recent times, some work has emerged that aims to extend the functionality of NeRF beyond view synthesis, for semantically aware tasks such as editing and segmentation using 3D feature field distillation from 2D foundation models. However, these methods have two major limitations: (a) they are limited by the rendering speed of NeRF pipelines, and (b) implicitly represented feature fields suffer from continuity artifacts reducing feature quality. Recently, 3D Gaussian Splatting has shown state-of-the-art performance on real-time radiance field rendering. In this work, we go one step further: in addition to radiance field rendering, we enable 3D Gaussian splatting on arbitrary-dimension semantic features via 2D foundation model distillation. This translation is not straightforward: naively incorporating feature fields in the 3DGS framework encounters significant challenges, notably the disparities in spatial resolution and channel consistency between RGB images and feature maps. We propose architectural and training changes to efficiently avert this problem. Our proposed method is general, and our experiments showcase novel view semantic segmentation, language-guided editing and segment anything through learning feature fields from state-of-the-art 2D foundation models such as SAM and CLIP-LSeg. Across experiments, our distillation method is able to provide comparable or better results, while being significantly faster to both train and render. Additionally, to the best of our knowledge, we are the first method to enable point and bounding-box prompting for radiance field manipulation, by leveraging the SAM model. Project website at: https://feature-3dgs.github.io/ △ Less

Submitted 8 April, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

arXiv:2311.04944 [pdf, other]

Edge-assisted U-Shaped Split Federated Learning with Privacy-preserving for Internet of Things

Authors: Hengliang Tang, Zihang Zhao, Detian Liu, Yang Cao, Shiqiang Zhang, Siqing You

Abstract: In the realm of the Internet of Things (IoT), deploying deep learning models to process data generated or collected by IoT devices is a critical challenge. However, direct data transmission can cause network congestion and inefficient execution, given that IoT devices typically lack computation and communication capabilities. Centralized data processing in data centers is also no longer feasible d… ▽ More In the realm of the Internet of Things (IoT), deploying deep learning models to process data generated or collected by IoT devices is a critical challenge. However, direct data transmission can cause network congestion and inefficient execution, given that IoT devices typically lack computation and communication capabilities. Centralized data processing in data centers is also no longer feasible due to concerns over data privacy and security. To address these challenges, we present an innovative Edge-assisted U-Shaped Split Federated Learning (EUSFL) framework, which harnesses the high-performance capabilities of edge servers to assist IoT devices in model training and optimization process. In this framework, we leverage Federated Learning (FL) to enable data holders to collaboratively train models without sharing their data, thereby enhancing data privacy protection by transmitting only model parameters. Additionally, inspired by Split Learning (SL), we split the neural network into three parts using U-shaped splitting for local training on IoT devices. By exploiting the greater computation capability of edge servers, our framework effectively reduces overall training time and allows IoT devices with varying capabilities to perform training tasks efficiently. Furthermore, we proposed a novel noise mechanism called LabelDP to ensure that data features and labels can securely resist reconstruction attacks, eliminating the risk of privacy leakage. Our theoretical analysis and experimental results demonstrate that EUSFL can be integrated with various aggregation algorithms, maintaining good performance across different computing capabilities of IoT devices, and significantly reducing training time and local computation overhead. △ Less

Submitted 8 November, 2023; originally announced November 2023.

arXiv:2311.03799 [pdf, other]

Detecting Any Human-Object Interaction Relationship: Universal HOI Detector with Spatial Prompt Learning on Foundation Models

Authors: Yichao Cao, Qingfei Tang, Xiu Su, Chen Song, Shan You, Xiaobo Lu, Chang Xu

Abstract: Human-object interaction (HOI) detection aims to comprehend the intricate relationships between humans and objects, predicting $<human, action, object>$ triplets, and serving as the foundation for numerous computer vision tasks. The complexity and diversity of human-object interactions in the real world, however, pose significant challenges for both annotation and recognition, particularly in reco… ▽ More Human-object interaction (HOI) detection aims to comprehend the intricate relationships between humans and objects, predicting $<human, action, object>$ triplets, and serving as the foundation for numerous computer vision tasks. The complexity and diversity of human-object interactions in the real world, however, pose significant challenges for both annotation and recognition, particularly in recognizing interactions within an open world context. This study explores the universal interaction recognition in an open-world setting through the use of Vision-Language (VL) foundation models and large language models (LLMs). The proposed method is dubbed as \emph{\textbf{UniHOI}}. We conduct a deep analysis of the three hierarchical features inherent in visual HOI detectors and propose a method for high-level relation extraction aimed at VL foundation models, which we call HO prompt-based learning. Our design includes an HO Prompt-guided Decoder (HOPD), facilitates the association of high-level relation representations in the foundation model with various HO pairs within the image. Furthermore, we utilize a LLM (\emph{i.e.} GPT) for interaction interpretation, generating a richer linguistic understanding for complex HOIs. For open-category interaction recognition, our method supports either of two input types: interaction phrase or interpretive sentence. Our efficient architecture design and learning methods effectively unleash the potential of the VL foundation models and LLMs, allowing UniHOI to surpass all existing methods with a substantial margin, under both supervised and zero-shot settings. The code and pre-trained weights are available at: \url{https://github.com/Caoyichao/UniHOI}. △ Less

Submitted 7 November, 2023; originally announced November 2023.

arXiv:2311.02535

TokenMotion: Motion-Guided Vision Transformer for Video Camouflaged Object Detection Via Learnable Token Selection

Authors: Zifan Yu, Erfan Bank Tavakoli, Meida Chen, Suya You, Raghuveer Rao, Sanjeev Agarwal, Fengbo Ren

Abstract: The area of Video Camouflaged Object Detection (VCOD) presents unique challenges in the field of computer vision due to texture similarities between target objects and their surroundings, as well as irregular motion patterns caused by both objects and camera movement. In this paper, we introduce TokenMotion (TMNet), which employs a transformer-based model to enhance VCOD by extracting motion-guide… ▽ More The area of Video Camouflaged Object Detection (VCOD) presents unique challenges in the field of computer vision due to texture similarities between target objects and their surroundings, as well as irregular motion patterns caused by both objects and camera movement. In this paper, we introduce TokenMotion (TMNet), which employs a transformer-based model to enhance VCOD by extracting motion-guided features using a learnable token selection. Evaluated on the challenging MoCA-Mask dataset, TMNet achieves state-of-the-art performance in VCOD. It outperforms the existing state-of-the-art method by a 12.8% improvement in weighted F-measure, an 8.4% enhancement in S-measure, and a 10.7% boost in mean IoU. The results demonstrate the benefits of utilizing motion-guided features via learnable token selection within a transformer-based framework to tackle the intricate task of VCOD. △ Less

Submitted 1 February, 2024; v1 submitted 4 November, 2023; originally announced November 2023.

Comments: Revising Needed

arXiv:2310.20187 [pdf, other]

Self-Supervised Pre-Training for Precipitation Post-Processor

Authors: Sojung An, Junha Lee, Jiyeon Jang, Inchae Na, Wooyeon Park, Sujeong You

Abstract: Obtaining a sufficient forecast lead time for local precipitation is essential in preventing hazardous weather events. Global warming-induced climate change increases the challenge of accurately predicting severe precipitation events, such as heavy rainfall. In this paper, we propose a deep learning-based precipitation post-processor for numerical weather prediction (NWP) models. The precipitation… ▽ More Obtaining a sufficient forecast lead time for local precipitation is essential in preventing hazardous weather events. Global warming-induced climate change increases the challenge of accurately predicting severe precipitation events, such as heavy rainfall. In this paper, we propose a deep learning-based precipitation post-processor for numerical weather prediction (NWP) models. The precipitation post-processor consists of (i) employing self-supervised pre-training, where the parameters of the encoder are pre-trained on the reconstruction of the masked variables of the atmospheric physics domain; and (ii) conducting transfer learning on precipitation segmentation tasks (the target domain) from the pre-trained encoder. In addition, we introduced a heuristic labeling approach to effectively train class-imbalanced datasets. Our experiments on precipitation correction for regional NWP show that the proposed method outperforms other approaches. △ Less

Submitted 19 February, 2024; v1 submitted 31 October, 2023; originally announced October 2023.

Comments: 7 pages, 3 figures, 1 table, accepted to NeurIPS 2023 Workshop on Tackling Climate Change with Machine Learning at [this http URL](https://www.climatechange.ai/papers/neurips2023/18)

arXiv:2310.18788 [pdf, other]

PrObeD: Proactive Object Detection Wrapper

Authors: Vishal Asnani, Abhinav Kumar, Suya You, Xiaoming Liu

Abstract: Previous research in $2D$ object detection focuses on various tasks, including detecting objects in generic and camouflaged images. These works are regarded as passive works for object detection as they take the input image as is. However, convergence to global minima is not guaranteed to be optimal in neural networks; therefore, we argue that the trained weights in the object detector are not opt… ▽ More Previous research in $2D$ object detection focuses on various tasks, including detecting objects in generic and camouflaged images. These works are regarded as passive works for object detection as they take the input image as is. However, convergence to global minima is not guaranteed to be optimal in neural networks; therefore, we argue that the trained weights in the object detector are not optimal. To rectify this problem, we propose a wrapper based on proactive schemes, PrObeD, which enhances the performance of these object detectors by learning a signal. PrObeD consists of an encoder-decoder architecture, where the encoder network generates an image-dependent signal termed templates to encrypt the input images, and the decoder recovers this template from the encrypted images. We propose that learning the optimum template results in an object detector with an improved detection performance. The template acts as a mask to the input images to highlight semantics useful for the object detector. Finetuning the object detector with these encrypted images enhances the detection performance for both generic and camouflaged. Our experiments on MS-COCO, CAMO, COD$10$K, and NC$4$K datasets show improvement over different detectors after applying PrObeD. Our models/codes are available at https://github.com/vishal3477/Proactive-Object-Detection. △ Less

Submitted 28 October, 2023; originally announced October 2023.

Comments: Accepted at Neurips 2023

arXiv:2310.16102 [pdf, other]

Learned, Uncertainty-driven Adaptive Acquisition for Photon-Efficient Multiphoton Microscopy

Authors: Cassandra Tong Ye, Jiashu Han, Kunzan Liu, Anastasios Angelopoulos, Linda Griffith, Kristina Monakhova, Sixian You

Abstract: Multiphoton microscopy (MPM) is a powerful imaging tool that has been a critical enabler for live tissue imaging. However, since most multiphoton microscopy platforms rely on point scanning, there is an inherent trade-off between acquisition time, field of view (FOV), phototoxicity, and image quality, often resulting in noisy measurements when fast, large FOV, and/or gentle imaging is needed. Deep… ▽ More Multiphoton microscopy (MPM) is a powerful imaging tool that has been a critical enabler for live tissue imaging. However, since most multiphoton microscopy platforms rely on point scanning, there is an inherent trade-off between acquisition time, field of view (FOV), phototoxicity, and image quality, often resulting in noisy measurements when fast, large FOV, and/or gentle imaging is needed. Deep learning could be used to denoise multiphoton microscopy measurements, but these algorithms can be prone to hallucination, which can be disastrous for medical and scientific applications. We propose a method to simultaneously denoise and predict pixel-wise uncertainty for multiphoton imaging measurements, improving algorithm trustworthiness and providing statistical guarantees for the deep learning predictions. Furthermore, we propose to leverage this learned, pixel-wise uncertainty to drive an adaptive acquisition technique that rescans only the most uncertain regions of a sample. We demonstrate our method on experimental noisy MPM measurements of human endometrium tissues, showing that we can maintain fine features and outperform other denoising methods while predicting uncertainty at each pixel. Finally, with our adaptive acquisition technique, we demonstrate a 120X reduction in acquisition time and total light dose while successfully recovering fine features in the sample. We are the first to demonstrate distribution-free uncertainty quantification for a denoising task with real experimental data and the first to propose adaptive acquisition based on reconstruction uncertainty △ Less

Submitted 24 October, 2023; originally announced October 2023.

arXiv:2310.10879 [pdf, other]

BLoad: Enhancing Neural Network Training with Efficient Sequential Data Handling

Authors: Raphael Ruschel, A. S. M. Iftekhar, B. S. Manjunath, Suya You

Abstract: The increasing complexity of modern deep neural network models and the expanding sizes of datasets necessitate the development of optimized and scalable training methods. In this white paper, we addressed the challenge of efficiently training neural network models using sequences of varying sizes. To address this challenge, we propose a novel training scheme that enables efficient distributed data… ▽ More The increasing complexity of modern deep neural network models and the expanding sizes of datasets necessitate the development of optimized and scalable training methods. In this white paper, we addressed the challenge of efficiently training neural network models using sequences of varying sizes. To address this challenge, we propose a novel training scheme that enables efficient distributed data-parallel training on sequences of different sizes with minimal overhead. By using this scheme we were able to reduce the padding amount by more than 100$x$ while not deleting a single frame, resulting in an overall increased performance on both training time and Recall in our experiments. △ Less

Submitted 25 April, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

arXiv:2310.04995 [pdf, other]

SemST: Semantically Consistent Multi-Scale Image Translation via Structure-Texture Alignment

Authors: Ganning Zhao, Wenhui Cui, Suya You, C. -C. Jay Kuo

Abstract: Unsupervised image-to-image (I2I) translation learns cross-domain image mapping that transfers input from the source domain to output in the target domain while preserving its semantics. One challenge is that different semantic statistics in source and target domains result in content discrepancy known as semantic distortion. To address this problem, a novel I2I method that maintains semantic cons… ▽ More Unsupervised image-to-image (I2I) translation learns cross-domain image mapping that transfers input from the source domain to output in the target domain while preserving its semantics. One challenge is that different semantic statistics in source and target domains result in content discrepancy known as semantic distortion. To address this problem, a novel I2I method that maintains semantic consistency in translation is proposed and named SemST in this work. SemST reduces semantic distortion by employing contrastive learning and aligning the structural and textural properties of input and output by maximizing their mutual information. Furthermore, a multi-scale approach is introduced to enhance translation performance, thereby enabling the applicability of SemST to domain adaptation in high-resolution images. Experiments show that SemST effectively mitigates semantic distortion and achieves state-of-the-art performance. Also, the application of SemST to domain adaptation (DA) is explored. It is demonstrated by preliminary experiments that SemST can be utilized as a beneficial pre-training for the semantic segmentation task. △ Less

Submitted 7 October, 2023; originally announced October 2023.

arXiv:2310.04750 [pdf, other]

DiffNAS: Bootstrapping Diffusion Models by Prompting for Better Architectures

Authors: Wenhao Li, Xiu Su, Shan You, Fei Wang, Chen Qian, Chang Xu

Abstract: Diffusion models have recently exhibited remarkable performance on synthetic data. After a diffusion path is selected, a base model, such as UNet, operates as a denoising autoencoder, primarily predicting noises that need to be eliminated step by step. Consequently, it is crucial to employ a model that aligns with the expected budgets to facilitate superior synthetic performance. In this paper, we… ▽ More Diffusion models have recently exhibited remarkable performance on synthetic data. After a diffusion path is selected, a base model, such as UNet, operates as a denoising autoencoder, primarily predicting noises that need to be eliminated step by step. Consequently, it is crucial to employ a model that aligns with the expected budgets to facilitate superior synthetic performance. In this paper, we meticulously analyze the diffusion model and engineer a base model search approach, denoted "DiffNAS". Specifically, we leverage GPT-4 as a supernet to expedite the search, supplemented with a search memory to enhance the results. Moreover, we employ RFID as a proxy to promptly rank the experimental outcomes produced by GPT-4. We also adopt a rapid-convergence training strategy to boost search efficiency. Rigorous experimentation corroborates that our algorithm can augment the search efficiency by 2 times under GPT-based scenarios, while also attaining a performance of 2.82 with 0.37 improvement in FID on CIFAR10 relative to the benchmark IDDPM algorithm. △ Less

Submitted 9 October, 2023; v1 submitted 7 October, 2023; originally announced October 2023.

arXiv:2310.00730 [pdf]

EventLFM: Event Camera integrated Fourier Light Field Microscopy for Ultrafast 3D imaging

Authors: Ruipeng Guo, Qianwan Yang, Andrew S. Chang, Guorong Hu, Joseph Greene, Christopher V. Gabel, Sixian You, Lei Tian

Abstract: Ultrafast 3D imaging is indispensable for visualizing complex and dynamic biological processes. Conventional scanning-based techniques necessitate an inherent trade-off between acquisition speed and space-bandwidth product (SBP). Emerging single-shot 3D wide-field techniques offer a promising alternative but are bottlenecked by the synchronous readout constraints of conventional CMOS systems, thus… ▽ More Ultrafast 3D imaging is indispensable for visualizing complex and dynamic biological processes. Conventional scanning-based techniques necessitate an inherent trade-off between acquisition speed and space-bandwidth product (SBP). Emerging single-shot 3D wide-field techniques offer a promising alternative but are bottlenecked by the synchronous readout constraints of conventional CMOS systems, thus restricting data throughput to maintain high SBP at limited frame rates. To address this, we introduce EventLFM, a straightforward and cost-effective system that overcomes these challenges by integrating an event camera with Fourier light field microscopy (LFM), a state-of-the-art single-shot 3D wide-field imaging technique. The event camera operates on a novel asynchronous readout architecture, thereby bypassing the frame rate limitations inherent to conventional CMOS systems. We further develop a simple and robust event-driven LFM reconstruction algorithm that can reliably reconstruct 3D dynamics from the unique spatiotemporal measurements captured by EventLFM. Experimental results demonstrate that EventLFM can robustly reconstruct fast-moving and rapidly blinking 3D fluorescent samples at kHz frame rates. Furthermore, we highlight EventLFM's capability for imaging of blinking neuronal signals in scattering mouse brain tissues and 3D tracking of GFP-labeled neurons in freely moving C. elegans. We believe that the combined ultrafast speed and large 3D SBP offered by EventLFM may open up new possibilities across many biomedical applications. △ Less

Submitted 3 April, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

arXiv:2309.10421 [pdf, other]

Exploring Different Levels of Supervision for Detecting and Localizing Solar Panels on Remote Sensing Imagery

Authors: Maarten Burger, Rob Wijnhoven, Shaodi You

Abstract: This study investigates object presence detection and localization in remote sensing imagery, focusing on solar panel recognition. We explore different levels of supervision, evaluating three models: a fully supervised object detector, a weakly supervised image classifier with CAM-based localization, and a minimally supervised anomaly detector. The classifier excels in binary presence detection (0… ▽ More This study investigates object presence detection and localization in remote sensing imagery, focusing on solar panel recognition. We explore different levels of supervision, evaluating three models: a fully supervised object detector, a weakly supervised image classifier with CAM-based localization, and a minimally supervised anomaly detector. The classifier excels in binary presence detection (0.79 F1-score), while the object detector (0.72) offers precise localization. The anomaly detector requires more data for viable performance. Fusion of model results shows potential accuracy gains. CAM impacts localization modestly, with GradCAM, GradCAM++, and HiResCAM yielding superior results. Notably, the classifier remains robust with less data, in contrast to the object detector. △ Less

Submitted 19 September, 2023; originally announced September 2023.

Comments: Presented at the Netherlands Conference on Computer Vision (NCCV), The Hague, the Netherlands, September 14, 2023

arXiv:2309.09078 [pdf, other]

Unsupervised Green Object Tracker (GOT) without Offline Pre-training

Authors: Zhiruo Zhou, Suya You, C. -C. Jay Kuo

Abstract: Supervised trackers trained on labeled data dominate the single object tracking field for superior tracking accuracy. The labeling cost and the huge computational complexity hinder their applications on edge devices. Unsupervised learning methods have also been investigated to reduce the labeling cost but their complexity remains high. Aiming at lightweight high-performance tracking, feasibility w… ▽ More Supervised trackers trained on labeled data dominate the single object tracking field for superior tracking accuracy. The labeling cost and the huge computational complexity hinder their applications on edge devices. Unsupervised learning methods have also been investigated to reduce the labeling cost but their complexity remains high. Aiming at lightweight high-performance tracking, feasibility without offline pre-training, and algorithmic transparency, we propose a new single object tracking method, called the green object tracker (GOT), in this work. GOT conducts an ensemble of three prediction branches for robust box tracking: 1) a global object-based correlator to predict the object location roughly, 2) a local patch-based correlator to build temporal correlations of small spatial units, and 3) a superpixel-based segmentator to exploit the spatial information of the target frame. GOT offers competitive tracking accuracy with state-of-the-art unsupervised trackers, which demand heavy offline pre-training, at a lower computation cost. GOT has a tiny model size (<3k parameters) and low inference complexity (around 58M FLOPs per frame). Since its inference complexity is between 0.1%-10% of DL trackers, it can be easily deployed on mobile and edge devices. △ Less

Submitted 16 September, 2023; originally announced September 2023.

arXiv:2309.00237 [pdf, other]

Publicly Shareable Clinical Large Language Model Built on Synthetic Clinical Notes

Authors: Sunjun Kweon, Junu Kim, Jiyoun Kim, Sujeong Im, Eunbyeol Cho, Seongsu Bae, Jungwoo Oh, Gyubok Lee, Jong Hak Moon, Seng Chan You, Seungjin Baek, Chang Hoon Han, Yoon Bin Jung, Yohan Jo, Edward Choi

Abstract: The development of large language models tailored for handling patients' clinical notes is often hindered by the limited accessibility and usability of these notes due to strict privacy regulations. To address these challenges, we first create synthetic large-scale clinical notes using publicly available case reports extracted from biomedical literature. We then use these synthetic notes to train… ▽ More The development of large language models tailored for handling patients' clinical notes is often hindered by the limited accessibility and usability of these notes due to strict privacy regulations. To address these challenges, we first create synthetic large-scale clinical notes using publicly available case reports extracted from biomedical literature. We then use these synthetic notes to train our specialized clinical large language model, Asclepius. While Asclepius is trained on synthetic data, we assess its potential performance in real-world applications by evaluating it using real clinical notes. We benchmark Asclepius against several other large language models, including GPT-3.5-turbo and other open-source alternatives. To further validate our approach using synthetic notes, we also compare Asclepius with its variants trained on real clinical notes. Our findings convincingly demonstrate that synthetic clinical notes can serve as viable substitutes for real ones when constructing high-performing clinical language models. This conclusion is supported by detailed evaluations conducted by both GPT-4 and medical professionals. All resources including weights, codes, and data used in the development of Asclepius are made publicly accessible for future research. (https://github.com/starmpcc/Asclepius) △ Less

Submitted 13 June, 2024; v1 submitted 1 September, 2023; originally announced September 2023.

Comments: ACL 2024 (Findings)

arXiv:2308.11880 [pdf, other]

SUMMIT: Source-Free Adaptation of Uni-Modal Models to Multi-Modal Targets

Authors: Cody Simons, Dripta S. Raychaudhuri, Sk Miraj Ahmed, Suya You, Konstantinos Karydis, Amit K. Roy-Chowdhury

Abstract: Scene understanding using multi-modal data is necessary in many applications, e.g., autonomous navigation. To achieve this in a variety of situations, existing models must be able to adapt to shifting data distributions without arduous data annotation. Current approaches assume that the source data is available during adaptation and that the source consists of paired multi-modal data. Both these a… ▽ More Scene understanding using multi-modal data is necessary in many applications, e.g., autonomous navigation. To achieve this in a variety of situations, existing models must be able to adapt to shifting data distributions without arduous data annotation. Current approaches assume that the source data is available during adaptation and that the source consists of paired multi-modal data. Both these assumptions may be problematic for many applications. Source data may not be available due to privacy, security, or economic concerns. Assuming the existence of paired multi-modal data for training also entails significant data collection costs and fails to take advantage of widely available freely distributed pre-trained uni-modal models. In this work, we relax both of these assumptions by addressing the problem of adapting a set of models trained independently on uni-modal data to a target domain consisting of unlabeled multi-modal data, without having access to the original source dataset. Our proposed approach solves this problem through a switching framework which automatically chooses between two complementary methods of cross-modal pseudo-label fusion -- agreement filtering and entropy weighting -- based on the estimated domain gap. We demonstrate our work on the semantic segmentation problem. Experiments across seven challenging adaptation scenarios verify the efficacy of our approach, achieving results comparable to, and in some cases outperforming, methods which assume access to source data. Our method achieves an improvement in mIoU of up to 12% over competing baselines. Our code is publicly available at https://github.com/csimo005/SUMMIT. △ Less

Submitted 22 August, 2023; originally announced August 2023.

Comments: 12 pages, 5 figures, 9 tables, ICCV 2023

arXiv:2308.10870 [pdf, other]

Magnetic Phase Imaging using Lorentz Near-field Electron Ptychography

Authors: Shengbo You, Peng-Han Lu, András Kovács, Thomas Schachinger, Frederick Allars, Rafal E. Dunin-Borkowski, Andrew M. Maiden

Abstract: Over the past few years, the combination of diffuser and near-field electron ptychography has drawn more attention by its ability to recover large field of view with few diffraction patterns. In this paper, we purpose a novel design and implementation of amplitude diffuser. The amplitude diffuser introduces structures to the illumination while reducing the inelastic scattering. And the amplitude d… ▽ More Over the past few years, the combination of diffuser and near-field electron ptychography has drawn more attention by its ability to recover large field of view with few diffraction patterns. In this paper, we purpose a novel design and implementation of amplitude diffuser. The amplitude diffuser introduces structures to the illumination while reducing the inelastic scattering. And the amplitude diffuser is implemented at the condenser lens aperture, allowing us to vary the illumination size under the same microscope setup. We demonstrate the reconstruction results under both conventional Transmission Electron Microscopy (TEM) mode as well as Lorentz mode. △ Less

Submitted 26 July, 2023; originally announced August 2023.

Comments: Presented in ISCS23

Report number: ISCS23-17

arXiv:2308.10761 [pdf, other]

CoNe: Contrast Your Neighbours for Supervised Image Classification

Authors: Mingkai Zheng, Shan You, Lang Huang, Xiu Su, Fei Wang, Chen Qian, Xiaogang Wang, Chang Xu

Abstract: Image classification is a longstanding problem in computer vision and machine learning research. Most recent works (e.g. SupCon , Triplet, and max-margin) mainly focus on grouping the intra-class samples aggressively and compactly, with the assumption that all intra-class samples should be pulled tightly towards their class centers. However, such an objective will be very hard to achieve since it… ▽ More Image classification is a longstanding problem in computer vision and machine learning research. Most recent works (e.g. SupCon , Triplet, and max-margin) mainly focus on grouping the intra-class samples aggressively and compactly, with the assumption that all intra-class samples should be pulled tightly towards their class centers. However, such an objective will be very hard to achieve since it ignores the intra-class variance in the dataset. (i.e. different instances from the same class can have significant differences). Thus, such a monotonous objective is not sufficient. To provide a more informative objective, we introduce Contrast Your Neighbours (CoNe) - a simple yet practical learning framework for supervised image classification. Specifically, in CoNe, each sample is not only supervised by its class center but also directly employs the features of its similar neighbors as anchors to generate more adaptive and refined targets. Moreover, to further boost the performance, we propose ``distributional consistency" as a more informative regularization to enable similar instances to have a similar probability distribution. Extensive experimental results demonstrate that CoNe achieves state-of-the-art performance across different benchmark datasets, network architectures, and settings. Notably, even without a complicated training recipe, our CoNe achieves 80.8\% Top-1 accuracy on ImageNet with ResNet-50, which surpasses the recent Timm training recipe (80.4\%). Code and pre-trained models are available at \href{https://github.com/mingkai-zheng/CoNe}{https://github.com/mingkai-zheng/CoNe}. △ Less

Submitted 21 August, 2023; originally announced August 2023.

arXiv:2308.06692 [pdf, other]

SimMatchV2: Semi-Supervised Learning with Graph Consistency

Authors: Mingkai Zheng, Shan You, Lang Huang, Chen Luo, Fei Wang, Chen Qian, Chang Xu

Abstract: Semi-Supervised image classification is one of the most fundamental problem in computer vision, which significantly reduces the need for human labor. In this paper, we introduce a new semi-supervised learning algorithm - SimMatchV2, which formulates various consistency regularizations between labeled and unlabeled data from the graph perspective. In SimMatchV2, we regard the augmented view of a sa… ▽ More Semi-Supervised image classification is one of the most fundamental problem in computer vision, which significantly reduces the need for human labor. In this paper, we introduce a new semi-supervised learning algorithm - SimMatchV2, which formulates various consistency regularizations between labeled and unlabeled data from the graph perspective. In SimMatchV2, we regard the augmented view of a sample as a node, which consists of a label and its corresponding representation. Different nodes are connected with the edges, which are measured by the similarity of the node representations. Inspired by the message passing and node classification in graph theory, we propose four types of consistencies, namely 1) node-node consistency, 2) node-edge consistency, 3) edge-edge consistency, and 4) edge-node consistency. We also uncover that a simple feature normalization can reduce the gaps of the feature norm between different augmented views, significantly improving the performance of SimMatchV2. Our SimMatchV2 has been validated on multiple semi-supervised learning benchmarks. Notably, with ResNet-50 as our backbone and 300 epochs of training, SimMatchV2 achieves 71.9\% and 76.2\% Top-1 Accuracy with 1\% and 10\% labeled examples on ImageNet, which significantly outperforms the previous methods and achieves state-of-the-art performance. Code and pre-trained models are available at \href{https://github.com/mingkai-zheng/SimMatchV2}{https://github.com/mingkai-zheng/SimMatchV2}. △ Less

Submitted 13 August, 2023; originally announced August 2023.

arXiv:2307.13529 [pdf, other]

Re-mine, Learn and Reason: Exploring the Cross-modal Semantic Correlations for Language-guided HOI detection

Authors: Yichao Cao, Qingfei Tang, Feng Yang, Xiu Su, Shan You, Xiaobo Lu, Chang Xu

Abstract: Human-Object Interaction (HOI) detection is a challenging computer vision task that requires visual models to address the complex interactive relationship between humans and objects and predict HOI triplets. Despite the challenges posed by the numerous interaction combinations, they also offer opportunities for multimodal learning of visual texts. In this paper, we present a systematic and unified… ▽ More Human-Object Interaction (HOI) detection is a challenging computer vision task that requires visual models to address the complex interactive relationship between humans and objects and predict HOI triplets. Despite the challenges posed by the numerous interaction combinations, they also offer opportunities for multimodal learning of visual texts. In this paper, we present a systematic and unified framework (RmLR) that enhances HOI detection by incorporating structured text knowledge. Firstly, we qualitatively and quantitatively analyze the loss of interaction information in the two-stage HOI detector and propose a re-mining strategy to generate more comprehensive visual representation.Secondly, we design more fine-grained sentence- and word-level alignment and knowledge transfer strategies to effectively address the many-to-many matching problem between multiple interactions and multiple texts.These strategies alleviate the matching confusion problem that arises when multiple interactions occur simultaneously, thereby improving the effectiveness of the alignment process. Finally, HOI reasoning by visual features augmented with textual knowledge substantially improves the understanding of interactions. Experimental results illustrate the effectiveness of our approach, where state-of-the-art performance is achieved on public benchmarks. We further analyze the effects of different components of our approach to provide insights into its efficacy. △ Less

Submitted 18 September, 2023; v1 submitted 25 July, 2023; originally announced July 2023.

Comments: ICCV2023

arXiv:2307.00371 [pdf, other]

Learning Content-enhanced Mask Transformer for Domain Generalized Urban-Scene Segmentation

Authors: Qi Bi, Shaodi You, Theo Gevers

Abstract: Domain-generalized urban-scene semantic segmentation (USSS) aims to learn generalized semantic predictions across diverse urban-scene styles. Unlike domain gap challenges, USSS is unique in that the semantic categories are often similar in different urban scenes, while the styles can vary significantly due to changes in urban landscapes, weather conditions, lighting, and other factors. Existing ap… ▽ More Domain-generalized urban-scene semantic segmentation (USSS) aims to learn generalized semantic predictions across diverse urban-scene styles. Unlike domain gap challenges, USSS is unique in that the semantic categories are often similar in different urban scenes, while the styles can vary significantly due to changes in urban landscapes, weather conditions, lighting, and other factors. Existing approaches typically rely on convolutional neural networks (CNNs) to learn the content of urban scenes. In this paper, we propose a Content-enhanced Mask TransFormer (CMFormer) for domain-generalized USSS. The main idea is to enhance the focus of the fundamental component, the mask attention mechanism, in Transformer segmentation models on content information. To achieve this, we introduce a novel content-enhanced mask attention mechanism. It learns mask queries from both the image feature and its down-sampled counterpart, as lower-resolution image features usually contain more robust content information and are less sensitive to style variations. These features are fused into a Transformer decoder and integrated into a multi-resolution content-enhanced mask attention learning scheme. Extensive experiments conducted on various domain-generalized urban-scene segmentation datasets demonstrate that the proposed CMFormer significantly outperforms existing CNN-based methods for domain-generalized semantic segmentation, achieving improvements of up to 14.00\% in terms of mIoU (mean intersection over union). The source code is publicly available at \url{https://github.com/BiQiWHU/CMFormer}. △ Less

Submitted 17 December, 2023; v1 submitted 1 July, 2023; originally announced July 2023.

Comments: Accepted by AAAI 2024. Camera-ready version with available source code

arXiv:2306.05244 [pdf, other]

Spectral-temporal-spatial customization via modulating multimodal nonlinear pulse propagation

Authors: Tong Qiu, Honghao Cao, Kunzan Liu, Li-Yu Yu, Manuel Levy, Eva Lendaro, Fan Wang, Sixian You

Abstract: Multimode fibers (MMFs) have recently reemerged as attractive avenues for nonlinear effects due to their high-dimensional spatiotemporal nonlinear dynamics and scalability for high power. High-brightness MMF sources with effective control of the nonlinear processes would offer new possibilities for a wide range of applications from high-power fiber lasers, to bioimaging and chemical sensing, and t… ▽ More Multimode fibers (MMFs) have recently reemerged as attractive avenues for nonlinear effects due to their high-dimensional spatiotemporal nonlinear dynamics and scalability for high power. High-brightness MMF sources with effective control of the nonlinear processes would offer new possibilities for a wide range of applications from high-power fiber lasers, to bioimaging and chemical sensing, and to novel physics phenomena. Here we present a simple yet effective way of controlling nonlinear effects at high peak power levels: by leveraging not only the spatial but also the temporal degrees of freedom of the multimodal nonlinear pulse propagation in step-index MMFs using a programmable fiber shaper. This method represents the first method that enables modulation and optimization of multimodal nonlinear pulse propagation, achieving high tunability and broadband high peak power. Its potential as a nonlinear imaging source is further demonstrated by applying the MMF source to multiphoton microscopy, where widely tunable two-photon and three-photon imaging is achieved with adaptive optimization. These demonstrations highlight the effectiveness of directly modulating multimodal nonlinear pulse propagation to enhance the high-dimensional customization and optimize the high spectral brightness of MMF output. These advancements provide new possibilities for technology advances in nonlinear optics, bioimaging, spectroscopy, optical computing, and material processing. △ Less

Submitted 5 December, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

Comments: 36 pages, 19 figures

arXiv:2305.15712 [pdf, other]

Knowledge Diffusion for Distillation

Authors: Tao Huang, Yuan Zhang, Mingkai Zheng, Shan You, Fei Wang, Chen Qian, Chang Xu

Abstract: The representation gap between teacher and student is an emerging topic in knowledge distillation (KD). To reduce the gap and improve the performance, current methods often resort to complicated training schemes, loss functions, and feature alignments, which are task-specific and feature-specific. In this paper, we state that the essence of these methods is to discard the noisy information and dis… ▽ More The representation gap between teacher and student is an emerging topic in knowledge distillation (KD). To reduce the gap and improve the performance, current methods often resort to complicated training schemes, loss functions, and feature alignments, which are task-specific and feature-specific. In this paper, we state that the essence of these methods is to discard the noisy information and distill the valuable information in the feature, and propose a novel KD method dubbed DiffKD, to explicitly denoise and match features using diffusion models. Our approach is based on the observation that student features typically contain more noises than teacher features due to the smaller capacity of student model. To address this, we propose to denoise student features using a diffusion model trained by teacher features. This allows us to perform better distillation between the refined clean feature and teacher feature. Additionally, we introduce a light-weight diffusion model with a linear autoencoder to reduce the computation cost and an adaptive noise matching module to improve the denoising performance. Extensive experiments demonstrate that DiffKD is effective across various types of features and achieves state-of-the-art performance consistently on image classification, object detection, and semantic segmentation tasks. Code is available at https://github.com/hunto/DiffKD. △ Less

Submitted 3 December, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

Comments: NeurIPS 2023

arXiv:2304.12591 [pdf, other]

Unsupervised Synthetic Image Refinement via Contrastive Learning and Consistent Semantic-Structural Constraints

Authors: Ganning Zhao, Tingwei Shen, Suya You, C. -C. Jay Kuo

Abstract: Ensuring the realism of computer-generated synthetic images is crucial to deep neural network (DNN) training. Due to different semantic distributions between synthetic and real-world captured datasets, there exists semantic mismatch between synthetic and refined images, which in turn results in the semantic distortion. Recently, contrastive learning (CL) has been successfully used to pull correlat… ▽ More Ensuring the realism of computer-generated synthetic images is crucial to deep neural network (DNN) training. Due to different semantic distributions between synthetic and real-world captured datasets, there exists semantic mismatch between synthetic and refined images, which in turn results in the semantic distortion. Recently, contrastive learning (CL) has been successfully used to pull correlated patches together and push uncorrelated ones apart. In this work, we exploit semantic and structural consistency between synthetic and refined images and adopt CL to reduce the semantic distortion. Besides, we incorporate hard negative mining to improve the performance furthermore. We compare the performance of our method with several other benchmarking methods using qualitative and quantitative measures and show that our method offers the state-of-the-art performance. △ Less

Submitted 26 April, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

arXiv:2304.12463 [pdf, other]

A Study on Improving Realism of Synthetic Data for Machine Learning

Authors: Tingwei Shen, Ganning Zhao, Suya You

Abstract: Synthetic-to-real data translation using generative adversarial learning has achieved significant success in improving synthetic data. Yet, limited studies focus on deep evaluation and comparison of adversarial training on general-purpose synthetic data for machine learning. This work aims to train and evaluate a synthetic-to-real generative model that transforms the synthetic renderings into more… ▽ More Synthetic-to-real data translation using generative adversarial learning has achieved significant success in improving synthetic data. Yet, limited studies focus on deep evaluation and comparison of adversarial training on general-purpose synthetic data for machine learning. This work aims to train and evaluate a synthetic-to-real generative model that transforms the synthetic renderings into more realistic styles on general-purpose datasets conditioned with unlabeled real-world data. Extensive performance evaluation and comparison have been conducted through qualitative and quantitative metrics and a defined downstream perception task. △ Less

Submitted 28 April, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

Comments: 8 pages, 1 figure, 7 tables. Submit to the "SPIE Defense + Commercial Sensing" conference

arXiv:2304.10970 [pdf, other]

Can GPT-4 Perform Neural Architecture Search?

Authors: Mingkai Zheng, Xiu Su, Shan You, Fei Wang, Chen Qian, Chang Xu, Samuel Albanie

Abstract: We investigate the potential of GPT-4~\cite{gpt4} to perform Neural Architecture Search (NAS) -- the task of designing effective neural architectures. Our proposed approach, \textbf{G}PT-4 \textbf{E}nhanced \textbf{N}eural arch\textbf{I}tect\textbf{U}re \textbf{S}earch (GENIUS), leverages the generative capabilities of GPT-4 as a black-box optimiser to quickly navigate the architecture search spac… ▽ More We investigate the potential of GPT-4~\cite{gpt4} to perform Neural Architecture Search (NAS) -- the task of designing effective neural architectures. Our proposed approach, \textbf{G}PT-4 \textbf{E}nhanced \textbf{N}eural arch\textbf{I}tect\textbf{U}re \textbf{S}earch (GENIUS), leverages the generative capabilities of GPT-4 as a black-box optimiser to quickly navigate the architecture search space, pinpoint promising candidates, and iteratively refine these candidates to improve performance. We assess GENIUS across several benchmarks, comparing it with existing state-of-the-art NAS techniques to illustrate its effectiveness. Rather than targeting state-of-the-art performance, our objective is to highlight GPT-4's potential to assist research on a challenging technical problem through a simple prompting scheme that requires relatively limited domain expertise\footnote{Code available at \href{https://github.com/mingkai-zheng/GENIUS}{https://github.com/mingkai-zheng/GENIUS}.}. More broadly, we believe our preliminary results point to future research that harnesses general purpose language models for diverse optimisation tasks. We also highlight important limitations to our study, and note implications for AI safety. △ Less

Submitted 1 August, 2023; v1 submitted 21 April, 2023; originally announced April 2023.

arXiv:2303.15790 [pdf, other]

doi 10.1007/s11467-023-1333-z

STCF Conceptual Design Report: Volume 1 -- Physics & Detector

Authors: M. Achasov, X. C. Ai, R. Aliberti, L. P. An, Q. An, X. Z. Bai, Y. Bai, O. Bakina, A. Barnyakov, V. Blinov, V. Bobrovnikov, D. Bodrov, A. Bogomyagkov, A. Bondar, I. Boyko, Z. H. Bu, F. M. Cai, H. Cai, J. J. Cao, Q. H. Cao, Z. Cao, Q. Chang, K. T. Chao, D. Y. Chen, H. Chen , et al. (413 additional authors not shown)

Abstract: The Super $τ$-Charm facility (STCF) is an electron-positron collider proposed by the Chinese particle physics community. It is designed to operate in a center-of-mass energy range from 2 to 7 GeV with a peak luminosity of $0.5\times 10^{35}{\rm cm}^{-2}{\rm s}^{-1}$ or higher. The STCF will produce a data sample about a factor of 100 larger than that by the present $τ$-Charm factory -- the BEPCII,… ▽ More The Super $τ$-Charm facility (STCF) is an electron-positron collider proposed by the Chinese particle physics community. It is designed to operate in a center-of-mass energy range from 2 to 7 GeV with a peak luminosity of $0.5\times 10^{35}{\rm cm}^{-2}{\rm s}^{-1}$ or higher. The STCF will produce a data sample about a factor of 100 larger than that by the present $τ$-Charm factory -- the BEPCII, providing a unique platform for exploring the asymmetry of matter-antimatter (charge-parity violation), in-depth studies of the internal structure of hadrons and the nature of non-perturbative strong interactions, as well as searching for exotic hadrons and physics beyond the Standard Model. The STCF project in China is under development with an extensive R\&D program. This document presents the physics opportunities at the STCF, describes conceptual designs of the STCF detector system, and discusses future plans for detector R\&D and physics case studies. △ Less

Submitted 5 October, 2023; v1 submitted 28 March, 2023; originally announced March 2023.

Journal ref: Front. Phys. 19(1), 14701 (2024)

arXiv:2303.03793 [pdf]

Roadmap on Deep Learning for Microscopy

Authors: Giovanni Volpe, Carolina Wählby, Lei Tian, Michael Hecht, Artur Yakimovich, Kristina Monakhova, Laura Waller, Ivo F. Sbalzarini, Christopher A. Metzler, Mingyang Xie, Kevin Zhang, Isaac C. D. Lenton, Halina Rubinsztein-Dunlop, Daniel Brunner, Bijie Bai, Aydogan Ozcan, Daniel Midtvedt, Hao Wang, Nataša Sladoje, Joakim Lindblad, Jason T. Smith, Marien Ochoa, Margarida Barroso, Xavier Intes, Tong Qiu , et al. (50 additional authors not shown)

Abstract: Through digital imaging, microscopy has evolved from primarily being a means for visual observation of life at the micro- and nano-scale, to a quantitative tool with ever-increasing resolution and throughput. Artificial intelligence, deep neural networks, and machine learning are all niche terms describing computational methods that have gained a pivotal role in microscopy-based research over the… ▽ More Through digital imaging, microscopy has evolved from primarily being a means for visual observation of life at the micro- and nano-scale, to a quantitative tool with ever-increasing resolution and throughput. Artificial intelligence, deep neural networks, and machine learning are all niche terms describing computational methods that have gained a pivotal role in microscopy-based research over the past decade. This Roadmap is written collectively by prominent researchers and encompasses selected aspects of how machine learning is applied to microscopy image data, with the aim of gaining scientific knowledge by improved image quality, automated detection, segmentation, classification and tracking of objects, and efficient merging of information from multiple imaging modalities. We aim to give the reader an overview of the key developments and an understanding of possibilities and limitations of machine learning for microscopy. It will be of interest to a wide cross-disciplinary audience in the physical sciences and life sciences. △ Less

Submitted 7 March, 2023; originally announced March 2023.

arXiv:2302.10254 [pdf, other]

High-Fidelity and High-Speed Wavefront Shaping by Leveraging Complex Media

Authors: Li-Yu Yu, Sixian You

Abstract: Achieving high-precision light manipulation is crucial for delivering information through complex media with high fidelity. However, existing spatial light modulation devices face a fundamental tradeoff between speed and accuracy. Digital micromirror devices (DMDs) have emerged as a promising candidate as accessible high-speed wavefront shaping devices but at the cost of compromised fidelity, larg… ▽ More Achieving high-precision light manipulation is crucial for delivering information through complex media with high fidelity. However, existing spatial light modulation devices face a fundamental tradeoff between speed and accuracy. Digital micromirror devices (DMDs) have emerged as a promising candidate as accessible high-speed wavefront shaping devices but at the cost of compromised fidelity, largely due to the limited control degrees of freedom and the challenge of numerically optimizing a binary amplitude mask. Here we leverage the sparse-to-random transformation through complex media to overcome the dimensionality limitation of spatial light modulation devices. We demonstrate that pattern compression in the form of sparsity-constrained wavefront optimization allows sparse and robust wavefront representations of generic patterns in the random basis provided by the complex media, and thus effectively addresses the dimensionality limitation of DMDs, which significantly improves the projection fidelity without sacrificing the full frame rate (22 kHz), hardware complexity, or optimization time (0.5 s for 1000 frames). Since the dimensionality limitation is intrinsic to spatial light modulation devices and sparse-to-random transformation to complex media, our methods can be generalized to different pattern types, complex media, and device settings, supporting consistent superior performance across different types of complex media with up to an 89% increase in projection accuracy and a 126% improvement in speckle suppression. The proposed optimization framework has the potential to enhance existing holographic setups without any change to the hardware, enable high-fidelity and high-speed wavefront shaping through different scattering media and platforms, and directly facilitate a wide range of physics and real-world applications. △ Less

Submitted 19 September, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

arXiv:2302.08595 [pdf, other]

Frequency-domain Learning for Volumetric-based 3D Data Perception

Authors: Zifan Yu, Suya You, Fengbo Ren

Abstract: Frequency-domain learning draws attention due to its superior tradeoff between inference accuracy and input data size. Frequency-domain learning in 2D computer vision tasks has shown that 2D convolutional neural networks (CNN) have a stationary spectral bias towards low-frequency channels so that high-frequency channels can be pruned with no or little accuracy degradation. However, frequency-domai… ▽ More Frequency-domain learning draws attention due to its superior tradeoff between inference accuracy and input data size. Frequency-domain learning in 2D computer vision tasks has shown that 2D convolutional neural networks (CNN) have a stationary spectral bias towards low-frequency channels so that high-frequency channels can be pruned with no or little accuracy degradation. However, frequency-domain learning has not been studied in the context of 3D CNNs with 3D volumetric data. In this paper, we study frequency-domain learning for volumetric-based 3D data perception to reveal the spectral bias and the accuracy-input-data-size tradeoff of 3D CNNs. Our study finds that 3D CNNs are sensitive to a limited number of critical frequency channels, especially low-frequency channels. Experiment results show that frequency-domain learning can significantly reduce the size of volumetric-based 3D inputs (based on spectral bias) while achieving comparable accuracy with conventional spatial-domain learning approaches. Specifically, frequency-domain learning is able to reduce the input data size by 98% in 3D shape classification while limiting the average accuracy drop within 2%, and by 98% in the 3D point cloud semantic segmentation with a 1.48% mean-class accuracy improvement while limiting the mean-class IoU loss within 1.55%. Moreover, by learning from higher-resolution 3D data (i.e., 2x of the original image in the spatial domain), frequency-domain learning improves the mean-class accuracy and mean-class IoU by 3.04% and 0.63%, respectively, while achieving an 87.5% input data size reduction in 3D point cloud semantic segmentation. △ Less

Submitted 20 February, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

Comments: 13 pages

arXiv:2302.08594 [pdf, other]

TransUPR: A Transformer-based Uncertain Point Refiner for LiDAR Point Cloud Semantic Segmentation

Authors: Zifan Yu, Meida Chen, Zhikang Zhang, Suya You, Raghuveer Rao, Sanjeev Agarwal, Fengbo Ren

Abstract: Common image-based LiDAR point cloud semantic segmentation (LiDAR PCSS) approaches have bottlenecks resulting from the boundary-blurring problem of convolution neural networks (CNNs) and quantitation loss of spherical projection. In this work, we propose a transformer-based plug-and-play uncertain point refiner, i.e., TransUPR, to refine selected uncertain points in a learnable manner, which leads… ▽ More Common image-based LiDAR point cloud semantic segmentation (LiDAR PCSS) approaches have bottlenecks resulting from the boundary-blurring problem of convolution neural networks (CNNs) and quantitation loss of spherical projection. In this work, we propose a transformer-based plug-and-play uncertain point refiner, i.e., TransUPR, to refine selected uncertain points in a learnable manner, which leads to an improved segmentation performance. Uncertain points are sampled from coarse semantic segmentation results of 2D image segmentation where uncertain points are located close to the object boundaries in the 2D range image representation and 3D spherical projection background points. Following that, the geometry and coarse semantic features of uncertain points are aggregated by neighbor points in 3D space without adding expensive computation and memory footprint. Finally, the transformer-based refiner, which contains four stacked self-attention layers, along with an MLP module, is utilized for uncertain point classification on the concatenated features of self-attention layers. As the proposed refiner is independent of 2D CNNs, our TransUPR can be easily integrated into any existing image-based LiDAR PCSS approaches, e.g., CENet. Our TransUPR with the CENet achieves state-of-the-art performance, i.e., 68.2% mean Intersection over Union (mIoU) on the Semantic KITTI benchmark, which provides a performance improvement of 0.6% on the mIoU compared to the original CENet. △ Less

Submitted 12 October, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

Comments: 6 pages; Accepted by 2023 IROS

arXiv:2302.03368 [pdf]

doi 10.1038/s41467-023-38551-0

Melting Domain Size and Recrystallization Dynamics of Ice Revealed by Time-Resolved X-ray Scattering

Authors: Cheolhee Yang, Marjorie Ladd-Parada, Kyeongmin Nam, Sangmin Jeong, Seonju You, Alexander Späh, Harshad Pathak, Tobias Eklund, Thomas J. Lane, Jae Hyuk Lee, Intae Eom, Minseok Kim, Katrin Amann- Winkel, Fivos Perakis, Anders Nilsson, Kyung Hwan Kim

Abstract: The phase transition between water and ice is ubiquitous and one of the most important phenomena in nature. Here, we performed time-resolved x-ray scattering experiments capturing the melting and recrystallization dynamics of ice. The ultrafast heating of ice I is induced by an IR laser pulse and probed with an intense x-ray pulse, which provided us with direct structural information on different… ▽ More The phase transition between water and ice is ubiquitous and one of the most important phenomena in nature. Here, we performed time-resolved x-ray scattering experiments capturing the melting and recrystallization dynamics of ice. The ultrafast heating of ice I is induced by an IR laser pulse and probed with an intense x-ray pulse, which provided us with direct structural information on different length scales. From the wide-angle x-ray scattering (WAXS) patterns, the molten fraction, as well as the corresponding temperature at each delay, were determined. The small-angle x-ray scattering (SAXS) patterns, together with the information extracted from the WAXS analysis, provided the time-dependent change of the size and the number of the liquid domains. The results show partial melting (~13 %) and superheating of ice occurring at around 20 ns. After 100 ns, the average size of the liquid domains grows from about 2.5 nm to 4.5 nm by the coalescence of approximately six adjacent domains. Subsequently, we capture the recrystallization of the liquid domains, which occurs on microsecond timescales due to the cooling by heat dissipation and results to a decrease of the average liquid domain size. △ Less

Submitted 7 February, 2023; originally announced February 2023.

arXiv:2301.07666 [pdf, other]

DDS: Decoupled Dynamic Scene-Graph Generation Network

Authors: A S M Iftekhar, Raphael Ruschel, Satish Kumar, Suya You, B. S. Manjunath

Abstract: Scene-graph generation involves creating a structural representation of the relationships between objects in a scene by predicting subject-object-relation triplets from input data. However, existing methods show poor performance in detecting triplets outside of a predefined set, primarily due to their reliance on dependent feature learning. To address this issue we propose DDS -- a decoupled dynam… ▽ More Scene-graph generation involves creating a structural representation of the relationships between objects in a scene by predicting subject-object-relation triplets from input data. However, existing methods show poor performance in detecting triplets outside of a predefined set, primarily due to their reliance on dependent feature learning. To address this issue we propose DDS -- a decoupled dynamic scene-graph generation network -- that consists of two independent branches that can disentangle extracted features. The key innovation of the current paper is the decoupling of the features representing the relationships from those of the objects, which enables the detection of novel object-relationship combinations. The DDS model is evaluated on three datasets and outperforms previous methods by a significant margin, especially in detecting previously unseen triplets. △ Less

Submitted 18 January, 2023; originally announced January 2023.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2301.07310 [pdf, other]

doi 10.1145/3544548.3581041

MR.Brick: Designing A Remote Mixed-reality Educational Game System for Promoting Children's Social & Collaborative Skills

Authors: Yudan Wu, Shanhe You, Zixuan Guo, Xiangyang Li, Guyue Zhou, Jiangtao Gong

Abstract: Children are one of the groups most influenced by COVID-19-related social distancing, and a lack of contact with peers can limit their opportunities to develop social and collaborative skills. However, remote socialization and collaboration as an alternative approach is still a great challenge for children. This paper presents MR.Brick, a Mixed Reality (MR) educational game system that helps child… ▽ More Children are one of the groups most influenced by COVID-19-related social distancing, and a lack of contact with peers can limit their opportunities to develop social and collaborative skills. However, remote socialization and collaboration as an alternative approach is still a great challenge for children. This paper presents MR.Brick, a Mixed Reality (MR) educational game system that helps children adapt to remote collaboration. A controlled experimental study involving 24 children aged six to ten was conducted to compare MR.Brick with the traditional video game by measuring their social and collaborative skills and analyzing their multi-modal playing behaviours. The results showed that MR.Brick was more conducive to children's remote collaboration experience than the traditional video game. Given the lack of training systems designed for children to collaborate remotely, this study may inspire interaction design and educational research in related fields. △ Less

Submitted 26 January, 2023; v1 submitted 18 January, 2023; originally announced January 2023.

Comments: 14 pages, 9 figures

MSC Class: H.5.2

Journal ref: CHI2023

arXiv:2212.14623 [pdf, other]

Essential Number of Principal Components and Nearly Training-Free Model for Spectral Analysis

Authors: Yifeng Bie, Shuai You, Xinrui Li, Xuekui Zhang, Tao Lu

Abstract: Through a study of multi-gas mixture datasets, we show that in multi-component spectral analysis, the number of functional or non-functional principal components required to retain the essential information is the same as the number of independent constituents in the mixture set. Due to the mutual in-dependency among different gas molecules, near one-to-one projection from the principal component… ▽ More Through a study of multi-gas mixture datasets, we show that in multi-component spectral analysis, the number of functional or non-functional principal components required to retain the essential information is the same as the number of independent constituents in the mixture set. Due to the mutual in-dependency among different gas molecules, near one-to-one projection from the principal component to the mixture constituent can be established, leading to a significant simplification of spectral quantification. Further, with the knowledge of the molar extinction coefficients of each constituent, a complete principal component set can be extracted from the coefficients directly, and few to none training samples are required for the learning model. Compared to other approaches, the proposed methods provide fast and accurate spectral quantification solutions with a small memory size needed. △ Less

Submitted 30 December, 2022; originally announced December 2022.

Showing 1–50 of 240 results for author: You, S