subscribe to arXiv mailings

Novel clustered federated learning based on local loss

Authors: Endong Gu, Yongxin Chen, Hao Wen, Xingju Cai, Deren Han

Abstract: This paper proposes LCFL, a novel clustering metric for evaluating clients' data distributions in federated learning. LCFL aligns with federated learning requirements, accurately assessing client-to-client variations in data distribution. It offers advantages over existing clustered federated learning methods, addressing privacy concerns, improving applicability to non-convex models, and providing… ▽ More This paper proposes LCFL, a novel clustering metric for evaluating clients' data distributions in federated learning. LCFL aligns with federated learning requirements, accurately assessing client-to-client variations in data distribution. It offers advantages over existing clustered federated learning methods, addressing privacy concerns, improving applicability to non-convex models, and providing more accurate classification results. LCFL does not require prior knowledge of clients' data distributions. We provide a rigorous mathematical analysis, demonstrating the correctness and feasibility of our framework. Numerical experiments with neural network instances highlight the superior performance of LCFL over baselines on several clustered federated learning benchmarks. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.05045 [pdf, other]

Robust Skin Color Driven Privacy Preserving Face Recognition via Function Secret Sharing

Authors: Dong Han, Yufan Jiang, Yong Li, Ricardo Mendes, Joachim Denzler

Abstract: In this work, we leverage the pure skin color patch from the face image as the additional information to train an auxiliary skin color feature extractor and face recognition model in parallel to improve performance of state-of-the-art (SOTA) privacy-preserving face recognition (PPFR) systems. Our solution is robust against black-box attacking and well-established generative adversarial network (GA… ▽ More In this work, we leverage the pure skin color patch from the face image as the additional information to train an auxiliary skin color feature extractor and face recognition model in parallel to improve performance of state-of-the-art (SOTA) privacy-preserving face recognition (PPFR) systems. Our solution is robust against black-box attacking and well-established generative adversarial network (GAN) based image restoration. We analyze the potential risk in previous work, where the proposed cosine similarity computation might directly leak the protected precomputed embedding stored on the server side. We propose a Function Secret Sharing (FSS) based face embedding comparison protocol without any intermediate result leakage. In addition, we show in experiments that the proposed protocol is more efficient compared to the Secret Sharing (SS) based protocol. △ Less

Submitted 6 July, 2024; originally announced July 2024.

Comments: Accepted at ICIP2024

arXiv:2406.17226 [pdf, other]

Extended alternating structure-adapted proximal gradient algorithm for nonconvex nonsmooth problems

Authors: Ying Gao, Chunfeng Cui, Wenxing Zhang, Deren Han

Abstract: Alternating structure-adapted proximal (ASAP) gradient algorithm (M. Nikolova and P. Tan, SIAM J Optim, 29:2053-2078, 2019) has drawn much attention due to its efficiency in solving nonconvex nonsmooth optimization problems. However, the multiblock nonseparable structure confines the performance of ASAP to far-reaching practical problems, e.g., coupled tensor decomposition. In this paper, we propo… ▽ More Alternating structure-adapted proximal (ASAP) gradient algorithm (M. Nikolova and P. Tan, SIAM J Optim, 29:2053-2078, 2019) has drawn much attention due to its efficiency in solving nonconvex nonsmooth optimization problems. However, the multiblock nonseparable structure confines the performance of ASAP to far-reaching practical problems, e.g., coupled tensor decomposition. In this paper, we propose an extended ASAP (eASAP) algorithm for nonconvex nonsmooth optimization whose objective is the sum of two nonseperable functions and a coupling one. By exploiting the blockwise restricted prox-regularity, eASAP is capable of minimizing the objective whose coupling function is multiblock nonseparable. Moreover, we analyze the global convergence of eASAP by virtue of the Aubin property on partial subdifferential mapping and the Kurdyka-Łojasiewicz property on the objective. Furthermore, the sublinear convergence rate of eASAP is built upon the proximal point algorithmic framework under some mild conditions. Numerical simulations on multimodal data fusion demonstrate the compelling performance of the proposed method. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.08777 [pdf, other]

Finite Time Blowup of Integer- and Fractional-Order Time-Delayed Diffusion Equations

Authors: Christopher N. Angstmann, Stuart-James M. Burney, Daniel S. Han, Bruce I. Henry, Boris Z. Huang, Zhuang Xu

Abstract: In this work, exact solutions are derived for an integer- and fractional-order time-delayed diffusion equation with arbitrary initial conditions. The solutions are obtained using Fourier transform methods in conjunction with the known properties of delay functions. It is observed that the solutions do not exhibit infinite speed of propagation for smooth initial conditions that are bounded and posi… ▽ More In this work, exact solutions are derived for an integer- and fractional-order time-delayed diffusion equation with arbitrary initial conditions. The solutions are obtained using Fourier transform methods in conjunction with the known properties of delay functions. It is observed that the solutions do not exhibit infinite speed of propagation for smooth initial conditions that are bounded and positive. Sufficient conditions on the initial condition are also established such that the finite time blowup of the solutions can be explicitly calculated. Examples are provided that highlight the contrasting behaviours of these exact solutions with the known dynamics of solutions to the standard diffusion equation. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 17 pages, 2 figures

MSC Class: 35R25; 35C10; 34K06; 34K37; 33E20; 42A38

arXiv:2406.01349 [pdf, other]

Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation

Authors: Enhui Ma, Lijun Zhou, Tao Tang, Zhan Zhang, Dong Han, Junpeng Jiang, Kun Zhan, Peng Jia, Xianpeng Lang, Haiyang Sun, Di Lin, Kaicheng Yu

Abstract: Using generative models to synthesize new data has become a de-facto standard in autonomous driving to address the data scarcity issue. Though existing approaches are able to boost perception models, we discover that these approaches fail to improve the performance of planning of end-to-end autonomous driving models as the generated videos are usually less than 8 frames and the spatial and tempora… ▽ More Using generative models to synthesize new data has become a de-facto standard in autonomous driving to address the data scarcity issue. Though existing approaches are able to boost perception models, we discover that these approaches fail to improve the performance of planning of end-to-end autonomous driving models as the generated videos are usually less than 8 frames and the spatial and temporal inconsistencies are not negligible. To this end, we propose Delphi, a novel diffusion-based long video generation method with a shared noise modeling mechanism across the multi-views to increase spatial consistency, and a feature-aligned module to achieves both precise controllability and temporal consistency. Our method can generate up to 40 frames of video without loss of consistency which is about 5 times longer compared with state-of-the-art methods. Instead of randomly generating new data, we further design a sampling policy to let Delphi generate new data that are similar to those failure cases to improve the sample efficiency. This is achieved by building a failure-case driven framework with the help of pre-trained visual language models. Our extensive experiment demonstrates that our Delphi generates a higher quality of long videos surpassing previous state-of-the-art methods. Consequentially, with only generating 4% of the training dataset size, our framework is able to go beyond perception and prediction tasks, for the first time to the best of our knowledge, boost the planning performance of the end-to-end autonomous driving model by a margin of 25%. △ Less

Submitted 6 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

Comments: Project Page: https://westlake-autolab.github.io/delphi.github.io/, 8 figures

arXiv:2406.00988 [pdf, other]

ADE-HGNN: Accelerating HGNNs through Attention Disparity Exploitation

Authors: Dengke Han, Meng Wu, Runzhen Xue, Mingyu Yan, Xiaochun Ye, Dongrui Fan

Abstract: Heterogeneous Graph Neural Networks (HGNNs) have recently demonstrated great power in handling heterogeneous graph data, rendering them widely applied in many critical real-world domains. Most HGNN models leverage attention mechanisms to significantly improvemodel accuracy, albeit at the cost of increased computational complexity and memory bandwidth requirements. Fortunately, the attention dispar… ▽ More Heterogeneous Graph Neural Networks (HGNNs) have recently demonstrated great power in handling heterogeneous graph data, rendering them widely applied in many critical real-world domains. Most HGNN models leverage attention mechanisms to significantly improvemodel accuracy, albeit at the cost of increased computational complexity and memory bandwidth requirements. Fortunately, the attention disparity from source vertices towards a common target vertex unveils an opportunity to boost the model execution performance by pruning unimportant source vertices during neighbor aggregation. In this study, we commence with a quantitative analysis of the attention disparity in HGNN models, where the importance of different source vertices varies for the same target vertex. To fully exploit this finding for inference acceleration, we propose a runtime pruning method based on min-heap and map it to a dedicated hardware pruner to discard unimportant vertices. Given that the pruning overhead itself is non-negligible and cannot be amortized by conventional staged execution paradigm, an operation-fusion execution fow of HGNNs is introduced to overlap the pruning overhead while harnessing inter-stage parallelism. Finally, we present the design of a novel HGNN accelerator, ADE-HGNN, tailored to support the proposed execution framework. Our experimental results demonstrate that ADE-HGNN achieves an average performance improvement of 28.21x over the NVIDIA GPU T4 platform and 7.98x over the advanced GPU A100, with the inference accuracy loss kept within a negligible range of 0.11%~1.47%. Furthermore, ADE-HGNN significantly reduces energy consumption to 1.97% and 5.37% of the two platforms, respectively. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 15 pages, 9 figures, accepted by Euro-PAR 2024

arXiv:2406.00897 [pdf, other]

Exact Solutions of a Time-Delay Advection Equation and a Fractional Time-Delay Advection Equation

Authors: Christopher N. Angstmann, Stuart-James M. Burney, Daniel S. Han, Bruce I. Henry, Boris Z. Huang, Zhuang Xu

Abstract: Exact solutions are derived for a time-delay advection equation and a fractional-order time-delay advection equation with a time-delay in the spatial derivative. Solutions are obtained, for arbitrary separable initial conditions, by incorporating recently introduced delay functions in a separation of variables approach. Examples are provided showing oscillatory and translatory behaviours fundament… ▽ More Exact solutions are derived for a time-delay advection equation and a fractional-order time-delay advection equation with a time-delay in the spatial derivative. Solutions are obtained, for arbitrary separable initial conditions, by incorporating recently introduced delay functions in a separation of variables approach. Examples are provided showing oscillatory and translatory behaviours fundamentally different to standard propagating wave solutions. △ Less

Submitted 2 June, 2024; originally announced June 2024.

Comments: Letter

MSC Class: 35C10; 35F10; 34K06; 42A38; 33E20

arXiv:2405.19044 [pdf, ps, other]

On adaptive stochastic extended iterative methods for solving least squares

Authors: Yun Zeng, Deren Han, Yansheng Su, Jiaxin Xie

Abstract: In this paper, we propose a novel adaptive stochastic extended iterative method, which can be viewed as an improved extension of the randomized extended Kaczmarz (REK) method, for finding the unique minimum Euclidean norm least-squares solution of a given linear system. In particular, we introduce three equivalent stochastic reformulations of the linear least-squares problem: stochastic unconstrai… ▽ More In this paper, we propose a novel adaptive stochastic extended iterative method, which can be viewed as an improved extension of the randomized extended Kaczmarz (REK) method, for finding the unique minimum Euclidean norm least-squares solution of a given linear system. In particular, we introduce three equivalent stochastic reformulations of the linear least-squares problem: stochastic unconstrained and constrained optimization problems, and the stochastic multiobjective optimization problem. We then alternately employ the adaptive variants of the stochastic heavy ball momentum (SHBM) method, which utilize iterative information to update the parameters, to solve the stochastic reformulations. We prove that our method converges linearly in expectation, addressing an open problem in the literature related to designing theoretically supported adaptive SHBM methods. Numerical experiments show that our adaptive stochastic extended iterative method has strong advantages over the non-adaptive one. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.16605 [pdf, other]

Demystify Mamba in Vision: A Linear Attention Perspective

Authors: Dongchen Han, Ziyi Wang, Zhuofan Xia, Yizeng Han, Yifan Pu, Chunjiang Ge, Jun Song, Shiji Song, Bo Zheng, Gao Huang

Abstract: Mamba is an effective state space model with linear computation complexity. It has recently shown impressive efficiency in dealing with high-resolution inputs across various vision tasks. In this paper, we reveal that the powerful Mamba model shares surprising similarities with linear attention Transformer, which typically underperform conventional Transformer in practice. By exploring the similar… ▽ More Mamba is an effective state space model with linear computation complexity. It has recently shown impressive efficiency in dealing with high-resolution inputs across various vision tasks. In this paper, we reveal that the powerful Mamba model shares surprising similarities with linear attention Transformer, which typically underperform conventional Transformer in practice. By exploring the similarities and disparities between the effective Mamba and subpar linear attention Transformer, we provide comprehensive analyses to demystify the key factors behind Mamba's success. Specifically, we reformulate the selective state space model and linear attention within a unified formulation, rephrasing Mamba as a variant of linear attention Transformer with six major distinctions: input gate, forget gate, shortcut, no attention normalization, single-head, and modified block design. For each design, we meticulously analyze its pros and cons, and empirically evaluate its impact on model performance in vision tasks. Interestingly, the results highlight the forget gate and block design as the core contributors to Mamba's success, while the other four designs are less crucial. Based on these findings, we propose a Mamba-Like Linear Attention (MLLA) model by incorporating the merits of these two key designs into linear attention. The resulting model outperforms various vision Mamba models in both image classification and high-resolution dense prediction tasks, while enjoying parallelizable computation and fast inference speed. Code is available at https://github.com/LeapLabTHU/MLLA. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.14745 [pdf, other]

AnyLoss: Transforming Classification Metrics into Loss Functions

Authors: Doheon Han, Nuno Moniz, Nitesh V Chawla

Abstract: Many evaluation metrics can be used to assess the performance of models in binary classification tasks. However, most of them are derived from a confusion matrix in a non-differentiable form, making it very difficult to generate a differentiable loss function that could directly optimize them. The lack of solutions to bridge this challenge not only hinders our ability to solve difficult tasks, suc… ▽ More Many evaluation metrics can be used to assess the performance of models in binary classification tasks. However, most of them are derived from a confusion matrix in a non-differentiable form, making it very difficult to generate a differentiable loss function that could directly optimize them. The lack of solutions to bridge this challenge not only hinders our ability to solve difficult tasks, such as imbalanced learning, but also requires the deployment of computationally expensive hyperparameter search processes in model selection. In this paper, we propose a general-purpose approach that transforms any confusion matrix-based metric into a loss function, \textit{AnyLoss}, that is available in optimization processes. To this end, we use an approximation function to make a confusion matrix represented in a differentiable form, and this approach enables any confusion matrix-based metric to be directly used as a loss function. The mechanism of the approximation function is provided to ensure its operability and the differentiability of our loss functions is proved by suggesting their derivatives. We conduct extensive experiments under diverse neural networks with many datasets, and we demonstrate their general availability to target any confusion matrix-based metrics. Our method, especially, shows outstanding achievements in dealing with imbalanced datasets, and its competitive learning speed, compared to multiple baseline models, underscores its efficiency. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.14362 [pdf, other]

Advancing Spiking Neural Networks for Sequential Modeling with Central Pattern Generators

Authors: Changze Lv, Dongqi Han, Yansen Wang, Xiaoqing Zheng, Xuanjing Huang, Dongsheng Li

Abstract: Spiking neural networks (SNNs) represent a promising approach to developing artificial neural networks that are both energy-efficient and biologically plausible. However, applying SNNs to sequential tasks, such as text classification and time-series forecasting, has been hindered by the challenge of creating an effective and hardware-friendly spike-form positional encoding (PE) strategy. Drawing i… ▽ More Spiking neural networks (SNNs) represent a promising approach to developing artificial neural networks that are both energy-efficient and biologically plausible. However, applying SNNs to sequential tasks, such as text classification and time-series forecasting, has been hindered by the challenge of creating an effective and hardware-friendly spike-form positional encoding (PE) strategy. Drawing inspiration from the central pattern generators (CPGs) in the human brain, which produce rhythmic patterned outputs without requiring rhythmic inputs, we propose a novel PE technique for SNNs, termed CPG-PE. We demonstrate that the commonly used sinusoidal PE is mathematically a specific solution to the membrane potential dynamics of a particular CPG. Moreover, extensive experiments across various domains, including time-series forecasting, natural language processing, and image classification, show that SNNs with CPG-PE outperform their conventional counterparts. Additionally, we perform analysis experiments to elucidate the mechanism through which SNNs encode positional information and to explore the function of CPGs in the human brain. This investigation may offer valuable insights into the fundamental principles of neural computation. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.13549 [pdf, other]

Multi-Objective Optimization-Based Waveform Design for Multi-User and Multi-Target MIMO-ISAC Systems

Authors: Peng Wang, Dongsheng Han, Yashuai Cao, Wanli Ni, Dusit Niyato

Abstract: Integrated sensing and communication (ISAC) opens up new service possibilities for sixth-generation (6G) systems, where both communication and sensing (C&S) functionalities co-exist by sharing the same hardware platform and radio resource. In this paper, we investigate the waveform design problem in a downlink multi-user and multi-target ISAC system under different C&S performance preferences. The… ▽ More Integrated sensing and communication (ISAC) opens up new service possibilities for sixth-generation (6G) systems, where both communication and sensing (C&S) functionalities co-exist by sharing the same hardware platform and radio resource. In this paper, we investigate the waveform design problem in a downlink multi-user and multi-target ISAC system under different C&S performance preferences. The multi-user interference (MUI) may critically degrade the communication performance. To eliminate the MUI, we employ the constructive interference mechanism into the ISAC system, which saves the power budget for communication. However, due to the conflict between C&S metrics, it is intractable for the ISAC system to achieve the optimal performance of C&S objective simultaneously. Therefore, it is important to strike a tradeoff between C&S objectives. By virtue of the multi-objective optimization theory, we propose a weighted Tchebycheff-based transformation method to re-frame the C&S trade-off problem as a Pareto-optimal problem, thus effectively tackling the constraints in ISAC systems. Finally, simulation results reveal the trade-off relation between C&S performances, which provides insights for the flexible waveform design under different C&S performance preferences in MIMO-ISAC systems. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: 13 pages, submitted to IEEE TWC

arXiv:2405.05947 [pdf, other]

A Survey on Visualization Approaches in Political Science for Social and Political Factors: Progress to Date and Future Opportunities

Authors: Dongyun Han, Abdullah-Al-Raihan Nayeem, Jason Windett, Yaoyao Dai, Benjamin Radford, Isaac Cho

Abstract: Politics is the set of activities related to strategic decision-making in groups. Political scientists study the strategic interactions between states, institutions, politicians, and citizens; they seek to understand the causes and consequences of those decisions and interactions. While some decisions might alleviate social problems, others might lead to disasters such as war and conflict. Data vi… ▽ More Politics is the set of activities related to strategic decision-making in groups. Political scientists study the strategic interactions between states, institutions, politicians, and citizens; they seek to understand the causes and consequences of those decisions and interactions. While some decisions might alleviate social problems, others might lead to disasters such as war and conflict. Data visualization approaches have the potential to assist political scientists in their studies by providing visual contexts. However, political researchers' perspectives on data visualization are unclear. This paper examines political scientists' perspectives on visualization and how they apply data visualization in their research. We discovered a growing trend in the use of graphs in political science journals. However, we also found a knowledge gap between the political science and visualization domains, such as effective visualization techniques for tasks and the use of color studied by visualization researchers. To reduce this gap, we survey visualization techniques applicable to the political scientists' research and report the visual analytics systems implemented for and evaluated by political scientists. At the end of this paper, we present an outline of future opportunities, including research topics and methodologies, for multidisciplinary research in political science and data analytics. Through this paper, we expect visualization researchers to get a better grasp of the political science domain, as well as broaden the possibility of future visualization approaches from a multidisciplinary perspective. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.04091 [pdf, ps, other]

Randomized iterative methods for generalized absolute value equations: Solvability and error bounds

Authors: Jiaxin Xie, Houduo Qi, Deren Han

Abstract: Randomized iterative methods, such as the Kaczmarz method and its variants, have gained growing attention due to their simplicity and efficiency in solving large-scale linear systems. Meanwhile, absolute value equations (AVE) have attracted increasing interest due to their connection with the linear complementarity problem. In this paper, we investigate the application of randomized iterative meth… ▽ More Randomized iterative methods, such as the Kaczmarz method and its variants, have gained growing attention due to their simplicity and efficiency in solving large-scale linear systems. Meanwhile, absolute value equations (AVE) have attracted increasing interest due to their connection with the linear complementarity problem. In this paper, we investigate the application of randomized iterative methods to generalized AVE (GAVE). Our approach differs from most existing works in that we tackle GAVE with non-square coefficient matrices. We establish more comprehensive sufficient and necessary conditions for characterizing the solvability of GAVE and propose precise error bound conditions. Furthermore, we introduce a flexible and efficient randomized iterative algorithmic framework for solving GAVE, which employs sampling matrices drawn from user-specified distributions. This framework is capable of encompassing many well-known methods, including the Picard iteration method and the randomized Kaczmarz method. Leveraging our findings on solvability and error bounds, we establish both almost sure convergence and linear convergence rates for this versatile algorithmic framework. Finally, we present numerical examples to illustrate the advantages of the new algorithms. △ Less

Submitted 8 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.02760 [pdf, other]

GTFS2STN: Analyzing GTFS Transit Data by Generating Spatiotemporal Transit Network

Authors: Diyi Liu, Jing Guo, Yangsong Gu, Meredith King, Lee D. Han, Candace Brakewood

Abstract: GTFS, the General Transit Feed Specialization, is an open standard format to record transit information used by thousands of transit agencies across the world. By converting a static GTFS transit network to a spatiotemporal network connecting bus stops over space and time, a preliminary tool named GTFS2STN is implemented to analyze the accessibility of the transit system. Furthermore, a simple app… ▽ More GTFS, the General Transit Feed Specialization, is an open standard format to record transit information used by thousands of transit agencies across the world. By converting a static GTFS transit network to a spatiotemporal network connecting bus stops over space and time, a preliminary tool named GTFS2STN is implemented to analyze the accessibility of the transit system. Furthermore, a simple application is built for users to generate spatiotemporal network online. The online tool also supports some basic analysis including generate isochrone maps given origin, generate travel time variability over time given a pair of origin and destination, etc. Results show that the tool has a similar result compared with Mapnificent, another open source endeavour to generate isochrone maps given GTFS inputs. Compared with Mapnificent, the proposed GTFS2STN tool is suited for research and evaluation purposes because the users can upload any historical GTFS dataset by any transit agencies to evaluate the accessibility and travel time variability of transit networks over time. △ Less

Submitted 4 May, 2024; originally announced May 2024.

Comments: 8 pages, 8 figures

arXiv:2405.01113 [pdf, other]

Domain-Transferred Synthetic Data Generation for Improving Monocular Depth Estimation

Authors: Seungyeop Lee, Knut Peterson, Solmaz Arezoomandan, Bill Cai, Peihan Li, Lifeng Zhou, David Han

Abstract: A major obstacle to the development of effective monocular depth estimation algorithms is the difficulty in obtaining high-quality depth data that corresponds to collected RGB images. Collecting this data is time-consuming and costly, and even data collected by modern sensors has limited range or resolution, and is subject to inconsistencies and noise. To combat this, we propose a method of data g… ▽ More A major obstacle to the development of effective monocular depth estimation algorithms is the difficulty in obtaining high-quality depth data that corresponds to collected RGB images. Collecting this data is time-consuming and costly, and even data collected by modern sensors has limited range or resolution, and is subject to inconsistencies and noise. To combat this, we propose a method of data generation in simulation using 3D synthetic environments and CycleGAN domain transfer. We compare this method of data generation to the popular NYUDepth V2 dataset by training a depth estimation model based on the DenseDepth structure using different training sets of real and simulated data. We evaluate the performance of the models on newly collected images and LiDAR depth data from a Husky robot to verify the generalizability of the approach and show that GAN-transformed data can serve as an effective alternative to real-world data, particularly in depth estimation. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2404.18560 [pdf, other]

Non-convex Pose Graph Optimization in SLAM via Proximal Linearized Riemannian ADMM

Authors: Xin Chen, Chunfeng Cui, Deren Han, Liqun Qi

Abstract: Pose graph optimization (PGO) is a well-known technique for solving the pose-based simultaneous localization and mapping (SLAM) problem. In this paper, we represent the rotation and translation by a unit quaternion and a three-dimensional vector, and propose a new PGO model based on the von Mises-Fisher distribution. The constraints derived from the unit quaternions are spherical manifolds, and th… ▽ More Pose graph optimization (PGO) is a well-known technique for solving the pose-based simultaneous localization and mapping (SLAM) problem. In this paper, we represent the rotation and translation by a unit quaternion and a three-dimensional vector, and propose a new PGO model based on the von Mises-Fisher distribution. The constraints derived from the unit quaternions are spherical manifolds, and the projection onto the constraints can be calculated by normalization. Then a proximal linearized Riemannian alternating direction method of multipliers (PieADMM) is developed to solve the proposed model, which not only has low memory requirements, but also can update the poses in parallel. Furthermore, we establish the iteration complexity of $O(1/ε^{2})$ of PieADMM for finding an $ε$-stationary solution of our model. The efficiency of our proposed algorithm is demonstrated by numerical experiments on two synthetic and four 3D SLAM benchmark datasets. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.17955 [pdf, other]

A Survey of Third-Party Library Security Research in Application Software

Authors: Jia Zeng, Dan Han, Yaling Zhu, Yangzhong Wang, Fangchen Weng

Abstract: In the current software development environment, third-party libraries play a crucial role. They provide developers with rich functionality and convenient solutions, speeding up the pace and efficiency of software development. However, with the widespread use of third-party libraries, associated security risks and potential vulnerabilities are increasingly apparent. Malicious attackers can exploit… ▽ More In the current software development environment, third-party libraries play a crucial role. They provide developers with rich functionality and convenient solutions, speeding up the pace and efficiency of software development. However, with the widespread use of third-party libraries, associated security risks and potential vulnerabilities are increasingly apparent. Malicious attackers can exploit these vulnerabilities to infiltrate systems, execute unauthorized operations, or steal sensitive information, posing a severe threat to software security. Research on third-party libraries in software becomes paramount to address this growing security challenge. Numerous research findings exist regarding third-party libraries' usage, ecosystem, detection, and fortification defenses. Understanding the usage and ecosystem of third-party libraries helps developers comprehend the potential risks they bring and select trustworthy libraries. Third-party library detection tools aid developers in automatically discovering third-party libraries in software, facilitating their management. In addition to detection, fortification defenses are also indispensable. This article profoundly investigates and analyzes this literature, summarizing current research achievements and future development directions. It aims to provide practical and valuable insights for developers and researchers, jointly promoting the healthy development of software ecosystems and better-protecting software from security threats. △ Less

Submitted 27 April, 2024; originally announced April 2024.

Comments: 21 pages, 3 figures, one table

arXiv:2404.17507 [pdf, other]

HYPE: Hyperbolic Entailment Filtering for Underspecified Images and Texts

Authors: Wonjae Kim, Sanghyuk Chun, Taekyung Kim, Dongyoon Han, Sangdoo Yun

Abstract: In an era where the volume of data drives the effectiveness of self-supervised learning, the specificity and clarity of data semantics play a crucial role in model training. Addressing this, we introduce HYPerbolic Entailment filtering (HYPE), a novel methodology designed to meticulously extract modality-wise meaningful and well-aligned data from extensive, noisy image-text pair datasets. Our appr… ▽ More In an era where the volume of data drives the effectiveness of self-supervised learning, the specificity and clarity of data semantics play a crucial role in model training. Addressing this, we introduce HYPerbolic Entailment filtering (HYPE), a novel methodology designed to meticulously extract modality-wise meaningful and well-aligned data from extensive, noisy image-text pair datasets. Our approach leverages hyperbolic embeddings and the concept of entailment cones to evaluate and filter out samples with meaningless or underspecified semantics, focusing on enhancing the specificity of each data sample. HYPE not only demonstrates a significant improvement in filtering efficiency but also sets a new state-of-the-art in the DataComp benchmark when combined with existing filtering techniques. This breakthrough showcases the potential of HYPE to refine the data selection process, thereby contributing to the development of more accurate and efficient self-supervised learning models. Additionally, the image specificity $ε_{i}$ can be independently applied to induce an image-only dataset from an image-text or image-only data pool for training image-only self-supervised models and showed superior performance when compared to the dataset induced by CLIP score. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: 28pages, 4.5MB

arXiv:2404.17281 [pdf]

Topological polarization singularities induced by the non-Hermitian Dirac points

Authors: Jun Wang, Jie Liu, Peng Hu, Qiao Jiang, Dezhuan Han

Abstract: A Dirac point in the Hermitian photonic system will split into a pair of exceptional points (EPs) or even spawn a ring of EPs if non-Hermiticity is involved. Here, we present a new type of non-Hermitian Dirac point which is situated in the complex plane of eigenfrequency. When there is differential loss, the Dirac point exhibits a dual behavior: it not only splits into a pair of EPs with opposite… ▽ More A Dirac point in the Hermitian photonic system will split into a pair of exceptional points (EPs) or even spawn a ring of EPs if non-Hermiticity is involved. Here, we present a new type of non-Hermitian Dirac point which is situated in the complex plane of eigenfrequency. When there is differential loss, the Dirac point exhibits a dual behavior: it not only splits into a pair of EPs with opposite chirality in the band structure but also induces a pair of circularly polarized states (C points) with opposite handedness in the far-field radiation. Furthermore, breaking the corresponding mirror symmetries enables independent control of these Dirac-point induced C points, facilitating the merging of two C points and generation of unidirectional guided resonances. Our results demonstrate an explicit relation between the band singularities and polarization singularities, and provide a new mechanism to generate unidirectional emission, which can be useful in the band engineering and polarization manipulation. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.16659 [pdf, other]

ProbGate at EHRSQL 2024: Enhancing SQL Query Generation Accuracy through Probabilistic Threshold Filtering and Error Handling

Authors: Sangryul Kim, Donghee Han, Sehyun Kim

Abstract: Recently, deep learning-based language models have significantly enhanced text-to-SQL tasks, with promising applications in retrieving patient records within the medical domain. One notable challenge in such applications is discerning unanswerable queries. Through fine-tuning model, we demonstrate the feasibility of converting medical record inquiries into SQL queries. Additionally, we introduce a… ▽ More Recently, deep learning-based language models have significantly enhanced text-to-SQL tasks, with promising applications in retrieving patient records within the medical domain. One notable challenge in such applications is discerning unanswerable queries. Through fine-tuning model, we demonstrate the feasibility of converting medical record inquiries into SQL queries. Additionally, we introduce an entropy-based method to identify and filter out unanswerable results. We further enhance result quality by filtering low-confidence SQL through log probability-based distribution, while grammatical and schema errors are mitigated by executing queries on the actual database. We experimentally verified that our method can filter unanswerable questions, which can be widely utilized even when the parameters of the model are not accessible, and that it can be effectively utilized in practice. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: The 6th Clinical Natural Language Processing Workshop at NAACL 2024. Code is available at https://github.com/venzino-han/probgate_ehrsql

arXiv:2404.16425 [pdf, other]

Soft X-ray prompt emission from a high-redshift gamma-ray burst EP240315a

Authors: Y. Liu, H. Sun, D. Xu, D. S. Svinkin, J. Delaunay, N. R. Tanvir, H. Gao, C. Zhang, Y. Chen, X. -F. Wu, B. Zhang, W. Yuan, J. An, G. Bruni, D. D. Frederiks, G. Ghirlanda, J. -W. Hu, A. Li, C. -K. Li, J. -D. Li, D. B. Malesani, L. Piro, G. Raman, R. Ricci, E. Troja , et al. (170 additional authors not shown)

Abstract: Long gamma-ray bursts (GRBs) are believed to originate from core collapse of massive stars. High-redshift GRBs can probe the star formation and reionization history of the early universe, but their detection remains rare. Here we report the detection of a GRB triggered in the 0.5--4 keV band by the Wide-field X-ray Telescope (WXT) on board the Einstein Probe (EP) mission, designated as EP240315a,… ▽ More Long gamma-ray bursts (GRBs) are believed to originate from core collapse of massive stars. High-redshift GRBs can probe the star formation and reionization history of the early universe, but their detection remains rare. Here we report the detection of a GRB triggered in the 0.5--4 keV band by the Wide-field X-ray Telescope (WXT) on board the Einstein Probe (EP) mission, designated as EP240315a, whose bright peak was also detected by the Swift Burst Alert Telescope and Konus-Wind through off-line analyses. At a redshift of $z=4.859$, EP240315a showed a much longer and more complicated light curve in the soft X-ray band than in gamma-rays. Benefiting from a large field-of-view ($\sim$3600 deg$^2$) and a high sensitivity, EP-WXT captured the earlier engine activation and extended late engine activity through a continuous detection. With a peak X-ray flux at the faint end of previously known high-$z$ GRBs, the detection of EP240315a demonstrates the great potential for EP to study the early universe via GRBs. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: 41 pages, 8 figures, 7 tables

arXiv:2404.16263 [pdf, other]

New Timing Results of MSPs from NICER Observations

Authors: Shijie Zheng, Dawei Han, Heng Xu, Kejia Lee, Jianping Yuan, Haoxi Wang, Mingyu Ge, Liang Zhang, Yongye Li, Yitao Yin, Xiang Ma, Yong Chen, Shuangnan Zhang

Abstract: Millisecond pulsars (MSPs) are known for their long-term stability. Using six years of observations from the Neutron Star Interior Composition Explorer (NICER), we have conducted an in-depth analysis of the X-ray timing results for six MSPs: PSRs B1937+21, B1821$-$24, J0437$-$4715, J0030+0451, J0218+4232, and J2124$-$3358. The timing stability parameter $σ_z$ has been calculated, revealing remarka… ▽ More Millisecond pulsars (MSPs) are known for their long-term stability. Using six years of observations from the Neutron Star Interior Composition Explorer (NICER), we have conducted an in-depth analysis of the X-ray timing results for six MSPs: PSRs B1937+21, B1821$-$24, J0437$-$4715, J0030+0451, J0218+4232, and J2124$-$3358. The timing stability parameter $σ_z$ has been calculated, revealing remarkable timing precision on the order of $10^{-14}$ for PSRs B1937+21 and J0437$-$4715, and $10^{-13}$ for PSRs B1821$-$24, J0218+4232, and J0030+0451 over a timescale of 1000 days. These findings underscore the feasibility of autonomous in-orbit timekeeping using X-ray observations of MSPs. In addition, the consistency of long-term spin-down noise in the X-ray and radio bands has been investigated by comparison with IPTA radio data. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.14285 [pdf, other]

LLM-Personalize: Aligning LLM Planners with Human Preferences via Reinforced Self-Training for Housekeeping Robots

Authors: Dongge Han, Trevor McInroe, Adam Jelley, Stefano V. Albrecht, Peter Bell, Amos Storkey

Abstract: Large language models (LLMs) have shown significant potential for robotics applications, particularly task planning, by harnessing their language comprehension and text generation capabilities. However, in applications such as household robotics, a critical gap remains in the personalization of these models to individual user preferences. We introduce LLM-Personalize, a novel framework with an opt… ▽ More Large language models (LLMs) have shown significant potential for robotics applications, particularly task planning, by harnessing their language comprehension and text generation capabilities. However, in applications such as household robotics, a critical gap remains in the personalization of these models to individual user preferences. We introduce LLM-Personalize, a novel framework with an optimization pipeline designed to personalize LLM planners for household robotics. Our LLM-Personalize framework features an LLM planner that performs iterative planning in multi-room, partially-observable household scenarios, making use of a scene graph constructed with local observations. The generated plan consists of a sequence of high-level actions which are subsequently executed by a controller. Central to our approach is the optimization pipeline, which combines imitation learning and iterative self-training to personalize the LLM planner. In particular, the imitation learning phase performs initial LLM alignment from demonstrations, and bootstraps the model to facilitate effective iterative self-training, which further explores and aligns the model to user preferences. We evaluate LLM-Personalize on Housekeep, a challenging simulated real-world 3D benchmark for household rearrangements, and show that LLM-Personalize achieves more than a 30 percent increase in success rate over existing LLM planners, showcasing significantly improved alignment with human preferences. Project page: https://donggehan.github.io/projectllmpersonalize/. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.11822 [pdf, ps, other]

A class of maximum-based iteration methods for the generalized absolute value equation

Authors: Shiliang Wu, Deren Han, Cuixia Li

Abstract: In this paper, by using $|x|=2\max\{0,x\}-x$, a class of maximum-based iteration methods is established to solve the generalized absolute value equation $Ax-B|x|=b$. Some convergence conditions of the proposed method are presented. By some numerical experiments, the effectiveness and feasibility of the proposed method are confirmed. In this paper, by using $|x|=2\max\{0,x\}-x$, a class of maximum-based iteration methods is established to solve the generalized absolute value equation $Ax-B|x|=b$. Some convergence conditions of the proposed method are presented. By some numerical experiments, the effectiveness and feasibility of the proposed method are confirmed. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.09490 [pdf, other]

Leveraging Temporal Contextualization for Video Action Recognition

Authors: Minji Kim, Dongyoon Han, Taekyung Kim, Bohyung Han

Abstract: Pretrained vision-language models have shown effectiveness in video understanding. However, recent studies have not sufficiently leveraged essential temporal information from videos, simply averaging frame-wise representations or referencing consecutive frames. We introduce Temporally Contextualized CLIP (TC-CLIP), a pioneering framework for video understanding that effectively and efficiently lev… ▽ More Pretrained vision-language models have shown effectiveness in video understanding. However, recent studies have not sufficiently leveraged essential temporal information from videos, simply averaging frame-wise representations or referencing consecutive frames. We introduce Temporally Contextualized CLIP (TC-CLIP), a pioneering framework for video understanding that effectively and efficiently leverages comprehensive video information. We propose Temporal Contextualization (TC), a novel layer-wise temporal information infusion mechanism for video that extracts core information from each frame, interconnects relevant information across the video to summarize into context tokens, and ultimately leverages the context tokens during the feature encoding process. Furthermore, our Video-conditional Prompting (VP) module manufactures context tokens to generate informative prompts in text modality. We conduct extensive experiments in zero-shot, few-shot, base-to-novel, and fully-supervised action recognition to validate the superiority of our TC-CLIP. Ablation studies for TC and VP guarantee our design choices. Code is available at https://github.com/naver-ai/tc-clip △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: 24 pages, 10 figures, 12 tables

arXiv:2404.09460 [pdf, other]

Optimal Real-time Bidding Strategy For EV Aggregators in Wholesale Electricity Markets

Authors: Shihan Huang, Dongkun Han, John Zhen Fu Pang, Yue Chen

Abstract: With the rapid growth of electric vehicles (EVs), EV aggregators have been playing a increasingly vital role in power systems by not merely providing charging management but also participating in wholesale electricity markets. This work studies the optimal real-time bidding strategy for an EV aggregator. Since the charging process of EVs is time-coupled, it is necessary for EV aggregators to consi… ▽ More With the rapid growth of electric vehicles (EVs), EV aggregators have been playing a increasingly vital role in power systems by not merely providing charging management but also participating in wholesale electricity markets. This work studies the optimal real-time bidding strategy for an EV aggregator. Since the charging process of EVs is time-coupled, it is necessary for EV aggregators to consider future operational conditions (e.g., future EV arrivals) when deciding the current bidding strategy. However, accurately forecasting future operational conditions is challenging under the inherent uncertainties. Hence, there demands a real-time bidding strategy based solely on the up-to-date information, which is the main goal of this work. We start by developing an online optimal EV charging management algorithm for the EV aggregator via Lyapunov optimization. Based on this, an optimal real-time bidding strategy (bidding cost curve and bounds) for the aggregator is derived. Then, an efficient yet practical algorithm is proposed to obtain the bidding strategy. It shows that with the proposed bidding strategy, the aggregator's profit is nearly offline optimal. Moreover, the wholesale electricity market clearing result aligns with the individual aggregator's optimal charging strategy given the prices. Case studies against several benchmarks are conducted to evaluate the performance of the proposed method. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: 13 pages, 6 figures

arXiv:2404.08317 [pdf, other]

Technical Design Report of the Spin Physics Detector at NICA

Authors: The SPD Collaboration, V. Abazov, V. Abramov, L. Afanasyev, R. Akhunzyanov, A. Akindinov, I. Alekseev, A. Aleshko, V. Alexakhin, G. Alexeev, L. Alimov, A. Allakhverdieva, A. Amoroso, V. Andreev, V. Andreev, E. Andronov, Yu. Anikin, S. Anischenko, A. Anisenkov, V. Anosov, E. Antokhin, A. Antonov, S. Antsupov, A. Anufriev, K. Asadova , et al. (392 additional authors not shown)

Abstract: The Spin Physics Detector collaboration proposes to install a universal detector in the second interaction point of the NICA collider under construction (JINR, Dubna) to study the spin structure of the proton and deuteron and other spin-related phenomena using a unique possibility to operate with polarized proton and deuteron beams at a collision energy up to 27 GeV and a luminosity up to… ▽ More The Spin Physics Detector collaboration proposes to install a universal detector in the second interaction point of the NICA collider under construction (JINR, Dubna) to study the spin structure of the proton and deuteron and other spin-related phenomena using a unique possibility to operate with polarized proton and deuteron beams at a collision energy up to 27 GeV and a luminosity up to $10^{32}$ cm$^{-2}$ s$^{-1}$. As the main goal, the experiment aims to provide access to the gluon TMD PDFs in the proton and deuteron, as well as the gluon transversity distribution and tensor PDFs in the deuteron, via the measurement of specific single and double spin asymmetries using different complementary probes such as charmonia, open charm, and prompt photon production processes. Other polarized and unpolarized physics is possible, especially at the first stage of NICA operation with reduced luminosity and collision energy of the proton and ion beams. This document is dedicated exclusively to technical issues of the SPD setup construction. △ Less

Submitted 28 May, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

arXiv:2404.08003 [pdf, other]

Asynchronous Federated Reinforcement Learning with Policy Gradient Updates: Algorithm Design and Convergence Analysis

Authors: Guangchen Lan, Dong-Jun Han, Abolfazl Hashemi, Vaneet Aggarwal, Christopher G. Brinton

Abstract: To improve the efficiency of reinforcement learning, we propose a novel asynchronous federated reinforcement learning framework termed AFedPG, which constructs a global model through collaboration among $N$ agents using policy gradient (PG) updates. To handle the challenge of lagged policies in asynchronous settings, we design delay-adaptive lookahead and normalized update techniques that can effe… ▽ More To improve the efficiency of reinforcement learning, we propose a novel asynchronous federated reinforcement learning framework termed AFedPG, which constructs a global model through collaboration among $N$ agents using policy gradient (PG) updates. To handle the challenge of lagged policies in asynchronous settings, we design delay-adaptive lookahead and normalized update techniques that can effectively handle the heterogeneous arrival times of policy gradients. We analyze the theoretical global convergence bound of AFedPG, and characterize the advantage of the proposed algorithm in terms of both the sample complexity and time complexity. Specifically, our AFedPG method achieves $\mathcal{O}(\frac{ε^{-2.5}}{N})$ sample complexity at each agent on average. Compared to the single agent setting with $\mathcal{O}(ε^{-2.5})$ sample complexity, it enjoys a linear speedup with respect to the number of agents. Moreover, compared to synchronous FedPG, AFedPG improves the time complexity from $\mathcal{O}(\frac{t_{\max}}{N})$ to $\mathcal{O}(\frac{1}{\sum_{i=1}^{N} \frac{1}{t_{i}}})$, where $t_{i}$ denotes the time consumption in each iteration at the agent $i$, and $t_{\max}$ is the largest one. The latter complexity $\mathcal{O}(\frac{1}{\sum_{i=1}^{N} \frac{1}{t_{i}}})$ is always smaller than the former one, and this improvement becomes significant in large-scale federated settings with heterogeneous computing powers ($t_{\max}\gg t_{\min}$). Finally, we empirically verify the improved performances of AFedPG in three MuJoCo environments with varying numbers of agents. We also demonstrate the improvements with different computing heterogeneity. △ Less

Submitted 14 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

ACM Class: I.2.6; I.2.11

arXiv:2404.04792 [pdf, other]

doi 10.1145/3649329.3656259

GDR-HGNN: A Heterogeneous Graph Neural Networks Accelerator Frontend with Graph Decoupling and Recoupling

Authors: Runzhen Xue, Mingyu Yan, Dengke Han, Yihan Teng, Zhimin Tang, Xiaochun Ye, Dongrui Fan

Abstract: Heterogeneous Graph Neural Networks (HGNNs) have broadened the applicability of graph representation learning to heterogeneous graphs. However, the irregular memory access pattern of HGNNs leads to the buffer thrashing issue in HGNN accelerators. In this work, we identify an opportunity to address buffer thrashing in HGNN acceleration through an analysis of the topology of heterogeneous graphs. To… ▽ More Heterogeneous Graph Neural Networks (HGNNs) have broadened the applicability of graph representation learning to heterogeneous graphs. However, the irregular memory access pattern of HGNNs leads to the buffer thrashing issue in HGNN accelerators. In this work, we identify an opportunity to address buffer thrashing in HGNN acceleration through an analysis of the topology of heterogeneous graphs. To harvest this opportunity, we propose a graph restructuring method and map it into a hardware frontend named GDR-HGNN. GDR-HGNN dynamically restructures the graph on the fly to enhance data locality for HGNN accelerators. Experimental results demonstrate that, with the assistance of GDR-HGNN, a leading HGNN accelerator achieves an average speedup of 14.6 times and 1.78 times compared to the state-of-the-art software framework running on A100 GPU and itself, respectively. △ Less

Submitted 6 April, 2024; originally announced April 2024.

Comments: 6 pages, 10 figures, accepted by DAC'61

arXiv:2404.01954 [pdf, other]

HyperCLOVA X Technical Report

Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs. △ Less

Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 44 pages; updated authors list and fixed author names

arXiv:2404.01745 [pdf, other]

Unleash the Potential of CLIP for Video Highlight Detection

Authors: Donghoon Han, Seunghyeon Seo, Eunhwan Park, Seong-Uk Nam, Nojun Kwak

Abstract: Multimodal and large language models (LLMs) have revolutionized the utilization of open-world knowledge, unlocking novel potentials across various tasks and applications. Among these domains, the video domain has notably benefited from their capabilities. In this paper, we present Highlight-CLIP (HL-CLIP), a method designed to excel in the video highlight detection task by leveraging the pre-train… ▽ More Multimodal and large language models (LLMs) have revolutionized the utilization of open-world knowledge, unlocking novel potentials across various tasks and applications. Among these domains, the video domain has notably benefited from their capabilities. In this paper, we present Highlight-CLIP (HL-CLIP), a method designed to excel in the video highlight detection task by leveraging the pre-trained knowledge embedded in multimodal models. By simply fine-tuning the multimodal encoder in combination with our innovative saliency pooling technique, we have achieved the state-of-the-art performance in the highlight detection task, the QVHighlight Benchmark, to the best of our knowledge. △ Less

Submitted 2 April, 2024; originally announced April 2024.

arXiv:2404.01604 [pdf, other]

WaveDH: Wavelet Sub-bands Guided ConvNet for Efficient Image Dehazing

Authors: Seongmin Hwang, Daeyoung Han, Cheolkon Jung, Moongu Jeon

Abstract: The surge in interest regarding image dehazing has led to notable advancements in deep learning-based single image dehazing approaches, exhibiting impressive performance in recent studies. Despite these strides, many existing methods fall short in meeting the efficiency demands of practical applications. In this paper, we introduce WaveDH, a novel and compact ConvNet designed to address this effic… ▽ More The surge in interest regarding image dehazing has led to notable advancements in deep learning-based single image dehazing approaches, exhibiting impressive performance in recent studies. Despite these strides, many existing methods fall short in meeting the efficiency demands of practical applications. In this paper, we introduce WaveDH, a novel and compact ConvNet designed to address this efficiency gap in image dehazing. Our WaveDH leverages wavelet sub-bands for guided up-and-downsampling and frequency-aware feature refinement. The key idea lies in utilizing wavelet decomposition to extract low-and-high frequency components from feature levels, allowing for faster processing while upholding high-quality reconstruction. The downsampling block employs a novel squeeze-and-attention scheme to optimize the feature downsampling process in a structurally compact manner through wavelet domain learning, preserving discriminative features while discarding noise components. In our upsampling block, we introduce a dual-upsample and fusion mechanism to enhance high-frequency component awareness, aiding in the reconstruction of high-frequency details. Departing from conventional dehazing methods that treat low-and-high frequency components equally, our feature refinement block strategically processes features with a frequency-aware approach. By employing a coarse-to-fine methodology, it not only refines the details at frequency levels but also significantly optimizes computational costs. The refinement is performed in a maximum 8x downsampled feature space, striking a favorable efficiency-vs-accuracy trade-off. Extensive experiments demonstrate that our method, WaveDH, outperforms many state-of-the-art methods on several image dehazing benchmarks with significantly reduced computational costs. Our code is available at https://github.com/AwesomeHwang/WaveDH. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: Submitted to TMM

MSC Class: 68T07 ACM Class: I.4.4; I.4.9

arXiv:2403.19588 [pdf, other]

DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs

Authors: Donghyun Kim, Byeongho Heo, Dongyoon Han

Abstract: This paper revives Densely Connected Convolutional Networks (DenseNets) and reveals the underrated effectiveness over predominant ResNet-style architectures. We believe DenseNets' potential was overlooked due to untouched training methods and traditional design elements not fully revealing their capabilities. Our pilot study shows dense connections through concatenation are strong, demonstrating t… ▽ More This paper revives Densely Connected Convolutional Networks (DenseNets) and reveals the underrated effectiveness over predominant ResNet-style architectures. We believe DenseNets' potential was overlooked due to untouched training methods and traditional design elements not fully revealing their capabilities. Our pilot study shows dense connections through concatenation are strong, demonstrating that DenseNets can be revitalized to compete with modern architectures. We methodically refine suboptimal components - architectural adjustments, block redesign, and improved training recipes towards widening DenseNets and boosting memory efficiency while keeping concatenation shortcuts. Our models, employing simple architectural elements, ultimately surpass Swin Transformer, ConvNeXt, and DeiT-III - key architectures in the residual learning lineage. Furthermore, our models exhibit near state-of-the-art performance on ImageNet-1K, competing with the very recent models and downstream tasks, ADE20k semantic segmentation, and COCO object detection/instance segmentation. Finally, we provide empirical analyses that uncover the merits of the concatenation over additive shortcuts, steering a renewed preference towards DenseNet-style designs. Our code is available at https://github.com/naver-ai/rdnet. △ Less

Submitted 28 March, 2024; originally announced March 2024.

Comments: Code at https://github.com/naver-ai/rdnet

arXiv:2403.19522 [pdf, other]

Model Stock: All we need is just a few fine-tuned models

Authors: Dong-Hwan Jang, Sangdoo Yun, Dongyoon Han

Abstract: This paper introduces an efficient fine-tuning method for large pre-trained models, offering strong in-distribution (ID) and out-of-distribution (OOD) performance. Breaking away from traditional practices that need a multitude of fine-tuned models for averaging, our approach employs significantly fewer models to achieve final weights yet yield superior accuracy. Drawing from key insights in the we… ▽ More This paper introduces an efficient fine-tuning method for large pre-trained models, offering strong in-distribution (ID) and out-of-distribution (OOD) performance. Breaking away from traditional practices that need a multitude of fine-tuned models for averaging, our approach employs significantly fewer models to achieve final weights yet yield superior accuracy. Drawing from key insights in the weight space of fine-tuned weights, we uncover a strong link between the performance and proximity to the center of weight space. Based on this, we introduce a method that approximates a center-close weight using only two fine-tuned models, applicable during or after training. Our innovative layer-wise weight averaging technique surpasses state-of-the-art model methods such as Model Soup, utilizing only two fine-tuned models. This strategy can be aptly coined Model Stock, highlighting its reliance on selecting a minimal number of models to draw a more optimized-averaged model. We demonstrate the efficacy of Model Stock with fine-tuned models based upon pre-trained CLIP architectures, achieving remarkable performance on both ID and OOD tasks on the standard benchmarks, all while barely bringing extra computational demands. Our code and pre-trained models are available at https://github.com/naver-ai/model-stock. △ Less

Submitted 28 March, 2024; originally announced March 2024.

Comments: Code at https://github.com/naver-ai/model-stock

arXiv:2403.19218 [pdf, other]

A piecewise neural network method for solving large interval solution to initial value problem of ordinary differential equations

Authors: Dongpeng Han, Chaolu Temuer

Abstract: Various traditional numerical methods for solving initial value problems of differential equations often produce local solutions near the initial value point, despite the problems having larger interval solutions. Even current popular neural network algorithms or deep learning methods cannot guarantee yielding large interval solutions for these problems. In this paper, we propose a piecewise neura… ▽ More Various traditional numerical methods for solving initial value problems of differential equations often produce local solutions near the initial value point, despite the problems having larger interval solutions. Even current popular neural network algorithms or deep learning methods cannot guarantee yielding large interval solutions for these problems. In this paper, we propose a piecewise neural network approach to obtain a large interval numerical solution for initial value problems of differential equations. In this method, we first divide the solution interval, on which the initial problem is to be solved, into several smaller intervals. Neural networks with a unified structure are then employed on each sub-interval to solve the related sub-problems. By assembling these neural network solutions, a piecewise expression of the large interval solution to the problem is constructed, referred to as the piecewise neural network solution. The continuous differentiability of the solution over the entire interval, except for finite points, is proven through theoretical analysis and employing a parameter transfer technique. Additionally, a parameter transfer and multiple rounds of pre-training technique are utilized to enhance the accuracy of the approximation solution. Compared with existing neural network algorithms, this method does not increase the network size and training data scale for training the network on each sub-domain. Finally, several numerical experiments are presented to demonstrate the efficiency of the proposed algorithm. △ Less

Submitted 28 March, 2024; originally announced March 2024.

Comments: 26 pages,13 figures

arXiv:2403.15692 [pdf, other]

Block Orthogonal Sparse Superposition Codes for $ \sf{L}^3 $ Communications: Low Error Rate, Low Latency, and Low Power Consumption

Authors: Donghwa Han, Bowhyung Lee, Min Jang, Donghun Lee, Seho Myung, Namyoon Lee

Abstract: Block orthogonal sparse superposition (BOSS) code is a class of joint coded modulation methods, which can closely achieve the finite-blocklength capacity with a low-complexity decoder at a few coding rates under Gaussian channels. However, for fading channels, the code performance degrades considerably because coded symbols experience different channel fading effects. In this paper, we put forth n… ▽ More Block orthogonal sparse superposition (BOSS) code is a class of joint coded modulation methods, which can closely achieve the finite-blocklength capacity with a low-complexity decoder at a few coding rates under Gaussian channels. However, for fading channels, the code performance degrades considerably because coded symbols experience different channel fading effects. In this paper, we put forth novel joint demodulation and decoding methods for BOSS codes under fading channels. For a fast fading channel, we present a minimum mean square error approximate maximum a posteriori (MMSE-A-MAP) algorithm for the joint demodulation and decoding when channel state information is available at the receiver (CSIR). We also propose a joint demodulation and decoding method without using CSIR for a block fading channel scenario. We refer to this as the non-coherent sphere decoding (NSD) algorithm. Simulation results demonstrate that BOSS codes with MMSE-A-MAP decoding outperform CRC-aided polar codes, while NSD decoding achieves comparable performance to quasi-maximum likelihood decoding with significantly reduced complexity. Both decoding algorithms are suitable for parallelization, satisfying low-latency constraints. Additionally, real-time simulations on a software-defined radio testbed validate the feasibility of using BOSS codes for low-power transmission. △ Less

Submitted 22 March, 2024; originally announced March 2024.

arXiv:2403.13977 [pdf, other]

Spectral Analysis of Lattice Schrödinger-Type Operators Associated with the Nonstationary Anderson Model and Intermittency

Authors: Dan Han, Stanislav Molchanov, Boris Vainberg

Abstract: The research explores a high irregularity, commonly referred to as intermittency, of the solution to the non-stationary parabolic Anderson problem: \begin{equation*} \frac{\partial u}{\partial t} = \varkappa \mathcal{L}u(t,x) + ξ_{t}(x)u(t,x) \end{equation*} with the initial condition $u(0,x) \equiv 1$, where $(t,x) \in [0,\infty)\times \mathbb{Z}^d$. Here, $\varkappa \mathcal{L}$ denotes… ▽ More The research explores a high irregularity, commonly referred to as intermittency, of the solution to the non-stationary parabolic Anderson problem: \begin{equation*} \frac{\partial u}{\partial t} = \varkappa \mathcal{L}u(t,x) + ξ_{t}(x)u(t,x) \end{equation*} with the initial condition $u(0,x) \equiv 1$, where $(t,x) \in [0,\infty)\times \mathbb{Z}^d$. Here, $\varkappa \mathcal{L}$ denotes a non-local Laplacian, and $ξ_{t}(x)$ is a correlated white noise potential. The observed irregularity is intricately linked to the upper part of the spectrum of the multiparticle Schrödinger equations for the moment functions $m_p(t,x_1,x_2,\cdots,x_p) = \langle u(t,x_1)u(t,x_2)\cdots u(t,x_p)\rangle$. In the first half of the paper, a weak form of intermittency is expressed through moment functions of order $p\geq 3$ and established for a wide class of operators $\varkappa \mathcal{L}$ with a positive-definite correlator $B=B(x))$ of the white noise. In the second half of the paper, the strong intermittency is studied. It relates to the existence of a positive eigenvalue for the lattice Schrödinger type operator with the potential $B$. This operator is associated with the second moment $m_2$. Now $B$ is not necessarily positive-definite, but $\sum B(x)\geq 0$. △ Less

Submitted 20 March, 2024; originally announced March 2024.

MSC Class: 60H25; 60H15; 81Q10; 37H15; 35B40

arXiv:2403.13298 [pdf, other]

Rotary Position Embedding for Vision Transformer

Authors: Byeongho Heo, Song Park, Dongyoon Han, Sangdoo Yun

Abstract: Rotary Position Embedding (RoPE) performs remarkably on language models, especially for length extrapolation of Transformers. However, the impacts of RoPE on computer vision domains have been underexplored, even though RoPE appears capable of enhancing Vision Transformer (ViT) performance in a way similar to the language domain. This study provides a comprehensive analysis of RoPE when applied to… ▽ More Rotary Position Embedding (RoPE) performs remarkably on language models, especially for length extrapolation of Transformers. However, the impacts of RoPE on computer vision domains have been underexplored, even though RoPE appears capable of enhancing Vision Transformer (ViT) performance in a way similar to the language domain. This study provides a comprehensive analysis of RoPE when applied to ViTs, utilizing practical implementations of RoPE for 2D vision data. The analysis reveals that RoPE demonstrates impressive extrapolation performance, i.e., maintaining precision while increasing image resolution at inference. It eventually leads to performance improvement for ImageNet-1k, COCO detection, and ADE-20k segmentation. We believe this study provides thorough guidelines to apply RoPE into ViT, promising improved backbone performance with minimal extra computational overhead. Our code and pre-trained models are available at https://github.com/naver-ai/rope-vit △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: 20 pages, 5 figures

arXiv:2403.12404 [pdf, other]

Understanding and Improving Training-free Loss-based Diffusion Guidance

Authors: Yifei Shen, Xinyang Jiang, Yezhen Wang, Yifan Yang, Dongqi Han, Dongsheng Li

Abstract: Adding additional control to pretrained diffusion models has become an increasingly popular research area, with extensive applications in computer vision, reinforcement learning, and AI for science. Recently, several studies have proposed training-free loss-based guidance by using off-the-shelf networks pretrained on clean images. This approach enables zero-shot conditional generation for universa… ▽ More Adding additional control to pretrained diffusion models has become an increasingly popular research area, with extensive applications in computer vision, reinforcement learning, and AI for science. Recently, several studies have proposed training-free loss-based guidance by using off-the-shelf networks pretrained on clean images. This approach enables zero-shot conditional generation for universal control formats, which appears to offer a free lunch in diffusion guidance. In this paper, we aim to develop a deeper understanding of training-free guidance, as well as overcome its limitations. We offer a theoretical analysis that supports training-free guidance from the perspective of optimization, distinguishing it from classifier-based (or classifier-free) guidance. To elucidate their drawbacks, we theoretically demonstrate that training-free guidance is more susceptible to adversarial gradients and exhibits slower convergence rates compared to classifier guidance. We then introduce a collection of techniques designed to overcome the limitations, accompanied by theoretical rationale and empirical evidence. Our experiments in image and motion generation confirm the efficacy of these techniques. △ Less

Submitted 29 May, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.11557 [pdf, other]

doi 10.1109/TAC.2024.3380710

Distributed Adaptive Gradient Algorithm with Gradient Tracking for Stochastic Non-Convex Optimization

Authors: Dongyu Han, Kun Liu, Yeming Lin, Yuanqing Xia

Abstract: This paper considers a distributed stochastic non-convex optimization problem, where the nodes in a network cooperatively minimize a sum of $L$-smooth local cost functions with sparse gradients. By adaptively adjusting the stepsizes according to the historical (possibly sparse) gradients, a distributed adaptive gradient algorithm is proposed, in which a gradient tracking estimator is used to handl… ▽ More This paper considers a distributed stochastic non-convex optimization problem, where the nodes in a network cooperatively minimize a sum of $L$-smooth local cost functions with sparse gradients. By adaptively adjusting the stepsizes according to the historical (possibly sparse) gradients, a distributed adaptive gradient algorithm is proposed, in which a gradient tracking estimator is used to handle the heterogeneity between different local cost functions. We establish an upper bound on the optimality gap, which indicates that our proposed algorithm can reach a first-order stationary solution dependent on the upper bound on the variance of the stochastic gradients. Finally, numerical examples are presented to illustrate the effectiveness of the algorithm. △ Less

Submitted 29 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

Comments: 14 pages, 8 figures

Journal ref: IEEE Transactions on Automatic Control (2024)

arXiv:2403.11079 [pdf, other]

Bridging Expert Knowledge with Deep Learning Techniques for Just-In-Time Defect Prediction

Authors: Xin Zhou, DongGyun Han, David Lo

Abstract: Just-In-Time (JIT) defect prediction aims to automatically predict whether a commit is defective or not, and has been widely studied in recent years. In general, most studies can be classified into two categories: 1) simple models using traditional machine learning classifiers with hand-crafted features, and 2) complex models using deep learning techniques to automatically extract features from co… ▽ More Just-In-Time (JIT) defect prediction aims to automatically predict whether a commit is defective or not, and has been widely studied in recent years. In general, most studies can be classified into two categories: 1) simple models using traditional machine learning classifiers with hand-crafted features, and 2) complex models using deep learning techniques to automatically extract features from commit contents. Hand-crafted features used by simple models are based on expert knowledge but may not fully represent the semantic meaning of the commits. On the other hand, deep learning-based features used by complex models represent the semantic meaning of commits but may not reflect useful expert knowledge. Simple models and complex models seem complementary to each other to some extent. To utilize the advantages of both simple and complex models, we propose a model fusion framework that adopts both early fusions on the feature level and late fusions on the decision level. We propose SimCom++ by adopting the best early and late fusion strategies. The experimental results show that SimCom++ can significantly outperform the baselines by 5.7--26.9\%. In addition, our experimental results confirm that the simple model and complex model are complementary to each other. △ Less

Submitted 17 March, 2024; originally announced March 2024.

Comments: 48 pages

arXiv:2403.09675 [pdf, other]

Open-Universe Indoor Scene Generation using LLM Program Synthesis and Uncurated Object Databases

Authors: Rio Aguina-Kang, Maxim Gumin, Do Heon Han, Stewart Morris, Seung Jean Yoo, Aditya Ganeshan, R. Kenny Jones, Qiuhong Anna Wei, Kailiang Fu, Daniel Ritchie

Abstract: We present a system for generating indoor scenes in response to text prompts. The prompts are not limited to a fixed vocabulary of scene descriptions, and the objects in generated scenes are not restricted to a fixed set of object categories -- we call this setting indoor scene generation. Unlike most prior work on indoor scene generation, our system does not require a large training dataset of ex… ▽ More We present a system for generating indoor scenes in response to text prompts. The prompts are not limited to a fixed vocabulary of scene descriptions, and the objects in generated scenes are not restricted to a fixed set of object categories -- we call this setting indoor scene generation. Unlike most prior work on indoor scene generation, our system does not require a large training dataset of existing 3D scenes. Instead, it leverages the world knowledge encoded in pre-trained large language models (LLMs) to synthesize programs in a domain-specific layout language that describe objects and spatial relations between them. Executing such a program produces a specification of a constraint satisfaction problem, which the system solves using a gradient-based optimization scheme to produce object positions and orientations. To produce object geometry, the system retrieves 3D meshes from a database. Unlike prior work which uses databases of category-annotated, mutually-aligned meshes, we develop a pipeline using vision-language models (VLMs) to retrieve meshes from massive databases of un-annotated, inconsistently-aligned meshes. Experimental evaluations show that our system outperforms generative models trained on 3D data for traditional, closed-universe scene generation tasks; it also outperforms a recent LLM-based layout generation method on open-universe scene generation. △ Less

Submitted 4 February, 2024; originally announced March 2024.

Comments: See ancillary files for link to supplemental material

arXiv:2402.15265 [pdf, other]

CloChat: Understanding How People Customize, Interact, and Experience Personas in Large Language Models

Authors: Juhye Ha, Hyeon Jeon, DaEun Han, Jinwook Seo, Changhoon Oh

Abstract: Large language models (LLMs) have facilitated significant strides in generating conversational agents, enabling seamless, contextually relevant dialogues across diverse topics. However, the existing LLM-driven conversational agents have fixed personalities and functionalities, limiting their adaptability to individual user needs. Creating personalized agent personas with distinct expertise or trai… ▽ More Large language models (LLMs) have facilitated significant strides in generating conversational agents, enabling seamless, contextually relevant dialogues across diverse topics. However, the existing LLM-driven conversational agents have fixed personalities and functionalities, limiting their adaptability to individual user needs. Creating personalized agent personas with distinct expertise or traits can address this issue. Nonetheless, we lack knowledge of how people customize and interact with agent personas. In this research, we investigated how users customize agent personas and their impact on interaction quality, diversity, and dynamics. To this end, we developed CloChat, an interface supporting easy and accurate customization of agent personas in LLMs. We conducted a study comparing how participants interact with CloChat and ChatGPT. The results indicate that participants formed emotional bonds with the customized agents, engaged in more dynamic dialogues, and showed interest in sustaining interactions. These findings contribute to design implications for future systems with conversational agents using LLMs. △ Less

Submitted 23 February, 2024; originally announced February 2024.

Comments: In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '24)

arXiv:2402.15019 [pdf, other]

Consistency-Guided Temperature Scaling Using Style and Content Information for Out-of-Domain Calibration

Authors: Wonjeong Choi, Jungwuk Park, Dong-Jun Han, Younghyun Park, Jaekyun Moon

Abstract: Research interests in the robustness of deep neural networks against domain shifts have been rapidly increasing in recent years. Most existing works, however, focus on improving the accuracy of the model, not the calibration performance which is another important requirement for trustworthy AI systems. Temperature scaling (TS), an accuracy-preserving post-hoc calibration method, has been proven to… ▽ More Research interests in the robustness of deep neural networks against domain shifts have been rapidly increasing in recent years. Most existing works, however, focus on improving the accuracy of the model, not the calibration performance which is another important requirement for trustworthy AI systems. Temperature scaling (TS), an accuracy-preserving post-hoc calibration method, has been proven to be effective in in-domain settings, but not in out-of-domain (OOD) due to the difficulty in obtaining a validation set for the unseen domain beforehand. In this paper, we propose consistency-guided temperature scaling (CTS), a new temperature scaling strategy that can significantly enhance the OOD calibration performance by providing mutual supervision among data samples in the source domains. Motivated by our observation that over-confidence stemming from inconsistent sample predictions is the main obstacle to OOD calibration, we propose to guide the scaling process by taking consistencies into account in terms of two different aspects -- style and content -- which are the key components that can well-represent data samples in multi-domain settings. Experimental results demonstrate that our proposed strategy outperforms existing works, achieving superior OOD calibration performance on various datasets. This can be accomplished by employing only the source domains without compromising accuracy, making our scheme directly applicable to various trustworthy AI systems. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: Accepted at AAAI-24 (The 38th AAAI Conference on Artificial Intelligence, February 2024)

arXiv:2402.13851 [pdf, other]

VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models

Authors: Jiawei Liang, Siyuan Liang, Man Luo, Aishan Liu, Dongchen Han, Ee-Chien Chang, Xiaochun Cao

Abstract: Autoregressive Visual Language Models (VLMs) showcase impressive few-shot learning capabilities in a multimodal context. Recently, multimodal instruction tuning has been proposed to further enhance instruction-following abilities. However, we uncover the potential threat posed by backdoor attacks on autoregressive VLMs during instruction tuning. Adversaries can implant a backdoor by injecting pois… ▽ More Autoregressive Visual Language Models (VLMs) showcase impressive few-shot learning capabilities in a multimodal context. Recently, multimodal instruction tuning has been proposed to further enhance instruction-following abilities. However, we uncover the potential threat posed by backdoor attacks on autoregressive VLMs during instruction tuning. Adversaries can implant a backdoor by injecting poisoned samples with triggers embedded in instructions or images, enabling malicious manipulation of the victim model's predictions with predefined triggers. Nevertheless, the frozen visual encoder in autoregressive VLMs imposes constraints on the learning of conventional image triggers. Additionally, adversaries may encounter restrictions in accessing the parameters and architectures of the victim model. To address these challenges, we propose a multimodal instruction backdoor attack, namely VL-Trojan. Our approach facilitates image trigger learning through an isolating and clustering strategy and enhance black-box-attack efficacy via an iterative character-level text trigger generation method. Our attack successfully induces target outputs during inference, significantly surpassing baselines (+62.52\%) in ASR. Moreover, it demonstrates robustness across various model scales and few-shot in-context reasoning scenarios. △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.08963 [pdf, other]

DUEL: Duplicate Elimination on Active Memory for Self-Supervised Class-Imbalanced Learning

Authors: Won-Seok Choi, Hyundo Lee, Dong-Sig Han, Junseok Park, Heeyeon Koo, Byoung-Tak Zhang

Abstract: Recent machine learning algorithms have been developed using well-curated datasets, which often require substantial cost and resources. On the other hand, the direct use of raw data often leads to overfitting towards frequently occurring class information. To address class imbalances cost-efficiently, we propose an active data filtering process during self-supervised pre-training in our novel fram… ▽ More Recent machine learning algorithms have been developed using well-curated datasets, which often require substantial cost and resources. On the other hand, the direct use of raw data often leads to overfitting towards frequently occurring class information. To address class imbalances cost-efficiently, we propose an active data filtering process during self-supervised pre-training in our novel framework, Duplicate Elimination (DUEL). This framework integrates an active memory inspired by human working memory and introduces distinctiveness information, which measures the diversity of the data in the memory, to optimize both the feature extractor and the memory. The DUEL policy, which replaces the most duplicated data with new samples, aims to enhance the distinctiveness information in the memory and thereby mitigate class imbalances. We validate the effectiveness of the DUEL framework in class-imbalanced environments, demonstrating its robustness and providing reliable results in downstream tasks. We also analyze the role of the DUEL policy in the training process through various metrics and visualizations. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: Accepted as a full paper at AAAI 2024: The 38th Annual AAAI Conference on Artificial Intelligence (Main Tech Track). 7 pages (main paper), 2 pages (references), 11 pages (appendix) each

arXiv:2402.04406 [pdf, other]

Regularized MIP Model for Optimal Power Flow with Energy Storage Systems and its Applications

Authors: Dahye Han, Nan Jiang, Santanu S. Dey, Weijun Xie

Abstract: Incorporating energy storage systems (ESS) into power systems has been studied in many recent works, where binary variables are often introduced to model the complementary nature of battery charging and discharging. A conventional approach for these ESS optimization problems is to relax binary variables and convert the problem into a linear program. However, such linear programming relaxation mode… ▽ More Incorporating energy storage systems (ESS) into power systems has been studied in many recent works, where binary variables are often introduced to model the complementary nature of battery charging and discharging. A conventional approach for these ESS optimization problems is to relax binary variables and convert the problem into a linear program. However, such linear programming relaxation models can yield unrealistic fractional solutions, such as simultaneous charging and discharging. In this paper, we develop a regularized Mixed-Integer Programming (MIP) model for the ESS optimal power flow (OPF) problem. We prove that under mild conditions, the proposed regularized model admits a zero integrality gap with its linear programming relaxation; hence, it can be solved efficiently. By studying the properties of the regularized MIP model, we show that its optimal solution is also near-optimal to the original ESS OPF problem, thereby providing a valid and tight upper bound for the ESS OPF problem. The use of the regularized MIP model allows us to solve two intractable problems: a two-stage stochastic ESS OPF problem and a trilevel network contingency problem. △ Less

Submitted 6 February, 2024; originally announced February 2024.

arXiv:2402.03448 [pdf, other]

Decentralized Sporadic Federated Learning: A Unified Algorithmic Framework with Convergence Guarantees

Authors: Shahryar Zehtabi, Dong-Jun Han, Rohit Parasnis, Seyyedali Hosseinalipour, Christopher G. Brinton

Abstract: Decentralized federated learning (DFL) captures FL settings where both (i) model updates and (ii) model aggregations are exclusively carried out by the clients without a central server. Existing DFL works have mostly focused on settings where clients conduct a fixed number of local updates between local model exchanges, overlooking heterogeneity and dynamics in communication and computation capabi… ▽ More Decentralized federated learning (DFL) captures FL settings where both (i) model updates and (ii) model aggregations are exclusively carried out by the clients without a central server. Existing DFL works have mostly focused on settings where clients conduct a fixed number of local updates between local model exchanges, overlooking heterogeneity and dynamics in communication and computation capabilities. In this work, we propose Decentralized Sporadic Federated Learning (DSpodFL), a DFL methodology built on a generalized notion of sporadicity in both local gradient and aggregation processes. DSpodFL subsumes many existing decentralized optimization methods under a unified algorithmic framework by modeling the per-iteration (i) occurrence of gradient descent at each client and (ii) exchange of models between client pairs as arbitrary indicator random variables, thus capturing heterogeneous and time-varying computation/communication scenarios. We analytically characterize the convergence behavior of DSpodFL for both convex and non-convex models, for both constant and diminishing learning rates, under mild assumptions on the communication graph connectivity, data heterogeneity across clients, and gradient noises, and show how our bounds recover existing results as special cases. Experiments demonstrate that DSpodFL consistently achieves improved training speeds compared with baselines under various system settings. △ Less

Submitted 31 May, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

arXiv:2402.02442 [pdf, other]

A Momentum Accelerated Algorithm for ReLU-based Nonlinear Matrix Decomposition

Authors: Qingsong Wang, Chunfeng Cui, Deren Han

Abstract: Recently, there has been a growing interest in the exploration of Nonlinear Matrix Decomposition (NMD) due to its close ties with neural networks. NMD aims to find a low-rank matrix from a sparse nonnegative matrix with a per-element nonlinear function. A typical choice is the Rectified Linear Unit (ReLU) activation function. To address over-fitting in the existing ReLU-based NMD model (ReLU-NMD),… ▽ More Recently, there has been a growing interest in the exploration of Nonlinear Matrix Decomposition (NMD) due to its close ties with neural networks. NMD aims to find a low-rank matrix from a sparse nonnegative matrix with a per-element nonlinear function. A typical choice is the Rectified Linear Unit (ReLU) activation function. To address over-fitting in the existing ReLU-based NMD model (ReLU-NMD), we propose a Tikhonov regularized ReLU-NMD model, referred to as ReLU-NMD-T. Subsequently, we introduce a momentum accelerated algorithm for handling the ReLU-NMD-T model. A distinctive feature, setting our work apart from most existing studies, is the incorporation of both positive and negative momentum parameters in our algorithm. Our numerical experiments on real-world datasets show the effectiveness of the proposed model and algorithm. Moreover, the code is available at https://github.com/nothing2wang/NMD-TM. △ Less

Submitted 4 February, 2024; originally announced February 2024.

Comments: 5 pages, 7 figures

Showing 1–50 of 642 results for author: Han, D