subscribe to arXiv mailings

Grain boundaries control lithiation of solid solution substrates in lithium metal batteries

Authors: Leonardo Shoji Aota, Chanwon Jung, Siyuan Zhang, Ömer K. Büyükuslu, Poonam Yadav, Mahander Pratap Singh, Xinren Chen, Eric Woods, Christina Scheu, Se-Ho Kim, Dierk Raabe, Baptiste Gault

Abstract: The development of sustainable transportation and communication systems requires an increase in both energy density and capacity retention of Li-batteries. Using substrates forming a solid solution with body centered cubic Li enhances the cycle stability of anode-less batteries. However, it remains unclear how the substrate microstructure affects the lithiation behavior. Here, we deploy a correlat… ▽ More The development of sustainable transportation and communication systems requires an increase in both energy density and capacity retention of Li-batteries. Using substrates forming a solid solution with body centered cubic Li enhances the cycle stability of anode-less batteries. However, it remains unclear how the substrate microstructure affects the lithiation behavior. Here, we deploy a correlative, near-atomic scale probing approach through combined ion- and electron-microscopy to examine the distribution of Li in Li-Ag diffusion couples as model system. We reveal that Li regions with over 93.8% at.% nucleate within Ag at random high angle grain boundaries, whereas grain interiors are not lithiated. We evidence the role of kinetics and mechanical constraint from the microstructure over equilibrium thermodynamics in dictating the lithiation process. The findings suggest that grain size and grain boundary character are critical to enhance the electrochemical performance of interlayers/electrodes, particularly for improving lithiation kinetics and hence reducing dendrite formation. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.09153 [pdf]

doi 10.1038/s41467-024-49841-6

Topological Fermi-arc surface state covered by floating electrons on a two-dimensional electride

Authors: Chan-young Lim, Min-Seok Kim, Dong Cheol Lim, Sunghun Kim, Yeonghoon Lee, Jaehoon Cha, Gyubin Lee, Sang Yong Song, Dinesh Thapa, Jonathan D. Denlinger, Seong-Gon Kim, Sung Wng Kim, Jungpil Seo, Yeongkwan Kim

Abstract: Two-dimensional electrides can acquire topologically non-trivial phases due to intriguing interplay between the cationic atomic layers and anionic electron layers. However, experimental evidence of topological surface states has yet to be verified. Here, via angle-resolved photoemission spectroscopy (ARPES) and scanning tunnelling microscopy (STM), we probe the magnetic Weyl states of the ferromag… ▽ More Two-dimensional electrides can acquire topologically non-trivial phases due to intriguing interplay between the cationic atomic layers and anionic electron layers. However, experimental evidence of topological surface states has yet to be verified. Here, via angle-resolved photoemission spectroscopy (ARPES) and scanning tunnelling microscopy (STM), we probe the magnetic Weyl states of the ferromagnetic electride $[Gd_{2}$C]^{2+}\cdot2e^{-}$. In particular, the presence of Weyl cones and Fermi-arc states is demonstrated through photon energy-dependent ARPES measurements, agreeing with theoretical band structure calculations. Notably, the STM measurements reveal that the Fermi-arc states exist underneath a floating quantum electron liquid on the top Gd layer, forming double-stacked surface states in a heterostructure. Our work thus not only unveils the non-trivial topology of the $[Gd_{2}$C]^{2+}\cdot2e^{-}$ electride but also realizes a surface heterostructure that can host phenomena distinct from the bulk. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 22 pages, 6 figures

Journal ref: Nat. Commun. 15 (2024) 5615

arXiv:2407.09033 [pdf, other]

Textual Query-Driven Mask Transformer for Domain Generalized Segmentation

Authors: Byeonghyun Pak, Byeongju Woo, Sunghwan Kim, Dae-hwan Kim, Hoseong Kim

Abstract: In this paper, we introduce a method to tackle Domain Generalized Semantic Segmentation (DGSS) by utilizing domain-invariant semantic knowledge from text embeddings of vision-language models. We employ the text embeddings as object queries within a transformer-based segmentation framework (textual object queries). These queries are regarded as a domain-invariant basis for pixel grouping in DGSS. T… ▽ More In this paper, we introduce a method to tackle Domain Generalized Semantic Segmentation (DGSS) by utilizing domain-invariant semantic knowledge from text embeddings of vision-language models. We employ the text embeddings as object queries within a transformer-based segmentation framework (textual object queries). These queries are regarded as a domain-invariant basis for pixel grouping in DGSS. To leverage the power of textual object queries, we introduce a novel framework named the textual query-driven mask transformer (tqdm). Our tqdm aims to (1) generate textual object queries that maximally encode domain-invariant semantics and (2) enhance the semantic clarity of dense visual features. Additionally, we suggest three regularization losses to improve the efficacy of tqdm by aligning between visual and textual features. By utilizing our method, the model can comprehend inherent semantic information for classes of interest, enabling it to generalize to extreme domains (e.g., sketch style). Our tqdm achieves 68.9 mIoU on GTA5$\rightarrow$Cityscapes, outperforming the prior state-of-the-art method by 2.5 mIoU. The project page is available at https://byeonghyunpak.github.io/tqdm. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: ECCV 2024

arXiv:2407.08892 [pdf, other]

Characterizing Prompt Compression Methods for Long Context Inference

Authors: Siddharth Jha, Lutfi Eren Erdogan, Sehoon Kim, Kurt Keutzer, Amir Gholami

Abstract: Long context inference presents challenges at the system level with increased compute and memory requirements, as well as from an accuracy perspective in being able to reason over long contexts. Recently, several methods have been proposed to compress the prompt to reduce the context length. However, there has been little work on comparing the different proposed methods across different tasks thro… ▽ More Long context inference presents challenges at the system level with increased compute and memory requirements, as well as from an accuracy perspective in being able to reason over long contexts. Recently, several methods have been proposed to compress the prompt to reduce the context length. However, there has been little work on comparing the different proposed methods across different tasks through a standardized analysis. This has led to conflicting results. To address this, here we perform a comprehensive characterization and evaluation of different prompt compression methods. In particular, we analyze extractive compression, summarization-based abstractive compression, and token pruning methods. Surprisingly, we find that extractive compression often outperforms all the other approaches, and enables up to 10x compression with minimal accuracy degradation. Interestingly, we also find that despite several recent claims, token pruning methods often lag behind extractive compression. We only found marginal improvements on summarization tasks. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: Es-FoMo @ ICML 2024

arXiv:2407.07438 [pdf, ps, other]

Near-order relation of power means

Authors: Jinmi Hwang, Sejong Kim

Abstract: On the setting of positive definite operators we study the near-order properties of power means such as the quasi-arithmetic mean (Hölder mean) and Rényi power mean. We see the monotonicity of spectral geometric mean and Wasserstein mean on parameters with respect to the near-order and the near-order relationship between the spectral geometric mean and Wasserstein mean. Furthermore, the monotonici… ▽ More On the setting of positive definite operators we study the near-order properties of power means such as the quasi-arithmetic mean (Hölder mean) and Rényi power mean. We see the monotonicity of spectral geometric mean and Wasserstein mean on parameters with respect to the near-order and the near-order relationship between the spectral geometric mean and Wasserstein mean. Furthermore, the monotonicity of quasi-arithmetic mean on parameters and the convergence of Rényi power mean to the log-Euclidean mean with respect to the near-order have been established. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.07024 [pdf, other]

Exploring Scalability of Self-Training for Open-Vocabulary Temporal Action Localization

Authors: Jeongseok Hyun, Su Ho Han, Hyolim Kang, Joon-Young Lee, Seon Joo Kim

Abstract: The vocabulary size in temporal action localization (TAL) is constrained by the scarcity of large-scale annotated datasets. To address this, recent works incorporate powerful pre-trained vision-language models (VLMs), such as CLIP, to perform open-vocabulary TAL (OV-TAL). However, unlike VLMs trained on extensive image/video-text pairs, existing OV-TAL methods still rely on small, fully labeled TA… ▽ More The vocabulary size in temporal action localization (TAL) is constrained by the scarcity of large-scale annotated datasets. To address this, recent works incorporate powerful pre-trained vision-language models (VLMs), such as CLIP, to perform open-vocabulary TAL (OV-TAL). However, unlike VLMs trained on extensive image/video-text pairs, existing OV-TAL methods still rely on small, fully labeled TAL datasets for training an action localizer. In this paper, we explore the scalability of self-training with unlabeled YouTube videos for OV-TAL. Our self-training approach consists of two stages. First, a class-agnostic action localizer is trained on a human-labeled TAL dataset and used to generate pseudo-labels for unlabeled videos. Second, the large-scale pseudo-labeled dataset is combined with the human-labeled dataset to train the localizer. Extensive experiments demonstrate that leveraging web-scale videos in self-training significantly enhances the generalizability of an action localizer. Additionally, we highlighted issues with existing OV-TAL evaluation schemes and proposed a new evaluation protocol. Code is released at https://github.com/HYUNJS/STOV-TAL △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.06851 [pdf, other]

Safe-Embed: Unveiling the Safety-Critical Knowledge of Sentence Encoders

Authors: Jinseok Kim, Jaewon Jung, Sangyeop Kim, Sohyung Park, Sungzoon Cho

Abstract: Despite the impressive capabilities of Large Language Models (LLMs) in various tasks, their vulnerability to unsafe prompts remains a critical issue. These prompts can lead LLMs to generate responses on illegal or sensitive topics, posing a significant threat to their safe and ethical use. Existing approaches attempt to address this issue using classification models, but they have several drawback… ▽ More Despite the impressive capabilities of Large Language Models (LLMs) in various tasks, their vulnerability to unsafe prompts remains a critical issue. These prompts can lead LLMs to generate responses on illegal or sensitive topics, posing a significant threat to their safe and ethical use. Existing approaches attempt to address this issue using classification models, but they have several drawbacks. With the increasing complexity of unsafe prompts, similarity search-based techniques that identify specific features of unsafe prompts provide a more robust and effective solution to this evolving problem. This paper investigates the potential of sentence encoders to distinguish safe from unsafe prompts, and the ability to classify various unsafe prompts according to a safety taxonomy. We introduce new pairwise datasets and the Categorical Purity (CP) metric to measure this capability. Our findings reveal both the effectiveness and limitations of existing sentence encoders, proposing directions to improve sentence encoders to operate as more robust safety detectors. Our code is available at https://github.com/JwdanielJung/Safe-Embed. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: ACL 2024 KnowledgeableLMs workshop paper

arXiv:2407.06204 [pdf, other]

A Survey on Mixture of Experts

Authors: Weilin Cai, Juyong Jiang, Fan Wang, Jing Tang, Sunghun Kim, Jiayi Huang

Abstract: Large language models (LLMs) have garnered unprecedented advancements across diverse fields, ranging from natural language processing to computer vision and beyond. The prowess of LLMs is underpinned by their substantial model size, extensive and diverse datasets, and the vast computational power harnessed during training, all of which contribute to the emergent abilities of LLMs (e.g., in-context… ▽ More Large language models (LLMs) have garnered unprecedented advancements across diverse fields, ranging from natural language processing to computer vision and beyond. The prowess of LLMs is underpinned by their substantial model size, extensive and diverse datasets, and the vast computational power harnessed during training, all of which contribute to the emergent abilities of LLMs (e.g., in-context learning) that are not present in small models. Within this context, the mixture of experts (MoE) has emerged as an effective method for substantially scaling up model capacity with minimal computation overhead, gaining significant attention from academia and industry. Despite its growing prevalence, there lacks a systematic and comprehensive review of the literature on MoE. This survey seeks to bridge that gap, serving as an essential resource for researchers delving into the intricacies of MoE. We first briefly introduce the structure of the MoE layer, followed by proposing a new taxonomy of MoE. Next, we overview the core designs for various MoE models including both algorithmic and systemic aspects, alongside collections of available open-source implementations, hyperparameter configurations and empirical evaluations. Furthermore, we delineate the multifaceted applications of MoE in practice, and outline some potential directions for future research. To facilitate ongoing updates and the sharing of cutting-edge developments in MoE research, we have established a resource repository accessible at https://github.com/withinmiaov/A-Survey-on-Mixture-of-Experts. △ Less

Submitted 26 June, 2024; originally announced July 2024.

arXiv:2407.05733 [pdf, other]

doi 10.1145/3657604.3664703

Is GPT-4 Alone Sufficient for Automated Essay Scoring?: A Comparative Judgment Approach Based on Rater Cognition

Authors: Seungju Kim, Meounggun Jo

Abstract: Large Language Models (LLMs) have shown promise in Automated Essay Scoring (AES), but their zero-shot and few-shot performance often falls short compared to state-of-the-art models and human raters. However, fine-tuning LLMs for each specific task is impractical due to the variety of essay prompts and rubrics used in real-world educational contexts. This study proposes a novel approach combining L… ▽ More Large Language Models (LLMs) have shown promise in Automated Essay Scoring (AES), but their zero-shot and few-shot performance often falls short compared to state-of-the-art models and human raters. However, fine-tuning LLMs for each specific task is impractical due to the variety of essay prompts and rubrics used in real-world educational contexts. This study proposes a novel approach combining LLMs and Comparative Judgment (CJ) for AES, using zero-shot prompting to choose between two essays. We demonstrate that a CJ method surpasses traditional rubric-based scoring in essay scoring using LLMs. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: 16 pages, 3 figures, Learning @ Scale 2024

arXiv:2407.05618 [pdf, other]

Improved limit on neutrinoless double beta decay of \mohundred~from AMoRE-I

Authors: A. Agrawal, V. V. Alenkov, P. Aryal, J. Beyer, B. Bhandari, R. S. Boiko, K. Boonin, O. Buzanov, C. R. Byeon, N. Chanthima, M. K. Cheoun, J. S. Choe, Seonho Choi, S. Choudhury, J. S. Chung, F. A. Danevich, M. Djamal, D. Drung, C. Enss, A. Fleischmann, A. M. Gangapshev, L. Gastaldo, Y. M. Gavrilyuk, A. M. Gezhaev, O. Gileva , et al. (83 additional authors not shown)

Abstract: AMoRE searches for the signature of neutrinoless double beta decay of $^{100}$Mo with a 100 kg sample of enriched $^{100}$Mo. Scintillating molybdate crystals coupled with a metallic magnetic calorimeter operate at milli-Kelvin temperatures to measure the energy of electrons emitted in the decay. As a demonstration of the full-scale AMoRE, we conducted AMoRE-I, a pre-experiment with 18 molybdate c… ▽ More AMoRE searches for the signature of neutrinoless double beta decay of $^{100}$Mo with a 100 kg sample of enriched $^{100}$Mo. Scintillating molybdate crystals coupled with a metallic magnetic calorimeter operate at milli-Kelvin temperatures to measure the energy of electrons emitted in the decay. As a demonstration of the full-scale AMoRE, we conducted AMoRE-I, a pre-experiment with 18 molybdate crystals, at the Yangyang Underground Laboratory for over two years. The exposure was 8.02 kg$\cdot$year (or 3.89 kg$_{\mathrm{^{100}Mo}}\cdot$year) and the total background rate near the Q-value was 0.025 $\pm$ 0.002 counts/keV/kg/year. We observed no indication of $0νββ$ decay and report a new lower limit of the half-life of $^{100}$Mo $0νββ$ decay as $ T^{0ν}_{1/2}>3.0\times10^{24}~\mathrm{years}$ at 90\% confidence level. The effective Majorana mass limit range is $m_{ββ}<$(210--610) meV using nuclear matrix elements estimated in the framework of different models, including the recent shell model calculations. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: 7 pages, 4 figures

arXiv:2407.05565 [pdf]

Exploring the role of nonlocal Coulomb interactions in perovskite transition metal oxides

Authors: Indukuru Ramesh Reddy, Chang-Jong Kang, Sooran Kim, Bongjae Kim

Abstract: Employing the density functional theory incorporating on-site and inter-site Coulomb interactions (DFT+U+V), we have investigated the role of the nonlocal interactions on the electronic structures of the transition metal oxide perovskites. Using constrained random phase approximation calculations, we derived screened Coulomb interaction parameters and revealed a competition between localization an… ▽ More Employing the density functional theory incorporating on-site and inter-site Coulomb interactions (DFT+U+V), we have investigated the role of the nonlocal interactions on the electronic structures of the transition metal oxide perovskites. Using constrained random phase approximation calculations, we derived screened Coulomb interaction parameters and revealed a competition between localization and screening effects, which results in nonmonotonic behavior with d-orbital occupation. We highlight the significant role and nonlocality of inter-site Coulomb interactions, V, comparable in magnitude to the local interaction, U. Our DFT+U+V results exemplarily show the representative band renormalization, and deviations from ideal extended Hubbard models due to increased hybridization between transition metal d and oxygen p orbitals as occupation increases. We further demonstrate that the inclusion of the inter-site V is essential for accurately reproducing the experimental magnetic order in transition metal oxides. △ Less

Submitted 7 July, 2024; originally announced July 2024.

arXiv:2407.05186 [pdf, other]

Understanding Political Communication and Political Communicators on Twitch

Authors: Sangyeon Kim

Abstract: As new technologies rapidly reshape patterns of political communication, platforms like Twitch are transforming how people consume political information. This entertainment-oriented live streaming platform allows us to observe the impact of technologies such as ``live-streaming'' and ``streaming-chat'' on political communication. Despite its entertainment focus, Twitch hosts a variety of political… ▽ More As new technologies rapidly reshape patterns of political communication, platforms like Twitch are transforming how people consume political information. This entertainment-oriented live streaming platform allows us to observe the impact of technologies such as ``live-streaming'' and ``streaming-chat'' on political communication. Despite its entertainment focus, Twitch hosts a variety of political actors, including politicians and pundits. This study explores Twitch politics by addressing three main questions: 1) Who are the political Twitch streamers? 2) What content is covered in political streams? 3) How do audiences of political streams interact with each other? To identify political streamers, I leveraged the Twitch API and supervised machine-learning techniques, identifying 574 political streamers. I used topic modeling to analyze the content of political streams, revealing seven broad categories of political topics and a unique pattern of communication involving context-specific ``emotes.'' Additionally, I created user-reference networks to examine interaction patterns, finding that a small number of users dominate the communication network. This research contributes to our understanding of how new social media technologies influence political communication, particularly among younger audiences. △ Less

Submitted 6 July, 2024; originally announced July 2024.

arXiv:2407.04597 [pdf, other]

Feature Attenuation of Defective Representation Can Resolve Incomplete Masking on Anomaly Detection

Authors: YeongHyeon Park, Sungho Kang, Myung Jin Kim, Hyeong Seok Kim, Juneho Yi

Abstract: In unsupervised anomaly detection (UAD) research, while state-of-the-art models have reached a saturation point with extensive studies on public benchmark datasets, they adopt large-scale tailor-made neural networks (NN) for detection performance or pursued unified models for various tasks. Towards edge computing, it is necessary to develop a computationally efficient and scalable solution that av… ▽ More In unsupervised anomaly detection (UAD) research, while state-of-the-art models have reached a saturation point with extensive studies on public benchmark datasets, they adopt large-scale tailor-made neural networks (NN) for detection performance or pursued unified models for various tasks. Towards edge computing, it is necessary to develop a computationally efficient and scalable solution that avoids large-scale complex NNs. Motivated by this, we aim to optimize the UAD performance with minimal changes to NN settings. Thus, we revisit the reconstruction-by-inpainting approach and rethink to improve it by analyzing strengths and weaknesses. The strength of the SOTA methods is a single deterministic masking approach that addresses the challenges of random multiple masking that is inference latency and output inconsistency. Nevertheless, the issue of failure to provide a mask to completely cover anomalous regions is a remaining weakness. To mitigate this issue, we propose Feature Attenuation of Defective Representation (FADeR) that only employs two MLP layers which attenuates feature information of anomaly reconstruction during decoding. By leveraging FADeR, features of unseen anomaly patterns are reconstructed into seen normal patterns, reducing false alarms. Experimental results demonstrate that FADeR achieves enhanced performance compared to similar-scale NNs. Furthermore, our approach exhibits scalability in performance enhancement when integrated with other single deterministic masking methods in a plug-and-play manner. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 11 pages, 6 figures, 5 tables

arXiv:2407.04280 [pdf, other]

LearnerVoice: A Dataset of Non-Native English Learners' Spontaneous Speech

Authors: Haechan Kim, Junho Myung, Seoyoung Kim, Sungpah Lee, Dongyeop Kang, Juho Kim

Abstract: Prevalent ungrammatical expressions and disfluencies in spontaneous speech from second language (L2) learners pose unique challenges to Automatic Speech Recognition (ASR) systems. However, few datasets are tailored to L2 learner speech. We publicly release LearnerVoice, a dataset consisting of 50.04 hours of audio and transcriptions of L2 learners' spontaneous speech. Our linguistic analysis revea… ▽ More Prevalent ungrammatical expressions and disfluencies in spontaneous speech from second language (L2) learners pose unique challenges to Automatic Speech Recognition (ASR) systems. However, few datasets are tailored to L2 learner speech. We publicly release LearnerVoice, a dataset consisting of 50.04 hours of audio and transcriptions of L2 learners' spontaneous speech. Our linguistic analysis reveals that transcriptions in our dataset contain L2S (L2 learner's Spontaneous speech) features, consisting of ungrammatical expressions and disfluencies (e.g., filler words, word repetitions, self-repairs, false starts), significantly more than native speech datasets. Fine-tuning whisper-small.en with LearnerVoice achieves a WER of 10.26%, 44.2% lower than vanilla whisper-small.en. Furthermore, our qualitative analysis indicates that 54.2% of errors from the vanilla model on LearnerVoice are attributable to L2S features, with 48.1% of them being reduced in the fine-tuned model. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: Accepted for INTERSPEECH 2024

arXiv:2407.04192 [pdf, other]

KAN-ODEs: Kolmogorov-Arnold Network Ordinary Differential Equations for Learning Dynamical Systems and Hidden Physics

Authors: Benjamin C. Koenig, Suyong Kim, Sili Deng

Abstract: Kolmogorov-Arnold Networks (KANs) as an alternative to Multi-layer perceptrons (MLPs) are a recent development demonstrating strong potential for data-driven modeling. This work applies KANs as the backbone of a Neural Ordinary Differential Equation framework, generalizing their use to the time-dependent and grid-sensitive cases often seen in scientific machine learning applications. The proposed… ▽ More Kolmogorov-Arnold Networks (KANs) as an alternative to Multi-layer perceptrons (MLPs) are a recent development demonstrating strong potential for data-driven modeling. This work applies KANs as the backbone of a Neural Ordinary Differential Equation framework, generalizing their use to the time-dependent and grid-sensitive cases often seen in scientific machine learning applications. The proposed KAN-ODEs retain the flexible dynamical system modeling framework of Neural ODEs while leveraging the many benefits of KANs, including faster neural scaling, stronger interpretability, and lower parameter counts when compared against MLPs. We demonstrate these benefits in three test cases: the Lotka-Volterra predator-prey model, Burgers' equation, and the Fisher-KPP PDE. We showcase the strong performance of parameter-lean KAN-ODE systems generally in reconstructing entire dynamical systems, and also in targeted applications to the inference of a source term in an otherwise known flow field. We additionally demonstrate the interpretability of KAN-ODEs via activation function visualization and symbolic regression of trained results. The successful training of KAN-ODEs and their improved performance when compared to traditional Neural ODEs implies significant potential in leveraging this novel network architecture in myriad scientific machine learning applications. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: 12 pages, 5 figures plus 1 appendix figure, 1 table plus 1 appendix table. B.C.K. and S.K. contributed equally to this work

ACM Class: I.6.5; G.1.7

arXiv:2407.04007 [pdf, other]

Domain Wall Networks as Skyrmion Crystals in Chiral Magnets

Authors: Seungho Lee, Toshiaki Fujimori, Muneto Nitta, Se Kwon Kim

Abstract: We theoretically investigate the ground states of a chiral magnet with a square anisotropy and show that it supports domain wall networks as stable ground states. A domain wall junction in the domain wall network turns out to be a skyrmion with half topological charge and, therefore, the found domain wall network has a second topological nature, a skyrmion crystal. More specifically, we present a… ▽ More We theoretically investigate the ground states of a chiral magnet with a square anisotropy and show that it supports domain wall networks as stable ground states. A domain wall junction in the domain wall network turns out to be a skyrmion with half topological charge and, therefore, the found domain wall network has a second topological nature, a skyrmion crystal. More specifically, we present a ground-state phase diagram of the chiral magnet with varying anisotropy parameters consisting of skyrmion lattices, chiral soliton lattices, and ferromagnetic states. In the presence of the square anisotropy, the skyrmion crystal forms a domain wall network. The size of domains in the domain wall network is shown to be tunable by an external magnetic field, offering a way to realize experimentally detectable domain wall networks. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: 8 pages, 6 figures, 2 pages of supplemental material

arXiv:2407.03828 [pdf, other]

NuSTAR as an Axion Helioscope

Authors: J. Ruz, E. Todarello, J. K. Vogel, M. Giannotti, B. Grefenstette, H. S. Hudson, I. G. Hannah, I. G. Irastorza, C. S. Kim, T. O'Shea, M. Regis, D. M. Smith, M. Taoso, J. Trujillo Bueno

Abstract: The nature of dark matter in the Universe is still an open question in astrophysics and cosmology. Axions and axion-like particles (ALPs) offer a compelling solution, and traditionally ground-based experiments have eagerly, but to date unsuccessfully, searched for these hypothetical low-mass particles that are expected to be produced in large quantities in the strong electromagnetic fields in the… ▽ More The nature of dark matter in the Universe is still an open question in astrophysics and cosmology. Axions and axion-like particles (ALPs) offer a compelling solution, and traditionally ground-based experiments have eagerly, but to date unsuccessfully, searched for these hypothetical low-mass particles that are expected to be produced in large quantities in the strong electromagnetic fields in the interior of stars. This work offers a fresh look at axions and ALPs by leveraging their conversion into X-rays in the magnetic field of the Sun's atmosphere rather than a laboratory magnetic field. Unique data acquired with the Nuclear Spectroscopic Telescope Array (NuSTAR) during the solar minimum in 2020 allows us to set stringent limits on the coupling of axions to photons using state-of-the-art magnetic field models of the solar atmosphere. We report pioneering limits on the axion-photon coupling strength of $6.9\times 10^{-12}$ GeV$^{-1}$ at 95\% confidence level for axion masses $m_a \lesssim 2\times 10^{-7}$ eV, surpassing current ground-based searches and further probing unexplored regions of the axion-photon coupling parameter space up to axion masses of $m_a \lesssim 5\times 10^{-4}$ eV. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: 23 pages, 12 figures

arXiv:2407.03563 [pdf, other]

Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition

Authors: Sungnyun Kim, Kangwook Jang, Sangmin Bae, Hoirin Kim, Se-Young Yun

Abstract: Audio-visual speech recognition (AVSR) aims to transcribe human speech using both audio and video modalities. In practical environments with noise-corrupted audio, the role of video information becomes crucial. However, prior works have primarily focused on enhancing audio features in AVSR, overlooking the importance of video features. In this study, we strengthen the video features by learning th… ▽ More Audio-visual speech recognition (AVSR) aims to transcribe human speech using both audio and video modalities. In practical environments with noise-corrupted audio, the role of video information becomes crucial. However, prior works have primarily focused on enhancing audio features in AVSR, overlooking the importance of video features. In this study, we strengthen the video features by learning three temporal dynamics in video data: context order, playback direction, and the speed of video frames. Cross-modal attention modules are introduced to enrich video features with audio information so that speech variability can be taken into account when training on the video temporal dynamics. Based on our approach, we achieve the state-of-the-art performance on the LRS2 and LRS3 AVSR benchmarks for the noise-dominant settings. Our approach excels in scenarios especially for babble and speech noise, indicating the ability to distinguish the speech signal that should be recognized from lip movements in the video modality. We support the validity of our methodology by offering the ablation experiments for the temporal dynamics losses and the cross-modal attention architecture design. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.03103 [pdf, other]

Cactus: Towards Psychological Counseling Conversations using Cognitive Behavioral Theory

Authors: Suyeon Lee, Sunghwan Kim, Minju Kim, Dongjin Kang, Dongil Yang, Harim Kim, Minseok Kang, Dayi Jung, Min Hee Kim, Seungbeen Lee, Kyoung-Mee Chung, Youngjae Yu, Dongha Lee, Jinyoung Yeo

Abstract: Recently, the demand for psychological counseling has significantly increased as more individuals express concerns about their mental health. This surge has accelerated efforts to improve the accessibility of counseling by using large language models (LLMs) as counselors. To ensure client privacy, training open-source LLMs faces a key challenge: the absence of realistic counseling datasets. To add… ▽ More Recently, the demand for psychological counseling has significantly increased as more individuals express concerns about their mental health. This surge has accelerated efforts to improve the accessibility of counseling by using large language models (LLMs) as counselors. To ensure client privacy, training open-source LLMs faces a key challenge: the absence of realistic counseling datasets. To address this, we introduce Cactus, a multi-turn dialogue dataset that emulates real-life interactions using the goal-oriented and structured approach of Cognitive Behavioral Therapy (CBT). We create a diverse and realistic dataset by designing clients with varied, specific personas, and having counselors systematically apply CBT techniques in their interactions. To assess the quality of our data, we benchmark against established psychological criteria used to evaluate real counseling sessions, ensuring alignment with expert evaluations. Experimental results demonstrate that Camel, a model trained with Cactus, outperforms other models in counseling skills, highlighting its effectiveness and potential as a counseling agent. We make our data, model, and code publicly available. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Under Review

arXiv:2407.02750 [pdf, other]

Learning to Reduce: Towards Improving Performance of Large Language Models on Structured Data

Authors: Younghun Lee, Sungchul Kim, Ryan A. Rossi, Tong Yu, Xiang Chen

Abstract: Large Language Models (LLMs) have been achieving competent performance on a wide range of downstream tasks, yet existing work shows that inference on structured data is challenging for LLMs. This is because LLMs need to either understand long structured data or select the most relevant evidence before inference, and both approaches are not trivial. This paper proposes a framework, Learning to Redu… ▽ More Large Language Models (LLMs) have been achieving competent performance on a wide range of downstream tasks, yet existing work shows that inference on structured data is challenging for LLMs. This is because LLMs need to either understand long structured data or select the most relevant evidence before inference, and both approaches are not trivial. This paper proposes a framework, Learning to Reduce, that fine-tunes a language model with On-Policy Learning to generate a reduced version of an input structured data. When compared to state-of-the-art LLMs like GPT-4, Learning to Reduce not only achieves outstanding performance in reducing the input, but shows generalizability on different datasets. We further show that the model fine-tuned with our framework helps LLMs better perform on table QA tasks especially when the context is longer. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: ICML 2024 Workshop on Long-Context Foundation Models, Vienna, Austria 2024. arXiv admin note: substantial text overlap with arXiv:2402.14195

arXiv:2407.01622 [pdf, other]

doi 10.1145/3637528.3671969

Addressing Prediction Delays in Time Series Forecasting: A Continuous GRU Approach with Derivative Regularization

Authors: Sheo Yon Jhin, Seojin Kim, Noseong Park

Abstract: Time series forecasting has been an essential field in many different application areas, including economic analysis, meteorology, and so forth. The majority of time series forecasting models are trained using the mean squared error (MSE). However, this training based on MSE causes a limitation known as prediction delay. The prediction delay, which implies the ground-truth precedes the prediction,… ▽ More Time series forecasting has been an essential field in many different application areas, including economic analysis, meteorology, and so forth. The majority of time series forecasting models are trained using the mean squared error (MSE). However, this training based on MSE causes a limitation known as prediction delay. The prediction delay, which implies the ground-truth precedes the prediction, can cause serious problems in a variety of fields, e.g., finance and weather forecasting -- as a matter of fact, predictions succeeding ground-truth observations are not practically meaningful although their MSEs can be low. This paper proposes a new perspective on traditional time series forecasting tasks and introduces a new solution to mitigate the prediction delay. We introduce a continuous-time gated recurrent unit (GRU) based on the neural ordinary differential equation (NODE) which can supervise explicit time-derivatives. We generalize the GRU architecture in a continuous-time manner and minimize the prediction delay through our time-derivative regularization. Our method outperforms in metrics such as MSE, Dynamic Time Warping (DTW) and Time Distortion Index (TDI). In addition, we demonstrate the low prediction delay of our method in a variety of datasets. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Comments: KDD 2024 accepted paper

arXiv:2407.01073 [pdf, other]

No More Potentially Dynamic Objects: Static Point Cloud Map Generation based on 3D Object Detection and Ground Projection

Authors: Soojin Woo, Donghwi Jung, Seong-Woo Kim

Abstract: In this paper, we propose an algorithm to generate a static point cloud map based on LiDAR point cloud data. Our proposed pipeline detects dynamic objects using 3D object detectors and projects points of dynamic objects onto the ground. Typically, point cloud data acquired in real-time serves as a snapshot of the surrounding areas containing both static objects and dynamic objects. The static obje… ▽ More In this paper, we propose an algorithm to generate a static point cloud map based on LiDAR point cloud data. Our proposed pipeline detects dynamic objects using 3D object detectors and projects points of dynamic objects onto the ground. Typically, point cloud data acquired in real-time serves as a snapshot of the surrounding areas containing both static objects and dynamic objects. The static objects include buildings and trees, otherwise, the dynamic objects contain objects such as parked cars that change their position over time. Removing dynamic objects from the point cloud map is crucial as they can degrade the quality and localization accuracy of the map. To address this issue, in this paper, we propose an algorithm that creates a map only consisting of static objects. We apply a 3D object detection algorithm to the point cloud data which are obtained from LiDAR to implement our pipeline. We then stack the points to create the map after performing ground segmentation and projection. As a result, not only we can eliminate currently dynamic objects at the time of map generation but also potentially dynamic objects such as parked vehicles. We validate the performance of our method using two kinds of datasets collected on real roads: KITTI and our dataset. The result demonstrates the capability of our proposal to create an accurate static map excluding dynamic objects from input point clouds. Also, we verified the improved performance of localization using a generated map based on our method. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.00859 [pdf, other]

Statistical inference on partially shape-constrained function-on-scalar linear regression models

Authors: Kyunghee Han, Yeonjoo Park, Soo-Young Kim

Abstract: We consider functional linear regression models where functional outcomes are associated with scalar predictors by coefficient functions with shape constraints, such as monotonicity and convexity, that apply to sub-domains of interest. To validate the partial shape constraints, we propose testing a composite hypothesis of linear functional constraints on regression coefficients. Our approach emplo… ▽ More We consider functional linear regression models where functional outcomes are associated with scalar predictors by coefficient functions with shape constraints, such as monotonicity and convexity, that apply to sub-domains of interest. To validate the partial shape constraints, we propose testing a composite hypothesis of linear functional constraints on regression coefficients. Our approach employs kernel- and spline-based methods within a unified inferential framework, evaluating the statistical significance of the hypothesis by measuring an $L^2$-distance between constrained and unconstrained model fits. In the theoretical study of large-sample analysis under mild conditions, we show that both methods achieve the standard rate of convergence observed in the nonparametric estimation literature. Through numerical experiments of finite-sample analysis, we demonstrate that the type I error rate keeps the significance level as specified across various scenarios and that the power increases with sample size, confirming the consistency of the test procedure under both estimation methods. Our theoretical and numerical results provide researchers the flexibility to choose a method based on computational preference. The practicality of partial shape-constrained inference is illustrated by two data applications: one involving clinical trials of NeuroBloc in type A-resistant cervical dystonia and the other with the National Institute of Mental Health Schizophrenia Study. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: 30 pages, 7 figures

arXiv:2407.00693 [pdf, other]

BAPO: Base-Anchored Preference Optimization for Personalized Alignment in Large Language Models

Authors: Gihun Lee, Minchan Jeong, Yujin Kim, Hojung Jung, Jaehoon Oh, Sangmook Kim, Se-Young Yun

Abstract: While learning to align Large Language Models (LLMs) with human preferences has shown remarkable success, aligning these models to meet the diverse user preferences presents further challenges in preserving previous knowledge. This paper examines the impact of personalized preference optimization on LLMs, revealing that the extent of knowledge loss varies significantly with preference heterogeneit… ▽ More While learning to align Large Language Models (LLMs) with human preferences has shown remarkable success, aligning these models to meet the diverse user preferences presents further challenges in preserving previous knowledge. This paper examines the impact of personalized preference optimization on LLMs, revealing that the extent of knowledge loss varies significantly with preference heterogeneity. Although previous approaches have utilized the KL constraint between the reference model and the policy model, we observe that they fail to maintain general knowledge and alignment when facing personalized preferences. To this end, we introduce Base-Anchored Preference Optimization (BAPO), a simple yet effective approach that utilizes the initial responses of reference model to mitigate forgetting while accommodating personalized alignment. BAPO effectively adapts to diverse user preferences while minimally affecting global knowledge or general alignment. Our experiments demonstrate the efficacy of BAPO in various setups. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: under review

arXiv:2407.00061 [pdf, ps, other]

Probabilistic multi-Stirling numbers of the second kind and probabilistic multi-Lah numbers

Authors: Taekyun Kim, Dae san Kim

Abstract: Assume that the moment generating function of the random vari able Y exists in a neighborhood of the origin. We introduce the probabilistic multi-Stirling numbers of the second kind associated with Y and the proba bilistic multi-Lah numbers associated with Y, both of indices (k1,k2,...,kr), by means of the multiple logarithm. Those numbers are respectively probabilistic extensions of the mul… ▽ More Assume that the moment generating function of the random vari able Y exists in a neighborhood of the origin. We introduce the probabilistic multi-Stirling numbers of the second kind associated with Y and the proba bilistic multi-Lah numbers associated with Y, both of indices (k1,k2,...,kr), by means of the multiple logarithm. Those numbers are respectively probabilistic extensions of the multi-Stirling numbers of the second kind and the multi-Lah numbers which, for (k1,k2,...,kr) = (1,1,...,1), boil down respectively to the Stirling numbers of the second and the unsigned Lah numbers. The aim of this paper is to study some properties, related identities, recurrence relations and explicit expressions of those probabilistic extension numbers in connection with several other special numbers △ Less

Submitted 17 June, 2024; originally announced July 2024.

Comments: 11 pages

MSC Class: 11B68; 11B73; 11B83

arXiv:2407.00006 [pdf, other]

Adaptive and Parallel Multiscale Framework for Modeling Cohesive Failure in Engineering Scale Systems

Authors: Sion Kim, Ezra Kissel, Karel Matous

Abstract: The high computational demands of multiscale modeling necessitate advanced parallel and adaptive strategies. To address this challenge, we introduce an adaptive method that utilizes two microscale models based on an offline database for multiscale modeling of curved interfaces (e.g., adhesive layers). This database employs nonlinear classifiers, developed using Support Vector Machines from microsc… ▽ More The high computational demands of multiscale modeling necessitate advanced parallel and adaptive strategies. To address this challenge, we introduce an adaptive method that utilizes two microscale models based on an offline database for multiscale modeling of curved interfaces (e.g., adhesive layers). This database employs nonlinear classifiers, developed using Support Vector Machines from microscale sampling data, as a preprocessing step for multiscale simulations. Next, we develop a new parallel network library that enables seamless model selection with customized communication layers, ensuring scalability in parallel computing environments. The correctness and effectiveness of the hierarchically parallel solver are verified on a crack propagation problem within the curved adhesive layer. Finally, we predict the ultimate bending moment and adhesive layer failure of a wind turbine blade and validate the solver on a difficult large-scale engineering problem. △ Less

Submitted 18 April, 2024; originally announced July 2024.

arXiv:2406.19848 [pdf, other]

3D Operation of Autonomous Excavator based on Reinforcement Learning through Independent Reward for Individual Joints

Authors: Yoonkyu Yoo, Donghwi Jung, Seong-Woo Kim

Abstract: In this paper, we propose a control algorithm based on reinforcement learning, employing independent rewards for each joint to control excavators in a 3D space. The aim of this research is to address the challenges associated with achieving precise control of excavators, which are extensively utilized in construction sites but prove challenging to control with precision due to their hydraulic stru… ▽ More In this paper, we propose a control algorithm based on reinforcement learning, employing independent rewards for each joint to control excavators in a 3D space. The aim of this research is to address the challenges associated with achieving precise control of excavators, which are extensively utilized in construction sites but prove challenging to control with precision due to their hydraulic structures. Traditional methods relied on operator expertise for precise excavator operation, occasionally resulting in safety accidents. Therefore, there have been endeavors to attain precise excavator control through equation-based control algorithms. However, these methods had the limitation of necessitating prior information related to physical values of the excavator, rendering them unsuitable for the diverse range of excavators used in the field. To overcome these limitations, we have explored reinforcement learning-based control methods that do not demand prior knowledge of specific equipment but instead utilize data to train models. Nevertheless, existing reinforcement learning-based methods overlooked cabin swing rotation and confined the bucket's workspace to a 2D plane. Control confined within such a limited area diminishes the applicability of the algorithm in construction sites. We address this issue by expanding the previous 2D plane workspace of the bucket operation into a 3D space, incorporating cabin swing rotation. By expanding the workspace into 3D, excavators can execute continuous operations without requiring human intervention. To accomplish this objective, distinct targets were established for each joint, facilitating the training of action values for each joint independently, regardless of the progress of other joint learning. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.19618 [pdf]

Unstable Retention Behavior in MIFIS FEFET: Accurate Analysis of the Origin by Absolute Polarization Measurement

Authors: Song-Hyeon Kuk, Kyul Ko, Bong Ho Kim, Jae-Hoon Han, Sang-Hyeon Kim

Abstract: Ferroelectric field-effect-transistor (FEFET) has emerged as a scalable solution for 3D NAND and embedded flash (eFlash), with recent progress in achieving large memory window (MW) using metal-insulator-ferroelectric-insulator-semiconductor (MIFIS) gate stacks. Although the physical origin of the large MW in the MIFIS stack has already been discussed, its retention characteristics have not been ex… ▽ More Ferroelectric field-effect-transistor (FEFET) has emerged as a scalable solution for 3D NAND and embedded flash (eFlash), with recent progress in achieving large memory window (MW) using metal-insulator-ferroelectric-insulator-semiconductor (MIFIS) gate stacks. Although the physical origin of the large MW in the MIFIS stack has already been discussed, its retention characteristics have not been explored yet. Here, we demonstrate MIFIS FEFET with a maximum MW of 9.7 V, and show that MIFIS FEFET has unstable retention characteristics, especially after erase. We discover the origin of the unstable retention characteristics and prove our hypothesis with absolute polarization measurement and different operation modes, showing that the unstable retention characteristics is a fundamental issue. Based on the understanding, we discuss a novel charge compensation model and promising engineering methodologies to achieve stable retention in MIFIS FEFET. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: We are submitting this to an IEEE journal but because of delays, we would like to share the information

arXiv:2406.19575 [pdf]

AR-PPF: Advanced Resolution-Based Pixel Preemption Data Filtering for Efficient Time-Series Data Analysis

Authors: Taewoong Kim, Kukjin Choi, Sungjun Kim

Abstract: With the advent of automation, many manufacturing industries have transitioned to data-centric methodologies, giving rise to an unprecedented influx of data during the manufacturing process. This data has become instrumental in analyzing the quality of manufacturing process and equipment. Engineers and data analysts, in particular, require extensive time-series data for seasonal cycle analysis. Ho… ▽ More With the advent of automation, many manufacturing industries have transitioned to data-centric methodologies, giving rise to an unprecedented influx of data during the manufacturing process. This data has become instrumental in analyzing the quality of manufacturing process and equipment. Engineers and data analysts, in particular, require extensive time-series data for seasonal cycle analysis. However, due to computational resource constraints, they are often limited to querying short-term data multiple times or resorting to the use of summarized data in which key patterns may be overlooked. This study proposes a novel solution to overcome these limitations; the advanced resolution-based pixel preemption data filtering (AR-PPF) algorithm. This technology allows for efficient visualization of time-series charts over long periods while significantly reducing the time required to retrieve data. We also demonstrates how this approach not only enhances the efficiency of data analysis but also ensures that key feature is not lost, thereby providing a more accurate and comprehensive understanding of the data. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: 7pages, preprint, '24 Samsung Best Paper Awards

arXiv:2406.19328 [pdf, other]

Subtractive Training for Music Stem Insertion using Latent Diffusion Models

Authors: Ivan Villa-Renteria, Mason L. Wang, Zachary Shah, Zhe Li, Soohyun Kim, Neelesh Ramachandran, Mert Pilanci

Abstract: We present Subtractive Training, a simple and novel method for synthesizing individual musical instrument stems given other instruments as context. This method pairs a dataset of complete music mixes with 1) a variant of the dataset lacking a specific stem, and 2) LLM-generated instructions describing how the missing stem should be reintroduced. We then fine-tune a pretrained text-to-audio diffusi… ▽ More We present Subtractive Training, a simple and novel method for synthesizing individual musical instrument stems given other instruments as context. This method pairs a dataset of complete music mixes with 1) a variant of the dataset lacking a specific stem, and 2) LLM-generated instructions describing how the missing stem should be reintroduced. We then fine-tune a pretrained text-to-audio diffusion model to generate the missing instrument stem, guided by both the existing stems and the text instruction. Our results demonstrate Subtractive Training's efficacy in creating authentic drum stems that seamlessly blend with the existing tracks. We also show that we can use the text instruction to control the generation of the inserted stem in terms of rhythm, dynamics, and genre, allowing us to modify the style of a single instrument in a full song while keeping the remaining instruments the same. Lastly, we extend this technique to MIDI formats, successfully generating compatible bass, drum, and guitar parts for incomplete arrangements. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.19287 [pdf, other]

Isotropy of cosmic rays beyond $10^{20}$ eV favors their heavy mass composition

Authors: Telescope Array Collaboration, R. U. Abbasi, Y. Abe, T. Abu-Zayyad, M. Allen, Y. Arai, R. Arimura, E. Barcikowski, J. W. Belz, D. R. Bergman, S. A. Blake, I. Buckland, B. G. Cheon, M. Chikawa, T. Fujii, K. Fujisue, K. Fujita, R. Fujiwara, M. Fukushima, G. Furlich, N. Globus, R. Gonzalez, W. Hanlon, N. Hayashida, H. He , et al. (118 additional authors not shown)

Abstract: We report an estimation of the injected mass composition of ultra-high energy cosmic rays (UHECRs) at energies higher than 10 EeV. The composition is inferred from an energy-dependent sky distribution of UHECR events observed by the Telescope Array surface detector by comparing it to the Large Scale Structure of the local Universe. In the case of negligible extra-galactic magnetic fields the resul… ▽ More We report an estimation of the injected mass composition of ultra-high energy cosmic rays (UHECRs) at energies higher than 10 EeV. The composition is inferred from an energy-dependent sky distribution of UHECR events observed by the Telescope Array surface detector by comparing it to the Large Scale Structure of the local Universe. In the case of negligible extra-galactic magnetic fields the results are consistent with a relatively heavy injected composition at E ~ 10 EeV that becomes lighter up to E ~ 100 EeV, while the composition at E > 100 EeV is very heavy. The latter is true even in the presence of highest experimentally allowed extra-galactic magnetic fields, while the composition at lower energies can be light if a strong EGMF is present. The effect of the uncertainty in the galactic magnetic field on these results is subdominant. △ Less

Submitted 3 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

Comments: 8 pages, 3 figures, accepted for publication in PRL

arXiv:2406.19286 [pdf, other]

Mass composition of ultra-high energy cosmic rays from distribution of their arrival directions with the Telescope Array

Authors: Telescope Array Collaboration, R. U. Abbasi, Y. Abe, T. Abu-Zayyad, M. Allen, Y. Arai, R. Arimura, E. Barcikowski, J. W. Belz, D. R. Bergman, S. A. Blake, I. Buckland, B. G. Cheon, M. Chikawa, T. Fujii, K. Fujisue, K. Fujita, R. Fujiwara, M. Fukushima, G. Furlich, N. Globus, R. Gonzalez, W. Hanlon, N. Hayashida, H. He , et al. (118 additional authors not shown)

Abstract: We use a new method to estimate the injected mass composition of ultrahigh cosmic rays (UHECRs) at energies higher than 10 EeV. The method is based on comparison of the energy-dependent distribution of cosmic ray arrival directions as measured by the Telescope Array experiment (TA) with that calculated in a given putative model of UHECR under the assumption that sources trace the large-scale struc… ▽ More We use a new method to estimate the injected mass composition of ultrahigh cosmic rays (UHECRs) at energies higher than 10 EeV. The method is based on comparison of the energy-dependent distribution of cosmic ray arrival directions as measured by the Telescope Array experiment (TA) with that calculated in a given putative model of UHECR under the assumption that sources trace the large-scale structure (LSS) of the Universe. As we report in the companion letter, the TA data show large deflections with respect to the LSS which can be explained, assuming small extra-galactic magnetic fields (EGMF), by an intermediate composition changing to a heavy one (iron) in the highest energy bin. Here we show that these results are robust to uncertainties in UHECR injection spectra, the energy scale of the experiment and galactic magnetic fields (GMF). The assumption of weak EGMF, however, strongly affects this interpretation at all but the highest energies E > 100 EeV, where the remarkable isotropy of the data implies a heavy injected composition even in the case of strong EGMF. This result also holds if UHECR sources are as rare as $2 \times 10^{-5}$ Mpc$^{-3}$, that is the conservative lower limit for the source number density. △ Less

Submitted 3 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

Comments: 18 pages, 11 figures, accepted for publication in PRD

arXiv:2406.19135 [pdf, other]

DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability

Authors: Hyun Joon Park, Jin Sob Kim, Wooseok Shin, Sung Won Han

Abstract: Expressive Text-to-Speech (TTS) using reference speech has been studied extensively to synthesize natural speech, but there are limitations to obtaining well-represented styles and improving model generalization ability. In this study, we present Diffusion-based EXpressive TTS (DEX-TTS), an acoustic model designed for reference-based speech synthesis with enhanced style representations. Based on a… ▽ More Expressive Text-to-Speech (TTS) using reference speech has been studied extensively to synthesize natural speech, but there are limitations to obtaining well-represented styles and improving model generalization ability. In this study, we present Diffusion-based EXpressive TTS (DEX-TTS), an acoustic model designed for reference-based speech synthesis with enhanced style representations. Based on a general diffusion TTS framework, DEX-TTS includes encoders and adapters to handle styles extracted from reference speech. Key innovations contain the differentiation of styles into time-invariant and time-variant categories for effective style extraction, as well as the design of encoders and adapters with high generalization ability. In addition, we introduce overlapping patchify and convolution-frequency patch embedding strategies to improve DiT-based diffusion networks for TTS. DEX-TTS yields outstanding performance in terms of objective and subjective evaluation in English multi-speaker and emotional multi-speaker datasets, without relying on pre-training strategies. Lastly, the comparison results for the general TTS on a single-speaker dataset verify the effectiveness of our enhanced diffusion backbone. Demos are available here. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: Preprint

arXiv:2406.19132 [pdf, other]

Origin of extended Main Sequence Turn Off in open cluster NGC 2355

Authors: Jayanand Maurya, M. R. Samal, Louis Amard, Yu Zhang, Hubiao Niu, Sang Chul Kim, Y. C. Joshi, B. Kumar

Abstract: The presence of extended Main Sequence Turn-Off (eMSTO) in the open clusters has been attributed to various factors, such as spread in rotation rates, binary stars, and dust-like extinction from stellar excretion discs. We present a comprehensive analysis of the eMSTO in the open cluster NGC 2355. Using spectra from the Gaia-ESO archives, we find that the stars in the red part of the eMSTO have a… ▽ More The presence of extended Main Sequence Turn-Off (eMSTO) in the open clusters has been attributed to various factors, such as spread in rotation rates, binary stars, and dust-like extinction from stellar excretion discs. We present a comprehensive analysis of the eMSTO in the open cluster NGC 2355. Using spectra from the Gaia-ESO archives, we find that the stars in the red part of the eMSTO have a higher mean v sin i value of 135.3$\pm$4.6 km s$^{-1}$ compared to the stars in the blue part that have an average v sin i equal to 81.3$\pm$5.6 km s$^{-1}$. This suggests that the eMSTO in NGC 2355 is possibly caused by the spread in rotation rates of stars. We do not find any substantial evidence of the dust-like extinction from the eMSTO stars using ultraviolet data from the Swift survey. The estimated synchronization time for low mass ratio close binaries in the blue part of the eMSTO suggests that they would be mostly slow-rotating if present. However, the stars in the blue part of the eMSTO are preferentially located in the outer region of the cluster indicating that they may lack low mass ratio close binaries. The spread in rotation rates of eMSTO stars in NGC 2355 is most likely caused by the star-disc interaction mechanism. The stars in the lower main sequence beyond the eMSTO region of NGC 2355 are slow-rotating (mean v sin i = 26.5$\pm$1.3 km s$^{-1}$) possibly due to the magnetic braking of their rotations. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: 11 pages, 12 figures, accepted for publication in MNRAS

arXiv:2406.18858 [pdf]

On-off switchable nonreciprocal negative refraction in non-Hermitian photon-magnon hybrid systems

Authors: Junyoung Kim, Bosung Kim, Bo-Jong Kim, Haechan Jeon, Sang-Koog Kim

Abstract: Photon-magnon coupling, where electromagnetic waves interact with spin waves, and negative refraction, which bends the direction of electromagnetic waves unnaturally, constitute critical foundations and advancements in the realms of optics, spintronics, and quantum information technology. Here, we explore a magnetic-field-controlled, on-off switchable, nonreciprocal negative refraction within a no… ▽ More Photon-magnon coupling, where electromagnetic waves interact with spin waves, and negative refraction, which bends the direction of electromagnetic waves unnaturally, constitute critical foundations and advancements in the realms of optics, spintronics, and quantum information technology. Here, we explore a magnetic-field-controlled, on-off switchable, nonreciprocal negative refraction within a non-Hermitian photon-magnon hybrid system. By integrating an yttrium iron garnet film with an inverted split-ring resonator, we discover pronounced negative refraction driven by the system's non-Hermitian properties. This phenomenon exhibits unique nonreciprocal behavior dependent on the signal's propagation direction. Our analytical model sheds light on the crucial interplay between coherent and dissipative coupling, significantly altering permittivity and permeability's imaginary components, crucial for negative refraction's emergence. This work pioneers new avenues for employing negative refraction in photon-magnon hybrid systems, signaling substantial advancements in quantum hybrid systems. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: 27 pages, 4 figures

arXiv:2406.18551 [pdf, other]

GFFE: G-buffer Free Frame Extrapolation for Low-latency Real-time Rendering

Authors: Songyin Wu, Deepak Vembar, Anton Sochenov, Selvakumar Panneer, Sungye Kim, Anton Kaplanyan, Ling-Qi Yan

Abstract: Real-time rendering has been embracing ever-demanding effects, such as ray tracing. However, rendering such effects in high resolution and high frame rate remains challenging. Frame extrapolation methods, which don't introduce additional latency as opposed to frame interpolation methods such as DLSS 3 and FSR 3, boost the frame rate by generating future frames based on previous frames. However, it… ▽ More Real-time rendering has been embracing ever-demanding effects, such as ray tracing. However, rendering such effects in high resolution and high frame rate remains challenging. Frame extrapolation methods, which don't introduce additional latency as opposed to frame interpolation methods such as DLSS 3 and FSR 3, boost the frame rate by generating future frames based on previous frames. However, it is a more challenging task because of the lack of information in the disocclusion regions, and recent methods also have a high engine integration cost due to requiring G-buffers as input. We propose a \emph{G-buffer free} frame extrapolation, GFFE, with a novel heuristic framework and an efficient neural network, to plausibly generate new frames in real-time without introducing additional latency. We analyze the motion of dynamic fragments and different types of disocclusions, and design the corresponding modules of the extrapolation block to handle them. After filling disocclusions, a light-weight shading correction network is used to correct shading and improve overall quality. GFFE achieves comparable or better results compared to previous interpolation as well as G-buffer-dependent extrapolation methods, with more efficient performance and easier game integration. △ Less

Submitted 23 May, 2024; originally announced June 2024.

arXiv:2406.17869 [pdf, other]

Burst Image Super-Resolution with Base Frame Selection

Authors: Sanghyun Kim, Min Jung Lee, Woohyeok Kim, Deunsol Jung, Jaesung Rim, Sunghyun Cho, Minsu Cho

Abstract: Burst image super-resolution has been a topic of active research in recent years due to its ability to obtain a high-resolution image by using complementary information between multiple frames in the burst. In this work, we explore using burst shots with non-uniform exposures to confront real-world practical scenarios by introducing a new benchmark dataset, dubbed Non-uniformly Exposed Burst Image… ▽ More Burst image super-resolution has been a topic of active research in recent years due to its ability to obtain a high-resolution image by using complementary information between multiple frames in the burst. In this work, we explore using burst shots with non-uniform exposures to confront real-world practical scenarios by introducing a new benchmark dataset, dubbed Non-uniformly Exposed Burst Image (NEBI), that includes the burst frames at varying exposure times to obtain a broader range of irradiance and motion characteristics within a scene. As burst shots with non-uniform exposures exhibit varying levels of degradation, fusing information of the burst shots into the first frame as a base frame may not result in optimal image quality. To address this limitation, we propose a Frame Selection Network (FSN) for non-uniform scenarios. This network seamlessly integrates into existing super-resolution methods in a plug-and-play manner with low computational costs. The comparative analysis reveals the effectiveness of the nonuniform setting for the practical scenario and our FSN on synthetic-/real- NEBI datasets. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: CVPR2024W NTIRE accepted

arXiv:2406.17310 [pdf, other]

High Fidelity Text-to-Speech Via Discrete Tokens Using Token Transducer and Group Masked Language Model

Authors: Joun Yeop Lee, Myeonghun Jeong, Minchan Kim, Ji-Hyun Lee, Hoon-Young Cho, Nam Soo Kim

Abstract: We propose a novel two-stage text-to-speech (TTS) framework with two types of discrete tokens, i.e., semantic and acoustic tokens, for high-fidelity speech synthesis. It features two core components: the Interpreting module, which processes text and a speech prompt into semantic tokens focusing on linguistic contents and alignment, and the Speaking module, which captures the timbre of the target v… ▽ More We propose a novel two-stage text-to-speech (TTS) framework with two types of discrete tokens, i.e., semantic and acoustic tokens, for high-fidelity speech synthesis. It features two core components: the Interpreting module, which processes text and a speech prompt into semantic tokens focusing on linguistic contents and alignment, and the Speaking module, which captures the timbre of the target voice to generate acoustic tokens from semantic tokens, enriching speech reconstruction. The Interpreting stage employs a transducer for its robustness in aligning text to speech. In contrast, the Speaking stage utilizes a Conformer-based architecture integrated with a Grouped Masked Language Model (G-MLM) to boost computational efficiency. Our experiments verify that this innovative structure surpasses the conventional models in the zero-shot scenario in terms of speech quality and speaker similarity. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: Accepted by Interspeech2024

arXiv:2406.17254 [pdf, other]

Scalp Diagnostic System With Label-Free Segmentation and Training-Free Image Translation

Authors: Youngmin Kim, Saejin Kim, Hoyeon Moon, Youngjae Yu, Junhyug Noh

Abstract: Scalp diseases and alopecia affect millions of people around the world, underscoring the urgent need for early diagnosis and management of the disease. However, the development of a comprehensive AI-based diagnosis system encompassing these conditions remains an underexplored domain due to the challenges associated with data imbalance and the costly nature of labeling. To address these issues, we… ▽ More Scalp diseases and alopecia affect millions of people around the world, underscoring the urgent need for early diagnosis and management of the disease. However, the development of a comprehensive AI-based diagnosis system encompassing these conditions remains an underexplored domain due to the challenges associated with data imbalance and the costly nature of labeling. To address these issues, we propose ScalpVision, an AI-driven system for the holistic diagnosis of scalp diseases and alopecia. In ScalpVision, effective hair segmentation is achieved using pseudo image-label pairs and an innovative prompting method in the absence of traditional hair masking labels. This approach is crucial for extracting key features such as hair thickness and count, which are then used to assess alopecia severity. Additionally, ScalpVision introduces DiffuseIT-M, a generative model adept at dataset augmentation while maintaining hair information, facilitating improved predictions of scalp disease severity. Our experimental results affirm ScalpVision's efficiency in diagnosing a variety of scalp conditions and alopecia, showcasing its potential as a valuable tool in dermatological care. △ Less

Submitted 25 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

Comments: IEEE Transactions on Medical Imaging (Under Review)

arXiv:2406.17145 [pdf, other]

GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism

Authors: Byungsoo Jeon, Mengdi Wu, Shiyi Cao, Sunghyun Kim, Sunghyun Park, Neeraj Aggarwal, Colin Unger, Daiyaan Arfeen, Peiyuan Liao, Xupeng Miao, Mohammad Alizadeh, Gregory R. Ganger, Tianqi Chen, Zhihao Jia

Abstract: Deep neural networks (DNNs) continue to grow rapidly in size, making them infeasible to train on a single device. Pipeline parallelism is commonly used in existing DNN systems to support large-scale DNN training by partitioning a DNN into multiple stages, which concurrently perform DNN training for different micro-batches in a pipeline fashion. However, existing pipeline-parallel approaches only c… ▽ More Deep neural networks (DNNs) continue to grow rapidly in size, making them infeasible to train on a single device. Pipeline parallelism is commonly used in existing DNN systems to support large-scale DNN training by partitioning a DNN into multiple stages, which concurrently perform DNN training for different micro-batches in a pipeline fashion. However, existing pipeline-parallel approaches only consider sequential pipeline stages and thus ignore the topology of a DNN, resulting in missed model-parallel opportunities. This paper presents graph pipeline parallelism (GPP), a new pipeline-parallel scheme that partitions a DNN into pipeline stages whose dependencies are identified by a directed acyclic graph. GPP generalizes existing sequential pipeline parallelism and preserves the inherent topology of a DNN to enable concurrent execution of computationally-independent operators, resulting in reduced memory requirement and improved GPU performance. In addition, we develop GraphPipe, a distributed system that exploits GPP strategies to enable performant and scalable DNN training. GraphPipe partitions a DNN into a graph of stages, optimizes micro-batch schedules for these stages, and parallelizes DNN training using the discovered GPP strategies. Evaluation on a variety of DNNs shows that GraphPipe outperforms existing pipeline-parallel systems such as PipeDream and Piper by up to 1.6X. GraphPipe also reduces the search time by 9-21X compared to PipeDream and Piper. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.16994 [pdf, other]

Quantum Multi-Agent Reinforcement Learning for Cooperative Mobile Access in Space-Air-Ground Integrated Networks

Authors: Gyu Seon Kim, Yeryeong Cho, Jaehyun Chung, Soohyun Park, Soyi Jung, Zhu Han, Joongheon Kim

Abstract: Achieving global space-air-ground integrated network (SAGIN) access only with CubeSats presents significant challenges such as the access sustainability limitations in specific regions (e.g., polar regions) and the energy efficiency limitations in CubeSats. To tackle these problems, high-altitude long-endurance unmanned aerial vehicles (HALE-UAVs) can complement these CubeSat shortcomings for prov… ▽ More Achieving global space-air-ground integrated network (SAGIN) access only with CubeSats presents significant challenges such as the access sustainability limitations in specific regions (e.g., polar regions) and the energy efficiency limitations in CubeSats. To tackle these problems, high-altitude long-endurance unmanned aerial vehicles (HALE-UAVs) can complement these CubeSat shortcomings for providing cooperatively global access sustainability and energy efficiency. However, as the number of CubeSats and HALE-UAVs, increases, the scheduling dimension of each ground station (GS) increases. As a result, each GS can fall into the curse of dimensionality, and this challenge becomes one major hurdle for efficient global access. Therefore, this paper provides a quantum multi-agent reinforcement Learning (QMARL)-based method for scheduling between GSs and CubeSats/HALE-UAVs in order to improve global access availability and energy efficiency. The main reason why the QMARL-based scheduler can be beneficial is that the algorithm facilitates a logarithmic-scale reduction in scheduling action dimensions, which is one critical feature as the number of CubeSats and HALE-UAVs expands. Additionally, individual GSs have different traffic demands depending on their locations and characteristics, thus it is essential to provide differentiated access services. The superiority of the proposed scheduler is validated through data-intensive experiments in realistic CubeSat/HALE-UAV settings. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: 17 pages, 22 figures

arXiv:2406.16702 [pdf, other]

North-PHASE: Studying Periodicity, Hot Spots, Accretion Stability and Early Evolution in young stars in the northern hemisphere

Authors: A. Sicilia-Aguilar, R. S. Kahar, M. E. Pelayo-Baldárrago, V. Roccatagliata, D. Froebrich, F. J. Galindo-Guil, J. Campbell-White, J. S. Kim, I. Mendigutía, L. Schlueter, P. S. Teixeira, S. Matsumura, M. Fang, A. Scholz, P. Ábrahám, A. Frasca, A. Garufi, C. Herbert, Á. Kóspál, C. F. Manara

Abstract: We present the overview and first results from the North-PHASE Legacy Survey, which follows six young clusters for five years, using the 2 deg$^2$ FoV of the JAST80 telescope from the Javalambre Observatory (Spain). North-PHASE investigates stellar variability on timescales from days to years for thousands of young stars distributed over entire clusters. This allows us to find new YSO, characteris… ▽ More We present the overview and first results from the North-PHASE Legacy Survey, which follows six young clusters for five years, using the 2 deg$^2$ FoV of the JAST80 telescope from the Javalambre Observatory (Spain). North-PHASE investigates stellar variability on timescales from days to years for thousands of young stars distributed over entire clusters. This allows us to find new YSO, characterise accretion and study inner disk evolution within the cluster context. Each region (Tr37, CepOB3, IC5070, IC348, NGC2264, and NGC1333) is observed in six filters (SDSS griz, u band, and J0660, which covers H$α$), detecting cluster members as well as field variable stars. Tr37 is used to prove feasibility and optimise the variability analysis techniques. In Tr37, variability reveals 50 new YSO, most of them proper motion outliers. North-PHASE independently confirms the youth of astrometric members, efficiently distinguishes accreting and non-accreting stars, reveals the extent of the cluster populations along Tr37/IC1396 bright rims, and detects variability resulting from rotation, dips, and irregular bursts. The proper motion outliers unveil a more complex star formation history than inferred from Gaia alone, and variability highlights previously hidden proper motion deviations in the surrounding clouds. We also find that non-YSO variables identified by North-PHASE cover a different variability parameter space and include long-period variables, eclipsing binaries, RR Lyr, and $δ$ Scuti stars. These early results also emphasize the power of variability to complete the picture of star formation where it is missed by astrometry. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: Accepted by MNRAS

arXiv:2406.16695 [pdf, other]

Geometry-Aware Score Distillation via 3D Consistent Noising and Gradient Consistency Modeling

Authors: Min-Seop Kwak, Donghoon Ahn, Ines Hyeonsu Kim, Jin-Hwa Kim, Seungryong Kim

Abstract: Score distillation sampling (SDS), the methodology in which the score from pretrained 2D diffusion models is distilled into 3D representation, has recently brought significant advancements in text-to-3D generation task. However, this approach is still confronted with critical geometric inconsistency problems such as the Janus problem. Starting from a hypothesis that such inconsistency problems may… ▽ More Score distillation sampling (SDS), the methodology in which the score from pretrained 2D diffusion models is distilled into 3D representation, has recently brought significant advancements in text-to-3D generation task. However, this approach is still confronted with critical geometric inconsistency problems such as the Janus problem. Starting from a hypothesis that such inconsistency problems may be induced by multiview inconsistencies between 2D scores predicted from various viewpoints, we introduce GSD, a simple and general plug-and-play framework for incorporating 3D consistency and therefore geometry awareness into the SDS process. Our methodology is composed of three components: 3D consistent noising, designed to produce 3D consistent noise maps that perfectly follow the standard Gaussian distribution, geometry-based gradient warping for identifying correspondences between predicted gradients of different viewpoints, and novel gradient consistency loss to optimize the scene geometry toward producing more consistent gradients. We demonstrate that our method significantly improves performance, successfully addressing the geometric inconsistency problems in text-to-3D generation task with minimal computation cost and being compatible with existing score distillation-based models. Our project page is available at https://ku-cvlab.github.io/GSD/. △ Less

Submitted 30 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.16175 [pdf, other]

The Persistence of Contrarianism on Twitter: Mapping users' sharing habits for the Ukraine war, COVID-19 vaccination, and the 2022 Midterm Elections

Authors: David Axelrod, Sangyeon Kim, John Paolillo

Abstract: Empirical studies of online disinformation emphasize matters of public concern such as the COVID-19 pandemic, foreign election interference, and the Russo-Ukraine war, largely in studies that treat the topics separately. Comparatively fewer studies attempt to relate such disparate topics and address the extent to which they share behaviors. In this study, we compare three samples of Twitter data o… ▽ More Empirical studies of online disinformation emphasize matters of public concern such as the COVID-19 pandemic, foreign election interference, and the Russo-Ukraine war, largely in studies that treat the topics separately. Comparatively fewer studies attempt to relate such disparate topics and address the extent to which they share behaviors. In this study, we compare three samples of Twitter data on COVID-19 vaccination, the Ukraine war and the 2022 midterm elections, to ascertain how distinct ideological stances of users across the three samples might be related. Our results indicate the emergence of a broad contrarian stance that is defined by its opposition to public health narratives/policies along with the Biden administration's foreign policy stances. Sharing activity within the contrarian position falls on a spectrum with outright conspiratorial content on one end. We confirm the existence of ideologically coherent cross-subject stances among Twitter users, but in a manner not squarely aligned with right-left political orientations. △ Less

Submitted 28 June, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

arXiv:2406.16136 [pdf, other]

Distribution-Free Online Change Detection for Low-Rank Images

Authors: Tingnan Gong, Seong-Hee Kim, Yao Xie

Abstract: We present a distribution-free CUSUM procedure designed for online change detection in a time series of low-rank images, particularly when the change causes a mean shift. We represent images as matrix data and allow for temporal dependence, in addition to inherent spatial dependence, before and after the change. The marginal distributions are assumed to be general, not limited to any specific para… ▽ More We present a distribution-free CUSUM procedure designed for online change detection in a time series of low-rank images, particularly when the change causes a mean shift. We represent images as matrix data and allow for temporal dependence, in addition to inherent spatial dependence, before and after the change. The marginal distributions are assumed to be general, not limited to any specific parametric distribution. We propose new monitoring statistics that utilize the low-rank structure of the in-control mean matrix. Additionally, we study the properties of the proposed detection procedure, assessing whether the monitoring statistics effectively capture a mean shift and evaluating the rate of increase in average run length relative to the control limit in both in-control and out-of-control cases. The effectiveness of our procedure is demonstrated through simulated and real data experiments. △ Less

Submitted 23 June, 2024; originally announced June 2024.

Comments: 29 pages, 7 figures

arXiv:2406.16042 [pdf, other]

Pose-Diversified Augmentation with Diffusion Model for Person Re-Identification

Authors: Inès Hyeonsu Kim, JoungBin Lee, Soowon Son, Woojeong Jin, Kyusun Cho, Junyoung Seo, Min-Seop Kwak, Seokju Cho, JeongYeol Baek, Byeongwon Lee, Seungryong Kim

Abstract: Person re-identification (Re-ID) often faces challenges due to variations in human poses and camera viewpoints, which significantly affect the appearance of individuals across images. Existing datasets frequently lack diversity and scalability in these aspects, hindering the generalization of Re-ID models to new camera systems. Previous methods have attempted to address these issues through data a… ▽ More Person re-identification (Re-ID) often faces challenges due to variations in human poses and camera viewpoints, which significantly affect the appearance of individuals across images. Existing datasets frequently lack diversity and scalability in these aspects, hindering the generalization of Re-ID models to new camera systems. Previous methods have attempted to address these issues through data augmentation; however, they rely on human poses already present in the training dataset, failing to effectively reduce the human pose bias in the dataset. We propose Diff-ID, a novel data augmentation approach that incorporates sparse and underrepresented human pose and camera viewpoint examples into the training data, addressing the limited diversity in the original training data distribution. Our objective is to augment a training dataset that enables existing Re-ID models to learn features unbiased by human pose and camera viewpoint variations. To achieve this, we leverage the knowledge of pre-trained large-scale diffusion models. Using the SMPL model, we simultaneously capture both the desired human poses and camera viewpoints, enabling realistic human rendering. The depth information provided by the SMPL model indirectly conveys the camera viewpoints. By conditioning the diffusion model on both the human pose and camera viewpoint concurrently through the SMPL model, we generate realistic images with diverse human poses and camera viewpoints. Qualitative results demonstrate the effectiveness of our method in addressing human pose bias and enhancing the generalizability of Re-ID models compared to other data augmentation-based Re-ID approaches. The performance gains achieved by training Re-ID models on our offline augmented dataset highlight the potential of our proposed framework in improving the scalability and generalizability of person Re-ID models. △ Less

Submitted 23 June, 2024; originally announced June 2024.

Comments: The project page is available at https://ku-cvlab.github.io/Diff-ID/

arXiv:2406.15664 [pdf, other]

Flat Posterior Does Matter For Bayesian Transfer Learning

Authors: Sungjun Lim, Jeyoon Yeom, Sooyon Kim, Hoyoon Byun, Jinho Kang, Yohan Jung, Jiyoung Jung, Kyungwoo Song

Abstract: The large-scale pre-trained neural network has achieved notable success in enhancing performance for downstream tasks. Another promising approach for generalization is Bayesian Neural Network (BNN), which integrates Bayesian methods into neural network architectures, offering advantages such as Bayesian Model averaging (BMA) and uncertainty quantification. Despite these benefits, transfer learning… ▽ More The large-scale pre-trained neural network has achieved notable success in enhancing performance for downstream tasks. Another promising approach for generalization is Bayesian Neural Network (BNN), which integrates Bayesian methods into neural network architectures, offering advantages such as Bayesian Model averaging (BMA) and uncertainty quantification. Despite these benefits, transfer learning for BNNs has not been widely investigated and shows limited improvement. We hypothesize that this issue arises from the inability to find flat minima, which is crucial for generalization performance. To address this, we evaluate the sharpness of BNNs in various settings, revealing their insufficiency in seeking flat minima and the influence of flatness on BMA performance. Therefore, we propose Sharpness-aware Bayesian Model Averaging (SA-BMA), a Bayesian-fitting flat posterior seeking optimizer integrated with Bayesian transfer learning. SA-BMA calculates the divergence between posteriors in the parameter space, aligning with the nature of BNNs, and serves as a generalized version of existing sharpness-aware optimizers. We validate that SA-BMA improves generalization performance in few-shot classification and distribution shift scenarios by ensuring flatness. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.15102 [pdf, other]

HLQ: Fast and Efficient Backpropagation via Hadamard Low-rank Quantization

Authors: Seonggon Kim, Eunhyeok Park

Abstract: With the rapid increase in model size and the growing importance of various fine-tuning applications, lightweight training has become crucial. Since the backward pass is twice as expensive as the forward pass, optimizing backpropagation is particularly important. However, modifications to this process can lead to suboptimal convergence, so training optimization should minimize perturbations, which… ▽ More With the rapid increase in model size and the growing importance of various fine-tuning applications, lightweight training has become crucial. Since the backward pass is twice as expensive as the forward pass, optimizing backpropagation is particularly important. However, modifications to this process can lead to suboptimal convergence, so training optimization should minimize perturbations, which is a highly challenging task. In this study, we introduce a novel optimization strategy called Hadamard Low-rank Quantization (HLQ), focusing on reducing the cost of backpropagation in convolutional and linear layers. We first analyze the sensitivity of gradient computation with respect to activation and weight, and judiciously design the HLQ pipeline to apply 4-bit Hadamard quantization to the activation gradient and Hadamard low-rank approximation to the weight gradient. This combination was found to be the best for maximizing benefits, and our extensive experiments demonstrate the outstanding performance of HLQ in both training from scratch and fine-tuning, achieving significant memory savings and acceleration on real GPUs with negligible quality degradation. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.14888 [pdf, other]

Finding dusty AGNs from the JWST CEERS survey with mid-infrared photometry

Authors: Tom C. -C. Chien, Chih-Teng Ling, Tomotsugu Goto, Cossas K. -W. Wu, Seong Jin Kim, Tetsuya Hashimoto, Yu-Wei Lin, Ece Kilerci, Simon C. -C. Ho, Po-Ya Wang, Bjorn Jasper R. Raquel

Abstract: The nature of the interaction between active galactic nuclei (AGNs) and their host galaxies remains an unsolved question. Therefore, conducting an AGN census is valuable to AGN research. Nevertheless, a significant fraction of AGNs are obscured by their environment, which blocks UV and optical emissions due to the dusty torus surrounding the central supermassive black hole (SMBH). To overcome this… ▽ More The nature of the interaction between active galactic nuclei (AGNs) and their host galaxies remains an unsolved question. Therefore, conducting an AGN census is valuable to AGN research. Nevertheless, a significant fraction of AGNs are obscured by their environment, which blocks UV and optical emissions due to the dusty torus surrounding the central supermassive black hole (SMBH). To overcome this challenge, mid-infrared (IR) surveys have emerged as a valuable tool for identifying obscured AGNs, as the obscured light is re-emitted in this range. With its high sensitivity, the James Webb Space Telescope (JWST) uncovered more fainter objects than previous telescopes. By applying the SED fitting, this work investigates AGN candidates in JWST Cosmic Evolution Early Release Science (CEERS) fields. We identified 42 candidates, 30 of them are classified as composites ($0.2\leq f_{\rm AGN, IR}< 0.5$), and 12 of them are AGNs ($f_{\rm AGN, IR}\geq 0.5$). We report the AGN luminosity contributions and AGN number fractions as a function of redshift and total infrared luminosity, showing that previously reported increasing relations are not apparent in our sample due to the sample size. We also extend the previous results on ultra-luminous infrared galaxies (ULIRGs, $L_{\rm TIR}\geq 10^{12} L_{\odot}$) to less luminous AGNs, highlighting the power of JWST. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: 15 pages, 20 figures, 4 tables. Accepted for publication in MNRAS. The 3 min summary: https://www.youtube.com/watch?v=mWUebbgUOh8

arXiv:2406.13214 [pdf, other]

Self-Explainable Temporal Graph Networks based on Graph Information Bottleneck

Authors: Sangwoo Seo, Sungwon Kim, Jihyeong Jung, Yoonho Lee, Chanyoung Park

Abstract: Temporal Graph Neural Networks (TGNN) have the ability to capture both the graph topology and dynamic dependencies of interactions within a graph over time. There has been a growing need to explain the predictions of TGNN models due to the difficulty in identifying how past events influence their predictions. Since the explanation model for a static graph cannot be readily applied to temporal grap… ▽ More Temporal Graph Neural Networks (TGNN) have the ability to capture both the graph topology and dynamic dependencies of interactions within a graph over time. There has been a growing need to explain the predictions of TGNN models due to the difficulty in identifying how past events influence their predictions. Since the explanation model for a static graph cannot be readily applied to temporal graphs due to its inability to capture temporal dependencies, recent studies proposed explanation models for temporal graphs. However, existing explanation models for temporal graphs rely on post-hoc explanations, requiring separate models for prediction and explanation, which is limited in two aspects: efficiency and accuracy of explanation. In this work, we propose a novel built-in explanation framework for temporal graphs, called Self-Explainable Temporal Graph Networks based on Graph Information Bottleneck (TGIB). TGIB provides explanations for event occurrences by introducing stochasticity in each temporal event based on the Information Bottleneck theory. Experimental results demonstrate the superiority of TGIB in terms of both the link prediction performance and explainability compared to state-of-the-art methods. This is the first work that simultaneously performs prediction and explanation for temporal graphs in an end-to-end manner. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: KDD 2024

Showing 1–50 of 7,494 results for author: Kim, S