subscribe to arXiv mailings

Eye-for-an-eye: Appearance Transfer with Semantic Correspondence in Diffusion Models

Authors: Sooyeon Go, Kyungmook Choi, Minjung Shin, Youngjung Uh

Abstract: As pretrained text-to-image diffusion models have become a useful tool for image synthesis, people want to specify the results in various ways. In this paper, we introduce a method to produce results with the same structure of a target image but painted with colors from a reference image, i.e., appearance transfer, especially following the semantic correspondence between the result and the referen… ▽ More As pretrained text-to-image diffusion models have become a useful tool for image synthesis, people want to specify the results in various ways. In this paper, we introduce a method to produce results with the same structure of a target image but painted with colors from a reference image, i.e., appearance transfer, especially following the semantic correspondence between the result and the reference. E.g., the result wing takes color from the reference wing, not the reference head. Existing methods rely on the query-key similarity within self-attention layer, usually producing defective results. To this end, we propose to find semantic correspondences and explicitly rearrange the features according to the semantic correspondences. Extensive experiments show the superiority of our method in various aspects: preserving the structure of the target and reflecting the color from the reference according to the semantic correspondences, even when the two images are not aligned. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: project page : https://sooyeon-go.github.io/eye_for_an_eye/

arXiv:2404.16721 [pdf, other]

Distilling Privileged Information for Dubins Traveling Salesman Problems with Neighborhoods

Authors: Min Kyu Shin, Su-Jeong Park, Seung-Keol Ryu, Heeyeon Kim, Han-Lim Choi

Abstract: This paper presents a novel learning approach for Dubins Traveling Salesman Problems(DTSP) with Neighborhood (DTSPN) to quickly produce a tour of a non-holonomic vehicle passing through neighborhoods of given task points. The method involves two learning phases: initially, a model-free reinforcement learning approach leverages privileged information to distill knowledge from expert trajectories ge… ▽ More This paper presents a novel learning approach for Dubins Traveling Salesman Problems(DTSP) with Neighborhood (DTSPN) to quickly produce a tour of a non-holonomic vehicle passing through neighborhoods of given task points. The method involves two learning phases: initially, a model-free reinforcement learning approach leverages privileged information to distill knowledge from expert trajectories generated by the LinKernighan heuristic (LKH) algorithm. Subsequently, a supervised learning phase trains an adaptation network to solve problems independently of privileged information. Before the first learning phase, a parameter initialization technique using the demonstration data was also devised to enhance training efficiency. The proposed learning method produces a solution about 50 times faster than LKH and substantially outperforms other imitation learning and RL with demonstration schemes, most of which fail to sense all the task points. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: 7 pages, 4 figures, double blind under review

arXiv:2404.11835 [pdf, other]

CAUS: A Dataset for Question Generation based on Human Cognition Leveraging Large Language Models

Authors: Minjung Shin, Donghyun Kim, Jeh-Kwang Ryu

Abstract: We introduce the Curious About Uncertain Scene (CAUS) dataset, designed to enable Large Language Models, specifically GPT-4, to emulate human cognitive processes for resolving uncertainties. Leveraging this dataset, we investigate the potential of LLMs to engage in questioning effectively. Our approach involves providing scene descriptions embedded with uncertainties to stimulate the generation of… ▽ More We introduce the Curious About Uncertain Scene (CAUS) dataset, designed to enable Large Language Models, specifically GPT-4, to emulate human cognitive processes for resolving uncertainties. Leveraging this dataset, we investigate the potential of LLMs to engage in questioning effectively. Our approach involves providing scene descriptions embedded with uncertainties to stimulate the generation of reasoning and queries. The queries are then classified according to multi-dimensional criteria. All procedures are facilitated by a collaborative system involving both LLMs and human researchers. Our results demonstrate that GPT-4 can effectively generate pertinent questions and grasp their nuances, particularly when given appropriate context and instructions. The study suggests that incorporating human-like questioning into AI models improves their ability to manage uncertainties, paving the way for future advancements in Artificial Intelligence (AI). △ Less

Submitted 19 May, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

Comments: 8 pages, 4 figures and 3 tables. This work has been accepted for presentation as a poster with full paper publication at CogSci 2024. This is the final submission

arXiv:2404.08611 [pdf, other]

Automatic Quantification of Serial PET/CT Images for Pediatric Hodgkin Lymphoma Patients Using a Longitudinally-Aware Segmentation Network

Authors: Xin Tie, Muheon Shin, Changhee Lee, Scott B. Perlman, Zachary Huemann, Amy J. Weisman, Sharon M. Castellino, Kara M. Kelly, Kathleen M. McCarten, Adina L. Alazraki, Junjie Hu, Steve Y. Cho, Tyler J. Bradshaw

Abstract: $\textbf{Purpose}$: Automatic quantification of longitudinal changes in PET scans for lymphoma patients has proven challenging, as residual disease in interim-therapy scans is often subtle and difficult to detect. Our goal was to develop a longitudinally-aware segmentation network (LAS-Net) that can quantify serial PET/CT images for pediatric Hodgkin lymphoma patients. $\textbf{Materials and Metho… ▽ More $\textbf{Purpose}$: Automatic quantification of longitudinal changes in PET scans for lymphoma patients has proven challenging, as residual disease in interim-therapy scans is often subtle and difficult to detect. Our goal was to develop a longitudinally-aware segmentation network (LAS-Net) that can quantify serial PET/CT images for pediatric Hodgkin lymphoma patients. $\textbf{Materials and Methods}$: This retrospective study included baseline (PET1) and interim (PET2) PET/CT images from 297 patients enrolled in two Children's Oncology Group clinical trials (AHOD1331 and AHOD0831). LAS-Net incorporates longitudinal cross-attention, allowing relevant features from PET1 to inform the analysis of PET2. Model performance was evaluated using Dice coefficients for PET1 and detection F1 scores for PET2. Additionally, we extracted and compared quantitative PET metrics, including metabolic tumor volume (MTV) and total lesion glycolysis (TLG) in PET1, as well as qPET and $Δ$SUVmax in PET2, against physician measurements. We quantified their agreement using Spearman's $ρ$ correlations and employed bootstrap resampling for statistical analysis. $\textbf{Results}$: LAS-Net detected residual lymphoma in PET2 with an F1 score of 0.606 (precision/recall: 0.615/0.600), outperforming all comparator methods (P<0.01). For baseline segmentation, LAS-Net achieved a mean Dice score of 0.772. In PET quantification, LAS-Net's measurements of qPET, $Δ$SUVmax, MTV and TLG were strongly correlated with physician measurements, with Spearman's $ρ$ of 0.78, 0.80, 0.93 and 0.96, respectively. The performance remained high, with a slight decrease, in an external testing cohort. $\textbf{Conclusion}$: LAS-Net achieved high performance in quantifying PET metrics across serial scans, highlighting the value of longitudinal awareness in evaluating multi-time-point imaging datasets. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: 6 figures, 4 tables in the main text

arXiv:2403.18503 [pdf, other]

Distributional Treatment Effect with Latent Rank Invariance

Authors: Myungkou Shin

Abstract: Treatment effect heterogeneity is of a great concern when evaluating the treatment. However, even with a simple case of a binary random treatment, the distribution of treatment effect is difficult to identify due to the fundamental limitation that we cannot observe both treated potential outcome and untreated potential outcome for a given individual. This paper assumes a conditional independence a… ▽ More Treatment effect heterogeneity is of a great concern when evaluating the treatment. However, even with a simple case of a binary random treatment, the distribution of treatment effect is difficult to identify due to the fundamental limitation that we cannot observe both treated potential outcome and untreated potential outcome for a given individual. This paper assumes a conditional independence assumption that the two potential outcomes are independent of each other given a scalar latent variable. Using two proxy variables, we identify conditional distribution of the potential outcomes given the latent variable. To pin down the location of the latent variable, we assume strict monotonicty on some functional of the conditional distribution; with specific example of strictly increasing conditional expectation, we label the latent variable as 'latent rank' and motivate the identifying assumption as 'latent rank invariance.' △ Less

Submitted 6 June, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.03468 [pdf, other]

Multi-task Learning for Real-time Autonomous Driving Leveraging Task-adaptive Attention Generator

Authors: Wonhyeok Choi, Mingyu Shin, Hyukzae Lee, Jaehoon Cho, Jaehyeon Park, Sunghoon Im

Abstract: Real-time processing is crucial in autonomous driving systems due to the imperative of instantaneous decision-making and rapid response. In real-world scenarios, autonomous vehicles are continuously tasked with interpreting their surroundings, analyzing intricate sensor data, and making decisions within split seconds to ensure safety through numerous computer vision tasks. In this paper, we presen… ▽ More Real-time processing is crucial in autonomous driving systems due to the imperative of instantaneous decision-making and rapid response. In real-world scenarios, autonomous vehicles are continuously tasked with interpreting their surroundings, analyzing intricate sensor data, and making decisions within split seconds to ensure safety through numerous computer vision tasks. In this paper, we present a new real-time multi-task network adept at three vital autonomous driving tasks: monocular 3D object detection, semantic segmentation, and dense depth estimation. To counter the challenge of negative transfer, which is the prevalent issue in multi-task learning, we introduce a task-adaptive attention generator. This generator is designed to automatically discern interrelations across the three tasks and arrange the task-sharing pattern, all while leveraging the efficiency of the hard-parameter sharing approach. To the best of our knowledge, the proposed model is pioneering in its capability to concurrently handle multiple tasks, notably 3D object detection, while maintaining real-time processing speeds. Our rigorously optimized network, when tested on the Cityscapes-3D datasets, consistently outperforms various baseline models. Moreover, an in-depth ablation study substantiates the efficacy of the methodologies integrated into our framework. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: Accepted at ICRA 2024

arXiv:2402.18748 [pdf, other]

Fast Bootstrapping Nonparametric Maximum Likelihood for Latent Mixture Models

Authors: Shijie Wang, Minsuk Shin, Ray Bai

Abstract: Estimating the mixing density of a latent mixture model is an important task in signal processing. Nonparametric maximum likelihood estimation is one popular approach to this problem. If the latent variable distribution is assumed to be continuous, then bootstrapping can be used to approximate it. However, traditional bootstrapping requires repeated evaluations on resampled data and is not scalabl… ▽ More Estimating the mixing density of a latent mixture model is an important task in signal processing. Nonparametric maximum likelihood estimation is one popular approach to this problem. If the latent variable distribution is assumed to be continuous, then bootstrapping can be used to approximate it. However, traditional bootstrapping requires repeated evaluations on resampled data and is not scalable. In this letter, we construct a generative process to rapidly produce nonparametric maximum likelihood bootstrap estimates. Our method requires only a single evaluation of a novel two-stage optimization algorithm. Simulations and real data analyses demonstrate that our procedure accurately estimates the mixing density with little computational cost even when there are a hundred thousand observations. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: 6 pages (main article is 4 pages, one page of references, and one page Appendix). 5 figures and 4 tables. This paper supersedes a previously circulated technical report by S. Wang and M. Shin (arXiv:2006.00767v2.pdf)

arXiv:2401.07716 [pdf, other]

Layerwise Quantum Convolutional Neural Networks Provide a Unified Way for Estimating Fundamental Properties of Quantum Information Theory

Authors: Myeongjin Shin, Seungwoo Lee, Mingyu Lee, Donghwa Ji, Hyeonjun Yeo, Harrison J. Lee, Kabgyun Jeong

Abstract: The estimation of fundamental properties in quantum information theory, including von Neumann entropy, Rényi entropy, Tsallis entropy, quantum relative entropy, trace distance, and fidelity, has received significant attention. While various algorithms exist for individual property estimation, a unified approach is lacking. This paper proposes a unified methodology using Layerwise Quantum Convoluti… ▽ More The estimation of fundamental properties in quantum information theory, including von Neumann entropy, Rényi entropy, Tsallis entropy, quantum relative entropy, trace distance, and fidelity, has received significant attention. While various algorithms exist for individual property estimation, a unified approach is lacking. This paper proposes a unified methodology using Layerwise Quantum Convolutional Neural Networks (LQCNN). Recent studies exploring parameterized quantum circuits for property estimation face challenges such as barren plateaus and complexity issues in large qubit states. In contrast, our work overcomes these challenges, avoiding barren plateaus and providing a practical solution for large qubit states. Our first contribution offers a mathematical proof that the LQCNN structure preserves fundamental properties. Furthermore, our second contribution analyzes the algorithm's complexity, demonstrating its avoidance of barren plateaus through a structured local cost function. △ Less

Submitted 15 January, 2024; originally announced January 2024.

Comments: 9 pages, 1 figure

arXiv:2401.05400 [pdf]

Collaborative Learning with Artificial Intelligence Speakers (CLAIS): Pre-Service Elementary Science Teachers' Responses to the Prototype

Authors: Gyeong-Geon Lee, Seonyeong Mun, Myeong-Kyeong Shin, Xiaoming Zhai

Abstract: This research aims to demonstrate that AI can function not only as a tool for learning, but also as an intelligent agent with which humans can engage in collaborative learning (CL) to change epistemic practices in science classrooms. We adopted a design and development research approach, following the Analysis, Design, Development, Implementation and Evaluation (ADDIE) model, to prototype a tangib… ▽ More This research aims to demonstrate that AI can function not only as a tool for learning, but also as an intelligent agent with which humans can engage in collaborative learning (CL) to change epistemic practices in science classrooms. We adopted a design and development research approach, following the Analysis, Design, Development, Implementation and Evaluation (ADDIE) model, to prototype a tangible instructional system called Collaborative Learning with AI Speakers (CLAIS). The CLAIS system is designed to have 3-4 human learners join an AI speaker to form a small group, where humans and AI are considered as peers participating in the Jigsaw learning process. The development was carried out using the NUGU AI speaker platform. The CLAIS system was successfully implemented in a Science Education course session with 15 pre-service elementary science teachers. The participants evaluated the CLAIS system through mixed methods surveys as teachers, learners, peers, and users. Quantitative data showed that the participants' Intelligent-Technological, Pedagogical, And Content Knowledge was significantly increased after the CLAIS session, the perception of the CLAIS learning experience was positive, the peer assessment on AI speakers and human peers was different, and the user experience was ambivalent. Qualitative data showed that the participants anticipated future changes in the epistemic process in science classrooms, while acknowledging technical issues such as speech recognition performance and response latency. This study highlights the potential of Human-AI Collaboration for knowledge co-construction in authentic classroom settings and exemplify how AI could shape the future landscape of epistemic practices in the classroom. △ Less

Submitted 19 December, 2023; originally announced January 2024.

arXiv:2401.02694 [pdf, other]

Nonconvex High-Dimensional Time-Varying Coefficient Estimation for Noisy High-Frequency Observations

Authors: Minseok Shin, Donggyu Kim

Abstract: In this paper, we propose a novel high-dimensional time-varying coefficient estimator for noisy high-frequency observations. In high-frequency finance, we often observe that noises dominate a signal of an underlying true process. Thus, we cannot apply usual regression procedures to analyze noisy high-frequency observations. To handle this issue, we first employ a smoothing method for the observed… ▽ More In this paper, we propose a novel high-dimensional time-varying coefficient estimator for noisy high-frequency observations. In high-frequency finance, we often observe that noises dominate a signal of an underlying true process. Thus, we cannot apply usual regression procedures to analyze noisy high-frequency observations. To handle this issue, we first employ a smoothing method for the observed variables. However, the smoothed variables still contain non-negligible noises. To manage these non-negligible noises and the high dimensionality, we propose a nonconvex penalized regression method for each local coefficient. This method produces consistent but biased local coefficient estimators. To estimate the integrated coefficients, we propose a debiasing scheme and obtain a debiased integrated coefficient estimator using debiased local coefficient estimators. Then, to further account for the sparsity structure of the coefficients, we apply a thresholding scheme to the debiased integrated coefficient estimator. We call this scheme the Thresholded dEbiased Nonconvex LASSO (TEN-LASSO) estimator. Furthermore, this paper establishes the concentration properties of the TEN-LASSO estimator and discusses a nonconvex optimization algorithm. △ Less

Submitted 5 January, 2024; originally announced January 2024.

Comments: 54 pages, 5 figures

arXiv:2401.01075 [pdf, other]

Depth-discriminative Metric Learning for Monocular 3D Object Detection

Authors: Wonhyeok Choi, Mingyu Shin, Sunghoon Im

Abstract: Monocular 3D object detection poses a significant challenge due to the lack of depth information in RGB images. Many existing methods strive to enhance the object depth estimation performance by allocating additional parameters for object depth estimation, utilizing extra modules or data. In contrast, we introduce a novel metric learning scheme that encourages the model to extract depth-discrimina… ▽ More Monocular 3D object detection poses a significant challenge due to the lack of depth information in RGB images. Many existing methods strive to enhance the object depth estimation performance by allocating additional parameters for object depth estimation, utilizing extra modules or data. In contrast, we introduce a novel metric learning scheme that encourages the model to extract depth-discriminative features regardless of the visual attributes without increasing inference time and model size. Our method employs the distance-preserving function to organize the feature space manifold in relation to ground-truth object depth. The proposed (K, B, eps)-quasi-isometric loss leverages predetermined pairwise distance restriction as guidance for adjusting the distance among object descriptors without disrupting the non-linearity of the natural feature manifold. Moreover, we introduce an auxiliary head for object-wise depth estimation, which enhances depth quality while maintaining the inference time. The broad applicability of our method is demonstrated through experiments that show improvements in overall performance when integrated into various baselines. The results show that our method consistently improves the performance of various baselines by 23.51% and 5.78% on average across KITTI and Waymo, respectively. △ Less

Submitted 2 January, 2024; originally announced January 2024.

Comments: Accepted at NeurIPS 2023

arXiv:2312.15964

Semantic Guidance Tuning for Text-To-Image Diffusion Models

Authors: Hyun Kang, Dohae Lee, Myungjin Shin, In-Kwon Lee

Abstract: Recent advancements in Text-to-Image (T2I) diffusion models have demonstrated impressive success in generating high-quality images with zero-shot generalization capabilities. Yet, current models struggle to closely adhere to prompt semantics, often misrepresenting or overlooking specific attributes. To address this, we propose a simple, training-free approach that modulates the guidance direction… ▽ More Recent advancements in Text-to-Image (T2I) diffusion models have demonstrated impressive success in generating high-quality images with zero-shot generalization capabilities. Yet, current models struggle to closely adhere to prompt semantics, often misrepresenting or overlooking specific attributes. To address this, we propose a simple, training-free approach that modulates the guidance direction of diffusion models during inference. We first decompose the prompt semantics into a set of concepts, and monitor the guidance trajectory in relation to each concept. Our key observation is that deviations in model's adherence to prompt semantics are highly correlated with divergence of the guidance from one or more of these concepts. Based on this observation, we devise a technique to steer the guidance direction towards any concept from which the model diverges. Extensive experimentation validates that our method improves the semantic alignment of images generated by diffusion models in response to prompts. Project page is available at: https://korguy.github.io/ △ Less

Submitted 29 January, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

Comments: Rework is being done

arXiv:2312.13947 [pdf, other]

PhysRFANet: Physics-Guided Neural Network for Real-Time Prediction of Thermal Effect During Radiofrequency Ablation Treatment

Authors: Minwoo Shin, Minjee Seo, Seonaeng Cho, Juil Park, Joon Ho Kwon, Deukhee Lee, Kyungho Yoon

Abstract: Radiofrequency ablation (RFA) is a widely used minimally invasive technique for ablating solid tumors. Achieving precise personalized treatment necessitates feedback information on in situ thermal effects induced by the RFA procedure. While computer simulation facilitates the prediction of electrical and thermal phenomena associated with RFA, its practical implementation in clinical settings is hi… ▽ More Radiofrequency ablation (RFA) is a widely used minimally invasive technique for ablating solid tumors. Achieving precise personalized treatment necessitates feedback information on in situ thermal effects induced by the RFA procedure. While computer simulation facilitates the prediction of electrical and thermal phenomena associated with RFA, its practical implementation in clinical settings is hindered by high computational demands. In this paper, we propose a physics-guided neural network model, named PhysRFANet, to enable real-time prediction of thermal effect during RFA treatment. The networks, designed for predicting temperature distribution and the corresponding ablation lesion, were trained using biophysical computational models that integrated electrostatics, bio-heat transfer, and cell necrosis, alongside magnetic resonance (MR) images of breast cancer patients. Validation of the computational model was performed through experiments on ex vivo bovine liver tissue. Our model demonstrated a 96% Dice score in predicting the lesion volume and an RMSE of 0.4854 for temperature distribution when tested with foreseen tumor images. Notably, even with unforeseen images, it achieved a 93% Dice score for the ablation lesion and an RMSE of 0.6783 for temperature distribution. All networks were capable of inferring results within 10 ms. The presented technique, applied to optimize the placement of the electrode for a specific target region, holds significant promise in enhancing the safety and efficacy of RFA treatments. △ Less

Submitted 21 December, 2023; originally announced December 2023.

arXiv:2311.16466 [pdf, other]

Large language models can enhance persuasion through linguistic feature alignment

Authors: Minkyu Shin, Jin Kim

Abstract: Although large language models (LLMs) are reshaping various aspects of human life, our current understanding of their impacts remains somewhat constrained. Here we investigate the impact of LLMs on human communication, using data on consumer complaints in the financial industry. By employing an AI detection tool on more than 820K complaints gathered by the Consumer Financial Protection Bureau (CFP… ▽ More Although large language models (LLMs) are reshaping various aspects of human life, our current understanding of their impacts remains somewhat constrained. Here we investigate the impact of LLMs on human communication, using data on consumer complaints in the financial industry. By employing an AI detection tool on more than 820K complaints gathered by the Consumer Financial Protection Bureau (CFPB), we find a sharp increase in the likely use of LLMs shortly after the release of ChatGPT. Moreover, the likely LLM usage was positively correlated with message persuasiveness (i.e., increased likelihood of obtaining relief from financial firms). Computational linguistic analyses suggest that the positive correlation may be explained by LLMs' enhancement of various linguistic features. Based on the results of these observational studies, we hypothesize that LLM usage may enhance a comprehensive set of linguistic features, increasing message persuasiveness to receivers with heterogeneous linguistic preferences (i.e., linguistic feature alignment). We test this hypothesis in preregistered experiments and find support for it. As an instance of early empirical demonstrations of LLM usage for enhancing persuasion, our research highlights the transformative potential of LLMs in human communication. △ Less

Submitted 12 February, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

arXiv:2311.04915 [pdf]

doi 10.19066/cogsci.2024.35.1.002

Chain of Empathy: Enhancing Empathetic Response of Large Language Models Based on Psychotherapy Models

Authors: Yoon Kyung Lee, Inju Lee, Minjung Shin, Seoyeon Bae, Sowon Hahn

Abstract: We present a novel method, the Chain of Empathy (CoE) prompting, that utilizes insights from psychotherapy to induce Large Language Models (LLMs) to reason about human emotional states. This method is inspired by various psychotherapy approaches including Cognitive Behavioral Therapy (CBT), Dialectical Behavior Therapy (DBT), Person Centered Therapy (PCT), and Reality Therapy (RT), each leading to… ▽ More We present a novel method, the Chain of Empathy (CoE) prompting, that utilizes insights from psychotherapy to induce Large Language Models (LLMs) to reason about human emotional states. This method is inspired by various psychotherapy approaches including Cognitive Behavioral Therapy (CBT), Dialectical Behavior Therapy (DBT), Person Centered Therapy (PCT), and Reality Therapy (RT), each leading to different patterns of interpreting clients' mental states. LLMs without reasoning generated predominantly exploratory responses. However, when LLMs used CoE reasoning, we found a more comprehensive range of empathetic responses aligned with the different reasoning patterns of each psychotherapy model. The CBT based CoE resulted in the most balanced generation of empathetic responses. The findings underscore the importance of understanding the emotional context and how it affects human and AI communication. Our research contributes to understanding how psychotherapeutic models can be incorporated into LLMs, facilitating the development of context-specific, safer, and empathetic AI. △ Less

Submitted 13 December, 2023; v1 submitted 1 November, 2023; originally announced November 2023.

Journal ref: Korean Journal of Cognitive Science. 2024, Vol. 35 Issue 1, p23-48. 26p

arXiv:2309.10066 [pdf, other]

doi 10.1007/s10278-024-00985-3

Automatic Personalized Impression Generation for PET Reports Using Large Language Models

Authors: Xin Tie, Muheon Shin, Ali Pirasteh, Nevein Ibrahim, Zachary Huemann, Sharon M. Castellino, Kara M. Kelly, John Garrett, Junjie Hu, Steve Y. Cho, Tyler J. Bradshaw

Abstract: In this study, we aimed to determine if fine-tuned large language models (LLMs) can generate accurate, personalized impressions for whole-body PET reports. Twelve language models were trained on a corpus of PET reports using the teacher-forcing algorithm, with the report findings as input and the clinical impressions as reference. An extra input token encodes the reading physician's identity, allo… ▽ More In this study, we aimed to determine if fine-tuned large language models (LLMs) can generate accurate, personalized impressions for whole-body PET reports. Twelve language models were trained on a corpus of PET reports using the teacher-forcing algorithm, with the report findings as input and the clinical impressions as reference. An extra input token encodes the reading physician's identity, allowing models to learn physician-specific reporting styles. Our corpus comprised 37,370 retrospective PET reports collected from our institution between 2010 and 2022. To identify the best LLM, 30 evaluation metrics were benchmarked against quality scores from two nuclear medicine (NM) physicians, with the most aligned metrics selecting the model for expert evaluation. In a subset of data, model-generated impressions and original clinical impressions were assessed by three NM physicians according to 6 quality dimensions (3-point scale) and an overall utility score (5-point scale). Each physician reviewed 12 of their own reports and 12 reports from other physicians. Bootstrap resampling was used for statistical analysis. Of all evaluation metrics, domain-adapted BARTScore and PEGASUSScore showed the highest Spearman's rank correlations (0.568 and 0.563) with physician preferences. Based on these metrics, the fine-tuned PEGASUS model was selected as the top LLM. When physicians reviewed PEGASUS-generated impressions in their own style, 89% were considered clinically acceptable, with a mean utility score of 4.08 out of 5. Physicians rated these personalized impressions as comparable in overall utility to the impressions dictated by other physicians (4.03, P=0.41). In conclusion, personalized impressions generated by PEGASUS were clinically useful, highlighting its potential to expedite PET reporting. △ Less

Submitted 17 October, 2023; v1 submitted 18 September, 2023; originally announced September 2023.

Comments: 25 pages in total. 6 figures and 3 tables in the main body. The manuscript has been submitted to a journal for potential publication

Journal ref: J Digit Imaging. Inform. Med. (2024)

arXiv:2309.01363 [pdf, other]

Mutual Information Maximizing Quantum Generative Adversarial Network and Its Applications in Finance

Authors: Mingyu Lee, Myeongjin Shin, Junseo Lee, Kabgyun Jeong

Abstract: One of the most promising applications in the era of NISQ (Noisy Intermediate-Scale Quantum) computing is quantum machine learning. Quantum machine learning offers significant quantum advantages over classical machine learning across various domains. Specifically, generative adversarial networks have been recognized for their potential utility in diverse fields such as image generation, finance, a… ▽ More One of the most promising applications in the era of NISQ (Noisy Intermediate-Scale Quantum) computing is quantum machine learning. Quantum machine learning offers significant quantum advantages over classical machine learning across various domains. Specifically, generative adversarial networks have been recognized for their potential utility in diverse fields such as image generation, finance, and probability distribution modeling. However, these networks necessitate solutions for inherent challenges like mode collapse. In this study, we capitalize on the concept that the estimation of mutual information between high-dimensional continuous random variables can be achieved through gradient descent using neural networks. We introduce a novel approach named InfoQGAN, which employs the Mutual Information Neural Estimator (MINE) within the framework of quantum generative adversarial networks to tackle the mode collapse issue. Furthermore, we elaborate on how this approach can be applied to a financial scenario, specifically addressing the problem of generating portfolio return distributions through dynamic asset allocation. This illustrates the potential practical applicability of InfoQGAN in real-world contexts. △ Less

Submitted 4 September, 2023; originally announced September 2023.

Comments: 15 pages, 15 figures

arXiv:2308.15839 [pdf, other]

Utilizing Task-Generic Motion Prior to Recover Full-Body Motion from Very Sparse Signals

Authors: Myungjin Shin, Dohae Lee, In-Kwon Lee

Abstract: The most popular type of devices used to track a user's posture in a virtual reality experience consists of a head-mounted display and two controllers held in both hands. However, due to the limited number of tracking sensors (three in total), faithfully recovering the user in full-body is challenging, limiting the potential for interactions among simulated user avatars within the virtual world. T… ▽ More The most popular type of devices used to track a user's posture in a virtual reality experience consists of a head-mounted display and two controllers held in both hands. However, due to the limited number of tracking sensors (three in total), faithfully recovering the user in full-body is challenging, limiting the potential for interactions among simulated user avatars within the virtual world. Therefore, recent studies have attempted to reconstruct full-body poses using neural networks that utilize previously learned human poses or accept a series of past poses over a short period. In this paper, we propose a method that utilizes information from a neural motion prior to improve the accuracy of reconstructed user's motions. Our approach aims to reconstruct user's full-body poses by predicting the latent representation of the user's overall motion from limited input signals and integrating this information with tracking sensor inputs. This is based on the premise that the ultimate goal of pose reconstruction is to reconstruct the motion, which is a series of poses. Our results show that this integration enables more accurate reconstruction of the user's full-body motion, particularly enhancing the robustness of lower body motion reconstruction from impoverished signals. Web: https://https://mjsh34.github.io/mp-sspe/ △ Less

Submitted 30 August, 2023; originally announced August 2023.

arXiv:2306.14566 [pdf, other]

doi 10.1007/s11128-023-04253-1

Estimating Quantum Mutual Information Through a Quantum Neural Network

Authors: Myeongjin Shin, Junseo Lee, Kabgyun Jeong

Abstract: We propose a method of quantum machine learning called quantum mutual information neural estimation (QMINE) for estimating von Neumann entropy and quantum mutual information, which are fundamental properties in quantum information theory. The QMINE proposed here basically utilizes a technique of quantum neural networks (QNNs), to minimize a loss function that determines the von Neumann entropy, an… ▽ More We propose a method of quantum machine learning called quantum mutual information neural estimation (QMINE) for estimating von Neumann entropy and quantum mutual information, which are fundamental properties in quantum information theory. The QMINE proposed here basically utilizes a technique of quantum neural networks (QNNs), to minimize a loss function that determines the von Neumann entropy, and thus quantum mutual information, which is believed more powerful to process quantum datasets than conventional neural networks due to quantum superposition and entanglement. To create a precise loss function, we propose a quantum Donsker-Varadhan representation (QDVR), which is a quantum analog of the classical Donsker-Varadhan representation. By exploiting a parameter shift rule on parameterized quantum circuits, we can efficiently implement and optimize the QNN and estimate the quantum entropies using the QMINE technique. Furthermore, numerical observations support our predictions of QDVR and demonstrate the good performance of QMINE. △ Less

Submitted 11 February, 2024; v1 submitted 26 June, 2023; originally announced June 2023.

Comments: 10 pages, 4 figures

Journal ref: Quantum Information Processing 23, 57 (2024)

arXiv:2306.11244 [pdf, ps, other]

A Collision-Based Hybrid Method for the BGK Equation

Authors: Minwoo Shin, Cory D. Hauck, Ryan G. McClarren

Abstract: We apply the collision-based hybrid introduced in \cite{hauck} to the Boltzmann equation with the BGK operator and a hyperbolic scaling. An implicit treatment of the source term is used to handle stiffness associated with the BGK operator. Although it helps the numerical scheme become stable with a large time step size, it is still not obvious to achieve the desired order of accuracy due to the re… ▽ More We apply the collision-based hybrid introduced in \cite{hauck} to the Boltzmann equation with the BGK operator and a hyperbolic scaling. An implicit treatment of the source term is used to handle stiffness associated with the BGK operator. Although it helps the numerical scheme become stable with a large time step size, it is still not obvious to achieve the desired order of accuracy due to the relationship between the size of the spatial cell and the mean free path. Without asymptotic preserving property, a very restricted grid size is required to resolve the mean free path, which is not practical. Our approaches are based on the noncollision-collision decomposition of the BGK equation. We introduce the arbitrary order of nodal discontinuous Galerkin (DG) discretization in space with a semi-implicit time-stepping method; we employ the backward Euler time integration for the uncollided equation and the 2nd order predictor-corrector scheme for the collided equation, i.e., both source terms in uncollided and collided equations are treated implicitly and only streaming term in the collided equation is solved explicitly. This improves the computational efficiency without the complexity of the numerical implementation. Numerical results are presented for various Knudsen numbers to present the effectiveness and accuracy of our hybrid method. Also, we compare the solutions of the hybrid and non-hybrid schemes. △ Less

Submitted 19 June, 2023; originally announced June 2023.

arXiv:2306.04635 [pdf]

A New Paradigm Integrating the Concepts of Particle Abrasion and Breakage

Authors: Priya Tripathi, Seung Jae Lee, Moochul Shin, Chang Hoon Lee

Abstract: This paper introduces a new paradigm that integrates the concepts of particle abrasion and breakage. Both processes can co-occur under loading as soil particles are subjected to friction as well as collisions between particles. Therefore, the significance of this integrating paradigm lies in its ability to address both abrasion and breakage in a single framework. The new paradigm is mapped out in… ▽ More This paper introduces a new paradigm that integrates the concepts of particle abrasion and breakage. Both processes can co-occur under loading as soil particles are subjected to friction as well as collisions between particles. Therefore, the significance of this integrating paradigm lies in its ability to address both abrasion and breakage in a single framework. The new paradigm is mapped out in a framework called the 'particle geometry space.' The x-axis corresponds to the surface-area-to-volume ratio ($A/V$), while the y-axis represents volume ($V$). This space facilitates a holistic characterization of the four-particle geometry features, i.e., shape ($β$) and size ($D$) as well as surface area ($A$) and volume ($V$). Three distinct paths (abrasion, breakage, and equally-occurring abrasion and breakage processes), three limit lines (breakage line, sphere line, and average shape-conserving line), and five different zones are defined in the particle geometry space. Consequently, this approach enables us to systematically relate the extent of co-occurring abrasion and breakage to the particle geometry evolution. △ Less

Submitted 3 September, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

Comments: 10 pages, 5 figures; Introduced a new term 'particle geometry space'; Improved readability and fixed minor typos;

arXiv:2305.06464 [pdf, ps, other]

Brauertsch fields

Authors: Daniel Krashen, Max Lieblich, Minseon Shin

Abstract: We prove a local-to-global principle for Brauer classes: for any finite collection of non-trivial Brauer classes on a variety over a field of transcendence degree at least 3, there are infinitely many specializations where each class stays non-trivial. This is deduced from a Grothendieck--Lefschetz-type theorem for Brauer groups of certain smooth stacks. This also leads to the notion of a Brauerts… ▽ More We prove a local-to-global principle for Brauer classes: for any finite collection of non-trivial Brauer classes on a variety over a field of transcendence degree at least 3, there are infinitely many specializations where each class stays non-trivial. This is deduced from a Grothendieck--Lefschetz-type theorem for Brauer groups of certain smooth stacks. This also leads to the notion of a Brauertsch field. △ Less

Submitted 10 May, 2023; originally announced May 2023.

arXiv:2305.01148 [pdf, other]

PU-EdgeFormer: Edge Transformer for Dense Prediction in Point Cloud Upsampling

Authors: Dohoon Kim, Minwoo Shin, Joonki Paik

Abstract: Despite the recent development of deep learning-based point cloud upsampling, most MLP-based point cloud upsampling methods have limitations in that it is difficult to train the local and global structure of the point cloud at the same time. To solve this problem, we present a combined graph convolution and transformer for point cloud upsampling, denoted by PU-EdgeFormer. The proposed method const… ▽ More Despite the recent development of deep learning-based point cloud upsampling, most MLP-based point cloud upsampling methods have limitations in that it is difficult to train the local and global structure of the point cloud at the same time. To solve this problem, we present a combined graph convolution and transformer for point cloud upsampling, denoted by PU-EdgeFormer. The proposed method constructs EdgeFormer unit that consists of graph convolution and multi-head self-attention modules. We employ graph convolution using EdgeConv, which learns the local geometry and global structure of point cloud better than existing point-to-feature method. Through in-depth experiments, we confirmed that the proposed method has better point cloud upsampling performance than the existing state-of-the-art method in both subjective and objective aspects. The code is available at https://github.com/dohoon2045/PU-EdgeFormer. △ Less

Submitted 1 May, 2023; originally announced May 2023.

Comments: Accepted to ICASSP 2023

arXiv:2305.00108 [pdf, other]

doi 10.3847/1538-4365/acdee1

A data science platform to enable time-domain astronomy

Authors: Michael W. Coughlin, Joshua S. Bloom, Guy Nir, Sarah Antier, Theophile Jegou du Laz, Stéfan van der Walt, Arien Crellin-Quick, Thomas Culino, Dmitry A. Duev, Daniel A. Goldstein, Brian F. Healy, Viraj Karambelkar, Jada Lilleboe, Kyung Min Shin, Leo P. Singer, Tomas Ahumada, Shreya Anand, Eric C. Bellm, Richard Dekany, Matthew J. Graham, Mansi M. Kasliwal, Ivona Kostadinova, R. Weizmann Kiendrebeogo, Shrinivas R. Kulkarni, Sydney Jenkins , et al. (28 additional authors not shown)

Abstract: SkyPortal is an open-source software package designed to efficiently discover interesting transients, manage follow-up, perform characterization, and visualize the results. By enabling fast access to archival and catalog data, cross-matching heterogeneous data streams, and the triggering and monitoring of on-demand observations for further characterization, a SkyPortal-based platform has been oper… ▽ More SkyPortal is an open-source software package designed to efficiently discover interesting transients, manage follow-up, perform characterization, and visualize the results. By enabling fast access to archival and catalog data, cross-matching heterogeneous data streams, and the triggering and monitoring of on-demand observations for further characterization, a SkyPortal-based platform has been operating at scale for 2 yr for the Zwicky Transient Facility Phase II community, with hundreds of users, containing tens of millions of time-domain sources, interacting with dozens of telescopes, and enabling community reporting. While SkyPortal emphasizes rich user experiences (UX) across common frontend workflows, recognizing that scientific inquiry is increasingly performed programmatically, SkyPortal also surfaces an extensive and well-documented API system. From backend and frontend software to data science analysis tools and visualization frameworks, the SkyPortal design emphasizes the re-use and leveraging of best-in-class approaches, with a strong extensibility ethos. For instance, SkyPortal now leverages ChatGPT large-language models (LLMs) to automatically generate and surface source-level human-readable summaries. With the imminent re-start of the next-generation of gravitational wave detectors, SkyPortal now also includes dedicated multi-messenger features addressing the requirements of rapid multi-messenger follow-up: multi-telescope management, team/group organizing interfaces, and cross-matching of multi-messenger data streams with time-domain optical surveys, with interfaces sufficiently intuitive for the newcomers to the field. (abridged) △ Less

Submitted 14 June, 2023; v1 submitted 28 April, 2023; originally announced May 2023.

Comments: Accepted to ApJS

arXiv:2303.07462 [pdf]

doi 10.1073/pnas.2214840120

Superhuman Artificial Intelligence Can Improve Human Decision Making by Increasing Novelty

Authors: Minkyu Shin, Jin Kim, Bas van Opheusden, Thomas L. Griffiths

Abstract: How will superhuman artificial intelligence (AI) affect human decision making? And what will be the mechanisms behind this effect? We address these questions in a domain where AI already exceeds human performance, analyzing more than 5.8 million move decisions made by professional Go players over the past 71 years (1950-2021). To address the first question, we use a superhuman AI program to estima… ▽ More How will superhuman artificial intelligence (AI) affect human decision making? And what will be the mechanisms behind this effect? We address these questions in a domain where AI already exceeds human performance, analyzing more than 5.8 million move decisions made by professional Go players over the past 71 years (1950-2021). To address the first question, we use a superhuman AI program to estimate the quality of human decisions across time, generating 58 billion counterfactual game patterns and comparing the win rates of actual human decisions with those of counterfactual AI decisions. We find that humans began to make significantly better decisions following the advent of superhuman AI. We then examine human players' strategies across time and find that novel decisions (i.e., previously unobserved moves) occurred more frequently and became associated with higher decision quality after the advent of superhuman AI. Our findings suggest that the development of superhuman AI programs may have prompted human players to break away from traditional strategies and induced them to explore novel moves, which in turn may have improved their decision-making. △ Less

Submitted 14 April, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

Comments: This paper is published in PNAS: https://www.pnas.org/doi/10.1073/pnas.2214840120 Minor edits to v1 include the addition of watermark and link to the published paper in the footer

MSC Class: 68T01; 68T05; 68T35; 68T99 ACM Class: I.2.0; I.2.1; I.2.6; I.2.m

Journal ref: Proceedings of the National Academy of Sciences, 120 (12), e2214840120 (2023)

arXiv:2302.13658 [pdf, other]

Robust High-Dimensional Time-Varying Coefficient Estimation

Authors: Minseok Shin, Donggyu Kim

Abstract: In this paper, we develop a novel high-dimensional coefficient estimation procedure based on high-frequency data. Unlike usual high-dimensional regression procedure such as LASSO, we additionally handle the heavy-tailedness of high-frequency observations as well as time variations of coefficient processes. Specifically, we employ Huber loss and truncation scheme to handle heavy-tailed observations… ▽ More In this paper, we develop a novel high-dimensional coefficient estimation procedure based on high-frequency data. Unlike usual high-dimensional regression procedure such as LASSO, we additionally handle the heavy-tailedness of high-frequency observations as well as time variations of coefficient processes. Specifically, we employ Huber loss and truncation scheme to handle heavy-tailed observations, while $\ell_{1}$-regularization is adopted to overcome the curse of dimensionality. To account for the time-varying coefficient, we estimate local coefficients which are biased due to the $\ell_{1}$-regularization. Thus, when estimating integrated coefficients, we propose a debiasing scheme to enjoy the law of large number property and employ a thresholding scheme to further accommodate the sparsity of the coefficients. We call this Robust thrEsholding Debiased LASSO (RED-LASSO) estimator. We show that the RED-LASSO estimator can achieve a near-optimal convergence rate. In the empirical study, we apply the RED-LASSO procedure to the high-dimensional integrated coefficient estimation using high-frequency trading data. △ Less

Submitted 8 October, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

Comments: 55 pages, 5 figures

arXiv:2301.09091 [pdf, other]

BallGAN: 3D-aware Image Synthesis with a Spherical Background

Authors: Minjung Shin, Yunji Seo, Jeongmin Bae, Young Sun Choi, Hyunsu Kim, Hyeran Byun, Youngjung Uh

Abstract: 3D-aware GANs aim to synthesize realistic 3D scenes such that they can be rendered in arbitrary perspectives to produce images. Although previous methods produce realistic images, they suffer from unstable training or degenerate solutions where the 3D geometry is unnatural. We hypothesize that the 3D geometry is underdetermined due to the insufficient constraint, i.e., being classified as real ima… ▽ More 3D-aware GANs aim to synthesize realistic 3D scenes such that they can be rendered in arbitrary perspectives to produce images. Although previous methods produce realistic images, they suffer from unstable training or degenerate solutions where the 3D geometry is unnatural. We hypothesize that the 3D geometry is underdetermined due to the insufficient constraint, i.e., being classified as real image to the discriminator is not enough. To solve this problem, we propose to approximate the background as a spherical surface and represent a scene as a union of the foreground placed in the sphere and the thin spherical background. It reduces the degree of freedom in the background field. Accordingly, we modify the volume rendering equation and incorporate dedicated constraints to design a novel 3D-aware GAN framework named BallGAN. BallGAN has multiple advantages as follows. 1) It produces more reasonable 3D geometry; the images of a scene across different viewpoints have better photometric consistency and fidelity than the state-of-the-art methods. 2) The training becomes much more stable. 3) The foreground can be separately rendered on top of different arbitrary backgrounds. △ Less

Submitted 24 August, 2023; v1 submitted 22 January, 2023; originally announced January 2023.

Comments: ICCV 2023, Project Page: https://minjung-s.github.io/ballgan

arXiv:2301.08448 [pdf, other]

Source-free Subject Adaptation for EEG-based Visual Recognition

Authors: Pilhyeon Lee, Seogkyu Jeon, Sunhee Hwang, Minjung Shin, Hyeran Byun

Abstract: This paper focuses on subject adaptation for EEG-based visual recognition. It aims at building a visual stimuli recognition system customized for the target subject whose EEG samples are limited, by transferring knowledge from abundant data of source subjects. Existing approaches consider the scenario that samples of source subjects are accessible during training. However, it is often infeasible a… ▽ More This paper focuses on subject adaptation for EEG-based visual recognition. It aims at building a visual stimuli recognition system customized for the target subject whose EEG samples are limited, by transferring knowledge from abundant data of source subjects. Existing approaches consider the scenario that samples of source subjects are accessible during training. However, it is often infeasible and problematic to access personal biological data like EEG signals due to privacy issues. In this paper, we introduce a novel and practical problem setup, namely source-free subject adaptation, where the source subject data are unavailable and only the pre-trained model parameters are provided for subject adaptation. To tackle this challenging problem, we propose classifier-based data generation to simulate EEG samples from source subjects using classifier responses. Using the generated samples and target subject data, we perform subject-independent feature learning to exploit the common knowledge shared across different subjects. Notably, our framework is generalizable and can adopt any subject-independent learning method. In the experiments on the EEG-ImageNet40 benchmark, our model brings consistent improvements regardless of the choice of subject-independent learning. Also, our method shows promising performance, recording top-1 test accuracy of 74.6% under the 5-shot setting even without relying on source data. Our code can be found at https://github.com/DeepBCI/Deep-BCI/tree/master/1_Intelligent_BCI/Source_Free_Subject_Adaptation_for_EEG. △ Less

Submitted 20 January, 2023; originally announced January 2023.

Comments: Accepted by the 11th IEEE International Winter Conference on Brain-Computer Interface (BCI 2023). Code is available at https://github.com/DeepBCI/Deep-BCI

arXiv:2301.03661 [pdf, other]

Generative Quantile Regression with Variability Penalty

Authors: Shijie Wang, Minsuk Shin, Ray Bai

Abstract: Quantile regression and conditional density estimation can reveal structure that is missed by mean regression, such as multimodality and skewness. In this paper, we introduce a deep learning generative model for joint quantile estimation called Penalized Generative Quantile Regression (PGQR). Our approach simultaneously generates samples from many random quantile levels, allowing us to infer the c… ▽ More Quantile regression and conditional density estimation can reveal structure that is missed by mean regression, such as multimodality and skewness. In this paper, we introduce a deep learning generative model for joint quantile estimation called Penalized Generative Quantile Regression (PGQR). Our approach simultaneously generates samples from many random quantile levels, allowing us to infer the conditional distribution of a response variable given a set of covariates. Our method employs a novel variability penalty to avoid the problem of vanishing variability, or memorization, in deep generative models. Further, we introduce a new family of partial monotonic neural networks (PMNN) to circumvent the problem of crossing quantile curves. A major benefit of PGQR is that it can be fit using a single optimization, thus bypassing the need to repeatedly train the model at multiple quantile levels or use computationally expensive cross-validation to tune the penalty parameter. We illustrate the efficacy of PGQR through extensive simulation studies and analysis of real datasets. Code to implement our method is available at https://github.com/shijiew97/PGQR. △ Less

Submitted 13 November, 2023; v1 submitted 9 January, 2023; originally announced January 2023.

Comments: 41 pages, 17 figures, 4 tables. New version includes more simulation studies, comparisons to competing methods, illustrations, real data applications, and discussion of the vanishing variability phenomenon and overparameterization in deep learning. The figures are higher-resolution, and the presentation and writing have improved

arXiv:2210.13529 [pdf, other]

Multi-Person 3D Pose and Shape Estimation via Inverse Kinematics and Refinement

Authors: Junuk Cha, Muhammad Saqlain, GeonU Kim, Mingyu Shin, Seungryul Baek

Abstract: Estimating 3D poses and shapes in the form of meshes from monocular RGB images is challenging. Obviously, it is more difficult than estimating 3D poses only in the form of skeletons or heatmaps. When interacting persons are involved, the 3D mesh reconstruction becomes more challenging due to the ambiguity introduced by person-to-person occlusions. To tackle the challenges, we propose a coarse-to-f… ▽ More Estimating 3D poses and shapes in the form of meshes from monocular RGB images is challenging. Obviously, it is more difficult than estimating 3D poses only in the form of skeletons or heatmaps. When interacting persons are involved, the 3D mesh reconstruction becomes more challenging due to the ambiguity introduced by person-to-person occlusions. To tackle the challenges, we propose a coarse-to-fine pipeline that benefits from 1) inverse kinematics from the occlusion-robust 3D skeleton estimation and 2) Transformer-based relation-aware refinement techniques. In our pipeline, we first obtain occlusion-robust 3D skeletons for multiple persons from an RGB image. Then, we apply inverse kinematics to convert the estimated skeletons to deformable 3D mesh parameters. Finally, we apply the Transformer-based mesh refinement that refines the obtained mesh parameters considering intra- and inter-person relations of 3D meshes. Via extensive experiments, we demonstrate the effectiveness of our method, outperforming state-of-the-arts on 3DPW, MuPoTS and AGORA datasets. △ Less

Submitted 30 October, 2022; v1 submitted 24 October, 2022; originally announced October 2022.

Comments: Published at ECCV 2022

arXiv:2210.11870 [pdf, other]

LittleBird: Efficient Faster & Longer Transformer for Question Answering

Authors: Minchul Lee, Kijong Han, Myeong Cheol Shin

Abstract: BERT has shown a lot of sucess in a wide variety of NLP tasks. But it has a limitation dealing with long inputs due to its attention mechanism. Longformer, ETC and BigBird addressed this issue and effectively solved the quadratic dependency problem. However we find that these models are not sufficient, and propose LittleBird, a novel model based on BigBird with improved speed and memory footprint… ▽ More BERT has shown a lot of sucess in a wide variety of NLP tasks. But it has a limitation dealing with long inputs due to its attention mechanism. Longformer, ETC and BigBird addressed this issue and effectively solved the quadratic dependency problem. However we find that these models are not sufficient, and propose LittleBird, a novel model based on BigBird with improved speed and memory footprint while maintaining accuracy. In particular, we devise a more flexible and efficient position representation method based on Attention with Linear Biases (ALiBi). We also show that replacing the method of global information represented in the BigBird with pack and unpack attention is more effective. The proposed model can work on long inputs even after being pre-trained on short inputs, and can be trained efficiently reusing existing pre-trained language model for short inputs. This is a significant benefit for low-resource languages where large amounts of long text data are difficult to obtain. As a result, our experiments show that LittleBird works very well in a variety of languages, achieving high performance in question answering tasks, particularly in KorQuAD2.0, Korean Question Answering Dataset for long paragraphs. △ Less

Submitted 12 April, 2023; v1 submitted 21 October, 2022; originally announced October 2022.

Comments: Accepted to EMNLP 2022. see https://aclanthology.org/2022.emnlp-main.352/

arXiv:2208.09412 [pdf, other]

Multigroup Neutron Transport using a Collision-Based Hybrid Method

Authors: Ben Whewell, Ryan G. McClarren, Cory D. Hauck, Minwoo Shin

Abstract: A collision-based hybrid algorithm for the discrete ordinates approximation of the neutron transport equation is extended to the multigroup setting. The algorithm uses discrete energy and angle grids at two different resolutions and approximates the fission and scattering sources on the coarser grids. The coupling of a collided transport equation, discretized on the coarse grid, with an uncollided… ▽ More A collision-based hybrid algorithm for the discrete ordinates approximation of the neutron transport equation is extended to the multigroup setting. The algorithm uses discrete energy and angle grids at two different resolutions and approximates the fission and scattering sources on the coarser grids. The coupling of a collided transport equation, discretized on the coarse grid, with an uncollided transport equation, discretized on the fine grid, yields an algorithm that, in most cases, is more efficient than the traditional multigroup approach. The improvement over existing techniques is demonstrated for time-dependent problems with different materials, geometries, and energy groups. △ Less

Submitted 2 December, 2022; v1 submitted 19 August, 2022; originally announced August 2022.

arXiv:2205.15808 [pdf, other]

Volatility Models for Stylized Facts of High-Frequency Financial Data

Authors: Donggyu Kim, Minseok Shin

Abstract: This paper introduces novel volatility diffusion models to account for the stylized facts of high-frequency financial data such as volatility clustering, intra-day U-shape, and leverage effect. For example, the daily integrated volatility of the proposed volatility process has a realized GARCH structure with an asymmetric effect on log-returns. To further explain the heavy-tailedness of the financ… ▽ More This paper introduces novel volatility diffusion models to account for the stylized facts of high-frequency financial data such as volatility clustering, intra-day U-shape, and leverage effect. For example, the daily integrated volatility of the proposed volatility process has a realized GARCH structure with an asymmetric effect on log-returns. To further explain the heavy-tailedness of the financial data, we assume that the log-returns have a finite $2b$-th moment for $b \in (1,2]$. Then, we propose a Huber regression estimator which has an optimal convergence rate of $n^{(1-b)/b}$. We also discuss how to adjust bias coming from Huber loss and show its asymptotic properties. △ Less

Submitted 31 May, 2022; originally announced May 2022.

Comments: 53 pages, 3 figures

arXiv:2204.02346 [pdf, ps, other]

Finitely Heterogeneous Treatment Effect in Event-study

Authors: Myungkou Shin

Abstract: The key assumption of the differences-in-differences approach in the event-study design is that untreated potential outcome differences are mean independent of treatment timing: the parallel trend assumption. In this paper, we relax the parallel trend assumption by assuming a latent type variable and developing a type-specific parallel trend assumption. With a finite support assumption on the late… ▽ More The key assumption of the differences-in-differences approach in the event-study design is that untreated potential outcome differences are mean independent of treatment timing: the parallel trend assumption. In this paper, we relax the parallel trend assumption by assuming a latent type variable and developing a type-specific parallel trend assumption. With a finite support assumption on the latent type variable, we show that an extremum classifier consistently estimates the type assignment. Based on the classification result, we propose a type-specific diff-in-diff estimator for type-specific CATT. By estimating the CATT with regard to the latent type, we study heterogeneity in treatment effect, in addition to heterogeneity in baseline outcomes. △ Less

Submitted 13 February, 2024; v1 submitted 5 April, 2022; originally announced April 2022.

arXiv:2203.16762 [pdf, other]

Mapping Topics in 100,000 Real-life Moral Dilemmas

Authors: Tuan Dung Nguyen, Georgiana Lyall, Alasdair Tran, Minjeong Shin, Nicholas George Carroll, Colin Klein, Lexing Xie

Abstract: Moral dilemmas play an important role in theorizing both about ethical norms and moral psychology. Yet thought experiments borrowed from the philosophical literature often lack the nuances and complexity of real life. We leverage 100,000 threads -- the largest collection to date -- from Reddit's r/AmItheAsshole to examine the features of everyday moral dilemmas. Combining topic modeling with evalu… ▽ More Moral dilemmas play an important role in theorizing both about ethical norms and moral psychology. Yet thought experiments borrowed from the philosophical literature often lack the nuances and complexity of real life. We leverage 100,000 threads -- the largest collection to date -- from Reddit's r/AmItheAsshole to examine the features of everyday moral dilemmas. Combining topic modeling with evaluation from both expert and crowd-sourced workers, we discover 47 finer-grained, meaningful topics and group them into five meta-categories. We show that most dilemmas combine at least two topics, such as family and money. We also observe that the pattern of topic co-occurrence carries interesting information about the structure of everyday moral concerns: for example, the generation of moral dilemmas from nominally neutral topics, and interaction effects in which final verdicts do not line up with the moral concerns in the original stories in any simple way. Our analysis demonstrates the utility of a fine-grained data-driven approach to online moral dilemmas, and provides a valuable resource for researchers aiming to explore the intersection of practical and theoretical ethics. △ Less

Submitted 30 March, 2022; originally announced March 2022.

Comments: To be published in ICWSM 2022

arXiv:2203.14709 [pdf, other]

MSTR: Multi-Scale Transformer for End-to-End Human-Object Interaction Detection

Authors: Bumsoo Kim, Jonghwan Mun, Kyoung-Woon On, Minchul Shin, Junhyun Lee, Eun-Sol Kim

Abstract: Human-Object Interaction (HOI) detection is the task of identifying a set of <human, object, interaction> triplets from an image. Recent work proposed transformer encoder-decoder architectures that successfully eliminated the need for many hand-designed components in HOI detection through end-to-end training. However, they are limited to single-scale feature resolution, providing suboptimal perfor… ▽ More Human-Object Interaction (HOI) detection is the task of identifying a set of <human, object, interaction> triplets from an image. Recent work proposed transformer encoder-decoder architectures that successfully eliminated the need for many hand-designed components in HOI detection through end-to-end training. However, they are limited to single-scale feature resolution, providing suboptimal performance in scenes containing humans, objects and their interactions with vastly different scales and distances. To tackle this problem, we propose a Multi-Scale TRansformer (MSTR) for HOI detection powered by two novel HOI-aware deformable attention modules called Dual-Entity attention and Entity-conditioned Context attention. While existing deformable attention comes at a huge cost in HOI detection performance, our proposed attention modules of MSTR learn to effectively attend to sampling points that are essential to identify interactions. In experiments, we achieve the new state-of-the-art performance on two HOI detection benchmarks. △ Less

Submitted 28 March, 2022; originally announced March 2022.

Comments: CVPR 2022

arXiv:2203.03897 [pdf, other]

Geodesic Multi-Modal Mixup for Robust Fine-Tuning

Authors: Changdae Oh, Junhyuk So, Hoyoon Byun, YongTaek Lim, Minchul Shin, Jong-June Jeon, Kyungwoo Song

Abstract: Pre-trained multi-modal models, such as CLIP, provide transferable embeddings and show promising results in diverse applications. However, the analysis of learned multi-modal embeddings is relatively unexplored, and the embedding transferability can be improved. In this work, we observe that CLIP holds separated embedding subspaces for two different modalities, and then we investigate it through t… ▽ More Pre-trained multi-modal models, such as CLIP, provide transferable embeddings and show promising results in diverse applications. However, the analysis of learned multi-modal embeddings is relatively unexplored, and the embedding transferability can be improved. In this work, we observe that CLIP holds separated embedding subspaces for two different modalities, and then we investigate it through the lens of uniformity-alignment to measure the quality of learned representation. Both theoretically and empirically, we show that CLIP retains poor uniformity and alignment even after fine-tuning. Such a lack of alignment and uniformity might restrict the transferability and robustness of embeddings. To this end, we devise a new fine-tuning method for robust representation equipping better alignment and uniformity. First, we propose a Geodesic Multi-Modal Mixup that mixes the embeddings of image and text to generate hard negative samples on the hypersphere. Then, we fine-tune the model on hard negatives as well as original negatives and positives with contrastive loss. Based on the theoretical analysis about hardness guarantee and limiting behavior, we justify the use of our method. Extensive experiments on retrieval, calibration, few- or zero-shot classification (under distribution shift), embedding arithmetic, and image captioning further show that our method provides transferable representations, enabling robust model adaptation on diverse tasks. Code: https://github.com/changdaeoh/multimodal-mixup △ Less

Submitted 6 November, 2023; v1 submitted 8 March, 2022; originally announced March 2022.

Comments: To appear at NeurIPS 2023

arXiv:2202.08419 [pdf, other]

High-Dimensional Time-Varying Coefficient Estimation

Authors: Donggyu Kim, Minseok Shin

Abstract: In this paper, we develop a novel high-dimensional time-varying coefficient estimation method, based on high-dimensional Ito diffusion processes. To account for high-dimensional time-varying coefficients, we first estimate local (or instantaneous) coefficients using a time-localized Dantzig selection scheme under a sparsity condition, which results in biased local coefficient estimators due to the… ▽ More In this paper, we develop a novel high-dimensional time-varying coefficient estimation method, based on high-dimensional Ito diffusion processes. To account for high-dimensional time-varying coefficients, we first estimate local (or instantaneous) coefficients using a time-localized Dantzig selection scheme under a sparsity condition, which results in biased local coefficient estimators due to the regularization. To handle the bias, we propose a debiasing scheme, which provides well-performing unbiased local coefficient estimators. With the unbiased local coefficient estimators, we estimate the integrated coefficient, and to further account for the sparsity of the coefficient process, we apply thresholding schemes. We call this Thresholding dEbiased Dantzig (TED). We establish asymptotic properties of the proposed TED estimator. In the empirical analysis, we apply the TED procedure to analyzing high-dimensional factor models using high-frequency data. △ Less

Submitted 8 October, 2023; v1 submitted 16 February, 2022; originally announced February 2022.

Comments: 50 pages, 5 figures

arXiv:2202.02901 [pdf, other]

Inter-subject Contrastive Learning for Subject Adaptive EEG-based Visual Recognition

Authors: Pilhyeon Lee, Sunhee Hwang, Jewook Lee, Minjung Shin, Seogkyu Jeon, Hyeran Byun

Abstract: This paper tackles the problem of subject adaptive EEG-based visual recognition. Its goal is to accurately predict the categories of visual stimuli based on EEG signals with only a handful of samples for the target subject during training. The key challenge is how to appropriately transfer the knowledge obtained from abundant data of source subjects to the subject of interest. To this end, we intr… ▽ More This paper tackles the problem of subject adaptive EEG-based visual recognition. Its goal is to accurately predict the categories of visual stimuli based on EEG signals with only a handful of samples for the target subject during training. The key challenge is how to appropriately transfer the knowledge obtained from abundant data of source subjects to the subject of interest. To this end, we introduce a novel method that allows for learning subject-independent representation by increasing the similarity of features sharing the same class but coming from different subjects. With the dedicated sampling principle, our model effectively captures the common knowledge shared across different subjects, thereby achieving promising performance for the target subject even under harsh problem settings with limited data. Specifically, on the EEG-ImageNet40 benchmark, our model records the top-1 / top-3 test accuracy of 72.6% / 91.6% when using only five EEG samples per class for the target subject. Our code is available at https://github.com/DeepBCI/Deep-BCI/tree/master/1_Intelligent_BCI/Inter_Subject_Contrastive_Learning_for_EEG. △ Less

Submitted 6 February, 2022; originally announced February 2022.

Comments: Accepted by the 10th IEEE International Winter Conference on Brain-Computer Interface (BCI 2022). Code is available at https://github.com/DeepBCI/Deep-BCI

arXiv:2201.07372 [pdf, other]

Prospective Learning: Principled Extrapolation to the Future

Authors: Ashwin De Silva, Rahul Ramesh, Lyle Ungar, Marshall Hussain Shuler, Noah J. Cowan, Michael Platt, Chen Li, Leyla Isik, Seung-Eon Roh, Adam Charles, Archana Venkataraman, Brian Caffo, Javier J. How, Justus M Kebschull, John W. Krakauer, Maxim Bichuch, Kaleab Alemayehu Kinfu, Eva Yezerets, Dinesh Jayaraman, Jong M. Shin, Soledad Villar, Ian Phillips, Carey E. Priebe, Thomas Hartung, Michael I. Miller , et al. (18 additional authors not shown)

Abstract: Learning is a process which can update decision rules, based on past experience, such that future performance improves. Traditionally, machine learning is often evaluated under the assumption that the future will be identical to the past in distribution or change adversarially. But these assumptions can be either too optimistic or pessimistic for many problems in the real world. Real world scenari… ▽ More Learning is a process which can update decision rules, based on past experience, such that future performance improves. Traditionally, machine learning is often evaluated under the assumption that the future will be identical to the past in distribution or change adversarially. But these assumptions can be either too optimistic or pessimistic for many problems in the real world. Real world scenarios evolve over multiple spatiotemporal scales with partially predictable dynamics. Here we reformulate the learning problem to one that centers around this idea of dynamic futures that are partially learnable. We conjecture that certain sequences of tasks are not retrospectively learnable (in which the data distribution is fixed), but are prospectively learnable (in which distributions may be dynamic), suggesting that prospective learning is more difficult in kind than retrospective learning. We argue that prospective learning more accurately characterizes many real world problems that (1) currently stymie existing artificial intelligence solutions and/or (2) lack adequate explanations for how natural intelligences solve them. Thus, studying prospective learning will lead to deeper insights and solutions to currently vexing challenges in both natural and artificial intelligences. △ Less

Submitted 13 July, 2023; v1 submitted 18 January, 2022; originally announced January 2022.

Comments: Accepted at the 2nd Conference on Lifelong Learning Agents (CoLLAs), 2023

arXiv:2201.05843 [pdf, other]

doi 10.1109/TII.2022.3143175

Cooperative Multi-Agent Deep Reinforcement Learning for Reliable Surveillance via Autonomous Multi-UAV Control

Authors: Won Joon Yun, Soohyun Park, Joongheon Kim, MyungJae Shin, Soyi Jung, David A. Mohaisen, Jae-Hyun Kim

Abstract: CCTV-based surveillance using unmanned aerial vehicles (UAVs) is considered a key technology for security in smart city environments. This paper creates a case where the UAVs with CCTV-cameras fly over the city area for flexible and reliable surveillance services. UAVs should be deployed to cover a large area while minimize overlapping and shadow areas for a reliable surveillance system. However,… ▽ More CCTV-based surveillance using unmanned aerial vehicles (UAVs) is considered a key technology for security in smart city environments. This paper creates a case where the UAVs with CCTV-cameras fly over the city area for flexible and reliable surveillance services. UAVs should be deployed to cover a large area while minimize overlapping and shadow areas for a reliable surveillance system. However, the operation of UAVs is subject to high uncertainty, necessitating autonomous recovery systems. This work develops a multi-agent deep reinforcement learning-based management scheme for reliable industry surveillance in smart city applications. The core idea this paper employs is autonomously replenishing the UAV's deficient network requirements with communications. Via intensive simulations, our proposed algorithm outperforms the state-of-the-art algorithms in terms of surveillance coverage, user support capability, and computational costs. △ Less

Submitted 15 January, 2022; originally announced January 2022.

Comments: 10 pages, 6 figures, Accepted for publication in IEEE Transactions on Industrial Informatics (TII)

arXiv:2201.05277 [pdf, other]

Boundary-aware Self-supervised Learning for Video Scene Segmentation

Authors: Jonghwan Mun, Minchul Shin, Gunsoo Han, Sangho Lee, Seongsu Ha, Joonseok Lee, Eun-Sol Kim

Abstract: Self-supervised learning has drawn attention through its effectiveness in learning in-domain representations with no ground-truth annotations; in particular, it is shown that properly designed pretext tasks (e.g., contrastive prediction task) bring significant performance gains for downstream tasks (e.g., classification task). Inspired from this, we tackle video scene segmentation, which is a task… ▽ More Self-supervised learning has drawn attention through its effectiveness in learning in-domain representations with no ground-truth annotations; in particular, it is shown that properly designed pretext tasks (e.g., contrastive prediction task) bring significant performance gains for downstream tasks (e.g., classification task). Inspired from this, we tackle video scene segmentation, which is a task of temporally localizing scene boundaries in a video, with a self-supervised learning framework where we mainly focus on designing effective pretext tasks. In our framework, we discover a pseudo-boundary from a sequence of shots by splitting it into two continuous, non-overlapping sub-sequences and leverage the pseudo-boundary to facilitate the pre-training. Based on this, we introduce three novel boundary-aware pretext tasks: 1) Shot-Scene Matching (SSM), 2) Contextual Group Matching (CGM) and 3) Pseudo-boundary Prediction (PP); SSM and CGM guide the model to maximize intra-scene similarity and inter-scene discrimination while PP encourages the model to identify transitional moments. Through comprehensive analysis, we empirically show that pre-training and transferring contextual representation are both critical to improving the video scene segmentation performance. Lastly, we achieve the new state-of-the-art on the MovieNet-SSeg benchmark. The code is available at https://github.com/kakaobrain/bassl. △ Less

Submitted 13 January, 2022; originally announced January 2022.

Comments: The code is available at https://github.com/kakaobrain/bassl

arXiv:2112.14710 [pdf, other]

Parallelized and Randomized Adversarial Imitation Learning for Safety-Critical Self-Driving Vehicles

Authors: Won Joon Yun, MyungJae Shin, Soyi Jung, Sean Kwon, Joongheon Kim

Abstract: Self-driving cars and autonomous driving research has been receiving considerable attention as major promising prospects in modern artificial intelligence applications. According to the evolution of advanced driver assistance system (ADAS), the design of self-driving vehicle and autonomous driving systems becomes complicated and safety-critical. In general, the intelligent system simultaneously an… ▽ More Self-driving cars and autonomous driving research has been receiving considerable attention as major promising prospects in modern artificial intelligence applications. According to the evolution of advanced driver assistance system (ADAS), the design of self-driving vehicle and autonomous driving systems becomes complicated and safety-critical. In general, the intelligent system simultaneously and efficiently activates ADAS functions. Therefore, it is essential to consider reliable ADAS function coordination to control the driving system, safely. In order to deal with this issue, this paper proposes a randomized adversarial imitation learning (RAIL) algorithm. The RAIL is a novel derivative-free imitation learning method for autonomous driving with various ADAS functions coordination; and thus it imitates the operation of decision maker that controls autonomous driving with various ADAS functions. The proposed method is able to train the decision maker that deals with the LIDAR data and controls the autonomous driving in multi-lane complex highway environments. The simulation-based evaluation verifies that the proposed method achieves desired performance. △ Less

Submitted 26 December, 2021; originally announced December 2021.

Comments: 13 pages, 8 figures

arXiv:2112.07104 [pdf, other]

doi 10.3847/1538-3881/ac4335

Estimation of Photometric Redshifts. II. Identification of Out-of-Distribution Data with Neural Networks

Authors: Joongoo Lee, Min-Su Shin

Abstract: In this study, we propose a three-stage training approach of neural networks for both photometric redshift estimation of galaxies and detection of out-of-distribution (OOD) objects. Our approach comprises supervised and unsupervised learning, which enables using unlabeled (UL) data for OOD detection in training the networks. Employing the UL data, which is the dataset most similar to the real-worl… ▽ More In this study, we propose a three-stage training approach of neural networks for both photometric redshift estimation of galaxies and detection of out-of-distribution (OOD) objects. Our approach comprises supervised and unsupervised learning, which enables using unlabeled (UL) data for OOD detection in training the networks. Employing the UL data, which is the dataset most similar to the real-world data, ensures a reliable usage of the trained model in practice. We quantitatively assess the model performance of photometric redshift estimation and OOD detection using in-distribution (ID) galaxies and labeled OOD (LOOD) samples such as stars and quasars. Our model successfully produces photometric redshifts matched with spectroscopic redshifts for the ID samples and identifies well the LOOD objects with more than 98% accuracy. Although quantitative assessment with the UL samples is impracticable due to the lack of labels and spectroscopic redshifts, we also find that our model successfully estimates reasonable photometric redshifts for ID-like UL samples and filter OOD-like UL objects. The code for the model implementation is available at https://github.com/GooLee0123/MBRNN_OOD. △ Less

Submitted 13 December, 2021; originally announced December 2021.

Comments: 22 pages, accepted to AJ

arXiv:2111.12596 [pdf, other]

Characterizing Sparse Asteroid Light Curves with Gaussian Processes

Authors: Christina Willecke Lindberg, Daniela Huppenkothen, R. Lynne Jones, Bryce T. Bolin, Mario Juric, V. Zach Golkhou, Eric C. Bellm, Andrew J. Drake, Matthew J. Graham, Russ R. Laher, Ashish A. Mahabal, Frank J. Masci, Reed Riddle, Kyung Min Shin

Abstract: In the era of wide-field surveys like the Zwicky Transient Facility and the Rubin Observatory's Legacy Survey of Space and Time, sparse photometric measurements constitute an increasing percentage of asteroid observations, particularly for asteroids newly discovered in these large surveys. Follow-up observations to supplement these sparse data may be prohibitively expensive in many cases, so to ov… ▽ More In the era of wide-field surveys like the Zwicky Transient Facility and the Rubin Observatory's Legacy Survey of Space and Time, sparse photometric measurements constitute an increasing percentage of asteroid observations, particularly for asteroids newly discovered in these large surveys. Follow-up observations to supplement these sparse data may be prohibitively expensive in many cases, so to overcome these sampling limitations, we introduce a flexible model based on Gaussian Processes to enable Bayesian parameter inference of asteroid time series data. This model is designed to be flexible and extensible, and can model multiple asteroid properties such as the rotation period, light curve amplitude, changing pulse profile, and magnitude changes due to the phase angle evolution at the same time. Here, we focus on the inference of rotation periods. Based on both simulated light curves and real observations from the Zwicky Transient Facility, we show that the new model reliably infers rotational periods from sparsely sampled light curves, and generally provides well-constrained posterior probability densities for the model parameters. We propose this framework as an intermediate method between fast, but very limited period detection algorithms and much more comprehensive, but computationally expensive shape modeling based on ray-tracing codes. △ Less

Submitted 24 November, 2021; originally announced November 2021.

Comments: 27 pages, 18 figures, accepted for publication in AJ, associated software available at https://github.com/dirac-institute/asterogap/tree/v0.1

arXiv:2110.14469 [pdf]

doi 10.1007/s10035-022-01240-8

Phenotypic Trait of Particle Geometries

Authors: Seung Jae Lee, Moochul Shin, Chang Hoon Lee, Priya Tripathi

Abstract: People of a race appear different but share a 'phenotypic trait' due to a common genetic origin. Mineral particles are like humans: they appear different despite having a same geological origin. Then, do the particles have some sort of 'phenotypic trait' in the geometries as we do? How can we characterize the phenotypic trait of particle geometries? This paper discusses a new perspective on how th… ▽ More People of a race appear different but share a 'phenotypic trait' due to a common genetic origin. Mineral particles are like humans: they appear different despite having a same geological origin. Then, do the particles have some sort of 'phenotypic trait' in the geometries as we do? How can we characterize the phenotypic trait of particle geometries? This paper discusses a new perspective on how the phenotypic trait can be discovered in the particle geometries and how the 'variation' and 'average' of the geometry can be quantified. The key idea is using the power-law between particle surface-area-to-volume ratio ($A/V$) and the particle volume ($V$) that uncovers the phenotypic trait in terms of $α$ and $β^*$: From the log-transformed relation of $V = (A/V)^α {\times} β^*$, the power value $α$ represents the relation between shape and size, while the term $β^*$ (evaluated by fixing $α$ = -3) informs the angularity of the average shape in the granular material. In other words, $α$ represents the 'variation' of the geometry while $β^*$ is concerned with the 'average' geometry of a granular material. Furthermore, this study finds that $A/V$ and $V$ can be also used to characterize individual particle shape in terms of Wadell's true Sphericity ($S$). This paper also revisits the $M = A/V {\times} L/6$ concept originally introduced by Su et al. (2020) and finds the shape index $M$ is an extended form of $S$ providing additional information about the particle elongation. Therefore, the proposed method using $A/V$ and $V$ provides a unified approach that can characterize the particle geometry at multiple scales from granular material to a single particle. Ref.: Su, Y.F., Bhattacharya, S., Lee, S.J., Lee, C.H., Shin, M.: A new interpretation of three-dimensional particle geometry: M-A-V-L. Transp. Geotech. 23, 100328 (2020). △ Less

Submitted 27 October, 2021; v1 submitted 27 October, 2021; originally announced October 2021.

Comments: 30 pages, 15 figures, 8 tables; No difference from arXiv:2110.14469v1 except the first page stamp and the header on the manuscript

Journal ref: Granular Matter 24, 79 (2022)

arXiv:2110.13531 [pdf, other]

Bayesian Estimation and Comparison of Conditional Moment Models

Authors: Siddhartha Chib, Minchul Shin, Anna Simoni

Abstract: We consider the Bayesian analysis of models in which the unknown distribution of the outcomes is specified up to a set of conditional moment restrictions. The nonparametric exponentially tilted empirical likelihood function is constructed to satisfy a sequence of unconditional moments based on an increasing (in sample size) vector of approximating functions (such as tensor splines based on the spl… ▽ More We consider the Bayesian analysis of models in which the unknown distribution of the outcomes is specified up to a set of conditional moment restrictions. The nonparametric exponentially tilted empirical likelihood function is constructed to satisfy a sequence of unconditional moments based on an increasing (in sample size) vector of approximating functions (such as tensor splines based on the splines of each conditioning variable). For any given sample size, results are robust to the number of expanded moments. We derive Bernstein-von Mises theorems for the behavior of the posterior distribution under both correct and incorrect specification of the conditional moments, subject to growth rate conditions (slower under misspecification) on the number of approximating functions. A large-sample theory for comparing different conditional moment models is also developed. The central result is that the marginal likelihood criterion selects the model that is less misspecified. We also introduce sparsity-based model search for high-dimensional conditioning variables, and provide efficient MCMC computations for high-dimensional parameters. Along with clarifying examples, the framework is illustrated with real-data applications to risk-factor determination in finance, and causal inference under conditional ignorability. △ Less

Submitted 26 October, 2021; originally announced October 2021.

arXiv:2110.07870 [pdf, other]

doi 10.1051/0004-6361/202039551

A new approach to feature-based asteroid taxonomy in 3D color space: 1. SDSS photometric system

Authors: Dong-Goo Roh, Hong-Kyu Moon, Min-Su Shin, Francesca E. DeMeo

Abstract: The taxonomic classification of asteroids has been mostly based on spectroscopic observations with wavelengths spanning from the VIS to the NIR. VIS-NIR spectra of $\sim$2500 asteroids have been obtained since the 1970s; the SDSS MOC 4 was released with $\sim$4 $\times$ 10$^{5}$ measurements of asteroid positions and colors in the early 2000s. A number of works then devised methods to classify the… ▽ More The taxonomic classification of asteroids has been mostly based on spectroscopic observations with wavelengths spanning from the VIS to the NIR. VIS-NIR spectra of $\sim$2500 asteroids have been obtained since the 1970s; the SDSS MOC 4 was released with $\sim$4 $\times$ 10$^{5}$ measurements of asteroid positions and colors in the early 2000s. A number of works then devised methods to classify these data within the framework of existing taxonomic systems. Some of these works, however, used 2D parameter space that displayed a continuous distribution of clouds of data points resulting in boundaries that were artificially defined. We introduce here a more advanced method to classify asteroids based on existing systems. This approach is simply represented by a triplet of SDSS colors. The distributions and memberships of each taxonomic type are determined by machine learning methods in the form of both unsupervised and semi-supervised learning. We apply our scheme to MOC 4 calibrated with VIS-NIR reflectance spectra. We successfully separate seven different taxonomy classifications with which we have a sufficient number of spectroscopic datasets. We found the overlapping regions of taxonomic types in a 2D plane were separated with relatively clear boundaries in the 3D space newly defined in this work. Our scheme explicitly discriminates between different taxonomic types, which is an improvement over existing systems. This new method for taxonomic classification has a great deal of scalability for asteroid research, such as space weathering in the S-complex, and the origin and evolution of asteroid families. We present the structure of the asteroid belt, and describe the orbital distribution based on our newly assigned taxonomic classifications. It is also possible to extend the methods presented here to other photometric systems, such as the Johnson-Cousins and LSST filter systems. △ Less

Submitted 15 October, 2021; originally announced October 2021.

Comments: 13 pages, 7 figures, accepted for publication in Astronomy & Astrophysics

Journal ref: A&A 664, A51 (2022)

arXiv:2110.06476 [pdf, other]

Winning the ICCV'2021 VALUE Challenge: Task-aware Ensemble and Transfer Learning with Visual Concepts

Authors: Minchul Shin, Jonghwan Mun, Kyoung-Woon On, Woo-Young Kang, Gunsoo Han, Eun-Sol Kim

Abstract: The VALUE (Video-And-Language Understanding Evaluation) benchmark is newly introduced to evaluate and analyze multi-modal representation learning algorithms on three video-and-language tasks: Retrieval, QA, and Captioning. The main objective of the VALUE challenge is to train a task-agnostic model that is simultaneously applicable for various tasks with different characteristics. This technical re… ▽ More The VALUE (Video-And-Language Understanding Evaluation) benchmark is newly introduced to evaluate and analyze multi-modal representation learning algorithms on three video-and-language tasks: Retrieval, QA, and Captioning. The main objective of the VALUE challenge is to train a task-agnostic model that is simultaneously applicable for various tasks with different characteristics. This technical report describes our winning strategies for the VALUE challenge: 1) single model optimization, 2) transfer learning with visual concepts, and 3) task-aware ensemble. The first and third strategies are designed to address heterogeneous characteristics of each task, and the second one is to leverage rich and fine-grained visual information. We provide a detailed and comprehensive analysis with extensive experimental results. Based on our approach, we ranked first place on the VALUE and QA phases for the competition. △ Less

Submitted 12 October, 2021; originally announced October 2021.

Comments: CLVL workshop at ICCV 2021

arXiv:2110.05726 [pdf, other]

doi 10.3847/1538-3881/ac2e96

Estimation of Photometric Redshifts. I. Machine Learning Inference for Pan-STARRS1 Galaxies Using Neural Networks

Authors: Joongoo Lee, Min-Su Shin

Abstract: We present a new machine learning model for estimating photometric redshifts with improved accuracy for galaxies in Pan-STARRS1 data release 1. Depending on the estimation range of redshifts, this model based on neural networks can handle the difficulty for inferring photometric redshifts. Moreover, to reduce bias induced by the new model's ability to deal with estimation difficulty, it exploits t… ▽ More We present a new machine learning model for estimating photometric redshifts with improved accuracy for galaxies in Pan-STARRS1 data release 1. Depending on the estimation range of redshifts, this model based on neural networks can handle the difficulty for inferring photometric redshifts. Moreover, to reduce bias induced by the new model's ability to deal with estimation difficulty, it exploits the power of ensemble learning. We extensively examine the mapping between input features and target redshift spaces to which the model is validly applicable to discover the strength and weaknesses of trained model. Because our trained model is well calibrated, our model produces reliable confidence information about objects with non-catastrophic estimation. While our model is highly accurate for most test examples residing in the input space, where training samples are densely populated, its accuracy quickly diminishes for sparse samples and unobserved objects (i.e., unseen samples) in training. We report that out-of-distribution (OOD) samples for our model contain both physically OOD objects (i.e., stars and quasars) and galaxies with observed properties not represented by training data. The code for our model is available at https://github.com/GooLee0123/MBRNN for other uses of the model and retraining the model with different data. △ Less

Submitted 11 October, 2021; originally announced October 2021.

Comments: 26 pages, accepted to AJ

Showing 1–50 of 161 results for author: Shin, M