-
Cross-Lingual Unlearning of Selective Knowledge in Multilingual Language Models
Authors:
Minseok Choi,
Kyunghyun Min,
Jaegul Choo
Abstract:
Pretrained language models memorize vast amounts of information, including private and copyrighted data, raising significant safety concerns. Retraining these models after excluding sensitive data is prohibitively expensive, making machine unlearning a viable, cost-effective alternative. Previous research has focused on machine unlearning for monolingual models, but we find that unlearning in one…
▽ More
Pretrained language models memorize vast amounts of information, including private and copyrighted data, raising significant safety concerns. Retraining these models after excluding sensitive data is prohibitively expensive, making machine unlearning a viable, cost-effective alternative. Previous research has focused on machine unlearning for monolingual models, but we find that unlearning in one language does not necessarily transfer to others. This vulnerability makes models susceptible to low-resource language attacks, where sensitive information remains accessible in less dominant languages. This paper presents a pioneering approach to machine unlearning for multilingual language models, selectively erasing information across different languages while maintaining overall performance. Specifically, our method employs an adaptive unlearning scheme that assigns language-dependent weights to address different language performances of multilingual language models. Empirical results demonstrate the effectiveness of our framework compared to existing unlearning baselines, setting a new standard for secure and adaptable multilingual language models.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
SViTT-Ego: A Sparse Video-Text Transformer for Egocentric Video
Authors:
Hector A. Valdez,
Kyle Min,
Subarna Tripathi
Abstract:
Pretraining egocentric vision-language models has become essential to improving downstream egocentric video-text tasks. These egocentric foundation models commonly use the transformer architecture. The memory footprint of these models during pretraining can be substantial. Therefore, we pretrain SViTT-Ego, the first sparse egocentric video-text transformer model integrating edge and node sparsific…
▽ More
Pretraining egocentric vision-language models has become essential to improving downstream egocentric video-text tasks. These egocentric foundation models commonly use the transformer architecture. The memory footprint of these models during pretraining can be substantial. Therefore, we pretrain SViTT-Ego, the first sparse egocentric video-text transformer model integrating edge and node sparsification. We pretrain on the EgoClip dataset and incorporate the egocentric-friendly objective EgoNCE, instead of the frequently used InfoNCE. Most notably, SViTT-Ego obtains a +2.8% gain on EgoMCQ (intra-video) accuracy compared to LAVILA large, with no additional data augmentation techniques other than standard image augmentations, yet pretrainable on memory-limited devices.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Contrastive Language Video Time Pre-training
Authors:
Hengyue Liu,
Kyle Min,
Hector A. Valdez,
Subarna Tripathi
Abstract:
We introduce LAVITI, a novel approach to learning language, video, and temporal representations in long-form videos via contrastive learning. Different from pre-training on video-text pairs like EgoVLP, LAVITI aims to align language, video, and temporal features by extracting meaningful moments in untrimmed videos. Our model employs a set of learnable moment queries to decode clip-level visual, la…
▽ More
We introduce LAVITI, a novel approach to learning language, video, and temporal representations in long-form videos via contrastive learning. Different from pre-training on video-text pairs like EgoVLP, LAVITI aims to align language, video, and temporal features by extracting meaningful moments in untrimmed videos. Our model employs a set of learnable moment queries to decode clip-level visual, language, and temporal features. In addition to vision and language alignment, we introduce relative temporal embeddings (TE) to represent timestamps in videos, which enables contrastive learning of time. Significantly different from traditional approaches, the prediction of a particular timestamp is transformed by computing the similarity score between the predicted TE and all TEs. Furthermore, existing approaches for video understanding are mainly designed for short videos due to high computational complexity and memory footprint. Our method can be trained on the Ego4D dataset with only 8 NVIDIA RTX-3090 GPUs in a day. We validated our method on CharadesEgo action recognition, achieving state-of-the-art results.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
R.A.C.E.: Robust Adversarial Concept Erasure for Secure Text-to-Image Diffusion Model
Authors:
Changhoon Kim,
Kyle Min,
Yezhou Yang
Abstract:
In the evolving landscape of text-to-image (T2I) diffusion models, the remarkable capability to generate high-quality images from textual descriptions faces challenges with the potential misuse of reproducing sensitive content. To address this critical issue, we introduce Robust Adversarial Concept Erase (RACE), a novel approach designed to mitigate these risks by enhancing the robustness of conce…
▽ More
In the evolving landscape of text-to-image (T2I) diffusion models, the remarkable capability to generate high-quality images from textual descriptions faces challenges with the potential misuse of reproducing sensitive content. To address this critical issue, we introduce Robust Adversarial Concept Erase (RACE), a novel approach designed to mitigate these risks by enhancing the robustness of concept erasure method for T2I models. RACE utilizes a sophisticated adversarial training framework to identify and mitigate adversarial text embeddings, significantly reducing the Attack Success Rate (ASR). Impressively, RACE achieves a 30 percentage point reduction in ASR for the ``nudity'' concept against the leading white-box attack method. Our extensive evaluations demonstrate RACE's effectiveness in defending against both white-box and black-box attacks, marking a significant advancement in protecting T2I diffusion models from generating inappropriate or misleading imagery. This work underlines the essential need for proactive defense measures in adapting to the rapidly advancing field of adversarial challenges.
△ Less
Submitted 25 May, 2024;
originally announced May 2024.
-
Return of EM: Entity-driven Answer Set Expansion for QA Evaluation
Authors:
Dongryeol Lee,
Minwoo Lee,
Kyungmin Min,
Joonsuk Park,
Kyomin Jung
Abstract:
Recently, directly using large language models (LLMs) has been shown to be the most reliable method to evaluate QA models. However, it suffers from limited interpretability, high cost, and environmental harm. To address these, we propose to use soft EM with entity-driven answer set expansion. Our approach expands the gold answer set to include diverse surface forms, based on the observation that t…
▽ More
Recently, directly using large language models (LLMs) has been shown to be the most reliable method to evaluate QA models. However, it suffers from limited interpretability, high cost, and environmental harm. To address these, we propose to use soft EM with entity-driven answer set expansion. Our approach expands the gold answer set to include diverse surface forms, based on the observation that the surface forms often follow particular patterns depending on the entity type. The experimental results show that our method outperforms traditional evaluation methods by a large margin. Moreover, the reliability of our evaluation method is comparable to that of LLM-based ones, while offering the benefits of high interpretability and reduced environmental harm.
△ Less
Submitted 11 June, 2024; v1 submitted 24 April, 2024;
originally announced April 2024.
-
Self-Improving Interference Management Based on Deep Learning With Uncertainty Quantification
Authors:
Hyun-Suk Lee,
Do-Yup Kim,
Kyungsik Min
Abstract:
This paper presents a groundbreaking self-improving interference management framework tailored for wireless communications, integrating deep learning with uncertainty quantification to enhance overall system performance. Our approach addresses the computational challenges inherent in traditional optimization-based algorithms by harnessing deep learning models to predict optimal interference manage…
▽ More
This paper presents a groundbreaking self-improving interference management framework tailored for wireless communications, integrating deep learning with uncertainty quantification to enhance overall system performance. Our approach addresses the computational challenges inherent in traditional optimization-based algorithms by harnessing deep learning models to predict optimal interference management solutions. A significant breakthrough of our framework is its acknowledgment of the limitations inherent in data-driven models, particularly in scenarios not adequately represented by the training dataset. To overcome these challenges, we propose a method for uncertainty quantification, accompanied by a qualifying criterion, to assess the trustworthiness of model predictions. This framework strategically alternates between model-generated solutions and traditional algorithms, guided by a criterion that assesses the prediction credibility based on quantified uncertainties. Experimental results validate the framework's efficacy, demonstrating its superiority over traditional deep learning models, notably in scenarios underrepresented in the training dataset. This work marks a pioneering endeavor in harnessing self-improving deep learning for interference management, through the lens of uncertainty quantification.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
Action Scene Graphs for Long-Form Understanding of Egocentric Videos
Authors:
Ivan Rodin,
Antonino Furnari,
Kyle Min,
Subarna Tripathi,
Giovanni Maria Farinella
Abstract:
We present Egocentric Action Scene Graphs (EASGs), a new representation for long-form understanding of egocentric videos. EASGs extend standard manually-annotated representations of egocentric videos, such as verb-noun action labels, by providing a temporally evolving graph-based description of the actions performed by the camera wearer, including interacted objects, their relationships, and how a…
▽ More
We present Egocentric Action Scene Graphs (EASGs), a new representation for long-form understanding of egocentric videos. EASGs extend standard manually-annotated representations of egocentric videos, such as verb-noun action labels, by providing a temporally evolving graph-based description of the actions performed by the camera wearer, including interacted objects, their relationships, and how actions unfold in time. Through a novel annotation procedure, we extend the Ego4D dataset by adding manually labeled Egocentric Action Scene Graphs offering a rich set of annotations designed for long-from egocentric video understanding. We hence define the EASG generation task and provide a baseline approach, establishing preliminary benchmarks. Experiments on two downstream tasks, egocentric action anticipation and egocentric activity summarization, highlight the effectiveness of EASGs for long-form egocentric video understanding. We will release the dataset and the code to replicate experiments and annotations.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
A human brain atlas of chi-separation for normative iron and myelin distributions
Authors:
Kyeongseon Min,
Beomseok Sohn,
Woo Jung Kim,
Chae Jung Park,
Soohwa Song,
Dong Hoon Shin,
Kyung Won Chang,
Na-Young Shin,
Minjun Kim,
Hyeong-Geol Shin,
Phil Hyu Lee,
Jongho Lee
Abstract:
Iron and myelin are primary susceptibility sources in the human brain. These substances are essential for healthy brain, and their abnormalities are often related to various neurological disorders. Recently, an advanced susceptibility mapping technique, which is referred to as chi-separation, has been proposed, successfully disentangling paramagnetic iron from diamagnetic myelin. This method opene…
▽ More
Iron and myelin are primary susceptibility sources in the human brain. These substances are essential for healthy brain, and their abnormalities are often related to various neurological disorders. Recently, an advanced susceptibility mapping technique, which is referred to as chi-separation, has been proposed, successfully disentangling paramagnetic iron from diamagnetic myelin. This method opened a potential for generating high resolution iron and myelin maps in the brain. Utilizing this technique, this study constructs a normative chi-separation atlas from 106 healthy human brains. The resulting atlas provides detailed anatomical structures associated with the distributions of iron and myelin, clearly delineating subcortical nuclei, thalamic nuclei, and white matter fiber bundles. Additionally, susceptibility values in a number of regions of interest are reported along with age-dependent changes. This atlas may have direct applications such as localization of subcortical structures for deep brain stimulation or high-intensity focused ultrasound and also serve as a valuable resource for future research.
△ Less
Submitted 2 April, 2024; v1 submitted 8 November, 2023;
originally announced November 2023.
-
Fuzzy Relational Databases via Associative Arrays
Authors:
Kevin Min,
Hayden Jananthan,
Jeremy Kepner
Abstract:
The increasing rise in artificial intelligence has made the use of imprecise language in computer programs like ChatGPT more prominent. Fuzzy logic addresses this form of imprecise language by introducing the concept of fuzzy sets, where elements belong to the set with a certain membership value (called the fuzzy value). This paper combines fuzzy data with relational algebra to provide the mathema…
▽ More
The increasing rise in artificial intelligence has made the use of imprecise language in computer programs like ChatGPT more prominent. Fuzzy logic addresses this form of imprecise language by introducing the concept of fuzzy sets, where elements belong to the set with a certain membership value (called the fuzzy value). This paper combines fuzzy data with relational algebra to provide the mathematical foundation for a fuzzy database querying language, describing various useful operations in the language of linear algebra and multiset operations, in addition to rigorously proving key identities.
△ Less
Submitted 6 November, 2023;
originally announced November 2023.
-
Learning to Discover Skills through Guidance
Authors:
Hyunseung Kim,
Byungkun Lee,
Hojoon Lee,
Dongyoon Hwang,
Sejik Park,
Kyushik Min,
Jaegul Choo
Abstract:
In the field of unsupervised skill discovery (USD), a major challenge is limited exploration, primarily due to substantial penalties when skills deviate from their initial trajectories. To enhance exploration, recent methodologies employ auxiliary rewards to maximize the epistemic uncertainty or entropy of states. However, we have identified that the effectiveness of these rewards declines as the…
▽ More
In the field of unsupervised skill discovery (USD), a major challenge is limited exploration, primarily due to substantial penalties when skills deviate from their initial trajectories. To enhance exploration, recent methodologies employ auxiliary rewards to maximize the epistemic uncertainty or entropy of states. However, we have identified that the effectiveness of these rewards declines as the environmental complexity rises. Therefore, we present a novel USD algorithm, skill discovery with guidance (DISCO-DANCE), which (1) selects the guide skill that possesses the highest potential to reach unexplored states, (2) guides other skills to follow guide skill, then (3) the guided skills are dispersed to maximize their discriminability in unexplored states. Empirical evaluation demonstrates that DISCO-DANCE outperforms other USD baselines in challenging environments, including two navigation benchmarks and a continuous control benchmark. Qualitative visualizations and code of DISCO-DANCE are available at https://mynsng.github.io/discodance.
△ Less
Submitted 1 November, 2023; v1 submitted 31 October, 2023;
originally announced October 2023.
-
Towards Validating Long-Term User Feedbacks in Interactive Recommendation Systems
Authors:
Hojoon Lee,
Dongyoon Hwang,
Kyushik Min,
Jaegul Choo
Abstract:
Interactive Recommender Systems (IRSs) have attracted a lot of attention, due to their ability to model interactive processes between users and recommender systems. Numerous approaches have adopted Reinforcement Learning (RL) algorithms, as these can directly maximize users' cumulative rewards. In IRS, researchers commonly utilize publicly available review datasets to compare and evaluate algorith…
▽ More
Interactive Recommender Systems (IRSs) have attracted a lot of attention, due to their ability to model interactive processes between users and recommender systems. Numerous approaches have adopted Reinforcement Learning (RL) algorithms, as these can directly maximize users' cumulative rewards. In IRS, researchers commonly utilize publicly available review datasets to compare and evaluate algorithms. However, user feedback provided in public datasets merely includes instant responses (e.g., a rating), with no inclusion of delayed responses (e.g., the dwell time and the lifetime value). Thus, the question remains whether these review datasets are an appropriate choice to evaluate the long-term effects of the IRS. In this work, we revisited experiments on IRS with review datasets and compared RL-based models with a simple reward model that greedily recommends the item with the highest one-step reward. Following extensive analysis, we can reveal three main findings: First, a simple greedy reward model consistently outperforms RL-based models in maximizing cumulative rewards. Second, applying higher weighting to long-term rewards leads to a degradation of recommendation performance. Third, user feedbacks have mere long-term effects on the benchmark datasets. Based on our findings, we conclude that a dataset has to be carefully verified and that a simple greedy baseline should be included for a proper evaluation of RL-based IRS approaches.
△ Less
Submitted 21 August, 2023;
originally announced August 2023.
-
STHG: Spatial-Temporal Heterogeneous Graph Learning for Advanced Audio-Visual Diarization
Authors:
Kyle Min
Abstract:
This report introduces our novel method named STHG for the Audio-Visual Diarization task of the Ego4D Challenge 2023. Our key innovation is that we model all the speakers in a video using a single, unified heterogeneous graph learning framework. Unlike previous approaches that require a separate component solely for the camera wearer, STHG can jointly detect the speech activities of all people inc…
▽ More
This report introduces our novel method named STHG for the Audio-Visual Diarization task of the Ego4D Challenge 2023. Our key innovation is that we model all the speakers in a video using a single, unified heterogeneous graph learning framework. Unlike previous approaches that require a separate component solely for the camera wearer, STHG can jointly detect the speech activities of all people including the camera wearer. Our final method obtains 61.1% DER on the test set of Ego4D, which significantly outperforms all the baselines as well as last year's winner. Our submission achieved 1st place in the Ego4D Challenge 2023. We additionally demonstrate that applying the off-the-shelf speech recognition system to the diarized speech segments by STHG produces a competitive performance on the Speech Transcription task of this challenge.
△ Less
Submitted 31 October, 2023; v1 submitted 18 June, 2023;
originally announced June 2023.
-
WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models
Authors:
Changhoon Kim,
Kyle Min,
Maitreya Patel,
Sheng Cheng,
Yezhou Yang
Abstract:
The rapid advancement of generative models, facilitating the creation of hyper-realistic images from textual descriptions, has concurrently escalated critical societal concerns such as misinformation. Although providing some mitigation, traditional fingerprinting mechanisms fall short in attributing responsibility for the malicious use of synthetic images. This paper introduces a novel approach to…
▽ More
The rapid advancement of generative models, facilitating the creation of hyper-realistic images from textual descriptions, has concurrently escalated critical societal concerns such as misinformation. Although providing some mitigation, traditional fingerprinting mechanisms fall short in attributing responsibility for the malicious use of synthetic images. This paper introduces a novel approach to model fingerprinting that assigns responsibility for the generated images, thereby serving as a potential countermeasure to model misuse. Our method modifies generative models based on each user's unique digital fingerprint, imprinting a unique identifier onto the resultant content that can be traced back to the user. This approach, incorporating fine-tuning into Text-to-Image (T2I) tasks using the Stable Diffusion Model, demonstrates near-perfect attribution accuracy with a minimal impact on output quality. Through extensive evaluation, we show that our method outperforms baseline methods with an average improvement of 11\% in handling image post-processes. Our method presents a promising and novel avenue for accountable model distribution and responsible use. Our code is available in \url{https://github.com/kylemin/WOUAF}.
△ Less
Submitted 24 April, 2024; v1 submitted 7 June, 2023;
originally announced June 2023.
-
Evaluation Strategy of Time-series Anomaly Detection with Decay Function
Authors:
Yongwan Gim,
Kyushik Min
Abstract:
Recent algorithms of time-series anomaly detection have been evaluated by applying a Point Adjustment (PA) protocol. However, the PA protocol has a problem of overestimating the performance of the detection algorithms because it only depends on the number of detected abnormal segments and their size. We propose a novel evaluation protocol called the Point-Adjusted protocol with decay function (PAd…
▽ More
Recent algorithms of time-series anomaly detection have been evaluated by applying a Point Adjustment (PA) protocol. However, the PA protocol has a problem of overestimating the performance of the detection algorithms because it only depends on the number of detected abnormal segments and their size. We propose a novel evaluation protocol called the Point-Adjusted protocol with decay function (PAdf) to evaluate the time-series anomaly detection algorithm by reflecting the following ideal requirements: detect anomalies quickly and accurately without false alarms. This paper theoretically and experimentally shows that the PAdf protocol solves the over- and under-estimation problems of existing protocols such as PA and PA\%K. By conducting re-evaluations of SOTA models in benchmark datasets, we show that the PA protocol only focuses on finding many anomalous segments, whereas the score of the PAdf protocol considers not only finding many segments but also detecting anomalies quickly without delay.
△ Less
Submitted 15 May, 2023;
originally announced May 2023.
-
SViTT: Temporal Learning of Sparse Video-Text Transformers
Authors:
Yi Li,
Kyle Min,
Subarna Tripathi,
Nuno Vasconcelos
Abstract:
Do video-text transformers learn to model temporal relationships across frames? Despite their immense capacity and the abundance of multimodal training data, recent work has revealed the strong tendency of video-text models towards frame-based spatial representations, while temporal reasoning remains largely unsolved. In this work, we identify several key challenges in temporal learning of video-t…
▽ More
Do video-text transformers learn to model temporal relationships across frames? Despite their immense capacity and the abundance of multimodal training data, recent work has revealed the strong tendency of video-text models towards frame-based spatial representations, while temporal reasoning remains largely unsolved. In this work, we identify several key challenges in temporal learning of video-text transformers: the spatiotemporal trade-off from limited network size; the curse of dimensionality for multi-frame modeling; and the diminishing returns of semantic information by extending clip length. Guided by these findings, we propose SViTT, a sparse video-text architecture that performs multi-frame reasoning with significantly lower cost than naive transformers with dense attention. Analogous to graph-based networks, SViTT employs two forms of sparsity: edge sparsity that limits the query-key communications between tokens in self-attention, and node sparsity that discards uninformative visual tokens. Trained with a curriculum which increases model sparsity with the clip length, SViTT outperforms dense transformer baselines on multiple video-text retrieval and question answering benchmarks, with a fraction of computational cost. Project page: http://svcl.ucsd.edu/projects/svitt.
△ Less
Submitted 18 April, 2023;
originally announced April 2023.
-
Prediction of Protein Aggregation Propensity via Data-driven Approaches
Authors:
Seungpyo Kang,
Minseon Kim,
Jiwon Sun,
Myeonghun Lee,
Kyoungmin Min
Abstract:
Protein aggregation occurs when misfolded or unfolded proteins physically bind together, and can promote the development of various amyloid diseases. This study aimed to construct surrogate models for predicting protein aggregation via data-driven methods using two types of databases. First, an aggregation propensity score database was constructed by calculating the scores for protein structures i…
▽ More
Protein aggregation occurs when misfolded or unfolded proteins physically bind together, and can promote the development of various amyloid diseases. This study aimed to construct surrogate models for predicting protein aggregation via data-driven methods using two types of databases. First, an aggregation propensity score database was constructed by calculating the scores for protein structures in Protein Data Bank using Aggrescan3D 2.0. Moreover, feature- and graph-based models for predicting protein aggregation have been developed using this database. The graph-based regression model outperformed the feature-based model, resulting in R2 of 0.95, although it intrinsically required protein structures. Second, for the experimental data, a feature-based model was built using Curated Protein Aggregation Database 2.0, to predict the aggregated intensity curves. In summary, this study suggests the approaches that are more effective in predicting protein aggregation, depending on the type of descriptor and the database.
△ Less
Submitted 6 April, 2023;
originally announced April 2023.
-
Unbiased Scene Graph Generation in Videos
Authors:
Sayak Nag,
Kyle Min,
Subarna Tripathi,
Amit K. Roy Chowdhury
Abstract:
The task of dynamic scene graph generation (SGG) from videos is complicated and challenging due to the inherent dynamics of a scene, temporal fluctuation of model predictions, and the long-tailed distribution of the visual relationships in addition to the already existing challenges in image-based SGG. Existing methods for dynamic SGG have primarily focused on capturing spatio-temporal context usi…
▽ More
The task of dynamic scene graph generation (SGG) from videos is complicated and challenging due to the inherent dynamics of a scene, temporal fluctuation of model predictions, and the long-tailed distribution of the visual relationships in addition to the already existing challenges in image-based SGG. Existing methods for dynamic SGG have primarily focused on capturing spatio-temporal context using complex architectures without addressing the challenges mentioned above, especially the long-tailed distribution of relationships. This often leads to the generation of biased scene graphs. To address these challenges, we introduce a new framework called TEMPURA: TEmporal consistency and Memory Prototype guided UnceRtainty Attenuation for unbiased dynamic SGG. TEMPURA employs object-level temporal consistencies via transformer-based sequence modeling, learns to synthesize unbiased relationship representations using memory-guided training, and attenuates the predictive uncertainty of visual relations using a Gaussian Mixture Model (GMM). Extensive experiments demonstrate that our method achieves significant (up to 10% in some cases) performance gain over existing methods highlighting its superiority in generating more unbiased scene graphs.
△ Less
Submitted 29 June, 2023; v1 submitted 3 April, 2023;
originally announced April 2023.
-
AmorProt: Amino Acid Molecular Fingerprints Repurposing based Protein Fingerprint
Authors:
Myeonghun Lee,
Kyoungmin Min
Abstract:
As protein therapeutics play an important role in almost all medical fields, numerous studies have been conducted on proteins using artificial intelligence. Artificial intelligence has enabled data driven predictions without the need for expensive experiments. Nevertheless, unlike the various molecular fingerprint algorithms that have been developed, protein fingerprint algorithms have rarely been…
▽ More
As protein therapeutics play an important role in almost all medical fields, numerous studies have been conducted on proteins using artificial intelligence. Artificial intelligence has enabled data driven predictions without the need for expensive experiments. Nevertheless, unlike the various molecular fingerprint algorithms that have been developed, protein fingerprint algorithms have rarely been studied. In this study, we proposed the amino acid molecular fingerprints repurposing based protein (AmorProt) fingerprint, a protein sequence representation method that effectively uses the molecular fingerprints corresponding to 20 amino acids. Subsequently, the performances of the tree based machine learning and artificial neural network models were compared using (1) amyloid classification and (2) isoelectric point regression. Finally, the applicability and advantages of the developed platform were demonstrated through a case study and the following experiments: (3) comparison of dataset dependence with feature based methods; (4) feature importance analysis; and (5) protein space analysis. Consequently, the significantly improved model performance and data set independent versatility of the AmorProt fingerprint were verified. The results revealed that the current protein representation method can be applied to various fields related to proteins, such as predicting their fundamental properties or interaction with ligands.
△ Less
Submitted 27 March, 2023;
originally announced March 2023.
-
SHUNIT: Style Harmonization for Unpaired Image-to-Image Translation
Authors:
Seokbeom Song,
Suhyeon Lee,
Hongje Seong,
Kyoungwon Min,
Euntai Kim
Abstract:
We propose a novel solution for unpaired image-to-image (I2I) translation. To translate complex images with a wide range of objects to a different domain, recent approaches often use the object annotations to perform per-class source-to-target style mapping. However, there remains a point for us to exploit in the I2I. An object in each class consists of multiple components, and all the sub-object…
▽ More
We propose a novel solution for unpaired image-to-image (I2I) translation. To translate complex images with a wide range of objects to a different domain, recent approaches often use the object annotations to perform per-class source-to-target style mapping. However, there remains a point for us to exploit in the I2I. An object in each class consists of multiple components, and all the sub-object components have different characteristics. For example, a car in CAR class consists of a car body, tires, windows and head and tail lamps, etc., and they should be handled separately for realistic I2I translation. The simplest solution to the problem will be to use more detailed annotations with sub-object component annotations than the simple object annotations, but it is not possible. The key idea of this paper is to bypass the sub-object component annotations by leveraging the original style of the input image because the original style will include the information about the characteristics of the sub-object components. Specifically, for each pixel, we use not only the per-class style gap between the source and target domains but also the pixel's original style to determine the target style of a pixel. To this end, we present Style Harmonization for unpaired I2I translation (SHUNIT). Our SHUNIT generates a new style by harmonizing the target domain style retrieved from a class memory and an original source image style. Instead of direct source-to-target style mapping, we aim for source and target styles harmonization. We validate our method with extensive experiments and achieve state-of-the-art performance on the latest benchmark sets. The source code is available online: https://github.com/bluejangbaljang/SHUNIT.
△ Less
Submitted 11 January, 2023;
originally announced January 2023.
-
Intel Labs at Ego4D Challenge 2022: A Better Baseline for Audio-Visual Diarization
Authors:
Kyle Min
Abstract:
This report describes our approach for the Audio-Visual Diarization (AVD) task of the Ego4D Challenge 2022. Specifically, we present multiple technical improvements over the official baselines. First, we improve the detection performance of the camera wearer's voice activity by modifying the training scheme of its model. Second, we discover that an off-the-shelf voice activity detection model can…
▽ More
This report describes our approach for the Audio-Visual Diarization (AVD) task of the Ego4D Challenge 2022. Specifically, we present multiple technical improvements over the official baselines. First, we improve the detection performance of the camera wearer's voice activity by modifying the training scheme of its model. Second, we discover that an off-the-shelf voice activity detection model can effectively remove false positives when it is applied solely to the camera wearer's voice activities. Lastly, we show that better active speaker detection leads to a better AVD outcome. Our final method obtains 65.9% DER on the test set of Ego4D, which significantly outperforms all the baselines. Our submission achieved 1st place in the Ego4D Challenge 2022.
△ Less
Submitted 29 October, 2023; v1 submitted 14 October, 2022;
originally announced October 2022.
-
Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection
Authors:
Kyle Min,
Sourya Roy,
Subarna Tripathi,
Tanaya Guha,
Somdeb Majumdar
Abstract:
Active speaker detection (ASD) in videos with multiple speakers is a challenging task as it requires learning effective audiovisual features and spatial-temporal correlations over long temporal windows. In this paper, we present SPELL, a novel spatial-temporal graph learning framework that can solve complex tasks such as ASD. To this end, each person in a video frame is first encoded in a unique n…
▽ More
Active speaker detection (ASD) in videos with multiple speakers is a challenging task as it requires learning effective audiovisual features and spatial-temporal correlations over long temporal windows. In this paper, we present SPELL, a novel spatial-temporal graph learning framework that can solve complex tasks such as ASD. To this end, each person in a video frame is first encoded in a unique node for that frame. Nodes corresponding to a single person across frames are connected to encode their temporal dynamics. Nodes within a frame are also connected to encode inter-person relationships. Thus, SPELL reduces ASD to a node classification task. Importantly, SPELL is able to reason over long temporal contexts for all nodes without relying on computationally expensive fully connected graph neural networks. Through extensive experiments on the AVA-ActiveSpeaker dataset, we demonstrate that learning graph-based representations can significantly improve the active speaker detection performance owing to its explicit spatial and temporal structure. SPELL outperforms all previous state-of-the-art approaches while requiring significantly lower memory and computational resources. Our code is publicly available at https://github.com/SRA2/SPELL
△ Less
Submitted 12 October, 2022; v1 submitted 15 July, 2022;
originally announced July 2022.
-
Hierarchical Latent Structure for Multi-Modal Vehicle Trajectory Forecasting
Authors:
Dooseop Choi,
KyoungWook Min
Abstract:
Variational autoencoder (VAE) has widely been utilized for modeling data distributions because it is theoretically elegant, easy to train, and has nice manifold representations. However, when applied to image reconstruction and synthesis tasks, VAE shows the limitation that the generated sample tends to be blurry. We observe that a similar problem, in which the generated trajectory is located betw…
▽ More
Variational autoencoder (VAE) has widely been utilized for modeling data distributions because it is theoretically elegant, easy to train, and has nice manifold representations. However, when applied to image reconstruction and synthesis tasks, VAE shows the limitation that the generated sample tends to be blurry. We observe that a similar problem, in which the generated trajectory is located between adjacent lanes, often arises in VAE-based trajectory forecasting models. To mitigate this problem, we introduce a hierarchical latent structure into the VAE-based forecasting model. Based on the assumption that the trajectory distribution can be approximated as a mixture of simple distributions (or modes), the low-level latent variable is employed to model each mode of the mixture and the high-level latent variable is employed to represent the weights for the modes. To model each mode accurately, we condition the low-level latent variable using two lane-level context vectors computed in novel ways, one corresponds to vehicle-lane interaction and the other to vehicle-vehicle interaction. The context vectors are also used to model the weights via the proposed mode selection network. To evaluate our forecasting model, we use two large-scale real-world datasets. Experimental results show that our model is not only capable of generating clear multi-modal trajectory distributions but also outperforms the state-of-the-art (SOTA) models in terms of prediction accuracy. Our code is available at https://github.com/d1024choi/HLSTrajForecast.
△ Less
Submitted 11 July, 2022;
originally announced July 2022.
-
JORLDY: a fully customizable open source framework for reinforcement learning
Authors:
Kyushik Min,
Hyunho Lee,
Kwansu Shin,
Taehak Lee,
Hojoon Lee,
Jinwon Choi,
Sungho Son
Abstract:
Recently, Reinforcement Learning (RL) has been actively researched in both academic and industrial fields. However, there exist only a few RL frameworks which are developed for researchers or students who want to study RL. In response, we propose an open-source RL framework "Join Our Reinforcement Learning framework for Developing Yours" (JORLDY). JORLDY provides more than 20 widely used RL algori…
▽ More
Recently, Reinforcement Learning (RL) has been actively researched in both academic and industrial fields. However, there exist only a few RL frameworks which are developed for researchers or students who want to study RL. In response, we propose an open-source RL framework "Join Our Reinforcement Learning framework for Developing Yours" (JORLDY). JORLDY provides more than 20 widely used RL algorithms which are implemented with Pytorch. Also, JORLDY supports multiple RL environments which include OpenAI gym, Unity ML-Agents, Mujoco, Super Mario Bros and Procgen. Moreover, the algorithmic components such as agent, network, environment can be freely customized, so that the users can easily modify and append algorithmic components. We expect that JORLDY will support various RL research and contribute further advance the field of RL. The source code of JORLDY is provided on the following Github: https://github.com/kakaoenterprise/JORLDY
△ Less
Submitted 11 April, 2022;
originally announced April 2022.
-
Style-Guided Domain Adaptation for Face Presentation Attack Detection
Authors:
Young-Eun Kim,
Woo-Jeoung Nam,
Kyungseo Min,
Seong-Whan Lee
Abstract:
Domain adaptation (DA) or domain generalization (DG) for face presentation attack detection (PAD) has attracted attention recently with its robustness against unseen attack scenarios. Existing DA/DG-based PAD methods, however, have not yet fully explored the domain-specific style information that can provide knowledge regarding attack styles (e.g., materials, background, illumination and resolutio…
▽ More
Domain adaptation (DA) or domain generalization (DG) for face presentation attack detection (PAD) has attracted attention recently with its robustness against unseen attack scenarios. Existing DA/DG-based PAD methods, however, have not yet fully explored the domain-specific style information that can provide knowledge regarding attack styles (e.g., materials, background, illumination and resolution). In this paper, we introduce a novel Style-Guided Domain Adaptation (SGDA) framework for inference-time adaptive PAD. Specifically, Style-Selective Normalization (SSN) is proposed to explore the domain-specific style information within the high-order feature statistics. The proposed SSN enables the adaptation of the model to the target domain by reducing the style difference between the target and the source domains. Moreover, we carefully design Style-Aware Meta-Learning (SAML) to boost the adaptation ability, which simulates the inference-time adaptation with style selection process on virtual test domain. In contrast to previous domain adaptation approaches, our method does not require either additional auxiliary models (e.g., domain adaptors) or the unlabeled target domain during training, which makes our method more practical to PAD task. To verify our experiments, we utilize the public datasets: MSU-MFSD, CASIA-FASD, OULU-NPU and Idiap REPLAYATTACK. In most assessments, the result demonstrates a notable gap of performance compared to the conventional DA/DG-based PAD methods.
△ Less
Submitted 19 June, 2022; v1 submitted 28 March, 2022;
originally announced March 2022.
-
ANNA: Enhanced Language Representation for Question Answering
Authors:
Changwook Jun,
Hansol Jang,
Myoseop Sim,
Hyun Kim,
Jooyoung Choi,
Kyungkoo Min,
Kyunghoon Bae
Abstract:
Pre-trained language models have brought significant improvements in performance in a variety of natural language processing tasks. Most existing models performing state-of-the-art results have shown their approaches in the separate perspectives of data processing, pre-training tasks, neural network modeling, or fine-tuning. In this paper, we demonstrate how the approaches affect performance indiv…
▽ More
Pre-trained language models have brought significant improvements in performance in a variety of natural language processing tasks. Most existing models performing state-of-the-art results have shown their approaches in the separate perspectives of data processing, pre-training tasks, neural network modeling, or fine-tuning. In this paper, we demonstrate how the approaches affect performance individually, and that the language model performs the best results on a specific question answering task when those approaches are jointly considered in pre-training models. In particular, we propose an extended pre-training task, and a new neighbor-aware mechanism that attends neighboring tokens more to capture the richness of context for pre-training language modeling. Our best model achieves new state-of-the-art results of 95.7\% F1 and 90.6\% EM on SQuAD 1.1 and also outperforms existing pre-trained language models such as RoBERTa, ALBERT, ELECTRA, and XLNet on the SQuAD 2.0 benchmark.
△ Less
Submitted 3 April, 2022; v1 submitted 28 March, 2022;
originally announced March 2022.
-
MGCVAE: Multi-objective Inverse Design via Molecular Graph Conditional Variational Autoencoder
Authors:
Myeonghun Lee,
Kyoungmin Min
Abstract:
The ultimate goal of various fields is to directly generate molecules with desired properties, such as finding water-soluble molecules in drug development and finding molecules suitable for organic light-emitting diode (OLED) or photosensitizers in the field of development of new organic materials. In this respect, this study proposes a molecular graph generative model based on the autoencoder for…
▽ More
The ultimate goal of various fields is to directly generate molecules with desired properties, such as finding water-soluble molecules in drug development and finding molecules suitable for organic light-emitting diode (OLED) or photosensitizers in the field of development of new organic materials. In this respect, this study proposes a molecular graph generative model based on the autoencoder for de novo design. The performance of molecular graph conditional variational autoencoder (MGCVAE) for generating molecules having specific desired properties is investigated by comparing it to molecular graph variational autoencoder (MGVAE). Furthermore, multi-objective optimization for MGCVAE was applied to satisfy two selected properties simultaneously. In this study, two physical properties -- logP and molar refractivity -- were used as optimization targets for the purpose of designing de novo molecules, especially in drug discovery. As a result, it was confirmed that among generated molecules, 25.89% optimized molecules were generated in MGCVAE compared to 0.66% in MGVAE. Hence, it demonstrates that MGCVAE effectively produced drug-like molecules with two target properties. The results of this study suggest that these graph-based data-driven models are one of the effective methods of designing new molecules that fulfill various physical properties, such as drug discovery.
△ Less
Submitted 14 February, 2022;
originally announced February 2022.
-
Machine Learning-Aided Discovery of Superionic Solid-State Electrolyte for Li-Ion Batteries
Authors:
Seungpyo Kang,
Minseon Kim,
Kyoungmin Min
Abstract:
Li-Ion Solid-State Electrolytes (Li-SSEs) are a promising solution that resolves the critical issues of conventional Li-Ion Batteries (LIBs) such as poor ionic conductivity, interfacial instability, and dendrites growth. In this study, a platform consisting of a high-throughput screening and a machine-learning surrogate model for discovering superionic Li-SSEs among 20,237 Li-containing materials…
▽ More
Li-Ion Solid-State Electrolytes (Li-SSEs) are a promising solution that resolves the critical issues of conventional Li-Ion Batteries (LIBs) such as poor ionic conductivity, interfacial instability, and dendrites growth. In this study, a platform consisting of a high-throughput screening and a machine-learning surrogate model for discovering superionic Li-SSEs among 20,237 Li-containing materials is developed. For the training database, the ionic conductivity of Na SuperIonic CONductor (NASICON) and Li SuperIonic CONductor (LISICON) type SSEs are obtained from the previous literature. Then, the chemical descriptor (CD) and additional structural properties are used as machine-readable features. Li-SSE candidates are selected through the screening criteria, and the prediction on the ionic conductivity of those is followed. Then, to reduce uncertainty in the surrogate model, the ensemble method by considering the best-performing two models is employed, whose mean prediction accuracy is 0.843 and 0.829, respectively. Furthermore, first-principles calculations are conducted for confirming the ionic conductivity of the strong candidates. Finally, six potential superionic Li-SSEs that have not previously been investigated are proposed. We believe that the constructed platform can accelerate the search for Li-SSEs with high ionic conductivity at minimum cost.
△ Less
Submitted 14 February, 2022;
originally announced February 2022.
-
Korean-Specific Dataset for Table Question Answering
Authors:
Changwook Jun,
Jooyoung Choi,
Myoseop Sim,
Hyun Kim,
Hansol Jang,
Kyungkoo Min
Abstract:
Existing question answering systems mainly focus on dealing with text data. However, much of the data produced daily is stored in the form of tables that can be found in documents and relational databases, or on the web. To solve the task of question answering over tables, there exist many datasets for table question answering written in English, but few Korean datasets. In this paper, we demonstr…
▽ More
Existing question answering systems mainly focus on dealing with text data. However, much of the data produced daily is stored in the form of tables that can be found in documents and relational databases, or on the web. To solve the task of question answering over tables, there exist many datasets for table question answering written in English, but few Korean datasets. In this paper, we demonstrate how we construct Korean-specific datasets for table question answering: Korean tabular dataset is a collection of 1.4M tables with corresponding descriptions for unsupervised pre-training language models. Korean table question answering corpus consists of 70k pairs of questions and answers created by crowd-sourced workers. Subsequently, we then build a pre-trained language model based on Transformer and fine-tune the model for table question answering with these datasets. We then report the evaluation results of our model. We make our datasets publicly available via our GitHub repository and hope that those datasets will help further studies for question answering over tables, and for the transformation of table formats.
△ Less
Submitted 1 May, 2022; v1 submitted 17 January, 2022;
originally announced January 2022.
-
Learning Spatial-Temporal Graphs for Active Speaker Detection
Authors:
Sourya Roy,
Kyle Min,
Subarna Tripathi,
Tanaya Guha,
Somdeb Majumdar
Abstract:
We address the problem of active speaker detection through a new framework, called SPELL, that learns long-range multimodal graphs to encode the inter-modal relationship between audio and visual data. We cast active speaker detection as a node classification task that is aware of longer-term dependencies. We first construct a graph from a video so that each node corresponds to one person. Nodes re…
▽ More
We address the problem of active speaker detection through a new framework, called SPELL, that learns long-range multimodal graphs to encode the inter-modal relationship between audio and visual data. We cast active speaker detection as a node classification task that is aware of longer-term dependencies. We first construct a graph from a video so that each node corresponds to one person. Nodes representing the same identity share edges between them within a defined temporal window. Nodes within the same video frame are also connected to encode inter-person interactions. Through extensive experiments on the Ava-ActiveSpeaker dataset, we demonstrate that learning graph-based representation, owing to its explicit spatial and temporal structure, significantly improves the overall performance. SPELL outperforms several relevant baselines and performs at par with state of the art models while requiring an order of magnitude lower computation cost.
△ Less
Submitted 3 December, 2021; v1 submitted 2 December, 2021;
originally announced December 2021.
-
ACNet: Mask-Aware Attention with Dynamic Context Enhancement for Robust Acne Detection
Authors:
Kyungseo Min,
Gun-Hee Lee,
Seong-Whan Lee
Abstract:
Computer-aided diagnosis has recently received attention for its advantage of low cost and time efficiency. Although deep learning played a major role in the recent success of acne detection, there are still several challenges such as color shift by inconsistent illumination, variation in scales, and high density distribution. To address these problems, we propose an acne detection network which c…
▽ More
Computer-aided diagnosis has recently received attention for its advantage of low cost and time efficiency. Although deep learning played a major role in the recent success of acne detection, there are still several challenges such as color shift by inconsistent illumination, variation in scales, and high density distribution. To address these problems, we propose an acne detection network which consists of three components, specifically: Composite Feature Refinement, Dynamic Context Enhancement, and Mask-Aware Multi-Attention. First, Composite Feature Refinement integrates semantic information and fine details to enrich feature representation, which mitigates the adverse impact of imbalanced illumination. Then, Dynamic Context Enhancement controls different receptive fields of multi-scale features for context enhancement to handle scale variation. Finally, Mask-Aware Multi-Attention detects densely arranged and small acne by suppressing uninformative regions and highlighting probable acne regions. Experiments are performed on acne image dataset ACNE04 and natural image dataset PASCAL VOC 2007. We demonstrate how our method achieves the state-of-the-art result on ACNE04 and competitive performance with previous state-of-the-art methods on the PASCAL VOC 2007.
△ Less
Submitted 17 December, 2021; v1 submitted 31 May, 2021;
originally announced May 2021.
-
Machine learning aided materials design platform for predicting the mechanical properties of Na-ion solid-state electrolytes
Authors:
Junho Jo,
Eunseong Choi,
Minseon Kim,
Kyoungmin Min
Abstract:
Na-ion solid-state electrolytes (Na-SSE) exhibit high potential for electrical energy storage owing to their high energy densities and low manufacturing cost. However, their mechanical properties critical to maintain structural stability at the interface are still insufficiently understood. In this study, a machine learning based regression model was developed for predicting the mechanical propert…
▽ More
Na-ion solid-state electrolytes (Na-SSE) exhibit high potential for electrical energy storage owing to their high energy densities and low manufacturing cost. However, their mechanical properties critical to maintain structural stability at the interface are still insufficiently understood. In this study, a machine learning based regression model was developed for predicting the mechanical properties of Na-SSEs. As a training set, 12,361 materials were obtained from a well-known materials database (Materials Project) and were represented with their respective chemical and structural descriptors. The developed surrogate model exhibited a remarkable accuracy (R2 score) of 0.72 and 0.87, with a mean absolute error of 11.8 GPa and 15.3 GPa for the shear and bulk modulus, respectively. This model was then applied to predict the mechanical properties of 2,432 Na-SSEs, the properties of which have been validated with first principles calculations. Finally, the optimization process was performed to develop an ideal materials screening platform by adding the new minimized dataset, wherein the prediction uncertainty is reduced. We believe that the platform proposed in this study can accelerate the search for Na-SSEs with ideal mechanical properties at minimum cost.
△ Less
Submitted 22 April, 2021;
originally announced April 2021.
-
A study of the decoherence correction derived from the exact factorization approach for non-adiabatic dynamics
Authors:
Patricia Vindel-Zandbergen,
Lea M. Ibele,
Jong-Kwon Ha,
Seung Kyu Min,
Basile F. E. Curchod,
Neepa T. Maitra
Abstract:
We present a detailed study of the decoherence correction to surface-hopping that was recently derived from the exact factorization approach. Ab initio multiple spawning calculations that use the same initial conditions and same electronic structure method are used as a reference for three molecules: ethylene, methaniminium cation, and fulvene, for which non-adiabatic dynamics follows a photo-exci…
▽ More
We present a detailed study of the decoherence correction to surface-hopping that was recently derived from the exact factorization approach. Ab initio multiple spawning calculations that use the same initial conditions and same electronic structure method are used as a reference for three molecules: ethylene, methaniminium cation, and fulvene, for which non-adiabatic dynamics follows a photo-excitation. A comparison with the Granucci-Persico energy-based decoherence correction, and the augmented fewest-switches surface-hopping scheme shows that the three decoherence-corrected methods operate on individual trajectories in a qualitatively different way, but results averaged over trajectories are similar for these systems.
△ Less
Submitted 8 April, 2021;
originally announced April 2021.
-
Construction of a far ultraviolet all sky map from an incomplete survey: Application of a deep learning algorithm
Authors:
Young-Soo Jo,
Yeon-Ju Choi,
Min-Gi Kim,
Chang-Ho Woo,
Kyoung-Wook Min,
Kwang-Il Seon
Abstract:
We constructed a far ultraviolet (FUV) all sky map based on observations from the Far Ultraviolet Imaging Spectrograph (FIMS) aboard the Korean microsatellite STSAT-1. For the ~20% of the sky not covered by FIMS observations, predictions from a deep artificial neural network were used. Seven datasets were chosen for input parameters, including five all sky maps of H-alpha, E(B-V), N(HI), and two X…
▽ More
We constructed a far ultraviolet (FUV) all sky map based on observations from the Far Ultraviolet Imaging Spectrograph (FIMS) aboard the Korean microsatellite STSAT-1. For the ~20% of the sky not covered by FIMS observations, predictions from a deep artificial neural network were used. Seven datasets were chosen for input parameters, including five all sky maps of H-alpha, E(B-V), N(HI), and two X-ray bands, with Galactic longitudes and latitudes. 70% of the pixels of the observed FIMS dataset were randomly selected for training as target parameters and the remaining 30% were used for validation. A simple four-layer neural network architecture, which consisted of three convolution layers and a dense layer at the end, was adopted, with an individual activation function for each convolution layer; each convolution layer was followed by a dropout layer. The predicted FUV intensities exhibited good agreement with Galaxy Evolution Explorer (GALEX) observations made in a similar FUV wavelength band for high Galactic latitudes. As a sample application of the constructed map, a dust scattering simulation was conducted with model optical parameters and a Galactic dust model for a region that included observed and predicted pixels. Overall, FUV intensities in the observed and predicted regions were reproduced well.
△ Less
Submitted 10 January, 2021;
originally announced January 2021.
-
Integrating Human Gaze into Attention for Egocentric Activity Recognition
Authors:
Kyle Min,
Jason J. Corso
Abstract:
It is well known that human gaze carries significant information about visual attention. However, there are three main difficulties in incorporating the gaze data in an attention mechanism of deep neural networks: 1) the gaze fixation points are likely to have measurement errors due to blinking and rapid eye movements; 2) it is unclear when and how much the gaze data is correlated with visual atte…
▽ More
It is well known that human gaze carries significant information about visual attention. However, there are three main difficulties in incorporating the gaze data in an attention mechanism of deep neural networks: 1) the gaze fixation points are likely to have measurement errors due to blinking and rapid eye movements; 2) it is unclear when and how much the gaze data is correlated with visual attention; and 3) gaze data is not always available in many real-world situations. In this work, we introduce an effective probabilistic approach to integrate human gaze into spatiotemporal attention for egocentric activity recognition. Specifically, we represent the locations of gaze fixation points as structured discrete latent variables to model their uncertainties. In addition, we model the distribution of gaze fixations using a variational method. The gaze distribution is learned during the training process so that the ground-truth annotations of gaze locations are no longer needed in testing situations since they are predicted from the learned gaze distribution. The predicted gaze locations are used to provide informative attentional cues to improve the recognition performance. Our method outperforms all the previous state-of-the-art approaches on EGTEA, which is a large-scale dataset for egocentric activity recognition provided with gaze measurements. We also perform an ablation study and qualitative analysis to demonstrate that our attention mechanism is effective.
△ Less
Submitted 8 November, 2020;
originally announced November 2020.
-
Effects of transient non-thermal particles on the big bang nucleosynthesis
Authors:
Tae-Sun Park,
Kyung Joo Min,
Seung-Woo Hong
Abstract:
The effects of introducing a small amount of non-thermal distribution (NTD) of elements in big bang nucleosynthesis (BBN) are studied by allowing a fraction of the NTD to be time-dependent so that it contributes only during a certain period of the BBN evolution. The fraction is modeled as a Gaussian-shaped function of $\log(T)$, where $T$ is the temperature of the cosmos, and thus the function is…
▽ More
The effects of introducing a small amount of non-thermal distribution (NTD) of elements in big bang nucleosynthesis (BBN) are studied by allowing a fraction of the NTD to be time-dependent so that it contributes only during a certain period of the BBN evolution. The fraction is modeled as a Gaussian-shaped function of $\log(T)$, where $T$ is the temperature of the cosmos, and thus the function is specified by three parameters; the central temporal position, the width and the magnitude. The change in the average nuclear reaction rates due to the presence of the NTD is assumed to be proportional to the Maxwellian reaction rates but with temperature $T_{\rm NTD} \equiv ζT$, $ζ$ being another parameter of our model. By scanning a wide four-dimensional parametric space at about half a million points, we have found about 130 points with $χ^2< 1$, at which the predicted primordial abundances of light elements are consistent with the observations. The magnitude parameter $\varepsilon_0$ of these points turns out to be scattered over a very wide range from $\varepsilon_0 \sim 10^{-19}$ to $\sim 10^{-1}$, and the $ζ$-parameter is found to be strongly correlated with the magnitude parameter $\varepsilon_0$. The temperature region with $0.3\times 10^9 \mbox{K} \lesssim T \lesssim 0.4\times 10^9 \mbox{K}$ or the temporal region $t\simeq 10^3$ s seems to play a central role in lowering $χ^2$.
△ Less
Submitted 10 September, 2020;
originally announced September 2020.
-
Bunched LPCNet : Vocoder for Low-cost Neural Text-To-Speech Systems
Authors:
Ravichander Vipperla,
Sangjun Park,
Kihyun Choo,
Samin Ishtiaq,
Kyoungbo Min,
Sourav Bhattacharya,
Abhinav Mehrotra,
Alberto Gil C. P. Ramos,
Nicholas D. Lane
Abstract:
LPCNet is an efficient vocoder that combines linear prediction and deep neural network modules to keep the computational complexity low. In this work, we present two techniques to further reduce it's complexity, aiming for a low-cost LPCNet vocoder-based neural Text-to-Speech (TTS) System. These techniques are: 1) Sample-bunching, which allows LPCNet to generate more than one audio sample per infe…
▽ More
LPCNet is an efficient vocoder that combines linear prediction and deep neural network modules to keep the computational complexity low. In this work, we present two techniques to further reduce it's complexity, aiming for a low-cost LPCNet vocoder-based neural Text-to-Speech (TTS) System. These techniques are: 1) Sample-bunching, which allows LPCNet to generate more than one audio sample per inference; and 2) Bit-bunching, which reduces the computations in the final layer of LPCNet. With the proposed bunching techniques, LPCNet, in conjunction with a Deep Convolutional TTS (DCTTS) acoustic model, shows a 2.19x improvement over the baseline run-time when running on a mobile device, with a less than 0.1 decrease in TTS mean opinion score (MOS).
△ Less
Submitted 11 August, 2020;
originally announced August 2020.
-
Adversarial Background-Aware Loss for Weakly-supervised Temporal Activity Localization
Authors:
Kyle Min,
Jason J. Corso
Abstract:
Temporally localizing activities within untrimmed videos has been extensively studied in recent years. Despite recent advances, existing methods for weakly-supervised temporal activity localization struggle to recognize when an activity is not occurring. To address this issue, we propose a novel method named A2CL-PT. Two triplets of the feature space are considered in our approach: one triplet is…
▽ More
Temporally localizing activities within untrimmed videos has been extensively studied in recent years. Despite recent advances, existing methods for weakly-supervised temporal activity localization struggle to recognize when an activity is not occurring. To address this issue, we propose a novel method named A2CL-PT. Two triplets of the feature space are considered in our approach: one triplet is used to learn discriminative features for each activity class, and the other one is used to distinguish the features where no activity occurs (i.e. background features) from activity-related features for each video. To further improve the performance, we build our network using two parallel branches which operate in an adversarial way: the first branch localizes the most salient activities of a video and the second one finds other supplementary activities from non-localized parts of the video. Extensive experiments performed on THUMOS14 and ActivityNet datasets demonstrate that our proposed method is effective. Specifically, the average mAP of IoU thresholds from 0.1 to 0.9 on the THUMOS14 dataset is significantly improved from 27.9% to 30.0%.
△ Less
Submitted 13 July, 2020;
originally announced July 2020.
-
PathGAN: Local Path Planning with Attentive Generative Adversarial Networks
Authors:
Dooseop Choi,
Seung-jun Han,
Kyoungwook Min,
Jeongdan Choi
Abstract:
To achieve autonomous driving without high-definition maps, we present a model capable of generating multiple plausible paths from egocentric images for autonomous vehicles. Our generative model comprises two neural networks: the feature extraction network (FEN) and path generation network (PGN). The FEN extracts meaningful features from an egocentric image, whereas the PGN generates multiple path…
▽ More
To achieve autonomous driving without high-definition maps, we present a model capable of generating multiple plausible paths from egocentric images for autonomous vehicles. Our generative model comprises two neural networks: the feature extraction network (FEN) and path generation network (PGN). The FEN extracts meaningful features from an egocentric image, whereas the PGN generates multiple paths from the features, given a driving intention and speed. To ensure that the paths generated are plausible and consistent with the intention, we introduce an attentive discriminator and train it with the PGN under generative adversarial networks framework. We also devise an interaction model between the positions in the paths and the intentions hidden in the positions and design a novel PGN architecture that reflects the interaction model, resulting in the improvement of the accuracy and diversity of the generated paths. Finally, we introduce ETRIDriving, a dataset for autonomous driving in which the recorded sensor data are labeled with discrete high-level driving actions, and demonstrate the state-of-the-art performance of the proposed model on ETRIDriving in terms of accuracy and diversity.
△ Less
Submitted 2 March, 2021; v1 submitted 7 July, 2020;
originally announced July 2020.
-
TASED-Net: Temporally-Aggregating Spatial Encoder-Decoder Network for Video Saliency Detection
Authors:
Kyle Min,
Jason J. Corso
Abstract:
TASED-Net is a 3D fully-convolutional network architecture for video saliency detection. It consists of two building blocks: first, the encoder network extracts low-resolution spatiotemporal features from an input clip of several consecutive frames, and then the following prediction network decodes the encoded features spatially while aggregating all the temporal information. As a result, a single…
▽ More
TASED-Net is a 3D fully-convolutional network architecture for video saliency detection. It consists of two building blocks: first, the encoder network extracts low-resolution spatiotemporal features from an input clip of several consecutive frames, and then the following prediction network decodes the encoded features spatially while aggregating all the temporal information. As a result, a single prediction map is produced from an input clip of multiple frames. Frame-wise saliency maps can be predicted by applying TASED-Net in a sliding-window fashion to a video. The proposed approach assumes that the saliency map of any frame can be predicted by considering a limited number of past frames. The results of our extensive experiments on video saliency detection validate this assumption and demonstrate that our fully-convolutional model with temporal aggregation method is effective. TASED-Net significantly outperforms previous state-of-the-art approaches on all three major large-scale datasets of video saliency detection: DHF1K, Hollywood2, and UCFSports. After analyzing the results qualitatively, we observe that our model is especially better at attending to salient moving objects.
△ Less
Submitted 15 August, 2019;
originally announced August 2019.
-
Regularizing Neural Networks for Future Trajectory Prediction via Inverse Reinforcement Learning Framework
Authors:
Dooseop Choi,
Kyoungwook Min,
Jeongdan Choi
Abstract:
Predicting distant future trajectories of agents in a dynamic scene is not an easy problem because the future trajectory of an agent is affected by not only his/her past trajectory but also the scene contexts. To tackle this problem, we propose a model based on recurrent neural networks (RNNs) and a novel method for training the model. The proposed model is based on an encoder-decoder architecture…
▽ More
Predicting distant future trajectories of agents in a dynamic scene is not an easy problem because the future trajectory of an agent is affected by not only his/her past trajectory but also the scene contexts. To tackle this problem, we propose a model based on recurrent neural networks (RNNs) and a novel method for training the model. The proposed model is based on an encoder-decoder architecture where the encoder encodes inputs (past trajectories and scene context information) while the decoder produces a trajectory from the context vector given by the encoder. We train the networks of the proposed model to produce a future trajectory, which is the closest to the true trajectory, while maximizing a reward from a reward function. The reward function is also trained at the same time to maximize the margin between the rewards from the ground-truth trajectory and its estimate. The reward function plays the role of a regularizer for the proposed model so the trained networks are able to better utilize the scene context information for the prediction task. We evaluated the proposed model on several public datasets. Experimental results show that the prediction performance of the proposed model is much improved by the regularization, which outperforms the-state-of-the-arts in terms of accuracy. The implementation codes are available at https://github.com/d1024choi/traj-pred-irl/.
△ Less
Submitted 25 December, 2019; v1 submitted 10 July, 2019;
originally announced July 2019.
-
Global distribution of far-ultraviolet emissions from highly ionized gas in the Milky Way
Authors:
Young-Soo Jo,
Kwang-il Seon,
Kyoung-Wook Min,
Jerry Edelstein,
Wonyong Han
Abstract:
We present all-sky maps of two major FUV cooling lines, C IV and O VI, of highly ionized gas to investigate the nature of the transition-temperature gas. From the extinction-corrected line intensities of C IV and O VI, we calculated the gas temperature and the emission measure of the transition-temperature gas assuming isothermal plasma in the collisional ionization equilibrium. The gas temperatur…
▽ More
We present all-sky maps of two major FUV cooling lines, C IV and O VI, of highly ionized gas to investigate the nature of the transition-temperature gas. From the extinction-corrected line intensities of C IV and O VI, we calculated the gas temperature and the emission measure of the transition-temperature gas assuming isothermal plasma in the collisional ionization equilibrium. The gas temperature was found to be more or less uniform throughout the Galaxy with a value of (1.89 $\pm$ 0.06) $\times$ $10^5$ K. The emission measure of the transition-temperature gas is described well by a disk-like model in which the scale height of the electron density is $z_0=6_{-2}^{+3}$ kpc. The total mass of the transition-temperature gas is estimated to be approximately $6.4_{-2.8}^{+5.2}\times10^9 M_{\bigodot}$. We also calculated the volume-filling fraction of the transition-temperature gas, which was estimated to be $f=0.26\pm0.09$, and varies from $f\sim0.37$ in the inner Galaxy to $f\sim0.18$ in the outer Galaxy. The spatial distribution of C IV and O VI cannot be explained by a simple supernova remnant model or a three-phase model. The combined effects of supernova remnants and turbulent mixing layers can explain the intensity ratio of C IV and O VI. Thermal conduction front models and high-velocity cloud models are also consistent with our observation.
△ Less
Submitted 22 May, 2019; v1 submitted 19 May, 2019;
originally announced May 2019.
-
Comparison of the extraplanar H$α$ and UV emissions in the halos of nearby edge-on spiral galaxies
Authors:
Young-Soo Jo,
Kwang-il Seon,
Jong-Ho Shinn,
Yujin Yang,
Dukhang Lee,
Kyoung-Wook Min
Abstract:
We compare vertical profiles of the extraplanar H$α$ emission to those of the UV emission for 38 nearby edge-on late-type galaxies. It is found that detection of the "diffuse" extraplanar dust (eDust), traced by the vertically extended, scattered UV starlight, always coincides with the presence of the extraplanar H$α$ emission. A strong correlation between the scale heights of the extraplanar H…
▽ More
We compare vertical profiles of the extraplanar H$α$ emission to those of the UV emission for 38 nearby edge-on late-type galaxies. It is found that detection of the "diffuse" extraplanar dust (eDust), traced by the vertically extended, scattered UV starlight, always coincides with the presence of the extraplanar H$α$ emission. A strong correlation between the scale heights of the extraplanar H$α$ and UV emissions is also found; the scale height at H$α$ is found to be $\sim0.74$ of the scale height at FUV. Our results may indicate the multiphase nature of the diffuse ionized gas and dust in the galactic halos. The existence of eDust in galaxies where the extraplanar H$α$ emission is detected suggests that a larger portion of the extraplanar H$α$ emission than that predicted in previous studies may be caused by H$α$ photons that originate from H II regions in the galactic plane and are subsequently scattered by the eDust. This possibility raise a in studying the eDIG. We also find that the scale heights of the extraplanar emissions normalized to the galaxy size correlate well with the star formation rate surface density of the galaxies. The properties of eDust in our galaxies is on a continuation line of that found through previous observations of the extraplanar polycyclic aromatic hydrocarbons emission in more active galaxies known to have galactic winds.
△ Less
Submitted 18 June, 2018;
originally announced June 2018.
-
Idiosyncratic Approach to Visualize Degradation of Black Phosphorus
Authors:
Bilal Abbas Naqvi,
Muhammad Arslan Shehzad,
Janghwan Cha,
Kyung Ah Min,
M. Farooq Khan,
Sajjad Hussain,
Seo Yongho,
Suklyun Hong,
Eom Jonghwa,
Jung Jongwan
Abstract:
Black Phosphorus (BP) is an excellent material for post graphene era due to its layer dependent band gap, high mobility and high Ion/Ioff. However, its poor stability in ambient poses a great challenge in its practical and long-term usage. Optical visualization of oxidized BP is the key and foremost step for its successful passivation from the ambience. Here, we have done a systematic study of the…
▽ More
Black Phosphorus (BP) is an excellent material for post graphene era due to its layer dependent band gap, high mobility and high Ion/Ioff. However, its poor stability in ambient poses a great challenge in its practical and long-term usage. Optical visualization of oxidized BP is the key and foremost step for its successful passivation from the ambience. Here, we have done a systematic study of the oxidation of BP and developed a technique to optically identify the oxidation of BP using Liquid Crystal (LC). Interestingly we found that rapid oxidation of thin layers of BP makes them disappear and can be envisaged by using the alignment of LC. The molecular dynamics simulations also proved the preferential alignment of LC on oxidized BP. We believe that this simple technique will be effective in passivation efforts of BP and will enable it for exploitation of its properties in the field of electronics.
△ Less
Submitted 7 June, 2018;
originally announced June 2018.
-
A Unified Model for Extractive and Abstractive Summarization using Inconsistency Loss
Authors:
Wan-Ting Hsu,
Chieh-Kai Lin,
Ming-Ying Lee,
Kerui Min,
Jing Tang,
Min Sun
Abstract:
We propose a unified model combining the strength of extractive and abstractive summarization. On the one hand, a simple extractive model can obtain sentence-level attention with high ROUGE scores but less readable. On the other hand, a more complicated abstractive model can obtain word-level dynamic attention to generate a more readable paragraph. In our model, sentence-level attention is used to…
▽ More
We propose a unified model combining the strength of extractive and abstractive summarization. On the one hand, a simple extractive model can obtain sentence-level attention with high ROUGE scores but less readable. On the other hand, a more complicated abstractive model can obtain word-level dynamic attention to generate a more readable paragraph. In our model, sentence-level attention is used to modulate the word-level attention such that words in less attended sentences are less likely to be generated. Moreover, a novel inconsistency loss function is introduced to penalize the inconsistency between two levels of attentions. By end-to-end training our model with the inconsistency loss and original losses of extractive and abstractive models, we achieve state-of-the-art ROUGE scores while being the most informative and readable summarization on the CNN/Daily Mail dataset in a solid human evaluation.
△ Less
Submitted 5 July, 2018; v1 submitted 16 May, 2018;
originally announced May 2018.
-
Hierarchical Novelty Detection for Visual Object Recognition
Authors:
Kibok Lee,
Kimin Lee,
Kyle Min,
Yuting Zhang,
Jinwoo Shin,
Honglak Lee
Abstract:
Deep neural networks have achieved impressive success in large-scale visual object recognition tasks with a predefined set of classes. However, recognizing objects of novel classes unseen during training still remains challenging. The problem of detecting such novel classes has been addressed in the literature, but most prior works have focused on providing simple binary or regressive decisions, e…
▽ More
Deep neural networks have achieved impressive success in large-scale visual object recognition tasks with a predefined set of classes. However, recognizing objects of novel classes unseen during training still remains challenging. The problem of detecting such novel classes has been addressed in the literature, but most prior works have focused on providing simple binary or regressive decisions, e.g., the output would be "known," "novel," or corresponding confidence intervals. In this paper, we study more informative novelty detection schemes based on a hierarchical classification framework. For an object of a novel class, we aim for finding its closest super class in the hierarchical taxonomy of known classes. To this end, we propose two different approaches termed top-down and flatten methods, and their combination as well. The essential ingredients of our methods are confidence-calibrated classifiers, data relabeling, and the leave-one-out strategy for modeling novel classes under the hierarchical taxonomy. Furthermore, our method can generate a hierarchical embedding that leads to improved generalized zero-shot learning performance in combination with other commonly-used semantic embeddings.
△ Less
Submitted 15 June, 2018; v1 submitted 2 April, 2018;
originally announced April 2018.
-
A unified image reconstruction framework for quantitative dual- and triple-energy CT imaging of material compositions
Authors:
Wei Zhao,
Don Vernekohl,
Fei Han,
Bin Han,
Hao Peng,
Lei Xing,
James K Min
Abstract:
Many clinical applications depend critically on the accurate differentiation and classification of different types of materials in patient anatomy. This work introduces a unified framework for accurate nonlinear material decomposition and applies it, for the first time, in the concept of triple-energy CT (TECT) for enhanced material differentiation and classification as well as dual-energy CT. The…
▽ More
Many clinical applications depend critically on the accurate differentiation and classification of different types of materials in patient anatomy. This work introduces a unified framework for accurate nonlinear material decomposition and applies it, for the first time, in the concept of triple-energy CT (TECT) for enhanced material differentiation and classification as well as dual-energy CT. The triple-energy data acquisition is implemented at the scales of micro-CT and clinical CT imaging with commercial "TwinBeam" dual-source DECT configuration and a fast kV switching DECT configuration. Material decomposition and quantitative comparison with a photon counting detector and with the presence of a bow-tie filter are also performed. The proposed method provides quantitative material- and energy-selective images examining realistic configurations for both dual- and triple-energy CT measurements. Compared to the polychromatic kV CT images, virtual monochromatic images show superior image quality. For the mouse phantom, quantitative measurements show that the differences between gadodiamide and iodine concentrations obtained using TECT and idealized photon counting CT (PCCT) are smaller than 8 mg/mL and 1 mg/mL, respectively. TECT outperforms DECT for multi-contrast CT imaging and is robust with respect to spectrum estimation. For the thorax phantom, the differences between the concentrations of the contrast map and the corresponding true reference values are smaller than 7 mg/mL for all of the realistic configurations. A unified framework for both dual- and triple-energy CT imaging has been established for the accurate extraction of material compositions; considering currently available commercial DECT configurations. The novel technique is promising to provide an urgently needed solution for several CT-based diagnosis and therapy applications.
△ Less
Submitted 15 March, 2018;
originally announced March 2018.
-
Controlled Electrochemical Intercalation of Graphene/h-BN van der Waals Heterostructures
Authors:
S. Y. Frank Zhao,
Giselle A. Elbaz,
D. Kwabena Bediako,
Cyndia Yu,
Dmitri K. Efetov,
Yinsheng Guo,
Jayakanth Ravichandran,
Kyung-Ah Min,
Suklyun Hong,
Takashi Taniguchi,
Kenji Watanabe,
Louis E. Brus,
Xavier Roy,
Philip Kim
Abstract:
Electrochemical intercalation is a powerful method for tuning the electronic properties of layered solids. In this work, we report an electro-chemical strategy to controllably intercalate lithium ions into a series of van der Waals (vdW) heterostructures built by sandwiching graphene between hexagonal boron nitride (h-BN). We demonstrate that encapsulating graphene with h-BN eliminates parasitic s…
▽ More
Electrochemical intercalation is a powerful method for tuning the electronic properties of layered solids. In this work, we report an electro-chemical strategy to controllably intercalate lithium ions into a series of van der Waals (vdW) heterostructures built by sandwiching graphene between hexagonal boron nitride (h-BN). We demonstrate that encapsulating graphene with h-BN eliminates parasitic surface side reactions while simultaneously creating a new hetero-interface that permits intercalation between the atomically thin layers. To monitor the electrochemical process, we employ the Hall effect to precisely monitor the intercalation reaction. We also simultaneously probe the spectroscopic and electrical transport properties of the resulting intercalation compounds at different stages of intercalation. We achieve the highest carrier density $> 5 \times 10^{13} cm^{-2}$ with mobility $> 10^3 cm^2/(Vs)$ in the most heavily intercalated samples, where Shubnikov-de Haas quantum oscillations are observed at low temperatures. These results set the stage for further studies that employ intercalation in modifying properties of vdW heterostructures.
△ Less
Submitted 21 October, 2017;
originally announced October 2017.
-
Asymmetric Electron-Hole Decoherence in Ion-Gated Epitaxial Graphene
Authors:
Kil-Joon Min,
Jaesung Park,
Wan-Seop Kim,
Dong-Hun Chae
Abstract:
We report on asymmetric electron-hole decoherence in epitaxial graphene gated by an ionic liquid. The observed negative magnetoresistance near zero magnetic field for different gate voltages, analyzed in the framework of weak localization, gives rise to distinct electron-hole decoherence. The hole decoherence rate increases prominently with decreasing negative gate voltage while the electron decoh…
▽ More
We report on asymmetric electron-hole decoherence in epitaxial graphene gated by an ionic liquid. The observed negative magnetoresistance near zero magnetic field for different gate voltages, analyzed in the framework of weak localization, gives rise to distinct electron-hole decoherence. The hole decoherence rate increases prominently with decreasing negative gate voltage while the electron decoherence rate does not exhibit any substantial gate dependence. Quantitatively, the hole decoherence rate is as large as the electron decoherence rate by a factor of two. We discuss possible microscopic origins including spin-exchange scattering consistent with our experimental observations.
△ Less
Submitted 21 September, 2017;
originally announced September 2017.
-
EpiFi: An In-Home Sensor Network Architecture for Epidemiological Studies
Authors:
Philip Lundrigan,
Kyeong Min,
Neal Patwari,
Sneha Kasera,
Kerry Kelly,
Jimmy Moore,
Miriah Meyer,
Scott C. Collingwood,
Flory Nkoy,
Bryan Stone,
Katherine Sward
Abstract:
We design and build a system called EpiFi, which allows epidemiologists to easily design and deploy experiments in homes. The focus of EpiFi is reducing the barrier to entry for deploying and using an in-home sensor network. We present a novel architecture for in-home sensor networks configured using a single configuration file and provide: a fast and reliable method for device discovery when inst…
▽ More
We design and build a system called EpiFi, which allows epidemiologists to easily design and deploy experiments in homes. The focus of EpiFi is reducing the barrier to entry for deploying and using an in-home sensor network. We present a novel architecture for in-home sensor networks configured using a single configuration file and provide: a fast and reliable method for device discovery when installed in the home, a new mechanism for sensors to authenticate over the air using a subject's home WiFi router, and data reliability mechanisms to minimize loss in the network through a long-term deployment. We work collaboratively with pediatric asthma researchers to design three studies and deploy EpiFi in homes.
△ Less
Submitted 7 September, 2017;
originally announced September 2017.
-
Far-ultraviolet fluorescent molecular hydrogen emission map of the Milky Way Galaxy
Authors:
Young-Soo Jo,
Kwang-Il Seon,
Kyoung-Wook Min,
Jerry Edelstein,
Wonyong Han
Abstract:
We present the far-ultraviolet (FUV) fluorescent molecular hydrogen (H_2) emission map of the Milky Way Galaxy obtained with FIMS/SPEAR covering ~76% of the sky. The extinction-corrected intensity of the fluorescent H_2 emission has a strong linear correlation with the well-known tracers of the cold interstellar medium (ISM), including color excess E(B-V), neutral hydrogen column density N(H I), a…
▽ More
We present the far-ultraviolet (FUV) fluorescent molecular hydrogen (H_2) emission map of the Milky Way Galaxy obtained with FIMS/SPEAR covering ~76% of the sky. The extinction-corrected intensity of the fluorescent H_2 emission has a strong linear correlation with the well-known tracers of the cold interstellar medium (ISM), including color excess E(B-V), neutral hydrogen column density N(H I), and H_alpha emission. The all-sky H_2 column density map was also obtained using a simple photodissociation region model and interstellar radiation fields derived from UV star catalogs. We estimated the fraction of H2 (f_H2) and the gas-to-dust ratio (GDR) of the diffuse ISM. The f_H2 gradually increases from <1% at optically thin regions where E(B-V) < 0.1 to ~50% for E(B-V) = 3. The estimated GDR is ~5.1 x 10^21 atoms cm^-2 mag^-1, in agreement with the standard value of 5.8 x 10^21 atoms cm^-2 mag^-1.
△ Less
Submitted 16 July, 2017;
originally announced July 2017.