-
AirSketch: Generative Motion to Sketch
Authors:
Hui Xian Grace Lim,
Xuanming Cui,
Yogesh S Rawat,
Ser-Nam Lim
Abstract:
Illustration is a fundamental mode of human expression and communication. Certain types of motion that accompany speech can provide this illustrative mode of communication. While Augmented and Virtual Reality technologies (AR/VR) have introduced tools for producing drawings with hand motions (air drawing), they typically require costly hardware and additional digital markers, thereby limiting thei…
▽ More
Illustration is a fundamental mode of human expression and communication. Certain types of motion that accompany speech can provide this illustrative mode of communication. While Augmented and Virtual Reality technologies (AR/VR) have introduced tools for producing drawings with hand motions (air drawing), they typically require costly hardware and additional digital markers, thereby limiting their accessibility and portability. Furthermore, air drawing demands considerable skill to achieve aesthetic results. To address these challenges, we introduce the concept of AirSketch, aimed at generating faithful and visually coherent sketches directly from hand motions, eliminating the need for complicated headsets or markers. We devise a simple augmentation-based self-supervised training procedure, enabling a controllable image diffusion model to learn to translate from highly noisy hand tracking images to clean, aesthetically pleasing sketches, while preserving the essential visual cues from the original tracking data. We present two air drawing datasets to study this problem. Our findings demonstrate that beyond producing photo-realistic images from precise spatial inputs, controllable image diffusion can effectively produce a refined, clear sketch from a noisy input. Our work serves as an initial step towards marker-less air drawing and reveals distinct applications of controllable diffusion models to AirSketch and AR/VR in general.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Feasibility of Neural Radiance Fields for Crime Scene Video Reconstruction
Authors:
Shariq Nadeem Malik,
Min Hao Chee,
Dayan Mario Anthony Perera,
Chern Hong Lim
Abstract:
This paper aims to review and determine the feasibility of using variations of NeRF models in order to reconstruct crime scenes given input videos of the scene. We focus on three main innovations of NeRF when it comes to reconstructing crime scenes: Multi-object Synthesis, Deformable Synthesis, and Lighting. From there, we analyse its innovation progress against the requirements to be met in order…
▽ More
This paper aims to review and determine the feasibility of using variations of NeRF models in order to reconstruct crime scenes given input videos of the scene. We focus on three main innovations of NeRF when it comes to reconstructing crime scenes: Multi-object Synthesis, Deformable Synthesis, and Lighting. From there, we analyse its innovation progress against the requirements to be met in order to be able to reconstruct crime scenes with given videos of such scenes.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Centrality dependence of Lévy-stable two-pion Bose-Einstein correlations in $\sqrt{s_{_{NN}}}=200$ GeV Au$+$Au collisions
Authors:
PHENIX Collaboration,
N. J. Abdulameer,
U. Acharya,
A. Adare,
C. Aidala,
N. N. Ajitanand,
Y. Akiba,
R. Akimoto,
H. Al-Ta'ani,
J. Alexander,
A. Angerami,
K. Aoki,
N. Apadula,
Y. Aramaki,
H. Asano,
E. C. Aschenauer,
E. T. Atomssa,
T. C. Awes,
B. Azmoun,
V. Babintsev,
M. Bai,
B. Bannier,
K. N. Barish,
B. Bassalleck,
S. Bathe
, et al. (377 additional authors not shown)
Abstract:
The PHENIX experiment measured the centrality dependence of two-pion Bose-Einstein correlation functions in $\sqrt{s_{_{NN}}}=200$~GeV Au$+$Au collisions at the Relativistic Heavy Ion Collider at Brookhaven National Laboratory. The data are well represented by Lévy-stable source distributions. The extracted source parameters are the correlation-strength parameter $λ$, the Lévy index of stability…
▽ More
The PHENIX experiment measured the centrality dependence of two-pion Bose-Einstein correlation functions in $\sqrt{s_{_{NN}}}=200$~GeV Au$+$Au collisions at the Relativistic Heavy Ion Collider at Brookhaven National Laboratory. The data are well represented by Lévy-stable source distributions. The extracted source parameters are the correlation-strength parameter $λ$, the Lévy index of stability $α$, and the Lévy-scale parameter $R$ as a function of transverse mass $m_T$ and centrality. The $λ(m_T)$ parameter is constant at larger values of $m_T$, but decreases as $m_T$ decreases. The Lévy scale parameter $R(m_T)$ decreases with $m_T$ and exhibits proportionality to the length scale of the nuclear overlap region. The Lévy exponent $α(m_T)$ is independent of $m_T$ within uncertainties in each investigated centrality bin, but shows a clear centrality dependence. At all centralities, the Lévy exponent $α$ is significantly different from that of Gaussian ($α=2$) or Cauchy ($α=1$) source distributions. Comparisons to the predictions of Monte-Carlo simulations of resonance-decay chains show that in all but the most peripheral centrality class (50%-60%), the obtained results are inconsistent with the measurements, unless a significant reduction of the in-medium mass of the $η'$ meson is included. In each centrality class, the best value of the in-medium $η'$ mass is compared to the mass of the $η$ meson, as well as to several theoretical predictions that consider restoration of $U_A(1)$ symmetry in hot hadronic matter.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Identity-enabled CDMA LiDAR for massively parallel ranging with a single-element receiver
Authors:
Yixiu Shen,
Zi Heng Lim,
Guangya Zhou
Abstract:
Light detection and ranging (LiDAR) have emerged as a crucial tool for high-resolution 3D imaging, particularly in autonomous vehicles, remote sensing, and augmented reality. However, the increasing demand for faster acquisition speed and higher resolution in LiDAR systems has highlighted the limitations of traditional mechanical scanning methods. This study introduces a novel wavelength-multiplex…
▽ More
Light detection and ranging (LiDAR) have emerged as a crucial tool for high-resolution 3D imaging, particularly in autonomous vehicles, remote sensing, and augmented reality. However, the increasing demand for faster acquisition speed and higher resolution in LiDAR systems has highlighted the limitations of traditional mechanical scanning methods. This study introduces a novel wavelength-multiplexed code-division multiple access (CDMA) parallel laser ranging approach with a single-pixel receiver to address these challenges. By leveraging the unique properties of Gold-sequences in a direct-sequence spread spectrum (DSSS) framework, our design enables comprehensive parallelization in detection and ranging activities to significantly enhance system efficiency and user capacity. The proposed coaxial architecture simplifies hardware requirements using a single avalanche photodiode (APD) for multi-reception, reducing susceptibility to ambient noise and external interferences. We demonstrate 3D imaging at 5 m and 10 m, and the experimental results highlight the capability of our CDMA LiDAR system to achieve 40 parallel ranging channels with centimeter-level depth resolution and an angular resolution of 0.03 degree. Furthermore, our system allows for user identification modulation, enabling identity-based ranging among different users. The robustness of our proposed system against interference and speckle noise and near-far signal problems, combined with its potential for miniaturization and integration into chip-scale optics, presents a promising avenue to develop high-performance, compact LiDAR systems suitable for commercial applications.
△ Less
Submitted 10 July, 2024; v1 submitted 9 July, 2024;
originally announced July 2024.
-
Improved limit on neutrinoless double beta decay of \mohundred~from AMoRE-I
Authors:
A. Agrawal,
V. V. Alenkov,
P. Aryal,
J. Beyer,
B. Bhandari,
R. S. Boiko,
K. Boonin,
O. Buzanov,
C. R. Byeon,
N. Chanthima,
M. K. Cheoun,
J. S. Choe,
Seonho Choi,
S. Choudhury,
J. S. Chung,
F. A. Danevich,
M. Djamal,
D. Drung,
C. Enss,
A. Fleischmann,
A. M. Gangapshev,
L. Gastaldo,
Y. M. Gavrilyuk,
A. M. Gezhaev,
O. Gileva
, et al. (83 additional authors not shown)
Abstract:
AMoRE searches for the signature of neutrinoless double beta decay of $^{100}$Mo with a 100 kg sample of enriched $^{100}$Mo. Scintillating molybdate crystals coupled with a metallic magnetic calorimeter operate at milli-Kelvin temperatures to measure the energy of electrons emitted in the decay. As a demonstration of the full-scale AMoRE, we conducted AMoRE-I, a pre-experiment with 18 molybdate c…
▽ More
AMoRE searches for the signature of neutrinoless double beta decay of $^{100}$Mo with a 100 kg sample of enriched $^{100}$Mo. Scintillating molybdate crystals coupled with a metallic magnetic calorimeter operate at milli-Kelvin temperatures to measure the energy of electrons emitted in the decay. As a demonstration of the full-scale AMoRE, we conducted AMoRE-I, a pre-experiment with 18 molybdate crystals, at the Yangyang Underground Laboratory for over two years. The exposure was 8.02 kg$\cdot$year (or 3.89 kg$_{\mathrm{^{100}Mo}}\cdot$year) and the total background rate near the Q-value was 0.025 $\pm$ 0.002 counts/keV/kg/year. We observed no indication of $0νββ$ decay and report a new lower limit of the half-life of $^{100}$Mo $0νββ$ decay as $ T^{0ν}_{1/2}>3.0\times10^{24}~\mathrm{years}$ at 90\% confidence level. The effective Majorana mass limit range is $m_{ββ}<$(210--610) meV using nuclear matrix elements estimated in the framework of different models, including the recent shell model calculations.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
MMSci: A Multimodal Multi-Discipline Dataset for PhD-Level Scientific Comprehension
Authors:
Zekun Li,
Xianjun Yang,
Kyuri Choi,
Wanrong Zhu,
Ryan Hsieh,
HyeonJung Kim,
Jin Hyuk Lim,
Sungyoung Ji,
Byungju Lee,
Xifeng Yan,
Linda Ruth Petzold,
Stephen D. Wilson,
Woosang Lim,
William Yang Wang
Abstract:
The rapid advancement of Large Language Models (LLMs) and Large Multimodal Models (LMMs) has heightened the demand for AI-based scientific assistants capable of understanding scientific articles and figures. Despite progress, there remains a significant gap in evaluating models' comprehension of professional, graduate-level, and even PhD-level scientific content. Current datasets and benchmarks pr…
▽ More
The rapid advancement of Large Language Models (LLMs) and Large Multimodal Models (LMMs) has heightened the demand for AI-based scientific assistants capable of understanding scientific articles and figures. Despite progress, there remains a significant gap in evaluating models' comprehension of professional, graduate-level, and even PhD-level scientific content. Current datasets and benchmarks primarily focus on relatively simple scientific tasks and figures, lacking comprehensive assessments across diverse advanced scientific disciplines. To bridge this gap, we collected a multimodal, multidisciplinary dataset from open-access scientific articles published in Nature Communications journals. This dataset spans 72 scientific disciplines, ensuring both diversity and quality. We created benchmarks with various tasks and settings to comprehensively evaluate LMMs' capabilities in understanding scientific figures and content. Our evaluation revealed that these tasks are highly challenging: many open-source models struggled significantly, and even GPT-4V and GPT-4o faced difficulties. We also explored using our dataset as training resources by constructing visual instruction-following data, enabling the 7B LLaVA model to achieve performance comparable to GPT-4V/o on our benchmark. Additionally, we investigated the use of our interleaved article texts and figure images for pre-training LMMs, resulting in improvements on the material generation task. The source dataset, including articles, figures, constructed benchmarks, and visual instruction-following data, is open-sourced.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Stochastic Processes: From Classical to Quantum
Authors:
Soon Hoe Lim
Abstract:
The main goal of these notes is to give an introduction to the mathematics of quantum noise and some of its applications in non-equilibrium statistical mechanics. We start with some reminders from the theory of classical stochastic processes. We then provide a brief overview of quantum mechanics and quantum field theory, from the viewpoint of quantum probability and adopting the language of Hudson…
▽ More
The main goal of these notes is to give an introduction to the mathematics of quantum noise and some of its applications in non-equilibrium statistical mechanics. We start with some reminders from the theory of classical stochastic processes. We then provide a brief overview of quantum mechanics and quantum field theory, from the viewpoint of quantum probability and adopting the language of Hudson and Parthasarathy. We introduce quantum stochastic processes on a boson Fock space and their calculus. Whenever possible, we make connections with the relevant concepts in classical probability theory. As an application of the theory, we introduce the theory of open quantum systems, with emphasis on the physics and modeling aspects of these systems.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
CLOi-Mapper: Consistent, Lightweight, Robust, and Incremental Mapper With Embedded Systems for Commercial Robot Services
Authors:
DongKi Noh,
Hyungtae Lim,
Gyuho Eoh,
Duckyu Choi,
Jeongsik Choi,
Hyunjun Lim,
SeungMin Baek,
Hyun Myung
Abstract:
In commercial autonomous service robots with several form factors, simultaneous localization and mapping (SLAM) is an essential technology for providing proper services such as cleaning and guidance. Such robots require SLAM algorithms suitable for specific applications and environments. Hence, several SLAM frameworks have been proposed to address various requirements in the past decade. However,…
▽ More
In commercial autonomous service robots with several form factors, simultaneous localization and mapping (SLAM) is an essential technology for providing proper services such as cleaning and guidance. Such robots require SLAM algorithms suitable for specific applications and environments. Hence, several SLAM frameworks have been proposed to address various requirements in the past decade. However, we have encountered challenges in implementing recent innovative frameworks when handling service robots with low-end processors and insufficient sensor data, such as low-resolution 2D LiDAR sensors. Specifically, regarding commercial robots, consistent performance in different hardware configurations and environments is more crucial than the performance dedicated to specific sensors or environments. Therefore, we propose a) a multi-stage %hierarchical approach for global pose estimation in embedded systems; b) a graph generation method with zero constraints for synchronized sensors; and c) a robust and memory-efficient method for long-term pose-graph optimization. As verified in in-home and large-scale indoor environments, the proposed method yields consistent global pose estimation for services in commercial fields. Furthermore, the proposed method exhibits potential commercial viability considering the consistent performance verified via mass production and long-term (> 5 years) operation.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
B-TMS: Bayesian Traversable Terrain Modeling and Segmentation Across 3D LiDAR Scans and Maps for Enhanced Off-Road Navigation
Authors:
Minho Oh,
Gunhee Shin,
Seoyeon Jang,
Seungjae Lee,
Dongkyu Lee,
Wonho Song,
Byeongho Yu,
Hyungtae Lim,
Jaeyoung Lee,
Hyun Myung
Abstract:
Recognizing traversable terrain from 3D point cloud data is critical, as it directly impacts the performance of autonomous navigation in off-road environments. However, existing segmentation algorithms often struggle with challenges related to changes in data distribution, environmental specificity, and sensor variations. Moreover, when encountering sunken areas, their performance is frequently co…
▽ More
Recognizing traversable terrain from 3D point cloud data is critical, as it directly impacts the performance of autonomous navigation in off-road environments. However, existing segmentation algorithms often struggle with challenges related to changes in data distribution, environmental specificity, and sensor variations. Moreover, when encountering sunken areas, their performance is frequently compromised, and they may even fail to recognize them. To address these challenges, we introduce B-TMS, a novel approach that performs map-wise terrain modeling and segmentation by utilizing Bayesian generalized kernel (BGK) within the graph structure known as the tri-grid field (TGF). Our experiments encompass various data distributions, ranging from single scans to partial maps, utilizing both public datasets representing urban scenes and off-road environments, and our own dataset acquired from extremely bumpy terrains. Our results demonstrate notable contributions, particularly in terms of robustness to data distribution variations, adaptability to diverse environmental conditions, and resilience against the challenges associated with parameter changes.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Evolution Dynamics Toward the Limit Cycle of a Quantum Self-Sustained Oscillator
Authors:
Hendry M. Lim,
Donny Dwiputra,
M Shoufie Ukhtary,
Ahmad R. T. Nugraha
Abstract:
The dynamics of a quantum self-sustained oscillator as it evolves toward its limit cycle may be useful in solving related problems like those in quantum synchronization, yet is inadequately studied. Here we investigate the evolution of a quantum Rayleigh-van der Pol (RvdP) oscillator, the simplest form of a self-sustained oscillator exhibiting a quasiharmonic limit cycle, starting from Fock, therm…
▽ More
The dynamics of a quantum self-sustained oscillator as it evolves toward its limit cycle may be useful in solving related problems like those in quantum synchronization, yet is inadequately studied. Here we investigate the evolution of a quantum Rayleigh-van der Pol (RvdP) oscillator, the simplest form of a self-sustained oscillator exhibiting a quasiharmonic limit cycle, starting from Fock, thermal, and coherent states. We find that the phase-space dynamics significantly differ depending on the initial state -- one evolution toward the limit cycle may take much longer than another and a least-time parameter may be present. We describe the resulting dynamics in terms of the coherence decay and the redistribution of eigenstate occupation.
△ Less
Submitted 26 June, 2024; v1 submitted 18 June, 2024;
originally announced June 2024.
-
Post-hoc Utterance Refining Method by Entity Mining for Faithful Knowledge Grounded Conversations
Authors:
Yoonna Jang,
Suhyune Son,
Jeongwoo Lee,
Junyoung Son,
Yuna Hur,
Jungwoo Lim,
Hyeonseok Moon,
Kisu Yang,
Heuiseok Lim
Abstract:
Despite the striking advances in recent language generation performance, model-generated responses have suffered from the chronic problem of hallucinations that are either untrue or unfaithful to a given source. Especially in the task of knowledge grounded conversation, the models are required to generate informative responses, but hallucinated utterances lead to miscommunication. In particular, e…
▽ More
Despite the striking advances in recent language generation performance, model-generated responses have suffered from the chronic problem of hallucinations that are either untrue or unfaithful to a given source. Especially in the task of knowledge grounded conversation, the models are required to generate informative responses, but hallucinated utterances lead to miscommunication. In particular, entity-level hallucination that causes critical misinformation and undesirable conversation is one of the major concerns. To address this issue, we propose a post-hoc refinement method called REM. It aims to enhance the quality and faithfulness of hallucinated utterances by refining them based on the source knowledge. If the generated utterance has a low source-faithfulness score with the given knowledge, REM mines the key entities in the knowledge and implicitly uses them for refining the utterances. We verify that our method reduces entity hallucination in the utterance. Also, we show the adaptability and efficacy of REM with extensive experiments and generative results. Our code is available at https://github.com/YOONNAJANG/REM.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
Projected background and sensitivity of AMoRE-II
Authors:
A. Agrawal,
V. V. Alenkov,
P. Aryal,
J. Beyer,
B. Bhandari,
R. S. Boiko,
K. Boonin,
O. Buzanov,
C. R. Byeon,
N. Chanthima,
M. K. Cheoun,
J. S. Choe,
Seonho Choi,
S. Choudhury,
J. S. Chung,
F. A. Danevich,
M. Djamal,
D. Drung,
C. Enss,
A. Fleischmann,
A. M. Gangapshev,
L. Gastaldo,
Y. M. Gavrilyuk,
A. M. Gezhaev,
O. Gileva
, et al. (81 additional authors not shown)
Abstract:
AMoRE-II aims to search for neutrinoless double beta decay with an array of 423 Li$_2$$^{100}$MoO$_4$ crystals operating in the cryogenic system as the main phase of the Advanced Molybdenum-based Rare process Experiment (AMoRE). AMoRE has been planned to operate in three phases: AMoRE-pilot, AMoRE-I, and AMoRE-II. AMoRE-II is currently being installed at the Yemi Underground Laboratory, located ap…
▽ More
AMoRE-II aims to search for neutrinoless double beta decay with an array of 423 Li$_2$$^{100}$MoO$_4$ crystals operating in the cryogenic system as the main phase of the Advanced Molybdenum-based Rare process Experiment (AMoRE). AMoRE has been planned to operate in three phases: AMoRE-pilot, AMoRE-I, and AMoRE-II. AMoRE-II is currently being installed at the Yemi Underground Laboratory, located approximately 1000 meters deep in Jeongseon, Korea. The goal of AMoRE-II is to reach up to $T^{0νββ}_{1/2}$ $\sim$ 6 $\times$ 10$^{26}$ years, corresponding to an effective Majorana mass of 15 - 29 meV, covering all the inverted mass hierarchy regions. To achieve this, the background level of the experimental configurations and possible background sources of gamma and beta events should be well understood. We have intensively performed Monte Carlo simulations using the GEANT4 toolkit in all the experimental configurations with potential sources. We report the estimated background level that meets the 10$^{-4}$counts/(keV$\cdot$kg$\cdot$yr) requirement for AMoRE-II in the region of interest (ROI) and show the projected half-life sensitivity based on the simulation study.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Jet modification via $π^0$-hadron correlations in Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV
Authors:
PHENIX Collaboration,
N. J. Abdulameer,
U. Acharya,
A. Adare,
S. Afanasiev,
C. Aidala,
N. N. Ajitanand,
Y. Akiba,
H. Al-Bataineh,
J. Alexander,
M. Alfred,
K. Aoki,
N. Apadula,
L. Aphecetche,
J. Asai,
H. Asano,
E. T. Atomssa,
R. Averbeck,
T. C. Awes,
B. Azmoun,
V. Babintsev,
M. Bai,
G. Baksay,
L. Baksay,
A. Baldisseri
, et al. (510 additional authors not shown)
Abstract:
High-momentum two-particle correlations are a useful tool for studying jet-quenching effects in the quark-gluon plasma. Angular correlations between neutral-pion triggers and charged hadrons with transverse momenta in the range 4--12~GeV/$c$ and 0.5--7~GeV/$c$, respectively, have been measured by the PHENIX experiment in 2014 for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. Suppression is obs…
▽ More
High-momentum two-particle correlations are a useful tool for studying jet-quenching effects in the quark-gluon plasma. Angular correlations between neutral-pion triggers and charged hadrons with transverse momenta in the range 4--12~GeV/$c$ and 0.5--7~GeV/$c$, respectively, have been measured by the PHENIX experiment in 2014 for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. Suppression is observed in the yield of high-momentum jet fragments opposite the trigger particle, which indicates jet suppression stemming from in-medium partonic energy loss, while enhancement is observed for low-momentum particles. The ratio and differences between the yield in Au$+$Au collisions and $p$$+$$p$ collisions, $I_{AA}$ and $Δ_{AA}$, as a function of the trigger-hadron azimuthal separation, $Δφ$, are measured for the first time at the Relativistic Heavy Ion Collider. These results better quantify how the yield of low-$p_T$ associated hadrons is enhanced at wide angle, which is crucial for studying energy loss as well as medium-response effects.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
ChatLang-8: An LLM-Based Synthetic Data Generation Framework for Grammatical Error Correction
Authors:
Jeiyoon Park,
Chanjun Park,
Heuiseok Lim
Abstract:
We explore and improve the capabilities of LLMs to generate data for grammatical error correction (GEC). When merely producing parallel sentences, their patterns are too simplistic to be valuable as a corpus. To address this issue, we propose an automated framework that includes a Subject Selector, Grammar Selector, Prompt Manager, and Evaluator. Additionally, we introduce a new dataset for GEC ta…
▽ More
We explore and improve the capabilities of LLMs to generate data for grammatical error correction (GEC). When merely producing parallel sentences, their patterns are too simplistic to be valuable as a corpus. To address this issue, we propose an automated framework that includes a Subject Selector, Grammar Selector, Prompt Manager, and Evaluator. Additionally, we introduce a new dataset for GEC tasks, named ChatLang-8, which encompasses eight types of subject nouns and 23 types of grammar. It consists of 1 million pairs featuring human-like grammatical errors. Our experiments reveal that ChatLang-8 exhibits a more uniform pattern composition compared to existing GEC datasets. Furthermore, we observe improved model performance when using ChatLang-8 instead of existing GEC datasets. The experimental results suggest that our framework and ChatLang-8 are valuable resources for enhancing ChatGPT's data generation capabilities.
△ Less
Submitted 11 June, 2024; v1 submitted 5 June, 2024;
originally announced June 2024.
-
Translation Deserves Better: Analyzing Translation Artifacts in Cross-lingual Visual Question Answering
Authors:
ChaeHun Park,
Koanho Lee,
Hyesu Lim,
Jaeseok Kim,
Junmo Park,
Yu-Jung Heo,
Du-Seong Chang,
Jaegul Choo
Abstract:
Building a reliable visual question answering~(VQA) system across different languages is a challenging problem, primarily due to the lack of abundant samples for training. To address this challenge, recent studies have employed machine translation systems for the cross-lingual VQA task. This involves translating the evaluation samples into a source language (usually English) and using monolingual…
▽ More
Building a reliable visual question answering~(VQA) system across different languages is a challenging problem, primarily due to the lack of abundant samples for training. To address this challenge, recent studies have employed machine translation systems for the cross-lingual VQA task. This involves translating the evaluation samples into a source language (usually English) and using monolingual models (i.e., translate-test). However, our analysis reveals that translated texts contain unique characteristics distinct from human-written ones, referred to as translation artifacts. We find that these artifacts can significantly affect the models, confirmed by extensive experiments across diverse models, languages, and translation processes. In light of this, we present a simple data augmentation strategy that can alleviate the adverse impacts of translation artifacts.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Enhancing Consistency and Role-Specific Knowledge Capturing by Rebuilding Fictional Character's Persona
Authors:
Jeiyoon Park,
Chanjun Park,
Heuiseok Lim
Abstract:
With the recent introduction of Assistants API, it is expected that document-based language models will be actively used in various domains, especially Role-playing. However, a key challenge lies in utilizing protagonist's persona: Assistants API often fails to achieve with its search because the information extraction part is different each time and it often omits important information such as pr…
▽ More
With the recent introduction of Assistants API, it is expected that document-based language models will be actively used in various domains, especially Role-playing. However, a key challenge lies in utilizing protagonist's persona: Assistants API often fails to achieve with its search because the information extraction part is different each time and it often omits important information such as protagonist's backstory or relationships. It is hard to maintain a consistent persona simply by using the persona document as input to the Assistants API. To address the challenge of achieving stable persona consistency, we propose CharacterGPT, a novel persona reconstruction framework to alleviate the shortcomings of the Assistants API. Our method involves Character Persona Training (CPT), an effective persona rebuilding process that updates the character persona by extracting the character's traits from given summary of the novel for each character as if the story in a novel progresses. In our experiments, we ask each character to take the Big Five Inventory personality test in various settings and analyze the results. To assess whether it can think outside the box, we let each character generate short novels. Extensive experiments and human evaluation demonstrate that CharacterGPT presents new possibilities for role-playing agent research. Code and results are available at: https://github.com/Jeiyoon/charactergpt
△ Less
Submitted 4 June, 2024; v1 submitted 30 May, 2024;
originally announced May 2024.
-
Designing Prompt Analytics Dashboards to Analyze Student-ChatGPT Interactions in EFL Writing
Authors:
Minsun Kim,
SeonGyeom Kim,
Suyoun Lee,
Yoosang Yoon,
Junho Myung,
Haneul Yoo,
Hyungseung Lim,
Jieun Han,
Yoonsu Kim,
So-Yeon Ahn,
Juho Kim,
Alice Oh,
Hwajung Hong,
Tak Yeon Lee
Abstract:
While ChatGPT has significantly impacted education by offering personalized resources for students, its integration into educational settings poses unprecedented risks, such as inaccuracies and biases in AI-generated content, plagiarism and over-reliance on AI, and privacy and security issues. To help teachers address such risks, we conducted a two-phase iterative design process that comprises sur…
▽ More
While ChatGPT has significantly impacted education by offering personalized resources for students, its integration into educational settings poses unprecedented risks, such as inaccuracies and biases in AI-generated content, plagiarism and over-reliance on AI, and privacy and security issues. To help teachers address such risks, we conducted a two-phase iterative design process that comprises surveys, interviews, and prototype demonstration involving six EFL (English as a Foreign Language) teachers, who integrated ChatGPT into semester-long English essay writing classes. Based on the needs identified during the initial survey and interviews, we developed a prototype of Prompt Analytics Dashboard (PAD) that integrates the essay editing history and chat logs between students and ChatGPT. Teacher's feedback on the prototype informs additional features and unmet needs for designing future PAD, which helps them (1) analyze contextual analysis of student behaviors, (2) design an overall learning loop, and (3) develop their teaching skills.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Exploring a Multimodal Fusion-based Deep Learning Network for Detecting Facial Palsy
Authors:
Nicole Heng Yim Oo,
Min Hun Lee,
Jeong Hoon Lim
Abstract:
Algorithmic detection of facial palsy offers the potential to improve current practices, which usually involve labor-intensive and subjective assessment by clinicians. In this paper, we present a multimodal fusion-based deep learning model that utilizes unstructured data (i.e. an image frame with facial line segments) and structured data (i.e. features of facial expressions) to detect facial palsy…
▽ More
Algorithmic detection of facial palsy offers the potential to improve current practices, which usually involve labor-intensive and subjective assessment by clinicians. In this paper, we present a multimodal fusion-based deep learning model that utilizes unstructured data (i.e. an image frame with facial line segments) and structured data (i.e. features of facial expressions) to detect facial palsy. We then contribute to a study to analyze the effect of different data modalities and the benefits of a multimodal fusion-based approach using videos of 21 facial palsy patients. Our experimental results show that among various data modalities (i.e. unstructured data - RGB images and images of facial line segments and structured data - coordinates of facial landmarks and features of facial expressions), the feed-forward neural network using features of facial expression achieved the highest precision of 76.22 while the ResNet-based model using images of facial line segments achieved the highest recall of 83.47. When we leveraged both images of facial line segments and features of facial expressions, our multimodal fusion-based deep learning model slightly improved the precision score to 77.05 at the expense of a decrease in the recall score.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
Efficiently Parameterized Neural Metriplectic Systems
Authors:
Anthony Gruber,
Kookjin Lee,
Haksoo Lim,
Noseong Park,
Nathaniel Trask
Abstract:
Metriplectic systems are learned from data in a way that scales quadratically in both the size of the state and the rank of the metriplectic data. Besides being provably energy conserving and entropy stable, the proposed approach comes with approximation results demonstrating its ability to accurately learn metriplectic dynamics from data as well as an error estimate indicating its potential for g…
▽ More
Metriplectic systems are learned from data in a way that scales quadratically in both the size of the state and the rank of the metriplectic data. Besides being provably energy conserving and entropy stable, the proposed approach comes with approximation results demonstrating its ability to accurately learn metriplectic dynamics from data as well as an error estimate indicating its potential for generalization to unseen timescales when approximation error is low. Examples are provided which illustrate performance in the presence of both full state information as well as when entropic variables are unknown, confirming that the proposed approach exhibits superior accuracy and scalability without compromising on model expressivity.
△ Less
Submitted 28 May, 2024; v1 submitted 25 May, 2024;
originally announced May 2024.
-
Implementation of CU gates and its application in a remote-controlled quantum operation
Authors:
Byungjoo Kim,
Seongjin Hong,
Yong-Su Kim,
Kyunghwan Oh,
Hyang-Tag Lim
Abstract:
Recently, remote-controlled quantum information processing has been proposed for its applications in secure quantum processing protocols and distributed quantum networks. For remote-controlled quantum gates, the experimental realization of controlled unitary (CU) gates between any quantum gates is an essential task. Here, we propose and experimentally demonstrate a scheme for implementing CU gates…
▽ More
Recently, remote-controlled quantum information processing has been proposed for its applications in secure quantum processing protocols and distributed quantum networks. For remote-controlled quantum gates, the experimental realization of controlled unitary (CU) gates between any quantum gates is an essential task. Here, we propose and experimentally demonstrate a scheme for implementing CU gates between arbitrary pairs of unitary gates using the polarization and time-bin degrees of freedom of single-photons. Then, we experimentally implement remote-controlled single-qubit unitary gates by controlling either the state preparation or measurement of the control qubit with high process fidelities. We believe that the proposed remote-controlled quantum gate model can pave the way for secure and efficient quantum information processing.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
A finite time analysis of distributed Q-learning
Authors:
Han-Dong Lim,
Donghwan Lee
Abstract:
Multi-agent reinforcement learning (MARL) has witnessed a remarkable surge in interest, fueled by the empirical success achieved in applications of single-agent reinforcement learning (RL). In this study, we consider a distributed Q-learning scenario, wherein a number of agents cooperatively solve a sequential decision making problem without access to the central reward function which is an averag…
▽ More
Multi-agent reinforcement learning (MARL) has witnessed a remarkable surge in interest, fueled by the empirical success achieved in applications of single-agent reinforcement learning (RL). In this study, we consider a distributed Q-learning scenario, wherein a number of agents cooperatively solve a sequential decision making problem without access to the central reward function which is an average of the local rewards. In particular, we study finite-time analysis of a distributed Q-learning algorithm, and provide a new sample complexity result of $\tilde{\mathcal{O}}\left( \min\left\{\frac{1}{ε^2}\frac{t_{\text{mix}}}{(1-γ)^6 d_{\min}^4 } ,\frac{1}ε\frac{\sqrt{|\gS||\gA|}}{(1-σ_2(\boldsymbol{W}))(1-γ)^4 d_{\min}^3} \right\}\right)$ under tabular lookup
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
New Tight Wavelet Frame Constructions Sharing Responsibility
Authors:
Youngmi Hur,
Hyojae Lim
Abstract:
Tight wavelet frames (TWFs) in $L^2(\mathbb{R}^n)$ are versatile and practical structures that provide the perfect reconstruction property. Nevertheless, existing TWF construction methods exhibit limitations, including a lack of specific methods for generating mother wavelets in extension-based construction, and the necessity to address the sum of squares (SOS) problem even when specific methods f…
▽ More
Tight wavelet frames (TWFs) in $L^2(\mathbb{R}^n)$ are versatile and practical structures that provide the perfect reconstruction property. Nevertheless, existing TWF construction methods exhibit limitations, including a lack of specific methods for generating mother wavelets in extension-based construction, and the necessity to address the sum of squares (SOS) problem even when specific methods for generating mother wavelets are provided in SOS-based construction. It is a common practice for current TWF constructions to begin with a given refinable function. However, this approach places the entire burden on finding suitable mother wavelets. In this paper, we introduce TWF construction methods that spread the burden between both types of functions: refinable functions and mother wavelets. These construction methods offer an alternative approach to circumvent the SOS problem while providing specific techniques for generating mother wavelets. We present examples to illustrate our construction methods.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Bridging the Intent Gap: Knowledge-Enhanced Visual Generation
Authors:
Yi Cheng,
Ziwei Xu,
Dongyun Lin,
Harry Cheng,
Yongkang Wong,
Ying Sun,
Joo Hwee Lim,
Mohan Kankanhalli
Abstract:
For visual content generation, discrepancies between user intentions and the generated content have been a longstanding problem. This discrepancy arises from two main factors. First, user intentions are inherently complex, with subtle details not fully captured by input prompts. The absence of such details makes it challenging for generative models to accurately reflect the intended meaning, leadi…
▽ More
For visual content generation, discrepancies between user intentions and the generated content have been a longstanding problem. This discrepancy arises from two main factors. First, user intentions are inherently complex, with subtle details not fully captured by input prompts. The absence of such details makes it challenging for generative models to accurately reflect the intended meaning, leading to a mismatch between the desired and generated output. Second, generative models trained on visual-label pairs lack the comprehensive knowledge to accurately represent all aspects of the input data in their generated outputs. To address these challenges, we propose a knowledge-enhanced iterative refinement framework for visual content generation. We begin by analyzing and identifying the key challenges faced by existing generative models. Then, we introduce various knowledge sources, including human insights, pre-trained models, logic rules, and world knowledge, which can be leveraged to address these challenges. Furthermore, we propose a novel visual generation framework that incorporates a knowledge-based feedback module to iteratively refine the generation process. This module gradually improves the alignment between the generated content and user intentions. We demonstrate the efficacy of the proposed framework through preliminary results, highlighting the potential of knowledge-enhanced generative models for intention-aligned content generation.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Outlier-Robust Long-Term Robotic Mapping Leveraging Ground Segmentation
Authors:
Hyungtae Lim
Abstract:
Despite the remarkable advancements in deep learning-based perception technologies and simultaneous localization and mapping (SLAM), one can face the failure of these approaches when robots encounter scenarios outside their modeled experiences (here, the term modeling encompasses both conventional pattern finding and data-driven approaches). In particular, because learning-based methods are prone…
▽ More
Despite the remarkable advancements in deep learning-based perception technologies and simultaneous localization and mapping (SLAM), one can face the failure of these approaches when robots encounter scenarios outside their modeled experiences (here, the term modeling encompasses both conventional pattern finding and data-driven approaches). In particular, because learning-based methods are prone to catastrophic failure when operated in untrained scenes, there is still a demand for conventional yet robust approaches that work out of the box in diverse scenarios, such as real-world robotic services and SLAM competitions. In addition, the dynamic nature of real-world environments, characterized by changing surroundings over time and the presence of moving objects, leads to undesirable data points that hinder a robot from localization and path planning. Consequently, methodologies that enable long-term map management, such as multi-session SLAM and static map building, become essential. Therefore, to achieve a robust long-term robotic mapping system that can work out of the box, first, I propose (i) fast and robust ground segmentation to reject the ground points, which are featureless and thus not helpful for localization and mapping. Then, by employing the concept of graduated non-convexity (GNC), I propose (ii) outlier-robust registration with ground segmentation that overcomes the presence of gross outliers within the feature matching results, and (iii) hierarchical multi-session SLAM that not only uses our proposed GNC-based registration but also employs a GNC solver to be robust against outlier loop candidates. Finally, I propose (iv) instance-aware static map building that can handle the presence of moving objects in the environment based on the observation that most moving objects in urban environments are inevitably in contact with the ground.
△ Less
Submitted 27 May, 2024; v1 submitted 18 May, 2024;
originally announced May 2024.
-
Enhancing Language Models for Financial Relation Extraction with Named Entities and Part-of-Speech
Authors:
Menglin Li,
Kwan Hui Lim
Abstract:
The Financial Relation Extraction (FinRE) task involves identifying the entities and their relation, given a piece of financial statement/text. To solve this FinRE problem, we propose a simple but effective strategy that improves the performance of pre-trained language models by augmenting them with Named Entity Recognition (NER) and Part-Of-Speech (POS), as well as different approaches to combine…
▽ More
The Financial Relation Extraction (FinRE) task involves identifying the entities and their relation, given a piece of financial statement/text. To solve this FinRE problem, we propose a simple but effective strategy that improves the performance of pre-trained language models by augmenting them with Named Entity Recognition (NER) and Part-Of-Speech (POS), as well as different approaches to combine these information. Experiments on a financial relations dataset show promising results and highlights the benefits of incorporating NER and POS in existing models. Our dataset and codes are available at https://github.com/kwanhui/FinRelExtract.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Entanglement swapping via lossy channels using photon-number-encoded states
Authors:
Wan Zo,
Bohdan Bilash,
Donghwa Lee,
Yosep Kim,
Hyang-Tag Lim,
Kyunghwan Oh,
Syed M. Assad,
Yong-Su Kim
Abstract:
Entanglement shared between distant parties is a key resource in quantum networks. However, photon losses in quantum channels significantly reduce the success probability of entanglement sharing, which scales quadratically with the channel transmission. Quantum repeaters using entanglement swapping can mitigate this effect, but usually require high-performance photonic quantum memories to synchron…
▽ More
Entanglement shared between distant parties is a key resource in quantum networks. However, photon losses in quantum channels significantly reduce the success probability of entanglement sharing, which scales quadratically with the channel transmission. Quantum repeaters using entanglement swapping can mitigate this effect, but usually require high-performance photonic quantum memories to synchronize photonic qubits. In this work, we theoretically and experimentally investigate an entanglement swapping protocol using photon-number-encoded states that can effectively alleviate quantum channel losses without requiring photonic quantum memories. We demonstrate that the protocol exhibits a success probability scaling linearly with the channel transmission. Furthermore, we show that while unbalanced channel losses can degrade the shared entanglement, this effect can be compensated by optimally adjusting the initial entangled states. Our results highlight the potential of photon-number encoding for realizing robust entanglement distribution in lossy quantum networks.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Towards Precise Observations of Neural Model Robustness in Classification
Authors:
Wenchuan Mu,
Kwan Hui Lim
Abstract:
In deep learning applications, robustness measures the ability of neural models that handle slight changes in input data, which could lead to potential safety hazards, especially in safety-critical applications. Pre-deployment assessment of model robustness is essential, but existing methods often suffer from either high costs or imprecise results. To enhance safety in real-world scenarios, metric…
▽ More
In deep learning applications, robustness measures the ability of neural models that handle slight changes in input data, which could lead to potential safety hazards, especially in safety-critical applications. Pre-deployment assessment of model robustness is essential, but existing methods often suffer from either high costs or imprecise results. To enhance safety in real-world scenarios, metrics that effectively capture the model's robustness are needed. To address this issue, we compare the rigour and usage conditions of various assessment methods based on different definitions. Then, we propose a straightforward and practical metric utilizing hypothesis testing for probabilistic robustness and have integrated it into the TorchAttacks library. Through a comparative analysis of diverse robustness assessment methods, our approach contributes to a deeper understanding of model robustness in safety-critical applications.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
Label-Free Topic-Focused Summarization Using Query Augmentation
Authors:
Wenchuan Mu,
Kwan Hui Lim
Abstract:
In today's data and information-rich world, summarization techniques are essential in harnessing vast text to extract key information and enhance decision-making and efficiency. In particular, topic-focused summarization is important due to its ability to tailor content to specific aspects of an extended text. However, this usually requires extensive labelled datasets and considerable computationa…
▽ More
In today's data and information-rich world, summarization techniques are essential in harnessing vast text to extract key information and enhance decision-making and efficiency. In particular, topic-focused summarization is important due to its ability to tailor content to specific aspects of an extended text. However, this usually requires extensive labelled datasets and considerable computational power. This study introduces a novel method, Augmented-Query Summarization (AQS), for topic-focused summarization without the need for extensive labelled datasets, leveraging query augmentation and hierarchical clustering. This approach facilitates the transferability of machine learning models to the task of summarization, circumventing the need for topic-specific training. Through real-world tests, our method demonstrates the ability to generate relevant and accurate summaries, showing its potential as a cost-effective solution in data-rich environments. This innovation paves the way for broader application and accessibility in the field of topic-focused summarization technology, offering a scalable, efficient method for personalized content extraction.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
Translation of Multifaceted Data without Re-Training of Machine Translation Systems
Authors:
Hyeonseok Moon,
Seungyoon Lee,
Seongtae Hong,
Seungjun Lee,
Chanjun Park,
Heuiseok Lim
Abstract:
Translating major language resources to build minor language resources becomes a widely-used approach. Particularly in translating complex data points composed of multiple components, it is common to translate each component separately. However, we argue that this practice often overlooks the interrelation between components within the same data point. To address this limitation, we propose a nove…
▽ More
Translating major language resources to build minor language resources becomes a widely-used approach. Particularly in translating complex data points composed of multiple components, it is common to translate each component separately. However, we argue that this practice often overlooks the interrelation between components within the same data point. To address this limitation, we propose a novel MT pipeline that considers the intra-data relation in implementing MT for training data. In our MT pipeline, all the components in a data point are concatenated to form a single translation sequence and subsequently reconstructed to the data components after translation. We introduce a Catalyst Statement (CS) to enhance the intra-data relation, and Indicator Token (IT) to assist the decomposition of a translated sequence into its respective data components. Through our approach, we have achieved a considerable improvement in translation quality itself, along with its effectiveness as training data. Compared with the conventional approach that translates each data component separately, our method yields better training data that enhances the performance of the trained model by 2.690 points for the web page ranking (WPR) task, and 0.845 for the question generation (QG) task in the XGLUE benchmark.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Photonic variational quantum eigensolver using entanglement measurements
Authors:
Jinil Lee,
Wooyeong Song,
Donghwa Lee,
Yosep Kim,
Seung-Woo Lee,
Hyang-Tag Lim,
Hojoong Jung,
Sang-Wook Han,
Yong-Su Kim
Abstract:
Variational quantum eigensolver (VQE), which combines quantum systems with classical computational power, has been arisen as a promising candidate for near-term quantum computing applications. However, the experimental resources such as the number of measurements to implement VQE rapidly increases as the Hamiltonian problem size grows. Applying entanglement measurements to reduce the number of mea…
▽ More
Variational quantum eigensolver (VQE), which combines quantum systems with classical computational power, has been arisen as a promising candidate for near-term quantum computing applications. However, the experimental resources such as the number of measurements to implement VQE rapidly increases as the Hamiltonian problem size grows. Applying entanglement measurements to reduce the number of measurement setups has been proposed to address this issue, however, entanglement measurements themselves can introduce additional resource demands. Here, we apply entanglement measurements to the photonic VQE utilizing polarization and path degrees of freedom of a single-photon. In our photonic VQE, entanglement measurements can be deterministically implemented using linear optics, so it takes full advantage of introducing entanglement measurements without additional experimental demands. Moreover, we show that such a setup can mitigate errors in measurement apparatus for a certain Hamiltonian.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Ring-a-Pose: A Ring for Continuous Hand Pose Tracking
Authors:
Tianhong Catherine Yu,
Guilin Hu,
Ruidong Zhang,
Hyunchul Lim,
Saif Mahmud,
Chi-Jung Lee,
Ke Li,
Devansh Agarwal,
Shuyang Nie,
Jinseok Oh,
François Guimbretière,
Cheng Zhang
Abstract:
We present Ring-a-Pose, a single untethered ring that tracks continuous 3D hand poses. Located in the center of the hand, the ring emits an inaudible acoustic signal that each hand pose reflects differently. Ring-a-Pose imposes minimal obtrusions on the hand, unlike multi-ring or glove systems. It is not affected by the choice of clothing that may cover wrist-worn systems. In a series of three use…
▽ More
We present Ring-a-Pose, a single untethered ring that tracks continuous 3D hand poses. Located in the center of the hand, the ring emits an inaudible acoustic signal that each hand pose reflects differently. Ring-a-Pose imposes minimal obtrusions on the hand, unlike multi-ring or glove systems. It is not affected by the choice of clothing that may cover wrist-worn systems. In a series of three user studies with a total of 30 participants, we evaluate Ring-a-Pose's performance on pose tracking and micro-finger gesture recognition. Without collecting any training data from a user, Ring-a-Pose tracks continuous hand poses with a joint error of 14.1mm. The joint error decreases to 10.3mm for fine-tuned user-dependent models. Ring-a-Pose recognizes 7-class micro-gestures with a 90.60% and 99.27% accuracy for user-independent and user-dependent models, respectively. Furthermore, the ring exhibits promising performance when worn on any finger. Ring-a-Pose enables the future of smart rings to track and recognize hand poses using relatively low-power acoustic sensing.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
Contextrast: Contextual Contrastive Learning for Semantic Segmentation
Authors:
Changki Sung,
Wanhee Kim,
Jungho An,
Wooju Lee,
Hyungtae Lim,
Hyun Myung
Abstract:
Despite great improvements in semantic segmentation, challenges persist because of the lack of local/global contexts and the relationship between them. In this paper, we propose Contextrast, a contrastive learning-based semantic segmentation method that allows to capture local/global contexts and comprehend their relationships. Our proposed method comprises two parts: a) contextual contrastive lea…
▽ More
Despite great improvements in semantic segmentation, challenges persist because of the lack of local/global contexts and the relationship between them. In this paper, we propose Contextrast, a contrastive learning-based semantic segmentation method that allows to capture local/global contexts and comprehend their relationships. Our proposed method comprises two parts: a) contextual contrastive learning (CCL) and b) boundary-aware negative (BANE) sampling. Contextual contrastive learning obtains local/global context from multi-scale feature aggregation and inter/intra-relationship of features for better discrimination capabilities. Meanwhile, BANE sampling selects embedding features along the boundaries of incorrectly predicted regions to employ them as harder negative samples on our contrastive learning, resolving segmentation issues along the boundary region by exploiting fine-grained details. We demonstrate that our Contextrast substantially enhances the performance of semantic segmentation networks, outperforming state-of-the-art contrastive learning approaches on diverse public datasets, e.g. Cityscapes, CamVid, PASCAL-C, COCO-Stuff, and ADE20K, without an increase in computational cost during inference.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
FewUser: Few-Shot Social User Geolocation via Contrastive Learning
Authors:
Menglin Li,
Kwan Hui Lim
Abstract:
To address the challenges of scarcity in geotagged data for social user geolocation, we propose FewUser, a novel framework for Few-shot social User geolocation. We incorporate a contrastive learning strategy between users and locations to improve geolocation performance with no or limited training data. FewUser features a user representation module that harnesses a pre-trained language model (PLM)…
▽ More
To address the challenges of scarcity in geotagged data for social user geolocation, we propose FewUser, a novel framework for Few-shot social User geolocation. We incorporate a contrastive learning strategy between users and locations to improve geolocation performance with no or limited training data. FewUser features a user representation module that harnesses a pre-trained language model (PLM) and a user encoder to process and fuse diverse social media inputs effectively. To bridge the gap between PLM's knowledge and geographical data, we introduce a geographical prompting module with hard, soft, and semi-soft prompts, to enhance the encoding of location information. Contrastive learning is implemented through a contrastive loss and a matching loss, complemented by a hard negative mining strategy to refine the learning process. We construct two datasets TwiU and FliU, containing richer metadata than existing benchmarks, to evaluate FewUser and the extensive experiments demonstrate that FewUser significantly outperforms state-of-the-art methods in both zero-shot and various few-shot settings, achieving absolute improvements of 26.95\% and \textbf{41.62\%} on TwiU and FliU, respectively, with only one training sample per class. We further conduct a comprehensive analysis to investigate the impact of user representation on geolocation performance and the effectiveness of FewUser's components, offering valuable insights for future research in this area.
△ Less
Submitted 28 March, 2024;
originally announced April 2024.
-
HyperCLOVA X Technical Report
Authors:
Kang Min Yoo,
Jaegeun Han,
Sookyo In,
Heewon Jeon,
Jisu Jeong,
Jaewook Kang,
Hyunwook Kim,
Kyung-Min Kim,
Munhyong Kim,
Sungju Kim,
Donghyun Kwak,
Hanock Kwak,
Se Jung Kwon,
Bado Lee,
Dongsoo Lee,
Gichang Lee,
Jooho Lee,
Baeseong Park,
Seongjin Shin,
Joonsang Yu,
Seolki Baek,
Sumin Byeon,
Eungsup Cho,
Dooseok Choe,
Jeesung Han
, et al. (371 additional authors not shown)
Abstract:
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t…
▽ More
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs.
△ Less
Submitted 13 April, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
X-LLaVA: Optimizing Bilingual Large Vision-Language Alignment
Authors:
Dongjae Shin,
Hyeonseok Lim,
Inho Won,
Changsu Choi,
Minjun Kim,
Seungwoo Song,
Hangyeol Yoo,
Sangmin Kim,
Kyungtae Lim
Abstract:
The impressive development of large language models (LLMs) is expanding into the realm of large multimodal models (LMMs), which incorporate multiple types of data beyond text. However, the nature of multimodal models leads to significant expenses in the creation of training data. Furthermore, constructing multilingual data for LMMs presents its own set of challenges due to language diversity and c…
▽ More
The impressive development of large language models (LLMs) is expanding into the realm of large multimodal models (LMMs), which incorporate multiple types of data beyond text. However, the nature of multimodal models leads to significant expenses in the creation of training data. Furthermore, constructing multilingual data for LMMs presents its own set of challenges due to language diversity and complexity. Therefore, in this study, we propose two cost-effective methods to solve this problem: (1) vocabulary expansion and pretraining of multilingual LLM for specific languages, and (2) automatic and elaborate construction of multimodal datasets using GPT4-V. Based on015 these methods, we constructed a 91K English-Korean-Chinese multilingual, multimodal training dataset. Additionally, we developed a bilingual multimodal model that exhibits excellent performance in both Korean and English, surpassing existing approaches.
△ Less
Submitted 1 April, 2024; v1 submitted 17 March, 2024;
originally announced March 2024.
-
Optimizing Language Augmentation for Multilingual Large Language Models: A Case Study on Korean
Authors:
ChangSu Choi,
Yongbin Jeong,
Seoyoon Park,
InHo Won,
HyeonSeok Lim,
SangMin Kim,
Yejee Kang,
Chanhyuk Yoon,
Jaewan Park,
Yiseul Lee,
HyeJin Lee,
Younggyun Hahm,
Hansaem Kim,
KyungTae Lim
Abstract:
Large language models (LLMs) use pretraining to predict the subsequent word; however, their expansion requires significant computing resources. Numerous big tech companies and research institutes have developed multilingual LLMs (MLLMs) to meet current demands, overlooking less-resourced languages (LRLs). This study proposed three strategies to enhance the performance of LRLs based on the publicly…
▽ More
Large language models (LLMs) use pretraining to predict the subsequent word; however, their expansion requires significant computing resources. Numerous big tech companies and research institutes have developed multilingual LLMs (MLLMs) to meet current demands, overlooking less-resourced languages (LRLs). This study proposed three strategies to enhance the performance of LRLs based on the publicly available MLLMs. First, the MLLM vocabularies of LRLs were expanded to enhance expressiveness. Second, bilingual data were used for pretraining to align the high- and less-resourced languages. Third, a high-quality small-scale instruction dataset was constructed and instruction-tuning was performed to augment the LRL. The experiments employed the Llama2 model and Korean was used as the LRL, which was quantitatively evaluated against other developed LLMs across eight tasks. Furthermore, a qualitative assessment was performed based on human evaluation and GPT4. Experimental results showed that our proposed Bllossom model exhibited superior performance in qualitative analyses compared to previously proposed Korean monolingual models.
△ Less
Submitted 21 March, 2024; v1 submitted 16 March, 2024;
originally announced March 2024.
-
Improving Real-Time Omnidirectional 3D Multi-Person Human Pose Estimation with People Matching and Unsupervised 2D-3D Lifting
Authors:
Pawel Knap,
Peter Hardy,
Alberto Tamajo,
Hwasup Lim,
Hansung Kim
Abstract:
Current human pose estimation systems focus on retrieving an accurate 3D global estimate of a single person. Therefore, this paper presents one of the first 3D multi-person human pose estimation systems that is able to work in real-time and is also able to handle basic forms of occlusion. First, we adjust an off-the-shelf 2D detector and an unsupervised 2D-3D lifting model for use with a 360…
▽ More
Current human pose estimation systems focus on retrieving an accurate 3D global estimate of a single person. Therefore, this paper presents one of the first 3D multi-person human pose estimation systems that is able to work in real-time and is also able to handle basic forms of occlusion. First, we adjust an off-the-shelf 2D detector and an unsupervised 2D-3D lifting model for use with a 360$^\circ$ panoramic camera and mmWave radar sensors. We then introduce several contributions, including camera and radar calibrations, and the improved matching of people within the image and radar space. The system addresses both the depth and scale ambiguity problems by employing a lightweight 2D-3D pose lifting algorithm that is able to work in real-time while exhibiting accurate performance in both indoor and outdoor environments which offers both an affordable and scalable solution. Notably, our system's time complexity remains nearly constant irrespective of the number of detected individuals, achieving a frame rate of approximately 7-8 fps on a laptop with a commercial-grade GPU.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
FSViewFusion: Few-Shots View Generation of Novel Objects
Authors:
Rukhshanda Hussain,
Hui Xian Grace Lim,
Borchun Chen,
Mubarak Shah,
Ser Nam Lim
Abstract:
Novel view synthesis has observed tremendous developments since the arrival of NeRFs. However, Nerf models overfit on a single scene, lacking generalization to out of distribution objects. Recently, diffusion models have exhibited remarkable performance on introducing generalization in view synthesis. Inspired by these advancements, we explore the capabilities of a pretrained stable diffusion mode…
▽ More
Novel view synthesis has observed tremendous developments since the arrival of NeRFs. However, Nerf models overfit on a single scene, lacking generalization to out of distribution objects. Recently, diffusion models have exhibited remarkable performance on introducing generalization in view synthesis. Inspired by these advancements, we explore the capabilities of a pretrained stable diffusion model for view synthesis without explicit 3D priors. Specifically, we base our method on a personalized text to image model, Dreambooth, given its strong ability to adapt to specific novel objects with a few shots. Our research reveals two interesting findings. First, we observe that Dreambooth can learn the high level concept of a view, compared to arguably more complex strategies which involve finetuning diffusions on large amounts of multi-view data. Second, we establish that the concept of a view can be disentangled and transferred to a novel object irrespective of the original object's identify from which the views are learnt. Motivated by this, we introduce a learning strategy, FSViewFusion, which inherits a specific view through only one image sample of a single scene, and transfers the knowledge to a novel object, learnt from few shots, using low rank adapters. Through extensive experiments we demonstrate that our method, albeit simple, is efficient in generating reliable view samples for in the wild images. Code and models will be released.
△ Less
Submitted 12 March, 2024; v1 submitted 10 March, 2024;
originally announced March 2024.
-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Authors:
Gemini Team,
Petko Georgiev,
Ving Ian Lei,
Ryan Burnell,
Libin Bai,
Anmol Gulati,
Garrett Tanzer,
Damien Vincent,
Zhufeng Pan,
Shibo Wang,
Soroosh Mariooryad,
Yifan Ding,
Xinyang Geng,
Fred Alcober,
Roy Frostig,
Mark Omernick,
Lexi Walker,
Cosmin Paduraru,
Christina Sorokin,
Andrea Tacchetti,
Colin Gaffney,
Samira Daruki,
Olcan Sercinoglu,
Zach Gleicher,
Juliette Love
, et al. (1092 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February…
▽ More
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
△ Less
Submitted 14 June, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
KnowPhish: Large Language Models Meet Multimodal Knowledge Graphs for Enhancing Reference-Based Phishing Detection
Authors:
Yuexin Li,
Chengyu Huang,
Shumin Deng,
Mei Lin Lock,
Tri Cao,
Nay Oo,
Hoon Wei Lim,
Bryan Hooi
Abstract:
Phishing attacks have inflicted substantial losses on individuals and businesses alike, necessitating the development of robust and efficient automated phishing detection approaches. Reference-based phishing detectors (RBPDs), which compare the logos on a target webpage to a known set of logos, have emerged as the state-of-the-art approach. However, a major limitation of existing RBPDs is that the…
▽ More
Phishing attacks have inflicted substantial losses on individuals and businesses alike, necessitating the development of robust and efficient automated phishing detection approaches. Reference-based phishing detectors (RBPDs), which compare the logos on a target webpage to a known set of logos, have emerged as the state-of-the-art approach. However, a major limitation of existing RBPDs is that they rely on a manually constructed brand knowledge base, making it infeasible to scale to a large number of brands, which results in false negative errors due to the insufficient brand coverage of the knowledge base. To address this issue, we propose an automated knowledge collection pipeline, using which we collect a large-scale multimodal brand knowledge base, KnowPhish, containing 20k brands with rich information about each brand. KnowPhish can be used to boost the performance of existing RBPDs in a plug-and-play manner. A second limitation of existing RBPDs is that they solely rely on the image modality, ignoring useful textual information present in the webpage HTML. To utilize this textual information, we propose a Large Language Model (LLM)-based approach to extract brand information of webpages from text. Our resulting multimodal phishing detection approach, KnowPhish Detector (KPD), can detect phishing webpages with or without logos. We evaluate KnowPhish and KPD on a manually validated dataset, and a field study under Singapore's local context, showing substantial improvements in effectiveness and efficiency compared to state-of-the-art baselines.
△ Less
Submitted 15 June, 2024; v1 submitted 4 March, 2024;
originally announced March 2024.
-
Leveraging Contrastive Learning for Few-shot Geolocation of Social Posts
Authors:
Menglin Li,
Kwan Hui Lim
Abstract:
Social geolocation is an important problem of predicting the originating locations of social media posts. However, this task is challenging due to the need for a substantial volume of training data, alongside well-annotated labels. These issues are further exacerbated by new or less popular locations with insufficient labels, further leading to an imbalanced dataset. In this paper, we propose \tex…
▽ More
Social geolocation is an important problem of predicting the originating locations of social media posts. However, this task is challenging due to the need for a substantial volume of training data, alongside well-annotated labels. These issues are further exacerbated by new or less popular locations with insufficient labels, further leading to an imbalanced dataset. In this paper, we propose \textbf{ContrastGeo}, a \textbf{Contrast}ive learning enhanced framework for few-shot social \textbf{Geo}location. Specifically, a Tweet-Location Contrastive learning objective is introduced to align representations of tweets and locations within tweet-location pairs. To capture the correlations between tweets and locations, a Tweet-Location Matching objective is further adopted into the framework and refined via an online hard negative mining approach. We also develop three fusion strategies with various fusion encoders to better generate joint representations of tweets and locations. Comprehensive experiments on three social media datasets highlight ContrastGeo's superior performance over several state-of-the-art baselines in few-shot social geolocation.
△ Less
Submitted 19 February, 2024;
originally announced March 2024.
-
Generation and optimization of entanglement between giant atoms chirally coupled to spin cavities
Authors:
Jia-Bin You,
Jian Feng Kong,
Davit Aghamalyan,
Wai-Keong Mok,
Kian Hwee Lim,
Jun Ye,
Ching Eng Png,
Francisco J. García-Vidal
Abstract:
We explore a scheme for entanglement generation and optimization in giant atoms by coupling them to finite one-dimensional arrays of spins that behave as cavities. We find that high values for the concurrence can be achieved in small-sized cavities, being the generation time very short. When exciting the system by external means, optimal concurrence is obtained for very weak drivings. We also anal…
▽ More
We explore a scheme for entanglement generation and optimization in giant atoms by coupling them to finite one-dimensional arrays of spins that behave as cavities. We find that high values for the concurrence can be achieved in small-sized cavities, being the generation time very short. When exciting the system by external means, optimal concurrence is obtained for very weak drivings. We also analyze the effect of disorder in these systems, showing that although the average concurrence decreases with disorder, high concurrences can still be obtained even in scenarios presenting strong disorder. This result leads us to propose an optimization procedure in which by engineering the on-site energies or hoppings in the cavity, concurrences close to 1 can be reached within an extremely short period of time.
△ Less
Submitted 29 February, 2024;
originally announced March 2024.
-
Analysis of Multi-Source Language Training in Cross-Lingual Transfer
Authors:
Seong Hoon Lim,
Taejun Yun,
Jinhyeon Kim,
Jihun Choi,
Taeuk Kim
Abstract:
The successful adaptation of multilingual language models (LMs) to a specific language-task pair critically depends on the availability of data tailored for that condition. While cross-lingual transfer (XLT) methods have contributed to addressing this data scarcity problem, there still exists ongoing debate about the mechanisms behind their effectiveness. In this work, we focus on one of promising…
▽ More
The successful adaptation of multilingual language models (LMs) to a specific language-task pair critically depends on the availability of data tailored for that condition. While cross-lingual transfer (XLT) methods have contributed to addressing this data scarcity problem, there still exists ongoing debate about the mechanisms behind their effectiveness. In this work, we focus on one of promising assumptions about inner workings of XLT, that it encourages multilingual LMs to place greater emphasis on language-agnostic or task-specific features. We test this hypothesis by examining how the patterns of XLT change with a varying number of source languages involved in the process. Our experimental findings show that the use of multiple source languages in XLT-a technique we term Multi-Source Language Training (MSLT)-leads to increased mingling of embedding spaces for different languages, supporting the claim that XLT benefits from making use of language-independent information. On the other hand, we discover that using an arbitrary combination of source languages does not always guarantee better performance. We suggest simple heuristics for identifying effective language combinations for MSLT and empirically prove its effectiveness.
△ Less
Submitted 4 June, 2024; v1 submitted 21 February, 2024;
originally announced February 2024.
-
Finite-Time Error Analysis of Online Model-Based Q-Learning with a Relaxed Sampling Model
Authors:
Han-Dong Lim,
HyeAnn Lee,
Donghwan Lee
Abstract:
Reinforcement learning has witnessed significant advancements, particularly with the emergence of model-based approaches. Among these, $Q$-learning has proven to be a powerful algorithm in model-free settings. However, the extension of $Q$-learning to a model-based framework remains relatively unexplored. In this paper, we delve into the sample complexity of $Q$-learning when integrated with a mod…
▽ More
Reinforcement learning has witnessed significant advancements, particularly with the emergence of model-based approaches. Among these, $Q$-learning has proven to be a powerful algorithm in model-free settings. However, the extension of $Q$-learning to a model-based framework remains relatively unexplored. In this paper, we delve into the sample complexity of $Q$-learning when integrated with a model-based approach. Through theoretical analyses and empirical evaluations, we seek to elucidate the conditions under which model-based $Q$-learning excels in terms of sample efficiency compared to its model-free counterpart.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
Exploring the Effects of Population and Employment Characteristics on Truck Flows: An Analysis of NextGen NHTS Origin-Destination Data
Authors:
Majbah Uddin,
Yuandong Liu,
Hyeonsup Lim
Abstract:
Truck transportation remains the dominant mode of US freight transportation because of its advantages, such as the flexibility of accessing pickup and drop-off points and faster delivery. Because of the massive freight volume transported by trucks, understanding the effects of population and employment characteristics on truck flows is critical for better transportation planning and investment dec…
▽ More
Truck transportation remains the dominant mode of US freight transportation because of its advantages, such as the flexibility of accessing pickup and drop-off points and faster delivery. Because of the massive freight volume transported by trucks, understanding the effects of population and employment characteristics on truck flows is critical for better transportation planning and investment decisions. The US Federal Highway Administration published a truck travel origin-destination data set as part of the Next Generation National Household Travel Survey program. This data set contains the total number of truck trips in 2020 within and between 583 predefined zones encompassing metropolitan and nonmetropolitan statistical areas within each state and Washington, DC. In this study, origin-destination-level truck trip flow data was augmented to include zone-level population and employment characteristics from the US Census Bureau. Census population and County Business Patterns data were included. The final data set was used to train a machine learning algorithm-based model, Extreme Gradient Boosting (XGBoost), where the target variable is the number of total truck trips. Shapley Additive ExPlanation (SHAP) was adopted to explain the model results. Results showed that the distance between the zones was the most important variable and had a nonlinear relationship with truck flows.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
Improving the accuracy of freight mode choice models: A case study using the 2017 CFS PUF data set and ensemble learning techniques
Authors:
Diyi Liu,
Hyeonsup Lim,
Majbah Uddin,
Yuandong Liu,
Lee D. Han,
Ho-ling Hwang,
Shih-Miao Chin
Abstract:
The US Census Bureau has collected two rounds of experimental data from the Commodity Flow Survey, providing shipment-level characteristics of nationwide commodity movements, published in 2012 (i.e., Public Use Microdata) and in 2017 (i.e., Public Use File). With this information, data-driven methods have become increasingly valuable for understanding detailed patterns in freight logistics. In thi…
▽ More
The US Census Bureau has collected two rounds of experimental data from the Commodity Flow Survey, providing shipment-level characteristics of nationwide commodity movements, published in 2012 (i.e., Public Use Microdata) and in 2017 (i.e., Public Use File). With this information, data-driven methods have become increasingly valuable for understanding detailed patterns in freight logistics. In this study, we used the 2017 Commodity Flow Survey Public Use File data set to explore building a high-performance freight mode choice model, considering three main improvements: (1) constructing local models for each separate commodity/industry category; (2) extracting useful geographical features, particularly the derived distance of each freight mode between origin/destination zones; and (3) applying additional ensemble learning methods such as stacking or voting to combine results from local and unified models for improved performance. The proposed method achieved over 92% accuracy without incorporating external information, an over 19% increase compared to directly fitting Random Forests models over 10,000 samples. Furthermore, SHAP (Shapely Additive Explanations) values were computed to explain the outputs and major patterns obtained from the proposed model. The model framework could enhance the performance and interpretability of existing freight mode choice models.
△ Less
Submitted 12 February, 2024; v1 submitted 1 February, 2024;
originally announced February 2024.
-
Understanding Electric Vehicle Ownership Using Data Fusion and Spatial Modeling
Authors:
Meiyu,
Pan,
Majbah Uddin,
Hyeonsup Lim
Abstract:
The global shift toward electric vehicles (EVs) for climate sustainability lacks comprehensive insights into the impact of the built environment on EV ownership, especially in varying spatial contexts. This study, focusing on New York State, integrates data fusion techniques across diverse datasets to examine the influence of socioeconomic and built environmental factors on EV ownership. The utili…
▽ More
The global shift toward electric vehicles (EVs) for climate sustainability lacks comprehensive insights into the impact of the built environment on EV ownership, especially in varying spatial contexts. This study, focusing on New York State, integrates data fusion techniques across diverse datasets to examine the influence of socioeconomic and built environmental factors on EV ownership. The utilization of spatial regression models reveals consistent coefficient values, highlighting the robustness of the results, with the Spatial Lag model better at capturing spatial autocorrelation. Results underscore the significance of charging stations within a 10-mile radius, indicative of a preference for convenient charging options influencing EV ownership decisions. Factors like higher education levels, lower rental populations, and concentrations of older population align with increased EV ownership. Utilizing publicly available data offers a more accessible avenue for understanding EV ownership across regions, complementing traditional survey approaches.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
Hyperphosphorylation-Induced Phase Transition in Vesicle Delivery Dynamics of Motor Proteins in Neuronal Cells
Authors:
Eunsang Lee,
Donghee Kim,
Yo Han Song,
Kyujin Shin,
Sanggeun Song,
Minho Lee,
Yeongchang Goh,
Mi Hee Lim,
Ji-Hyun Kim,
Jaeyoung Sung,
Kang Taek Lee
Abstract:
Synaptic vesicle transport by motor proteins along microtubules is a crucial active process underlying neuronal communication. It is known that microtubules are destabilized by tau-hyperphosphorylation, which causes tau proteins to detach from microtubules and form neurofibril tangles. However, how tau-phosphorylation affects transport dynamics of motor proteins on the microtubule remains unknown.…
▽ More
Synaptic vesicle transport by motor proteins along microtubules is a crucial active process underlying neuronal communication. It is known that microtubules are destabilized by tau-hyperphosphorylation, which causes tau proteins to detach from microtubules and form neurofibril tangles. However, how tau-phosphorylation affects transport dynamics of motor proteins on the microtubule remains unknown. Here, we discover that long-distance unidirectional motion of vesicle-motor protein multiplexes (VMPMs) in living cells is suppressed under tau-hyperphosphorylation, with the consequent loss of fast vesicle-transport along the microtubule. The VMPMs in hyperphosphorylated cells exhibit seemingly bidirectional random motion, with dynamic properties far different from VMPM motion in normal cells. We establish a parsimonious physicochemical model of VMPM's active motion that provides a unified, quantitative explanation and predictions for our experimental results. Our analysis reveals that, under hyperphosphorylation conditions, motor-protein-multiplexes have both static and dynamic motility fluctuations. The loss of the fast vesicle-transport along the microtubule can be a mechanism of neurodegenerative disorders associated with tau-hyperphosphorylation.
△ Less
Submitted 23 April, 2024; v1 submitted 27 January, 2024;
originally announced January 2024.
-
Toward Practical Automatic Speech Recognition and Post-Processing: a Call for Explainable Error Benchmark Guideline
Authors:
Seonmin Koo,
Chanjun Park,
Jinsung Kim,
Jaehyung Seo,
Sugyeong Eo,
Hyeonseok Moon,
Heuiseok Lim
Abstract:
Automatic speech recognition (ASR) outcomes serve as input for downstream tasks, substantially impacting the satisfaction level of end-users. Hence, the diagnosis and enhancement of the vulnerabilities present in the ASR model bear significant importance. However, traditional evaluation methodologies of ASR systems generate a singular, composite quantitative metric, which fails to provide comprehe…
▽ More
Automatic speech recognition (ASR) outcomes serve as input for downstream tasks, substantially impacting the satisfaction level of end-users. Hence, the diagnosis and enhancement of the vulnerabilities present in the ASR model bear significant importance. However, traditional evaluation methodologies of ASR systems generate a singular, composite quantitative metric, which fails to provide comprehensive insight into specific vulnerabilities. This lack of detail extends to the post-processing stage, resulting in further obfuscation of potential weaknesses. Despite an ASR model's ability to recognize utterances accurately, subpar readability can negatively affect user satisfaction, giving rise to a trade-off between recognition accuracy and user-friendliness. To effectively address this, it is imperative to consider both the speech-level, crucial for recognition accuracy, and the text-level, critical for user-friendliness. Consequently, we propose the development of an Error Explainable Benchmark (EEB) dataset. This dataset, while considering both speech- and text-level, enables a granular understanding of the model's shortcomings. Our proposition provides a structured pathway for a more `real-world-centric' evaluation, a marked shift away from abstracted, traditional methods, allowing for the detection and rectification of nuanced system weaknesses, ultimately aiming for an improved user experience.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
Alternative Speech: Complementary Method to Counter-Narrative for Better Discourse
Authors:
Seungyoon Lee,
Dahyun Jung,
Chanjun Park,
Seolhwa Lee,
Heuiseok Lim
Abstract:
We introduce the concept of "Alternative Speech" as a new way to directly combat hate speech and complement the limitations of counter-narrative. An alternative speech provides practical alternatives to hate speech in real-world scenarios by offering speech-level corrections to speakers while considering the surrounding context and promoting speakers to reform. Further, an alternative speech can c…
▽ More
We introduce the concept of "Alternative Speech" as a new way to directly combat hate speech and complement the limitations of counter-narrative. An alternative speech provides practical alternatives to hate speech in real-world scenarios by offering speech-level corrections to speakers while considering the surrounding context and promoting speakers to reform. Further, an alternative speech can combat hate speech alongside counter-narratives, offering a useful tool to address social issues such as racial discrimination and gender inequality. We propose the new concept and provide detailed guidelines for constructing the necessary dataset. Through discussion, we demonstrate that combining alternative speech and counter-narrative can be a more effective strategy for combating hate speech by complementing specificity and guiding capacity of counter-narrative. This paper presents another perspective for dealing with hate speech, offering viable remedies to complement the constraints of current approaches to mitigating harmful bias.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.