-
A Word Order Synchronization Metric for Evaluating Simultaneous Interpretation and Translation
Authors:
Mana Makinae,
Katsuhito Sudoh,
Mararu Yamada,
Satoshi Nakamura
Abstract:
Simultaneous interpretation (SI), the translation of one language to another in real time, starts translation before the original speech has finished. Its evaluation needs to consider both latency and quality. This trade-off is challenging especially for distant word order language pairs such as English and Japanese. To handle this word order gap, interpreters maintain the word order of the source…
▽ More
Simultaneous interpretation (SI), the translation of one language to another in real time, starts translation before the original speech has finished. Its evaluation needs to consider both latency and quality. This trade-off is challenging especially for distant word order language pairs such as English and Japanese. To handle this word order gap, interpreters maintain the word order of the source language as much as possible to keep up with original language to minimize its latency while maintaining its quality, whereas in translation reordering happens to keep fluency in the target language. This means outputs synchronized with the source language are desirable based on the real SI situation, and it's a key for further progress in computational SI and simultaneous machine translation (SiMT). In this work, we propose an automatic evaluation metric for SI and SiMT focusing on word order synchronization. Our evaluation metric is based on rank correlation coefficients, leveraging cross-lingual pre-trained language models. Our experimental results on NAIST-SIC-Aligned and JNPC showed our metrics' effectiveness to measure word order synchronization between source and target language.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Comparison of optical spectra between asteroids Ryugu and Bennu: II. High-precision analysis for space weathering trends
Authors:
K. Yumoto,
E. Tatsumi,
T. Kouyama,
D. R. Golish,
Y. Cho,
T. Morota,
S. Kameda,
H. Sato,
B. Rizk,
D. N. DellaGiustina,
Y. Yokota,
H. Suzuki,
J. de León,
H. Campins,
J. Licandro,
M. Popescu,
J. L. Rizos,
R. Honda,
M. Yamada,
N. Sakatani,
C. Honda,
M. Matsuoka,
M. Hayakawa,
H. Sawada,
K. Ogawa
, et al. (3 additional authors not shown)
Abstract:
The influence of space weathering on the observed spectra of C-complex asteroids remains uncertain. This has long hindered our understanding of their composition through telescope observations. Multi-band imaging of Ryugu by ONC-T on Hayabusa2 and that of Bennu by MapCam on OSIRIS-REx found opposite spectral trends of space weathering; Ryugu darkened/reddened while Bennu brightened/blued. How the…
▽ More
The influence of space weathering on the observed spectra of C-complex asteroids remains uncertain. This has long hindered our understanding of their composition through telescope observations. Multi-band imaging of Ryugu by ONC-T on Hayabusa2 and that of Bennu by MapCam on OSIRIS-REx found opposite spectral trends of space weathering; Ryugu darkened/reddened while Bennu brightened/blued. How the spectra of Ryugu and Bennu evolved relative to each other would place a constraint for understanding their origins and evolutions. In this study, we compared the space weathering trends on Ryugu and Bennu by applying the results of cross calibration between ONC-T and MapCam. We show that the average Bennu surface is brighter by 18.0 $\pm$ 1.5% at 550 nm and bluer by 0.18 $\pm$ 0.03 $μ$m$^{-1}$ (480-850 nm slope) than Ryugu. The spectral slopes of surface materials are more uniform on Bennu than on Ryugu at spatial scales $\gtrsim$1 m, but Bennu is more heterogeneous at $\lesssim$1 m. This suggests that lateral mixing due to resurfacing may have been more efficient on Bennu. The reflectance-spectral slope distributions of craters on Ryugu and Bennu appeared to follow two trend lines with an offset before cross calibration, but they converged to a single straight trend without a bend after cross calibration. We show that the spectra of the freshest craters on Ryugu and Bennu are indistinguishable within the uncertainty of cross calibration. These results suggest that Ryugu and Bennu initially had similar spectra before space weathering and that they evolved in completely opposite directions along the same trend line, subsequently evolving into asteroids with different disk-averaged spectra. These findings further suggest that space weathering likely expanded the spectral slope variation of C-complex asteroids, implying that they may have formed from materials with more uniform spectral slopes.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Semantic Graph Consistency: Going Beyond Patches for Regularizing Self-Supervised Vision Transformers
Authors:
Chaitanya Devaguptapu,
Sumukh Aithal,
Shrinivas Ramasubramanian,
Moyuru Yamada,
Manohar Kaul
Abstract:
Self-supervised learning (SSL) with vision transformers (ViTs) has proven effective for representation learning as demonstrated by the impressive performance on various downstream tasks. Despite these successes, existing ViT-based SSL architectures do not fully exploit the ViT backbone, particularly the patch tokens of the ViT. In this paper, we introduce a novel Semantic Graph Consistency (SGC) m…
▽ More
Self-supervised learning (SSL) with vision transformers (ViTs) has proven effective for representation learning as demonstrated by the impressive performance on various downstream tasks. Despite these successes, existing ViT-based SSL architectures do not fully exploit the ViT backbone, particularly the patch tokens of the ViT. In this paper, we introduce a novel Semantic Graph Consistency (SGC) module to regularize ViT-based SSL methods and leverage patch tokens effectively. We reconceptualize images as graphs, with image patches as nodes and infuse relational inductive biases by explicit message passing using Graph Neural Networks into the SSL framework. Our SGC loss acts as a regularizer, leveraging the underexploited patch tokens of ViTs to construct a graph and enforcing consistency between graph features across multiple views of an image. Extensive experiments on various datasets including ImageNet, RESISC and Food-101 show that our approach significantly improves the quality of learned representations, resulting in a 5-10\% increase in performance when limited labeled data is used for linear evaluation. These experiments coupled with a comprehensive set of ablations demonstrate the promise of our approach in various settings.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis
Authors:
Yuping Lin,
Pengfei He,
Han Xu,
Yue Xing,
Makoto Yamada,
Hui Liu,
Jiliang Tang
Abstract:
Large language models (LLMs) are susceptible to a type of attack known as jailbreaking, which misleads LLMs to output harmful contents. Although there are diverse jailbreak attack strategies, there is no unified understanding on why some methods succeed and others fail. This paper explores the behavior of harmful and harmless prompts in the LLM's representation space to investigate the intrinsic p…
▽ More
Large language models (LLMs) are susceptible to a type of attack known as jailbreaking, which misleads LLMs to output harmful contents. Although there are diverse jailbreak attack strategies, there is no unified understanding on why some methods succeed and others fail. This paper explores the behavior of harmful and harmless prompts in the LLM's representation space to investigate the intrinsic properties of successful jailbreak attacks. We hypothesize that successful attacks share some similar properties: They are effective in moving the representation of the harmful prompt towards the direction to the harmless prompts. We leverage hidden representations into the objective of existing jailbreak attacks to move the attacks along the acceptance direction, and conduct experiments to validate the above hypothesis using the proposed objective. We hope this study provides new insights into understanding how LLMs understand harmfulness information.
△ Less
Submitted 26 June, 2024; v1 submitted 15 June, 2024;
originally announced June 2024.
-
Polyak Meets Parameter-free Clipped Gradient Descent
Authors:
Yuki Takezawa,
Han Bao,
Ryoma Sato,
Kenta Niwa,
Makoto Yamada
Abstract:
Gradient descent and its variants are de facto standard algorithms for training machine learning models. As gradient descent is sensitive to its hyperparameters, we need to tune the hyperparameters carefully using a grid search, but it is time-consuming, especially when multiple hyperparameters exist. Recently, parameter-free methods that adjust the hyperparameters on the fly have been studied. Ho…
▽ More
Gradient descent and its variants are de facto standard algorithms for training machine learning models. As gradient descent is sensitive to its hyperparameters, we need to tune the hyperparameters carefully using a grid search, but it is time-consuming, especially when multiple hyperparameters exist. Recently, parameter-free methods that adjust the hyperparameters on the fly have been studied. However, the existing work only studied parameter-free methods for the stepsize, and parameter-free methods for other hyperparameters have not been explored. For instance, the gradient clipping threshold is also a crucial hyperparameter in addition to the stepsize to prevent gradient explosion issues, but none of the existing studies investigated the parameter-free methods for clipped gradient descent. In this work, we study the parameter-free methods for clipped gradient descent. Specifically, we propose Inexact Polyak Stepsize, which converges to the optimal solution without any hyperparameters tuning, and its convergence rate is asymptotically independent of L under L-smooth and $(L_0, L_1)$-smooth assumptions of the loss function as that of clipped gradient descent with well-tuned hyperparameters. We numerically validated our convergence results using a synthetic function and demonstrated the effectiveness of our proposed methods using LSTM, Nano-GPT, and T5.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
PhiNets: Brain-inspired Non-contrastive Learning Based on Temporal Prediction Hypothesis
Authors:
Satoki Ishikawa,
Makoto Yamada,
Han Bao,
Yuki Takezawa
Abstract:
SimSiam is a prominent self-supervised learning method that achieves impressive results in various vision tasks under static environments. However, it has two critical issues: high sensitivity to hyperparameters, especially weight decay, and unsatisfactory performance in online and continual learning, where neuroscientists believe that powerful memory functions are necessary, as in brains. In this…
▽ More
SimSiam is a prominent self-supervised learning method that achieves impressive results in various vision tasks under static environments. However, it has two critical issues: high sensitivity to hyperparameters, especially weight decay, and unsatisfactory performance in online and continual learning, where neuroscientists believe that powerful memory functions are necessary, as in brains. In this paper, we propose PhiNet, inspired by a hippocampal model based on the temporal prediction hypothesis. Unlike SimSiam, which aligns two augmented views of the original image, PhiNet integrates an additional predictor block that estimates the original image representation to imitate the CA1 region in the hippocampus. Moreover, we model the neocortex inspired by the Complementary Learning Systems theory with a momentum encoder block as a slow learner, which works as long-term memory. We demonstrate through analysing the learning dynamics that PhiNet benefits from the additional predictor to prevent the complete collapse of learned representations, a notorious challenge in non-contrastive learning. This dynamics analysis may partially corroborate why this hippocampal model is biologically plausible. Experimental results demonstrate that PhiNet is more robust to weight decay and performs better than SimSiam in memory-intensive tasks like online and continual learning.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Functional Renormalization Group Analysis of $O(3)$ Nonlinear Sigma Model and Non-Abelian Bosonization Duality
Authors:
Junichi Haruna,
Keito Shimizu,
Masatoshi Yamada
Abstract:
It is known that the $SU(2)$ Wess-Zumino-Witten model is dual to the free fermion theory in two dimensions via non-Abelian bosonization. Additionally, the $SU(2)$ Wess-Zumino-Witten model is believed to be equivalent to the $O(3)$ nonlinear sigma model with the theta term. In this work, we reexamine this duality through the lens of renormalization group (RG) flow. We analyze the RG flow structure…
▽ More
It is known that the $SU(2)$ Wess-Zumino-Witten model is dual to the free fermion theory in two dimensions via non-Abelian bosonization. Additionally, the $SU(2)$ Wess-Zumino-Witten model is believed to be equivalent to the $O(3)$ nonlinear sigma model with the theta term. In this work, we reexamine this duality through the lens of renormalization group (RG) flow. We analyze the RG flow structure of the $O(3)$ nonlinear sigma model with the theta term in two dimensions using the functional renormalization group. Our results reveal a nontrivial fixed point with a nonzero value of the topological coupling. The scaling dimensions (critical exponents) at this fixed point suggest the realization of duality between the $O(3)$ nonlinear sigma model with the theta term and the free fermion theory, indicating that these models belong to the same universality class.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Thermal Wash-in Leptogenesis via Heavy Higgs Decay
Authors:
Kyohei Mukaida,
Hidenaga Watanabe,
Masaki Yamada
Abstract:
We present a conceptually simple model to generate asymmetries that are not directly related to baryon nor lepton charges. The model employs a three-Higgs doublet framework, wherein the other two Higgs fields are significantly heavier than the Standard Model (SM) Higgs field. The decay of these heavier Higgs fields generates asymmetry for approximately conserved charges in the Standard Model at a…
▽ More
We present a conceptually simple model to generate asymmetries that are not directly related to baryon nor lepton charges. The model employs a three-Higgs doublet framework, wherein the other two Higgs fields are significantly heavier than the Standard Model (SM) Higgs field. The decay of these heavier Higgs fields generates asymmetry for approximately conserved charges in the Standard Model at a high temperature. These asymmetries will be converted into baryon/lepton asymmetry through $B-L$ violating interactions associated with right-handed neutrinos via the wash-in mechanism.
△ Less
Submitted 11 July, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
Multifield Stochastic Dynamics in GUT Hybrid Inflation and Gravitational Wave Signatures of GUT Higgs Representation
Authors:
Yuichiro Tada,
Masaki Yamada
Abstract:
We revisit the hybrid inflation model within the framework of the Grand Unified Theory (GUT), focusing on cases where the waterfall phase transition extends over several e-foldings to dilute monopoles. Considering the stochastic effects of quantum fluctuations, we demonstrate that the waterfall fields (i.e., GUT Higgs) maintain a nonzero vacuum expectation value around the waterfall phase transiti…
▽ More
We revisit the hybrid inflation model within the framework of the Grand Unified Theory (GUT), focusing on cases where the waterfall phase transition extends over several e-foldings to dilute monopoles. Considering the stochastic effects of quantum fluctuations, we demonstrate that the waterfall fields (i.e., GUT Higgs) maintain a nonzero vacuum expectation value around the waterfall phase transition. By accurately accounting for the number of degrees of freedom of the GUT Higgs field, we establish that these fluctuations can produce observable gravitational waves without leading to an overproduction of primordial black holes. The amplitude of these gravitational waves is inversely proportional to the degrees of freedom of the waterfall fields, thereby providing a unique method to probe the representation of the GUT Higgs.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
GLoD: Composing Global Contexts and Local Details in Image Generation
Authors:
Moyuru Yamada
Abstract:
Diffusion models have demonstrated their capability to synthesize high-quality and diverse images from textual prompts. However, simultaneous control over both global contexts (e.g., object layouts and interactions) and local details (e.g., colors and emotions) still remains a significant challenge. The models often fail to understand complex descriptions involving multiple objects and reflect spe…
▽ More
Diffusion models have demonstrated their capability to synthesize high-quality and diverse images from textual prompts. However, simultaneous control over both global contexts (e.g., object layouts and interactions) and local details (e.g., colors and emotions) still remains a significant challenge. The models often fail to understand complex descriptions involving multiple objects and reflect specified visual attributes to wrong targets or ignore them. This paper presents Global-Local Diffusion (\textit{GLoD}), a novel framework which allows simultaneous control over the global contexts and the local details in text-to-image generation without requiring training or fine-tuning. It assigns multiple global and local prompts to corresponding layers and composes their noises to guide a denoising process using pre-trained diffusion models. Our framework enables complex global-local compositions, conditioning objects in the global prompt with the local prompts while preserving other unspecified identities. Our quantitative and qualitative evaluations demonstrate that GLoD effectively generates complex images that adhere to both user-provided object interactions and object details.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Imaging quantum interference in a monolayer Kitaev quantum spin liquid candidate
Authors:
Y. Kohsaka,
S. Akutagawa,
S. Omachi,
Y. Iwamichi,
T. Ono,
I. Tanaka,
S. Tateishi,
H. Murayama,
S. Suetsugu,
K. Hashimoto,
T. Shibauchi,
M. O. Takahashi,
M. G. Yamada,
S. Nikolaev,
T. Mizushima,
S. Fujimoto,
T. Terashima,
T. Asaba,
Y. Kasahara,
Y. Matsuda
Abstract:
Single atomic defects are prominent windows to look into host quantum states because collective responses from the host states emerge as localized states around the defects. Friedel oscillations and Kondo clouds in Fermi liquids are quintessential examples. However, the situation is quite different for quantum spin liquid (QSL), an exotic state of matter with fractionalized quasiparticles and topo…
▽ More
Single atomic defects are prominent windows to look into host quantum states because collective responses from the host states emerge as localized states around the defects. Friedel oscillations and Kondo clouds in Fermi liquids are quintessential examples. However, the situation is quite different for quantum spin liquid (QSL), an exotic state of matter with fractionalized quasiparticles and topological order arising from a profound impact of quantum entanglement. Elucidating the underlying local electronic property has been challenging due to the charge neutrality of fractionalized quasiparticles and the insulating nature of QSLs. Here, using spectroscopic-imaging scanning tunneling microscopy, we report atomically resolved images of monolayer $α$-RuCl$_3$, the most promising Kitaev QSL candidate, on metallic substrates. We find quantum interference in the insulator manifesting as incommensurate and decaying spatial oscillations of the local density of states around defects with a characteristic bias dependence. The oscillation differs from any known spatial structures in its nature and does not exist in other Mott insulators, implying it is an exotic oscillation involved with excitations unique to $α$-RuCl$_3$. Numerical simulations can reproduce the observed oscillation by assuming that itinerant Majorana fermions of Kitaev QSL are scattered across the Majorana Fermi surface. The oscillation provides a new approach to exploring Kitaev QSLs through the local response against defects like Friedel oscillations in metals.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
The incompatibility of quantum channels in general probabilistic theories
Authors:
Masataka Yamada,
Takayuki Miyadera
Abstract:
In quantum theory, there exist sets of operations that cannot be performed simultaneously. These sets of operations are referred to as incompatible. While this definition of incompatibility extends to general probabilistic theories, the dependency of the set of compatible sets on the definition of composite systems has not been thoroughly investigated. In the context of quantum channels, compatibi…
▽ More
In quantum theory, there exist sets of operations that cannot be performed simultaneously. These sets of operations are referred to as incompatible. While this definition of incompatibility extends to general probabilistic theories, the dependency of the set of compatible sets on the definition of composite systems has not been thoroughly investigated. In the context of quantum channels, compatibility is defined using the tensor product of Hilbert spaces, employing the usual composite system. However, in the context of general probabilistic theories, composite systems are not uniquely determined, and the set of states can range from min tensor to max tensor, forming various convex sets. In this paper, in addition to quantum compatibility using the usual composite system, we introduce min-tensor-compatibility using the min-tensor on the composite system of effect spaces and investigate their relationship using noisy identity channels on qubits. As a result, we found that the set of min-tensor-compatible channel pairs is strictly broader than the set of quantum-compatible channel pairs. Furthermore, we introduce the concept of almost quantum compatible channel pairs from an operational perspective. This concept corresponds to cases where the correlation functions appearing in the verification of compatibility can be realized through a channel and local reinterpretation of effects. We demonstrate that the set of all almost quantum compatible channel pairs is strictly narrower than the set of all min-tensor-compatible channel pairs.
△ Less
Submitted 2 March, 2024;
originally announced March 2024.
-
Neutrino zeromodes on electroweak strings in light of topological insulators
Authors:
Minoru Eto,
Yu Hamada,
Ryusuke Jinno,
Muneto Nitta,
Masatoshi Yamada
Abstract:
We examine neutrino zeromode solutions on the electroweak $Z$-string and their effect on the stability of the string in the standard model and its extensions. We propose using topological invariants constructed from the momentum (and real) space topology of Green's functions, often used for investigating edge modes in condensed matter physics. We analyze the standard model and then examine type-I…
▽ More
We examine neutrino zeromode solutions on the electroweak $Z$-string and their effect on the stability of the string in the standard model and its extensions. We propose using topological invariants constructed from the momentum (and real) space topology of Green's functions, often used for investigating edge modes in condensed matter physics. We analyze the standard model and then examine type-I and type-II extensions of the neutrino sector as well as their hybrid. Based on this analysis, we also comment on proposals in the literature to stabilize the $Z$-string.
△ Less
Submitted 7 June, 2024; v1 submitted 29 February, 2024;
originally announced February 2024.
-
Perturbative reheating and thermalization of pure Yang-Mills plasma
Authors:
Kyohei Mukaida,
Masaki Yamada
Abstract:
We investigate the thermalization of high-energy particles injected from the perturbative decay of inflaton during the pre-thermal phase of reheating in detail. In general, thermalization takes a relatively long time in a low-temperature plasma; therefore, the instantaneous thermalization approximation is not justified, even for the reheating of the Standard Model (SM) sector. We consider a pure Y…
▽ More
We investigate the thermalization of high-energy particles injected from the perturbative decay of inflaton during the pre-thermal phase of reheating in detail. In general, thermalization takes a relatively long time in a low-temperature plasma; therefore, the instantaneous thermalization approximation is not justified, even for the reheating of the Standard Model (SM) sector. We consider a pure Yang-Mills (YM) theory as an approximation of the SM sector or a possible dark sector, considering the Landau-Pomeranchuk-Migdal effect, a quantum interference effect in a finite temperature plasma. We perform the first numerical calculation to solve the time evolution of the system, including the redshift due to the expansion of the Universe, and show the details of the temperature evolution near the maximum and the behavior of the quasi-attractors at later times. The maximal temperature $T_\text{max}$ and time scale $t_\text{max}$ are determined quantitatively, such as $T_\text{max} \simeq 0.05 \times (Γ_I M_\text{Pl}^2/m_I^3)^{2/5} m_I$ and $t_\text{max} \simeq 2 \times 10^3 \times (Γ_I M_\text{Pl}^2/m_I^3)^{-3/5} m_I^{-1}$ in the SM-like system, where $m_I$ and $Γ_I$ are the mass and decay rate of inflaton. We also provide a similar formula for pure $\operatorname*{SU}(N)$ and $\operatorname*{SO}(N)$ YM theories for general values of $N$ and coupling constant $α$, including $T_\text{max} \propto α^{4/5}$ and $t_\text{max} \propto N^{-2} α^{-16/5}$ behaviors and their numerical coefficients. The thermalization occurs in a finite time scale, resulting in a lower maximal temperature of the Universe after inflation than that under the instantaneous thermalization approximation.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Analysis of Linear Mode Connectivity via Permutation-Based Weight Matching
Authors:
Akira Ito,
Masanori Yamada,
Atsutoshi Kumagai
Abstract:
Recently, Ainsworth et al. showed that using weight matching (WM) to minimize the $L_2$ distance in a permutation search of model parameters effectively identifies permutations that satisfy linear mode connectivity (LMC), in which the loss along a linear path between two independently trained models with different seeds remains nearly constant. This paper provides a theoretical analysis of LMC usi…
▽ More
Recently, Ainsworth et al. showed that using weight matching (WM) to minimize the $L_2$ distance in a permutation search of model parameters effectively identifies permutations that satisfy linear mode connectivity (LMC), in which the loss along a linear path between two independently trained models with different seeds remains nearly constant. This paper provides a theoretical analysis of LMC using WM, which is crucial for understanding stochastic gradient descent's effectiveness and its application in areas like model merging. We first experimentally and theoretically show that permutations found by WM do not significantly reduce the $L_2$ distance between two models and the occurrence of LMC is not merely due to distance reduction by WM in itself. We then provide theoretical insights showing that permutations can change the directions of the singular vectors, but not the singular values, of the weight matrices in each layer. This finding shows that permutations found by WM mainly align the directions of singular vectors associated with large singular values across models. This alignment brings the singular vectors with large singular values, which determine the model functionality, closer between pre-merged and post-merged models, so that the post-merged model retains functionality similar to the pre-merged models, making it easy to satisfy LMC. Finally, we analyze the difference between WM and straight-through estimator (STE), a dataset-dependent permutation search method, and show that WM outperforms STE, especially when merging three or more models.
△ Less
Submitted 15 April, 2024; v1 submitted 6 February, 2024;
originally announced February 2024.
-
Data Poisoning for In-context Learning
Authors:
Pengfei He,
Han Xu,
Yue Xing,
Hui Liu,
Makoto Yamada,
Jiliang Tang
Abstract:
In the domain of large language models (LLMs), in-context learning (ICL) has been recognized for its innovative ability to adapt to new tasks, relying on examples rather than retraining or fine-tuning. This paper delves into the critical issue of ICL's susceptibility to data poisoning attacks, an area not yet fully explored. We wonder whether ICL is vulnerable, with adversaries capable of manipula…
▽ More
In the domain of large language models (LLMs), in-context learning (ICL) has been recognized for its innovative ability to adapt to new tasks, relying on examples rather than retraining or fine-tuning. This paper delves into the critical issue of ICL's susceptibility to data poisoning attacks, an area not yet fully explored. We wonder whether ICL is vulnerable, with adversaries capable of manipulating example data to degrade model performance. To address this, we introduce ICLPoison, a specialized attacking framework conceived to exploit the learning mechanisms of ICL. Our approach uniquely employs discrete text perturbations to strategically influence the hidden states of LLMs during the ICL process. We outline three representative strategies to implement attacks under our framework, each rigorously evaluated across a variety of models and tasks. Our comprehensive tests, including trials on the sophisticated GPT-4 model, demonstrate that ICL's performance is significantly compromised under our framework. These revelations indicate an urgent need for enhanced defense mechanisms to safeguard the integrity and reliability of LLMs in applications relying on in-context learning.
△ Less
Submitted 27 March, 2024; v1 submitted 3 February, 2024;
originally announced February 2024.
-
On the phase structure of extra-dimensional gauge theories with fermions
Authors:
Álvaro Pastor-Gutiérrez,
Masatoshi Yamada
Abstract:
We study the phase structure of five-dimensional Yang-Mills theories coupled to Dirac fermions. In order to tackle their non-perturbative character, we derive the flow equations for the gauge coupling and the effective potential for the Aharonov-Bohm phases employing the Functional Renormalisation Group. We analyse the infrared and ultraviolet fixed-point solutions in the flow of the gauge couplin…
▽ More
We study the phase structure of five-dimensional Yang-Mills theories coupled to Dirac fermions. In order to tackle their non-perturbative character, we derive the flow equations for the gauge coupling and the effective potential for the Aharonov-Bohm phases employing the Functional Renormalisation Group. We analyse the infrared and ultraviolet fixed-point solutions in the flow of the gauge coupling as a function of the compactification radius of the fifth dimension. We discuss various types of trajectories which smoothly connect both dimensional limits. Last, we investigate the phase diagram and vacuum structure of the gauge potential for different fermion content.
△ Less
Submitted 7 May, 2024; v1 submitted 15 January, 2024;
originally announced January 2024.
-
Axion cogenesis without isocurvature perturbations
Authors:
Raymond T. Co,
Masaki Yamada
Abstract:
Axion rotations can simultaneously explain the dark matter abundance and the baryon asymmetry of the Universe by kinetic misalignment and axiogenesis. We consider a scenario in which the Peccei-Quinn symmetry breaking field is as large as the Planck scale during inflation and the axion rotation is initiated by the inflaton-induced potential immediately after the end of inflation. This is a realiza…
▽ More
Axion rotations can simultaneously explain the dark matter abundance and the baryon asymmetry of the Universe by kinetic misalignment and axiogenesis. We consider a scenario in which the Peccei-Quinn symmetry breaking field is as large as the Planck scale during inflation and the axion rotation is initiated by the inflaton-induced potential immediately after the end of inflation. This is a realization of the cogenesis scenario that is free of problems with domain walls and isocurvature perturbations thanks to large explicit Peccei-Quinn symmetry breaking at the Planck scale during inflation. The baryon asymmetry can be more efficiently produced by lepto-axiogenesis, in which case the axion mass is predicted to be larger than $O(0.1)$ meV. We also discuss a UV complete model in supersymmetric theories.
△ Less
Submitted 29 December, 2023;
originally announced December 2023.
-
Gradient Flow Exact Renormalization Group for Scalar Quantum Electrodynamics
Authors:
Junichi Haruna,
Masatoshi Yamada
Abstract:
Gradient Flow Exact Renormalization Group (GF-ERG) is a framework to define the renormalization group flow of Wilsonian effective action utilizing coarse-graining along the diffusion equations. We apply it for Scalar Quantum Electrodynamics and derive flow equations for the Wilsonian effective action with the perturbative expansion in the gauge coupling. We focus on the quantum corrections to the…
▽ More
Gradient Flow Exact Renormalization Group (GF-ERG) is a framework to define the renormalization group flow of Wilsonian effective action utilizing coarse-graining along the diffusion equations. We apply it for Scalar Quantum Electrodynamics and derive flow equations for the Wilsonian effective action with the perturbative expansion in the gauge coupling. We focus on the quantum corrections to the correlation functions up to the second order of the gauge coupling and discuss the gauge invariance of the GF-ERG flow. We demonstrate that the anomalous dimension of the gauge field agrees with the standard perturbative computation and that the mass of the photon keeps vanishing in general spacetime dimensions. The latter is a noteworthy fact that contrasts with the conventional Exact Renormalization Group formalism in which an artificial photon mass proportional to a cutoff scale is induced. Our results imply that the GF-ERG can give a gauge-invariant renormalization group flow in a non-perturbative way.
△ Less
Submitted 27 May, 2024; v1 submitted 25 December, 2023;
originally announced December 2023.
-
Dialogue System of Team NTT-EASE for DRC2023
Authors:
Yuki Kubo,
Tomoya Yamashita,
Masanori Yamada
Abstract:
We developed a dialogue system as a team NTT-EASE in the Dialogue Robot Competition 2023 (DRC2023). We introduce a dialogue system (EASE-DRCBot) constructed for DRC2023. EASE-DRCBot incorporates a manually defined dialogue flow. The conditions for system utterances are based on keyword extraction, example-based method, and sentiment analysis. For answering a user's question, EASE-DRCBot utilizes G…
▽ More
We developed a dialogue system as a team NTT-EASE in the Dialogue Robot Competition 2023 (DRC2023). We introduce a dialogue system (EASE-DRCBot) constructed for DRC2023. EASE-DRCBot incorporates a manually defined dialogue flow. The conditions for system utterances are based on keyword extraction, example-based method, and sentiment analysis. For answering a user's question, EASE-DRCBot utilizes GPT-3.5 to generate responses. We analyze the results of the preliminary round and explain future works.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
A Reduced Ideal MHD System for Nonlinear Magnetic Field Turbulence in Plasmas with Approximate Flux Surfaces
Authors:
Naoki Sato,
Michio Yamada
Abstract:
This paper studies the nonlinear evolution of magnetic field turbulence in proximity of steady ideal MHD configurations characterized by a small electric current, a small plasma flow, and approximate flux surfaces, a physical setting that is relevant for plasma confinement in stellarators. The aim is to gather insight on magnetic field dynamics, to elucidate accessibility and stability of three-di…
▽ More
This paper studies the nonlinear evolution of magnetic field turbulence in proximity of steady ideal MHD configurations characterized by a small electric current, a small plasma flow, and approximate flux surfaces, a physical setting that is relevant for plasma confinement in stellarators. The aim is to gather insight on magnetic field dynamics, to elucidate accessibility and stability of three-dimensional MHD equilibria, as well as to formulate practical methods to compute them. Starting from the ideal MHD equations, a reduced dynamical system of two coupled nonlinear PDEs for the flux function and the angle variable associated with the Clebsch representation of the magnetic field is obtained. It is shown that under suitable boundary and gauge conditions such reduced system preserves magnetic energy, magnetic helicity, and total magnetic flux. The noncanonical Hamiltonian structure of the reduced system is identified, and used to show the nonlinear stability of steady solutions against perturbations involving only one Clebsch potential. The Hamiltonian structure is also applied to construct a dissipative dynamical system through the method of double brackets. This dissipative system enables the computation of MHD equilibria by minimizing energy until a critical point of the Hamiltonian is reached. Finally, an iterative scheme based on the alternate solution of the two steady equations in the reduced system is proposed as a further method to compute MHD equilibria. A theorem is proven which states that the iterative scheme converges to a nontrivial MHD equilbrium as long as solutions exist at each step of the iteration.
△ Less
Submitted 6 November, 2023;
originally announced November 2023.
-
An Empirical Study of Self-supervised Learning with Wasserstein Distance
Authors:
Makoto Yamada,
Yuki Takezawa,
Guillaume Houry,
Kira Michaela Dusterwald,
Deborah Sulem,
Han Zhao,
Yao-Hung Hubert Tsai
Abstract:
In this study, we delve into the problem of self-supervised learning (SSL) utilizing the 1-Wasserstein distance on a tree structure (a.k.a., Tree-Wasserstein distance (TWD)), where TWD is defined as the L1 distance between two tree-embedded vectors. In SSL methods, the cosine similarity is often utilized as an objective function; however, it has not been well studied when utilizing the Wasserstein…
▽ More
In this study, we delve into the problem of self-supervised learning (SSL) utilizing the 1-Wasserstein distance on a tree structure (a.k.a., Tree-Wasserstein distance (TWD)), where TWD is defined as the L1 distance between two tree-embedded vectors. In SSL methods, the cosine similarity is often utilized as an objective function; however, it has not been well studied when utilizing the Wasserstein distance. Training the Wasserstein distance is numerically challenging. Thus, this study empirically investigates a strategy for optimizing the SSL with the Wasserstein distance and finds a stable training procedure. More specifically, we evaluate the combination of two types of TWD (total variation and ClusterTree) and several probability models, including the softmax function, the ArcFace probability model, and simplicial embedding. We propose a simple yet effective Jeffrey divergence-based regularization method to stabilize optimization. Through empirical experiments on STL10, CIFAR10, CIFAR100, and SVHN, we find that a simple combination of the softmax function and TWD can obtain significantly lower results than the standard SimCLR. Moreover, a simple combination of TWD and SimSiam fails to train the model. We find that the model performance depends on the combination of TWD and probability model, and that the Jeffrey divergence regularization helps in model training. Finally, we show that the appropriate combination of the TWD and probability model outperforms cosine similarity-based representation learning.
△ Less
Submitted 5 February, 2024; v1 submitted 16 October, 2023;
originally announced October 2023.
-
Embarrassingly Simple Text Watermarks
Authors:
Ryoma Sato,
Yuki Takezawa,
Han Bao,
Kenta Niwa,
Makoto Yamada
Abstract:
We propose Easymark, a family of embarrassingly simple yet effective watermarks. Text watermarking is becoming increasingly important with the advent of Large Language Models (LLM). LLMs can generate texts that cannot be distinguished from human-written texts. This is a serious problem for the credibility of the text. Easymark is a simple yet effective solution to this problem. Easymark can inject…
▽ More
We propose Easymark, a family of embarrassingly simple yet effective watermarks. Text watermarking is becoming increasingly important with the advent of Large Language Models (LLM). LLMs can generate texts that cannot be distinguished from human-written texts. This is a serious problem for the credibility of the text. Easymark is a simple yet effective solution to this problem. Easymark can inject a watermark without changing the meaning of the text at all while a validator can detect if a text was generated from a system that adopted Easymark or not with high credibility. Easymark is extremely easy to implement so that it only requires a few lines of code. Easymark does not require access to LLMs, so it can be implemented on the user-side when the LLM providers do not offer watermarked LLMs. In spite of its simplicity, it achieves higher detection accuracy and BLEU scores than the state-of-the-art text watermarking methods. We also prove the impossibility theorem of perfect watermarking, which is valuable in its own right. This theorem shows that no matter how sophisticated a watermark is, a malicious user could remove it from the text, which motivate us to use a simple watermark such as Easymark. We carry out experiments with LLM-generated texts and confirm that Easymark can be detected reliably without any degradation of BLEU and perplexity, and outperform state-of-the-art watermarks in terms of both quality and reliability.
△ Less
Submitted 13 October, 2023;
originally announced October 2023.
-
Super-slow phase transition catalyzed by BHs and the birth of baby BHs
Authors:
Ryusuke Jinno,
Jun'ya Kume,
Masaki Yamada
Abstract:
We discuss the unique phenomenology of first-order phase transitions catalyzed by primordial black holes (BHs). If the number of BHs within one Hubble volume is smaller than unity at the time of bubble nucleation, each bubble catalyzed around them can expand to the Hubble size, and the universe is eventually filled with true vacuum much after nucleation. This super-slow transition predicts enhance…
▽ More
We discuss the unique phenomenology of first-order phase transitions catalyzed by primordial black holes (BHs). If the number of BHs within one Hubble volume is smaller than unity at the time of bubble nucleation, each bubble catalyzed around them can expand to the Hubble size, and the universe is eventually filled with true vacuum much after nucleation. This super-slow transition predicts enhanced gravitational wave signals from bubble collisions and can be tested in future observations. Moreover, the remaining rare false vacuum patches give birth to baby BHs, which can account for the abundance of dark matter in our universe.
△ Less
Submitted 23 February, 2024; v1 submitted 10 October, 2023;
originally announced October 2023.
-
Confidence-driven Sampling for Backdoor Attacks
Authors:
Pengfei He,
Han Xu,
Yue Xing,
Jie Ren,
Yingqian Cui,
Shenglai Zeng,
Jiliang Tang,
Makoto Yamada,
Mohammad Sabokrou
Abstract:
Backdoor attacks aim to surreptitiously insert malicious triggers into DNN models, granting unauthorized control during testing scenarios. Existing methods lack robustness against defense strategies and predominantly focus on enhancing trigger stealthiness while randomly selecting poisoned samples. Our research highlights the overlooked drawbacks of random sampling, which make that attack detectab…
▽ More
Backdoor attacks aim to surreptitiously insert malicious triggers into DNN models, granting unauthorized control during testing scenarios. Existing methods lack robustness against defense strategies and predominantly focus on enhancing trigger stealthiness while randomly selecting poisoned samples. Our research highlights the overlooked drawbacks of random sampling, which make that attack detectable and defensible. The core idea of this paper is to strategically poison samples near the model's decision boundary and increase defense difficulty. We introduce a straightforward yet highly effective sampling methodology that leverages confidence scores. Specifically, it selects samples with lower confidence scores, significantly increasing the challenge for defenders in identifying and countering these attacks. Importantly, our method operates independently of existing trigger designs, providing versatility and compatibility with various backdoor attack techniques. We substantiate the effectiveness of our approach through a comprehensive set of empirical experiments, demonstrating its potential to significantly enhance resilience against backdoor attacks in DNNs.
△ Less
Submitted 8 October, 2023;
originally announced October 2023.
-
Necessary and Sufficient Watermark for Large Language Models
Authors:
Yuki Takezawa,
Ryoma Sato,
Han Bao,
Kenta Niwa,
Makoto Yamada
Abstract:
In recent years, large language models (LLMs) have achieved remarkable performances in various NLP tasks. They can generate texts that are indistinguishable from those written by humans. Such remarkable performance of LLMs increases their risk of being used for malicious purposes, such as generating fake news articles. Therefore, it is necessary to develop methods for distinguishing texts written…
▽ More
In recent years, large language models (LLMs) have achieved remarkable performances in various NLP tasks. They can generate texts that are indistinguishable from those written by humans. Such remarkable performance of LLMs increases their risk of being used for malicious purposes, such as generating fake news articles. Therefore, it is necessary to develop methods for distinguishing texts written by LLMs from those written by humans. Watermarking is one of the most powerful methods for achieving this. Although existing watermarking methods have successfully detected texts generated by LLMs, they significantly degrade the quality of the generated texts. In this study, we propose the Necessary and Sufficient Watermark (NS-Watermark) for inserting watermarks into generated texts without degrading the text quality. More specifically, we derive minimum constraints required to be imposed on the generated texts to distinguish whether LLMs or humans write the texts. Then, we formulate the NS-Watermark as a constrained optimization problem and propose an efficient algorithm to solve it. Through the experiments, we demonstrate that the NS-Watermark can generate more natural texts than existing watermarking methods and distinguish more accurately between texts written by LLMs and those written by humans. Especially in machine translation tasks, the NS-Watermark can outperform the existing watermarking method by up to 30 BLEU scores.
△ Less
Submitted 1 October, 2023;
originally announced October 2023.
-
Dissipation of axion energy via the Schwinger and Witten effects
Authors:
Kwang Sik Jeong,
Shota Nakagawa,
Fuminobu Takahashi,
Masaki Yamada
Abstract:
In the presence of an anomalous CP phase in a U(1) gauge theory, a monopole becomes a dyon via the Witten effect. When the anomalous CP phase is promoted to a dynamical field, the axion, the electric charge of the dyon changes according to the coherent motion of the axion oscillation. Once the electric charge exceeds a certain threshold, the Schwinger pair production of charged particles becomes e…
▽ More
In the presence of an anomalous CP phase in a U(1) gauge theory, a monopole becomes a dyon via the Witten effect. When the anomalous CP phase is promoted to a dynamical field, the axion, the electric charge of the dyon changes according to the coherent motion of the axion oscillation. Once the electric charge exceeds a certain threshold, the Schwinger pair production of charged particles becomes efficient near the surface of the dyon. These non-perturbative effects lead to the back reaction of the axion dynamics by causing the dissipation of the axion oscillation energy and the change of the effective potential due to the Witten effect. Taking these effects into account, we consider the dynamics of the whole system, including the axion, monopole, and charged heavy vector bosons, and discuss to what extent the axion abundance is modified. We also discuss the electric dipole radiation from a bound state of a monopole-anti-monopole pair due to the axion coherent oscillations.
△ Less
Submitted 28 September, 2023;
originally announced September 2023.
-
Irreversible vierbein postulate: Emergence of spacetime from quantum phase transition
Authors:
Yadikaer Maitiniyazi,
Shinya Matsuzaki,
Kin-ya Oda,
Masatoshi Yamada
Abstract:
We formulate a model for quantum gravity based on the local Lorentz symmetry and general coordinate invariance. A key idea is the irreversible vierbein postulate that a tree-level action for the model at a certain energy scale does not contain an inverse vierbein. Under this postulate, only the spinor becomes a dynamical field, and no gravitational background field is introduced in the tree-level…
▽ More
We formulate a model for quantum gravity based on the local Lorentz symmetry and general coordinate invariance. A key idea is the irreversible vierbein postulate that a tree-level action for the model at a certain energy scale does not contain an inverse vierbein. Under this postulate, only the spinor becomes a dynamical field, and no gravitational background field is introduced in the tree-level action. In this paper, after explaining the transformation rules of the local Lorentz and general-coordinate transformations in detail, a tree-level action is defined. We show that fermionic fluctuations can induce a nonvanishing gravitational background field.
△ Less
Submitted 13 May, 2024; v1 submitted 28 September, 2023;
originally announced September 2023.
-
Nonperturbative aspects of two-dimensional $T\bar{T}$-deformed scalar theory from functional renormalization group
Authors:
Jie Liu,
Junichi Haruna,
Masatoshi Yamada
Abstract:
We study $T\bar{T}$-deformed $O(N)$ scalar field theory in two-dimensional spacetime using the functional renormalization group. We derive the $β$ functions for the couplings in the system and explore the fixed points. In addition to the Gaussian (trivial) fixed point, we find a nontrivial fixed point at which a new universality class exists. The deformation parameter becomes relevant at the nontr…
▽ More
We study $T\bar{T}$-deformed $O(N)$ scalar field theory in two-dimensional spacetime using the functional renormalization group. We derive the $β$ functions for the couplings in the system and explore the fixed points. In addition to the Gaussian (trivial) fixed point, we find a nontrivial fixed point at which a new universality class exists. The deformation parameter becomes relevant at the nontrivial fixed point. Therefore, the $T\bar T$-deformed scalar field theory in two-dimensional spacetime could be defined as a nonperturbatively renormalizable theory.
△ Less
Submitted 13 March, 2024; v1 submitted 27 September, 2023;
originally announced September 2023.
-
D3: Data Diversity Design for Systematic Generalization in Visual Question Answering
Authors:
Amir Rahimi,
Vanessa D'Amario,
Moyuru Yamada,
Kentaro Takemoto,
Tomotake Sasaki,
Xavier Boix
Abstract:
Systematic generalization is a crucial aspect of intelligence, which refers to the ability to generalize to novel tasks by combining known subtasks and concepts. One critical factor that has been shown to influence systematic generalization is the diversity of training data. However, diversity can be defined in various ways, as data have many factors of variation. A more granular understanding of…
▽ More
Systematic generalization is a crucial aspect of intelligence, which refers to the ability to generalize to novel tasks by combining known subtasks and concepts. One critical factor that has been shown to influence systematic generalization is the diversity of training data. However, diversity can be defined in various ways, as data have many factors of variation. A more granular understanding of how different aspects of data diversity affect systematic generalization is lacking. We present new evidence in the problem of Visual Question Answering (VQA) that reveals that the diversity of simple tasks (i.e. tasks formed by a few subtasks and concepts) plays a key role in achieving systematic generalization. This implies that it may not be essential to gather a large and varied number of complex tasks, which could be costly to obtain. We demonstrate that this result is independent of the similarity between the training and testing data and applies to well-known families of neural network architectures for VQA (i.e. monolithic architectures and neural module networks). Additionally, we observe that neural module networks leverage all forms of data diversity we evaluated, while monolithic architectures require more extensive amounts of data to do so. These findings provide a first step towards understanding the interactions between data diversity design, neural network architectures, and systematic generalization capabilities.
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
Dynamics of Superconformal Axion: Quality and Scalegenesis
Authors:
Shota Nakagawa,
Yuichiro Nakai,
Masaki Yamada,
Yufei Zhang
Abstract:
We explore a dynamical mechanism to realize the emergence of a global $U(1)_{\rm PQ}$ symmetry and its spontaneous breaking at an intermediate scale for an axion solution to the strong CP problem. Such a dynamics is provided by a new supersymmetric QCD near the middle of conformal window that couples to fields spontaneously breaking the $U(1)_{\rm PQ}$ symmetry. A large anomalous dimension of the…
▽ More
We explore a dynamical mechanism to realize the emergence of a global $U(1)_{\rm PQ}$ symmetry and its spontaneous breaking at an intermediate scale for an axion solution to the strong CP problem. Such a dynamics is provided by a new supersymmetric QCD near the middle of conformal window that couples to fields spontaneously breaking the $U(1)_{\rm PQ}$ symmetry. A large anomalous dimension of the $U(1)_{\rm PQ}$ breaking fields leads to the suppression of explicit $U(1)_{\rm PQ}$-violating higher dimensional operators. The $U(1)_{\rm PQ}$ breaking vacuum is generated at a scale hierarchically smaller than the Planck scale by a non-perturbative effect. The $U(1)_{\rm PQ}$ breaking drives the conformal breaking, and all the new quarks become massive. The axion potential is generated by the ordinary color $SU(3)_C$ effect as the $U(1)_{\rm PQ}$ symmetry is only anomalous under the $SU(3)_C$. The saxion direction is stabilized by supersymmetry breaking and cosmologically harmless.
△ Less
Submitted 30 January, 2024; v1 submitted 13 September, 2023;
originally announced September 2023.
-
Optimizing Machine Translation through Prompt Engineering: An Investigation into ChatGPT's Customizability
Authors:
Masaru Yamada
Abstract:
This paper explores the influence of integrating the purpose of the translation and the target audience into prompts on the quality of translations produced by ChatGPT. Drawing on previous translation studies, industry practices, and ISO standards, the research underscores the significance of the pre-production phase in the translation process. The study reveals that the inclusion of suitable prom…
▽ More
This paper explores the influence of integrating the purpose of the translation and the target audience into prompts on the quality of translations produced by ChatGPT. Drawing on previous translation studies, industry practices, and ISO standards, the research underscores the significance of the pre-production phase in the translation process. The study reveals that the inclusion of suitable prompts in large-scale language models like ChatGPT can yield flexible translations, a feat yet to be realized by conventional Machine Translation (MT). The research scrutinizes the changes in translation quality when prompts are used to generate translations that meet specific conditions. The evaluation is conducted from a practicing translator's viewpoint, both subjectively and qualitatively, supplemented by the use of OpenAI's word embedding API for cosine similarity calculations. The findings suggest that the integration of the purpose and target audience into prompts can indeed modify the generated translations, generally enhancing the translation quality by industry standards. The study also demonstrates the practical application of the "good translation" concept, particularly in the context of marketing documents and culturally dependent idioms.
△ Less
Submitted 21 February, 2024; v1 submitted 2 August, 2023;
originally announced August 2023.
-
Implicit neural representation for change detection
Authors:
Peter Naylor,
Diego Di Carlo,
Arianna Traviglia,
Makoto Yamada,
Marco Fiorucci
Abstract:
Identifying changes in a pair of 3D aerial LiDAR point clouds, obtained during two distinct time periods over the same geographic region presents a significant challenge due to the disparities in spatial coverage and the presence of noise in the acquisition system. The most commonly used approaches to detecting changes in point clouds are based on supervised methods which necessitate extensive lab…
▽ More
Identifying changes in a pair of 3D aerial LiDAR point clouds, obtained during two distinct time periods over the same geographic region presents a significant challenge due to the disparities in spatial coverage and the presence of noise in the acquisition system. The most commonly used approaches to detecting changes in point clouds are based on supervised methods which necessitate extensive labelled data often unavailable in real-world applications. To address these issues, we propose an unsupervised approach that comprises two components: Implicit Neural Representation (INR) for continuous shape reconstruction and a Gaussian Mixture Model for categorising changes. INR offers a grid-agnostic representation for encoding bi-temporal point clouds, with unmatched spatial support that can be regularised to enhance high-frequency details and reduce noise. The reconstructions at each timestamp are compared at arbitrary spatial scales, leading to a significant increase in detection capabilities. We apply our method to a benchmark dataset comprising simulated LiDAR point clouds for urban sprawling. This dataset encompasses diverse challenging scenarios, varying in resolutions, input modalities and noise levels. This enables a comprehensive multi-scenario evaluation, comparing our method with the current state-of-the-art approach. We outperform the previous methods by a margin of 10% in the intersection over union metric. In addition, we put our techniques to practical use by applying them in a real-world scenario to identify instances of illicit excavation of archaeological sites and validate our results by comparing them with findings from field experts.
△ Less
Submitted 30 August, 2023; v1 submitted 28 July, 2023;
originally announced July 2023.
-
Laboratory Study of Collisionless Magnetic Reconnection
Authors:
H. Ji,
J. Yoo,
W. Fox,
M. Yamada,
M. Argall,
J. Egedal,
Y. -H. Liu,
R. Wilder,
S. Eriksson,
W. Daughton,
K. Bergstedt,
S. Bose,
J. Burch,
R. Torbert,
J. Ng,
L. -J. Chen
Abstract:
A concise review is given on the past two decades' results from laboratory experiments on collisionless magnetic reconnection in direct relation with space measurements, especially by Magnetospheric Multiscale (MMS) mission. Highlights include spatial structures of electromagnetic fields in ion and electron diffusion regions as a function of upstream symmetry and guide field strength; energy conve…
▽ More
A concise review is given on the past two decades' results from laboratory experiments on collisionless magnetic reconnection in direct relation with space measurements, especially by Magnetospheric Multiscale (MMS) mission. Highlights include spatial structures of electromagnetic fields in ion and electron diffusion regions as a function of upstream symmetry and guide field strength; energy conversion and partition from magnetic field to ions and electrons including particle acceleration; electrostatic and electromagnetic kinetic plasma waves with various wavelengths; and plasmoid-mediated multiscale reconnection. Combined with the progress in theoretical, numerical, and observational studies, the physics foundation of fast reconnection in colisionless plasmas has been largely established, at least within the parameter ranges and spatial scales that were studied. Immediate and long-term future opportunities based on multiscale experiments and space missions supported by exascale computation are discussed, including dissipation by kinetic plasma waves, particle heating and acceleration, and multiscale physics across fluid and kinetic scales.
△ Less
Submitted 13 July, 2023;
originally announced July 2023.
-
Dark baryon from pure Yang-Mills theory and its GW signature from cosmic strings
Authors:
Masaki Yamada,
Kazuya Yonekura
Abstract:
We point out that SO($2N$) pure Yang-Mills theory provides a candidate for dark matter (DM) without the explicit need to impose any additional symmetry. The DM candidate is a particular type of glueball, which we refer to as a baryonic glueball, that is naturally stable and produced by a novel production mechanism for a moderately large $N$. In this case, the intercommutation probability of cosmic…
▽ More
We point out that SO($2N$) pure Yang-Mills theory provides a candidate for dark matter (DM) without the explicit need to impose any additional symmetry. The DM candidate is a particular type of glueball, which we refer to as a baryonic glueball, that is naturally stable and produced by a novel production mechanism for a moderately large $N$. In this case, the intercommutation probability of cosmic strings (or macroscopic color flux tubes) is quite low, which offers characteristic gravitational wave signals to test our model. In particular, our model can simultaneously account for both abundance of DM and the recently reported gravitational wave signals detected in pulsar timing array experiments, including NANOGrav.
△ Less
Submitted 17 August, 2023; v1 submitted 13 July, 2023;
originally announced July 2023.
-
Can baryon asymmetry be explained by a large initial value before inflation?
Authors:
Kai Murai,
Fuminobu Takahashi,
Masaki Yamada,
Wen Yin
Abstract:
We show that the baryon asymmetry of the Universe cannot be explained by a large initial value before inflation because it inevitably predicts correlated baryon isocurvature perturbations that are already excluded by cosmic microwave background observations. Similar arguments can generally be applied to some models of dark matter.
We show that the baryon asymmetry of the Universe cannot be explained by a large initial value before inflation because it inevitably predicts correlated baryon isocurvature perturbations that are already excluded by cosmic microwave background observations. Similar arguments can generally be applied to some models of dark matter.
△ Less
Submitted 6 July, 2023;
originally announced July 2023.
-
Defect-Induced Low-Energy Majorana Excitations in the Kitaev Magnet $α$-RuCl$_3$
Authors:
K. Imamura,
Y. Mizukami,
O. Tanaka,
R. Grasset,
M. Konczykowski,
N. Kurita,
H. Tanaka,
Y. Matsuda,
M. G. Yamada,
K. Hashimoto,
T. Shibauchi
Abstract:
The excitations in the Kitaev spin liquid (KSL) can be described by Majorana fermions, which have characteristic field dependence of bulk gap and topological edge modes. In the high-field state of layered honeycomb magnet $α$-RuCl$_3$, experimental results supporting these Majorana features have been reported recently. However, there are challenges due to sample dependence and the impact of inevit…
▽ More
The excitations in the Kitaev spin liquid (KSL) can be described by Majorana fermions, which have characteristic field dependence of bulk gap and topological edge modes. In the high-field state of layered honeycomb magnet $α$-RuCl$_3$, experimental results supporting these Majorana features have been reported recently. However, there are challenges due to sample dependence and the impact of inevitable disorder on the KSL is poorly understood. Here we study how low-energy excitations are modified by introducing point defects in $α$-RuCl$_3$ using electron irradiation, which induces site vacancies and exchange randomness. High-resolution measurements of the temperature dependence of specific heat $C(T)$ under in-plane fields $H$ reveal that while the field-dependent Majorana gap is almost intact, additional low-energy states with $C/T=A(H)T$ are induced by introduced defects. At low temperatures, we obtain the data collapse of $C/T\sim H^{-γ}(T/H)$ expected for a disordered quantum spin system, but with an anomalously large exponent $γ$. This leads us to find a power-law relationship between the coefficient $A(H)$ and the field-sensitive Majorana gap. These results are consistent with the picture that the disorder induces low-energy linear Majorana excitations, which may be considered as a weak localization effect of Majorana fermions in the KSL.
△ Less
Submitted 15 February, 2024; v1 submitted 29 June, 2023;
originally announced June 2023.
-
Comparison of optical spectra between asteroids Ryugu and Bennu:I. Cross calibration between Hayabusa2/ONC-T and OSIRIS-REx/MapCam
Authors:
K. Yumoto,
E. Tatsumi,
T. Kouyama,
D. R. Golish,
Y. Cho,
T. Morota,
S. Kameda,
H. Sato,
B. Rizk,
D. N. DellaGiustina,
Y. Yokota,
H. Suzuki,
J. de Leon,
H. Campins,
J. Licandro,
M. Popescu,
J. L. Rizos,
R. Honda,
M. Yamada,
N. Sakatani,
C. Honda,
M. Matsuoka,
M. Hayakawa,
H. Sawada,
K. Ogawa
, et al. (3 additional authors not shown)
Abstract:
Asteroids (162173) Ryugu and (101955) Bennu observed by Hayabusa2 and OSIRIS-REx share many properties, but spectral observations by the telescopic Optical Navigation Camera (ONC-T) and MapCam detected subtle but significant differences, which may reflect differences in their origin and evolution. Comparing these differences on the same absolute scale is necessary for understanding their causes. H…
▽ More
Asteroids (162173) Ryugu and (101955) Bennu observed by Hayabusa2 and OSIRIS-REx share many properties, but spectral observations by the telescopic Optical Navigation Camera (ONC-T) and MapCam detected subtle but significant differences, which may reflect differences in their origin and evolution. Comparing these differences on the same absolute scale is necessary for understanding their causes. However, ONC-T and MapCam have a large imager-to-imager systematic error of up to 15% caused by the difference in radiometric calibration targets. To resolve this problem, we cross calibrated albedo and color data between the two instruments using the Moon as the common calibration standard. The images of the Moon taken by ONC-T and MapCam were compared with those simulated using photometry models developed from lunar orbiter data. Our results show that the cross-calibrated reflectance of Ryugu and Bennu can be obtained by upscaling the pre-cross-calibrated reflectance of Bennu by 13.3 +/- 1.6% at b band, 13.2 +/- 1.5% at v band, 13.6 +/- 1.7% at w band, and 14.8 +/- 1.8% at x band, while those for Ryugu are kept the same. These factors compensate for the imager-to-imager bias caused by differences in targets used for radiometric calibration and solar irradiance models used for data reduction. Need for such large upscaling underscore the importance of using the cross-calibrated data for accurately comparing the Ryugu and Bennu data. The uncertainty in these factors show that the reflectance of Ryugu and Bennu can be compared with <2% accuracy after applying our results. By applying our cross calibration, the geometric albedo of Bennu became consistent with those observed by ground-based telescopes and OVIRS. Our result can be simply applied by multiplying a constant to the publicly available data and enables accurate comparison of the optical spectra of Ryugu and Bennu in future studies.
△ Less
Submitted 18 June, 2024; v1 submitted 23 June, 2023;
originally announced June 2023.
-
Quantum decay of scalar and vector boson stars and oscillons into gravitons
Authors:
Kazunori Nakayama,
Fuminobu Takahashi,
Masaki Yamada
Abstract:
We point out that a soliton such as an oscillon or boson star inevitably decays into gravitons through gravitational interactions. These decay processes exist even if there are no apparent self-interactions of the constituent field, scalar or vector, since they are induced by gravitational interactions. Hence, our results provide a strict upper limit on the lifetime of oscillons and boson stars in…
▽ More
We point out that a soliton such as an oscillon or boson star inevitably decays into gravitons through gravitational interactions. These decay processes exist even if there are no apparent self-interactions of the constituent field, scalar or vector, since they are induced by gravitational interactions. Hence, our results provide a strict upper limit on the lifetime of oscillons and boson stars including the dilute axion star. We also calculate the spectrum of the graviton background from decay of solitons.
△ Less
Submitted 22 June, 2023;
originally announced June 2023.
-
Stochastic dynamics of multi-waterfall hybrid inflation and formation of primordial black holes
Authors:
Yuichiro Tada,
Masaki Yamada
Abstract:
We show that a hybrid inflation model with multiple waterfall fields can result in the formation of primordial black hole (PBH) with an astrophysical size, by using an advanced algorithm to follow the stochastic dynamics of the waterfall fields. This is in contrast to the case with a single waterfall field, where the wavelength of density perturbations is usually too short to form PBHs of the astr…
▽ More
We show that a hybrid inflation model with multiple waterfall fields can result in the formation of primordial black hole (PBH) with an astrophysical size, by using an advanced algorithm to follow the stochastic dynamics of the waterfall fields. This is in contrast to the case with a single waterfall field, where the wavelength of density perturbations is usually too short to form PBHs of the astrophysical scale (or otherwise PBH are overproduced and the model is ruled out) unless the inflaton potential is tuned. In particular, we demonstrate that PBHs with masses of order $10^{20}\, {\rm g}$ can form after hybrid inflation consistently with other cosmological observations if the number of waterfall fields is about 5 for the case of instantaneous reheating. Observable gravitational waves are produced from the second-order effect of large curvature perturbations as well as from the dynamics of texture or global defects that form after the waterfall phase transition.
△ Less
Submitted 12 June, 2023;
originally announced June 2023.
-
One-Shot Machine Unlearning with Mnemonic Code
Authors:
Tomoya Yamashita,
Masanori Yamada,
Takashi Shibata
Abstract:
Deep learning has achieved significant improvements in accuracy and has been applied to various fields. With the spread of deep learning, a new problem has also emerged; deep learning models can sometimes have undesirable information from an ethical standpoint. This problem must be resolved if deep learning is to make sensitive decisions such as hiring and prison sentencing. Machine unlearning (MU…
▽ More
Deep learning has achieved significant improvements in accuracy and has been applied to various fields. With the spread of deep learning, a new problem has also emerged; deep learning models can sometimes have undesirable information from an ethical standpoint. This problem must be resolved if deep learning is to make sensitive decisions such as hiring and prison sentencing. Machine unlearning (MU) is the research area that responds to such demands. MU aims at forgetting about undesirable training data from a trained deep learning model. A naive MU approach is to re-train the whole model with the training data from which the undesirable data has been removed. However, re-training the whole model can take a huge amount of time and consumes significant computer resources. To make MU even more practical, a simple-yet-effective MU method is required. In this paper, we propose a one-shot MU method, which does not need additional training. To design one-shot MU, we add noise to the model parameters that are sensitive to undesirable information. In our proposed method, we use the Fisher information matrix (FIM) to estimate the sensitive model parameters. Training data were usually used to evaluate the FIM in existing methods. In contrast, we avoid the need to retain the training data for calculating the FIM by using class-specific synthetic signals called mnemonic code. Extensive experiments using artificial and natural datasets demonstrate that our method outperforms the existing methods.
△ Less
Submitted 9 June, 2023;
originally announced June 2023.
-
Revisiting Permutation Symmetry for Merging Models between Different Datasets
Authors:
Masanori Yamada,
Tomoya Yamashita,
Shin'ya Yamaguchi,
Daiki Chijiwa
Abstract:
Model merging is a new approach to creating a new model by combining the weights of different trained models. Previous studies report that model merging works well for models trained on a single dataset with different random seeds, while model merging between different datasets is difficult. Merging knowledge from different datasets has practical significance, but it has not been well investigated…
▽ More
Model merging is a new approach to creating a new model by combining the weights of different trained models. Previous studies report that model merging works well for models trained on a single dataset with different random seeds, while model merging between different datasets is difficult. Merging knowledge from different datasets has practical significance, but it has not been well investigated. In this paper, we investigate the properties of merging models between different datasets. Through theoretical and empirical analyses, we find that the accuracy of the merged model decreases more significantly as the datasets diverge more and that the different loss landscapes for each dataset make model merging between different datasets difficult. We also show that merged models require datasets for merging in order to achieve a high accuracy. Furthermore, we show that condensed datasets created by dataset condensation can be used as substitutes for the original datasets when merging models. We conduct experiments for model merging between different datasets. When merging between MNIST and Fashion- MNIST models, the accuracy significantly improves by 28% using the dataset and 25% using the condensed dataset compared with not using the dataset.
△ Less
Submitted 8 June, 2023;
originally announced June 2023.
-
Hamiltonian Structure and Nonlinear Stability of Steady Solutions of the Generalized Hasegawa-Mima Equation for Drift Wave Turbulence in Curved Magnetic Fields
Authors:
Naoki Sato,
Michio Yamada
Abstract:
The Generalized Hasegawa-Mima (GHM) equation, which generalizes the standard Hasegawa-Mima (HM) equation, is a nonlinear equation describing the evolution of drift wave turbulence in curved magnetic fields. The GHM equation can be obtained from a drift wave turbulence ordering that does not involve ordering conditions on spatial derivatives of the magnetic field or the plasma density, and it is th…
▽ More
The Generalized Hasegawa-Mima (GHM) equation, which generalizes the standard Hasegawa-Mima (HM) equation, is a nonlinear equation describing the evolution of drift wave turbulence in curved magnetic fields. The GHM equation can be obtained from a drift wave turbulence ordering that does not involve ordering conditions on spatial derivatives of the magnetic field or the plasma density, and it is therefore appropriate to describe the evolution of electrostatic turbulence in strongly inhomogeneous magnetized plasmas. In this work, we discuss the noncanonical Hamiltonian structure of the GHM equation, and obtain conditions for the nonlinear stability of steady solutions through the energy-Casimir stability criterion. These results are then applied to describe drift waves and infer the existence of stable toroidal zonal flows with radial shear in dipole magnetic fields.
△ Less
Submitted 26 May, 2023;
originally announced May 2023.
-
Guiding Center Derivation of the Generalized Hasegawa-Mima Equation for Drift Wave Turbulence in Curved Magnetic Fields
Authors:
Naoki Sato,
Michio Yamada
Abstract:
Recently, a generalized Hasegawa-Mima (gHM) equation describing drift wave turbulence in curved magnetic fields has been derived in [N. Sato and M. Yamada, J. Plasma Phys. (2022), vol. 88, 905880319] for an ion-electron plasma modeled as a two-fluid system. In this work, we show that a mathematically equivalent GHM equation can be obtained within the kinetic framework of guiding center motion, and…
▽ More
Recently, a generalized Hasegawa-Mima (gHM) equation describing drift wave turbulence in curved magnetic fields has been derived in [N. Sato and M. Yamada, J. Plasma Phys. (2022), vol. 88, 905880319] for an ion-electron plasma modeled as a two-fluid system. In this work, we show that a mathematically equivalent GHM equation can be obtained within the kinetic framework of guiding center motion, and that the relevant drift wave turbulence ordering can be further relaxed, effectively generalizing the applicability of the equation to any magnetic field geometry and electron spatial density, in the sense that no ordering requirements involve spatial derivatives of the magnetic field or the electron spatial density.
△ Less
Submitted 26 May, 2023;
originally announced May 2023.
-
Beyond Exponential Graph: Communication-Efficient Topologies for Decentralized Learning via Finite-time Convergence
Authors:
Yuki Takezawa,
Ryoma Sato,
Han Bao,
Kenta Niwa,
Makoto Yamada
Abstract:
Decentralized learning has recently been attracting increasing attention for its applications in parallel computation and privacy preservation. Many recent studies stated that the underlying network topology with a faster consensus rate (a.k.a. spectral gap) leads to a better convergence rate and accuracy for decentralized learning. However, a topology with a fast consensus rate, e.g., the exponen…
▽ More
Decentralized learning has recently been attracting increasing attention for its applications in parallel computation and privacy preservation. Many recent studies stated that the underlying network topology with a faster consensus rate (a.k.a. spectral gap) leads to a better convergence rate and accuracy for decentralized learning. However, a topology with a fast consensus rate, e.g., the exponential graph, generally has a large maximum degree, which incurs significant communication costs. Thus, seeking topologies with both a fast consensus rate and small maximum degree is important. In this study, we propose a novel topology combining both a fast consensus rate and small maximum degree called the Base-$(k + 1)$ Graph. Unlike the existing topologies, the Base-$(k + 1)$ Graph enables all nodes to reach the exact consensus after a finite number of iterations for any number of nodes and maximum degree k. Thanks to this favorable property, the Base-$(k + 1)$ Graph endows Decentralized SGD (DSGD) with both a faster convergence rate and more communication efficiency than the exponential graph. We conducted experiments with various topologies, demonstrating that the Base-$(k + 1)$ Graph enables various decentralized learning methods to achieve higher accuracy with better communication efficiency than the existing topologies.
△ Less
Submitted 15 October, 2023; v1 submitted 19 May, 2023;
originally announced May 2023.
-
HICO-DET-SG and V-COCO-SG: New Data Splits for Evaluating the Systematic Generalization Performance of Human-Object Interaction Detection Models
Authors:
Kentaro Takemoto,
Moyuru Yamada,
Tomotake Sasaki,
Hisanao Akima
Abstract:
Human-Object Interaction (HOI) detection is a task to localize humans and objects in an image and predict the interactions in human-object pairs. In real-world scenarios, HOI detection models need systematic generalization, i.e., generalization to novel combinations of objects and interactions, because the train data are expected to cover a limited portion of all possible combinations. To evaluate…
▽ More
Human-Object Interaction (HOI) detection is a task to localize humans and objects in an image and predict the interactions in human-object pairs. In real-world scenarios, HOI detection models need systematic generalization, i.e., generalization to novel combinations of objects and interactions, because the train data are expected to cover a limited portion of all possible combinations. To evaluate the systematic generalization performance of HOI detection models, we created two new sets of HOI detection data splits named HICO-DET-SG and V-COCO-SG based on the HICO-DET and V-COCO datasets, respectively. When evaluated on the new data splits, HOI detection models with various characteristics performed much more poorly than when evaluated on the original splits. This shows that systematic generalization is a challenging goal in HOI detection. By analyzing the evaluation results, we also gain insights for improving the systematic generalization performance and identify four possible future research directions. We hope that our new data splits and presented analysis will encourage further research on systematic generalization in HOI detection.
△ Less
Submitted 11 April, 2024; v1 submitted 17 May, 2023;
originally announced May 2023.
-
On the primordial black hole formation in hybrid inflation
Authors:
Yuichiro Tada,
Masaki Yamada
Abstract:
We revisit the scenario of primordial black hole (PBH) formation from large curvature perturbations generated during the waterfall phase transition in hybrid inflation models. In a minimal setup considered in the literature, the mass and abundance of PBHs are correlated and astrophysical size PBHs tend to be overproduced. This is because a longer length scale for curvature perturbations (or a larg…
▽ More
We revisit the scenario of primordial black hole (PBH) formation from large curvature perturbations generated during the waterfall phase transition in hybrid inflation models. In a minimal setup considered in the literature, the mass and abundance of PBHs are correlated and astrophysical size PBHs tend to be overproduced. This is because a longer length scale for curvature perturbations (or a larger PBH mass) requires a longer waterfall regime with a flatter potential, which results in overproduction of curvature perturbations. However, in this paper, we discuss that the higher-dimensional terms for the inflaton potential affect the dynamics during the waterfall phase transition and show that astrophysical size PHBs of order $10^{17\text{--}23} \, {\rm g}$ (which can explain the whole dark matter) can form in some parameter space consistently with any existing constraints. The scenario can be tested by observing the induced gravitational waves from scalar perturbations by future gravitational wave experiments, such as LISA.
△ Less
Submitted 23 June, 2023; v1 submitted 3 April, 2023;
originally announced April 2023.
-
Heliocentric Distance Dependence of Zodiacal Light Observed by Hayabusa2#
Authors:
Kohji Tsumura,
Shuji Matsuura,
Kei Sano,
Takahiro Iwata,
Hajime Yano,
Kohei Kitazato,
Kohji Takimoto,
Manabu Yamada,
Tomokatsu Morota,
Toru Kouyama,
Masahiko Hayakawa,
Yasuhiro Yokota,
Eri Tatsumi,
Moe Matsuoka,
Naoya Sakatani,
Rie Honda,
Shingo Kameda,
Hidehiko Suzuki,
Yuichiro Cho,
Kazuo Yoshioka,
Kazunori Ogawa,
Kei Shirai,
Hirotaka Sawada,
Seiji Sugita
Abstract:
Zodiacal light (ZL) is sunlight scattered by interplanetary dust particles (IDPs) at optical wavelengths. The spatial distribution of IDPs in the Solar System may hold an important key to understanding the evolution of the Solar System and material transportation within it. The number density of IDPs can be expressed as $n(r) \sim r^{-α}$, and the exponent $α\sim 1.3$ was obtained by previous obse…
▽ More
Zodiacal light (ZL) is sunlight scattered by interplanetary dust particles (IDPs) at optical wavelengths. The spatial distribution of IDPs in the Solar System may hold an important key to understanding the evolution of the Solar System and material transportation within it. The number density of IDPs can be expressed as $n(r) \sim r^{-α}$, and the exponent $α\sim 1.3$ was obtained by previous observations from interplanetary space by Helios 1/2 and Pioneer 10/11 in the 1970s and 1980s. However, no direct measurements of $α$ based on ZL observations from interplanetary space outside Earth's orbit have been performed since then. Here, we introduce initial results for the radial profile of the ZL at optical wavelengths observed over the range 0.76-1.06 au by ONC-T aboard the Hayabusa2# mission in 2021-2022. The ZL brightness we obtained is well reproduced by a model brightness, although there is a small excess of the observed ZL brightness over the model brightness at around 0.9 au. The radial power-law index we obtained is $α= 1.30 \pm 0.08$, which is consistent with previous results based on ZL observations. The dominant source of uncertainty arises from the uncertainty in estimating the diffuse Galactic light (DGL).
△ Less
Submitted 6 July, 2023; v1 submitted 13 March, 2023;
originally announced March 2023.
-
Study and perspective on neutron beam divergence improvement achievable by the combination of two or more neutron collimating systems
Authors:
Oriol Sans-Planell,
Francesco Cantini,
Marco Costa,
Francesco Grazzi,
Manuel Morgano,
Masako Yamada
Abstract:
This communication presents the results obtained at an experimental campaign at PSI BOA beamline using the combination of the ANET Compact Neutron Collimator (CNC) with the actual BOA pin-hole system. Through extensive resolution campaigns, it has been possible to quantify and understand the effects of improvement on the beam divergence when combining the two collimating systems. A new theoretical…
▽ More
This communication presents the results obtained at an experimental campaign at PSI BOA beamline using the combination of the ANET Compact Neutron Collimator (CNC) with the actual BOA pin-hole system. Through extensive resolution campaigns, it has been possible to quantify and understand the effects of improvement on the beam divergence when combining the two collimating systems. A new theoretical approach to this problem is described and discussed. The effect is expected not to be limited to the specific case that has been studied at PSI BOA but to have a more general validity for neutron collimation systems.
△ Less
Submitted 24 February, 2023;
originally announced February 2023.
-
Nystrom Method for Accurate and Scalable Implicit Differentiation
Authors:
Ryuichiro Hataya,
Makoto Yamada
Abstract:
The essential difficulty of gradient-based bilevel optimization using implicit differentiation is to estimate the inverse Hessian vector product with respect to neural network parameters. This paper proposes to tackle this problem by the Nystrom method and the Woodbury matrix identity, exploiting the low-rankness of the Hessian. Compared to existing methods using iterative approximation, such as c…
▽ More
The essential difficulty of gradient-based bilevel optimization using implicit differentiation is to estimate the inverse Hessian vector product with respect to neural network parameters. This paper proposes to tackle this problem by the Nystrom method and the Woodbury matrix identity, exploiting the low-rankness of the Hessian. Compared to existing methods using iterative approximation, such as conjugate gradient and the Neumann series approximation, the proposed method avoids numerical instability and can be efficiently computed in matrix operations without iterations. As a result, the proposed method works stably in various tasks and is faster than iterative approximations. Throughout experiments including large-scale hyperparameter optimization and meta learning, we demonstrate that the Nystrom method consistently achieves comparable or even superior performance to other approaches. The source code is available from https://github.com/moskomule/hypergrad.
△ Less
Submitted 19 February, 2023;
originally announced February 2023.