subscribe to arXiv mailings

A Multivocal Review of MLOps Practices, Challenges and Open Issues

Authors: Beyza Eken, Samodha Pallewatta, Nguyen Khoi Tran, Ayse Tosun, Muhammad Ali Babar

Abstract: With the increasing trend of Machine Learning (ML) enabled software applications, the paradigm of ML Operations (MLOps) has gained tremendous attention of researchers and practitioners. MLOps encompasses the practices and technologies for streamlining the resources and monitoring needs of operationalizing ML models. Software development practitioners need access to the detailed and easily understa… ▽ More With the increasing trend of Machine Learning (ML) enabled software applications, the paradigm of ML Operations (MLOps) has gained tremendous attention of researchers and practitioners. MLOps encompasses the practices and technologies for streamlining the resources and monitoring needs of operationalizing ML models. Software development practitioners need access to the detailed and easily understandable knowledge of MLOps workflows, practices, challenges and solutions to effectively and efficiently support the adoption of MLOps. Whilst the academic and industry literature on the MLOps has been growing rapidly, there have been relatively a few attempts at systematically synthesizing and analyzing the vast amount of existing literature of MLOps for improving ease of access and understanding. We conducted a Multivocal Literature Review (MLR) of 150 relevant academic studies and 48 gray literature to provide a comprehensive body of knowledge on MLOps. Through this MLR, we identified the emerging MLOps practices, adoption challenges and solutions related to various areas, including development and operation of complex pipelines, managing production at scale, managing artifacts, and ensuring quality, security, governance, and ethical aspects. We also report the socio-technical aspect of MLOps relating to diverse roles involved and collaboration practices across them through the MLOps lifecycle. We assert that this MLR provides valuable insights to researchers and practitioners seeking to navigate the rapidly evolving landscape of MLOps. We also identify the open issues that need to be addressed in order to advance the current state-of-the-art of MLOps. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 45 pages, 4 figures

arXiv:2405.20089 [pdf, other]

The Fine-Tuning Paradox: Boosting Translation Quality Without Sacrificing LLM Abilities

Authors: David Stap, Eva Hasler, Bill Byrne, Christof Monz, Ke Tran

Abstract: Fine-tuning large language models (LLMs) for machine translation has shown improvements in overall translation quality. However, it is unclear what is the impact of fine-tuning on desirable LLM behaviors that are not present in neural machine translation models, such as steerability, inherent document-level translation abilities, and the ability to produce less literal translations. We perform an… ▽ More Fine-tuning large language models (LLMs) for machine translation has shown improvements in overall translation quality. However, it is unclear what is the impact of fine-tuning on desirable LLM behaviors that are not present in neural machine translation models, such as steerability, inherent document-level translation abilities, and the ability to produce less literal translations. We perform an extensive translation evaluation on the LLaMA and Falcon family of models with model size ranging from 7 billion up to 65 billion parameters. Our results show that while fine-tuning improves the general translation quality of LLMs, several abilities degrade. In particular, we observe a decline in the ability to perform formality steering, to produce technical translations through few-shot examples, and to perform document-level translation. On the other hand, we observe that the model produces less literal translations after fine-tuning on parallel data. We show that by including monolingual data as part of the fine-tuning data we can maintain the abilities while simultaneously enhancing overall translation quality. Our findings emphasize the need for fine-tuning strategies that preserve the benefits of LLMs for machine translation. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: Accepted to ACL 2024 (long, main)

arXiv:2405.15427 [pdf, other]

doi 10.3847/1538-4357/ad4ce3

AGEL: Is the Conflict Real? Investigating Galaxy Evolution Models using Strong Lensing at 0.3 < z < 0.9

Authors: Nandini Sahu, Kim-Vy Tran, Sherry H. Suyu, Anowar J. Shajib, Sebastian Ertl, Glenn G. Kacprzak, Karl Glazebrook, Tucker Jones, Keerthi Vasan G. C., Tania M. Barone, A. Makai Baker, Hannah Skobe, Caro Derkenne, Geraint F. Lewis, Sarah M. Sweet, Sebastian Lopez

Abstract: Observed evolution of the total mass distribution with redshift is crucial to testing galaxy evolution theories. To measure the total mass distribution, strong gravitational lenses complement the resolved dynamical observations currently limited to $z \lesssim 0.5$. Here we present the lens models for a pilot sample of seven galaxy-scale lenses from the ASTRO3D Galaxy Evolution with Lenses (AGEL)… ▽ More Observed evolution of the total mass distribution with redshift is crucial to testing galaxy evolution theories. To measure the total mass distribution, strong gravitational lenses complement the resolved dynamical observations currently limited to $z \lesssim 0.5$. Here we present the lens models for a pilot sample of seven galaxy-scale lenses from the ASTRO3D Galaxy Evolution with Lenses (AGEL) survey. The AGEL lenses, modeled using HST/WFC3-F140W images with Gravitational Lens Efficient Explorer (GLEE) software, have deflector redshifts between $0.3 < z_{\rm defl} < 0.9$. Assuming a power-law density profile with slope $γ$, we measure the total density profile for the deflector galaxies via lens modeling. We also measure the stellar velocity dispersions ($σ_{\rm obs}$) for four lenses and obtain $σ_{\rm obs}$ from SDSS-BOSS for the remaining lenses to test our lens models by comparing observed and model-predicted velocity dispersions. For the seven AGEL lenses, we measure an average density profile slope of $-1.95 \pm 0.09$ and a $γ$--$z$ relation that does not evolve with redshift at $z<1$. Although our result is consistent with some observations and simulations, it differs from other studies at $z<1$ that suggest the $γ$--$z$ relation evolves with redshift. The apparent conflicts among observations and simulations may be due to a combination of 1) systematics in the lensing and dynamical modeling; 2) challenges in comparing observations with simulations; and 3) assuming a simple power-law for the total mass distribution. By providing more lenses at $z_{\rm defl} > 0.5$, the AGEL survey will provide stronger constraints on whether the mass profiles evolve with redshift as predicted by current theoretical models. △ Less

Submitted 21 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

Comments: Accepted for publication in the Astrophysical Journal (ApJ) 24 Pages, 8 Figures, 3 Tables

arXiv:2405.13010 [pdf, other]

UCCIX: Irish-eXcellence Large Language Model

Authors: Khanh-Tung Tran, Barry O'Sullivan, Hoang D. Nguyen

Abstract: The development of Large Language Models (LLMs) has predominantly focused on high-resource languages, leaving extremely low-resource languages like Irish with limited representation. This work presents UCCIX, a pioneering effort on the development of an open-source Irish-based LLM. We propose a novel framework for continued pre-training of LLMs specifically adapted for extremely low-resource langu… ▽ More The development of Large Language Models (LLMs) has predominantly focused on high-resource languages, leaving extremely low-resource languages like Irish with limited representation. This work presents UCCIX, a pioneering effort on the development of an open-source Irish-based LLM. We propose a novel framework for continued pre-training of LLMs specifically adapted for extremely low-resource languages, requiring only a fraction of the textual data typically needed for training LLMs according to scaling laws. Our model, based on Llama 2-13B, outperforms much larger models on Irish language tasks with up to 12% performance improvement, showcasing the effectiveness and efficiency of our approach. We also contribute comprehensive Irish benchmarking datasets, including IrishQA, a question-answering dataset, and Irish version of MT-bench. These datasets enable rigorous evaluation and facilitate future research in Irish LLM systems. Our work aims to preserve and promote the Irish language, knowledge, and culture of Ireland in the digital era while providing a framework for adapting LLMs to other indigenous languages. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2404.17570 [pdf, other]

A manufacturable platform for photonic quantum computing

Authors: Koen Alexander, Andrea Bahgat, Avishai Benyamini, Dylan Black, Damien Bonneau, Stanley Burgos, Ben Burridge, Geoff Campbell, Gabriel Catalano, Alex Ceballos, Chia-Ming Chang, CJ Chung, Fariba Danesh, Tom Dauer, Michael Davis, Eric Dudley, Ping Er-Xuan, Josep Fargas, Alessandro Farsi, Colleen Fenrich, Jonathan Frazer, Masaya Fukami, Yogeeswaran Ganesan, Gary Gibson, Mercedes Gimeno-Segovia , et al. (70 additional authors not shown)

Abstract: Whilst holding great promise for low noise, ease of operation and networking, useful photonic quantum computing has been precluded by the need for beyond-state-of-the-art components, manufactured by the millions. Here we introduce a manufacturable platform for quantum computing with photons. We benchmark a set of monolithically-integrated silicon photonics-based modules to generate, manipulate, ne… ▽ More Whilst holding great promise for low noise, ease of operation and networking, useful photonic quantum computing has been precluded by the need for beyond-state-of-the-art components, manufactured by the millions. Here we introduce a manufacturable platform for quantum computing with photons. We benchmark a set of monolithically-integrated silicon photonics-based modules to generate, manipulate, network, and detect photonic qubits, demonstrating dual-rail photonic qubits with $99.98\% \pm 0.01\%$ state preparation and measurement fidelity, Hong-Ou-Mandel quantum interference between independent photon sources with $99.50\%\pm0.25\%$ visibility, two-qubit fusion with $99.22\%\pm0.12\%$ fidelity, and a chip-to-chip qubit interconnect with $99.72\%\pm0.04\%$ fidelity, not accounting for loss. In addition, we preview a selection of next generation technologies, demonstrating low-loss silicon nitride waveguides and components, fabrication-tolerant photon sources, high-efficiency photon-number-resolving detectors, low-loss chip-to-fiber coupling, and barium titanate electro-optic phase shifters. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: 8 pages, 5 figures

arXiv:2404.09951 [pdf, other]

Unifying Global and Local Scene Entities Modelling for Precise Action Spotting

Authors: Kim Hoang Tran, Phuc Vuong Do, Ngoc Quoc Ly, Ngan Le

Abstract: Sports videos pose complex challenges, including cluttered backgrounds, camera angle changes, small action-representing objects, and imbalanced action class distribution. Existing methods for detecting actions in sports videos heavily rely on global features, utilizing a backbone network as a black box that encompasses the entire spatial frame. However, these approaches tend to overlook the nuance… ▽ More Sports videos pose complex challenges, including cluttered backgrounds, camera angle changes, small action-representing objects, and imbalanced action class distribution. Existing methods for detecting actions in sports videos heavily rely on global features, utilizing a backbone network as a black box that encompasses the entire spatial frame. However, these approaches tend to overlook the nuances of the scene and struggle with detecting actions that occupy a small portion of the frame. In particular, they face difficulties when dealing with action classes involving small objects, such as balls or yellow/red cards in soccer, which only occupy a fraction of the screen space. To address these challenges, we introduce a novel approach that analyzes and models scene entities using an adaptive attention mechanism. Particularly, our model disentangles the scene content into the global environment feature and local relevant scene entities feature. To efficiently extract environmental features while considering temporal information with less computational cost, we propose the use of a 2D backbone network with a time-shift mechanism. To accurately capture relevant scene entities, we employ a Vision-Language model in conjunction with the adaptive attention mechanism. Our model has demonstrated outstanding performance, securing the 1st place in the SoccerNet-v2 Action Spotting, FineDiving, and FineGym challenge with a substantial performance improvement of 1.6, 2.0, and 1.3 points in avg-mAP compared to the runner-up methods. Furthermore, our approach offers interpretability capabilities in contrast to other deep learning models, which are often designed as black boxes. Our code and models are released at: https://github.com/Fsoft-AIC/unifying-global-local-feature. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: Accepted to IJCNN 2024

arXiv:2404.02320 [pdf, ps, other]

On Properties of Adjoint Systems for Evolutionary PDEs

Authors: Brian K. Tran, Ben S. Southworth, Melvin Leok

Abstract: We investigate the geometric structure of adjoint systems associated with evolutionary partial differential equations at the fully continuous, semi-discrete, and fully discrete levels and the relations between these levels. We show that the adjoint system associated with an evolutionary partial differential equation has an infinite-dimensional Hamiltonian structure, which is useful for connecting… ▽ More We investigate the geometric structure of adjoint systems associated with evolutionary partial differential equations at the fully continuous, semi-discrete, and fully discrete levels and the relations between these levels. We show that the adjoint system associated with an evolutionary partial differential equation has an infinite-dimensional Hamiltonian structure, which is useful for connecting the fully continuous, semi-discrete, and fully discrete levels. We subsequently address the question of discretize-then-optimize versus optimize-then-discrete for both semi-discretization and time integration, by characterizing the commutativity of discretize-then-optimize methods versus optimize-then-discretize methods uniquely in terms of an adjoint-variational quadratic conservation law. For Galerkin semi-discretizations and one-step time integration methods in particular, we explicitly construct these commuting methods by using structure-preserving discretization techniques. △ Less

Submitted 2 April, 2024; originally announced April 2024.

arXiv:2403.13533 [pdf, ps, other]

On Sums of Practical Numbers and Polygonal Numbers

Authors: Sai Teja Somu, Duc Van Khanh Tran

Abstract: Practical numbers are positive integers $n$ such that every positive integer less than or equal to $n$ can be written as a sum of distinct positive divisors of $n$. In this paper, we show that all positive integers can be written as a sum of a practical number and a triangular number, resolving a conjecture by Sun. We also show that all sufficiently large natural numbers can be written as a sum of… ▽ More Practical numbers are positive integers $n$ such that every positive integer less than or equal to $n$ can be written as a sum of distinct positive divisors of $n$. In this paper, we show that all positive integers can be written as a sum of a practical number and a triangular number, resolving a conjecture by Sun. We also show that all sufficiently large natural numbers can be written as a sum of a practical number and two $s$-gonal numbers. △ Less

Submitted 20 March, 2024; originally announced March 2024.

MSC Class: 11B83; 11D85; 11A99

Journal ref: Journal of Integer Sequences, 27(5), Article 24.5.1, 2024

arXiv:2403.13285 [pdf, other]

MOSEL survey: Spatially offset Lyman-continuum emission in a new emitter at z=3.088

Authors: Anshu Gupta, Cathryn M. Trott, Ravi Jaiswar, E. V. Ryan-Weber, Andrew J. Bunker, Ayan Acharyya, Alex J. Cameron, Ben Forrest, Glenn G. Kacprzak, Themiya Nanayakkara, Kim-Vy Tran, Aman Chokshi

Abstract: We present the discovery of a unique Lyman-continuum (LyC) emitter at z=3.088. The LyC emission were detected using the Hubble Space Telescope (HST) WFC3/UVIS F336W filter, covering a rest-frame wavelength range of 760-900 Angstrom. The peak signal-to-noise ratio (SNR) of LyC emission is 3.9 in a r=0.24'' aperture and is spatially offset by 0.29''+/-0.04'' (~ 2.2+/-0.3 kpc) from the rest-UV emissi… ▽ More We present the discovery of a unique Lyman-continuum (LyC) emitter at z=3.088. The LyC emission were detected using the Hubble Space Telescope (HST) WFC3/UVIS F336W filter, covering a rest-frame wavelength range of 760-900 Angstrom. The peak signal-to-noise ratio (SNR) of LyC emission is 3.9 in a r=0.24'' aperture and is spatially offset by 0.29''+/-0.04'' (~ 2.2+/-0.3 kpc) from the rest-UV emission peak (F606W). By combining imaging and spectroscopic data from the James Webb Space Telescope (JWST) JADES, FRESCO and JEMS surveys, along with VLT/MUSE data from the MXDF survey, we estimate that the probability of random alignment with an interloper galaxy causing the LyC emission is less than 6x10^-5. The interstellar medium (ISM) conditions in the galaxy are similar to other LyC emitters at high redshift (12+log(O/H)=7.79+/-0.06, logU =-3.27+/-0.14, O32 = 3.65+/-0.22), although the single-peaked Lyman-alpha profile and lack of rest-UV emission lines suggest an optically thick ISM. We think that LyC photons are leaking through a narrow cone of optically thin neutral ISM, most likely created by a past merger (as evidenced by medium-band F210M and F182M images). Using the escape fraction constraints from individual leakers and a simple model, we estimate that the opening half-angle of ionization cones can be as low as 16^deg (2% ionised fraction) to reproduce some of the theoretical constraints on the average escape fraction for galaxies. The narrow opening angle required can explain the low number density of confirmed LyC leakers. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: 12 pages, 8 figures, and 1 table. Submitted to The Astrophysical Journal

arXiv:2403.06364 [pdf, other]

MP2-based composite extrapolation schemes can predict core-ionization energies for first-row elements with coupled-cluster level accuracy

Authors: Anton Morgunov, Henry K. Tran, Oinam Romesh Meitei, Yu-Che Chien, Troy Van Voorhis

Abstract: X-ray photoelectron spectroscopy (XPS) measures core-electron binding energies (CEBEs) to reveal element-specific insights into chemical environment and bonding. Accurate theoretical CEBE prediction aids XPS interpretation but requires proper modeling of orbital relaxation and electron correlation upon core-ionization. This work systematically investigates basis set selection for extrapolation to… ▽ More X-ray photoelectron spectroscopy (XPS) measures core-electron binding energies (CEBEs) to reveal element-specific insights into chemical environment and bonding. Accurate theoretical CEBE prediction aids XPS interpretation but requires proper modeling of orbital relaxation and electron correlation upon core-ionization. This work systematically investigates basis set selection for extrapolation to the complete basis set (CBS) limit of CEBEs from $Δ$MP2 and $Δ$CC energies across 94 K-edges in diverse organic molecules. We demonstrate that an alternative composite scheme using $Δ$MP2 in a large basis corrected by $Δ$CC-$Δ$MP2 difference in a small basis can quantitatively recover optimally extrapolated $Δ$CC CEBEs within 0.02 eV. Unlike $Δ$CC, MP2 calculations do not suffer from convergence issues and are computationally cheaper, and, thus, the composite $Δ$MP2/$Δ$CC scheme balances accuracy and cost, overcoming limitations of solely using either method. We conclude by providing a comprehensive analysis of the choice of small and large basis sets for the composite schemes and provide practical recommendations for highly accurate (within 0.10-0.15 eV MAE) ab initio prediction of XPS spectra. △ Less

Submitted 10 March, 2024; originally announced March 2024.

arXiv:2403.01339 [pdf, ps, other]

Uniform $\mathcal{C}^k$ Approximation of $G$-Invariant and Antisymmetric Functions, Embedding Dimensions, and Polynomial Representations

Authors: Soumya Ganguly, Khoa Tran, Rahul Sarkar

Abstract: For any subgroup $G$ of the symmetric group $\mathcal{S}_n$ on $n$ symbols, we present results for the uniform $\mathcal{C}^k$ approximation of $G$-invariant functions by $G$-invariant polynomials. For the case of totally symmetric functions ($G = \mathcal{S}_n$), we show that this gives rise to the sum-decomposition Deep Sets ansatz of Zaheer et al. (2018), where both the inner and outer function… ▽ More For any subgroup $G$ of the symmetric group $\mathcal{S}_n$ on $n$ symbols, we present results for the uniform $\mathcal{C}^k$ approximation of $G$-invariant functions by $G$-invariant polynomials. For the case of totally symmetric functions ($G = \mathcal{S}_n$), we show that this gives rise to the sum-decomposition Deep Sets ansatz of Zaheer et al. (2018), where both the inner and outer functions can be chosen to be smooth, and moreover, the inner function can be chosen to be independent of the target function being approximated. In particular, we show that the embedding dimension required is independent of the regularity of the target function, the accuracy of the desired approximation, as well as $k$. Next, we show that a similar procedure allows us to obtain a uniform $\mathcal{C}^k$ approximation of antisymmetric functions as a sum of $K$ terms, where each term is a product of a smooth totally symmetric function and a smooth antisymmetric homogeneous polynomial of degree at most $\binom{n}{2}$. We also provide upper and lower bounds on $K$ and show that $K$ is independent of the regularity of the target function, the desired approximation accuracy, and $k$. △ Less

Submitted 2 March, 2024; originally announced March 2024.

Comments: 38 pages

MSC Class: 05E10 ACM Class: I.2.4; I.2.6; I.2.0

arXiv:2402.16214 [pdf, ps, other]

Products and powers of principal symmetric ideals

Authors: Eric Dannetun, Riccardo Formenti, Bo Y. Gao, Juliann Geraci, Ross Kogel, Yuelin Li, Shreya Mandal, Vinuge Rupasinghe, Alexandra Seceleanu, Duc Van Khank Tran, Noah Walker

Abstract: Principal symmetric ideals were recently introduced by Harada, Seceleanu, and Sega, with a focus on their homological properties. They are ideals generated by the orbit of a single polynomial under permutations of variables in a polynomial ring. In this paper we seek to determine when a product of two principal symmetric ideals is principal symmetric and when all the powers of a principal symmetri… ▽ More Principal symmetric ideals were recently introduced by Harada, Seceleanu, and Sega, with a focus on their homological properties. They are ideals generated by the orbit of a single polynomial under permutations of variables in a polynomial ring. In this paper we seek to determine when a product of two principal symmetric ideals is principal symmetric and when all the powers of a principal symmetric ideal are again principal symmetric ideals. We characterize the ideals that have the latter property as being generated by polynomials invariant up to a scalar multiple under permutation of variables. Recognizing principal symmetric ideals is an open question for the purpose of which we produce certain obstructions. We also demonstrate that the Hilbert functions of symmetric monomial ideals are not all given by symmetric monomial ideals, in contrast to the non-symmetric case. △ Less

Submitted 17 June, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

Comments: A flaw in the proof of Theorem 3.5 in version 1 = Theorem 3.6 in version 2 was remedied

MSC Class: Primary: 13A50; 13C13; Secondary: 13D40; 13F20

arXiv:2402.11041 [pdf, other]

doi 10.37190/e-inf220103

How good are my search strings? Reflections on using an existing review as a quasi-gold standard

Authors: Huynh Khanh Vi Tran, Jürgen Börstler, Nauman Bin Ali, Michael Unterkalmsteiner

Abstract: Background: Systematic literature studies (SLS) have become a core research methodology in Evidence-based Software Engineering (EBSE). Search completeness, ie, finding all relevant papers on the topic of interest, has been recognized as one of the most commonly discussed validity issues of SLSs. Aim: This study aims at raising awareness on the issues related to search string construction and on se… ▽ More Background: Systematic literature studies (SLS) have become a core research methodology in Evidence-based Software Engineering (EBSE). Search completeness, ie, finding all relevant papers on the topic of interest, has been recognized as one of the most commonly discussed validity issues of SLSs. Aim: This study aims at raising awareness on the issues related to search string construction and on search validation using a quasi-gold standard (QGS). Furthermore, we aim at providing guidelines for search string validation. Method: We use a recently completed tertiary study as a case and complement our findings with the observations from other researchers studying and advancing EBSE. Results: We found that the issue of assessing QGS quality has not seen much attention in the literature, and the validation of automated searches in SLSs could be improved. Hence, we propose to extend the current search validation approach by the additional analysis step of the automated search validation results and provide recommendations for the QGS construction. Conclusion: In this paper, we report on new issues which could affect search completeness in SLSs. Furthermore, the proposed guideline and recommendations could help researchers implement a more reliable search strategy in their SLSs. △ Less

Submitted 16 February, 2024; originally announced February 2024.

Journal ref: e Informatica Softw. Eng. J. 16(1) (2022)

arXiv:2402.09541 [pdf, other]

doi 10.1016/j.infsof.2021.106620

Assessing test artifact quality -- A tertiary study

Authors: Huynh Khanh Vi Tran, Michael Unterkalmsteiner, Jürgen Börstler, Nauman bin Ali

Abstract: Context: Modern software development increasingly relies on software testing for an ever more frequent delivery of high quality software. This puts high demands on the quality of the central artifacts in software testing, test suites and test cases. Objective: We aim to develop a comprehensive model for capturing the dimensions of test case/suite quality, which are relevant for a variety of perspe… ▽ More Context: Modern software development increasingly relies on software testing for an ever more frequent delivery of high quality software. This puts high demands on the quality of the central artifacts in software testing, test suites and test cases. Objective: We aim to develop a comprehensive model for capturing the dimensions of test case/suite quality, which are relevant for a variety of perspectives. Method: We have carried out a systematic literature review to identify and analyze existing secondary studies on quality aspects of software testing artifacts. Results: We identified 49 relevant secondary studies. Of these 49 studies, less than half did some form of quality appraisal of the included primary studies and only 3 took into account the quality of the primary study when synthesizing the results. We present an aggregation of the context dimensions and factors that can be used to characterize the environment in which the test case/suite quality is investigated. We also provide a comprehensive model of test case/suite quality with definitions for the quality attributes and measurements based on findings in the literature and ISO/IEC 25010:2011. Conclusion: The test artifact quality model presented in the paper can be used to support test artifact quality assessment and improvement initiatives in practice. Furtherm Information and Software Technology 139 (2021): 106620ore, the model can also be used as a framework for documenting context characteristics to make research results more accessible for research and practice. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Journal ref: Information and Software Technology 139 (2021): 106620

arXiv:2402.09285 [pdf, other]

GraphiQ: Quantum circuit design for photonic graph states

Authors: Jie Lin, Benjamin MacLellan, Sobhan Ghanbari, Julie Belleville, Khuong Tran, Luc Robichaud, Roger G. Melko, Hoi-Kwong Lo, Piotr Roztocki

Abstract: GraphiQ is a versatile open-source framework for designing photonic graph state generation schemes, with a particular emphasis on photon-emitter hybrid circuits. Built in Python, GraphiQ consists of a suite of design tools, including multiple simulation backends and optimization methods. The library supports scheme optimization in the presence of circuit imperfections, as well as user-defined opti… ▽ More GraphiQ is a versatile open-source framework for designing photonic graph state generation schemes, with a particular emphasis on photon-emitter hybrid circuits. Built in Python, GraphiQ consists of a suite of design tools, including multiple simulation backends and optimization methods. The library supports scheme optimization in the presence of circuit imperfections, as well as user-defined optimization goals. Our framework thus represents a valuable tool for the development of practical schemes adhering to experimentally-relevant constraints. As graph states are a key resource for measurement-based quantum computing, all-photonic quantum repeaters, and robust quantum metrology, among others, we envision GraphiQ's broad impact for advancing quantum technologies. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: 8+10 pages, 4+5 figures

arXiv:2402.00942 [pdf, other]

Spatially Resolved Galactic Winds at Cosmic Noon: Outflow Kinematics and Mass Loading in a Lensed Star-Forming Galaxy at $z=1.87$

Authors: Keerthi Vasan G. C., Tucker Jones, Anowar J. Shajib, Sunny Rhoades, Yuguang Chen, Ryan L. Sanders, Daniel P. Stark, Richard S. Ellis, Nicha Leethochawalit, Glenn G. Kacprzak, Tania M. Barone, Karl Glazebrook, Kim-Vy H. Tran, Hannah Skobe, Kris Mortensen, Ivana Barisic

Abstract: We study the spatially resolved outflow properties of CSWA13, an intermediate mass ($M_*=10^{9}~\mathrm{M}_{\odot}$), gravitationally lensed star-forming galaxy at $z=1.87$. We use Keck/KCWI to map outflows in multiple rest-frame ultraviolet ISM absorption lines, along with fluorescent Si II$^*$ emission, and nebular emission from C III] tracing the local systemic velocity. The spatial structure o… ▽ More We study the spatially resolved outflow properties of CSWA13, an intermediate mass ($M_*=10^{9}~\mathrm{M}_{\odot}$), gravitationally lensed star-forming galaxy at $z=1.87$. We use Keck/KCWI to map outflows in multiple rest-frame ultraviolet ISM absorption lines, along with fluorescent Si II$^*$ emission, and nebular emission from C III] tracing the local systemic velocity. The spatial structure of outflow velocity mirrors that of the nebular kinematics, which we interpret to be a signature of a young galactic wind that is pressurizing the ISM of the galaxy but is yet to burst out. From the radial extent of Si II$^*$ emission, we estimate that the outflow is largely encapsulated within $3.5$ kpc. We explore the geometry (e.g., patchiness) of the outflow by measuring the covering fraction at different velocities, finding that the maximum covering fraction is at velocities $v\simeq-150$ km$\,$s$^{-1}$. Using the outflow velocity ($v_{out}$), radius ($R$), column density ($N$), and solid angle ($Ω$) based on the covering fraction, we measure the mass loss rate $\log\dot{m}_{out}/(\mathrm{M}_{\odot}\text{yr}^{-1}) = 1.73\pm0.23$ and mass loading factor $\logη= 0.04\pm0.34$ for the low-ionization outflowing gas in this galaxy. These values are relatively large and the bulk of the outflowing gas is moving with speeds less than the escape velocity of the galaxy halo, suggesting that the majority of outflowing mass will remain in the circumgalactic medium and/or recycle back into the galaxy. The results support a picture of high outflow rates transporting mass and metals into the inner circumgalactic medium, providing the gas reservoir for future star formation. △ Less

Submitted 1 February, 2024; originally announced February 2024.

Comments: Submitted to ApJ. Comments welcome (17 pages, 10 figures, 1 table)

arXiv:2401.12427 [pdf, ps, other]

Order Conditions for Nonlinearly Partitioned Runge-Kutta Methods

Authors: Brian K. Tran, Ben S. Southworth, Tommaso Buvoli

Abstract: Recently a new class of nonlinearly partitioned Runge-Kutta (NPRK) methods was proposed for nonlinearly partitioned systems of ordinary differential equations, $y' = F(y,y)$. The target class of problems are ones in which different scales, stiffnesses, or physics are coupled in a nonlinear way, wherein the desired partition cannot be written in a classical additive or component-wise fashion. Here… ▽ More Recently a new class of nonlinearly partitioned Runge-Kutta (NPRK) methods was proposed for nonlinearly partitioned systems of ordinary differential equations, $y' = F(y,y)$. The target class of problems are ones in which different scales, stiffnesses, or physics are coupled in a nonlinear way, wherein the desired partition cannot be written in a classical additive or component-wise fashion. Here we use rooted-tree analysis to derive full order conditions for NPRK$_M$ methods, where $M$ denotes the number of nonlinear partitions. Due to the nonlinear coupling and thereby mixed product differentials, it turns out the standard node-colored rooted-tree analysis used in analyzing ODE integrators does not naturally apply. Instead we develop a new edge-colored rooted-tree framework to address the nonlinear coupling. The resulting order conditions are enumerated, provided directly for up to 4th order with $M=2$ and 3rd-order with $M=3$, and related to existing order conditions of additive and partitioned RK methods. △ Less

Submitted 22 January, 2024; originally announced January 2024.

arXiv:2401.03977 [pdf, other]

On the infinite time horizon approximation for Lévy-driven McKean-Vlasov SDEs with non-globally Lipschitz continuous and super-linearly growth drift and diffusion coefficients

Authors: Ngoc Khue Tran, Trung-Thuy Kieu, Duc-Trong Luong, Hoang-Long Ngo

Abstract: This paper studies the numerical approximation for McKean-Vlasov stochastic differential equations driven by Lévy processes. We propose a tamed-adaptive Euler-Maruyama scheme and consider its strong convergence in both finite and infinite time horizons when applying for some classes of Lévy-driven McKean-Vlasov stochastic differential equations with non-globally Lipschitz continuous and super-line… ▽ More This paper studies the numerical approximation for McKean-Vlasov stochastic differential equations driven by Lévy processes. We propose a tamed-adaptive Euler-Maruyama scheme and consider its strong convergence in both finite and infinite time horizons when applying for some classes of Lévy-driven McKean-Vlasov stochastic differential equations with non-globally Lipschitz continuous and super-linearly growth drift and diffusion coefficients. △ Less

Submitted 8 January, 2024; originally announced January 2024.

Comments: 40 pages, 1 figure

MSC Class: 60H35; 60H10

arXiv:2312.15576 [pdf, other]

Reducing LLM Hallucinations using Epistemic Neural Networks

Authors: Shreyas Verma, Kien Tran, Yusuf Ali, Guangyu Min

Abstract: Reducing and detecting hallucinations in large language models is an open research problem. In this project, we attempt to leverage recent advances in the field of uncertainty estimation to reduce hallucinations in frozen large language models. Epistemic neural networks have recently been proposed to improve output joint distributions for large pre-trained models. ENNs are small networks attached… ▽ More Reducing and detecting hallucinations in large language models is an open research problem. In this project, we attempt to leverage recent advances in the field of uncertainty estimation to reduce hallucinations in frozen large language models. Epistemic neural networks have recently been proposed to improve output joint distributions for large pre-trained models. ENNs are small networks attached to large, frozen models to improve the model's joint distributions and uncertainty estimates. In this work, we train an epistemic neural network on top of the Llama-2 7B model combined with a contrastive decoding feature enhancement technique. We are the first to train an ENN for the next token prediction task and explore the efficacy of this method in reducing hallucinations on the TruthfulQA dataset. In essence, we provide a method that leverages a pre-trained model's latent embeddings to reduce hallucinations. △ Less

Submitted 24 December, 2023; originally announced December 2023.

Comments: 12 pages,9 figures, 4 tables

arXiv:2311.03527 [pdf, other]

Type II Hamiltonian Lie Group Variational Integrators with Applications to Geometric Adjoint Sensitivity Analysis

Authors: Brian K. Tran, Melvin Leok

Abstract: Variational integrators for Euler--Lagrange equations and Hamilton's equations are a class of structure-preserving numerical methods that respect the conservative properties of such systems. Lie group variational integrators are a particular class of these integrators that apply to systems which evolve over the tangent bundle and cotangent bundle of Lie groups. Traditionally, these are constructed… ▽ More Variational integrators for Euler--Lagrange equations and Hamilton's equations are a class of structure-preserving numerical methods that respect the conservative properties of such systems. Lie group variational integrators are a particular class of these integrators that apply to systems which evolve over the tangent bundle and cotangent bundle of Lie groups. Traditionally, these are constructed from a variational principle which assumes fixed position endpoints. In this paper, we instead construct Lie group variational integrators with a novel Type II variational principle on the cotangent bundle of a Lie group which allows for Type II boundary conditions, i.e., fixed initial position and final momenta; these boundary conditions are particularly important for adjoint sensitivity analysis, which is the motivating application in our paper. In general, such Type II variational principles are only globally defined on vector spaces or locally defined on general manifolds; however, by left translation, we are able to define this variational principle globally on cotangent bundles of Lie groups. By developing the continuous and discrete Type II variational principles over Lie groups, we construct a structure-preserving Lie group variational integrator that is both symplectic and momentum-preserving. Subsequently, we introduce adjoint systems on Lie groups, and show how these adjoint systems can be used to perform geometric adjoint sensitivity analysis for optimization problems on Lie groups. Finally, we conclude with two numerical examples to show how adjoint sensitivity analysis can be used to solve initial-value optimization problems and optimal control problems on Lie groups. △ Less

Submitted 6 November, 2023; originally announced November 2023.

arXiv:2311.02158 [pdf, other]

doi 10.3847/2041-8213/ad0788

MOSEL survey: JWST reveals major mergers/strong interactions drive the extreme emission lines in the early universe

Authors: Anshu Gupta, Ravi Jaiswar, Vicente Rodriguez-Gomez, Ben Forrest, Kim-Vy Tran, Themiya Nanayakkara, Anishya Harshan, Elisabete da Cunha, Glenn G. Kacprzak, Michaela Hirschmann

Abstract: Extreme emission line galaxies (EELGs), where nebular emissions contribute 30-40% of the flux in certain photometric bands, are ubiquitous in the early universe (z>6). We utilise deep NIRCam imaging from the JWST Advanced Deep Extragalactic Survey (JADES) to investigate the properties of companion galaxies (projected distance <40 kpc, |dv|<10,000 km/s) around EELGs at z~3. Tests with TNG100 simula… ▽ More Extreme emission line galaxies (EELGs), where nebular emissions contribute 30-40% of the flux in certain photometric bands, are ubiquitous in the early universe (z>6). We utilise deep NIRCam imaging from the JWST Advanced Deep Extragalactic Survey (JADES) to investigate the properties of companion galaxies (projected distance <40 kpc, |dv|<10,000 km/s) around EELGs at z~3. Tests with TNG100 simulation reveal that nearly all galaxies at z=3 will merge with at least one companion galaxy selected using similar parameters by z=0. The median mass ratio of the most massive companion and the total mass ratio of all companions around EELGs is more than 10 times higher than the control sample. Even after comparing with a stellar mass and stellar mass plus specific SFR-matched control sample, EELGs have three-to-five times higher mass ratios of the brightest companion and total mass ratio of all companions. Our measurements suggest that EELGs are more likely to be experiencing strong interactions or undergoing major mergers irrespective of their stellar mass or specific SFRs. We suspect that gas cooling induced by strong interactions and/or major mergers could be triggering the extreme emission lines, and the increased merger rate might be responsible for the over-abundance of EELGs at z>6. △ Less

Submitted 3 November, 2023; originally announced November 2023.

Comments: 7 pages, 5 figure, Accepted for publication in the Astrophysical Journal Letters

Journal ref: The Astrophysical Journal Letters, 2023

arXiv:2310.18046 [pdf, other]

ViCLEVR: A Visual Reasoning Dataset and Hybrid Multimodal Fusion Model for Visual Question Answering in Vietnamese

Authors: Khiem Vinh Tran, Hao Phu Phan, Kiet Van Nguyen, Ngan Luu Thuy Nguyen

Abstract: In recent years, Visual Question Answering (VQA) has gained significant attention for its diverse applications, including intelligent car assistance, aiding visually impaired individuals, and document image information retrieval using natural language queries. VQA requires effective integration of information from questions and images to generate accurate answers. Neural models for VQA have made r… ▽ More In recent years, Visual Question Answering (VQA) has gained significant attention for its diverse applications, including intelligent car assistance, aiding visually impaired individuals, and document image information retrieval using natural language queries. VQA requires effective integration of information from questions and images to generate accurate answers. Neural models for VQA have made remarkable progress on large-scale datasets, with a primary focus on resource-rich languages like English. To address this, we introduce the ViCLEVR dataset, a pioneering collection for evaluating various visual reasoning capabilities in Vietnamese while mitigating biases. The dataset comprises over 26,000 images and 30,000 question-answer pairs (QAs), each question annotated to specify the type of reasoning involved. Leveraging this dataset, we conduct a comprehensive analysis of contemporary visual reasoning systems, offering valuable insights into their strengths and limitations. Furthermore, we present PhoVIT, a comprehensive multimodal fusion that identifies objects in images based on questions. The architecture effectively employs transformers to enable simultaneous reasoning over textual and visual data, merging both modalities at an early model stage. The experimental findings demonstrate that our proposed model achieves state-of-the-art performance across four evaluation metrics. The accompanying code and dataset have been made publicly accessible at \url{https://github.com/kvt0012/ViCLEVR}. This provision seeks to stimulate advancements within the research community, fostering the development of more multimodal fusion algorithms, specifically tailored to address the nuances of low-resource languages, exemplified by Vietnamese. △ Less

Submitted 27 October, 2023; originally announced October 2023.

Comments: A pre-print version and submitted to journal

arXiv:2310.15356 [pdf, other]

Lie Group Variational Collision Integrators for a Class of Hybrid Systems

Authors: Khoa Tran, Melvin Leok

Abstract: The problem of 3-dimensional, convex rigid-body collision over a plane is fully investigated; this includes bodies with sharp corners that is resolved without the need for nonsmooth convex analysis of tangent and normal cones. In particular, using nonsmooth Lagrangian mechanics, the equations of motion and jump equations are derived, which are largely dependent on the collision detection function.… ▽ More The problem of 3-dimensional, convex rigid-body collision over a plane is fully investigated; this includes bodies with sharp corners that is resolved without the need for nonsmooth convex analysis of tangent and normal cones. In particular, using nonsmooth Lagrangian mechanics, the equations of motion and jump equations are derived, which are largely dependent on the collision detection function. Following the variational approach, a Lie group variational collision integrator (LGVCI) is systematically derived that is symplectic, momentum-preserving, and has excellent long-time, near energy conservation. Furthermore, systems with corner impacts are resolved adeptly using $ε$-rounding on the sign distance function (SDF) of the body. Extensive numerical experiments are conducted to demonstrate the conservation properties of the LGVCI. △ Less

Submitted 15 March, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

Comments: 40 pages, 8 figures

MSC Class: 37M15; 65P10; 70F35; 70G65; 34A38; 49J52

arXiv:2310.14602 [pdf, ps, other]

Generative Pre-trained Transformer for Vietnamese Community-based COVID-19 Question Answering

Authors: Tam Minh Vo, Khiem Vinh Tran

Abstract: Recent studies have provided empirical evidence of the wide-ranging potential of Generative Pre-trained Transformer (GPT), a pretrained language model, in the field of natural language processing. GPT has been effectively employed as a decoder within state-of-the-art (SOTA) question answering systems, yielding exceptional performance across various tasks. However, the current research landscape co… ▽ More Recent studies have provided empirical evidence of the wide-ranging potential of Generative Pre-trained Transformer (GPT), a pretrained language model, in the field of natural language processing. GPT has been effectively employed as a decoder within state-of-the-art (SOTA) question answering systems, yielding exceptional performance across various tasks. However, the current research landscape concerning GPT's application in Vietnamese remains limited. This paper aims to address this gap by presenting an implementation of GPT-2 for community-based question answering specifically focused on COVID-19 related queries in Vietnamese. We introduce a novel approach by conducting a comparative analysis of different Transformers vs SOTA models in the community-based COVID-19 question answering dataset. The experimental findings demonstrate that the GPT-2 models exhibit highly promising outcomes, outperforming other SOTA models as well as previous community-based COVID-19 question answering models developed for Vietnamese. △ Less

Submitted 31 October, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

arXiv:2310.14549 [pdf, other]

Multimodal Graph Learning for Modeling Emerging Pandemics with Big Data

Authors: Khanh-Tung Tran, Truong Son Hy, Lili Jiang, Xuan-Son Vu

Abstract: Accurate forecasting and analysis of emerging pandemics play a crucial role in effective public health management and decision-making. Traditional approaches primarily rely on epidemiological data, overlooking other valuable sources of information that could act as sensors or indicators of pandemic patterns. In this paper, we propose a novel framework called MGL4MEP that integrates temporal graph… ▽ More Accurate forecasting and analysis of emerging pandemics play a crucial role in effective public health management and decision-making. Traditional approaches primarily rely on epidemiological data, overlooking other valuable sources of information that could act as sensors or indicators of pandemic patterns. In this paper, we propose a novel framework called MGL4MEP that integrates temporal graph neural networks and multi-modal data for learning and forecasting. We incorporate big data sources, including social media content, by utilizing specific pre-trained language models and discovering the underlying graph structure among users. This integration provides rich indicators of pandemic dynamics through learning with temporal graph neural networks. Extensive experiments demonstrate the effectiveness of our framework in pandemic forecasting and analysis, outperforming baseline methods across different areas, pandemic situations, and prediction horizons. The fusion of temporal graph learning and multi-modal data enables a comprehensive understanding of the pandemic landscape with less time lag, cheap cost, and more potential information indicators. △ Less

Submitted 23 October, 2023; originally announced October 2023.

arXiv:2310.14149 [pdf, ps, other]

Some Results on Zumkeller Numbers

Authors: Sai Teja Somu, Andrzej Kukla, Duc Van Khanh Tran

Abstract: A positive integer $n$ is said to be a Zumkeller number or an integer-perfect number if the set of its positive divisors can be partitioned into two subsets of equal sums. In this paper, we prove several results regarding Zumkeller numbers. For any positive integer $m$, we prove that there are infinitely many positive integers $n$ for which $n+1,\cdots, n+m$ are all Zumkeller numbers. Additionally… ▽ More A positive integer $n$ is said to be a Zumkeller number or an integer-perfect number if the set of its positive divisors can be partitioned into two subsets of equal sums. In this paper, we prove several results regarding Zumkeller numbers. For any positive integer $m$, we prove that there are infinitely many positive integers $n$ for which $n+1,\cdots, n+m$ are all Zumkeller numbers. Additionally, we show that every positive integer greater than $94185$ can be expressed as a sum of two Zumkeller numbers and that all sufficiently large integers can be written as a sum of a Zumkeller number and a practical number. We also show that there are infinitely many positive integers that cannot be expressed as a sum of a Zumkeller number and a square or a prime. △ Less

Submitted 27 November, 2023; v1 submitted 21 October, 2023; originally announced October 2023.

MSC Class: 11B13; 11B25; 11P99

arXiv:2310.11477 [pdf, other]

Robust-MBFD: A Robust Deep Learning System for Motor Bearing Faults Detection Using Multiple Deep Learning Training Strategies and A Novel Double Loss Function

Authors: Khoa Tran, Lam Pham, Hai-Canh Vu

Abstract: This paper presents a comprehensive analysis of motor bearing fault detection (MBFD), which involves the task of identifying faults in a motor bearing based on its vibration. To this end, we first propose and evaluate various machine learning based systems for the MBFD task. Furthermore, we propose three deep learning based systems for the MBFD task, each of which explores one of the following tra… ▽ More This paper presents a comprehensive analysis of motor bearing fault detection (MBFD), which involves the task of identifying faults in a motor bearing based on its vibration. To this end, we first propose and evaluate various machine learning based systems for the MBFD task. Furthermore, we propose three deep learning based systems for the MBFD task, each of which explores one of the following training strategies: supervised learning, semi-supervised learning, and unsupervised learning. The proposed machine learning based systems and deep learning based systems are evaluated, compared, and then they are used to identify the best model for the MBFD task. We conducted extensive experiments on various benchmark datasets of motor bearing faults, including those from the American Society for Mechanical Failure Prevention Technology (MFPT), Case Western Reserve University Bearing Center (CWRU), and the Condition Monitoring of Bearing Damage in Electromechanical Drive Systems from Paderborn University (PU). The experimental results on different datasets highlight two main contributions of this study. First, we prove that deep learning based systems are more effective than machine learning based systems for the MBFD task. Second, we achieve a robust and general deep learning based system with a novel loss function for the MBFD task on several benchmark datasets, demonstrating its potential for real-life MBFD applications. △ Less

Submitted 17 October, 2023; originally announced October 2023.

arXiv:2310.10875 [pdf, other]

Filling the Holes on 3D Heritage Object Surface based on Automatic Segmentation Algorithm

Authors: Sinh Van Nguyen, Son Thanh Le, Minh Khai Tran, Le Thanh Sach

Abstract: Reconstructing and processing the 3D objects are popular activities in the research field of computer graphics, image processing and computer vision. The 3D objects are processed based on the methods like geometric modeling, a branch of applied mathematics and computational geometry, or the machine learning algorithms based on image processing. The computation of geometrical objects includes proce… ▽ More Reconstructing and processing the 3D objects are popular activities in the research field of computer graphics, image processing and computer vision. The 3D objects are processed based on the methods like geometric modeling, a branch of applied mathematics and computational geometry, or the machine learning algorithms based on image processing. The computation of geometrical objects includes processing the curves and surfaces, subdivision, simplification, meshing, holes filling, reconstructing, and refining the 3D surface objects on both point cloud data and triangular mesh. While the machine learning methods are developed using deep learning models. With the support of 3D laser scan devices and Lidar techniques, the obtained dataset is close to original shape of the real objects. Besides, the photography and its application based on the modern techniques in recent years help us collect data and process the 3D models more precise. This article proposes an improved method for filling holes on the 3D object surface based on an automatic segmentation. Instead of filling the hole directly as the existing methods, we now subdivide the hole before filling it. The hole is first determined and segmented automatically based on computation of its local curvature. It is then filled on each part of the hole to match its local curvature shape. The method can work on both 3D point cloud surfaces and triangular mesh surface. Comparing to the state of the art methods, our proposed method obtained higher accuracy of the reconstructed 3D objects. △ Less

Submitted 16 October, 2023; originally announced October 2023.

Comments: 20 pages, 11 figures, 37 references

arXiv:2310.06300 [pdf, other]

An Empirically Grounded Reference Architecture for Software Supply Chain Metadata Management

Authors: Nguyen Khoi Tran, Samodha Pallewatta, M. Ali Babar

Abstract: With the rapid rise in Software Supply Chain (SSC) attacks, organisations need thorough and trustworthy visibility over the entire SSC of their software inventory to detect risks early and identify compromised assets rapidly in the event of an SSC attack. One way to achieve such visibility is through SSC metadata, machine-readable and authenticated documents describing an artefact's lifecycle. Ado… ▽ More With the rapid rise in Software Supply Chain (SSC) attacks, organisations need thorough and trustworthy visibility over the entire SSC of their software inventory to detect risks early and identify compromised assets rapidly in the event of an SSC attack. One way to achieve such visibility is through SSC metadata, machine-readable and authenticated documents describing an artefact's lifecycle. Adopting SSC metadata requires organisations to procure or develop a Software Supply Chain Metadata Management system (SCM2), a suite of software tools for performing life cycle activities of SSC metadata documents such as creation, signing, distribution, and consumption. Selecting or developing an SCM2 is challenging due to the lack of a comprehensive domain model and architectural blueprint to aid practitioners in navigating the vast design space of SSC metadata terminologies, frameworks, and solutions. This paper addresses the above-mentioned challenge by presenting an empirically grounded Reference Architecture (RA) comprising of a domain model and an architectural blueprint for SCM2 systems. Our proposed RA is constructed systematically on an empirical foundation built with industry-driven and peer-reviewed SSC security frameworks. Our theoretical evaluation, which consists of an architectural mapping of five prominent SSC security tools on the RA, ensures its validity and applicability, thus affirming the proposed RA as an effective framework for analysing existing SCM2 solutions and guiding the engineering of new SCM2 systems. △ Less

Submitted 8 June, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

Comments: Accepted for full paper presentation at EASE 2024 conference

arXiv:2310.02921 [pdf]

doi 10.1002/mame.202300375

Tuneable and biodegradable poly(ester amide)s for disposable facemasks

Authors: Esteban Alvarez Seoane, Alessandro Cattaneo, Fabien Neuenschwander, Lucien Blanchard, Tatiana Nogueira Matos, Laure Jeandupeux, Gianni Fiorucci, Maryam Tizgadam, Kelly Tran, Pierre-Louis Sciboz, Luce Albergati, Jérôme Charmet, Roger Marti, Stefan Hengsberger

Abstract: The widespread use of disposable facemasks during the COVID-19 pandemic has led to environmental widespread concern due to microplastic pollution. Biodegradable disposable facemasks are a first step to reducing the environmental impact of pandemics. In this paper we present high-performance facemask components based on novel poly(ester amide) (PEA) grades synthesized from bio-sourced materials and… ▽ More The widespread use of disposable facemasks during the COVID-19 pandemic has led to environmental widespread concern due to microplastic pollution. Biodegradable disposable facemasks are a first step to reducing the environmental impact of pandemics. In this paper we present high-performance facemask components based on novel poly(ester amide) (PEA) grades synthesized from bio-sourced materials and processed into non-woven facemask components. PEA based polymers present an excellent compromise between mechanical performance and biodegradability. Importantly, the properties of the PEA can easily be tuned by changing the ratio of the ester and amides, or variation of diol and diacid part. We synthesized seven polymers which we optimized for biodegradability and processability. Among them, two grades combined electrospinning process compatibility with full degradation within 35 days, using a normalized biodegradation test. The ultra-thin filters thus developed were evaluated for performance on a custom-made characterization bench. The filters achieved a microparticle capture efficiency and breathability comparable to commercial filters. Another PEA grade was optimized to reach optimal visco-thermal properties that made it compatible with solvent-free melt-spinning process as demonstrated with continuous fibres production. Overall, our environmentally friendly solution paves the way for the fabrication of high-performance fibres with excellent biodegradability for the next generation facemasks. △ Less

Submitted 4 October, 2023; originally announced October 2023.

Comments: 32 pages (manuscript: 21 pages, 6 figures) (SI: 11 pages, 6 figures)

arXiv:2310.00273 [pdf, other]

Safe Stabilizing Control for Polygonal Robots in Dynamic Elliptical Environments

Authors: Kehan Long, Khoa Tran, Melvin Leok, Nikolay Atanasov

Abstract: This paper addresses the challenge of safe navigation for rigid-body mobile robots in dynamic environments. We introduce an analytic approach to compute the distance between a polygon and an ellipse, and employ it to construct a control barrier function (CBF) for safe control synthesis. Existing CBF design methods for mobile robot obstacle avoidance usually assume point or circular robots, prevent… ▽ More This paper addresses the challenge of safe navigation for rigid-body mobile robots in dynamic environments. We introduce an analytic approach to compute the distance between a polygon and an ellipse, and employ it to construct a control barrier function (CBF) for safe control synthesis. Existing CBF design methods for mobile robot obstacle avoidance usually assume point or circular robots, preventing their applicability to more realistic robot body geometries. Our work enables CBF designs that capture complex robot and obstacle shapes. We demonstrate the effectiveness of our approach in simulations highlighting real-time obstacle avoidance in constrained and dynamic environments for both mobile robots and multi-joint robot arms. △ Less

Submitted 30 April, 2024; v1 submitted 30 September, 2023; originally announced October 2023.

Comments: 2024 American Control Conference

arXiv:2309.16801 [pdf, ps, other]

doi 10.1007/978-3-030-35333-9_3

Test-Case Quality -- Understanding Practitioners' Perspectives

Authors: Huynh Khanh Vi Tran, Nauman Bin Ali, Jürgen Börstler, Michael Unterkalmsteiner

Abstract: Background: Test-case quality has always been one of the major concerns in software testing. To improve test-case quality, it is important to better understand how practitioners perceive the quality of test-cases. Objective: Motivated by that need, we investigated how practitioners define test-case quality and which aspects of test-cases are important for quality assessment. Method: We conducted s… ▽ More Background: Test-case quality has always been one of the major concerns in software testing. To improve test-case quality, it is important to better understand how practitioners perceive the quality of test-cases. Objective: Motivated by that need, we investigated how practitioners define test-case quality and which aspects of test-cases are important for quality assessment. Method: We conducted semi-structured interviews with professional developers, testers and test architects from a multinational software company in Sweden. Before the interviews, we asked participants for actual test cases (written in natural language) that they perceive as good, normal, and bad respectively together with rationales for their assessment. We also compared their opinions on shared test cases and contrasted their views with the relevant literature. Results: We present a quality model which consists of 11 test-case quality attributes. We also identify a misalignment in defining test-case quality among practitioners and between academia and industry, along with suggestions for improving test-case quality in industry. Conclusion: The results show that practitioners' background, including roles and working experience, are critical dimensions of how test-case quality is defined and assessed. △ Less

Submitted 28 September, 2023; originally announced September 2023.

Comments: PROFES 2019: 37-52

arXiv:2309.12972 [pdf, other]

License Plate Recognition Based On Multi-Angle View Model

Authors: Dat Tran-Anh, Khanh Linh Tran, Hoai-Nam Vu

Abstract: In the realm of research, the detection/recognition of text within images/videos captured by cameras constitutes a highly challenging problem for researchers. Despite certain advancements achieving high accuracy, current methods still require substantial improvements to be applicable in practical scenarios. Diverging from text detection in images/videos, this paper addresses the issue of text dete… ▽ More In the realm of research, the detection/recognition of text within images/videos captured by cameras constitutes a highly challenging problem for researchers. Despite certain advancements achieving high accuracy, current methods still require substantial improvements to be applicable in practical scenarios. Diverging from text detection in images/videos, this paper addresses the issue of text detection within license plates by amalgamating multiple frames of distinct perspectives. For each viewpoint, the proposed method extracts descriptive features characterizing the text components of the license plate, specifically corner points and area. Concretely, we present three viewpoints: view-1, view-2, and view-3, to identify the nearest neighboring components facilitating the restoration of text components from the same license plate line based on estimations of similarity levels and distance metrics. Subsequently, we employ the CnOCR method for text recognition within license plates. Experimental results on the self-collected dataset (PTITPlates), comprising pairs of images in various scenarios, and the publicly available Stanford Cars Dataset, demonstrate the superiority of the proposed method over existing approaches. △ Less

Submitted 22 September, 2023; originally announced September 2023.

arXiv:2309.10550 [pdf, other]

Addressing the Scalability Bottleneck of Semantic Technologies at Bosch

Authors: Diego Rincon-Yanez, Mohamed H. Gad-Elrab, Daria Stepanova, Kien Trung Tran, Cuong Chu Xuan, Baifan Zhou, Evgeny Karlamov

Abstract: At the heart of smart manufacturing is real-time semi-automatic decision-making. Such decisions are vital for optimizing production lines, e.g., reducing resource consumption, improving the quality of discrete manufacturing operations, and optimizing the actual products, e.g., optimizing the sampling rate for measuring product dimensions during production. Such decision-making relies on massive in… ▽ More At the heart of smart manufacturing is real-time semi-automatic decision-making. Such decisions are vital for optimizing production lines, e.g., reducing resource consumption, improving the quality of discrete manufacturing operations, and optimizing the actual products, e.g., optimizing the sampling rate for measuring product dimensions during production. Such decision-making relies on massive industrial data thus posing a real-time processing bottleneck. △ Less

Submitted 19 September, 2023; originally announced September 2023.

Journal ref: Industry Track - Extended Semantic Web Conference (ESWC2023)

arXiv:2309.08012 [pdf, other]

HUVECs-encapsulation via Millimeter-sized Alginate Droplets

Authors: Khanh Tran, Brenda A. A. B. Ametepe, Erika L. Gomez, Daniel Ramos, Clare Kim, Ga-Young Kelly Suh, Siavash Ahrar, Perla Ayala

Abstract: Droplet microfluidics are a powerful approach for hydrogel cell encapsulations. Much of the field has focused on single-cell encapsulations with pico-nanoliter droplet volumes necessary for single-cell sequencing or high-throughput screening. These small volumes, however, limit the use of hydrogel droplets for tissue engineering or cell therapies. We describe simple droplet microfluidics to genera… ▽ More Droplet microfluidics are a powerful approach for hydrogel cell encapsulations. Much of the field has focused on single-cell encapsulations with pico-nanoliter droplet volumes necessary for single-cell sequencing or high-throughput screening. These small volumes, however, limit the use of hydrogel droplets for tissue engineering or cell therapies. We describe simple droplet microfluidics to generate millimeter-sized alginate droplets and demonstrate their use for cell encapsulations. This effort builds on our recent efforts, specifically by replacing the glass slide forming the bottom layer of the chamber with a more hydrophobic acrylic (PMMA) layer to improve the alginate-in-oil droplet formation. Using glass layer and PMMA layer devices, we characterized the tunable production of water-in-oil droplets (average droplet lengths ranged from 0.8 to 3.7 mm). Next, PMMA layer devices were used to demonstrate the tunable generation of alginate-in-oil droplets (average droplet lengths ranged from 3-6 mm). Increasing the flow ratio (Q.ratio = Q.oil/Q.alginate) led to more uniform droplets as measured by the coefficient of variance, which was approximately 5%. Finally, a proof-of-use experiment used HUVEC-encapsulated alginate droplets as part of a scratch-healing assay. Specifically, HUVEC-encapsulated droplets (AH droplets) led to the recovery of 3T3 fibroblast monolayers compared to no droplets or cell-free droplets (A droplets). Our results extended the use of simple microfluidics to generate and retrieve millimeter-sized alginate droplets for effective cell encapsulations. △ Less

Submitted 14 September, 2023; originally announced September 2023.

Comments: 13 pages, 6 figures

arXiv:2309.06157 [pdf, other]

Robust-MBDL: A Robust Multi-branch Deep Learning Based Model for Remaining Useful Life Prediction and Operational Condition Identification of Rotating Machines

Authors: Khoa Tran, Hai-Canh Vu, Lam Pham, Nassim Boudaoud

Abstract: In this paper, a Robust Multi-branch Deep learning-based system for remaining useful life (RUL) prediction and condition operations (CO) identification of rotating machines is proposed. In particular, the proposed system comprises main components: (1) an LSTM-Autoencoder to denoise the vibration data; (2) a feature extraction to generate time-domain, frequency-domain, and time-frequency based feat… ▽ More In this paper, a Robust Multi-branch Deep learning-based system for remaining useful life (RUL) prediction and condition operations (CO) identification of rotating machines is proposed. In particular, the proposed system comprises main components: (1) an LSTM-Autoencoder to denoise the vibration data; (2) a feature extraction to generate time-domain, frequency-domain, and time-frequency based features from the denoised data; (3) a novel and robust multi-branch deep learning network architecture to exploit the multiple features. The performance of our proposed system was evaluated and compared to the state-of-the-art systems on two benchmark datasets of XJTU-SY and PRONOSTIA. The experimental results prove that our proposed system outperforms the state-of-the-art systems and presents potential for real-life applications on bearing machines. △ Less

Submitted 14 December, 2023; v1 submitted 12 September, 2023; originally announced September 2023.

arXiv:2308.13043 [pdf, ps, other]

Classifying Primitive Solvable Permutation Groups of Rank 5 and 6

Authors: Anakin Dey, Kolton O'Neal, Duc Van Khanh Tran, Camron Upshur, Yong Yang

Abstract: Let $G$ be a finite solvable permutation group acting faithfully and primitively on a finite set $Ω$. Let $G_0$ be the stabilizer of a point $α\in Ω$ The rank of $G$ is defined as the number of orbits of $G_0$ in $Ω$, including the trivial orbit $\{α\}$. In this paper, we completely classify the cases where $G$ has rank 5 and 6, continuing the previous works on classifying groups of rank 4 o… ▽ More Let $G$ be a finite solvable permutation group acting faithfully and primitively on a finite set $Ω$. Let $G_0$ be the stabilizer of a point $α\in Ω$ The rank of $G$ is defined as the number of orbits of $G_0$ in $Ω$, including the trivial orbit $\{α\}$. In this paper, we completely classify the cases where $G$ has rank 5 and 6, continuing the previous works on classifying groups of rank 4 or lower. △ Less

Submitted 4 February, 2024; v1 submitted 24 August, 2023; originally announced August 2023.

arXiv:2308.11596 [pdf, other]

SeamlessM4T: Massively Multilingual & Multimodal Machine Translation

Authors: Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Cora Meglioli, David Dale, Ning Dong, Paul-Ambroise Duquenne, Hady Elsahar, Hongyu Gong, Kevin Heffernan, John Hoffman, Christopher Klaiber, Pengwei Li, Daniel Licht, Jean Maillard, Alice Rakotoarison, Kaushik Ram Sadagopan, Guillaume Wenzek, Ethan Ye, Bapi Akula, Peng-Jen Chen, Naji El Hachem, Brian Ellis, Gabriel Mejia Gonzalez, Justin Haaheim , et al. (43 additional authors not shown)

Abstract: What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages? While recent breakthroughs in text-based models have pushed machine translation coverage beyond 200 languages, unified speech-to-speech translation models have yet to achieve similar strides. More specifically, conventional speech-to-speech translation systems rely on cascaded s… ▽ More What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages? While recent breakthroughs in text-based models have pushed machine translation coverage beyond 200 languages, unified speech-to-speech translation models have yet to achieve similar strides. More specifically, conventional speech-to-speech translation systems rely on cascaded systems that perform translation progressively, putting high-performing unified systems out of reach. To address these gaps, we introduce SeamlessM4T, a single model that supports speech-to-speech translation, speech-to-text translation, text-to-speech translation, text-to-text translation, and automatic speech recognition for up to 100 languages. To build this, we used 1 million hours of open speech audio data to learn self-supervised speech representations with w2v-BERT 2.0. Subsequently, we created a multimodal corpus of automatically aligned speech translations. Filtered and combined with human-labeled and pseudo-labeled data, we developed the first multilingual system capable of translating from and into English for both speech and text. On FLEURS, SeamlessM4T sets a new standard for translations into multiple target languages, achieving an improvement of 20% BLEU over the previous SOTA in direct speech-to-text translation. Compared to strong cascaded models, SeamlessM4T improves the quality of into-English translation by 1.3 BLEU points in speech-to-text and by 2.6 ASR-BLEU points in speech-to-speech. Tested for robustness, our system performs better against background noises and speaker variations in speech-to-text tasks compared to the current SOTA model. Critically, we evaluated SeamlessM4T on gender bias and added toxicity to assess translation safety. Finally, all contributions in this work are open-sourced and accessible at https://github.com/facebookresearch/seamless_communication △ Less

Submitted 24 October, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

ACM Class: I.2.7

arXiv:2308.07601 [pdf, ps, other]

VBD-MT Chinese-Vietnamese Translation Systems for VLSP 2022

Authors: Hai Long Trieu, Song Kiet Bui, Tan Minh Tran, Van Khanh Tran, Hai An Nguyen

Abstract: We present our systems participated in the VLSP 2022 machine translation shared task. In the shared task this year, we participated in both translation tasks, i.e., Chinese-Vietnamese and Vietnamese-Chinese translations. We build our systems based on the neural-based Transformer model with the powerful multilingual denoising pre-trained model mBART. The systems are enhanced by a sampling method fo… ▽ More We present our systems participated in the VLSP 2022 machine translation shared task. In the shared task this year, we participated in both translation tasks, i.e., Chinese-Vietnamese and Vietnamese-Chinese translations. We build our systems based on the neural-based Transformer model with the powerful multilingual denoising pre-trained model mBART. The systems are enhanced by a sampling method for backtranslation, which leverage large scale available monolingual data. Additionally, several other methods are applied to improve the translation quality including ensembling and postprocessing. We achieve 38.9 BLEU on ChineseVietnamese and 38.0 BLEU on VietnameseChinese on the public test sets, which outperform several strong baselines. △ Less

Submitted 15 August, 2023; originally announced August 2023.

arXiv:2308.05606 [pdf, other]

doi 10.1038/s41586-024-07191-9

A massive galaxy that formed its stars at $z \sim 11$

Authors: Karl Glazebrook, Themiya Nanayakkara, Corentin Schreiber, Claudia Lagos, Lalitwadee Kawinwanichakij, Colin Jacobs, Harry Chittenden, Gabriel Brammer, Glenn G. Kacprzak, Ivo Labbe, Danilo Marchesini, Z. Cemile Marsan, Pascal A. Oesch, Casey Papovich, Rhea-Silvia Remus, Kim-Vy H. Tran, James Esdaile, Angel Chandro Gomez

Abstract: The formation of galaxies by gradual hierarchical co-assembly of baryons and cold dark matter halos is a fundamental paradigm underpinning modern astrophysics and predicts a strong decline in the number of massive galaxies at early cosmic times. Extremely massive quiescent galaxies (stellar masses $>10^{11}$ M$_\odot$) have now been observed as early as 1-2 billions years after the Big Bang; these… ▽ More The formation of galaxies by gradual hierarchical co-assembly of baryons and cold dark matter halos is a fundamental paradigm underpinning modern astrophysics and predicts a strong decline in the number of massive galaxies at early cosmic times. Extremely massive quiescent galaxies (stellar masses $>10^{11}$ M$_\odot$) have now been observed as early as 1-2 billions years after the Big Bang; these are extremely constraining on theoretical models as they form 300-500 Myr earlier and only some models can form massive galaxies this early. Here we report on the spectroscopic observations with the James Webb Space Telescope of a massive quiescent galaxy ZF-UDS-7329 at redshift 3.205 $\pm$ 0.005 that eluded deep ground-based spectrscopy, is significantly redder than typical and whose spectrum reveals features typical of much older stellar populations. Detailed modelling shows the stellar population formed around 1.5 billion years earlier in time (z ~ 11) at an epoch when dark matter halos of sufficient hosting mass have not yet assembled in the standard scenario. This observation may point to the presence of undetected populations of early galaxies and the possibility of significant gaps in our understanding of early stellar populations, galaxy formation and/or the nature of dark matter. △ Less

Submitted 3 May, 2024; v1 submitted 10 August, 2023; originally announced August 2023.

Comments: 21 pages, 6 figures, updated to reflect accepted version. v3 with figure caption clarified

Journal ref: Nature 2024

arXiv:2307.15335 [pdf, other]

BARTPhoBEiT: Pre-trained Sequence-to-Sequence and Image Transformers Models for Vietnamese Visual Question Answering

Authors: Khiem Vinh Tran, Kiet Van Nguyen, Ngan Luu Thuy Nguyen

Abstract: Visual Question Answering (VQA) is an intricate and demanding task that integrates natural language processing (NLP) and computer vision (CV), capturing the interest of researchers. The English language, renowned for its wealth of resources, has witnessed notable advancements in both datasets and models designed for VQA. However, there is a lack of models that target specific countries such as Vie… ▽ More Visual Question Answering (VQA) is an intricate and demanding task that integrates natural language processing (NLP) and computer vision (CV), capturing the interest of researchers. The English language, renowned for its wealth of resources, has witnessed notable advancements in both datasets and models designed for VQA. However, there is a lack of models that target specific countries such as Vietnam. To address this limitation, we introduce a transformer-based Vietnamese model named BARTPhoBEiT. This model includes pre-trained Sequence-to-Sequence and bidirectional encoder representation from Image Transformers in Vietnamese and evaluates Vietnamese VQA datasets. Experimental results demonstrate that our proposed model outperforms the strong baseline and improves the state-of-the-art in six metrics: Accuracy, Precision, Recall, F1-score, WUPS 0.0, and WUPS 0.9. △ Less

Submitted 28 July, 2023; originally announced July 2023.

arXiv:2307.13246 [pdf, other]

Vibrational heat-bath configuration interaction with semistochastic perturbation theory using harmonic oscillator or VSCF modals

Authors: Henry K. Tran, Timothy C. Berkelbach

Abstract: Vibrational heat-bath configuration interaction (VHCI) -- a selected configuration interaction technique for vibrational structure theory -- has recently been developed in two independent works [J. Chem. Phys. 154, 074104 (2021); Mol. Phys. 119, e1936250 (2021)], where it was shown to provide accuracy on par with the most accurate vibrational structure methods with a low computational cost. Here,… ▽ More Vibrational heat-bath configuration interaction (VHCI) -- a selected configuration interaction technique for vibrational structure theory -- has recently been developed in two independent works [J. Chem. Phys. 154, 074104 (2021); Mol. Phys. 119, e1936250 (2021)], where it was shown to provide accuracy on par with the most accurate vibrational structure methods with a low computational cost. Here, we eliminate the memory bottleneck of the second-order perturbation theory (PT2) correction using the same (semi)stochastic approach developed previously for electronic structure theory. This allows us to treat, in an unbiased manner, much larger perturbative spaces, which are necessary for high accuracy in large systems. Stochastic errors are easily controlled to be less than 1 cm$^{-1}$. We also report two other developments: (i) we propose a new heat-bath criterion and an associated exact implicit sorting algorithm for potential energy surfaces expressible as a sum of products of one-dimensional potentials; (ii) we formulate VHCI to use a vibrational self-consistent field (VSCF) reference, as opposed to the harmonic oscillator reference configuration used in previous reports. Interestingly, we find that with VSCF, the minor improvements to the accuracy are outweighed by the much higher computational cost associated with the loss of sparsity in the Hamiltonian and integrals transformations needed for matrix element evaluation. △ Less

Submitted 25 July, 2023; originally announced July 2023.

arXiv:2307.09590 [pdf]

The FENIKS Survey: Spectroscopic Confirmation of Massive Quiescent Galaxies at z ~ 3-5

Authors: Jacqueline Antwi-Danso, Casey Papovich, James Esdaile, Themiya Nanayakkara, Karl Glazebrook, Taylor A. Hutchison, Katherine E. Whitaker, Z. Cemile Marsan, Ruben J. Diaz, Danilo Marchesini, Adam Muzzin, Kim-Vy H. Tran, David J. Setton, Yasha Kaushal, Joshua S. Speagle, Justin Cole

Abstract: The measured ages of massive, quiescent galaxies at $z\sim 3-4$ imply that massive galaxies quench as early as $z\sim 6$. While the number of spectroscopic confirmations of quiescent galaxies at $z < 3$ has increased over the years, there are only a handful at $z > 3.5$. We report spectroscopic redshifts of one secure ($z=3.757$) and two tentative ($z = 3.336$, $z=4.673$) massive (… ▽ More The measured ages of massive, quiescent galaxies at $z\sim 3-4$ imply that massive galaxies quench as early as $z\sim 6$. While the number of spectroscopic confirmations of quiescent galaxies at $z < 3$ has increased over the years, there are only a handful at $z > 3.5$. We report spectroscopic redshifts of one secure ($z=3.757$) and two tentative ($z = 3.336$, $z=4.673$) massive ($\log(M_\ast/M_\odot) > 10.3$) quiescent galaxies with 11 hours of Keck/MOSFIRE $K$-band observations. Our candidates were selected from the FENIKS survey, which uses deep Gemini/Flamingos-2 $K_b$ $K_r$ imaging optimized for increased sensitivity to the characteristic red colors of galaxies at $z > 3$ with strong Balmer/4000 Å breaks. The rest-frame $UVJ$ and $(ugi)_s$ colors of 3/4 quiescent candidates are consistent with $1-2$ Gyr old stellar populations. This places these galaxies as the oldest objects at these redshifts, and challenges the notion that quiescent galaxies at $z > 3$ are all recently-quenched, "post-starburst" galaxies. Our spectroscopy shows that the other quiescent-galaxy candidate is a broad-line AGN ($z = 3.594$) with strong, redshifted $Hβ$+[O III] emission with a velocity offset $>1000$ km/s, indicative of a powerful outflow. The star-formation history of our highest redshift candidate suggests that its progenitor was already in place by $z \sim 7-11$, reaching $\sim$ 10$^{11} M_{\odot}$ by $z \simeq 8$. These observations reveal the limit of what is possible with deep near-infrared photometry and targeted spectroscopy from the ground and demonstrate that secure spectroscopic confirmation of quiescent galaxies at $z > 4$ is only feasible with JWST. △ Less

Submitted 1 May, 2024; v1 submitted 18 July, 2023; originally announced July 2023.

Comments: 20 pages, 11 figures, submitted to ApJ

arXiv:2307.09290 [pdf, ps, other]

On Some Doubly Logarithmic Integrals

Authors: Duc Van Khanh Tran

Abstract: There have been many works on proving the integrals in the table of integrals compiled by Gradshteyn and Ryzhik, and in this paper we prove some doubly logarithmic integral identities in the Gradshteyn and Ryzhik table. There have been many works on proving the integrals in the table of integrals compiled by Gradshteyn and Ryzhik, and in this paper we prove some doubly logarithmic integral identities in the Gradshteyn and Ryzhik table. △ Less

Submitted 24 July, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

MSC Class: 33B15; 11M06

arXiv:2306.11784 [pdf, other]

NANCY: Next-generation All-sky Near-infrared Community surveY

Authors: Jiwon Jesse Han, Arjun Dey, Adrian M. Price-Whelan, Joan Najita, Edward F. Schlafly, Andrew Saydjari, Risa H. Wechsler, Ana Bonaca, David J Schlegel, Charlie Conroy, Anand Raichoor, Alex Drlica-Wagner, Juna A. Kollmeier, Sergey E. Koposov, Gurtina Besla, Hans-Walter Rix, Alyssa Goodman, Douglas Finkbeiner, Abhijeet Anand, Matthew Ashby, Benedict Bahr-Kalus, Rachel Beaton, Jayashree Behera, Eric F. Bell, Eric C Bellm , et al. (184 additional authors not shown)

Abstract: The Nancy Grace Roman Space Telescope is capable of delivering an unprecedented all-sky, high-spatial resolution, multi-epoch infrared map to the astronomical community. This opportunity arises in the midst of numerous ground- and space-based surveys that will provide extensive spectroscopy and imaging together covering the entire sky (such as Rubin/LSST, Euclid, UNIONS, SPHEREx, DESI, SDSS-V, GAL… ▽ More The Nancy Grace Roman Space Telescope is capable of delivering an unprecedented all-sky, high-spatial resolution, multi-epoch infrared map to the astronomical community. This opportunity arises in the midst of numerous ground- and space-based surveys that will provide extensive spectroscopy and imaging together covering the entire sky (such as Rubin/LSST, Euclid, UNIONS, SPHEREx, DESI, SDSS-V, GALAH, 4MOST, WEAVE, MOONS, PFS, UVEX, NEO Surveyor, etc.). Roman can uniquely provide uniform high-spatial-resolution (~0.1 arcsec) imaging over the entire sky, vastly expanding the science reach and precision of all of these near-term and future surveys. This imaging will not only enhance other surveys, but also facilitate completely new science. By imaging the full sky over two epochs, Roman can measure the proper motions for stars across the entire Milky Way, probing 100 times fainter than Gaia out to the very edge of the Galaxy. Here, we propose NANCY: a completely public, all-sky survey that will create a high-value legacy dataset benefiting innumerable ongoing and forthcoming studies of the universe. NANCY is a pure expression of Roman's potential: it images the entire sky, at high spatial resolution, in a broad infrared bandpass that collects as many photons as possible. The majority of all ongoing astronomical surveys would benefit from incorporating observations of NANCY into their analyses, whether these surveys focus on nearby stars, the Milky Way, near-field cosmology, or the broader universe. △ Less

Submitted 20 June, 2023; originally announced June 2023.

Comments: Submitted to the call for white papers for the Roman Core Community Survey (June 16th, 2023), and to the Bulletin of the AAS

arXiv:2306.09630 [pdf, other]

doi 10.1093/mnras/stad1079

The MAGPI Survey: Impact of environment on the total internal mass distribution of galaxies in the last 5 Gyr

Authors: Caro Derkenne, Richard M. McDermid, Adriano Poci, J. Trevor Mendel, Francesco D'Eugenio, Seyoung Jeon, Rhea-Silvia Remus, Sabine Bellstedt, Andrew J. Battisti, Joss Bland-Hawthorn, Anna Ferre-Mateu, Caroline Foster, K. E. Harborne, Claudia D. P. Lagos, Yingjie Peng, Piyush Sharda, Gauri Sharma, Sarah Sweet, Kim-Vy H. Tran, Lucas M. Valenzuela, Sam Vaughan, Emily Wisnioski, Sukyoung K. Yi

Abstract: We investigate the impact of environment on the internal mass distribution of galaxies using the Middle Ages Galaxy Properties with Integral field spectroscopy (MAGPI) survey. We use 2D resolved stellar kinematics to construct Jeans dynamical models for galaxies at mean redshift $z \sim 0.3$, corresponding to a lookback time of $3-4$ Gyr. The internal mass distribution for each galaxy is parameter… ▽ More We investigate the impact of environment on the internal mass distribution of galaxies using the Middle Ages Galaxy Properties with Integral field spectroscopy (MAGPI) survey. We use 2D resolved stellar kinematics to construct Jeans dynamical models for galaxies at mean redshift $z \sim 0.3$, corresponding to a lookback time of $3-4$ Gyr. The internal mass distribution for each galaxy is parameterised by the combined mass density slope $γ$ (baryons $+$ dark matter), which is the logarithmic change of density with radius. We use a MAGPI sample of 28 galaxies from low-to-mid density environments and compare to density slopes derived from galaxies in the high density Frontier Fields clusters in the redshift range $0.29 <z < 0.55$, corresponding to a lookback time of $\sim 5$ Gyr. We find a median density slope of $γ= -2.22 \pm 0.05$ for the MAGPI sample, which is significantly steeper than the Frontier Fields median slope ($γ= -2.01 \pm 0.04$), implying the cluster galaxies are less centrally concentrated in their mass distribution than MAGPI galaxies. We also compare to the distribution of density slopes from galaxies in Atlas3D at $z \sim 0$, because the sample probes a similar environmental range as MAGPI. The Atlas3D median total slope is $γ= -2.25 \pm 0.02$, consistent with the MAGPI median. Our results indicate environment plays a role in the internal mass distribution of galaxies, with no evolution of the slope in the last 3-4 Gyr. These results are in agreement with the predictions of cosmological simulations. △ Less

Submitted 16 June, 2023; originally announced June 2023.

Comments: Accepted for publication in MNRAS

Journal ref: Monthly Notices of the Royal Astronomical Society, Volume 522, Issue 3, July 2023, Pages 3602 - 3626

arXiv:2306.06620 [pdf, other]

ARIST: An Effective API Argument Recommendation Approach

Authors: Son Nguyen, Cuong Tran Manh, Kien T. Tran, Tan M. Nguyen, Thu-Trang Nguyen, Kien-Tuan Ngo, Hieu Dinh Vo

Abstract: Learning and remembering to use APIs are difficult. Several techniques have been proposed to assist developers in using APIs. Most existing techniques focus on recommending the right API methods to call, but very few techniques focus on recommending API arguments. In this paper, we propose ARIST, a novel automated argument recommendation approach which suggests arguments by predicting developers'… ▽ More Learning and remembering to use APIs are difficult. Several techniques have been proposed to assist developers in using APIs. Most existing techniques focus on recommending the right API methods to call, but very few techniques focus on recommending API arguments. In this paper, we propose ARIST, a novel automated argument recommendation approach which suggests arguments by predicting developers' expectations when they define and use API methods. To implement this idea in the recommendation process, ARIST combines program analysis (PA), language models (LMs), and several features specialized for the recommendation task which consider the functionality of formal parameters and the positional information of code elements (e.g., variables or method calls) in the given context. In ARIST, the LMs and the recommending features are used to suggest the promising candidates identified by PA. Meanwhile, PA navigates the LMs and the features working on the set of the valid candidates which satisfy syntax, accessibility, and type-compatibility constraints defined by the programming language in use. Our evaluation on a large dataset of real-world projects shows that ARIST improves the state-of-the-art approach by 19% and 18% in top-1 precision and recall for recommending arguments of frequently-used libraries. For general argument recommendation task, i.e., recommending arguments for every method call, ARIST outperforms the baseline approaches by up to 125% top-1 accuracy. Moreover, for newly-encountered projects, ARIST achieves more than 60% top-3 accuracy when evaluating on a larger dataset. For working/maintaining projects, with a personalized LM to capture developers' coding practice, ARIST can productively rank the expected arguments at the top-1 position in 7/10 requests. △ Less

Submitted 11 June, 2023; originally announced June 2023.

arXiv:2305.17648 [pdf, other]

Z-GMOT: Zero-shot Generic Multiple Object Tracking

Authors: Kim Hoang Tran, Anh Duy Le Dinh, Tien Phat Nguyen, Thinh Phan, Pha Nguyen, Khoa Luu, Donald Adjeroh, Gianfranco Doretto, Ngan Hoang Le

Abstract: Despite recent significant progress, Multi-Object Tracking (MOT) faces limitations such as reliance on prior knowledge and predefined categories and struggles with unseen objects. To address these issues, Generic Multiple Object Tracking (GMOT) has emerged as an alternative approach, requiring less prior information. However, current GMOT methods often rely on initial bounding boxes and struggle t… ▽ More Despite recent significant progress, Multi-Object Tracking (MOT) faces limitations such as reliance on prior knowledge and predefined categories and struggles with unseen objects. To address these issues, Generic Multiple Object Tracking (GMOT) has emerged as an alternative approach, requiring less prior information. However, current GMOT methods often rely on initial bounding boxes and struggle to handle variations in factors such as viewpoint, lighting, occlusion, and scale, among others. Our contributions commence with the introduction of the \textit{Referring GMOT dataset} a collection of videos, each accompanied by detailed textual descriptions of their attributes. Subsequently, we propose $\mathtt{Z-GMOT}$, a cutting-edge tracking solution capable of tracking objects from \textit{never-seen categories} without the need of initial bounding boxes or predefined categories. Within our $\mathtt{Z-GMOT}$ framework, we introduce two novel components: (i) $\mathtt{iGLIP}$, an improved Grounded language-image pretraining, for accurately detecting unseen objects with specific characteristics. (ii) $\mathtt{MA-SORT}$, a novel object association approach that adeptly integrates motion and appearance-based matching strategies to tackle the complex task of tracking objects with high similarity. Our contributions are benchmarked through extensive experiments conducted on the Referring GMOT dataset for GMOT task. Additionally, to assess the generalizability of the proposed $\mathtt{Z-GMOT}$, we conduct ablation studies on the DanceTrack and MOT20 datasets for the MOT task. Our dataset, code, and models are released at: https://fsoft-aic.github.io/Z-GMOT. △ Less

Submitted 13 June, 2024; v1 submitted 28 May, 2023; originally announced May 2023.

arXiv:2305.16474 [pdf, other]

FairDP: Certified Fairness with Differential Privacy

Authors: Khang Tran, Ferdinando Fioretto, Issa Khalil, My T. Thai, NhatHai Phan

Abstract: This paper introduces FairDP, a novel mechanism designed to achieve certified fairness with differential privacy (DP). FairDP independently trains models for distinct individual groups, using group-specific clipping terms to assess and bound the disparate impacts of DP. Throughout the training process, the mechanism progressively integrates knowledge from group models to formulate a comprehensive… ▽ More This paper introduces FairDP, a novel mechanism designed to achieve certified fairness with differential privacy (DP). FairDP independently trains models for distinct individual groups, using group-specific clipping terms to assess and bound the disparate impacts of DP. Throughout the training process, the mechanism progressively integrates knowledge from group models to formulate a comprehensive model that balances privacy, utility, and fairness in downstream tasks. Extensive theoretical and empirical analyses validate the efficacy of FairDP and improved trade-offs between model utility, privacy, and fairness compared with existing methods. △ Less

Submitted 21 August, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

arXiv:2305.15578 [pdf]

An Open-Access Database of Active-source and Passive-wavefield DAS and Nodal Station Measurements at the Newberry Florida Site

Authors: Aser Abbas, Brady R. Cox, Khiem T. Tran, Isabella Corey, Nishkarsha Dawadi

Abstract: This paper documents a comprehensive subsurface imaging experiment using stress waves in Newberry, Florida, at a site known for significant spatial variability, karstic voids, and underground anomalies. The experiment utilized advanced sensing technologies, including approximately two kilometers of distributed acoustic sensing (DAS) fiber optic cable, forming a dense 2D array of 1920 channels, and… ▽ More This paper documents a comprehensive subsurface imaging experiment using stress waves in Newberry, Florida, at a site known for significant spatial variability, karstic voids, and underground anomalies. The experiment utilized advanced sensing technologies, including approximately two kilometers of distributed acoustic sensing (DAS) fiber optic cable, forming a dense 2D array of 1920 channels, and a 2D array of 144 three-component nodal stations, to sense active-source and passive-wavefield stress waves. The active-source data was generated using a vibroseis shaker truck and impact sources, and it was simultaneously sensed by both the DAS and the nodal stations. The vibroseis truck was used to excite the ground in the three directions at 260 locations inside and outside the instrumented array, while the impact sources were used at 268 locations within the instrumented array. The passive-wavefield data recorded using the nodal stations comprised 48 hours of ambient noise collected over a period of four days in four twelve-hour time blocks. Meanwhile, the passive wavefield data collected using DAS consisted of four hours of ambient noise recordings. This paper aims to provide a comprehensive overview of the testing site, experiment layout, the DAS and nodal station acquisition parameters, implemented processing steps, and potential use cases of the dataset. While potential use cases, such as surface wave testing, full waveform inversion, and ambient noise tomography, are discussed relative to example data, the focus of this paper is on documenting this unique dataset rather than on processing the data for detecting anomalies or generating subsurface 2D/3D imaging results. The raw and processed data, along with detailed documentation of the experiment and Python tools to aid in visualizing the DAS dataset have been archived and made publicly available on DesignSafe under project PRJ-3521. △ Less

Submitted 24 May, 2023; originally announced May 2023.

Comments: 33 pages, 12 figures, dataset paper

Showing 1–50 of 357 results for author: Tran, K