-
Semantic Graphs for Syntactic Simplification: A Revisit from the Age of LLM
Authors:
Peiran Yao,
Kostyantyn Guzhva,
Denilson Barbosa
Abstract:
Symbolic sentence meaning representations, such as AMR (Abstract Meaning Representation) provide expressive and structured semantic graphs that act as intermediates that simplify downstream NLP tasks. However, the instruction-following capability of large language models (LLMs) offers a shortcut to effectively solve NLP tasks, questioning the utility of semantic graphs. Meanwhile, recent work has…
▽ More
Symbolic sentence meaning representations, such as AMR (Abstract Meaning Representation) provide expressive and structured semantic graphs that act as intermediates that simplify downstream NLP tasks. However, the instruction-following capability of large language models (LLMs) offers a shortcut to effectively solve NLP tasks, questioning the utility of semantic graphs. Meanwhile, recent work has also shown the difficulty of using meaning representations merely as a helpful auxiliary for LLMs. We revisit the position of semantic graphs in syntactic simplification, the task of simplifying sentence structures while preserving their meaning, which requires semantic understanding, and evaluate it on a new complex and natural dataset. The AMR-based method that we propose, AMRS$^3$, demonstrates that state-of-the-art meaning representations can lead to easy-to-implement simplification methods with competitive performance and unique advantages in cost, interpretability, and generalization. With AMRS$^3$ as an anchor, we discover that syntactic simplification is a task where semantic graphs are helpful in LLM prompting. We propose AMRCoC prompting that guides LLMs to emulate graph algorithms for explicit symbolic reasoning on AMR graphs, and show its potential for improving LLM on semantic-centered tasks like syntactic simplification.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Accurate and Nuanced Open-QA Evaluation Through Textual Entailment
Authors:
Peiran Yao,
Denilson Barbosa
Abstract:
Open-domain question answering (Open-QA) is a common task for evaluating large language models (LLMs). However, current Open-QA evaluations are criticized for the ambiguity in questions and the lack of semantic understanding in evaluators. Complex evaluators, powered by foundation models or LLMs and pertaining to semantic equivalence, still deviate from human judgments by a large margin. We propos…
▽ More
Open-domain question answering (Open-QA) is a common task for evaluating large language models (LLMs). However, current Open-QA evaluations are criticized for the ambiguity in questions and the lack of semantic understanding in evaluators. Complex evaluators, powered by foundation models or LLMs and pertaining to semantic equivalence, still deviate from human judgments by a large margin. We propose to study the entailment relations of answers to identify more informative and more general system answers, offering a much closer evaluation to human judgment on both NaturalQuestions and TriviaQA while being learning-free. The entailment-based evaluation we propose allows the assignment of bonus or partial marks by quantifying the inference gap between answers, enabling a nuanced ranking of answer correctness that has higher AUC than current methods.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
Emergent Ferromagnetism at LaFeO3/SrTiO3 Interface Arising from Strain-induced Spin-State Transition
Authors:
Menglin Zhu,
Joseph Lanier,
Sevim Polat Genlik,
Jose G. Flores,
Victor da Cruz Pinha Barbosa,
Mohit Randeria,
Patrick M. Woodward,
Maryam Ghazisaeidi,
Fengyuan Yang,
Jinwoo Hwang
Abstract:
Creating new interfacial magnetic states with desired functionalities is attractive for fundamental studies and spintronics applications. The emergence of interfacial magnetic phases demands the fabrication of pristine interfaces and the characterization and understanding of atomic structure as well as electronic, magnetic, and orbital degrees of freedom at the interface. Here, we report a novel i…
▽ More
Creating new interfacial magnetic states with desired functionalities is attractive for fundamental studies and spintronics applications. The emergence of interfacial magnetic phases demands the fabrication of pristine interfaces and the characterization and understanding of atomic structure as well as electronic, magnetic, and orbital degrees of freedom at the interface. Here, we report a novel interfacial insulating ferromagnetic order in antiferromagnetic LaFeO3 grown on SrTiO3, characterized by a combination of electron microscopy and spectroscopy, magnetometry, and density functional theory. The epitaxial strain drives a spin-state disproportionation in the interfacial layer of LaFeO3, which leads to a checkerboard arrangement of low- and high-spin Fe3+ ions inside smaller and larger FeO6 octahedra, respectively. Ferromagnetism at the interface arises from superexchange interactions between the low- and high-spin Fe3+. The detailed understanding of creation of emergent magnetism illustrates the potential of designing and controlling orbital degrees of freedom at the interface to realize novel phases and functionalities for future spin-electronic applications.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Are Large Language Models Moral Hypocrites? A Study Based on Moral Foundations
Authors:
José Luiz Nunes,
Guilherme F. C. F. Almeida,
Marcelo de Araujo,
Simone D. J. Barbosa
Abstract:
Large language models (LLMs) have taken centre stage in debates on Artificial Intelligence. Yet there remains a gap in how to assess LLMs' conformity to important human values. In this paper, we investigate whether state-of-the-art LLMs, GPT-4 and Claude 2.1 (Gemini Pro and LLAMA 2 did not generate valid results) are moral hypocrites. We employ two research instruments based on the Moral Foundatio…
▽ More
Large language models (LLMs) have taken centre stage in debates on Artificial Intelligence. Yet there remains a gap in how to assess LLMs' conformity to important human values. In this paper, we investigate whether state-of-the-art LLMs, GPT-4 and Claude 2.1 (Gemini Pro and LLAMA 2 did not generate valid results) are moral hypocrites. We employ two research instruments based on the Moral Foundations Theory: (i) the Moral Foundations Questionnaire (MFQ), which investigates which values are considered morally relevant in abstract moral judgements; and (ii) the Moral Foundations Vignettes (MFVs), which evaluate moral cognition in concrete scenarios related to each moral foundation. We characterise conflicts in values between these different abstractions of moral evaluation as hypocrisy. We found that both models displayed reasonable consistency within each instrument compared to humans, but they displayed contradictory and hypocritical behaviour when we compared the abstract values present in the MFQ to the evaluation of concrete moral violations of the MFV.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
PASO -- Astronomy and Space Situational Awareness in a Dark Sky Destination
Authors:
Domingos Barbosa,
Bruno Coelho,
Miguel Bergano,
Constança Alves,
Alexandre C. M. Correia,
Luís Cupido,
José Freitas,
Luís Gonçalves,
Bruce Grossan,
Anna Guerman,
Allan K. de Almeida Jr.,
Dalmiro Maia,
Bruno Morgado,
João Pandeirada,
Valério Ribeiro,
Gonçalo Rosa,
George Smoot,
Timothée Vaillant,
Thyrso Villela,
Carlos Alexandre Wuensche
Abstract:
The Pampilhosa da Serra Space Observatory (PASO) is located in the center of the continental Portuguese territory, in the heart of a certified Dark Sky destination by the Starlight Foundation (Aldeias do Xisto) and has been an instrumental asset to advance science, education and astrotourism certifications. PASO hosts astronomy and Space Situational Awareness (SSA) activities including a node of t…
▽ More
The Pampilhosa da Serra Space Observatory (PASO) is located in the center of the continental Portuguese territory, in the heart of a certified Dark Sky destination by the Starlight Foundation (Aldeias do Xisto) and has been an instrumental asset to advance science, education and astrotourism certifications. PASO hosts astronomy and Space Situational Awareness (SSA) activities including a node of the Portuguese Space Surveillance \& Tracking (SST) infrastructure network, such as a space radar currently in test phase using GEM radiotelescope, a double Wide Field of View Telescope system, a EUSST optical sensor telescope. These instruments allow surveillance of satellite and space debris in LEO, MEO and GEO orbits. The WFOV telescope offers spectroscopy capabilities enabling light curve analysis and cosmic sources monitoring. Instruments for Space Weather are being considered for installation to monitor solar activities and expand the range of SSA services.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
Assessing the dark degeneracy through the gas mass fraction data
Authors:
Dinorah Barbosa,
Rodrigo von Marttens,
Javier Gonzalez,
Jailson Alcaniz
Abstract:
It is well-known that Einstein's equations constrain only the total energy-momentum tensor of the cosmic substratum, without specifying the characteristics of its individual constituents. Consequently, cosmological models featuring distinct decompositions within the dark sector, while sharing identical values for the sum of dark components' energy-momentum tensor, remain indistinguishable when ass…
▽ More
It is well-known that Einstein's equations constrain only the total energy-momentum tensor of the cosmic substratum, without specifying the characteristics of its individual constituents. Consequently, cosmological models featuring distinct decompositions within the dark sector, while sharing identical values for the sum of dark components' energy-momentum tensor, remain indistinguishable when assessed through observables based on distance measurements. Notably, it has been already demonstrated that cosmological models with dynamical descriptions of dark energy, characterized by a time-dependent equation of state (EoS), can always be mapped into a model featuring a decaying vacuum ($w=-1$) coupled with dark matter. We explore the possibility of breaking this degeneracy by using measurements of the gas mass fraction observed in massive and relaxed galaxy clusters. This data is particularly interesting for this purpose because it isolates the matter contribution, possibly allowing the degeneracy breaking. We study the particular case of the $w$CDM model with its interactive counterpart. We compare the results obtained from both descriptions with a non-parametric analysis obtained through Gaussian Process. Even though the degeneracy may be broken from the theoretical point of view, we find that current gas mass fraction data seems to be insufficient for a final conclusion about which approach is favored, even when combined with SNIa, BAO and CMB.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Tangent Velocity constraint for orbital maneuvers with Theory of Functional Connections
Authors:
A. K. de Almeida Jr.,
T. Vaillant,
V. M. de Oliveira,
D. Barbosa,
D. Maia,
S. Aljbaae,
B. Coelho,
M. Bergano,
J. Pandeirada,
A. F. B. A. Prado,
A. Guerman,
A. C. M. Correia
Abstract:
Maneuvering a spacecraft in the cislunar space is a complex problem, since it is highly perturbed by the gravitational influence of both the Earth and the Moon, and possibly also the Sun. Trajectories minimizing the needed fuel are generally preferred in order to decrease the mass of the payload. A classical method to constrain maneuvers is mathematically modelling them using the Two Point Boundar…
▽ More
Maneuvering a spacecraft in the cislunar space is a complex problem, since it is highly perturbed by the gravitational influence of both the Earth and the Moon, and possibly also the Sun. Trajectories minimizing the needed fuel are generally preferred in order to decrease the mass of the payload. A classical method to constrain maneuvers is mathematically modelling them using the Two Point Boundary Value Problem (TPBVP), defining spacecraft positions at the start and end of the trajectory. Solutions to this problem can then be obtained with optimization techniques like the nonlinear least squares conjugated with the Theory of Functional Connections (TFC) to embed the constraints, which recently became an effective method for deducing orbit transfers. In this paper, we propose a tangential velocity (TV) type of constraints to design orbital maneuvers. We show that the technique presented in this paper can be used to transfer a spacecraft (e.g. from the Earth to the Moon) and perform rendezvous maneuvers (e.g. a swing-by with the Moon). In comparison with the TPBVP, solving the TV constraints via TFC offers several advantages, leading to a significant reduction in computational time. Hence, it proves to be an efficient technique to design these maneuvers.
△ Less
Submitted 8 January, 2024;
originally announced January 2024.
-
Non-standard Wigner doublets
Authors:
F. A. da Silva Barbosa,
J. M. Hoff da Silva
Abstract:
Guided by a conservative formulation in investigating the physical content of quantum fields, we explore non-standard Wigner classes of particles that could provide the basis for self-interaction models to dark matter. We critically contrast the analysis with long-standing constraints to non-standard Wigner classes in the literature to discuss the model's viability.
Guided by a conservative formulation in investigating the physical content of quantum fields, we explore non-standard Wigner classes of particles that could provide the basis for self-interaction models to dark matter. We critically contrast the analysis with long-standing constraints to non-standard Wigner classes in the literature to discuss the model's viability.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
Selection of powerful radio galaxies with machine learning
Authors:
R. Carvajal,
I. Matute,
J. Afonso,
R. P. Norris,
K. J. Luken,
P. Sánchez-Sáez,
P. A. C. Cunha,
A. Humphrey,
H. Messias,
S. Amarantidis,
D. Barbosa,
H. A. Cruz,
H. Miranda,
A. Paulino-Afonso,
C. Pappalardo
Abstract:
We developed and trained a pipeline of three machine learning (ML) models than can predict which sources are more likely to be an AGN and to be detected in specific radio surveys. Also, it can estimate redshift values for predicted radio-detectable AGNs. These models, which combine predictions from tree-based and gradient-boosting algorithms, have been trained with multi-wavelength data from near-…
▽ More
We developed and trained a pipeline of three machine learning (ML) models than can predict which sources are more likely to be an AGN and to be detected in specific radio surveys. Also, it can estimate redshift values for predicted radio-detectable AGNs. These models, which combine predictions from tree-based and gradient-boosting algorithms, have been trained with multi-wavelength data from near-infrared-selected sources in the Hobby-Eberly Telescope Dark Energy Experiment (HETDEX) Spring field. Training, testing, calibration, and validation were carried out in the HETDEX field. Further validation was performed on near-infrared-selected sources in the Stripe 82 field. In the HETDEX validation subset, our pipeline recovers 96% of the initially labelled AGNs and, from AGNs candidates, we recover 50% of previously detected radio sources. For Stripe 82, these numbers are 94% and 55%. Compared to random selection, these rates are two and four times better for HETDEX, and 1.2 and 12 times better for Stripe 82. The pipeline can also recover the redshift distribution of these sources with $σ_{\mathrm{NMAD}}$ = 0.07 for HETDEX ($σ_{\mathrm{NMAD}}$ = 0.09 for Stripe 82) and an outlier fraction of 19% (25% for Stripe 82), compatible with previous results based on broad-band photometry. Feature importance analysis stresses the relevance of near- and mid-infrared colours to select AGNs and identify their radio and redshift nature. Combining different algorithms in ML models shows an improvement in the prediction power of our pipeline over a random selection of sources. Tree-based ML models (in contrast to deep learning techniques) facilitate the analysis of the impact that features have on the predictions. This prediction can give insight into the potential physical interplay between the properties of radio AGNs (e.g. mass of black hole and accretion rate).
△ Less
Submitted 1 December, 2023; v1 submitted 20 September, 2023;
originally announced September 2023.
-
Theory of Functional Connections and Nelder-Mead optimization methods applied in satellite characterization
Authors:
Allan Kardec de Almeida Junior,
Safwan Aljbaae,
Timothée Vaillant,
Jhonathan M. Piñeros,
Bruno Coelho,
Domingos Barbosa,
Miguel Bergano,
João Pandeirada,
Francisco C. Carvalho,
Leonardo B. T. Santos,
Antonio F. B. A. Prado,
Anna Guerman,
Alexandre C. M. Correia
Abstract:
The growing population of man-made objects with the build up of mega-constellations not only increases the potential danger to all space vehicles and in-space infrastructures (including space observatories), but above all poses a serious threat to astronomy and dark skies. Monitoring of this population requires precise satellite characterization, which is is a challenging task that involves analyz…
▽ More
The growing population of man-made objects with the build up of mega-constellations not only increases the potential danger to all space vehicles and in-space infrastructures (including space observatories), but above all poses a serious threat to astronomy and dark skies. Monitoring of this population requires precise satellite characterization, which is is a challenging task that involves analyzing observational data such as position, velocity, and light curves using optimization methods. In this study, we propose and analyze the application of two optimization procedures to determine the parameters associated with the dynamics of a satellite: one based on the Theory of Functional Connections (TFC) and another one based on the Nelder-Mead heuristic optimization algorithm. The TFC performs linear functional interpolation to embed the constraints of the problem into a functional. In this paper, we propose to use this functional to analytically embed the observational data of a satellite into its equations of dynamics. After that, any solution will always satisfy the observational data. The second procedure proposed in this research takes advantage of the Nealder-Mead algorithm, that does not require the gradient of the objective function, as alternative solution. The accuracy, efficiency, and dependency on the initial guess of each method is investigated, analyzed, and compared for several dynamical models. These methods can be used to obtain the physical parameters of a satellite from available observational data and for space debris characterization contributing to follow-up monitoring activities in space and astronomical observatories.
△ Less
Submitted 21 July, 2023;
originally announced July 2023.
-
Tukey reducibility for categories -- In search of the strongest statement in finite Ramsey theory
Authors:
Keegan Dasilva Barbosa,
Dragan Mašulović
Abstract:
Every statement of the Ramsey theory of finite structures corresponds to the fact that a particular category has the Ramsey property. We can, then, compare the strength of Ramsey statements by comparing the ``Ramsey strength'' of the corresponding categories. The main thesis of this paper is that establishing pre-adjunctions between pairs of categories is an appropriate way of comparing their ``Ra…
▽ More
Every statement of the Ramsey theory of finite structures corresponds to the fact that a particular category has the Ramsey property. We can, then, compare the strength of Ramsey statements by comparing the ``Ramsey strength'' of the corresponding categories. The main thesis of this paper is that establishing pre-adjunctions between pairs of categories is an appropriate way of comparing their ``Ramsey strength''. What comes as a pleasant surprise is that pre-adjunctions generalize the Tukey reducibility in the same way categories generalize preorders. In this paper we set forth a classification program of statements of finite Ramsey theory based on their relationship with respect to this generalized notion of Tukey reducibility for categories. After identifying the ``weakest'' Ramsey category, we prove that the Finite Dual Ramsey Theorem is as powerful as the full-blown version of the Graham-Rothschild Theorem, and conclude the paper with the hypothesis that the Finite Dual Ramsey Theorem is the ``strongest'' of all finite Ramsey statements.
△ Less
Submitted 21 August, 2023; v1 submitted 18 July, 2023;
originally announced July 2023.
-
Relational Extraction on Wikipedia Tables using Convolutional and Memory Networks
Authors:
Arif Shahriar,
Rohan Saha,
Denilson Barbosa
Abstract:
Relation extraction (RE) is the task of extracting relations between entities in text. Most RE methods extract relations from free-form running text and leave out other rich data sources, such as tables. We explore RE from the perspective of applying neural methods on tabularly organized data. We introduce a new model consisting of Convolutional Neural Network (CNN) and Bidirectional-Long Short Te…
▽ More
Relation extraction (RE) is the task of extracting relations between entities in text. Most RE methods extract relations from free-form running text and leave out other rich data sources, such as tables. We explore RE from the perspective of applying neural methods on tabularly organized data. We introduce a new model consisting of Convolutional Neural Network (CNN) and Bidirectional-Long Short Term Memory (BiLSTM) network to encode entities and learn dependencies among them, respectively. We evaluate our model on a large and recent dataset and compare results with previous neural methods. Experimental results show that our model consistently outperforms the previous model for the task of relation extraction on tabular data. We perform comprehensive error analyses and ablation study to show the contribution of various components of our model. Finally, we discuss the usefulness and trade-offs of our approach, and provide suggestions for fostering further research.
△ Less
Submitted 11 July, 2023;
originally announced July 2023.
-
Spectral Analysis of Marine Debris in Simulated and Observed Sentinel-2/MSI Images using Unsupervised Classification
Authors:
Bianca Matos de Barros,
Douglas Galimberti Barbosa,
Cristiano Lima Hackmann
Abstract:
Marine litter poses significant threats to marine and coastal environments, with its impacts ever-growing. Remote sensing provides an advantageous supplement to traditional mitigation techniques, such as local cleaning operations and trawl net surveys, due to its capabilities for extensive coverage and frequent observation. In this study, we used Radiative Transfer Model (RTM) simulated data and d…
▽ More
Marine litter poses significant threats to marine and coastal environments, with its impacts ever-growing. Remote sensing provides an advantageous supplement to traditional mitigation techniques, such as local cleaning operations and trawl net surveys, due to its capabilities for extensive coverage and frequent observation. In this study, we used Radiative Transfer Model (RTM) simulated data and data from the Multispectral Instrument (MSI) of the Sentinel-2 mission in combination with machine learning algorithms. Our aim was to study the spectral behavior of marine plastic pollution and evaluate the applicability of RTMs within this research area. The results from the exploratory analysis and unsupervised classification using the KMeans algorithm indicate that the spectral behavior of pollutants is influenced by factors such as the type of polymer and pixel coverage percentage. The findings also reveal spectral characteristics and trends of association and differentiation among elements. The applied methodology is strongly dependent on the data, and if reapplied in new, more diverse, and detailed datasets, it can potentially generate even better results. These insights can guide future research in remote sensing applications for detecting marine plastic pollution.
△ Less
Submitted 26 June, 2023;
originally announced June 2023.
-
Box Ramsey and Canonical Colourings
Authors:
Keegan Dasilva Barbosa
Abstract:
This paper introduces the concept of a productive notion of big Ramsey degree and showcases its versatility through a handful of applications. The main focus is notably providing sufficient conditions for the existence of a finite canonical basis of equivalence relations, building upon the prior work of Laflamme, Sauer, and Vuksanovic. Additionally, a combinatorial analysis of indexed structures i…
▽ More
This paper introduces the concept of a productive notion of big Ramsey degree and showcases its versatility through a handful of applications. The main focus is notably providing sufficient conditions for the existence of a finite canonical basis of equivalence relations, building upon the prior work of Laflamme, Sauer, and Vuksanovic. Additionally, a combinatorial analysis of indexed structures is conducted.
△ Less
Submitted 13 June, 2023;
originally announced June 2023.
-
A Systematic Mapping Study and Practitioner Insights on the Use of Software Engineering Practices to Develop MVPs
Authors:
Silvio Alonso,
Marcos Kalinowski,
Bruna Ferreira,
Simone D. J. Barbosa,
Helio Lopes
Abstract:
[Background] The MVP concept has influenced the way in which development teams apply Software Engineering practices. However, the overall understanding of this influence of MVPs on SE practices is still poor. [Objective] Our goal is to characterize the publication landscape on practices that have been used in the context of software MVPs and to gather practitioner insights on the identified practi…
▽ More
[Background] The MVP concept has influenced the way in which development teams apply Software Engineering practices. However, the overall understanding of this influence of MVPs on SE practices is still poor. [Objective] Our goal is to characterize the publication landscape on practices that have been used in the context of software MVPs and to gather practitioner insights on the identified practices. [Method] We conducted a systematic mapping study and discussed its results in two focus groups sessions involving twelve industry practitioners that extensively use MVPs in their projects to capture their perceptions on the findings of the mapping study. [Results] We identified 33 papers published between 2013 and 2020 and observed some trends related to MVP ideation and evaluation practices. For instance, regarding ideation, we found six different approaches and mainly informal end-user involvement practices. Regarding evaluation, there is an emphasis on end-user validations based on practices such as usability tests, A/B testing, and usage data analysis. However, there is still limited research related to MVP technical feasibility assessment and effort estimation. Practitioners of the focus group sessions reinforced the confidence in our results regarding ideation and evaluation practices, being aware of most of the identified practices. They also reported how they deal with the technical feasibility assessments and effort estimation in practice. [Conclusion] Our analysis suggests that there are opportunities for solution proposals and evaluation studies to address literature gaps concerning technical feasibility assessment and effort estimation. Overall, more effort needs to be invested into empirically evaluating the existing MVP-related practices.
△ Less
Submitted 14 May, 2023;
originally announced May 2023.
-
A short note on the characterization of countable chains with finite big Ramsey spectra
Authors:
Keegan Dasilva Barbosa,
Dragan Mašulović,
Rajko Nenadov
Abstract:
In this short note we confirm the deep structural correspondence between the complexity of a countable scattered chain (= strict linear order) and its big Ramsey combinatorics: we show that a countable scattered chain has finite big Ramsey degrees if and only if it is of finite Hausdorff rank. This also provides a complete characterization of countable chains whose big Ramsey spectra are finite.…
▽ More
In this short note we confirm the deep structural correspondence between the complexity of a countable scattered chain (= strict linear order) and its big Ramsey combinatorics: we show that a countable scattered chain has finite big Ramsey degrees if and only if it is of finite Hausdorff rank. This also provides a complete characterization of countable chains whose big Ramsey spectra are finite.
We expand the notion of big Ramsey spectrum to monomorphic structures and give a sufficient condition for a monomorphic countable structure to have finite big Ramsey spectrum.
△ Less
Submitted 22 July, 2023; v1 submitted 4 May, 2023;
originally announced May 2023.
-
Human-AI Co-Creation Approach to Find Forever Chemicals Replacements
Authors:
Juliana Jansen Ferreira,
Vinícius Segura,
Joana G. R. Souza,
Gabriel D. J. Barbosa,
João Gallas,
Renato Cerqueira,
Dmitry Zubarev
Abstract:
Generative models are a powerful tool in AI for material discovery. We are designing a software framework that supports a human-AI co-creation process to accelerate finding replacements for the ``forever chemicals''-- chemicals that enable our modern lives, but are harmful to the environment and the human health. Our approach combines AI capabilities with the domain-specific tacit knowledge of sub…
▽ More
Generative models are a powerful tool in AI for material discovery. We are designing a software framework that supports a human-AI co-creation process to accelerate finding replacements for the ``forever chemicals''-- chemicals that enable our modern lives, but are harmful to the environment and the human health. Our approach combines AI capabilities with the domain-specific tacit knowledge of subject matter experts to accelerate the material discovery. Our co-creation process starts with the interaction between the subject matter experts and a generative model that can generate new molecule designs. In this position paper, we discuss our hypothesis that these subject matter experts can benefit from a more iterative interaction with the generative model, asking for smaller samples and ``guiding'' the exploration of the discovery space with their knowledge.
△ Less
Submitted 11 April, 2023;
originally announced April 2023.
-
NLP Workbench: Efficient and Extensible Integration of State-of-the-art Text Mining Tools
Authors:
Peiran Yao,
Matej Kosmajac,
Abeer Waheed,
Kostyantyn Guzhva,
Natalie Hervieux,
Denilson Barbosa
Abstract:
NLP Workbench is a web-based platform for text mining that allows non-expert users to obtain semantic understanding of large-scale corpora using state-of-the-art text mining models. The platform is built upon latest pre-trained models and open source systems from academia that provide semantic analysis functionalities, including but not limited to entity linking, sentiment analysis, semantic parsi…
▽ More
NLP Workbench is a web-based platform for text mining that allows non-expert users to obtain semantic understanding of large-scale corpora using state-of-the-art text mining models. The platform is built upon latest pre-trained models and open source systems from academia that provide semantic analysis functionalities, including but not limited to entity linking, sentiment analysis, semantic parsing, and relation extraction. Its extensible design enables researchers and developers to smoothly replace an existing model or integrate a new one. To improve efficiency, we employ a microservice architecture that facilitates allocation of acceleration hardware and parallelization of computation. This paper presents the architecture of NLP Workbench and discusses the challenges we faced in designing it. We also discuss diverse use cases of NLP Workbench and the benefits of using it over other approaches. The platform is under active development, with its source code released under the MIT license. A website and a short video demonstrating our platform are also available.
△ Less
Submitted 2 March, 2023;
originally announced March 2023.
-
Machine Learning based tool for CMS RPC currents quality monitoring
Authors:
E. Shumka,
A. Samalan,
M. Tytgat,
M. El Sawy,
G. A. Alves,
F. Marujo,
E. A. Coelho,
E. M. Da Costa,
H. Nogima,
A. Santoro,
S. Fonseca De Souza,
D. De Jesus Damiao,
M. Thiel,
K. Mota Amarilo,
M. Barroso Ferreira Filho,
A. Aleksandrov,
R. Hadjiiska,
P. Iaydjiev,
M. Rodozov,
M. Shopova,
G. Soultanov,
A. Dimitrov,
L. Litov,
B. Pavlov,
P. Petkov
, et al. (83 additional authors not shown)
Abstract:
The muon system of the CERN Compact Muon Solenoid (CMS) experiment includes more than a thousand Resistive Plate Chambers (RPC). They are gaseous detectors operated in the hostile environment of the CMS underground cavern on the Large Hadron Collider where pp luminosities of up to $2\times 10^{34}$ $\text{cm}^{-2}\text{s}^{-1}$ are routinely achieved. The CMS RPC system performance is constantly m…
▽ More
The muon system of the CERN Compact Muon Solenoid (CMS) experiment includes more than a thousand Resistive Plate Chambers (RPC). They are gaseous detectors operated in the hostile environment of the CMS underground cavern on the Large Hadron Collider where pp luminosities of up to $2\times 10^{34}$ $\text{cm}^{-2}\text{s}^{-1}$ are routinely achieved. The CMS RPC system performance is constantly monitored and the detector is regularly maintained to ensure stable operation. The main monitorable characteristics are dark current, efficiency for muon detection, noise rate etc. Herein we describe an automated tool for CMS RPC current monitoring which uses Machine Learning techniques. We further elaborate on the dedicated generalized linear model proposed already and add autoencoder models for self-consistent predictions as well as hybrid models to allow for RPC current predictions in a distant future.
△ Less
Submitted 6 February, 2023;
originally announced February 2023.
-
RPC based tracking system at CERN GIF++ facility
Authors:
K. Mota Amarilo,
A. Samalan,
M. Tytgat,
M. El Sawy,
G. A. Alves,
F. Marujo,
E. A. Coelho,
E. M. Da Costa,
H. Nogima,
A. Santoro,
S. Fonseca De Souza,
D. De Jesus Damiao,
M. Thiel,
M. Barroso Ferreira Filho,
A. Aleksandrov,
R. Hadjiiska,
P. Iaydjiev,
M. Rodozov,
M. Shopova,
G. Soultanov,
A. Dimitrov,
L. Litov,
B. Pavlov,
P. Petkov,
A. Petrov
, et al. (83 additional authors not shown)
Abstract:
With the HL-LHC upgrade of the LHC machine, an increase of the instantaneous luminosity by a factor of five is expected and the current detection systems need to be validated for such working conditions to ensure stable data taking. At the CERN Gamma Irradiation Facility (GIF++) many muon detectors undergo such studies, but the high gamma background can pose a challenge to the muon trigger system…
▽ More
With the HL-LHC upgrade of the LHC machine, an increase of the instantaneous luminosity by a factor of five is expected and the current detection systems need to be validated for such working conditions to ensure stable data taking. At the CERN Gamma Irradiation Facility (GIF++) many muon detectors undergo such studies, but the high gamma background can pose a challenge to the muon trigger system which is exposed to many fake hits from the gamma background. A tracking system using RPCs is implemented to clean the fake hits, taking profit of the high muon efficiency of these chambers. This work will present the tracking system configuration, used detector analysis algorithm and results.
△ Less
Submitted 29 November, 2022;
originally announced November 2022.
-
Lessons Learned to Improve the UX Practices in Agile Projects Involving Data Science and Process Automation
Authors:
Bruna Ferreira,
Silvio Marques,
Marcos Kalinowski,
Helio Lopes,
Simone D. J. Barbosa
Abstract:
Context: User-Centered Design and Agile methodologies focus on human issues. Nevertheless, agile methodologies focus on contact with contracting customers and generating value for them. Usually, the communication between end users and the agile team is mediated by customers. However, they do not know the problems end users face in their routines. Hence, UX issues are typically identified only afte…
▽ More
Context: User-Centered Design and Agile methodologies focus on human issues. Nevertheless, agile methodologies focus on contact with contracting customers and generating value for them. Usually, the communication between end users and the agile team is mediated by customers. However, they do not know the problems end users face in their routines. Hence, UX issues are typically identified only after the implementation, during user testing and validation. Objective: Aiming to improve the understanding and definition of the problem in agile projects, this research investigates the practices and difficulties experienced by agile teams during the development of data science and process automation projects. Also, we analyze the benefits and the teams' perceptions regarding user participation in these projects. Method: We collected data from four agile teams in an academia-industry collaboration focusing on delivering data science and process automation solutions. Therefore, we applied a carefully designed questionnaire answered by developers, scrum masters, and UX designers. In total, 18 subjects answered the questionnaire. Results: From the results, we identify practices used by the teams to define and understand the problem and to represent the solution. The practices most often used are prototypes and meetings with stakeholders. Another practice that helped the team to understand the problem was using Lean Inceptions. Also, our results present some specific issues regarding data science projects. Conclusion: We observed that end-user participation can be critical to understanding and defining the problem. They help to define elements of the domain and barriers in the implementation. We identified a need for approaches that facilitate user-team communication in data science projects and the need for more detailed requirements representations to support data science solutions.
△ Less
Submitted 24 November, 2022;
originally announced November 2022.
-
Developing a data fusion concept for radar and optical ground based SST station
Authors:
Bruno Coelho,
Domingos Barbosa,
Miguel Bergano,
João Pandeirada,
Paulo Marques,
Alexandre C. M. Correia,
José Matias de Freitas
Abstract:
As part of the Portuguese Space Surveillance and Tracking (SST) program, a tracking radar and a double Wide Field of View Telescope system (4.3° x 2.3°) are being installed at the Pampilhosa da Serra Space Observatory (PASO) in the centre of continental Portugal, complementing an already installed deployable optical sensor for MEO and GEO surveillance. The tracking radar will track space debris in…
▽ More
As part of the Portuguese Space Surveillance and Tracking (SST) program, a tracking radar and a double Wide Field of View Telescope system (4.3° x 2.3°) are being installed at the Pampilhosa da Serra Space Observatory (PASO) in the centre of continental Portugal, complementing an already installed deployable optical sensor for MEO and GEO surveillance. The tracking radar will track space debris in Low Earth Orbit (LEO) up to 1000 km and at the same time the telescope will also have LEO tracking capabilities. This article intends to discuss possible ways to take advantage of having these two sensors at the same location. Using both types of sensors takes advantage of the radar measurements which give precise radial velocity and distance to the objects, while the telescope gives better sky coordinates measurements. With the installation of radar and optical sensors, PASO can extend observation time of space debris and correlate information from optical and radar provenances in real time. During twilight periods both sensors can be used simultaneously to rapidly compute new TLEs for LEO objects, eliminating the time delays involved in data exchange between sites in a large SST network. This concept will not replace the need for a SST network with sensors in multiple locations around the globe, but will provide a more complete set of measurements from a given object passage, and therefore increase the added value for initial orbit determination, or monitoring of reentry campaigns of a given location. PASO will contribute to the development of new solutions to better characterize the objects improving the overall SST capabilities and constitute a perfect site for the development and testing of new radar and optical data fusion algorithms and techniques for space debris monitoring.
△ Less
Submitted 8 November, 2022;
originally announced November 2022.
-
ATLAS: Deployment, Control Platform and First RSO Measurements
Authors:
João Pandeirada,
Miguel Bergano,
Paulo Marques,
Domingos Barbosa,
Bruno Coelho,
José Freitas,
Domingos Nunes
Abstract:
The ever increasing dependence of modern societies in space based services results in a rising number of objects in orbit which grows the probability of collisions between them. The increase in space debris is a threat to space assets, space based-operations and led to a common effort to develop programs for dealing with it. As part of the Portuguese Space Surveillance and Tracking (SST) project,…
▽ More
The ever increasing dependence of modern societies in space based services results in a rising number of objects in orbit which grows the probability of collisions between them. The increase in space debris is a threat to space assets, space based-operations and led to a common effort to develop programs for dealing with it. As part of the Portuguese Space Surveillance and Tracking (SST) project, led by the Portuguese Ministry of Defense (MoD), Instituto de Telecomunicações (IT) is developing the rAdio TeLescope pAmpilhosa Serra (ATLAS), a new monostatic radar tracking sensor located at the Pampilhosa da Serra Space Observatory (PASO), Portugal. The system operates at 5.56 GHz and aims to provide information on objects in low earth orbit (LEO), with cross sections above 10 cm$^2$ at 1000~km. The sensor is tasked by the Portuguese Network Operations Center (NOC), located in the Azores island, which interfaces with the EU-SST network.
△ Less
Submitted 30 June, 2023; v1 submitted 7 November, 2022;
originally announced November 2022.
-
Probing a $\mathrm{Z}^{\prime}$ with non-universal fermion couplings through top quark fusion, decays to bottom quarks, and machine learning techniques
Authors:
Diego Barbosa,
Felipe Díaz,
Liliana Quintero,
Andrés Flórez,
Manuel Sanchez,
Alfredo Gurrola,
Elijah Sheridan,
Francesco Romeo
Abstract:
The production of heavy mass resonances has been widely studied theoretically and experimentally. Several extensions of the standard model (SM) of particle physics, naturally give rise to a new resonance, with neutral electric charge, commonly referred to as the $\textrm{Z}^{\prime}$ boson. The nature, mass, couplings, and associated quantum numbers of this hypothetical particle are yet to be dete…
▽ More
The production of heavy mass resonances has been widely studied theoretically and experimentally. Several extensions of the standard model (SM) of particle physics, naturally give rise to a new resonance, with neutral electric charge, commonly referred to as the $\textrm{Z}^{\prime}$ boson. The nature, mass, couplings, and associated quantum numbers of this hypothetical particle are yet to be determined. We present a feasibility study on the production of a vector like $\textrm{Z}^{\prime}$ boson at the LHC, with preferential couplings to third generation fermions, considering proton-proton collisions at $\sqrt{s} = 13$ $\mathrm{TeV}$ and 14 TeV. We work under two simplified phenomenological frameworks where the $\mathrm{Z}^{\prime}$ masses and couplings to the SM particles are free parameters, and consider final states of the $\textrm{Z}^{\prime}$ decaying to a pair of $\mathrm{b}$ quarks. The analysis is performed using machine learning techniques in order to maximize the experimental sensitivity. The proposed search methodology can be a key mode for discovery, complementary to the existing search strategies considered in literature, and extends the LHC sensitivity to the $\mathrm{Z}^{\prime}$ parameter space.
△ Less
Submitted 27 October, 2022;
originally announced October 2022.
-
IRJIT: A Simple, Online, Information Retrieval Approach for Just-In-Time Software Defect Prediction
Authors:
Hareem Sahar,
Abdul Ali Bangash,
Abram Hindle,
Denilson Barbosa
Abstract:
Just-in-Time software defect prediction (JIT-SDP) prevents the introduction of defects into the software by identifying them at commit check-in time. Current software defect prediction approaches rely on manually crafted features such as change metrics and involve expensive to train machine learning or deep learning models. These models typically involve extensive training processes that may requi…
▽ More
Just-in-Time software defect prediction (JIT-SDP) prevents the introduction of defects into the software by identifying them at commit check-in time. Current software defect prediction approaches rely on manually crafted features such as change metrics and involve expensive to train machine learning or deep learning models. These models typically involve extensive training processes that may require significant computational resources and time. These characteristics can pose challenges when attempting to update the models in real-time as new examples become available, potentially impacting their suitability for fast online defect prediction. Furthermore, the reliance on a complex underlying model makes these approaches often less explainable, which means the developers cannot understand the reasons behind models' predictions. An approach that is not explainable might not be adopted in real-life development environments because of developers' lack of trust in its results. To address these limitations, we propose an approach called IRJIT that employs information retrieval on source code and labels new commits as buggy or clean based on their similarity to past buggy or clean commits. IRJIT approach is online and explainable as it can learn from new data without expensive retraining, and developers can see the documents that support a prediction, providing additional context. By evaluating 10 open-source datasets in a within project setting, we show that our approach is up to 112 times faster than the state-of-the-art ML and DL approaches, offers explainability at the commit and line level, and has comparable performance to the state-of-the-art.
△ Less
Submitted 12 June, 2024; v1 submitted 5 October, 2022;
originally announced October 2022.
-
Gender Representation in Brazilian Computer Science Conferences
Authors:
Natália Dal Pizzol,
Eduardo Dos Santos Barbosa,
Soraia Raupp Musse
Abstract:
This study presents an automated bibliometric analysis of 6569 research papers published in thirteen Brazilian Computer Science Society (SBC) conferences from 1999 to 2021. Our primary goal was to gather data to understand the gender representation in publications in the field of Computer Science. We applied a systematic assignment of gender to 23.573 listed papers authorships, finding that the ge…
▽ More
This study presents an automated bibliometric analysis of 6569 research papers published in thirteen Brazilian Computer Science Society (SBC) conferences from 1999 to 2021. Our primary goal was to gather data to understand the gender representation in publications in the field of Computer Science. We applied a systematic assignment of gender to 23.573 listed papers authorships, finding that the gender gap for women is significant, with female authors being under-represented in all years of the study.
△ Less
Submitted 23 August, 2022;
originally announced August 2022.
-
One-parameter dynamical dark-energy from the generalized Chaplygin gas
Authors:
Rodrigo von Marttens,
Dinorah Barbosa,
Jailson Alcaniz
Abstract:
The fact that Einstein's equations connect the space-time geometry to the total matter content of the cosmic substratum, but not to individual contributions of the matter species, can be translated into a degeneracy in the cosmological dark sector. Such degeneracy makes it impossible to distinguish cases where dark energy (DE) interacts with dark matter (DM) from a dynamical non-interacting scenar…
▽ More
The fact that Einstein's equations connect the space-time geometry to the total matter content of the cosmic substratum, but not to individual contributions of the matter species, can be translated into a degeneracy in the cosmological dark sector. Such degeneracy makes it impossible to distinguish cases where dark energy (DE) interacts with dark matter (DM) from a dynamical non-interacting scenario using observational data based only on time or distance measurements. In this paper, based on the non-adiabatic generalized Chaplygin gas (gCg) model, we derive and study some cosmological consequences of a varying one-parameter dynamical DE parameterization, which does not allow phantom crossing. We perform a parameter selection using the most recent public available data, such as the data from Planck 2018, eBOSS DR16, Pantheon and KiDS-1000. We find that current observations provide strong constraints on the model parameters, leading to values very close to the $Λ$CDM cosmology, at the same time that the well-known $σ_8$ tension is reduced from $\sim 3σ$ to $\sim 1σ$ level.
△ Less
Submitted 12 August, 2022;
originally announced August 2022.
-
A note on higher rank descriptions of massless and massive spin-1 particles
Authors:
D. Dalmazi,
F. A. da Silva Barbosa,
A. L. R. dos Santos
Abstract:
The Maxwell theory can be written as a first order model with the help of a two-form auxiliary field, such master action allows the proof of duality between $1$-form and $D-3$ forms. Here we show that the replacement of the two-form auxiliary field by an arbitrary (non symmetric) rank-2 tensor leads to a new massless spin-1 dual theory in terms of a partially antisymmetric rank-3 tensor. In the ma…
▽ More
The Maxwell theory can be written as a first order model with the help of a two-form auxiliary field, such master action allows the proof of duality between $1$-form and $D-3$ forms. Here we show that the replacement of the two-form auxiliary field by an arbitrary (non symmetric) rank-2 tensor leads to a new massless spin-1 dual theory in terms of a partially antisymmetric rank-3 tensor. In the massive spin-1 case we have a non symmetric generalization of the massive two-form theory (Kalb-Ramond). The coupling of the massive non symmetric spin-1 model to matter fields is investigated via master actions.
We also show that massive models with severe discontinuity in their massless limit can also be obtained from Kaluza-Klein dimensional reduction of massless higher rank tensors which become Stueckelberg fields after the reduction.
△ Less
Submitted 12 October, 2022; v1 submitted 15 June, 2022;
originally announced June 2022.
-
Canonical analysis of Kalb-Ramond-Proca duality
Authors:
F. A. da Silva Barbosa
Abstract:
It is shown that the canonical quantization of the free massive Kalb-Ramond and Curtright-Freund Lagrangians leads to the same theory obtained from the canonical quantization of the free Proca and Klein-Gordon Lagrangians. The duality in the presence of interaction is explored in the context of the Feynman rules and beyond. It is pointed out that the equivalence between massive dual models without…
▽ More
It is shown that the canonical quantization of the free massive Kalb-Ramond and Curtright-Freund Lagrangians leads to the same theory obtained from the canonical quantization of the free Proca and Klein-Gordon Lagrangians. The duality in the presence of interaction is explored in the context of the Feynman rules and beyond. It is pointed out that the equivalence between massive dual models without gauge symmetry is rooted in an ambiguity of coordinate choices.
△ Less
Submitted 10 June, 2022; v1 submitted 16 March, 2022;
originally announced March 2022.
-
Atomic memory based on recoil-induced resonances
Authors:
Juan Carlos Chaves Capella,
Alvaro Mitchell Galvao de Melo,
Jesus Pavon Lopez,
Jose Wellington Rocha Tabosa,
Daniel Felinto Pires Barbosa
Abstract:
In this work we perform a detailed theoretical and experimental investigation of an atomic memory based on recoil-induced resonance in cold cesium atoms. We consider the interaction of a nearly degenerated pump and probe beams with an ensemble of two-level atoms. A full theoretical density matrix calculation in the extended Hilbert space of the internal and external atomic degrees of freedom allow…
▽ More
In this work we perform a detailed theoretical and experimental investigation of an atomic memory based on recoil-induced resonance in cold cesium atoms. We consider the interaction of a nearly degenerated pump and probe beams with an ensemble of two-level atoms. A full theoretical density matrix calculation in the extended Hilbert space of the internal and external atomic degrees of freedom allows us to obtain, from first principles, the transient and stationary responses determining the probe transmission and the forward four-wave mixing spectra. These two signals are generated together at the same order of perturbation with respect to the intensities of pump and probe beams. However, during continuous excitation of the sample, they are detected in very different ways and the signal at the probe transmission appears to be considerably larger, being the main focus of investigation prior to this work. Moreover, we have investigated the storage of optical information in the atomic external degrees of freedom, which provided a simple interpretation for the previously-reported non-volatile character of this memory. The retrieved signals after storage reveal the equivalent role of probe transmission and four-wave mixing, as the two signals have similar amplitudes. Probe transmission and forward four-wave-mixing spectra were then experimentally measured for both continuous excitation and after storage. The experimental observations are in good agreement with the developed theory and open a new pathway for the reversible exchange of optical information with atomic systems.
△ Less
Submitted 17 June, 2022; v1 submitted 29 December, 2021;
originally announced December 2021.
-
Cyber-Cosmos: A New Citizen Science Concept in A Dark Sky Destination
Authors:
Domingos Barbosa,
Bruno Coelho,
Miguel Bergano,
Catarina Magalhães,
David Mendonça,
Daniela Silva,
Alexandre C. M. Correia,
João Pandeirada,
Valério Ribeiro,
Thomas Esposito,
Franck Marchis
Abstract:
Astrotourism and related citizen science activities are becoming a major trend of a sustainable, high-quality tourism segment, core elements to the protection of Dark skies in many countries. In the Summer of 2020, in the middle of COVID pandemics, we started an initiative to train young students - Cyber-Cosmos - using an Unistellar eVscope, a smart, compact and user-friendly digital telescope tha…
▽ More
Astrotourism and related citizen science activities are becoming a major trend of a sustainable, high-quality tourism segment, core elements to the protection of Dark skies in many countries. In the Summer of 2020, in the middle of COVID pandemics, we started an initiative to train young students - Cyber-Cosmos - using an Unistellar eVscope, a smart, compact and user-friendly digital telescope that offers unprecedented opportunities for deep-sky observation and citizen science campaigns. Sponsored by the Ciência Viva Summer program, this was probably the first continuous application of this equipment in a pedagogical and citizen-science context, and in a pandemic context. Pampilhosa da Serra, home to a certified Dark Sky destination (Aldeias do Xisto) in central Portugal, was the chosen location for this project, where we expect astrotourism and citizen science to flourish and contribute to space sciences education.
△ Less
Submitted 3 November, 2021;
originally announced November 2021.
-
A Portuguese radar tracking sensor for Space Debris monitoring
Authors:
João Pandeirada,
Miguel Bergano,
Paulo Marques,
Domingos Barbosa,
Bruno Coelho,
Valério Ribeiro,
José Freitas,
Domingos Nunes,
José Eduardo
Abstract:
The increase in space debris is a threat to space assets, space based-operations and led to a common effort to develop programs for dealing with this increase. As part of the Portuguese Space Surveillance and Tracking (SST) project, led by the Portuguese Ministry of Defense (MoD), the Instituto de Telecomunicações (IT) is developing rAdio TeLescope pAmpilhosa Serra (ATLAS), a new monostatic radar…
▽ More
The increase in space debris is a threat to space assets, space based-operations and led to a common effort to develop programs for dealing with this increase. As part of the Portuguese Space Surveillance and Tracking (SST) project, led by the Portuguese Ministry of Defense (MoD), the Instituto de Telecomunicações (IT) is developing rAdio TeLescope pAmpilhosa Serra (ATLAS), a new monostatic radar tracking sensor located at the Pampilhosa da Serra Space Observatory (ErPoB), Portugal. The system operates at 5.56 GHz and aims to provide information on objects in low earth orbit (LEO) orbits, with cross sections above 10 cm2 at 1000 km. ErPoB houses all the necessary equipment to connect to the research and development team in IT-Aveiro and to the European Union Space Surveillance and Tracking (EU-SST) network through the Portuguese SST-PT network and operation center. The ATLAS system features digital waveform synthesis, power amplifiers using Gallium Nitride (GaN) technology, fully digital signal processing and a highly modular architecture that follows an Open Systems (OS) philosophy and uses Commercial-Off-The-Shelf (COTS) technologies. ATLAS establishes a modern and versatile platform for fast and easy development, research and innovation. The whole system (except antenna and power amplifiers) was tested in a setup with a major reflector of opportunity at a well defined range. The obtained range profiles show that the target can be easily detected. This marks a major step on the functional testing of the system and on getting closer to an operational system capable of detecting objects in orbit.
△ Less
Submitted 7 January, 2022; v1 submitted 3 November, 2021;
originally announced November 2021.
-
Exploring New Redshift Indicators for Radio-Powerful AGN
Authors:
Rodrigo Carvajal,
Israel Matute,
José Afonso,
Stergios Amarantidis,
Davi Barbosa,
Pedro Cunha,
Andrew Humphrey
Abstract:
Active Galactic Nuclei (AGN) are relevant sources of radiation that might have helped reionising the Universe during its early epochs. The super-massive black holes (SMBHs) they host helped accreting material and emitting large amounts of energy into the medium. Recent studies have shown that, for epochs earlier than $z~{\sim}~5$, the number density of SMBHs is on the order of few hundreds per squ…
▽ More
Active Galactic Nuclei (AGN) are relevant sources of radiation that might have helped reionising the Universe during its early epochs. The super-massive black holes (SMBHs) they host helped accreting material and emitting large amounts of energy into the medium. Recent studies have shown that, for epochs earlier than $z~{\sim}~5$, the number density of SMBHs is on the order of few hundreds per square degree. Latest observations place this value below $300$ SMBHs at $z~{\gtrsim}~6$ for the full sky. To overcome this gap, it is necessary to detect large numbers of sources at the earliest epochs. Given the large areas needed to detect such quantities, using traditional redshift determination techniques -- spectroscopic and photometric redshift -- is no longer an efficient task. Machine Learning (ML) might help obtaining precise redshift for large samples in a fraction of the time used by other methods. We have developed and implemented an ML model which can predict redshift values for WISE-detected AGN in the HETDEX Spring Field. We obtained a median prediction error of $σ_{z}^{N} = 1.48 \times (z_{\mathrm{Predicted}} - z_{\mathrm{True}}) / (1 + z_{\mathrm{True}}) = 0.1162$ and an outlier fraction of $η= 11.58 \%$ at $(z_{\mathrm{Predicted}} - z_{\mathrm{True}}) / (1 + z_{\mathrm{True}}) > 0.15$, in line with previous applications of ML to AGN. We also applied the model to data from the Stripe 82 area obtaining a prediction error of $σ_{z}^{N} = 0.2501$.
△ Less
Submitted 1 November, 2021;
originally announced November 2021.
-
The weak Ramsey property and extreme amenability
Authors:
Adam Bartoš,
Tristan Bice,
Keegan Dasilva Barbosa,
Wiesław Kubiś
Abstract:
We extend the Kechris--Pestov--Todorčević correspondence to weak Fraïssé categories and automorphism groups of generic objects. The new ingredient is the weak Ramsey property. We demonstrate the theory on several examples including monoid categories, the category of almost linear orders, and categories of strong embeddings of trees.
We extend the Kechris--Pestov--Todorčević correspondence to weak Fraïssé categories and automorphism groups of generic objects. The new ingredient is the weak Ramsey property. We demonstrate the theory on several examples including monoid categories, the category of almost linear orders, and categories of strong embeddings of trees.
△ Less
Submitted 4 October, 2021;
originally announced October 2021.
-
Radio astronomy and Space science in Azores: enhancing the Atlantic VLBI infrastructure cluster
Authors:
Domingos Barbosa,
Bruno Coelho,
Sonia Antón,
Miguel Bergano,
Tjarda Boekholt,
Alexandre C. M. Correia,
Dalmiro Maia,
João Pandeirada,
Valério Ribeiro,
Jason Adams,
João Paulo Barraca,
Diogo Gomes,
Bruno Morgado
Abstract:
Radio astronomy and Space Infrastructures in the Azores have a great scientific and industrial interest because they benefit from a unique geographical location in the middle of the North Atlantic allowing a vast improvement in the sky coverage. This fact obviously has a very high added value for: i) the establishment of space tracking and communications networks for the emergent global small sate…
▽ More
Radio astronomy and Space Infrastructures in the Azores have a great scientific and industrial interest because they benefit from a unique geographical location in the middle of the North Atlantic allowing a vast improvement in the sky coverage. This fact obviously has a very high added value for: i) the establishment of space tracking and communications networks for the emergent global small satellite fleets ii) it is invaluable to connect the radio astronomy infrastructure networks in Africa, Europe and America continents using Very Large Baseline Interferometry (VLBI) techniques, iii) it allows excellent potential for monitoring space debris and Near Earth Objects (NEOs). There is in S. Miguel island a 32-metre SATCOM antenna that could be integrated in advanced VLBI networks and be capable of additional Deep Space Network ground support. This paper explores the space science opportunities offered by the upgrade of the S. Miguel 32-metre SATCOM antenna into a world-class infrastructure for radio astronomy and space exploration: it would enable a Deep Space Network mode and would constitute a key space facility for data production, promoting local digital infrastructure investments and the testing of cutting-edge information technologies. Its Atlantic location also enables improvements in angular resolution, provides many baseline in East-West and North-South directions connecting the emergent VLBI stations in America to Europe and Africa VLBI arrays therefore contributing for greater array imaging capabilities especially for sources or well studied fields close to or below the celestial equator, where ESO facilities, ALMA, SKA and its precursors do or will operate and observe in the coming decades.
△ Less
Submitted 7 July, 2021;
originally announced July 2021.
-
New SST Optical Sensor of Pampilhosa da Serra: studies on image processing algorithms and multi-filter characterization of Space Debris
Authors:
Bruno Coelho,
Domingos Barbosa,
Miguel Bergano,
A. C. M. Correia,
José Freitas,
Paulo Marques,
João Pandeirada,
Valério Ribeiro
Abstract:
As part of the Portuguese Space Surveillance and Tracking (SST) System, two new Wide Field of View (2.3deg x 2.3deg) small aperture (30cm) telescopes will be deployed in 2021, at the Pampilhosa da Serra Space Observatory (PASO), located in the center of the continental Portuguese territory, in the heart of a certified Dark Sky area. These optical systems will provide added value capabilities to th…
▽ More
As part of the Portuguese Space Surveillance and Tracking (SST) System, two new Wide Field of View (2.3deg x 2.3deg) small aperture (30cm) telescopes will be deployed in 2021, at the Pampilhosa da Serra Space Observatory (PASO), located in the center of the continental Portuguese territory, in the heart of a certified Dark Sky area. These optical systems will provide added value capabilities to the Portuguese SST network, complementing the optical telescopes currently in commissioning in Madeira and Azores. These telescopes are optimized for GEO and MEO survey operations and besides the required SST operational capability, they will also provide an important development component to the Portuguese SST network. The telescopes will be equipped with filter wheels, being able to perform observations in several optical bands including white light, BVRI bands and narrow band filters such as H(alpha) and O[III] to study potential different objects' albedos. This configuration enables us to conduct a study on space debris classification$/$characterization using combinations of different colors aiming the production of improved color index schemes to be incorporated in the automatic pipelines for classification of space debris. This optical sensor will also be used to conduct studies on image processing algorithms, including source extraction and classification solutions through the application of machine learning techniques. Since SST dedicated telescopes produce a large quantity of data per observation night, fast, efficient and automatic image processing techniques are mandatory. A platform like this one, dedicated to the development of Space Surveillance studies, will add a critical capability to keep the Portuguese SST network updated, and as a consequence it may provide useful developments to the European SST network as well.
△ Less
Submitted 5 July, 2021;
originally announced July 2021.
-
Design of pulsed waveforms for space debris detection with ATLAS
Authors:
João Pandeirada,
Miguel Bergano,
Paulo Marques,
Domingos Barbosa,
José Freitas,
Bruno Coelho,
Valério Ribeiro
Abstract:
ATLAS is the first Portuguese radar system that aims to detect space debris. The article introduces the system and provides a brief description of its capabilities. The system is capable of synthesizing arbitrary amplitude modulated pulse shapes with a resolution of 10 ns. Given that degree of freedom we decided to test an amplitude modulated chirp signal developed by us and a nested barker code.…
▽ More
ATLAS is the first Portuguese radar system that aims to detect space debris. The article introduces the system and provides a brief description of its capabilities. The system is capable of synthesizing arbitrary amplitude modulated pulse shapes with a resolution of 10 ns. Given that degree of freedom we decided to test an amplitude modulated chirp signal developed by us and a nested barker code. These waveforms are explained as well as their advantages and drawbacks for space debris detection. An experimental setup was developed to test the system receiver and waveforms are processed by digital matched filtering. The experiments test the system using different waveform shapes and noise levels. Experimental results are in agreement with simulation and show that the chirp signal is more resilient to Doppler shifts, has higher range resolution and lower peak-to-sidelobe ratio in comparison with the nested barker code. Future work in order to increase detection capabilities is discussed at the end.
△ Less
Submitted 5 July, 2021;
originally announced July 2021.
-
An empirical evaluation of the usefulness of Tree Kernels for Commit-time Defect Detection in large software systems
Authors:
Hareem Sahar,
Yuxin Liu,
Abram Hindle,
Denilson Barbosa
Abstract:
Defect detection at commit check-in time prevents the introduction of defects into software systems. Current defect detection approaches rely on metric-based models which are not very accurate and whose results are not directly useful for developers. We propose a method to detect bug-inducing commits by comparing the incoming changes with all past commits in the project, considering both those tha…
▽ More
Defect detection at commit check-in time prevents the introduction of defects into software systems. Current defect detection approaches rely on metric-based models which are not very accurate and whose results are not directly useful for developers. We propose a method to detect bug-inducing commits by comparing the incoming changes with all past commits in the project, considering both those that introduced defects and those that did not. Our method considers individual changes in the commit separately, at the method-level granularity. Doing so helps developers as they are informed of specific methods that need further attention instead of being told that the entire commit is problematic. Our approach represents source code as abstract syntax trees and uses tree kernels to estimate the similarity of the code with previous commits. We experiment with subtree kernels (STK), subset tree kernels (SSTK), or partial tree kernels (PTK). An incoming change is then classified using a K-NN classifier on the past changes. We evaluate our approach on the BigCloneBench benchmark and on the Technical Debt dataset, using the NiCad clone detector as the baseline. Our experiments with the BigCloneBench benchmark show that the tree kernel approach can detect clones with a comparable MAP to that of NiCad. Also, on defect detection with the Technical Debt dataset, tree kernels are least as effective as NiCad with MRR, F-score, and Accuracy of 0.87, 0.80, and 0.82 respectively.
△ Less
Submitted 20 June, 2021;
originally announced June 2021.
-
Development of the first Portuguese radar tracking sensor for Space Debris
Authors:
João Pandeirada,
Miguel Bergano,
João Neves,
Paulo Marques,
Domingos Barbosa,
Bruno Coelho,
Valério Ribeiro
Abstract:
Currently, space debris represents a threat for satellites and space-based operations, both in-orbit and during the launching process. The yearly increase in space debris represents a serious concern to major space agencies leading to the development of dedicated space programs to deal with this issue. Ground-based radars can detect Earth orbiting debris down to a few square centimeters and theref…
▽ More
Currently, space debris represents a threat for satellites and space-based operations, both in-orbit and during the launching process. The yearly increase in space debris represents a serious concern to major space agencies leading to the development of dedicated space programs to deal with this issue. Ground-based radars can detect Earth orbiting debris down to a few square centimeters and therefore constitute a major building block of a space debris monitoring system. New radar sensors are required in Europe to enhance capabilities and availability of its small radar network capable of tracking and surveying space objects and to respond to the debris increase expected from the New Space economy activities. This article presents ATLAS, a new tracking radar system for debris detection located in Portugal. It starts by an extensive technical description of all the system components followed by a study that estimates its future performance. A section dedicated to waveform design is also presented, since the system allows the usage of several types of pulse modulation schemes such as LFM and phase coded modulations while enabling the development and testing of more advanced ones. By presenting an architecture that is highly modular with fully digital signal processing, ATLAS establishes a platform for fast and easy development, research and innovation. The system follows the use of Commercial-Off-The-Shelf technologies and Open Systems which is unique among current radar systems.
△ Less
Submitted 20 February, 2021;
originally announced February 2021.
-
Typing Errors in Factual Knowledge Graphs: Severity and Possible Ways Out
Authors:
Peiran Yao,
Denilson Barbosa
Abstract:
Factual knowledge graphs (KGs) such as DBpedia and Wikidata have served as part of various downstream tasks and are also widely adopted by artificial intelligence research communities as benchmark datasets. However, we found these KGs to be surprisingly noisy. In this study, we question the quality of these KGs, where the typing error rate is estimated to be 27% for coarse-grained types on average…
▽ More
Factual knowledge graphs (KGs) such as DBpedia and Wikidata have served as part of various downstream tasks and are also widely adopted by artificial intelligence research communities as benchmark datasets. However, we found these KGs to be surprisingly noisy. In this study, we question the quality of these KGs, where the typing error rate is estimated to be 27% for coarse-grained types on average, and even 73% for certain fine-grained types. In pursuit of solutions, we propose an active typing error detection algorithm that maximizes the utilization of both gold and noisy labels. We also comprehensively discuss and compare unsupervised, semi-supervised, and supervised paradigms to deal with typing errors in factual KGs. The outcomes of this study provide guidelines for researchers to use noisy factual KGs. To help practitioners deploy the techniques and conduct further research, we published our code and data.
△ Less
Submitted 3 February, 2021;
originally announced February 2021.
-
Borel Colouring Bad Sequences
Authors:
Keegan Dasilva Barbosa
Abstract:
Every better quasi-order codifies a Borel graph that does not contain a copy of the shift graph. It is known that there is a better quasi-order that codes a Borel graph with infinite Borel chromatic number, though one has yet to be explicitly constructed. In this paper, we show that examples cannot be constructed via standard methods. Moreover, we show that most of the known better quasi-orders ar…
▽ More
Every better quasi-order codifies a Borel graph that does not contain a copy of the shift graph. It is known that there is a better quasi-order that codes a Borel graph with infinite Borel chromatic number, though one has yet to be explicitly constructed. In this paper, we show that examples cannot be constructed via standard methods. Moreover, we show that most of the known better quasi-orders are non-examples, suggesting there is still a class of better quasi-orders with interesting combinatorial properties who's elements/members still remain unknown.
△ Less
Submitted 14 January, 2021;
originally announced January 2021.
-
Determining the position of a single spin relative to a metallic nanowire
Authors:
J. F. da Silva Barbosa,
M. Lee,
P. Campagne-Ibarcq,
P. Jamonneau,
Y. Kubo,
S. Pezzagna,
J. Meijer,
T. Teraji,
D. Vion,
D. Esteve,
R. W. Heeres,
P. Bertet
Abstract:
The nanoscale localization of individual paramagnetic defects near an electrical circuit is an important step for realizing hybrid quantum devices with strong spin-microwave photon coupling. Here, we demonstrate the fabrication of an array of individual NV centers in diamond near a metallic nanowire deposited on top of the substrate. We determine the relative position of each NV center with…
▽ More
The nanoscale localization of individual paramagnetic defects near an electrical circuit is an important step for realizing hybrid quantum devices with strong spin-microwave photon coupling. Here, we demonstrate the fabrication of an array of individual NV centers in diamond near a metallic nanowire deposited on top of the substrate. We determine the relative position of each NV center with $\sim$10\,nm accuracy, using it as a vector magnetometer to measure the field generated by passing a dc current through the wire.
△ Less
Submitted 19 November, 2020;
originally announced November 2020.
-
Square Kilometre Array Science Data Challenge 1: analysis and results
Authors:
A. Bonaldi,
T. An,
M. Bruggen,
S. Burkutean,
B. Coelho,
H. Goodarzi,
P. Hartley,
P. K. Sandhu,
C. Wu,
L. Yu,
M. H. Zhoolideh Haghighi,
S. Anton,
Z. Bagheri,
D. Barbosa,
J. P. Barraca,
D. Bartashevich,
M. Bergano,
M. Bonato,
J. Brand,
F. de Gasperin,
A. Giannetti,
R. Dodson,
P. Jain,
S. Jaiswal,
B. Lao
, et al. (20 additional authors not shown)
Abstract:
As the largest radio telescope in the world, the Square Kilometre Array (SKA) will lead the next generation of radio astronomy. The feats of engineering required to construct the telescope array will be matched only by the techniques developed to exploit the rich scientific value of the data. To drive forward the development of efficient and accurate analysis methods, we are designing a series of…
▽ More
As the largest radio telescope in the world, the Square Kilometre Array (SKA) will lead the next generation of radio astronomy. The feats of engineering required to construct the telescope array will be matched only by the techniques developed to exploit the rich scientific value of the data. To drive forward the development of efficient and accurate analysis methods, we are designing a series of data challenges that will provide the scientific community with high-quality datasets for testing and evaluating new techniques. In this paper we present a description and results from the first such Science Data Challenge (SDC1). Based on SKA MID continuum simulated observations and covering three frequencies (560 MHz, 1400MHz and 9200 MHz) at three depths (8 h, 100 h and 1000 h), SDC1 asked participants to apply source detection, characterization and classification methods to simulated data. The challenge opened in November 2018, with nine teams submitting results by the deadline of April 2019. In this work we analyse the results for 8 of those teams, showcasing the variety of approaches that can be successfully used to find, characterise and classify sources in a deep, crowded field. The results also demonstrate the importance of building domain knowledge and expertise on this kind of analysis to obtain the best performance. As high-resolution observations begin revealing the true complexity of the sky, one of the outstanding challenges emerging from this analysis is the ability to deal with highly resolved and complex sources as effectively as the unresolved source population.
△ Less
Submitted 28 September, 2020;
originally announced September 2020.
-
An Annotated Corpus of Webtables for Information Extraction Tasks
Authors:
Erin Macdonald,
Denilson Barbosa
Abstract:
Information Extraction is a well-researched area of Natural Language Processing with applications in web search and question answering concerned with identifying entities and relationships between them as expressed in a given context, usually a sentence of a paragraph of running text. Given the importance of the task, several datasets and benchmarks have been curated over the years. However, focus…
▽ More
Information Extraction is a well-researched area of Natural Language Processing with applications in web search and question answering concerned with identifying entities and relationships between them as expressed in a given context, usually a sentence of a paragraph of running text. Given the importance of the task, several datasets and benchmarks have been curated over the years. However, focusing on running text alone leaves out tables which are common in many structured documents and in which pairs of entities also co-occur in context (e.g., the same row of the table). While there are recent papers on relation extraction from tables in the literature, their experimental evaluations have been on ad-hoc datasets for the lack of a standard benchmark. This paper helps close that gap. We introduce an annotation framework and a dataset of 217,834 tables from Wikipedia which are annotated with 28 relations, using both classifiers and carefully designed queries over a reference knowledge graph. Binary classifiers are then applied to the resulting dataset to remove false positives, resulting in an average annotation accuracy of 94%. The resulting dataset is the first of its kind to be made publicly available.
△ Less
Submitted 16 November, 2020; v1 submitted 17 August, 2020;
originally announced August 2020.
-
Portuguese SKA White Book
Authors:
Domingos Barbosa,
Sonia Antón,
João Paulo Barraca,
Miguel Bergano,
Alexandre C. M. Correia,
Dalmiro Maia,
Valério A. R. M. Ribeiro
Abstract:
This white book stems from the contributions presented at the Portuguese SKA Days, held on the 6th and 7th February 2018 with the presence of the SKA Deputy Director General Alistair McPherson and the SKA Science Director Robert Braun. This initiative was held to promote the Square Kilometer Array (SKA) - the world's largest radio telescope - among the Portuguese scientific and business communitie…
▽ More
This white book stems from the contributions presented at the Portuguese SKA Days, held on the 6th and 7th February 2018 with the presence of the SKA Deputy Director General Alistair McPherson and the SKA Science Director Robert Braun. This initiative was held to promote the Square Kilometer Array (SKA) - the world's largest radio telescope - among the Portuguese scientific and business communities with support from the Portuguese Science and Technology Foundation (FCT) with the contribution of Portuguese policy makers and researchers. The meeting was very successful in providing a detailed overview of the SKA status, vision and goals and describes most of the Portuguese contributions to science, technology and the related industry aspirations
△ Less
Submitted 3 May, 2020;
originally announced May 2020.
-
A Decomposition Theorem for Aronszajn Lines
Authors:
Keegan Dasilva Barbosa
Abstract:
We show that under the proper forcing axiom the class of all Aronszajn lines behave like $σ$-scattered orders under the embeddability relation. In particular, we are able to show that the class of better quasi order labeled fragmented Aronszajn lines is itself a better quasi order. Moreover, we show that every better quasi order labeled Aronszajn line can be expressed as a finite sum of labeled ty…
▽ More
We show that under the proper forcing axiom the class of all Aronszajn lines behave like $σ$-scattered orders under the embeddability relation. In particular, we are able to show that the class of better quasi order labeled fragmented Aronszajn lines is itself a better quasi order. Moreover, we show that every better quasi order labeled Aronszajn line can be expressed as a finite sum of labeled types which are algebraically indecomposable. By encoding lines with finite labeled trees, we are also able to deduce a decomposition result, that for every Aronszajn line $L$ there is integer n such that for any finite colouring of $L$ there is subset $L^\prime$ of $L$ isomorphic to $L$ which uses no more than n colours.
△ Less
Submitted 26 March, 2020;
originally announced March 2020.
-
Brazilian Lyrics-Based Music Genre Classification Using a BLSTM Network
Authors:
Raul de Araújo Lima,
Rômulo César Costa de Sousa,
Simone Diniz Junqueira Barbosa,
Hélio Cortês Vieira Lopes
Abstract:
Organize songs, albums, and artists in groups with shared similarity could be done with the help of genre labels. In this paper, we present a novel approach for automatic classifying musical genre in Brazilian music using only the song lyrics. This kind of classification remains a challenge in the field of Natural Language Processing. We construct a dataset of 138,368 Brazilian song lyrics distrib…
▽ More
Organize songs, albums, and artists in groups with shared similarity could be done with the help of genre labels. In this paper, we present a novel approach for automatic classifying musical genre in Brazilian music using only the song lyrics. This kind of classification remains a challenge in the field of Natural Language Processing. We construct a dataset of 138,368 Brazilian song lyrics distributed in 14 genres. We apply SVM, Random Forest and a Bidirectional Long Short-Term Memory (BLSTM) network combined with different word embeddings techniques to address this classification task. Our experiments show that the BLSTM method outperforms the other models with an F1-score average of $0.48$. Some genres like "gospel", "funk-carioca" and "sertanejo", which obtained 0.89, 0.70 and 0.69 of F1-score, respectively, can be defined as the most distinct and easy to classify in the Brazilian musical genres context.
△ Less
Submitted 6 March, 2020;
originally announced March 2020.
-
A Categorical Notion of Precompact Expansion
Authors:
Keegan Dasilva Barbosa
Abstract:
We generalize the notion of relational precompact expansions of Fraïssé classes via functorial means, inspired by the technique outlined by Laflamme, Nguyen Van Thé and Sauer in their paper Partition properties of the dense local order and a colored version of Milliken's theorem arXiv:0710.2885. We also generalize the expansion property and prove that categorical precompact expansions grant upper…
▽ More
We generalize the notion of relational precompact expansions of Fraïssé classes via functorial means, inspired by the technique outlined by Laflamme, Nguyen Van Thé and Sauer in their paper Partition properties of the dense local order and a colored version of Milliken's theorem arXiv:0710.2885. We also generalize the expansion property and prove that categorical precompact expansions grant upper bounds for Ramsey degrees. Moreover, we show under strict conditions, we can also compute big Ramsey degrees. We also apply our methodology to calculate the big and little Ramsey degrees of the objects in Age$(\mathbf{S}(n))$ for all $n\geq 2$.
△ Less
Submitted 26 February, 2020;
originally announced February 2020.
-
VisMaker: a Question-Oriented Visualization Recommender System for Data Exploration
Authors:
Raul de Araújo Lima,
Simone Diniz Junqueira Barbosa
Abstract:
The increasingly rapid growth of data production and the consequent need to explore data to obtain answers to the most varied questions have promoted the development of tools to facilitate the manipulation and construction of data visualizations. However, building useful data visualizations is not a trivial task: it may involve a large number of subtle decisions that require experience from their…
▽ More
The increasingly rapid growth of data production and the consequent need to explore data to obtain answers to the most varied questions have promoted the development of tools to facilitate the manipulation and construction of data visualizations. However, building useful data visualizations is not a trivial task: it may involve a large number of subtle decisions that require experience from their designer. In this paper, we present VisMaker, a visualization recommender tool that uses a set of rules to present visualization recommendations organized and described through questions, in order to facilitate the understanding of the recommendations and assisting the visual exploration process. We carried out two studies comparing our tool with Voyager 2 and analyzed some aspects of the use of tools. We collected feedback from participants to identify the advantages and disadvantages of our recommendation approach. As a result, we gathered comments to help improve the development of tools in this domain.
△ Less
Submitted 14 February, 2020;
originally announced February 2020.
-
Knowledge Graph Embedding for Link Prediction: A Comparative Analysis
Authors:
Andrea Rossi,
Donatella Firmani,
Antonio Matinata,
Paolo Merialdo,
Denilson Barbosa
Abstract:
Knowledge Graphs (KGs) have found many applications in industry and academic settings, which in turn, have motivated considerable research efforts towards large-scale information extraction from a variety of sources. Despite such efforts, it is well known that even state-of-the-art KGs suffer from incompleteness. Link Prediction (LP), the task of predicting missing facts among entities already a K…
▽ More
Knowledge Graphs (KGs) have found many applications in industry and academic settings, which in turn, have motivated considerable research efforts towards large-scale information extraction from a variety of sources. Despite such efforts, it is well known that even state-of-the-art KGs suffer from incompleteness. Link Prediction (LP), the task of predicting missing facts among entities already a KG, is a promising and widely studied task aimed at addressing KG incompleteness. Among the recent LP techniques, those based on KG embeddings have achieved very promising performances in some benchmarks. Despite the fast growing literature in the subject, insufficient attention has been paid to the effect of the various design choices in those methods. Moreover, the standard practice in this area is to report accuracy by aggregating over a large number of test facts in which some entities are over-represented; this allows LP methods to exhibit good performance by just attending to structural properties that include such entities, while ignoring the remaining majority of the KG. This analysis provides a comprehensive comparison of embedding-based LP methods, extending the dimensions of analysis beyond what is commonly available in the literature. We experimentally compare effectiveness and efficiency of 16 state-of-the-art methods, consider a rule-based baseline, and report detailed analysis over the most popular benchmarks in the literature.
△ Less
Submitted 21 January, 2021; v1 submitted 3 February, 2020;
originally announced February 2020.