Data

20 pages, 17344 KiB

Open AccessData Descriptor

Multi-Scale Earthquake Damaged Building Feature Set

by Guorui Gao, Futao Wang, Zhenqing Wang, Qing Zhao, Litao Wang, Jinfeng Zhu, Wenliang Liu, Gang Qin and Yanfang Hou

Data 2024, 9(7), 88; https://doi.org/10.3390/data9070088 - 28 Jun 2024

Abstract

Earthquake disasters are marked by their unpredictability and potential for extreme destructiveness. Accurate information on building damage, captured in post-earthquake remote sensing images, is critical for an effective post-disaster emergency response. The foundational features within these images are essential for the accurate extraction [...] Read more.

Earthquake disasters are marked by their unpredictability and potential for extreme destructiveness. Accurate information on building damage, captured in post-earthquake remote sensing images, is critical for an effective post-disaster emergency response. The foundational features within these images are essential for the accurate extraction of building damage data following seismic events. Presently, the availability of publicly accessible datasets tailored specifically to earthquake-damaged buildings is limited, and existing collections of post-earthquake building damage characteristics are insufficient. To address this gap and foster research advancement in this domain, this paper introduces a new, large-scale, publicly available dataset named the Major Earthquake Damage Building Feature Set (MEDBFS). This dataset comprises image data sourced from five significant global earthquakes and captured by various optical remote sensing satellites, featuring diverse scale characteristics and multiple spatial resolutions. It includes over 7000 images of buildings pre- and post-disaster, each subjected to stringent quality control and expert validation. The images are categorized into three primary groups: intact/slightly damaged, severely damaged, and completely collapsed. This paper develops a comprehensive feature set encompassing five dimensions: spectral, texture, edge detection, building index, and temporal sequencing, resulting in 16 distinct classes of feature images. This dataset is poised to significantly enhance the capabilities for data-driven identification and analysis of earthquake-induced building damage, thereby supporting the advancement of scientific and technological efforts for emergency earthquake response. Full article

(This article belongs to the Section Spatial Data Science and Digital Earth)

► Show Figures

Figure 1

23 pages, 9558 KiB

Open AccessData Descriptor

A Point Cloud Dataset of Vehicles Passing through a Toll Station for Use in Training Classification Algorithms

by Alexander Campo-Ramírez, Eduardo F. Caicedo-Bravo and Eval B. Bacca-Cortes

Data 2024, 9(7), 87; https://doi.org/10.3390/data9070087 - 27 Jun 2024

Abstract

This work presents a point cloud dataset of vehicles passing through a toll station in Colombia to be used to train artificial vision and computational intelligence algorithms. This article details the process of creating the dataset, covering initial data acquisition, range information preprocessing, [...] Read more.

This work presents a point cloud dataset of vehicles passing through a toll station in Colombia to be used to train artificial vision and computational intelligence algorithms. This article details the process of creating the dataset, covering initial data acquisition, range information preprocessing, point cloud validation, and vehicle labeling. Additionally, a detailed description of the structure and content of the dataset is provided, along with some potential applications of its use. The dataset consists of 36,026 total objects divided into 6 classes: 31,432 cars, campers, vans and 2-axle trucks with a single tire on the rear axle, 452 minibuses with a single tire on the rear axle, 1158 buses, 1179 2-axle small trucks, 797 2-axle large trucks, and 1008 trucks with 3 or more axles. The point clouds were captured using a LiDAR sensor and Doppler effect speed sensors. The dataset can be used to train and evaluate algorithms for range data processing, vehicle classification, vehicle counting, and traffic flow analysis. The dataset can also be used to develop new applications for intelligent transportation systems. Full article

► Show Figures

Figure 1

25 pages, 686 KiB

Open AccessArticle

Tuning Data Mining Models to Predict Secondary School Academic Performance

by William Hoyos and Isaac Caicedo-Castro

Data 2024, 9(7), 86; https://doi.org/10.3390/data9070086 - 26 Jun 2024

Abstract

In recent years, educational data mining has emerged as a growing discipline focused on developing models for predicting academic performance. The primary objective of this research was to tune classification models to predict academic performance in secondary school. The dataset employed for this [...] Read more.

In recent years, educational data mining has emerged as a growing discipline focused on developing models for predicting academic performance. The primary objective of this research was to tune classification models to predict academic performance in secondary school. The dataset employed for this study encompassed information from 19,545 high school students. We used descriptive statistics to characterise information contained in personal, school, and socioeconomic variables. We implemented two data mining techniques, namely artificial neural networks (ANN) and support vector machines (SVM). Parameter optimisation was conducted through five–fold cross–validation, and model performance was assessed using accuracy and F

_{1}

–Score. The results indicate a functional dependence between predictor variables and academic performance. The algorithms demonstrated an average performance exceeding 80% accuracy. Notably, ANN outperformed SVM in the dataset analysed. This type of methodology could help educational institutions to predict academic underachievement and thus generate strategies to improve students’ academic performance. Full article

(This article belongs to the Special Issue Data Mining and Computational Intelligence for E-Learning and Education—2nd Edition)

14 pages, 563 KiB

Open AccessData Descriptor

Evaluation of Online Inquiry Competencies of Chilean Elementary School Students: A Dataset

by Luz Chourio-Acevedo and Roberto González-Ibañez

Data 2024, 9(7), 85; https://doi.org/10.3390/data9070085 - 25 Jun 2024

Abstract

In the age of abundant digital content, children and adolescents face the challenge of developing new information literacy competencies, particularly those pertaining to online inquiry, in order to thrive academically and personally. This article addresses the challenge encountered by Chilean students in developing [...] Read more.

In the age of abundant digital content, children and adolescents face the challenge of developing new information literacy competencies, particularly those pertaining to online inquiry, in order to thrive academically and personally. This article addresses the challenge encountered by Chilean students in developing online inquiry competencies (OICs) essential for completing school assignments, particularly in natural science education. A diagnostic study was conducted with 279 elementary school students (from fourth to eighth grade) from four educational institutions in Chile, representing diverse socioeconomic backgrounds. An instrument aligned with the national curriculum, featuring questions related to natural sciences, was administered through a game named NEURONE-Trivia, which integrates a search engine and a logging component to record students’ search behavior. The primary outcome of this study is a dataset comprising demographic information, self-perception, and information-seeking behaviors data collected during students’ online search sessions for natural science research tasks. This dataset serves as a valuable resource for researchers, educators, and practitioners interested in investigating the interplay between demographic characteristics, self-perception, and information-seeking behaviors among elementary students within the context of OIC development. Furthermore, it enables further examination of students’ search behaviors concerning source evaluation, information retrieval, and information utilization. Full article

(This article belongs to the Special Issue Data Mining and Computational Intelligence for E-Learning and Education—2nd Edition)

8 pages, 1164 KiB

Open AccessData Descriptor

Gender Distribution of Scientific Prizes Is Associated with Naming of Awards after Men, Women or Neutral

by Katja Gehmlich and Stefan Krause

Data 2024, 9(7), 84; https://doi.org/10.3390/data9070084 - 25 Jun 2024

Abstract

Woman scientists have for long been under-represented as recipients of academic prizes. The reasons for this lack of recognition are manifold, including potential gender bias amongst award panels and nomination practices. This dataset of the gender distribution of 8747 recipients of 345 scientific [...] Read more.

Woman scientists have for long been under-represented as recipients of academic prizes. The reasons for this lack of recognition are manifold, including potential gender bias amongst award panels and nomination practices. This dataset of the gender distribution of 8747 recipients of 345 scientific medals and prizes awarded by 11 General Scientific Societies as well as subject-specific societies in the Earth and Environmental Sciences and in Cardiology between 1731 and 2021 explores the magnitude, temporal trends and potential drivers of observed gender imbalances. Our analysis revealed women were particularly underrepresented in awards named after men with awards not named after a person or named after a woman being more frequently awarded to woman scientists. Time-series analysis confirmed persisting trends that are only starting to change since the early 2000s, indicating that a lot remains to be accomplished to achieve true equity. We encourage the scientific community to extend our data and analysis, as they represent important evidence of the recognition of academic achievements towards other under-represented groups and including also nomination information. Full article

20 pages, 995 KiB

Open AccessArticle

Leveraging Sports Analytics and Association Rule Mining to Uncover Recovery and Economic Impacts in NBA Basketball

by Vangelis Sarlis, George Papageorgiou and Christos Tjortjis

Data 2024, 9(7), 83; https://doi.org/10.3390/data9070083 - 24 Jun 2024

Abstract

This study examines the multifaceted field of injuries and their impacts on performance in the National Basketball Association (NBA), leveraging a blend of Data Science, Data Mining, and Sports Analytics. Our research is driven by three pivotal questions: Firstly, we explore how Association [...] Read more.

This study examines the multifaceted field of injuries and their impacts on performance in the National Basketball Association (NBA), leveraging a blend of Data Science, Data Mining, and Sports Analytics. Our research is driven by three pivotal questions: Firstly, we explore how Association Rule Mining can elucidate the complex interplay between players’ salaries, physical attributes, and health conditions and their influence on team performance, including team losses and recovery times. Secondly, we investigate the relationship between players’ recovery times and their teams’ financial performance, probing interdependencies with players’ salaries and career trajectories. Lastly, we examine how insights gleaned from Data Mining and Sports Analytics on player recovery times and financial influence can inform strategic financial management and salary negotiations in basketball. Harnessing extensive datasets detailing player demographics, injuries, and contracts, we employ advanced analytic techniques to categorize injuries and transform contract data into a format conducive to deep analytical scrutiny. Our anomaly detection methodologies, an ensemble combination of DBSCAN, isolation forest, and Z-score algorithms, spotlight patterns and outliers in recovery times, unveiling the intricate dance between player health, performance, and financial outcomes. This nuanced understanding emphasizes the economic stakes of sports injuries. The findings of this study provide a rich, data-driven foundation for teams and stakeholders, advocating for more effective injury management and strategic planning. By addressing these research questions, our work not only contributes to the academic discourse in Sports Analytics but also offers practical frameworks for enhancing player welfare and team financial health, thereby shaping the future of strategic decisions in professional sports. Full article

(This article belongs to the Special Issue Machine Learning and Data Mining in Exercise, Sports and Health Research)

► Show Figures

Figure 1

15 pages, 866 KiB

Open AccessData Descriptor

Hardware Trojan Dataset of RISC-V and Web3 Generated with ChatGPT-4

by Victor Takashi Hayashi and Wilson Vicente Ruggiero

Data 2024, 9(6), 82; https://doi.org/10.3390/data9060082 - 19 Jun 2024

Abstract

Although hardware trojans impose a relevant threat to the hardware security of RISC-V and Web3 applications, existing datasets have a limited set of examples, as the most famous hardware trojan dataset TrustHub has 106 different trojans. RISC-V specifically has study cases of three [...] Read more.

Although hardware trojans impose a relevant threat to the hardware security of RISC-V and Web3 applications, existing datasets have a limited set of examples, as the most famous hardware trojan dataset TrustHub has 106 different trojans. RISC-V specifically has study cases of three and four different hardware trojans, and no research was found regarding Web3 hardware trojans in modules such as a hardware wallet. This research presents a dataset of 290 Verilog examples generated with ChatGPT-4 Large Language Model (LLM) based on 29 golden models and the TrustHub taxonomy. It is expected that this dataset supports future research endeavors regarding defense mechanisms against hardware trojans in RISC-V, hardware wallet, and hardware Proof of Work (PoW) miner. Full article

(This article belongs to the Section Information Systems and Data Management)

► Show Figures

Figure 1

17 pages, 1331 KiB

Open AccessData Descriptor

Beyond the Classroom: An Analysis of Internal and External Factors Related to Students’ Love of Learning and Educational Outcomes

by Charles M. Burke, Lori P. Montross and Vera G. Dianova

Data 2024, 9(6), 81; https://doi.org/10.3390/data9060081 - 16 Jun 2024

Abstract

This study explores the multifaceted factors influencing student learning motivations and educational outcomes. Utilizing a diverse student body from Franklin University Switzerland, the study emphasizes the impact of internal factors, such as the psychological state of flow and a self-reported love of learning, [...] Read more.

This study explores the multifaceted factors influencing student learning motivations and educational outcomes. Utilizing a diverse student body from Franklin University Switzerland, the study emphasizes the impact of internal factors, such as the psychological state of flow and a self-reported love of learning, alongside GPA and student cohort influences like year of study, academic discipline, country of origin, and academic travel. Through a cross-sectional survey of 112 students, the study evaluates how these factors correlate with and diverge from each other and student GPAs, aiming to dissect the influences of intrinsic motivations, demographic variables, and educational experiences. Our analysis revealed significant correlations between students’ self-reported love of learning, experiences of flow, and academic performance. Conversely, academic travel did not show a significant direct impact, suggesting that while such experiences are enriching, they do not necessarily translate into a greater love of learning, flow, or higher academic achievement in the short term. However, demographic factors, particularly discipline of study and country of origin, significantly influenced the students’ love of learning, indicating varied motivational drives across different cultural and educational backgrounds. This study provides valuable insights for educational policymakers and institutions aiming to cultivate more engaging and fulfilling learning environments. Full article

(This article belongs to the Special Issue Data Mining and Computational Intelligence for E-Learning and Education—2nd Edition)

► Show Figures

Figure 1

8 pages, 500 KiB

Open AccessData Descriptor

Data for Optimal Estimation of Under-Frequency Load Shedding Scheme Parameters by Considering Virtual Inertia Injection

by Santiago Bustamante-Mesa, Jorge W. Gonzalez-Sanchez, Sergio D. Saldarriaga-Zuluaga, Jesús M. López-Lezama and Nicolás Muñoz-Galeano

Data 2024, 9(6), 80; https://doi.org/10.3390/data9060080 - 13 Jun 2024

Abstract

The data presented in this paper are related to the paper entitled “Optimal Estimation of Under-Frequency Load Shedding Scheme Parameters by Considering Virtual Inertia Injection”, available in the Energies journal. Here, data are included to show the results of an Under-Frequency Load Shedding [...] Read more.

The data presented in this paper are related to the paper entitled “Optimal Estimation of Under-Frequency Load Shedding Scheme Parameters by Considering Virtual Inertia Injection”, available in the Energies journal. Here, data are included to show the results of an Under-Frequency Load Shedding (UFLS) scheme that considers the injection of virtual inertia by a VSC-HVDC link. The data obtained in six cases which were considered and analyzed are shown. In this paper, each case represents a different frequency response configuration in the event of generation loss, taking into account the presence or absence of a VSC-HVDC link, traditional and optimized UFLS schemes, as well as the injection of virtual inertia by the VSC-HVDC link. Data for each example contain the state of the relay, threshold, position in every delay, load shed, and relay configuration parameters. Data were obtained through Digsilent Power Factory and Python simulations. The purpose of this dataset is so that other researchers can reproduce the results reported in our paper. Full article

► Show Figures

Figure 1

18 pages, 2112 KiB

Open AccessData Descriptor

CrazyPAD: A Dataset for Assessing the Impact of Structural Defects on Nano-Quadcopter Performance

by Kamil Masalimov, Tagir Muslimov, Evgeny Kozlov and Rustem Munasypov

Data 2024, 9(6), 79; https://doi.org/10.3390/data9060079 - 13 Jun 2024

Abstract

This article presents a novel dataset focused on structural damage in quadcopters, addressing a significant gap in unmanned aerial vehicle (UAV or drone) research. The dataset is called CrazyPAD (Crazyflie Propeller Anomaly Data) according to the name of the Crazyflie 2.1 nano-quadrocopter used [...] Read more.

This article presents a novel dataset focused on structural damage in quadcopters, addressing a significant gap in unmanned aerial vehicle (UAV or drone) research. The dataset is called CrazyPAD (Crazyflie Propeller Anomaly Data) according to the name of the Crazyflie 2.1 nano-quadrocopter used to collect the data. Despite the existence of datasets on UAV anomalies and behavior, none of them covers structural damage specifically in nano-quadrocopters. Our dataset, therefore, provides critical data for developing predictive models for defect detection in nano-quadcopters. This work details the data collection methodology, involving rigorous simulations of structural damages and their effects on UAV performance. The ultimate goal is to enhance UAV safety by enabling accurate defect diagnosis and predictive maintenance, contributing substantially to the field of UAV technology and its practical applications. Full article

► Show Figures

Figure 1

19 pages, 2321 KiB

Open AccessData Descriptor

In Vivo and In Vitro Electrochemical Impedance Spectroscopy of Acute and Chronic Intracranial Electrodes

by Kyle P. O’Sullivan, Brian J. Philip, Jonathan L. Baker, John D. Rolston, Mark E. Orazem, Kevin J. Otto and Christopher R. Butson

Data 2024, 9(6), 78; https://doi.org/10.3390/data9060078 - 6 Jun 2024

Abstract

Invasive intracranial electrodes are used in both clinical and research applications for recording and stimulation of brain tissue, providing essential data in acute and chronic contexts. The impedance characteristics of the electrode–tissue interface (ETI) evolve over time and can change dramatically relative to [...] Read more.

Invasive intracranial electrodes are used in both clinical and research applications for recording and stimulation of brain tissue, providing essential data in acute and chronic contexts. The impedance characteristics of the electrode–tissue interface (ETI) evolve over time and can change dramatically relative to pre-implantation baseline. Understanding how ETI properties contribute to the recording and stimulation characteristics of an electrode can provide valuable insights for users who often do not have access to complex impedance characterizations of their devices. In contrast to the typical method of characterizing electrical impedance at a single frequency, we demonstrate a method for using electrochemical impedance spectroscopy (EIS) to investigate complex characteristics of the ETI of several commonly used acute and chronic electrodes. We also describe precise modeling strategies for verifying the accuracy of our instrumentation and understanding device–solution interactions, both in vivo and in vitro. Included with this publication is a dataset containing both in vitro and in vivo device characterizations, as well as some examples of modeling and error structure analysis results. These data can be used for more detailed interpretation of neural recordings performed on common electrode types, providing a more complete picture of their properties than is often available to users. Full article

► Show Figures

Figure 1

6 pages, 530 KiB

Open AccessData Descriptor

Data on Stark Broadening of N VI Spectral Lines

by Milan S. Dimitrijević, Magdalena D. Christova and Sylvie Sahal-Bréchot

Data 2024, 9(6), 77; https://doi.org/10.3390/data9060077 - 29 May 2024

Abstract

Data on Stark broadening parameters, spectral line widths, and shifts for 15 multiplets of N VI, whose spectral lines are broadened by collisions with electrons, protons, alpha particles (He III) and B III, B IV, B V and B VI ions, are presented. [...] Read more.

Data on Stark broadening parameters, spectral line widths, and shifts for 15 multiplets of N VI, whose spectral lines are broadened by collisions with electrons, protons, alpha particles (He III) and B III, B IV, B V and B VI ions, are presented. They have been calculated using the semiclassical perturbation theory, for temperatures from 50,000 K to 2,000,000 K, and perturber densities from 10¹⁶ cm⁻³ up to 10²⁴ cm⁻³. The data for e, p and He III are of particular interest for the analysis and modelling of atmospheres of hot and dense stars, as, e.g., white dwarfs, and for investigation of their spectra, and data for boron ions are used for analysis and modelling of laser-driven plasma in proton–boron fusion research. Full article

(This article belongs to the Section Information Systems and Data Management)

► Show Figures

Figure 1

7 pages, 614 KiB

Open AccessData Descriptor

The China Historical Christian Database: A Dataset Quantifying Christianity in China from 1550 to 1950

by Alex Mayfield, Margaret Frei, Daryl Ireland and Eugenio Menegon

Data 2024, 9(6), 76; https://doi.org/10.3390/data9060076 - 29 May 2024

Abstract

The era of digitization is revolutionizing traditional humanities research, presenting both novel methodologies and challenges. This field harnesses quantitative techniques to yield groundbreaking insights, contingent upon comprehensive datasets on historical subjects. The China Historical Christian Database (CHCD) exemplifies this trend, furnishing researchers with [...] Read more.

The era of digitization is revolutionizing traditional humanities research, presenting both novel methodologies and challenges. This field harnesses quantitative techniques to yield groundbreaking insights, contingent upon comprehensive datasets on historical subjects. The China Historical Christian Database (CHCD) exemplifies this trend, furnishing researchers with a rich repository of historical, relational, and geographical data about Christianity in China from 1550 to 1950. The study of Christianity in China confronts formidable obstacles, including the mobility of historical agents, fluctuating relational networks, and linguistic disparities among scattered sources. The CHCD addresses these challenges by curating an open-access database built in neo4j that records information about Christian institutions in China and those that worked inside of them. Drawing on historical sources, the CHCD contains temporal, relational, and geographic data. The database currently has over 40,000 nodes and 200,000 relationships, and continues to grow. Beyond its utility for religious studies, the CHCD encompasses broader interdisciplinary inquiries including social network analysis, geospatial visualization, and economic modeling. This article introduces the CHCD’s structure, and explains the data collection and curation process. Full article

► Show Figures

Figure 1

27 pages, 512 KiB

Open AccessArticle

De-Anonymizing Users across Rating Datasets via Record Linkage and Quasi-Identifier Attacks

by Nicolás Torres and Patricio Olivares

Data 2024, 9(6), 75; https://doi.org/10.3390/data9060075 - 27 May 2024

Abstract

The widespread availability of pseudonymized user datasets has enabled personalized recommendation systems. However, recent studies have shown that users can be de-anonymized by exploiting the uniqueness of their data patterns, raising significant privacy concerns. This paper presents a novel approach that tackles the [...] Read more.

The widespread availability of pseudonymized user datasets has enabled personalized recommendation systems. However, recent studies have shown that users can be de-anonymized by exploiting the uniqueness of their data patterns, raising significant privacy concerns. This paper presents a novel approach that tackles the challenging task of linking user identities across multiple rating datasets from diverse domains, such as movies, books, and music, by leveraging the consistency of users’ rating patterns as high-dimensional quasi-identifiers. The proposed method combines probabilistic record linkage techniques with quasi-identifier attacks, employing the Fellegi–Sunter model to compute the likelihood of two records referring to the same user based on the similarity of their rating vectors. Through extensive experiments on three publicly available rating datasets, we demonstrate the effectiveness of the proposed approach in achieving high precision and recall in cross-dataset de-anonymization tasks, outperforming existing techniques, with F1-scores ranging from 0.72 to 0.79 for pairwise de-anonymization tasks. The novelty of this research lies in the unique integration of record linkage techniques with quasi-identifier attacks, enabling the effective exploitation of the uniqueness of rating patterns as high-dimensional quasi-identifiers to link user identities across diverse datasets, addressing a limitation of existing methodologies. We thoroughly investigate the impact of various factors, including similarity metrics, dataset combinations, data sparsity, and user demographics, on the de-anonymization performance. This work highlights the potential privacy risks associated with the release of anonymized user data across diverse contexts and underscores the critical need for stronger anonymization techniques and tailored privacy-preserving mechanisms for rating datasets and recommender systems. Full article

(This article belongs to the Section Information Systems and Data Management)

► Show Figures

Figure 1

16 pages, 1931 KiB

Open AccessArticle

CVs Classification Using Neural Network Approaches Combined with BERT and Gensim: CVs of Moroccan Engineering Students

by Aniss Qostal, Aniss Moumen and Younes Lakhrissi

Data 2024, 9(6), 74; https://doi.org/10.3390/data9060074 - 24 May 2024

Abstract

Deep learning (DL)-oriented document processing is widely used in different fields for extraction, recognition, and classification processes from raw corpus of data. The article examines the application of deep learning approaches, based on different neural network methods, including Gated Recurrent Unit (GRU), long [...] Read more.

Deep learning (DL)-oriented document processing is widely used in different fields for extraction, recognition, and classification processes from raw corpus of data. The article examines the application of deep learning approaches, based on different neural network methods, including Gated Recurrent Unit (GRU), long short-term memory (LSTM), and convolutional neural networks (CNNs). The compared models were combined with two different word embedding techniques, namely: Bidirectional Encoder Representations from Transformers (BERT) and Gensim Word2Vec. The models are designed to evaluate the performance of architectures based on neural network techniques for the classification of CVs of Moroccan engineering students at ENSAK (National School of Applied Sciences of Kenitra, Ibn Tofail University). The used dataset included CVs collected from engineering students at ENSAK in 2023 for a project on the employability of Moroccan engineers in which new approaches were applied, especially machine learning, deep learning, and big data. Accordingly, 867 resumes were collected from five specialties of study (Electrical Engineering (ELE), Networks and Systems Telecommunications (NST), Computer Engineering (CE), Automotive Mechatronics Engineering (AutoMec), Industrial Engineering (Indus)). The results showed that the proposed models based on the BERT embedding approach had more accuracy compared to models based on the Gensim Word2Vec embedding approach. Accordingly, the CNN-GRU/BERT model achieved slightly better accuracy with 0.9351 compared to other hybrid models. On the other hand, single learning models also have good metrics, especially based on BERT embedding architectures, where CNN has the best accuracy with 0.9188. Full article

► Show Figures

Figure 1

22 pages, 1226 KiB

Open AccessArticle

Comparative Analysis of the Predictive Performance of an ANN and Logistic Regression for the Acceptability of Eco-Mobility Using the Belgrade Data Set

by Jelica Komarica, Draženko Glavić and Snežana Kaplanović

Data 2024, 9(5), 73; https://doi.org/10.3390/data9050073 - 19 May 2024

Abstract

To solve the problem of environmental pollution caused by road traffic, alternatives to vehicles with internal combustion engines are often proposed. As such, eco-mobility microvehicles have significant potential in the fight against environmental pollution, but only on the condition that they are widely [...] Read more.

To solve the problem of environmental pollution caused by road traffic, alternatives to vehicles with internal combustion engines are often proposed. As such, eco-mobility microvehicles have significant potential in the fight against environmental pollution, but only on the condition that they are widely accepted and that they replace the vehicles that predominantly pollute the environment. With this in mind, this study aims to elucidate the main variables that influence the acceptability of these vehicles, using prediction models based on binary logistic regression and a multilayer artificial neural network—a multilayer perceptron (ANN). The data of a random sample obtained via an online questionnaire, answered by 503 inhabitants of Belgrade (Serbia), were used for training and testing the model. A multilayer perceptron with 9 and 7 neurons in two hidden layers, a hyperbolic tangent activation function in the hidden layer, and an identity function in the output layer performed slightly better than the binary logistic regression model. With an accuracy of 85%, a precision of 79%, a recall of 81%, and an area under the ROC curve of 0.9, the multilayer perceptron model recognized the influential variables in predicting acceptability. The results of the model indicate that a respondent’s relationship to their current environmental pollution, the frequency of their use of modes of transport such as bicycles and motorcycles, their mileage for commuting, and their personal income have the greatest influence on the acceptability of using eco-mobility vehicles. Full article

► Show Figures

Figure 1

13 pages, 2681 KiB

Open AccessArticle

A Benchmark Data Set for Long-Term Monitoring in the eLTER Site Gesäuse-Johnsbachtal

by Florian Lippl, Alexander Maringer, Margit Kurka, Jakob Abermann, Wolfgang Schöner and Manuela Hirschmugl

Data 2024, 9(5), 72; https://doi.org/10.3390/data9050072 - 18 May 2024

Abstract

This paper gives an overview over all currently available data sets for the European Long-term Ecosystem Research (eLTER) monitoring site Gesäuse-Johnsbachtal. The site is part of the LTSER platform Eisenwurzen in the Alps of the province of Styria, Austria. It contains both protected [...] Read more.

This paper gives an overview over all currently available data sets for the European Long-term Ecosystem Research (eLTER) monitoring site Gesäuse-Johnsbachtal. The site is part of the LTSER platform Eisenwurzen in the Alps of the province of Styria, Austria. It contains both protected (National Park Gesäuse) and non-protected areas (Johnsbachtal). Although the main research focus of the eLTER monitoring site Gesäuse-Johnsbachtal is on inland surface running waters, forests and other wooded land, the eLTER whole system (WAILS) approach was followed in regard to the data selection, systematically screening all available data in regard to its suitability as eLTER’s Standard Observations (SOs). Thus, data from all system strata was included, incorporating Geosphere, Atmosphere, Hydrosphere, Biosphere and Sociosphere. In the WAILS approach these SOs are key data for a whole system approach towards long term ecosystem research. Altogether, 54 data sets have been collected for the eLTER monitoring site Gesäuse-Johnsbachtal and included in the Dynamical Ecological Information Management System – Site and Data Registry (DEIMS-SDR), which is the eLTER data platform. The presented work provides all these data sets through dedicated data repositories for FAIR use. This paper gives an overview on all compiled data sets and their main properties. Additionally, the available data are evaluated in a concluding gap analysis with regard to the needed observation data according to WAILS, followed by an outlook on how to fill these gaps. Full article

► Show Figures

Figure 1

24 pages, 545 KiB

Open AccessArticle

Neural Architecture Comparison for Bibliographic Reference Segmentation: An Empirical Study

by Rodrigo Cuéllar Hidalgo, Raúl Pinto Elías, Juan-Manuel Torres-Moreno, Osslan Osiris Vergara Villegas , Gerardo Reyes Salgado and Andrea Magadán Salazar

Data 2024, 9(5), 71; https://doi.org/10.3390/data9050071 - 18 May 2024

Abstract

In the realm of digital libraries, efficiently managing and accessing scientific publications necessitates automated bibliographic reference segmentation. This study addresses the challenge of accurately segmenting bibliographic references, a task complicated by the varied formats and styles of references. Focusing on the empirical evaluation [...] Read more.

In the realm of digital libraries, efficiently managing and accessing scientific publications necessitates automated bibliographic reference segmentation. This study addresses the challenge of accurately segmenting bibliographic references, a task complicated by the varied formats and styles of references. Focusing on the empirical evaluation of Conditional Random Fields (CRF), Bidirectional Long Short-Term Memory with CRF (BiLSTM + CRF), and Transformer Encoder with CRF (Transformer + CRF) architectures, this research employs Byte Pair Encoding and Character Embeddings for vector representation. The models underwent training on the extensive Giant corpus and subsequent evaluation on the Cora Corpus to ensure a balanced and rigorous comparison, maintaining uniformity across embedding layers, normalization techniques, and Dropout strategies. Results indicate that the BiLSTM + CRF architecture outperforms its counterparts by adeptly handling the syntactic structures prevalent in bibliographic data, achieving an F1-Score of 0.96. This outcome highlights the necessity of aligning model architecture with the specific syntactic demands of bibliographic reference segmentation tasks. Consequently, the study establishes the BiLSTM + CRF model as a superior approach within the current state-of-the-art, offering a robust solution for the challenges faced in digital library management and scholarly communication. Full article

(This article belongs to the Special Issue Advances in Text Mining Techniques and Applications for Knowledge Discovery)

► Show Figures

Figure 1

17 pages, 6833 KiB

Open AccessData Descriptor

Continuous Wave Measurements Collected in Intermediate Depth throughout the North Sea Storm Season during the RealDune/REFLEX Experiments

by Jantien Rutten, Marion Tissier, Paul van Wiechen, Xinyi Zhang, Sierd de Vries, Ad Reniers and Jan-Willem Mol

Data 2024, 9(5), 70; https://doi.org/10.3390/data9050070 - 17 May 2024

Cited by 1

Abstract

High-resolution wave measurements at intermediate water depth are required to improve coastal impact modeling. Specifically, such data sets are desired to calibrate and validate models, and broaden the insight on the boundary conditions that force models. Here, we present a wave data set [...] Read more.

High-resolution wave measurements at intermediate water depth are required to improve coastal impact modeling. Specifically, such data sets are desired to calibrate and validate models, and broaden the insight on the boundary conditions that force models. Here, we present a wave data set collected in the North Sea at three stations in intermediate water depth (6–14 m) during the 2021/2022 storm season as part of the RealDune/REFLEX experiments. Continuous measurements of synchronized surface elevation, velocity and pressure were recorded at 2–4 Hz by Acoustic Doppler Profilers and an Acoustic Doppler Velocimeter for a 5-month duration. Time series were quality-controlled, directional-frequency energy spectra were calculated and common bulk parameters were derived. Measured wave conditions vary from calm to energetic with 0.1–5.0 m sea-swell wave height, 5–16 s mean wave period and W-NNW direction. Nine storms, i.e., wave height beyond 2.5 m for at least six hours, were recorded including the triple storms Dudley, Eunice and Franklin. This unique data set can be used to investigate wave transformation, wave nonlinearity and wave directionality for higher and lower frequencies (e.g., sea-swell and infragravity waves) to compare with theoretical and empirical descriptions. Furthermore, the data can serve to force, calibrate and validate models during storm conditions. Full article

► Show Figures

Figure 1

Journal Description

Latest Articles

Journal Menu

Journal Browser

Highly Accessed Articles

Latest Books

E-Mail Alert

News

Topics

Conferences

Special Issues

Topical Collections

Further Information

Guidelines

MDPI Initiatives

Follow MDPI