subscribe to arXiv mailings

Can social media shape the security of next-generation connected vehicles?

Authors: Nicola Scarano, Luca Mannella, Alessandro Savino, Stefano Di Carlo

Abstract: The increasing adoption of connectivity and electronic components in vehicles makes these systems valuable targets for attackers. While automotive vendors prioritize safety, there remains a critical need for comprehensive assessment and analysis of cyber risks. In this context, this paper proposes a Social Media Automotive Threat Intelligence (SOCMATI) framework, specifically designed for the emer… ▽ More The increasing adoption of connectivity and electronic components in vehicles makes these systems valuable targets for attackers. While automotive vendors prioritize safety, there remains a critical need for comprehensive assessment and analysis of cyber risks. In this context, this paper proposes a Social Media Automotive Threat Intelligence (SOCMATI) framework, specifically designed for the emerging field of automotive cybersecurity. The framework leverages advanced intelligence techniques and machine learning models to extract valuable insights from social media. Four use cases illustrate the framework's potential by demonstrating how it can significantly enhance threat assessment procedures within the automotive industry. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: Position paper, four pages, two images

arXiv:2407.03111 [pdf, other]

Compressed Latent Replays for Lightweight Continual Learning on Spiking Neural Networks

Authors: Alberto Dequino, Alessio Carpegna, Davide Nadalini, Alessandro Savino, Luca Benini, Stefano Di Carlo, Francesco Conti

Abstract: Rehearsal-based Continual Learning (CL) has been intensely investigated in Deep Neural Networks (DNNs). However, its application in Spiking Neural Networks (SNNs) has not been explored in depth. In this paper we introduce the first memory-efficient implementation of Latent Replay (LR)-based CL for SNNs, designed to seamlessly integrate with resource-constrained devices. LRs combine new samples wit… ▽ More Rehearsal-based Continual Learning (CL) has been intensely investigated in Deep Neural Networks (DNNs). However, its application in Spiking Neural Networks (SNNs) has not been explored in depth. In this paper we introduce the first memory-efficient implementation of Latent Replay (LR)-based CL for SNNs, designed to seamlessly integrate with resource-constrained devices. LRs combine new samples with latent representations of previously learned data, to mitigate forgetting. Experiments on the Heidelberg SHD dataset with Sample and Class-Incremental tasks reach a Top-1 accuracy of 92.5% and 92%, respectively, without forgetting the previously learned information. Furthermore, we minimize the LRs' requirements by applying a time-domain compression, reducing by two orders of magnitude their memory requirement, with respect to a naive rehearsal setup, with a maximum accuracy drop of 4%. On a Multi-Class-Incremental task, our SNN learns 10 new classes from an initial set of 10, reaching a Top-1 accuracy of 78.4% on the full test set. △ Less

Submitted 4 July, 2024; v1 submitted 8 May, 2024; originally announced July 2024.

arXiv:2407.00483 [pdf, other]

Navigating the road to automotive cybersecurity compliance

Authors: Franco Oberti, Fabrizio Abrate, Alessandro Savino, Filippo Parisi, Stefano Di Carlo

Abstract: The automotive industry has evolved significantly since the introduction of the Ford Model T in 1908. Today's vehicles are not merely mechanical constructs; they are integral components of a complex digital ecosystem, equipped with advanced connectivity features powered by Artificial Intelligence and cloud computing technologies. This evolution has enhanced vehicle safety, efficiency, and the over… ▽ More The automotive industry has evolved significantly since the introduction of the Ford Model T in 1908. Today's vehicles are not merely mechanical constructs; they are integral components of a complex digital ecosystem, equipped with advanced connectivity features powered by Artificial Intelligence and cloud computing technologies. This evolution has enhanced vehicle safety, efficiency, and the overall driving experience. However, it also introduces new challenges, notably in cybersecurity. With the increasing integration of digital technologies, vehicles have become more susceptible to cyber-attacks, prompting significant cybersecurity concerns. These concerns include securing sensitive data, protecting vehicles from unauthorized access, and ensuring user privacy. In response, the automotive industry is compelled to adopt robust cybersecurity measures to safeguard both vehicles and data against potential threats. Legislative frameworks such as UNR155 and UNR156 by the United Nations, along with other international regulations, aim to establish stringent cybersecurity mandates. These regulations require compliance with comprehensive cybersecurity management systems and necessitate regular updates and testing to cope with the evolving nature of cyber threats. The introduction of such regulations highlights the growing recognition of cybersecurity as a critical component of automotive safety and functionality. The future of automotive cybersecurity lies in the continuous development of advanced protective measures and collaborative efforts among all stakeholders, including manufacturers, policymakers, and cybersecurity professionals. Only through such concerted efforts can the industry hope to address the dual goals of innovation in vehicle functionality and stringent security measures against the backdrop of an increasingly interconnected digital landscape. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2407.00052 [pdf]

Vitamin-V: Expanding Open-Source RISC-V Cloud Environments

Authors: Ramon Canal, Stefano Di Carlo, Dimitris Gizopoulos, Alberto Scionti, Francesco Lubrano, Josep-Lluís Berral, Aaron Call, Diego Marron, Konstantinos Nikas, Dionisios Pnevmatikatos, Daniel Raho, Alvise Rigo, Yannis Papaefstathiou, José María Arnau, Angelos Arelakis

Abstract: Among the key contributions of Vitamin-V (2023-2025 Horizon Europe project), we develop a complete RISC-V open-source software stack for cloud services with comparable performance to the cloud-dominant x86 counterpart. In this paper, we detail the software suites and applications ported plus the three cloud setups under evaluation. Among the key contributions of Vitamin-V (2023-2025 Horizon Europe project), we develop a complete RISC-V open-source software stack for cloud services with comparable performance to the cloud-dominant x86 counterpart. In this paper, we detail the software suites and applications ported plus the three cloud setups under evaluation. △ Less

Submitted 12 June, 2024; originally announced July 2024.

Comments: RISC-V Summit Europe 2024, 24-28 June 2024

arXiv:2406.10282 [pdf, other]

Hardware-based stack buffer overflow attack detection on RISC-V architectures

Authors: Cristiano Pegoraro Chenet, Ziteng Zhang, Alessandro Savino, Stefano Di Carlo

Abstract: This work evaluates how well hardware-based approaches detect stack buffer overflow (SBO) attacks in RISC-V systems. We conducted simulations on the PULP platform and examined micro-architecture events using semi-supervised anomaly detection techniques. The findings showed the challenge of detection performance. Thus, a potential solution combines software and hardware-based detectors concurrently… ▽ More This work evaluates how well hardware-based approaches detect stack buffer overflow (SBO) attacks in RISC-V systems. We conducted simulations on the PULP platform and examined micro-architecture events using semi-supervised anomaly detection techniques. The findings showed the challenge of detection performance. Thus, a potential solution combines software and hardware-based detectors concurrently, with hardware as the primary defense. The hardware-based approaches present compelling benefits that could enhance RISC-V-based architectures. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 2 pages, 2 figures

arXiv:2406.07125 [pdf, other]

CARACAS: vehiCular ArchitectuRe for detAiled Can Attacks Simulation

Authors: Sadek Misto Kirdi, Nicola Scarano, Franco Oberti, Luca Mannella, Stefano Di Carlo, Alessandro Savino

Abstract: Modern vehicles are increasingly vulnerable to attacks that exploit network infrastructures, particularly the Controller Area Network (CAN) networks. To effectively counter such threats using contemporary tools like Intrusion Detection Systems (IDSs) based on data analysis and classification, large datasets of CAN messages become imperative. This paper delves into the feasibility of generating syn… ▽ More Modern vehicles are increasingly vulnerable to attacks that exploit network infrastructures, particularly the Controller Area Network (CAN) networks. To effectively counter such threats using contemporary tools like Intrusion Detection Systems (IDSs) based on data analysis and classification, large datasets of CAN messages become imperative. This paper delves into the feasibility of generating synthetic datasets by harnessing the modeling capabilities of simulation frameworks such as Simulink coupled with a robust representation of attack models to present CARACAS, a vehicular model, including component control via CAN messages and attack injection capabilities. CARACAS showcases the efficacy of this methodology, including a Battery Electric Vehicle (BEV) model, and focuses on attacks targeting torque control in two distinct scenarios. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: 6 pages, 8 figures, TrustAICyberSec workshop - IEEE ISCC 2024

arXiv:2406.04227 [pdf, other]

R-CONV: An Analytical Approach for Efficient Data Reconstruction via Convolutional Gradients

Authors: Tamer Ahmed Eltaras, Qutaibah Malluhi, Alessandro Savino, Stefano Di Carlo, Adnan Qayyum, Junaid Qadir

Abstract: In the effort to learn from extensive collections of distributed data, federated learning has emerged as a promising approach for preserving privacy by using a gradient-sharing mechanism instead of exchanging raw data. However, recent studies show that private training data can be leaked through many gradient attacks. While previous analytical-based attacks have successfully reconstructed input da… ▽ More In the effort to learn from extensive collections of distributed data, federated learning has emerged as a promising approach for preserving privacy by using a gradient-sharing mechanism instead of exchanging raw data. However, recent studies show that private training data can be leaked through many gradient attacks. While previous analytical-based attacks have successfully reconstructed input data from fully connected layers, their effectiveness diminishes when applied to convolutional layers. This paper introduces an advanced data leakage method to efficiently exploit convolutional layers' gradients. We present a surprising finding: even with non-fully invertible activation functions, such as ReLU, we can analytically reconstruct training samples from the gradients. To the best of our knowledge, this is the first analytical approach that successfully reconstructs convolutional layer inputs directly from the gradients, bypassing the need to reconstruct layers' outputs. Prior research has mainly concentrated on the weight constraints of convolution layers, overlooking the significance of gradient constraints. Our findings demonstrate that existing analytical methods used to estimate the risk of gradient attacks lack accuracy. In some layers, attacks can be launched with less than 5% of the reported constraints. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.04102 [pdf, other]

Chromatic Topological Data Analysis

Authors: Sebastiano Cultrera di Montesano, Ondrej Draganov, Herbert Edelsbrunner, Morteza Saghafian

Abstract: Exploring the shape of point configurations has been a key driver in the evolution of TDA (short for topological data analysis) since its infancy. This survey illustrates the recent efforts to broaden these ideas to model spatial interactions among multiple configurations, each distinguished by a color. It describes advances in this area and prepares the ground for further exploration by mentionin… ▽ More Exploring the shape of point configurations has been a key driver in the evolution of TDA (short for topological data analysis) since its infancy. This survey illustrates the recent efforts to broaden these ideas to model spatial interactions among multiple configurations, each distinguished by a color. It describes advances in this area and prepares the ground for further exploration by mentioning unresolved questions and promising research avenues while focusing on the overlap with discrete geometry. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2405.17920 [pdf, other]

Banana Trees for the Persistence in Time Series Experimentally

Authors: Lara Ost, Sebastiano Cultrera di Montesano, Herbert Edelsbrunner

Abstract: In numerous fields, dynamic time series data require continuous updates, necessitating efficient data processing techniques for accurate analysis. This paper examines the banana tree data structure, specifically designed to efficiently maintain persistent homology -- a multi-scale topological descriptor -- for dynamically changing time series data. We implement this data structure and conduct an e… ▽ More In numerous fields, dynamic time series data require continuous updates, necessitating efficient data processing techniques for accurate analysis. This paper examines the banana tree data structure, specifically designed to efficiently maintain persistent homology -- a multi-scale topological descriptor -- for dynamically changing time series data. We implement this data structure and conduct an experimental study to assess its properties and runtime for update operations. Our findings indicate that banana trees are highly effective with unbiased random data, outperforming state-of-the-art static algorithms in these scenarios. Additionally, our results show that real-world time series share structural properties with unbiased random walks, suggesting potential practical utility for our implementation. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.15231 [pdf, other]

Cardinality Estimation on Hyper-relational Knowledge Graphs

Authors: Fei Teng, Haoyang Li, Shimin Di, Lei Chen

Abstract: Cardinality Estimation (CE) for query is to estimate the number of results without execution, which is an effective index in query optimization. Recently, CE over has achieved great success in knowledge graphs (KGs) that consist of triple facts. To more precisely represent facts, current researchers propose hyper-relational KGs (HKGs) to represent a triple fact with qualifiers, where qualifiers pr… ▽ More Cardinality Estimation (CE) for query is to estimate the number of results without execution, which is an effective index in query optimization. Recently, CE over has achieved great success in knowledge graphs (KGs) that consist of triple facts. To more precisely represent facts, current researchers propose hyper-relational KGs (HKGs) to represent a triple fact with qualifiers, where qualifiers provide additional context to the fact. However, existing CE methods over KGs achieve unsatisfying performance on HKGs due to the complexity of qualifiers in HKGs. Also, there is only one dataset for HKG query cardinality estimation, i.e., WD50K-QE, which is not comprehensive and only covers limited patterns. The lack of querysets over HKG also becomes a bottleneck to comprehensively investigate CE problems on HKGs. In this work, we first construct diverse and unbiased hyper-relational querysets over three popular HKGs for investigating CE. Besides, we also propose a novel qualifier-attached graph neural network (GNN) model that effectively incorporates qualifier information and adaptively combines outputs from multiple GNN layers, to accurately predict the cardinality. Our experiments illustrate that the proposed hyper-relational query encoder outperforms all state-of-the-art CE methods over three popular HKGs on the diverse and unbiased benchmark. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.09541 [pdf, other]

Spectral complexity of deep neural networks

Authors: Simmaco Di Lillo, Domenico Marinucci, Michele Salvi, Stefano Vigogna

Abstract: It is well-known that randomly initialized, push-forward, fully-connected neural networks weakly converge to isotropic Gaussian processes, in the limit where the width of all layers goes to infinity. In this paper, we propose to use the angular power spectrum of the limiting field to characterize the complexity of the network architecture. In particular, we define sequences of random variables ass… ▽ More It is well-known that randomly initialized, push-forward, fully-connected neural networks weakly converge to isotropic Gaussian processes, in the limit where the width of all layers goes to infinity. In this paper, we propose to use the angular power spectrum of the limiting field to characterize the complexity of the network architecture. In particular, we define sequences of random variables associated with the angular power spectrum, and provide a full characterization of the network complexity in terms of the asymptotic distribution of these sequences as the depth diverges. On this basis, we classify neural networks as low-disorder, sparse, or high-disorder; we show how this classification highlights a number of distinct features for standard activation functions, and in particular, sparsity properties of ReLU networks. Our theoretical results are also validated by numerical simulations. △ Less

Submitted 27 June, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

MSC Class: 68T07; 60G60; 33C55; 62M15

arXiv:2405.08122 [pdf, other]

doi 10.1109/TCST.2024.3400571

Equivariant Deep Learning of Mixed-Integer Optimal Control Solutions for Vehicle Decision Making and Motion Planning

Authors: Rudolf Reiter, Rien Quirynen, Moritz Diehl, Stefano Di Cairano

Abstract: Mixed-integer quadratic programs (MIQPs) are a versatile way of formulating vehicle decision making and motion planning problems, where the prediction model is a hybrid dynamical system that involves both discrete and continuous decision variables. However, even the most advanced MIQP solvers can hardly account for the challenging requirements of automotive embedded platforms. Thus, we use machine… ▽ More Mixed-integer quadratic programs (MIQPs) are a versatile way of formulating vehicle decision making and motion planning problems, where the prediction model is a hybrid dynamical system that involves both discrete and continuous decision variables. However, even the most advanced MIQP solvers can hardly account for the challenging requirements of automotive embedded platforms. Thus, we use machine learning to simplify and hence speed up optimization. Our work builds on recent ideas for solving MIQPs in real-time by training a neural network to predict the optimal values of integer variables and solving the remaining problem by online quadratic programming. Specifically, we propose a recurrent permutation equivariant deep set that is particularly suited for imitating MIQPs that involve many obstacles, which is often the major source of computational burden in motion planning problems. Our framework comprises also a feasibility projector that corrects infeasible predictions of integer variables and considerably increases the likelihood of computing a collision-free trajectory. We evaluate the performance, safety and real-time feasibility of decision-making for autonomous driving using the proposed approach on realistic multi-lane traffic scenarios with interactive agents in SUMO simulations. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.06290

Path Planning and Motion Control for Accurate Positioning of Car-like Robots

Authors: Jin Dai, Zejiang Wang, Yebin Wang, Rien Quirynen, Stefano Di Cairano

Abstract: This paper investigates the planning and control for accurate positioning of car-like robots. We propose a solution that integrates two modules: a motion planner, facilitated by the rapidly-exploring random tree algorithm and continuous-curvature (CC) steering technique, generates a CC trajectory as a reference; and a nonlinear model predictive controller (NMPC) regulates the robot to accurately t… ▽ More This paper investigates the planning and control for accurate positioning of car-like robots. We propose a solution that integrates two modules: a motion planner, facilitated by the rapidly-exploring random tree algorithm and continuous-curvature (CC) steering technique, generates a CC trajectory as a reference; and a nonlinear model predictive controller (NMPC) regulates the robot to accurately track the reference trajectory. Based on the $μ$-tangency conditions in prior art, we derive explicit existence conditions and develop associated computation methods for a special class of CC paths which not only admit the same driving patterns as Reeds-Shepp paths but also consist of cusp-free clothoid turns. Afterwards, we create an autonomous vehicle parking scenario where the NMPC endeavors to follow the reference trajectory. Feasibility and computational efficiency of the CC steering are validated by numerical simulation. CarSim-Simulink joint simulations statistically verify that with exactly same NMPC, the closed-loop system with CC trajectories as references substantially outperforms the case where Reeds-Shepp trajectories are used as references. △ Less

Submitted 8 June, 2024; v1 submitted 10 May, 2024; originally announced May 2024.

Comments: The paper needs further revision to guarantee technical correctness and conciseness

arXiv:2405.02520 [pdf, other]

TurboFFT: A High-Performance Fast Fourier Transform with Fault Tolerance on GPU

Authors: Shixun Wu, Yujia Zhai, Jinyang Liu, Jiajun Huang, Zizhe Jian, Huangliang Dai, Sheng Di, Zizhong Chen, Franck Cappello

Abstract: The Fast Fourier Transform (FFT), as a core computation in a wide range of scientific applications, is increasingly threatened by reliability issues. In this paper, we introduce TurboFFT, a high-performance FFT implementation equipped with a two-sided checksum scheme that detects and corrects silent data corruptions at computing units efficiently. The proposed two-sided checksum addresses the erro… ▽ More The Fast Fourier Transform (FFT), as a core computation in a wide range of scientific applications, is increasingly threatened by reliability issues. In this paper, we introduce TurboFFT, a high-performance FFT implementation equipped with a two-sided checksum scheme that detects and corrects silent data corruptions at computing units efficiently. The proposed two-sided checksum addresses the error propagation issue by encoding a batch of input signals with different linear combinations, which not only allows fast batched error detection but also enables error correction on-the-fly instead of recomputing. We explore two-sided checksum designs at the kernel, thread, and threadblock levels, and provide a baseline FFT implementation competitive to the state-of-the-art, closed-source cuFFT. We demonstrate a kernel fusion strategy to mitigate and overlap the computation/memory overhead introduced by fault tolerance with underlying FFT computation. We present a template-based code generation strategy to reduce development costs and support a wide range of input sizes and data types. Experimental results on an NVIDIA A100 server GPU and a Tesla Turing T4 GPU demonstrate TurboFFT offers a competitive or superior performance compared to the closed-source library cuFFT. TurboFFT only incurs a minimum overhead (7\% to 15\% on average) compared to cuFFT, even under hundreds of error injections per minute for both single and double precision. TurboFFT achieves a 23\% improvement compared to existing fault tolerance FFT schemes. △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2404.11015 [pdf, other]

FedFa: A Fully Asynchronous Training Paradigm for Federated Learning

Authors: Haotian Xu, Zhaorui Zhang, Sheng Di, Benben Liu, Khalid Ayed Alharthi, Jiannong Cao

Abstract: Federated learning has been identified as an efficient decentralized training paradigm for scaling the machine learning model training on a large number of devices while guaranteeing the data privacy of the trainers. FedAvg has become a foundational parameter update strategy for federated learning, which has been promising to eliminate the effect of the heterogeneous data across clients and guaran… ▽ More Federated learning has been identified as an efficient decentralized training paradigm for scaling the machine learning model training on a large number of devices while guaranteeing the data privacy of the trainers. FedAvg has become a foundational parameter update strategy for federated learning, which has been promising to eliminate the effect of the heterogeneous data across clients and guarantee convergence. However, the synchronization parameter update barriers for each communication round during the training significant time on waiting, slowing down the training procedure. Therefore, recent state-of-the-art solutions propose using semi-asynchronous approaches to mitigate the waiting time cost with guaranteed convergence. Nevertheless, emerging semi-asynchronous approaches are unable to eliminate the waiting time completely. We propose a full asynchronous training paradigm, called FedFa, which can guarantee model convergence and eliminate the waiting time completely for federated learning by using a few buffered results on the server for parameter updating. Further, we provide theoretical proof of the convergence rate for our proposed FedFa. Extensive experimental results indicate our approach effectively improves the training performance of federated learning by up to 6x and 4x speedup compared to the state-of-the-art synchronous and semi-asynchronous strategies while retaining high accuracy in both IID and Non-IID scenarios. △ Less

Submitted 20 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Journal ref: IJCAI 2024: the 33rd International Joint Conference on Artificial Intelligence

arXiv:2404.03714 [pdf, other]

doi 10.3390/electronics13091744

SpikeExplorer: hardware-oriented Design Space Exploration for Spiking Neural Networks on FPGA

Authors: Dario Padovano, Alessio Carpegna, Alessandro Savino, Stefano Di Carlo

Abstract: One of today's main concerns is to bring Artificial Intelligence power to embedded systems for edge applications. The hardware resources and power consumption required by state-of-the-art models are incompatible with the constrained environments observed in edge systems, such as IoT nodes and wearable devices. Spiking Neural Networks (SNNs) can represent a solution in this sense: inspired by neuro… ▽ More One of today's main concerns is to bring Artificial Intelligence power to embedded systems for edge applications. The hardware resources and power consumption required by state-of-the-art models are incompatible with the constrained environments observed in edge systems, such as IoT nodes and wearable devices. Spiking Neural Networks (SNNs) can represent a solution in this sense: inspired by neuroscience, they reach unparalleled power and resource efficiency when run on dedicated hardware accelerators. However, when designing such accelerators, the amount of choices that can be taken is huge. This paper presents SpikExplorer, a modular and flexible Python tool for hardware-oriented Automatic Design Space Exploration to automate the configuration of FPGA accelerators for SNNs. Using Bayesian optimizations, SpikerExplorer enables hardware-centric multi-objective optimization, supporting factors such as accuracy, area, latency, power, and various combinations during the exploration process. The tool searches the optimal network architecture, neuron model, and internal and training parameters, trying to reach the desired constraints imposed by the user. It allows for a straightforward network configuration, providing the full set of explored points for the user to pick the trade-off that best fits the needs. The potential of SpikExplorer is showcased using three benchmark datasets. It reaches 95.8% accuracy on the MNIST dataset, with a power consumption of 180mW/image and a latency of 0.12 ms/image, making it a powerful tool for automatically optimizing SNNs. △ Less

Submitted 4 April, 2024; originally announced April 2024.

arXiv:2404.02840 [pdf, ps, other]

A Survey on Error-Bounded Lossy Compression for Scientific Datasets

Authors: Sheng Di, Jinyang Liu, Kai Zhao, Xin Liang, Robert Underwood, Zhaorui Zhang, Milan Shah, Yafan Huang, Jiajun Huang, Xiaodong Yu, Congrong Ren, Hanqi Guo, Grant Wilkins, Dingwen Tao, Jiannan Tian, Sian Jin, Zizhe Jian, Daoce Wang, MD Hasanur Rahman, Boyuan Zhang, Jon C. Calhoun, Guanpeng Li, Kazutomo Yoshii, Khalid Ayed Alharthi, Franck Cappello

Abstract: Error-bounded lossy compression has been effective in significantly reducing the data storage/transfer burden while preserving the reconstructed data fidelity very well. Many error-bounded lossy compressors have been developed for a wide range of parallel and distributed use cases for years. These lossy compressors are designed with distinct compression models and design principles, such that each… ▽ More Error-bounded lossy compression has been effective in significantly reducing the data storage/transfer burden while preserving the reconstructed data fidelity very well. Many error-bounded lossy compressors have been developed for a wide range of parallel and distributed use cases for years. These lossy compressors are designed with distinct compression models and design principles, such that each of them features particular pros and cons. In this paper we provide a comprehensive survey of emerging error-bounded lossy compression techniques for different use cases each involving big data to process. The key contribution is fourfold. (1) We summarize an insightful taxonomy of lossy compression into 6 classic compression models. (2) We provide a comprehensive survey of 10+ commonly used compression components/modules used in error-bounded lossy compressors. (3) We provide a comprehensive survey of 10+ state-of-the-art error-bounded lossy compressors as well as how they combine the various compression modules in their designs. (4) We provide a comprehensive survey of the lossy compression for 10+ modern scientific applications and use-cases. We believe this survey is useful to multiple communities including scientific applications, high-performance computing, lossy compression, and big data. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: submitted to ACM Computing journal, requited to be 35 pages including references

arXiv:2404.02826 [pdf, ps, other]

An Error-Bounded Lossy Compression Method with Bit-Adaptive Quantization for Particle Data

Authors: Congrong Ren, Sheng Di, Longtao Zhang, Kai Zhao, Hanqi Guo

Abstract: This paper presents error-bounded lossy compression tailored for particle datasets from diverse scientific applications in cosmology, fluid dynamics, and fusion energy sciences. As today's high-performance computing capabilities advance, these datasets often reach trillions of points, posing significant visualization, analysis, and storage challenges. While error-bounded lossy compression makes it… ▽ More This paper presents error-bounded lossy compression tailored for particle datasets from diverse scientific applications in cosmology, fluid dynamics, and fusion energy sciences. As today's high-performance computing capabilities advance, these datasets often reach trillions of points, posing significant visualization, analysis, and storage challenges. While error-bounded lossy compression makes it possible to represent floating-point values with strict pointwise accuracy guarantees, the lack of correlations in particle data's storage ordering often limits the compression ratio. Inspired by quantization-encoding schemes in SZ lossy compressors, we dynamically determine the number of bits to encode particles of the dataset to increase the compression ratio. Specifically, we utilize a k-d tree to partition particles into subregions and generate ``bit boxes'' centered at particles for each subregion to encode their positions. These bit boxes ensure error control while reducing the bit count used for compression. We comprehensively evaluate our method against state-of-the-art compressors on cosmology, fluid dynamics, and fusion plasma datasets. △ Less

Submitted 4 April, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

arXiv:2404.02163 [pdf, other]

FastqZip: An Improved Reference-Based Genome Sequence Lossy Compression Framework

Authors: Yuanjian Liu, Huihao Luo, Zhijun Han, Yao Hu, Yehui Yang, Kyle Chard, Sheng Di, Ian Foster, Jiesheng Wu

Abstract: Storing and archiving data produced by next-generation sequencing (NGS) is a huge burden for research institutions. Reference-based compression algorithms are effective in dealing with these data. Our work focuses on compressing FASTQ format files with an improved reference-based compression algorithm to achieve a higher compression ratio than other state-of-the-art algorithms. We propose FastqZip… ▽ More Storing and archiving data produced by next-generation sequencing (NGS) is a huge burden for research institutions. Reference-based compression algorithms are effective in dealing with these data. Our work focuses on compressing FASTQ format files with an improved reference-based compression algorithm to achieve a higher compression ratio than other state-of-the-art algorithms. We propose FastqZip, which uses a new method mapping the sequence to reference for compression, allows reads-reordering and lossy quality scores, and the BSC or ZPAQ algorithm to perform final lossless compression for a higher compression ratio and relatively fast speed. Our method ensures the sequence can be losslessly reconstructed while allowing lossless or lossy compression for the quality scores. We reordered the reads to get a higher compression ratio. We evaluate our algorithms on five datasets and show that FastqZip can outperform the SOTA algorithm Genozip by around 10% in terms of compression ratio while having an acceptable slowdown. △ Less

Submitted 22 February, 2024; originally announced April 2024.

arXiv:2404.00383 [pdf, other]

SpikingJET: Enhancing Fault Injection for Fully and Convolutional Spiking Neural Networks

Authors: Anil Bayram Gogebakan, Enrico Magliano, Alessio Carpegna, Annachiara Ruospo, Alessandro Savino, Stefano Di Carlo

Abstract: As artificial neural networks become increasingly integrated into safety-critical systems such as autonomous vehicles, devices for medical diagnosis, and industrial automation, ensuring their reliability in the face of random hardware faults becomes paramount. This paper introduces SpikingJET, a novel fault injector designed specifically for fully connected and convolutional Spiking Neural Network… ▽ More As artificial neural networks become increasingly integrated into safety-critical systems such as autonomous vehicles, devices for medical diagnosis, and industrial automation, ensuring their reliability in the face of random hardware faults becomes paramount. This paper introduces SpikingJET, a novel fault injector designed specifically for fully connected and convolutional Spiking Neural Networks (SNNs). Our work underscores the critical need to evaluate the resilience of SNNs to hardware faults, considering their growing prominence in real-world applications. SpikingJET provides a comprehensive platform for assessing the resilience of SNNs by inducing errors and injecting faults into critical components such as synaptic weights, neuron model parameters, internal states, and activation functions. This paper demonstrates the effectiveness of Spiking-JET through extensive software-level experiments on various SNN architectures, revealing insights into their vulnerability and resilience to hardware faults. Moreover, highlighting the importance of fault resilience in SNNs contributes to the ongoing effort to enhance the reliability and safety of Neural Network (NN)-powered systems in diverse domains. △ Less

Submitted 30 March, 2024; originally announced April 2024.

ACM Class: I.2

arXiv:2403.17546 [pdf]

Decoding excellence: Mapping the demand for psychological traits of operations and supply chain professionals through text mining

Authors: S. Di Luozzo, A. Fronzetti Colladon, M. M. Schiraldi

Abstract: The current study proposes an innovative methodology for the profiling of psychological traits of Operations Management (OM) and Supply Chain Management (SCM) professionals. We use innovative methods and tools of text mining and social network analysis to map the demand for relevant skills from a set of job descriptions, with a focus on psychological characteristics. The proposed approach aims to… ▽ More The current study proposes an innovative methodology for the profiling of psychological traits of Operations Management (OM) and Supply Chain Management (SCM) professionals. We use innovative methods and tools of text mining and social network analysis to map the demand for relevant skills from a set of job descriptions, with a focus on psychological characteristics. The proposed approach aims to evaluate the market demand for specific traits by combining relevant psychological constructs, text mining techniques, and an innovative measure, namely, the Semantic Brand Score. We apply the proposed methodology to a dataset of job descriptions for OM and SCM professionals, with the objective of providing a mapping of their relevant required skills, including psychological characteristics. In addition, the analysis is then detailed by considering the region of the organization that issues the job description, its organizational size, and the seniority level of the open position in order to understand their nuances. Finally, topic modeling is used to examine key components and their relative significance in job descriptions. By employing a novel methodology and considering contextual factors, we provide an innovative understanding of the attitudinal traits that differentiate professionals. This research contributes to talent management, recruitment practices, and professional development initiatives, since it provides new figures and perspectives to improve the effectiveness and success of Operations Management and Supply Chain Management professionals. △ Less

Submitted 26 March, 2024; originally announced March 2024.

ACM Class: I.2.7; J.4; H.4.0

arXiv:2403.15953 [pdf, other]

Understanding The Effectiveness of Lossy Compression in Machine Learning Training Sets

Authors: Robert Underwood, Jon C. Calhoun, Sheng Di, Franck Cappello

Abstract: Learning and Artificial Intelligence (ML/AI) techniques have become increasingly prevalent in high performance computing (HPC). However, these methods depend on vast volumes of floating point data for training and validation which need methods to share the data on a wide area network (WAN) or to transfer it from edge devices to data centers. Data compression can be a solution to these problems, bu… ▽ More Learning and Artificial Intelligence (ML/AI) techniques have become increasingly prevalent in high performance computing (HPC). However, these methods depend on vast volumes of floating point data for training and validation which need methods to share the data on a wide area network (WAN) or to transfer it from edge devices to data centers. Data compression can be a solution to these problems, but an in-depth understanding of how lossy compression affects model quality is needed. Prior work largely considers a single application or compression method. We designed a systematic methodology for evaluating data reduction techniques for ML/AI, and we use it to perform a very comprehensive evaluation with 17 data reduction methods on 7 ML/AI applications to show modern lossy compression methods can achieve a 50-100x compression ratio improvement for a 1% or less loss in quality. We identify critical insights that guide the future use and design of lossy compressors for ML/AI. △ Less

Submitted 23 March, 2024; originally announced March 2024.

Comments: 12 pages, 4 figures

ACM Class: I.2.6; E.2; C.4

arXiv:2403.13730 [pdf, other]

Projection-free computation of robust controllable sets with constrained zonotopes

Authors: Abraham P. Vinod, Avishai Weiss, Stefano Di Cairano

Abstract: We study the problem of computing robust controllable sets for discrete-time linear systems with additive uncertainty. We propose a tractable and scalable approach to inner- and outer-approximate robust controllable sets using constrained zonotopes, when the additive uncertainty set is a symmetric, convex, and compact set. Our least-squares-based approach uses novel closed-form approximations of t… ▽ More We study the problem of computing robust controllable sets for discrete-time linear systems with additive uncertainty. We propose a tractable and scalable approach to inner- and outer-approximate robust controllable sets using constrained zonotopes, when the additive uncertainty set is a symmetric, convex, and compact set. Our least-squares-based approach uses novel closed-form approximations of the Pontryagin difference between a constrained zonotopic minuend and a symmetric, convex, and compact subtrahend. Unlike existing approaches, our approach does not rely on convex optimization solvers, and is projection-free for ellipsoidal and zonotopic uncertainty sets. We also propose a least-squares-based approach to compute a convex, polyhedral outer-approximation to constrained zonotopes, and characterize sufficient conditions under which all these approximations are exact. We demonstrate the computational efficiency and scalability of our approach in several case studies, including the design of abort-safe rendezvous trajectories for a spacecraft in near-rectilinear halo orbit under uncertainty. Our approach can inner-approximate a 20-step robust controllable set for a 100-dimensional linear system in under 15 seconds on a standard computer. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: 22 pages, 6 figures

arXiv:2403.10204 [pdf, other]

The Euclidean MST-ratio for Bi-colored Lattices

Authors: Sebastiano Cultrera di Montesano, Ondřej Draganov, Herbert Edelsbrunner, Morteza Saghafian

Abstract: Given a finite set, $A \subseteq \mathbb{R}^2$, and a subset, $B \subseteq A$, the \emph{MST-ratio} is the combined length of the minimum spanning trees of $B$ and $A \setminus B$ divided by the length of the minimum spanning tree of $A$. The question of the supremum, over all sets $A$, of the maximum, over all subsets $B$, is related to the Steiner ratio, and we prove this sup-max is between… ▽ More Given a finite set, $A \subseteq \mathbb{R}^2$, and a subset, $B \subseteq A$, the \emph{MST-ratio} is the combined length of the minimum spanning trees of $B$ and $A \setminus B$ divided by the length of the minimum spanning tree of $A$. The question of the supremum, over all sets $A$, of the maximum, over all subsets $B$, is related to the Steiner ratio, and we prove this sup-max is between $2.154$ and $2.427$. Restricting ourselves to $2$-dimensional lattices, we prove that the sup-max is $2.0$, while the inf-max is $1.25$. By some margin the most difficult of these results is the upper bound for the inf-max, which we prove by showing that the hexagonal lattice cannot have MST-ratio larger than $1.25$. △ Less

Submitted 15 March, 2024; originally announced March 2024.

MSC Class: 52C05; 05C10 ACM Class: G.2

arXiv:2401.08397 [pdf]

doi 10.1109/LATS62223.2024.10534595

A Micro Architectural Events Aware Real-Time Embedded System Fault Injector

Authors: Enrico Magliano, Alessio Carpegna, Alessadro Savino, Stefano Di Carlo

Abstract: In contemporary times, the increasing complexity of the system poses significant challenges to the reliability, trustworthiness, and security of the SACRES. Key issues include the susceptibility to phenomena such as instantaneous voltage spikes, electromagnetic interference, neutron strikes, and out-of-range temperatures. These factors can induce switch state changes in transistors, resulting in b… ▽ More In contemporary times, the increasing complexity of the system poses significant challenges to the reliability, trustworthiness, and security of the SACRES. Key issues include the susceptibility to phenomena such as instantaneous voltage spikes, electromagnetic interference, neutron strikes, and out-of-range temperatures. These factors can induce switch state changes in transistors, resulting in bit-flipping, soft errors, and transient corruption of stored data in memory. The occurrence of soft errors, in turn, may lead to system faults that can propel the system into a hazardous state. Particularly in critical sectors like automotive, avionics, or aerospace, such malfunctions can have real-world implications, potentially causing harm to individuals. This paper introduces a novel fault injector designed to facilitate the monitoring, aggregation, and examination of micro-architectural events. This is achieved by harnessing the microprocessor's PMU and the debugging interface, specifically focusing on ensuring the repeatability of fault injections. The fault injection methodology targets bit-flipping within the memory system, affecting CPU registers and RAM. The outcomes of these fault injections enable a thorough analysis of the impact of soft errors and establish a robust correlation between the identified faults and the essential timing predictability demanded by SACRES. △ Less

Submitted 11 June, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

Journal ref: 2024 IEEE 25th Latin American Test Symposium (LATS)

arXiv:2401.01141 [pdf, other]

Spiker+: a framework for the generation of efficient Spiking Neural Networks FPGA accelerators for inference at the edge

Authors: Alessio Carpegna, Alessandro Savino, Stefano Di Carlo

Abstract: Including Artificial Neural Networks in embedded systems at the edge allows applications to exploit Artificial Intelligence capabilities directly within devices operating at the network periphery. This paper introduces Spiker+, a comprehensive framework for generating efficient, low-power, and low-area customized Spiking Neural Networks (SNN) accelerators on FPGA for inference at the edge. Spiker+… ▽ More Including Artificial Neural Networks in embedded systems at the edge allows applications to exploit Artificial Intelligence capabilities directly within devices operating at the network periphery. This paper introduces Spiker+, a comprehensive framework for generating efficient, low-power, and low-area customized Spiking Neural Networks (SNN) accelerators on FPGA for inference at the edge. Spiker+ presents a configurable multi-layer hardware SNN, a library of highly efficient neuron architectures, and a design framework, enabling the development of complex neural network accelerators with few lines of Python code. Spiker+ is tested on two benchmark datasets, the MNIST and the Spiking Heidelberg Digits (SHD). On the MNIST, it demonstrates competitive performance compared to state-of-the-art SNN accelerators. It outperforms them in terms of resource allocation, with a requirement of 7,612 logic cells and 18 Block RAMs (BRAMs), which makes it fit in very small FPGA, and power consumption, draining only 180mW for a complete inference on an input image. The latency is comparable to the ones observed in the state-of-the-art, with 780us/img. To the authors' knowledge, Spiker+ is the first SNN accelerator tested on the SHD. In this case, the accelerator requires 18,268 logic cells and 51 BRAM, with an overall power consumption of 430mW and a latency of 54 us for a complete inference on input data. This underscores the significance of Spiker+ in the hardware-accelerated SNN landscape, making it an excellent solution to deploy configurable and tunable SNN architectures in resource and power-constrained edge applications. △ Less

Submitted 2 January, 2024; originally announced January 2024.

arXiv:2312.17525 [pdf, other]

doi 10.1109/DSN-W58399.2023.00047

Design Space Exploration of Approximate Computing Techniques with a Reinforcement Learning Approach

Authors: Sepide Saeedi, Alessandro Savino, Stefano Di Carlo

Abstract: Approximate Computing (AxC) techniques have become increasingly popular in trading off accuracy for performance gains in various applications. Selecting the best AxC techniques for a given application is challenging. Among proposed approaches for exploring the design space, Machine Learning approaches such as Reinforcement Learning (RL) show promising results. In this paper, we proposed an RL-base… ▽ More Approximate Computing (AxC) techniques have become increasingly popular in trading off accuracy for performance gains in various applications. Selecting the best AxC techniques for a given application is challenging. Among proposed approaches for exploring the design space, Machine Learning approaches such as Reinforcement Learning (RL) show promising results. In this paper, we proposed an RL-based multi-objective Design Space Exploration strategy to find the approximate versions of the application that balance accuracy degradation and power and computation time reduction. Our experimental results show a good trade-off between accuracy degradation and decreased power and computation time for some benchmarks. △ Less

Submitted 29 December, 2023; originally announced December 2023.

ACM Class: C.1

Journal ref: 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W), Porto, Portugal, 2023, pp. 167-170

arXiv:2312.14220 [pdf, other]

Single-Cell RNA-seq Synthesis with Latent Diffusion Model

Authors: Yixuan Wang, Shuangyin Li, Shimin DI, Lei Chen

Abstract: The single-cell RNA sequencing (scRNA-seq) technology enables researchers to study complex biological systems and diseases with high resolution. The central challenge is synthesizing enough scRNA-seq samples; insufficient samples can impede downstream analysis and reproducibility. While various methods have been attempted in past research, the resulting scRNA-seq samples were often of poor quality… ▽ More The single-cell RNA sequencing (scRNA-seq) technology enables researchers to study complex biological systems and diseases with high resolution. The central challenge is synthesizing enough scRNA-seq samples; insufficient samples can impede downstream analysis and reproducibility. While various methods have been attempted in past research, the resulting scRNA-seq samples were often of poor quality or limited in terms of useful specific cell subpopulations. To address these issues, we propose a novel method called Single-Cell Latent Diffusion (SCLD) based on the Diffusion Model. This method is capable of synthesizing large-scale, high-quality scRNA-seq samples, including both 'holistic' or targeted specific cellular subpopulations within a unified framework. A pre-guidance mechanism is designed for synthesizing specific cellular subpopulations, while a post-guidance mechanism aims to enhance the quality of scRNA-seq samples. The SCLD can synthesize large-scale and high-quality scRNA-seq samples for various downstream tasks. Our experimental results demonstrate state-of-the-art performance in cell classification and data distribution distances when evaluated on two scRNA-seq benchmarks. Additionally, visualization experiments show the SCLD's capability in synthesizing specific cellular subpopulations. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: 13 pages, 5 figures

arXiv:2312.13461 [pdf, other]

FedSZ: Leveraging Error-Bounded Lossy Compression for Federated Learning Communications

Authors: Grant Wilkins, Sheng Di, Jon C. Calhoun, Zilinghan Li, Kibaek Kim, Robert Underwood, Richard Mortier, Franck Cappello

Abstract: With the promise of federated learning (FL) to allow for geographically-distributed and highly personalized services, the efficient exchange of model updates between clients and servers becomes crucial. FL, though decentralized, often faces communication bottlenecks, especially in resource-constrained scenarios. Existing data compression techniques like gradient sparsification, quantization, and p… ▽ More With the promise of federated learning (FL) to allow for geographically-distributed and highly personalized services, the efficient exchange of model updates between clients and servers becomes crucial. FL, though decentralized, often faces communication bottlenecks, especially in resource-constrained scenarios. Existing data compression techniques like gradient sparsification, quantization, and pruning offer some solutions, but may compromise model performance or necessitate expensive retraining. In this paper, we introduce FedSZ, a specialized lossy-compression algorithm designed to minimize the size of client model updates in FL. FedSZ incorporates a comprehensive compression pipeline featuring data partitioning, lossy and lossless compression of model parameters and metadata, and serialization. We evaluate FedSZ using a suite of error-bounded lossy compressors, ultimately finding SZ2 to be the most effective across various model architectures and datasets including AlexNet, MobileNetV2, ResNet50, CIFAR-10, Caltech101, and Fashion-MNIST. Our study reveals that a relative error bound 1E-2 achieves an optimal tradeoff, compressing model states between 5.55-12.61x while maintaining inference accuracy within <0.5% of uncompressed results. Additionally, the runtime overhead of FedSZ is <4.7% or between of the wall-clock communication-round time, a worthwhile trade-off for reducing network transfer times by an order of magnitude for networks bandwidths <500Mbps. Intriguingly, we also find that the error introduced by FedSZ could potentially serve as a source of differentially private noise, opening up new avenues for privacy-preserving FL. △ Less

Submitted 24 April, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

Comments: Appearing at 44th IEEE International Conference on Distributed Computing Systems (ICDCS)

arXiv:2312.11560 [pdf, other]

Learning from Emergence: A Study on Proactively Inhibiting the Monosemantic Neurons of Artificial Neural Networks

Authors: Jiachuan Wang, Shimin Di, Lei Chen, Charles Wang Wai Ng

Abstract: Recently, emergence has received widespread attention from the research community along with the success of large-scale models. Different from the literature, we hypothesize a key factor that promotes the performance during the increase of scale: the reduction of monosemantic neurons that can only form one-to-one correlations with specific features. Monosemantic neurons tend to be sparser and have… ▽ More Recently, emergence has received widespread attention from the research community along with the success of large-scale models. Different from the literature, we hypothesize a key factor that promotes the performance during the increase of scale: the reduction of monosemantic neurons that can only form one-to-one correlations with specific features. Monosemantic neurons tend to be sparser and have negative impacts on the performance in large models. Inspired by this insight, we propose an intuitive idea to identify monosemantic neurons and inhibit them. However, achieving this goal is a non-trivial task as there is no unified quantitative evaluation metric and simply banning monosemantic neurons does not promote polysemanticity in neural networks. Therefore, we first propose a new metric to measure the monosemanticity of neurons with the guarantee of efficiency for online computation, then introduce a theoretically supported method to suppress monosemantic neurons and proactively promote the ratios of polysemantic neurons in training neural networks. We validate our conjecture that monosemanticity brings about performance change at different model scales on a variety of neural networks and benchmark datasets in different areas, including language, image, and physics simulation tasks. Further experiments validate our analysis and theory regarding the inhibition of monosemanticity. △ Less

Submitted 19 June, 2024; v1 submitted 17 December, 2023; originally announced December 2023.

Comments: 16 pages, 5 figures, KDD2024

arXiv:2312.09383 [pdf, other]

Security layers and related services within the Horizon Europe NEUROPULS project

Authors: Fabio Pavanello, Cedric Marchand, Paul Jimenez, Xavier Letartre, Ricardo Chaves, Niccolò Marastoni, Alberto Lovato, Mariano Ceccato, George Papadimitriou, Vasileios Karakostas, Dimitris Gizopoulos, Roberta Bardini, Tzamn Melendez Carmona, Stefano Di Carlo, Alessandro Savino, Laurence Lerch, Ulrich Ruhrmair, Sergio Vinagrero Gutierrez, Giorgio Di Natale, Elena Ioana Vatajelu

Abstract: In the contemporary security landscape, the incorporation of photonics has emerged as a transformative force, unlocking a spectrum of possibilities to enhance the resilience and effectiveness of security primitives. This integration represents more than a mere technological augmentation; it signifies a paradigm shift towards innovative approaches capable of delivering security primitives with key… ▽ More In the contemporary security landscape, the incorporation of photonics has emerged as a transformative force, unlocking a spectrum of possibilities to enhance the resilience and effectiveness of security primitives. This integration represents more than a mere technological augmentation; it signifies a paradigm shift towards innovative approaches capable of delivering security primitives with key properties for low-power systems. This not only augments the robustness of security frameworks, but also paves the way for novel strategies that adapt to the evolving challenges of the digital age. This paper discusses the security layers and related services that will be developed, modeled, and evaluated within the Horizon Europe NEUROPULS project. These layers will exploit novel implementations for security primitives based on physical unclonable functions (PUFs) using integrated photonics technology. Their objective is to provide a series of services to support the secure operation of a neuromorphic photonic accelerator for edge computing applications. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: 6 pages, 4 figures

arXiv:2312.06505 [pdf, other]

Grounded Question-Answering in Long Egocentric Videos

Authors: Shangzhe Di, Weidi Xie

Abstract: Existing approaches to video understanding, mainly designed for short videos from a third-person perspective, are limited in their applicability in certain fields, such as robotics. In this paper, we delve into open-ended question-answering (QA) in long, egocentric videos, which allows individuals or robots to inquire about their own past visual experiences. This task presents unique challenges, i… ▽ More Existing approaches to video understanding, mainly designed for short videos from a third-person perspective, are limited in their applicability in certain fields, such as robotics. In this paper, we delve into open-ended question-answering (QA) in long, egocentric videos, which allows individuals or robots to inquire about their own past visual experiences. This task presents unique challenges, including the complexity of temporally grounding queries within extensive video content, the high resource demands for precise data annotation, and the inherent difficulty of evaluating open-ended answers due to their ambiguous nature. Our proposed approach tackles these challenges by (i) integrating query grounding and answering within a unified model to reduce error propagation; (ii) employing large language models for efficient and scalable data synthesis; and (iii) introducing a close-ended QA task for evaluation, to manage answer ambiguity. Extensive experiments demonstrate the effectiveness of our method, which also achieves state-of-the-art performance on the QaEgo4D and Ego4D-NLQ benchmarks. Code, data, and models are available at https://github.com/Becomebright/GroundVQA. △ Less

Submitted 1 April, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

Comments: Accepted to CVPR 2024. Project website at https://dszdsz.cn/GroundVQA

arXiv:2312.05492 [pdf, other]

cuSZ-$i$: High-Ratio Scientific Lossy Compression on GPUs with Optimized Multi-Level Interpolation

Authors: Jinyang Liu, Jiannan Tian, Shixun Wu, Sheng Di, Boyuan Zhang, Robert Underwood, Yafan Huang, Jiajun Huang, Kai Zhao, Guanpeng Li, Dingwen Tao, Zizhong Chen, Franck Cappello

Abstract: Error-bounded lossy compression is a critical technique for significantly reducing scientific data volumes. Compared to CPU-based compressors, GPU-based compressors exhibit substantially higher throughputs, fitting better for today's HPC applications. However, the critical limitations of existing GPU-based compressors are their low compression ratios and qualities, severely restricting their appli… ▽ More Error-bounded lossy compression is a critical technique for significantly reducing scientific data volumes. Compared to CPU-based compressors, GPU-based compressors exhibit substantially higher throughputs, fitting better for today's HPC applications. However, the critical limitations of existing GPU-based compressors are their low compression ratios and qualities, severely restricting their applicability. To overcome these, we introduce a new GPU-based error-bounded scientific lossy compressor named cuSZ-$i$, with the following contributions: (1) A novel GPU-optimized interpolation-based prediction method significantly improves the compression ratio and decompression data quality. (2) The Huffman encoding module in cuSZ-$i$ is optimized for better efficiency. (3) cuSZ-$i$ is the first to integrate the NVIDIA Bitcomp-lossless as an additional compression-ratio-enhancing module. Evaluations show that cuSZ-$i$ significantly outperforms other latest GPU-based lossy compressors in compression ratio under the same error bound (hence, the desired quality), showcasing a 476% advantage over the second-best. This leads to cuSZ-$i$'s optimized performance in several real-world use cases. △ Less

Submitted 11 July, 2024; v1 submitted 9 December, 2023; originally announced December 2023.

Comments: accepted by SC '24

arXiv:2311.12133 [pdf, other]

High-performance Effective Scientific Error-bounded Lossy Compression with Auto-tuned Multi-component Interpolation

Authors: Jinyang Liu, Sheng Di, Kai Zhao, Xin Liang, Sian Jin, Zizhe Jian, Jiajun Huang, Shixun Wu, Zizhong Chen, Franck Cappello

Abstract: Error-bounded lossy compression has been identified as a promising solution for significantly reducing scientific data volumes upon users' requirements on data distortion. For the existing scientific error-bounded lossy compressors, some of them (such as SPERR and FAZ) can reach fairly high compression ratios and some others (such as SZx, SZ, and ZFP) feature high compression speeds, but they rare… ▽ More Error-bounded lossy compression has been identified as a promising solution for significantly reducing scientific data volumes upon users' requirements on data distortion. For the existing scientific error-bounded lossy compressors, some of them (such as SPERR and FAZ) can reach fairly high compression ratios and some others (such as SZx, SZ, and ZFP) feature high compression speeds, but they rarely exhibit both high ratio and high speed meanwhile. In this paper, we propose HPEZ with newly-designed interpolations and quality-metric-driven auto-tuning, which features significantly improved compression quality upon the existing high-performance compressors, meanwhile being exceedingly faster than high-ratio compressors. The key contributions lie in the following points: (1) We develop a series of advanced techniques such as interpolation re-ordering, multi-dimensional interpolation, and natural cubic splines to significantly improve compression qualities with interpolation-based data prediction. (2) The auto-tuning module in HPEZ has been carefully designed with novel strategies, including but not limited to block-wise interpolation tuning, dynamic dimension freezing, and Lorenzo tuning. (3) We thoroughly evaluate HPEZ compared with many other compressors on six real-world scientific datasets. Experiments show that HPEZ outperforms other high-performance error-bounded lossy compressors in compression ratio by up to 140% under the same error bound, and by up to 360% under the same PSNR. In parallel data transfer experiments on the distributed database, HPEZ achieves a significant performance gain with up to 40% time cost reduction over the second-best compressor. △ Less

Submitted 13 December, 2023; v1 submitted 20 November, 2023; originally announced November 2023.

arXiv:2311.01115 [pdf, other]

Dynamically Maintaining the Persistent Homology of Time Series

Authors: Sebastiano Cultrera di Montesano, Herbert Edelsbrunner, Monika Henzinger, Lara Ost

Abstract: We present a dynamic data structure for maintaining the persistent homology of a time series of real numbers. The data structure supports local operations, including the insertion and deletion of an item and the cutting and concatenating of lists, each in time $O(\log n + k)$, in which $n$ counts the critical items and $k$ the changes in the augmented persistence diagram. To achieve this, we desig… ▽ More We present a dynamic data structure for maintaining the persistent homology of a time series of real numbers. The data structure supports local operations, including the insertion and deletion of an item and the cutting and concatenating of lists, each in time $O(\log n + k)$, in which $n$ counts the critical items and $k$ the changes in the augmented persistence diagram. To achieve this, we design a tailor-made tree structure with an unconventional representation, referred to as banana tree, which may be useful in its own right. △ Less

Submitted 2 July, 2024; v1 submitted 2 November, 2023; originally announced November 2023.

Comments: Corrected the statement and proof of Theorem 5.2; added a missing edge-case to the anti-cancellation algorithm

arXiv:2311.00149 [pdf, other]

A Knowledge Compilation Take on Binary Polynomial Optimization

Authors: Florent Capelli, Alberto Del Pia, Silvia Di Gregorio

Abstract: The Binary Polynomial Optimization (BPO) problem is defined as the problem of maximizing a given polynomial function over all binary points. The main contribution of this paper is to draw a novel connection between BPO and the problem of finding the maximal assignment for a Boolean function with weights on variables. This connection allows us to give a strongly polynomial algorithm that solves BPO… ▽ More The Binary Polynomial Optimization (BPO) problem is defined as the problem of maximizing a given polynomial function over all binary points. The main contribution of this paper is to draw a novel connection between BPO and the problem of finding the maximal assignment for a Boolean function with weights on variables. This connection allows us to give a strongly polynomial algorithm that solves BPO with a hypergraph that is either $β$-acyclic or with bounded incidence treewidth. This result unifies and significantly extends the known tractable classes of BPO. The generality of our technique allows us to deal also with extensions of BPO, where we enforce extended cardinality constraints on the set of binary points, and where we seek $k$ best feasible solutions. We also extend our results to the significantly more general problem where variables are replaced by literals. Preliminary computational results show that the resulting algorithms can be significantly faster than current state-of-the-art. △ Less

Submitted 31 October, 2023; originally announced November 2023.

arXiv:2311.00063 [pdf, other]

Safe multi-agent motion planning under uncertainty for drones using filtered reinforcement learning

Authors: Sleiman Safaoui, Abraham P. Vinod, Ankush Chakrabarty, Rien Quirynen, Nobuyuki Yoshikawa, Stefano Di Cairano

Abstract: We consider the problem of safe multi-agent motion planning for drones in uncertain, cluttered workspaces. For this problem, we present a tractable motion planner that builds upon the strengths of reinforcement learning and constrained-control-based trajectory planning. First, we use single-agent reinforcement learning to learn motion plans from data that reach the target but may not be collision-… ▽ More We consider the problem of safe multi-agent motion planning for drones in uncertain, cluttered workspaces. For this problem, we present a tractable motion planner that builds upon the strengths of reinforcement learning and constrained-control-based trajectory planning. First, we use single-agent reinforcement learning to learn motion plans from data that reach the target but may not be collision-free. Next, we use a convex optimization, chance constraints, and set-based methods for constrained control to ensure safety, despite the uncertainty in the workspace, agent motion, and sensing. The proposed approach can handle state and control constraints on the agents, and enforce collision avoidance among themselves and with static obstacles in the workspace with high probability. The proposed approach yields a safe, real-time implementable, multi-agent motion planner that is simpler to train than methods based solely on learning. Numerical simulations and experiments show the efficacy of the approach. △ Less

Submitted 31 October, 2023; originally announced November 2023.

arXiv:2310.14133 [pdf, other]

Dynamic Quality Metric Oriented Error-bounded Lossy Compression for Scientific Datasets

Authors: Jinyang Liu, Sheng Di, Kai Zhao, Xin Liang, Zizhong Chen, Franck Cappello

Abstract: With the ever-increasing execution scale of high performance computing (HPC) applications, vast amounts of data are being produced by scientific research every day. Error-bounded lossy compression has been considered a very promising solution to address the big-data issue for scientific applications because it can significantly reduce the data volume with low time cost meanwhile allowing users to… ▽ More With the ever-increasing execution scale of high performance computing (HPC) applications, vast amounts of data are being produced by scientific research every day. Error-bounded lossy compression has been considered a very promising solution to address the big-data issue for scientific applications because it can significantly reduce the data volume with low time cost meanwhile allowing users to control the compression errors with a specified error bound. The existing error-bounded lossy compressors, however, are all developed based on inflexible designs or compression pipelines, which cannot adapt to diverse compression quality requirements/metrics favored by different application users. In this paper, we propose a novel dynamic quality metric oriented error-bounded lossy compression framework, namely QoZ. The detailed contribution is three-fold. (1) We design a novel highly-parameterized multi-level interpolation-based data predictor, which can significantly improve the overall compression quality with the same compressed size. (2) We design the error-bounded lossy compression framework QoZ based on the adaptive predictor, which can auto-tune the critical parameters and optimize the compression result according to user-specified quality metrics during online compression. (3) We evaluate QoZ carefully by comparing its compression quality with multiple state-of-the-arts on various real-world scientific application datasets. Experiments show that, compared with the second-best lossy compressor, QoZ can achieve up to 70% compression ratio improvement under the same error bound, up to 150% compression ratio improvement under the same PSNR, or up to 270% compression ratio improvement under the same SSIM. △ Less

Submitted 21 October, 2023; originally announced October 2023.

arXiv:2309.16214 [pdf, other]

Canary: Congestion-Aware In-Network Allreduce Using Dynamic Trees

Authors: Daniele De Sensi, Edgar Costa Molero, Salvatore Di Girolamo, Laurent Vanbever, Torsten Hoefler

Abstract: The allreduce operation is an essential building block for many distributed applications, ranging from the training of deep learning models to scientific computing. In an allreduce operation, data from multiple hosts is aggregated together and then broadcasted to each host participating in the operation. Allreduce performance can be improved by a factor of two by aggregating the data directly in t… ▽ More The allreduce operation is an essential building block for many distributed applications, ranging from the training of deep learning models to scientific computing. In an allreduce operation, data from multiple hosts is aggregated together and then broadcasted to each host participating in the operation. Allreduce performance can be improved by a factor of two by aggregating the data directly in the network. Switches aggregate data coming from multiple ports before forwarding the partially aggregated result to the next hop. In all existing solutions, each switch needs to know the ports from which it will receive the data to aggregate. However, this forces packets to traverse a predefined set of switches, making these solutions prone to congestion. For this reason, we design Canary, the first congestion-aware in-network allreduce algorithm. Canary uses load balancing algorithms to forward packets on the least congested paths. Because switches do not know from which ports they will receive the data to aggregate, they use timeouts to aggregate the data in a best-effort way. We develop a P4 Canary prototype and evaluate it on a Tofino switch. We then validate Canary through simulations on large networks, showing performance improvements up to 40% compared to the state-of-the-art. △ Less

Submitted 28 September, 2023; originally announced September 2023.

ACM Class: C.2.1; C.2.2; C.2.4; C.5.1

arXiv:2309.10603 [pdf, other]

Asteroids co-orbital motion classification based on Machine Learning

Authors: Giulia Ciacci, Andrea Barucci, Sara Di Ruzza, Elisa Maria Alessi

Abstract: In this work, we explore how to classify asteroids in co-orbital motion with a given planet using Machine Learning. We consider four different kinds of motion in mean motion resonance with the planet, nominally Tadpole, Horseshoe and Quasi-satellite, building 3 datasets defined as Real (taking the ephemerides of real asteroids from the JPL Horizons system), Ideal and Perturbed (both simulated, obt… ▽ More In this work, we explore how to classify asteroids in co-orbital motion with a given planet using Machine Learning. We consider four different kinds of motion in mean motion resonance with the planet, nominally Tadpole, Horseshoe and Quasi-satellite, building 3 datasets defined as Real (taking the ephemerides of real asteroids from the JPL Horizons system), Ideal and Perturbed (both simulated, obtained by propagating initial conditions considering two different dynamical systems) for training and testing the Machine Learning algorithms in different conditions. The time series of the variable theta (angle related to the resonance) are studied with a data analysis pipeline defined ad hoc for the problem and composed by: data creation and annotation, time series features extraction thanks to the tsfresh package (potentially followed by selection and standardization) and the application of Machine Learning algorithms for Dimensionality Reduction and Classification. Such approach, based on features extracted from the time series, allows to work with a smaller number of data with respect to Deep Learning algorithms, also allowing to define a ranking of the importance of the features. Physical Interpretability of the features is another key point of this approach. In addition, we introduce the SHapley Additive exPlanations for Explainability technique. Different training and test sets are used, in order to understand the power and the limits of our approach. The results show how the algorithms are able to identify and classify correctly the time series, with a high degree of performance. △ Less

Submitted 19 September, 2023; originally announced September 2023.

arXiv:2309.04037 [pdf, other]

SRN-SZ: Deep Leaning-Based Scientific Error-bounded Lossy Compression with Super-resolution Neural Networks

Authors: Jinyang Liu, Sheng Di, Sian Jin, Kai Zhao, Xin Liang, Zizhong Chen, Franck Cappello

Abstract: The fast growth of computational power and scales of modern super-computing systems have raised great challenges for the management of exascale scientific data. To maintain the usability of scientific data, error-bound lossy compression is proposed and developed as an essential technique for the size reduction of scientific data with constrained data distortion. Among the diverse datasets generate… ▽ More The fast growth of computational power and scales of modern super-computing systems have raised great challenges for the management of exascale scientific data. To maintain the usability of scientific data, error-bound lossy compression is proposed and developed as an essential technique for the size reduction of scientific data with constrained data distortion. Among the diverse datasets generated by various scientific simulations, certain datasets cannot be effectively compressed by existing error-bounded lossy compressors with traditional techniques. The recent success of Artificial Intelligence has inspired several researchers to integrate neural networks into error-bounded lossy compressors. However, those works still suffer from limited compression ratios and/or extremely low efficiencies. To address those issues and improve the compression on the hard-to-compress datasets, in this paper, we propose SRN-SZ, which is a deep learning-based scientific error-bounded lossy compressor leveraging the hierarchical data grid expansion paradigm implemented by super-resolution neural networks. SRN-SZ applies the most advanced super-resolution network HAT for its compression, which is free of time-costing per-data training. In experiments compared with various state-of-the-art compressors, SRN-SZ achieves up to 75% compression ratio improvements under the same error bound and up to 80% compression ratio improvements under the same PSNR than the second-best compressor. △ Less

Submitted 6 November, 2023; v1 submitted 7 September, 2023; originally announced September 2023.

arXiv:2309.03628 [pdf, other]

OSMOSIS: Enabling Multi-Tenancy in Datacenter SmartNICs

Authors: Mikhail Khalilov, Marcin Chrapek, Siyuan Shen, Alessandro Vezzu, Thomas Benz, Salvatore Di Girolamo, Timo Schneider, Daniele De Sensi, Luca Benini, Torsten Hoefler

Abstract: Multi-tenancy is essential for unleashing SmartNIC's potential in datacenters. Our systematic analysis in this work shows that existing on-path SmartNICs have resource multiplexing limitations. For example, existing solutions lack multi-tenancy capabilities such as performance isolation and QoS provisioning for compute and IO resources. Compared to standard NIC data paths with a well-defined set o… ▽ More Multi-tenancy is essential for unleashing SmartNIC's potential in datacenters. Our systematic analysis in this work shows that existing on-path SmartNICs have resource multiplexing limitations. For example, existing solutions lack multi-tenancy capabilities such as performance isolation and QoS provisioning for compute and IO resources. Compared to standard NIC data paths with a well-defined set of offloaded functions, unpredictable execution times of SmartNIC kernels make conventional approaches for multi-tenancy and QoS insufficient. We fill this gap with OSMOSIS, a SmartNICs resource manager co-design. OSMOSIS extends existing OS mechanisms to enable dynamic hardware resource multiplexing of the on-path packet processing data plane. We integrate OSMOSIS within an open-source RISC-V-based 400Gbit/s SmartNIC. Our performance results demonstrate that OSMOSIS fully supports multi-tenancy and enables broader adoption of SmartNICs in datacenters with low overhead. △ Less

Submitted 13 March, 2024; v1 submitted 7 September, 2023; originally announced September 2023.

Comments: 12 pages, 14 figures, 103 references

arXiv:2308.06960 [pdf, other]

Search to Fine-tune Pre-trained Graph Neural Networks for Graph-level Tasks

Authors: Zhili Wang, Shimin Di, Lei Chen, Xiaofang Zhou

Abstract: Recently, graph neural networks (GNNs) have shown its unprecedented success in many graph-related tasks. However, GNNs face the label scarcity issue as other neural networks do. Thus, recent efforts try to pre-train GNNs on a large-scale unlabeled graph and adapt the knowledge from the unlabeled graph to the target downstream task. The adaptation is generally achieved by fine-tuning the pre-traine… ▽ More Recently, graph neural networks (GNNs) have shown its unprecedented success in many graph-related tasks. However, GNNs face the label scarcity issue as other neural networks do. Thus, recent efforts try to pre-train GNNs on a large-scale unlabeled graph and adapt the knowledge from the unlabeled graph to the target downstream task. The adaptation is generally achieved by fine-tuning the pre-trained GNNs with a limited number of labeled data. Despite the importance of fine-tuning, current GNNs pre-training works often ignore designing a good fine-tuning strategy to better leverage transferred knowledge and improve the performance on downstream tasks. Only few works start to investigate a better fine-tuning strategy for pre-trained GNNs. But their designs either have strong assumptions or overlook the data-aware issue for various downstream datasets. Therefore, we aim to design a better fine-tuning strategy for pre-trained GNNs to improve the model performance in this paper. Given a pre-trained GNN, we propose to search to fine-tune pre-trained graph neural networks for graph-level tasks (S2PGNN), which adaptively design a suitable fine-tuning framework for the given labeled data on the downstream task. To ensure the improvement brought by searching fine-tuning strategy, we carefully summarize a proper search space of fine-tuning framework that is suitable for GNNs. The empirical studies show that S2PGNN can be implemented on the top of 10 famous pre-trained GNNs and consistently improve their performance. Besides, S2PGNN achieves better performance than existing fine-tuning strategies within and outside the GNN area. Our code is publicly available at \url{https://anonymous.4open.science/r/code_icde2024-A9CB/}. △ Less

Submitted 1 March, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

arXiv:2308.06266 [pdf, other]

$n$ Walks in the Fictional Woods

Authors: Victor Schetinger, Sara Di Bartolomeo, Edirlei Soares de Lima, Christofer Meinecke, Rudolf Rosa

Abstract: This paper presents a novel exploration of the interaction between generative AI models, visualization, and narrative generation processes, using OpenAI's GPT as a case study. We look at the question "Where Does Generativeness Comes From", which has a simple answer at the intersection of many domains. Drawing on Umberto Eco's "Six Walks in the Fictional Woods", we engender a speculative, transdisc… ▽ More This paper presents a novel exploration of the interaction between generative AI models, visualization, and narrative generation processes, using OpenAI's GPT as a case study. We look at the question "Where Does Generativeness Comes From", which has a simple answer at the intersection of many domains. Drawing on Umberto Eco's "Six Walks in the Fictional Woods", we engender a speculative, transdisciplinary scientific narrative using ChatGPT in different roles: as an information repository, a ghost writer, a scientific coach, among others. The paper is written as a piling of plateaus where the titling of each (sub-)section, the "teaser" images, the headers, and a biblock of text are strata forming a narrative about narratives. To enrich our exposition, we present a visualization prototype to analyze storyboarded narratives, and extensive conversations with ChatGPT. Each link to a ChatGPT conversation is an experiment on writing where we try to use different plugins and techniques to investigate the topics that, ultimately form the content of this portable document file. Our visualization uses a dataset of stories with scene descriptions, textual descriptions of scenes (both generated by ChatGPT), and images (generated by Stable Diffusion using scene descriptions as prompts). We employ a simple graph-node diagram to try to make a "forest of narratives" visible, an example of a vis4gen application that can be used to analyze the output of Large Languange + Image Models. △ Less

Submitted 23 August, 2023; v1 submitted 13 July, 2023; originally announced August 2023.

Comments: this is a submission for IEEE alt.vis 2023

arXiv:2308.05199 [pdf, other]

gZCCL: Compression-Accelerated Collective Communication Framework for GPU Clusters

Authors: Jiajun Huang, Sheng Di, Xiaodong Yu, Yujia Zhai, Jinyang Liu, Yafan Huang, Ken Raffenetti, Hui Zhou, Kai Zhao, Xiaoyi Lu, Zizhong Chen, Franck Cappello, Yanfei Guo, Rajeev Thakur

Abstract: GPU-aware collective communication has become a major bottleneck for modern computing platforms as GPU computing power rapidly rises. A traditional approach is to directly integrate lossy compression into GPU-aware collectives, which can lead to serious performance issues such as underutilized GPU devices and uncontrolled data distortion. In order to address these issues, in this paper, we propose… ▽ More GPU-aware collective communication has become a major bottleneck for modern computing platforms as GPU computing power rapidly rises. A traditional approach is to directly integrate lossy compression into GPU-aware collectives, which can lead to serious performance issues such as underutilized GPU devices and uncontrolled data distortion. In order to address these issues, in this paper, we propose gZCCL, a first-ever general framework that designs and optimizes GPU-aware, compression-enabled collectives with an accuracy-aware design to control error propagation. To validate our framework, we evaluate the performance on up to 512 NVIDIA A100 GPUs with real-world applications and datasets. Experimental results demonstrate that our gZCCL-accelerated collectives, including both collective computation (Allreduce) and collective data movement (Scatter), can outperform NCCL as well as Cray MPI by up to 4.5X and 28.7X, respectively. Furthermore, our accuracy evaluation with an image-stacking application confirms the high reconstructed data quality of our accuracy-aware framework. △ Less

Submitted 6 May, 2024; v1 submitted 9 August, 2023; originally announced August 2023.

Comments: 12 pages, 13 figures, and 2 tables. ICS '24

arXiv:2307.09609 [pdf, other]

AMRIC: A Novel In Situ Lossy Compression Framework for Efficient I/O in Adaptive Mesh Refinement Applications

Authors: Daoce Wang, Jesus Pulido, Pascal Grosset, Jiannan Tian, Sian Jin, Houjun Tang, Jean Sexton, Sheng Di, Zarija Lukić, Kai Zhao, Bo Fang, Franck Cappello, James Ahrens, Dingwen Tao

Abstract: As supercomputers advance towards exascale capabilities, computational intensity increases significantly, and the volume of data requiring storage and transmission experiences exponential growth. Adaptive Mesh Refinement (AMR) has emerged as an effective solution to address these two challenges. Concurrently, error-bounded lossy compression is recognized as one of the most efficient approaches to… ▽ More As supercomputers advance towards exascale capabilities, computational intensity increases significantly, and the volume of data requiring storage and transmission experiences exponential growth. Adaptive Mesh Refinement (AMR) has emerged as an effective solution to address these two challenges. Concurrently, error-bounded lossy compression is recognized as one of the most efficient approaches to tackle the latter issue. Despite their respective advantages, few attempts have been made to investigate how AMR and error-bounded lossy compression can function together. To this end, this study presents a novel in-situ lossy compression framework that employs the HDF5 filter to improve both I/O costs and boost compression quality for AMR applications. We implement our solution into the AMReX framework and evaluate on two real-world AMR applications, Nyx and WarpX, on the Summit supercomputer. Experiments with 4096 CPU cores demonstrate that AMRIC improves the compression ratio by up to 81X and the I/O performance by up to 39X over AMReX's original compression solution. △ Less

Submitted 13 July, 2023; originally announced July 2023.

Comments: 12 pages, 18 figures, 3 tables, accepted by ACM/IEEE SC '23

arXiv:2307.06975 [pdf, other]

Neuro-symbolic Empowered Denoising Diffusion Probabilistic Models for Real-time Anomaly Detection in Industry 4.0

Authors: Luigi Capogrosso, Alessio Mascolini, Federico Girella, Geri Skenderi, Sebastiano Gaiardelli, Nicola Dall'Ora, Francesco Ponzio, Enrico Fraccaroli, Santa Di Cataldo, Sara Vinco, Enrico Macii, Franco Fummi, Marco Cristani

Abstract: Industry 4.0 involves the integration of digital technologies, such as IoT, Big Data, and AI, into manufacturing and industrial processes to increase efficiency and productivity. As these technologies become more interconnected and interdependent, Industry 4.0 systems become more complex, which brings the difficulty of identifying and stopping anomalies that may cause disturbances in the manufactu… ▽ More Industry 4.0 involves the integration of digital technologies, such as IoT, Big Data, and AI, into manufacturing and industrial processes to increase efficiency and productivity. As these technologies become more interconnected and interdependent, Industry 4.0 systems become more complex, which brings the difficulty of identifying and stopping anomalies that may cause disturbances in the manufacturing process. This paper aims to propose a diffusion-based model for real-time anomaly prediction in Industry 4.0 processes. Using a neuro-symbolic approach, we integrate industrial ontologies in the model, thereby adding formal knowledge on smart manufacturing. Finally, we propose a simple yet effective way of distilling diffusion models through Random Fourier Features for deployment on an embedded system for direct integration into the manufacturing process. To the best of our knowledge, this approach has never been explored before. △ Less

Submitted 18 July, 2023; v1 submitted 13 July, 2023; originally announced July 2023.

Comments: Accepted at the 26th Forum on specification and Design Languages (FDL 2023)

arXiv:2307.05416 [pdf, other]

Optimizing Scientific Data Transfer on Globus with Error-bounded Lossy Compression

Authors: Yuanjian Liu, Sheng Di, Kyle Chard, Ian Foster, Franck Cappello

Abstract: The increasing volume and velocity of science data necessitate the frequent movement of enormous data volumes as part of routine research activities. As a result, limited wide-area bandwidth often leads to bottlenecks in research progress. However, in many cases, consuming applications (e.g., for analysis, visualization, and machine learning) can achieve acceptable performance on reduced-precision… ▽ More The increasing volume and velocity of science data necessitate the frequent movement of enormous data volumes as part of routine research activities. As a result, limited wide-area bandwidth often leads to bottlenecks in research progress. However, in many cases, consuming applications (e.g., for analysis, visualization, and machine learning) can achieve acceptable performance on reduced-precision data, and thus researchers may wish to compromise on data precision to reduce transfer and storage costs. Error-bounded lossy compression presents a promising approach as it can significantly reduce data volumes while preserving data integrity based on user-specified error bounds. In this paper, we propose a novel data transfer framework called Ocelot that integrates error-bounded lossy compression into the Globus data transfer infrastructure. We note four key contributions: (1) Ocelot is the first integration of lossy compression in Globus to significantly improve scientific data transfer performance over wide area network (WAN). (2) We propose an effective machine-learning based lossy compression quality estimation model that can predict the quality of error-bounded lossy compressors, which is fundamental to ensure that transferred data are acceptable to users. (3) We develop optimized strategies to reduce the compression time overhead, counter the compute-node waiting time, and improve transfer speed for compressed files. (4) We perform evaluations using many real-world scientific applications across different domains and distributed Globus endpoints. Our experiments show that Ocelot can improve dataset transfer performance substantially, and the quality of lossy compression (time, ratio and data distortion) can be predicted accurately for the purpose of quality assurance. △ Less

Submitted 11 July, 2023; originally announced July 2023.

arXiv:2306.16639 [pdf]

Evaluating ChatGPT's Decimal Skills and Feedback Generation in a Digital Learning Game

Authors: Huy A. Nguyen, Hayden Stec, Xinying Hou, Sarah Di, Bruce M. McLaren

Abstract: While open-ended self-explanations have been shown to promote robust learning in multiple studies, they pose significant challenges to automated grading and feedback in technology-enhanced learning, due to the unconstrained nature of the students' input. Our work investigates whether recent advances in Large Language Models, and in particular ChatGPT, can address this issue. Using decimal exercise… ▽ More While open-ended self-explanations have been shown to promote robust learning in multiple studies, they pose significant challenges to automated grading and feedback in technology-enhanced learning, due to the unconstrained nature of the students' input. Our work investigates whether recent advances in Large Language Models, and in particular ChatGPT, can address this issue. Using decimal exercises and student data from a prior study of the learning game Decimal Point, with more than 5,000 open-ended self-explanation responses, we investigate ChatGPT's capability in (1) solving the in-game exercises, (2) determining the correctness of students' answers, and (3) providing meaningful feedback to incorrect answers. Our results showed that ChatGPT can respond well to conceptual questions, but struggled with decimal place values and number line problems. In addition, it was able to accurately assess the correctness of 75% of the students' answers and generated generally high-quality feedback, similar to human instructors. We conclude with a discussion of ChatGPT's strengths and weaknesses and suggest several venues for extending its use cases in digital teaching and learning. △ Less

Submitted 28 June, 2023; originally announced June 2023.

Comments: Be accepted as a Research Paper in 18th European Conference on Technology Enhanced Learning

arXiv:2306.13867 [pdf, other]

Physics-Informed Machine Learning for Modeling and Control of Dynamical Systems

Authors: Truong X. Nghiem, Ján Drgoňa, Colin Jones, Zoltan Nagy, Roland Schwan, Biswadip Dey, Ankush Chakrabarty, Stefano Di Cairano, Joel A. Paulson, Andrea Carron, Melanie N. Zeilinger, Wenceslao Shaw Cortez, Draguna L. Vrabie

Abstract: Physics-informed machine learning (PIML) is a set of methods and tools that systematically integrate machine learning (ML) algorithms with physical constraints and abstract mathematical models developed in scientific and engineering domains. As opposed to purely data-driven methods, PIML models can be trained from additional information obtained by enforcing physical laws such as energy and mass c… ▽ More Physics-informed machine learning (PIML) is a set of methods and tools that systematically integrate machine learning (ML) algorithms with physical constraints and abstract mathematical models developed in scientific and engineering domains. As opposed to purely data-driven methods, PIML models can be trained from additional information obtained by enforcing physical laws such as energy and mass conservation. More broadly, PIML models can include abstract properties and conditions such as stability, convexity, or invariance. The basic premise of PIML is that the integration of ML and physics can yield more effective, physically consistent, and data-efficient models. This paper aims to provide a tutorial-like overview of the recent advances in PIML for dynamical system modeling and control. Specifically, the paper covers an overview of the theory, fundamental concepts and methods, tools, and applications on topics of: 1) physics-informed learning for system identification; 2) physics-informed learning for control; 3) analysis and verification of PIML models; and 4) physics-informed digital twins. The paper is concluded with a perspective on open challenges and future research opportunities. △ Less

Submitted 24 June, 2023; originally announced June 2023.

Comments: 16 pages, 4 figures, to be published in 2023 American Control Conference (ACC)

Showing 1–50 of 150 results for author: Di, S