subscribe to arXiv mailings

GoSurf: Identifying Software Supply Chain Attack Vectors in Go

Authors: Carmine Cesarano, Vivi Andersson, Roberto Natella, Martin Monperrus

Abstract: In Go, the widespread adoption of open-source software has led to a flourishing ecosystem of third-party dependencies, which are often integrated into critical systems. However, the reuse of dependencies introduces significant supply chain security risks, as a single compromised package can have cascading impacts. Existing supply chain attack taxonomies overlook language-specific features that can… ▽ More In Go, the widespread adoption of open-source software has led to a flourishing ecosystem of third-party dependencies, which are often integrated into critical systems. However, the reuse of dependencies introduces significant supply chain security risks, as a single compromised package can have cascading impacts. Existing supply chain attack taxonomies overlook language-specific features that can be exploited by attackers to hide malicious code. In this paper, we propose a novel taxonomy of 12 distinct attack vectors tailored for the Go language and its package lifecycle. Our taxonomy identifies patterns in which language-specific Go features, intended for benign purposes, can be misused to propagate malicious code stealthily through supply chains. Additionally, we introduce GoSurf, a static analysis tool that analyzes the attack surface of Go packages according to our proposed taxonomy. We evaluate GoSurf on a corpus of widely used, real-world Go packages. Our work provides preliminary insights for securing the open-source software supply chain within the Go ecosystem, allowing developers and security analysts to prioritize code audit efforts and uncover hidden malicious behaviors. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2407.00125 [pdf, other]

A Survey on Failure Analysis and Fault Injection in AI Systems

Authors: Guangba Yu, Gou Tan, Haojia Huang, Zhenyu Zhang, Pengfei Chen, Roberto Natella, Zibin Zheng

Abstract: The rapid advancement of Artificial Intelligence (AI) has led to its integration into various areas, especially with Large Language Models (LLMs) significantly enhancing capabilities in Artificial Intelligence Generated Content (AIGC). However, the complexity of AI systems has also exposed their vulnerabilities, necessitating robust methods for failure analysis (FA) and fault injection (FI) to ens… ▽ More The rapid advancement of Artificial Intelligence (AI) has led to its integration into various areas, especially with Large Language Models (LLMs) significantly enhancing capabilities in Artificial Intelligence Generated Content (AIGC). However, the complexity of AI systems has also exposed their vulnerabilities, necessitating robust methods for failure analysis (FA) and fault injection (FI) to ensure resilience and reliability. Despite the importance of these techniques, there lacks a comprehensive review of FA and FI methodologies in AI systems. This study fills this gap by presenting a detailed survey of existing FA and FI approaches across six layers of AI systems. We systematically analyze 160 papers and repositories to answer three research questions including (1) what are the prevalent failures in AI systems, (2) what types of faults can current FI tools simulate, (3) what gaps exist between the simulated faults and real-world failures. Our findings reveal a taxonomy of AI system failures, assess the capabilities of existing FI tools, and highlight discrepancies between real-world and simulated failures. Moreover, this survey contributes to the field by providing a framework for fault diagnosis, evaluating the state-of-the-art in FI, and identifying areas for improvement in FI techniques to enhance the resilience of AI systems. △ Less

Submitted 27 June, 2024; originally announced July 2024.

arXiv:2404.12893 [pdf, other]

The Power of Words: Generating PowerShell Attacks from Natural Language

Authors: Pietro Liguori, Christian Marescalco, Roberto Natella, Vittorio Orbinato, Luciano Pianese

Abstract: As the Windows OS stands out as one of the most targeted systems, the PowerShell language has become a key tool for malicious actors and cybersecurity professionals (e.g., for penetration testing). This work explores an uncharted domain in AI code generation by automatically generating offensive PowerShell code from natural language descriptions using Neural Machine Translation (NMT). For training… ▽ More As the Windows OS stands out as one of the most targeted systems, the PowerShell language has become a key tool for malicious actors and cybersecurity professionals (e.g., for penetration testing). This work explores an uncharted domain in AI code generation by automatically generating offensive PowerShell code from natural language descriptions using Neural Machine Translation (NMT). For training and evaluation purposes, we propose two novel datasets with PowerShell code samples, one with manually curated descriptions in natural language and another code-only dataset for reinforcing the training. We present an extensive evaluation of state-of-the-art NMT models and analyze the generated code both statically and dynamically. Results indicate that tuning NMT using our dataset is effective at generating offensive PowerShell code. Comparative analysis against the most widely used LLM service ChatGPT reveals the specialized strengths of our fine-tuned models. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: 18th USENIX WOOT Conference on Offensive Technologies, GitHub Repo: https://github.com/dessertlab/powershell-offensive-code-generation

arXiv:2402.01219 [pdf, other]

doi 10.1109/MSEC.2024.3355713

AI Code Generators for Security: Friend or Foe?

Authors: Roberto Natella, Pietro Liguori, Cristina Improta, Bojan Cukic, Domenico Cotroneo

Abstract: Recent advances of artificial intelligence (AI) code generators are opening new opportunities in software security research, including misuse by malicious actors. We review use cases for AI code generators for security and introduce an evaluation benchmark. Recent advances of artificial intelligence (AI) code generators are opening new opportunities in software security research, including misuse by malicious actors. We review use cases for AI code generators for security and introduce an evaluation benchmark. △ Less

Submitted 2 February, 2024; originally announced February 2024.

Comments: Dataset available at: https://github.com/dessertlab/violent-python

Journal ref: IEEE Security & Privacy, Early Access, February 2024

arXiv:2401.05961 [pdf, other]

Securing an Application Layer Gateway: An Industrial Case Study

Authors: Carmine Cesarano, Roberto Natella

Abstract: Application Layer Gateways (ALGs) play a crucial role in securing critical systems, including railways, industrial automation, and defense applications, by segmenting networks at different levels of criticality. However, they require rigorous security testing to prevent software vulnerabilities, not only at the network level but also at the application layer (e.g., deep traffic inspection componen… ▽ More Application Layer Gateways (ALGs) play a crucial role in securing critical systems, including railways, industrial automation, and defense applications, by segmenting networks at different levels of criticality. However, they require rigorous security testing to prevent software vulnerabilities, not only at the network level but also at the application layer (e.g., deep traffic inspection components). This paper presents a vulnerability-driven methodology for the comprehensive security testing of ALGs. We present the methodology in the context of an industrial case study in the railways domain, and a simulation-based testing environment to support the methodology. △ Less

Submitted 11 January, 2024; originally announced January 2024.

arXiv:2311.08274 [pdf, other]

doi 10.1109/TDSC.2024.3376129

Laccolith: Hypervisor-Based Adversary Emulation with Anti-Detection

Authors: Vittorio Orbinato, Marco Carlo Feliciano, Domenico Cotroneo, Roberto Natella

Abstract: Advanced Persistent Threats (APTs) represent the most threatening form of attack nowadays since they can stay undetected for a long time. Adversary emulation is a proactive approach for preparing against these attacks. However, adversary emulation tools lack the anti-detection abilities of APTs. We introduce Laccolith, a hypervisor-based solution for adversary emulation with anti-detection to fill… ▽ More Advanced Persistent Threats (APTs) represent the most threatening form of attack nowadays since they can stay undetected for a long time. Adversary emulation is a proactive approach for preparing against these attacks. However, adversary emulation tools lack the anti-detection abilities of APTs. We introduce Laccolith, a hypervisor-based solution for adversary emulation with anti-detection to fill this gap. We also present an experimental study to compare Laccolith with MITRE CALDERA, a state-of-the-art solution for adversary emulation, against five popular anti-virus products. We found that CALDERA cannot evade detection, limiting the realism of emulated attacks, even when combined with a state-of-the-art anti-detection framework. Our experiments show that Laccolith can hide its activities from all the tested anti-virus products, thus making it suitable for realistic emulations. △ Less

Submitted 29 April, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

arXiv:2310.18834 [pdf, other]

doi 10.1016/j.jss.2024.112113

Automating the Correctness Assessment of AI-generated Code for Security Contexts

Authors: Domenico Cotroneo, Alessio Foggia, Cristina Improta, Pietro Liguori, Roberto Natella

Abstract: Evaluating the correctness of code generated by AI is a challenging open problem. In this paper, we propose a fully automated method, named ACCA, to evaluate the correctness of AI-generated code for security purposes. The method uses symbolic execution to assess whether the AI-generated code behaves as a reference implementation. We use ACCA to assess four state-of-the-art models trained to genera… ▽ More Evaluating the correctness of code generated by AI is a challenging open problem. In this paper, we propose a fully automated method, named ACCA, to evaluate the correctness of AI-generated code for security purposes. The method uses symbolic execution to assess whether the AI-generated code behaves as a reference implementation. We use ACCA to assess four state-of-the-art models trained to generate security-oriented assembly code and compare the results of the evaluation with different baseline solutions, including output similarity metrics, widely used in the field, and the well-known ChatGPT, the AI-powered language model developed by OpenAI. Our experiments show that our method outperforms the baseline solutions and assesses the correctness of the AI-generated code similar to the human-based evaluation, which is considered the ground truth for the assessment in the field. Moreover, ACCA has a very strong correlation with the human evaluation (Pearson's correlation coefficient r=0.84 on average). Finally, since it is a fully automated solution that does not require any human intervention, the proposed method performs the assessment of every code snippet in ~0.17s on average, which is definitely lower than the average time required by human analysts to manually inspect the code, based on our experience. △ Less

Submitted 8 June, 2024; v1 submitted 28 October, 2023; originally announced October 2023.

arXiv:2308.04451 [pdf, other]

doi 10.1145/3643916.3644416

Vulnerabilities in AI Code Generators: Exploring Targeted Data Poisoning Attacks

Authors: Domenico Cotroneo, Cristina Improta, Pietro Liguori, Roberto Natella

Abstract: AI-based code generators have become pivotal in assisting developers in writing software starting from natural language (NL). However, they are trained on large amounts of data, often collected from unsanitized online sources (e.g., GitHub, HuggingFace). As a consequence, AI models become an easy target for data poisoning, i.e., an attack that injects malicious samples into the training data to ge… ▽ More AI-based code generators have become pivotal in assisting developers in writing software starting from natural language (NL). However, they are trained on large amounts of data, often collected from unsanitized online sources (e.g., GitHub, HuggingFace). As a consequence, AI models become an easy target for data poisoning, i.e., an attack that injects malicious samples into the training data to generate vulnerable code. To address this threat, this work investigates the security of AI code generators by devising a targeted data poisoning strategy. We poison the training data by injecting increasing amounts of code containing security vulnerabilities and assess the attack's success on different state-of-the-art models for code generation. Our study shows that AI code generators are vulnerable to even a small amount of poison. Notably, the attack success strongly depends on the model architecture and poisoning rate, whereas it is not influenced by the type of vulnerabilities. Moreover, since the attack does not impact the correctness of code generated by pre-trained models, it is hard to detect. Lastly, our work offers practical insights into understanding and potentially mitigating this threat. △ Less

Submitted 9 February, 2024; v1 submitted 4 August, 2023; originally announced August 2023.

Comments: Accepted for publication at the International Conference on Program Comprehension 2024

arXiv:2306.05079 [pdf, other]

Enhancing Robustness of AI Offensive Code Generators via Data Augmentation

Authors: Cristina Improta, Pietro Liguori, Roberto Natella, Bojan Cukic, Domenico Cotroneo

Abstract: In this work, we present a method to add perturbations to the code descriptions to create new inputs in natural language (NL) from well-intentioned developers that diverge from the original ones due to the use of new words or because they miss part of them. The goal is to analyze how and to what extent perturbations affect the performance of AI code generators in the context of security-oriented c… ▽ More In this work, we present a method to add perturbations to the code descriptions to create new inputs in natural language (NL) from well-intentioned developers that diverge from the original ones due to the use of new words or because they miss part of them. The goal is to analyze how and to what extent perturbations affect the performance of AI code generators in the context of security-oriented code. First, we show that perturbed descriptions preserve the semantics of the original, non-perturbed ones. Then, we use the method to assess the robustness of three state-of-the-art code generators against the newly perturbed inputs, showing that the performance of these AI-based solutions is highly affected by perturbations in the NL descriptions. To enhance their robustness, we use the method to perform data augmentation, i.e., to increase the variability and diversity of the NL descriptions in the training data, proving its effectiveness against both perturbed and non-perturbed code descriptions. △ Less

Submitted 1 October, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

arXiv:2301.07422 [pdf, other]

doi 10.1016/j.jss.2023.111611

Run-time Failure Detection via Non-intrusive Event Analysis in a Large-Scale Cloud Computing Platform

Authors: Domenico Cotroneo, Luigi De Simone, Pietro Liguori, Roberto Natella

Abstract: Cloud computing systems fail in complex and unforeseen ways due to unexpected combinations of events and interactions among hardware and software components. These failures are especially problematic when they are silent, i.e., not accompanied by any explicit failure notification, hindering the timely detection and recovery. In this work, we propose an approach to run-time failure detection tailor… ▽ More Cloud computing systems fail in complex and unforeseen ways due to unexpected combinations of events and interactions among hardware and software components. These failures are especially problematic when they are silent, i.e., not accompanied by any explicit failure notification, hindering the timely detection and recovery. In this work, we propose an approach to run-time failure detection tailored for monitoring multi-tenant and concurrent cloud computing systems. The approach uses a non-intrusive form of event tracing, without manual changes to the system's internals to propagate session identifiers (IDs), and builds a set of lightweight monitoring rules from fault-free executions. We evaluated the effectiveness of the approach in detecting failures in the context of the OpenStack cloud computing platform, a complex and "off-the-shelf" distributed system, by executing a campaign of fault injection experiments in a multi-tenant scenario. Our experiments show that the approach detects the failure with an F1 score (0.85) and accuracy (0.77) higher than the ones provided by the OpenStack failure logging mechanisms (0.53 and 0.50) and two non--session-aware run-time verification approaches (both lower than 0.15). Moreover, the approach significantly decreases the average time to detect failures at run-time (~114 seconds) compared to the OpenStack logging mechanisms. △ Less

Submitted 18 January, 2023; originally announced January 2023.

Comments: Paper accepted for publication in The Journal of Systems and Software

arXiv:2212.06008 [pdf, other]

doi 10.1016/j.eswa.2023.120073

Who Evaluates the Evaluators? On Automatic Metrics for Assessing AI-based Offensive Code Generators

Authors: Pietro Liguori, Cristina Improta, Roberto Natella, Bojan Cukic, Domenico Cotroneo

Abstract: AI-based code generators are an emerging solution for automatically writing programs starting from descriptions in natural language, by using deep neural networks (Neural Machine Translation, NMT). In particular, code generators have been used for ethical hacking and offensive security testing by generating proof-of-concept attacks. Unfortunately, the evaluation of code generators still faces seve… ▽ More AI-based code generators are an emerging solution for automatically writing programs starting from descriptions in natural language, by using deep neural networks (Neural Machine Translation, NMT). In particular, code generators have been used for ethical hacking and offensive security testing by generating proof-of-concept attacks. Unfortunately, the evaluation of code generators still faces several issues. The current practice uses output similarity metrics, i.e., automatic metrics that compute the textual similarity of generated code with ground-truth references. However, it is not clear what metric to use, and which metric is most suitable for specific contexts. This work analyzes a large set of output similarity metrics on offensive code generators. We apply the metrics on two state-of-the-art NMT models using two datasets containing offensive assembly and Python code with their descriptions in the English language. We compare the estimates from the automatic metrics with human evaluation and provide practical insights into their strengths and limitations. △ Less

Submitted 13 April, 2023; v1 submitted 12 December, 2022; originally announced December 2022.

arXiv:2208.14109 [pdf, other]

On Temporal Isolation Assessment in Virtualized Railway Signaling as a Service Systems

Authors: Domenico Cotroneo, Luigi De Simone, Roberto Natella

Abstract: Railway signaling systems provide numerous critical functions at different safety level, to correctly implement the entire transport ecosystem. Today, we are witnessing the increasing use of the cloud and virtualization technologies in such mixed-criticality systems, with the main goal of reducing costs, improving reliability, while providing orchestration capabilities. Unfortunately, virtualizati… ▽ More Railway signaling systems provide numerous critical functions at different safety level, to correctly implement the entire transport ecosystem. Today, we are witnessing the increasing use of the cloud and virtualization technologies in such mixed-criticality systems, with the main goal of reducing costs, improving reliability, while providing orchestration capabilities. Unfortunately, virtualization includes several issues for assessing temporal isolation, which is critical for safety-related standards like EN50128. In this short paper, we envision leveraging the real-time flavor of a general-purpose hypervisor, like Xen, to build the Railway Signaling as a Service (RSaaS) systems of the future. We provide a preliminary background, highlighting the need for a systematic evaluation of the temporal isolation to demonstrate the feasibility of using general-purpose hypervisors in the safety-critical context for certification purposes. △ Less

Submitted 30 August, 2022; originally announced August 2022.

Comments: 5 pages, The Twentieth International Workshop on Assurance in Distributed Systems and Networks (ADSN 2022)

arXiv:2208.12770 [pdf, ps, other]

doi 10.1109/TSC.2022.3183938

A Latency-driven Availability Assessment for Multi-Tenant Service Chains

Authors: Luigi De Simone, Mario Di Mauro, Roberto Natella, Fabio Postiglione

Abstract: Nowadays, most telecommunication services adhere to the Service Function Chain (SFC) paradigm, where network functions are implemented via software. In particular, container virtualization is becoming a popular approach to deploy network functions and to enable resource slicing among several tenants. The resulting infrastructure is a complex system composed by a huge amount of containers implement… ▽ More Nowadays, most telecommunication services adhere to the Service Function Chain (SFC) paradigm, where network functions are implemented via software. In particular, container virtualization is becoming a popular approach to deploy network functions and to enable resource slicing among several tenants. The resulting infrastructure is a complex system composed by a huge amount of containers implementing different SFC functionalities, along with different tenants sharing the same chain. The complexity of such a scenario lead us to evaluate two critical metrics: the steady-state availability (the probability that a system is functioning in long runs) and the latency (the time between a service request and the pertinent response). Consequently, we propose a latency-driven availability assessment for multi-tenant service chains implemented via Containerized Network Functions (CNFs). We adopt a multi-state system to model single CNFs and the queueing formalism to characterize the service latency. To efficiently compute the availability, we develop a modified version of the Multidimensional Universal Generating Function (MUGF) technique. Finally, we solve an optimization problem to minimize the SFC cost under an availability constraint. As a relevant example of SFC, we consider a containerized version of IP Multimedia Subsystem, whose parameters have been estimated through fault injection techniques and load tests. △ Less

Submitted 26 August, 2022; originally announced August 2022.

arXiv:2208.12144 [pdf, other]

Automatic Mapping of Unstructured Cyber Threat Intelligence: An Experimental Study

Authors: Vittorio Orbinato, Mariarosaria Barbaraci, Roberto Natella, Domenico Cotroneo

Abstract: Proactive approaches to security, such as adversary emulation, leverage information about threat actors and their techniques (Cyber Threat Intelligence, CTI). However, most CTI still comes in unstructured forms (i.e., natural language), such as incident reports and leaked documents. To support proactive security efforts, we present an experimental study on the automatic classification of unstructu… ▽ More Proactive approaches to security, such as adversary emulation, leverage information about threat actors and their techniques (Cyber Threat Intelligence, CTI). However, most CTI still comes in unstructured forms (i.e., natural language), such as incident reports and leaked documents. To support proactive security efforts, we present an experimental study on the automatic classification of unstructured CTI into attack techniques using machine learning (ML). We contribute with two new datasets for CTI analysis, and we evaluate several ML models, including both traditional and deep learning-based ones. We present several lessons learned about how ML can perform at this task, which classifiers perform best and under which conditions, which are the main causes of classification errors, and the challenges ahead for CTI analysis. △ Less

Submitted 25 August, 2022; originally announced August 2022.

Comments: 2022 IEEE 33rd International Symposium on Software Reliability Engineering (ISSRE)

arXiv:2203.15319 [pdf, ps, other]

doi 10.1145/3528588.3528653

Can NMT Understand Me? Towards Perturbation-based Evaluation of NMT Models for Code Generation

Authors: Pietro Liguori, Cristina Improta, Simona De Vivo, Roberto Natella, Bojan Cukic, Domenico Cotroneo

Abstract: Neural Machine Translation (NMT) has reached a level of maturity to be recognized as the premier method for the translation between different languages and aroused interest in different research areas, including software engineering. A key step to validate the robustness of the NMT models consists in evaluating the performance of the models on adversarial inputs, i.e., inputs obtained from the ori… ▽ More Neural Machine Translation (NMT) has reached a level of maturity to be recognized as the premier method for the translation between different languages and aroused interest in different research areas, including software engineering. A key step to validate the robustness of the NMT models consists in evaluating the performance of the models on adversarial inputs, i.e., inputs obtained from the original ones by adding small amounts of perturbation. However, when dealing with the specific task of the code generation (i.e., the generation of code starting from a description in natural language), it has not yet been defined an approach to validate the robustness of the NMT models. In this work, we address the problem by identifying a set of perturbations and metrics tailored for the robustness assessment of such models. We present a preliminary experimental evaluation, showing what type of perturbations affect the model the most and deriving useful insights for future directions. △ Less

Submitted 30 March, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

Comments: Paper accepted for publication in the proceedings of The 1st Intl. Workshop on Natural Language-based Software Engineering (NLBSE) to be held with ICSE 2022

arXiv:2202.03755 [pdf, other]

doi 10.1007/s10515-022-00331-3

Can We Generate Shellcodes via Natural Language? An Empirical Study

Authors: Pietro Liguori, Erfan Al-Hossami, Domenico Cotroneo, Roberto Natella, Bojan Cukic, Samira Shaikh

Abstract: Writing software exploits is an important practice for offensive security analysts to investigate and prevent attacks. In particular, shellcodes are especially time-consuming and a technical challenge, as they are written in assembly language. In this work, we address the task of automatically generating shellcodes, starting purely from descriptions in natural language, by proposing an approach ba… ▽ More Writing software exploits is an important practice for offensive security analysts to investigate and prevent attacks. In particular, shellcodes are especially time-consuming and a technical challenge, as they are written in assembly language. In this work, we address the task of automatically generating shellcodes, starting purely from descriptions in natural language, by proposing an approach based on Neural Machine Translation (NMT). We then present an empirical study using a novel dataset (Shellcode_IA32), which consists of 3,200 assembly code snippets of real Linux/x86 shellcodes from public databases, annotated using natural language. Moreover, we propose novel metrics to evaluate the accuracy of NMT at generating shellcodes. The empirical analysis shows that NMT can generate assembly code snippets from the natural language with high accuracy and that in many cases can generate entire shellcodes with no errors. △ Less

Submitted 8 February, 2022; originally announced February 2022.

Comments: 33 pages, 5 figures, 9 tables. To be published in Automated Software Engineering journal

arXiv:2201.07521 [pdf, other]

ThorFI: A Novel Approach for Network Fault Injection as a Service

Authors: Domenico Cotroneo, Luigi De Simone, Roberto Natella

Abstract: In this work, we present a novel fault injection solution (ThorFI) for virtual networks in cloud computing infrastructures. ThorFI is designed to provide non-intrusive fault injection capabilities for a cloud tenant, and to isolate injections from interfering with other tenants on the infrastructure. We present the solution in the context of the OpenStack cloud management platform, and release thi… ▽ More In this work, we present a novel fault injection solution (ThorFI) for virtual networks in cloud computing infrastructures. ThorFI is designed to provide non-intrusive fault injection capabilities for a cloud tenant, and to isolate injections from interfering with other tenants on the infrastructure. We present the solution in the context of the OpenStack cloud management platform, and release this implementation as open-source software. Finally, we present two relevant case studies of ThorFI, respectively in an NFV IMS and of a high-availability cloud application. The case studies show that ThorFI can enhance functional tests with fault injection, as in 4%-34% of the test cases the IMS is unable to handle faults; and that despite redundancy in virtual networks, faults in one virtual network segment can propagate to other segments, and can affect the throughput and response time of the cloud application as a whole, by about 3 times in the worst case. △ Less

Submitted 20 January, 2022; v1 submitted 19 January, 2022; originally announced January 2022.

Comments: 21 pages, accepted for publication in Elsevier Journal of Networking and Computer Applications

arXiv:2112.06874 [pdf, other]

doi 10.1016/j.jss.2021.111181

Software Micro-Rejuvenation for Android Mobile Systems

Authors: Domenico Cotroneo, Luigi De Simone, Roberto Natella, Roberto Pietrantuono, Stefano Russo

Abstract: Software aging -- the phenomenon affecting many long-running systems, causing performance degradation or an increasing failure rate over mission time, and eventually leading to failure - is known to affect mobile devices and their operating systems, too. Software rejuvenation -- the technique typically used to counteract aging -- may compromise the user's perception of availability and reliability… ▽ More Software aging -- the phenomenon affecting many long-running systems, causing performance degradation or an increasing failure rate over mission time, and eventually leading to failure - is known to affect mobile devices and their operating systems, too. Software rejuvenation -- the technique typically used to counteract aging -- may compromise the user's perception of availability and reliability of the personal device, if applied at a coarse grain, e.g., by restarting applications or, worse, rebooting the entire device. This article proposes a configurable micro-rejuvenation technique to counteract software aging in Android-based mobile devices, acting at a fine-grained level, namely on in-memory system data structures. The technique is engineered in two phases. Before releasing the (customized) Android version, a heap profiling facility is used by the manufacturer's developers to identify potentially bloating data structures in Android services and to instrument their code. After release, an aging detection and rejuvenation service will safely clean up the bloating data structures, with a negligible impact on user perception and device availability, as neither the device nor operating system's processes are restarted. The results of experiments show the ability of the technique to provide significant gains in aging mobile operating system responsiveness and time to failure. △ Less

Submitted 13 December, 2021; originally announced December 2021.

Comments: Accepted for publication in Elsevier Journal of Systems and Software

arXiv:2110.06253 [pdf, other]

doi 10.1007/s10664-022-10233-3

StateAFL: Greybox Fuzzing for Stateful Network Servers

Authors: Roberto Natella

Abstract: Fuzzing network servers is a technical challenge, since the behavior of the target server depends on its state over a sequence of multiple messages. Existing solutions are costly and difficult to use, as they rely on manually-customized artifacts such as protocol models, protocol parsers, and learning frameworks. The aim of this work is to develop a greybox fuzzer (StateaAFL) for network servers t… ▽ More Fuzzing network servers is a technical challenge, since the behavior of the target server depends on its state over a sequence of multiple messages. Existing solutions are costly and difficult to use, as they rely on manually-customized artifacts such as protocol models, protocol parsers, and learning frameworks. The aim of this work is to develop a greybox fuzzer (StateaAFL) for network servers that only relies on lightweight analysis of the target program, with no manual customization, in a similar way to what the AFL fuzzer achieved for stateless programs. The proposed fuzzer instruments the target server at compile-time, to insert probes on memory allocations and network I/O operations. At run-time, it infers the current protocol state of the target server by taking snapshots of long-lived memory areas, and by applying a fuzzy hashing algorithm (Locality-Sensitive Hashing) to map memory contents to a unique state identifier. The fuzzer incrementally builds a protocol state machine for guiding fuzzing. We implemented and released StateaAFL as open-source software. As a basis for reproducible experimentation, we integrated StateaAFL with a large set of network servers for popular protocols, with no manual customization to accomodate for the protocol. The experimental results show that the fuzzer can be applied with no manual customization on a large set of network servers for popular protocols, and that it can achieve comparable, or even better code coverage and bug detection than customized fuzzing. Moreover, our qualitative analysis shows that states inferred from memory better reflect the server behavior than only using response codes from messages. △ Less

Submitted 4 October, 2022; v1 submitted 12 October, 2021; originally announced October 2021.

Comments: The tool is available at https://github.com/stateafl/stateafl

Journal ref: Empir Software Eng 27, 191 (2022)

arXiv:2109.00279 [pdf, other]

doi 10.1109/ISSRE52982.2021.00042

EVIL: Exploiting Software via Natural Language

Authors: Pietro Liguori, Erfan Al-Hossami, Vittorio Orbinato, Roberto Natella, Samira Shaikh, Domenico Cotroneo, Bojan Cukic

Abstract: Writing exploits for security assessment is a challenging task. The writer needs to master programming and obfuscation techniques to develop a successful exploit. To make the task easier, we propose an approach (EVIL) to automatically generate exploits in assembly/Python language from descriptions in natural language. The approach leverages Neural Machine Translation (NMT) techniques and a dataset… ▽ More Writing exploits for security assessment is a challenging task. The writer needs to master programming and obfuscation techniques to develop a successful exploit. To make the task easier, we propose an approach (EVIL) to automatically generate exploits in assembly/Python language from descriptions in natural language. The approach leverages Neural Machine Translation (NMT) techniques and a dataset that we developed for this work. We present an extensive experimental study to evaluate the feasibility of EVIL, using both automatic and manual analysis, and both at generating individual statements and entire exploits. The generated code achieved high accuracy in terms of syntactic and semantic correctness. △ Less

Submitted 1 September, 2021; originally announced September 2021.

Comments: Paper accepted at the 32nd International Symposium on Software Reliability Engineering (ISSRE 2021)

arXiv:2106.15182 [pdf, other]

doi 10.1016/j.jss.2021.111043

Enhancing the Analysis of Software Failures in Cloud Computing Systems with Deep Learning

Authors: Domenico Cotroneo, Luigi De Simone, Pietro Liguori, Roberto Natella

Abstract: Identifying the failure modes of cloud computing systems is a difficult and time-consuming task, due to the growing complexity of such systems, and the large volume and noisiness of failure data. This paper presents a novel approach for analyzing failure data from cloud systems, in order to relieve human analysts from manually fine-tuning the data for feature engineering. The approach leverages De… ▽ More Identifying the failure modes of cloud computing systems is a difficult and time-consuming task, due to the growing complexity of such systems, and the large volume and noisiness of failure data. This paper presents a novel approach for analyzing failure data from cloud systems, in order to relieve human analysts from manually fine-tuning the data for feature engineering. The approach leverages Deep Embedded Clustering (DEC), a family of unsupervised clustering algorithms based on deep learning, which uses an autoencoder to optimize data dimensionality and inter-cluster variance. We applied the approach in the context of the OpenStack cloud computing platform, both on the raw failure data and in combination with an anomaly detection pre-processing algorithm. The results show that the performance of the proposed approach, in terms of purity of clusters, is comparable to, or in some cases even better than manually fine-tuned clustering, thus avoiding the need for deep domain knowledge and reducing the effort to perform the analysis. In all cases, the proposed approach provides better performance than unsupervised clustering when no feature engineering is applied to the data. Moreover, the distribution of failure modes from the proposed approach is closer to the actual frequency of the failure modes. △ Less

Submitted 29 June, 2021; originally announced June 2021.

Comments: Paper accepted to the Journal of Systems and Software on June 28th, 2021

arXiv:2104.13660 [pdf, other]

doi 10.1016/j.cose.2021.102307

Timing Covert Channel Analysis of the VxWorks MILS Embedded Hypervisor under the Common Criteria Security Certification

Authors: Domenico Cotroneo, Luigi De Simone, Roberto Natella

Abstract: Virtualization technology is nowadays adopted in security-critical embedded systems to achieve higher performance and more design flexibility. However, it also comes with new security threats, where attackers leverage timing covert channels to exfiltrate sensitive information from a partition using a trojan. This paper presents a novel approach for the experimental assessment of timing covert ch… ▽ More Virtualization technology is nowadays adopted in security-critical embedded systems to achieve higher performance and more design flexibility. However, it also comes with new security threats, where attackers leverage timing covert channels to exfiltrate sensitive information from a partition using a trojan. This paper presents a novel approach for the experimental assessment of timing covert channels in embedded hypervisors, with a case study on security assessment of a commercial hypervisor product (Wind River VxWorks MILS), in cooperation with a licensed laboratory for the Common Criteria security certification. Our experimental analysis shows that it is indeed possible to establish a timing covert channel, and that the approach is useful for system designers for assessing that their configuration is robust against this kind of information leakage. △ Less

Submitted 28 April, 2021; originally announced April 2021.

Comments: To appear on Computers & Security

arXiv:2104.13100 [pdf, other]

doi 10.18653/v1/2021.nlp4prog-1.7

Shellcode_IA32: A Dataset for Automatic Shellcode Generation

Authors: Pietro Liguori, Erfan Al-Hossami, Domenico Cotroneo, Roberto Natella, Bojan Cukic, Samira Shaikh

Abstract: We take the first step to address the task of automatically generating shellcodes, i.e., small pieces of code used as a payload in the exploitation of a software vulnerability, starting from natural language comments. We assemble and release a novel dataset (Shellcode_IA32), consisting of challenging but common assembly instructions with their natural language descriptions. We experiment with stan… ▽ More We take the first step to address the task of automatically generating shellcodes, i.e., small pieces of code used as a payload in the exploitation of a software vulnerability, starting from natural language comments. We assemble and release a novel dataset (Shellcode_IA32), consisting of challenging but common assembly instructions with their natural language descriptions. We experiment with standard methods in neural machine translation (NMT) to establish baseline performance levels on this task. △ Less

Submitted 18 March, 2022; v1 submitted 27 April, 2021; originally announced April 2021.

Comments: Paper accepted to NLP4Prog Workshop 2021 co-located with ACL-IJCNLP 2021. Extended journal version of this work has been published in the Automated Software Engineering journal, Volume 29, Article no. 30, March 2022, DOI: 10.1007/s10515-022-00331-3

arXiv:2101.05102 [pdf, other]

ProFuzzBench: A Benchmark for Stateful Protocol Fuzzing

Authors: Roberto Natella, Van-Thuan Pham

Abstract: We present a new benchmark (ProFuzzBench) for stateful fuzzing of network protocols. The benchmark includes a suite of representative open-source network servers for popular protocols, and tools to automate experimentation. We discuss challenges and potential directions for future research based on this benchmark. We present a new benchmark (ProFuzzBench) for stateful fuzzing of network protocols. The benchmark includes a suite of representative open-source network servers for popular protocols, and tools to automate experimentation. We discuss challenges and potential directions for future research based on this benchmark. △ Less

Submitted 13 January, 2021; originally announced January 2021.

Comments: The source code of ProFuzzBench is available online on GitHub at: https://github.com/profuzzbench/profuzzbench

arXiv:2010.06607 [pdf, other]

doi 10.1007/978-3-030-76352-7_19

Towards Runtime Verification via Event Stream Processing in Cloud Computing Infrastructures

Authors: Domenico Cotroneo, Luigi De Simone, Pietro Liguori, Roberto Natella, Angela Scibelli

Abstract: Software bugs in cloud management systems often cause erratic behavior, hindering detection, and recovery of failures. As a consequence, the failures are not timely detected and notified, and can silently propagate through the system. To face these issues, we propose a lightweight approach to runtime verification, for monitoring and failure detection of cloud computing systems. We performed a prel… ▽ More Software bugs in cloud management systems often cause erratic behavior, hindering detection, and recovery of failures. As a consequence, the failures are not timely detected and notified, and can silently propagate through the system. To face these issues, we propose a lightweight approach to runtime verification, for monitoring and failure detection of cloud computing systems. We performed a preliminary evaluation of the proposed approach in the OpenStack cloud management platform, an "off-the-shelf" distributed system, showing that the approach can be applied with high failure detection coverage. △ Less

Submitted 13 October, 2020; originally announced October 2020.

Comments: International Workshop on Artificial Intelligence for IT Operations, 14 December 2020

arXiv:2010.00331 [pdf, other]

doi 10.1109/TDSC.2020.3025289

Fault Injection Analytics: A Novel Approach to Discover Failure Modes in Cloud-Computing Systems

Authors: Domenico Cotroneo, Luigi De Simone, Pietro Liguori, Roberto Natella

Abstract: Cloud computing systems fail in complex and unexpected ways due to unexpected combinations of events and interactions between hardware and software components. Fault injection is an effective means to bring out these failures in a controlled environment. However, fault injection experiments produce massive amounts of data, and manually analyzing these data is inefficient and error-prone, as the an… ▽ More Cloud computing systems fail in complex and unexpected ways due to unexpected combinations of events and interactions between hardware and software components. Fault injection is an effective means to bring out these failures in a controlled environment. However, fault injection experiments produce massive amounts of data, and manually analyzing these data is inefficient and error-prone, as the analyst can miss severe failure modes that are yet unknown. This paper introduces a new paradigm (fault injection analytics) that applies unsupervised machine learning on execution traces of the injected system, to ease the discovery and interpretation of failure modes. We evaluated the proposed approach in the context of fault injection experiments on the OpenStack cloud computing platform, where we show that the approach can accurately identify failure modes with a low computational cost. △ Less

Submitted 30 September, 2020; originally announced October 2020.

Comments: IEEE Transactions on Dependable and Secure Computing; 16 pages. arXiv admin note: text overlap with arXiv:1908.11640

arXiv:2008.06943 [pdf, other]

Dependability Evaluation of Middleware Technology for Large-scale Distributed Caching

Authors: Domenico Cotroneo, Roberto Natella, Stefano Rosiello

Abstract: Distributed caching systems (e.g., Memcached) are widely used by service providers to satisfy accesses by millions of concurrent clients. Given their large-scale, modern distributed systems rely on a middleware layer to manage caching nodes, to make applications easier to develop, and to apply load balancing and replication strategies. In this work, we performed a dependability evaluation of three… ▽ More Distributed caching systems (e.g., Memcached) are widely used by service providers to satisfy accesses by millions of concurrent clients. Given their large-scale, modern distributed systems rely on a middleware layer to manage caching nodes, to make applications easier to develop, and to apply load balancing and replication strategies. In this work, we performed a dependability evaluation of three popular middleware platforms, namely Twemproxy by Twitter, Mcrouter by Facebook, and Dynomite by Netflix, to assess availability and performance under faults, including failures of Memcached nodes and congestion due to unbalanced workloads and network link bandwidth bottlenecks. We point out the different availability and performance trade-offs achieved by the three platforms, and scenarios in which few faulty components cause cascading failures of the whole distributed system. △ Less

Submitted 18 August, 2020; v1 submitted 16 August, 2020; originally announced August 2020.

Comments: 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE 2020)

arXiv:2005.11523 [pdf, other]

A Comprehensive Study on Software Aging across Android Versions and Vendors

Authors: Domenico Cotroneo, Antonio Ken Iannillo, Roberto Natella, Roberto Pietrantuono

Abstract: This paper analyzes the phenomenon of software aging - namely, the gradual performance degradation and resource exhaustion in the long run - in the Android OS. The study intends to highlight if, and to what extent, devices from different vendors, under various usage conditions and configurations, are affected by software aging and which parts of the system are the main contributors. The results de… ▽ More This paper analyzes the phenomenon of software aging - namely, the gradual performance degradation and resource exhaustion in the long run - in the Android OS. The study intends to highlight if, and to what extent, devices from different vendors, under various usage conditions and configurations, are affected by software aging and which parts of the system are the main contributors. The results demonstrate that software aging systematically determines a gradual loss of responsiveness perceived by the user, and an unjustified depletion of physical memory. The analysis reveals differences in the aging trends due to the workload factors and to the type of running applications, as well as differences due to vendors' customization. Moreover, we analyze several system-level metrics to trace back the software aging effects to their main causes. We show that bloated Java containers are a significant contributor to software aging, and that it is feasible to mitigate aging through a micro-rejuvenation solution at the container level. △ Less

Submitted 23 May, 2020; originally announced May 2020.

arXiv:2005.04990 [pdf, other]

doi 10.1109/DSN48063.2020.00052

ProFIPy: Programmable Software Fault Injection as-a-Service

Authors: Domenico Cotroneo, Luigi De Simone, Pietro Liguori, Roberto Natella

Abstract: In this paper, we present a new fault injection tool (ProFIPy) for Python software. The tool is designed to be programmable, in order to enable users to specify their software fault model, using a domain-specific language (DSL) for fault injection. Moreover, to achieve better usability, ProFIPy is provided as software-as-a-service and supports the user through the configuration of the faultload an… ▽ More In this paper, we present a new fault injection tool (ProFIPy) for Python software. The tool is designed to be programmable, in order to enable users to specify their software fault model, using a domain-specific language (DSL) for fault injection. Moreover, to achieve better usability, ProFIPy is provided as software-as-a-service and supports the user through the configuration of the faultload and workload, failure data analysis, and full automation of the experiments using container-based virtualization and parallelization. △ Less

Submitted 11 May, 2020; originally announced May 2020.

Comments: 50th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2020)

arXiv:1912.03490 [pdf, other]

doi 10.1109/TR.2019.2954384

Dependability Assessment of the Android OS through Fault Injection

Authors: Domenico Cotroneo, Antonio Ken Iannillo, Roberto Natella, Stefano Rosiello

Abstract: The reliability of mobile devices is a challenge for vendors, since the mobile software stack has significantly grown in complexity. In this paper, we study how to assess the impact of faults on the quality of user experience in the Android mobile OS through fault injection. We first address the problem of identifying a realistic fault model for the Android OS, by providing to developers a set of… ▽ More The reliability of mobile devices is a challenge for vendors, since the mobile software stack has significantly grown in complexity. In this paper, we study how to assess the impact of faults on the quality of user experience in the Android mobile OS through fault injection. We first address the problem of identifying a realistic fault model for the Android OS, by providing to developers a set of lightweight and systematic guidelines for fault modeling. Then, we present an extensible fault injection tool (AndroFIT) to apply such fault model on actual, commercial Android devices. Finally, we present a large fault injection experimentation on three Android products from major vendors, and point out several reliability issues and opportunities for improving the Android OS. △ Less

Submitted 7 December, 2019; originally announced December 2019.

Journal ref: IEEE Transactions on Reliability, 2019

arXiv:1908.11640 [pdf, other]

doi 10.1109/ISSRE.2019.00023

Enhancing Failure Propagation Analysis in Cloud Computing Systems

Authors: Domenico Cotroneo, Luigi De Simone, Pietro Liguori, Roberto Natella, Nematollah Bidokhti

Abstract: In order to plan for failure recovery, the designers of cloud systems need to understand how their system can potentially fail. Unfortunately, analyzing the failure behavior of such systems can be very difficult and time-consuming, due to the large volume of events, non-determinism, and reuse of third-party components. To address these issues, we propose a novel approach that joins fault injection… ▽ More In order to plan for failure recovery, the designers of cloud systems need to understand how their system can potentially fail. Unfortunately, analyzing the failure behavior of such systems can be very difficult and time-consuming, due to the large volume of events, non-determinism, and reuse of third-party components. To address these issues, we propose a novel approach that joins fault injection with anomaly detection to identify the symptoms of failures. We evaluated the proposed approach in the context of the OpenStack cloud computing platform. We show that our model can significantly improve the accuracy of failure analysis in terms of false positives and negatives, with a low computational cost. △ Less

Submitted 30 August, 2019; originally announced August 2019.

Comments: 12 pages, The 30th International Symposium on Software Reliability Engineering (ISSRE 2019)

arXiv:1908.11297 [pdf, other]

Analyzing the Context of Bug-Fixing Changes in the OpenStack Cloud Computing Platform

Authors: Domenico Cotroneo, Luigi De Simone, Antonio Ken Iannillo, Roberto Natella, Stefano Rosiello, Nematollah Bidokhti

Abstract: Many research areas in software engineering, such as mutation testing, automatic repair, fault localization, and fault injection, rely on empirical knowledge about recurring bug-fixing code changes. Previous studies in this field focus on what has been changed due to bug-fixes, such as in terms of code edit actions. However, such studies did not consider where the bug-fix change was made (i.e., th… ▽ More Many research areas in software engineering, such as mutation testing, automatic repair, fault localization, and fault injection, rely on empirical knowledge about recurring bug-fixing code changes. Previous studies in this field focus on what has been changed due to bug-fixes, such as in terms of code edit actions. However, such studies did not consider where the bug-fix change was made (i.e., the context of the change), but knowing about the context can potentially narrow the search space for many software engineering techniques (e.g., by focusing mutation only on specific parts of the software). Furthermore, most previous work on bug-fixing changes focused on C and Java projects, but there is little empirical evidence about Python software. Therefore, in this paper we perform a thorough empirical analysis of bug-fixing changes in three OpenStack projects, focusing on both the what and the where of the changes. We observed that all the recurring change patterns are not oblivious with respect to the surrounding code, but tend to occur in specific code contexts. △ Less

Submitted 29 August, 2019; originally announced August 2019.

Comments: 14 pages, The 30th International Symposium on Software Reliability Engineering (ISSRE 2019)

arXiv:1907.04055 [pdf, other]

doi 10.1145/3338906.3338916

How Bad Can a Bug Get? An Empirical Analysis of Software Failures in the OpenStack Cloud Computing Platform

Authors: Domenico Cotroneo, Luigi De Simone, Pietro Liguori, Roberto Natella, Nematollah Bidokhti

Abstract: Cloud management systems provide abstractions and APIs for programmatically configuring cloud infrastructures. Unfortunately, residual software bugs in these systems can potentially lead to high-severity failures, such as prolonged outages and data losses. In this paper, we investigate the impact of failures in the context widespread OpenStack cloud management system, by performing fault injection… ▽ More Cloud management systems provide abstractions and APIs for programmatically configuring cloud infrastructures. Unfortunately, residual software bugs in these systems can potentially lead to high-severity failures, such as prolonged outages and data losses. In this paper, we investigate the impact of failures in the context widespread OpenStack cloud management system, by performing fault injection and by analyzing the impact of the resulting failures in terms of fail-stop behavior, failure detection through logging, and failure propagation across components. The analysis points out that most of the failures are not timely detected and notified; moreover, many of these failures can silently propagate over time and through components of the cloud management system, which call for more thorough run-time checks and fault containment. △ Less

Submitted 9 July, 2019; originally announced July 2019.

Comments: 12 pages, ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE '19)

Journal ref: ESEC/FSE 2019 Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering Pages 200-211

arXiv:1906.00621 [pdf, other]

doi 10.1007/s10664-019-09725-6

Evolutionary Fuzzing of Android OS Vendor System Services

Authors: Domenico Cotroneo, Antonio Ken Iannillo, Roberto Natella

Abstract: Android devices are shipped in several flavors by more than 100 manufacturer partners, which extend the Android "vanilla" OS with new system services, and modify the existing ones. These proprietary extensions expose Android devices to reliability and security issues. In this paper, we propose a coverage-guided fuzzing platform (Chizpurfle) based on evolutionary algorithms to test proprietary Andr… ▽ More Android devices are shipped in several flavors by more than 100 manufacturer partners, which extend the Android "vanilla" OS with new system services, and modify the existing ones. These proprietary extensions expose Android devices to reliability and security issues. In this paper, we propose a coverage-guided fuzzing platform (Chizpurfle) based on evolutionary algorithms to test proprietary Android system services. A key feature of this platform is the ability to profile coverage on the actual, unmodified Android device, by taking advantage of dynamic binary re-writing techniques. We applied this solution on three high-end commercial Android smartphones. The results confirmed that evolutionary fuzzing is able to test Android OS system services more efficiently than blind fuzzing. Furthermore, we evaluate the impact of different choices for the fitness function and selection algorithm. △ Less

Submitted 3 June, 2019; originally announced June 2019.

Showing 1–34 of 34 results for author: Natella, R