-
AI Code Generators for Security: Friend or Foe?
Authors:
Roberto Natella,
Pietro Liguori,
Cristina Improta,
Bojan Cukic,
Domenico Cotroneo
Abstract:
Recent advances of artificial intelligence (AI) code generators are opening new opportunities in software security research, including misuse by malicious actors. We review use cases for AI code generators for security and introduce an evaluation benchmark.
Recent advances of artificial intelligence (AI) code generators are opening new opportunities in software security research, including misuse by malicious actors. We review use cases for AI code generators for security and introduce an evaluation benchmark.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
Enhancing Robustness of AI Offensive Code Generators via Data Augmentation
Authors:
Cristina Improta,
Pietro Liguori,
Roberto Natella,
Bojan Cukic,
Domenico Cotroneo
Abstract:
In this work, we present a method to add perturbations to the code descriptions to create new inputs in natural language (NL) from well-intentioned developers that diverge from the original ones due to the use of new words or because they miss part of them. The goal is to analyze how and to what extent perturbations affect the performance of AI code generators in the context of security-oriented c…
▽ More
In this work, we present a method to add perturbations to the code descriptions to create new inputs in natural language (NL) from well-intentioned developers that diverge from the original ones due to the use of new words or because they miss part of them. The goal is to analyze how and to what extent perturbations affect the performance of AI code generators in the context of security-oriented code. First, we show that perturbed descriptions preserve the semantics of the original, non-perturbed ones. Then, we use the method to assess the robustness of three state-of-the-art code generators against the newly perturbed inputs, showing that the performance of these AI-based solutions is highly affected by perturbations in the NL descriptions. To enhance their robustness, we use the method to perform data augmentation, i.e., to increase the variability and diversity of the NL descriptions in the training data, proving its effectiveness against both perturbed and non-perturbed code descriptions.
△ Less
Submitted 1 October, 2023; v1 submitted 8 June, 2023;
originally announced June 2023.
-
Who Evaluates the Evaluators? On Automatic Metrics for Assessing AI-based Offensive Code Generators
Authors:
Pietro Liguori,
Cristina Improta,
Roberto Natella,
Bojan Cukic,
Domenico Cotroneo
Abstract:
AI-based code generators are an emerging solution for automatically writing programs starting from descriptions in natural language, by using deep neural networks (Neural Machine Translation, NMT). In particular, code generators have been used for ethical hacking and offensive security testing by generating proof-of-concept attacks. Unfortunately, the evaluation of code generators still faces seve…
▽ More
AI-based code generators are an emerging solution for automatically writing programs starting from descriptions in natural language, by using deep neural networks (Neural Machine Translation, NMT). In particular, code generators have been used for ethical hacking and offensive security testing by generating proof-of-concept attacks. Unfortunately, the evaluation of code generators still faces several issues. The current practice uses output similarity metrics, i.e., automatic metrics that compute the textual similarity of generated code with ground-truth references. However, it is not clear what metric to use, and which metric is most suitable for specific contexts. This work analyzes a large set of output similarity metrics on offensive code generators. We apply the metrics on two state-of-the-art NMT models using two datasets containing offensive assembly and Python code with their descriptions in the English language. We compare the estimates from the automatic metrics with human evaluation and provide practical insights into their strengths and limitations.
△ Less
Submitted 13 April, 2023; v1 submitted 12 December, 2022;
originally announced December 2022.
-
Can NMT Understand Me? Towards Perturbation-based Evaluation of NMT Models for Code Generation
Authors:
Pietro Liguori,
Cristina Improta,
Simona De Vivo,
Roberto Natella,
Bojan Cukic,
Domenico Cotroneo
Abstract:
Neural Machine Translation (NMT) has reached a level of maturity to be recognized as the premier method for the translation between different languages and aroused interest in different research areas, including software engineering. A key step to validate the robustness of the NMT models consists in evaluating the performance of the models on adversarial inputs, i.e., inputs obtained from the ori…
▽ More
Neural Machine Translation (NMT) has reached a level of maturity to be recognized as the premier method for the translation between different languages and aroused interest in different research areas, including software engineering. A key step to validate the robustness of the NMT models consists in evaluating the performance of the models on adversarial inputs, i.e., inputs obtained from the original ones by adding small amounts of perturbation. However, when dealing with the specific task of the code generation (i.e., the generation of code starting from a description in natural language), it has not yet been defined an approach to validate the robustness of the NMT models. In this work, we address the problem by identifying a set of perturbations and metrics tailored for the robustness assessment of such models. We present a preliminary experimental evaluation, showing what type of perturbations affect the model the most and deriving useful insights for future directions.
△ Less
Submitted 30 March, 2022; v1 submitted 29 March, 2022;
originally announced March 2022.
-
Can We Generate Shellcodes via Natural Language? An Empirical Study
Authors:
Pietro Liguori,
Erfan Al-Hossami,
Domenico Cotroneo,
Roberto Natella,
Bojan Cukic,
Samira Shaikh
Abstract:
Writing software exploits is an important practice for offensive security analysts to investigate and prevent attacks. In particular, shellcodes are especially time-consuming and a technical challenge, as they are written in assembly language. In this work, we address the task of automatically generating shellcodes, starting purely from descriptions in natural language, by proposing an approach ba…
▽ More
Writing software exploits is an important practice for offensive security analysts to investigate and prevent attacks. In particular, shellcodes are especially time-consuming and a technical challenge, as they are written in assembly language. In this work, we address the task of automatically generating shellcodes, starting purely from descriptions in natural language, by proposing an approach based on Neural Machine Translation (NMT). We then present an empirical study using a novel dataset (Shellcode_IA32), which consists of 3,200 assembly code snippets of real Linux/x86 shellcodes from public databases, annotated using natural language. Moreover, we propose novel metrics to evaluate the accuracy of NMT at generating shellcodes. The empirical analysis shows that NMT can generate assembly code snippets from the natural language with high accuracy and that in many cases can generate entire shellcodes with no errors.
△ Less
Submitted 8 February, 2022;
originally announced February 2022.
-
Godot is not coming: when we will let innovations enter psychiatry?
Authors:
Milena B. Čukić
Abstract:
Current diagnostic practice in psychiatry is not relying on objective biophysical evidence. Recent pandemic emphasized the need to address the rising number of mood disorders (in particular, depression) cases in a more efficient way. We are proposing several already developed practices that can help improve that diagnostic process: detection based on electrophysiological signals (both electroencep…
▽ More
Current diagnostic practice in psychiatry is not relying on objective biophysical evidence. Recent pandemic emphasized the need to address the rising number of mood disorders (in particular, depression) cases in a more efficient way. We are proposing several already developed practices that can help improve that diagnostic process: detection based on electrophysiological signals (both electroencephalogram and electrocardiogram based) that were shown to be accurate for clinical practice and several modalities of electromagnetic stimulation that were proven to ameliorate symptoms of depression. In this work, we are connecting the two with explanations coming from physiological complexity studies (and our own work) as well as advanced statistical methods like machine learning and the Bayesian inference approach. It is shown that fractal and nonlinear measures can adequately quantify previously undetected changes in intrinsic dynamics of physiological systems, providing the basis for early detection of depression. We are also advocating for early screening of cardiovascular risks in depression which is in connection to previously described decomplexification of the autonomous nervous system resulting in symptoms recognized clinically. All that said, additional information about the level of complexity can help clinicians make a better decisions in the therapeutic process, increase the overall effectiveness of the treatment, and finally increase the quality of life of the patient.
△ Less
Submitted 16 October, 2021;
originally announced October 2021.
-
EVIL: Exploiting Software via Natural Language
Authors:
Pietro Liguori,
Erfan Al-Hossami,
Vittorio Orbinato,
Roberto Natella,
Samira Shaikh,
Domenico Cotroneo,
Bojan Cukic
Abstract:
Writing exploits for security assessment is a challenging task. The writer needs to master programming and obfuscation techniques to develop a successful exploit. To make the task easier, we propose an approach (EVIL) to automatically generate exploits in assembly/Python language from descriptions in natural language. The approach leverages Neural Machine Translation (NMT) techniques and a dataset…
▽ More
Writing exploits for security assessment is a challenging task. The writer needs to master programming and obfuscation techniques to develop a successful exploit. To make the task easier, we propose an approach (EVIL) to automatically generate exploits in assembly/Python language from descriptions in natural language. The approach leverages Neural Machine Translation (NMT) techniques and a dataset that we developed for this work. We present an extensive experimental study to evaluate the feasibility of EVIL, using both automatic and manual analysis, and both at generating individual statements and entire exploits. The generated code achieved high accuracy in terms of syntactic and semantic correctness.
△ Less
Submitted 1 September, 2021;
originally announced September 2021.
-
Shellcode_IA32: A Dataset for Automatic Shellcode Generation
Authors:
Pietro Liguori,
Erfan Al-Hossami,
Domenico Cotroneo,
Roberto Natella,
Bojan Cukic,
Samira Shaikh
Abstract:
We take the first step to address the task of automatically generating shellcodes, i.e., small pieces of code used as a payload in the exploitation of a software vulnerability, starting from natural language comments. We assemble and release a novel dataset (Shellcode_IA32), consisting of challenging but common assembly instructions with their natural language descriptions. We experiment with stan…
▽ More
We take the first step to address the task of automatically generating shellcodes, i.e., small pieces of code used as a payload in the exploitation of a software vulnerability, starting from natural language comments. We assemble and release a novel dataset (Shellcode_IA32), consisting of challenging but common assembly instructions with their natural language descriptions. We experiment with standard methods in neural machine translation (NMT) to establish baseline performance levels on this task.
△ Less
Submitted 18 March, 2022; v1 submitted 27 April, 2021;
originally announced April 2021.
-
The comparison of Higuchi fractal dimension and Sample Entropy analysis of sEMG: effects of muscle contraction intensity and TMS
Authors:
Milena B. Cukic,
Mirjana M. Platisa,
Aleksandar Kalauzi,
Joji Oommen,
Milos R. Ljubisavljevic
Abstract:
The aim of the study was to examine how the complexity of surface electromyogram (sEMG) signal, estimated by Higuchi fractal dimension (HFD) and Sample Entropy (SampEn), change depending on muscle contraction intensity and external perturbation of the corticospinal activity during muscle contraction induced by single-pulse Transcranial Magnetic Stimulation (spTMS). HFD and SampEn were computed fro…
▽ More
The aim of the study was to examine how the complexity of surface electromyogram (sEMG) signal, estimated by Higuchi fractal dimension (HFD) and Sample Entropy (SampEn), change depending on muscle contraction intensity and external perturbation of the corticospinal activity during muscle contraction induced by single-pulse Transcranial Magnetic Stimulation (spTMS). HFD and SampEn were computed from sEMG signal recorded at three various levels of voluntary contraction before and after spTMS. After spTMS, both HFD and SampEn decreased at medium compared to the mild contraction. SampEn increased, while HFD did not change significantly at strong compared to medium contraction. spTMS significantly decreased both parameters at all contraction levels. When same parameters were computed from the mathematically generated sine-wave calibration curves, the results show that SampEn has better accuracy at lower (0-40 Hz) and HFD at higher (60-120 Hz) frequencies. Changes in the sEMG complexity associated with increased muscle contraction intensity cannot be accurately depicted by a single complexity measure. Examination of sEMG should entail both SampEn and HFD as they provide complementary information about different frequency components of sEMG. Further studies are needed to explain the implication of changes in nonlinear parameters and their relation to underlying sEMG physiological processes.
△ Less
Submitted 28 March, 2018;
originally announced March 2018.