subscribe to arXiv mailings

doi 10.1109/MSEC.2024.3355713

AI Code Generators for Security: Friend or Foe?

Authors: Roberto Natella, Pietro Liguori, Cristina Improta, Bojan Cukic, Domenico Cotroneo

Abstract: Recent advances of artificial intelligence (AI) code generators are opening new opportunities in software security research, including misuse by malicious actors. We review use cases for AI code generators for security and introduce an evaluation benchmark. Recent advances of artificial intelligence (AI) code generators are opening new opportunities in software security research, including misuse by malicious actors. We review use cases for AI code generators for security and introduce an evaluation benchmark. △ Less

Submitted 2 February, 2024; originally announced February 2024.

Comments: Dataset available at: https://github.com/dessertlab/violent-python

Journal ref: IEEE Security & Privacy, Early Access, February 2024

arXiv:2306.05079 [pdf, other]

Enhancing Robustness of AI Offensive Code Generators via Data Augmentation

Authors: Cristina Improta, Pietro Liguori, Roberto Natella, Bojan Cukic, Domenico Cotroneo

Abstract: In this work, we present a method to add perturbations to the code descriptions to create new inputs in natural language (NL) from well-intentioned developers that diverge from the original ones due to the use of new words or because they miss part of them. The goal is to analyze how and to what extent perturbations affect the performance of AI code generators in the context of security-oriented c… ▽ More In this work, we present a method to add perturbations to the code descriptions to create new inputs in natural language (NL) from well-intentioned developers that diverge from the original ones due to the use of new words or because they miss part of them. The goal is to analyze how and to what extent perturbations affect the performance of AI code generators in the context of security-oriented code. First, we show that perturbed descriptions preserve the semantics of the original, non-perturbed ones. Then, we use the method to assess the robustness of three state-of-the-art code generators against the newly perturbed inputs, showing that the performance of these AI-based solutions is highly affected by perturbations in the NL descriptions. To enhance their robustness, we use the method to perform data augmentation, i.e., to increase the variability and diversity of the NL descriptions in the training data, proving its effectiveness against both perturbed and non-perturbed code descriptions. △ Less

Submitted 1 October, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

arXiv:2212.06008 [pdf, other]

doi 10.1016/j.eswa.2023.120073

Who Evaluates the Evaluators? On Automatic Metrics for Assessing AI-based Offensive Code Generators

Authors: Pietro Liguori, Cristina Improta, Roberto Natella, Bojan Cukic, Domenico Cotroneo

Abstract: AI-based code generators are an emerging solution for automatically writing programs starting from descriptions in natural language, by using deep neural networks (Neural Machine Translation, NMT). In particular, code generators have been used for ethical hacking and offensive security testing by generating proof-of-concept attacks. Unfortunately, the evaluation of code generators still faces seve… ▽ More AI-based code generators are an emerging solution for automatically writing programs starting from descriptions in natural language, by using deep neural networks (Neural Machine Translation, NMT). In particular, code generators have been used for ethical hacking and offensive security testing by generating proof-of-concept attacks. Unfortunately, the evaluation of code generators still faces several issues. The current practice uses output similarity metrics, i.e., automatic metrics that compute the textual similarity of generated code with ground-truth references. However, it is not clear what metric to use, and which metric is most suitable for specific contexts. This work analyzes a large set of output similarity metrics on offensive code generators. We apply the metrics on two state-of-the-art NMT models using two datasets containing offensive assembly and Python code with their descriptions in the English language. We compare the estimates from the automatic metrics with human evaluation and provide practical insights into their strengths and limitations. △ Less

Submitted 13 April, 2023; v1 submitted 12 December, 2022; originally announced December 2022.

arXiv:2203.15319 [pdf, ps, other]

doi 10.1145/3528588.3528653

Can NMT Understand Me? Towards Perturbation-based Evaluation of NMT Models for Code Generation

Authors: Pietro Liguori, Cristina Improta, Simona De Vivo, Roberto Natella, Bojan Cukic, Domenico Cotroneo

Abstract: Neural Machine Translation (NMT) has reached a level of maturity to be recognized as the premier method for the translation between different languages and aroused interest in different research areas, including software engineering. A key step to validate the robustness of the NMT models consists in evaluating the performance of the models on adversarial inputs, i.e., inputs obtained from the ori… ▽ More Neural Machine Translation (NMT) has reached a level of maturity to be recognized as the premier method for the translation between different languages and aroused interest in different research areas, including software engineering. A key step to validate the robustness of the NMT models consists in evaluating the performance of the models on adversarial inputs, i.e., inputs obtained from the original ones by adding small amounts of perturbation. However, when dealing with the specific task of the code generation (i.e., the generation of code starting from a description in natural language), it has not yet been defined an approach to validate the robustness of the NMT models. In this work, we address the problem by identifying a set of perturbations and metrics tailored for the robustness assessment of such models. We present a preliminary experimental evaluation, showing what type of perturbations affect the model the most and deriving useful insights for future directions. △ Less

Submitted 30 March, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

Comments: Paper accepted for publication in the proceedings of The 1st Intl. Workshop on Natural Language-based Software Engineering (NLBSE) to be held with ICSE 2022

arXiv:2202.03755 [pdf, other]

doi 10.1007/s10515-022-00331-3

Can We Generate Shellcodes via Natural Language? An Empirical Study

Authors: Pietro Liguori, Erfan Al-Hossami, Domenico Cotroneo, Roberto Natella, Bojan Cukic, Samira Shaikh

Abstract: Writing software exploits is an important practice for offensive security analysts to investigate and prevent attacks. In particular, shellcodes are especially time-consuming and a technical challenge, as they are written in assembly language. In this work, we address the task of automatically generating shellcodes, starting purely from descriptions in natural language, by proposing an approach ba… ▽ More Writing software exploits is an important practice for offensive security analysts to investigate and prevent attacks. In particular, shellcodes are especially time-consuming and a technical challenge, as they are written in assembly language. In this work, we address the task of automatically generating shellcodes, starting purely from descriptions in natural language, by proposing an approach based on Neural Machine Translation (NMT). We then present an empirical study using a novel dataset (Shellcode_IA32), which consists of 3,200 assembly code snippets of real Linux/x86 shellcodes from public databases, annotated using natural language. Moreover, we propose novel metrics to evaluate the accuracy of NMT at generating shellcodes. The empirical analysis shows that NMT can generate assembly code snippets from the natural language with high accuracy and that in many cases can generate entire shellcodes with no errors. △ Less

Submitted 8 February, 2022; originally announced February 2022.

Comments: 33 pages, 5 figures, 9 tables. To be published in Automated Software Engineering journal

arXiv:2110.12873 [pdf]

Godot is not coming: when we will let innovations enter psychiatry?

Authors: Milena B. Čukić

Abstract: Current diagnostic practice in psychiatry is not relying on objective biophysical evidence. Recent pandemic emphasized the need to address the rising number of mood disorders (in particular, depression) cases in a more efficient way. We are proposing several already developed practices that can help improve that diagnostic process: detection based on electrophysiological signals (both electroencep… ▽ More Current diagnostic practice in psychiatry is not relying on objective biophysical evidence. Recent pandemic emphasized the need to address the rising number of mood disorders (in particular, depression) cases in a more efficient way. We are proposing several already developed practices that can help improve that diagnostic process: detection based on electrophysiological signals (both electroencephalogram and electrocardiogram based) that were shown to be accurate for clinical practice and several modalities of electromagnetic stimulation that were proven to ameliorate symptoms of depression. In this work, we are connecting the two with explanations coming from physiological complexity studies (and our own work) as well as advanced statistical methods like machine learning and the Bayesian inference approach. It is shown that fractal and nonlinear measures can adequately quantify previously undetected changes in intrinsic dynamics of physiological systems, providing the basis for early detection of depression. We are also advocating for early screening of cardiovascular risks in depression which is in connection to previously described decomplexification of the autonomous nervous system resulting in symptoms recognized clinically. All that said, additional information about the level of complexity can help clinicians make a better decisions in the therapeutic process, increase the overall effectiveness of the treatment, and finally increase the quality of life of the patient. △ Less

Submitted 16 October, 2021; originally announced October 2021.

Comments: 35 pages, 4 pictures

arXiv:2109.00279 [pdf, other]

doi 10.1109/ISSRE52982.2021.00042

EVIL: Exploiting Software via Natural Language

Authors: Pietro Liguori, Erfan Al-Hossami, Vittorio Orbinato, Roberto Natella, Samira Shaikh, Domenico Cotroneo, Bojan Cukic

Abstract: Writing exploits for security assessment is a challenging task. The writer needs to master programming and obfuscation techniques to develop a successful exploit. To make the task easier, we propose an approach (EVIL) to automatically generate exploits in assembly/Python language from descriptions in natural language. The approach leverages Neural Machine Translation (NMT) techniques and a dataset… ▽ More Writing exploits for security assessment is a challenging task. The writer needs to master programming and obfuscation techniques to develop a successful exploit. To make the task easier, we propose an approach (EVIL) to automatically generate exploits in assembly/Python language from descriptions in natural language. The approach leverages Neural Machine Translation (NMT) techniques and a dataset that we developed for this work. We present an extensive experimental study to evaluate the feasibility of EVIL, using both automatic and manual analysis, and both at generating individual statements and entire exploits. The generated code achieved high accuracy in terms of syntactic and semantic correctness. △ Less

Submitted 1 September, 2021; originally announced September 2021.

Comments: Paper accepted at the 32nd International Symposium on Software Reliability Engineering (ISSRE 2021)

arXiv:2104.13100 [pdf, other]

doi 10.18653/v1/2021.nlp4prog-1.7

Shellcode_IA32: A Dataset for Automatic Shellcode Generation

Authors: Pietro Liguori, Erfan Al-Hossami, Domenico Cotroneo, Roberto Natella, Bojan Cukic, Samira Shaikh

Abstract: We take the first step to address the task of automatically generating shellcodes, i.e., small pieces of code used as a payload in the exploitation of a software vulnerability, starting from natural language comments. We assemble and release a novel dataset (Shellcode_IA32), consisting of challenging but common assembly instructions with their natural language descriptions. We experiment with stan… ▽ More We take the first step to address the task of automatically generating shellcodes, i.e., small pieces of code used as a payload in the exploitation of a software vulnerability, starting from natural language comments. We assemble and release a novel dataset (Shellcode_IA32), consisting of challenging but common assembly instructions with their natural language descriptions. We experiment with standard methods in neural machine translation (NMT) to establish baseline performance levels on this task. △ Less

Submitted 18 March, 2022; v1 submitted 27 April, 2021; originally announced April 2021.

Comments: Paper accepted to NLP4Prog Workshop 2021 co-located with ACL-IJCNLP 2021. Extended journal version of this work has been published in the Automated Software Engineering journal, Volume 29, Article no. 30, March 2022, DOI: 10.1007/s10515-022-00331-3

arXiv:1803.10753 [pdf]

The comparison of Higuchi fractal dimension and Sample Entropy analysis of sEMG: effects of muscle contraction intensity and TMS

Authors: Milena B. Cukic, Mirjana M. Platisa, Aleksandar Kalauzi, Joji Oommen, Milos R. Ljubisavljevic

Abstract: The aim of the study was to examine how the complexity of surface electromyogram (sEMG) signal, estimated by Higuchi fractal dimension (HFD) and Sample Entropy (SampEn), change depending on muscle contraction intensity and external perturbation of the corticospinal activity during muscle contraction induced by single-pulse Transcranial Magnetic Stimulation (spTMS). HFD and SampEn were computed fro… ▽ More The aim of the study was to examine how the complexity of surface electromyogram (sEMG) signal, estimated by Higuchi fractal dimension (HFD) and Sample Entropy (SampEn), change depending on muscle contraction intensity and external perturbation of the corticospinal activity during muscle contraction induced by single-pulse Transcranial Magnetic Stimulation (spTMS). HFD and SampEn were computed from sEMG signal recorded at three various levels of voluntary contraction before and after spTMS. After spTMS, both HFD and SampEn decreased at medium compared to the mild contraction. SampEn increased, while HFD did not change significantly at strong compared to medium contraction. spTMS significantly decreased both parameters at all contraction levels. When same parameters were computed from the mathematically generated sine-wave calibration curves, the results show that SampEn has better accuracy at lower (0-40 Hz) and HFD at higher (60-120 Hz) frequencies. Changes in the sEMG complexity associated with increased muscle contraction intensity cannot be accurately depicted by a single complexity measure. Examination of sEMG should entail both SampEn and HFD as they provide complementary information about different frequency components of sEMG. Further studies are needed to explain the implication of changes in nonlinear parameters and their relation to underlying sEMG physiological processes. △ Less

Submitted 28 March, 2018; originally announced March 2018.

Comments: 21 pages, 3 Figures

Showing 1–9 of 9 results for author: Cukic, B