Dmitrijs Trizna

Hlavní město Praha, Česko Kontaktní údaje
1 tis. sledujících uživatelů Více než 500 spojení

Zaregistrujte se a zobrazte si profil

Pár slov o mně

My goal is to advance humanity's knowledge on how to properly do cybersecurity with the…

Pracovní zkušenosti a vzdělání

  • Microsoft

Zobrazit úplné pracovní zkušenosti uživatele Dmitrijs

Podívejte se na jejich pracovní pozici, délku zaměstnání a další.

nebo

Kliknutím na tlačítko Pokračovat a připojit se nebo se přihlásit vyjadřujete souhlas s podmínkami uvedenými v dokumentech Smlouva s uživatelem, Zásady ochrany soukromí a Zásady pro soubory cookie LinkedIn.

Licence a certifikace

Publikace

  • Nebula: Self-Attention for Dynamic Malware Analysis

    arxiv.org

    Dynamic analysis enables detecting Windows malware by executing programs in a controlled environment, and storing their actions in log reports. Previous work has started training machine learning models on such reports to perform either malware detection or malware classification. However, most of the approaches (i) have only considered convolutional and long-short term memory networks, (ii) they have been built focusing only on APIs called at runtime, without considering other relevant though…

    Dynamic analysis enables detecting Windows malware by executing programs in a controlled environment, and storing their actions in log reports. Previous work has started training machine learning models on such reports to perform either malware detection or malware classification. However, most of the approaches (i) have only considered convolutional and long-short term memory networks, (ii) they have been built focusing only on APIs called at runtime, without considering other relevant though heterogeneous sources of information like network and file operations, and (iii) the code and pretrained models are hardly available, hindering reproducibility of results in this research area. In this work, we overcome these limitations by presenting Nebula, a versatile, self-attention transformer-based neural architecture that can generalize across different behavior representations and formats, combining heterogeneous information from dynamic log reports. We show the efficacy of Nebula on three distinct data collections from different dynamic analysis platforms, comparing its performance with previous state-of-the-art models developed for malware detection and classification tasks. We produce an extensive ablation study that showcases how the components of Nebula influence its predictive performance, while enabling it to outperform some competing approaches at very low false positive rates. We conclude our work by inspecting the behavior of Nebula through the application of explainability methods, which highlight that Nebula correctly focuses more on portions of reports that contain malicious activities. We release our code and models at github.com/dtrizna/nebula.

    Ostatní autoři
    Zobrazit publikaci
  • GPT-like Pre-Training on Unlabeled System Logs for Malware Detection

    Troopers23

    In recent years, self-supervised language modeling techniques, such as those used in GPT-like language models, have shown great success in natural language processing tasks, without requiring supervision from domain experts to learn language semantics. In this talk, we explore the transferability of these techniques to system logs and share pre-training methodology of a Transformer model on unlabeled logs for malware detection.

    Infrastructures generate vast amounts of system logs…

    In recent years, self-supervised language modeling techniques, such as those used in GPT-like language models, have shown great success in natural language processing tasks, without requiring supervision from domain experts to learn language semantics. In this talk, we explore the transferability of these techniques to system logs and share pre-training methodology of a Transformer model on unlabeled logs for malware detection.

    Infrastructures generate vast amounts of system logs suitable for cybersecurity needs, but only a fraction of these logs are labeled and annotated for specific events or anomalies. Our experiments demonstrate that pre-training the model on unlabeled system logs leads to improved performance on the task of malware detection, compared to training on labeled data alone. Moreover, we show that the pre-trained model learns patterns that are similar to what a human engineer would consider relevant in detecting malware.

    These findings highlight the potential of pre-training GPT-like models on system logs for cybersecurity applications, and demonstrate the benefits of self-supervised learning approaches in domains where labeled data is scarce. Overall, our work contributes to the growing body of literature on applying language modeling techniques beyond natural language processing and opens up new avenues for research in the field of cybersecurity.

    Ostatní autoři
    Zobrazit publikaci
  • Shell Language Processing: Unix command parsing for Machine Learning

    In proceedings of Conference on Applied Machine Learning for Information Security (CAMLIS), 2021

    Results presented at DefCon 29, AI Village.

    Zobrazit publikaci

Podívejte se na úplný profil uživatele Dmitrijs

  • Zjistěte, koho společně znáte
  • Nechte se představit
  • Kontaktujte uživatele Dmitrijs
Zaregistrujte se a zobrazte si úplný profil

Lidé si také prohlédli

Získejte nové dovednosti pomocí těchto kurzů