MAster assignment
Cyber security: Secbert
TYPE : Master CS (Cybersecurity)
Period: Start date: March 2022
Student: Liberato, M. (Matteo, Student M-CS)
If you are interested please contact:
Description:
SecBERT
Recently, advances in neural networks and Natural Language Processing (NLP) techniques such as BERT and GPT-3 opened tremendous opportunities for automatic text processing such as text classification or knowledge extraction. While existing techniques work great for generic texts, specific subdomains such as "cyber security" often pose domain-specific challenges. Domains have specific jargon that is not present in general vocabulary, or require domain-specific knowledge for understanding text and subsequently extracting relevant information.
This project aims to develop a new neural network inspired by BERT (SecBERT), that is trained specifically for analysing cyber threat reports. We start by retraining SecBERT for generic NLP tasks such as language modelling (predicting masked words in a sentence) and next sentence prediction (predicting if a chosen next sentence is probable or not given the first sentence). After this initial phase, we will explore how we can use SecBERT to improve state-of-the-art analysis tools for cyber threat reports. Here we will look into (amongst others) improved word-embeddings and knowledge extraction techniques. Here, you are also free to explore your own interests in applying SecBERT to the analysis of cyber threat reports.
Difficulty:
medium to as difficult as you like :)
Requirements:
- Interest in neural networks, machine learning and natural language processing
- Can program in Python
- Worked with frameworks such as PyTorch, Huggingface Transformers, and/or Tensorflow (Recommended)
Related papers:
- BERT: https://arxiv.org/abs/1810.04805
- Transformers (background for BERT):
https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf (Explained in code here: http://nlp.seas.harvard.edu/2018/04/03/attention.html)
- Analysing Cyber Threat Reports: https://arxiv.org/abs/2111.07093
- Analysing Cyber Threat Reports: https://dl.acm.org/doi/abs/10.1145/3134600.3134646
Datasets:
- Cyber threat reports (available via Thijs)
- Scrape data from various internet sources