Skip to main navigation Skip to search Skip to main content

From Word Embedding to Cyber-Phrase Embedding: Comparison of Processing Cybersecurity Texts

  • Moumita Das Purba
  • , Bill Chu
  • , Ehab Al-Shaer

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

Much of the vital information about emerging threats and the corresponding defensive measures are contained in large volumes of natural language texts online. Capturing such actionable intelligence in real-time is critical to prevent large scale attacks automatically. The ATTCK framework is a widely recognized standard to catalog technical details of cyber threats and deploy mitigating measures. A technique in ATTCK specifies a set of adversary actions to achieve a particular goal, such as Exfiltration over Command and Control channel. Details of the technique include encrypted traffic and encoded data. A key challenge in identifying such cyber intelligence from natural language texts is that for a given action, such as encrypted traffic, many alternative expressions are possible (e.g., send using a self-signed certificate, send using HTTPS requests). It is not practical to manually provide an exhaustive list of all such variants. We demonstrate that using cyber-phrase embedding on a cybersecurity text corpus is a promising approach to overcome such difficulties. Our evaluation demonstrates that our model outperforms existing models. We have created an open-source project to make our tools and data available for the cybersecurity research community.

Original languageEnglish
Title of host publicationProceedings - 2020 IEEE International Conference on Intelligence and Security Informatics, ISI 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728188003
DOIs
StatePublished - 9 Nov 2020
Externally publishedYes
Event18th IEEE International Conference on Intelligence and Security Informatics, ISI 2020 - Virtual, Arlington, United States
Duration: 9 Nov 202010 Nov 2020

Publication series

NameProceedings - 2020 IEEE International Conference on Intelligence and Security Informatics, ISI 2020

Conference

Conference18th IEEE International Conference on Intelligence and Security Informatics, ISI 2020
Country/TerritoryUnited States
CityVirtual, Arlington
Period9/11/2010/11/20

Bibliographical note

Publisher Copyright:
© 2020 IEEE.

Keywords

  • CK Framework
  • Cyber attack
  • Cyber threat intelligence
  • MITRE ATT
  • NLP
  • Text mining
  • Word Embedding

ASJC Scopus subject areas

  • Information Systems and Management
  • Safety, Risk, Reliability and Quality
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Information Systems

Fingerprint

Dive into the research topics of 'From Word Embedding to Cyber-Phrase Embedding: Comparison of Processing Cybersecurity Texts'. Together they form a unique fingerprint.

Cite this