Abstract
Much of the vital information about emerging threats and the corresponding defensive measures are contained in large volumes of natural language texts online. Capturing such actionable intelligence in real-time is critical to prevent large scale attacks automatically. The ATTCK framework is a widely recognized standard to catalog technical details of cyber threats and deploy mitigating measures. A technique in ATTCK specifies a set of adversary actions to achieve a particular goal, such as Exfiltration over Command and Control channel. Details of the technique include encrypted traffic and encoded data. A key challenge in identifying such cyber intelligence from natural language texts is that for a given action, such as encrypted traffic, many alternative expressions are possible (e.g., send using a self-signed certificate, send using HTTPS requests). It is not practical to manually provide an exhaustive list of all such variants. We demonstrate that using cyber-phrase embedding on a cybersecurity text corpus is a promising approach to overcome such difficulties. Our evaluation demonstrates that our model outperforms existing models. We have created an open-source project to make our tools and data available for the cybersecurity research community.
| Original language | English |
|---|---|
| Title of host publication | Proceedings - 2020 IEEE International Conference on Intelligence and Security Informatics, ISI 2020 |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| ISBN (Electronic) | 9781728188003 |
| DOIs | |
| State | Published - 9 Nov 2020 |
| Externally published | Yes |
| Event | 18th IEEE International Conference on Intelligence and Security Informatics, ISI 2020 - Virtual, Arlington, United States Duration: 9 Nov 2020 → 10 Nov 2020 |
Publication series
| Name | Proceedings - 2020 IEEE International Conference on Intelligence and Security Informatics, ISI 2020 |
|---|
Conference
| Conference | 18th IEEE International Conference on Intelligence and Security Informatics, ISI 2020 |
|---|---|
| Country/Territory | United States |
| City | Virtual, Arlington |
| Period | 9/11/20 → 10/11/20 |
Bibliographical note
Publisher Copyright:© 2020 IEEE.
Keywords
- CK Framework
- Cyber attack
- Cyber threat intelligence
- MITRE ATT
- NLP
- Text mining
- Word Embedding
ASJC Scopus subject areas
- Information Systems and Management
- Safety, Risk, Reliability and Quality
- Computer Networks and Communications
- Computer Vision and Pattern Recognition
- Information Systems
Fingerprint
Dive into the research topics of 'From Word Embedding to Cyber-Phrase Embedding: Comparison of Processing Cybersecurity Texts'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver