Over the years, many software projects have failed due to poor requirements specifications. Clear specification of security requirements early in the system development process is an effective method to reduce security costs and avoid recurrent security issues. However, in many projects, the security requirements are not written explicitly, or they are mixed with other requirements, making security requirements extraction and classification challenging tasks. Natural language documents are the main source of requirements in a software project. Requirement extraction and classification are important steps towards automatically analysing natural language-based requirement artifacts. Manually classifying natural language-based security requirements is complex, error-prone, and time-consuming. Due to inherent complexity of natural language documents, there is a need to automate some aspects of security requirements specification and analysis process. Researchers have used rule-based and machine learning-based techniques to automate security requirements extraction and classification processes. The existing techniques for natural language requirement documents achieve reasonable performance on heterogeneous datasets, but their performance diminishes when applied to unseen projects. The generality of the classifiers has largely not been researched to tackle different sources of the datasets. Moreover, there are many challenges to deal with when employing automatic security requirements classification and processing including the problem of dealing with imbalanced and small datasets. In this study, the objective is to automatically extract and classify security requirements and other categories (e.g. availability, integrity etc.) using machine learning and NLP techniques. In addition, we also aim to investigate the problem of data scarcity in this area in order to achieve robust training of machine learning models
|Effective start/end date||1/07/21 → 1/01/23|
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.