TY - JOUR
T1 - A Novel Metadata Based Multi-Label Document Classification Technique
AU - Sajid, Naseer Ahmed
AU - Ahmad, Munir
AU - Rahman, Atta Ur
AU - Zaman, Gohar
AU - Ahmed, Mohammed Salih
AU - Ibrahim, Nehad
AU - Ahmed, Mohammed Imran B.
AU - Krishnasamy, Gomathi
AU - Alzaher, Reem
AU - Alkharraa, Mariam
AU - AlKhulaifi, Dania
AU - AlQahtani, Maryam
AU - Salam, Asiya A.
AU - Saraireh, Linah
AU - Gollapalli, Mohammed
AU - Ahmed, Rashad
N1 - Publisher Copyright:
© 2023 CRL Publishing. All rights reserved.
PY - 2023
Y1 - 2023
N2 - From the beginning, the process of research and its publication is an ever-growing phenomenon and with the emergence of web technologies, its growth rate is overwhelming. On a rough estimate, more than thirty thousand research journals have been issuing around four million papers annually on average. Search engines, indexing services, and digital libraries have been searching for such publications over the web. Nevertheless, getting the most relevant articles against the user requests is yet a fantasy. It is mainly because the articles are not appropriately indexed based on the hierarchies of granular subject classification. To overcome this issue, researchers are striving to investigate new techniques for the classification of the research articles especially, when the complete article text is not available (a case of non-open access articles). The proposed study aims to investigate the multilabel classification over the available metadata in the best possible way and to assess, “to what extent metadata-based features can perform in contrast to content-based approaches.” In this regard, novel techniques for investigating multilabel classification have been proposed, developed, and evaluated on metadata such as the Title and Keywords of the articles. The proposed technique has been assessed for two diverse datasets, namely, from the Journal of universal computer science (J.UCS) and the benchmark dataset comprises of the articles published by the Association for computing machinery (ACM). The proposed technique yields encouraging results in contrast to the state-of-the-art techniques in the literature.
AB - From the beginning, the process of research and its publication is an ever-growing phenomenon and with the emergence of web technologies, its growth rate is overwhelming. On a rough estimate, more than thirty thousand research journals have been issuing around four million papers annually on average. Search engines, indexing services, and digital libraries have been searching for such publications over the web. Nevertheless, getting the most relevant articles against the user requests is yet a fantasy. It is mainly because the articles are not appropriately indexed based on the hierarchies of granular subject classification. To overcome this issue, researchers are striving to investigate new techniques for the classification of the research articles especially, when the complete article text is not available (a case of non-open access articles). The proposed study aims to investigate the multilabel classification over the available metadata in the best possible way and to assess, “to what extent metadata-based features can perform in contrast to content-based approaches.” In this regard, novel techniques for investigating multilabel classification have been proposed, developed, and evaluated on metadata such as the Title and Keywords of the articles. The proposed technique has been assessed for two diverse datasets, namely, from the Journal of universal computer science (J.UCS) and the benchmark dataset comprises of the articles published by the Association for computing machinery (ACM). The proposed technique yields encouraging results in contrast to the state-of-the-art techniques in the literature.
KW - Multilabel classification
KW - content/data mining
KW - indexing
KW - metadata
UR - http://www.scopus.com/inward/record.url?scp=85147744026&partnerID=8YFLogxK
U2 - 10.32604/csse.2023.033844
DO - 10.32604/csse.2023.033844
M3 - Article
AN - SCOPUS:85147744026
SN - 0267-6192
VL - 46
SP - 2195
EP - 2214
JO - Computer Systems Science and Engineering
JF - Computer Systems Science and Engineering
IS - 2
ER -