Sport-fanaticism is one of the social problems. Studying this problem in social network sites such as Twitter becomes important where social sites provide a mean for people to communicate and share emotions. Hence, a huge amount of data is posted on social media every day where text mining and sentiment analysis are essential to automatically analyze such data to extract the desired information and knowledge. In this paper, two main contributions are introduced. The first contribution is that we generated twelve large-scale fanatic-lexicons that can help in building fanatic-classification to automatically classify Arabic social text (e.g., tweets) into fanatic-text or non-fanatic text. The generated fanatic-lexicons can help in building anti-fanatic tools and automatically detecting and measuring sport-fanaticism in Arabic social text. As far as we know, the generated fanatic-lexicons are the first large-scale fanatic-lexicons. The generated resources are publicly available for research purpose. The second contribution is that we proposed a new method to automatically generate sentiment lexicons which is called Term Frequency-Inverse Context Frequency (TFICF). The performance of the proposed-TFICF method is analyzed and compared with one of the common methods in this path which is called Pointwise Mutual Information (PMI). Our proposed-TFICF method showed better performance where the highest accuracy of TFICF is 89%, and the highest accuracy of PMI is 82%.
Bibliographical notePublisher Copyright:
© 2022, The Author(s), under exclusive licence to Springer-Verlag GmbH Austria, part of Springer Nature.
- Natural language processing
- Sentiment analysis
- Social text
ASJC Scopus subject areas
- Information Systems
- Media Technology
- Human-Computer Interaction
- Computer Science Applications