Abstract
Purpose – This study aims to predict the risk of Type 2 Diabetes Mellitus (T2DM) among social media users in Saudi Arabia using machine learning (ML) techniques. By analyzing Instagram posts, the research seeks to identify lifestyle and behavioral patterns indicative of T2DM risk. Design/methodology/approach – A multi-stage methodology was used, incorporating data collection, preprocessing, topic modeling using latent Dirichlet allocation, sentiment analysis via TextBlob and feature engineering through Term Frequency–Inverse Document Frequency vectorization. A Random Forest classifier was developed and optimized through hyperparameter tuning. The model was evaluated against other classifiers (Logistic Regression, Support Vector Classifier, Naïve Bayes and K-Nearest Neighbors) using accuracy, precision, recall, F1-score and Receiver Operating Characteristic – Area Under the Curve (ROC-AUC). Findings – The Random Forest classifier achieved 90% accuracy with an ROC-AUC of 0.92, outperforming other models. Linguistic features, sentiment polarity, and behavioral indicators, such as physical activity, food habits, and hashtag usage, significantly contributed to the predictive model. A five-topic model yielded the highest coherence score in topic modeling, demonstrating clarity in key health-related themes. The study underscores the potential of Instagram data in early T2DM risk detection. Originality/value – This research is one of the first to utilize Instagram data in the Saudi context to predict T2DM risk, offering culturally contextualized, behavior-based insights for early detection. It demonstrates the effectiveness of integrating sentiment analysis, topic modeling and ML for digital health surveillance. The findings contribute to the growing field of health informatics by showing how user-generated content can enhance early intervention strategies.
| Original language | English |
|---|---|
| Pages (from-to) | 1-22 |
| Number of pages | 22 |
| Journal | Information Discovery and Delivery |
| DOIs | |
| State | Accepted/In press - 2025 |
Bibliographical note
Publisher Copyright:© 2025 Emerald Publishing Limited
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
Keywords
- Healthcare
- Machine learning
- Predictive modeling
- Sentiment analysis
- Social media
- Type 2 Diabetes
ASJC Scopus subject areas
- General Computer Science
- Library and Information Sciences
Fingerprint
Dive into the research topics of 'Predicting type 2 diabetes among social media users in Saudi Arabia using machine learning'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver