Abstract
Twitter is one of the most popular social networking services in the world. Over the past few years' people have been using Twitter messages on a daily basis to express their views and share their feelings. As a result, the size of data is increasing dramatically creating opportunity for researchers to use these tweets as sources for data mining and extract valuable information. Being popular in Saudi Arabia, we believe that twitter messages (tweets) are a good source to capture the sentiment of people. These twitter messages can be divided mostly into two classes: positive or negative. Our goal is to design and implement a sentiment analyzer that will classify real tweets collected by Twitter API into one of the above categories. We used a machine learning based sentiment analysis method and applied several supervised learning algorithms such as Logistics Regression, Naïve Bayes, Support Vector Machine, and Decision Trees. To accomplish this, we preprocess the data-set that we have chosen to train our classifier. We used Bag-of-words and TF-IDF techniques to extract features from the preprocessed tweets. We also used uni-gram, bi-gram, and tri-gram to rank our features to identify best predictive accuracy for the classifier. Among the classification techniques that we used, Logistic regression performs the best in terms of accuracy for the validation and test data while using tri-gram features with stop words and TF-IDF feature extraction technique.
Original language | English |
---|---|
Title of host publication | 2021 International Conference of Women in Data Science at Taif University, WiDSTaif 2021 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
ISBN (Electronic) | 9781665449489 |
DOIs | |
State | Published - 30 Mar 2021 |
Publication series
Name | 2021 International Conference of Women in Data Science at Taif University, WiDSTaif 2021 |
---|
Bibliographical note
Publisher Copyright:© 2021 IEEE.
Keywords
- Bag-of words model
- Data Mining
- Machine Learning
- Sentiment Analysis
- Stemming
- Stop Words
- TF-IDF model
- Text Preprocessor
ASJC Scopus subject areas
- Computer Science (miscellaneous)
- Computer Science Applications