Adith Chandriah
My Project
Abstract
Sentiment analysis programs rely heavily on classification techniques to identify positive, negative, or neutral sentiment. A significant limitation of current sentiment analysis programs is the trade-off between accuracy and processing time. This study explores optimization strategies for sentiment analysis using a machine learning model built in three steps: data extraction, processing, and modeling. By testing classifiers like Naive Bayes, J48, SVM, and OneR, the study identifies the strengths and weaknesses of each method. While Naive Bayes proved to be the fastest learner, OneR offered the highest accuracy, showcasing the variability in performance based on application needs.
Introduction
Sentiment analysis focuses on identifying and classifying opinions within textual data. With the rapid growth of online social media, sentiment analysis has gained prominence as a tool for analyzing customer reviews, public opinions, and trends. Platforms like Twitter generate vast amounts of data daily, necessitating efficient methods for analyzing sentiment.
The goal of sentiment analysis is to extract actionable insights and feedback from the text, benefiting industries such as retail and marketing. This study focuses on four classifiers—Naive Bayes, J48, SVM, and OneR—to identify the most efficient methods for sentiment classification. By applying preprocessing techniques, the study refines the input data to achieve improved results, balancing accuracy and speed.
Methodology
​
Data Processing
Preprocessing was conducted using Python 3.5, employing libraries like NLTK and bs4 to clean and prepare the text. Text was filtered into document-level, sentence-level, and word-level inputs. Outdated datasets were excluded to ensure relevance.
Classifiers Evaluated
Naive Bayes: Utilizes tone/mindset probabilities to classify sentiment. Benefits include fast processing and minimal data requirements.
J48 Algorithm: Uses decision trees to classify text. Though slower, it handles larger datasets effectively.
OneR Algorithm: Simplifies classification with a single rule, achieving high accuracy.
SVM (Support Vector Machine): Focuses on hyperplane separation for text classification. Demonstrates high accuracy but requires advanced computation.
Advanced Modeling
LSTM (Long Short-Term Memory): Incorporates a feedback loop to retain important information while filtering out unnecessary data. This ensures the generation of nuanced predictions.
ULMFIT: Pre-trained with over 28,595 entries, ULMFIT is optimized for sentiment classification with enhanced accuracy.
Optimization Strategies
Gradual unfreezing of layers in the LSTM model and advanced preprocessing steps, such as removing nulls and handling incomplete data, were utilized to improve results. The Radial Basis Function kernel was applied to refine SVM performance.
Conclusion
According to findings from Fatima Jemai, Mohamed Hayouni, and Dr. Sahbi Baccar, Naive Bayes performed best for basic tasks due to its speed, while OneR excelled in accuracy. Optimizing sentiment analysis depends heavily on the specific application. For instance:
Naive Bayes: Ideal for fast, large-scale analyses.
OneR: Suitable for accuracy-focused tasks.
This study emphasizes the importance of preprocessing and selecting appropriate classifiers for efficiency. Further optimization could incorporate deep learning techniques to enhance performance across diverse datasets.
​
Acknowledgments
I would like to thank Fatima Jemai, Mohamed Hayouni, and Dr. Sahbi Baccar for their insights and mentorship throughout this project. Special thanks to Mohit Nadkarni and Delegates Beyond Borders for their support.
​
References
Albadiani, B., Shi, R., & Dong, J. (n.d.). Sentiment Analysis Using Machine Learning Algorithms.
Jemai, F., Hayouni, M., & Baccar, S. (n.d.). Sentiment Analysis on Twitter Incorporating Universal Language Model Fine-Tuning and SVM.
Contact
I'm always looking for new and exciting opportunities. Let's connect.
123-456-7890