Big Text Data Analysis for Sentiment Classification in Product Reviews Using Advanced Large Language Models

Authors

  • Ram Mohan Polam University of Illinois at Springfield. Author
  • Bhavana Kamarthapu Fairleigh Dickinson University. Author
  • Ajay Babu Kakani Wright State University. Author
  • Sri Krishna Kireeti Nandiraju University of Illinois at Springfield. Author
  • Sandeep Kumar Chundru University of Central Missouri, Chundru. Author
  • Srikanth Reddy Vangala University of Bridgeport. Author

DOI:

https://doi.org/10.63282/3050-9416.IJAIBDCMS-V2I2P107

Keywords:

Sentiment Analysis, Product Reviews, BERT Model, Text Preprocessing, Machine Learning Classification

Abstract

Sentiment analysis is the practice of mining data, opinions, reviews, or statements using natural language processing (NLP) to predict the statement's emotion. Sentiment analysis involves categorizing content into three stages: "positive," "negative," and "neutral." It has an impact on a large number of individuals and companies globally. Sentiment analysis is an essential task in natural language processing, particularly in the e-commerce sector where understanding customer sentiment may significantly influence company decisions. Using the dataset of Amazon product evaluations, it examines the application of Bidirectional Encoder Representations from Transformers (BERT) in deep learning-based sentiment classification in this paper. Standard text normalization techniques, including tokenization, case folding, and stop word removal, are applied to the dataset. BERT is used to set a performance benchmark to compare with two baseline models.  A TF-IDF vector served as the feature representation for both methods. However, an approach based on the SentiWordNet lexicon was employed for the lexicon-based method. The models are compared using evaluation metrics such as F1 score, recall, accuracy, and precision. According to the experimental results, the proposed BERT model outperforms traditional methods in terms of F1-score (88.97%), recall (89.67%), accuracy (89.84%), and precision (88.87%). According to the research, transformer-based models outperform large-scale review datasets in sentiment analysis tasks by efficiently learning contextual knowledge and categorization

References

[1] D. M. E.-D. M. Hussein, “A Survey on Sentiment Analysis Challenges,” J. King Saud Univ. - Eng. Sci., vol. 30, no. 4, pp. 330–338, Oct. 2018, doi: 10.1016/j.jksues.2016.04.002.

[2] H. Zou, X. Tang, B. Xie, and B. Liu, “Sentiment Classification Using Machine Learning Techniques with Syntax Features,” in 2015 International Conference on Computational Science and Computational Intelligence (CSCI), IEEE, Dec. 2015, pp. 175–179. doi: 10.1109/CSCI.2015.44.

[3] Q. Pan, X. Zheng, and G. Chen, “A Mix-model based Deep Learning for Text Sentiment Analysis,” in 2018 International Conference on Cloud Computing, Big Data and Blockchain (ICCBB), IEEE, Nov. 2018, pp. 1–6. doi: 10.1109/ICCBB.2018.8756420.

[4] W. Medhat, A. Hassan, and H. Korashy, “Sentiment analysis algorithms and applications: A survey,” Ain Shams Eng. J., vol. 5, no. 4, pp. 1093–1113, Dec. 2014, doi: 10.1016/j.asej.2014.04.011.

[5] N. Sahoo, C. Dellarocas, and S. Srinivasan, “The impact of online product reviews on product returns,” Inf. Syst. Res., 2018, doi: 10.1287/isre.2017.0736.

[6] X. Fang and J. Zhan, “Sentiment analysis using product review data,” J. Big Data, 2015, doi: 10.1186/s40537-015-0015-2.

[7] A. H. Anju, “Extreme Gradient Boosting using Squared Logistics Loss function,” Int. J. Sci. Dev. Res., vol. 2, no. 8, pp. 54–61, 2017.

[8] L. Muliawaty, K. Alamsyah, U. Salamah, and D. S. Maylawati, “The concept of big data in bureaucratic service using sentiment analysis,” Int. J. Sociotechnology Knowl. Dev., 2019, doi: 10.4018/IJSKD.2019070101.

[9] V. Kolluri, “A Comprehensive Analysis on Explainable and Ethical Machine: Demystifying Advances in Artificial Intelligence,” TIJER - Int. Res. Journals, vol. 2, no. 7, pp. 2349–9249, 2015.

[10] S. Karimi and F. S. Shahrabadi, “Sentiment Analysis Using BERT (Pre-Training Language Representations) and Deep Learning on Persian Texts,” Technol. Deep Learn., 2019.

[11] A. S. Rathor, A. Agarwal, and P. Dimri, “Comparative Study of Machine Learning Approaches for Amazon Reviews,” in Procedia Computer Science, 2018. doi: 10.1016/j.procs.2018.05.119.

[12] F. Long, K. Zhou, and W. Ou, “Sentiment Analysis of Text Based on Bidirectional LSTM With Multi-Head Attention,” IEEE Access, vol. 7, pp. 141960–141969, 2019, doi: 10.1109/ACCESS.2019.2942614.

[13] M. Yadav and V. Bhojane, “Semi-Supervised Mix-Hindi Sentiment Analysis using Neural Network,” in 2019 9th International Conference on Cloud Computing, Data Science & Engineering (Confluence), IEEE, Jan. 2019, pp. 309–314. doi: 10.1109/CONFLUENCE.2019.8776943.

[14] D. Goularas and S. Kamis, “Evaluation of Deep Learning Techniques in Sentiment Analysis from Twitter Data,” in 2019 International Conference on Deep Learning and Machine Learning in Emerging Applications (Deep-ML), 2019, pp. 12–17. doi: 10.1109/Deep-ML.2019.00011.

[15] N. Kant, R. Puri, N. Yakovenko, and B. Catanzaro, “Practical Text Classification With Large Pre-Trained Language Models,” 2018.

[16] A. Ejaz, Z. Turabee, M. Rahim, and S. Khoja, “Opinion mining approaches on Amazon product reviews: A comparative study,” in 2017 International Conference on Information and Communication Technologies, ICICT 2017, 2017. doi: 10.1109/ICICT.2017.8320185.

[17] T. K. Shivaprasad and J. Shetty, “Sentiment analysis of product reviews: A review,” in 2017 International Conference on Inventive Communication and Computational Technologies (ICICCT), 2017, pp. 298–301. doi: 10.1109/ICICCT.2017.7975207.

[18] V. S and J. R, “Text Mining: open Source Tokenization Tools – An Analysis,” Adv. Comput. Intell. An Int. J., vol. 3, no. 1, pp. 37–47, Jan. 2016, doi: 10.5121/acii.2016.3104.

[19] S. Pei, L. Wang, T. Shen, and Z. Ning, “DA-BERT: Enhancing part-of-speech tagging of aspect sentiment analysis using BERT,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2019. doi: 10.1007/978-3-030-29611-7_7.

[20] Z. Miftahutdinov, I. Alimova, and E. Tutubalina, “KFU NLP Team at SMM4H 2019 Tasks: Want to Extract Adverse Drugs Reactions from Tweets? BERT to The Rescue,” 2019. doi: 10.18653/v1/w19-3207.

[21] T. U. Haque, N. N. Saber, and F. M. Shah, “Sentiment analysis on large scale Amazon product reviews,” in 2018 IEEE International Conference on Innovative Research and Development, ICIRD 2018, 2018. doi: 10.1109/ICIRD.2018.8376299.

[22] A. Veluchamy, H. Nguyen, M. L. Diop, and R. Iqbal, “Comparative Study of Sentiment Analysis with Product Reviews Using Machine Learning and Lexicon-Based Approaches,” SMU Data Sci. Rev., vol. 1, no. 4, pp. 1–22, 2018.

[23] Kalla, D., & Samiuddin, V. (2020). Chatbot for medical treatment using NLTK Lib. IOSR J. Comput. Eng, 22, 12.

[24] Kuraku, S., & Kalla, D. (2020). Emotet malware a banking credentials stealer. Iosr J. Comput. Eng, 22, 31-41.

Downloads

Published

2021-06-30

Issue

Section

Articles

How to Cite

1.
Polam RM, Kamarthapu B, Kakani AB, Nandiraju SKK, Chundru SK, Vangala SR. Big Text Data Analysis for Sentiment Classification in Product Reviews Using Advanced Large Language Models. IJAIBDCMS [Internet]. 2021 Jun. 30 [cited 2025 Oct. 29];2(2):55-6. Available from: https://ijaibdcms.org/index.php/ijaibdcms/article/view/185