The Impact of Feature Extraction to Naïve Bayes Based Sentiment Analysis on Review Dataset of Indihome Services

Salsabila Mazya  Permataning Tyas; Bagus Setya  Rintyarna; Wiwik  Suharso

doi:10.31849/digitalzone.v13i1.9158

Salsabila Mazya Permataning Tyas Universitas Muhammadiyah Jember
Bagus Setya Rintyarna Universitas Muhammadiyah Jember
Wiwik Suharso Universitas Muhammadiyah Jember

DOI: https://doi.org/10.31849/digitalzone.v13i1.9158

Keywords: analisis sentimen, indihome, TF-IDF, word2vec, naive bayes.

Abstract

Indihome is a product of PT Telekomunikasi Indonesia as an internet service provider or internet service provider (ISP) in Indonesia. Every product or service offered to the public certainly has its advantages and disadvantages, as well as Indihome. From the advantages and disadvantages of Indihome services, we can do a technique, namely sentiment analysis. In this study, sentiment analysis was carried out regarding public responses or reviews about IndiHome services on Twitter social media. This study uses a comparison of TF-IDF and Word2Vec feature extraction, and the classification method used is the nave Bayes classifier. The accuracy results obtained in this study were 96% using the TF-IDF feature extraction and testing was carried out using an unseen data test that was selected randomly resulting in an accuracy of 92%. While the accuracy value obtained by using the Word2Vec feature extraction is 60% by testing using unseen test data that was selected randomly resulting in an accuracy value of 44%.

Downloads

Download data is not yet available.

References

B. S. Rintyarna, H. Kuswanto, R. Sarno, and E. K. Rachmaningsih, “Modelling Service Quality of Internet Service Providers during COVID-19 : The Customer Perspective Based on Twitter Dataset,” pp. 1–12, 2022.

B. S. Rintyarna, “Mapping acceptance of Indonesian organic food consumption under Covid-19 pandemic using Sentiment Analysis of Twitter dataset,” J. Theor. Appl. Inf. Technol., vol. 99, no. 5, pp. 1009–1019, 2021.

B. S. Rintyarna, R. Sarno, and C. Fatichah, “Semantic Features for Optimizing Supervised Approach of Sentiment Analysis on Product Reviews,” MDPI Comput., vol. 8, no. 3, pp. 1–16, 2019.

F. Rahutomo, D. S. E. Ikawati, and O. A. Rohman, “Evaluasi Fitur Word2Vec Pada Sistem Ujian Esai Online,” JIPI (Jurnal Ilm. Penelit. dan Pembelajaran Inform., vol. 4, no. 1, pp. 36–45, 2019.

G. A. Buntoro, “Analisis Sentimen Calon Gubernur DKI Jakarta 2017 Di Twitter,” INTEGER J. Inf. Technol., vol. 1, no. 1, pp. 32–41, 2017, [Online]. Available: https://www.researchgate.net/profile/Ghulam_Buntoro/publication/316617194_Analisis_Sentimen_Calon_Gubernur_DKI_Jakarta_2017_Di_Twitter/links/5907eee44585152d2e9ff992/Analisis-Sentimen-Calon-Gubernur-DKI-Jakarta-2017-Di-Twitter.pdf.

H. Murfi, F. L. Siagian, and Y. Satria, “Topic features for machine learning-based sentiment analysis in Indonesian tweets,” Int. J. Intell. Comput. Cybern., p. IJICC-04-2018-0057, 2019, doi: 10.1108/IJICC-04-2018-0057.

A. K. M. Thelwall, K. Buckley, G. Paltoglou, D. Cai, “Sentiment Strength Detection in Short Informal Text,” J. Am. Soc. Inf. Sci. Technol., vol. 61, no. 12, pp. 2544–2558, 2010.

F. W. KURNIAWAN, “Analisis Sentimen Twitter Bahasa Indonesia dengan Word2Vec,” Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 2, no. 2, pp. 4704–4713, 2020, [Online]. Available: https://openlibrary.telkomuniversity.ac.id/home/catalog/id/159923/slug/analisis-sentimen-twitter-bahasa-indonesia-dengan-word2vec.html%0A/home/catalog/id/159923/slug/analisis-sentimen-twitter-bahasa-indonesia-dengan-word2vec.html.

S. Fransiska and A. Irham Gufroni, “Sentiment Analysis Provider by.U on Google Play Store Reviews with TF-IDF and Support Vector Machine (SVM) Method,” Sci. J. Informatics, vol. 7, no. 2, pp. 2407–7658, 2020, [Online]. Available: http://journal.unnes.ac.id/nju/index.php/sji.

B. S. Rintyarna, “Pengaruh Seleksi Fitur Pada Skema Klasifikasi Naive Bayes Berbasis Gaussian dan Kernel Density,” J. Sist. dan Teknol. Inf. Indones., vol. 1, no. 1, pp. 26–30, 2016, [Online]. Available: file:///C:/Users/User/Downloads/fvm939e.pdf.

S. Choirunnisa, “Metode Hibrida Oversampling Dan Ketidakseimbangan Data Kegagalan,” 2019.

B. Jang, I. Kim, and J. W. Kim, “Word2vec convolutional neural networks for classification of news articles and tweets,” PLoS One, vol. 14, no. 8, pp. 1–20, 2019, doi: 10.1371/journal.pone.0220976.

M. A. Fauzi, F. Nur, and T. Afirianto, “Improving Sentiment Analysis of Short Informal Indonesian Product Reviews using Synonym Based Feature Expansion,” TELKOMNIKA (Telecommunication, Comput. Electron. Control., vol. 16, no. 3, pp. 1345–1350, 2018, doi: 10.12928/TELKOMNIKA.v16i3.7751.

S. Raschka, “Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning,” 2018, [Online]. Available: http://arxiv.org/abs/1811.12808.

B. G. Marcot and A. M. Hanea, “What is an optimal value of k in k-fold cross-validation in discrete Bayesian network analysis?,” Comput. Stat., vol. 36, no. 3, pp. 2009–2031, 2021, doi: 10.1007/s00180-020-00999-9.

H. Chen and D. Fu, “An Improved Naive Bayes Classifier for Large Scale Text,” vol. 146, no. Icaita, pp. 33–36, 2018, doi: 10.2991/icaita-18.2018.9.