EKSPLORASI FITUR FASTTEXT, TF-IDF DAN INDOBERT PADA METODE K-NEAREST NEIGHBOR UNTUK KLASIFIKASI SENTIMEN

Authors

  • Atika Putri Universitas Islam Negeri Sultan Syarif Kasim Riau
  • Surya Agustian Universitas Islam Negeri Sultan Syarif Kasim - Riau
  • Jasril Jasril Universitas Islam Negeri Sultan Syarif Kasim - Riau
  • Iis Afrianty Universitas Islam Negeri Sultan Syarif Kasim - Riau

DOI:

https://doi.org/10.31849/zn.v7i1.24779

Keywords:

Klasifikasi Sentimen, Optimasi Model, K-Nearest Neighbor, Fasttext dan TF-IDF, Indobert

Abstract

Sentiment classification is essential for analyzing public opinion, particularly on social media issues. One of the main challenges in sentiment classification is the limited amount of training data, which often affects the model's ability to make accurate predictions. This study examines Kaesang Pengarep's appointment as PSI chairman using feature extraction methods such as FastText, TF-IDF, and IndoBERT, alongside the K-Nearest Neighbor (KNN) algorithm. Optimization steps include adding external data, refining text preprocessing, applying data scaling, and tuning parameters. The baseline model achieved 44% accuracy and 39% F1-score using FastText. After optimization and switching to IndoBERT, the optimal model achieved 57% accuracy and 49% F1-score, showing a 10% improvement. These findings demonstrate that optimizations, such as advanced feature extraction and parameter tuning, significantly impact sentiment classification. Future research could focus on advanced optimization techniques to address data limitations and enhance sentiment analysis performance.

Keywords: Sentiment Classification, Model Optimisation, K-Nearest Neighbor, FastText, TF-IDF, IndoBERT.

References

[1] D. A. Vonega, A. Fadila, and D. E. Kurniawan, “Analisis Sentimen Twitter Terhadap Opini Publik Atas Isu Pencalonan Puan Maharani dalam PILPRES 2024,” Journal of Applied Informatics and Computing, vol. 6, no. 2, pp. 129–135, 2022.
[2] R. Harun, “Analisis Sentimen Opini Publik Pengguna Twitter Terhadap Kenaikan Harga BBM Menggunakan Algoritma Naïve Bayes,” Jurnal Ilmiah Ilmu Komputer Banthayo Lo Komputer, vol. 2, no. 1, pp. 26–33, 2023.
[3] E. Budianita, E. P. Cynthia, A. Pranata, and D. Abimanyu, “Pendekatan berbasis Machine Learning dan Leksikal Pada Analisis Sentimen,” Seminar Nasional Teknologi Informasi, Komunikasi dan Industri (SNTIKI), pp. 99–104, 2022.
[4] O. Kherif, Y. Benmahamed, M. Teguar, A. Boubakeur, and S. S. M. Ghoneim, “Accuracy Improvement of Power Transformer Faults Diagnostic Using KNN Classifier With Decision Tree Principle,” IEEE Access, vol. 9, pp. 81693–81701, 2021, doi: 10.1109/ACCESS.2021.3086135.
[5] A. Naldi and S. Agustian, “KLASIFIKASI SENTIMEN VAKSIN COVID-19 MENGGUNAKAN K-NEAREST NEIGHBOR BERDASARKAN WORD EMBEDDINGS FASTTEXT PADA TWITTER,” ZONAsi: Jurnal Sistem Informasi, vol. 5, no. 2, pp. 323–333, Jun. 2023, doi: 10.31849/zn.v5i2.12548.
[6] N. Sepriadi, E. Budianita, M. Fikry, and Pizaini, “Analisis Sentimen Review Aplikasi Mypertamina Menggunakan Word Embedding Fasttext Dan Algoritma K-Nearest Neighbor,” INFORMASI (Jurnal Informatika dan Sistem Informasi), vol. 15, no. 1, pp. 91–109, May 2023, doi: 10.37424/informasi.v15i1.222.
[7] S. Agustian, M. I. Syah, N. Fatiara, and R. Abdillah, “New Directions in Text Classification Research: Maximizing The Performance of Sentiment Classification from Limited Data,” 2024. [Online]. Available: https://arxiv.org/abs/2407.05627
[8] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching word vectors with subword information,” Trans Assoc Comput Linguist, vol. 5, pp. 135–146, 2017.
[9] A. Addiga and S. Bagui, “Sentiment Analysis on Twitter Data Using Term Frequency-Inverse Document Frequency,” Journal of Computer and Communications, vol. 10, no. 08, pp. 117–128, 2022, doi: 10.4236/jcc.2022.108008.
[10] K. S. Nugroho, A. Y. Sukmadewa, H. Wuswilahaken DW, F. A. Bachtiar, and N. Yudistira, “Bert fine-tuning for sentiment analysis on indonesian mobile apps reviews,” in Proceedings of the 6th International Conference on Sustainable Information Engineering and Technology, 2021, pp. 258–264.
[11] M. Ahsan, M. Mahmud, P. Saha, K. Gupta, and Z. Siddique, “Effect of Data Scaling Methods on Machine Learning Algorithms and Model Performance,” Technologies (Basel), vol. 9, no. 3, p. 52, Jul. 2021, doi: 10.3390/technologies9030052.
[12] A. N. Yahya, “Pro dan Kontra Kaesang Pangarep Jadi Ketum PSI ,” KOMPAS, Sep. 26, 2023.
[13] Y. El Saputra, “Klasifikasi Sentimen SVM Dengan Dataset yang Kecil Pada Kasus Kaesang Sebagai Ketua Umum PSI,” KLIK: Kajian Ilmiah Informatika dan Komputer, vol. 4, no. 6, pp. 2902–2908, 2024.
[14] S. .Safrizal, S. Agustian, A. Nazir, and Y. Yusra, “Klasifikasi Sentimen Terhadap Pengangkatan Kaesang Sebagai Ketua Umum Partai PSI Menggunakan Metode Support Vector Machine,” Building of Informatics, Technology and Science (BITS), vol. 6, no. 1, Jun. 2024, doi: 10.47065/bits.v6i1.5340.
[15] S. Alam and N. Yao, “The impact of preprocessing steps on the accuracy of machine learning algorithms in sentiment analysis,” Comput Math Organ Theory, vol. 25, no. 3, pp. 319–335, Sep. 2019, doi: 10.1007/s10588-018-9266-8.
[16] H. Zhou, “Research of Text Classification Based on TF-IDF and CNN-LSTM,” J Phys Conf Ser, vol. 2171, no. 1, p. 012021, Jan. 2022, doi: 10.1088/1742-6596/2171/1/012021.
[17] J. D. M.-W. C. Kenton and L. K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of naacL-HLT, Minneapolis, Minnesota, 2019, p. 2.
[18] F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, “IndoLEM and IndoBERT: A benchmark dataset and pre-trained language model for Indonesian NLP,” arXiv preprint arXiv:2011.00677, 2020.
[19] Muhammad Luthfi Al-Ghifari and Ken Ditha Tania, “Sentiment Analysis Performance Value Optimization Using Hyperparamater Tunning With Grid Search On Shopee App Reviews,” Indonesian Journal of Computer Science, vol. 12, no. 5, Oct. 2023, doi: 10.33022/ijcs.v12i5.3384.
[20] I. G. T. Permana and I. B. G. Dwidasmara, “Evaluasi Performance dengan Grid Search Terhadap K Nearest Neighbor (KNN) untuk Klasifikasi Penderita Diabetes Melitus,” Jurnal Elektronik Ilmu Komputer Udayana p-ISSN, vol. 2301, p. 5373.
[21] J. U. U. Nysa, A. Mahmudi, and K. Auliasari, “PERBANDINGAN JARAK EUCLIDEAN, MANHATTAN, CHEBYSHEV PADA KLASIFIKASI STATUS GIZI BALITA MENGGUNAKAN METODE K-NEAREST NEIGHBORS (KNN),” JATI (Jurnal Mahasiswa Teknik Informatika), vol. 7, no. 4, pp. 2443–2450, 2023.
[22] P. Yohana, S. Agustian, and S. K. Gusti, “Klasifikasi Sentimen Masyarakat terhadap Kebijakan Vaksin Covid-19 pada Twitter dengan Imbalance Classes Menggunakan Naive Bayes,” in Seminar Nasional Teknologi Informasi Komunikasi dan Industri, pp. 69–80.

Downloads

Published

2025-01-01

How to Cite

[1]
“EKSPLORASI FITUR FASTTEXT, TF-IDF DAN INDOBERT PADA METODE K-NEAREST NEIGHBOR UNTUK KLASIFIKASI SENTIMEN”, zn, vol. 7, no. 1, pp. 49–60, Jan. 2025, doi: 10.31849/zn.v7i1.24779.

Most read articles by the same author(s)