Improvement of FPS and Efficiency of Parameters Mask R-CNN with MobileNetV3 Small for Cardboard Detection
DOI:
https://doi.org/10.31849/digitalzone.v16i1.26349Keywords:
Mask R-CNN, MobileNetV3 Small, Cardboard Detection, Model ConfigurationAbstract
Inventory management in warehouses often experiences discrepancies in recording the number of cardboard boxes due to errors during the manual recording process. To overcome this problem, a cardboard detection method was developed using the Default Mask R-CNN model and a modified model using MobileNetV3 Small. The training data was obtained from a collection of cardboard photos which then went through an annotation stage. In the cReonfiguration stage, various anchor scales were applied to determine the bounding box parameters, while the training process used Stochastic Gradient Descent (SGD). The default model is trained with the initial Mask R-CNN settings, while the custom model modifies the backbone and Feature Pyramid Network (FPN) adjustments. The test results show that the custom model has higher efficiency with a parameter count of 20,857,704 and an average FPS of 10.92. However, the accuracy level of the custom model is lower than that of the default model
References
[1] Yohanes Mbiri, Kristina Sara, and Anastasia Mude, “Rancang Bangun Sistem Informasi Adiministrasi Kependudukan Berbasis Website Menggunakan Metode Agile Di Desa Nginamanu Barat Kecamatan Wolomeze Kabupaten Ngada,” Simtek J. Sist. Inf. dan Tek. Komput., vol. 8, no. 1, pp. 148–153, Apr. 2023, doi: 10.51876/simtek.v8i1.236.
[2] S. Teja, M. I. Jalil, S. Nurakmalia, F. A. Rizaldi, and A. Saifudin, “Analisis dan Perancangan Sistem Informasi Inventory pada PT Insan Data Permata,” JURIHUM J. Inov. dan Hum., vol. 1, pp. 231–239, Apr. 2023, doi: 10.30998/jrami.v1i02.231.
[3] E. Fontana, W. Zarotti, and D. Lodi Rizzini, “A Comparative Assessment of Parcel Box Detection Algorithms for Industrial Applications,” in 2021 European Conference on Mobile Robots (ECMR), IEEE, Aug. 2021, pp. 1–6. doi: 10.1109/ECMR50962.2021.9568825.
[4] M. A. Masril and D. P. Caniago, “Optimasi Teknologi Computer Vision pada Robot Industri Sebagai Pemindah Objek Berdasarkan Warna,” ELKOMIKA J. Tek. Energi Elektr. Tek. Telekomun. Tek. Elektron., vol. 11, no. 1, p. 46, Jan. 2023, doi: 10.26760/elkomika.v11i1.46.
[5] T. Anjali Dompeipen, S. R. U. . Sompie, and M. E. . Najoan, “Computer Vision Implementation for Detection and Counting the Number of Humans,” J. Tek. Inform. vol. 16 no. 1, vol. 16, no. 1, pp. 65–76, 2021, doi: 10.35793.
[6] A. Y. Firmandicky and Y. A. Susetyo, “Klasifikasi Kardus Barang di PT XYZ Menggunakan Convolutional Neural Network dengan Pendekatan Fine Grained Image Classification,” J. JTIK (Jurnal Teknol. Inf. dan Komunikasi), vol. 8, no. 4, pp. 954–964, Oct. 2024, doi: 10.35870/jtik.v8i4.2337.
[7] J. Yang et al., “SCD: A Stacked Carton Dataset for Detection and Segmentation,” Sensors, vol. 22, no. 10, p. 3617, May 2022, doi: 10.3390/s22103617.
[8] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 2, pp. 386–397, 2020, doi: 10.1109/TPAMI.2018.2844175.
[9] A. Naumann, F. Hertlein, L. Dörr, and K. Furmans, “TAMPAR: Visual Tampering Detection for Parcel Logistics in Postal Supply Chains,” in 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), IEEE, Jan. 2024, pp. 8061–8071. doi: 10.1109/WACV57701.2024.00789.
[10] S. Fang, B. Zhang, and J. Hu, “Improved Mask R-CNN Multi-Target Detection and Segmentation for Autonomous Driving in Complex Scenes,” Sensors, vol. 23, no. 8, 2023, doi: 10.3390/s23083853.
[11] R. Rubin, C. Jacob, S. M. Anzar, and A. Panthakkan, “Mask R-CNN with Multi-Backbones - A Comparative Analysis,” 2022 5th Int. Conf. Signal Process. Inf. Secur. ICSPIS 2022, no. December, pp. 55–60, 2022, doi: 10.1109/ICSPIS57063.2022.10002546.
[12] C. Huang, Y. Zhou, and X. Xie, “Intelligent Diagnosis of Concrete Defects Based on Improved Mask R-CNN,” Appl. Sci., vol. 14, no. 10, 2024, doi: 10.3390/app14104148.
[13] L. Cao, P. Song, Y. Wang, Y. Yang, and B. Peng, “An Improved Lightweight Real-Time Detection Algorithm Based on the Edge Computing Platform for UAV Images,” Electron., vol. 12, no. 10, 2023, doi: 10.3390/electronics12102274.
[14] M. Abd Elaziz, A. Dahou, N. A. Alsaleh, A. H. Elsheikh, A. I. Saba, and M. Ahmadein, “Boosting covid-19 image classification using mobilenetv3 and aquila optimizer algorithm,” Entropy, vol. 23, no. 11, pp. 1–17, 2021, doi: 10.3390/e23111383.
[15] T. Shahriar, “Comparative Analysis of Lightweight Deep Learning Models for Memory-Constrained Devices,” pp. 1–22, 2025, [Online]. Available: http://arxiv.org/abs/2505.03303
[16] A. Howard et al., “Searching for mobileNetV3,” Proc. IEEE Int. Conf. Comput. Vis., vol. 2019-Octob, pp. 1314–1324, 2019, doi: 10.1109/ICCV.2019.00140.
[17] D. D. Karyanto, D. D. Karyanto, J. Indra, A. R. Pratama, and T. Rohana, “Detection of the Size of Plastic Mineral Water Bottle Waste Using the Yolov5 Method,” JIKO (Jurnal Inform. dan Komputer), vol. 7, no. 2, pp. 123–130, 2024, doi: 10.33387/jiko.v7i2.8535.
[18] R. Budi, R. A. Harianto, and E. Setyati, “Segmentasi Citra Area Tumpukan Sampah Dengan Memanfaatkan Mask R-CNN,” J. Intell. Syst. Comput., vol. 5, no. 1, pp. 58–64, 2023, doi: 10.52985/insyst.v5i1.305.
[19] N. Sarasuartha Mahajaya, P. Desiana, W. Ayu, and R. R. Huizen, “Pengaruh Optimizer Adam, AdamW, SGD, dan LAMB terhadap Model Vision Transformer pada Klasifikasi Penyakit Paru-paru,” Pros. Semin. Has. Penelit. Inform. dan Komput., vol. 1, no. 2, pp. 818–823, 2024.
[20] N. Uly, H. Hendry, and A. Iriani, “CNN-RNN Hybrid Model for Diagnosis of COVID-19 on X-Ray Imagery,” Digit. Zo. J. Teknol. Inf. dan Komun., vol. 14, no. 1, pp. 57–67, 2023, doi: 10.31849/digitalzone.v14i1.13668.
[21] A. Ardiansyah, J. Triloka, and Indera, “Evaluasi Kinerja Model YOLOv8 dalam Deteksi Kesegaran Buah,” JUPITER, vol. 16, no. 2, pp. 357–368, 2024, doi: 10.5281/zenodo.11296226.
[22] A. P. Nardilasari, A. L. Hananto, S. S. Hilabi, T. Tukino, and B. Priyatna, “Analisis Sentimen Calon Presiden 2024 Menggunakan Algoritma SVM Pada Media Sosial Twitter,” JOINTECS (Journal Inf. Technol. Comput. Sci., vol. 8, no. 1, p. 11, 2023, doi: 10.31328/jointecs.v8i1.4265.
[23] M. alfin Mansyur and N. Pratiwi, “Deteksi manusia dengan algoritma yolo untuk pemutaran audio otomatis di area tertentu,” JIPI (Jurnal Ilm. Penelit. dan Pembelajaran Inform., vol. 10, no. 1, pp. 667–674, 2025, doi: doi.org/10.29100/jipi.v10i1.5967.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Digital Zone: Jurnal Teknologi Informasi dan Komunikasi

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.






