Optimization and Comparative Analysis of C4.5, Random Forest and XGBoost for Mapping SMK Muhammadiyah 2 Pekanbaru Alumni Career Pathways
Keywords:
Career Pathway, Data Mining Classification, C4.5, Random Forest, XGBoostAbstract
The mapping of vocational high school (SMK) alumni career pathways is essential to strengthen the link-and-match policy between vocational education and industry. However, most tracer studies focus primarily on descriptive statistics rather than predictive modeling. This study aims to optimize and compare three classification algorithms—C4.5, Random Forest, and XGBoost—in mapping alumni career pathways based on tracer study data, academic records, and internship (PKL) experience. The research adopts a quantitative approach using supervised learning within the CRISP-DM framework. The dataset consists of alumni data from 2021–2024, including academic performance, certification records, demographic variables, and post-graduation career status. Model performance was evaluated using accuracy, precision, recall, F1-score, and ROC-AUC metrics. The findings indicate that XGBoost achieved the highest classification performance, reaching an accuracy of 98%, outperforming Random Forest and C4.5. Feature importance analysis revealed that internship experience, competency certification, and productive subject scores were dominant predictors of career outcomes. The results contribute theoretically to the application of machine learning in vocational education analytics and practically provide evidence-based recommendations for curriculum strengthening, career guidance improvement, and tracer study optimization.
