Hierarchical Deep Language Models for Classifying Antibody Subtypes in Oncology Clinical Notes

Authors

  • Taslim Taslim Universitas Lancang Kuning, Universiti Sultan Zainal Abidin
  • Susi Handayani Universitas Lancang Kuning
  • Dafwen Toresa Universitas Lancang Kuning
  • Syahriatna Syahriatna Universitas Lancang Kuning

DOI:

https://doi.org/10.31849/digitalzone.v17i1.29945

Keywords:

ClinicalBERT, hierarchical text classification, antibody subtype classification, oncology clinical trials, imbalanced learning

Abstract

A hierarchical language model based on ClinicalBERT was developed to classify antibody subtypes in oncology clinical trial statements under severe class imbalance conditions. The study addresses the limitation of conventional flat multilabel classification approaches that ignore the hierarchical dependency between general antibody categories and their specific subtypes. The proposed framework applies a two-stage architecture in which antibody presence is detected before subtype identification, enabling conditional learning for monoclonal and bispecific antibodies. The model combines hierarchical optimization with imbalance-aware strategies including class weighting, focal loss, and limited undersampling to improve sensitivity toward rare therapeutic categories. Experiments were conducted using 17,701 annotated oncology trial statements derived from ClinicalTrials.gov protocols. Model performance was evaluated using macro F1, subtype-specific F1, precision–recall area under the curve, hierarchical accuracy, and calibration analysis across multiple random seeds. The hierarchical ClinicalBERT model consistently outperformed the flat multilabel baseline, with bispecific F1 increasing from 0.02 to 0.06 and precision–recall area improving from 0.01 to 0.03. Reliability analysis further demonstrated well-calibrated probability estimates suitable for evidence ranking and biomedical decision-support applications. The findings indicate that incorporating label hierarchy improves rare subtype recognition, robustness, interpretability, and classification stability in biomedical text mining tasks.

References

[1] W. Lu et al., “Application of Entity-Bert Model Based on Neuroscience and Brain-Like Cognition in Electronic Medical Record Entity Recognition,” 2023. DOI: https://doi.org/10.3389/fnins.2023.1259652

[2] Y. Park, G.-J. Yang, C.-B. Sohn, and S. J. Park, “GPDminer: A Tool for Extracting Named Entities and Analyzing Relations in Biological Literature,” 2024. DOI: https://doi.org/10.1186/s12859-024-05710-z

[3] I. Guellil et al., “Natural Language Processing for Detecting Adverse Drug Events: A Systematic Review Protocol,” 2023. doi: 10.3310/nihropenres.13504.1. DOI: https://doi.org/10.3310/nihropenres.13504.1

[4] P. Han, X. Li, X. Wang, S. Wang, C. Gao, and W. Chen, “Exploring the Effects of Drug, Disease, and Protein Dependencies on Biomedical Named Entity Recognition: A Comparative Analysis,” 2022. DOI: https://doi.org/10.3389/fphar.2022.1020759.

[5] X. Zhu, Y. Gu, and Z. Xiao, “HerbKG: Constructing a Herbal-Molecular Medicine Knowledge Graph Using a Two-Stage Framework Based on Deep Transfer Learning,” 2022. DOI: https://doi.org/10.3389/fgene.2022.799349.

[6] J. Zhao, S. Huang, and J. M. Cole, “OpticalBERT and OpticalTable-SQA: Text- And Table-Based Language Models for the Optical-Materials Domain,” 2023. DOI: https://doi.org/10.1021/acs.jcim.2c01259

[7] B. S. Lancheros, G. C. Pastor, and R. Mitkov, “Data Augmentation and Transfer Learning for Cross-Lingual Named Entity Recognition in the Biomedical Domain,” 2024. DOI: https://doi.org/10.1007/s10579-024-09738-8.

[8] D. Shrivastav et al., “Integrating Natural Language Processing in Medical Information Science for Clinical Text Analysis,” 2024. DOI: https://doi.org/10.56294/mw2024513.

[9] K. S. Reddy, N. Ragavenderan, K. Vasanth, G. N. Naik, V. P. H, and G. S. Nagaraja, “MedicalBERT: Enhancing Biomedical Natural Language Processing Using Pretrained BERT-based Model,” 2025. DOI: https://doi.org/10.11591/ijai.v14.i3.pp2367-2378.

[10] F. A. Maulana and A. Salam, “Enchancing Medical Named Entity Recognition With Ensemble Voting of BERT-Based Models on BC5CDR,” 2025. DOI: https://doi.org/10.30871/jaic.v9i3.9549.

[11] O. Vernygora, F. A. H. Sperling, and J. R. Dupuis, “Toward Transparent Taxonomy: An Interactive Web‐tool for Evaluating Competing Taxonomic Arrangements,” 2023. DOI: https://doi.org/10.1111/cla.12563.

[12] D. Mousa, N. Zayed, and I. A. Yassine, “Alzheimer Disease Stages Identification Based on Correlation Transfer Function System Using Resting-State Functional Magnetic Resonance Imaging,” 2022. DOI: https://doi.org/10.1371/journal.pone.0264710.

[13] L. Williams, E. Anthi, and P. Burnap, “Comparing Hierarchical Approaches to Enhance Supervised Emotive Text Classification,” 2024. DOI: https://doi.org/10.3390/bdcc8040038.

[14] A. Zangari, M. Marcuzzo, M. Rizzo, L. Giudice, A. Albarelli, and A. Gasparetto, “Hierarchical Text Classification and Its Foundations: A Review of Current Research,” 2024. DOI: https://doi.org/10.3390/electronics13071199.

[15] P. Sun, S. Linlin, L. Yuan, H. Yu, and Y. Wei, “Research of News Text Classification Method Based on Hierarchical Semantics and Prior Correction,” 2024. DOI: https://doi.org/10.3233/jifs-238433.

[16] S. Lutz et al., “Novel NKG2D-directed Bispecific Antibodies Enhance Antibody-Mediated Killing of Malignant B Cells by NK Cells and T Cells,” 2023. DOI: https://doi.org/10.3389/fimmu.2023.1227572.

[17] H. Sun et al., “A Novel Bispecific Antibody Targeting Two Overlapping Epitopes in RBD Improves Neutralizing Potency and Breadth Against SARS-CoV-2,” 2024. DOI: https://doi.org/10.1080/22221751.2024.2373307.

[18] A. V Vasco, R. Taylor, Y. Méndez, and G. J. L. Bernardes, “On-Demand Thio-Succinimide Hydrolysis for the Assembly of Stable Protein–Protein Conjugates,” 2024. DOI: https://doi.org/10.1021/jacs.4c03721.

[19] M. Soleimani and A. S. Mirshahzadeh, “Multi-Class Classification of Imbalanced Intelligent Data Using Deep Neural Network,” 2023. DOI: https://doi.org/10.4108/airo.3486.

[20] Y. Jin, L. Sun, and S. He, “Convergence of Polarized Self-Attention With Consistent Rank Chinese Text Classification,” 2024. DOI: https://doi.org/10.54254/2753-8818/31/20241133.

[21] K. M. Hasib et al., “McNn-Lstm: Combining CNN and LSTM to Classify Multi-Class Text in Imbalanced News Data,” 2023. DOI: https://doi.org/10.1109/access.2023.3309697

[22] P. Cheng and R. Langevin, “Unpacking the Effects of Child Maltreatment Subtypes on Emotional Competence in Emerging Adults.,” 2023. DOI: https://doi.org/10.1037/tra0001322

[23] W. Zhao et al., “Rare Mutation-Dominant Compound EGFR-positive NSCLC Is Associated With Enriched Kinase Domain-Resided Variants of Uncertain Significance and Poor Clinical Outcomes,” 2023. DOI: https://doi.org/10.1186/s12916-023-02768-z.

[24] K. Y. A. Abdelmonem, L. Ghalyoun, D. Nagarajan, S. Otsmane, and J. Picazo-Yeste, “Amelanotic Nodular Melanoma Over the Knee Region: A Case Report of a Diagnostic Challenge From the United Arab Emirates,” 2025. DOI: https://doi.org/10.7759/cureus.85630.

[25] A. Bustos and A. Pertusa, “Learning eligibility in cancer clinical trials using deep neural networks,” Appl. Sci., vol. 8, no. 7, p. 1206, 2018, doi: 10.3390/app8071206. DOI: https://doi.org/10.3390/app8071206

Downloads

Published

2026-06-06

How to Cite

Hierarchical Deep Language Models for Classifying Antibody Subtypes in Oncology Clinical Notes. (2026). Digital Zone: Jurnal Teknologi Informasi Dan Komunikasi, 17(1), 95-104. https://doi.org/10.31849/digitalzone.v17i1.29945

Most read articles by the same author(s)