A DNA Methylation–based Computational Framework for Tumour–microenvironment State Inference and Molecular Stratification

preprint OA: closed CC-BY-4.0
AI-generated summary by claude@2026-06, 2026-06-07

This paper introduces a computational framework that uses DNA methylation data to infer tumor-microenvironment states and stratify tumors based on their molecular profiles.

One-sentence paraphrase of the abstract; not a substitute for reading it. No clinical advice. How this works

AI-generated deep summary by claude@2026-06, 2026-06-07 · read from full text

This preprint studied how to reconstruct biologically meaningful tumour–microenvironment (TME) interaction states and molecularly stratify cervical cancer using DNA methylation profiles. The authors developed a computational framework that integrates epigenomic feature restriction, joint tumour–TME modelling, and machine-learning state reconstruction, aiming to infer tumour microenvironment states from bulk patient methylation data and link them to experimentally tractable cervical cancer cell line models, including consideration of HIV-associated programmes. They report the framework as a generalizable, reproducible, and interpretable “state inference engine,” with the stated caveat that the work is based on a preprint and has not been peer reviewed. This paper is centrally about endometriosis and does not explicitly discuss endometriosis or adenomyosis; it was included in the corpus via a keyword match in the upstream search index.

Read from the paper's body, not the abstract. Not a substitute for reading the paper. No clinical advice. How this works

Full text 380,321 characters · extracted from preprint-html · click to expand
A DNA Methylation–based Computational Framework for Tumour–microenvironment State Inference and Molecular Stratification | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article A DNA Methylation–based Computational Framework for Tumour–microenvironment State Inference and Molecular Stratification Saltiel Hamese, Mutsa Takundwa, Earl Prinsloo, Deepak Balaji Thimiri Govindaraj This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8708919/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 7 You are reading this latest preprint version Abstract Cervical cancer remains a persistent yet preventable threat to women’s health worldwide, with a disproportionate burden borne by women in low- and middle-income countries. In sub-Saharan Africa, including South Africa, it continues to rank among the leading causes of cancer-related morbidity and mortality despite the availability of screening, vaccination, and treatment strategies. Structural inequities in healthcare access, late-stage diagnosis, and the prevalence of biologically aggressive disease contribute to poor outcomes, underscoring the need for molecularly informed and context-sensitive precision medicine approaches. A central biological challenge in cervical cancer management is pronounced intra-tumour heterogeneity (ITH), arising from the coexistence of multiple tumour subclones shaped by genetic variation, epigenetic regulation, and dynamic tumour microenvironment (TME) pressures. This heterogeneity drives tumour adaptation, immune evasion, therapeutic resistance, and disease recurrence, complicating clinical decision-making and limiting the durability of standard treatments. These challenges are further intensified by persistent human papillomavirus (HPV) infection and, in many settings, HIV co-infection, which together impose distinct immune and stromal programmes that fundamentally shape tumour behaviour. Advances in computational biology and analytical programming have enabled the large-scale analysis of patient-derived omics data, including genomic, epigenomic, transcriptomic, and proteomic profiles, often through machine learning–based classification, clustering, and predictive modelling frameworks. However, despite the widespread availability of multi-omics datasets through resources such as The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO), their translation into clinically meaningful tumour stratification and experimentally actionable insight for cervical cancer remains limited. A key barrier is the lack of integrated, biologically grounded frameworks capable of reconstructing tumour–microenvironment interaction states directly from bulk patient data and systematically linking these inferred states to representative and experimentally tractable in vitro model systems. As a consequence, commonly used cervical cancer models frequently fail to capture critical immune and stromal dimensions, contributing to poor translatability of preclinical findings. To address this gap, we developed a DNA methylation–based computational framework for data-driven tumour stratification and tumour–microenvironment state inference. The framework integrates epigenomic feature restriction, joint tumour–TME modelling, and machine learning–based state reconstruction to infer biologically meaningful tumour microenvironment states directly from patient methylation profiles. It is designed not merely as a clustering pipeline, but as a generalizable epigenetic state inference engine that connects patient tumour states to experimentally controllable in vitro systems, with a specific focus on cervical cancer cell lines as the primary translational models. By explicitly modelling tumour-intrinsic, microenvironmental, and host-associated regulatory programmes—including those influenced by HIV infection—this framework enables the systematic selection and evaluation of cell line models that more faithfully recapitulate patient tumour biology. It advances precision oncology by providing a reproducible and interpretable approach to methylation-driven tumour stratification and cell line alignment in cervical cancer, with broader applicability to other immune-modulated malignancies and underserved disease contexts. Biological sciences/Cancer Biological sciences/Computational biology and bioinformatics Health sciences/Oncology Cervical cancer Women’s health Intra-tumour heterogeneity Tumour microenvironment Epigenomics DNA methylation Data-driven stratification Computational oncology Machine learning Tumour subtypes Precision oncology Experimental model selection Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 Figure 11 Figure 12 Figure 13 Figure 14 Figure 15 Figure 16 Figure 17 Introduction CERVICAL CANCER: EPIDEMIOLOGY, VIRAL AETIOLOGY, AND CLINICAL CHALLENGES Approximately 15–20% of human cancers are attributable to oncogenic viral infections, with cervical cancer (CC) representing one of the most prominent examples (Hewavisenti et al., 2023). Recent global analyses provide the first comprehensive quantification of the cancer burden attributed to HIV, estimating that 0.4% of all cancers diagnosed worldwide in 2022 (± 81 300 of 19 million cases) were directly attributed to HIV infection and theoretically preventable through improved HIV control measures. Cervical cancer accounts for the largest proportion of HIV—attributable malignancies globally, followed by Kaposi sarcoma, non-Hodgkin lymphoma, and Hodgkin lymphoma. Striking geographical disparities were observed, with the highest absolute and relative HIV-attributable cancer burden concentrated in sub-Saharan Africa, particularly eastern and southern Africa, where HIV contributed to more than 10% of all cancer cases. In Africa, cervical cancer alone accounts for approximately 41% of all HIV—attributable cancers (Huang et al., 2025). These data underscore the disproportionate intersection of HIV and cervical cancer in high—prevalence regions. Persistent infection with high-risk human papillomavirus (HR-HPV) remains the central etiological driver of cervical carcinogenesis, but disease initiation and progression are strongly modulated by host immune competence. Although HPV genotype distributions are often comparable between HIV-positive and HIV-negative women with cervical carcinoma, HIV-associated CD4 + T-cell depletion—particularly below 200 cells/µL—is consistently associated with HR-HPV persistence, multi-type infection, and accelerated progression from cervical intraepithelial neoplasia to invasive disease (Hewavisenti et al., 2023; Huang et al., 2025; Vuyst et al., 2012). Antiretroviral therapy (ART) has substantially reduced the incidence of several AIDS—defining malignancies, yet its impact on HPV—driven disease remains inconsistent, with persistent HPV infection and neoplasia frequently observed despite effective viral suppression. Incomplete or functionally dysregulated immune reconstitution—characterised by impaired antigen presentation, altered T—cell effector responses, and chronic inflammation—likely underlies sustained susceptibility to HPV—mediated oncogenesis. At the mechanistic level, HIV—HPV co-infection is driven primarily by disruption of epithelial integrity and immune—mediated tumour microenvironment (TME) remodeling rather than direct co—infection of the same cells. HIV infection promotes mucosal inflammation, downregulation of E—cadherin and tight junction proteins, and increased epithelial permeability, facilitating HPV access to the basal epithelial layer. HIV—derived proteins, particularly Tat and gp12, further exacerbate HPV oncogenic potential by disrupting epithelial barriers, enhancing expression of viral E6/E7 oncogenes, suppressing p53, and reactivating latent HPV. Concurrent reductions in innate antiviral molecules, including B—defensin-2 and thrombospondin, weaken local immune defense and reinforce viral persistence. These interactions operate bidirectionally, as HPV—associated mucosal inflammation and immune cell recruitment increase susceptibility to HIV acquisition (Hewavisenti et al., 2023; Nazli et al., 2010; Strickler et al., 2005). Collectively, these processes position cervical cancer—particularly in HIV-endemic settings—as a disease shaped by infection-driven immune dysfunction and tumour microenvironmental heterogeneity, providing a strong biological rationale for integrative analytical frameworks that explicitly model infection-associated epigenetic regulation and microenvironmental tumour states. HETEROGENEITY OF TUMOUR MICROENVIRONMENTAL STATES The tumour microenvironment (TME) in cervical cancer is a highly heterogeneous and functionally complex ecosystem composed of malignant epithelial cells and multiple non-malignant cellular compartments, each contributing distinct molecular programmes that shape tumour behaviour, progression, and therapeutic response. Epithelial tumour cells, defined by canonical markers such as EPCAM, KRT8/18/19, CDH1, and MUC1 (Chakravarthy et al., 2022; Richter et al., 2010; Ruan et al., 2020; Saha et al., 2018; Zhao et al., 2024), constitute the proliferative backbone of cervical cancer. Within this compartment, functional heterogeneity arises from variations in differentiation status and epithelial–mesenchymal transition (EMT) dynamics, which collectively dictate invasive potential, metastatic behaviour, and therapeutic sensitivity (Zhang et al., 2021). A distinct cancer stem–like population, characterised by the expression of LGR5, ALDH1A1, PROM1 (CD133), SOX2, NANOG, and POU5F1, provides self-renewal capacity and underpins tumour initiation, recurrence, and resistance to conventional therapies (Huang & Rofstad, 2016; Javed et al., 2021; Wang et al., 2025). These stem-like subpopulations coexist along an EMT gradient, exhibiting hybrid epithelial/mesenchymal phenotypes that confer plasticity and adaptive survival under therapeutic pressure. Surrounding the malignant compartment is a diverse stromal network dominated by cancer-associated fibroblasts (CAFs), marked by FAP, ACTA2, PDGFRB, COL1A1/2, NT5E, and THY1. CAFs display pronounced functional heterogeneity encompassing myofibroblastic CAFs (myCAFs), which drive extracellular matrix (ECM) deposition and tissue stiffness; inflammatory CAFs (iCAFs), which secrete cytokines and chemokines that modulate immune cell infiltration; and antigen-presenting CAFs (apCAFs), which express immune-regulatory molecules that influence T-cell activation and tolerance. These phenotypically distinct subsets collectively orchestrate ECM remodelling, promote EMT, and modulate immune evasion, thereby serving as key regulators of stromal–tumour crosstalk and therapeutic resistance (Bueno-Urquiza et al., 2024; Y. Li et al., 2025). Adjacent to CAFs, endothelial cells (PECAM1, VWF, CDH5, KDR, FLT1) and pericytes (RGS5, MCAM, CSPG4, ANGPT1) coordinate angiogenesis, vascular integrity, and perfusion (Shim et al., 2002). Within this vascular niche, endothelial heterogeneity manifests through tip, stalk, and quiescent phenotypes that dynamically respond to hypoxia and VEGF signalling, while pericyte subsets regulate vessel maturation, permeability, and the delivery of chemotherapeutic agents (Amini et al., 2022; Brash et al., 2025; Dasgupta et al., 2022; Keleg et al., 2014; Kumar et al., 2024; Moro et al., 2024). Finally, the immune–inflammatory compartment introduces another axis of heterogeneity, encompassing cytotoxic and helper T-cell lineages, macrophages, dendritic cells, and myeloid-derived suppressor populations. Markers such as PTPRC, CD3D/E, CD8A, CD68, CD14, IL1B, TNF, and CXCL9/10 delineate immune subsets that can either enhance anti-tumour immunity or sustain immunosuppressive environments, depending on the balance of effector versus regulatory phenotypes (Chen et al., 2021; De Vos Van Steenwijk et al., 2013; Dimitrova et al., 2023; John-Olabode et al., 2025; Li et al., 2023; Litwin et al., 2020; Wang et al., 2022; Xia et al., 2023). In cervical cancer, this immune diversity is further shaped by HPV-driven antigenic stimuli and chronic inflammatory signalling, generating a finely tuned equilibrium between immune activation and evasion. Collectively, these cellular and molecular dimensions of the TME underscore its multilayered heterogeneity and its central role in dictating disease trajectory, therapeutic response, and clinical outcome in cervical cancer. Because bulk transcriptomic and methylomic datasets represent aggregate signals from multiple cellular populations, tumour microenvironment (TME)-derived signatures provide a powerful means to decompose patient heterogeneity (Ma et al., 2024). Marker-based scoring approaches—typically performed by z-scoring each gene across samples and averaging expression within predefined marker sets—allow quantitative approximation of cell-type abundance and functional state in bulk tissue (Busarello et al., 2025). These signatures are biologically informative: high CAF scores commonly associate with stromal activation, epithelial–mesenchymal transition (EMT), reduced tumour purity, and immunosuppressive phenotypes, whereas elevated immune signatures indicate cytokine signalling and lymphocytic infiltration, often correlating with enhanced treatment responsiveness in immunologically “hot” tumours (Peng et al., 2020; Ying et al., 2025). In cervical cancer, TME composition is further shaped by viral and host factors, including HPV genotype and HIV co-infection, each capable of inducing distinct immunologic, metabolic, and epigenetic states (Pavone et al., 2024). These multilayered influences are reflected not only in transcriptional programmes but also in DNA methylation patterns across the classical hallmarks of cancer—activating invasion and metastasis, avoiding immune destruction, and sustaining tumour-promoting inflammation—all of which contribute to dynamic TME remodelling (Song et al., 2025). Driver gene methylation introduces an additional regulatory axis for tumour stratification, offering insights into subtype-specific mechanisms of transcriptional control that may not be visible from gene expression alone (Chen et al., 2017). Understanding these interlinked TME-associated and driver-gene methylation programmes is essential for accurately comparing patient tumours with preclinical cervical cancer cell lines. Cell lines, while invaluable for mechanistic and drug-response studies, represent purified epithelial systems devoid of immune and stromal complexity. As a result, apparent molecular discrepancies between tumours and cell lines may arise not from intrinsic biological divergence but from the absence of microenvironmental influences such as immune infiltration, stromal remodelling, or EMT (Raghavan et al., 2021). An integrated visual overview of the analytical design is provided in Fig. 1 , which presents the framework as a three-part graphical workflow progressing from data preparation, through biological inference, to translational modelling. The figure illustrates how tumour microenvironment (TME) heterogeneity in cervical cancer is interrogated through DNA methylation landscapes in a structured and reproducible manner. Part 1 depicts data retrieval, harmonisation, and quality control. DNA methylation data are obtained from open-source patient cohorts, followed by rigorous sample- and probe-level quality control, probe filtering, and matrix harmonisation to ensure that downstream analyses are driven by biologically meaningful variation rather than technical artefacts. Part 2 represents the biological discovery core of the framework. Feature-restricted differential methylation analysis isolates high-variance and biologically informative CpG sites, including TME- and driver-associated features. These epigenetic signatures are then used to reconstruct latent tumour states through unsupervised learning, with dimensionality reduction and clustering revealing discrete methylation endotypes. Functional enrichment analysis assigns biological meaning to these tumour states by linking them to coherent pathways and regulatory programmes. Part 3 focuses on tumour–model concordance, predictive modelling, and performance evaluation. Feature-restricted similarity metrics are used to quantitatively compare patient tumours with experimental systems, particularly cervical cancer cell lines, enabling assessment of molecular fidelity. Multi-feature integration combines TME- and driver-linked methylation signals to stabilise model ranking and support translational inference. Predictive modelling and performance metrics demonstrate that the integrated feature space provides robust, interpretable, and biologically grounded discrimination. Collectively, Fig. 1 emphasises the central principle of the framework: transforming tumour heterogeneity, epigenetic state reconstruction, and experimental model selection into a unified inference problem that directly links patient-specific molecular context to rational prioritisation of laboratory systems for downstream mechanistic and therapeutic investigation. CERVICAL CANCER EXPERIMENTAL MODELS Cervical cancer experimental models constitute essential platforms for mechanistic investigation, therapeutic screening, and biomarker development. However, accumulating evidence indicates that in vitro and ex vivo models differ markedly in their ability to recapitulate the molecular architecture of patient tumours. Large-scale benchmarking studies have shown systematic discordance between primary tumours and commonly used cancer cell lines, even after rigorous normalisation, reflecting the dominant influence of tumour microenvironment (TME) composition, viral oncogenic context (HPV), and stromal and immune admixture on patient-derived molecular profiles. These discrepancies underscore the need for principled, data-driven approaches to evaluate model fidelity rather than assuming equivalence across available systems (Raghavan et al., 2021). At the same time, integrative pan-cancer analyses encompassing over a thousand human cancer cell lines have demonstrated that many models do retain disease-specific molecular programs, provided that similarity is assessed within biologically relevant feature spaces. A critical implication of these findings is that widely used drug-repurposing and pharmacogenomic frameworks—including perturbational matching approaches based on LINCS L1000 and Connectivity Map, as well as correlation-driven resources such as CTRP, GDSC, and PRISM—implicitly depend on the assumption that selected biomodels faithfully reflect the molecular state of the patient tumours to which therapeutic hypotheses are applied (Chawla et al., 2022; Eskra et al., 2023; Szalai et al., 2019). This assumption extends beyond cell lines to more complex systems such as patient-derived xenografts (PDXs) and organoids (PDOs), where accurate drug-response inference requires alignment of immune, stromal, and epithelial programs between models and clinically defined tumour subgroups (Fashemi et al., 2023; Liu et al., 2023; Zhao et al., 2021). Within this context, we introduce a rigorous computational framework for quantifying tumour–model concordance in a biologically constrained and reproducible manner. Rather than assuming equivalence across experimental systems, the framework explicitly evaluates molecular fidelity by restricting similarity assessments to disease-relevant and context-informative feature spaces, thereby enabling principled comparison between patient tumours and candidate experimental models. Rather than relying on global similarity metrics that are often dominated by housekeeping features, the framework restricts concordance analyses to disease- and context-informative molecular signals, such as differentially methylated or transcriptionally variable loci that encode tumour microenvironment states. By reconstructing patient-specific TME-associated epigenetic programmes and systematically correlating these signatures with corresponding cervical cancer experimental models, the approach identifies those models that most faithfully capture the transcriptional and epigenetic landscape of clinical disease (Table 1 ). Importantly, this strategy transforms experimental model selection from a heuristic process into an explicit inference problem. Aligning individual tumours or tumour clusters with their closest-matching cell lines, patient-derived xenografts ( PDXs ), or patient-derived organoids ( PDOs ) enables biologically grounded prioritisation of therapeutic compounds whose molecular perturbation profiles are predicted to reverse, reinforce, or exploit the defining features of each tumour state. In doing so, this framework not only exposes the biological heterogeneity of cervical cancer but also establishes a rational and scalable basis for selecting experimental systems that maximise the translational relevance of downstream drug-repurposing and precision oncology analyses. Table 1 Cervical cancer experimental models used in this study, including tumour origin, HPV status, morphology, typical experimental applications, and key literature references. Cellosaurus ID (RRID) Tumour Origin HPV Status Morphology Typical Use / Notes Citation Ca Ski (RRID:CVCL_1100) Metastatic cervical squamous carcinoma (epidermal origin → metastasis to small intestine) HPV16, ~ 600 copies/cell Epithelial-like, adherent Classic HPV16⁺ model; used for metastasis, cisplatin resistance, and viral oncogene expression studies (Koraneekit et al., 2018; Naidu et al., 2018) CAL-39 (RRID:CVCL_1109) Primary cervical squamous carcinoma HPV18 positive Epithelial, adherent HPV18⁺ reference model; drug response assays (Zięba et al., 2018; Gioanni et al., 1993) SiHa (RRID:CVCL_0032) Primary cervical squamous carcinoma HPV16, low copy number (~ 1–2 copies/cell) Epithelial, adherent Low-copy HPV16 model; radiotherapy, DDR, and viral–host integration studies (Filippova et al., 2014) HeLa (RRID:CVCL_0030) Cervical adenocarcinoma (primary tumour) HPV18 positive Epithelial, adherent Widely used immortal cervical cell line; HPV18 oncogene biology, transcriptomics, and therapeutic testing (Liu et al., 2012) ME-180 (RRID:CVCL_1401) Primary cervical epidermoid carcinoma HPV39 positive Epithelial, adherent HPV39 variant model; EMT and drug testing studies (Seshadri et al., 2021) MS751 (RRID:CVCL_4996) Metastatic cervical carcinoma (lung metastasis) HPV18 positive Epithelial High HPV18 copy model; immune-related transcriptomic studies (Lin et al., 2020) SiSo (RRID:CVCL_2193) Squamous cervical carcinoma HPV18 positive Epithelial HPV18 model for drug and immune interaction studies (Li et al., 2025) C-33 A (RRID:CVCL_1094) Cervical carcinoma, HPV negative HPV-negative Epithelial, adherent HPV-independent cervical model; tumour suppressor and DDR gene studies (Conlon et al., 2021) HT-3 (RRID:CVCL_1293) Primary cervical carcinoma HPV-negative Epithelial, adherent Radio-resistant HPV-negative model; often used in chemoradiation and cisplatin sensitivity assays (Sahu et al., 2024) BOKU (RRID:CVCL_1089) Cervical squamous carcinoma HPV16 positive Epithelial Rarely used HPV16⁺ model; utilised in HPV gene regulation and immune microenvironment studies (Hiramoto et al., 2018) SKG-IIIa (RRID:CVCL_1704) Cervical squamous carcinoma (Japanese origin) HPV16 positive Epithelial Represents Asian HPV16⁺ squamous tumours; invasion and cell-cycle regulation studies (Horikawa et al., 2015) SW756 (RRID:CVCL_1727) Cervical squamous cell carcinoma (primary tumour) HPV18 positive Epithelial, adherent HPV18⁺ SCC model; EMT, TME, and chemotherapeutic response profiling (Kamradt et al., 2000) DoTc2 4510 (RRID:CVCL_1181) Cervical carcinoma HPV16 positive Epithelial HPV16⁺ model used for immuno-oncology and methylation concordance analyses (Vučković et al., 2023) Materials PATIENT COHORTS Two independent and complementary cervical cancer cohorts were analysed to ensure analytical robustness, biological generalisability, and cross-platform validation of tumour microenvironment (TME)–associated molecular states. These cohorts were selected to capture both transcriptomic and epigenetic dimensions of cervical cancer biology across distinct clinical and virological contexts. The first dataset, TCGA-CESC, was obtained from the NCI Genomic Data Commons, a rigorously standardised and widely validated resource for high-quality molecular and clinical cancer data (Weinstein et al., 2013). TCGA-CESC provides bulk RNA-sequencing profiles with extensive clinical annotation, including tumour purity estimates, disease stage, and HPV status, and serves as a reference cohort for cervical cancer genomic studies (Burk et al., 2017). In this study, TCGA-CESC was leveraged to define transcriptionally informed tumour- and TME-associated programmes, establish biologically grounded feature spaces, and support cross-modal interpretation of methylation-derived tumour states. The second dataset, GSE279982, was retrieved from the NCBI Gene Expression Omnibus, an internationally trusted repository for functional genomics data (Barrett et al., 2013). Released in 2024, GSE279982 constitutes one of the largest cervical cancer methylome studies to date in an HIV-endemic setting, profiling 538 cervical samples from Nigerian women using the Infinium MethylationEPIC BeadChip (Kaur et al., 2023). The cohort includes HIV-positive cervical cancer, HIV-positive cervical intraepithelial neoplasia (CIN), HIV-positive cancer-free controls, and HIV-negative cervical cancer cases, accompanying HPV genotyping and detailed clinical metadata. This dataset was originally generated to identify differentially methylated regions associated with cervical cancer progression under chronic HIV and HPV co-infection (Zheng et al., 2025). In this study, GSE279982 served as the primary epigenetic substrate for reconstructing TME-associated methylation states, performing feature-restricted differential analysis, and quantifying tumour–model concordance. The cohort’s depth, epidemiological relevance, and metadata completeness make it particularly well suited for evaluating how viral co-infection shapes epigenetic tumour states and for testing the robustness of tumour–model inference across clinically and biologically diverse patient populations. REFERENCE BIOMODEL DATASETS To enable systematic benchmarking of patient tumours against experimentally tractable cervical cancer models, complementary transcriptomic and epigenomic reference datasets were integrated based on data quality, coverage, and cross-study compatibility. These resources provide the molecular baselines required for quantitative tumour–model concordance analysis. Transcriptomic profiles of cervical cancer cell lines were obtained from the Genomics of Drug Sensitivity in Cancer (GDSC2) repository, which provides uniformly processed RNA-sequencing data across a large panel of deeply characterised human cancer cell lines (Yang et al., 2013). GDSC2 offers harmonised expression profiles alongside extensive molecular and pharmacological annotations, making it a widely used reference resource for translational and drug-response studies. In this study, these data support the identification of transcriptionally defined tumour- and TME-associated programmes and enable cross-modal interpretation of epigenetically inferred tumour states. Corresponding DNA methylation profiles for cervical cancer cell lines were retrieved from GEO (GSE68379), ensuring platform-consistent CpG coverage for direct comparison with patient tumour methylomes (Lorio et al., 2016). The availability of matched epigenomic data enables feature-restricted, CpG-level concordance analyses that minimise technical bias and focus on biologically informative methylation signals. Together, these curated reference model datasets provide a robust molecular backbone for evaluating tumour–cell line fidelity across both transcriptional and epigenetic dimensions and support the rational selection of cervical cancer experimental systems for downstream mechanistic studies and therapeutic prioritisation. ANALYTICAL FRAMEWORK AND IMPLEMENTATION All analyses were conducted using a modular and reproducible bioinformatics framework designed to infer tumour microenvironment (TME)–associated molecular states and to quantitatively align patient tumours with experimentally tractable cervical cancer models. The framework is implemented as a stepwise inference workflow that integrates data quality control, biologically constrained feature selection, unsupervised learning, tumour–model concordance analysis, and functional interpretation within a unified analytical design. The workflow begins with rigorous data quality control (QC) tailored to each molecular modality. For DNA methylation data, probe-level and sample-level QC metrics—including signal intensity distributions, detection p-values, and variance structure—are evaluated to exclude technical artefacts and ensure that downstream patterns reflect true biological variation rather than noise (Sun et al., 2022; Iorio et al., 2016). Only high-confidence CpG sites shared across patient tumours and reference model platforms are retained for downstream analyses, ensuring cross-dataset comparability. The framework then applies feature-restricted differential analysis, focusing on high-variance and biologically informative loci rather than genome-wide averages. For methylation data, this involves identifying differentially methylated or highly variable CpG sites associated with tumour state, clinical strata, or TME composition using moderated statistical testing (Ritchie et al., 2015). This biologically constrained feature selection reduces dimensionality, increases statistical power, and prioritises regulatory signals linked to tumour biology and microenvironmental context. Curated feature sets are subsequently used for tumour microenvironment reconstruction and unsupervised structure discovery. Dimensionality reduction and clustering approaches—including hierarchical clustering and graph-based community detection using Rphenograph—are employed to resolve latent tumour states that reflect coordinated epigenetic and TME-associated programmes rather than continuous variation (Levine et al., 2015; McInnes et al., 2018). Cluster robustness and biological coherence are evaluated through stability assessment and enrichment analyses. A central component of the framework is feature-restricted tumour–model concordance analysis. Patient-derived molecular signatures are quantitatively compared with cervical cancer model profiles using Spearman rank correlation, computed within the restricted feature space to assess preservation of biologically meaningful methylation programmes across systems (Wisniewski & Brannan, 2024). This strategy explicitly accounts for the absence of immune and stromal compartments in in vitro models and enables biologically interpretable ranking of model fidelity at both patient and cluster levels. The framework further supports multi-feature integration and predictive modelling by combining tumour-intrinsic (driver-associated) and TME-linked molecular signals to enhance robustness and translational relevance. Model performance is evaluated using receiver operating characteristic (ROC) and precision–recall curves, calibration profiles, and feature-importance metrics to ensure discrimination, stability, and interpretability (Haynes, 2013; McKight & Najab, 2010). Finally, clinical enrichment and functional pathway analyses are performed to contextualise molecularly defined tumour states. Cluster-level signatures are tested for enrichment of clinical, virological, and phenotypic annotations, while pathway enrichment analyses using FGSEA and MSigDB map epigenetic programmes to underlying biological mechanisms, including immune regulation, stromal activation, proliferation, and stress-response pathways (Kanehisa et al., 2016; Liberzon et al., 2015; Korotkevich et al., 2016). Collectively, this framework operationalises tumour heterogeneity and experimental model selection as a unified inference problem, providing a transparent, extensible, and biologically grounded methodological basis for linking patient-specific molecular context to rational model prioritisation and downstream translational analyses. STATISTICAL ANALYSES All statistical workflows were performed in R (v4.3 or later) using reproducible, script-based pipelines. Correlation analyses were conducted using Spearman’s rank correlation, chosen for its robustness to non-normal distributions and its suitability for cross-platform comparisons (methylation ↔ expression; tumour ↔ cell line) (Wisniewski & Brannan, 2024). Group-level comparisons across clinical strata—including HIV status, HPV genotype, and TME-defined subtypes—were assessed using Wilcoxon rank-sum tests for two-group contrasts or Kruskal–Wallis tests for multi-group comparisons (Haynes, 2013b; McKight & Najab, 2010). Multiple hypothesis testing was controlled using the Benjamini–Hochberg false discovery rate (FDR) procedure, with statistical significance defined at FDR < 0.05 unless otherwise specified (Haynes, 2013). For clustering analyses, both methylation-derived and expression-derived matrices were analysed using hierarchical clustering (complete linkage, Euclidean distance) implemented through base R ggplot2 (Zhu et al., 2025; Wickham et al., 2025), and ComplexHeatmap (Kolde et al., 2025), supplemented by Rphenograph where high-dimensional TME signatures required graph-based community detection (Levine et al., 2015). Dimensionality reduction for TME profiling was performed using PCA and non-linear embeddings (t-SNE/UMAP) generated via Rtsne and uwot respectively (Krijthe, 2023; McInnes et al., 2018). Functional enrichment analyses used FGSEA for pathway-level ranking (KEGG) and MSigDB for ontology-based and cancer hallmark processes (Kanehisa et al., 2016; Liberzon et al., 2015). Differential methylation testing used limma for per-cluster contrasts (Ritchie et al., 2015), and DMP-to-gene linking was done through Illumina 450K/EPIC probe annotation. Collectively, these statistical procedures supported a unified pipeline to profile TME and infer most appropriate cancer models. Methods APPLICATION TO THE GSE279982 METHYLOME The framework was designed as a tumour-aware, feature-restricted integrative workflow that links patient epigenetic heterogeneity to rational selection of experimental models. Rather than treating quality control, clustering, and model mapping as independent analytical steps, the approach integrates these components into a single inference process in which tumour state discovery, biological interpretation, and model prioritisation are jointly optimised. Application to the GSE279982 DNA methylation dataset proceeded through a series of sequential stages: (i) data harmonisation and quality control Supplementary, (ii) biologically constrained feature restriction, (iii) unsupervised discovery of methylation-defined tumour states, (iv) functional characterisation of cluster-specific regulatory programmes, (v) tumour–model concordance analysis within restricted feature spaces, and (vi) clinical and virological contextualisation of inferred tumour states (Fig. 2) . Together, these steps enable systematic reconstruction of tumour microenvironment–associated epigenetic states and provide a coherent framework for evaluating how well experimental systems capture the molecular diversity observed in patient tumours (see Supplementary Framework_R, Tutorials 1–9). 1. Data Sources and Harmonisation Genome-wide DNA methylation profiles were obtained from GSE279982, generated using the Illumina Infinium MethylationEPIC BeadChip platform and accompanied by curated clinical metadata, including HIV status, HPV genotype, age, BMI, and cancer stage. Reference DNA methylation profiles for cervical cancer cell lines were obtained from GSE68379 to enable quantitative comparison between patient tumours and experimentally tractable in vitro systems. Processed β-value matrices provided by the original studies were used to maintain consistency with validated normalisation pipelines and to enable direct cross-dataset integration. Sample identifiers were harmonised across molecular and clinical metadata, and only samples with concordant annotations were retained for downstream analyses (see Supplementary Framework_R, Tutorial 1) . 2. Methylation Quality Control and Probe Filtering To ensure that downstream tumour state discovery reflected biological signal rather than technical artefact, stringent DNA methylation–specific quality control was applied. Sample-level QC assessed global β-value distributions, variance structure, and signal consistency across arrays. No samples exhibited aberrant hybridisation profiles or extreme variance, supporting retention of the full cohort. Probe-level filtering removed CpG sites mapping to sex chromosomes, probes overlapping known single-nucleotide polymorphisms (SNPs) at the CpG or single-base extension site, cross-hybridising probes, and probes failing detection thresholds across samples. This filtering step yielded a high-confidence CpG set shared across patient tumours and cervical cancer cell line platforms, ensuring technical compatibility for downstream concordance and comparative analyses (see Supplementary Framework_R, Tutorial 2) . 3. Gene-Level Aggregation and Feature Harmonisation To improve biological interpretability and facilitate cross-system comparison, CpG-level methylation values were aggregated to the gene promoter level. CpGs annotated to promoter-proximal regions (TSS200 and TSS1500) were summarised by mean β-values per gene per sample. This representation reduces stochastic CpG-level noise while preserving regulatory signal relevant to transcriptional control. Only genes represented in both the patient and reference datasets were retained, generating a harmonised gene-by-sample methylation matrix used throughout subsequent analyses (see Supplementary Framework_R, Tutorial 2 and 3). 4. Biologically Constrained Feature Restriction A central design principle of the framework is that tumour heterogeneity is encoded across distinct biological axes and must be interrogated within appropriately restricted feature spaces. Accordingly, feature restriction was applied prior to clustering and comparative mapping rather than post hoc. Three complementary feature sets were constructed: Differentially methylated CpGs (DMPs) capturing disease- and endotype-specific epigenetic disruption, Driver-associated CpGs reflecting tumour-intrinsic oncogenic programmes, and Tumour microenvironment (TME)-associated CpGs representing immune and stromal context. Differential methylation analyses were performed using the limma framework, with moderated statistics and false discovery rate control. Features were ranked by effect size and statistical strength, yielding compact, biologically enriched feature spaces that maximised signal-to-noise and interpretability (see Supplementary Framework_R, Tutorial 3—6) . 5. Unsupervised Discovery of Methylation-Defined Tumour States To identify latent tumour structure without imposing clinical labels, unsupervised clustering was performed within TME-restricted methylation space using Rphenograph, a graph-based community detection algorithm well suited to high-dimensional molecular data. This analysis identified four robust tumour clusters (C1–C4), each representing a distinct epigenetic and microenvironmental state. Cluster robustness was assessed through inspection of intra-cluster coherence and stability across subsampling. Low-dimensional embedding using UMAP and t-SNE was employed to visualise tumour relationships and confirm that clusters represented discrete biological communities rather than technical artefacts or continuous gradients (see Supplementary Framework_R, Tutorial 7). 6. Functional Characterisation of Tumour Endotypes To assign biological meaning to methylation-defined tumour states, cluster-specific functional enrichment analysis was performed. Differentially methylated genes for each cluster were ranked and analysed using FGSEA against MSigDB Hallmark, GO Biological Process, and KEGG gene sets. Normalised enrichment scores (NES) were used to identify coherent biological programmes, including immune activation, myeloid differentiation, cell-cycle regulation, metabolic reprogramming, and stress-response pathways. These functional signatures provided mechanistic context for tumour stratification and served as an additional biological layer for model mapping (see Supplementary Framework_R, Tutorial 9) . 7. Tumour–Model Concordance within Restricted Feature Spaces To evaluate model fidelity, tumour–model similarity was quantified using Spearman rank correlation computed within restricted feature spaces. Concordance analyses were performed independently within differentially methylated region (DMP)–restricted, driver-restricted, and TME-restricted methylation matrices, as well as across integrated feature combinations. This strategy enabled systematic dissection of tumour-intrinsic versus contextual contributions to model similarity. Correlation-based similarity was selected for its robustness to cross-platform variability and non-normal data distributions. Concordance profiles were resolved at the individual patient level and further stratified by tumour cluster, ensuring that experimental model prioritisation reflected biologically defined tumour states rather than cohort-averaged signals (see Supplementary Framework_R, Tutorial 4—6). 8. Integrated Feature Modelling and Consensus Model Inference To formalise experimental model selection across feature spaces, similarity scores derived from DMP-restricted, driver-restricted, and TME-restricted analyses were integrated into a unified predictive framework. Patient-level top-ranked models from each feature space were combined using a consensus-label strategy, prioritising agreement across independent biological axes. A random forest classifier was trained to predict consensus model labels using feature-restricted similarity scores as predictors. Model performance was evaluated using repeated cross-validation, confusion matrices, variable importance analysis, and prediction probability distributions. This approach quantified the relative contribution of epigenetic disruption, oncogenic identity, and microenvironmental context to overall model fidelity and provided an objective basis for prioritising experimental systems that most closely recapitulate patient tumour states (see Supplementary Framework_R, Tutorial 4–6). 9. Clinical and Virological Contextualisation Finally, methylation-defined tumour states and biomodel concordance results were evaluated for association with clinical and virological variables, including HIV status, HPV genotype, age, BMI, and cancer stage. Enrichment patterns were visualised on UMAP embeddings and cluster-resolved plots, and group-level differences were assessed using non-parametric statistical tests. This analysis established that the inferred tumour states capture clinically relevant heterogeneity while remaining independent of conventional staging alone (see Supplementary Framework_R, Tutorial 8). DEVELOPMENT USING TCGA-CESC TRANSCRIPTOMES The framework was initially developed and calibrated using transcriptomic data from the TCGA-CESC cohort to establish a biologically grounded and technically robust basis for tumour–model inference prior to its application to epigenetic data. TCGA-CESC was selected as the development dataset because it represents the most comprehensively annotated reference cohort for cervical cancer, providing high-quality RNA-sequencing profiles alongside detailed clinical, histological, and virological (HPV status) metadata. In this phase, TCGA-CESC transcriptomes were used to define and validate core analytical components, including feature-restricted similarity estimation, tumour microenvironment (TME)–aware stratification, and patient–in vitro model concordance logic. By leveraging both global gene expression patterns and biologically informed gene sets, such as cancer-driver genes and TME-associated gene signatures, the framework was tuned to distinguish tumour-intrinsic transcriptional programmes from microenvironment-driven variation that is incompletely represented in vitro . The strong HPV dependence, well-characterised epithelial–mesenchymal gradients, and pronounced immune heterogeneity present in TCGA-CESC enabled systematic evaluation of the framework’s ability to recover known biological structure without explicit supervision. This transcriptome-based development phase established the conceptual and computational foundations of the analytical approach, ensuring that subsequent extension to DNA methylation data preserved biological interpretability, model discrimination, and translational relevance across molecular layers (Fig. 2) . Figure 2. Study design and development framework. Schematic overview of the analytical workflow illustrating data sources, core analytical modules, and the integration logic underpinning tumour–model inference. DNA methylation profiles from GSE279982 (patient tumours) and GSE68379 (cervical cancer cell lines) were harmonised, quality controlled, probe-filtered, and promoter-aggregated to generate biologically interpretable methylation matrices. Feature restriction was then applied to construct three complementary tumour representations: (i) tumour microenvironment (TME)–restricted CpGs, (ii) differentially methylated positions (DMPs), and (iii) cancer driver–associated CpGs. TME-restricted features were used for unsupervised clustering via Rphenograph to define methylation-based tumour states, together with dimensionality reduction and pathway enrichment analyses to assign biological meaning to each endotype. In parallel, tumour–model similarity scores were computed independently within each restricted feature space to evaluate the fidelity of in vitro systems to patient tumours. Outputs from the DMP-, driver-, and TME-restricted concordance analyses were integrated within a consensus inference framework, calibrated using TCGA-CESC transcriptomic data, and formalised through a random forest classifier to derive robust prioritisation of experimental models. Final tumour states and model assignments were subsequently contextualised with clinical and virological variables to assess relevance to HIV status, HPV genotype, and disease characteristics, forming the complete development and application workflow. Classifier architecture A multi-feature, weakly supervised classification architecture was implemented to infer optimal experimental models by integrating complementary epigenetic signals (Fig. 3 ). The classifier operates on three biologically constrained DNA methylation feature spaces: (i) tumour microenvironment (TME)–restricted CpGs, (ii) cancer driver gene–restricted CpGs, and (iii) intra-cluster HIV-associated differentially methylated positions (DMPs). Each feature space is analysed independently to compute tumour–model similarity using Spearman rank correlation, ensuring sensitivity to monotonic epigenetic concordance while remaining robust to outliers and non-normal data distributions. To enable joint learning across heterogeneous feature spaces, within-patient normalisation of similarity scores is applied, transforming absolute correlations into patient-specific relative enrichment measures (Z-scores) (Andrade et al., 2021). This harmonisation preserves the internal ranking of model preferences for each patient while removing scale-dependent biases between feature domains. For each patient, top-ranked experimental models are identified independently within each feature space, yielding multiple, potentially discordant model assignments that reflect distinct biological axes of tumour heterogeneity. In the absence of gold-standard labels for model fidelity, a weakly supervised consensus labelling strategy is adopted. Feature-specific top predictions are reconciled into a single consensus model label per patient, prioritising agreement across feature spaces and resolving conflicts using biologically motivated precedence rules. These consensus labels serve as training targets while retaining patient-specific molecular context. A Random Forest classifier is trained using the harmonised similarity features from all three feature spaces to predict the consensus model label. Random Forests were selected for their capacity to capture non-linear feature interactions, tolerate correlated predictors, and support robust rank-based decision boundaries. Model training is performed using repeated stratified cross-validation to minimise overfitting and ensure generalisability. The trained classifier outputs probabilistic model assignments for each patient, enabling interpretable and confidence-aware prioritisation of experimental systems. Feature importance analysis is used to quantify the relative contribution of TME, driver-associated, and HIV-linked epigenetic signals, ensuring that predictive performance reflects biologically meaningful integration rather than dominance by a single feature class ( Fig. 3 ) . See, Supplementary Framework_R , Tutorial 4–6 for the core logic of classifier, and Tutorial 9 for consensus and integration logic. Results OPEN-SOURCE DATA RETRIEVAL AND COHORT HARMONISATION Publicly available DNA methylation datasets were programmatically retrieved and harmonised to establish an analysis-ready foundation for the study. The primary patient cohort (GSE279982) comprised genome-scale Illumina EPIC methylation profiles from cervical tumours collected from HIV-positive and HIV-negative women, accompanied by extensive clinical annotations including HIV status, HPV genotype, tumour stage, age, BMI, and tumour site. A complementary reference cohort (GSE68379) provided DNA methylation profiles for a curated panel of cervix-derived cancer cell lines. Systematic inspection confirmed high metadata completeness in both cohorts, with minimal missingness and no duplicated or unmapped sample identifiers. Processed patient β-value matrices and reconstructed cell line β-values from raw IDAT files exhibited biologically valid methylation ranges. Explicit alignment of assay matrices with clinical and experimental metadata ensured exact correspondence between samples and annotations. More than 440,000 CpG probes were shared between patient tumours and cell line datasets, establishing a substantial common feature space for integrative and comparative analyses. Together, these results demonstrate successful retrieval, curation, and harmonisation of patient and reference methylation datasets, providing a robust and reproducible input for downstream tumour–model concordance analysis, epigenetic state inference, and translational investigations. DATA PREPROCESSING, NORMALISATION, AND QUALITY CONTROL Rigorous preprocessing and quality control were applied to generate harmonised, high-confidence DNA methylation profiles suitable for integrative tumour–model analyses. Sample-level quality assessment using raw IDAT data from cervical cancer cell lines demonstrated uniformly high signal quality with no evidence of failed arrays, supporting retention of all model samples ( Fig. 4 ) . Probe-level filtering removed low-confidence and biologically confounding CpGs, including probes failing detection thresholds, cross-reactive probes, probes overlapping common SNPs, sex chromosome probes, and non-autosomal loci, substantially reducing technical noise. Background and dye-bias correction using Noob normalisation yielded stable and biologically valid β-value distributions across all cancer models. These probe filters were then consistently applied to the patient tumour cohort, ensuring both tumours and models were represented within an identical CpG universe. After harmonisation, 430,885 high-confidence CpG sites were shared between patient tumours and cervical cancer models, with perfectly aligned assay matrices and metadata. This preprocessing framework effectively minimised technical artefacts while preserving biologically relevant variation, providing a robust and unbiased foundation for downstream clustering, differential methylation analysis, tumour–model matching, and translational inference. FEATURE-RESTRICTED DIFFERENTIAL EXPRESSION ANALYSIS (DEA) Dea-restricted Feature Space 1. Differentially Methylated Probes Differential methylation analysis using limma revealed widespread and statistically robust HIV-associated epigenetic alterations across the cervical cancer methylome. Using QC-filtered, normalised, and harmonised methylation profiles, β-values were transformed to M-values to enable linear modelling, and probe-wise models were fitted to compare HIV-positive versus HIV-negative tumours with empirical Bayes moderation. This analysis identified 95,378 significantly differentially methylated positions (DMPs) after false discovery rate correction (FDR < 0.05) and effect-size filtering, indicating extensive HIV-linked epigenetic reprogramming in cervical cancer. Both hypermethylated and hypomethylated CpGs were detected, with large effect sizes and strong statistical support, reflecting coordinated regulatory changes rather than stochastic variation ( Fig. 5 A ) . Annotation of significant DMPs demonstrated enrichment across promoters, gene bodies, CpG islands, shores, and open-sea regions, linking HIV status to broad regulatory and genomic contexts relevant to tumour biology. These DMPs constitute a high-confidence, disease-relevant epigenetic signature that directly informed downstream tumour stratification, tumour–model concordance analysis, and biomarker discovery. Inspection of the top 500 HIV-associated DMPs further confirmed that the restricted feature space captured structured and biologically coherent epigenetic variation across patients. Scaled β-value heatmaps revealed coordinated methylation patterns with clear separation by HIV status, alongside additional stratification by HPV group and tumour stage ( Fig. 5 B ) . Notably, these CpGs organised into coherent methylation blocks rather than exhibiting stochastic variation, supporting their relevance as markers of biologically distinct tumour states. 2. Tumour-Model Mapping Restricting tumour–model similarity analysis to differentially methylated probes (DMPs) associated with HIV status and cervical cancer biology substantially reshaped the structure, dynamic range, and interpretability of tumour–model concordance. Following harmonisation, 430,885 CpGs were shared between patient tumours and reference cell line datasets; restriction to the disease-associated DMP set yielded a focused feature space of 95,378 CpGs. Similarity calculations within this restricted space produced a markedly expanded correlation range and enhanced contrast between high- and low-concordance models relative to genome-wide methylation similarity, indicating effective suppression of background and non-informative methylation signal. DEA-restricted Spearman similarity profiles revealed pronounced inter-patient heterogeneity in model alignment, with correlation values varying systematically across tumours rather than remaining uniform ( Fig. 6 ) . Individual cell lines exhibited distinct trajectory shapes across the patient cohort, with some models showing consistently higher concordance across subsets of tumours, while others displayed lower or more variable alignment. This pattern indicates that experimental model suitability is not global but tumour-state dependent when similarity is evaluated within a disease-relevant epigenetic space. Annotation of these similarity profiles with clinical metadata revealed structured stratification by HIV status, HPV genotype, and cancer stage. Tumours stratified by HIV status demonstrated distinct correlation distributions, consistent with HIV-associated epigenetic programmes exerting a measurable influence on tumour–model similarity when analysis is constrained to biologically relevant CpGs. Additional stratification by HPV group and tumour stage further refined these profiles, uncovering genotype- and stage-dependent shifts in model concordance that were not apparent under genome-wide correlation. These structured patterns contrast sharply with the comparatively compressed and homogeneous similarity observed in global methylation analyses. Visual integration of DEA-restricted similarity profiles with heatmap-based representations of HIV-associated methylation structure (top DMPs; Fig. 5 B ) demonstrated that tumours sharing coherent epigenetic states also exhibited recurrent alignment with specific experimental models. Notably, subsets of cell lines consistently tracked together across patients, suggesting shared representation of underlying tumour states rather than averaged or nonspecific similarity. This convergence supports a state-aware tumour–model matching paradigm, enabling rational prioritisation of experimental systems tailored to defined tumour contexts rather than one-size-fits-all selection. 3. ML-Based Tumour–Model Mapping Performance Evaluation Machine-learning–based prioritisation of tumour–model similarity, built on DEA-restricted methylation concordance, enabled objective identification of high-confidence biomodels for each patient beyond naïve correlation ranking. Evaluation of the trained Random Forest classifier demonstrated excellent discriminative ability, highly consistent prioritisation behaviour, and biologically meaningful integration of absolute and relative similarity features. The Precision@K profile showed that the algorithm performs extremely well at the clinically relevant end of the ranking spectrum, with precision remaining at or near 1.0 for the top one to three predicted models before declining gradually thereafter ( Fig. 7 B ) . This structured decay indicates a confident ranking system that sharply distinguishes optimal from suboptimal biomodels rather than exhibiting diffused or random prioritisation. The ROC curve further supported this behaviour, demonstrating near-perfect separation between top-ranked and non-top-ranked biomodels across patients, with an AUC of 1.0 ( Fig. 7 A ) . Importantly, this evaluation treats prioritisation as a cross-patient classification problem; maintaining such high discrimination performance across heterogeneous tumours demonstrating strong generalisability of the learned similarity function and confirms that DEA-restricted similarity features provide sufficient signal for robust classification even under weak supervision. To assess translational robustness, we examined whether prioritised models were consistently selected across patients rather than arising from stochastic or unstable ranking. Stability analysis of the Top-3 predictions revealed recurrent enrichment of specific biomodels, indicating convergence toward a small subset of highly representative cervical cancer models ( Fig. 7 C ) . Cell lines such as MS751 , SKG-IIIA , and TC-YIK were repeatedly prioritised, suggesting that these models capture stable biological programmes recurrently observed across DEA-restricted tumour methylomes. This pattern supports the biological grounding of the framework, demonstrating systematic preference for reproducible tumour-relevant models rather than patient-specific noise. Finally, regression analysis formally tested whether relative DEA similarity contributes meaningful information beyond absolute correlation. A positive association between absolute DEA correlation and patient-normalised DEA-Z scores was observed ( Fig. 7 D ) , indicating that while relative performance within a patient correlates with absolute signal strength, it does not collapse into it. An R² of approximately 0.21 demonstrates that relative similarity explains a substantial fraction of concordance variance while preserving independent information. This confirms the conceptual motivation of the framework: models with comparable absolute methylation concordance may differ markedly in their relative enrichment within a patient-specific context, and capturing this distinction is essential for biologically rational and clinically meaningful prioritisation. Driver Gene-Restricted Feature Space Driver genes directly shape tumour biology through effects on proliferation, DNA damage response, and oncogenic signalling; consequently, methylation changes at driver loci are more likely to reflect tumour lineage, oncogenic programmes, and actionable biology than genome-wide background methylation. Restricting similarity analysis to driver-linked CpGs therefore increases biological interpretability and enhances signal relevant for biomodel selection. Restriction to this driver-associated feature space yielded a compact yet highly informative molecular representation that captured structured and biologically meaningful variation across patients. The heatmap of top driver-associated CpGs revealed coherent blocks of hyper- and hypomethylation with clear patient stratification aligned to HIV status, HPV genotype, and tumour stage, indicating that driver methylation states encode both oncogenic and context-specific tumour biology ( Fig. 8 A ) . Spearman correlation profiling between patient tumours and candidate biomodels within this driver-restricted space produced consistently elevated similarity values, with recurrent correlation peaks for specific models across the cohort ( Fig. 8 B ) , suggesting the presence of conserved driver-programme archetypes shared across subsets of tumours. Machine-learning evaluation further confirmed the robustness of this driver-restricted space for biomodel prioritisation. Precision@K analysis demonstrated extremely high precision among the top-ranked predictions ( Fig. 8 D ) , while ROC analysis showed near-perfect discriminative performance between top-ranked and non-top-ranked biomodels ( Fig. 8 C ) . Stability analysis of Top-3 selections revealed recurrent prioritisation of a small subset of biomodels, particularly MS751 , HELASF , SKG-IIIA , TC-YIK , and OMC-1 ( Fig. 8 E ) , indicating convergence toward biologically representative and reproducible model choices rather than stochastic ranking behaviour. Finally, regression analysis demonstrated a strong linear relationship between relative (within-patient) driver enrichment and absolute driver concordance, with patient-normalised driver Z-scores explaining approximately 61% of the variance in absolute correlation values ( Fig. 8 F ) . This indicates that relative driver-associated signal is a dominant determinant of experimental model fidelity in this feature space, capturing biologically meaningful information beyond absolute similarity alone. Collectively, these results demonstrate that driver gene–restricted methylation encodes stable and biologically grounded tumour identity, supports confident and reproducible model prioritisation, and provides a powerful framework for context-aware experimental system selection. Tumour Microenvironment (TME)-restricted Feature Space Restriction of the feature space to tumour microenvironment (TME)–associated differentially methylated CpGs further sharpened tumour stratification and biomodel prioritisation by explicitly focusing on epigenetic programmes linked to immune, stromal, and microenvironmental biology. Heatmap visualisation of the top 500 TME-associated CpGs revealed highly structured methylation blocks with clear segregation of patient tumours according to HIV status, HPV group, cancer stage, age, and BMI ( Fig. 9 A ) , indicating that TME-linked methylation captures clinically coherent tumour contexts rather than diffuse background signal. These CpGs exhibited coordinated hypo- and hypermethylation patterns across patients, consistent with stable microenvironmental states such as immune-inflamed versus immune-suppressed tumours. DEA-restricted TME similarity profiles demonstrated pronounced and non-random concordance patterns between patient tumours and cervical cancer biomodels ( Fig. 9 B ) . Several cell lines displayed recurrently high Spearman correlations across large subsets of patients, supporting the existence of conserved TME archetypes that are reproducibly represented in vitro. Integration of TME-restricted similarity into the joint predictive framework (DEA + driver + TME) substantially improved model selection performance. Precision@K analysis showed that prioritisation remained near unity for the top-ranked biomodels and declined smoothly thereafter ( Fig. 9 D ) , indicating confident discrimination at clinically relevant ranks. Consistently, ROC analysis demonstrated excellent discriminative performance for the joint model ( Fig. 9 C; AUC ≈ 0.99) , confirming robust separation of top versus non-top biomodels across heterogeneous tumours. Stability analysis further demonstrated convergence on a small subset of highly representative biomodels, most notably MS751, HELASF , and HELA ( Fig. 9 E ) , highlighting their recurrent suitability for modelling TME-driven cervical cancer biology. Finally, regression analysis revealed a strong linear relationship between absolute TME-restricted correlation and relative (patient-normalised) TME enrichment ( Fig. 9 F; R² ≈ 0.60) , demonstrating that relative concordance captures substantial additional signal beyond absolute similarity alone. Collectively, these results establish TME-restricted methylation as a powerful and complementary feature space that enhances biological interpretability, improves tumour–model alignment, and strengthens translational confidence in experimental model selection. MULTI-FEATURE INTEGRATION AND PREDICTIVE MODELLING Integrating multiple biologically constrained feature spaces further strengthened tumour–model prioritisation, demonstrating that complementary epigenetic signals jointly enhance predictive performance. When combining DEA-restricted and driver-restricted methylation features ( DEA + Driver ), the joint model achieved near-perfect discriminative ability, with the ROC curve indicating complete separation of top versus non-top models (AUC = 1.0; Fig. 8 C ) . Precision@K analysis showed exceptionally high accuracy at clinically relevant ranks, with precision remaining close to unity for the top three to four prioritised models before gradually declining ( Fig. 8 D ) . This pattern reflects confident and selective ranking behaviour rather than diffuse or ambiguous assignment. Stability analysis revealed recurrent selection of a small subset of experimental systems—most prominently MS751, HELASF, SKG-IIIA , and TC-YIK ( Fig. 8 E ) —indicating that driver-informed DEA features converge on reproducible tumour archetypes across patients. Consistent with this behaviour, regression of absolute versus relative driver-restricted correlations demonstrated a strong linear relationship (R² ≈ 0.61; Fig. 8 F ) , confirming that patient-normalised relative enrichment captures a dominant and biologically meaningful component of tumour–model concordance beyond absolute similarity alone. Extending this framework to jointly integrate DEA, driver, and tumour microenvironment–restricted features (DEA + Driver + TME) yielded the most balanced and biologically expressive model. The joint classifier maintained excellent discriminative performance (ROC AUC ≈ 0.99; Fig. 9 C ) while preserving high Precision@K across the top-ranked models ( Fig. 9 D ) , indicating robust prioritisation even under increased feature complexity. Importantly, stability analysis again showed convergence on a consistent core of experimental systems, with MS751, HELASF , and HeLa most frequently selected ( Fig. 9 E ) . This convergence mirrors the strong and recurrent tumour–model similarity structure observed in the TME-restricted similarity profiles ( Fig. 9 B ) , suggesting that integration of tumour-intrinsic, driver-linked, and microenvironmental methylation signals identifies models that best recapitulate both intrinsic tumour biology and contextual TME states. The strong association between relative and absolute TME-restricted correlations (R² ≈ 0.60; Fig. 9 F ) further confirmed that relative, within-patient enrichment is a critical determinant of model fidelity when microenvironmental features are considered. This is consistent with the structured TME-linked methylation states observed at the tumour level ( Fig. 9 A ) . Finally, integration of driver and tumour microenvironment features alone (Driver + TME) retained high predictive power, with ROC analysis demonstrating excellent separation of top versus non-top models (AUC ≈ 0.99; Fig. 9 C ) and Precision@K curves showing strong performance at low K values ( Fig. 9 D ) . Stability profiling revealed a partially overlapping yet distinct subset of recurrently selected models compared with driver-inclusive frameworks, consistent with preferential selection of systems that capture immune and stromal tumour contexts rather than purely tumour-intrinsic programmes. Regression analysis of absolute versus relative Driver + TME-restricted correlations showed a strong linear association (R² ≈ 0.52) , confirming that patient-normalised relative enrichment contributes substantial independent information beyond absolute concordance alone. Collectively, these results demonstrate that multi-feature integration is synergistic rather than merely additive. DEA captures disease-specific epigenetic disruption, driver features encode oncogenic identity, and TME features reflect contextual tumour states. Their integrated use yields a robust, interpretable, and highly accurate predictive framework for tumour-aware experimental model selection, directly supporting translational and precision-modelling applications in cervical cancer. UNSUPERVISED CLUSTERING AND LATENT STRUCTURE DISCOVERY To determine whether tumour microenvironment (TME)–associated methylation patterns encode coherent latent tumour states, we performed unsupervised clustering and low-dimensional embedding using patient-wide similarity profiles derived exclusively from TME -restricted CpGs. This strategy enables discovery of biologically meaningful tumour subgroups without imposing clinical labels or outcome-driven bias. Both UMAP and t-SNE embeddings demonstrated clear, compact, and well-separated clusters, indicating that TME-linked methylation patterns encode strong, non-random biological signal ( Fig. 10 A,B ) . When analysis was restricted to TME-associated CpGs alone, four highly distinct tumour clusters (C1–C4) emerged with minimal overlap ( Fig. 10 A,B ) . These clusters exhibited characteristic structural features, including compact communities consistent with relatively homogeneous immune or stromal states and elongated manifolds suggestive of gradual transitions in microenvironmental composition, such as immune infiltration or stromal activation. The close concordance between UMAP and t-SNE embeddings confirms that these structures are robust to dimensionality-reduction method and not driven by projection artefacts. Importantly, these TME-driven tumour states emerged independently of tumour-intrinsic driver information, demonstrating that microenvironment-associated methylation represents a dominant and orthogonal axis of tumour heterogeneity. This observation is consistent with the strong explanatory power of relative TME -restricted enrichment observed in downstream modelling (R² ≈ 0.60) , indicating that TME-linked signal captures substantial variance in tumour identity beyond absolute similarity alone. Integration of driver and TME features preserved strong clustering while reducing background noise, yielding three coherent tumour communities in the joint Driver + TME model ( Fig. 10 C ) . These communities displayed smooth transitional boundaries, consistent with a continuum of tumour states shaped jointly by intrinsic oncogenic programmes and extrinsic microenvironmental context. Extending integration to include disease-associated CpGs (DEA + Driver + TME) retained this structured separation while further stabilising cluster geometry ( Fig. 10 D ) , mirroring the high predictive performance and biological grounding observed in the corresponding joint prioritisation models. Across all embeddings, cluster assignments were concordant with Rphenograph-derived community structure, confirming that the observed tumour states reflect stable latent organisation rather than methodological artefacts. Notably, tumours with similar clinical annotations frequently segregated into different epigenetic states, while clinically distinct tumours often converged within shared TME -driven clusters. This highlights the limitations of conventional clinicopathological stratification and underscores the central role of the tumour microenvironment in defining functional tumour identity. CLINICAL ENRICHMENT AND PHENOTYPIC STRATIFICATION To evaluate the clinical relevance of methylation-defined tumour states identified by the integrated TME + driver framework, we examined the distribution of clinical, virological, and host-related variables across the three unsupervised UMAP clusters (C1–C3) . Dimensionality reduction of the joint TME + driver methylation profiles revealed a highly structured embedding with three well-separated tumour communities, indicating that integration of tumour-extrinsic microenvironmental signals with driver-associated CpGs captures coherent biological organisation independent of clinical annotation ( Fig. 11 ) . Stratification by HPV genotype demonstrated non-random enrichment patterns across the embedding ( Fig. 11 A ) . HPV16-positive tumours clustered tightly within a restricted region of the UMAP space, consistent with a relatively homogeneous epigenetic programme. In contrast, HPV18-positive and other high-risk HPV–associated tumours occupied partially overlapping but spatially distinct regions, while low-risk and unknown HPV types were more diffusely distributed. These patterns indicate that viral genotype exerts a measurable and persistent influence on the integrated tumour–microenvironment methylation landscape. Host-related factors further stratified the embedding. When coloured by BMI group, tumours segregated along a principal UMAP axis, with overweight and obese cases preferentially localising to one cluster, whereas normal and underweight samples were more prominent in an opposing region ( Fig. 11 B ) . This gradient suggests that host metabolic state is reflected in tumour-associated epigenetic features when microenvironmental and driver-linked signals are jointly considered. Age stratification revealed a similar structured distribution ( Fig. 11 C ) , with younger patients clustering more tightly and older age groups showing increased dispersion, consistent with cumulative epigenetic divergence and greater tumour heterogeneity with advancing age. Stratification by HIV status revealed the most pronounced phenotypic organisation ( Fig. 11 D ) . HIV-positive cervical cancers segregated preferentially into specific regions of the UMAP space, with limited overlap with HIV-negative cases. Notably, HIV-positive tumours exhibited increased spread across the embedding, consistent with heightened epigenetic heterogeneity potentially driven by chronic immune activation and microenvironmental remodelling. The persistence of this separation within the TME + driver model indicates that HIV-associated immune and stromal methylation signatures act as a dominant axis of tumour stratification beyond tumour-intrinsic oncogenic features alone. Overlay of cluster assignments confirmed that clinical variables were not randomly distributed across C1–C3 but instead showed clear enrichment patterns. Individual clusters captured distinct combinations of viral genotype, host immune status, and host-related phenotypes, supporting the biological validity of the inferred tumour states. Collectively, these results demonstrate that the integrated TME + driver methylation framework defines clinically meaningful tumour communities that reflect coordinated interactions between oncogenic programmes, viral context, and host microenvironmental factors. FUNCTIONAL ENRICHMENT ANALYSIS To determine whether the methylation-defined tumour clusters represent biologically meaningful states rather than computational artefacts, functional enrichment analysis was performed on cluster-specific differentially methylated genes derived from the joint TME–Driver model (R² ≈ 0.52) . Genes were ranked according to cluster-specific methylation deviation and interrogated using Gene Ontology (GO) Biological Processes and KEGG pathway enrichment. This framework enabled systematic characterisation of the biological programmes underlying each tumour state and inference of their immunological and microenvironmental context. Across all clusters, enrichment analyses revealed highly structured and biologically coherent signatures, confirming that the identified methylation-based clusters reflect distinct tumour phenotypes rather than stochastic epigenetic variation ( Fig. 12 A–F ) . Cluster C1 was characterised by strong enrichment of biological processes related to immune surveillance, inflammatory signalling, and environmental sensing ( Fig. 12 A ) . Prominent GO terms included sensory perception of chemical stimulus, detection of external stimuli, immune and inflammatory response pathways, neuronal fate commitment, and cell adhesion. Consistent with this, KEGG pathway analysis highlighted olfactory transduction, NOD-like receptor signalling, natural killer cell–mediated cytotoxicity, IL-17 signalling, and complement and coagulation cascades ( Fig. 12 D ) . Together, these features define a tumour state marked by active immune sensing and innate immune engagement, consistent with an immune-inflamed microenvironment. The enrichment of cytokine signalling and innate immune pathways suggests enhanced immune surveillance and inflammatory activity, indicating that tumours within this cluster may be particularly responsive to immune-modulating therapeutic strategies. Cluster C2 exhibited a distinct functional profile dominated by pathways involved in neuronal signalling, endocrine regulation, and cellular differentiation ( Fig. 12 B ) . Enriched biological processes included neuron fate commitment, dopaminergic and neurotransmitter signalling, neuroactive ligand–receptor interaction, cAMP signalling, calcium signalling, and hormonal regulation. KEGG analysis further supported this phenotype, revealing enrichment of neuroactive ligand–receptor interaction, endocrine signalling pathways, microRNAs in cancer, and pathways regulating pluripotency ( Fig. 12 E ) . Collectively, these findings indicate a neuroendocrine-like tumour state characterised by epigenetic regulation of intracellular signalling cascades rather than immune activation. This cluster is consistent with an immune-cold or immune-evasive phenotype marked by enhanced receptor-mediated signalling, increased transcriptional plasticity, and reduced immune engagement. Cluster C3 demonstrated strong enrichment for immune and stromal interaction pathways, including defence response to bacteria, cytokine–cytokine receptor interaction, NOD-like receptor signalling, and complement and coagulation cascades ( Fig. 12 C ) . GO terms further highlighted immune response regulation, pattern recognition receptor signalling, inflammatory signalling, and extracellular matrix–associated processes. KEGG enrichment revealed associations with viral infection pathways, including HIV and Kaposi sarcoma–associated herpesvirus, reflecting immune activation within a virally influenced tumour microenvironment ( Fig. 12 F ) . This profile is consistent with a tumour state shaped by chronic immune stimulation and inflammatory signalling, reflecting strong tumour–immune–stromal crosstalk. The prominence of viral response and cytokine-mediated pathways aligns with the biological context of HPV-driven cervical cancer and supports a central role for host–pathogen interactions in shaping tumour epigenetic states. Collectively, these findings demonstrate that methylation-defined tumour clusters represent distinct and biologically interpretable endotypes rather than arbitrary computational groupings. The clusters capture immune-inflamed tumours characterised by active innate immunity (C1) , neuroendocrine-like tumours dominated by signalling and differentiation programmes (C2) , and immune–stromal reactive tumours enriched for inflammatory and viral-response pathways (C3) . Importantly, these patterns were consistent across GO and KEGG analyses, aligned with established tumour–microenvironment biology, and largely independent of conventional clinical classifications. Together, these results validate joint TME–Driver methylation profiling as a robust framework for tumour stratification and provide mechanistic insight into how epigenetic regulation shapes immune engagement, tumour behaviour, and potential therapeutic vulnerability. INTRA-CLUSTER DIFFERENTIAL EXPRESSION ANALYSIS (DEA) Intra-cluster differential methylation analysis demonstrated that HIV infection induces endotype-specific epigenetic reprogramming within methylation-defined tumour states rather than a uniform effect across cervical cancer. By performing differential analysis separately within each cluster, this approach isolates HIV-associated methylation changes that are conditional on tumour epigenetic context, thereby avoiding signal dilution caused by cross-cluster heterogeneity. Within Cluster C1 , the volcano plot revealed a robust HIV-associated methylation signature characterised by a substantial number of significant differentially methylated CpGs (DMPs) and moderate-to-large effect sizes ( Fig. 13 A ) . The distribution was skewed toward hypermethylation in HIV-positive tumours, with a smaller but distinct hypomethylated component. Several loci exceeded − log₁₀(P) values of 5, indicating reproducible and biologically meaningful epigenetic perturbation. Compared with other clusters, the log fold-change range was moderately constrained, suggesting a regulated yet consistent HIV response within this tumour endotype. Cluster C2 exhibited a markedly attenuated HIV-associated methylation profile ( Fig. 13 B ) . Both the number of significant DMPs and their effect sizes were reduced relative to C1 and C3 , with log fold changes largely confined within ± 0.3 and lower overall statistical significance. Nevertheless, clear bidirectional methylation changes were observed, indicating that HIV status remains biologically relevant even within this more epigenetically stable tumour state. The balanced presence of hyper- and hypomethylated CpGs suggests targeted modulation of specific regulatory loci rather than widespread epigenomic remodeling, consistent with a tumour endotype that is comparatively resistant to global HIV-driven epigenetic shifts. In contrast, Cluster C3 displayed the strongest and most extensive HIV-associated methylation reprogramming ( Fig. 13 C ) . The volcano plot revealed a dense and asymmetric burden of DMPs, with pronounced enrichment of hypomethylated CpGs in HIV-positive tumours alongside a substantial hypermethylated component. Effect sizes spanned a wide log fold-change range (approximately − 0.6 to + 0.6), and top loci exceeded − log₁₀(P) values of 8, indicating strong and coordinated epigenetic disruption. The magnitude and density of these signals suggest that C3 represents a highly HIV-responsive tumour state, potentially reflecting an immune-active or microenvironmentally plastic context in which viral co-infection profoundly reshapes regulatory methylation landscapes. For comparison, a global differential methylation analysis pooling tumours across all clusters revealed extensive HIV-associated hypermethylation and hypomethylation across the cervical cancer methylome ( Fig. 13 D ) . While this global analysis confirms a broad HIV effect, it obscures the pronounced cluster-specific differences in signal strength, directionality, and regulatory scope observed in the intra-cluster analyses. Together, these findings demonstrate that HIV-associated epigenetic reprogramming is strongly modulated by underlying tumour endotype, supporting the existence of cluster-specific intra- TME-Driver sub-states rather than a single, uniform HIV-driven methylation programme. Functional Enrichment Analysis Integration of tumour microenvironment (TME)– and driver-restricted methylation features revealed stable and biologically interpretable tumour states that align with cluster-specific functional programmes. UMAP projection of the integrated feature space demonstrated clear separation of tumour groups, with partial but non-random stratification by HIV status and cervical cancer diagnostic category ( Fig. 14 A ). Rather than forming a continuous gradient, tumours segregated into discrete communities, indicating that joint modelling of contextual (TME) and oncogenic (driver) methylation signals captures latent tumour states with defined biological identity. HIV-positive tumours were distributed non-uniformly across these clusters, preferentially occupying specific regions of the embedding, consistent with the endotype-specific methylation and functional signatures described above. The preservation of cluster structure under multi-feature integration further supports the synergistic design of TOBI, demonstrating that combining tumour-intrinsic and microenvironmental signals enhances biological resolution beyond single-feature analyses. Functional enrichment analysis of cluster-specific differentially methylated genes confirmed that HIV-associated epigenetic reprogramming is profoundly endotype dependent. In Cluster C3 , Gene Ontology (GO) Biological Process enrichment revealed a strong predominance of negative normalised enrichment scores (NES), indicating preferential enrichment in HIV-negative tumours ( Fig. 14 D ) . Dominant pathways included protein catabolic processes, post-translational modification, phosphorylation, apoptotic signalling, and organophosphate and amide metabolic processes. This pattern suggests that C3 tumours in the absence of HIV maintain tightly regulated metabolic and proteostatic networks. The coordinated loss of these programmes in HIV-positive C3 tumours is consistent with extensive epigenetic disruption of core cellular homeostasis, identifying C3 as a highly HIV-responsive and epigenetically plastic endotype. These functional shifts align with the large-amplitude intra-cluster differential methylation observed previously, reinforcing the interpretation of C3 as particularly sensitive to viral perturbation. In contrast, Cluster C1 displayed strong positive NES values for immune-related biological processes, indicating enrichment in HIV-positive tumours ( Fig. 14 B ) . Enriched pathways included myeloid cell differentiation, leukocyte activation, immune effector processes, defence responses to bacteria, and regulation of immune responses. This immune-dominant signature suggests that HIV-positive C1 tumours reside within an activated or inflamed tumour microenvironment, with epigenetic regulation favouring immune cell recruitment and functional differentiation. The specificity of immune and myeloid programmes to C1 underscores the endotype dependence of HIV effects and supports the classification of C1 as an immune-reactive tumour state in which HIV amplifies microenvironmental signalling rather than suppressing intrinsic tumour biology. Cluster C2 exhibited an intermediate and bidirectional functional profile ( Fig. 14 C ) . HIV-positive tumours showed enrichment of cell-cycle regulation, DNA metabolic processes, translation, and cell junction organisation, whereas HIV-negative tumours were enriched for small-molecule and monocarboxylic acid metabolic pathways. This pattern indicates selective rebalancing of proliferative and biosynthetic programmes in response to HIV infection, coupled with attenuation of metabolic flexibility. Compared with C1 and C3, C2 represents a transitional endotype in which HIV-associated methylation changes modulate growth and metabolic pathways without inducing widespread immune activation or global metabolic collapse. Collectively, the concordance between the integrated UMAP structure ( Fig. 14 A ) and the cluster-specific functional enrichment patterns ( Fig. 14 B–D ) confirms that the framework captures biologically meaningful tumour states shaped by the interaction of viral infection, tumour-intrinsic programmes, and the tumour microenvironment. These results validate the core design principle of the study: that biologically constrained feature integration combined with intra-cluster resolution reveals mechanistic tumour endotypes that are not apparent from clinical annotation alone. Intra-(DEA)-Cluster Tumour–Model Mapping Restricting tumour–model similarity analysis to C3-specific HIV-associated DMPs produced a compact but highly informative feature space comprising 2,310 CpGs shared across patient tumours and cervical cancer cell lines. This represents a substantial refinement relative to the much larger global DEA feature set used in earlier analyses. Heatmap visualisation of the top 500 C3 HIV-associated DMPs revealed highly structured and coordinated methylation blocks across tumours, with clear segregation driven by HIV status despite all samples belonging to the same methylation-defined endotype ( Fig. 15 C ) . This indicates that substantial epigenetic heterogeneity persists within C3 and that HIV infection imposes a coherent secondary regulatory layer superimposed on the core cluster identity. Importantly, these methylation patterns aligned with multiple clinical annotations, including HPV genotype, cancer stage, age, and BMI, demonstrating that intra-cluster HIV-associated methylation captures biologically meaningful tumour sub-contexts rather than residual technical or stochastic variation. DEA-restricted Spearman correlation profiling between C3 tumours and cervical cancer cell lines further revealed non-random and highly structured tumour–model concordance patterns ( Fig. 15 D ) . Multi-line similarity plots showed that individual cell lines exhibited recurrent peaks and troughs across patients, rather than flat or overlapping profiles, indicating selective alignment with specific tumour subsets within the same cluster. Several models consistently achieved higher concordance across large fractions of patients, while others displayed pronounced patient-specific variability or uniformly low similarity. These patterns support the existence of HIV-modulated sub-states within the C3 endotype that are differentially represented in vitro. Correlation values spanned weak to moderately strong positive ranges, highlighting that restriction to HIV-associated CpGs amplifies biologically relevant signal while suppressing background similarity driven by shared endotype structure alone ( Fig. 15 A–B ) . Endotype-aware Model Performance The random forest classifier trained on TME-, driver-, and DMP-based similarity scores achieved high predictive accuracy (≈ 97%) under repeated cross-validation, demonstrating that the integrated feature representation robustly encodes experimental model identity. Performance was stable across folds, excluding overfitting and confirming generalizability. The confusion matrix shows near-perfect classification of consensus labels, with MS751 correctly assigned in the overwhelming majority of cases and only minimal misclassification between HELA and OMC-1 ( Fig. 16 A ) . Importantly, classification errors are sparse and asymmetric, indicating that misassignment occurs primarily among biologically adjacent models rather than at random. This supports the interpretation that the framework is resolving fine-grained biological similarity rather than merely separating trivial or extreme cases. Feature importance analysis reveals that intra-cluster HIV-associated DMP similarity is the dominant predictor, followed by TME-restricted similarity, while driver-restricted similarity contributes minimally to the consensus setting ( Fig. 16 B ) . This is mechanistically consistent with earlier observations showing that HIV-linked epigenetic programmes and microenvironmental state are primary determinants of tumour identity in this cohort, whereas driver-associated methylation is more conserved across experimental models once consensus is enforced. Predicted class distributions closely mirror the true consensus labels, indicating no systematic inflation or suppression of specific classes ( Fig. 16 C ) . The model therefore preserves class balance, confirming that the high accuracy is not driven by class imbalance or majority-class effects. Prediction probabilities are sharply peaked near 1.0 for the assigned class, particularly for MS751 , indicating strong model confidence. Slightly broader, yet still well-separated, probability distributions for HELA and OMC-1 reflect smaller sample sizes and closer biological proximity rather than classifier uncertainty ( Fig. 16 D ) . Together, these probability profiles demonstrate that the framework yields decisive, stable, and biologically interpretable predictions rather than ambiguous or weakly differentiated rankings. Stability Analysis of Experimental Model Selection Across Feature Spaces Stability analysis quantified how consistently specific experimental models were prioritised as Top-3 matches across patients under different feature-integration regimes. The DEA + Driver configuration produced the least compact stability profile, with selections distributed across multiple models ( MS751, HELASF, SKG-IIIA, TC-YIK , and OMC-1 ) ( Fig. 17 D ) . Although MS751 remained dominant, the broader tail of selected models indicates that tumour-intrinsic signals alone are insufficient to enforce a unique or consistent model match. This highlights the limitation of driver-centric or global differential analyses when divorced from tumour context and microenvironmental state. Restricting the feature space to driver- and TME-associated CpGs preserved strong stability for MS751 and HeLa , with HELASF remaining prominent but slightly reduced relative to the full model ( Fig. 17 B ) . Importantly, the overall model set became more compact, with fewer low-frequency selections. This indicates that oncogenic and microenvironmental signals together are sufficient to define a robust core of representative experimental systems, while infection-linked epigenetic features (DMPs) further sharpen consensus and suppress marginal alternatives. The joint model excluding intra-cluster DMP refinement still recovered MS751 as the most stable model but exhibited greater dispersion among secondary candidates, including HeLa , HELASF , and SKG-IIIA ( Fig. 17 C ) . Compared with the fully integrated configuration, this setting showed increased ambiguity between biologically related cell lines, demonstrating that intra-cluster DMP stratification plays a critical role in resolving fine-grained tumour identity that is otherwise blurred when broader feature spaces are merged alone. When all three feature spaces were jointly integrated, a highly stable and sharply ranked hierarchy of experimental models emerged. MS751 was the most consistently selected model across patients, followed by HELASF and HeLa , with a steep drop-off for remaining lines ( Fig. 17 A ) . This pattern indicates that full multi-feature integration converges on a small, dominant set of biologically representative systems rather than distributing selections diffusely across many candidates. The dominance of MS751, HELASF , and HeLa reflects their strong concordance with patient tumours across infection-associated epigenetic programmes (DMPs), tumour microenvironment context (TME), and oncogenic background (driver loci), establishing them as the most globally faithful experimental models in this cohort. Taken together, this stability analysis demonstrates that maximal robustness and biological specificity are achieved only when DMP, TME, and driver features are jointly integrated. Full integration collapses model selection onto a small, reproducible set of experimental systems, led by MS751 , with HELASF and HeLa as consistent secondary matches. Progressive removal of contextual or infection-associated features results in increased dispersion and reduced consensus, underscoring that tumour identity is fundamentally multi-axial. These results provide a final validation of the analytical framework, showing that its feature-restricted, intra-cluster, and integrative design yields stable, interpretable, and biologically grounded experimental model prioritisation. The framework therefore offers a principled approach to tumour-aware model selection that outperforms single-feature or purely tumour-intrinsic strategies and is directly applicable to translational and precision-modelling studies in HIV-associated cervical cancer. Extension to additional omics layers: transcriptomic foundations from TCGA-CESC During the initial development of the framework, transcriptomic data from TCGA-CESC served as a critical proof-of-concept for tumour–model concordance across complementary biological dimensions. These analyses established the conceptual and analytical foundations that were later generalised to DNA methylation and multi-feature integration, demonstrating that model fidelity is inherently feature-dependent and context-specific rather than universal. To systematically assess how well commonly used cervical cancer cell lines represent patient tumours, transcriptomic correlation analyses were performed between TCGA-CESC tumours and GDSC2 cell lines. Gene expression profiles were stratified into cancer-driver, tumour microenvironment (TME), and global transcriptome feature spaces, enabling explicit separation of tumour-intrinsic oncogenic programmes from contextual immune and stromal signals. This decomposition made it possible to interrogate model fidelity along biologically interpretable axes rather than relying on undifferentiated genome-wide similarity. Across all gynaecological cancer models, cervical cancer–derived cell lines consistently showed the highest concordance with TCGA-CESC tumours, confirming lineage specificity and validating the analytical approach. This observation was biologically expected: TCGA-CESC tumours are predominantly HPV-driven epithelial cancers, and the corresponding cell lines (e.g., HeLa, SiHa, CaSki, MS751, SISO ) share HPV-mediated transcriptional regulatory programmes. The recovery of this expected signal confirms that the correlation framework captures meaningful biological structure rather than reflecting spurious or platform-driven similarity. Global transcriptomic similarity analyses revealed that tumour composition is a dominant determinant of experimental model representativeness. Tumour purity was strongly positively associated with cell line similarity, whereas epithelial–mesenchymal transition (EMT) scores were inversely correlated. High-purity tumours more closely resembled epithelial cell lines, while EMT-high tumours diverged because of increased stromal and immune gene expression. These findings anticipated later methylation-based observations and established that microenvironmental complexity systematically reduces tumour–cell line concordance. At the model level, a subset of cell lines—most notably HeLa, CaSki, MS751 , and SiHa —displayed consistently high median correlations, indicating preservation of core tumour-intrinsic transcriptional programmes. In contrast, lines such as HT-3, C-4I , and SW756 exhibited broader variance, suggesting selective alignment with specific tumour subtypes rather than general representativeness. Extending the analysis beyond cervical cancer models demonstrated that TCGA-CESC tumours aligned most strongly with cervix-derived cell lines, whereas ovarian and endometrial models showed only partial similarity. Moderate correlations with selected ovarian and endometrial lines likely reflect shared epithelial differentiation programmes and conserved immune–stromal pathways across gynaecological tissues. However, the sharp decline in concordance outside cervical lineages reinforced a central principle of the study: experimental model suitability is tissue-specific and not transferable without substantial loss of biological fidelity. Restriction to cancer-driver gene expression revealed a more heterogeneous concordance landscape. While most cervical cancer lines retained moderate similarity to TCGA tumours, only a subset of HPV-positive squamous-derived models—particularly CAL-39, CaSki, ME-180, DoTc2 4510 , and MS751 —consistently captured HPV-driven oncogenic transcriptional programmes. The HPV-negative line C-33A reproducibly diverged, reflecting its fundamentally distinct regulatory architecture. This divergence provided early evidence that viral oncogenesis is a dominant axis of tumour–model alignment, a theme later reinforced by methylation-based HIV and HPV analyses. Analysis of tumour microenvironment–associated gene expression demonstrated that immune and stromal programmes are only partially preserved in vitro. Nevertheless, several HPV-positive models, especially CAL-39 , retained strong concordance with TCGA-CESC tumours across TME features. The consistent prominence of CAL-39 among HPV16/18-positive squamous tumours highlighted that certain cell lines preserve microenvironment-linked transcriptional states despite ex vivo propagation, whereas others do not. This observation directly motivated the later explicit incorporation of TME-restricted features into the integrative modelling framework. Collectively, these transcriptomic analyses established a set of conceptual principles that directly informed the design and formalisation of the analytical framework. First, they demonstrated that feature restriction is essential, as cancer-driver genes, tumour microenvironment (TME) markers, and global expression profiles encode distinct and only partially overlapping biological axes that cannot be meaningfully collapsed into a single similarity metric. Second, the analyses showed that relative, patient-normalise similarity is more informative than absolute concordance, particularly for contextual features such as immune and stromal programmes, where inter-patient heterogeneity dominates signal structure. Third, they established that experimental model suitability is endotype-dependent rather than universal, with different tumour subgroups preferentially aligning with distinct in vitro systems depending on viral status, histology, and microenvironmental state. Finally, the strong stratification by HPV genotype, epithelial–mesenchymal transition (EMT) state, and tumour purity underscored that viral and microenvironmental contexts are major drivers of cervical cancer heterogeneity, necessitating a multi-feature, integrative analytical strategy. These principles were subsequently generalised to the epigenetic layer through DNA methylation analyses in HIV-associated Nigerian cervical cancer cohorts. Within the framework, they are operationalised through explicit feature restriction, intra-cluster differential resolution, and cross-layer integration, enabling tumour-aware prioritisation of experimental models that reflect both intrinsic oncogenic programmes and context-dependent regulatory states. By anchoring the approach in transcriptomic foundations and extending it across the methylome, the framework achieves biological continuity while remaining methodologically scalable across molecular profiling modalities. Discussion We applied a tumour-aware epigenomic pipeline to identify distinct DNA methylation endotypes in cervical cancer, shaped by tumour microenvironment (TME) context and underlying genomic programmes. Unsupervised clustering of patient methylomes enriched for TME-associated CpGs delineated endotypes with markedly divergent immune and stromal infiltration patterns and corresponding clinical outcomes. Notably, one endotype exhibited an immune-hot phenotype, characterized by elevated CD8⁺ T-cell and M1 macrophage infiltration, and was associated with improved prognosis. In contrast, another endotype displayed an immu ne-cold, stroma-rich profile linked to poorer survival. These observations are concordant with independent studies demonstrating that cervical cancer subgroups with enhanced immune signatures have superior clinical outcomes (Liu et al., 2022; Zhu et al., 2022). For example, Wang et al. identified an HPV-positive subgroup enriched for T-cell infiltration and pro-inflammatory cytokine signaling that exhibited improved disease-free survival relative to immune-low counterparts. Similarly, Zhao et al. defined four DNA methylation–based cervical cancer subtypes with distinct immune landscapes, including a subtype marked by high effector and memory T-cell infiltration and another characterized by minimal immune activation (Zhao et al., 2025). Collectively, these findings indicate that TME-informed methylation states capture a central axis of cervical cancer biology, stratifying patients into biologically and clinically meaningful endotypes that align with established prognostic immune signatures. To dissect heterogeneous regulatory influences, we performed feature-restricted analyses focusing separately on (i) TME-related CpGs, (ii) driver gene–associated CpGs, and (iii) differentially methylated positions (DMPs) between tumours and experimental models. This strategy captures orthogonal axes of cancer regulation. The TME axis reflects epigenetic marks associated with immune and stromal infiltration, the driver axis targets CpGs linked to genes implicated in oncogenic pathways (including established tumour suppressors and oncogenes), and the DMP axis captures global epigenetic dysregulation independent of specific functional annotation. Collectively, this framework enables the disentanglement of cell-extrinsic microenvironmental influences from cell-intrinsic tumour programmes and broad methylome perturbations. Such compartmentalised analysis aligns with the prevailing view that cancer heterogeneity arises from multiple, interacting sources—including genetic drivers and immune context—each imprinting distinct molecular signature (Yang et al., 2023). Moreover, epigenetic regulation in cancer is inherently dual in nature: tumour cell methylomes acquire aberrations in oncogenic loci while simultaneously shaping immune cell recruitment and immune checkpoint regulation. Consistent with this, Yang et al. reviewed mechanisms by which tumour DNA methylation modulates antitumour immunity. By parallelising analyses across TME-, driver-, and DMP-restricted feature sets, this approach preserves complementary regulatory signals that would otherwise be diluted in a global methylome analysis. A key finding of this study is that HIV co-infection exerts endotype-specific epigenetic effects. Within each methylation-defined cluster, HIV-positive patients exhibited distinct DNA methylation alterations at both host and viral loci, indicating that HIV introduces an additional layer of regulatory reprogramming in cervical tumours. These findings are consistent with emerging epidemiological and molecular evidence showing that HIV/HPV co-infection is associated with aberrant DNA methylation in genes involved in viral pathogenesis and tumour progression (Zheng et al., 2025). In addition, increased methylation of high-risk HPV genomic regions, particularly the L1 and L2 loci, has been linked to disease progression among HIV-positive women (Gradissimo et al., 2018). Together, these observations support the conclusion that HIV infection reshapes both host and viral epigenomes during cervical neoplastic evolution. In the cohort analysed here, specific TME-defined endotypes appeared particularly sensitive to HIV status. For example, HIV-positive patients within the immune- rich cluster displayed methylation changes not observed in HIV-negative counterparts, suggesting the presence of endotype-specific viral epigenetic programmes. Such programmes may reflect HIV-driven immunosuppression and viral integration effects that manifest differentially across tumour subtypes. These findings underscore the importance of incorporating HIV and HPV status into TME-aware molecular stratification frameworks, particularly in high-risk settings, and highlight potential biomarkers for viral-associated cervical cancer subgroups. For translational modelling, experimental model fidelity was rigorously evaluated by quantifying multi-feature concordance between patient tumours and candidate systems, including cell lines and patient-derived xenografts (PDXs). Faithful experimental models are essential for cancer research; however, prolonged in vitro culture and model adaptation frequently induce widespread epigenetic and transcriptional drift. Indeed, systematic analyses have demonstrated that many cancer cell lines diverge substantially from their tumours of origin at both the transcriptomic and epigenomic levels (Salvadores et al., 2020). To address this limitation, DNA methylation and transcriptomic profiles were aligned between cervical cancer tumours and experimental models, with adjustment for global shifts, and similarity scores were computed across TME-, driver-, and DMP-restricted feature spaces. This approach parallels recent pan-cancer efforts that quantitatively assess tumour–model concordance, such as classifier-based strategies that identify models failing to recapitulate their annotated cancer types. For example, Kinker et al. integrated thousands of tumours and hundreds of cell lines within a unified transcriptomic framework to detect models with poor lineage fidelity. Applying a comparable strategy in the epigenomic context, we identified which cervical cancer models most faithfully recapitulate the methylation signatures of each tumour endotype. The concordance analyses revealed that only a subset of available models captured the TME-driven epigenetic profiles observed in patients, underscoring important gaps in model representation for specific tumour contexts. This finding is consistent with prior work showing that many PDXs and cell lines exhibit limited transcriptional fidelity, whereas select organoid and engineered systems more accurately reflect native tumours (Peng et al., 2021). By extending model fidelity assessment into the epigenomic domain and integrating multiple biologically constrained feature spaces, this strategy ensures that selected experimental systems preserve both tumour-intrinsic regulatory programmes and microenvironment-associated epigenetic states characteristic of each tumour endotype. Finally, TME-, driver-, and DMP-derived features were integrated into a machine-learning classifier to recommend the most appropriate experimental model for each patient endotype. A Random Forest ensemble was trained on the combined epigenetic concordance scores, enabling non-linear integration of complementary regulatory signals across feature axes. Tree-based models are well suited to high-dimensional epigenomic data, and prior studies have demonstrated their strong performance in DNA methylation–based classification tasks. In cervical cancer, Random Forest approaches have achieved high accuracy in discriminating tumour-specific methylation patterns (Apoorva et al., 2024). More broadly, methylation-based Random Forest classifiers have been shown to accurately infer tumour tissue of origin across diverse cancer types (Duckett et al., 2025). Guided by these precedents, the integrated classifier robustly matched tumours to their most representative experimental systems in cross-validation analyses, substantially outperforming single-axis or naïve selection strategies. This framework provides a principled and systematic approach for matching patient tumours to in vitro and in vivo models, moving beyond ad hoc model selection. Importantly, it explicitly incorporates tumour microenvironmental context into experimental model choice, rather than relying on generic, one-size-fits-all cell line representations. Collectively, these findings advance tumour-aware patient stratification and precision modelling in cervical cancer. By integrating tumour microenvironment context with DNA methylation endotypes, we propose a refined classification framework that captures both immunological and genetic heterogeneity. This stratification has direct clinical relevance, as epigenetic subtypes have been shown to associate with prognosis and therapeutic response. In cervical cancer, methylation-defined subtypes differ in immune checkpoint expression and inferred responsiveness to immunotherapy (Zhao et al., 2025). Consistent with this, the immune-hot endotype identified here may represent patients more likely to benefit from immunomodulatory treatments, whereas immune-cold endotypes may require alternative therapeutic strategies. In addition, endotype-matched experimental models enable more predictive and context-aware preclinical drug screening. For example, integration of DNA methylation and tumour microenvironment features has previously been used to nominate targeted therapeutic agents for high-risk cervical cancer subgroups (Liu et al., 2022). Analogously, the experimental model assignments generated by our framework provide a rational basis for evaluating clinically relevant compounds within the appropriate epigenetic and microenvironmental context. Future integration of epigenetic endotyping with drug-sensitivity data may further enable identification of endotype-specific therapeutic vulnerabilities and opportunities for drug repurposing. Overall, this study demonstrates that incorporating tumour microenvironmental signals into epigenomic analyses yields biologically coherent patient clusters and informs principled selection of experimental systems. Such tumour-informed frameworks hold strong promise for improving personalised therapeutic prediction and enhancing the translational relevance of preclinical cancer research. Conclusions This study demonstrates that integrating tumour microenvironment–aware epigenomic features into patient stratification yields biologically coherent and clinically relevant cervical cancer endotypes. By explicitly disentangling microenvironment-associated regulatory programmes from tumour-intrinsic oncogenic and global epigenetic alterations, the framework captures key axes of tumour heterogeneity that are obscured in conventional genome-wide analyses. The identification of immune- hot and immune- cold methylation endotypes, together with their differential clinical and virological associations, underscores the central role of the tumour microenvironment in shaping disease progression and therapeutic vulnerability. Importantly, this work moves beyond descriptive stratification by directly linking patient endotypes to experimentally tractable cancer models. Systematic tumour–model concordance analysis revealed that only a subset of commonly used cervical cancer models faithfully recapitulate patient-specific epigenetic and microenvironmental states, highlighting a major and underappreciated source of irreproducibility in preclinical research. The integration of multi-axis epigenetic features into a machine-learning classifier provides a principled and scalable approach for experimental model recommendation, replacing ad hoc model selection with tumour-informed inference. Together, these findings establish a robust framework for tumour-aware precision modelling. By embedding microenvironmental context into epigenomic analysis and experimental model selection, this approach enhances the biological validity of preclinical studies and lays the foundation for endotype-matched therapeutic discovery. More broadly, the framework offers a generalisable strategy for improving translational fidelity across cancers, particularly in disease settings characterised by complex tumour–immune–viral interactions. Declarations Data Availability All datasets used in this study are publicly available through internationally recognised genomic repositories that provide version-controlled, reproducible, and openly accessible molecular data. No restrictions or controlled-access permissions were required. All data generated in this study, all scripts used to generate the computational workflow are available upon Author request. TCGA–CESC RNA-seq and Clinical Data Transcriptomic and clinical data for the TCGA Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma (TCGA-CESC) cohort were obtained from the National Cancer Institute’s Genomic Data Commons (GDC) portal (https://portal.gdc.cancer.gov/). The GDC is a high-integrity, rigorously standardised repository that hosts The Cancer Genome Atlas (TCGA) programme, widely regarded as the global reference resource for integrative cancer genomics. All RNA-seq HTSeq counts, FPKM/TPM values, purity estimates, and clinical annotations used in this study correspond to the most recent harmonised GDC release and remain freely downloadable under open-access terms. Patient Methylation Dataset: GSE279982 DNA methylation data for HIV-positive and HIV-negative Nigerian women were retrieved from the NCBI Gene Expression Omnibus (GEO) under accession GSE279982 (https://www.ncbi.nlm.nih.gov/geo/). GEO is a long-standing, internationally curated archive for functional genomics data and enforces strict metadata standards, file integrity checks, and reproducible versioning. The dataset consists of Illumina Infinium MethylationEPIC BeadChip profiles (IDAT files), HPV genotyping, and accompanying clinical metadata. All files are available without access restrictions and were downloaded from their primary GEO FTP directory. Reference Cell Line Methylation Data GSE68379 Methylation profiles for cervical cancer cell lines used in tumour–cell line concordance analyses were downloaded from GEO under accession GSE68379. This dataset provides high-quality 450K methylation profiling for key cervical carcinoma models, including HeLa, SiHa, CaSki, C-33A, HT-3, SISO, and MS751. Raw IDAT files and series matrix files remain publicly accessible through the GEO portal. Reference Cell Line Transcriptomic and Pharmacogenomic Profiles (GDSC2) Bulk RNA-seq expression data for cervical cancer cell lines were obtained from the Genomics of Drug Sensitivity in Cancer (GDSC2) platform (https://www.cancerrxgene.org/) a collaborative resource maintained by the Wellcome Sanger Institute and Massachusetts General Hospital Cancer Center. GDSC2 is a globally recognised pharmacogenomic reference database, widely used for benchmarking drug-response prediction models and preclinical oncology research. Processed RNA-seq expression matrices and cell-line metadata are fully open-access and can be downloaded through the GDSC interface or its associated FTP repository. Cancer Driver Gene Catalogue The curated list of cancer driver genes used for driver-gene methylation concordance analyses was retrieved from Cell Model Passports (https://cellmodelpassports.sanger.ac.uk/), maintained by the Wellcome Sanger Institute. The resource provides harmonised genomic annotations, validated driver gene sets, and up-to-date gene identifiers mapped across multiple platforms. All driver gene lists used in this study correspond to the 2024–12–12 resource update and are openly available for download. Pathway and Gene Set Resources Functional enrichment analyses were conducted using publicly accessible and standardised gene set repositories. The Molecular Signatures Database (MSigDB, v2024.1) Hallmark and KEGG collections were accessed under standard academic use conditions and provided the primary curated pathways for over-representation and enrichment testing. Additional KEGG pathway definitions were retrieved programmatically through the KEGG REST API, ensuring consistent and up-to-date pathway annotations. All gene sets incorporated into FGSEA analyses were obtained directly from these open-access resources without modification, maintaining full reproducibility and transparency of the enrichment framework. Supplementary Material Supplementary Framework_R contains a fully annotated tutorial handbook (Tutorials 1–9) providing complete, reproducible implementations of all analytical steps described in this study, including data processing, feature restriction, tumour stratification, tumour–model concordance, and classifier construction. Code and Reproducibility All analyses were performed using open-source software in R (v4.3), relying exclusively on publicly accessible datasets. Scripts for data import, preprocessing, correlation computation, and visualisation can be shared upon request and will be released in the accompanying repository for the subsequent chapter. The full analytical workflow can be followed using the supplementary tutorial handbook provided with this study. Author Contributions Saltiel Hamese completed this research and wrote the research paper together with the contributions from all the authors as follows: Dr. Mutsa Takundwa, Prof. Earl Prinsloo and Dr. Deepak B. Thimiri Govinda Raj. All the authors have read and approved the manuscript. Conflict of Interest The authors declare no conflict of interest. Acknowledgements The author gratefully acknowledges the publicly available multi-omics resources that enabled all analyses in this chapter. TCGA and GEO provided the foundational transcriptomic and methylation datasets, including HPV-stratified and HIV-positive cohorts essential for TME and clustering analyses. The Genomics of Drug Sensitivity in Cancer (GDSC) and Cell Model Passports resources, maintained by the Wellcome Sanger Institute, supplied high-quality cell line molecular profiles and curated cancer driver gene catalogues used for tumour–cell line concordance and driver-level methylation evaluation. Functional annotation and enrichment analyses were supported by the Molecular Signatures Database (MSigDB) and the Kyoto Encyclopedia of Genes and Genomes (KEGG), which offered rigorously curated pathway collections for FGSEA-based interpretation. Special thanks are extended to ChatGPT (OpenAI) for assistance in the development, optimisation, and troubleshooting of R-based workflows applied throughout this study. Funding Declaration Saltiel Hamese Doctoral Studies are funded by the National Skills Development Fund (NSDF), which is administered by the National Research Foundation (NRF) of South Africa, under the grant reference: PMDS230530111412. The project (Principal Investigator: DBTGR) was funded by the National Research Foundation (NRF) Competitive Grant, ICGEB Early Career Grant, Department of Science, Technology, and Innovation (DSTI) Emerging Research Area (ERA) Funding, SAMRC-AMED Cancer Research funding, and CSIR Strategic Initiative funding. Mutsa was funded by the NRF Thuthuka Rating Track. References Amini, A. P., Kirkpatrick, J. D., Wang, C. S., Jaeger, A. M., Su, S., Naranjo, S., Zhong, Q., Cabana, C. M., Jacks, T., & Bhatia, S. N. (2022). Multiscale profiling of protease activity in cancer. Nature Communications 2022 13:1, 13(1), 1–16. https://doi.org/10.1038/s41467-022-32988-5 Andrade, C. (2021). Z Scores, Standard Scores, and Composite Test Scores Explained. Indian Journal of Psychological Medicine, 43(6), 555. https://doi.org/10.1177/02537176211046525 Apoorva, Handa, V., Batra, S., & Arora, V. (2024). Advancing epigenetic profiling in cervical cancer: machine learning techniques for classifying DNA methylation patterns. 3 Biotech 2024 14:11, 14(11), 264-. https://doi.org/10.1007/S13205-024-04107-2 Barrett, T., Wilhite, S. E., Ledoux, P., Evangelista, C., Kim, I. F., Tomashevsky, M., Marshall, K. A., Phillippy, K. H., Sherman, P. M., Holko, M., Yefanov, A., Lee, H., Zhang, N., Robertson, C. L., Serova, N., Davis, S., & Soboleva, A. (2013). NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Research, 41(D1), D991–D995. https://doi.org/10.1093/NAR/GKS1193 Brash, J. T., Diez-Pinel, G., Rinaldi, L., Castellan, R. F. P., Fantin, A., & Ruhrberg, C. (2025). Endothelial transcriptomic, epigenomic and proteomic data challenge the proposed role for TSAd in vascular permeability. Angiogenesis 2025 28:2, 28(2), 1–21. https://doi.org/10.1007/S10456-025-09971-X Bueno-Urquiza, L. J., Godínez-Rubí, M., Villegas-Pineda, J. C., Vega-Magaña, A. N., Jave-Suárez, L. F., Puebla-Mora, A. G., Aguirre-Sandoval, G. E., Martínez-Silva, M. G., Ramírez-de-Arellano, A., & Pereira-Suárez, A. L. (2024). Phenotypic Heterogeneity of Cancer Associated Fibroblasts in Cervical Cancer Progression: FAP as a Central Activation Marker. Cells, 13(7), 560. https://doi.org/10.3390/CELLS13070560 Burk, R. D., Chen, Z., Saller, C., Tarvin, K., Carvalho, A. L., Scapulatempo-Neto, C., Silveira, H. C., Fregnani, J. H., Creighton, C. J., Anderson, M. L., Castro, P., Wang, S. S., Yau, C., Benz, C., Gordon Robertson, A., Mungall, K., Lim, L., Bowlby, R., Sadeghi, S., … Mutch, D. (2017). Integrated genomic and molecular characterization of cervical cancer. Nature 2017 543:7645, 543(7645), 378–384. https://doi.org/10.1038/nature21386 Busarello, E., Biancon, G., Cimignolo, I., Lauria, F., Ibnat, Z., Ramirez, C., Tomè, G., Ciuffreda, M., Bucciarelli, G., Pilli, A., Marino, S. M., Bontempi, V., Ress, F., Aass, K. R., VanOudenhove, J., Tiberi, L., Mione, M. C., Standal, T., Macchi, P., … Tebaldi, T. (2025). Cell Marker Accordion: interpretable single-cell and spatial omics annotation in health and disease. Nature Communications 2025 16:1, 16(1), 1–18. https://doi.org/10.1038/s41467-025-60900-4 Chakravarthy, A., Reddin, I., Henderson, S., Dong, C., Kirkwood, N., Jeyakumar, M., Rodriguez, D. R., Martinez, N. G., McDermott, J., Su, X., Egawa, N., Fjeldbo, C. S., Skingen, V. E., Lyng, H., Halle, M. K., Krakstad, C., Soleiman, A., Sprung, S., Lechner, M., … Fenton, T. R. (2022). Integrated analysis of cervical squamous cell carcinoma cohorts from three continents reveals conserved subtypes of prognostic significance. Nature Communications 2022 13:1, 13(1), 1–17. https://doi.org/10.1038/s41467-022-33544-x Chawla, S., Rockstroh, A., Lehman, M., Ratther, E., Jain, A., Anand, A., Gupta, A., Bhattacharya, N., Poonia, S., Rai, P., Das, N., Majumdar, A., Jayadeva, Ahuja, G., Hollier, B. G., Nelson, C. C., & Sengupta, D. (2022). Gene expression based inference of cancer drug sensitivity. Nature Communications 2022 13:1, 13(1), 5680-. https://doi.org/10.1038/s41467-022-33291-z Chen, K., Yong, J., Zauner, R., Wally, V., Whitelock, J., Sajinovic, M., Kopecki, Z., Liang, K., Scott, K. F., & Mellick, A. S. (2022). Chondroitin Sulfate Proteoglycan 4 as a Marker for Aggressive Squamous Cell Carcinoma. Cancers, 14(22), 5564. https://doi.org/10.3390/CANCERS14225564/S1 Chen, X., He, H., Xiao, Y., Hasim, A., Yuan, J., Ye, M., Li, X., Hao, Y., & Guo, X. (2021). CXCL10 Produced by HPV-Positive Cervical Cancer Cells Stimulates Exosomal PDL1 Expression by Fibroblasts via CXCR3 and JAK-STAT Pathways. Frontiers in Oncology, 11, 629350. https://doi.org/10.3389/FONC.2021.629350/FULL Conlon, N. T., Kooijman, J. J., van Gerwen, S. J. C., Mulder, W. R., Zaman, G. J. R., Diala, I., Eli, L. D., Lalani, A. S., Crown, J., & Collins, D. M. (2021). Comparative analysis of drug response and gene profiling of HER2-targeted tyrosine kinase inhibitors. British Journal of Cancer 2021 124:7, 124(7), 1249–1259. https://doi.org/10.1038/s41416-020-01257-x Dasgupta, S., Saha, A., Ganguly, N., Bhuniya, A., Dhar, S., Guha, I., Ghosh, T., Sarkar, A., Ghosh, S., Roy, K., Das, T., Banerjee, S., Pal, C., Baral, R., & Bose, A. (2022). NLGP regulates RGS5-TGFβ axis to promote pericyte-dependent vascular normalization during restricted tumor growth. FASEB Journal, 36(5), e22268. https://doi.org/10.1096/FJ.202101093R;JOURNAL:JOURNAL:15306860;REQUESTEDJOURNAL:JOURNAL:15306860;WGROUP:STRING:PUBLICATION De Vos Van Steenwijk, P. J., Ramwadhdoebe, T. H., Goedemans, R., Doorduijn, E. M., Van Ham, J. J., Gorter, A., Van Hall, T., Kuijjer, M. L., Van Poelgeest, M. I. E., Van Der Burg, S. H., & Jordanova, E. S. (2013). Tumor-infiltrating CD14-positive myeloid cells and CD8-positive T-cells prolong survival in patients with cervical carcinoma. International Journal of Cancer, 133(12), 2884–2894. https://doi.org/10.1002/IJC.28309;CTYPE:STRING:JOURNAL Desai, P., Takahashi, N., Kumar, R., Nichols, S., Malin, J., Hunt, A., Schultz, C., Cao, Y., Tillo, D., Nousome, D., Chauhan, L., Sciuto, L., Jordan, K., Rajapakse, V., Tandon, M., Lissa, D., Zhang, Y., Kumar, S., Pongor, L., … Thomas, A. (2024). Microenvironment shapes small-cell lung cancer neuroendocrine states and presents therapeutic opportunities. Cell Reports Medicine, 5(6). https://doi.org/10.1016/j.xcrm.2024.101610 Dimitrova, P., Vasileva-Slaveva, M., Shivarov, V., Hasan, I., & Yordanov, A. (2023). Infiltration by Intratumor and Stromal CD8 and CD68 in Cervical Cancer. Medicina, 59(4), 728. https://doi.org/10.3390/MEDICINA59040728 Duckett, D., Vormittag-Nocito, E. R., Jamshidi, P., Sukhanova, M., Parker, S., Brat, D. J., Jennings, L. J., & Santana-Santos, L. (2025). Accurate identification of primary site in tumors of unknown origin (TUO) using DNA methylation. Npj Precision Oncology 2025 9:1, 9(1), 8-. https://doi.org/10.1038/s41698-025-00805-z Eskra, J. N., Nguyen, E., Golabi, A., Nair, S., Masciotti, A., Fazio, A., Kocak, M., Ronan, M., Rees, M. G., & Roth, J. A. (2023). Abstract PR004: PRISM high-throughput screening of antibody-drug conjugates uncovers clinically relevant targets. Molecular Cancer Therapeutics, 22(12_Supplement), PR004–PR004. https://doi.org/10.1158/1535-7163.TARG-23-PR004 Fashemi, B. E., van Biljon, L., Rodriguez, J., Graham, O., Mullen, M., & Khabele, D. (2023). Ovarian Cancer Patient-Derived Organoid Models for Pre-Clinical Drug Testing. Journal of Visualized Experiments : JoVE, 2023(199), 10.3791/65068. https://doi.org/10.3791/65068 Filippova, M., Filippov, V., Williams, V. M., Zhang, K., Kokoza, A., Bashkirova, S., & Duerksen-Hughes, P. (2014). Cellular Levels of Oxidative Stress Affect the Response of Cervical Cancer Cells to Chemotherapeutic Agents. BioMed Research International, 2014, 574659. https://doi.org/10.1155/2014/574659 Gioanni, J., Grosgeorge, J., Zanghellini, E., Mazeau, C., Gaudray, P., Ettore, F., Formento, P., & Demard, F. (1993). Characterization of CAL39, a new human cell line derived from a vulvar squamous cell carcinoma. International Journal of Oncology, 3(2), 293–297. https://doi.org/10.3892/IJO.3.2.293/ABSTRACT Gradissimo, A., Lam, J., Attonito, J. D., Palefsky, J., Massad, L. S., Xie, X., Eltoum, I. E., Rahangdale, L., Fischl, M. A., Anastos, K., Minkoff, H., Xue, X., D’Souza, G., Flowers, L. C., Colie, C., Shrestha, S., Hessol, N. A., Strickler, H. D., & Burk, R. D. (2018). Methylation of high-risk human papillomavirus genomes are associated with cervical precancer in HIV-positive women. Cancer Epidemiology Biomarkers and Prevention, 27(12), 1407–1415. https://doi.org/10.1158/1055-9965.EPI-17-1051 Gu, Z. (2022). Complex heatmap visualization. IMeta, 1(3), e43. https://doi.org/10.1002/IMT2.43;PAGEGROUP:STRING:PUBLICATION Haynes, W. (2013). Benjamini–Hochberg Method. Encyclopedia of Systems Biology, 78–78. https://doi.org/10.1007/978-1-4419-9863-7_1215 Haynes, W. (2013). Wilcoxon Rank Sum Test. Encyclopedia of Systems Biology, 2354–2355. https://doi.org/10.1007/978-1-4419-9863-7_1185/FIGURES/234 Hewavisenti, R. V., Arena, J., Ahlenstiel, C. L., & Sasson, S. C. (2023). Human papillomavirus in the setting of immunodeficiency: Pathogenesis and the emergence of next-generation therapies to reduce the high associated cancer risk. Frontiers in Immunology, 14, 1112513. https://doi.org/10.3389/FIMMU.2023.1112513 Hiramoto, S., Kato, K., Shoji, H., Okita, N., Takashima, A., Honma, Y., Iwasa, S., Hamaguchi, T., Yamada, Y., Shimada, Y., & Boku, N. (2018). A retrospective analysis of 5-fluorouracil plus cisplatin as first-line chemotherapy in the recent treatment strategy for patients with metastatic or recurrent esophageal squamous cell carcinoma. International Journal of Clinical Oncology, 23(3), 466–472. https://doi.org/10.1007/S10147-018-1239-X Horikawa, N., Baba, T., Matsumura, N., Murakami, R., Abiko, K., Hamanishi, J., Yamaguchi, K., Koshiyama, M., Yoshioka, Y., & Konishi, I. (2015). Genomic profile predicts the efficacy of neoadjuvant chemotherapy for cervical cancer patients. BMC Cancer 2015 15:1, 15(1), 739-. https://doi.org/10.1186/S12885-015-1703-1 Huang, R., & Rofstad, E. K. (2016). Cancer stem cells (CSCs), cervical CSCs and targeted therapies. Oncotarget, 8(21), 35351. https://doi.org/10.18632/ONCOTARGET.10169 Huang, Y., Georges, D., Rumgay, H., Soerjomataram, I., & Clifford, G. M. (2025). Global burden of cancer attributable to HIV: a worldwide incidence analysis. The Lancet Global Health, 13(9), e1525–e1532. https://doi.org/10.1016/S2214-109X(25)00264-5 Iorio, F., Knijnenburg, T. A., Vis, D. J., Bignell, G. R., Menden, M. P., Schubert, M., Aben, N., Gonçalves, E., Barthorpe, S., Lightfoot, H., Cokelaer, T., Greninger, P., van Dyk, E., Chang, H., de Silva, H., Heyn, H., Deng, X., Egan, R. K., Liu, Q., … Garnett, M. J. (2016a). A Landscape of Pharmacogenomic Interactions in Cancer. Cell, 166(3), 740–754. https://doi.org/10.1016/j.cell.2016.06.017 Iorio, F., Knijnenburg, T. A., Vis, D. J., Bignell, G. R., Menden, M. P., Schubert, M., Aben, N., Gonçalves, E., Barthorpe, S., Lightfoot, H., Cokelaer, T., Greninger, P., van Dyk, E., Chang, H., de Silva, H., Heyn, H., Deng, X., Egan, R. K., Liu, Q., … Garnett, M. J. (2016b). A Landscape of Pharmacogenomic Interactions in Cancer. Cell, 166(3), 740–754. https://doi.org/10.1016/j.cell.2016.06.017 Javed, S., Sood, S., Rai, B., Bhattacharyya, S., Bagga, R., & Srinivasan, R. (2021). ALDH1 & CD133 in invasive cervical carcinoma & their association with the outcome of chemoradiation therapy. The Indian Journal of Medical Research, 154(2), 367. https://doi.org/10.4103/IJMR.IJMR_709_20 John-Olabode, S. O., Udenze, I. C., Adejimi, A. A., Ajie, O., & Okunade, K. S. (2025). Association between tumour necrosis factor-a polymorphism and cervical cancer in Lagos State, Nigeria. Ecancermedicalscience, 19, 1845. https://doi.org/10.3332/ECANCER.2025.1845 Kamradt, MC, & al. (2000). Inhibition of radiation-induced apoptosis by dexamethasone in cervical carcinoma cell lines depends upon increased HPV E6/E7. https://doi.org/10.1054/bjoc.2000.1114 Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M., & Tanabe, M. (2016). KEGG as a reference resource for gene and protein annotation. Nucleic Acids Research, 44(D1), D457–D462. https://doi.org/10.1093/NAR/GKV1070 Kaur, D., Lee, S. M., Goldberg, D., Spix, N. J., Hinoue, T., Li, H.-T., Dwaraka, V. B., Smith, R., Shen, H., Liang, G., Renke, N., Laird, P. W., & Zhou, W. (2023). Comprehensive evaluation of the Infinium human MethylationEPIC v2 BeadChip. Epigenetics Communications, 3(1). https://doi.org/10.1186/S43682-023-00021-5 Keleg, S., Titov, A., Heller, A., Giese, T., Tjaden, C., Ahmad, S. S., Gaida, M. M., Bauer, A. S., Werner, J., & Giese, N. A. (2014). Chondroitin Sulfate Proteoglycan CSPG4 as a Novel Hypoxia-Sensitive Marker in Pancreatic Tumors. PLOS ONE, 9(6), e100178. https://doi.org/10.1371/JOURNAL.PONE.0100178 Kolde, R. (2025). Pretty Heatmaps [R package pheatmap version 1.0.13]. CRAN: Contributed Packages. https://doi.org/10.32614/CRAN.PACKAGE.PHEATMAP Koraneekit, A., Limpaiboon, T., Sangka, A., Boonsiri, P., Daduang, S., & Daduang, J. (2018). Synergistic effects of cisplatin-caffeic acid induces apoptosis in human cervical cancer cells via the mitochondrial pathways. Oncology Letters, 15(5), 7397–7402. https://doi.org/10.3892/OL.2018.8256/ABSTRACT Korotkevich, G., Sukhov, V., Budin, N., Shpak, B., Artyomov, M. N., & Sergushichev, A. (2016). Fast gene set enrichment analysis. BioRxiv. https://doi.org/10.1101/060012 Krijthe, J. (2023). T-Distributed Stochastic Neighbor Embedding using a Barnes-Hut Implementation [R package Rtsne version 0.17]. CRAN: Contributed Packages. https://doi.org/10.32614/CRAN.PACKAGE.RTSNE Kumar, A., Khurana, U., Chowdhary, R., Halder, A., & Kapoor, N. (2024). Evaluation of the diagnostic utility of MCAM-1 (CD146) in a group of common gynecological cancers: A case-control study. Turkish Journal of Obstetrics and Gynecology, 21(1), 43. https://doi.org/10.4274/TJOD.GALENOS.2024.38265 Levine, J. H., Simonds, E. F., Bendall, S. C., Davis, K. L., Amir, E. A. D., Tadmor, M. D., Litvin, O., Fienberg, H. G., Jager, A., Zunder, E. R., Finck, R., Gedman, A. L., Radtke, I., Downing, J. R., Pe’er, D., & Nolan, G. P. (2015). Data-Driven Phenotypic Dissection of AML Reveals Progenitor-like Cells that Correlate with Prognosis. Cell, 162(1), 184–197. https://doi.org/10.1016/j.cell.2015.05.047 Li, D. Z., Yan, B., Liao, K., Huang, J., Zhang, J., Chen, Y. C., Zhu, J., Zhi, S., & Chen, L. (2025). Multi-omics modality completion and knowledge distillation for drug response prediction in cervical cancer. Frontiers in Oncology, 15, 1622600. https://doi.org/10.3389/FONC.2025.1622600/BIBTEX Li, X., Yue, Z., Wang, D., & Zhou, L. (2023). PTPRC functions as a prognosis biomarker in the tumor microenvironment of cutaneous melanoma. Scientific Reports 2023 13:1, 13(1), 1–15. https://doi.org/10.1038/s41598-023-46794-6 Li, Y., Liu, Q., Jing, X., Wang, Y., Jia, X., Yang, X., & Chen, K. (2025). Cancer‐Associated Fibroblasts: Heterogeneity, Cancer Pathogenesis, and Therapeutic Targets. MedComm, 6(7), e70292. https://doi.org/10.1002/MCO2.70292 Liberzon, A., Birger, C., Thorvaldsdóttir, H., Ghandi, M., Mesirov, J. P., & Tamayo, P. (2015). The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Systems, 1(6), 417. https://doi.org/10.1016/J.CELS.2015.12.004 Lin, M., Pan, C., Xu, W., Li, J., & Zhu, X. (2020). Leonurine promotes cisplatin sensitivity in human cervical cancer cells through increasing apoptosis and inhibiting drug-resistant proteins. Drug Design, Development and Therapy, 14, 1885–1895. https://doi.org/10.2147/DDDT.S252112;WEBSITE:WEBSITE:TFOPB;PAGEGROUP:STRING:PUBLICATION Litwin, T. R., Irvin, S. R., Chornock, R. L., Sahasrabuddhe, V. V., Stanley, M., & Wentzensen, N. (2020). Infiltrating T-cell markers in cervical carcinogenesis: a systematic review and meta-analysis. British Journal of Cancer 2020 124:4, 124(4), 831–841. https://doi.org/10.1038/s41416-020-01184-x Liu, B., Zhai, J., Wang, W., Liu, T., Liu, C., Zhu, X., Wang, Q., Tian, W., & Zhang, F. (2022). Identification of Tumor Microenvironment and DNA Methylation-Related Prognostic Signature for Predicting Clinical Outcomes and Therapeutic Responses in Cervical Cancer. Frontiers in Molecular Biosciences, 9, 872932. https://doi.org/10.3389/FMOLB.2022.872932/BIBTEX Liu, J., Yang, L., Zhang, J., Zhang, J., Chen, Y., Li, K., Li, Y., Li, Y., Yao, L., & Guo, G. (2012). Knock-down of NDRG2 sensitizes cervical cancer Hela cells to cisplatin through suppressing Bcl-2 expression. BMC Cancer 2012 12:1, 12(1), 370-. https://doi.org/10.1186/1471-2407-12-370 Liu, Y., Wu, W., Cai, C., Zhang, H., Shen, H., & Han, Y. (2023). Patient-derived xenograft models in cancer therapy: technologies and applications. Signal Transduction and Targeted Therapy 2023 8:1, 8(1), 160-. https://doi.org/10.1038/s41392-023-01419-2 Ma, W., Tang, W., Kwok, J. S. L., Tong, A. H. Y., Lo, C. W. S., Chu, A. T. W., & Chung, B. H. Y. (2024). A review on trends in development and translation of omics signatures in cancer. Computational and Structural Biotechnology Journal, 23, 954–971. https://doi.org/10.1016/J.CSBJ.2024.01.024 McInnes, L., Healy, J., & Melville, J. (2018). UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. https://arxiv.org/pdf/1802.03426 McKight, P. E., & Najab, J. (2010). Kruskal-Wallis Test. The Corsini Encyclopedia of Psychology, 1–1. https://doi.org/10.1002/9780470479216.CORPSY0491 Moro, M., Balestrero, F. C., & Grolla, A. A. (2024). Pericytes: jack-of-all-trades in cancer-related inflammation. Frontiers in Pharmacology, 15. https://doi.org/10.3389/FPHAR.2024.1426033 Naidu, R., Paulraj, F., Abas, F., Lajis, N., & Othman, I. (2018). Identification of Differentially Expressed Genes in CaSki Cervical Cancer Cells Treated with a Selected Diarylpentanoid. Frontiers in Pharmacology, 9. https://doi.org/10.3389/CONF.FPHAR.2018.63.00121/EVENT_ABSTRACT Nazli, A., Chan, O., Dobson-Belaire, W. N., Ouellet, M., Tremblay, M. J., Gray-Owen, S. D., Arsenault, A. L., & Kaushic, C. (2010). Exposure to HIV-1 Directly Impairs Mucosal Epithelial Barrier Integrity Allowing Microbial Translocation. PLoS Pathogens, 6(4), e1000852. https://doi.org/10.1371/JOURNAL.PPAT.1000852 Olukomogbon, T., Akpobome, B., Omole, A., Adebamowo, C. A., & Adebamowo, S. N. (2024). Association Between Cervical Inflammatory Mediators and Prevalent Cervical Human Papillomavirus Infection. JCO Global Oncology, 10(10), e2300380. https://doi.org/10.1200/GO.23.00380 Pavone, G., Marino, A., Fisicaro, V., Motta, L., Spata, A., Martorana, F., Spampinato, S., Celesia, B. M., Cacopardo, B., Vigneri, P., & Nunnari, G. (2024). Entangled Connections: HIV and HPV Interplay in Cervical Cancer—A Comprehensive Review. International Journal of Molecular Sciences, 25(19). https://doi.org/10.3390/IJMS251910358 Peng, D., Gleyzer, R., Tai, W. H., Kumar, P., Bian, Q., Isaacs, B., da Rocha, E. L., Cai, S., DiNapoli, K., Huang, F. W., & Cahan, P. (2021). Evaluating the transcriptional fidelity of cancer models. Genome Medicine, 13(1), 73. https://doi.org/10.1186/S13073-021-00888-W Peng, Y. X., Yu, B., Qin, H., Xue, L., Liang, Y. J., & Quan, Z. X. (2020). EMT-related gene expression is positively correlated with immunity and may be derived from stromal cells in osteosarcoma. PeerJ, 2020(2), e8489. https://doi.org/10.7717/PEERJ.8489/SUPP-4 Raghavan, S., Winter, P. S., Navia, A. W., Williams, H. L., DenAdel, A., Lowder, K. E., Galvez-Reyes, J., Kalekar, R. L., Mulugeta, N., Kapner, K. S., Raghavan, M. S., Borah, A. A., Liu, N., Väyrynen, S. A., Costa, A. D., Ng, R. W. S., Wang, J., Hill, E. K., Ragon, D. Y., … Shalek, A. K. (2021). Microenvironment drives cell state, plasticity, and drug response in pancreatic cancer. Cell, 184(25), 6119-6137.e26. https://doi.org/10.1016/J.CELL.2021.11.017 Richter, C. E., Cocco, E., Bellone, S., Bellone, M., Casagrande, F., Todeschini, P., Rüttinger, D., Silasi, D. A., Azodi, M., Schwartz, P. E., Rutherford, T. J., Pecorelli, S., & Santin, A. D. (2010). Primary Cervical Carcinoma Cell Lines Overexpress Epithelial Cell Adhesion Molecule (EpCAM) and Are Highly Sensitive to Immunotherapy With MT201, a Fully Human Monoclonal Anti-EpCAM Antibody. International Journal of Gynecological Cancer : Official Journal of the International Gynecological Cancer Society, 20(9), 1440. https://doi.org/10.1111/IGC.0b013e3181fb18a1 Ritchie, M. E., Phipson, B., Wu, D., Hu, Y., Law, C. W., Shi, W., & Smyth, G. K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research, 43(7), e47. https://doi.org/10.1093/NAR/GKV007 Ruan, H., Zhou, Y., Shen, J., Zhai, Y., Xu, Y., Pi, L., Huang, R., Chen, K., Li, X., Ma, W., Wu, Z., Deng, X., Wang, X., Zhang, C., & Guan, M. (2020). Circulating tumor cell characterization of lung cancer brain metastases in the cerebrospinal fluid through single-cell transcriptome analysis. Clinical and Translational Medicine, 10(8), e246. https://doi.org/10.1002/CTM2.246 Saha, S. K., Kim, K., Yang, G. M., Choi, H. Y., & Cho, S. G. (2018a). Cytokeratin 19 (KRT19) has a Role in the Reprogramming of Cancer Stem Cell-Like Cells to Less Aggressive and More Drug-Sensitive Cells. International Journal of Molecular Sciences 2018, Vol. 19, Page 1423, 19(5), 1423. https://doi.org/10.3390/IJMS19051423 Saha, S. K., Kim, K., Yang, G. M., Choi, H. Y., & Cho, S. G. (2018b). Cytokeratin 19 (KRT19) has a Role in the Reprogramming of Cancer Stem Cell-Like Cells to Less Aggressive and More Drug-Sensitive Cells. International Journal of Molecular Sciences 2018, Vol. 19, Page 1423, 19(5), 1423. https://doi.org/10.3390/IJMS19051423 Sahu, D., Shi, J., Segura Rueda, I. A., Chatrath, A., & Dutta, A. (2024). Development of a polygenic score predicting drug resistance and patient outcome in breast cancer. Npj Precision Oncology 2024 8:1, 8(1), 219-. https://doi.org/10.1038/s41698-024-00714-7 Salvadores, M., Fuster-Tormo, F., & Supek, F. (2020). Matching cell lines with cancer type and subtype of origin via mutational, epigenomic, and transcriptomic patterns. Science Advances, 6(27), eaba1862. https://doi.org/10.1126/SCIADV.ABA1862 Seshadri, V. D. (2021). Brucine promotes apoptosis in cervical cancer cells (ME-180) via suppression of inflammation and cell proliferation by regulating PI3K/AKT/mTOR signaling pathway. Environmental Toxicology, 36(9), 1841–1847. https://doi.org/10.1002/TOX.23304;JOURNAL:JOURNAL:10982256A;WGROUP:STRING:PUBLICATION Shim, W. S. N., Teh, M., Bapna, A., Kim, I., Koh, G. Y., Mack, P. O. P., & Ge, R. (2002). Angiopoietin 1 Promotes Tumor Angiogenesis and Tumor Vessel Plasticity of Human Cervical Cancer in Mice. Experimental Cell Research, 279(2), 299–309. https://doi.org/10.1006/EXCR.2002.5597 Song, J., Yang, P., Chen, C., Ding, W., Tillement, O., Bai, H., & Zhang, S. (2025). Targeting epigenetic regulators as a promising avenue to overcome cancer therapy resistance. Signal Transduction and Targeted Therapy 2025 10:1, 10(1), 1–56. https://doi.org/10.1038/s41392-025-02266-z Strickler, H. D., Burk, R. D., Fazzari, M., Anastos, K., Minkoff, H., Massad, L. S., Hall, C., Bacon, M., Levine, A. M., Watts, D. H., Silverberg, M. J., Xue, X., Schlecht, N. F., Melnick, S., & Palefsky, J. M. (2005). Natural history and possible reactivation of human papillomavirus in human immunodeficiency virus-positive women. Journal of the National Cancer Institute, 97(8), 577–586. https://doi.org/10.1093/JNCI/DJI073 Szalai, B., Subramanian, V., Holland, C. H., Alföldi, R., Puskás, L. G., & Saez-Rodriguez, J. (2019). Signatures of cell death and proliferation in perturbation transcriptomics data-from confounding factor to effective prediction. Nucleic Acids Research, 47(19), 10010–10026. https://doi.org/10.1093/NAR/GKZ805 Vučković, N., Hoppe-Seyler, K., & Riemer, A. B. (2023). Characterization of DoTc2 4510—Identifying HPV16 Presence in a Cervical Carcinoma Cell Line Previously Considered to Be HPV-Negative. Cancers, 15(15). https://doi.org/10.3390/CANCERS15153810/S1 Vuyst, H. De, Ndirangu, G., Moodley, M., Tenet, V., Estambale, B., Meijer, C. J. L. M., Snijders, P. J. F., Clifford, G., & Franceschi, S. (2012). Prevalence of human papillomavirus in women with invasive cervical carcinoma by HIV status in Kenya and South Africa. International Journal of Cancer, 131(4), 949–955. https://doi.org/10.1002/IJC.26470 Wang, J., Gu, X., Cao, L., Ouyang, Y., Qi, X., Wang, Z., & Wang, J. (2022). A novel prognostic biomarker CD3G that correlates with the tumor microenvironment in cervical cancer. Frontiers in Oncology, 12, 979226. https://doi.org/10.3389/FONC.2022.979226/FULL Wang, W., Lokman, N. A., Barry, S. C., Oehler, M. K., & Ricciardelli, C. (2025). LGR5: An emerging therapeutic target for cancer metastasis and chemotherapy resistance. Cancer and Metastasis Reviews, 44(1). https://doi.org/10.1007/S10555-024-10239-X Weinstein, J. N., Collisson, E. A., Mills, G. B., Shaw, K. R. M., Ozenberger, B. A., Ellrott, K., Sander, C., Stuart, J. M., Chang, K., Creighton, C. J., Davis, C., Donehower, L., Drummond, J., Wheeler, D., Ally, A., Balasundaram, M., Birol, I., Butterfield, Y. S. N., Chu, A., … Kling, T. (2013). The Cancer Genome Atlas Pan-Cancer analysis project. Nature Genetics 2013 45:10, 45(10), 1113–1120. https://doi.org/10.1038/ng.2764 Wickham, H., Chang, W., Henry, L., Pedersen, T. L., Takahashi, K., Wilke, C., Woo, K., Yutani, H., Dunnington, D., & van den Brand, T. (2025). Create Elegant Data Visualisations Using the Grammar of Graphics [R package ggplot2 version 4.0.1]. CRAN: Contributed Packages. https://doi.org/10.32614/CRAN.PACKAGE.GGPLOT2 Wei, C. et al. (2024). Integrated machine learning identifies a cellular senescence-related prognostic model to improve outcomes in uterine corpus endometrial carcinoma. https://doi.org/10.3389/fimmu.2024.1418508. Wisniewski, S. J., & Brannan, G. D. (2024). Correlation (Coefficient, Partial, and Spearman Rank) and Regression Analysis. StatPearls. https://www.ncbi.nlm.nih.gov/books/NBK606101/ Xia, W. T., Qiu, W. R., Yu, W. K., Xu, Z. C., & Zhang, S. H. (2023). Identifying TME signatures for cervical cancer prognosis based on GEO and TCGA databases. Heliyon, 9(4), e15096. https://doi.org/10.1016/J.HELIYON.2023.E15096 Yang, J., Xu, J., Wang, W., Zhang, B., Yu, X., & Shi, S. (2023). Epigenetic regulation in the tumor microenvironment: molecular mechanisms and therapeutic targets. Signal Transduction and Targeted Therapy 2023 8:1, 8(1), 210-. https://doi.org/10.1038/s41392-023-01480-x Yang, W., Soares, J., Greninger, P., Edelman, E. J., Lightfoot, H., Forbes, S., Bindal, N., Beare, D., Smith, J. A., Thompson, I. R., Ramaswamy, S., Futreal, P. A., Haber, D. A., Stratton, M. R., Benes, C., McDermott, U., & Garnett, M. J. (2013). Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Research, 41(Database issue). https://doi.org/10.1093/NAR/GKS1111 Ying, L., Zhang, L., Chen, Y., Huang, C., Zhou, J., Xie, J., & Liu, L. (2025). Predicting immunotherapy prognosis and targeted therapy sensitivity of colon cancer based on a CAF-related molecular signature. Scientific Reports 2025 15:1, 15(1), 1–19. https://doi.org/10.1038/s41598-025-90899-z Zhang, S. Y., Ren, X. Y., Wang, C. Y., Chen, X. J., Cao, R. Y., Liu, Q., Pan, X., Zhou, J. Y., Zhang, W. L., Tang, X. R., Cheng, B., & Wu, T. (2021). Comprehensive Characterization of Immune Landscape Based on Epithelial-Mesenchymal Transition Signature in OSCC: Implication for Prognosis and Immunotherapy. Frontiers in Oncology, 11, 587862. https://doi.org/10.3389/FONC.2021.587862/BIBTEX Zhao, A., Pan, Y., Gao, Y., Zhi, Z., Lu, H., Dong, B., Zhang, X., Wu, M., Zhu, F., Zhou, S., & Ma, S. (2024). MUC1 promotes cervical squamous cell carcinoma through ERK phosphorylation-mediated regulation of ITGA2/ITGA3. BMC Cancer 2024 24:1, 24(1), 1–14. https://doi.org/10.1186/S12885-024-12314-6 Zhao, Y., Li, M. C., Konaté, M. M., Chen, L., Das, B., Karlovich, C., Williams, P. M., Evrard, Y. A., Doroshow, J. H., & McShane, L. M. (2021). TPM, FPKM, or Normalized Counts? A Comparative Study of Quantification Measures for the Analysis of RNA-seq Data from the NCI Patient-Derived Models Repository. Journal of Translational Medicine, 19(1). https://doi.org/10.1186/S12967-021-02936-W Zhao, Y., Zhao, C., Zhao, J., Ma, Y., Zhang, S., Liu, Y., Wang, Y., Liu, S., & Zhang, Y. (2025). Excavation of Molecular Subtypes of Cervical Cancer Based on DNA Methylation Patterns. Frontiers in Bioscience (Landmark Edition), 30(9). https://doi.org/10.31083/FBL45025 Zheng, Y., Han, J., Qu, Y., Wang, J., Joyce, B. T., Kim, K., Nannini, D. R., Musa, J., Imade, G. E., Anorlu, R., Maiga, M., Morhason-Bello, I., Simon, M. A., Silas, O., Abdulkareem, F. B., Badmos, K., Nyam, C. J., Gursel, D. B., Wei, J. J., … Hou, L. (2025). DNA methylation biomarkers for cervical cancer risk prediction in HIV-positive Nigerian women. International Journal of Cancer, 157(7), 1363–1375. https://doi.org/10.1002/IJC.35502;JOURNAL:JOURNAL:10970215;WGROUP:STRING:PUBLICATION Zhu, X., Li, S., Luo, J., Ying, X., Li, Z., Wang, Y., Zhang, M., Zhang, T., Jiang, P., & Wang, X. (2022). Subtyping of Human Papillomavirus-Positive Cervical Cancers Based on the Expression Profiles of 50 Genes. Frontiers in Immunology, 13, 801639. https://doi.org/10.3389/FIMMU.2022.801639/BIBTEX Zhu, Y. (2025). Leveraging Data Visualization with ggplot2 in Translation Pedagogy: Enhancing Learning Through Visual Insights. Lecture Notes in Computer Science, 15589 LNCS, 135–144. https://doi.org/10.1007/978-981-96-4407-0_11 Zięba, S., Kowalik, A., Zalewski, K., Rusetska, N., Goryca, K., Piaścik, A., Misiek, M., Bakuła-Zalewska, E., Kopczyński, J., Kowalski, K., Radziszewski, J., Bidziński, M., Góźdź, S., & Kowalewska, M. (2018). Somatic mutation profiling of vulvar cancer: Exploring therapeutic targets. Gynecologic Oncology, 150(3), 552–561. https://doi.org/10.1016/j.ygyno.2018.06.026 Additional Declarations No competing interests reported. Supplementary Files DNAmethylationbasedcomputationalframeworkannexure.pdf Cite Share Download PDF Status: Under Review Version 1 posted Reviewers agreed at journal 20 May, 2026 Reviewers agreed at journal 20 May, 2026 Reviewers agreed at journal 11 May, 2026 Reviewers invited by journal 07 May, 2026 Editor assigned by journal 06 Feb, 2026 Submission checks completed at journal 30 Jan, 2026 First submitted to journal 27 Jan, 2026 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8708919","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":641601369,"identity":"b8d4b4c3-ecb4-47aa-af60-96305bcfe240","order_by":0,"name":"Saltiel Hamese","email":"","orcid":"","institution":"Council for Scientific and Industrial Research","correspondingAuthor":false,"prefix":"","firstName":"Saltiel","middleName":"","lastName":"Hamese","suffix":""},{"id":641601377,"identity":"b3b36888-f36c-4a28-90f2-539c782225b3","order_by":1,"name":"Mutsa Takundwa","email":"","orcid":"","institution":"Council for Scientific and Industrial Research","correspondingAuthor":false,"prefix":"","firstName":"Mutsa","middleName":"","lastName":"Takundwa","suffix":""},{"id":641601384,"identity":"0ae0b0e5-b2da-4823-a28e-2f28ac9abc0a","order_by":2,"name":"Earl Prinsloo","email":"","orcid":"","institution":"Rhodes University","correspondingAuthor":false,"prefix":"","firstName":"Earl","middleName":"","lastName":"Prinsloo","suffix":""},{"id":641601392,"identity":"2a7ce3fa-913d-4bc8-a9d0-1c83626c03d6","order_by":3,"name":"Deepak Balaji Thimiri Govindaraj","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA9klEQVRIiWNgGAWjYDCCA0CcwMAMYiY+YGCQIE1LsgHxWhggWtiIUc/AwHd+8TGJhzus5c3bDzyr+PHHIp+B/fABxh81uLVI3niWJpF4Jt1wzpmEtJu9bRKWDTxpCQwSx3BrMbhxxtggse0w4wyGhLTbjA0SBgwSPAYMBmyEtdjP4H+QVszwB6ol4R8eLed7DB8AtSTOkEhIYwaGAETLwTZ8fmFLBGpJT54h8SBZEugXAzagXw429uHWwnf+8IGDP9usbWfw5yR++PGnzoCf/fDBhz++4dbCIJEAY/FAWCCPH8CjgYGBHy7Njl/hKBgFo2AUjFwAAAWBUVSf16+jAAAAAElFTkSuQmCC","orcid":"","institution":"Council for Scientific and Industrial Research","correspondingAuthor":true,"prefix":"","firstName":"Deepak","middleName":"Balaji Thimiri","lastName":"Govindaraj","suffix":""}],"badges":[],"createdAt":"2026-01-27 10:19:37","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8708919/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8708919/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":109763911,"identity":"a087ac54-0444-46ee-8697-c5834f142971","added_by":"auto","created_at":"2026-05-22 07:36:16","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":385461,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003e\u003cstrong\u003eGraphical workflow of the analytical framework for tumour microenvironment–aware DNA methylation analysis and experimental model alignment in cervical cancer.\u003c/strong\u003e\u003c/em\u003e The framework is organised into three sequential parts. \u003cstrong\u003ePart 1\u003c/strong\u003e illustrates data retrieval from public repositories, followed by harmonisation and rigorous sample- and probe-level quality control to generate an analysis-ready methylation matrix. \u003cstrong\u003ePart 2\u003c/strong\u003e shows feature-restricted differential methylation analysis, tumour state reconstruction by unsupervised learning, and functional enrichment, enabling identification and biological interpretation of discrete DNA methylation endotypes shaped by tumour microenvironment (TME) context and tumour-intrinsic programmes. \u003cstrong\u003ePart 3\u003c/strong\u003e depicts tumour–model concordance, multi-feature integration, and predictive modelling, in which feature-restricted similarity metrics are used to align patient tumours with experimental systems and evaluate model performance.\u003c/p\u003e","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-8708919/v1/c9f02dfa5a4718a4e1b97b15.png"},{"id":109764001,"identity":"3cf80103-f62b-490a-8a01-66b1412a245c","added_by":"auto","created_at":"2026-05-22 07:36:22","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":2468965,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003e\u003cstrong\u003eStudy design and development framework.\u003c/strong\u003e\u003c/em\u003e Schematic overview of the analytical workflow illustrating data sources, core analytical modules, and the integration logic underpinning tumour–model inference. DNA methylation profiles from GSE279982 (patient tumours) and GSE68379 (cervical cancer cell lines) were harmonised, quality controlled, probe-filtered, and promoter-aggregated to generate biologically interpretable methylation matrices. Feature restriction was then applied to construct three complementary tumour representations: \u003cstrong\u003e(i)\u003c/strong\u003etumour microenvironment (TME)–restricted CpGs, \u003cstrong\u003e(ii)\u003c/strong\u003e differentially methylated positions (DMPs), and \u003cstrong\u003e(iii)\u003c/strong\u003e cancer driver–associated CpGs. TME-restricted features were used for unsupervised clustering via Rphenograph to define methylation-based tumour states, together with dimensionality reduction and pathway enrichment analyses to assign biological meaning to each endotype. In parallel, tumour–model similarity scores were computed independently within each restricted feature space to evaluate the fidelity of in vitro systems to patient tumours. Outputs from the DMP-, driver-, and TME-restricted concordance analyses were integrated within a consensus inference framework, calibrated using TCGA-CESC transcriptomic data, and formalised through a random forest classifier to derive robust prioritisation of experimental models. Final tumour states and model assignments were subsequently contextualised with clinical and virological variables to assess relevance to HIV status, HPV genotype, and disease characteristics, forming the complete development and application workflow.\u003c/p\u003e","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-8708919/v1/7e59b8a90478634112d28894.png"},{"id":109764169,"identity":"bd23e57f-0ae0-49cf-b3b1-a61cf338ca03","added_by":"auto","created_at":"2026-05-22 07:36:39","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":1545308,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003e\u003cstrong\u003eClassifier architecture for integrative epigenetic model inference.\u003c/strong\u003e\u003c/em\u003e Schematic overview of the multi-feature, weakly supervised classification framework designed to infer optimal experimental models from DNA methylation data. The approach integrates three biologically constrained methylation feature spaces: tumour microenvironment (TME)–restricted CpGs, cancer driver gene–restricted CpGs, and intra-cluster HIV-associated differentially methylated positions (DMPs). Within each feature space, tumour–model similarity is computed independently using Spearman rank correlation to capture monotonic epigenetic concordance while remaining robust to outliers and non-normal distributions. Similarity scores are subsequently normalised within each patient to generate relative enrichment measures (Z-scores), enabling harmonisation across heterogeneous feature domains. Feature-specific top-ranked model predictions are then reconciled using a weakly supervised consensus labelling strategy that prioritises agreement across feature spaces and resolves conflicts using biologically informed precedence rules. These consensus labels are used to train a Random Forest classifier on the integrated similarity features, allowing modelling of non-linear interactions and robust decision boundaries. The trained model outputs probabilistic experimental model assignments for each patient, providing confidence-aware predictions and interpretable feature importance estimates that quantify the relative contribution of TME-, driver-, and HIV-associated epigenetic signals.\u003c/p\u003e","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-8708919/v1/81d290f6b8ee93782fded275.png"},{"id":109763867,"identity":"ed2dd92f-8b06-4c1c-844d-b3a89da1e7ca","added_by":"auto","created_at":"2026-05-22 07:36:04","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":91650,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003e\u003cstrong\u003eSample-level quality control of cervical cancer cell-line methylation data.\u003c/strong\u003e\u003c/em\u003e A Scatter plot of median methylated versus unmethylated signal intensities derived from raw IDAT files demonstrates uniformly high array quality across all cervical cancer models. No samples fell below quality thresholds, supporting inclusion of all cell lines in downstream normalisation and integrative analyses.\u003c/p\u003e","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-8708919/v1/89126d829739936e786eae96.png"},{"id":109763904,"identity":"9e0cc8d8-5085-41d1-a3f4-087695583fe9","added_by":"auto","created_at":"2026-05-22 07:36:16","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":5480183,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003e\u003cstrong\u003eHIV-associated differential DNA methylation in cervical cancer\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003e. (A)\u003c/strong\u003eVolcano plot showing log₂ fold changes (M-values) versus −log₁₀ FDR-adjusted p-values for all tested CpG sites from limma differential methylation analysis comparing HIV-positive and HIV-negative tumours. Significant DMPs highlight extensive HIV-associated hypermethylation and hypomethylation across the cervical cancer methylome. \u003cstrong\u003e(B)\u003c/strong\u003e Heatmap of scaled β-values for the top 500 HIV-associated DMPs, demonstrating coordinated methylation patterns with clear separation by HIV status and additional stratification by HPV group and tumour stage.\u003c/p\u003e","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-8708919/v1/4465a428abf6d7c54aefa7ae.png"},{"id":109765084,"identity":"b8a642e6-5150-4f46-914e-2d4c8f12f581","added_by":"auto","created_at":"2026-05-22 07:39:21","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":776276,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003e\u003cstrong\u003ePatient-level DEA-restricted tumour–model similarity profiles.\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003e \u003c/strong\u003eMulti-line plot showing Spearman correlation coefficients between individual patient tumours and cervical cancer biomodels calculated using DEA-restricted CpGs. Each line represents a distinct biomodel, with correlation trajectories plotted across patients. DEA restriction expands the dynamic range of tumour–model similarity and reveals structured inter-patient heterogeneity, highlighting biomodels that preferentially align with specific tumour subsets rather than exhibiting uniform global similarity.\u003c/p\u003e","description":"","filename":"floatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-8708919/v1/08b126282671282492b95c55.png"},{"id":109764141,"identity":"61c2ce35-92fe-4d1d-b9d0-739ce3369c99","added_by":"auto","created_at":"2026-05-22 07:36:35","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":5401107,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003e\u003cstrong\u003ePerformance, stability, and biological grounding of DEA-restricted Random Forest model prioritisation.\u003c/strong\u003e\u003c/em\u003e Machine learning–based prioritisation of tumour–model similarity, trained on DEA-restricted methylation concordance, enabled objective identification of high-confidence biomodels beyond naïve correlation ranking. \u003cstrong\u003e(A)\u003c/strong\u003ePrecision@K demonstrated excellent ranking performance in the clinically relevant region, with Precision remaining ~1.0 for the top one to three predicted models and declining gradually thereafter. This structured decay indicates confident separation between optimal and suboptimal biomodel candidates rather than diffuse or random ranking behaviour. \u003cstrong\u003e(B)\u003c/strong\u003e ROC analysis showed near-perfect discrimination between top-ranked and non-top-ranked models across patients (AUC = 1.0), confirming that DEA-restricted similarity features provide sufficient and generalisable signal for robust classification across heterogeneous tumours. \u003cstrong\u003e(C)\u003c/strong\u003e Stability analysis of Top-3 predictions revealed recurrence of specific cell lines across patients, with \u003cem\u003eMS751, SKG-IIIA and TC-YIK\u003c/em\u003e most frequently prioritised. This indicates that the classifier converges toward a biologically meaningful subset of representative cervical cancer models, rather than producing unstable or stochastic selections. \u003cstrong\u003e(D)\u003c/strong\u003e Regression analysis demonstrated a positive association between absolute DEA correlation and patient-normalised DEA-Z scores, indicating that relative similarity contributes biologically informative signal beyond absolute concordance (R² ≈ 0.21).\u003c/p\u003e","description":"","filename":"floatimage7.png","url":"https://assets-eu.researchsquare.com/files/rs-8708919/v1/9f04047007739498ffe94cf8.png"},{"id":109764139,"identity":"8951aa56-ef0a-4fdb-b960-6de2c6c6f4c2","added_by":"auto","created_at":"2026-05-22 07:36:35","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":8925433,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003e\u003cstrong\u003eDriver gene–restricted methylation space enables robust tumour stratification and biomodel prioritisation\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003e (A)\u003c/strong\u003e Heatmap of top driver-associated CpGs across patient tumours showing coherent blocks of hyper- and hypomethylation with patient stratification aligned to HIV status, HPV genotype, tumour stage, age, and BMI. \u003cstrong\u003e(B)\u003c/strong\u003e Multi-cell-line Spearman similarity profiles in the driver-restricted feature space demonstrating consistently high patient–biomodel concordance with recurrent peaks for specific models, indicating conserved driver-programme archetypes. \u003cstrong\u003e(D)\u003c/strong\u003e Precision@K analysis for the joint \u003cem\u003eDEA + driver\u003c/em\u003e model showing high precision among top-ranked biomodel predictions. \u003cstrong\u003e(C)\u003c/strong\u003e ROC curve for the joint DEA + driver model indicating near-perfect discriminative performance. \u003cstrong\u003e(E)\u003c/strong\u003eStability analysis of Top-3 model selections highlighting recurrent prioritisation of a small subset of biomodels, including \u003cem\u003eMS751, HELASF, SKG-IIIA, TC-YIK,\u003c/em\u003e and \u003cem\u003eOMC-1\u003c/em\u003e \u003cstrong\u003e(F) \u003c/strong\u003eScatter plot relating relative (patient-normalised) driver enrichment to absolute driver correlation, revealing a strong linear relationship \u003cstrong\u003e(R² ≈ 0.61)\u003c/strong\u003e and demonstrating the dominant contribution of relative driver signal to biomodel fidelity.\u003c/p\u003e","description":"","filename":"floatimage8.png","url":"https://assets-eu.researchsquare.com/files/rs-8708919/v1/a14755317a0e5017d7af158e.png"},{"id":109763997,"identity":"69116998-bc94-4705-9059-c2cbaf5ba15a","added_by":"auto","created_at":"2026-05-22 07:36:22","extension":"png","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":5537113,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003e\u003cstrong\u003eTME-restricted methylation space enhances tumour stratification and experimental model prioritisation.\u003c/strong\u003e\u003c/em\u003e \u003cstrong\u003e(A)\u003c/strong\u003e Heatmap of the top 500 TME-associated differentially methylated CpGs across patient tumours. β-values are scaled per probe, and samples are hierarchically clustered. Tumours are annotated by HIV status, HPV group, cancer stage, age, and BMI, revealing coherent microenvironment-linked epigenetic structure. \u003cstrong\u003e(B)\u003c/strong\u003e DEA-restricted TME similarity profiles showing Spearman correlations between patient tumours and cervical cancer cell lines, highlighting structured and recurrent tumour–model concordance patterns. \u003cstrong\u003e(C)\u003c/strong\u003eReceiver operating characteristic (ROC) curve for the joint DEA + driver + TME predictive model, demonstrating excellent discriminative performance (AUC ≈ 0.99). \u003cstrong\u003e(D)\u003c/strong\u003e Precision@K analysis for the joint model, indicating high precision among the top-ranked experimental model predictions. \u003cstrong\u003e(E)\u003c/strong\u003eStability analysis of top-ranked models across patients, revealing recurrent prioritisation of a limited set of representative cell lines, including \u003cem\u003eMS751, HELASF,\u003c/em\u003e and \u003cem\u003eHeLa\u003c/em\u003e. \u003cstrong\u003e(F)\u003c/strong\u003e Scatter plot relating relative (patient-normalised Z-score) to absolute TME-restricted correlation, demonstrating a strong linear association \u003cstrong\u003e(R² ≈ 0.60)\u003c/strong\u003e and supporting the added value of relative enrichment for biologically meaningful experimental model prioritisation.\u003c/p\u003e","description":"","filename":"floatimage9.png","url":"https://assets-eu.researchsquare.com/files/rs-8708919/v1/fb6f9e8d529b347f04b87715.png"},{"id":109764170,"identity":"49fbe4b4-2bd9-455a-bbe9-66a21eef7eb1","added_by":"auto","created_at":"2026-05-22 07:36:39","extension":"png","order_by":10,"title":"Figure 10","display":"","copyAsset":false,"role":"figure","size":1853999,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003e\u003cstrong\u003eUnsupervised discovery of latent tumour states using TME-restricted and integrated methylation models.\u003c/strong\u003e\u003c/em\u003e\u003cem\u003e \u003c/em\u003e\u003cstrong\u003e(A)\u003c/strong\u003e UMAP embedding of \u003cem\u003eTME\u003c/em\u003e-restricted methylation similarity profiles reveals four highly distinct tumour clusters (C1–C4), indicating strong microenvironment-driven epigenetic stratification. \u003cstrong\u003e(B)\u003c/strong\u003e t-SNE embedding of the same \u003cem\u003eTME\u003c/em\u003e-restricted profiles confirms consistent cluster separation across dimensionality-reduction methods and concordance with unsupervised cluster assignments. \u003cstrong\u003e(C)\u003c/strong\u003e UMAP embedding of the joint \u003cem\u003eDriver + TME\u003c/em\u003e model preserves robust separation while reducing background variation, yielding three interpretable tumour communities reflecting combined intrinsic and microenvironmental influences. \u003cstrong\u003e(D)\u003c/strong\u003e UMAP embedding of the full joint model \u003cem\u003e(DEA + Driver + TME)\u003c/em\u003e demonstrates stable and coherent tumour communities with smooth transitional structure, consistent with a continuum of tumour states shaped by disease-associated, oncogenic, and microenvironmental epigenetic programmes.\u003c/p\u003e","description":"","filename":"floatimage10.png","url":"https://assets-eu.researchsquare.com/files/rs-8708919/v1/c5355395615850460907cd49.png"},{"id":109763066,"identity":"86c62e57-1394-4644-bee9-b0f21f0c3a87","added_by":"auto","created_at":"2026-05-22 07:33:35","extension":"png","order_by":11,"title":"Figure 11","display":"","copyAsset":false,"role":"figure","size":1549516,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003e\u003cstrong\u003eClinical enrichment and phenotypic stratification of TME + driver–defined tumour clusters\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003e. \u003c/strong\u003eUMAP embedding of tumours based on integrated TME + driver methylation profiles reveals three well-defined unsupervised clusters \u003cem\u003e(C1–C3)\u003c/em\u003e. The same embedding is shown with samples coloured by different clinical and host-related variables: \u003cstrong\u003e(A)\u003c/strong\u003e HPV genotype, demonstrating structured segregation of HPV16-, HPV18-, and other HPV-associated tumours; \u003cstrong\u003e(B)\u003c/strong\u003e BMI group, revealing a gradient consistent with host metabolic state influencing tumour-associated methylation patterns; \u003cstrong\u003e(C)\u003c/strong\u003eAge group, showing increased dispersion and epigenetic heterogeneity with advancing age; \u003cstrong\u003e(D)\u003c/strong\u003e HIV status, highlighting clear separation of HIV-positive and HIV-negative cervical cancers, with HIV-positive tumours exhibiting greater spread across the embedding.\u003c/p\u003e","description":"","filename":"floatimage11.png","url":"https://assets-eu.researchsquare.com/files/rs-8708919/v1/c41c44ab4408dda5d10a90d0.png"},{"id":109765086,"identity":"c707117d-b6a6-4fd7-8817-d9fd0445921b","added_by":"auto","created_at":"2026-05-22 07:39:22","extension":"png","order_by":12,"title":"Figure 12","display":"","copyAsset":false,"role":"figure","size":1706882,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003e\u003cstrong\u003eFunctional enrichment analysis of TME–Driver model–derived methylation clusters.\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003e \u003c/strong\u003eGene Ontology (GO) Biological Process \u003cstrong\u003e(A–C)\u003c/strong\u003eand KEGG pathway \u003cstrong\u003e(D–F)\u003c/strong\u003e enrichment analyses of cluster-specific differentially methylated genes identified from the joint TME–Driver methylation model. \u003cstrong\u003e(A, D)\u003c/strong\u003e Cluster \u003cem\u003eC1\u003c/em\u003e shows strong enrichment of immune sensing, innate immune signalling, cytokine-mediated inflammatory pathways, and immune surveillance, consistent with an immune-inflamed tumour microenvironment. \u003cstrong\u003e(B, E)\u003c/strong\u003e Cluster \u003cem\u003eC2\u003c/em\u003e is enriched for neuroendocrine, intracellular signalling, and differentiation-related pathways, including neuroactive ligand–receptor interaction and cAMP signalling, indicative of a signalling-driven, immune-cold tumour state. \u003cstrong\u003e(C, F)\u003c/strong\u003eCluster \u003cem\u003eC3\u003c/em\u003e is characterised by enrichment of immune–stromal interaction, cytokine signalling, and viral response pathways, reflecting chronic immune activation and tumour–microenvironment engagement. Together, these analyses demonstrate that TME–Driver–defined methylation clusters correspond to biologically distinct tumour endotypes with divergent immune and regulatory programmes.\u003c/p\u003e","description":"","filename":"floatimage12.png","url":"https://assets-eu.researchsquare.com/files/rs-8708919/v1/5f38e5903e33d04efe23be69.png"},{"id":109763900,"identity":"c85a4b82-9e32-4def-8b79-19a80149754b","added_by":"auto","created_at":"2026-05-22 07:36:16","extension":"png","order_by":13,"title":"Figure 13","display":"","copyAsset":false,"role":"figure","size":4125234,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003e\u003cstrong\u003eIntra-cluster differential methylation reveals endotype-specific HIV-associated epigenetic programmes.\u003c/strong\u003e\u003c/em\u003e Volcano plots depicting differential DNA methylation between HIV-positive and HIV-negative tumours. \u003cstrong\u003e(A–C)\u003c/strong\u003e Intra-cluster differential methylation analyses performed separately within methylation-defined tumour clusters \u003cem\u003eC1, C2\u003c/em\u003e, and \u003cem\u003eC3\u003c/em\u003e, respectively. Each point represents a CpG site plotted by log₂ fold change (HIV+ vs HIV−) and −log₁₀(FDR-adjusted P-value). \u003cstrong\u003e(D)\u003c/strong\u003e Global differential methylation analysis across all tumours irrespective of cluster assignment. Dashed vertical lines indicate effect-size thresholds, and the horizontal dashed line denotes the significance cutoff. Hypermethylated CpGs in HIV-positive tumours are shown in orange, hypomethylated CpGs in green, and non-significant sites in grey. The marked differences in number, magnitude, and directionality of HIV-associated DMPs across clusters demonstrate that HIV induces distinct, endotype-specific epigenetic programmes rather than a uniform methylation response across cervical cancer.\u003c/p\u003e","description":"","filename":"floatimage13.png","url":"https://assets-eu.researchsquare.com/files/rs-8708919/v1/3cbce53dea85c87279af8468.png"},{"id":109763917,"identity":"46889a40-b2fb-4156-89f8-2563fe57f126","added_by":"auto","created_at":"2026-05-22 07:36:17","extension":"png","order_by":14,"title":"Figure 14","display":"","copyAsset":false,"role":"figure","size":698680,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003e\u003cstrong\u003eEndotype-resolved functional enrichment and HIV-stratified tumour structure.\u003c/strong\u003e\u003c/em\u003e \u003cstrong\u003e(A)\u003c/strong\u003e UMAP projection of cervical tumours based on integrated tumour microenvironment (TME)– and driver-restricted methylation features, stratified by HIV status, revealing discrete and stable tumour states rather than continuous gradients. \u003cstrong\u003e(B–D)\u003c/strong\u003e Gene Ontology (GO) Biological Process enrichment analyses comparing HIV-positive versus HIV-negative tumours within methylation-defined clusters C1 \u003cstrong\u003e(B)\u003c/strong\u003e, C2 \u003cstrong\u003e(C)\u003c/strong\u003e, and C3 \u003cstrong\u003e(D)\u003c/strong\u003e. Bar plots show normalised enrichment scores (NES), with positive NES indicating pathways enriched in HIV-positive tumours and negative NES indicating enrichment in HIV-negative tumours. Cluster C1 is dominated by immune and myeloid differentiation pathways, consistent with an immune-inflamed tumour microenvironment. Cluster C2 exhibits a mixed profile characterised by HIV-associated enrichment of proliferative and biosynthetic programmes alongside attenuation of metabolic pathways. Cluster C3 is enriched for metabolic, proteostasis, and apoptotic signalling pathways in HIV-negative tumours, indicating extensive HIV-associated disruption of core cellular homeostasis.\u003c/p\u003e","description":"","filename":"floatimage14.png","url":"https://assets-eu.researchsquare.com/files/rs-8708919/v1/d6cc686c3a88c64b10c60423.png"},{"id":109763086,"identity":"47dc88b6-7d12-4749-8415-5b4f0cb1156d","added_by":"auto","created_at":"2026-05-22 07:33:38","extension":"png","order_by":15,"title":"Figure 15","display":"","copyAsset":false,"role":"figure","size":6338334,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003e\u003cstrong\u003eIntra-(DEA)-cluster tumour–model mapping within Cluster C3.\u003c/strong\u003e\u003c/em\u003e \u003cstrong\u003e(A)\u003c/strong\u003e Volcano plot of C3 HIV-associated differentially methylated CpGs (DMPs), highlighting significantly hyper- and hypomethylated sites. \u003cstrong\u003e(B)\u003c/strong\u003eGene Ontology (GO) enrichment analysis of C3 HIV-associated DMPs, showing the top biological processes enriched among cluster-specific methylation changes. \u003cstrong\u003e(C)\u003c/strong\u003eHeatmap of the top 500 HIV-associated DMPs within C3. β-values are scaled per CpG and hierarchically clustered across patient tumours. Columns are annotated by HIV status, HPV genotype, cancer stage, age group, and BMI, revealing coherent HIV-driven methylation substructure within a fixed tumour endotype. \u003cstrong\u003e(D)\u003c/strong\u003eDEA-restricted multi-line similarity profiles showing Spearman correlations between C3 patient tumours and cervical cancer experimental models, computed using cluster-specific HIV-associated DMPs (2,310 CpGs). Each coloured trajectory represents an experimental system, highlighting structured and non-random tumour–model concordance patterns and demonstrating that HIV-associated intra-cluster epigenetic variation materially influences model alignment.\u003c/p\u003e","description":"","filename":"floatimage15.png","url":"https://assets-eu.researchsquare.com/files/rs-8708919/v1/c29bae25f3a340b58b920def.png"},{"id":109763077,"identity":"0f5bf496-9980-469b-a2df-792958ded3e8","added_by":"auto","created_at":"2026-05-22 07:33:37","extension":"png","order_by":16,"title":"Figure 16","display":"","copyAsset":false,"role":"figure","size":96082,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003e\u003cstrong\u003eMulti-feature consensus modelling enables accurate and interpretable experimental model selection. \u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003e(A)\u003c/strong\u003e Confusion matrix from repeated cross-validated random forest classification shows near-perfect agreement between predicted and consensus model labels (\u003cem\u003eHeLa, MS751, OMC-1\u003c/em\u003e). \u003cstrong\u003e(B)\u003c/strong\u003e Variable importance analysis indicates that intra-cluster HIV-associated DMP similarity is the dominant predictor, followed by TME-restricted similarity, with minimal contribution from driver-restricted features.\u003cstrong\u003e (C)\u003c/strong\u003e Comparison of true versus predicted class counts demonstrates faithful preservation of class distributions without systematic bias. \u003cstrong\u003e(D)\u003c/strong\u003e Prediction probability distributions show high-confidence assignments, particularly for \u003cem\u003eMS751\u003c/em\u003e, confirming robust and decisive experimental model prioritisation.\u003c/p\u003e","description":"","filename":"floatimage16.png","url":"https://assets-eu.researchsquare.com/files/rs-8708919/v1/e2adc5f5e88790b02853d26a.png"},{"id":109763169,"identity":"a1e3f43b-1554-4a6e-a325-2b0c884e3e51","added_by":"auto","created_at":"2026-05-22 07:33:48","extension":"png","order_by":17,"title":"Figure 17","display":"","copyAsset":false,"role":"figure","size":749078,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003e\u003cstrong\u003eStability of experimental model selection across feature-integration strategies.\u003c/strong\u003e\u003c/em\u003e Bar plots show the frequency with which each cervical cancer cell line is selected among the top three matches across patients under different feature-integration configurations: \u003cstrong\u003e(A)\u003c/strong\u003e C3_DMP + TME + Driver (fully integrated model) reveals a highly stable hierarchy dominated by \u003cem\u003eMS751\u003c/em\u003e, followed by \u003cem\u003eHELASF\u003c/em\u003e and \u003cem\u003eHeLa\u003c/em\u003e, indicating strong cross-patient consensus when infection-associated, microenvironmental, and oncogenic signals are jointly considered. \u003cstrong\u003e(B)\u003c/strong\u003e Driver + TME preserves a compact core of consistently selected models while reducing low-frequency selections, demonstrating that oncogenic and microenvironmental features together are sufficient to define a robust representative set. \u003cstrong\u003e(C)\u003c/strong\u003e Joint model with global DMP resolution shows increased dispersion among secondary models, reflecting reduced resolution of tumour identity when intra-cluster epigenetic structure is not explicitly incorporated. \u003cstrong\u003e(D)\u003c/strong\u003e Global DMP plus Driver exhibits the greatest spread of selections, highlighting the limitations of tumour-intrinsic epigenetic features alone in enforcing stable and unambiguous experimental model prioritisation.\u003c/p\u003e","description":"","filename":"floatimage17.png","url":"https://assets-eu.researchsquare.com/files/rs-8708919/v1/48c7b2353f2009454d4d9584.png"},{"id":109766419,"identity":"cbeddd84-4847-469b-bb0f-62e0b12ec899","added_by":"auto","created_at":"2026-05-22 07:45:53","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":46083035,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8708919/v1/c3bb374a-546d-48e3-8152-b06a767b10e8.pdf"},{"id":109763897,"identity":"17a7f24a-8400-4b1c-a57e-6c818e4393f0","added_by":"auto","created_at":"2026-05-22 07:36:15","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":2169905,"visible":true,"origin":"","legend":"","description":"","filename":"DNAmethylationbasedcomputationalframeworkannexure.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8708919/v1/645c859a4aa9884e0167f109.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"A DNA Methylation–based Computational Framework for Tumour–microenvironment State Inference and Molecular Stratification","fulltext":[{"header":"Introduction","content":"\u003cdiv id=\"Sec2\" class=\"Section2\"\u003e \u003ch2\u003eCERVICAL CANCER: EPIDEMIOLOGY, VIRAL AETIOLOGY, AND CLINICAL CHALLENGES\u003c/h2\u003e \u003cp\u003eApproximately 15\u0026ndash;20% of human cancers are attributable to oncogenic viral infections, with cervical cancer (CC) representing one of the most prominent examples (Hewavisenti et al., 2023). Recent global analyses provide the first comprehensive quantification of the cancer burden attributed to HIV, estimating that 0.4% of all cancers diagnosed worldwide in 2022 (\u0026plusmn;\u0026thinsp;81 300 of 19\u0026nbsp;million cases) were directly attributed to HIV infection and theoretically preventable through improved HIV control measures. Cervical cancer accounts for the largest proportion of HIV\u0026mdash;attributable malignancies globally, followed by Kaposi sarcoma, non-Hodgkin lymphoma, and Hodgkin lymphoma. Striking geographical disparities were observed, with the highest absolute and relative HIV-attributable cancer burden concentrated in sub-Saharan Africa, particularly eastern and southern Africa, where HIV contributed to more than 10% of all cancer cases. In Africa, cervical cancer alone accounts for approximately 41% of all HIV\u0026mdash;attributable cancers (Huang et al., 2025). These data underscore the disproportionate intersection of HIV and cervical cancer in high\u0026mdash;prevalence regions. Persistent infection with high-risk human papillomavirus (HR-HPV) remains the central etiological driver of cervical carcinogenesis, but disease initiation and progression are strongly modulated by host immune competence. Although HPV genotype distributions are often comparable between HIV-positive and HIV-negative women with cervical carcinoma, HIV-associated CD4\u0026thinsp;+\u0026thinsp;T-cell depletion\u0026mdash;particularly below 200 cells/\u0026micro;L\u0026mdash;is consistently associated with HR-HPV persistence, multi-type infection, and accelerated progression from cervical intraepithelial neoplasia to invasive disease (Hewavisenti et al., 2023; Huang et al., 2025; Vuyst et al., 2012). Antiretroviral therapy (ART) has substantially reduced the incidence of several AIDS\u0026mdash;defining malignancies, yet its impact on HPV\u0026mdash;driven disease remains inconsistent, with persistent HPV infection and neoplasia frequently observed despite effective viral suppression. Incomplete or functionally dysregulated immune reconstitution\u0026mdash;characterised by impaired antigen presentation, altered T\u0026mdash;cell effector responses, and chronic inflammation\u0026mdash;likely underlies sustained susceptibility to HPV\u0026mdash;mediated oncogenesis.\u003c/p\u003e \u003cp\u003eAt the mechanistic level, HIV\u0026mdash;HPV co-infection is driven primarily by disruption of epithelial integrity and immune\u0026mdash;mediated tumour microenvironment (TME) remodeling rather than direct co\u0026mdash;infection of the same cells. HIV infection promotes mucosal inflammation, downregulation of E\u0026mdash;cadherin and tight junction proteins, and increased epithelial permeability, facilitating HPV access to the basal epithelial layer. HIV\u0026mdash;derived proteins, particularly Tat and gp12, further exacerbate HPV oncogenic potential by disrupting epithelial barriers, enhancing expression of viral E6/E7 oncogenes, suppressing p53, and reactivating latent HPV. Concurrent reductions in innate antiviral molecules, including B\u0026mdash;defensin-2 and thrombospondin, weaken local immune defense and reinforce viral persistence. These interactions operate bidirectionally, as HPV\u0026mdash;associated mucosal inflammation and immune cell recruitment increase susceptibility to HIV acquisition (Hewavisenti et al., 2023; Nazli et al., 2010; Strickler et al., 2005). Collectively, these processes position cervical cancer\u0026mdash;particularly in HIV-endemic settings\u0026mdash;as a disease shaped by infection-driven immune dysfunction and tumour microenvironmental heterogeneity, providing a strong biological rationale for integrative analytical frameworks that explicitly model infection-associated epigenetic regulation and microenvironmental tumour states.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eHETEROGENEITY OF TUMOUR MICROENVIRONMENTAL STATES\u003c/h2\u003e \u003cp\u003eThe tumour microenvironment (TME) in cervical cancer is a highly heterogeneous and functionally complex ecosystem composed of malignant epithelial cells and multiple non-malignant cellular compartments, each contributing distinct molecular programmes that shape tumour behaviour, progression, and therapeutic response. Epithelial tumour cells, defined by canonical markers such as EPCAM, KRT8/18/19, CDH1, and MUC1 (Chakravarthy et al., 2022; Richter et al., 2010; Ruan et al., 2020; Saha et al., 2018; Zhao et al., 2024), constitute the proliferative backbone of cervical cancer. Within this compartment, functional heterogeneity arises from variations in differentiation status and epithelial\u0026ndash;mesenchymal transition (EMT) dynamics, which collectively dictate invasive potential, metastatic behaviour, and therapeutic sensitivity (Zhang et al., 2021). A distinct cancer stem\u0026ndash;like population, characterised by the expression of LGR5, ALDH1A1, PROM1 (CD133), SOX2, NANOG, and POU5F1, provides self-renewal capacity and underpins tumour initiation, recurrence, and resistance to conventional therapies (Huang \u0026amp; Rofstad, 2016; Javed et al., 2021; Wang et al., 2025). These stem-like subpopulations coexist along an EMT gradient, exhibiting hybrid epithelial/mesenchymal phenotypes that confer plasticity and adaptive survival under therapeutic pressure. Surrounding the malignant compartment is a diverse stromal network dominated by cancer-associated fibroblasts (CAFs), marked by FAP, ACTA2, PDGFRB, COL1A1/2, NT5E, and THY1. CAFs display pronounced functional heterogeneity encompassing myofibroblastic CAFs (myCAFs), which drive extracellular matrix (ECM) deposition and tissue stiffness; inflammatory CAFs (iCAFs), which secrete cytokines and chemokines that modulate immune cell infiltration; and antigen-presenting CAFs (apCAFs), which express immune-regulatory molecules that influence T-cell activation and tolerance. These phenotypically distinct subsets collectively orchestrate ECM remodelling, promote EMT, and modulate immune evasion, thereby serving as key regulators of stromal\u0026ndash;tumour crosstalk and therapeutic resistance (Bueno-Urquiza et al., 2024; Y. Li et al., 2025).\u003c/p\u003e \u003cp\u003eAdjacent to CAFs, endothelial cells (PECAM1, VWF, CDH5, KDR, FLT1) and pericytes (RGS5, MCAM, CSPG4, ANGPT1) coordinate angiogenesis, vascular integrity, and perfusion (Shim et al., 2002). Within this vascular niche, endothelial heterogeneity manifests through tip, stalk, and quiescent phenotypes that dynamically respond to hypoxia and VEGF signalling, while pericyte subsets regulate vessel maturation, permeability, and the delivery of chemotherapeutic agents (Amini et al., 2022; Brash et al., 2025; Dasgupta et al., 2022; Keleg et al., 2014; Kumar et al., 2024; Moro et al., 2024). Finally, the immune\u0026ndash;inflammatory compartment introduces another axis of heterogeneity, encompassing cytotoxic and helper T-cell lineages, macrophages, dendritic cells, and myeloid-derived suppressor populations. Markers such as PTPRC, CD3D/E, CD8A, CD68, CD14, IL1B, TNF, and CXCL9/10 delineate immune subsets that can either enhance anti-tumour immunity or sustain immunosuppressive environments, depending on the balance of effector versus regulatory phenotypes (Chen et al., 2021; De Vos Van Steenwijk et al., 2013; Dimitrova et al., 2023; John-Olabode et al., 2025; Li et al., 2023; Litwin et al., 2020; Wang et al., 2022; Xia et al., 2023). In cervical cancer, this immune diversity is further shaped by HPV-driven antigenic stimuli and chronic inflammatory signalling, generating a finely tuned equilibrium between immune activation and evasion. Collectively, these cellular and molecular dimensions of the TME underscore its multilayered heterogeneity and its central role in dictating disease trajectory, therapeutic response, and clinical outcome in cervical cancer.\u003c/p\u003e \u003cp\u003eBecause bulk transcriptomic and methylomic datasets represent aggregate signals from multiple cellular populations, tumour microenvironment (TME)-derived signatures provide a powerful means to decompose patient heterogeneity (Ma et al., 2024). Marker-based scoring approaches\u0026mdash;typically performed by z-scoring each gene across samples and averaging expression within predefined marker sets\u0026mdash;allow quantitative approximation of cell-type abundance and functional state in bulk tissue (Busarello et al., 2025). These signatures are biologically informative: high CAF scores commonly associate with stromal activation, epithelial\u0026ndash;mesenchymal transition (EMT), reduced tumour purity, and immunosuppressive phenotypes, whereas elevated immune signatures indicate cytokine signalling and lymphocytic infiltration, often correlating with enhanced treatment responsiveness in immunologically \u0026ldquo;hot\u0026rdquo; tumours (Peng et al., 2020; Ying et al., 2025). In cervical cancer, TME composition is further shaped by viral and host factors, including HPV genotype and HIV co-infection, each capable of inducing distinct immunologic, metabolic, and epigenetic states (Pavone et al., 2024). These multilayered influences are reflected not only in transcriptional programmes but also in DNA methylation patterns across the classical hallmarks of cancer\u0026mdash;activating invasion and metastasis, avoiding immune destruction, and sustaining tumour-promoting inflammation\u0026mdash;all of which contribute to dynamic TME remodelling (Song et al., 2025). Driver gene methylation introduces an additional regulatory axis for tumour stratification, offering insights into subtype-specific mechanisms of transcriptional control that may not be visible from gene expression alone (Chen et al., 2017).\u003c/p\u003e \u003cp\u003eUnderstanding these interlinked TME-associated and driver-gene methylation programmes is essential for accurately comparing patient tumours with preclinical cervical cancer cell lines. Cell lines, while invaluable for mechanistic and drug-response studies, represent purified epithelial systems devoid of immune and stromal complexity. As a result, apparent molecular discrepancies between tumours and cell lines may arise not from intrinsic biological divergence but from the absence of microenvironmental influences such as immune infiltration, stromal remodelling, or EMT (Raghavan et al., 2021). An integrated visual overview of the analytical design is provided in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e, which presents the framework as a three-part graphical workflow progressing from data preparation, through biological inference, to translational modelling. The figure illustrates how tumour microenvironment (TME) heterogeneity in cervical cancer is interrogated through DNA methylation landscapes in a structured and reproducible manner. \u003cb\u003ePart 1\u003c/b\u003e depicts data retrieval, harmonisation, and quality control. DNA methylation data are obtained from open-source patient cohorts, followed by rigorous sample- and probe-level quality control, probe filtering, and matrix harmonisation to ensure that downstream analyses are driven by biologically meaningful variation rather than technical artefacts. \u003cb\u003ePart 2\u003c/b\u003e represents the biological discovery core of the framework. Feature-restricted differential methylation analysis isolates high-variance and biologically informative CpG sites, including TME- and driver-associated features. These epigenetic signatures are then used to reconstruct latent tumour states through unsupervised learning, with dimensionality reduction and clustering revealing discrete methylation endotypes. Functional enrichment analysis assigns biological meaning to these tumour states by linking them to coherent pathways and regulatory programmes. \u003cb\u003ePart 3\u003c/b\u003e focuses on tumour\u0026ndash;model concordance, predictive modelling, and performance evaluation. Feature-restricted similarity metrics are used to quantitatively compare patient tumours with experimental systems, particularly cervical cancer cell lines, enabling assessment of molecular fidelity. Multi-feature integration combines TME- and driver-linked methylation signals to stabilise model ranking and support translational inference. Predictive modelling and performance metrics demonstrate that the integrated feature space provides robust, interpretable, and biologically grounded discrimination. Collectively, Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e emphasises the central principle of the framework: transforming tumour heterogeneity, epigenetic state reconstruction, and experimental model selection into a unified inference problem that directly links patient-specific molecular context to rational prioritisation of laboratory systems for downstream mechanistic and therapeutic investigation.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eCERVICAL CANCER EXPERIMENTAL MODELS\u003c/h3\u003e\n\u003cp\u003eCervical cancer experimental models constitute essential platforms for mechanistic investigation, therapeutic screening, and biomarker development. However, accumulating evidence indicates that \u003cem\u003ein vitro\u003c/em\u003e and \u003cem\u003eex vivo\u003c/em\u003e models differ markedly in their ability to recapitulate the molecular architecture of patient tumours. Large-scale benchmarking studies have shown systematic discordance between primary tumours and commonly used cancer cell lines, even after rigorous normalisation, reflecting the dominant influence of tumour microenvironment (TME) composition, viral oncogenic context (HPV), and stromal and immune admixture on patient-derived molecular profiles. These discrepancies underscore the need for principled, data-driven approaches to evaluate model fidelity rather than assuming equivalence across available systems (Raghavan et al., 2021). At the same time, integrative pan-cancer analyses encompassing over a thousand human cancer cell lines have demonstrated that many models do retain disease-specific molecular programs, provided that similarity is assessed within biologically relevant feature spaces. A critical implication of these findings is that widely used drug-repurposing and pharmacogenomic frameworks\u0026mdash;including perturbational matching approaches based on LINCS L1000 and Connectivity Map, as well as correlation-driven resources such as CTRP, GDSC, and PRISM\u0026mdash;implicitly depend on the assumption that selected biomodels faithfully reflect the molecular state of the patient tumours to which therapeutic hypotheses are applied (Chawla et al., 2022; Eskra et al., 2023; Szalai et al., 2019). This assumption extends beyond cell lines to more complex systems such as patient-derived xenografts (PDXs) and organoids (PDOs), where accurate drug-response inference requires alignment of immune, stromal, and epithelial programs between models and clinically defined tumour subgroups (Fashemi et al., 2023; Liu et al., 2023; Zhao et al., 2021).\u003c/p\u003e \u003cp\u003eWithin this context, we introduce a rigorous computational framework for quantifying tumour\u0026ndash;model concordance in a biologically constrained and reproducible manner. Rather than assuming equivalence across experimental systems, the framework explicitly evaluates molecular fidelity by restricting similarity assessments to disease-relevant and context-informative feature spaces, thereby enabling principled comparison between patient tumours and candidate experimental models. Rather than relying on global similarity metrics that are often dominated by housekeeping features, the framework restricts concordance analyses to disease- and context-informative molecular signals, such as differentially methylated or transcriptionally variable loci that encode tumour microenvironment states. By reconstructing patient-specific TME-associated epigenetic programmes and systematically correlating these signatures with corresponding cervical cancer experimental models, the approach identifies those models that most faithfully capture the transcriptional and epigenetic landscape of clinical disease (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). Importantly, this strategy transforms experimental model selection from a heuristic process into an explicit inference problem. Aligning individual tumours or tumour clusters with their closest-matching cell lines, patient-derived xenografts (\u003cem\u003ePDXs\u003c/em\u003e), or patient-derived organoids (\u003cem\u003ePDOs\u003c/em\u003e) enables biologically grounded prioritisation of therapeutic compounds whose molecular perturbation profiles are predicted to reverse, reinforce, or exploit the defining features of each tumour state. In doing so, this framework not only exposes the biological heterogeneity of cervical cancer but also establishes a rational and scalable basis for selecting experimental systems that maximise the translational relevance of downstream drug-repurposing and precision oncology analyses.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eCervical cancer experimental models used in this study, including tumour origin, HPV status, morphology, typical experimental applications, and key literature references.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"6\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCellosaurus ID (RRID)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTumour Origin\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eHPV Status\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eMorphology\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eTypical Use / Notes\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eCitation\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCa Ski (RRID:CVCL_1100)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMetastatic cervical squamous carcinoma (epidermal origin \u0026rarr; metastasis to small intestine)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eHPV16, ~\u0026thinsp;600 copies/cell\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eEpithelial-like, adherent\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eClassic HPV16⁺ model; used for metastasis, cisplatin resistance, and viral oncogene expression studies\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e(Koraneekit et al., 2018; Naidu et al., 2018)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCAL-39 (RRID:CVCL_1109)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePrimary cervical squamous carcinoma\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eHPV18 positive\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eEpithelial, adherent\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eHPV18⁺ reference model; drug response assays\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e(Zięba et al., 2018; Gioanni et al., 1993)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSiHa (RRID:CVCL_0032)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePrimary cervical squamous carcinoma\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eHPV16, low copy number (~\u0026thinsp;1\u0026ndash;2 copies/cell)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eEpithelial, adherent\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eLow-copy HPV16 model; radiotherapy, DDR, and viral\u0026ndash;host integration studies\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e(Filippova et al., 2014)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHeLa (RRID:CVCL_0030)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCervical adenocarcinoma (primary tumour)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eHPV18 positive\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eEpithelial, adherent\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eWidely used immortal cervical cell line; HPV18 oncogene biology, transcriptomics, and therapeutic testing\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e(Liu et al., 2012)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eME-180 (RRID:CVCL_1401)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePrimary cervical epidermoid carcinoma\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eHPV39 positive\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eEpithelial, adherent\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eHPV39 variant model; EMT and drug testing studies\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e(Seshadri et al., 2021)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMS751 (RRID:CVCL_4996)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMetastatic cervical carcinoma (lung metastasis)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eHPV18 positive\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eEpithelial\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eHigh HPV18 copy model; immune-related transcriptomic studies\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e(Lin et al., 2020)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSiSo (RRID:CVCL_2193)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eSquamous cervical carcinoma\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eHPV18 positive\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eEpithelial\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eHPV18 model for drug and immune interaction studies\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e(Li et al., 2025)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eC-33 A (RRID:CVCL_1094)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCervical carcinoma, HPV negative\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eHPV-negative\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eEpithelial, adherent\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eHPV-independent cervical model; tumour suppressor and DDR gene studies\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e(Conlon et al., 2021)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHT-3 (RRID:CVCL_1293)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePrimary cervical carcinoma\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eHPV-negative\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eEpithelial, adherent\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eRadio-resistant HPV-negative model; often used in chemoradiation and cisplatin sensitivity assays\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e(Sahu et al., 2024)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBOKU (RRID:CVCL_1089)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCervical squamous carcinoma\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eHPV16 positive\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eEpithelial\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eRarely used HPV16⁺ model; utilised in HPV gene regulation and immune microenvironment studies\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e(Hiramoto et al., 2018)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSKG-IIIa (RRID:CVCL_1704)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCervical squamous carcinoma (Japanese origin)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eHPV16 positive\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eEpithelial\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eRepresents Asian HPV16⁺ squamous tumours; invasion and cell-cycle regulation studies\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e(Horikawa et al., 2015)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSW756 (RRID:CVCL_1727)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCervical squamous cell carcinoma (primary tumour)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eHPV18 positive\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eEpithelial, adherent\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eHPV18⁺ SCC model; EMT, TME, and chemotherapeutic response profiling\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e(Kamradt et al., 2000)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDoTc2 4510 (RRID:CVCL_1181)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCervical carcinoma\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eHPV16 positive\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eEpithelial\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eHPV16⁺ model used for immuno-oncology and methylation concordance analyses\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e(Vučković et al., 2023)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e"},{"header":"Materials","content":"\u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003ePATIENT COHORTS\u003c/h2\u003e \u003cp\u003eTwo independent and complementary cervical cancer cohorts were analysed to ensure analytical robustness, biological generalisability, and cross-platform validation of tumour microenvironment (TME)\u0026ndash;associated molecular states. These cohorts were selected to capture both transcriptomic and epigenetic dimensions of cervical cancer biology across distinct clinical and virological contexts. The first dataset, TCGA-CESC, was obtained from the NCI Genomic Data Commons, a rigorously standardised and widely validated resource for high-quality molecular and clinical cancer data (Weinstein et al., 2013). TCGA-CESC provides bulk RNA-sequencing profiles with extensive clinical annotation, including tumour purity estimates, disease stage, and HPV status, and serves as a reference cohort for cervical cancer genomic studies (Burk et al., 2017). In this study, TCGA-CESC was leveraged to define transcriptionally informed tumour- and TME-associated programmes, establish biologically grounded feature spaces, and support cross-modal interpretation of methylation-derived tumour states. The second dataset, GSE279982, was retrieved from the NCBI Gene Expression Omnibus, an internationally trusted repository for functional genomics data (Barrett et al., 2013). Released in 2024, GSE279982 constitutes one of the largest cervical cancer methylome studies to date in an HIV-endemic setting, profiling 538 cervical samples from Nigerian women using the Infinium MethylationEPIC BeadChip (Kaur et al., 2023). The cohort includes HIV-positive cervical cancer, HIV-positive cervical intraepithelial neoplasia (CIN), HIV-positive cancer-free controls, and HIV-negative cervical cancer cases, accompanying HPV genotyping and detailed clinical metadata. This dataset was originally generated to identify differentially methylated regions associated with cervical cancer progression under chronic HIV and HPV co-infection (Zheng et al., 2025). In this study, GSE279982 served as the primary epigenetic substrate for reconstructing TME-associated methylation states, performing feature-restricted differential analysis, and quantifying tumour\u0026ndash;model concordance. The cohort\u0026rsquo;s depth, epidemiological relevance, and metadata completeness make it particularly well suited for evaluating how viral co-infection shapes epigenetic tumour states and for testing the robustness of tumour\u0026ndash;model inference across clinically and biologically diverse patient populations.\u003c/p\u003e \u003cp\u003e \u003cb\u003eREFERENCE BIOMODEL DATASETS\u003c/b\u003e \u003c/p\u003e \u003cp\u003eTo enable systematic benchmarking of patient tumours against experimentally tractable cervical cancer models, complementary transcriptomic and epigenomic reference datasets were integrated based on data quality, coverage, and cross-study compatibility. These resources provide the molecular baselines required for quantitative tumour\u0026ndash;model concordance analysis. Transcriptomic profiles of cervical cancer cell lines were obtained from the Genomics of Drug Sensitivity in Cancer (GDSC2) repository, which provides uniformly processed RNA-sequencing data across a large panel of deeply characterised human cancer cell lines (Yang et al., 2013). GDSC2 offers harmonised expression profiles alongside extensive molecular and pharmacological annotations, making it a widely used reference resource for translational and drug-response studies. In this study, these data support the identification of transcriptionally defined tumour- and TME-associated programmes and enable cross-modal interpretation of epigenetically inferred tumour states. Corresponding DNA methylation profiles for cervical cancer cell lines were retrieved from GEO (GSE68379), ensuring platform-consistent CpG coverage for direct comparison with patient tumour methylomes (Lorio et al., 2016). The availability of matched epigenomic data enables feature-restricted, CpG-level concordance analyses that minimise technical bias and focus on biologically informative methylation signals. Together, these curated reference model datasets provide a robust molecular backbone for evaluating tumour\u0026ndash;cell line fidelity across both transcriptional and epigenetic dimensions and support the rational selection of cervical cancer experimental systems for downstream mechanistic studies and therapeutic prioritisation.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eANALYTICAL FRAMEWORK AND IMPLEMENTATION\u003c/h3\u003e\n\u003cp\u003eAll analyses were conducted using a modular and reproducible bioinformatics framework designed to infer tumour microenvironment (TME)\u0026ndash;associated molecular states and to quantitatively align patient tumours with experimentally tractable cervical cancer models. The framework is implemented as a stepwise inference workflow that integrates data quality control, biologically constrained feature selection, unsupervised learning, tumour\u0026ndash;model concordance analysis, and functional interpretation within a unified analytical design. The workflow begins with rigorous data quality control (QC) tailored to each molecular modality. For DNA methylation data, probe-level and sample-level QC metrics\u0026mdash;including signal intensity distributions, detection p-values, and variance structure\u0026mdash;are evaluated to exclude technical artefacts and ensure that downstream patterns reflect true biological variation rather than noise (Sun et al., 2022; Iorio et al., 2016). Only high-confidence CpG sites shared across patient tumours and reference model platforms are retained for downstream analyses, ensuring cross-dataset comparability. The framework then applies feature-restricted differential analysis, focusing on high-variance and biologically informative loci rather than genome-wide averages. For methylation data, this involves identifying differentially methylated or highly variable CpG sites associated with tumour state, clinical strata, or TME composition using moderated statistical testing (Ritchie et al., 2015). This biologically constrained feature selection reduces dimensionality, increases statistical power, and prioritises regulatory signals linked to tumour biology and microenvironmental context. Curated feature sets are subsequently used for tumour microenvironment reconstruction and unsupervised structure discovery. Dimensionality reduction and clustering approaches\u0026mdash;including hierarchical clustering and graph-based community detection using Rphenograph\u0026mdash;are employed to resolve latent tumour states that reflect coordinated epigenetic and TME-associated programmes rather than continuous variation (Levine et al., 2015; McInnes et al., 2018). Cluster robustness and biological coherence are evaluated through stability assessment and enrichment analyses. A central component of the framework is feature-restricted tumour\u0026ndash;model concordance analysis. Patient-derived molecular signatures are quantitatively compared with cervical cancer model profiles using Spearman rank correlation, computed within the restricted feature space to assess preservation of biologically meaningful methylation programmes across systems (Wisniewski \u0026amp; Brannan, 2024). This strategy explicitly accounts for the absence of immune and stromal compartments in \u003cem\u003ein vitro\u003c/em\u003e models and enables biologically interpretable ranking of model fidelity at both patient and cluster levels. The framework further supports multi-feature integration and predictive modelling by combining tumour-intrinsic (driver-associated) and TME-linked molecular signals to enhance robustness and translational relevance. Model performance is evaluated using receiver operating characteristic (ROC) and precision\u0026ndash;recall curves, calibration profiles, and feature-importance metrics to ensure discrimination, stability, and interpretability (Haynes, 2013; McKight \u0026amp; Najab, 2010). Finally, clinical enrichment and functional pathway analyses are performed to contextualise molecularly defined tumour states. Cluster-level signatures are tested for enrichment of clinical, virological, and phenotypic annotations, while pathway enrichment analyses using FGSEA and MSigDB map epigenetic programmes to underlying biological mechanisms, including immune regulation, stromal activation, proliferation, and stress-response pathways (Kanehisa et al., 2016; Liberzon et al., 2015; Korotkevich et al., 2016). Collectively, this framework operationalises tumour heterogeneity and experimental model selection as a unified inference problem, providing a transparent, extensible, and biologically grounded methodological basis for linking patient-specific molecular context to rational model prioritisation and downstream translational analyses.\u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eSTATISTICAL ANALYSES\u003c/h2\u003e \u003cp\u003eAll statistical workflows were performed in R (v4.3 or later) using reproducible, script-based pipelines. Correlation analyses were conducted using Spearman\u0026rsquo;s rank correlation, chosen for its robustness to non-normal distributions and its suitability for cross-platform comparisons (methylation \u0026harr; expression; tumour \u0026harr; cell line) (Wisniewski \u0026amp; Brannan, 2024). Group-level comparisons across clinical strata\u0026mdash;including HIV status, HPV genotype, and TME-defined subtypes\u0026mdash;were assessed using Wilcoxon rank-sum tests for two-group contrasts or Kruskal\u0026ndash;Wallis tests for multi-group comparisons (Haynes, 2013b; McKight \u0026amp; Najab, 2010). Multiple hypothesis testing was controlled using the Benjamini\u0026ndash;Hochberg false discovery rate (FDR) procedure, with statistical significance defined at FDR\u0026thinsp;\u0026lt;\u0026thinsp;0.05 unless otherwise specified (Haynes, 2013). For clustering analyses, both methylation-derived and expression-derived matrices were analysed using hierarchical clustering (complete linkage, Euclidean distance) implemented through base R ggplot2 (Zhu et al., 2025; Wickham et al., 2025), and ComplexHeatmap (Kolde et al., 2025), supplemented by Rphenograph where high-dimensional TME signatures required graph-based community detection (Levine et al., 2015). Dimensionality reduction for TME profiling was performed using PCA and non-linear embeddings (t-SNE/UMAP) generated via Rtsne and uwot respectively (Krijthe, 2023; McInnes et al., 2018). Functional enrichment analyses used FGSEA for pathway-level ranking (KEGG) and MSigDB for ontology-based and cancer hallmark processes (Kanehisa et al., 2016; Liberzon et al., 2015). Differential methylation testing used limma for per-cluster contrasts (Ritchie et al., 2015), and DMP-to-gene linking was done through Illumina 450K/EPIC probe annotation. Collectively, these statistical procedures supported a unified pipeline to profile TME and infer most appropriate cancer models.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eMethods\u003c/h3\u003e\n\u003cdiv id=\"Sec10\" class=\"Section2\"\u003e \u003ch2\u003eAPPLICATION TO THE GSE279982 METHYLOME\u003c/h2\u003e \u003cp\u003eThe framework was designed as a tumour-aware, feature-restricted integrative workflow that links patient epigenetic heterogeneity to rational selection of experimental models. Rather than treating quality control, clustering, and model mapping as independent analytical steps, the approach integrates these components into a single inference process in which tumour state discovery, biological interpretation, and model prioritisation are jointly optimised. Application to the GSE279982 DNA methylation dataset proceeded through a series of sequential stages: (i) data harmonisation and quality control Supplementary, (ii) biologically constrained feature restriction, (iii) unsupervised discovery of methylation-defined tumour states, (iv) functional characterisation of cluster-specific regulatory programmes, (v) tumour\u0026ndash;model concordance analysis within restricted feature spaces, and (vi) clinical and virological contextualisation of inferred tumour states \u003cb\u003e(Fig.\u0026nbsp;2)\u003c/b\u003e. Together, these steps enable systematic reconstruction of tumour microenvironment\u0026ndash;associated epigenetic states and provide a coherent framework for evaluating how well experimental systems capture the molecular diversity observed in patient tumours \u003cem\u003e(see Supplementary Framework_R, Tutorials 1\u0026ndash;9).\u003c/em\u003e\u003c/p\u003e \u003cp\u003e \u003cb\u003e1. Data Sources and Harmonisation\u003c/b\u003e \u003c/p\u003e \u003cp\u003eGenome-wide DNA methylation profiles were obtained from GSE279982, generated using the Illumina Infinium MethylationEPIC BeadChip platform and accompanied by curated clinical metadata, including HIV status, HPV genotype, age, BMI, and cancer stage. Reference DNA methylation profiles for cervical cancer cell lines were obtained from GSE68379 to enable quantitative comparison between patient tumours and experimentally tractable in vitro systems. Processed β-value matrices provided by the original studies were used to maintain consistency with validated normalisation pipelines and to enable direct cross-dataset integration. Sample identifiers were harmonised across molecular and clinical metadata, and only samples with concordant annotations were retained for downstream analyses \u003cem\u003e(see Supplementary Framework_R, Tutorial 1)\u003c/em\u003e.\u003c/p\u003e \u003cp\u003e \u003cb\u003e2. Methylation Quality Control and Probe Filtering\u003c/b\u003e \u003c/p\u003e\u003cp\u003eTo ensure that downstream tumour state discovery reflected biological signal rather than technical artefact, stringent DNA methylation\u0026ndash;specific quality control was applied. Sample-level QC assessed global β-value distributions, variance structure, and signal consistency across arrays. No samples exhibited aberrant hybridisation profiles or extreme variance, supporting retention of the full cohort. Probe-level filtering removed CpG sites mapping to sex chromosomes, probes overlapping known single-nucleotide polymorphisms (SNPs) at the CpG or single-base extension site, cross-hybridising probes, and probes failing detection thresholds across samples. This filtering step yielded a high-confidence CpG set shared across patient tumours and cervical cancer cell line platforms, ensuring technical compatibility for downstream concordance and comparative analyses \u003cem\u003e(see Supplementary Framework_R, Tutorial 2)\u003c/em\u003e.\u003c/p\u003e \u003cp\u003e \u003cb\u003e3. Gene-Level Aggregation and Feature Harmonisation\u003c/b\u003e \u003c/p\u003e \u003cp\u003eTo improve biological interpretability and facilitate cross-system comparison, CpG-level methylation values were aggregated to the gene promoter level. CpGs annotated to promoter-proximal regions (TSS200 and TSS1500) were summarised by mean β-values per gene per sample. This representation reduces stochastic CpG-level noise while preserving regulatory signal relevant to transcriptional control. Only genes represented in both the patient and reference datasets were retained, generating a harmonised gene-by-sample methylation matrix used throughout subsequent analyses \u003cem\u003e(see Supplementary Framework_R, Tutorial 2 and 3).\u003c/em\u003e\u003c/p\u003e \u003cb\u003e4. Biologically Constrained Feature Restriction\u003c/b\u003e \u003c/p\u003e \u003cp\u003eA central design principle of the framework is that tumour heterogeneity is encoded across distinct biological axes and must be interrogated within appropriately restricted feature spaces. Accordingly, feature restriction was applied prior to clustering and comparative mapping rather than post hoc. Three complementary feature sets were constructed: Differentially methylated CpGs (DMPs) capturing disease- and endotype-specific epigenetic disruption, Driver-associated CpGs reflecting tumour-intrinsic oncogenic programmes, and Tumour microenvironment (TME)-associated CpGs representing immune and stromal context. Differential methylation analyses were performed using the limma framework, with moderated statistics and false discovery rate control. Features were ranked by effect size and statistical strength, yielding compact, biologically enriched feature spaces that maximised signal-to-noise and interpretability \u003cem\u003e(see Supplementary Framework_R, Tutorial 3\u0026mdash;6)\u003c/em\u003e.\u003c/p\u003e \u003cp\u003e \u003cb\u003e5. Unsupervised Discovery of Methylation-Defined Tumour States\u003c/b\u003e \u003c/p\u003e \u003cp\u003eTo identify latent tumour structure without imposing clinical labels, unsupervised clustering was performed within TME-restricted methylation space using Rphenograph, a graph-based community detection algorithm well suited to high-dimensional molecular data. This analysis identified four robust tumour clusters (C1\u0026ndash;C4), each representing a distinct epigenetic and microenvironmental state. Cluster robustness was assessed through inspection of intra-cluster coherence and stability across subsampling. Low-dimensional embedding using UMAP and t-SNE was employed to visualise tumour relationships and confirm that clusters represented discrete biological communities rather than technical artefacts or continuous gradients \u003cem\u003e(see Supplementary Framework_R, Tutorial 7).\u003c/em\u003e\u003c/p\u003e \u003cp\u003e \u003cb\u003e6. Functional Characterisation of Tumour Endotypes\u003c/b\u003e \u003c/p\u003e \u003cp\u003eTo assign biological meaning to methylation-defined tumour states, cluster-specific functional enrichment analysis was performed. Differentially methylated genes for each cluster were ranked and analysed using FGSEA against MSigDB Hallmark, GO Biological Process, and KEGG gene sets. Normalised enrichment scores (NES) were used to identify coherent biological programmes, including immune activation, myeloid differentiation, cell-cycle regulation, metabolic reprogramming, and stress-response pathways. These functional signatures provided mechanistic context for tumour stratification and served as an additional biological layer for model mapping \u003cem\u003e(see Supplementary Framework_R, Tutorial 9)\u003c/em\u003e.\u003c/p\u003e \u003cp\u003e \u003cb\u003e7. Tumour\u0026ndash;Model Concordance within Restricted Feature Spaces\u003c/b\u003e \u003c/p\u003e \u003cp\u003eTo evaluate model fidelity, tumour\u0026ndash;model similarity was quantified using Spearman rank correlation computed within restricted feature spaces. Concordance analyses were performed independently within differentially methylated region (DMP)\u0026ndash;restricted, driver-restricted, and TME-restricted methylation matrices, as well as across integrated feature combinations. This strategy enabled systematic dissection of tumour-intrinsic versus contextual contributions to model similarity. Correlation-based similarity was selected for its robustness to cross-platform variability and non-normal data distributions. Concordance profiles were resolved at the individual patient level and further stratified by tumour cluster, ensuring that experimental model prioritisation reflected biologically defined tumour states rather than cohort-averaged signals \u003cem\u003e(see Supplementary Framework_R, Tutorial 4\u0026mdash;6).\u003c/em\u003e\u003c/p\u003e \u003cp\u003e \u003cb\u003e8. Integrated Feature Modelling and Consensus Model Inference\u003c/b\u003e \u003c/p\u003e \u003cp\u003eTo formalise experimental model selection across feature spaces, similarity scores derived from DMP-restricted, driver-restricted, and TME-restricted analyses were integrated into a unified predictive framework. Patient-level top-ranked models from each feature space were combined using a consensus-label strategy, prioritising agreement across independent biological axes. A random forest classifier was trained to predict consensus model labels using feature-restricted similarity scores as predictors. Model performance was evaluated using repeated cross-validation, confusion matrices, variable importance analysis, and prediction probability distributions. This approach quantified the relative contribution of epigenetic disruption, oncogenic identity, and microenvironmental context to overall model fidelity and provided an objective basis for prioritising experimental systems that most closely recapitulate patient tumour states \u003cem\u003e(see Supplementary Framework_R, Tutorial 4\u0026ndash;6).\u003c/em\u003e\u003c/p\u003e\u003cp\u003e \u003cb\u003e9. Clinical and Virological Contextualisation\u003c/b\u003e \u003c/p\u003e \u003cp\u003eFinally, methylation-defined tumour states and biomodel concordance results were evaluated for association with clinical and virological variables, including HIV status, HPV genotype, age, BMI, and cancer stage. Enrichment patterns were visualised on UMAP embeddings and cluster-resolved plots, and group-level differences were assessed using non-parametric statistical tests. This analysis established that the inferred tumour states capture clinically relevant heterogeneity while remaining independent of conventional staging alone \u003cem\u003e(see Supplementary Framework_R, Tutorial 8).\u003c/em\u003e\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003eDEVELOPMENT USING TCGA-CESC TRANSCRIPTOMES\u003c/h2\u003e \u003cp\u003eThe framework was initially developed and calibrated using transcriptomic data from the TCGA-CESC cohort to establish a biologically grounded and technically robust basis for tumour\u0026ndash;model inference prior to its application to epigenetic data. TCGA-CESC was selected as the development dataset because it represents the most comprehensively annotated reference cohort for cervical cancer, providing high-quality RNA-sequencing profiles alongside detailed clinical, histological, and virological (HPV status) metadata. In this phase, TCGA-CESC transcriptomes were used to define and validate core analytical components, including feature-restricted similarity estimation, tumour microenvironment (TME)\u0026ndash;aware stratification, and patient\u0026ndash;in vitro model concordance logic. By leveraging both global gene expression patterns and biologically informed gene sets, such as cancer-driver genes and TME-associated gene signatures, the framework was tuned to distinguish tumour-intrinsic transcriptional programmes from microenvironment-driven variation that is incompletely represented \u003cem\u003ein vitro\u003c/em\u003e. The strong HPV dependence, well-characterised epithelial\u0026ndash;mesenchymal gradients, and pronounced immune heterogeneity present in TCGA-CESC enabled systematic evaluation of the framework\u0026rsquo;s ability to recover known biological structure without explicit supervision. This transcriptome-based development phase established the conceptual and computational foundations of the analytical approach, ensuring that subsequent extension to DNA methylation data preserved biological interpretability, model discrimination, and translational relevance across molecular layers \u003cb\u003e(Fig.\u0026nbsp;2)\u003c/b\u003e.\u003cb\u003eFigure 2.\u003c/b\u003e \u003cb\u003eStudy design and development framework.\u003c/b\u003e Schematic overview of the analytical workflow illustrating data sources, core analytical modules, and the integration logic underpinning tumour\u0026ndash;model inference. DNA methylation profiles from GSE279982 (patient tumours) and GSE68379 (cervical cancer cell lines) were harmonised, quality controlled, probe-filtered, and promoter-aggregated to generate biologically interpretable methylation matrices. Feature restriction was then applied to construct three complementary tumour representations: \u003cb\u003e(i)\u003c/b\u003e tumour microenvironment (TME)\u0026ndash;restricted CpGs, \u003cb\u003e(ii)\u003c/b\u003e differentially methylated positions (DMPs), and \u003cb\u003e(iii)\u003c/b\u003e cancer driver\u0026ndash;associated CpGs. TME-restricted features were used for unsupervised clustering via Rphenograph to define methylation-based tumour states, together with dimensionality reduction and pathway enrichment analyses to assign biological meaning to each endotype. In parallel, tumour\u0026ndash;model similarity scores were computed independently within each restricted feature space to evaluate the fidelity of in vitro systems to patient tumours. Outputs from the DMP-, driver-, and TME-restricted concordance analyses were integrated within a consensus inference framework, calibrated using TCGA-CESC transcriptomic data, and formalised through a random forest classifier to derive robust prioritisation of experimental models. Final tumour states and model assignments were subsequently contextualised with clinical and virological variables to assess relevance to HIV status, HPV genotype, and disease characteristics, forming the complete development and application workflow.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003eClassifier architecture\u003c/h2\u003e \u003cp\u003eA multi-feature, weakly supervised classification architecture was implemented to infer optimal experimental models by integrating complementary epigenetic signals (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e3\u003c/span\u003e). The classifier operates on three biologically constrained DNA methylation feature spaces: \u003cb\u003e(i)\u003c/b\u003e tumour microenvironment (TME)\u0026ndash;restricted CpGs, \u003cb\u003e(ii)\u003c/b\u003e cancer driver gene\u0026ndash;restricted CpGs, and \u003cb\u003e(iii)\u003c/b\u003e intra-cluster HIV-associated differentially methylated positions (DMPs). Each feature space is analysed independently to compute tumour\u0026ndash;model similarity using Spearman rank correlation, ensuring sensitivity to monotonic epigenetic concordance while remaining robust to outliers and non-normal data distributions. To enable joint learning across heterogeneous feature spaces, within-patient normalisation of similarity scores is applied, transforming absolute correlations into patient-specific relative enrichment measures (Z-scores) (Andrade et al., 2021). This harmonisation preserves the internal ranking of model preferences for each patient while removing scale-dependent biases between feature domains. For each patient, top-ranked experimental models are identified independently within each feature space, yielding multiple, potentially discordant model assignments that reflect distinct biological axes of tumour heterogeneity. In the absence of gold-standard labels for model fidelity, a weakly supervised consensus labelling strategy is adopted. Feature-specific top predictions are reconciled into a single consensus model label per patient, prioritising agreement across feature spaces and resolving conflicts using biologically motivated precedence rules. These consensus labels serve as training targets while retaining patient-specific molecular context. A Random Forest classifier is trained using the harmonised similarity features from all three feature spaces to predict the consensus model label. Random Forests were selected for their capacity to capture non-linear feature interactions, tolerate correlated predictors, and support robust rank-based decision boundaries. Model training is performed using repeated stratified cross-validation to minimise overfitting and ensure generalisability. The trained classifier outputs probabilistic model assignments for each patient, enabling interpretable and confidence-aware prioritisation of experimental systems. Feature importance analysis is used to quantify the relative contribution of TME, driver-associated, and HIV-linked epigenetic signals, ensuring that predictive performance reflects biologically meaningful integration rather than dominance by a single feature class \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e3\u003c/span\u003e\u003cb\u003e)\u003c/b\u003e. See, \u003cem\u003eSupplementary Framework_R\u003c/em\u003e, \u003cem\u003eTutorial 4\u0026ndash;6\u003c/em\u003e for the core logic of classifier, and \u003cem\u003eTutorial 9\u003c/em\u003e for consensus and integration logic.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003eOPEN-SOURCE DATA RETRIEVAL AND COHORT HARMONISATION\u003c/h2\u003e \u003cp\u003ePublicly available DNA methylation datasets were programmatically retrieved and harmonised to establish an analysis-ready foundation for the study. The primary patient cohort (GSE279982) comprised genome-scale Illumina EPIC methylation profiles from cervical tumours collected from HIV-positive and HIV-negative women, accompanied by extensive clinical annotations including HIV status, HPV genotype, tumour stage, age, BMI, and tumour site. A complementary reference cohort (GSE68379) provided DNA methylation profiles for a curated panel of cervix-derived cancer cell lines. Systematic inspection confirmed high metadata completeness in both cohorts, with minimal missingness and no duplicated or unmapped sample identifiers. Processed patient β-value matrices and reconstructed cell line β-values from raw IDAT files exhibited biologically valid methylation ranges. Explicit alignment of assay matrices with clinical and experimental metadata ensured exact correspondence between samples and annotations. More than 440,000 CpG probes were shared between patient tumours and cell line datasets, establishing a substantial common feature space for integrative and comparative analyses. Together, these results demonstrate successful retrieval, curation, and harmonisation of patient and reference methylation datasets, providing a robust and reproducible input for downstream tumour\u0026ndash;model concordance analysis, epigenetic state inference, and translational investigations.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003eDATA PREPROCESSING, NORMALISATION, AND QUALITY CONTROL\u003c/h2\u003e \u003cp\u003eRigorous preprocessing and quality control were applied to generate harmonised, high-confidence DNA methylation profiles suitable for integrative tumour\u0026ndash;model analyses. Sample-level quality assessment using raw IDAT data from cervical cancer cell lines demonstrated uniformly high signal quality with no evidence of failed arrays, supporting retention of all model samples \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e4\u003c/span\u003e\u003cb\u003e)\u003c/b\u003e. Probe-level filtering removed low-confidence and biologically confounding CpGs, including probes failing detection thresholds, cross-reactive probes, probes overlapping common SNPs, sex chromosome probes, and non-autosomal loci, substantially reducing technical noise. Background and dye-bias correction using Noob normalisation yielded stable and biologically valid β-value distributions across all cancer models. These probe filters were then consistently applied to the patient tumour cohort, ensuring both tumours and models were represented within an identical CpG universe. After harmonisation, 430,885 high-confidence CpG sites were shared between patient tumours and cervical cancer models, with perfectly aligned assay matrices and metadata. This preprocessing framework effectively minimised technical artefacts while preserving biologically relevant variation, providing a robust and unbiased foundation for downstream clustering, differential methylation analysis, tumour\u0026ndash;model matching, and translational inference.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003eFEATURE-RESTRICTED DIFFERENTIAL EXPRESSION ANALYSIS (DEA)\u003c/h2\u003e \u003cp\u003e \u003cb\u003eDea-restricted Feature Space\u003c/b\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003e1. Differentially Methylated Probes\u003c/h2\u003e \u003cp\u003eDifferential methylation analysis using \u003cem\u003elimma\u003c/em\u003e revealed widespread and statistically robust HIV-associated epigenetic alterations across the cervical cancer methylome. Using QC-filtered, normalised, and harmonised methylation profiles, β-values were transformed to M-values to enable linear modelling, and probe-wise models were fitted to compare HIV-positive versus HIV-negative tumours with empirical Bayes moderation. This analysis identified 95,378 significantly differentially methylated positions (DMPs) after false discovery rate correction (FDR\u0026thinsp;\u0026lt;\u0026thinsp;0.05) and effect-size filtering, indicating extensive HIV-linked epigenetic reprogramming in cervical cancer. Both hypermethylated and hypomethylated CpGs were detected, with large effect sizes and strong statistical support, reflecting coordinated regulatory changes rather than stochastic variation \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e5\u003c/span\u003eA\u003cb\u003e)\u003c/b\u003e. Annotation of significant DMPs demonstrated enrichment across promoters, gene bodies, CpG islands, shores, and open-sea regions, linking HIV status to broad regulatory and genomic contexts relevant to tumour biology. These DMPs constitute a high-confidence, disease-relevant epigenetic signature that directly informed downstream tumour stratification, tumour\u0026ndash;model concordance analysis, and biomarker discovery. Inspection of the top 500 HIV-associated DMPs further confirmed that the restricted feature space captured structured and biologically coherent epigenetic variation across patients. Scaled β-value heatmaps revealed coordinated methylation patterns with clear separation by HIV status, alongside additional stratification by HPV group and tumour stage \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e5\u003c/span\u003eB\u003cb\u003e)\u003c/b\u003e. Notably, these CpGs organised into coherent methylation blocks rather than exhibiting stochastic variation, supporting their relevance as markers of biologically distinct tumour states.\u003c/p\u003e \u003cp\u003e \u003cb\u003e2. Tumour-Model Mapping\u003c/b\u003e \u003c/p\u003e \u003cp\u003eRestricting tumour\u0026ndash;model similarity analysis to differentially methylated probes (DMPs) associated with HIV status and cervical cancer biology substantially reshaped the structure, dynamic range, and interpretability of tumour\u0026ndash;model concordance. Following harmonisation, 430,885 CpGs were shared between patient tumours and reference cell line datasets; restriction to the disease-associated DMP set yielded a focused feature space of 95,378 CpGs. Similarity calculations within this restricted space produced a markedly expanded correlation range and enhanced contrast between high- and low-concordance models relative to genome-wide methylation similarity, indicating effective suppression of background and non-informative methylation signal. DEA-restricted Spearman similarity profiles revealed pronounced inter-patient heterogeneity in model alignment, with correlation values varying systematically across tumours rather than remaining uniform \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e6\u003c/span\u003e\u003cb\u003e)\u003c/b\u003e. Individual cell lines exhibited distinct trajectory shapes across the patient cohort, with some models showing consistently higher concordance across subsets of tumours, while others displayed lower or more variable alignment. This pattern indicates that experimental model suitability is not global but tumour-state dependent when similarity is evaluated within a disease-relevant epigenetic space. Annotation of these similarity profiles with clinical metadata revealed structured stratification by HIV status, HPV genotype, and cancer stage. Tumours stratified by HIV status demonstrated distinct correlation distributions, consistent with HIV-associated epigenetic programmes exerting a measurable influence on tumour\u0026ndash;model similarity when analysis is constrained to biologically relevant CpGs. Additional stratification by HPV group and tumour stage further refined these profiles, uncovering genotype- and stage-dependent shifts in model concordance that were not apparent under genome-wide correlation. These structured patterns contrast sharply with the comparatively compressed and homogeneous similarity observed in global methylation analyses. Visual integration of DEA-restricted similarity profiles with heatmap-based representations of HIV-associated methylation structure \u003cb\u003e(top DMPs;\u003c/b\u003e Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e5\u003c/span\u003eB\u003cb\u003e)\u003c/b\u003e demonstrated that tumours sharing coherent epigenetic states also exhibited recurrent alignment with specific experimental models. Notably, subsets of cell lines consistently tracked together across patients, suggesting shared representation of underlying tumour states rather than averaged or nonspecific similarity. This convergence supports a state-aware tumour\u0026ndash;model matching paradigm, enabling rational prioritisation of experimental systems tailored to defined tumour contexts rather than one-size-fits-all selection.\u003c/p\u003e \u003cp\u003e \u003cb\u003e3. ML-Based Tumour\u0026ndash;Model Mapping Performance Evaluation\u003c/b\u003e \u003c/p\u003e\u003cp\u003eMachine-learning\u0026ndash;based prioritisation of tumour\u0026ndash;model similarity, built on DEA-restricted methylation concordance, enabled objective identification of high-confidence biomodels for each patient beyond na\u0026iuml;ve correlation ranking. Evaluation of the trained Random Forest classifier demonstrated excellent discriminative ability, highly consistent prioritisation behaviour, and biologically meaningful integration of absolute and relative similarity features. The Precision@K profile showed that the algorithm performs extremely well at the clinically relevant end of the ranking spectrum, with precision remaining at or near 1.0 for the top one to three predicted models before declining gradually thereafter \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e7\u003c/span\u003eB\u003cb\u003e)\u003c/b\u003e. This structured decay indicates a confident ranking system that sharply distinguishes optimal from suboptimal biomodels rather than exhibiting diffused or random prioritisation. The ROC curve further supported this behaviour, demonstrating near-perfect separation between top-ranked and non-top-ranked biomodels across patients, with an AUC of 1.0 \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e7\u003c/span\u003eA\u003cb\u003e)\u003c/b\u003e. Importantly, this evaluation treats prioritisation as a cross-patient classification problem; maintaining such high discrimination performance across heterogeneous tumours demonstrating strong generalisability of the learned similarity function and confirms that DEA-restricted similarity features provide sufficient signal for robust classification even under weak supervision. To assess translational robustness, we examined whether prioritised models were consistently selected across patients rather than arising from stochastic or unstable ranking. Stability analysis of the Top-3 predictions revealed recurrent enrichment of specific biomodels, indicating convergence toward a small subset of highly representative cervical cancer models \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e7\u003c/span\u003eC\u003cb\u003e)\u003c/b\u003e. Cell lines such as \u003cem\u003eMS751\u003c/em\u003e, \u003cem\u003eSKG-IIIA\u003c/em\u003e, and \u003cem\u003eTC-YIK\u003c/em\u003e were repeatedly prioritised, suggesting that these models capture stable biological programmes recurrently observed across DEA-restricted tumour methylomes. This pattern supports the biological grounding of the framework, demonstrating systematic preference for reproducible tumour-relevant models rather than patient-specific noise. Finally, regression analysis formally tested whether relative DEA similarity contributes meaningful information beyond absolute correlation. A positive association between absolute DEA correlation and patient-normalised DEA-Z scores was observed \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e7\u003c/span\u003eD\u003cb\u003e)\u003c/b\u003e, indicating that while relative performance within a patient correlates with absolute signal strength, it does not collapse into it. An R\u0026sup2; of approximately 0.21 demonstrates that relative similarity explains a substantial fraction of concordance variance while preserving independent information. This confirms the conceptual motivation of the framework: models with comparable absolute methylation concordance may differ markedly in their relative enrichment within a patient-specific context, and capturing this distinction is essential for biologically rational and clinically meaningful prioritisation.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003ch2\u003eDriver Gene-Restricted Feature Space\u003c/h2\u003e \u003cp\u003eDriver genes directly shape tumour biology through effects on proliferation, DNA damage response, and oncogenic signalling; consequently, methylation changes at driver loci are more likely to reflect tumour lineage, oncogenic programmes, and actionable biology than genome-wide background methylation. Restricting similarity analysis to driver-linked CpGs therefore increases biological interpretability and enhances signal relevant for biomodel selection. Restriction to this driver-associated feature space yielded a compact yet highly informative molecular representation that captured structured and biologically meaningful variation across patients. The heatmap of top driver-associated CpGs revealed coherent blocks of hyper- and hypomethylation with clear patient stratification aligned to HIV status, HPV genotype, and tumour stage, indicating that driver methylation states encode both oncogenic and context-specific tumour biology \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e8\u003c/span\u003eA\u003cb\u003e)\u003c/b\u003e. Spearman correlation profiling between patient tumours and candidate biomodels within this driver-restricted space produced consistently elevated similarity values, with recurrent correlation peaks for specific models across the cohort \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e8\u003c/span\u003eB\u003cb\u003e)\u003c/b\u003e, suggesting the presence of conserved driver-programme archetypes shared across subsets of tumours. Machine-learning evaluation further confirmed the robustness of this driver-restricted space for biomodel prioritisation. Precision@K analysis demonstrated extremely high precision among the top-ranked predictions \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e8\u003c/span\u003eD\u003cb\u003e)\u003c/b\u003e, while ROC analysis showed near-perfect discriminative performance between top-ranked and non-top-ranked biomodels \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e8\u003c/span\u003eC\u003cb\u003e)\u003c/b\u003e. Stability analysis of Top-3 selections revealed recurrent prioritisation of a small subset of biomodels, particularly \u003cem\u003eMS751\u003c/em\u003e, \u003cem\u003eHELASF\u003c/em\u003e, \u003cem\u003eSKG-IIIA\u003c/em\u003e, \u003cem\u003eTC-YIK\u003c/em\u003e, and \u003cem\u003eOMC-1\u003c/em\u003e \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e8\u003c/span\u003eE\u003cb\u003e)\u003c/b\u003e, indicating convergence toward biologically representative and reproducible model choices rather than stochastic ranking behaviour. Finally, regression analysis demonstrated a strong linear relationship between relative (within-patient) driver enrichment and absolute driver concordance, with patient-normalised driver Z-scores explaining approximately 61% of the variance in absolute correlation values \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e8\u003c/span\u003eF\u003cb\u003e)\u003c/b\u003e. This indicates that relative driver-associated signal is a dominant determinant of experimental model fidelity in this feature space, capturing biologically meaningful information beyond absolute similarity alone. Collectively, these results demonstrate that driver gene\u0026ndash;restricted methylation encodes stable and biologically grounded tumour identity, supports confident and reproducible model prioritisation, and provides a powerful framework for context-aware experimental system selection.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec19\" class=\"Section2\"\u003e \u003ch2\u003eTumour Microenvironment (TME)-restricted Feature Space\u003c/h2\u003e \u003cp\u003eRestriction of the feature space to tumour microenvironment (TME)\u0026ndash;associated differentially methylated CpGs further sharpened tumour stratification and biomodel prioritisation by explicitly focusing on epigenetic programmes linked to immune, stromal, and microenvironmental biology. Heatmap visualisation of the top 500 TME-associated CpGs revealed highly structured methylation blocks with clear segregation of patient tumours according to HIV status, HPV group, cancer stage, age, and BMI \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e9\u003c/span\u003eA\u003cb\u003e)\u003c/b\u003e, indicating that TME-linked methylation captures clinically coherent tumour contexts rather than diffuse background signal. These CpGs exhibited coordinated hypo- and hypermethylation patterns across patients, consistent with stable microenvironmental states such as immune-inflamed versus immune-suppressed tumours. DEA-restricted TME similarity profiles demonstrated pronounced and non-random concordance patterns between patient tumours and cervical cancer biomodels \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e9\u003c/span\u003eB\u003cb\u003e)\u003c/b\u003e. Several cell lines displayed recurrently high Spearman correlations across large subsets of patients, supporting the existence of conserved TME archetypes that are reproducibly represented in vitro. Integration of TME-restricted similarity into the joint predictive framework (DEA\u0026thinsp;+\u0026thinsp;driver\u0026thinsp;+\u0026thinsp;TME) substantially improved model selection performance. Precision@K analysis showed that prioritisation remained near unity for the top-ranked biomodels and declined smoothly thereafter \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e9\u003c/span\u003eD\u003cb\u003e)\u003c/b\u003e, indicating confident discrimination at clinically relevant ranks. Consistently, ROC analysis demonstrated excellent discriminative performance for the joint model \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e9\u003c/span\u003eC; \u003cb\u003eAUC\u0026thinsp;\u0026asymp;\u0026thinsp;0.99)\u003c/b\u003e, confirming robust separation of top versus non-top biomodels across heterogeneous tumours. Stability analysis further demonstrated convergence on a small subset of highly representative biomodels, most notably \u003cem\u003eMS751, HELASF\u003c/em\u003e, and \u003cem\u003eHELA\u003c/em\u003e \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e9\u003c/span\u003eE\u003cb\u003e)\u003c/b\u003e, highlighting their recurrent suitability for modelling TME-driven cervical cancer biology. Finally, regression analysis revealed a strong linear relationship between absolute TME-restricted correlation and relative (patient-normalised) TME enrichment \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e9\u003c/span\u003eF; \u003cb\u003eR\u0026sup2; \u0026asymp; 0.60)\u003c/b\u003e, demonstrating that relative concordance captures substantial additional signal beyond absolute similarity alone. Collectively, these results establish TME-restricted methylation as a powerful and complementary feature space that enhances biological interpretability, improves tumour\u0026ndash;model alignment, and strengthens translational confidence in experimental model selection.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec20\" class=\"Section2\"\u003e \u003ch2\u003eMULTI-FEATURE INTEGRATION AND PREDICTIVE MODELLING\u003c/h2\u003e \u003cp\u003eIntegrating multiple biologically constrained feature spaces further strengthened tumour\u0026ndash;model prioritisation, demonstrating that complementary epigenetic signals jointly enhance predictive performance. When combining DEA-restricted and driver-restricted methylation features (\u003cem\u003eDEA\u0026thinsp;+\u0026thinsp;Driver\u003c/em\u003e), the joint model achieved near-perfect discriminative ability, with the ROC curve indicating complete separation of top versus non-top models \u003cb\u003e(AUC\u0026thinsp;=\u0026thinsp;1.0;\u003c/b\u003e Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e8\u003c/span\u003eC\u003cb\u003e)\u003c/b\u003e. Precision@K analysis showed exceptionally high accuracy at clinically relevant ranks, with precision remaining close to unity for the top three to four prioritised models before gradually declining \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e8\u003c/span\u003eD\u003cb\u003e)\u003c/b\u003e. This pattern reflects confident and selective ranking behaviour rather than diffuse or ambiguous assignment. Stability analysis revealed recurrent selection of a small subset of experimental systems\u0026mdash;most prominently \u003cem\u003eMS751, HELASF, SKG-IIIA\u003c/em\u003e, and \u003cem\u003eTC-YIK\u003c/em\u003e \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e8\u003c/span\u003eE\u003cb\u003e)\u003c/b\u003e\u0026mdash;indicating that driver-informed DEA features converge on reproducible tumour archetypes across patients. Consistent with this behaviour, regression of absolute versus relative driver-restricted correlations demonstrated a strong linear relationship \u003cb\u003e(R\u0026sup2; \u0026asymp; 0.61;\u003c/b\u003e Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e8\u003c/span\u003eF\u003cb\u003e)\u003c/b\u003e, confirming that patient-normalised relative enrichment captures a dominant and biologically meaningful component of tumour\u0026ndash;model concordance beyond absolute similarity alone. Extending this framework to jointly integrate DEA, driver, and tumour microenvironment\u0026ndash;restricted features (DEA\u0026thinsp;+\u0026thinsp;Driver\u0026thinsp;+\u0026thinsp;TME) yielded the most balanced and biologically expressive model. The joint classifier maintained excellent discriminative performance \u003cb\u003e(ROC AUC\u0026thinsp;\u0026asymp;\u0026thinsp;0.99;\u003c/b\u003e Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e9\u003c/span\u003eC\u003cb\u003e)\u003c/b\u003e while preserving high Precision@K across the top-ranked models \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e9\u003c/span\u003eD\u003cb\u003e)\u003c/b\u003e, indicating robust prioritisation even under increased feature complexity. Importantly, stability analysis again showed convergence on a consistent core of experimental systems, with \u003cem\u003eMS751, HELASF\u003c/em\u003e, and \u003cem\u003eHeLa\u003c/em\u003e most frequently selected \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e9\u003c/span\u003eE\u003cb\u003e)\u003c/b\u003e. This convergence mirrors the strong and recurrent tumour\u0026ndash;model similarity structure observed in the TME-restricted similarity profiles \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e9\u003c/span\u003eB\u003cb\u003e)\u003c/b\u003e, suggesting that integration of tumour-intrinsic, driver-linked, and microenvironmental methylation signals identifies models that best recapitulate both intrinsic tumour biology and contextual TME states. The strong association between relative and absolute TME-restricted correlations \u003cb\u003e(R\u0026sup2; \u0026asymp; 0.60;\u003c/b\u003e Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e9\u003c/span\u003eF\u003cb\u003e)\u003c/b\u003e further confirmed that relative, within-patient enrichment is a critical determinant of model fidelity when microenvironmental features are considered. This is consistent with the structured TME-linked methylation states observed at the tumour level \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e9\u003c/span\u003eA\u003cb\u003e)\u003c/b\u003e. Finally, integration of driver and tumour microenvironment features alone (Driver\u0026thinsp;+\u0026thinsp;TME) retained high predictive power, with ROC analysis demonstrating excellent separation of top versus non-top models \u003cb\u003e(AUC\u0026thinsp;\u0026asymp;\u0026thinsp;0.99;\u003c/b\u003e Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e9\u003c/span\u003eC\u003cb\u003e)\u003c/b\u003e and Precision@K curves showing strong performance at low K values \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e9\u003c/span\u003eD\u003cb\u003e)\u003c/b\u003e. Stability profiling revealed a partially overlapping yet distinct subset of recurrently selected models compared with driver-inclusive frameworks, consistent with preferential selection of systems that capture immune and stromal tumour contexts rather than purely tumour-intrinsic programmes. Regression analysis of absolute versus relative Driver\u0026thinsp;+\u0026thinsp;TME-restricted correlations showed a strong linear association \u003cb\u003e(R\u0026sup2; \u0026asymp; 0.52)\u003c/b\u003e, confirming that patient-normalised relative enrichment contributes substantial independent information beyond absolute concordance alone. Collectively, these results demonstrate that multi-feature integration is synergistic rather than merely additive. DEA captures disease-specific epigenetic disruption, driver features encode oncogenic identity, and TME features reflect contextual tumour states. Their integrated use yields a robust, interpretable, and highly accurate predictive framework for tumour-aware experimental model selection, directly supporting translational and precision-modelling applications in cervical cancer.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec21\" class=\"Section2\"\u003e \u003ch2\u003eUNSUPERVISED CLUSTERING AND LATENT STRUCTURE DISCOVERY\u003c/h2\u003e \u003cp\u003eTo determine whether tumour microenvironment (TME)\u0026ndash;associated methylation patterns encode coherent latent tumour states, we performed unsupervised clustering and low-dimensional embedding using patient-wide similarity profiles derived exclusively from \u003cem\u003eTME\u003c/em\u003e-restricted CpGs. This strategy enables discovery of biologically meaningful tumour subgroups without imposing clinical labels or outcome-driven bias. Both UMAP and t-SNE embeddings demonstrated clear, compact, and well-separated clusters, indicating that TME-linked methylation patterns encode strong, non-random biological signal \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig9\" class=\"InternalRef\"\u003e10\u003c/span\u003eA,B\u003cb\u003e)\u003c/b\u003e. When analysis was restricted to TME-associated CpGs alone, four highly distinct tumour clusters \u003cem\u003e(C1\u0026ndash;C4)\u003c/em\u003e emerged with minimal overlap \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig9\" class=\"InternalRef\"\u003e10\u003c/span\u003eA,B\u003cb\u003e)\u003c/b\u003e. These clusters exhibited characteristic structural features, including compact communities consistent with relatively homogeneous immune or stromal states and elongated manifolds suggestive of gradual transitions in microenvironmental composition, such as immune infiltration or stromal activation. The close concordance between UMAP and t-SNE embeddings confirms that these structures are robust to dimensionality-reduction method and not driven by projection artefacts. Importantly, these TME-driven tumour states emerged independently of tumour-intrinsic driver information, demonstrating that microenvironment-associated methylation represents a dominant and orthogonal axis of tumour heterogeneity. This observation is consistent with the strong explanatory power of relative \u003cem\u003eTME\u003c/em\u003e-restricted enrichment observed in downstream modelling \u003cb\u003e(R\u0026sup2; \u0026asymp; 0.60)\u003c/b\u003e, indicating that TME-linked signal captures substantial variance in tumour identity beyond absolute similarity alone. Integration of driver and TME features preserved strong clustering while reducing background noise, yielding three coherent tumour communities in the joint \u003cem\u003eDriver\u0026thinsp;+\u0026thinsp;TME\u003c/em\u003e model \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig9\" class=\"InternalRef\"\u003e10\u003c/span\u003eC\u003cb\u003e)\u003c/b\u003e. These communities displayed smooth transitional boundaries, consistent with a continuum of tumour states shaped jointly by intrinsic oncogenic programmes and extrinsic microenvironmental context. Extending integration to include disease-associated CpGs \u003cem\u003e(DEA\u0026thinsp;+\u0026thinsp;Driver\u0026thinsp;+\u0026thinsp;TME)\u003c/em\u003e retained this structured separation while further stabilising cluster geometry \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig9\" class=\"InternalRef\"\u003e10\u003c/span\u003eD\u003cb\u003e)\u003c/b\u003e, mirroring the high predictive performance and biological grounding observed in the corresponding joint prioritisation models. Across all embeddings, cluster assignments were concordant with Rphenograph-derived community structure, confirming that the observed tumour states reflect stable latent organisation rather than methodological artefacts. Notably, tumours with similar clinical annotations frequently segregated into different epigenetic states, while clinically distinct tumours often converged within shared \u003cem\u003eTME\u003c/em\u003e-driven clusters. This highlights the limitations of conventional clinicopathological stratification and underscores the central role of the tumour microenvironment in defining functional tumour identity.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec22\" class=\"Section2\"\u003e \u003ch2\u003eCLINICAL ENRICHMENT AND PHENOTYPIC STRATIFICATION\u003c/h2\u003e \u003cp\u003eTo evaluate the clinical relevance of methylation-defined tumour states identified by the integrated \u003cem\u003eTME\u0026thinsp;+\u0026thinsp;driver\u003c/em\u003e framework, we examined the distribution of clinical, virological, and host-related variables across the three unsupervised UMAP clusters \u003cem\u003e(C1\u0026ndash;C3)\u003c/em\u003e. Dimensionality reduction of the joint \u003cem\u003eTME\u0026thinsp;+\u0026thinsp;driver\u003c/em\u003e methylation profiles revealed a highly structured embedding with three well-separated tumour communities, indicating that integration of tumour-extrinsic microenvironmental signals with driver-associated CpGs captures coherent biological organisation independent of clinical annotation \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig10\" class=\"InternalRef\"\u003e11\u003c/span\u003e\u003cb\u003e)\u003c/b\u003e. Stratification by HPV genotype demonstrated non-random enrichment patterns across the embedding \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig10\" class=\"InternalRef\"\u003e11\u003c/span\u003eA\u003cb\u003e)\u003c/b\u003e. HPV16-positive tumours clustered tightly within a restricted region of the UMAP space, consistent with a relatively homogeneous epigenetic programme. In contrast, HPV18-positive and other high-risk HPV\u0026ndash;associated tumours occupied partially overlapping but spatially distinct regions, while low-risk and unknown HPV types were more diffusely distributed. These patterns indicate that viral genotype exerts a measurable and persistent influence on the integrated tumour\u0026ndash;microenvironment methylation landscape. Host-related factors further stratified the embedding. When coloured by BMI group, tumours segregated along a principal UMAP axis, with overweight and obese cases preferentially localising to one cluster, whereas normal and underweight samples were more prominent in an opposing region \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig10\" class=\"InternalRef\"\u003e11\u003c/span\u003eB\u003cb\u003e)\u003c/b\u003e. This gradient suggests that host metabolic state is reflected in tumour-associated epigenetic features when microenvironmental and driver-linked signals are jointly considered. Age stratification revealed a similar structured distribution \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig10\" class=\"InternalRef\"\u003e11\u003c/span\u003eC\u003cb\u003e)\u003c/b\u003e, with younger patients clustering more tightly and older age groups showing increased dispersion, consistent with cumulative epigenetic divergence and greater tumour heterogeneity with advancing age. Stratification by HIV status revealed the most pronounced phenotypic organisation \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig10\" class=\"InternalRef\"\u003e11\u003c/span\u003eD\u003cb\u003e)\u003c/b\u003e. HIV-positive cervical cancers segregated preferentially into specific regions of the UMAP space, with limited overlap with HIV-negative cases. Notably, HIV-positive tumours exhibited increased spread across the embedding, consistent with heightened epigenetic heterogeneity potentially driven by chronic immune activation and microenvironmental remodelling. The persistence of this separation within the \u003cem\u003eTME\u0026thinsp;+\u0026thinsp;driver\u003c/em\u003e model indicates that HIV-associated immune and stromal methylation signatures act as a dominant axis of tumour stratification beyond tumour-intrinsic oncogenic features alone. Overlay of cluster assignments confirmed that clinical variables were not randomly distributed across \u003cem\u003eC1\u0026ndash;C3\u003c/em\u003e but instead showed clear enrichment patterns. Individual clusters captured distinct combinations of viral genotype, host immune status, and host-related phenotypes, supporting the biological validity of the inferred tumour states. Collectively, these results demonstrate that the integrated \u003cem\u003eTME\u0026thinsp;+\u0026thinsp;driver\u003c/em\u003e methylation framework defines clinically meaningful tumour communities that reflect coordinated interactions between oncogenic programmes, viral context, and host microenvironmental factors.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cdiv id=\"Sec23\" class=\"Section3\"\u003e \u003ch2\u003eFUNCTIONAL ENRICHMENT ANALYSIS\u003c/h2\u003e \u003cp\u003eTo determine whether the methylation-defined tumour clusters represent biologically meaningful states rather than computational artefacts, functional enrichment analysis was performed on cluster-specific differentially methylated genes derived from the joint \u003cem\u003eTME\u0026ndash;Driver\u003c/em\u003e model \u003cb\u003e(R\u0026sup2; \u0026asymp; 0.52)\u003c/b\u003e. Genes were ranked according to cluster-specific methylation deviation and interrogated using Gene Ontology (GO) Biological Processes and KEGG pathway enrichment. This framework enabled systematic characterisation of the biological programmes underlying each tumour state and inference of their immunological and microenvironmental context. Across all clusters, enrichment analyses revealed highly structured and biologically coherent signatures, confirming that the identified methylation-based clusters reflect distinct tumour phenotypes rather than stochastic epigenetic variation \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig11\" class=\"InternalRef\"\u003e12\u003c/span\u003eA\u0026ndash;F\u003cb\u003e)\u003c/b\u003e. Cluster C1 was characterised by strong enrichment of biological processes related to immune surveillance, inflammatory signalling, and environmental sensing \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig11\" class=\"InternalRef\"\u003e12\u003c/span\u003eA\u003cb\u003e)\u003c/b\u003e. Prominent GO terms included sensory perception of chemical stimulus, detection of external stimuli, immune and inflammatory response pathways, neuronal fate commitment, and cell adhesion. Consistent with this, KEGG pathway analysis highlighted olfactory transduction, NOD-like receptor signalling, natural killer cell\u0026ndash;mediated cytotoxicity, IL-17 signalling, and complement and coagulation cascades \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig11\" class=\"InternalRef\"\u003e12\u003c/span\u003eD\u003cb\u003e)\u003c/b\u003e. Together, these features define a tumour state marked by active immune sensing and innate immune engagement, consistent with an immune-inflamed microenvironment. The enrichment of cytokine signalling and innate immune pathways suggests enhanced immune surveillance and inflammatory activity, indicating that tumours within this cluster may be particularly responsive to immune-modulating therapeutic strategies. Cluster C2 exhibited a distinct functional profile dominated by pathways involved in neuronal signalling, endocrine regulation, and cellular differentiation \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig11\" class=\"InternalRef\"\u003e12\u003c/span\u003eB\u003cb\u003e)\u003c/b\u003e. Enriched biological processes included neuron fate commitment, dopaminergic and neurotransmitter signalling, neuroactive ligand\u0026ndash;receptor interaction, cAMP signalling, calcium signalling, and hormonal regulation. KEGG analysis further supported this phenotype, revealing enrichment of neuroactive ligand\u0026ndash;receptor interaction, endocrine signalling pathways, microRNAs in cancer, and pathways regulating pluripotency \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig11\" class=\"InternalRef\"\u003e12\u003c/span\u003eE\u003cb\u003e)\u003c/b\u003e. Collectively, these findings indicate a neuroendocrine-like tumour state characterised by epigenetic regulation of intracellular signalling cascades rather than immune activation. This cluster is consistent with an immune-cold or immune-evasive phenotype marked by enhanced receptor-mediated signalling, increased transcriptional plasticity, and reduced immune engagement.\u003c/p\u003e \u003cp\u003eCluster C3 demonstrated strong enrichment for immune and stromal interaction pathways, including defence response to bacteria, cytokine\u0026ndash;cytokine receptor interaction, NOD-like receptor signalling, and complement and coagulation cascades \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig11\" class=\"InternalRef\"\u003e12\u003c/span\u003eC\u003cb\u003e)\u003c/b\u003e. GO terms further highlighted immune response regulation, pattern recognition receptor signalling, inflammatory signalling, and extracellular matrix\u0026ndash;associated processes. KEGG enrichment revealed associations with viral infection pathways, including HIV and Kaposi sarcoma\u0026ndash;associated herpesvirus, reflecting immune activation within a virally influenced tumour microenvironment \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig11\" class=\"InternalRef\"\u003e12\u003c/span\u003eF\u003cb\u003e)\u003c/b\u003e. This profile is consistent with a tumour state shaped by chronic immune stimulation and inflammatory signalling, reflecting strong tumour\u0026ndash;immune\u0026ndash;stromal crosstalk. The prominence of viral response and cytokine-mediated pathways aligns with the biological context of HPV-driven cervical cancer and supports a central role for host\u0026ndash;pathogen interactions in shaping tumour epigenetic states. Collectively, these findings demonstrate that methylation-defined tumour clusters represent distinct and biologically interpretable endotypes rather than arbitrary computational groupings. The clusters capture immune-inflamed tumours characterised by active innate immunity \u003cem\u003e(C1)\u003c/em\u003e, neuroendocrine-like tumours dominated by signalling and differentiation programmes \u003cem\u003e(C2)\u003c/em\u003e, and immune\u0026ndash;stromal reactive tumours enriched for inflammatory and viral-response pathways \u003cem\u003e(C3)\u003c/em\u003e. Importantly, these patterns were consistent across GO and KEGG analyses, aligned with established tumour\u0026ndash;microenvironment biology, and largely independent of conventional clinical classifications. Together, these results validate joint TME\u0026ndash;Driver methylation profiling as a robust framework for tumour stratification and provide mechanistic insight into how epigenetic regulation shapes immune engagement, tumour behaviour, and potential therapeutic vulnerability.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec24\" class=\"Section2\"\u003e \u003ch2\u003eINTRA-CLUSTER DIFFERENTIAL EXPRESSION ANALYSIS (DEA)\u003c/h2\u003e \u003cp\u003eIntra-cluster differential methylation analysis demonstrated that HIV infection induces endotype-specific epigenetic reprogramming within methylation-defined tumour states rather than a uniform effect across cervical cancer. By performing differential analysis separately within each cluster, this approach isolates HIV-associated methylation changes that are conditional on tumour epigenetic context, thereby avoiding signal dilution caused by cross-cluster heterogeneity. Within Cluster \u003cem\u003eC1\u003c/em\u003e, the volcano plot revealed a robust HIV-associated methylation signature characterised by a substantial number of significant differentially methylated CpGs (DMPs) and moderate-to-large effect sizes \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig12\" class=\"InternalRef\"\u003e13\u003c/span\u003eA\u003cb\u003e)\u003c/b\u003e. The distribution was skewed toward hypermethylation in HIV-positive tumours, with a smaller but distinct hypomethylated component. Several loci exceeded\u0026thinsp;\u0026minus;\u0026thinsp;log₁₀(P) values of 5, indicating reproducible and biologically meaningful epigenetic perturbation. Compared with other clusters, the log fold-change range was moderately constrained, suggesting a regulated yet consistent HIV response within this tumour endotype. Cluster \u003cem\u003eC2\u003c/em\u003e exhibited a markedly attenuated HIV-associated methylation profile \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig12\" class=\"InternalRef\"\u003e13\u003c/span\u003eB\u003cb\u003e)\u003c/b\u003e. Both the number of significant DMPs and their effect sizes were reduced relative to \u003cem\u003eC1\u003c/em\u003e and \u003cem\u003eC3\u003c/em\u003e, with log fold changes largely confined within \u0026plusmn;\u0026thinsp;0.3 and lower overall statistical significance. Nevertheless, clear bidirectional methylation changes were observed, indicating that HIV status remains biologically relevant even within this more epigenetically stable tumour state. The balanced presence of hyper- and hypomethylated CpGs suggests targeted modulation of specific regulatory loci rather than widespread epigenomic remodeling, consistent with a tumour endotype that is comparatively resistant to global HIV-driven epigenetic shifts. In contrast, Cluster \u003cem\u003eC3\u003c/em\u003e displayed the strongest and most extensive HIV-associated methylation reprogramming \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig12\" class=\"InternalRef\"\u003e13\u003c/span\u003eC\u003cb\u003e)\u003c/b\u003e. The volcano plot revealed a dense and asymmetric burden of DMPs, with pronounced enrichment of hypomethylated CpGs in HIV-positive tumours alongside a substantial hypermethylated component. Effect sizes spanned a wide log fold-change range (approximately\u0026thinsp;\u0026minus;\u0026thinsp;0.6 to +\u0026thinsp;0.6), and top loci exceeded\u0026thinsp;\u0026minus;\u0026thinsp;log₁₀(P) values of 8, indicating strong and coordinated epigenetic disruption. The magnitude and density of these signals suggest that \u003cem\u003eC3\u003c/em\u003e represents a highly HIV-responsive tumour state, potentially reflecting an immune-active or microenvironmentally plastic context in which viral co-infection profoundly reshapes regulatory methylation landscapes. For comparison, a global differential methylation analysis pooling tumours across all clusters revealed extensive HIV-associated hypermethylation and hypomethylation across the cervical cancer methylome \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig12\" class=\"InternalRef\"\u003e13\u003c/span\u003eD\u003cb\u003e)\u003c/b\u003e. While this global analysis confirms a broad HIV effect, it obscures the pronounced cluster-specific differences in signal strength, directionality, and regulatory scope observed in the intra-cluster analyses. Together, these findings demonstrate that HIV-associated epigenetic reprogramming is strongly modulated by underlying tumour endotype, supporting the existence of cluster-specific intra-\u003cem\u003eTME-Driver\u003c/em\u003e sub-states rather than a single, uniform HIV-driven methylation programme.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cdiv id=\"Sec25\" class=\"Section3\"\u003e \u003ch2\u003eFunctional Enrichment Analysis\u003c/h2\u003e \u003cp\u003eIntegration of tumour microenvironment (TME)\u0026ndash; and driver-restricted methylation features revealed stable and biologically interpretable tumour states that align with cluster-specific functional programmes. UMAP projection of the integrated feature space demonstrated clear separation of tumour groups, with partial but non-random stratification by HIV status and cervical cancer diagnostic category \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig13\" class=\"InternalRef\"\u003e14\u003c/span\u003eA\u003cb\u003e).\u003c/b\u003e Rather than forming a continuous gradient, tumours segregated into discrete communities, indicating that joint modelling of contextual (TME) and oncogenic (driver) methylation signals captures latent tumour states with defined biological identity. HIV-positive tumours were distributed non-uniformly across these clusters, preferentially occupying specific regions of the embedding, consistent with the endotype-specific methylation and functional signatures described above. The preservation of cluster structure under multi-feature integration further supports the synergistic design of TOBI, demonstrating that combining tumour-intrinsic and microenvironmental signals enhances biological resolution beyond single-feature analyses. Functional enrichment analysis of cluster-specific differentially methylated genes confirmed that HIV-associated epigenetic reprogramming is profoundly endotype dependent. In Cluster \u003cem\u003eC3\u003c/em\u003e, Gene Ontology (GO) Biological Process enrichment revealed a strong predominance of negative normalised enrichment scores (NES), indicating preferential enrichment in HIV-negative tumours \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig13\" class=\"InternalRef\"\u003e14\u003c/span\u003eD\u003cb\u003e)\u003c/b\u003e. Dominant pathways included protein catabolic processes, post-translational modification, phosphorylation, apoptotic signalling, and organophosphate and amide metabolic processes. This pattern suggests that \u003cem\u003eC3\u003c/em\u003e tumours in the absence of HIV maintain tightly regulated metabolic and proteostatic networks. The coordinated loss of these programmes in HIV-positive \u003cem\u003eC3\u003c/em\u003e tumours is consistent with extensive epigenetic disruption of core cellular homeostasis, identifying \u003cem\u003eC3\u003c/em\u003e as a highly HIV-responsive and epigenetically plastic endotype. These functional shifts align with the large-amplitude intra-cluster differential methylation observed previously, reinforcing the interpretation of \u003cem\u003eC3\u003c/em\u003e as particularly sensitive to viral perturbation.\u003c/p\u003e \u003cp\u003eIn contrast, Cluster \u003cem\u003eC1\u003c/em\u003e displayed strong positive NES values for immune-related biological processes, indicating enrichment in HIV-positive tumours \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig13\" class=\"InternalRef\"\u003e14\u003c/span\u003eB\u003cb\u003e)\u003c/b\u003e. Enriched pathways included myeloid cell differentiation, leukocyte activation, immune effector processes, defence responses to bacteria, and regulation of immune responses. This immune-dominant signature suggests that HIV-positive C1 tumours reside within an activated or inflamed tumour microenvironment, with epigenetic regulation favouring immune cell recruitment and functional differentiation. The specificity of immune and myeloid programmes to \u003cem\u003eC1\u003c/em\u003e underscores the endotype dependence of HIV effects and supports the classification of \u003cem\u003eC1\u003c/em\u003e as an immune-reactive tumour state in which HIV amplifies microenvironmental signalling rather than suppressing intrinsic tumour biology. Cluster \u003cem\u003eC2\u003c/em\u003e exhibited an intermediate and bidirectional functional profile \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig13\" class=\"InternalRef\"\u003e14\u003c/span\u003eC\u003cb\u003e)\u003c/b\u003e. HIV-positive tumours showed enrichment of cell-cycle regulation, DNA metabolic processes, translation, and cell junction organisation, whereas HIV-negative tumours were enriched for small-molecule and monocarboxylic acid metabolic pathways. This pattern indicates selective rebalancing of proliferative and biosynthetic programmes in response to HIV infection, coupled with attenuation of metabolic flexibility. Compared with \u003cem\u003eC1\u003c/em\u003e and \u003cem\u003eC3, C2\u003c/em\u003e represents a transitional endotype in which HIV-associated methylation changes modulate growth and metabolic pathways without inducing widespread immune activation or global metabolic collapse. Collectively, the concordance between the integrated UMAP structure \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig13\" class=\"InternalRef\"\u003e14\u003c/span\u003eA\u003cb\u003e)\u003c/b\u003e and the cluster-specific functional enrichment patterns \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig13\" class=\"InternalRef\"\u003e14\u003c/span\u003eB\u0026ndash;D\u003cb\u003e)\u003c/b\u003e confirms that the framework captures biologically meaningful tumour states shaped by the interaction of viral infection, tumour-intrinsic programmes, and the tumour microenvironment. These results validate the core design principle of the study: that biologically constrained feature integration combined with intra-cluster resolution reveals mechanistic tumour endotypes that are not apparent from clinical annotation alone.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec26\" class=\"Section3\"\u003e \u003ch2\u003eIntra-(DEA)-Cluster Tumour\u0026ndash;Model Mapping\u003c/h2\u003e \u003cp\u003eRestricting tumour\u0026ndash;model similarity analysis to C3-specific HIV-associated DMPs produced a compact but highly informative feature space comprising 2,310 CpGs shared across patient tumours and cervical cancer cell lines. This represents a substantial refinement relative to the much larger global DEA feature set used in earlier analyses. Heatmap visualisation of the top 500 C3 HIV-associated DMPs revealed highly structured and coordinated methylation blocks across tumours, with clear segregation driven by HIV status despite all samples belonging to the same methylation-defined endotype \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig14\" class=\"InternalRef\"\u003e15\u003c/span\u003eC\u003cb\u003e)\u003c/b\u003e. This indicates that substantial epigenetic heterogeneity persists within C3 and that HIV infection imposes a coherent secondary regulatory layer superimposed on the core cluster identity. Importantly, these methylation patterns aligned with multiple clinical annotations, including HPV genotype, cancer stage, age, and BMI, demonstrating that intra-cluster HIV-associated methylation captures biologically meaningful tumour sub-contexts rather than residual technical or stochastic variation. DEA-restricted Spearman correlation profiling between C3 tumours and cervical cancer cell lines further revealed non-random and highly structured tumour\u0026ndash;model concordance patterns \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig14\" class=\"InternalRef\"\u003e15\u003c/span\u003eD\u003cb\u003e)\u003c/b\u003e. Multi-line similarity plots showed that individual cell lines exhibited recurrent peaks and troughs across patients, rather than flat or overlapping profiles, indicating selective alignment with specific tumour subsets within the same cluster. Several models consistently achieved higher concordance across large fractions of patients, while others displayed pronounced patient-specific variability or uniformly low similarity. These patterns support the existence of HIV-modulated sub-states within the C3 endotype that are differentially represented in vitro. Correlation values spanned weak to moderately strong positive ranges, highlighting that restriction to HIV-associated CpGs amplifies biologically relevant signal while suppressing background similarity driven by shared endotype structure alone \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig14\" class=\"InternalRef\"\u003e15\u003c/span\u003eA\u0026ndash;B\u003cb\u003e)\u003c/b\u003e.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec27\" class=\"Section3\"\u003e \u003ch2\u003eEndotype-aware Model Performance\u003c/h2\u003e \u003cp\u003eThe random forest classifier trained on TME-, driver-, and DMP-based similarity scores achieved high predictive accuracy (\u0026asymp;\u0026thinsp;97%) under repeated cross-validation, demonstrating that the integrated feature representation robustly encodes experimental model identity. Performance was stable across folds, excluding overfitting and confirming generalizability. The confusion matrix shows near-perfect classification of consensus labels, with \u003cem\u003eMS751\u003c/em\u003e correctly assigned in the overwhelming majority of cases and only minimal misclassification between \u003cem\u003eHELA\u003c/em\u003e and \u003cem\u003eOMC-1\u003c/em\u003e \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig15\" class=\"InternalRef\"\u003e16\u003c/span\u003eA\u003cb\u003e)\u003c/b\u003e. Importantly, classification errors are sparse and asymmetric, indicating that misassignment occurs primarily among biologically adjacent models rather than at random. This supports the interpretation that the framework is resolving fine-grained biological similarity rather than merely separating trivial or extreme cases. Feature importance analysis reveals that intra-cluster HIV-associated DMP similarity is the dominant predictor, followed by TME-restricted similarity, while driver-restricted similarity contributes minimally to the consensus setting \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig15\" class=\"InternalRef\"\u003e16\u003c/span\u003eB\u003cb\u003e)\u003c/b\u003e. This is mechanistically consistent with earlier observations showing that HIV-linked epigenetic programmes and microenvironmental state are primary determinants of tumour identity in this cohort, whereas driver-associated methylation is more conserved across experimental models once consensus is enforced. Predicted class distributions closely mirror the true consensus labels, indicating no systematic inflation or suppression of specific classes \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig15\" class=\"InternalRef\"\u003e16\u003c/span\u003eC\u003cb\u003e)\u003c/b\u003e. The model therefore preserves class balance, confirming that the high accuracy is not driven by class imbalance or majority-class effects. Prediction probabilities are sharply peaked near 1.0 for the assigned class, particularly for \u003cem\u003eMS751\u003c/em\u003e, indicating strong model confidence. Slightly broader, yet still well-separated, probability distributions for \u003cem\u003eHELA\u003c/em\u003e and \u003cem\u003eOMC-1\u003c/em\u003e reflect smaller sample sizes and closer biological proximity rather than classifier uncertainty \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig15\" class=\"InternalRef\"\u003e16\u003c/span\u003eD\u003cb\u003e)\u003c/b\u003e. Together, these probability profiles demonstrate that the framework yields decisive, stable, and biologically interpretable predictions rather than ambiguous or weakly differentiated rankings.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec28\" class=\"Section2\"\u003e \u003ch2\u003eStability Analysis of Experimental Model Selection Across Feature Spaces\u003c/h2\u003e \u003cp\u003eStability analysis quantified how consistently specific experimental models were prioritised as Top-3 matches across patients under different feature-integration regimes. The DEA\u0026thinsp;+\u0026thinsp;Driver configuration produced the least compact stability profile, with selections distributed across multiple models (\u003cem\u003eMS751, HELASF, SKG-IIIA, TC-YIK\u003c/em\u003e, and \u003cem\u003eOMC-1\u003c/em\u003e) \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig16\" class=\"InternalRef\"\u003e17\u003c/span\u003eD\u003cb\u003e)\u003c/b\u003e. Although \u003cem\u003eMS751\u003c/em\u003e remained dominant, the broader tail of selected models indicates that tumour-intrinsic signals alone are insufficient to enforce a unique or consistent model match. This highlights the limitation of driver-centric or global differential analyses when divorced from tumour context and microenvironmental state. Restricting the feature space to driver- and TME-associated CpGs preserved strong stability for \u003cem\u003eMS751\u003c/em\u003e and \u003cem\u003eHeLa\u003c/em\u003e, with \u003cem\u003eHELASF\u003c/em\u003e remaining prominent but slightly reduced relative to the full model \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig16\" class=\"InternalRef\"\u003e17\u003c/span\u003eB\u003cb\u003e)\u003c/b\u003e. Importantly, the overall model set became more compact, with fewer low-frequency selections. This indicates that oncogenic and microenvironmental signals together are sufficient to define a robust core of representative experimental systems, while infection-linked epigenetic features (DMPs) further sharpen consensus and suppress marginal alternatives. The joint model excluding intra-cluster DMP refinement still recovered MS751 as the most stable model but exhibited greater dispersion among secondary candidates, including \u003cem\u003eHeLa\u003c/em\u003e, \u003cem\u003eHELASF\u003c/em\u003e, and \u003cem\u003eSKG-IIIA\u003c/em\u003e \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig16\" class=\"InternalRef\"\u003e17\u003c/span\u003eC\u003cb\u003e)\u003c/b\u003e. Compared with the fully integrated configuration, this setting showed increased ambiguity between biologically related cell lines, demonstrating that intra-cluster DMP stratification plays a critical role in resolving fine-grained tumour identity that is otherwise blurred when broader feature spaces are merged alone. When all three feature spaces were jointly integrated, a highly stable and sharply ranked hierarchy of experimental models emerged. \u003cem\u003eMS751\u003c/em\u003e was the most consistently selected model across patients, followed by \u003cem\u003eHELASF\u003c/em\u003e and \u003cem\u003eHeLa\u003c/em\u003e, with a steep drop-off for remaining lines \u003cb\u003e(\u003c/b\u003eFig.\u0026nbsp;\u003cspan refid=\"Fig16\" class=\"InternalRef\"\u003e17\u003c/span\u003eA\u003cb\u003e)\u003c/b\u003e. This pattern indicates that full multi-feature integration converges on a small, dominant set of biologically representative systems rather than distributing selections diffusely across many candidates. The dominance of \u003cem\u003eMS751, HELASF\u003c/em\u003e, and \u003cem\u003eHeLa\u003c/em\u003e reflects their strong concordance with patient tumours across infection-associated epigenetic programmes (DMPs), tumour microenvironment context (TME), and oncogenic background (driver loci), establishing them as the most globally faithful experimental models in this cohort.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eTaken together, this stability analysis demonstrates that maximal robustness and biological specificity are achieved only when DMP, TME, and driver features are jointly integrated. Full integration collapses model selection onto a small, reproducible set of experimental systems, led by \u003cem\u003eMS751\u003c/em\u003e, with \u003cem\u003eHELASF\u003c/em\u003e and \u003cem\u003eHeLa\u003c/em\u003e as consistent secondary matches. Progressive removal of contextual or infection-associated features results in increased dispersion and reduced consensus, underscoring that tumour identity is fundamentally multi-axial. These results provide a final validation of the analytical framework, showing that its feature-restricted, intra-cluster, and integrative design yields stable, interpretable, and biologically grounded experimental model prioritisation. The framework therefore offers a principled approach to tumour-aware model selection that outperforms single-feature or purely tumour-intrinsic strategies and is directly applicable to translational and precision-modelling studies in HIV-associated cervical cancer.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec29\" class=\"Section2\"\u003e \u003ch2\u003eExtension to additional omics layers: transcriptomic foundations from TCGA-CESC\u003c/h2\u003e \u003cp\u003eDuring the initial development of the framework, transcriptomic data from TCGA-CESC served as a critical proof-of-concept for tumour\u0026ndash;model concordance across complementary biological dimensions. These analyses established the conceptual and analytical foundations that were later generalised to DNA methylation and multi-feature integration, demonstrating that model fidelity is inherently feature-dependent and context-specific rather than universal. To systematically assess how well commonly used cervical cancer cell lines represent patient tumours, transcriptomic correlation analyses were performed between TCGA-CESC tumours and GDSC2 cell lines. Gene expression profiles were stratified into cancer-driver, tumour microenvironment (TME), and global transcriptome feature spaces, enabling explicit separation of tumour-intrinsic oncogenic programmes from contextual immune and stromal signals. This decomposition made it possible to interrogate model fidelity along biologically interpretable axes rather than relying on undifferentiated genome-wide similarity. Across all gynaecological cancer models, cervical cancer\u0026ndash;derived cell lines consistently showed the highest concordance with TCGA-CESC tumours, confirming lineage specificity and validating the analytical approach. This observation was biologically expected: TCGA-CESC tumours are predominantly HPV-driven epithelial cancers, and the corresponding cell lines (e.g., \u003cem\u003eHeLa, SiHa, CaSki, MS751, SISO\u003c/em\u003e) share HPV-mediated transcriptional regulatory programmes. The recovery of this expected signal confirms that the correlation framework captures meaningful biological structure rather than reflecting spurious or platform-driven similarity.\u003c/p\u003e \u003cp\u003eGlobal transcriptomic similarity analyses revealed that tumour composition is a dominant determinant of experimental model representativeness. Tumour purity was strongly positively associated with cell line similarity, whereas epithelial\u0026ndash;mesenchymal transition (EMT) scores were inversely correlated. High-purity tumours more closely resembled epithelial cell lines, while EMT-high tumours diverged because of increased stromal and immune gene expression. These findings anticipated later methylation-based observations and established that microenvironmental complexity systematically reduces tumour\u0026ndash;cell line concordance. At the model level, a subset of cell lines\u0026mdash;most notably \u003cem\u003eHeLa, CaSki, MS751\u003c/em\u003e, and \u003cem\u003eSiHa\u003c/em\u003e\u0026mdash;displayed consistently high median correlations, indicating preservation of core tumour-intrinsic transcriptional programmes. In contrast, lines such as \u003cem\u003eHT-3, C-4I\u003c/em\u003e, and \u003cem\u003eSW756\u003c/em\u003e exhibited broader variance, suggesting selective alignment with specific tumour subtypes rather than general representativeness. Extending the analysis beyond cervical cancer models demonstrated that TCGA-CESC tumours aligned most strongly with cervix-derived cell lines, whereas ovarian and endometrial models showed only partial similarity. Moderate correlations with selected ovarian and endometrial lines likely reflect shared epithelial differentiation programmes and conserved immune\u0026ndash;stromal pathways across gynaecological tissues. However, the sharp decline in concordance outside cervical lineages reinforced a central principle of the study: experimental model suitability is tissue-specific and not transferable without substantial loss of biological fidelity. Restriction to cancer-driver gene expression revealed a more heterogeneous concordance landscape. While most cervical cancer lines retained moderate similarity to TCGA tumours, only a subset of HPV-positive squamous-derived models\u0026mdash;particularly \u003cem\u003eCAL-39, CaSki, ME-180, DoTc2 4510\u003c/em\u003e, and \u003cem\u003eMS751\u003c/em\u003e\u0026mdash;consistently captured HPV-driven oncogenic transcriptional programmes. The HPV-negative line \u003cem\u003eC-33A\u003c/em\u003e reproducibly diverged, reflecting its fundamentally distinct regulatory architecture. This divergence provided early evidence that viral oncogenesis is a dominant axis of tumour\u0026ndash;model alignment, a theme later reinforced by methylation-based HIV and HPV analyses. Analysis of tumour microenvironment\u0026ndash;associated gene expression demonstrated that immune and stromal programmes are only partially preserved in vitro. Nevertheless, several HPV-positive models, especially \u003cem\u003eCAL-39\u003c/em\u003e, retained strong concordance with TCGA-CESC tumours across TME features. The consistent prominence of \u003cem\u003eCAL-39\u003c/em\u003e among HPV16/18-positive squamous tumours highlighted that certain cell lines preserve microenvironment-linked transcriptional states despite ex vivo propagation, whereas others do not. This observation directly motivated the later explicit incorporation of TME-restricted features into the integrative modelling framework.\u003c/p\u003e \u003cp\u003eCollectively, these transcriptomic analyses established a set of conceptual principles that directly informed the design and formalisation of the analytical framework. First, they demonstrated that feature restriction is essential, as cancer-driver genes, tumour microenvironment (TME) markers, and global expression profiles encode distinct and only partially overlapping biological axes that cannot be meaningfully collapsed into a single similarity metric. Second, the analyses showed that relative, patient-normalise similarity is more informative than absolute concordance, particularly for contextual features such as immune and stromal programmes, where inter-patient heterogeneity dominates signal structure. Third, they established that experimental model suitability is endotype-dependent rather than universal, with different tumour subgroups preferentially aligning with distinct \u003cem\u003ein vitro\u003c/em\u003e systems depending on viral status, histology, and microenvironmental state. Finally, the strong stratification by HPV genotype, epithelial\u0026ndash;mesenchymal transition (EMT) state, and tumour purity underscored that viral and microenvironmental contexts are major drivers of cervical cancer heterogeneity, necessitating a multi-feature, integrative analytical strategy. These principles were subsequently generalised to the epigenetic layer through DNA methylation analyses in HIV-associated Nigerian cervical cancer cohorts. Within the framework, they are operationalised through explicit feature restriction, intra-cluster differential resolution, and cross-layer integration, enabling tumour-aware prioritisation of experimental models that reflect both intrinsic oncogenic programmes and context-dependent regulatory states. By anchoring the approach in transcriptomic foundations and extending it across the methylome, the framework achieves biological continuity while remaining methodologically scalable across molecular profiling modalities.\u003c/p\u003e \u003c/div\u003e"},{"header":"Discussion","content":"\u003cp\u003eWe applied a tumour-aware epigenomic pipeline to identify distinct DNA methylation endotypes in cervical cancer, shaped by tumour microenvironment (TME) context and underlying genomic programmes. Unsupervised clustering of patient methylomes enriched for TME-associated CpGs delineated endotypes with markedly divergent immune and stromal infiltration patterns and corresponding clinical outcomes. Notably, one endotype exhibited an \u003cem\u003eimmune-hot\u003c/em\u003e phenotype, characterized by elevated CD8⁺ T-cell and M1 macrophage infiltration, and was associated with improved prognosis. In contrast, another endotype displayed an \u003cem\u003eimmu\u003c/em\u003ene-cold, stroma-rich profile linked to poorer survival. These observations are concordant with independent studies demonstrating that cervical cancer subgroups with enhanced immune signatures have superior clinical outcomes (Liu et al., 2022; Zhu et al., 2022). For example, Wang et al. identified an HPV-positive subgroup enriched for T-cell infiltration and pro-inflammatory cytokine signaling that exhibited improved disease-free survival relative to immune-low counterparts. Similarly, Zhao et al. defined four DNA methylation\u0026ndash;based cervical cancer subtypes with distinct immune landscapes, including a subtype marked by high effector and memory T-cell infiltration and another characterized by minimal immune activation (Zhao et al., 2025). Collectively, these findings indicate that TME-informed methylation states capture a central axis of cervical cancer biology, stratifying patients into biologically and clinically meaningful endotypes that align with established prognostic immune signatures.\u003c/p\u003e \u003cp\u003eTo dissect heterogeneous regulatory influences, we performed feature-restricted analyses focusing separately on \u003cb\u003e(i)\u003c/b\u003e TME-related CpGs, \u003cb\u003e(ii)\u003c/b\u003e driver gene\u0026ndash;associated CpGs, and \u003cb\u003e(iii)\u003c/b\u003e differentially methylated positions (DMPs) between tumours and experimental models. This strategy captures orthogonal axes of cancer regulation. The TME axis reflects epigenetic marks associated with immune and stromal infiltration, the driver axis targets CpGs linked to genes implicated in oncogenic pathways (including established tumour suppressors and oncogenes), and the DMP axis captures global epigenetic dysregulation independent of specific functional annotation. Collectively, this framework enables the disentanglement of cell-extrinsic microenvironmental influences from cell-intrinsic tumour programmes and broad methylome perturbations. Such compartmentalised analysis aligns with the prevailing view that cancer heterogeneity arises from multiple, interacting sources\u0026mdash;including genetic drivers and immune context\u0026mdash;each imprinting distinct molecular signature (Yang et al., 2023). Moreover, epigenetic regulation in cancer is inherently dual in nature: tumour cell methylomes acquire aberrations in oncogenic loci while simultaneously shaping immune cell recruitment and immune checkpoint regulation. Consistent with this, Yang et al. reviewed mechanisms by which tumour DNA methylation modulates antitumour immunity. By parallelising analyses across TME-, driver-, and DMP-restricted feature sets, this approach preserves complementary regulatory signals that would otherwise be diluted in a global methylome analysis.\u003c/p\u003e \u003cp\u003eA key finding of this study is that HIV co-infection exerts endotype-specific epigenetic effects. Within each methylation-defined cluster, HIV-positive patients exhibited distinct DNA methylation alterations at both host and viral loci, indicating that HIV introduces an additional layer of regulatory reprogramming in cervical tumours. These findings are consistent with emerging epidemiological and molecular evidence showing that HIV/HPV co-infection is associated with aberrant DNA methylation in genes involved in viral pathogenesis and tumour progression (Zheng et al., 2025). In addition, increased methylation of high-risk HPV genomic regions, particularly the L1 and L2 loci, has been linked to disease progression among HIV-positive women (Gradissimo et al., 2018). Together, these observations support the conclusion that HIV infection reshapes both host and viral epigenomes during cervical neoplastic evolution. In the cohort analysed here, specific TME-defined endotypes appeared particularly sensitive to HIV status. For example, HIV-positive patients within the immune-\u003cem\u003erich\u003c/em\u003e cluster displayed methylation changes not observed in HIV-negative counterparts, suggesting the presence of endotype-specific viral epigenetic programmes. Such programmes may reflect HIV-driven immunosuppression and viral integration effects that manifest differentially across tumour subtypes. These findings underscore the importance of incorporating HIV and HPV status into TME-aware molecular stratification frameworks, particularly in high-risk settings, and highlight potential biomarkers for viral-associated cervical cancer subgroups.\u003c/p\u003e \u003cp\u003eFor translational modelling, experimental model fidelity was rigorously evaluated by quantifying multi-feature concordance between patient tumours and candidate systems, including cell lines and patient-derived xenografts (PDXs). Faithful experimental models are essential for cancer research; however, prolonged \u003cem\u003ein vitro\u003c/em\u003e culture and model adaptation frequently induce widespread epigenetic and transcriptional drift. Indeed, systematic analyses have demonstrated that many cancer cell lines diverge substantially from their tumours of origin at both the transcriptomic and epigenomic levels (Salvadores et al., 2020). To address this limitation, DNA methylation and transcriptomic profiles were aligned between cervical cancer tumours and experimental models, with adjustment for global shifts, and similarity scores were computed across TME-, driver-, and DMP-restricted feature spaces. This approach parallels recent pan-cancer efforts that quantitatively assess tumour\u0026ndash;model concordance, such as classifier-based strategies that identify models failing to recapitulate their annotated cancer types. For example, Kinker et al. integrated thousands of tumours and hundreds of cell lines within a unified transcriptomic framework to detect models with poor lineage fidelity. Applying a comparable strategy in the epigenomic context, we identified which cervical cancer models most faithfully recapitulate the methylation signatures of each tumour endotype. The concordance analyses revealed that only a subset of available models captured the TME-driven epigenetic profiles observed in patients, underscoring important gaps in model representation for specific tumour contexts. This finding is consistent with prior work showing that many PDXs and cell lines exhibit limited transcriptional fidelity, whereas select organoid and engineered systems more accurately reflect native tumours (Peng et al., 2021). By extending model fidelity assessment into the epigenomic domain and integrating multiple biologically constrained feature spaces, this strategy ensures that selected experimental systems preserve both tumour-intrinsic regulatory programmes and microenvironment-associated epigenetic states characteristic of each tumour endotype.\u003c/p\u003e \u003cp\u003eFinally, TME-, driver-, and DMP-derived features were integrated into a machine-learning classifier to recommend the most appropriate experimental model for each patient endotype. A Random Forest ensemble was trained on the combined epigenetic concordance scores, enabling non-linear integration of complementary regulatory signals across feature axes. Tree-based models are well suited to high-dimensional epigenomic data, and prior studies have demonstrated their strong performance in DNA methylation\u0026ndash;based classification tasks. In cervical cancer, Random Forest approaches have achieved high accuracy in discriminating tumour-specific methylation patterns (Apoorva et al., 2024). More broadly, methylation-based Random Forest classifiers have been shown to accurately infer tumour tissue of origin across diverse cancer types (Duckett et al., 2025). Guided by these precedents, the integrated classifier robustly matched tumours to their most representative experimental systems in cross-validation analyses, substantially outperforming single-axis or na\u0026iuml;ve selection strategies. This framework provides a principled and systematic approach for matching patient tumours to in vitro and in vivo models, moving beyond ad hoc model selection. Importantly, it explicitly incorporates tumour microenvironmental context into experimental model choice, rather than relying on generic, one-size-fits-all cell line representations.\u003c/p\u003e \u003cp\u003eCollectively, these findings advance tumour-aware patient stratification and precision modelling in cervical cancer. By integrating tumour microenvironment context with DNA methylation endotypes, we propose a refined classification framework that captures both immunological and genetic heterogeneity. This stratification has direct clinical relevance, as epigenetic subtypes have been shown to associate with prognosis and therapeutic response. In cervical cancer, methylation-defined subtypes differ in immune checkpoint expression and inferred responsiveness to immunotherapy (Zhao et al., 2025). Consistent with this, the immune-hot endotype identified here may represent patients more likely to benefit from immunomodulatory treatments, whereas immune-cold endotypes may require alternative therapeutic strategies. In addition, endotype-matched experimental models enable more predictive and context-aware preclinical drug screening. For example, integration of DNA methylation and tumour microenvironment features has previously been used to nominate targeted therapeutic agents for high-risk cervical cancer subgroups (Liu et al., 2022). Analogously, the experimental model assignments generated by our framework provide a rational basis for evaluating clinically relevant compounds within the appropriate epigenetic and microenvironmental context. Future integration of epigenetic endotyping with drug-sensitivity data may further enable identification of endotype-specific therapeutic vulnerabilities and opportunities for drug repurposing. Overall, this study demonstrates that incorporating tumour microenvironmental signals into epigenomic analyses yields biologically coherent patient clusters and informs principled selection of experimental systems. Such tumour-informed frameworks hold strong promise for improving personalised therapeutic prediction and enhancing the translational relevance of preclinical cancer research.\u003c/p\u003e"},{"header":"Conclusions","content":"\u003cp\u003eThis study demonstrates that integrating tumour microenvironment\u0026ndash;aware epigenomic features into patient stratification yields biologically coherent and clinically relevant cervical cancer endotypes. By explicitly disentangling microenvironment-associated regulatory programmes from tumour-intrinsic oncogenic and global epigenetic alterations, the framework captures key axes of tumour heterogeneity that are obscured in conventional genome-wide analyses. The identification of immune-\u003cem\u003ehot\u003c/em\u003e and immune-\u003cem\u003ecold\u003c/em\u003e methylation endotypes, together with their differential clinical and virological associations, underscores the central role of the tumour microenvironment in shaping disease progression and therapeutic vulnerability. Importantly, this work moves beyond descriptive stratification by directly linking patient endotypes to experimentally tractable cancer models. Systematic tumour\u0026ndash;model concordance analysis revealed that only a subset of commonly used cervical cancer models faithfully recapitulate patient-specific epigenetic and microenvironmental states, highlighting a major and underappreciated source of irreproducibility in preclinical research. The integration of multi-axis epigenetic features into a machine-learning classifier provides a principled and scalable approach for experimental model recommendation, replacing ad hoc model selection with tumour-informed inference. Together, these findings establish a robust framework for tumour-aware precision modelling. By embedding microenvironmental context into epigenomic analysis and experimental model selection, this approach enhances the biological validity of preclinical studies and lays the foundation for endotype-matched therapeutic discovery. More broadly, the framework offers a generalisable strategy for improving translational fidelity across cancers, particularly in disease settings characterised by complex tumour\u0026ndash;immune\u0026ndash;viral interactions.\u003c/p\u003e "},{"header":"Declarations","content":"\u003cp\u003eData Availability\u003c/p\u003e\n\u003cp\u003eAll datasets used in this study are publicly available through internationally recognised genomic repositories that provide version-controlled, reproducible, and openly accessible molecular data. No restrictions or controlled-access permissions were required. All data generated in this study, all scripts used to generate the computational workflow are available upon Author request.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTCGA\u0026ndash;CESC RNA-seq and Clinical Data\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTranscriptomic and clinical data for the TCGA Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma (TCGA-CESC) cohort were obtained from the National Cancer Institute\u0026rsquo;s Genomic Data Commons (GDC) portal (https://portal.gdc.cancer.gov/). The GDC is a high-integrity, rigorously standardised repository that hosts The Cancer Genome Atlas (TCGA) programme, widely regarded as the global reference resource for integrative cancer genomics. All RNA-seq HTSeq counts, FPKM/TPM values, purity estimates, and clinical annotations used in this study correspond to the most recent harmonised GDC release and remain freely downloadable under open-access terms.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ePatient Methylation Dataset: GSE279982\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eDNA methylation data for HIV-positive and HIV-negative Nigerian women were retrieved from the NCBI Gene Expression Omnibus (GEO) under accession \u003cstrong\u003eGSE279982\u003c/strong\u003e (https://www.ncbi.nlm.nih.gov/geo/). GEO is a long-standing, internationally curated archive for functional genomics data and enforces strict metadata standards, file integrity checks, and reproducible versioning. The dataset consists of Illumina Infinium MethylationEPIC BeadChip profiles (IDAT files), HPV genotyping, and accompanying clinical metadata. All files are available without access restrictions and were downloaded from their primary GEO FTP directory.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eReference Cell Line Methylation Data\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eGSE68379 \u003c/em\u003eMethylation profiles for cervical cancer cell lines used in tumour\u0026ndash;cell line concordance analyses were downloaded from GEO under accession GSE68379. This dataset provides high-quality 450K methylation profiling for key cervical carcinoma models, including HeLa, SiHa, CaSki, C-33A, HT-3, SISO, and MS751. Raw IDAT files and series matrix files remain publicly accessible through the GEO portal.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eReference Cell Line Transcriptomic and Pharmacogenomic Profiles (GDSC2)\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eBulk RNA-seq expression data for cervical cancer cell lines were obtained from the Genomics of Drug Sensitivity in Cancer (GDSC2) platform (https://www.cancerrxgene.org/) a collaborative resource maintained by the Wellcome Sanger Institute and Massachusetts General Hospital Cancer Center. GDSC2 is a globally recognised pharmacogenomic reference database, widely used for benchmarking drug-response prediction models and preclinical oncology research. Processed RNA-seq expression matrices and cell-line metadata are fully open-access and can be downloaded through the GDSC interface or its associated FTP repository.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCancer Driver Gene Catalogue\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe curated list of cancer driver genes used for driver-gene methylation concordance analyses was retrieved from \u003cstrong\u003eCell Model Passports\u003c/strong\u003e (https://cellmodelpassports.sanger.ac.uk/), maintained by the Wellcome Sanger Institute. The resource provides harmonised genomic annotations, validated driver gene sets, and up-to-date gene identifiers mapped across multiple platforms. All driver gene lists used in this study correspond to the 2024\u0026ndash;12\u0026ndash;12 resource update and are openly available for download.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ePathway and Gene Set Resources\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eFunctional enrichment analyses were conducted using publicly accessible and standardised gene set repositories. The Molecular Signatures Database (MSigDB, v2024.1) Hallmark and KEGG collections were accessed under standard academic use conditions and provided the primary curated pathways for over-representation and enrichment testing. Additional KEGG pathway definitions were retrieved programmatically through the KEGG REST API, ensuring consistent and up-to-date pathway annotations. All gene sets incorporated into FGSEA analyses were obtained directly from these open-access resources without modification, maintaining full reproducibility and transparency of the enrichment framework.\u003c/p\u003e\n\u003cp\u003eSupplementary Material\u003c/p\u003e\n\u003cp\u003eSupplementary \u003cem\u003eFramework_R\u003c/em\u003e contains a fully annotated tutorial handbook \u003cem\u003e(Tutorials 1\u0026ndash;9)\u003c/em\u003e providing complete, reproducible implementations of all analytical steps described in this study, including data processing, feature restriction, tumour stratification, tumour\u0026ndash;model concordance, and classifier construction.\u003c/p\u003e\n\u003cp\u003eCode and Reproducibility\u003c/p\u003e\n\u003cp\u003eAll analyses were performed using open-source software in R (v4.3), relying exclusively on publicly accessible datasets. Scripts for data import, preprocessing, correlation computation, and visualisation can be shared upon request and will be released in the accompanying repository for the subsequent chapter. The full analytical workflow can be followed using the supplementary tutorial handbook provided with this study.\u003c/p\u003e\n\u003cp\u003eAuthor Contributions\u003c/p\u003e\n\u003cp\u003eSaltiel Hamese completed this research and wrote the research paper together with the contributions from all the authors as follows: Dr. Mutsa Takundwa, Prof. Earl Prinsloo and Dr. Deepak B. Thimiri Govinda Raj. All the authors have read and approved the manuscript.\u003c/p\u003e\n\u003cp\u003eConflict of Interest\u003c/p\u003e\n\u003cp\u003eThe authors declare no conflict of interest.\u003c/p\u003e\n\u003cp\u003eAcknowledgements\u003c/p\u003e\n\u003cp\u003eThe author gratefully acknowledges the publicly available multi-omics resources that enabled all analyses in this chapter. TCGA and GEO provided the foundational transcriptomic and methylation datasets, including HPV-stratified and HIV-positive cohorts essential for TME and clustering analyses. The Genomics of Drug Sensitivity in Cancer (GDSC) and Cell Model Passports resources, maintained by the Wellcome Sanger Institute, supplied high-quality cell line molecular profiles and curated cancer driver gene catalogues used for tumour\u0026ndash;cell line concordance and driver-level methylation evaluation. Functional annotation and enrichment analyses were supported by the Molecular Signatures Database (MSigDB) and the Kyoto Encyclopedia of Genes and Genomes (KEGG), which offered rigorously curated pathway collections for FGSEA-based interpretation. Special thanks are extended to ChatGPT (OpenAI) for assistance in the development, optimisation, and troubleshooting of R-based workflows applied throughout this study.\u003c/p\u003e\n\u003cp\u003eFunding Declaration \u003c/p\u003e\n\u003cp\u003eSaltiel Hamese Doctoral Studies are funded by the National Skills Development Fund (NSDF), which is administered by the National Research Foundation (NRF) of South Africa, under the grant reference: PMDS230530111412. The project (Principal Investigator: DBTGR) was funded by the National Research Foundation (NRF) Competitive Grant, ICGEB Early Career Grant, Department of Science, Technology, and Innovation (DSTI) Emerging Research Area (ERA) Funding, SAMRC-AMED Cancer Research funding, and CSIR Strategic Initiative funding. Mutsa was funded by the NRF Thuthuka Rating Track. \u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eAmini, A. P., Kirkpatrick, J. D., Wang, C. S., Jaeger, A. M., Su, S., Naranjo, S., Zhong, Q., Cabana, C. M., Jacks, T., \u0026amp; Bhatia, S. N. (2022). Multiscale profiling of protease activity in cancer. Nature Communications 2022 13:1, 13(1), 1\u0026ndash;16. https://doi.org/10.1038/s41467-022-32988-5 \u003c/li\u003e\n\u003cli\u003eAndrade, C. (2021). Z Scores, Standard Scores, and Composite Test Scores Explained. Indian Journal of Psychological Medicine, 43(6), 555. https://doi.org/10.1177/02537176211046525 \u003c/li\u003e\n\u003cli\u003eApoorva, Handa, V., Batra, S., \u0026amp; Arora, V. (2024). Advancing epigenetic profiling in cervical cancer: machine learning techniques for classifying DNA methylation patterns. 3 Biotech 2024 14:11, 14(11), 264-. https://doi.org/10.1007/S13205-024-04107-2 \u003c/li\u003e\n\u003cli\u003eBarrett, T., Wilhite, S. E., Ledoux, P., Evangelista, C., Kim, I. F., Tomashevsky, M., Marshall, K. A., Phillippy, K. H., Sherman, P. M., Holko, M., Yefanov, A., Lee, H., Zhang, N., Robertson, C. L., Serova, N., Davis, S., \u0026amp; Soboleva, A. (2013). NCBI GEO: archive for functional genomics data sets\u0026mdash;update. Nucleic Acids Research, 41(D1), D991\u0026ndash;D995. https://doi.org/10.1093/NAR/GKS1193 \u003c/li\u003e\n\u003cli\u003eBrash, J. T., Diez-Pinel, G., Rinaldi, L., Castellan, R. F. P., Fantin, A., \u0026amp; Ruhrberg, C. (2025). Endothelial transcriptomic, epigenomic and proteomic data challenge the proposed role for TSAd in vascular permeability. Angiogenesis 2025 28:2, 28(2), 1\u0026ndash;21. https://doi.org/10.1007/S10456-025-09971-X \u003c/li\u003e\n\u003cli\u003eBueno-Urquiza, L. J., God\u0026iacute;nez-Rub\u0026iacute;, M., Villegas-Pineda, J. C., Vega-Maga\u0026ntilde;a, A. N., Jave-Su\u0026aacute;rez, L. F., Puebla-Mora, A. G., Aguirre-Sandoval, G. E., Mart\u0026iacute;nez-Silva, M. G., Ram\u0026iacute;rez-de-Arellano, A., \u0026amp; Pereira-Su\u0026aacute;rez, A. L. (2024). Phenotypic Heterogeneity of Cancer Associated Fibroblasts in Cervical Cancer Progression: FAP as a Central Activation Marker. Cells, 13(7), 560. https://doi.org/10.3390/CELLS13070560 \u003c/li\u003e\n\u003cli\u003eBurk, R. D., Chen, Z., Saller, C., Tarvin, K., Carvalho, A. L., Scapulatempo-Neto, C., Silveira, H. C., Fregnani, J. H., Creighton, C. J., Anderson, M. L., Castro, P., Wang, S. S., Yau, C., Benz, C., Gordon Robertson, A., Mungall, K., Lim, L., Bowlby, R., Sadeghi, S., \u0026hellip; Mutch, D. (2017). Integrated genomic and molecular characterization of cervical cancer. Nature 2017 543:7645, 543(7645), 378\u0026ndash;384. https://doi.org/10.1038/nature21386 \u003c/li\u003e\n\u003cli\u003eBusarello, E., Biancon, G., Cimignolo, I., Lauria, F., Ibnat, Z., Ramirez, C., Tom\u0026egrave;, G., Ciuffreda, M., Bucciarelli, G., Pilli, A., Marino, S. M., Bontempi, V., Ress, F., Aass, K. R., VanOudenhove, J., Tiberi, L., Mione, M. C., Standal, T., Macchi, P., \u0026hellip; Tebaldi, T. (2025). Cell Marker Accordion: interpretable single-cell and spatial omics annotation in health and disease. Nature Communications 2025 16:1, 16(1), 1\u0026ndash;18. https://doi.org/10.1038/s41467-025-60900-4 \u003c/li\u003e\n\u003cli\u003eChakravarthy, A., Reddin, I., Henderson, S., Dong, C., Kirkwood, N., Jeyakumar, M., Rodriguez, D. R., Martinez, N. G., McDermott, J., Su, X., Egawa, N., Fjeldbo, C. S., Skingen, V. E., Lyng, H., Halle, M. K., Krakstad, C., Soleiman, A., Sprung, S., Lechner, M., \u0026hellip; Fenton, T. R. (2022). Integrated analysis of cervical squamous cell carcinoma cohorts from three continents reveals conserved subtypes of prognostic significance. Nature Communications 2022 13:1, 13(1), 1\u0026ndash;17. https://doi.org/10.1038/s41467-022-33544-x \u003c/li\u003e\n\u003cli\u003eChawla, S., Rockstroh, A., Lehman, M., Ratther, E., Jain, A., Anand, A., Gupta, A., Bhattacharya, N., Poonia, S., Rai, P., Das, N., Majumdar, A., Jayadeva, Ahuja, G., Hollier, B. G., Nelson, C. C., \u0026amp; Sengupta, D. (2022). Gene expression based inference of cancer drug sensitivity. Nature Communications 2022 13:1, 13(1), 5680-. https://doi.org/10.1038/s41467-022-33291-z \u003c/li\u003e\n\u003cli\u003eChen, K., Yong, J., Zauner, R., Wally, V., Whitelock, J., Sajinovic, M., Kopecki, Z., Liang, K., Scott, K. F., \u0026amp; Mellick, A. S. (2022). Chondroitin Sulfate Proteoglycan 4 as a Marker for Aggressive Squamous Cell Carcinoma. Cancers, 14(22), 5564. https://doi.org/10.3390/CANCERS14225564/S1 \u003c/li\u003e\n\u003cli\u003eChen, X., He, H., Xiao, Y., Hasim, A., Yuan, J., Ye, M., Li, X., Hao, Y., \u0026amp; Guo, X. (2021). CXCL10 Produced by HPV-Positive Cervical Cancer Cells Stimulates Exosomal PDL1 Expression by Fibroblasts via CXCR3 and JAK-STAT Pathways. Frontiers in Oncology, 11, 629350. https://doi.org/10.3389/FONC.2021.629350/FULL \u003c/li\u003e\n\u003cli\u003eConlon, N. T., Kooijman, J. J., van Gerwen, S. J. C., Mulder, W. R., Zaman, G. J. R., Diala, I., Eli, L. D., Lalani, A. S., Crown, J., \u0026amp; Collins, D. M. (2021). Comparative analysis of drug response and gene profiling of HER2-targeted tyrosine kinase inhibitors. British Journal of Cancer 2021 124:7, 124(7), 1249\u0026ndash;1259. https://doi.org/10.1038/s41416-020-01257-x \u003c/li\u003e\n\u003cli\u003eDasgupta, S., Saha, A., Ganguly, N., Bhuniya, A., Dhar, S., Guha, I., Ghosh, T., Sarkar, A., Ghosh, S., Roy, K., Das, T., Banerjee, S., Pal, C., Baral, R., \u0026amp; Bose, A. (2022). NLGP regulates RGS5-TGF\u0026beta; axis to promote pericyte-dependent vascular normalization during restricted tumor growth. FASEB Journal, 36(5), e22268. https://doi.org/10.1096/FJ.202101093R;JOURNAL:JOURNAL:15306860;REQUESTEDJOURNAL:JOURNAL:15306860;WGROUP:STRING:PUBLICATION \u003c/li\u003e\n\u003cli\u003eDe Vos Van Steenwijk, P. J., Ramwadhdoebe, T. H., Goedemans, R., Doorduijn, E. M., Van Ham, J. J., Gorter, A., Van Hall, T., Kuijjer, M. L., Van Poelgeest, M. I. E., Van Der Burg, S. H., \u0026amp; Jordanova, E. S. (2013). Tumor-infiltrating CD14-positive myeloid cells and CD8-positive T-cells prolong survival in patients with cervical carcinoma. International Journal of Cancer, 133(12), 2884\u0026ndash;2894. https://doi.org/10.1002/IJC.28309;CTYPE:STRING:JOURNAL \u003c/li\u003e\n\u003cli\u003eDesai, P., Takahashi, N., Kumar, R., Nichols, S., Malin, J., Hunt, A., Schultz, C., Cao, Y., Tillo, D., Nousome, D., Chauhan, L., Sciuto, L., Jordan, K., Rajapakse, V., Tandon, M., Lissa, D., Zhang, Y., Kumar, S., Pongor, L., \u0026hellip; Thomas, A. (2024). Microenvironment shapes small-cell lung cancer neuroendocrine states and presents therapeutic opportunities. Cell Reports Medicine, 5(6). https://doi.org/10.1016/j.xcrm.2024.101610 \u003c/li\u003e\n\u003cli\u003eDimitrova, P., Vasileva-Slaveva, M., Shivarov, V., Hasan, I., \u0026amp; Yordanov, A. (2023). Infiltration by Intratumor and Stromal CD8 and CD68 in Cervical Cancer. Medicina, 59(4), 728. https://doi.org/10.3390/MEDICINA59040728 \u003c/li\u003e\n\u003cli\u003eDuckett, D., Vormittag-Nocito, E. R., Jamshidi, P., Sukhanova, M., Parker, S., Brat, D. J., Jennings, L. J., \u0026amp; Santana-Santos, L. (2025). Accurate identification of primary site in tumors of unknown origin (TUO) using DNA methylation. Npj Precision Oncology 2025 9:1, 9(1), 8-. https://doi.org/10.1038/s41698-025-00805-z \u003c/li\u003e\n\u003cli\u003eEskra, J. N., Nguyen, E., Golabi, A., Nair, S., Masciotti, A., Fazio, A., Kocak, M., Ronan, M., Rees, M. G., \u0026amp; Roth, J. A. (2023). Abstract PR004: PRISM high-throughput screening of antibody-drug conjugates uncovers clinically relevant targets. Molecular Cancer Therapeutics, 22(12_Supplement), PR004\u0026ndash;PR004. https://doi.org/10.1158/1535-7163.TARG-23-PR004 \u003c/li\u003e\n\u003cli\u003eFashemi, B. E., van Biljon, L., Rodriguez, J., Graham, O., Mullen, M., \u0026amp; Khabele, D. (2023). Ovarian Cancer Patient-Derived Organoid Models for Pre-Clinical Drug Testing. Journal of Visualized Experiments : JoVE, 2023(199), 10.3791/65068. https://doi.org/10.3791/65068 \u003c/li\u003e\n\u003cli\u003eFilippova, M., Filippov, V., Williams, V. M., Zhang, K., Kokoza, A., Bashkirova, S., \u0026amp; Duerksen-Hughes, P. (2014). Cellular Levels of Oxidative Stress Affect the Response of Cervical Cancer Cells to Chemotherapeutic Agents. BioMed Research International, 2014, 574659. https://doi.org/10.1155/2014/574659 \u003c/li\u003e\n\u003cli\u003eGioanni, J., Grosgeorge, J., Zanghellini, E., Mazeau, C., Gaudray, P., Ettore, F., Formento, P., \u0026amp; Demard, F. (1993). Characterization of CAL39, a new human cell line derived from a vulvar squamous cell carcinoma. International Journal of Oncology, 3(2), 293\u0026ndash;297. https://doi.org/10.3892/IJO.3.2.293/ABSTRACT \u003c/li\u003e\n\u003cli\u003eGradissimo, A., Lam, J., Attonito, J. D., Palefsky, J., Massad, L. S., Xie, X., Eltoum, I. E., Rahangdale, L., Fischl, M. A., Anastos, K., Minkoff, H., Xue, X., D\u0026rsquo;Souza, G., Flowers, L. C., Colie, C., Shrestha, S., Hessol, N. A., Strickler, H. D., \u0026amp; Burk, R. D. (2018). Methylation of high-risk human papillomavirus genomes are associated with cervical precancer in HIV-positive women. Cancer Epidemiology Biomarkers and Prevention, 27(12), 1407\u0026ndash;1415. https://doi.org/10.1158/1055-9965.EPI-17-1051 \u003c/li\u003e\n\u003cli\u003eGu, Z. (2022). Complex heatmap visualization. IMeta, 1(3), e43. https://doi.org/10.1002/IMT2.43;PAGEGROUP:STRING:PUBLICATION \u003c/li\u003e\n\u003cli\u003eHaynes, W. (2013). Benjamini\u0026ndash;Hochberg Method. Encyclopedia of Systems Biology, 78\u0026ndash;78. https://doi.org/10.1007/978-1-4419-9863-7_1215 \u003c/li\u003e\n\u003cli\u003eHaynes, W. (2013). Wilcoxon Rank Sum Test. Encyclopedia of Systems Biology, 2354\u0026ndash;2355. https://doi.org/10.1007/978-1-4419-9863-7_1185/FIGURES/234 \u003c/li\u003e\n\u003cli\u003eHewavisenti, R. V., Arena, J., Ahlenstiel, C. L., \u0026amp; Sasson, S. C. (2023). Human papillomavirus in the setting of immunodeficiency: Pathogenesis and the emergence of next-generation therapies to reduce the high associated cancer risk. Frontiers in Immunology, 14, 1112513. https://doi.org/10.3389/FIMMU.2023.1112513 \u003c/li\u003e\n\u003cli\u003eHiramoto, S., Kato, K., Shoji, H., Okita, N., Takashima, A., Honma, Y., Iwasa, S., Hamaguchi, T., Yamada, Y., Shimada, Y., \u0026amp; Boku, N. (2018). A retrospective analysis of 5-fluorouracil plus cisplatin as first-line chemotherapy in the recent treatment strategy for patients with metastatic or recurrent esophageal squamous cell carcinoma. International Journal of Clinical Oncology, 23(3), 466\u0026ndash;472. https://doi.org/10.1007/S10147-018-1239-X \u003c/li\u003e\n\u003cli\u003eHorikawa, N., Baba, T., Matsumura, N., Murakami, R., Abiko, K., Hamanishi, J., Yamaguchi, K., Koshiyama, M., Yoshioka, Y., \u0026amp; Konishi, I. (2015). Genomic profile predicts the efficacy of neoadjuvant chemotherapy for cervical cancer patients. BMC Cancer 2015 15:1, 15(1), 739-. https://doi.org/10.1186/S12885-015-1703-1 \u003c/li\u003e\n\u003cli\u003eHuang, R., \u0026amp; Rofstad, E. K. (2016). Cancer stem cells (CSCs), cervical CSCs and targeted therapies. Oncotarget, 8(21), 35351. https://doi.org/10.18632/ONCOTARGET.10169 \u003c/li\u003e\n\u003cli\u003eHuang, Y., Georges, D., Rumgay, H., Soerjomataram, I., \u0026amp; Clifford, G. M. (2025). Global burden of cancer attributable to HIV: a worldwide incidence analysis. The Lancet Global Health, 13(9), e1525\u0026ndash;e1532. https://doi.org/10.1016/S2214-109X(25)00264-5 \u003c/li\u003e\n\u003cli\u003eIorio, F., Knijnenburg, T. A., Vis, D. J., Bignell, G. R., Menden, M. P., Schubert, M., Aben, N., Gon\u0026ccedil;alves, E., Barthorpe, S., Lightfoot, H., Cokelaer, T., Greninger, P., van Dyk, E., Chang, H., de Silva, H., Heyn, H., Deng, X., Egan, R. K., Liu, Q., \u0026hellip; Garnett, M. J. (2016a). A Landscape of Pharmacogenomic Interactions in Cancer. Cell, 166(3), 740\u0026ndash;754. https://doi.org/10.1016/j.cell.2016.06.017 \u003c/li\u003e\n\u003cli\u003eIorio, F., Knijnenburg, T. A., Vis, D. J., Bignell, G. R., Menden, M. P., Schubert, M., Aben, N., Gon\u0026ccedil;alves, E., Barthorpe, S., Lightfoot, H., Cokelaer, T., Greninger, P., van Dyk, E., Chang, H., de Silva, H., Heyn, H., Deng, X., Egan, R. K., Liu, Q., \u0026hellip; Garnett, M. J. (2016b). A Landscape of Pharmacogenomic Interactions in Cancer. Cell, 166(3), 740\u0026ndash;754. https://doi.org/10.1016/j.cell.2016.06.017 \u003c/li\u003e\n\u003cli\u003eJaved, S., Sood, S., Rai, B., Bhattacharyya, S., Bagga, R., \u0026amp; Srinivasan, R. (2021). ALDH1 \u0026amp; CD133 in invasive cervical carcinoma \u0026amp; their association with the outcome of chemoradiation therapy. The Indian Journal of Medical Research, 154(2), 367. https://doi.org/10.4103/IJMR.IJMR_709_20 \u003c/li\u003e\n\u003cli\u003eJohn-Olabode, S. O., Udenze, I. C., Adejimi, A. A., Ajie, O., \u0026amp; Okunade, K. S. (2025). Association between tumour necrosis factor-a polymorphism and cervical cancer in Lagos State, Nigeria. Ecancermedicalscience, 19, 1845. https://doi.org/10.3332/ECANCER.2025.1845 \u003c/li\u003e\n\u003cli\u003eKamradt, MC, \u0026amp; al. (2000). Inhibition of radiation-induced apoptosis by dexamethasone in cervical carcinoma cell lines depends upon increased HPV E6/E7. https://doi.org/10.1054/bjoc.2000.1114 \u003c/li\u003e\n\u003cli\u003eKanehisa, M., Sato, Y., Kawashima, M., Furumichi, M., \u0026amp; Tanabe, M. (2016). KEGG as a reference resource for gene and protein annotation. Nucleic Acids Research, 44(D1), D457\u0026ndash;D462. https://doi.org/10.1093/NAR/GKV1070 \u003c/li\u003e\n\u003cli\u003eKaur, D., Lee, S. M., Goldberg, D., Spix, N. J., Hinoue, T., Li, H.-T., Dwaraka, V. B., Smith, R., Shen, H., Liang, G., Renke, N., Laird, P. W., \u0026amp; Zhou, W. (2023). Comprehensive evaluation of the Infinium human MethylationEPIC v2 BeadChip. Epigenetics Communications, 3(1). https://doi.org/10.1186/S43682-023-00021-5 \u003c/li\u003e\n\u003cli\u003eKeleg, S., Titov, A., Heller, A., Giese, T., Tjaden, C., Ahmad, S. S., Gaida, M. M., Bauer, A. S., Werner, J., \u0026amp; Giese, N. A. (2014). Chondroitin Sulfate Proteoglycan CSPG4 as a Novel Hypoxia-Sensitive Marker in Pancreatic Tumors. PLOS ONE, 9(6), e100178. https://doi.org/10.1371/JOURNAL.PONE.0100178 \u003c/li\u003e\n\u003cli\u003eKolde, R. (2025). Pretty Heatmaps [R package pheatmap version 1.0.13]. CRAN: Contributed Packages. https://doi.org/10.32614/CRAN.PACKAGE.PHEATMAP \u003c/li\u003e\n\u003cli\u003eKoraneekit, A., Limpaiboon, T., Sangka, A., Boonsiri, P., Daduang, S., \u0026amp; Daduang, J. (2018). Synergistic effects of cisplatin-caffeic acid induces apoptosis in human cervical cancer cells via the mitochondrial pathways. Oncology Letters, 15(5), 7397\u0026ndash;7402. https://doi.org/10.3892/OL.2018.8256/ABSTRACT \u003c/li\u003e\n\u003cli\u003eKorotkevich, G., Sukhov, V., Budin, N., Shpak, B., Artyomov, M. N., \u0026amp; Sergushichev, A. (2016). Fast gene set enrichment analysis. BioRxiv. https://doi.org/10.1101/060012 \u003c/li\u003e\n\u003cli\u003eKrijthe, J. (2023). T-Distributed Stochastic Neighbor Embedding using a Barnes-Hut Implementation [R package Rtsne version 0.17]. CRAN: Contributed Packages. https://doi.org/10.32614/CRAN.PACKAGE.RTSNE \u003c/li\u003e\n\u003cli\u003eKumar, A., Khurana, U., Chowdhary, R., Halder, A., \u0026amp; Kapoor, N. (2024). Evaluation of the diagnostic utility of MCAM-1 (CD146) in a group of common gynecological cancers: A case-control study. Turkish Journal of Obstetrics and Gynecology, 21(1), 43. https://doi.org/10.4274/TJOD.GALENOS.2024.38265 \u003c/li\u003e\n\u003cli\u003eLevine, J. H., Simonds, E. F., Bendall, S. C., Davis, K. L., Amir, E. A. D., Tadmor, M. D., Litvin, O., Fienberg, H. G., Jager, A., Zunder, E. R., Finck, R., Gedman, A. L., Radtke, I., Downing, J. R., Pe\u0026rsquo;er, D., \u0026amp; Nolan, G. P. (2015). Data-Driven Phenotypic Dissection of AML Reveals Progenitor-like Cells that Correlate with Prognosis. Cell, 162(1), 184\u0026ndash;197. https://doi.org/10.1016/j.cell.2015.05.047 \u003c/li\u003e\n\u003cli\u003eLi, D. Z., Yan, B., Liao, K., Huang, J., Zhang, J., Chen, Y. C., Zhu, J., Zhi, S., \u0026amp; Chen, L. (2025). Multi-omics modality completion and knowledge distillation for drug response prediction in cervical cancer. Frontiers in Oncology, 15, 1622600. https://doi.org/10.3389/FONC.2025.1622600/BIBTEX \u003c/li\u003e\n\u003cli\u003eLi, X., Yue, Z., Wang, D., \u0026amp; Zhou, L. (2023). PTPRC functions as a prognosis biomarker in the tumor microenvironment of cutaneous melanoma. Scientific Reports 2023 13:1, 13(1), 1\u0026ndash;15. https://doi.org/10.1038/s41598-023-46794-6 \u003c/li\u003e\n\u003cli\u003eLi, Y., Liu, Q., Jing, X., Wang, Y., Jia, X., Yang, X., \u0026amp; Chen, K. (2025). Cancer‐Associated Fibroblasts: Heterogeneity, Cancer Pathogenesis, and Therapeutic Targets. MedComm, 6(7), e70292. https://doi.org/10.1002/MCO2.70292 \u003c/li\u003e\n\u003cli\u003eLiberzon, A., Birger, C., Thorvaldsd\u0026oacute;ttir, H., Ghandi, M., Mesirov, J. P., \u0026amp; Tamayo, P. (2015). The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Systems, 1(6), 417. https://doi.org/10.1016/J.CELS.2015.12.004 \u003c/li\u003e\n\u003cli\u003eLin, M., Pan, C., Xu, W., Li, J., \u0026amp; Zhu, X. (2020). Leonurine promotes cisplatin sensitivity in human cervical cancer cells through increasing apoptosis and inhibiting drug-resistant proteins. Drug Design, Development and Therapy, 14, 1885\u0026ndash;1895. https://doi.org/10.2147/DDDT.S252112;WEBSITE:WEBSITE:TFOPB;PAGEGROUP:STRING:PUBLICATION \u003c/li\u003e\n\u003cli\u003eLitwin, T. R., Irvin, S. R., Chornock, R. L., Sahasrabuddhe, V. V., Stanley, M., \u0026amp; Wentzensen, N. (2020). Infiltrating T-cell markers in cervical carcinogenesis: a systematic review and meta-analysis. British Journal of Cancer 2020 124:4, 124(4), 831\u0026ndash;841. https://doi.org/10.1038/s41416-020-01184-x \u003c/li\u003e\n\u003cli\u003eLiu, B., Zhai, J., Wang, W., Liu, T., Liu, C., Zhu, X., Wang, Q., Tian, W., \u0026amp; Zhang, F. (2022). Identification of Tumor Microenvironment and DNA Methylation-Related Prognostic Signature for Predicting Clinical Outcomes and Therapeutic Responses in Cervical Cancer. Frontiers in Molecular Biosciences, 9, 872932. https://doi.org/10.3389/FMOLB.2022.872932/BIBTEX \u003c/li\u003e\n\u003cli\u003eLiu, J., Yang, L., Zhang, J., Zhang, J., Chen, Y., Li, K., Li, Y., Li, Y., Yao, L., \u0026amp; Guo, G. (2012). Knock-down of NDRG2 sensitizes cervical cancer Hela cells to cisplatin through suppressing Bcl-2 expression. BMC Cancer 2012 12:1, 12(1), 370-. https://doi.org/10.1186/1471-2407-12-370 \u003c/li\u003e\n\u003cli\u003eLiu, Y., Wu, W., Cai, C., Zhang, H., Shen, H., \u0026amp; Han, Y. (2023). Patient-derived xenograft models in cancer therapy: technologies and applications. Signal Transduction and Targeted Therapy 2023 8:1, 8(1), 160-. https://doi.org/10.1038/s41392-023-01419-2 \u003c/li\u003e\n\u003cli\u003eMa, W., Tang, W., Kwok, J. S. L., Tong, A. H. Y., Lo, C. W. S., Chu, A. T. W., \u0026amp; Chung, B. H. Y. (2024). A review on trends in development and translation of omics signatures in cancer. Computational and Structural Biotechnology Journal, 23, 954\u0026ndash;971. https://doi.org/10.1016/J.CSBJ.2024.01.024 \u003c/li\u003e\n\u003cli\u003eMcInnes, L., Healy, J., \u0026amp; Melville, J. (2018). UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. https://arxiv.org/pdf/1802.03426 \u003c/li\u003e\n\u003cli\u003eMcKight, P. E., \u0026amp; Najab, J. (2010). Kruskal-Wallis Test. The Corsini Encyclopedia of Psychology, 1\u0026ndash;1. https://doi.org/10.1002/9780470479216.CORPSY0491 \u003c/li\u003e\n\u003cli\u003eMoro, M., Balestrero, F. C., \u0026amp; Grolla, A. A. (2024). Pericytes: jack-of-all-trades in cancer-related inflammation. Frontiers in Pharmacology, 15. https://doi.org/10.3389/FPHAR.2024.1426033 \u003c/li\u003e\n\u003cli\u003eNaidu, R., Paulraj, F., Abas, F., Lajis, N., \u0026amp; Othman, I. (2018). Identification of Differentially Expressed Genes in CaSki Cervical Cancer Cells Treated with a Selected Diarylpentanoid. Frontiers in Pharmacology, 9. https://doi.org/10.3389/CONF.FPHAR.2018.63.00121/EVENT_ABSTRACT \u003c/li\u003e\n\u003cli\u003eNazli, A., Chan, O., Dobson-Belaire, W. N., Ouellet, M., Tremblay, M. J., Gray-Owen, S. D., Arsenault, A. L., \u0026amp; Kaushic, C. (2010). Exposure to HIV-1 Directly Impairs Mucosal Epithelial Barrier Integrity Allowing Microbial Translocation. PLoS Pathogens, 6(4), e1000852. https://doi.org/10.1371/JOURNAL.PPAT.1000852 \u003c/li\u003e\n\u003cli\u003eOlukomogbon, T., Akpobome, B., Omole, A., Adebamowo, C. A., \u0026amp; Adebamowo, S. N. (2024). Association Between Cervical Inflammatory Mediators and Prevalent Cervical Human Papillomavirus Infection. JCO Global Oncology, 10(10), e2300380. https://doi.org/10.1200/GO.23.00380 \u003c/li\u003e\n\u003cli\u003ePavone, G., Marino, A., Fisicaro, V., Motta, L., Spata, A., Martorana, F., Spampinato, S., Celesia, B. M., Cacopardo, B., Vigneri, P., \u0026amp; Nunnari, G. (2024). Entangled Connections: HIV and HPV Interplay in Cervical Cancer\u0026mdash;A Comprehensive Review. International Journal of Molecular Sciences, 25(19). https://doi.org/10.3390/IJMS251910358 \u003c/li\u003e\n\u003cli\u003ePeng, D., Gleyzer, R., Tai, W. H., Kumar, P., Bian, Q., Isaacs, B., da Rocha, E. L., Cai, S., DiNapoli, K., Huang, F. W., \u0026amp; Cahan, P. (2021). Evaluating the transcriptional fidelity of cancer models. Genome Medicine, 13(1), 73. https://doi.org/10.1186/S13073-021-00888-W\u003c/li\u003e\n\u003cli\u003ePeng, Y. X., Yu, B., Qin, H., Xue, L., Liang, Y. J., \u0026amp; Quan, Z. X. (2020). EMT-related gene expression is positively correlated with immunity and may be derived from stromal cells in osteosarcoma. PeerJ, 2020(2), e8489. https://doi.org/10.7717/PEERJ.8489/SUPP-4 \u003c/li\u003e\n\u003cli\u003eRaghavan, S., Winter, P. S., Navia, A. W., Williams, H. L., DenAdel, A., Lowder, K. E., Galvez-Reyes, J., Kalekar, R. L., Mulugeta, N., Kapner, K. S., Raghavan, M. S., Borah, A. A., Liu, N., V\u0026auml;yrynen, S. A., Costa, A. D., Ng, R. W. S., Wang, J., Hill, E. K., Ragon, D. Y., \u0026hellip; Shalek, A. K. (2021). Microenvironment drives cell state, plasticity, and drug response in pancreatic cancer. Cell, 184(25), 6119-6137.e26. https://doi.org/10.1016/J.CELL.2021.11.017 \u003c/li\u003e\n\u003cli\u003eRichter, C. E., Cocco, E., Bellone, S., Bellone, M., Casagrande, F., Todeschini, P., R\u0026uuml;ttinger, D., Silasi, D. A., Azodi, M., Schwartz, P. E., Rutherford, T. J., Pecorelli, S., \u0026amp; Santin, A. D. (2010). Primary Cervical Carcinoma Cell Lines Overexpress Epithelial Cell Adhesion Molecule (EpCAM) and Are Highly Sensitive to Immunotherapy With MT201, a Fully Human Monoclonal Anti-EpCAM Antibody. International Journal of Gynecological Cancer : Official Journal of the International Gynecological Cancer Society, 20(9), 1440. https://doi.org/10.1111/IGC.0b013e3181fb18a1 \u003c/li\u003e\n\u003cli\u003eRitchie, M. E., Phipson, B., Wu, D., Hu, Y., Law, C. W., Shi, W., \u0026amp; Smyth, G. K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research, 43(7), e47. https://doi.org/10.1093/NAR/GKV007 \u003c/li\u003e\n\u003cli\u003eRuan, H., Zhou, Y., Shen, J., Zhai, Y., Xu, Y., Pi, L., Huang, R., Chen, K., Li, X., Ma, W., Wu, Z., Deng, X., Wang, X., Zhang, C., \u0026amp; Guan, M. (2020). Circulating tumor cell characterization of lung cancer brain metastases in the cerebrospinal fluid through single-cell transcriptome analysis. Clinical and Translational Medicine, 10(8), e246. https://doi.org/10.1002/CTM2.246 \u003c/li\u003e\n\u003cli\u003eSaha, S. K., Kim, K., Yang, G. M., Choi, H. Y., \u0026amp; Cho, S. G. (2018a). Cytokeratin 19 (KRT19) has a Role in the Reprogramming of Cancer Stem Cell-Like Cells to Less Aggressive and More Drug-Sensitive Cells. International Journal of Molecular Sciences 2018, Vol. 19, Page 1423, 19(5), 1423. https://doi.org/10.3390/IJMS19051423 \u003c/li\u003e\n\u003cli\u003eSaha, S. K., Kim, K., Yang, G. M., Choi, H. Y., \u0026amp; Cho, S. G. (2018b). Cytokeratin 19 (KRT19) has a Role in the Reprogramming of Cancer Stem Cell-Like Cells to Less Aggressive and More Drug-Sensitive Cells. International Journal of Molecular Sciences 2018, Vol. 19, Page 1423, 19(5), 1423. https://doi.org/10.3390/IJMS19051423 \u003c/li\u003e\n\u003cli\u003eSahu, D., Shi, J., Segura Rueda, I. A., Chatrath, A., \u0026amp; Dutta, A. (2024). Development of a polygenic score predicting drug resistance and patient outcome in breast cancer. Npj Precision Oncology 2024 8:1, 8(1), 219-. https://doi.org/10.1038/s41698-024-00714-7 \u003c/li\u003e\n\u003cli\u003eSalvadores, M., Fuster-Tormo, F., \u0026amp; Supek, F. (2020). Matching cell lines with cancer type and subtype of origin via mutational, epigenomic, and transcriptomic patterns. Science Advances, 6(27), eaba1862. https://doi.org/10.1126/SCIADV.ABA1862 \u003c/li\u003e\n\u003cli\u003eSeshadri, V. D. (2021). Brucine promotes apoptosis in cervical cancer cells (ME-180) via suppression of inflammation and cell proliferation by regulating PI3K/AKT/mTOR signaling pathway. Environmental Toxicology, 36(9), 1841\u0026ndash;1847. https://doi.org/10.1002/TOX.23304;JOURNAL:JOURNAL:10982256A;WGROUP:STRING:PUBLICATION \u003c/li\u003e\n\u003cli\u003eShim, W. S. N., Teh, M., Bapna, A., Kim, I., Koh, G. Y., Mack, P. O. P., \u0026amp; Ge, R. (2002). Angiopoietin 1 Promotes Tumor Angiogenesis and Tumor Vessel Plasticity of Human Cervical Cancer in Mice. Experimental Cell Research, 279(2), 299\u0026ndash;309. https://doi.org/10.1006/EXCR.2002.5597 \u003c/li\u003e\n\u003cli\u003eSong, J., Yang, P., Chen, C., Ding, W., Tillement, O., Bai, H., \u0026amp; Zhang, S. (2025). Targeting epigenetic regulators as a promising avenue to overcome cancer therapy resistance. Signal Transduction and Targeted Therapy 2025 10:1, 10(1), 1\u0026ndash;56. https://doi.org/10.1038/s41392-025-02266-z \u003c/li\u003e\n\u003cli\u003eStrickler, H. D., Burk, R. D., Fazzari, M., Anastos, K., Minkoff, H., Massad, L. S., Hall, C., Bacon, M., Levine, A. M., Watts, D. H., Silverberg, M. J., Xue, X., Schlecht, N. F., Melnick, S., \u0026amp; Palefsky, J. M. (2005). Natural history and possible reactivation of human papillomavirus in human immunodeficiency virus-positive women. Journal of the National Cancer Institute, 97(8), 577\u0026ndash;586. https://doi.org/10.1093/JNCI/DJI073 \u003c/li\u003e\n\u003cli\u003eSzalai, B., Subramanian, V., Holland, C. H., Alf\u0026ouml;ldi, R., Pusk\u0026aacute;s, L. G., \u0026amp; Saez-Rodriguez, J. (2019). Signatures of cell death and proliferation in perturbation transcriptomics data-from confounding factor to effective prediction. Nucleic Acids Research, 47(19), 10010\u0026ndash;10026. https://doi.org/10.1093/NAR/GKZ805 \u003c/li\u003e\n\u003cli\u003eVučković, N., Hoppe-Seyler, K., \u0026amp; Riemer, A. B. (2023). Characterization of DoTc2 4510\u0026mdash;Identifying HPV16 Presence in a Cervical Carcinoma Cell Line Previously Considered to Be HPV-Negative. Cancers, 15(15). https://doi.org/10.3390/CANCERS15153810/S1 \u003c/li\u003e\n\u003cli\u003eVuyst, H. De, Ndirangu, G., Moodley, M., Tenet, V., Estambale, B., Meijer, C. J. L. M., Snijders, P. J. F., Clifford, G., \u0026amp; Franceschi, S. (2012). Prevalence of human papillomavirus in women with invasive cervical carcinoma by HIV status in Kenya and South Africa. International Journal of Cancer, 131(4), 949\u0026ndash;955. https://doi.org/10.1002/IJC.26470 \u003c/li\u003e\n\u003cli\u003eWang, J., Gu, X., Cao, L., Ouyang, Y., Qi, X., Wang, Z., \u0026amp; Wang, J. (2022). A novel prognostic biomarker CD3G that correlates with the tumor microenvironment in cervical cancer. Frontiers in Oncology, 12, 979226. https://doi.org/10.3389/FONC.2022.979226/FULL \u003c/li\u003e\n\u003cli\u003eWang, W., Lokman, N. A., Barry, S. C., Oehler, M. K., \u0026amp; Ricciardelli, C. (2025). LGR5: An emerging therapeutic target for cancer metastasis and chemotherapy resistance. Cancer and Metastasis Reviews, 44(1). https://doi.org/10.1007/S10555-024-10239-X \u003c/li\u003e\n\u003cli\u003eWeinstein, J. N., Collisson, E. A., Mills, G. B., Shaw, K. R. M., Ozenberger, B. A., Ellrott, K., Sander, C., Stuart, J. M., Chang, K., Creighton, C. J., Davis, C., Donehower, L., Drummond, J., Wheeler, D., Ally, A., Balasundaram, M., Birol, I., Butterfield, Y. S. N., Chu, A., \u0026hellip; Kling, T. (2013). The Cancer Genome Atlas Pan-Cancer analysis project. Nature Genetics 2013 45:10, 45(10), 1113\u0026ndash;1120. https://doi.org/10.1038/ng.2764 \u003c/li\u003e\n\u003cli\u003eWickham, H., Chang, W., Henry, L., Pedersen, T. L., Takahashi, K., Wilke, C., Woo, K., Yutani, H., Dunnington, D., \u0026amp; van den Brand, T. (2025). Create Elegant Data Visualisations Using the Grammar of Graphics [R package ggplot2 version 4.0.1]. CRAN: Contributed Packages. https://doi.org/10.32614/CRAN.PACKAGE.GGPLOT2 \u003c/li\u003e\n\u003cli\u003eWei, C. et al. (2024). Integrated machine learning identifies a cellular senescence-related prognostic model to improve outcomes in uterine corpus endometrial carcinoma. https://doi.org/10.3389/fimmu.2024.1418508. \u003c/li\u003e\n\u003cli\u003eWisniewski, S. J., \u0026amp; Brannan, G. D. (2024). Correlation (Coefficient, Partial, and Spearman Rank) and Regression Analysis. StatPearls. https://www.ncbi.nlm.nih.gov/books/NBK606101/ \u003c/li\u003e\n\u003cli\u003eXia, W. T., Qiu, W. R., Yu, W. K., Xu, Z. C., \u0026amp; Zhang, S. H. (2023). Identifying TME signatures for cervical cancer prognosis based on GEO and TCGA databases. Heliyon, 9(4), e15096. https://doi.org/10.1016/J.HELIYON.2023.E15096 \u003c/li\u003e\n\u003cli\u003eYang, J., Xu, J., Wang, W., Zhang, B., Yu, X., \u0026amp; Shi, S. (2023). Epigenetic regulation in the tumor microenvironment: molecular mechanisms and therapeutic targets. Signal Transduction and Targeted Therapy 2023 8:1, 8(1), 210-. https://doi.org/10.1038/s41392-023-01480-x \u003c/li\u003e\n\u003cli\u003eYang, W., Soares, J., Greninger, P., Edelman, E. J., Lightfoot, H., Forbes, S., Bindal, N., Beare, D., Smith, J. A., Thompson, I. R., Ramaswamy, S., Futreal, P. A., Haber, D. A., Stratton, M. R., Benes, C., McDermott, U., \u0026amp; Garnett, M. J. (2013). Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Research, 41(Database issue). https://doi.org/10.1093/NAR/GKS1111 \u003c/li\u003e\n\u003cli\u003eYing, L., Zhang, L., Chen, Y., Huang, C., Zhou, J., Xie, J., \u0026amp; Liu, L. (2025). Predicting immunotherapy prognosis and targeted therapy sensitivity of colon cancer based on a CAF-related molecular signature. Scientific Reports 2025 15:1, 15(1), 1\u0026ndash;19. https://doi.org/10.1038/s41598-025-90899-z \u003c/li\u003e\n\u003cli\u003eZhang, S. Y., Ren, X. Y., Wang, C. Y., Chen, X. J., Cao, R. Y., Liu, Q., Pan, X., Zhou, J. Y., Zhang, W. L., Tang, X. R., Cheng, B., \u0026amp; Wu, T. (2021). Comprehensive Characterization of Immune Landscape Based on Epithelial-Mesenchymal Transition Signature in OSCC: Implication for Prognosis and Immunotherapy. Frontiers in Oncology, 11, 587862. https://doi.org/10.3389/FONC.2021.587862/BIBTEX \u003c/li\u003e\n\u003cli\u003eZhao, A., Pan, Y., Gao, Y., Zhi, Z., Lu, H., Dong, B., Zhang, X., Wu, M., Zhu, F., Zhou, S., \u0026amp; Ma, S. (2024). MUC1 promotes cervical squamous cell carcinoma through ERK phosphorylation-mediated regulation of ITGA2/ITGA3. BMC Cancer 2024 24:1, 24(1), 1\u0026ndash;14. https://doi.org/10.1186/S12885-024-12314-6 \u003c/li\u003e\n\u003cli\u003eZhao, Y., Li, M. C., Konat\u0026eacute;, M. M., Chen, L., Das, B., Karlovich, C., Williams, P. M., Evrard, Y. A., Doroshow, J. H., \u0026amp; McShane, L. M. (2021). TPM, FPKM, or Normalized Counts? A Comparative Study of Quantification Measures for the Analysis of RNA-seq Data from the NCI Patient-Derived Models Repository. Journal of Translational Medicine, 19(1). https://doi.org/10.1186/S12967-021-02936-W \u003c/li\u003e\n\u003cli\u003eZhao, Y., Zhao, C., Zhao, J., Ma, Y., Zhang, S., Liu, Y., Wang, Y., Liu, S., \u0026amp; Zhang, Y. (2025). Excavation of Molecular Subtypes of Cervical Cancer Based on DNA Methylation Patterns. Frontiers in Bioscience (Landmark Edition), 30(9). https://doi.org/10.31083/FBL45025 \u003c/li\u003e\n\u003cli\u003eZheng, Y., Han, J., Qu, Y., Wang, J., Joyce, B. T., Kim, K., Nannini, D. R., Musa, J., Imade, G. E., Anorlu, R., Maiga, M., Morhason-Bello, I., Simon, M. A., Silas, O., Abdulkareem, F. B., Badmos, K., Nyam, C. J., Gursel, D. B., Wei, J. J., \u0026hellip; Hou, L. (2025). DNA methylation biomarkers for cervical cancer risk prediction in HIV-positive Nigerian women. International Journal of Cancer, 157(7), 1363\u0026ndash;1375. https://doi.org/10.1002/IJC.35502;JOURNAL:JOURNAL:10970215;WGROUP:STRING:PUBLICATION \u003c/li\u003e\n\u003cli\u003eZhu, X., Li, S., Luo, J., Ying, X., Li, Z., Wang, Y., Zhang, M., Zhang, T., Jiang, P., \u0026amp; Wang, X. (2022). Subtyping of Human Papillomavirus-Positive Cervical Cancers Based on the Expression Profiles of 50 Genes. Frontiers in Immunology, 13, 801639. https://doi.org/10.3389/FIMMU.2022.801639/BIBTEX \u003c/li\u003e\n\u003cli\u003eZhu, Y. (2025). Leveraging Data Visualization with ggplot2 in Translation Pedagogy: Enhancing Learning Through Visual Insights. Lecture Notes in Computer Science, 15589 LNCS, 135\u0026ndash;144. https://doi.org/10.1007/978-981-96-4407-0_11 \u003c/li\u003e\n\u003cli\u003eZięba, S., Kowalik, A., Zalewski, K., Rusetska, N., Goryca, K., Piaścik, A., Misiek, M., Bakuła-Zalewska, E., Kopczyński, J., Kowalski, K., Radziszewski, J., Bidziński, M., G\u0026oacute;źdź, S., \u0026amp; Kowalewska, M. (2018). Somatic mutation profiling of vulvar cancer: Exploring therapeutic targets. Gynecologic Oncology, 150(3), 552\u0026ndash;561. https://doi.org/10.1016/j.ygyno.2018.06.026\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"npj-womens-health","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"Learn more about [npj Women's Health](https://www.nature.com/npjwomenshealth/)","snPcode":"44294","submissionUrl":"https://submission.springernature.com/new-submission/44294/3","title":"npj Women's Health","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"NPJ","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"Cervical cancer, Women’s health, Intra-tumour heterogeneity, Tumour microenvironment, Epigenomics, DNA methylation, Data-driven stratification, Computational oncology, Machine learning, Tumour subtypes, Precision oncology, Experimental model selection","lastPublishedDoi":"10.21203/rs.3.rs-8708919/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8708919/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eCervical cancer remains a persistent yet preventable threat to women\u0026rsquo;s health worldwide, with a disproportionate burden borne by women in low- and middle-income countries. In sub-Saharan Africa, including South Africa, it continues to rank among the leading causes of cancer-related morbidity and mortality despite the availability of screening, vaccination, and treatment strategies. Structural inequities in healthcare access, late-stage diagnosis, and the prevalence of biologically aggressive disease contribute to poor outcomes, underscoring the need for molecularly informed and context-sensitive precision medicine approaches. A central biological challenge in cervical cancer management is pronounced intra-tumour heterogeneity (ITH), arising from the coexistence of multiple tumour subclones shaped by genetic variation, epigenetic regulation, and dynamic tumour microenvironment (TME) pressures. This heterogeneity drives tumour adaptation, immune evasion, therapeutic resistance, and disease recurrence, complicating clinical decision-making and limiting the durability of standard treatments. These challenges are further intensified by persistent human papillomavirus (HPV) infection and, in many settings, HIV co-infection, which together impose distinct immune and stromal programmes that fundamentally shape tumour behaviour. Advances in computational biology and analytical programming have enabled the large-scale analysis of patient-derived omics data, including genomic, epigenomic, transcriptomic, and proteomic profiles, often through machine learning\u0026ndash;based classification, clustering, and predictive modelling frameworks. However, despite the widespread availability of multi-omics datasets through resources such as The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO), their translation into clinically meaningful tumour stratification and experimentally actionable insight for cervical cancer remains limited. A key barrier is the lack of integrated, biologically grounded frameworks capable of reconstructing tumour\u0026ndash;microenvironment interaction states directly from bulk patient data and systematically linking these inferred states to representative and experimentally tractable \u003cem\u003ein vitro\u003c/em\u003e model systems. As a consequence, commonly used cervical cancer models frequently fail to capture critical immune and stromal dimensions, contributing to poor translatability of preclinical findings. To address this gap, we developed a DNA methylation\u0026ndash;based computational framework for data-driven tumour stratification and tumour\u0026ndash;microenvironment state inference. The framework integrates epigenomic feature restriction, joint tumour\u0026ndash;TME modelling, and machine learning\u0026ndash;based state reconstruction to infer biologically meaningful tumour microenvironment states directly from patient methylation profiles. It is designed not merely as a clustering pipeline, but as a generalizable epigenetic state inference engine that connects patient tumour states to experimentally controllable \u003cem\u003ein vitro\u003c/em\u003e systems, with a specific focus on cervical cancer cell lines as the primary translational models. By explicitly modelling tumour-intrinsic, microenvironmental, and host-associated regulatory programmes\u0026mdash;including those influenced by HIV infection\u0026mdash;this framework enables the systematic selection and evaluation of cell line models that more faithfully recapitulate patient tumour biology. It advances precision oncology by providing a reproducible and interpretable approach to methylation-driven tumour stratification and cell line alignment in cervical cancer, with broader applicability to other immune-modulated malignancies and underserved disease contexts.\u003c/p\u003e","manuscriptTitle":"A DNA Methylation–based Computational Framework for Tumour–microenvironment State Inference and Molecular Stratification","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-05-18 09:13:45","doi":"10.21203/rs.3.rs-8708919/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"reviewerAgreed","content":"267813260004908356096927397873752116627","date":"2026-05-20T23:22:02+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"5929338549736952686967169787894467358","date":"2026-05-20T18:07:02+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"122675671771402837039327331007697783484","date":"2026-05-11T13:44:32+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2026-05-07T17:52:18+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2026-02-06T06:54:38+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2026-01-30T15:34:08+00:00","index":"","fulltext":""},{"type":"submitted","content":"npj Women's Health","date":"2026-01-27T09:24:56+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"npj-womens-health","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"Learn more about [npj Women's Health](https://www.nature.com/npjwomenshealth/)","snPcode":"44294","submissionUrl":"https://submission.springernature.com/new-submission/44294/3","title":"npj Women's Health","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"NPJ","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"d2b5ab25-a989-4052-b6f3-fca2cca9019a","owner":[],"postedDate":"May 18th, 2026","published":true,"recentEditorialEvents":[{"type":"reviewerAgreed","content":"267813260004908356096927397873752116627","date":"2026-05-20T23:22:02+00:00","index":34,"fulltext":""},{"type":"reviewerAgreed","content":"5929338549736952686967169787894467358","date":"2026-05-20T18:07:02+00:00","index":33,"fulltext":""},{"type":"reviewerAgreed","content":"122675671771402837039327331007697783484","date":"2026-05-11T13:44:32+00:00","index":24,"fulltext":""},{"type":"reviewersInvited","content":"17","date":"2026-05-07T17:52:18+00:00","index":"","fulltext":""}],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":68243942,"name":"Biological sciences/Cancer"},{"id":68243943,"name":"Biological sciences/Computational biology and bioinformatics"},{"id":68243944,"name":"Health sciences/Oncology"}],"tags":[],"updatedAt":"2026-05-18T09:13:45+00:00","versionOfRecord":[],"versionCreatedAt":"2026-05-18 09:13:45","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8708919","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8708919","identity":"rs-8708919","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Outcome instruments

MUSA

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall
last seen: 2026-05-22T02:00:06.705733+00:00
License: CC-BY-4.0