Keywords
peritoneal endometriosis, biomarkers, transcriptomics, machine learning, whole genome
RNA sequencing
All rights reserved. No reuse allowed without permission.
perpetuity.
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in
The copyright holder for thisthis version posted December 30, 2025. ; https://doi.org/10.64898/2025.12.23.25342915doi: medRxiv preprint
NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice.
Abstract
Context. Peritoneal endometriosis (PE) remains challenging to diagnose , as it cannot be detected
using standard imaging modalities and no clinically validated biomarkers are available.
Objective. To identify novel blood -based biomarkers for PE using whole blood transcriptomics
combined with machine learning approaches.
Design, Setting, and Patients. This observational study enrolled 48 women undergoing laparoscopic
surgery for endometriosis -related symptoms at tertiary referral centre s in Slovenia and Austria.
Patients were classified as having PE (n=20), peritoneal and ovarian endometriosis (PE and OE, n=8),
or no endometriosis (controls, n=20). Patients were further stratified by menstrual phase (proliferative
or secretory). Whole blood samples were collected preoperatively.
Methods. Whole-blood RNA sequencing was performed, and differenti ally expressed genes (DGEs)
and transcripts (DTEs) were identified . Sequencing data were processed using a machine learning
pipeline to select key features and develop support vector machine (SVM) classifiers for predicting
endometriosis status.
Results. In the proliferative group, no DGEs and only two DTEs distinguished PE from controls . In
contrast, in the secretory group, 1,035 DGEs and 922 DTEs were identified, with no overlap between
menstrual phases . Enrichment analysis of secretory phase DGEs indicated their involvement in
angiogenesis and immune-related pathways. Feature selection identified six transcripts that achieved
the best SVM classification performance in distinguishing cases from controls across both menstrual
phases (AUC = 0.92, sensitivity = 75%, specificity = 100%).
Conclusion. This study provides first evidence that integrating whole-blood transcriptomics with
machine learning can identify potential blood-based biomarkers for PE and highlights the influence of
menstrual cycle phase. These findings require validation in larger, independent cohort.
Introduction
Endometriosis is a complex chronic disease affecting approximately 10 % of women in their
reproductive age 1. It is an oestrogen-dependent, and chronic inflammatory condition associated with
infertility and chronic pelvic pain 2. Histologically, endometriosis is characterised by the ectopic
presence of endometrial stromal and epithelial cells, often accompanied by hemosiderin -containing
macrophages 3. Endometriotic lesions are primarily found on the surface of the peritoneum
(superficial peritoneal endometriosis, PE), on the ovaries (ovarian endometrioma, O E) or as nodules
that penetrate more than 5 mm beneath the peritoneum (deep endometriosis, DE) 3,4.
Patients with endometriosis present with a variety of non -specific symptoms that often overlap
with those of other gynaecological and gastrointestinal diseases , such as infertility, uterine fibroids,
inflammatory bowel syndrome, or pelvic inflammatory disease 5,6. Additionally, approximately 20 – 25
% of patients with confirmed endometriosis are asymptomatic 7,8. Despite the high prevalence of this
disease, endometriosis symptoms are often triviali sed, discouraging patients from seeking medical
help sooner 9. Consequently, there is a significant delay in diagnosis worldwide, ranging from 4 to 11
years after symptom onset 5,10,11.
Traditionally, laparoscopy, the surgical visualisation of endometriotic lesions, has been considered
the gold standard for diagnosi ng endometriosis. However, this invasive and costly procedure carries
surgical risks and can show variable results if not confirmed histologically 12,13. Studies have shown
that two-thirds of women undergoing laparoscopy are not diagnosed with endometriosis, suggesting
that many undergo unnecessary surgery 14. While OE and DE can be diagnosed with imaging modalities
such as transvaginal ultrasound (TVUS) and magnetic resonance imaging (MRI), these techniques are
not useful for diagnosing superficial PE, which accounts for approximately 80 % of all endometriosis
cases 2,15. Furthermore, common non -pigmented endometriotic lesions present in patients with
superficial PE may even be missed at laparoscopy 16,17. Recently , the European Society of Human
Reproduction and Embryology (ESHRE) guidelines have recommended that the diagnosis of
All rights reserved. No reuse allowed without permission.
perpetuity.
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in
The copyright holder for thisthis version posted December 30, 2025. ; https://doi.org/10.64898/2025.12.23.25342915doi: medRxiv preprint
endometriosis should be considered in patients with related symptoms. Confirmation should be
achieved with imaging, reserving laparoscopy for cases where imaging results are negative or where
empirical treatment has been unsuccessful or is inappropriate. Clinicians are advised against using
biomarkers for the diagnosis of endometriosis, as there is currently no reliable or clinically validated
biomarker available for this condition 18. However, identification of accurate, reliable, and
appropriately validated non-invasive biomarker s for endometriosis is needed to reduce current
diagnostic delays and accelerate patient treatment 19,20.
To date, non -invasive biomarkers for endometriosis have been investigated in various biological
samples, including peripheral blood, urine, menstrual blood, saliva, faeces and cervical mucus. Among
these, peripheral blood, particularly serum and plasma, has been used most frequently 21,22. In
contrast to serum and plasma, whole blood does not require separation into its constituent
components, reducing potential inter-sample variability during processing and storage, and resulting
in more reproducible results 23-25. Furthermore, whole blood collection is a standardized and routine
procedure, making it ideal for rapid point -of-care tests 24. As blood perfuses all organs, it provides
insights into human physiology and health, acting as a reporter for both systemic and locali sed
diseases. Consequently, transcriptome analysis of blood can identify gene expression signatures and
biomarkers for diagnosis, prognosis, and treatment monitoring 26-28.
So far, different molecules have been considered as potential biomarkers for endometriosis,
including inflammatory markers, apoptosis and endothelial markers, glycoproteins, growth factors,
oxidative stress markers, miRNAs, circRNAs and lncRNAs 25,29,30. To identify new biomarkers for
endometriosis, studies use either hypothesis -driven approach es, or hypothesis generating - high-
throughput (‘omics’) techniques, or a combination of both 31. Over the past 25 years, high-throughput
screening technologies have been widely used to generate large-scale biological datasets to improve
understanding, diagnosis and treat ment of endometriosis 32,33. Additionally, different machine
learning approaches are increasingly integrated into bioinformatics studies to analyse large datasets,
such as patient s clinical and lifestyle data, imaging data , and the expression of proteins, genes,
metabolites and their combinations, to establish models for the diagnosis of endometriosis 34.
However, many of these studies do not specify the subtype of endometriosis present among the
enrolled patients but instead analy se all types of the disease as a single entity. As different
endometriosis subtypes exhibit different pathophysiological characteristics 35, it is unlikely that to
identify biomarkers that diagnose all types of endometriosis with high sensitivity and specificity .
Furthermore, studies often lack a proper description of the modelling approach and are rarely
validated in larger, independent cohorts 34,36.
In our previous studies, we searched for biomarkers of endometriosis in serum, plasma , and
peritoneal fluid samples using different approaches , including targeted metabolomics 37,38 and
proteomics 39-42. In the present study, our aim was to identify a panel of blood biomarker candidates
for PE using whole genome transcriptomics combined with machine learning techniques.
Materials and methods
Study design and patient selection
This study was approved by the National Medical Ethics Committee of the Republic of Slovenia
(approval no. 120 -5412019-5), and the Republic of Austria (approval no. 545/2010). Patients were
recruited prospectively between January 2020 and December 2022 at the University Medical Centre
(UMC) Ljubljana, Slovenia, and between November 2019 and January 2020 at the Medical University
of Vienna, Austria (Figure 1). Inclusion criteria were symptoms suggestive of endometriosis (chronic
pelvic pain and/or infertility), reproductive age (18 to 39 years), and willingness to participate in the
study. All recruited patients signed a written informed consent form upon inclusion. Exclusion criteria
were pregnancy at the time of surgery, history or presence of malignant diseases and/or autoimmune
diseases such as rheumatoid arthritis, inflammatory bowel disease, or autoimmune thyroid disease.
Whole blood samples were collected from all patients. All women included underwent laparoscopic
All rights reserved. No reuse allowed without permission.
perpetuity.
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in
The copyright holder for thisthis version posted December 30, 2025. ; https://doi.org/10.64898/2025.12.23.25342915doi: medRxiv preprint
surgery and were characterized by the presence (cases) or absence (controls) of endometriosis after
histological examination. Patients diagnosed with endometriosis were further classified according to
the revised American Society of Reproductive Medicine scoring system (rASRM). Each patient also
completed a questionnaire detailing g eneral information about diet, lifestyle, smoking status,
recreational habits, and ethnic origin. The attending physician completed an additional questionnaire
covering clinical and gyn aecological data, including use of oral contraception or hormonal therapy,
medication use, regularity of menstrual cycle, phase of menstrual cycle at the time of surgery, and a
surgical report with staging and scoring of endometriosis, previous gynaecological surgeries and
presence of other pathologies.
Case and control patients were carefully matched to ensure there were no significant differences
in mean BMI or mean age. Furthermore, none of the participants had taken oral contraceptives in the
three months preceding laparoscopy. Patients were divided into proliferative (n=24) and secretory
(n=24) groups based on their menstrual cycle phase at the time of surgery. In the proliferative group,
all cases (n=12) had PE, whereas no endometriosis was detected in the control patients (n=12). In the
secretory group, cases (n=16) were divided into patients with PE (n=8) and those with both PE and OE
(n=8). At the time of patient enrolment, there were insufficient numbers of patients in the secretory
phase with PE only. Therefore, patients with combined PE and OE who matched the control criteria
were also included. The absence of endometriosis was confirmed laparoscopically for controls (n=8).
The clinical characteristics of patients in the discovery phase of the study are shown in Table 1.
Figure 1. Flowchart of the patient selection. n- number, BMI – body mass index, PE - peritoneal endometriosis, OE – ovarian
endometriosis, OC – oral contraception. Created with BioRender.com.
Study included an untargeted transcriptomic approach for biomarkers discovery , combined with
machine learning techniques. Total RNA was extracted from patients’ whole blood and subjected to
ribosomal RNA removal. Quantified cDNA libraries were then prepared, and whole-genome RNA was
sequenced using Illumina platforms. The RNA sequencing data were used for differential gene and
transcript expression analysis, as well as pathway enrichment analysis. Additionally, the sequencing
data were integrated into a machine learning pipeline for the selection of the most informative genes
and transcripts using feature selection techniques, followed by the construction of support vector
All rights reserved. No reuse allowed without permission.
perpetuity.
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in
The copyright holder for thisthis version posted December 30, 2025. ; https://doi.org/10.64898/2025.12.23.25342915doi: medRxiv preprint
machine classifiers to predict the endometriosis status of patients. The flowchart of the study design
is shown in Figure 2.
Figure 2. Schematic representation of the study design. Eligible participants were recruited prospectively and underwent
laparoscopy, with histological confirmation of endometriotic lesions . Whole blood samples from patients were collected,
total RNA was extracted , quantified, and cDNA libraries were prepared and sequenced using Illumina platforms. The RNA
sequencing data were used for differential gene and transcript expression analysis, functional enrichment analysis, and were
integrated into a machine learning pipeline for feature selection and construction of a classifier to predict the endometriosis
status of patients. Created with BioRender.com.
Sample collection and processing
Sample collection and processing w ere carried out according to a strict standard operating
procedure 43. Up to o ne day to one hour before surgery, 3 ml blood samples were collected into
Tempus Blood RNA tube (Applied Biosystems , Waltham, MA, USA) at the Department of Obstetrics
and Gynaecology at the UMC Ljubljana, Slovenia and at the Department of Gynaecology and Medical
University Centre Vienna, Austria. Immediately a fter blood collection, the tubes were shaken
vigorously for at least 20 seconds. The tubes were stored at 4 o C for up to 5 days and then transferred
to -80°C until further analysis according to manufacturer’s instructions.
RNA isolation and quality analysis
RNA was isolated according to the manufacturer’s instructions using the Tempus Spin RNA Isolation
kit (Applied Biosystems, Waltham, M A, USA). Briefly, stabilised blood was thawed at room
temperature and transferred into a 50 ml tube. Then, 3 to 4 ml of 1 x phosphate-buffered saline (PBS)
was added to the tube to reach a total volume of 12 ml. The tube was vortexed vigorously for 1.5
minutes and centrifuged at 3000 × g for 30 minutes at 4 °C . After centrifugation, the tube contents
were carefully poured off, and the RNA pellet was resuspended in 400 µl of RNA Purification
Resuspension Solution. The resuspended RNA was transferred to a purification filter and purified with
Wash Solutions 1 and 2. Finally, the RNA was eluted in Nucleic Acid Purification Elution Solution ,
aliquoted and stored at -80 °C for further analysis. The concentration of isolated RNA was determined
using a NanoDrop O ne (Thermo Fisher Scientific, Waltham , M A, USA), while RNA integrity was
assessed with the Agilent Bioanalyzer Instrument using the RNA 600 Nanokit (Agilent Technologies
Inc., Santa Clara, CA, USA). RNA samples were subjected to strict quality control before being sent for
sequencing. Samples with an RNA Integrity Number (RIN) > 7.5, a concentration of at least 20 ng/µl ,
and a volume of 20 µl were used for whole genome RNA sequencing (Novogene, Cambridge, UK).
RNA sequencing
First, ribosomal RNA was removed and rRNA free residues were cleaned using an rRNA removal kit
and ethanol precipitation. Subsequently, RNA fragmentation was performed, and first-strand cDNA
was synthesised. During second-strand cDNA synthesis, dTTPs were replaced by dUTPs in the reaction
buffer. The directional library was prepared after end repair, A-tailing, adapter ligation, size selection,
USER enzyme digestion, amplification, and purification. Quantified libraries were pooled and
All rights reserved. No reuse allowed without permission.
perpetuity.
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in
The copyright holder for thisthis version posted December 30, 2025. ; https://doi.org/10.64898/2025.12.23.25342915doi: medRxiv preprint
sequenced on Illumina platforms. The samples were sequenced in two separate batches: the first
comprising samples from patients in the proliferative group, and the second from those in the
secretory group. For alignment and calculation of read counts, we used CLC Genomics Workbench
21.0.4 and 22.0.1 (QIAGEN Aarhus, Denmark) using the Identify and Annotate Differentially Expressed
Genes and Pathways 1.19 and RNA-Seq Analysis 2.7 tool with the default settings. We used the
Reference
Homo sapiens GRCh38.104 (Gene, RNA). For calculation of TPM values, we used the
Differential Expression for RNA-seq 2.7 and 2.8 tools with the default settings.
RNA sequencing analysis
We performed differential gene expression (DGE) and differential transcript expression (DTE)
analyses based on read counts data using R 44 version 4.3.0 with DESeq2 45 version 1.42.1. Prior to
analysis, genes/transcripts with a non-zero read count in fewer than six samples were filtered out. The
analyses were conducted separately for patients in each of the two menstrual cycle phases. For both
phases, w e compared patients without endometriosis (controls) to those with endometriosis,
grouping patients with PE only and PE and OE together. Additionally, for the secretory phase, we
performed a further analysis by dividing patients into three groups: (1) patients without
endometriosis, (2) patients with PE only, and (3) patients with both PE and OE . We then compared
each group against the other two groups separately. Genes/transcripts were considered differentially
expressed if their adjusted p -value was below 0.05. Volcano plots displaying the -log10 adjusted p-
value against the log2 fold change were created using the Matplotlib library. The workflow of the
experimental analysis is illustrated in Figure 3.
Figure 3. Scheme of experimental workflow. PE – peritoneal endometriosis, OE – ovarian endometriosis, DGE – differential
expressed genes, DTE - differentially expressed transcripts, TPM – transcripts per million, SVM – support vector machine.
Gene set enrichment analysis
Only DEGs identified in patients from the secretory group were included in the gene set enrichment
analysis. The DGEs were compared across four groups: controls vs PE only, controls vs PE + OE, controls
vs all cases (PE only and PE + OE), and PE only vs PE + OE (Figure 3). All DGEs were divided into two
subsets: upregulated and downregulated genes. Within each subset, genes were ranked in ascending
order according to their adjusted p-values. These ranked gene lists were then individually inputted
into the g:GOSt tool, a component of the g:Profiler web service 46, for gene set enrichment analysis.
The "ordered query" option was selected, while other settings remained at their default values. The
All rights reserved. No reuse allowed without permission.
perpetuity.
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in
The copyright holder for thisthis version posted December 30, 2025. ; https://doi.org/10.64898/2025.12.23.25342915doi: medRxiv preprint
analysis was performed using g: Profiler version e113_eg59_p19_f6a03c19, with the database last
updated on Mon Jun 16 2025 46.
Principal component analysis
Genes/transcripts with zero entries across all samples were filtered out . Transcript per million
(TPM) values were centred based on genes/transcripts and log -transformed before conducting
principal component analysis (PCA). PCA was separately conducted for each menstrual cycle phase as
well as for the combined data of all patients across both phases. This analysis was performed using
the scikit-learn library for dimensionality reduction and with the Matplotlib library for visualization.
Feature Selection, Model Training, and Predictions Datasets
For this part, transcript per million (TPM) values were used. We constructed ten datasets: five at
the gene level and five at the transcript level. The gene-level datasets included: 1) all participants in
the proliferative phase, 2) all participants in the secretory phase, 3) all participants (both proliferative
and secretory phases), 4) participants in the secretory phase restricted to controls and cases with PE
only, and 5) participants in the proliferative phase restricted to a controls and cases with PE only
(Figure 3). The transcript-level datasets were constructed using the same participant groupings as the
gene-level datasets.
First, the genes/transcripts with zero entries across all samples were removed. Second , each
dataset was divided into training and test sets. In datasets containing participants from only one
menstrual cycle phase, 25% of participants were allocated to the test set. In contrast, for datasets with
participants from both phases, 15% of participants were assigned to the test set. The test sets were
formed by randomly sampling an equal number of participants from each group to maintain the
original proportions between groups in the dataset. By 'group ', we refer to patients in the same
menstrual phase and with the same endometriosis status (absence of endometriosis, PE, or PE + OE).
Feature selection on the training set
Consistent with the machine learning terminology, we refer here to each gene/transcript as a
feature. Feature standardization was performed to normalize the data, transforming it to have a mean
of zero and a standard deviation of one. Then, the importance of each feature was assessed by
applying three different techniques (separately to each of the ten training datasets ): mutual
information, random forest importance and support vector machine (SVM) weights. All three were
implemented using the scikit -learn library . M utual Information assessed the mutual dependence
between each feature and the endometriosis status . For random forest importance, feature
importance was evaluated based on how effectively each feature improved the predictive accuracy of
a trained random forest classifier. For SVM weights, a linear SVM model was trained on all features
and then, t he weights of the features, indicat ing their influence on the decision boundary , were
extracted. From the results of each feature selection technique, a list of the 2000 most important
genes/transcripts was prepared. Using this shortlist, recursive feature elimination (RFE) with an SVM
classifier (RFE-SVM) 47 was then performed to further refine the selection of relevant features. Initially,
an SVM classifier was trained on data from genes/transcripts from the list, and its performance was
assessed through leave -one-out cross-validation on the training dataset, calculating the area under
the curve (AUC) of the receiver-operator characteristic (ROC) curve. Subsequently, genes/transcripts
were sorted based on their contribution to distinguishing patients with endometriosis from those
without it, and the least informative gene/transcript was removed from the list. This iterative process
continued until the AUC started decreasing , i.e. at the end of this process, the minimal set of
genes/transcripts that achieved an AUC of 1.0 on the training set (indicating a perfect classifier capable
of accurately categorizing patients with or without endometriosis across all thresholds) was retrieved.
Predictions on the test set
All rights reserved. No reuse allowed without permission.
perpetuity.
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in
The copyright holder for thisthis version posted December 30, 2025. ; https://doi.org/10.64898/2025.12.23.25342915doi: medRxiv preprint
To evaluate the performance of the models on unseen data, each set of genes/transcripts obtained
from RFE was used for training the SVM model. The training set was filtered to exclusively include
genes/transcripts from the retrieved set, and likewise, the test set was filtered accordingly.
Subsequently, sensitivity, specificity, and the area under th e curve (AUC) of the ROC curve were
computed. From the model’s decision function, ROC curves were calculated with the library pROC in
R and visualized with the Matplotlib library in Python.
SVM model hyperparameter tunning
For the linear SVM, the only hyperparameter needing selection is C. We opted for tuning C using
the training set comprising participants in the secretory phase, filtered to retain solely differentially
expressed transcripts. This decision stemmed from the dataset's apparent linear separability observed
on the PCA plot. We evaluated C across a range of values: 0.001, 0.01, 0.1, 1, 10, and 100, calculating
the ROC AUC via leave-one-out cross-validation on the same dataset. From this analysis C value of 0.1
was selected for further utilization, being the second smallest value associated with an AUC of 1.0.
Statistical analysis of patient’s clinical data
Patients’ clinical data were analysed as follows. The normality of distribution was assessed using
the Shapiro-Wilk test, and the outliers were identified and excluded using the ROUT method (Q = 1,
FDR < 1 %). For continuous variables, the unpaired t-test or Mann-Whitney test was used. Categorical
clinical variables were compared using Fisher’s exact test, the Chi-squared test, or Chi-squared test for
trend. Statistical analysis was performed with GraphPad Prism 9.3 (GraphPad Software, San Diego, CA,
USA), with the significance level set at p < 0.05.
Results
Clinical characteristics of patients
The study included two groups: the proliferative group and the secretory group. In the proliferative
group, 12 patients with PE and 12 control subjects were included. The secretory group included 8
patients with PE, 8 patients with both PE and OE, and 8 controls (Figure 1).
In both the proliferative and secretory group, there were no significant differences between cases
and controls regarding age, body mass index (BMI), smoking status or use of oral contraceptives or
hormonal therapy in the last 3 months before surgery . A statistically significant difference was
observed in medication use prior to surgery between controls and cases only in the proliferative group
(p=0.014). All endometriosis patients in the proliferative group were classified as rASRM stage I. In the
secretory group, 75 % of patients with PE were classified as stage I, 12.5% as stage II, and 12.5% as
stage III, while none were in stage IV. Among patients with PE and OE group, 50 % were classified as
stage II and 50 % as stage IV. Clinical characteristics of patients included in the study is shown in Table
1.
Table 1. Clinical characteristics of patients included in the study
PROLIFERATIVE GROUP
Characteristic Unit Detail Patients with PE Controls p-value
Number of patients n - 12 12
Age (mean ± SD) years - 30.33 ± 4.14 31.25 ± 4.47 0.608
BMI (mean ± SD) kg/m2 - 21.98 ± 2.72 22.15 ± 3.18 0.886
Oral contraceptives in
the last 3 months
n (%) Yes 0 (0) 0 (0) >0.999
No 12 (100) 12 (0)
Hormonal therapy in
the last 3 months
n (%) Yes 0 (0) 0 (0) >0.999
No 12 (100) 12 (0)
Medication in the last
week
n (%) Yes 0 (0) 6 (50) 0.014
No 12 (100) 6 (50)
Smoking status n (%) Non-smoker 8 (67) 9 (75) 0.344
All rights reserved. No reuse allowed without permission.
perpetuity.
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in
The copyright holder for thisthis version posted December 30, 2025. ; https://doi.org/10.64898/2025.12.23.25342915doi: medRxiv preprint
Smoker 3 (25) 3 (25)
Occasional
smoker
0 (0) 0 (0)
Former smoker 1 (8) 0 (0)
rASRM n (%) I 12 (100) - -
II 0 (0) -
III 0 (0) -
IV 0 (0) -
SECRETORY GROUP
Characteristic Unit Detail Patients
with PE
Patients
with PE
and OE
Controls p-value
Number of patients n - 8 8 8
Age (mean ± SD) years - 29.13 ± 4.02 31.5 ± 2.62 30.75 ± 2.12 0.297
BMI (mean ± SD) kg/m2 - 22.91 ± 2.07 21.45 ± 1.87 21.91 ± 2.86 0.448
Oral contraceptives in
the last 3 months
n (%) Yes 0 (0) 0 (0) 0 (0) >0.999*
>0.999**
No 8 (100) 8 (100) 8 (100)
Hormonal therapy in
the last 3 months
n (%) Yes 1 (12.5) 0 (0) 0 (0) >0.999*
>0.999** No 7 (87.5) 8 (100) 8 (100)
Medication in the last
week
n (%) Yes 4 (50) 3 (38) 2 (25) 0.608*
0.999** No 4 (50) 5 (62) 6 (75)
Smoking status n (%) Non-smoker 5 (62.5) 5 (62) 3 (37.5) 0.246*
0.888** Smoker 2 (25) 0 (0) 1 (12.5)
Occasional
smoker
0 (0) 0 (0) 2 (25)
Former smoker 1 (12.5) 3 (38) 2 (25)
rASRM n (%) I 6 (75) 0 (0) - -
II 1 (12.5) 4 (50) -
III 1 (12.5) 0 (0) -
IV 0 (0) 4 (50) -
Abbreviations: n, number; SD, standard deviation; BMI – body mass index, PE, peritoneal endometriosis, O E, ovarian
endometriosis; rASRM, revised American Society of Reproductive Medicine score; *PE versus controls, **PE and OE versus
controls
PCA analysis based on all genes and transcripts clustered patients according to their menstrual
phase
PCA was carried out for each menstrual cycle phase individually and for the combined data from
all patients in both the proliferative and secretory phases. We visualized the first six principal
components against each other, with data points colored according to various factors including
endometriosis status, menstrual phase, and various metadata. The metadata encompassed potential
technical sources of variation such as recruitment location (Slovenia or Austria), RNA isolation date,
time between hospitalization date, as well as clinical and lifestyle information about patients . This
patient clinical and lifestyle data comprised the r ASRM score, age, age at menarche, maternal and
paternal ethnic origins, smoking status, sport/recreation, reports about pelvic, abdominal or back
pain, menstrual pain frequency, menstrual pain intensity, pain during sexual intercourse (in general
and in the last 3 months), score of pain during sexual intercourses, pain during urination/defecation
(in the last 3 months), nausea and vomiting, regularity of menstrual cycle, partus, miscarriage.
When performing PCA on all genes/transcripts, c lustering of samples was observed when
genes/transcripts from both phases were clustered by menstrual phase (Supplementary Figures 1 and
2). No clustering was observed based on endometriosis status or other metadata (Supplementary
Figure 1 - 3).
All rights reserved. No reuse allowed without permission.
perpetuity.
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in
The copyright holder for thisthis version posted December 30, 2025. ; https://doi.org/10.64898/2025.12.23.25342915doi: medRxiv preprint
In the secretory group, most of the differentially expressed genes and transcripts were identified
between controls and patients with peritoneal endometriosis
DGE and DTE a nalyses were performed separately for each menstrual cycle phase. For the
proliferative phase, we compared controls to PE only cases. For the secretory phase, we compared
four groups: controls, PE only, PE + OE, and all cases. No DGEs were identified among participants in
the proliferative group (Supplementary Figure 4, left). In the secretory group, 48 DGEs were identified
between all controls and all cases (Figure 4, left), 1035 DGEs between controls and cases with PE only
(Figure 4, middle), 3 DGEs between controls and cases with PE + O E (Figure 4, right), and 16 DGEs
between cases with PE only and cases with both PE and PE + OE (Supplementary Figure 4, right).
Figure 4. Volcano plots of differentially expressed genes across participants in the secretory group. Volcano plots display
differentially expressed genes (DGEs) for the following group comparisons: all controls versus all cases (left), controls versus
cases with PE only (middle) , and controls versus cases with both PE and OE (right). PE – peritoneal endometriosis, OE –
ovarian endometriosis. Differentially expressed genes are located above the dashed vertical line and are colored in blue.
In the proliferative group, only two differentially expressed transcripts were identified
(Supplementary Figure 5). In the secretory group, 110 DTEs were identified between all cases and
controls (Figure 5, left), 922 DTEs between controls and cases with PE only (Figure 5, middle), 29 DTEs
between controls and cases with both PE and OE (Figure 5, right), and 53 DTEs between the cases with
PE only and cases with PE and OE (Supplementary Figure 5).
Figure 5. Volcano plots of differentially expressed transcripts across participants in the secretory group. Volcano plot display
differentially expressed transcripts for the following group comparisons: all controls versus all cases (left), controls versus
cases with PE only (middle), and controls versus cases with both PE and OE (right). PE – peritoneal endometriosis; OE –
ovarian endometriosis. Differentially expressed transcripts are located above the dashed vertical line and colored in blue.
Gene set enrichment analysis highlights immune and inflammatory pathways in peritoneal
endometriosis
Gene set enrichment analysis was performed only for the secretory group, using the identified
upregulated and downregulated DGEs for comparisons between controls and defined case groups (all
cases, PE only, PE + OE).
The highest number of DGEs was identified between controls and cases with PE only. Functional
enrichment analysis of the upregulated genes in this group revealed significant enrichment across
Gene Ontology (GO) biological processes, R eactome, and KEGG pathways. The top enriched terms
All rights reserved. No reuse allowed without permission.
perpetuity.
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in
The copyright holder for thisthis version posted December 30, 2025. ; https://doi.org/10.64898/2025.12.23.25342915doi: medRxiv preprint
ranked by number of DGEs , were related to immune and inflammatory processes such as immune
system, innate immune system, neutrophil degranulation, cytokine receptor signalling, and regulation
of RAGE receptor binding. Additionally, pathways involved in intracellular signalling, cellular response
to stimulus or stress, and vesicle trafficking were also prominent (Figure 6A, B). Downregulated DGEs
for the comparison of the same group s were enriched in pathways related to T cell activation, a key
component of the cell -based immune response (Figure 6C). Identified enriched pathways for the up
and downregulated genes between the following groups: controls, PE only, PE+ OE, all cases (PE + PE
+OE), are listed in Supplementary Tables 1 and 2.
Figure 6. Top enriched pathways in secretory group participants with peritoneal endometriosis. Top enriched pathways are
ranked by the number of differentially expressed genes ( DGEs) contributing to each term , identified between peritoneal
endometriosis patients and controls in the secretory group. Pathway enrichment was performed using g:Profiler. The Y-axis
lists the top 10 –20 enriched terms, and the X -axis indicates the number of DGE s associated with each pathway. Pathways
are ranked by gene count, and only statistically significant terms (adjusted p-value < 0.05) are included. Enrichment analysis
is shown for A), B) upregulated and C) downregulated genes.
Among the DGEs identified between all controls and cases, three upregulated genes (CD93, CXCL8,
NINJ1) were found to be enriched in pathways like angiogenesis, a process known to contribute to the
development and progression of endometriosis (Supplementary Table 11). Additionally, two to three
upregulated genes (NINJ1, CD14, PTAFR) were associated with lipopolysaccharide (LPS) binding and
LPS immune receptor activity, which are linked to innate immune responses (Supplementary Table 2).
One of the three DGEs identified between controls and cases with both PE and OE endometriosis was
CDO1, which is associated with taurine and hypotaurine metabolism (Supplementary Table 2). Among
the downregulated DGEs identified between cases with PE only and those with both PE and O E,
pathway enrichment analysis revealed associations with neutrophil degranulation (CAMP, CRISP3,
ARG1, KRT1), defence response (CAMP, CRISP3, ARG1, KRT1, OASL), and innate immune response
(CAMP, CRISP3, ARG1, KRT1, OASL) (Supplementary Table 1).
Venn diagram analysis revealed one downregulated gene ( B3GAT1) that was common in three
group comparisons: all controls vs. all cases, all controls vs. PE cases, and all controls vs. PE+ OE cases
(Supplementary Figure 6, left). Among upregulated genes, 26 were shared between all controls vs. all
cases and all controls vs. PE cases, while one downregulated gene ( CDO1) overlapped between all
All rights reserved. No reuse allowed without permission.
perpetuity.
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in
The copyright holder for thisthis version posted December 30, 2025. ; https://doi.org/10.64898/2025.12.23.25342915doi: medRxiv preprint
controls vs. all cases and all controls vs. PE + OE endometriosis cases (Supplementary Figure 6, right).
These overlaps highlight subsets of genes that may be consistently dysregulated across clinical
presentations of PE and patients with both PE and OE, with potential relevance to common disease
mechanisms and biomarker development.
Feature selection and model predictions analysis identified a set of six DTEs with the highest
discriminatory performance in distinguishing cases from controls in both menstrual phases
Table 2 and Table 3 show the sets of genes/transcripts for which the SVM models achieved the
highest ROC AUC values on the test set within each group. ROC curves for the overall best-performing
models are shown in Figures 7 and 8, while ROC curves for the remaining top models are provided in
Supplementary Figures 7 to 14. In Table 2, for all participants in the proliferative group, a ROC AUC of
0.67, sensitivity of 1.0, and specificity of 0.67 were achieved with the selected 3 genes. For all
participants in the secretory group, predictions based on the identified 3 genes resulted in a ROC AUC
of 0.88, sensitivity of 0.75, and specificity of 1.0. In the secretory group, for controls and cases with PE
only, performance with the selected single gene was slightly worse: ROC AUC of 0.75, sensitivity of
0.5, and specificity of 0.5. When all pa rticipants from the proliferative and secretory group were
analysed together, the identified set of 8 genes resulted in a ROC AUC of 0.75, sensitivity of 1.0, and
specificity of 0.67. For the comparison between all controls and all cases with PE only, the identified
set of 7 genes resulted in a lower ROC AUC (0.67) and sensitivity (0.33), but a higher specificity of 1.0.
Table 2. Sets of genes identified by the feature selection pipelines for which the SVM models achieved the highest ROC AUC
values on the test set within each group.
Group of participants Feature selection
Method
prior RFE-
SVM
Models based on genes Performance on
test set
all participants
(PROLIFERATIVE GROUP)
mutual information Gene model 1 (3 genes) AUC: 0.67
Sensitivity: 1.0
Specificity: 0.67
all participants
(SECRETORY GROUP)
mutual information Gene model 2 (3 genes) AUC: 0.88
Sensitivity: 0.75
Specificity: 1.0
all controls and cases with
peritoneal endometriosis only
(SECRETORY GROUP)
random forest
importance
Gene model 3 (1 gene) AUC: 0.75
Sensitivity: 0.5
Specificity: 0.5
all participants
(PROLIFERATIVE AND
SECRETORY GROUP)
mutual information Gene model 4 (8 genes) AUC: 0.75
Sensitivity: 0.75
Specificity: 0.67
all controls and cases with
peritoneal endometriosis only
(PROLIFERATIVE AND
SECRETORY GROUP)
SVM weights Gene model 5 (7 genes) AUC: 0.67
Sensitivity: 0.33
Specificity: 0.67
*Model achieving the highest ROC AUC is highlighted bold
All rights reserved. No reuse allowed without permission.
perpetuity.
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in
The copyright holder for thisthis version posted December 30, 2025. ; https://doi.org/10.64898/2025.12.23.25342915doi: medRxiv preprint
Figure 7. Classification performance and PCA visualization based on selected genes in the secretory group. Left: Receiver
operating characteristic ( ROC) curve showing the predictive performance of the SVM model for all participants in the
secretory group. The model was trained using the gene set identified through the feature-selection procedure initiated with
mutual information. Right: Principal component analysis (PCA) plot generated using the same selected gene set for all
participants in the secretory group (right).
In Table 3 , it is shown that f or all participants in the proliferative group, the single selected
transcript yielded a ROC AUC of 0.78, sensitivity of 0.67, and specificity of 0.67. For all participants in
the secretory group, the two identified transcripts achieved a ROC AUC of 0.75, sensitivity of 0.75, and
specificity of 0.5. For the comparison between controls and cases with PE only, the performance with
the selected single transcript was very good, with a ROC AUC of 1.0, sensitivity of 1.0, and specifi city
of 0.5. For all participants, from both the proliferative and secretory group , the identified set of 6
transcripts, resulted in a ROC AUC of 0.92, sensitivity of 0.75, and specificity of 1.0. For the comparison
between all controls and all cases with PE only, the transcript set consisting of 3 transcripts resulted
in a ROC AUC of 0.89, sensitivity of 0.67, and specificity of 1.0.
Table 3. Sets of transcripts identified by the feature selection pipelines for which the SVM models achieved the highest
ROC AUC values on the test set within each group.
Group of participants Feature selection
Method
prior RFE-
SVM
Models based on transcripts Performance on
test set
all participants
(PROLIFERATIVE GROUP)
mutual information/
SVM weights
Transcript model 1 (1
transcript)
AUC: 0.78
Sensitivity: 0.67
Specificity: 0.67
all participants
(SECRETORY GROUP)
random forest
importance
Transcript model 2 (2
transcripts)
AUC: 0.75
Sensitivity: 0.75
Specificity: 0.5
all controls and cases with
peritoneal endometriosis only
(SECRETORY GROUP)
SVM weights Transcript model 3 (1
transcript)
AUC: 1.0
Sensitivity: 1.0
Specificity: 0.5
all participants
(PROLIFERATIVE AND
SECRETORY GROUP)
random forest
importance
Transcript model 4 (6
transcripts)
AUC: 0.92
Sensitivity: 0.75
Specificity: 1.0
all controls and cases with
peritoneal endometriosis only
(PROLIFERATIVE AND
SECRETORY GROUP)
mutual information Transcript model 5 (3
transcripts)
AUC: 0.89
Sensitivity: 0.67
Specificity: 1.0
*Model achieving the highest ROC AUC is highlighted bold
All rights reserved. No reuse allowed without permission.
perpetuity.
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in
The copyright holder for thisthis version posted December 30, 2025. ; https://doi.org/10.64898/2025.12.23.25342915doi: medRxiv preprint
Figure 8. Classification performance and PCA visualization based on selected transcripts for all the participants from both
proliferative and secretory group. Left: Receiver operating characteristic (ROC) curve showing the predictive performance of
the SVM model for all participants. The model was trained using the set of transcripts identified through the feature-selection
procedure initiated with random forest importance. Right: Principal component analysis (PCA) plot generated using the same
selected transcript set for all participants.
When performing PCA on the set of genes/transcript with the highest AUC of the ROC curve (Table
2 and 3) on the test set of the SVM models build on sets and transcripts identified by the feature
selection procedure, clustering of patients based on endometriosis status was observed (Figures 7 and
8 and Supplementary Figures 7 to 14).
Discussion
This is t he first study to use the whole-blood transcriptomics and machine learning methods to
identify novel biomarker candidates specific to PE. The analysis identified a set of six DTEs with the
highest discriminatory performance between controls and cases in the test dataset (ROC AUC= 0.92,
sensitivity = 75%, specificity = 100%). Similarly, a three gene panel achieved high performance in the
secretory group (AUC = 0.88, sensitivity = 75%, specificity = 100%). These findings suggest that, after
validation, blood-based transcriptomic markers may offer promising non -invasive tools for the
detection of PE.
To date, most transcriptomics studies using machine learning approaches have focused on
analysing gene expression data derived from ectopic, eutopic, or healthy endometrial tissue samples,
obtained either from the Gene Expression Omnibus (GEO) database or from newly collected samples
subjected to microarray analysis 48-51. In contrast, fewer studies have explored non -invasive
approaches for biomarker discovery in endometriosis. These s tudies have examined levels of miRNA
in serum, plasma, and saliva 52-54, lncRNAs in plasma-derived extracellular vesicles 55, mRNA expression
in menstrual fluid 56 or lncRNAs in the serum of endometriosis patients 57. Notably, most studies that
searched for non -invasive biomarkers for endometriosis have focused on miRNAs; however, these
investigations have reported inconsistent results and limited overlap among the identified candidate
miRNA biomarkers 29,58. One such study developed a salivary signature comprising 109 miRNAs ,
developed using miRNA sequencing and a machine learning random forest model, and proposed it as
a potential diagnostic tool for endometriosis (commonly referred to as Endotest) 54,59. Although this
signature was recently validated in a multicentre study 60, it has not yet received approval from
national health technology assessment bodies for routine clinical implementation, as further
independent, real -world validation outside controlled research settings is still required 61-63. In
another study, Su et al. 64 performed biomarker discovery by analysing publicly available GEO datasets
and using machine learning algorithms to develop an integrative model for predicting endometriosis,
resulting in a nine -gene diagnostic panel. This model was subsequently validated on whole blood
samples from endometriosis patients (n=29) and controls (n= 30). However, more than 65% of patients
All rights reserved. No reuse allowed without permission.
perpetuity.
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in
The copyright holder for thisthis version posted December 30, 2025. ; https://doi.org/10.64898/2025.12.23.25342915doi: medRxiv preprint
in the validation cohort had rAFS stage III –IV disease, and two of the nine genes showed suboptimal
performance. Further validation in a larger, multicentre cohort is therefore required.
In our study, whole blood samples from patients were used to identify candidate biomarkers for
endometriosis through a non -targeted transcriptomics approach. Previous studies have primarily
employed targeted strategies, such as measuring the levels of specific mRNA molecules in peripheral
blood samples 36,65. When RNA sequencing techniques were applied, they were mostly performed on
separated blood components, such as serum and plasma 53,66 or on isolated peripheral blood
mononuclear cells 67.
As shown in Tables 2 and 3, different feature selection techniques applied prior to RFE -SVM
produced distinct sets of genes and transcripts that yielded the best SVM classifier performance across
groups on the test dataset. Interestingly, overlap between feature selection techniques was observed
only in the transcript -based analysis of all participants in the proliferative group, where mutual
information and SVM weights produced an identical final transcript set following RFE -SVM.
Furthermore, there was minimal overlap among the identified gene/transcript sets across different
groups, with only two genes being selected more than once, and no transcripts recurring between
analyses. Neither of these two genes has previously been linked to endometriosis.
In our study, t wo SVM models based on DGEs and three models based on DTEs demonstrated
strong diagnostic potential, achieving an AUC of up to 1.0, and/or meeting the criteria for rule -in or
rule-out tests (sensitivity ≥ 95% for rule-out, specificity ≥ 95% for rule-in) on the test set (Tables 2 and
3). The model based on three DGEs showed the best performance in discriminating endometriosis
patients from controls in the secretory group , reaching an AUC of 0.88, sensitivity of 75 % and
specificity of 100 %. Of these, two genes are l ong non coding RNAs (lncRNAs), previously not
associated with endometriosis, while one is a small nucleolar RNA (snoRNA), found to be increased in
colorectal endometriosis compared to the eutopic endometrium of women with endometriosis 68.
Encouragingly, one model based on DTEs performed well even on datasets including participants
in both menstrual phases, effectively predicting endometriosis status regardless of the menstrual
phase at the time of blood withdrawal. Such a menstrual -phase-independent test would be much
more practical for clinical use, as it simplifies and standardises sample collection, eliminating the need
to perform the test in a specific menstrual cycle phase. This model , also based on a set of six
transcripts, showed the highest performance , reaching an AUC of 0.92, sensitivity of 75 % and
specificity of 100 %. Among the panel of six identified transcripts, five are protein-coding, while one is
a retained intron.
When performing PCA, we did not detect sources of variation in the data originating from technical
variation or clinical data, except for the menstrual phase which in our case coincided with the
sequencing batch . Therefore, it was challenging to determine conclusively whether the observed
differences in PCA between patients in different phases stem from genuine biological differences were
instead a result of technical variations between the two sequencing batches (referred to as batch
effects 69, and to mitigate this effect computationally. Consequently, gene/transcript differential
expression analysis was conducted separately for each menstrual phase. Previous studies have shown
that both normal endometrial tissue and endometriotic lesions exhibit phase-dependent variations in
gene expression 70,71. Therefore, it is important to account for menstrual phase as a variable when
trying to identify reliable molecular biomarkers of endometriosis. Transcriptomic profiling of whole
blood across menstrual phases in our study revealed distinct, phase -specific expression patterns
associated with PE. No DGEs and only two DTEs were detected in samples from the proliferative group.
The observed difference in medication use prior to surgery between controls and cases in the
proliferative group may have influenced transcriptional profiles, and contribute d to the limited
number of differentially expressed genes and transcripts detected in this group. In contrast, in the
secretory group, tens to hundreds of DGEs and DTEs were identified between cases and controls.
Specifically, the highest numbers of DGEs (1035) and DTEs (922) w ere detected between all controls
and PE only cases. No overlapping DGEs or DTEs were observed between the two menstrual phases,
underscoring the phase-specific expression profiles.
All rights reserved. No reuse allowed without permission.
perpetuity.
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in
The copyright holder for thisthis version posted December 30, 2025. ; https://doi.org/10.64898/2025.12.23.25342915doi: medRxiv preprint
A common approach in biomarker discovery studies relies on selecting differential ly expressed
genes based on p -values and/or fold changes. However, this method can miss biologically relevant
signals, especially when expression changes are subtle or phase-specific or may yield long lists of genes
that are challenging and impractical to use for validation or diagnosis. In addition, analyses at the gene
level do not account for transcript -specific changes that may be critical in distinguishing groups. To
address these limitations, machine learning methods have been increasingly applied, as they handle
complex transcriptomic data and identify patterns that conventional methods might overlook.
Machine learning also has limitations, one being multiplicity, whe re two distinct models achieve
similar performance in the training dataset but vary greatl y on independent/test dataset s. This
instability, also observed in our study, indicates that model accuracy alone does not guarantee the
identification of biologically relevant features. Consequently, multiplicity can lead to inconsistent
biomarker sets, emphasising the need for cautious interpretation and experimental validation 72.
This study has several strengths. It analysed the whole blood transcriptome, providing a non -
invasive and technically robust approach that avoids pre-analytical variability introduced by plasma or
serum processing and captures the full range of RNA species, not limited to mRNA or miRNA. The
study focuses on the most common form of endometriosis, PE, which is still not detectable with
imaging techniques. Strict standard operating procedures for blood collection and RNA isolation were
followed, and cases and controls were carefully matched for age, BMI, hormonal therapy, and smoking
status to minimi se confounding effects. Because endometriosis is an oestrogen-dependent disease
influenced by hormonal fluctuations throughout the menstrual cycle, patients were stratified by
menstrual phase. This approach enabled the identification of biomarkers that are independent of cycle
phase, as well as markers that are specific to phases of the cycle. The limitations of this study include
a relatively small sample size, difference in medication intake between cases and controls in the
proliferative group, lack of technical replication across sequencing batches, and partial confounding
of menstrual phase with sequencing batch. To minimi se technical variability, all samples were
processed at the same time by the same operator using identical protocols. Additionally, the findings
have not yet been validated in a larger, independent cohort or in populations of different ethnicities
and work in progress will address this through qPCR validation of the six identified transcripts.
Conclusion
To the best of our knowledge, this is the first study to integrate whole-genome transcriptomics
with machine learning techniques using whole blood samples for discovery of candidate biomarkers
associated with PE. Our analysis identified a six-transcript panel that performed well in distinsguishing
endometriosis patients from controls. However, validation in larger, independent cohorts is necessary
to confirm its diagnostic potential. The study revealed distinct gene expression profiles between
patients in the proliferative and secretory phases of the menstrual cycle, confirming the influence of
hormonal status on transcriptional patterns. The differential ly expressed genes identified as
upregulated in PE patients compared to controls were associated with angiogenesis and innate
immune pathways, supporrting important role of these processes in the pathophysiology of PE.
AUTHOR CONTRIBUTION
Conceptualization, T.L.R.; investigation, M.P. N., T.R . and A.V. , resources, T.L.R. and H.B.F; data
curation, H.B.F., R.W, M.P.N., writing—original draft preparation, M.P. N. and A.V. , writing—review
and editing, T.L.R., T.R. ; visualization, M.P.N., A.V. and T.R.; supervision, T.L.R.; project administration,
T.L.R. funding acquisition, T.L.R. All authors have read and agreed to the published version of the
manuscript.
Acknowledgements
The authors thank their study participants, who kindly donated their samples and time and the
personnel of the Department of Obstetrics and Gynaecology, University Medical Centre Ljubljana,
Ljubljana, Slovenia, especially Mrs. Tatjana Lončar.
All rights reserved. No reuse allowed without permission.
perpetuity.
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in
The copyright holder for thisthis version posted December 30, 2025. ; https://doi.org/10.64898/2025.12.23.25342915doi: medRxiv preprint
References
1. Zondervan KT, Becker CM, Missmer SA. Endometriosis. N Engl J Med. Mar 26
2020;382(13):1244–1256. doi:10.1056/NEJMra1810764
2. Saunders PTK, Horne AW. Endometriosis: Etiology, pathobiology, and therapeutic prospects.
Cell. May 27 2021;184(11):2807–2824. doi:10.1016/j.cell.2021.04.041
3. Bulun SE, Yilmaz BD, Sison C, et al. Endometriosis. Endocr Rev. Aug 1 2019;40(4):1048–1079.
doi:10.1210/er.2018-00242
4. Taylor HS, Kotlyar AM, Flores VA. Endometriosis is a chronic systemic disease: clinical
challenges and novel innovations. Lancet. Feb 27 2021;397(10276):839–852. doi:10.1016/s0140-
6736(21)00389-5
5. Frayne J, Milroy T, Simonis M, Lam A. Challenges in diagnosing and managing endometriosis
in general practice: A Western Australian qualitative study. Aust J Gen Pract. Aug 2023;52(8):547–
555. doi:10.31128/ajgp-10-22-6579
6. Ellis K, Munro D, Clarke J. Endometriosis Is Undervalued: A Call to Action. Front Glob
Womens Health. 2022;3:902371. doi:10.3389/fgwh.2022.902371
7. Moradi Y, Shams-Beyranvand M, Khateri S, et al. A systematic review on the prevalence of
endometriosis in women. Indian J Med Res. Mar 2021;154(3):446–454.
doi:10.4103/ijmr.IJMR_817_18
8. Bulletti C, Coccia ME, Battistoni S, Borini A. Endometriosis and infertility. J Assist Reprod
Genet. Aug 2010;27(8):441–7. doi:10.1007/s10815-010-9436-1
9. Sims OT, Gupta J, Missmer SA, Aninye IO. Stigma and Endometriosis: A Brief Overview and
Recommendations to Improve Psychosocial Well-Being and Diagnostic Delay. Int J Environ Res Public
Health. Aug 3 2021;18(15)doi:10.3390/ijerph18158210
10. Surrey E, Soliman AM, Trenz H, Blauer-Peterson C, Sluis A. Impact of Endometriosis
Diagnostic Delays on Healthcare Resource Utilization and Costs. Adv Ther. Mar 2020;37(3):1087–
1099. doi:10.1007/s12325-019-01215-x
11. Singh S, Soliman AM, Rahal Y, et al. Prevalence, Symptomatic Burden, and Diagnosis of
Endometriosis in Canada: Cross-Sectional Survey of 30 000 Women. J Obstet Gynaecol Can. Jul
2020;42(7):829–838. doi:10.1016/j.jogc.2019.10.038
12. Brosens IA, Brosens JJ. Is laparoscopy the gold standard for the diagnosis of endometriosis?
Eur J Obstet Gynecol Reprod Biol. Feb 2000;88(2):117–9. doi:10.1016/s0301-2115(99)00184-0
13. Wykes CB, Clark TJ, Khan KS. Accuracy of laparoscopy in the diagnosis of endometriosis: a
systematic quantitative review. Bjog. Nov 2004;111(11):1204–12. doi:10.1111/j.1471-
0528.2004.00433.x
14. Pascoal E, Wessels JM, Aas-Eng MK, et al. Strengths and limitations of diagnostic tools for
endometriosis and relevance in diagnostic test accuracy research. Ultrasound Obstet Gynecol. Sep
2022;60(3):309–327. doi:10.1002/uog.24892
15. Nisenblat V, Bossuyt PM, Farquhar C, Johnson N, Hull ML. Imaging modalities for the non-
invasive diagnosis of endometriosis. Cochrane Database Syst Rev. Feb 26 2016;2(2):Cd009591.
doi:10.1002/14651858.CD009591.pub2
16. Berker B, Seval M. Problems with the diagnosis of endometriosis. Womens Health (Lond).
Aug 2015;11(5):597–601. doi:10.2217/whe.15.44
17. Jansen RP, Russell P. Nonpigmented endometriosis: clinical, laparoscopic, and pathologic
definition. Am J Obstet Gynecol. Dec 1986;155(6):1154–9. doi:10.1016/0002-9378(86)90136-5
18. Becker CM, Bokor A, Heikinheimo O, et al. ESHRE guideline: endometriosis. Hum Reprod
Open. 2022;2022(2):hoac009. doi:10.1093/hropen/hoac009
19. Horne AW, Saunders PTK, Abokhrais IM, Hogg L. Top ten endometriosis research priorities in
the UK and Ireland. Lancet. Jun 3 2017;389(10085):2191–2192. doi:10.1016/s0140-6736(17)31344-2
All rights reserved. No reuse allowed without permission.
perpetuity.
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in
The copyright holder for thisthis version posted December 30, 2025. ; https://doi.org/10.64898/2025.12.23.25342915doi: medRxiv preprint
20. Rogers PA, Adamson GD, Al-Jefout M, et al. Research Priorities for Endometriosis. Reprod
Sci. Feb 2017;24(2):202–226. doi:10.1177/1933719116654991
21. Brulport A, Bourdon M, Vaiman D, et al. An integrated multi-tissue approach for
endometriosis candidate biomarkers: a systematic review. Reprod Biol Endocrinol. Feb 10
2024;22(1):21. doi:10.1186/s12958-023-01181-8
22. Rižner TL. Noninvasive biomarkers of endometriosis: myth or reality? Expert Rev Mol Diagn.
Apr 2014;14(3):365–85. doi:10.1586/14737159.2014.899905
23. Yu Z, Kastenmüller G, He Y, et al. Differences between human plasma and serum metabolite
profiles. PLoS One. 2011;6(7):e21230. doi:10.1371/journal.pone.0021230
24. May JE, Pemberton RM, Hart JP, McLeod J, Wilcock G, Doran O. Use of whole blood for
analysis of disease-associated biomarkers. Anal Biochem. Jun 1 2013;437(1):59–61.
doi:10.1016/j.ab.2013.02.024
25. Gibbons T, Rahmioglu N, Zondervan KT, Becker CM. Crimson clues: advancing endometriosis
detection and management with novel blood biomarkers. Fertil Steril. Feb 2024;121(2):145–163.
doi:10.1016/j.fertnstert.2023.12.018
26. Li S, Todor A, Luo R. Blood transcriptomics and metabolomics for personalized medicine.
Comput Struct Biotechnol J. 2016;14:1–7. doi:10.1016/j.csbj.2015.10.005
27. Mohr S, Liew CC. The peripheral-blood transcriptome: new insights into disease and risk
assessment. Trends Mol Med. Oct 2007;13(10):422–32. doi:10.1016/j.molmed.2007.08.003
28. Harrington CA, Fei SS, Minnier J, et al. RNA-Seq of human whole blood: Evaluation of globin
RNA depletion on Ribo-Zero library method. Sci Rep. Apr 14 2020;10(1):6271. doi:10.1038/s41598-
020-62801-6
29. Saare M, Peters M, Aints A, Laisk-Podar T, Salumets A, Altmäe S. OMICs Studies and
Endometriosis Biomarker Identification. In: D'Hooghe T, ed. Biomarkers for Endometriosis: State of
the Art. Springer International Publishing; 2017:227–258.
30. Dana PM, Taghavipour M, Mirzaei H, et al. Circular RNA as a potential diagnostic and/or
therapeutic target for endometriosis. Biomark Med. Sep 2020;14(13):1277–1287. doi:10.2217/bmm-
2020-0167
31. Hudson QJ, Perricos A, Wenzl R, Yotova I. Challenges in uncovering non-invasive biomarkers
of endometriosis. Exp Biol Med (Maywood). Mar 2020;245(5):437–447.
doi:10.1177/1535370220903270
32. Goulielmos GN, Matalliotakis M, Matalliotaki C, Eliopoulos E, Matalliotakis I, Zervou MI.
Endometriosis research in the -omics era. Gene. May 30 2020;741:144545.
doi:10.1016/j.gene.2020.144545
33. Samare-Najaf M, Razavinasab SA, Samareh A, Jamali N. Omics-based novel strategies in the
diagnosis of endometriosis. Crit Rev Clin Lab Sci. May 2024;61(3):205–225.
doi:10.1080/10408363.2023.2270736
34. Sivajohan B, Elgendi M, Menon C, Allaire C, Yong P, Bedaiwy MA. Clinical use of artificial
intelligence in endometriosis: a scoping review. npj Digital Medicine. 2022/08/04 2022;5(1):109.
doi:10.1038/s41746-022-00638-1
35. Imperiale L, Nisolle M, Noël JC, Fastrez M. Three Types of Endometriosis: Pathogenesis,
Diagnosis and Treatment. State of the Art. J Clin Med. Jan 28 2023;12(3)doi:10.3390/jcm12030994
36. Nisenblat V, Bossuyt PM, Shaikh R, et al. Blood biomarkers for the non-invasive diagnosis of
endometriosis. Cochrane Database Syst Rev. May 1 2016;2016(5):Cd012179.
doi:10.1002/14651858.Cd012179
37. Vouk K, Hevir N, Ribić-Pucelj M, et al. Discovery of phosphatidylcholines and sphingomyelins
as biomarkers for ovarian endometriosis. Hum Reprod. Oct 2012;27(10):2955–65.
doi:10.1093/humrep/des152
38. Vouk K, Ribič-Pucelj M, Adamski J, Rižner TL. Altered levels of acylcarnitines,
phosphatidylcholines, and sphingomyelins in peritoneal fluid from ovarian endometriosis patients. J
Steroid Biochem Mol Biol. May 2016;159:60–9. doi:10.1016/j.jsbmb.2016.02.023
All rights reserved. No reuse allowed without permission.
perpetuity.
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in
The copyright holder for thisthis version posted December 30, 2025. ; https://doi.org/10.64898/2025.12.23.25342915doi: medRxiv preprint
39. Kocbek V, Vouk K, Bersinger NA, Mueller MD, Lanišnik Rižner T. Panels of cytokines and
other secretory proteins as potential biomarkers of ovarian endometriosis. J Mol Diagn. May
2015;17(3):325–34. doi:10.1016/j.jmoldx.2015.01.006
40. Janša V, Klančič T, Pušić M, et al. Proteomic analysis of peritoneal fluid identified COMP and
TGFBI as new candidate biomarkers for endometriosis. Sci Rep. Oct 22 2021;11(1):20870.
doi:10.1038/s41598-021-00299-2
41. Janša V, Pušić Novak M, Ban Frangež H, Rižner TL. TGFBI as a candidate biomarker for non-
invasive diagnosis of early-stage endometriosis. Hum Reprod. Jul 5 2023;38(7):1284–1296.
doi:10.1093/humrep/dead091
42. Knific T, Vouk K, Vogler A, et al. Models including serum CA-125, BMI, cyst pathology,
dysmenorrhea or dyspareunia for diagnosis of endometriosis. Biomark Med. Jul 2018;12(7):737–747.
doi:10.2217/bmm-2017-0426
43. Rizner TL, Adamski J. Paramount importance of sample quality in pre-clinical and clinical
research-Need for standard operating procedures (SOPs). J Steroid Biochem Mol Biol. Feb
2019;186:1–3. doi:10.1016/j.jsbmb.2018.09.017
44. R: A language and environment for statistical computing. R Foundation for Statistical
Computing; 2021. https://www.R-project.org/
45. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-
seq data with DESeq2. Genome Biology. 2014/12/05 2014;15(12):550. doi:10.1186/s13059-014-
0550-8
46. Kolberg L, Raudvere U, Kuzmin I, Adler P, Vilo J, Peterson H. g:Profiler—interoperable web
service for functional enrichment analysis and gene identifier mapping (2023 update). Nucleic Acids
Research. 2023;51(W1):W207–W212. doi:10.1093/nar/gkad347
47. Guyon I, Weston J, Barnhill S, Vapnik V. Gene Selection for Cancer Classification using
Support Vector Machines. Machine Learning. 2002/01/01 2002;46(1):389–422.
doi:10.1023/A:1012487302797
48. Hosseini M, Hammami B, Kazemi M. Identification of potential diagnostic biomarkers and
therapeutic targets for endometriosis based on bioinformatics and machine learning analysis. J Assist
Reprod Genet. Oct 2023;40(10):2439–2451. doi:10.1007/s10815-023-02903-y
49. Xie Z, Feng Y, He Y, Lin Y, Wang X. Identification of biomarkers for endometriosis based on
summary-data-based Mendelian randomization and machine learning. Medicine (Baltimore). Apr 4
2025;104(14):e41804. doi:10.1097/md.0000000000041804
50. Jiang H, Zhang X, Wu Y, et al. Bioinformatics identification and validation of biomarkers and
infiltrating immune cells in endometriosis. Front Immunol. 2022;13:944683.
doi:10.3389/fimmu.2022.944683
51. Zhang H, Zhang H, Yang H, Shuid AN, Sandai D, Chen X. Machine learning-based integrated
identification of predictive combined diagnostic biomarkers for endometriosis. Front Genet.
2023;14:1290036. doi:10.3389/fgene.2023.1290036
52. Dryja-Brodowska A, Obrzut B, Obrzut M, Darmochwal-Kolarz D. miRNA in Endometriosis-A
New Hope or an Illusion? J Clin Med. Jul 8 2025;14(14)doi:10.3390/jcm14144849
53. Bendifallah S, Dabi Y, Suisse S, et al. MicroRNome analysis generates a blood-based
signature for endometriosis. Sci Rep. Mar 8 2022;12(1):4051. doi:10.1038/s41598-022-07771-7
54. Bendifallah S, Suisse S, Puchar A, et al. Salivary MicroRNA Signature for Diagnosis of
Endometriosis. J Clin Med. Jan 26 2022;11(3)doi:10.3390/jcm11030612
55. Shan S, Yang Y, Jiang J, et al. Extracellular vesicle-derived long non-coding RNA as circulating
biomarkers for endometriosis. Reprod Biomed Online. May 2022;44(5):923–933.
doi:10.1016/j.rbmo.2021.11.019
56. Amanda CR, Asmarinah, Hestiantoro A, Tulandi T, Febriyeni. Gene expression of aromatase,
SF-1, and HSD17B2 in menstrual blood as noninvasive diagnostic biomarkers for endometriosis.
European Journal of Obstetrics & Gynecology and Reproductive Biology. 2024/10/01/ 2024;301:95–
101. doi:https://doi.org/10.1016/j.ejogrb.2024.07.061
All rights reserved. No reuse allowed without permission.
perpetuity.
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in
The copyright holder for thisthis version posted December 30, 2025. ; https://doi.org/10.64898/2025.12.23.25342915doi: medRxiv preprint
57. Wang WT, Sun YM, Huang W, He B, Zhao YN, Chen YQ. Genome-wide Long Non-coding RNA
Analysis Identified Circulating LncRNAs as Novel Non-invasive Diagnostic Biomarkers for
Gynecological Disease. Sci Rep. Mar 18 2016;6:23343. doi:10.1038/srep23343
58. Vanhie A, Caron E, Vermeersch E, et al. Circulating microRNAs as Non-Invasive Biomarkers in
Endometriosis Diagnosis—A Systematic Review. Biomedicines. 2024;12(4):888.
59. Bendifallah S, Dabi Y, Suisse S, et al. Validation of a Salivary miRNA Signature of
Endometriosis - Interim Data. NEJM Evid. Jul 2023;2(7):EVIDoa2200282. doi:10.1056/EVIDoa2200282
60. Bendifallah S, Roman H, Suisse S, et al. Validation of a Saliva Micro-RNA Signature for
Endometriosis. NEJM Evid. Nov 2025;4(11):EVIDoa2400195. doi:10.1056/EVIDoa2400195
61. Vigano’ P, Vercellini P, Somigliana E, et al. “I’m looking through you”: What consumers and
manufacturers need to know about non-invasive diagnostic tests for endometriosis. Journal of
Endometriosis and Uterine Disorders. 2023;2doi:10.1016/j.jeud.2023.100031
62. Scheck SM, Henry C, Bedford N, et al. Non-invasive tests for endometriosis are here; how
reliable are they, and what should we do with the results? Aust N Z J Obstet Gynaecol. Apr
2024;64(2):168–170. doi:10.1111/ajo.13765
63. Kliber-Galuszka M, Kulczynska-Figurny K, Jagodzinski PP, Plawski A. Potential biomarkers for
early detection of endometriosis: current state of art (what we know so far). J Appl Genet. Oct 13
2025;doi:10.1007/s13353-025-01021-y
64. Su D, Guo Y, Yang R, et al. Identifying a panel of nine genes as novel specific model in
endometriosis noninvasive diagnosis. Fertility and Sterility. 2024/02/01/ 2024;121(2):323–333.
doi:https://doi.org/10.1016/j.fertnstert.2023.11.019
65. Fassbender A, Burney RO, O DF, D'Hooghe T, Giudice L. Update on Biomarkers for the
Detection of Endometriosis. Biomed Res Int. 2015;2015:130854. doi:10.1155/2015/130854
66. Papari E, Noruzinia M, Kashani L, Foster WG. Identification of candidate microRNA markers
of endometriosis with the use of next-generation sequencing and quantitative real-time polymerase
chain reaction. Fertil Steril. Jun 2020;113(6):1232–1241. doi:10.1016/j.fertnstert.2020.01.026
67. Andrieu T, Duo A, Duempelmann L, et al. Single-Cell RNA Sequencing of PBMCs Identified
Junction Plakoglobin (JUP) as Stratification Biomarker for Endometriosis. Int J Mol Sci. Dec 5
2024;25(23)doi:10.3390/ijms252313071
68. Ballester M, Gonin J, Rodenas A, et al. Eutopic endometrium and peritoneal, ovarian and
colorectal endometriotic tissues express a different profile of Nectin-1, -3, -4 and nectin-like
molecule 2. Human Reproduction. 2012;27(11):3179–3186. doi:10.1093/humrep/des304
69. Leek JT, Scharpf RB, Bravo HC, et al. Tackling the widespread and critical impact of batch
effects in high-throughput data. Nature Reviews Genetics. 2010/10/01 2010;11(10):733–739.
doi:10.1038/nrg2825
70. Burney RO. Biomarker development in endometriosis. Scand J Clin Lab Invest Suppl.
2014;244:75–81; discussion 80. doi:10.3109/00365513.2014.936692
71. Burney RO, Talbi S, Hamilton AE, et al. Gene expression analysis of endometrium reveals
progesterone resistance and candidate susceptibility genes in women with endometriosis.
Endocrinology. Aug 2007;148(8):3814–26. doi:10.1210/en.2006-1692
72. Heljakka A, Trapp M, Kannala J, Solin A. Disentangling model multiplicity in deep learning.
arXiv preprint arXiv:220608890. 2022;
All rights reserved. No reuse allowed without permission.
perpetuity.
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in
The copyright holder for thisthis version posted December 30, 2025. ; https://doi.org/10.64898/2025.12.23.25342915doi: medRxiv preprint
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.