Screening colorectal cancer associated autoantigens through multi-omics analysis and diagnostic performance evaluation of corresponding autoantibodies

Qiu, Zan; Cheng, Yifan; Liu, Haiyan; Li, Tiandong; Jiang, Yinan; Lu, Yin; Jiang, Donglin; Zhang, Xiaoyue; Wang, Xinwei; Kang, Zirui; Peng, Lei; Wang, Keyan; Dai, Liping; Ye, Hua; Wang, Peng; Shi, Jianxiang

doi:10.1186/s12885-025-14080-5

Research
Open access
Published: 16 April 2025

Screening colorectal cancer associated autoantigens through multi-omics analysis and diagnostic performance evaluation of corresponding autoantibodies

Zan Qiu^1,3,
Yifan Cheng^2,3,
Haiyan Liu^2,3,
Tiandong Li^2,3,
Yinan Jiang⁴,
Yin Lu^2,3,
Donglin Jiang^2,3,
Xiaoyue Zhang^2,3,
Xinwei Wang^1,3,
Zirui Kang^1,3,
Lei Peng^1,3,
Keyan Wang^1,3,
Liping Dai^1,3,
Hua Ye^2,3,
Peng Wang^2,3 &
…
Jianxiang Shi^1,3

BMC Cancer volume 25, Article number: 713 (2025) Cite this article

614 Accesses
15 Altmetric
Metrics details

Abstract

Background

This study aims to screen, validate novel biomarkers and develop a user-friendly online tool for the detection of colorectal cancer (CRC).

Methods

Multi-omics approach, comprising proteomic analysis and single-cell transcriptomic analysis, was utilized to discover candidate tumor-associated antigens (TAAs). The presence of tumor-associated autoantibodies (TAAbs) in serum was subsequently assessed using enzyme-linked immunosorbent assays (ELISA) in 300 CRC patients and 300 healthy controls. Ten machine learning algorithms were utilized to develop diagnostic models, with the optimal one selected and integrated into an R Shiny-based GUI to enhance usability and accessibility.

Results

We identified twelve potential TAAs: HMGA1, NPM1, EIF1AX, CKS1B, HSP90AB1, ACTG1, S100A11, maspin, ANXA3, eEF2, P4HB, and HKDC1. ELISA results showed that five TAAbs including anti-CKS1B, anti-S100A11, anti-maspin, anti-ANXA3, and anti-eEF2 were potential diagnostic biomarkers during the diagnostic evaluation phase (all P < 0.05). The Random Forest model yielded an AUC of 0.82 (95% CI: 0.78–0.88) on the training set and 0.75 (95% CI: 0.68–0.82) on the test set, demonstrating the robustness of the results. Web-based implementations of CRC diagnostic tools are publicly accessible via weblink https://qzan.shinyapps.io/CRCPred/.

Conclusions

A five biomarker panel can server as complementary biomarker to CEA and CA19-9 in CRC detection.

Peer Review reports

Introduction

Colorectal cancer (CRC) is the third most commonly diagnosed cancer worldwide and the second leading contributor to cancer-related mortality [1]. While CRC predominantly affects individuals over 50 years, a notable increase in incidence among younger populations has been reported [2]. Due to the poor prognosis associated with advanced-stage CRC, where five-year survival rates drop below 15%, early detection through regular screening programs is critical [3,4,5]. In clinical practice, sigmoidoscopy and colonoscopy are currently the standard tests for CRC diagnosis because of their high sensitivity and ability to detect visible precancerous lesions [6]. Additionally, stool-based tests can increase the efficiency of colonoscopy utilization [7]. Commercially available fecal immunochemical test shows a moderate sensitivity of 67.3% for CRC detection, while the multi-target stool DNA test demonstrates superior diagnostic performance with a sensitivity of 93.9% [8]. Although methylated SEPT9 is the only FDA-approved blood-based biomarker for CRC screening, its clinical utility is hampered by limited sensitivity, detecting only 44.7% of early-stage CRC and 11.2% of advanced adenomas [9]. Conventional clinical tumor markers, such as CEA and CA19-9, shows limited sensitivity, furtherly highlighting the potential need for more efficient and patient-friendly diagnosis options [10,11,12].

Tumor-associated autoantibodies (TAAb) have attracted attention as potential biomarkers for cancer diagnosis due to their stable presence in the bloodstream, even when corresponding antigen levels are low [13]. Autoantibodies can be detected earlier than the clinical onset of cancer, highlighting their value for early diagnosis [14]. Individual TAAbs have limited sensitivity and specificity, necessitating the combination of numerous TAAbs to increase diagnostic accuracy [15]. Anti‑p53 antibodies, the most extensively researched autoantibodies in CRC, may serve as biomarkers to distinguish CRC from healthy individuals or benign patients, a potential supported by a summary receiver operating characteristic curve with an AUC of 0.78 (95% CI: 0.76–0.81) [16].

However, the significance of identifying new tumor-associated antigens (TAAs) cannot be neglected, as autoantibodies, antibodies that target self-antigens, play a crucial role in modulating inflammatory responses, maintaining immune system homeostasis, and distinguishing between normal and tumor individuals in certain contexts [17]. In previous studies, the utilization of various techniques such as serological analysis of recombinant tumor cDNA expression libraries [18], phage cDNA libraries [19], serological proteome analysis(SERPA) [20], and protein microarrays [21] for TAAs identification in CRC.

Data mining is a valuable tool for identifying potentially useful patterns within large datasets, providing a more precise and reliable estimate of the efficiency of autoantibodies in CRC detection [14, 16]. The consensus molecular subtypes (CMS) of CRC, defined by integrating multi-omics data including genomics, epigenomics, transcriptomics, and immune-related proteomics, provide a comprehensive classification system that enables the identification of molecular markers with broad generalizability for CRC diagnosis [22]. Proteomic analyses elucidate distinct protein expression profiles, and previous studies also provide data for each CMS subtype [23]. Moreover, single-cell transcriptomics allows for a comprehensive analysis of the heterogeneity of CRC and modifications within the immune microenvironment at the single-cell level, facilitating the investigation of potential changes originating from epithelial cells [24, 25]. The integration of multi-omics approach facilitates the identification of novel TAAs, thereby providing a more comprehensive foundation for CRC diagnosis [26].

Our study aims to identify TAAs by using proteomic and single cell transcriptomic analysis, evaluate the diagnostic performance of their corresponding autoantibodies, and provide a scalable, cost-effective, and minimally invasive alternative to facilitate the detection of CRC.

Materials and methods

Participating patients and sample collection

This study included two groups: 300 CRC patients as CRC group, and 300 healthy controls (HCs) as HC group. Participants were matched by age (± 5 years) and gender, and were randomized in a 7:3 ratio, divided into a training set and a test set. The serum samples used in the study were from the Biological Specimen Bank of Henan Key Laboratory of Tumor Epidemiology (Henan, China) spanned from October 2020 to December 2023. All enrolled primary CRC cases were verified through pathological examination and were treatment-naïve. HCs were confirmed by reviewing their medical records to ensure they were free from malignancies or immune-related diseases. This study was approved by the Institutional Review Board of Zhengzhou University (Approval number: ZZURIB 2019-002). Written informed consent forms were obtained from all participants. All procedures were conducted in accordance with the relevant guidelines and regulations, as well as the Declaration of Helsinki. Early stages were categorized as stages 0 through II, and late stages were categorized as stages III and IV.

The blood samples were centrifuged at 3000 g for 5 min and the serum were aliquoted for long time storage in -80℃ freezer.

Identification of candidate TAAs based on multi-omics

The single-cell transcriptome data in the study were obtained from the Gene Expression Omnibus (GEO) database, including GSE132465, GSE144735 [24] and GSE200997 [27]. The proteomic data of COAD were downloaded from the Proteomic database from the Clinical Proteomic Tumor Analysis Consortium (CPTAC) database [23]. Tumor and normal epithelial cells were compared to identify abnormally highly expressed genes. Proteomic data were used to verify the overexpression of the identified genes. Participants’ information of relevant studies is shown in Supplementary Table S1.

Single-cell transcriptome data analysis

The R package Seurat (v4.3.2) was utilized to convert the matrix count for a single sample [28]. Subsequently, genes expressed in fewer than three cells were removed. Low-quality cells were eliminated based on the following criteria: cells containing fewer than 200 expressed genes, an erythrocyte ratio exceeding 10%, or mitochondrial content above 20%. Other processes are standard procedures [28]. Bulk effect correction is performed using harmony during the integration of three data sets [29]. Cell identity annotations for individual clusters are specified based on the expression of established marker genes and verified using CellTypist [30]. Subclusters of cells with comparable gene expression profiles are then assigned to the same cell type.

Differential expression analysis for single cell transcriptomic data

The wilcox.test algorithm within the FindMarkers function of the Seurat package was employed to identify differentially expressed genes in epithelial cells, using thresholds of Log₂FC > 0.25 and adj.P < 0.05. Subsequently, the subset function was utilized to segregate cells based on CMS classification. The same criteria of Log₂FC > 0.25 and adj.P < 0.05 were applied to obtain up-regulated differentially expressed genes across various CMS subtypes.

Differential expression analysis for proteomic data

The log-ratio normalized proteomic data were directly downloaded from the CPTAC database. Differentially expressed proteins were screened with log₂FC > 0.6 and adj.P < 0.05 using limma package for differential analysis [31].

Additional screening strategies are used to narrow down the candidate TAAs

Driver genes were collected from the IntOGen(CRC) [32] and OncoKB [33] databases. Genes within the CTDatabase [34] are named CT-related. A fetal gene expression signature [35,36,37] is supported by relevant articles, termed as Fetal_related.

In a separate study, the proteomic screening criteria were set at adj.P < 0.05 and log₂FC > 0.3. The down-regulated genes in small intestine carcinoma (SBA) epithelial cells compared with controls intersected with up-regulated genes in CRC epithelial cells compared with controls were designated as BSA_related.

All the genes listed above can be found in the Supplementary Table S2.

Function analysis of differentially expressed genes or proteins

Gene ontology (GO) annotation was performed to better understand the biological functions of these differentially expressed proteins. GO over-representation analysis of the selected genes was performed by using the clusterProfiler package [38].

Recombinant proteins and the detection of TAAbs by ELISA

Eight proteins (NPM1, EIF1AX, CKS1B, eEF2 (encoded by EEF2), P4HB, ANXA3, S100A11, HSP90AB1) were purchased from CUSABIO (Wuhan, China), and four proteins (HMGA1, ACTG1, maspin (encoded by SERPINB5), HKDC1) were purchased from Cloud-clone Corporation (Wuhan, China). The concentration, purity, and molecular weight of all proteins were confirmed using SDS/PAGE gel. The enzyme linked immunosorbent assay was performed with a coating concentration of 0.125 ug/ml for HMGA1, NPM1, S100A11 and 0.25 ug/ml for EIF1AX, HSP90AB1, ACTG1, CKS1B, maspin, ANXA3, eEF2, P4HB, HKDC1. The ELISA procedures were described in our previous study [39, 40]. Specific Binding Index (SBI) was used to evaluate the level of autoantibodies in peripheral serum, which represents the degree of binding between antigen and antibody. SBI = (OD_TBD - OD_Blank) / (OD_QC - OD_Blank). OD_TBD refers to the optical density (OD) value that needs to be determined, OD_QC represents the average OD value of the quality control (QC) samples, and OD_Blank denotes the average OD value of the blank wells.

Diagnostic model development

Ten machine learning algorithms were employed using the “tidymodels” R package in the training set based on the SBI values of the five TAAbs. These models include Logistic Regression, Decision Tree, Elastic Net, K-Nearest Neighbours, Light Gradient Boosting Machine, Random Forest (RF), eXtreme Gradient Boosting, Support Vector Machine, Multilayer Perceptron via nnet, and Stacking ensemble, chosen for their diverse methodologies and robust performance in identifying complex patterns within the data. A 10-fold cross-validation was performed to evaluate the predictive ability of the model. Precision-recall (PR) curves were employed to evaluate the ability of the model to discriminate, and decision curve analysis (DCA) were used to confirm the clinical effectiveness of the model further. Model performance was compared using AUCs on both training and test datasets. Statistical significance between datasets was evaluated using DeLong tests. The same statistical method was applied to assess differences in AUCs between models with equivalent sample sizes.

Statistical analysis

Data analysis and visualization was performed using SPSS Statistics 26.0 and R-4.3.2 software. Sample size calculation was performed using PASS software (Version 15, Confidence Intervals for One Proportion). Based on this analysis, the minimum sample size of the test set used in model development is 85 CRC patients and 58 HC participants. ROC analysis and the AUC with 95% CI were used to evaluate the diagnostic performance of the biomarkers and the model. The sensitivity and specificity were determined based on the cutoff value, which was defined as the SBI value at the maximum Youden index, while specificity is more than 85%. If either the TAAbs model or the clinical biomarker test yields a positive outcome, the individual is classified as positive. The corresponding positivity rate is then calculated. Three components chi-squared test is used to compare the diagnostic performance by using SPSS. Specifically, single-cell plotting is carried out using scRNAtoolVis (https://github.com/junjunlab/scRNAtoolVis) and plot1cell [41]. Analyses were judged statistically significant with a two-sided P-value of < 0.05.

Results

Study design and sample characteristics

This study was conducted in two steps: discovery of candidate TAAs (Step 1) and evaluation of TAAbs (Step 2) (Fig. 1). Proteomic data from 100 normal controls and 97 tumor cases were included in current study, and single-cell transcriptome data from 23 normal controls and 51 tumor cases. More detailed information can be found in Supplementary Table S1. A primary CRC single-cell transcriptome atlas was constructed, revealing an increased percentage of epithelial cells. Candidate TAAs were selected using screening strategies as shown in Fig. 1.

Subsequently, serum autoantibodies against 12 candidate TAAs were further evaluated by ELISA. The verification and validation phases were employed to assess the diagnostic performance of TAAbs. During the verification phase, diagnostic models were constructed in the training set, which included 210 CRC cases and 210 HC participants. In the validation phase, a test set comprising 90 CRC cases and 90 HC individuals was utilized to verify the potential diagnostic value of these TAAbs. Demographic and clinical characteristics of the study participants are presented in Table 1. Majority patients were diagnosed with stage II CRC in this study, including 118 (56.2%) in training set and 53 (58.9%) in test set. In accordance with routine clinical practice, CEA and CA19-9 were considered elevated above cutoff values of 5 ng/ml and 35 ng/ml, respectively. In both the training and test sets, the positive rates for CEA were 31.2% and 32.2%, while those for CA19-9 were 18.6% and 12.2%, respectively. In the model phase, 10 machine learning models were used for model selection and training, and hyperparameter tuning was used to improve the performance of the mode.

Table 1 Baseline characteristics of participants in the training set and test set

Full size table

Identification of candidate TAAs based on multi-omics

Epithelial cells were identified using markers (EPCAM and KRT19), and the subtypes of the epithelial cells were defined based on the reference map from CellTypist [42] (Fig. 2a). Odds ratios results showed that stem-like cells and colonocytes were enriched in tumor compared to adjacent controls (Fig. 2b). Stem-like cells showed upregulated LGR5, which is consistent with the accepted histological model of intestinal epithelium [43]. Compared to adjacent normal cells, tumor colonocytes overexpressed chemokines (CXCL1, CXCL2, CXCL3, and CCL20), showing significant effects on inflammatory processes and immune cell recruitment (Fig. 2c). Eight hundred upregulated genes were identified in epithelial cells for all CMS subtype combined. Furthermore, 1,226 genes over-expressed across four CMS subtypes were identified (Fig. 2d and e). Two hundred and eighty-four up-regulated proteins were identified from proteomic data (Fig. 2f).

As shown in Figs. 2g and 72 genes were thought to play an important role in the development of CRC. Further intersecting with gene sets of interest (listed in Supplementary Table S2) in TAAs findings, six driver genes in OncoKB and three genes from a fetal gene-expression signature were identified. In addition, EEF2, P4HB and HKDC1 were derived from SBA-related genes screened by slightly different strategies.

Finally, twelve potential TAAs, derived from the proteins encoded by the genes HMGA1, NPM1, EIF1AX, CKS1B, HSP90AB1, ACTG1, S100A11, SERPINB5, ANXA3, EEF2, P4HB, and HKDC1, were selected for subsequent experimental validation and verification (Supplementary Table S3). These genes are enriched in pathways closely related to cancer initiation and progression (Fig. 2h).

Diagnostic performance in the verification phase and validation phase

In the verification phase, six TAAbs, including anti-CKS1B, anti-ACTG1, anti-S100A11, anti-maspin, anti-ANXA3, and anti-eEF2, showed significant differences between CRC patients and NC (all P < 0.05) (Fig. 3a). The AUC values ranged from 0.58 to 0.64. Specifically, the AUC values were as follows: anti-CKS1B (AUC = 0.62, 95% CI:0.57–0.67, P < 0.01), anti-ACTG1 (AUC = 0.59, 95% CI:0.54–0.65, P < 0.01), anti-S100A11 (AUC = 0.64, 95% CI:0.58–0.69, P < 0.01), anti-maspin (AUC = 0.58, 95% CI:0.52–0.63, P < 0.01), anti-ANXA3 (AUC = 0.62, 95% CI:0.57–0.67, P < 0.01), and anti-eEF2 (AUC = 0.60, 95% CI:0.54–0.65, P < 0.01) (Fig. 4a).

During validation, the diagnostic performance of six significant TAAbs from the training set was further assessed, with five of them demonstrating potential diagnostic value. The results showed that the AUCs for these TAAbs ranged from 0.53 to 0.69, the sensitivity ranged from 22.22% to 37.78%, and the specificity ranged from 86.67% to 91.11% (Table 2). Among them, anti-maspin exhibited the highest diagnostic potential with an AUC of 0.69(95% CI: 0.62–0.77), a sensitivity of 35.56%, and a specificity of 86.67%. Anti-ACTG1 exhibited the lowest AUC of 0.53 (95% CI: 0.45–0.62) for CRC (P = 0.439) (Fig. 4b).

Table 2 Diagnostic performance of the 12 candidate TAAbs

Full size table

The results from the validation phase were consistent with those from the verification phase except for the autoantibody against ACTG1 (Fig. 3b). Consequently, data from the five TAAbs: anti-CKS1B, anti-S100A11, anti-maspin, anti-ANXA3, and anti-eEF2 were used for diagnostic model construction. There is no strong correlation between the relevant indicators in both the training and test sets.

Diagnostic performance of the immunodiagnostic model based on machine learning

The AUCs of ten models in the training set varied from 0.67 to 0.84, and their accuracy ranged from 61.90 to 72.62% (Fig. 5a; Table 3). Similarly, in the test set, the AUC ranged from 0.63 to 0.77 and the accuracy ranged from 61.11 to 68.33% (Fig. 5d; Table 3). The DeLong test showed no difference in AUCs between the training set and test sets for each model (Table 3).

Table 3 Diagnostic performance of the 10 machine learning algorithms in the training and test sets

Full size table

The stacking model showed the optimal diagnostic performance in the training set (AUC: 0.84, 95%CI = 0.80–0.87), followed by the RF model (AUC: 0.82, 95%CI = 0.78–0.86) (Fig. 5a and d). The DeLong test showed no significant difference between the two models. Additionally, the PR curve shows that the stacking model is better than the RF model. (Figure 5b and e). But DCA indicated the clinical effectiveness of the RF model in both the training set and the test set (Fig. 5c and f). To ensure a robust comparison of the RF and Stacking models, we evaluated their performance on both the training and test datasets using Delong test. In the test set, the AUC comparison showed P = 0.204. This suggests that both the RF and Stacking models exhibit comparable generalization performance on test data. Therefore, the simpler RF model is more suitable for clinical diagnosis.

The RF model exhibited consistent performance across training and test sets. It achieved an AUC of 0.82 (95% CI: 0.78–0.86) with 68.10% accuracy in the training set (Fig. 6a and b; Table 4), and an AUC of 0.75 (95% CI:0.68–0.82) with 67.22% accuracy in the test set. (Figure 6c and d; Table 3).

Table 4 Subgroup performance of the Five-TAAbs model in clinical diagnostics

Full size table

Interpretation and application of RF model

An online application for this RF model was developed, and SHAP was integrated to enhance the interpretability of the machine learning model, thereby increasing its utility in clinical settings. The bee swarm plot illustrates how the key characteristics in the dataset influence the model’s output, highlighting that anti-S100A11 has the highest SHAP values among all features (Fig. 6e).

To enhance its clinical utility, the RF model was deployed as a user-friendly web application accessible through a Shiny server (https://qzan.shinyapps.io/CRCPred/). Users can input TAAb values for one sample using the web interface, and the predicted likelihood of CRC diagnosis will be returned. The application leverages machine learning algorithms for real-time online computation.

Enhanced diagnostic performance of the 5-TAAbs immunodiagnostic model combined with CEA and CA19-9

Analysis of patients with CEA and CA19-9 test results demonstrated that combining the RF model with these biomarkers significantly improved the positive rate. In the training set, the RF model’s positive rate ranged from 50.3% to 50.5%, while CEA and CA19-9 achieved rates of 39.6% and 20.2%, respectively (Fig. 6f and h). Combining CEA and CA19-9 increased the positive rate to 44.3%. Notably, incorporating the RF model with CEA and CA19-9 further boosted the positive rate to 75.0%, representing a 30.7% improvement over the combination of CEA and CA19-9 alone (Fig. 6j).

Similar trends were observed in the test set. The RF model achieved a positive rate of 51.8%, while CEA and CA19-9 rates were 34.1% and 12.9%, respectively (Fig. 6g and i). The combination of CEA and CA19-9 yielded a positive rate of 36.5%. When combined with the RF model, the positive rate increased to 65.9%, representing a 29.4% improvement over the CEA and CA19-9 combination (Fig. 6k).

Subgroup analysis of the 5-TAAbs immunodiagnostic model for clinical application

Due to the limited sample size for certain clinical characteristics, the diagnostic values of the model in subgroups were primarily assessed in the training dataset. A subgroup analysis of clinical features such as age, gender, site, and stage was performed. The focus is on clinical features such as age, gender, lesion location, and disease staging. The findings revealed that the model demonstrated significantly better diagnostic efficacy for individuals aged 50 years and above compared to those under 50 years (P = 0.017) (Table 4). CRC diagnosed in individuals under 50 years is considered early-onset, while diagnosis at age 50 and above is termed late-onset CRC. Given the well-established positive correlation between age and the incidence risk of late-onset CRC, the model’s superior diagnostic performance in individuals aged 50 and over suggests its potential for future CRC screening programs targeting this high-risk population. However, the DeLong test revealed no significant difference in model performance between early and advanced stages of the disease (P = 0.085).

Discussion

CRC poses a significant global health burden, emphasizing the critical need for early diagnosis and improved patient outcomes [1]. Multi-omics analysis provided a comprehensive view of molecular changes in CRC, facilitating the identification of antigens from various biological pathways. By integrating single-cell transcriptome data from 74 individuals and proteome data from 197 individuals, this study screened 12 candidate TAAs and identified 5 TAAbs for CRC diagnosis through a two-phase ELISA validation. While the individual AUCs of these five TAAbs ranged from 0.58 to 0.64 in the training set and 0.62 to 0.69 in the test set, their combination within the RF model significantly improved diagnostic performance. Notably, combining the 5-TAAb RF model with established biomarkers like CEA or CA19-9 demonstrated superior performance in CRC detection.

The identification of TAAs is crucial for understanding CRC pathogenesis. Mitochondrial DNA transfer contributes to cancer progression by inducing epithelial cells to produce pro-cancer cytokines [44]. Furthermore, the accumulation of somatic mutations in cancer genomes results in the presence of multiple driver gene mutations within a single tumor [45]. The driver genes cataloged in the IntOGen and OncoKB may serve as potential TAAs [46]. Notably, stage-specific embryonic antigens, often present on both pluripotent stem cells and cancer stem cells, are considered potential diagnostic markers and therapeutic targets [47, 48]. By selecting genes commonly upregulated across different subtypes of CRC, we aimed to ensure broader applicability of the identified candidate genes.

In the training set, the AUC for these five TAAbs ranged from 0.58 to 0.64, while in the test set, the AUC ranged from 0.62 to 0.69. Notably, the diagnostic performance of these TAAbs in CRC has not been confirmed in previous studies. Yang et al. employed the SERPA method to identify anti-maspin and anti-ANXA3 as potential biomarkers for colon cancer, demonstrating differential expression patterns in a limited cohort of eight patients with colon adenocarcinoma and liver metastasis at various stages [49]. However, these findings lack validation in larger, population-based studies. Moreover, maspin expression is known to be downregulated during the early stages of tumorigenesis, and CEA may potentially influence its expression levels in CRC [50]. Yusuke et al. have determined that CRC patients exhibit significantly higher levels of eEF2 IgG antibodies compared to healthy individuals (P < 0.01) [51]. Studies have shown that eEF2 protein levels are significantly elevated in many different cancer types compared to normal tissues [51]. Research indicate that CRC overexpresses S100A11, and S100A11 undergoes nucleocytoplasmic translocation during cancer development, potentially impacting cancer cell proliferation [52]. Moreover, another study shows that serum S100A11 levels in CRC patients are significantly overexpressed [53]. CKS1B promotes cell proliferation by regulating the activity of cyclins and cyclin-dependent kinases, playing a key role in controlling the transition from the G1 to S phase of the cell cycle [54].

IgG isotype autoantibodies are abundant and diverse in human serum, and their levels can be influenced by disease conditions, including cancer [55]. An integrative analysis has demonstrated a significant increase in IgG-secreting plasma cells within CRC tissues compared to normal and adjacent tissues [56]. The presence of specific IgG antibodies in plasma offers a valuable resource for clinical detection [55]. Liu et al., after reviewing multiple studies, found that p53 autoantibody levels are significantly elevated in the blood of CRC patients compared to healthy controls. Furthermore, patients with negative serum p53 antibody detection exhibited longer disease-free and overall survival, highlighting its potential as a prognostic biomarker for early detection and clinical prognosis [14]. Study showed that although 69 CRC-associated TAAbs demonstrated high specificity (> 85%), their individual sensitivity was generally low (< 30%) [57, 58]. However, combining multiple autoantibodies significantly enhances sensitivity without compromising specificity, where combined sensitivity increased from 18.1%-35.1% to58.5% [57, 59]. In our study, the sensitivity of individual markers ranged from 20.48% to 36.67% in the training set and from 22.22% to 37.78% in the test set, with specificity exceeding 85% in both sets. Training an RF model using five TAAbs resulted in a sensitivity of 50.95% in the training set and 51.11% in the test set. Notably, combining this RF model with CEA and CA19-9 significantly enhanced diagnostic accuracy for CRC, emphasizing the valuable complementary role of these biomarkers alongside traditional clinical biomarkers.

The application of machine learning algorithms in the field of oncology has significant advantages in terms of accuracy and efficiency [16, 60]. Yin et al. developed an extracellular vesicle–related RF model for CRC diagnosis, achieving an AUC of 0.960 [61]. In this study, models based on the decision tree ensemble family, such as RF, LightGBM, and XGBoost, demonstrated superior performance. In comparison, while LR serves as a baseline model, its performance was notably inferior to that of tree-based models. Notably, RF and Stacking models emerged as the top two performers in terms of diagnostic accuracy, showcasing considerable generalization capability on the test set (P = 0.204). Therefore, this study selected the simpler RF model, allowing for interpretable results through SHAP.

This study presents several key strengths. Firstly, the integration of multi-omics data and CMS classification enhances the robustness of TAA selection. Secondly, rigorous validation of candidate TAAs ensures the reliability of the identified biomarkers. Thirdly, the application of machine learning algorithms improves diagnostic accuracy. Furthermore, the optimal RF model has been deployed as a Shiny app to enhance practical usability.

However, certain limitations need to be acknowledged. The sample size, while adequate for initial analysis, may limit the power of subgroup analyses. Additionally, the study population was predominantly from central China, which may limit the generalizability of the findings. Furthermore, the impact of potential confounding factors, such as smoking and alcohol consumption, was not assessed. Future large-scale, multi-center studies are warranted to further validate the diagnostic performance of the developed model and its generalizability across diverse populations.

Conclusions

In conclusion, this study has developed and validated a novel diagnostic method for CRC, utilizing a panel of TAAbs, and implementing a robust machine-learning model. This user-friendly model is accessible through a web application (qzan.shinyapps.io/CRCPred/), offering a promising tool for CRC diagnosis. The optimal model proposed in our study can significantly enhance the diagnostic performance of CEA and CA19-9.

Data availability

The datasets supporting the conclusions of this article are included within the article. All data utilized in this study are accessible from the corresponding authors upon reasonable request.

Abbreviations

CRC:: Colorectal Cancer
TAAs:: Tumor-associated Antigens
TAAbs:: Tumor-associated Autoantibodies
ELISA:: Enzyme-linked Immunosorbent Assays
SERPA:: Serological Proteome Analysis
CMS:: Consensus Molecular Subtypes
HC:: Healthy Control
GEO:: Gene Expression Omnibus
CPTAC:: Clinical Proteomic Tumor Analysis Consortium
SBA:: Small Intestine Carcinoma
GO:: Gene Ontology
SBI:: Specific Binding Index
OD:: Optical Density
QC:: Quality Control
RF:: Random Forest
PR:: Precision-recall
DCA:: Decision curve analysis

References

Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. Ca Cancer J Clin. 2021;71:209–49.
Article PubMed Google Scholar
Spaander MCW, Zauber AG, Syngal S, Blaser MJ, Sung JJ, You YN, et al. Young-onset colorectal cancer. Nat Reviews Disease Primers. 2023;9:21.
Article PubMed Google Scholar
Morgan E, Arnold M, Gini A, Lorenzoni V, Cabasag CJ, Laversanne M, et al. Global burden of colorectal cancer in 2020 and 2040: incidence and mortality estimates from GLOBOCAN. Gut. 2023;72:338–44.
Article PubMed Google Scholar
Hamdy NM. Unraveling the NcRNA landscape that governs colorectal cancer: A roadmap to personalized therapeutics. Life Sci. 2024;354:122946.
Article CAS PubMed Google Scholar
Rizk NI. Revealing the role of serum Exosomal novel long non-coding RNA NAMPT-AS as a promising diagnostic/prognostic biomarker in colorectal cancer patients. Life Sci. 2024;352:122850.
Article CAS PubMed Google Scholar
Zaika V, Prakash MK, Cheng C-Y, Schlander M, Lang BM, Beerenwinkel N, et al. Optimal timing of a colonoscopy screening schedule depends on adenoma detection, adenoma risk, adherence to screening and the screening objective: A microsimulation study. PLoS ONE. 2024;19:e0304374.
Article CAS PubMed PubMed Central Google Scholar
Coronado GD, Bienen L, Burnett-Hartman A, Lee JK, Rutter CM. Maximizing scarce colonoscopy resources: the crucial role of stool-based tests. JNCI: J Natl Cancer Inst. 2024;116:647–52.
Article PubMed PubMed Central Google Scholar
Imperiale TF, Gagrat ZD, Garces J, Brinberg D, Limburg PJ. Next-Generation multitarget stool DNA test for colorectal cancer screening. N Engl J Med. 2024;390:984–93.
Article CAS PubMed Google Scholar
Church TR, Wandell M, Lofton-Day C, Mongin SJ, Burger M, Payne SR, et al. Prospective evaluation of methylated SEPT9 in plasma for detection of asymptomatic colorectal cancer. Gut. 2014;63:317–25.
Article CAS PubMed Google Scholar
Beniwal SS, Lamo P, Kaushik A, Lorenzo-Villegas DL, Liu Y, MohanaSundaram A. Current status and emerging trends in colorectal cancer screening and diagnostics. Biosensors. 2023;13:926.
Article CAS PubMed PubMed Central Google Scholar
Stiksma J, Grootendorst DC, Van Der Linden PWG. CA 19– 9 as a marker in addition to CEA to monitor colorectal cancer. Clin Colorectal Cancer. 2014;13:239–44.
Article PubMed Google Scholar
Luo H, Shen K, Li B, Li R, Wang Z, Xie Z. Clinical significance and diagnostic value of serum NSE, CEA, CA19–9, CA125 and CA242 levels in colorectal cancer. Oncol Lett. 2020;20:742–50.
Article CAS PubMed PubMed Central Google Scholar
Liu W, Peng B, Lu Y, Xu W, Qian W, Zhang J-Y. Autoantibodies to tumor-associated antigens as biomarkers in cancer immunodiagnosis. Autoimmun Rev. 2011;10:331–5.
Article CAS PubMed Google Scholar
Liu S, Tan Q, Song Y, Shi Y, Han X. Anti-p53 autoantibody in blood as a diagnostic biomarker for colorectal cancer: A meta-analysis. Scand J Immunol. 2020;91:e12829.
Article PubMed Google Scholar
Chen H, Werner S, Tao S, Zörnig I, Brenner H. Blood autoantibodies against tumor-associated antigens as biomarkers in early detection of colorectal cancer. Cancer Lett. 2014;346:178–87.
Article CAS PubMed Google Scholar
Wang H, Li X, Zhou D, Huang J. Autoantibodies as biomarkers for colorectal cancer: A systematic review, meta-analysis, and bioinformatics analysis. Int J Biol Markers. 2019;34:334–47.
Article CAS PubMed Google Scholar
Poletaev A, Pukhalenko A, Kukushkin A, Sviridov P. Detection of early cancer: genetics or immunology?? Serum autoantibody profiles as markers of malignancy. ACAMC. 2015;15:1260–3.
Article CAS Google Scholar
Ran Y, Hu H, Zhou Z, Yu L, Sun L, Pan J, et al. Profiling Tumor-Associated autoantibodies for the detection of colon cancer. Clin Cancer Res. 2008;14:2696–700.
Article CAS PubMed Google Scholar
Chang W, Wu L, Cao F, Liu Y, Ma L, Wang M, et al. Development of autoantibody signatures as biomarkers for early detection of colorectal carcinoma. Clin Cancer Res. 2011;17:5715–24.
Article CAS PubMed Google Scholar
Wang H, Zhang B, Li X, Zhou D, Li Y, Jia S et al. Identification and validation of novel serum autoantibody biomarkers for early detection of colorectal cancer and advanced adenoma. Front Oncol. 2020;10.
Barpanda A, Tuckley C, Ray A, Banerjee A, Duttagupta SP, Kantharia C, et al. A protein microarray-based serum proteomic investigation reveals distinct autoantibody signature in colorectal cancer. Proteom Clin Apps. 2023;17:2200062.
Article CAS Google Scholar
Guinney J, Dienstmann R, Wang X, de Reyniès A, Schlicker A, Soneson C, et al. The consensus molecular subtypes of colorectal cancer. Nat Med. 2015;21:1350–6.
Article CAS PubMed PubMed Central Google Scholar
Vasaikar S, Huang C, Wang X, Petyuk VA, Savage SR, Wen B, et al. Proteogenomic analysis of human colon cancer reveals new therapeutic opportunities. Cell. 2019;177:1035–e104919.
Article CAS PubMed PubMed Central Google Scholar
Lee H-O, Hong Y, Etlioglu HE, Cho YB, Pomella V, Van den Bosch B, et al. Lineage-dependent gene expression programs influence the immune landscape of colorectal cancer. Nat Genet. 2020;52:594–603.
Article CAS PubMed Google Scholar
Khaliq AM, Erdogan C, Kurt Z, Turgut SS, Grunvald MW, Rand T, et al. Refining colorectal cancer classification and clinical stratification through a single-cell atlas. Genome Biol. 2022;23:113.
Article CAS PubMed PubMed Central Google Scholar
Alorda-Clara M, Torrens-Mas M, Morla-Barcelo PM, Martinez-Bernabe T, Sastre-Serra J, Roca P, et al. Use of omics technologies for the detection of colorectal cancer biomarkers. Cancers. 2022;14:817.
Article CAS PubMed PubMed Central Google Scholar
Joanito I, Wirapati P, Zhao N, Nawaz Z, Yeo G, Lee F, et al. Single-cell and bulk transcriptome sequencing identifies two epithelial tumor cell States and refines the consensus molecular classification of colorectal cancer. Nat Genet. 2022;54:963–75.
Article CAS PubMed PubMed Central Google Scholar
Hao Y, Hao S, Andersen-Nissen E, Mauck WM, Zheng S, Butler A, et al. Integrated analysis of multimodal single-cell data. Cell. 2021;184:3573–e358729.
Article CAS PubMed PubMed Central Google Scholar
Korsunsky I, Millard N, Fan J, Slowikowski K, Zhang F, Wei K, et al. Fast, sensitive and accurate integration of single-cell data with harmony. Nat Methods. 2019;16:1289–96.
Article CAS PubMed PubMed Central Google Scholar
Domínguez Conde C, Xu C, Jarvis LB, Rainbow DB, Wells SB, Gomes T, et al. Cross-tissue immune cell analysis reveals tissue-specific features in humans. Science. 2022;376:eabl5197.
Article PubMed PubMed Central Google Scholar
Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43:e47–47.
Article PubMed PubMed Central Google Scholar
Gonzalez-Perez A, Perez-Llamas C, Deu-Pons J, Tamborero D, Schroeder MP, Jene-Sanz A, et al. IntOGen-mutations identifies cancer drivers across tumor types. Nat Methods. 2013;10:1081–2.
Article CAS PubMed PubMed Central Google Scholar
Chakravarty D, Gao J, Phillips S, Kundra R, Zhang H, Wang J et al. OncoKB: A precision oncology knowledge base. JCO Precision Oncol. 2017;1:PO.17.00011.
Almeida LG, Sakabe NJ, deOliveira AR, Silva MCC, Mundstein AS, Cohen T, et al. CTdatabase: a knowledge-base of high-throughput and curated data on cancer-testis antigens. Nucleic Acids Res. 2009;37:816–9. Database issue:D.
Article Google Scholar
Chen B, Scurrah CR, McKinley ET, Simmons AJ, Ramirez-Solano MA, Zhu X, et al. Differential pre-malignant programs and microenvironment chart distinct paths to malignancy in human colorectal polyps. Cell. 2021;184:6262–e628026.
Article CAS PubMed PubMed Central Google Scholar
Park Y-K, Franklin JL, Settle SH, Levy SE, Chung E, Jeyakumar LH, et al. Gene expression profile analysis of mouse colon embryonic development. Genesis. 2005;41:1–12.
Article CAS PubMed Google Scholar
Mustata RC, Vasile G, Fernandez-Vallone V, Strollo S, Lefort A, Libert F, et al. Identification of Lgr5-independent spheroid-generating progenitors of the mouse fetal intestinal epithelium. Cell Rep. 2013;5:421–32.
Article CAS PubMed Google Scholar
Yu G, Wang L-G, Han Y, He Q-Y. ClusterProfiler: an R package for comparing biological themes among gene clusters. OMICS. 2012;16:284–7.
Article CAS PubMed PubMed Central Google Scholar
Sun G, Ye H, Yang Q, Zhu J, Qiu C, Shi J, et al. Using proteome microarray and gene expression omnibus database to screen Tumour-Associated antigens to construct the optimal diagnostic model of oesophageal squamous cell carcinoma. Clin Oncol. 2023;35:e582–92.
Article CAS Google Scholar
Li T, Sun G, Ye H, Song C, Shen Y, Cheng Y, et al. ESCCPred: a machine learning model for diagnostic prediction of early esophageal squamous cell carcinoma using autoantibody profiles. Br J Cancer. 2024;131:883–94.
Article CAS PubMed Google Scholar
Wu H, Gonzalez Villalobos R, Yao X, Reilly D, Chen T, Rankin M, et al. Mapping the single-cell transcriptomic response of murine diabetic kidney disease to therapies. Cell Metab. 2022;34:1064–e10786.
Article CAS PubMed PubMed Central Google Scholar
Xu C, Prete M, Webb S, Jardine L, Stewart B, He P et al. Automatic cell type harmonization and integration across Human Cell Atlas datasets.
Malagola E, Vasciaveo A, Ochiai Y, Kim W, Zheng B, Zanella L, et al. Isthmus progenitor cells contribute to homeostatic cellular turnover and support regeneration following intestinal injury. Cell. 2024;187:3056–e307117.
Article CAS PubMed Google Scholar
Guan B, Liu Y, Xie B, Zhao S, Yalikun A, Chen W, et al. Mitochondrial genome transfer drives metabolic reprogramming in adjacent colonic epithelial cells promoting TGFβ1-mediated tumor progression. Nat Commun. 2024;15:3653.
Article CAS PubMed PubMed Central Google Scholar
Gires O, Pan M, Schinke H, Canis M, Baeuerle PA. Expression and function of epithelial cell adhesion molecule EpCAM: where are we after 40 years? Cancer Metastasis Rev. 2020;39:969–87.
Article CAS PubMed PubMed Central Google Scholar
Wang T, Liu H, Pei L, Wang K, Song C, Wang P, et al. Screening of tumor-associated antigens based on Oncomine database and evaluation of diagnostic value of autoantibodies in lung cancer. Clin Immunol. 2020;210:108262.
Article CAS PubMed Google Scholar
Andrews PW, Gokhale PJ. A short history of pluripotent stem cells markers. Stem Cell Rep. 2024;19:1–10.
Article CAS Google Scholar
Long Y-Y, Wang Y, Huang Q-R, Zheng G-S, Jiao S-C. Measurement of serum antibodies against NY-ESO-1 by ELISA: A guide for the treatment of specific immunotherapy for patients with advanced colorectal cancer. Exp Ther Med. 2014;8:1279–84.
Article PubMed PubMed Central Google Scholar
Yang Q, Roehrl MH, Wang JY. Proteomic profiling of antibody-inducing immunogens in tumor tissue identifies PSMA1, LAP3, ANXA3, and Maspin as colon cancer markers. Oncotarget. 2018;9:3996–4019.
Article PubMed Google Scholar
Baek JY, Yeo HY, Chang HJ, Kim K, Kim SY, Park JW, et al. Serpin B5 is a CEA-interacting biomarker for colorectal cancer. Int J Cancer. 2014;134:1595–604.
Article CAS PubMed Google Scholar
Oji Y, Tatsumi N, Fukuda M, Nakatsuka S-I, Aoyagi S, Hirata E, et al. The translation elongation factor eEF2 is a novel tumor-associated antigen overexpressed in various types of cancers. Int J Oncol. 2014;44:1461–9.
Article CAS PubMed PubMed Central Google Scholar
Cross SS, Hamdy FC, Deloulme JC, Rehman I. Expression of S100 proteins in normal human tissues and common cancers using tissue microarrays: S100A6, S100A8, S100A9 and S100A11 are all overexpressed in common cancers. Histopathology. 2005;46:256–69.
Article CAS PubMed Google Scholar
Moravkova P, Kohoutova D, Vavrova J, Bures J. Serum S100A6, S100A8, S100A9 and S100A11 proteins in colorectal neoplasia: results of a single centre prospective study. Scand J Clin Lab Invest. 2020;80:173–8.
Article CAS PubMed Google Scholar
Kashkin KN, Chernov IP, Stukacheva EA, Kopantzev EP, Monastyrskaya GS, Uspenskaya NY, et al. Cancer specificity of promoters of the genes involved in cell proliferation control. Acta Naturae. 2013;5:79–83.
Article CAS PubMed PubMed Central Google Scholar
Nagele EP, Han M, Acharya NK, DeMarshall C, Kosciuk MC, Nagele RG. Natural IgG autoantibodies are abundant and ubiquitous in human Sera, and their number is influenced by age, gender, and disease. PLoS ONE. 2013;8:e60726.
Article CAS PubMed PubMed Central Google Scholar
Chu X, Li X, Zhang Y, Dang G, Miao Y, Xu W, et al. Integrative single-cell analysis of human colorectal cancer reveals patient stratification with distinct immune evasion mechanisms. Nat Cancer. 2024;5:1409–26.
Article CAS PubMed Google Scholar
Niloofa R, De Zoysa MI, Seneviratne SL. Autoantibodies in the diagnosis, prognosis, and prediction of colorectal cancer. J Cancer Res Ther. 2021;17:819–33.
Article CAS PubMed Google Scholar
Nikolaou S, Qiu S, Fiorentino F, Rasheed S, Tekkis P, Kontovounisios C. Systematic review of blood diagnostic markers in colorectal cancer. Tech Coloproctol. 2018;22:481–98.
Article PubMed PubMed Central Google Scholar
Chan C, Fan C, Kuo Y, Chen Y, Chang P, Chen K, et al. Multiple serological biomarkers for colorectal cancer detection. Intl J Cancer. 2010;126:1683–90.
Article CAS Google Scholar
Singh G. Artificial intelligence in colorectal cancer: a review. Siberian J Oncol. 2023;22:99–107.
Article Google Scholar
Yin H, Xie J, Xing S, Lu X, Yu Y, Ren Y, et al. Machine learning-based analysis identifies and validates serum Exosomal proteomic signatures for the diagnosis of colorectal cancer. Cell Rep Med. 2024;5:101689.
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The authors would like to thank all the staff and patients who participated in our study from the First Affiliated Hospital of Zhengzhou University and the Biological Specimen Bank of Henan Key Laboratory of Tumor Epidemiology (Henan, China) for their cooperation and collaboration. Data analysis was supported by the Supercomputing Center in Zhengzhou University (Zhengzhou).

Funding

This work was funded by the Project of Basic Research Fund of Henan Institute of Medical and Pharmaceutical Sciences (2022BP0112, 2024BP0207). This study was also supported by the International Cultivation of Henan Advanced Talents Program.

Author information

Authors and Affiliations

State Key Laboratory of Metabolic Dysregulation & Prevention and Treatment of Esophageal Cancer, Henan Institute of Medical and Pharmaceutical Sciences, Zhengzhou University, Zhengzhou, 450052, Henan, China
Zan Qiu, Xinwei Wang, Zirui Kang, Lei Peng, Keyan Wang, Liping Dai & Jianxiang Shi
College of Public Health, Zhengzhou University, Henan, 450001, Zhengzhou, China
Yifan Cheng, Haiyan Liu, Tiandong Li, Yin Lu, Donglin Jiang, Xiaoyue Zhang, Hua Ye & Peng Wang
Henan Key Laboratory of Tumor Epidemiology, Zhengzhou University, Zhengzhou, 450052, Henan, China
Zan Qiu, Yifan Cheng, Haiyan Liu, Tiandong Li, Yin Lu, Donglin Jiang, Xiaoyue Zhang, Xinwei Wang, Zirui Kang, Lei Peng, Keyan Wang, Liping Dai, Hua Ye, Peng Wang & Jianxiang Shi
Division of Pediatric Surgery, Department of Surgery, Children’s Hospital of Pittsburgh, University of Pittsburgh School of Medicine, PA, 15224, Pittsburgh, USA
Yinan Jiang

Authors

Zan Qiu
View author publications
You can also search for this author inPubMed Google Scholar
Yifan Cheng
View author publications
You can also search for this author inPubMed Google Scholar
Haiyan Liu
View author publications
You can also search for this author inPubMed Google Scholar
Tiandong Li
View author publications
You can also search for this author inPubMed Google Scholar
Yinan Jiang
View author publications
You can also search for this author inPubMed Google Scholar
Yin Lu
View author publications
You can also search for this author inPubMed Google Scholar
Donglin Jiang
View author publications
You can also search for this author inPubMed Google Scholar
Xiaoyue Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Xinwei Wang
View author publications
You can also search for this author inPubMed Google Scholar
Zirui Kang
View author publications
You can also search for this author inPubMed Google Scholar
Lei Peng
View author publications
You can also search for this author inPubMed Google Scholar
Keyan Wang
View author publications
You can also search for this author inPubMed Google Scholar
Liping Dai
View author publications
You can also search for this author inPubMed Google Scholar
Hua Ye
View author publications
You can also search for this author inPubMed Google Scholar
Peng Wang
View author publications
You can also search for this author inPubMed Google Scholar
Jianxiang Shi
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Conceptualization, JS, ZQ, LD and KW; Project administration & Supervision, JS, PW and HY; Formal analysis, ZQ, YC, HL and TL; Methodology & Experiment, ZQ, HL, YL, DJ and XZ; Investigation, XW, ZK and LP; Data curation & Data analysis, ZQ, JS and YJ; Original draft, ZQ; Review & Editing, All authors.

Corresponding author

Correspondence to Jianxiang Shi.

Ethics declarations

Ethics approval and consent to participate

This study was approved by the Institutional Review Board of Zhengzhou University (Approval number: ZZURIB 2019-002). All participants were informed the purpose the study and provided written consent by signing consent forms.

Consent for publication

All authors have read the manuscript and have agreed to submit it in its current form for consideration for publication.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary Material 2

Supplementary Material 3

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Qiu, Z., Cheng, Y., Liu, H. et al. Screening colorectal cancer associated autoantigens through multi-omics analysis and diagnostic performance evaluation of corresponding autoantibodies. BMC Cancer 25, 713 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12885-025-14080-5

Download citation

Received: 19 November 2024
Accepted: 03 April 2025
Published: 16 April 2025
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12885-025-14080-5

Screening colorectal cancer associated autoantigens through multi-omics analysis and diagnostic performance evaluation of corresponding autoantibodies

Abstract

Background

Methods

Results

Conclusions

Introduction

Materials and methods

Participating patients and sample collection

Identification of candidate TAAs based on multi-omics

Single-cell transcriptome data analysis

Differential expression analysis for single cell transcriptomic data

Differential expression analysis for proteomic data

Additional screening strategies are used to narrow down the candidate TAAs

Function analysis of differentially expressed genes or proteins

Recombinant proteins and the detection of TAAbs by ELISA

Diagnostic model development

Statistical analysis

Results

Study design and sample characteristics

Identification of candidate TAAs based on multi-omics

Diagnostic performance in the verification phase and validation phase

Diagnostic performance of the immunodiagnostic model based on machine learning

Interpretation and application of RF model

Enhanced diagnostic performance of the 5-TAAbs immunodiagnostic model combined with CEA and CA19-9

Subgroup analysis of the 5-TAAbs immunodiagnostic model for clinical application

Discussion

Conclusions

Data availability

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s note

Electronic supplementary material

Supplementary Material 1

Supplementary Material 2

Supplementary Material 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Cancer

Contact us