Article Preview
TopIntroduction
Cancer, as a complex disease, is not only controlled by individual genes and genetic factors but is also related to the environment and living habits. These factors affect gene expression and thereby influence the occurrence and development of cancer. Biomarkers, such as genes, miRNAs, proteins, metabolites, are biological entities that can determine whether cells, tissues, or individuals are normal or have diseases (Ideker & Sharan, 2008). In the medical field, biomarkers can help diagnose diseases, predict disease development trends, predict the response of patients after treatment, and thus achieve precise and effective treatment for patients. To date, no effective diagnosis and treatment methods have been determined for many types of cancer. Therefore, identifying biomarkers that recognize the early characteristics of cancer and determining the mechanism of cancer occurrence and development are vital.
Traditional cancer biomarkers, such as carcinoembryonic antigens and tumor tissue images, can only detect cancer in the late stages and are not useful for the treatment of patients with cancer. The cure rate and survival rate in patients with cancer are relatively low. Therefore, early detection and timely treatment are necessary to improve these rates.
The emergence of next-generation sequencing technology has greatly accelerated cancer research. The use of gene expression data to identify cancer-related genes and biomarkers has accelerated the process of individualized treatment (Dancik, 2015). Some studies used gene expression data to distinguish between normal and tumor samples (Nannini et al., 2009). Other studies used gene expression data to detect different states of cancer development (van’t Veer et al., 2002; Klahan et al., 2016). However, because gene expression data often include small sample numbers and noise, using only gene expression data limits the discovery of new candidate cancer genes.
In general, gene expression can be regulated by heterogeneous multi-level regulatory factors such as copy number, DNA methylation, transcription factors, and miRNAs (Cancer Genome Atlas Research N, 2012; 2013). High-throughput sequencing can be performed to accurately obtain various biological data at various stages of organism development. These data are collectively referred to as multi-omics data (Reuter et al., 2015) and include multiple types of datasets, such as genomics, epigenomics, transcriptomics, proteomics, metabolomics, and microbiomics data. Using various omics techniques, we are able to understand diseases from a variety of perspectives. Many studies have used DNA methylation, micro RNA (miRNA), protein-protein interaction network (PPIN), or other data to identify cancer-related biomarkers (Zhao et al., 2017; Capper et al., 2018; Liu et al., 2017; Zhou et al., 2016; Wu et al., 2014). However, most methods do not effectively integrate multi-omics data to identify cancer-related genes and biomarkers. Although the use of single-omics data to identify cancer-related genes has yielded many valuable results, a single data source does not provide complete information for a gene, and the results are significantly affected by noise.