If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
We aim to identify the potential genes and signaling pathways associated with the nasopharyngeal carcinoma (NPC) prognosis using Weighted Gene Co-Expression Network Analysis (WGCNA).
Gene Expression Omnibus (GEO) query was utilized to download two NPC mRNA microarray data. WGCNA was conducted on differentially expressed genes (DEGs) to obtain tumor-associated gene modules. Genes in core modules were intersected with DEGs for gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) functional enrichment analysis. GSE102349 dataset was devoted to identifying prognostic hub genes by survival analysis and the results were confirmed by quantitative polymerase chain reaction (qPCR).
Co-expression networks were built, and we detected 12 gene modules. The Brown module and Magenta module were extremely associated with NPC samples. GO functional analysis and KEGG pathway analysis was carried out to the genes in the Brown and Magenta modules. Our data indicated that DEGs in Brown module and Magenta module were correlated with the biological regulation, metabolic process, reproduction, and cellular proliferation. Twenty-six hub genes were obtained and were considered to be closely related to NPC. GSE102349 dataset was devoted to identifying prognostic hub genes by survival analysis. The expression of IL33, MPP3 and SLC16A7 in GSE102349 dataset was significantly correlated with the progression-free survival (PFS). The results of qPCR indicated a strong correlation between SLC16A7 expression and the overall survival (OS).
WGCNA contributed to the detection of gene modules and identification of hub genes and crucial genes. These crucial genes might be potential targets for pharmaceutic therapies with potential clinical significance.
] revealed that real-time monitoring of liquid biopsy response added prognostic information. This suggested that pretreatment EBV-DNA may serve as an important index for establishing a treatment regimen. In contrast, post-treatment EBV-DNA rather than pre-treatment EBV-DNA was correlated with the overall survival (OS) among the NPC patients, which may serve as a predictable factor for the prognosis [
]. Nevertheless, there is still a lack of clinical trials focusing on the prediction efficiency of these biomarkers in NPC patients.
High-throughput gene expression profiling techniques (e.g. microarray and high-throughput gene sequencing) provide new bioinformatic methods for identification of genes and pathological types that are associated with the onset of certain diseases. Meanwhile, they promote the identification of gene signatures related to the disease prognostic prediction and treatment. To our best knowledge, Weighted Gene Co-Expression Network Analysis (WGCNA) approach is proposed to provide a functional interpretation that is biologically significant and efficient in constructing global network structures [
]. Therefore, it is considered as the most straightforward method for investigating the gene co-expression networks. In this study, WGCNA was utilized to identify the molecular markers associated with NPC prognosis.
2. Materials and methods
2.1 Tissues samples
A total of 35 NPC patients admitted to our department between January 2014 and December 2015 were included in this study. All NPC patients had signed their written informed consent to participate this study. Clinical stages of NPC patients were classified by the TNM system of the American Joint Committee on Cancer. The patients with other malignancy and incomplete clinical or prognostic information were excluded from this study. The tissues samples were collected within 15 min after sample collection. The study protocols were approved by the Institutional Review Board of our institution.
2.2 GEO download and pre-processing
GEO query was utilized to download two NPC mRNA microarray datasets including GSE12452 and GSE53819 from GEO. The datasets were shown in Table 1. The study flow chart was added in Fig. 1. GSE53819 [
] dataset included 18 NPC tissue samples and 18 non-cancer tissue samples. One third of population was female. The samples were obtained from each subject before treatment. Agilent-014850 human genome microarray (4 × 44K G4112F, Agilent Tech, CA, USA) served as the sequencing platform. GSE12452 [
] was based on 10 normal nasopharyngeal tissues and 31 NPC tissues obtained before therapy. The sequencing platform was U133 Plus 2.0 array (Affymetrix, CA, USA). For the pre-processing of mRNA expression spectrum datasets, probes were projected to corresponding gene symbols. In cases of multiple probes on the same gene, expression level was calculated as the median, together with screening of mRNA dataset.
Table 1Datasets in the NPC expressing spectrum of GEO.
] was utilized for the differential analysis of GSE12452 and GSE53819. The screening for the mRNA expression spectrum was based on Benjamini-Hochberg adjusted P value of less than 0.05.
For the WGCNA, the gene expression was used to establish the similar matrix based on the absolute value of the Pearson correlation coefficient of both genes. The Pearson correlation coefficient was calculated according to the following formula:
where i and j represented two genes, and xi and xj represented the expression values, respectively. Then the similar matrix was transmitted to the adjacency matrix (signed type) using the following formula:
where i and j represented two genes and β represented the soft threshold.
Afterwards, the adjacency matrix was transmitted to the topological matrix, and the topological overlap measure (TOM) was utilized to describe the correlation between two genes:
where 1-TOM represented the variation between genei and genej.
The 1-TOM served as the distance for the clustering analysis of the genes, and then the module recognition was performed using the dynamic splicing trees. The most representative gene in each module was the module eigengene (ME), which reflected the whole level of the gene expression in the modules. It was the first main constituent of each module, and the ME was calculated using the following formula:
where i represented the gene in the module q, and 1 represented the chip sample in module q.
The module membership (MM) was evaluated based on the expression spectrum of a certain gene in all the samples and the ME, using the formula as follows:
Where xi represented the expression spectrum of genei, MEq represented the ME of the module q. MMqi represented the membership of genei in the module q.
In cases of MMqi=0, the genei was not in the module q; in the presence of a MMqi of approximately +1 or -1, there was a highly positive or negative correlation between genei and module q.
] was utilized to establish weighted gene co-expression network according to expression profile of the candidate genes. Then NPC-related modules were screened, and the interacted sub-networks were established using the co-expression data sets in the modules.
2.5 Functional analysis of modules
Candidate hub genes of specific target modules were subjected to KEGG pathway and GO functional enrichment analysis. KEGG pathways were analyzed using KEGG database (http://www.KEGG.jp) and GO functional enrichment analysis was carried out using GO database (http://geneontology.org/).
2.6 Prognostic analysis based on hub genes
For prognostic analysis, NPC dataset GSE102349 in GEO database was utilized. Disease progression and no progression in the last follow-up were considered as outcome of events. Median was used as boundary for evaluating high or low expression of genes. In the follow-up, Kaplan-Meier Plotter was utilized to depict progression-free survival (PFS) curvature. There were 113 samples in the GSE102349 dataset. The sequencing platform was GPL11154 Illumina HiSeq 2000 (Homo sapiens). After excluding the 25 samples involving data without event prognosis, 72 samples with no progression and 16 samples with disease progression were identified in the last follow-up.
2.7 qRT-PCR verification
Total RNA of tissues were extracted by TRIzol reagent (Invitrogen, Carlsbad, CA, USA). Approximately 1.2 μg RNA was reversely transcribed to cDNA by using Fast Quant RT Kit (TaKaRa, Dalian, China). Subsequently, qRT-PCR was carried out by SYBR Premix Ex Taq II kit (TaKaRa) and Roche Light Cycler® 480 System (Roche, Chicago, USA). Relative expression change of targets was analyzed by 2−ΔΔCt method [
Statistical analysis was performed using GraphPad Prism v.7.0 (GraphPad Software Inc, San Diego, CA, USA). A P value of <0.05 was considered to be statistically significant.
2.9 Patient and public involvement
Patients and the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.
3.1 DEGs screening
Two mRNA microarray data of NPC including GSE12452 and GSE53819 were downloaded from GEO. The screening standard for mRNA expression spectrum was Benjamini-Hochberg adjusted P value of less than 0.05 and |logFC|>1). A total of 739 DEGs were identified in Supplementary File 1, including 457 that were up-regulated and 282 that were down-regulated in GSE12452 (Fig. 2a). For GSE53819, 1,756 DEGs (Supplementary File 2) were identified including 694 that were up-regulated and 1,062 that were down-regulated (Fig. 2b).
] was utilized to establish WGCNA for candidate gene set (GSE53819). Co-expression network was in line with scale network, which demonstrated a negative correlation between the log(k) and log(P(k)). The correlation coefficient was larger than 0.9. To ensure scale-free network, we selected optimal β value (β=6) (Fig. 3a). Subsequently, the expression matrix was transmitted to the adjacency matrix, while adjacency matrix was transmitted to topological matrix. Based on TOM, average linkage clustering algorithm was used for clustering analysis of genes. Gene network module should contain at least 30 genes. Samples underwent analysis showed satisfactory consensus with no outlier. Therefore, the process for the outlier was not deleted.
Adjacency modules were merged into new modules with a height set at 0.25, and then 12 modules were obtained (Fig. 3b). Gene list for each module was provided in Supplementary File 3. Correlation coefficient for the Brown, Magenta, and Turquoise modules was -0.87, 0.69, and -0.68, respectively. Both the positively and negatively highest absolute value for the correlation coefficients, namely Brown and Magenta, were analyzed. As shown in Fig. 3c, Brown module and Magenta module were extremely associated with NPC samples. Gene network with all genes were shown in Fig. 3d.
3.3 Functional analysis of modules
GO functional analysis was carried out to genes in the Brown module (Supplementary File 4) and Magenta module (Supplementary File 5). Brown module was associated with positive regulation of mitotic nuclear division, positive regulation of nuclear division, protein localization to microtubule cytoskeleton, endoplasmic reticulum to Golgi vesicle-mediated transport, cilium or flagellum dependent cell motility, cilium dependent cell motility, protein localization to cytoskeleton, protein localization to cell surface as well as cellular oxidant detoxification. Brown module and Magenta module were correlated with biological regulation, metabolic process, reproduction, as well as cellular proliferation (Fig. 4a and 4c).
KEGG pathway analysis was given to genes in Brown module (Supplementary File 6) and Magenta module (Supplementary File 7). For Brown module, KEGG pathway was mainly enriched in systemic lupus erythematosus, alcoholism, glutathione metabolism, tuberculosis, drug metabolism-cytochrome P450, metabolism of xenobiotics by cytochrome P450, apoptosis-multiple species, Huntington disease, and viral carcinogenesis. For Magenta module, KEGG pathway was mainly enriched in breast cancer, basal cell carcinoma, gastric cancer, arrhythmogenic right ventricular cardiomyopathy (ARVC), Hippo signaling pathway, Cushing syndrome, Wnt signaling pathway, Hepatocellular carcinoma, as well as thyroid cancer (Fig. 4b and 4d).
For Brown module and Magenta module, hub genes were further analyzed to identify their association with pathogenesis of cancer. Co-expression network was finally established based on selection of co-expression pairs with a weight of >0.4 in Brown module and a weight of >0.5 in Magenta module. Genes in two modules were subject to network visualization, followed by calculating degree of each gene node (Supplementary File 8). The genes with a screening degree ranked 10% among all the degrees were selected as the 26 hub genes for the modules, including FAM47E, C22orf15, WDR38, TSPAN1, AK7, IL33, RALA, DNAH12, PDSS1, TMEM190, EFCAB1, AKAP14, CLEC5A, GTF2H3, KIF6, TNFAIP6, OLR1, CELSR2, STAP2, RTKN, SPINT1, RARG, MPP3, SLC16A7, EPC1 and HOXC8.
3.4 Prognostic analysis based on hub genes
GSE102349 dataset was devoted to identifying prognostic hub genes by survival analysis. As shown in Fig. 5, PFS was significantly correlated with expression of IL33, MPP3 and SLC16A7. These indicated that these hub genes may be closely related to prognosis of NPC patients. Next, quantitative identification of hub genes by real-time fluorescence. Expression trend of SLC16A7 was strongly correlated with overall survival (OS, P=0.0366, Fig. 6).
NPC is characterized by distinct geographical distribution and is particularly prevalent in east and southeast Asia [
]. To date, the prognosis of NPC is highly relied on disease stage. Generally, patients with localized lesion and/or early-stage NPC (i.e. stage I and II) were reported to present good prognosis, with a longer five-year survival than the other counterparts [
]. To date, there is indeed a harsh demand for the screening technique, however, few have been developed due to technical challenges. This requests investigation on development of specific biomarkers for diagnosis and screening of NPC. In this study, we aim to identify molecular markers associated with pathogenesis of NPC.
WGCNA, an algorithm for distinguishing module information from chip, is a powerful tool for identifying possible regulatory mechanisms, which provides insights into possible pathways of certain diseases [
]. For the genes always with similar expression patterns in different tissues or in a physiological process, they were defined as a module. On this basis, candidate hub genes and/or treatment targets were selected using module connectivity to relationship between phenotype and module. Unlike conventional gene screening techniques only focusing on differential expression, WGCNA involves near 10,000 of variable genes to screen modules of interest [
]. In this study, WGCNA was conducted to explore gene expression modules related to NPC. Our results indicated that 2,110 genes were constructed by co-expression network and finally 12 modules were generated, in which Brown and Magenta modules were extremely associated with NPC samples.
KEGG pathway analysis contributes to screening of certain diseases based on gene structures and classifies pathways into 9 categories based on their targets. In this study, we screened candidate genes with biological functions in KEGG pathways and GO terms to investigate mechanisms of existing genes associated with NPC pathogenesis and prognosis, and mRNA interaction and future prediction of prognostic factors. Genes showed different enrichment levels in GO terms and KEGG pathways, implying diversity in biological function enrichment. In this study, KEGG pathway analysis was given to the genes in the Brown module and Magenta module, which indicated that Brown module and Magenta module were correlated with biological regulation, metabolic process, reproduction, and cellular proliferation.
For Brown module and Magenta module, hub genes of the two modules were further analyzed to identify hub genes associated with pathogenesis of cancer. Twenty-six hub genes were obtained and were considered to be closely related to NPC. Then we verified the prognostic value of these biomarkers based on prognostic information data (GSE102349 dataset), which enhanced our understanding on NPC prognosis. PFS was significantly correlated with expression of IL33, MPP3 and SLC16A7. These indicated that these hub genes may be closely related to prognosis of NPC patients. Next, quantitative identification of hub genes by real-time fluorescence. The expression of SLC16A7 was strongly correlated with OS in NPC patients.
SLC16A7 gene encoded protein is a member of monocarboxylate transporter family, which catalyzes proton-linked transport of monocarboxylates and has highest affinity for pyruvate. To our best knowledge, such protein was reported to be more highly expressed in prostate and colorectal cancer specimens. In a previous study, epigenetic and oncogenic regulation of SLC16A7 was associated with over-expression of SLC16A7 protein, which impacted in modulating signaling pathway and cellular phenotypes in prostate cancer patients [
]. Meanwhile, in a study designed to assess effects of single-nucleotide polymorphisms of SLC16A7 gene on prognosis of colorectal cancer, SLC16A7 gene polymorphism may affect clinical outcome of patients, and could be used to predict responses among patients underwent adjuvant chemotherapy [
]. In this study, we demonstrated that SLC16A7 expression was associated with prognosis of NPC patients, which needed further approval by clinical trials.
In this study, based on the analysis of WGCNA, we demonstrated that Brown module, Magenta module and 26 hub genes might be closely associated with the carcinogenesis of NPC, among which the expression trends of SLC16A7 were strongly correlated with OS in NPC. In the future, further studies are needed to verify the functions of the candidate core gene SLC16A7 in different level using methods such as western blot, and transgenic methods, in order to gain a deeper understanding of the relationship between the genes and modules identified in this study.
The authors declare that there is no conflict of interest.
Declaration of Competing Interest
The authors declare that there is no conflict of interest.
This work was supported by the Funding Project of Fujian Medical University College Student Innovation and Entrepreneurship Training Program [Grant No. C19071].
Ethics approval and informed consent
The research related to human use has been complied with all the relevant national regulations, institutional policies and in accordance the tenets of the Helsinki Declaration, and has been approved by the Ethics Committee of Fujian Provincial Hospital (K2021-04-039). All participants gave informed consent before taking part in the study.
Consent for participate
Patients and the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.