Identification of hub genes related to prognosis in glioma

Flow chart for bioinformatics analysis

Sample clustering of WGNCA and scale-free topology model fit

Figure 2

(A) Hierarchical sample clustering. Outlier samples are eliminated according to sample dendrogram. (B) Horizontal axis indicates power parameter; vertical axis indicates scale-free fit goodness. Red line indicates threshold 0.9 of fit goodness. Soft threshold with fit goodness being more than 0.9 is required to be selected. (C) Mean connectivity of network. Soft threshold with larger connectivity is required to be selected, helping explain the selection of 0.9 as the value of soft threshold.

Sample clustering of WGNCA and scale-free topology model fit

(A) Hierarchical sample clustering. Outlier samples are eliminated according to sample dendrogram. (B) Horizontal axis indicates power parameter; vertical axis indicates scale-free fit goodness. Red line indicates threshold 0.9 of fit goodness. Soft threshold with fit goodness being more than 0.9 is required to be selected. (C) Mean connectivity of network. Soft threshold with larger connectivity is required to be selected, helping explain the selection of 0.9 as the value of soft threshold.

Four samples (GSM405232, GSM405389, GSM405414, GSM405467) were excluded by thresholding height at 150 in the previous cluster diagram.

An appropriate soft threshold should be chosen by scale-free topology model fit. The soft threshold with R² (goodness of fit) >0.9 and largest mean connectivity would be applied to WGCNA. In this research, soft threshold was 9 (Figure 2B,C).

Thirteen modules were simulated (Figure 3A), and module eigenvectors (MEs) capable of reflecting the overall expression trend of genes in each module, of these modules were calculated. ME and clinical indicators were subject to Pearson correlation analysis to estimate the correlation between each module and clinical indicators.

Identification of top 20% of hub genes for WGCNA

Figure 3

(A) Distance matrix is made according to 1-TOM value for clustering and module division. (B) Pearson correlation coefficient between module eigenvalues and clinical indicators. ‘Diag_age’ indicates diagnostic age; ‘WHO value’ indicates WHO grading. Color of heat map indicates correlation coefficient. Numerical value in the bracket indicates P-value of correlation coefficient. It is shown in the figure that the module obviously related to WHO grading is yellow, purple, pink, turquoise. (C) MM vs GS plot was drawn, with the most representative top 20% genes (result A) in yellow, purple, pink, turquoise modules shown as red dots. (D) The KEGG pathway enrichment of the interesting modules.

Identification of top 20% of hub genes for WGCNA

(A) Distance matrix is made according to 1-TOM value for clustering and module division. (B) Pearson correlation coefficient between module eigenvalues and clinical indicators. ‘Diag_age’ indicates diagnostic age; ‘WHO value’ indicates WHO grading. Color of heat map indicates correlation coefficient. Numerical value in the bracket indicates P-value of correlation coefficient. It is shown in the figure that the module obviously related to WHO grading is yellow, purple, pink, turquoise. (C) MM vs GS plot was drawn, with the most representative top 20% genes (result A) in yellow, purple, pink, turquoise modules shown as red dots. (D) The KEGG pathway enrichment of the interesting modules.

Based on this result (Figure 3B), we found that no module had significant correlation with diagnostic age and that several modules were significantly correlated with WHO grading. Therefore, we chose the four modules (yellow, purple, pink and turquoise) most relevant to WHO grading for further analysis.

To identify the first 20% genes with the most representative nature in each module, the gene ‘Module Membership’ which could reflect the representativeness of the gene in each module was calculated. The value of gene ‘Module Membership’ referred to the Pearson correlation coefficient between the expression profile of each gene and ME while the gene ‘Trait Significance’ referred to the Pearson correlation coefficient between the expression profile of each gene and clinical traits. The relationship between the level of gene expression and clinical traits could be reflected by ‘Trait Significance’. The MM vs GS figure could reflect the most representative genes in the modules (Figure 3C). Twenty percent of the largest genes in x-axis and 20% of the largest genes in y-axis were subject to intersection. The number of genes in intersected areas was actually less than 20% of the total number of genes in the modules. The most representative genes in these four modules selected by us were regarded as Result A (Figure 3C, additionally shown in Supplementary ResultA.csv) and were subject to KEGG pathway enrichment, from which it was found that they were significantly enriched in cell cycle, ECM–receptor interaction, p53 signaling pathway, PI3K-Akt signaling pathway and small cell lung cancer with the nature as a pathway, which were all clearly associated with tumor function (Figure 3D).

Differential expression analysis (result B, result D)

GSE16011 and CGGA RNAseq FPKM data were subject to differential expression analysis, the former was identified to have 231 differential genes in total. However, genes with consistent trends were saved as result B (Figure 4A, additionally shown in Supplementary ResultB.csv), including 129 significantly up-regulated genes and 100 down-regulated genes. In total of 320 differential genes were identified in CGGA RNAseq FPKM data, which, thereby, were considered as result D (Figure 4C, additionally shown in Supplementary ResultD.csv), including 171 up-regulated and 149 down-regulated genes.

Briefing on result B, result C, result D

Figure 4

(A) DEGs (high-expression group vs low-expression group), DEGs (tumor vs normal). Chi-square test was conducted for expressing quantity. Three results intersected with each other, obtaining 231 genes. However, only 229 genes have consistent trends in two differential expression analysis. (B) Results of survival analysis for CGGA mRNA microarray data and CGGA RNAseq data. Chi-square test was conducted for expressing quantity. It is found that 2707 genes in total were significantly related to prognosis. However, 26 genes are inconsistent trends in two datasets. A total of 2681 genes were finally obtained. (C) Differential expression analysis was conducted for CGGA mRNAseq FPKM data. DEGs (WHO III vs II) intersected with DEGs (WHO IV vs III), obtaining 320 genes, all of which have consistent trends in two differential expression analysis.

Briefing on result B, result C, result D

(A) DEGs (high-expression group vs low-expression group), DEGs (tumor vs normal). Chi-square test was conducted for expressing quantity. Three results intersected with each other, obtaining 231 genes. However, only 229 genes have consistent trends in two differential expression analysis. (B) Results of survival analysis for CGGA mRNA microarray data and CGGA RNAseq data. Chi-square test was conducted for expressing quantity. It is found that 2707 genes in total were significantly related to prognosis. However, 26 genes are inconsistent trends in two datasets. A total of 2681 genes were finally obtained. (C) Differential expression analysis was conducted for CGGA mRNAseq FPKM data. DEGs (WHO III vs II) intersected with DEGs (WHO IV vs III), obtaining 320 genes, all of which have consistent trends in two differential expression analysis.

Survival analysis (result C)

Survival analysis was performed on all genes in CGGA RNAseq FPKM data and CGGA mRNA microarray data. Genes with likelihood test P<0.001 and consistent trends in two datasets were screened, with 2682 genes in total regarded as result C (Figure 4B, additionally shown in Supplementary ResultC.csv).

Unreported key genes

Result A, result B and result D intersected with each other so as to obtain 24 genes in total, including TUBB6, COL4A2, TIMP1, COL4A1, CLIC1, COL1A1, TACC3, COL3A1, VIM, SERPINH1, NALCN, COL1A2, JPH3, FAM19A2, KCNB1, GNAL, KIF20A, ANXA2, FN1, ANXA1, CACNG2, NRSN1, CDH18, SH3GL2. Coincidentally, all these 24 genes belong to result C, indicating that their relation with survival is significant (Table 1, additionally shown in Supplementary Report.csv). CGGA mRNA microarray data and CGGA mRNAseq FPKM data were used to draw heat map for these key genes. Although we only performed a difference analysis on GSE16011 and CGGA mRNAseq FPKM data, the 24 key genes also showed a consistent trend in CGGA mRNA microarray data (Figure 5A,B). Additionally, genetic mapping information showed that DEGs (high vs low in GSE16011) were associated with the negative log of the corresponding P-value between 1.0 × 10⁻³ and 0. Four key genes were between 1.0 × 10⁻³ and 1.0 × 10⁻⁶, 20 key genes at more than 1.0 × 10⁻⁶ (Figure 5C). Weighted gene co-expressed network of key genes was extracted from the entire weighted gene co-expressed network (Figure 5D).

The analyses of 24 key genes

Figure 5

(A) Heat map indicates the relation of expression of 24 key genes with WHO grades in CGGA mRNA microarray data. (B) Heat map indicates the relation of expression of 24 key genes with WHO grades in CGGA mRNAseq FPKM data. (C) Manhattan plots presents a visualization analysis of chromosome locus via mapping the initiation site information. This figure was built by plotting −log10 (P of DEGs between High and Low in GSE16011) on the y-axis and the chromosome locus on the x-axis, with the highlight of 24 key genes in red. The size of those highlighted points indicates the negative log P-value of survival analysis. (D) Genes belonging to four yellow, purple, pink, turquoise modules were used to construct the complete co-expressed network obtained through WGCNA. The connection with top 1% connection weights was kept, leading to the disconnection of lots of weak connections in the network. Twenty-four key genes were considered as starting points of searching adjacent nodes and building co-expressed network. Thirty percent of genes with the highest connectivity are shown in the figure (70% of nodes are concealed. Effective information fails to be provided due to excessive nodes and lots of connections). Nodes size in the figure indicates connectivity of nodes. Color indicates module of genes and line width indicates connection weight.

The analyses of 24 key genes

(A) Heat map indicates the relation of expression of 24 key genes with WHO grades in CGGA mRNA microarray data. (B) Heat map indicates the relation of expression of 24 key genes with WHO grades in CGGA mRNAseq FPKM data. (C) Manhattan plots presents a visualization analysis of chromosome locus via mapping the initiation site information. This figure was built by plotting −log10 (P of DEGs between High and Low in GSE16011) on the y-axis and the chromosome locus on the x-axis, with the highlight of 24 key genes in red. The size of those highlighted points indicates the negative log P-value of survival analysis. (D) Genes belonging to four yellow, purple, pink, turquoise modules were used to construct the complete co-expressed network obtained through WGCNA. The connection with top 1% connection weights was kept, leading to the disconnection of lots of weak connections in the network. Twenty-four key genes were considered as starting points of searching adjacent nodes and building co-expressed network. Thirty percent of genes with the highest connectivity are shown in the figure (70% of nodes are concealed. Effective information fails to be provided due to excessive nodes and lots of connections). Nodes size in the figure indicates connectivity of nodes. Color indicates module of genes and line width indicates connection weight.

Table 1

Report for 24 key genes (19 May 2018)

Symbol	Affy_ID	Connectivity*	logFC G3 vs G2*	logFC G4 vs G3*	Likelihood test P-val*	RR*	Reported in glioma
KIF20A	10112_at	48.96	1.58^†	1.22 †	7.56E-15	2.69	Yes
CLIC1	1192_at	39.53	1.12^†	1.19^†	1.04E-20	3.80	Yes
ANXA2	302_at	33.12	1.63^†	1.34^†	1.43E-19	3.63	Yes
ANXA1	301_at	32.26	1.94^†	1.27^†	1.03E-20	3.30	Yes
TIMP1	7076_at	26.38	2.16^†	2.17^†	6.81E-20	3.30	Yes
CACNG2	10369_at	24.99	−2.01^†	−1.77^†	1.21E-11	0.56	NO
JPH3	57338_at	24.60	−1.23^†	−1.78^†	1.90E-14	0.50	NO
VIM	7431_at	22.27	1.31^†	1.22^†	2.52E-28	7.29	Yes
TACC3	10460_at	21.63	1.07^†	1.08^†	3.10E-17	2.77	Yes
SERPINH1	871_at	14.72	1.32^†	1.10^†	2.08E-20	3.74	Yes
KCNB1	3745_at	14.40	−1.17^†	−1.26^†	4.82E-14	0.52	Yes*
SH3GL2	6456_at	13.58	−1.45^†	−1.20^†	1.02E-12	0.51	Yes
TUBB6	84617_at	12.74	1.13^†	1.31^†	2.69E-21	3.74	No
NRSN1	140767_at	12.64	−1.05^†	−1.09^†	1.11E-14	0.51	No
FAM19A2	338811_at	12.51	−1.22^†	−1.48^†	3.38E-16	0.44	No
NALCN	259232_at	11.23	−1.00^†	−1.08^†	1.74E-15	0.49	No
CDH18	1016_at	10.67	−1.10^†	−1.09^†	1.14E-13	0.52	No
GNAL	2774_at	10.43	−1.05^†	−1.11^†	8.82E-12	0.51	No
FN1	2335_at	9.33	1.50^†	1.18^†	1.46E-12	2.37	Yes*
COL1A2	1278_at	8.83	1.62^†	1.52^†	1.03E-17	3.48	Yes*
COL4A2	1284_at	7.89	1.81^†	1.72^†	1.46E-20	3.43	Yes*
COL1A1	1277_at	7.75	2.12 †	2.34^†	1.76E-17	3.25	Yes
COL3A1	1281_at	7.61	2.29 †	2.11 †	7.00E-18	3.09	Yes*
COL4A1	1282_at	7.52	2.06 †	1.84 †	1.62E-19	3.43	Yes*

Symbol	Affy_ID	Connectivity*	logFC G3 vs G2*	logFC G4 vs G3*	Likelihood test P-val*	RR*	Reported in glioma
KIF20A	10112_at	48.96	1.58^†	1.22 †	7.56E-15	2.69	Yes
CLIC1	1192_at	39.53	1.12^†	1.19^†	1.04E-20	3.80	Yes
ANXA2	302_at	33.12	1.63^†	1.34^†	1.43E-19	3.63	Yes
ANXA1	301_at	32.26	1.94^†	1.27^†	1.03E-20	3.30	Yes
TIMP1	7076_at	26.38	2.16^†	2.17^†	6.81E-20	3.30	Yes
CACNG2	10369_at	24.99	−2.01^†	−1.77^†	1.21E-11	0.56	NO
JPH3	57338_at	24.60	−1.23^†	−1.78^†	1.90E-14	0.50	NO
VIM	7431_at	22.27	1.31^†	1.22^†	2.52E-28	7.29	Yes
TACC3	10460_at	21.63	1.07^†	1.08^†	3.10E-17	2.77	Yes
SERPINH1	871_at	14.72	1.32^†	1.10^†	2.08E-20	3.74	Yes
KCNB1	3745_at	14.40	−1.17^†	−1.26^†	4.82E-14	0.52	Yes*
SH3GL2	6456_at	13.58	−1.45^†	−1.20^†	1.02E-12	0.51	Yes
TUBB6	84617_at	12.74	1.13^†	1.31^†	2.69E-21	3.74	No
NRSN1	140767_at	12.64	−1.05^†	−1.09^†	1.11E-14	0.51	No
FAM19A2	338811_at	12.51	−1.22^†	−1.48^†	3.38E-16	0.44	No
NALCN	259232_at	11.23	−1.00^†	−1.08^†	1.74E-15	0.49	No
CDH18	1016_at	10.67	−1.10^†	−1.09^†	1.14E-13	0.52	No
GNAL	2774_at	10.43	−1.05^†	−1.11^†	8.82E-12	0.51	No
FN1	2335_at	9.33	1.50^†	1.18^†	1.46E-12	2.37	Yes*
COL1A2	1278_at	8.83	1.62^†	1.52^†	1.03E-17	3.48	Yes*
COL4A2	1284_at	7.89	1.81^†	1.72^†	1.46E-20	3.43	Yes*
COL1A1	1277_at	7.75	2.12 †	2.34^†	1.76E-17	3.25	Yes
COL3A1	1281_at	7.61	2.29 †	2.11 †	7.00E-18	3.09	Yes*
COL4A1	1282_at	7.52	2.06 †	1.84 †	1.62E-19	3.43	Yes*

Connectivity*, connectivity of nodes (genes) in network (Figure 6C); logFC G3vsG2* indicates log2 fold change of DEGs (WHO III vs WHO II); logFC G4vsG3* indicates log2 fold change of DEGs (WHO IV vs WHO III); Likelihood test P-val^* indicates P-value obtained through survival analysis. Yes^* is just reported through data analysis, without experiment conducted to verify functions of genes in glioma. RR^*, relative risk of death was obtained through computing method described in the method.

†

Indicates that P-value of DEG is less than 0.001.

Twenty-four key genes were identified in the present study. Most of them were reported in glioma, CACNG2 JPH3, TUBB6, NRSN1, FAM19A2, NALCN, GNAL have not been reported yet. Based on our current results, the researchers believe that some of these genes may have important research implications for gliomas. The results indicated that the expression of CACNG2, JPH3, NRSN1, FAM19A2, NALCN, GNAL decreased as the WHO grade increased. Their higher expression levels indicate a better prognosis (Figure 6A,B). TUBB6 increased as the WHO grade increased, higher expression levels of TUBB6 indicates a poor prognosis (Figure 6A,B). We used qRT-PCR to detect differences in the expression of these genes. The results show that the differences in these genes exist objectively (Figure 6C). ROC analysis showed that the expression levels of these genes have a good diagnostic efficacy on gliomas (Figure 6D).

Expression of unreported genes and their clinical significance

Figure 6

(A) Boxplot for genes are not reported in glioma using CGGA mRNAseq FPKM data. (B) Survival plot for genes which was not reported in glioma using CGGA mRNAseq FPKM data. (C) Boxplot based on the relative expression of each gene, as detected by qPCR. (D) ROC curve for unreported genes in glioma using GSE16011. (E) A graphical display of a correlation matrix using data from GSE16011.

https://doi.org/10.1200/JCO.2013.49.6968

Expression of unreported genes and their clinical significance

(A) Boxplot for genes are not reported in glioma using CGGA mRNAseq FPKM data. (B) Survival plot for genes which was not reported in glioma using CGGA mRNAseq FPKM data. (C) Boxplot based on the relative expression of each gene, as detected by qPCR. (D) ROC curve for unreported genes in glioma using GSE16011. (E) A graphical display of a correlation matrix using data from GSE16011.

Co-expression of unreported key genes and known glioma biomarker

We selected nine markers associated with gliomas, TP53, BRAF, CDKN2A, PTEN, TERT, EGFR, MGMT, IDH1, IDH2. Pearson correlation analysis was performed on them. There was not a strong correlation between unreported key genes and known glioma biomarker (Figure 6E). But IDH1 have a significant correlation with each unreported key gene (P<0.05).

Discussion

Glioma has become an important factor threatening human life. Research on glioma pathogenesis is important. As an exploratory research, the study aims to to discover key genes related to tumor malignancy. Paving the way for the next project is our lab. Therefore, datasets GSE16011 and CGGA with larger sample size were first selected so as to carry out WGCNA, differential analysis and survival analysis. The most representative top 20% genes in WGCNA module significantly related to WHO grades were selected. KEGG pathway analysis suggests that these genes are enriched with Cell cycle, ECM–receptor interaction, p53 signaling pathway, DNA replication, Focal adhesion, Small cell lung cancer and PI3K-Akt signaling pathway related to tumor functions (Table 2) while GO enrichment analysis shows that they are enriched with platelet-derived growth factor binding, extracellular matrix structural constituent, structural molecule activity, phospholipase A2 inhibitor activity and protein binding pathway (Table 3). Genes mainly involved in GO enrichment analysis include ANXA1, COL1A1, COL1A2, COL3A1, COL4A1, COL4A2, TUBB6, ANXA1 and ANXA2. It is obviously seen that various subtypes of collagen play important roles in glioma. COL1A1 has been researched in glioma.

Table 2

KEGG pathway enrichment analysis for 24 genes

Pathway description	Observed gene count	False discovery rate	Matching proteins
Amoebiasis	7	6.62E-09	COL1A1, COL1A2, COL3A1, COL4A1, COL4A2, FN1, GNAL
ECM–receptor interaction	6	7.82E-08	COL1A1, COL1A2, COL3A1, COL4A1, COL4A2, FN1
Protein digestion and absorption	6	3.96E-06	COL1A1, COL1A2, COL3A1, COL4A1, COL4A2
Focal adhesion	6	7.17E-06	COL1A1, COL1A2, COL3A1, COL4A1, COL4A2, FN1
PI3K-Akt signaling pathway	6	1.13E-04	COL1A1, COL1A2, COL3A1, COL4A1, COL4A2, FN1
Small cell lung cancer	3	6.08E-03	COL4A1, COL4A2, FN1
Platelet activation	3	1.72E-02	COL1A1, COL1A2, COL3A1

Pathway description	Observed gene count	False discovery rate	Matching proteins
Amoebiasis	7	6.62E-09	COL1A1, COL1A2, COL3A1, COL4A1, COL4A2, FN1, GNAL
ECM–receptor interaction	6	7.82E-08	COL1A1, COL1A2, COL3A1, COL4A1, COL4A2, FN1
Protein digestion and absorption	6	3.96E-06	COL1A1, COL1A2, COL3A1, COL4A1, COL4A2
Focal adhesion	6	7.17E-06	COL1A1, COL1A2, COL3A1, COL4A1, COL4A2, FN1
PI3K-Akt signaling pathway	6	1.13E-04	COL1A1, COL1A2, COL3A1, COL4A1, COL4A2, FN1
Small cell lung cancer	3	6.08E-03	COL4A1, COL4A2, FN1
Platelet activation	3	1.72E-02	COL1A1, COL1A2, COL3A1

Table 3

GO enrichment analysis for 24 genes

Pathway description	Observed gene count	False discovery rate	Matching proteins
Platelet-derived growth factor binding	4	1.97E-06	COL1A1, COL1A2, COL3A1, COL4A1
Extracellular matrix structural constituent	5	2.13E-05	COL1A1, COL1A2, COL3A1, COL4A1, COL4A2
Structural molecule activity	7	3.08E-03	ANXA1, COL1A1, COL1A2, COL3A1, COL4A1, COL4A2, TUBB6
Phospholipase A2 inhibitor activity	2	6.50E-03	ANXA1, ANXA2
Protein binding	14	4.70E-02	ANXA1, ANXA2, COL1A1, COL1A2, COL3A1, COL4A1, FN1, GNAL, KCNB1, KIF20A, SERPINH1, SH3GL2, TIMP1, VIM

Pathway description	Observed gene count	False discovery rate	Matching proteins
Platelet-derived growth factor binding	4	1.97E-06	COL1A1, COL1A2, COL3A1, COL4A1
Extracellular matrix structural constituent	5	2.13E-05	COL1A1, COL1A2, COL3A1, COL4A1, COL4A2
Structural molecule activity	7	3.08E-03	ANXA1, COL1A1, COL1A2, COL3A1, COL4A1, COL4A2, TUBB6
Phospholipase A2 inhibitor activity	2	6.50E-03	ANXA1, ANXA2
Protein binding	14	4.70E-02	ANXA1, ANXA2, COL1A1, COL1A2, COL3A1, COL4A1, FN1, GNAL, KCNB1, KIF20A, SERPINH1, SH3GL2, TIMP1, VIM

Twenty-four key genes were identified in the study. TOM matrix generated in WGCNA was used to build co-expressed network for 24 genes. Differential expression analysis was just made for GSE16011 and CGGA mRNA seq FPKM data in data analysis process. It is shown in Figure 5A,B that 24 key genes have obvious differential expression in mRNA microarray data, which is consistent with previous conclusion, proving that differential expression of 24 genes is of higher stability and credibility.

A total of 2682 genes identified in survival analysis study (shown in Supplementary ResultC.csv) are simultaneously expressed to be significantly related to prognoses of patients in two CGGA datasets, which is of great reference significance for other glioma studies. All 24 key genes identified in previous study belong to result C. Functions of KIF20A, CLIC1, ANXA2, ANXA1, TIMP1, VIM, TACC3, SH3GL2, COL1A1, CDH1,8 [14] SERPINH1/HSP47 [15] were reported via experimentation in glioma. KCNB1, FN1, COL1A2, COL4A2, COL3A1, COL4A1 was just reported by data analysis; however, functional experiments of those genes were not conducted. In addition, CACNG2, JPH3, TUBB6, NRSN1, FAM19A2, NALCN, GNAL were not reported in glioma. Researched and reported genes prove that conclusion of the study is of higher credibility. The results of ROC analysis with areas under the curve ranging from 0.85 to 0.99, reflecting a fair to excellent diagnostic efficiency of these unreported proteins.

To investigate the relationship between the unreported genes we obtained the important clinical markers. We selected nine markers associated with gliomas, TP53, BRAF, CDKN2A, PTEN, TERT, EGFR, MGMT, IDH1, IDH2. Pearson correlation analysis was performed on them. The results showed that these genes were significantly associated with IDH1 (Figure 6B).

We checked if these key genes have been reported by searching in PubMed. Results of searching in PubMed, with the summary of survival analysis and differential expression analysis of key genes is presented in Table 1. TUBB6 has higher RR in unreported key genes. Official full name of TUBB6 is tubulin β 6 class V, belonging to tubulins family. As a spherical molecule, Tubulin, divided into α-tubulin and β-tubulin, serves as the basic molecule constituting microtubule. As a part of cytoskeleton, microtubule is spread all over the whole cytoplasm. It is reported that subtype mutation of tubulin is involved in abnormalities of brain [16]. Participating in maintaining structure of cell, constituting cytoskeleton with microfilament and intermediate filament, microtubules plays a is very important role in many biological processes. They are also involved in constituting cilium and flagellin, which are applied to intracellular transport and participate in several cellular processes, including motion of secretory vesicles, organelle and substances in the cell. Moreover, they participate in cell division (mitosis and meiosis), for example, to form spindle apparatus used to pull eukaryotic chromosome.

There are lots of reports about key roles of microtubule in tumor such as lymphoblastic leukemia cells [17], glioma [18]. In this research, we found that TUBB6 is significantly related to glioma grades and poor prognosis, indicating that TUBB6 may play an important role in glioma genesis and development process.

Conclusions

Twenty-four key genes were hunted from hub genes enriched with key pathway relating to tumor of modules of WGCNA. Twenty-four key genes were potential prognostic markers for glioma. KIF20A, CLIC1, ANXA2, ANXA1, TIMP1, VIM, TACC3, SH3GL2, COL1A1, SERPINH1/HSP47, CDH18 have been reported in glioma, along with carrying out functional experiment. As to reports about KCNB1, FN1, COL1A2, COL4A2, COL3A1, COL4A1, only data analysis has been done; however, functional experiments were not conducted. CACNG2, JPH3, TUBB6, NRSN1, FAM19A2, NALCN, GNAL have not been reported in gliomas. They are novel biomarkers and independent prognostic factors in gliomas, but require experiments at the protein level.

Data Availability

The datasets analyzed during the current study are available in the CGGA repository [data3, data4 http://www.cgga.org.cn/download.jsp]. The datasets analyzed during the current study are available in the GEO repository [https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE16011].

Competing Interests

The authors declare that there are no competing interests associated with the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China [grant number 81802503 (to H.Y.)]; the ‘Peak’ Training Program for Scientific Research of Yijishan Hospital, Wannan Medical College [grant number GF2019G15 (to H.Y.)]; the Research Project Training Fund of Wannan Medical College [grant number WK2018ZF02 (to H.Y.)]; and the Natural Science Foundation of the Institutions of Higher Education from the Education Department of Anhui Province [grant number KJ2017A261 (to C.H.)].

Author Contribution

D.Z. contributed to the whole project from conception of the idea to writing the proposal and doing the statistical analysis. J.Z. and X.L. contributed in preparing the questions, and writing the manuscript. C.H. helped in writing the manuscript. J.L. mainly performed the statistical analysis. H.Y. supervised the implementation.

Abbreviations

ANXA1: annexin A1
ANXA2: annexin A2
CACNG2: calcium voltage-gated channel auxiliary subunit gamma 2
CDH18: cadherin 18
CGGA: Chinese Glioma Genome Atlas
circRNA: circular RNA
CLIC1: chloride intracellular channel 1
COL1A1: collagen type I alpha 1 chain
COL1A2: collagen type I alpha 2 chain
COL3A1: collagen type III alpha 1 chain
COL4A1: collagen type IV alpha 1 chain
COL4A2: collagen type IV alpha 2 chain
DEG: differential expressed gene
FAM19A2: family with sequence similarity 19 member A2
FN1: fibronectin 1
FPKM: Fragments per Kilobase Million
GEO: Gene Expression Omnibus
GNAL: G protein subunit alpha L
GO: Gene Ontology
JPH3: junctophilin 3
KCNB1: potassium voltage-gated channel subfamily B member 1
KEGG: Kyoto Encyclopedia of Genes and Genomes
KIF20A: kinesin family member 20A
lncRNA: long non-coding RNA
ME: module eigenvector
miRNA: MicroRNA
MM vs GS: Module Membership vs Gene significance
mRNA: messenger RNA
NALCN: sodium leak channel, non-selective
NRSN1: neurensin 1
qRT-PCR: Quantitative Real-time Polymerase Chain Reaction
ROC curve: receiver operating characteristic curve
SERPINH1: serpin family H member 1
SH3GL2: SH3 domain containing GRB2 like 2, endophilin A1
TACC3: transforming acidic coiled-coil containing protein 3
TIMP1: TIMP metallopeptidase inhibitor 1
TOM: topological overlap matrix
TUBB6: tubulin β 6 class V
VIM: vimentin
WGCNA: weighted correlation/co-expression network analysis
WHO: World Health Organization

References

1.

Gilbert

M.R.

,

Wang

M.

,

Aldape

K.D.

,

Stupp

R.

,

Hegi

M.E.

,

Jaeckle

K.A.

et al.

(

2013

)

Dose-dense temozolomide for newly diagnosed glioblastoma: a randomized phase III clinical trial

.

J. Clin. Oncol.

31

,

4085

–

4091

[PubMed]

2.

Supriya

M.

,

Kumar

G.A.

and

Kishor

R.G.

(

2015

)

Therapeutic approach beyond conventional temozolomide for newly diagnosed glioblastoma: review of the present evidence and future direction

.

Indian J. Med. Paediatr. Oncol.

36

,

229

–

237

https://doi.org/10.1016/j.cancergen.2012.10.009

3.

Goodenberger

M.L.

and

Jenkins

R.B.

(

2012

)

Genetics of adult glioma

.

Cancer Genet.

205

,

613

–

621

[PubMed]

https://doi.org/10.1007/s00401-016-1545-1

4.

Louis

D.N.

,

Perry

A.

,

Reifenberger

G.

,

von Deimling

A.

,

Figarella-Branger

D.

,

Cavenee

W.K.

et al.

(

2016

)

The 2016 World Health Organization Classification of Tumors of the Central Nervous System: a summary

.

Acta Neuropathol. (Berl)

131

,

803

–

820

https://doi.org/10.1126/science.1235122

5.

Vogelstein

B.

,

Papadopoulos

N.

,

Velculescu

V.E.

,

Zhou

S.

,

Diaz

L.A.J.

and

Kinzler

K.W.

(

2013

)

Cancer genome landscapes

.

Science

339

,

1546

–

1558

[PubMed]

https://doi.org/10.2202/1544-6115.1128

6.

Zhang

B.

and

Horvath

S.

(

2005

)

A general framework for weighted gene co-expression network analysis

.

Stat. Appl. Genet. Mol. Biol.

4

,

Article17

[PubMed]

https://doi.org/10.1186/1471-2105-9-559

7.

Langfelder

P.

and

Horvath

S.

(

2008

)

WGCNA: an R package for weighted correlation network analysis

.

BMC Bioinformatics

9

,

559

[PubMed]

https://doi.org/10.1111/j.1467-9868.2005.00496.x

8.

Storey

J.D.

,

Taylor

J.E.

and

Siegmund

D.

(

2005

)

Acknowledgement of priority: strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach

.

J. R. Stat. Soc. Ser. B

67

,

197

–

197

https://doi.org/10.1155/2017/7653101

9.

Long

H.

,

Liang

C.

,

Zhang

X.

,

Fang

L.

,

Wang

G.

,

Qi

S.

et al.

(

2017

)

Prediction and analysis of key genes in glioblastoma based on bioinformatics

.

Biomed Res. Int.

2017

,

7653101

[PubMed]

https://doi.org/10.1158/0008-5472.CAN-09-2307

10.

Gravendeel

L.A.M.

,

Kouwenhoven

M.C.M.

,

Gevaert

O.

,

de Rooi

J.J.

,

Stubbs

A.P.

,

Duijm

J.E.

et al.

(

2009

)

Intrinsic gene expression profiles of gliomas are a better predictor of survival than histology

.

Cancer Res.

69

,

9065

–

9072

[PubMed]

https://doi.org/10.1101/gr.165126.113

11.

Bao

Z.-S.

,

Chen

H.-M.

,

Yang

M.-Y.

,

Zhang

C.-B.

,

Yu

K.

,

Ye

W.-L.

et al.

(

2014

)

RNA-seq of 272 gliomas revealed a novel, recurrent PTPRZ1-MET fusion transcript in secondary glioblastomas

.

Genome Res.

24

,

1765

–

1773

[PubMed]

https://doi.org/10.1093/neuonc/nos263

12.

Yan

W.

,

Zhang

W.

,

You

G.

,

Zhang

J.

,

Han

L.

,

Bao

Z.

et al.

(

2012

)

Molecular classification of gliomas based on whole genome gene expression: a systematic report of 225 samples from the Chinese Glioma Cooperative Group

.

Neuro Oncol.

14

,

1432

–

1440

[PubMed]

https://cran.r-project.org/

13.

Core Team R.

(

2018

)

R: a language and environment for statistical computing

,

14.

Bai

Y.-H.

,

Zhan

Y.-B.

,

Yu

B.

,

Wang

W.-W.

,

Wang

L.

,

Zhou

J.-Q.

et al.

(

2018

)

A novel tumor-suppressor, CDH18, inhibits glioma cell invasiveness via UQCRC2 and correlates with the prognosis of glioma patients

.

Cell Physiol. Biochem.

48

,

1755

–

1770

https://doi.org/10.1159/000492317

https://doi.org/10.1111/bpa.12256

15.

Wu

Z.B.

,

Cai

L.

,

Lin

S.J.

,

Leng

Z.G.

,

Guo

Y.H.

,

Yang

W.L.

et al.

(

2016

)

Heat shock protein 47 promotes glioma angiogenesis

.

Brain Pathol.

26

,

31

–

42

https://doi.org/10.1016/j.braindev.2014.06.002

16.

Romaniello

R.

,

Arrigoni

F.

,

Bassi

M.T.

and

Borgatti

R.

(

2015

)

Mutations in alpha- and beta-tubulin encoding genes: implications in brain malformations

.

Brain Dev.

37

,

273

–

280

[PubMed]

https://doi.org/10.1080/15384101.2018.1496746

17.

Delgado

M.

and

Chambers

T.C.

(

2018

)

Microtubules play an essential role in the survival of primary acute lymphoblastic leukemia cells advancing through G₁ phase

.

Cell Cycle

17

,

1784

–

1796

https://doi.org/10.1186/s12885-018-4517-0

18.

Wang

Y.

,

Guan

G.

,

Cheng

W.

,

Jiang

Y.

,

Shan

F.

,

Wu

A.

et al.

(

2018

)

ARL2 overexpression inhibits glioma proliferation and tumorigenicity via down-regulating AXL

.

BMC Cancer

18

,

599

[PubMed]