Abstract
Breast cancer (BC) is the most common leading cause of cancer-related death in women worldwide. Gene expression profiling analysis for human BCs has been studied previously. However, co-expression analysis for BC cell lines is still devoid to date. The aim of the study was to identify key pathways and hub genes that may serve as a biomarker for BC and uncover potential molecular mechanism using weighted correlation network analysis. We analyzed microarray data of BC cell lines (GSE 48213) listed in the Gene Expression Omnibus database. Gene co-expression networks were used to construct and explore the biological function in hub modules using the weighted correlation network analysis algorithm method. Meanwhile, Gene ontology and KEGG pathway analysis were performed using Cytoscape plug-in ClueGo. The network of the key module was also constructed using Cytoscape. A total of 5000 genes were selected, 28 modules of co-expressed genes were identified from the gene co–expression network, one of which was found to be significantly associated with a subtype of BC lines. Functional enrichment analysis revealed that the brown module was mainly involved in the pathway of the autophagy, spliceosome, and mitophagy, the black module was mainly enriched in the pathway of colorectal cancer and pancreatic cancer, and genes in midnightblue module played critical roles in ribosome and regulation of lipolysis in adipocytes pathway. Three hub genes CBR3, SF3B6, and RHPN1 may play an important role in the development and malignancy of the disease. The findings of the present study could improve our understanding of the molecular pathogenesis of breast cancer.
Introduction
Breast cancer (BC) is one of the most common types of cancer in women [1]. This disease largely affects women in their 40–60s. BC accounts for 29% of all female cancer-related cancer worldwide that contributed the second most cancer now, after lung cancer both in developing and developed countries [2]. At present, the morbidity and mortality of BC are continuously growing. In 2017, approximately 63,990 new cases and 14,440 deaths were predicted to be associated with kidney cancer in the U.S.A. [3]. Nevertheless, BC is a complex disease with respect to molecular alterations, cellular composition, and clinical outcome six subtypes were defined approximately a decade ago based on transcriptional characteristics, and were designated luminal A, luminal B, ERBB2-enriched, basal-like, Claudin-low, and normal-like [4]. However, growing body studies indicated that BC is heterogeneous cancer in various aspects including clinical-pathological, molecular, and cellular heterogeneity [5]. In order to improve the prognosis and decrease mortality and morbidity of BC, diagnostic biomarkers are critical for early detection and risk stratification of BC, which could help us to choose the proper treatment.
Weighted gene co-expression network analysis (WGCNA), a comprehensive collection of R functions, is a prevalent and useful method in the correlation network analysis and the identification of disease-related gene modules and key genes that contribute to phenotypic traits [6,7]. WGCNA approach has provided functional interpretation tools in systems biology. Unlike conventional microarray-based expression profiling method, WGCA allows a global interpretation of gene expression data by constructing gene networks based on similarities in expression profiles amongst samples. To better understand and explore the complex mechanisms of BC cell lines, WGCNA method would be the best choice to study the disease. WGCNA has been successfully performed in various types of cancer, such coronary heart disease [8], uveal melanoma [9], gastric adenocarcinoma [10], prostate cancer [11], and lung cancer [12]. However, the analysis of microarray-based gene expression data by the WGCNA has so far not been applied to compare the co-expression network of BC cell lines.
In the present study, the WGCNA method was applied to analyze a gene expression dataset of the BC cell line to identify biologically relevant modules associated with BC cell line subtypes.
material and methods
Expression analysis of microarray data of BC cell line samples
Gene expression profiles of breast cell lines were accessible from the Gene Expression Omnibus (GEO) database using the accession number GSE 48216 [13]. The microarray gene data included 56 BC cell lines, which composed of five subtypes (basal, claudin-low, luminal, non-malignant, and unknown). Annotated files of microarray platform (GPL 10999) were also downloaded from GEO. Annotation information of microarray data was employed to match probes with corresponding gene information. The mean value was used to consider the expression level if one gene detected by multiple probes. We select the top 25% variance gene expression data as our study object. A matrix of pairwise correlations between all pairs of genes across all selected samples was constructed.
Construction of gene co-expression network analysis
The WGCNA implemented in the R software package (http://www.r-project.org/) is employed to construct the gene co-expression network and identify the co-expression modules. The process is summarized as follows. First, hierarchical clustering analysis was performed on the samples by the flashClust package in R software [14] and found the sample outlier. Second, WGCNA algorithm was used to calculate the correlations amongst genes across samples, and the appropriate soft threshold power was chosen and the standard scale-free network was established by the condition of scale independent as >0.8. Third, module identification was carried out with the dynamic tree cut method by hierarchically clustering the genes using 1- topological overlap matrix (TOM) as the distance measure with a deep split value of 2 and minimum module size (minClusterSize) of 50 for the resulting dendrogram. Highly similar modules were clustered and merged with a height cutoff of 0.25. Finally, WGCNA algorithm was then used to construct the co-expression modules and extract the gene information in each module.
Establishment of module related trait relationships
The correlation between gene expression modules and each phenotype of BC cell lines was determined by WGCNA. Spearman’s correlation analysis was carried out to determine the association individual module and subtype of BC cell lines, which was the most relevant module between the module eigengene (ME) and clinical traits. Each subtype of BC has a strongly related module, which can be used as its signature. Gene significance (GS) was calculated for each expression profiles, module membership (MM) was defined as the correlation of expression profile and each MM. The GS versus MM plot was also drawn.
Interaction analysis of co-expression modules of BC
Interaction relationships amongst different co-expression modules were imputed by WGCNA. Heatmap tool package in R software was used in the evaluation of the strength of the relationship. We correlate the clustering coefficient with connectivity by the module in the unweighted network. The heatmap of gene expression profiles in the individual module was also shown.
Functional enrichment analysis of genes in the hub modules
The number of genes in the critical modules was put in the order from high to low gene significance scores. Then, functional enrichment analysis was conducted on these genes in these modules. GO Biological Process term and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses were conducted using Cytoscape plug-in ClueGo [15]. Functional enrichment analysis was based on the cut-off value of P-value <0.05.
Module network visualization and hub genes
The interested module was constructed using the Cytoscape software, and Cytoscape plug-in molecular complex detection (MCODE) was used to analyze the most notable clustering module.
The Kaplan–Meier plotter
The prognostic value of hub gene was evaluated using an online database, Kaplan–Meier plotter (www.kmplot.com) [16], which integrated gene expression data with survival information of 4142 clinical BC patients [17]. For overall survival analysis, the log-rank test was employed to compare the survival difference between the BC group and non-tumor group. A two-sided P-value of <0.05 was regarded as statistically significant.
Validation of the hub gene expression with TCGA and CCLE database
To verify the hub genes from the TCGA database, the expression statuses of hub genes in the TCGA BRCA cohort were revealed for validation. The diagnostic performance was evaluated using receiver operating characteristic (ROC) curves. The hub genes in a different type of cancers and cell lines were determined through analysis in CCLE database (https://portals.broadinstitute.org/ccle/home), which is publicly a large-scale genomic dataset of human cancer cell lines to facilitate discovery and identification from genome-wide expression analyses [18]. P-value, less than 0.05, was regarded as significant. All analyses were performed using R/Bioconductor (version 3.3.2).
Results
Collection of gene expression data
The raw gene expression data (GSE 48213) were downloaded from the GEO data repository (http://www.ncbi.nlm.nih.gov/geo/). The dataset contained a total of 56 BC cell lines, consisting 27 luminal, 14 basal, six claudin-low, one Kbluc, five non-malignant, and three of unknown subtype. The detailed information is shown in Table 1. The sequencing platform was GPL10999 Illumina Genome Analyzer IIx (Homo sapiens). The gene expression matrix was downloaded from the GEO database. The mean value was used to represent the expression level if one gene detected by multiple probes. As a result, a total of 36,953 expression data of genes was obtained. To select the most varying genes, the top highest 25% variance expression profiles were chosen to conduct the WGCNA analysis.
GEO no. . | Cell line . | Subtype . |
---|---|---|
GSM1172844 | 184A1 | Non-malignant |
GSM1172845 | 184B5 | Non-malignant |
GSM1172846 | 21MT1 | Basal |
GSM1172847 | 21MT2 | Basal |
GSM1172848 | 21NT | Basal |
GSM1172849 | 21PT | Basal |
GSM1172850 | 600MPE | Luminal |
GSM1172851 | AU565 | Luminal |
GSM1172853 | BT474 | Luminal |
GSM1172854 | BT483 | Luminal |
GSM1172855 | BT549 | Claudin-low |
GSM1172856 | CAMA1 | Luminal |
GSM1172858 | EFM192A | Luminal |
GSM1172859 | EFM192B | Luminal |
GSM1172860 | EFM192C | Luminal |
GSM1172861 | HCC1143 | Basal |
GSM1172863 | HCC1395 | Claudin-low |
GSM1172864 | HCC1419 | Luminal |
GSM1172865 | HCC1428 | Luminal |
GSM1172867 | HCC1569 | Basal |
GSM1172868 | HCC1599 | Basal |
GSM1172869 | HCC1806 | Basal |
GSM1172870 | HCC1937 | Basal |
GSM1172871 | HCC1954 | Basal |
GSM1172872 | HCC202 | Luminal |
GSM1172873 | HCC2218 | Luminal |
GSM1172874 | HCC3153 | Basal |
GSM1172875 | HCC38 | Claudin-low |
GSM1172876 | HCC70 | Basal |
GSM1172877 | HS578T | Claudin-low |
GSM1172878 | JIMT1 | Basal |
GSM1172879 | LY2 | Luminal |
GSM1172881 | MB157 | unknown |
GSM1172882 | MCF10A | Non-malignant |
GSM1172883 | MCF10F | Non-malignant |
GSM1172884 | MCF12A | Non-malignant |
GSM1172885 | MCF7 | Luminal |
GSM1172886 | MDAMB134VI | Luminal |
GSM1172888 | MDAMB175 | Luminal |
GSM1172889 | MDAMB231 | Claudin-low |
GSM1172890 | MDAMB361 | Luminal |
GSM1172893 | MX1 | unknown |
GSM1172895 | SKBR3 | Luminal |
GSM1172896 | SUM1315 | Claudin-low |
GSM1172897 | SUM149PT | Basal |
GSM1172901 | SUM225CWN | Luminal |
GSM1172902 | SUM229PE | unknown |
GSM1172903 | SUM52PE | Luminal |
GSM1172904 | T47D | Luminal |
GSM1172905 | T47D | Kbluc |
GSM1172906 | UACC812 | Luminal |
GSM1172907 | UACC893 | Luminal |
GSM1172908 | ZR751 | Luminal |
GSM1172909 | ZR7530 | Luminal |
GSM1172910 | ZR75B | Luminal |
GSM1384316 | MDAMB453 | Luminal |
GEO no. . | Cell line . | Subtype . |
---|---|---|
GSM1172844 | 184A1 | Non-malignant |
GSM1172845 | 184B5 | Non-malignant |
GSM1172846 | 21MT1 | Basal |
GSM1172847 | 21MT2 | Basal |
GSM1172848 | 21NT | Basal |
GSM1172849 | 21PT | Basal |
GSM1172850 | 600MPE | Luminal |
GSM1172851 | AU565 | Luminal |
GSM1172853 | BT474 | Luminal |
GSM1172854 | BT483 | Luminal |
GSM1172855 | BT549 | Claudin-low |
GSM1172856 | CAMA1 | Luminal |
GSM1172858 | EFM192A | Luminal |
GSM1172859 | EFM192B | Luminal |
GSM1172860 | EFM192C | Luminal |
GSM1172861 | HCC1143 | Basal |
GSM1172863 | HCC1395 | Claudin-low |
GSM1172864 | HCC1419 | Luminal |
GSM1172865 | HCC1428 | Luminal |
GSM1172867 | HCC1569 | Basal |
GSM1172868 | HCC1599 | Basal |
GSM1172869 | HCC1806 | Basal |
GSM1172870 | HCC1937 | Basal |
GSM1172871 | HCC1954 | Basal |
GSM1172872 | HCC202 | Luminal |
GSM1172873 | HCC2218 | Luminal |
GSM1172874 | HCC3153 | Basal |
GSM1172875 | HCC38 | Claudin-low |
GSM1172876 | HCC70 | Basal |
GSM1172877 | HS578T | Claudin-low |
GSM1172878 | JIMT1 | Basal |
GSM1172879 | LY2 | Luminal |
GSM1172881 | MB157 | unknown |
GSM1172882 | MCF10A | Non-malignant |
GSM1172883 | MCF10F | Non-malignant |
GSM1172884 | MCF12A | Non-malignant |
GSM1172885 | MCF7 | Luminal |
GSM1172886 | MDAMB134VI | Luminal |
GSM1172888 | MDAMB175 | Luminal |
GSM1172889 | MDAMB231 | Claudin-low |
GSM1172890 | MDAMB361 | Luminal |
GSM1172893 | MX1 | unknown |
GSM1172895 | SKBR3 | Luminal |
GSM1172896 | SUM1315 | Claudin-low |
GSM1172897 | SUM149PT | Basal |
GSM1172901 | SUM225CWN | Luminal |
GSM1172902 | SUM229PE | unknown |
GSM1172903 | SUM52PE | Luminal |
GSM1172904 | T47D | Luminal |
GSM1172905 | T47D | Kbluc |
GSM1172906 | UACC812 | Luminal |
GSM1172907 | UACC893 | Luminal |
GSM1172908 | ZR751 | Luminal |
GSM1172909 | ZR7530 | Luminal |
GSM1172910 | ZR75B | Luminal |
GSM1384316 | MDAMB453 | Luminal |
Construction of co–expression module of BC cell lines
Co-expression modules were constructed by the expression values of 5000 genes in 56 BC cell lines using the WGCNA algorithm. On the whole, the sample hierarchical clustering plot in each sample was divided into two clusters using the flashClust tool package of WGCNA algorithm method. As can be seen in Figure 1, 56 BC samples were divided into two clusters, and we fail to detect outlier samples. Power values were screened out by WGCNA algorithm in the construction of co-expression modules. When the soft threshold power β was set at 6, the independence degree was up to 0.8 and the average connectivity degree was higher (Figure 2). After highly similar modules were merged, a total of 28 co-expression modules were identified in the co-expression network (Figure 3). Total 246 genes in the gray module did not belong to other modules, accounting for 0.492% in all total genes. The number of genes included in these modules was 167 (black module), 355 (blue module), 294 (brown module), 84 (cyan module), 44 (darkgreen module), 40 (darkgrey module), 38 (darkorange module), 43 (darkred module), 41 (darkturquoise), 270 (green module), 101 (greenyellow module), 246 (grey module), 67 (grey60 module), 77 (lightcyan module), 62 (lightgreen module), 61 (lightyellow module), 124 (magenta module), 84 (midnightblue module), 40 (orange module), 168 (pink module), 108 (purple module), 241 (red module), 60 (royalblue module), 87 (salmon module), 88 (tan module), 1671 (turquoise module), 37 (white module), and 279 (yellow module). The average number of genes in these 44 modules was 1671 and the median was 162. The information of genes belonged to four modules was listed in Supplementary Table S1.
Sample hierarchical clustering plot and trait heatmap
Analysis of network topology for various soft-thresholding powers
Clustering dendrograms of genes with dissimilarity based on the topological overlap, together with assigned module color
Correlation between modules and each subtype of BC cell lines in co-expression networks
The module-trait relationships were established by WGCNA algorithm. Modules with higher Spearman’s correlation coefficients are also the most important elements of modules associated with the trait. From Figure 4, the modules related to BC cell line classification were observed, including the five categories of ‘basal’, ‘claudin-low’, ‘luminal’, ‘non-malignant’, and ‘unknown’. The correlation between ME and traits shows that some modules are more important than others in the disease. A scatterplot of gene significance versus MM was also plotted. We found that the luminal phenotype highly correlated with the brown module (Spearmen correlation coefficients: 0.86, P<0.01), with a significantly significant correlation. The basal phenotype is highly related to the black module (Cor: 0.52, P<0.01). Claudin-low subtype highly correlated with darkred module (Cor: 0.67, P<0.01). Non-malignant subtype was highly associated with midnightblue module (Cor: 0.74, P<0.01) (Figure 5). The module with the greatest association with clinical traits was the brown-colored module. Therefore, this module should allow us to capture and summarize the most relevant mRNA expression profiles.
Module-trait associations
A scatterplot of gene significance versus MM
Interaction analysis of co-expression modules and trait
We further analyzed the interaction amongst 28 co-expression modules. On the whole, no significant difference amongst different modules was observed, suggesting the relative independence of gene expression in these modules (Figure 6). The higher scale independence amongst these modules was also detected. Connectivity of eigengenes analysis was performed in order to evaluate the interactions amongst these constructed co-expression modules (Figure 7). The dendrogram plot also indicated that the brown module was highly associated with luminal BC.
Heat map view of TOM of co-expressed genes in different modules in the top 1500 genes
Analysis of connectivity of eigengenes in a different module
Functional enrichment analysis of critical modules
To explore the potential biological function of critical modules, we performed the functional enrichment analysis of GO and KEGG category. There was a significant difference in the biological processes that different modules were enriched in Figure 8 (Table 2). The results of GO analysis for brown module revealed that GO categories enriched in the biological process (n=25), cellular component (n=3), and molecular function (n=3). Genes in the brown module were largely enriched in GO:0034248 (regulation of cellular amide metabolic process), GO:0010608 (post-transcriptional regulation of gene expression), and GO:0043487 (regulation of RNA stability). The GO enrichment categories are exhibited in Figure 8A. A total of 15 KEGG pathways focussing on the biological pathways were enriched, including autophagy, spliceosome, and mitophagy. The KEGG pathway enrichment categories are exhibited in Figure 9A. Gene in the black module was mainly enriched in GO:0000184 (nuclear-transcribed mRNA catabolic process, nonsense-mediated decay), GO:0033613 (activating transcription factor binding), GO:0022626 (cytosolic ribosome) (Figure 8B). Two KEGG pathways focussing on the biological pathways were enriched in the black module, including colorectal cancer and pancreatic cancer (Figure 9B). Gene in the dark red module was mainly enriched in GO:0003678 (DNA helicase activity), GO:0004003 (ATP-dependent DNA helicase activity), and GO:0070603 (SWI/SNF superfamily-type complex) (Figure 8C). However, genes in dark red module did not significantly enrich any pathways. Genes in midnight blue module were enriched in GO:0043628 (ncRNA 3′-end processing), GO:0006734 (NADH metabolic process), and GO:0045047(protein targeting to ER) (Figure 8D). Genes in midnight blue module played critical roles in ribosome and regulation of lipolysis in adipocytes pathway (Figure 9C).
GO terms shows as an interaction network using Cytoscape plug-in ClueGO
KEGG pathways displayed as an interaction network using Cytoscape plug-in ClueGO
Module . | Term . | count . | % . | P-value . |
---|---|---|---|---|
Brown | GO:0043487∼regulation of RNA stability | 10 | 3.40 | 0.00 |
GO:0034248∼regulation of cellular amide metabolic process | 19 | 6.46 | 0.00 | |
GO:0010608∼posttranscriptional regulation of gene expression | 25 | 8.50 | 0.00 | |
GO:0030529∼intracellular ribonucleoprotein complex | 37 | 12.59 | 0.00 | |
GO:0043487∼regulation of RNA stability | 10 | 3.40 | 0.00 | |
Black | GO:0033613∼activating transcription factor binding | 4 | 2.38 | 0.03 |
GO:0022626∼cytosolic ribosome | 6 | 3.57 | 0.00 | |
GO:0000184∼nuclear-transcribed mRNA catabolic process, nonsense-mediated decay | 6 | 3.57 | 0.00 | |
GO:1904813∼ficolin-1-rich granule lumen | 6 | 3.57 | 0.00 | |
GO:0019083∼viral transcription | 8 | 4.76 | 0.00 | |
Dark red | GO:0003678∼DNA helicase activity | 3 | 6.82 | 0.00 |
GO:0004003∼ATP-dependent DNA helicase activity | 3 | 6.82 | 0.00 | |
GO:0031491∼nucleosome binding | 3 | 6.82 | 0.00 | |
GO:0043044∼ATP-dependent chromatin remodeling | 4 | 9.09 | 0.00 | |
GO:0070603∼SWI/SNF superfamily-type complex | 4 | 9.09 | 0.00 | |
Midnight blue | GO:0045047∼protein targeting to ER | 5 | 5.88 | 0.00 |
GO:0022626∼cytosolic ribosome | 5 | 5.88 | 0.00 | |
GO:0042274∼ribosomal small subunit biogenesis | 4 | 4.71 | 0.00 | |
GO:0006734∼NADH metabolic process | 3 | 3.53 | 0.00 | |
GO:0043628∼ncRNA 3′-end processing | 3 | 3.53 | 0.00 |
Module . | Term . | count . | % . | P-value . |
---|---|---|---|---|
Brown | GO:0043487∼regulation of RNA stability | 10 | 3.40 | 0.00 |
GO:0034248∼regulation of cellular amide metabolic process | 19 | 6.46 | 0.00 | |
GO:0010608∼posttranscriptional regulation of gene expression | 25 | 8.50 | 0.00 | |
GO:0030529∼intracellular ribonucleoprotein complex | 37 | 12.59 | 0.00 | |
GO:0043487∼regulation of RNA stability | 10 | 3.40 | 0.00 | |
Black | GO:0033613∼activating transcription factor binding | 4 | 2.38 | 0.03 |
GO:0022626∼cytosolic ribosome | 6 | 3.57 | 0.00 | |
GO:0000184∼nuclear-transcribed mRNA catabolic process, nonsense-mediated decay | 6 | 3.57 | 0.00 | |
GO:1904813∼ficolin-1-rich granule lumen | 6 | 3.57 | 0.00 | |
GO:0019083∼viral transcription | 8 | 4.76 | 0.00 | |
Dark red | GO:0003678∼DNA helicase activity | 3 | 6.82 | 0.00 |
GO:0004003∼ATP-dependent DNA helicase activity | 3 | 6.82 | 0.00 | |
GO:0031491∼nucleosome binding | 3 | 6.82 | 0.00 | |
GO:0043044∼ATP-dependent chromatin remodeling | 4 | 9.09 | 0.00 | |
GO:0070603∼SWI/SNF superfamily-type complex | 4 | 9.09 | 0.00 | |
Midnight blue | GO:0045047∼protein targeting to ER | 5 | 5.88 | 0.00 |
GO:0022626∼cytosolic ribosome | 5 | 5.88 | 0.00 | |
GO:0042274∼ribosomal small subunit biogenesis | 4 | 4.71 | 0.00 | |
GO:0006734∼NADH metabolic process | 3 | 3.53 | 0.00 | |
GO:0043628∼ncRNA 3′-end processing | 3 | 3.53 | 0.00 |
Module visualization and hub genes
We chose gene pairs in the module which weight >0.05 and co-expression network was constructed using Cytoscape (Figure 10). The sub-network was also identified via Cytoscape plug-in MCODE. In total, three sub-co-expression networks (modules 1, 2, and 3) with score >6 were detected by MCODE (Figure 11). Each score was 34.979, 8.889, and 7.579, respectively. The three sub-co-expression networks belong to black module, midnightblue, and brown. Amongst these genes, CBR3, SF3B6, and RHPN1 were the hub gene in the sub-network.
The visualization of four key modules
Three significant modules identified from the whole network using Cytoscape plug-in MCODE with a score of >6.0
Validation of hub genes
The Kaplan–Meier curve and log-rank test analyses revealed that the mRNA level of three hub genes was significantly associated with OS (P<0.05) (Figure 12) in all BC patients. The BC patients with higher mRNA levels of CBR3 and SF3B6 or lower mRNA level of RHPN1 were predicted to have a better OS. The three hub gene expression level was extracted in TCGA BRCA cohort. Compared with normal samples, the three hub gene expression level was higher in the tumor samples (Figure 13). We employed ROC curves to evaluate the prognostic power of three hub genes. The combined AUC for the three biomarkers prognostic model was 0.927 (0.907−0.947) (Figure 14). The CCLE database analysis demonstrated that the two hub gene expression level of CBR3 and RHPN1 ranked higher in a variety of cancer cell line (Figure 15).
The prognostic value of mRNA level of hub genes in BC patients (OS in Kaplan–Meier plotter)
The mRNA expression level of three key genes between BRCA and adjunct noncancerous breast tissues based on TCGA data
The ROC curves analysis for diagnostic power prediction by the three key genes
The two mRNA (CBR3 and RHPN1) were high expressed in kidney cancer cell lines from CCLE analysis
Discussion
BC is still one of the most common and malignant tumors by complex molecular and cellular heterogeneity [5]. This disease largely influences women in their 40–60s. It is the second most cancer now, just after lung cancer, the principal cause of death from cancer amongst women both in developing and developed countries [2,3]. BC cell lines may mirror many of the molecular characteristics of the tumors from which they were derived, and are therefore a useful preclinical model in which to explore strategies for predictive marker development. However, the mechanisms of critical pathways and their interactions involved in the occurrence and development of BC, remain largely unknown. Therefore, understanding of BC biology may provide clinicians with new tools that can be used for the treatment of this disease.
Many successful efforts were employed to examine the complex molecular and cellular nature of BC through basic genetic and molecular study, and various novel genes that are involved in BC have been identified [19–21]. Although individual gene or protein alone can produce an important role in the development and metastasis of BC, determining individual gene expression levels does not facilitate a comprehensive understanding of cancer cell development [22]. Due to rapid technological breakthroughs of genome-wide sequencing, the research of clinical issues and related pathological mechanisms in various cancers has been developed [23]. Weighted gene co-expression network analysis (WGCNA) is a method frequently used in the co-expression module correlation analysis by microarray samples. WGCNA is also a powerful tool to examine the potential gene correlation structures within the gene expression data. Therefore, the WGCNA method was employed to construct robust gene co-expression networks. These modules were established in terms of large-scale gene expression profiles and the distinction of centrally located genes (hub genes) that drive key cellular signaling pathways [24]. The WGCNA approach has provided functional interpretation tools in systems biology and led to new insights into the molecular and pathological mechanisms in several diseases [25,26]. There are no reports applying WGCNA to systematically identify gene co-expression networks associated with BC cell lines. To fulfill this gap, the WGCNA was applied to construct a co-expression network amongst each phenotype of BC cell line and calculated module-trait correlations based on one public microarray datasets (GSE78000), which included 36 samples and 22,615 genes. A total of 28 co-expression modules were constructed by WGCNA method, which helps clinicians to detect the association between BC cell line transcriptome and phenotype. We found four co-expression modules that relate to phenotype (basal, claudin-low, luminal, unknown, and non-malignant). Using the enrichment and functional analysis of Cytoscape plug-in ClueGO, we found that the KEGG pathway of targetted genes in the brown module was mainly involved in the pathway of the autophagy, spliceosome, and mitophagy, the black module was mainly enriched in the pathway of colorectal cancer and pancreatic cancer, and genes in midnightblue module played critical roles in ribosome and regulation of lipolysis in adipocytes pathway. Three hub genes CBR3, SF3B6, and RHPN1 may play an important role in the development and malignancy of BC. The genes stood out for their association with the diffuse histological subtype.
CBR3 (carbonyl reductase 3) is a protein-coding gene. Amongst its related pathways are metabolism and cytochrome P450 – arranged by substrate type. GO annotations related to this gene include oxidoreductase activity and carbonyl reductase (NADPH) activity. An important paralog of this gene is CBR1. At present, CBR3 has become potential biomarkers in uterine leiomyomas disease [27], type 2 diabetes [28], oral squamous cell carcinomas [29], and BC [30]. CBR3 mRNA expression could regulate human cancer cell lines by the Nrf2 pathway [31]. CBR3 is also proven a novel target gene of inflammatory stimuli [32]. Hertz et al. demonstrate that SNPs in CBR3 was correlated with chronic cardiotoxicity in a cohort of BC patients treated with anthracyclines [33].
SF3B6 (splicing factor 3b subunit 6) is a protein-coding gene. Amongst its related pathways are gene expression and mRNA splicing - major pathway. GO annotations related to this gene include nucleic acid binding and nucleotide binding. Siebring-van et al. identified that silencing SF3A3 yielded much stronger cytotoxicity to non-small cell lung cancer cells than to lung fibroblasts, suggesting that the gene could represent useful therapeutic targets [34].
RHPN1 (Rhophilin Rho GTPase binding protein 1) is a protein-coding gene. Amongst its related pathways are ectoderm differentiation and RHO GTPases activate rhotekin and rhophilins. An important paralog of this gene is RHPN2. Lal MA reported that Rhophilin-1 is a key regulator of the podocyte cytoskeleton, and is essential for glomerular filtration [35].
In this work, our findings demonstrated that black module and brown module, midnight blue module, and dark red module were regarded as the most critical module in the development and pathological phenotype of BC. And the hub gene CBR3, SF3B6, and RHPN1 were found to be significantly in three modules, which may play as the potential diagnostic and prognostic biomarkers of BC, However, the biological function of these three genes needs to be further validated with more experiment.
Availability of data and material
Data availability could be obtained from TCGA website.
Competing Interests
The authors declare that there are no competing interests associated with the manuscript.
Funding
The work was supported by the Guizhou Science and Technology Cooperation Project (Qiankehe LH character [2017] 1110).
Author Contribution
Abbreviations
References
Author notes
These authors contributed equally to work.