The Cancer Genome Atlas (TCGA) targets more than 30 different cancer types, collecting hundreds of samples for each type. Each disease is studied individually by multiple groups across TCGA. Our Center is analyzing data collected for many of these diseases in order to understand each cancer more deeply. Our GDAC is also exploring associations between the various diseases to identify commonalities.

Our GDAC has participated in analysis for the following cancer types:

Our GDAC is participating in ongoing analysis for the following cancer types in TCGA: Adrenocortical Carcinoma, Cervical cancer, Cholangiocarcinoma, Liver Hepatocellular Carcinoma, Mesothelioma, Pancreas, Paraganglioma and Pheochromocytoma, Sarcoma, Stomach-Esophageal, Testicular Germ Cell cancer, Thymoma, and Uveal Melanoma. Our GDAC is also contributing to the pan-cancer analysis of 33 tumor types called the Pan-Cancer Atlas.

Our Center participated in the TCGA breast cancer analysis working group, contributing to working group discussions, analyses, presentation of results, and preparation of the TCGA breast cancer marker paper. Numerous analyses were performed by our GDAC, related in particular to the relationship between individual molecular features and various subtypes discovered through supervised and unsupervised methods. As a companion feature to the manuscript, our GDAC has provided a comprehensive feature matrix, including statistical pairwise analysis, that can be explored interactively via Regulome Explorer.
brca molecular features

Associations between molecular features. Statistically significant associations between features with genomic coordinates are indicated by arcs connecting pairs of dots which represent the features. Two examples are shown: significant associations between microRNA and mRNA expression levels (Left), and between copy-number and mRNA expression (Right).


The Cancer Genome Atlas Network
Comprehensive molecular portraits of human breast tumors
Nature 490, 61–70 (2012)

Our Center participated in the TCGA Colorectal Analysis Working Group, contributing to working group discussions, analyses, presentation of results, and the TCGA colorectal marker paper. Numerous analyses were performed by our GDAC, e.g. centered on micro-RNAs, DNA structural variation, signatures associated with anatomical position, signature association with specific subgroupings of microsatellite instability categories. For the colorectal manuscript, we focused on six clinical variables associated with tumor aggressiveness, and generated a score for the association of molecular features with those six variables. The aggressiveness score is a composite of association score with six clinical variables in which p-values for each individual comparison are combined using the weighted Fisher’s method from which an overall p-value is derived. The aggressiveness score is the negative of the base-10 logarithm of this overall p-value augmented by a plus or minus depending on whether the signature is higher or lower in the more aggressive tumors, respectively. This score is color-coded in the visual display with a blue to red color scale from low to high score. To limit the extent of the display, the score is saturated at -10 and +10.
colorectal aggressiveness

A "hotspot" of CRC aggressiveness in region 20q13.12. Certain chromosomal regions are enriched in clinically associated molecular features. Region 20q13.12 includes a local amplification (orange) and 11 genes (blue), all of which are expressed more highly in aggressive tumors. A number of methylation probes (green) are also statistically associated with tumor aggression, nearly all (8/10) with decreased levels in aggressive tumors.


The Cancer Genome Atlas Network
Comprehensive molecular characterization of human colon and rectal cancer.
Nature 487, 330-337 (2012)

TCGA interrogated a large set of endometrial carcinomas, including both serous (n=53) and mixed/endometrioid (n=280) types, to generate the multi-dimensional data types. Our Center performed data analysis on molecular classification and association with clinical and pathological variables. Using the RNASeq gene expression profile, we identified three gene expression subtypes in TCGA endometrial carcinoma, which are then termed as ‘mitotic’, ‘hormonal’, and ‘immunoresponsive’, respectively, on the basis of pathway analysis and gene content. These clusters are significantly correlated with tumor histology, grade, stage, patient overall survival and progression-free survival. Similar to serous ovarian carcinoma, the FOXM1 transcription factor network is significantly altered in the mitotic subtype. Associative analysis indicates that the mitotic subtype is enriched with TP53 mutation/deletion and PIK3CA amplification, while PTEN/CTNNB1 mutation mainly occurs in the hormonal and immunoresponsive subtypes. The results were presented in TCGA Endometrial Cancer workshop (April 9-10, 2012, WashU in St. Louis) and TCGA Semi-Annual Steering Meeting (April 25-27, 2012, Houston). In addition, our Center contributed to the TCGA endometrial marker paper.
comprehensive endometrial

Figure 4: Gene Expression Profiling Identifies Three Gene Expression Subtypes. (A) Tumors from TCGA separated into three clusters on the basis of gene expression, namely mitotic, hormonal and immunoresponsive. (B) The three clusters are significantly correlated with patient overall survival and progression-free survival. (C) Association of the three clusters with clinical/pathological features, mutation, copy-number variation and cluster assignments from different data types. (D) Molecular and clinical features associated with tumor histology and FOXM1 transcriptional factor network are significantly activated in the mitotic subtype.


The Cancer Genome Atlas Network
Integrated genomic characterization of endometrial carcinoma
Nature 497, 67–73 (2013)

Our Center played a key role in the TCGA’s analysis of gastric cancer in a cohort of 295 patients. This study identified four distinct molecular subtypes of gastric cancer, along with possible targeted treatments for the some subtypes. An essential component of this was an analysis that performed by the Center, integrating molecular patterns among the six molecular platforms of the study (which included DNA sequencing, RNA sequencing, and protein arrays) to identify sets of patients that shared molecular profiles. These molecular profiles were found to have strong associations with a limited set of key variables, which were subsequently used to classify gastric cancer into subtypes. The Center also identified distinct pathway-level differences among the subtypes.

gastric subtypes

Integrated Molecular Analysis Identifies Distinct Gastric Cancer Subtypes
Subsets of gastric cancer patients share molecular signatures reflected in multiple types of measurements. The central part of this figure (bordered by red line) indicates how a patient tumor sample (each corresponding to a column) falls into several possible patterns specific to molecular platforms as indicated by a blue tile. For example, in a single sample, copy number (SCNA) can be either High (blue in row 1) or Low (blue in row 16). Analysis by our Center played a role in revealing that four overall patterns are seen in the data, as indicated by the vertical red separation lines. Furthermore, these overall patterns were characterized by several key variables, as seen in the annotations below the box, the covariant tracks above the box, and the icons at the top of the figure representing DNA mismatch repair, diffuse cell type, Epstein-Barr virus, and aneuploidy, respectively. The key variables formed the basis of the classification of gastric molecular subtypes in the study.


The Cancer Genome Atlas Network
Comprehensive Molecular Characterization of Gastric Adenocarcinoma
Nature 513, 202-209 (2014)

Our center contributed to the glioblastoma multiforme (GBM) Analysis Working Group. For this analysis, we inferred associations in the data using pairwise statistical analysis as well as the RF-ACE algorithm. These inference methods have been applied to the entire GBM data set, as well as subsets of the data that have been partitioned by the four GBM subtypes to identify subtype-specific associations (e.g., the impact of TP53 mutations in classical, mesenchymal, neural, and proneural). The resulting associations have all been made available through Regulome Explorer and collaboratively shared with other members of the Analysis Working Group. Key associations that have emerged from these analyses and data exploration tools are mutual exclusivity and co-occurrence of genomic events, identification of associations between these genomic events and molecular features (e.g., mutations that impact gene and miRNA expression), and subtle relationships between molecular features and clinical data or sample characteristics (e.g., IDH1 mutation and hypermethylation phenotype).


The Cancer Genome Atlas Network
The Somatic Genomic Landscape of Glioblastoma
Cell 155, 462–477 (2013)

Our center reviewed TCGA data for 316 patients with high-grade serous ovarian cancer, the most common form of the disease. Data for each patient included a genetic survey of the surgically resected primary tumor and comprehensive clinical data. Most patients in the study had stage III or IV disease and G2 or G3 tumors. BRCA2 mutations were found in 29 ovarian cancers and BRCA1 mutations in 37. All patients had undergone surgery followed by platinum-based chemotherapy. Patients with BRCA2 mutations in their tumors had a significantly higher 5-year overall survival rate (61%) than did patients without BRCA mutations in their tumors (25%). The 3-year progression-free survival rate also was significantly higher for patients whose tumors had BRCA2 mutations (44%) than for those whose tumors did not have BRCA mutations (16%). BRCA1 mutations in tumors were not significantly associated with survival. All patients whose ovarian cancers had BRCA2 mutations responded to primary platinum-based chemotherapy, compared with 82% of patients whose tumors did not have any BRCA mutation and 80% of patients whose tumors had BRCA1 mutations. The median platinum-free duration was 18.0 months for patients whose tumors had BRCA2 mutations, 11.7 months for patients whose tumors did not have any BRCA mutations, and 12.5 months for those whose tumors had BRCA1 mutations. We also found that tumors with BRCA2 mutations had a median 84 mutations per tumor sample compared with 52 mutations per tumor sample for tumors without BRCA mutations. This last aspect, called the hypermutation/mutator phenotype for BRCA2 mutated ovarian cancers, might be a factor in the development and growth of a tumor and a sign of its vulnerability to DNA-damaging drugs.


D. Yang, S. Khan, Y. Sun, K. Hess, I. Shmulevich, A. K. Sood, W. Zhang
Association of BRCA1 and BRCA2 Mutations With Survival, Chemotherapy Sensitivity, and Gene Mutator Phenotype in Patients With Ovarian Cancer
JAMA 306, 1557–1565 (2011)

Our Center participated in the TCGA Prostate Analysis Working Group, contributing to working group discussions, analyses, presentation of results, and the TCGA prostate marker paper. Our Center worked on the analysis of the clinical data and integrated analysis of the molecular data for 333 primary prostate cancers. The study identified seven subtypes of prostate cancer reflecting the heterogeneity of this cancer.

prostate subtypes

Figure 1. The Molecular Taxonomy of Primary Prostate Cancer
Comprehensive molecular profiling of 333 primary prostate cancer samples revealed seven genomically distinct subtypes.


The Cancer Genome Atlas Network
The Molecular Taxonomy of Primary Prostate Cancer
Cell, Vol. 163, No 4, pp. 1011-1025 (2015)

Our Center participated in the TCGA Thyroid Analysis Working Group, contributing to working group discussions, analyses, presentation of results, and the TCGA thyroid marker paper. Our Center helped to analyze the largest cohort of Papillary Thyroid Cancer (PTC) samples studied to date (496 patients) by performing integrative analysis of DNA sequence, gene expression, microRNA expression, protein expression, and DNA methylation profiles of PTCs. We worked with clinicians to generate a risk of recurrence feature. We collaborated with other AWG members to analyze thyroid differentiation in PTCs. Our Center also played a key role in the identification of certain microRNAs, miR-21, miR-146b, and miR-204, in less differentiated subgroups of PTC. The identification of these microRNAs may lead to more precise surgical and medical therapy.

thyroid miRNA clustering

Figure 7. Unsupervised Clusters for miRNA-seq Data
Heatmap showing discriminatory miRs (5p or 3p mature strands) with the largest 6% of metagene matrix score, as well as miR-204-5p, 221-3p, and 222-3p, which were highlighted in correlations to BRS and TDS scores. The scalebar shows log2 normalized (reads-per-million, RPM), median centered miR abundance. miR names in red are discussed in the text. Gray vertical lines in the clinical information tracks mark samples without clinical data, and in the mutation tracks gray lines identify samples without sequence data.


The Cancer Genome Atlas Network
Integrated Genomic Characterization of Papillary Thyroid Carcinoma
Cell, Vol. 159, No. 3, 676-690 (2014)

The Center participated in the pan-cancer working group and Dr. Shmulevich was one of the cochairs. TCGA presents unprecedented opportunities to study molecular differences and similarities across multiple different cancers and their subtypes. The opportunity is to complement the traditional "tissue of origin" classification of cancers with multidimensional molecular characterization. Analytical tools such as random forest regression will be applied to multiple cancers to characterize subtypes, which may span multiple histological categories, at the level of molecular associations among genetic aberrations (mutations, translocations), expression, epigenetic and other measurements. The Center is also developing pathway level exploration within Regulome Explorer. This capability, already in prototype, allows the scientist to view a particular pathway at the level of associations and to identify enrichment of other pathways associated with the pathway of choice. These associations can be limited to specific datatypes or combinations thereof. Such capabilities will be important for pan-cancer analysis at the pathway level. Furthermore, PARADIGM (UCSC) integrated pathway levels are ingested as features into our feature matrices and analyzed jointly with all other features, providing an additional pathway-level view.


The Cancer Genome Atlas Network
The Cancer Genome Atlas Pan-Cancer analysis project
Nature Genetics, 45, 1113-1120, (2013)
Knijnenburg, TA, Bismeijer T, Wessels LF, and Shmulevich I
A multilevel pan-cancer map links gene mutations to cancer hallmarks
Chin J Cancer, 34, 48, (2015)