‘Milestone’ Catalogue of Mutations in the Cancer Genome
Scientists from the Pan-Cancer Analysis of Whole Genomes (PCAWG) project have released unprecedented data from their analyses of mutational events of more than 2600 cancer genomes. They catalogued and assembled a broad and comprehensive portrait of cancer-related mutations in both the coding and the noncoding regions of the genome across 38 major tumor types and applied this information to determine the biologic pathways that are implicated.
In a nutshell:
About five driver mutations were identified across each cancer genome (5% had no driver mutations).
Most driver mutations were found to be in the coding-region of the genome.
A “molecular clock” can determine which mutations occurred early in the evolution of the cancer.
Patterns of genetic mutations can help determine cancer type (which can help when a clinician is faced with a classification of carcinoma of unknown primary origin).
The mammoth effort, which involved a collaboration between more than 1300 scientists and clinicians from 37 countries, resulted in 23 peer-reviewed scientific articles that were published simultaneously in Nature and its affiliated journals.
“These six papers, together with companion papers being published elsewhere, represent a milestone in cancer and cloud genomics,” write Marcin Cieslik, PhD, and Arul M. Chinnaiyan, MD, PhD, in an accompanying comment in Nature. Both authors are affiliated with Michigan Center for Translational Pathology at the Roger Cancer Center, University of Michigan, Ann Arbor.
“The broad availability and quality of the PCAWG data set will almost certainly spur a wave of biological insights and methodological developments,” they add.
For instance, the work on molecular clocks, in which the researchers analyzed whether mutations occur earlier or later, could eventually be useful in the earlier diagnosis of cancer or in prevention strategies.
“Overall, the group’s findings suggest that driver mutations can occur years before cancer is diagnosed, which has implications for early detection and biomarker development,” they write.
Driver mutations can occur years before cancer is diagnosed, which has implications for early detection.”
However, they also note that though the inferential analyses such as those undertaken by the PCAWG expand on sequencing studies rooted in observations and provide a deeper understanding of cancer, they are often associated with a higher degree of uncertainty.
Entire Cancer Genome Analyzed ― Coding and Noncoding Regions
At a Nature press conference, Lincoln D. Stein, MD, PhD, member of the project steering committee and head of adaptive oncology at the Ontario Institute for Cancer Research in Canada, explained why the information and new understanding gleaned from the PCAWG is different from what had been learned from exome sequencing.
“Fourteen years ago, the genomic community sequenced its first cancer exome [the protein-coding region], and it was able to identify mutations in roughly 20,000 protein-coding genes in the human cell. And in the next 10 years, the community sequenced nearly 20,000 cancer exomes,” he said.
Stein explained that although the portrait of sequences that emerged from exome sequencing was a gold mine that provided insights into cancer biology and advanced precision treatment through targeted therapy, it represented a mere 1% of the genome.
“Assembling an accurate portrait of the cancer genome using just the exome data is like putting together a 100,000 piece jigsaw puzzle when you’re missing 99% of the pieces and there is no picture box with the completed picture to guide you,” he said.
The PCAWG built on the work of the International Cancer Genome Consortium (ICGC) and the Cancer Genome Atlas Cancer sequencing projects to uniformly analyze more than 2500 whole-cancer genomes — 100% of the cancer genome.
Stein pointed out that the PCAWG researchers have released all the raw, processed, and interpreted data to the research community as a legacy dataset that will continue to provide data for years to come. The PCAWG landing page provides links to several data resources for interactive online browsing.
The Pan-Cancer Analysis
For the pan-cancer analysis, whole-genome sequence data were available for 2605 primary tumors and 173 metastases or local recurrences, which corresponded with 2658 cancer genomes with matched normal tissues. After an alignment with the human genome, the data were analyzed for mutations using three pipelines ― somatic single-nucleotide variants (SNVs); small insertions and deletions (indels); and copy-number alterations and structural variants (SVs), which are rearrangements of large portions of the DNA.
Impressively, the cataloguing uncovered >43 million somatic SNVs, >400,000 somatic multinuclei variants, >2.4 million somatic indels, close to 290,000 somatic SVs, >19,000 somatic retrotransposition events, and >8000 de novo mitochondrial DNA mutations. The analysis reported considerable heterogeneity in the burden of somatic mutations across patients as well as tumor types.
“Cataloguing and understanding driver mutations has been a major goal of the cancer research community for the past 2 decades, and although there are a limited number of driver mutations, there are thousands or tens of thousands of combinations of these drivers,” Stein said.
The researchers found that 91% of cancers had at least one identifiable driver mutation; the average was 4.6 drivers per cancer. No drivers were reported for approximately 5% of the cancers.
Across tumor types, driver SVs were more prevalent in breast and ovary adenocarcinomas. Driver point mutations played a major role in colorectal adenocarcinomas and mature B-cell lymphomas. Many driver mutations that were associated with tumor suppressors were two-hit inactivation events: in 77% of mutations in TP53, both alleles were mutated.
No driver mutations were seen in a high proportion of patients with chromophobe renal cell carcinoma and pancreatic neuroendocrine cancers, tumors with considerable chromosomal aneuploidy. “Certain combinations of whole-chromosome gains and losses may be sufficient to initiate a cancer in the absence of more targeted driver events,” the PCAWG researchers write.
At the press conference, Peter Campbell, MD, PhD, member of the PCAWG steering committee and head of cancer, ageing, and somatic mutation at the Wellcome Sanger Institute in the United Kingdom, explained the significance of these findings. “When treating patients, I was amazed at how two patients would have tumors that looked the same, were treated the same, but have two completely opposite outcomes. One would survive and one would die,” he said.
He elaborated that the PCAWG has laid bare the reasons for the unpredictability of outcomes. “The reasons are written in the genome. The striking finding is just how different one person’s cancer genome is from another person’s — thousands of different combinations of mutations that can cause the cancer and more than 80 different underlying processes, leading to different patterns in the genome,” he said.
Knowledge of the patterns of mutations and of which genes they affect can help to identify which type of cancer the patient is likely to have, something that is useful in 5% of patients who receive an initial diagnosis of carcinoma of unknown origin, which normal tools cannot detect, Campbell elaborated.
Mutations in Noncoding Regions
In their analysis of noncoding somatic drivers, the PCAWG researchers call into question previously reported noncoding drivers, such as the long noncoding RNAs NEAT1 and MALAT1, but also reveal new ones.
They found single-site hotspots in the TERT promoter, which were associated with higher TERT expression. Somatic mutations were found in the promoter and/or 5ꞌ UTR (untranslated region) of MTG2 (coding for GTPase) and in the 3ꞌ UTR of TOB1 (important in gene regulation), NFKBIZ, and ALB.
TOB1 has implications for gastric and breast cancers, MTG2 codes for a GTPase, and NRKBIZ codes for a transcriptional factor. Mutations in the noncoding regions of these regions are associated with decreased or increased expression associated with tumorigenesis.
In cases in which promoter mutations were reported for TP53, there were no additional mutations in the coding region, suggesting for the first time a powerful inactivation of TP53 from noncoding mutations.
The researchers also reported novel structural-variant mutations that were candidates for driving oncogenesis, such as rearrangements involving AKR1C genes and BRD4.
However, noncoding drivers were found to be rare events. “Larger datasets and technological advances will continue to identify new non-coding drivers, albeit at considerably lower frequencies than protein-coding drivers,” the researchers comment.
“For cancer patients, this means that the vast majority of clinically relevant mutations in a cancer are likely to be found in protein-coding sequences, which will simplify efforts for the clinical use of genome sequencing in cancer,” said Iñigo Martincorena, PhD, of the Welcome Sanger Institute and a PCAWG researcher.
The researchers also addressed whether events occurred early or late in the evolution of the cancer. The concept of molecular clocks was addressed by determining whether the mutations were clonal (occurring in all the cancer clones) or subclonal (occurring in a fraction of the cancer clones), with clonal events occurring before subclonal ones.
Of 47 million point mutations in 2583 unique samples, the researchers noted that 22% were early clonal events; 7%, late clonal; 53%, unspecified clonal; and 17%, subclonal. In a panel of 453 cancer driver genes, they identified 5913 oncogenic point mutations, of which 29% were early clonal, 5% late clonal, 56% unspecified clonal, and 8% subclonal events.
Driver mutations in KRAS, TP53, and PIK3CA and noncoding driver mutations in TERT were deemed early events in the evolution of the cancers. In addition, 50% of early drivers were noted in nine genes; 50% of late and subclonal mutations were noted in as many as 35 genes. Indeed, the researchers noted that the mutational signature changes over time in 40% of the tumors.
How Does a Clinician Handle This Wealth of Information?
“The Pan-Cancer Project provides the roadmap to how the genomes should be collected and analyzed, and we developed standardized pipelines of high quality to analyze the genomes and present them in a unified fashion, which allows data from multiple centers to be analyzed in a uniform and harmonious way,” Stein commented at the press briefing.
In response to a question from Medscape Medical News about how this catalogue of mutations can inform treatment decisions, Stein said: “We found the same pathways that had been identified by exome sequencing, but many more ways to enter in and change those pathways.”
He further explained that with the increased ability to identify which pathways and genes are altered in a particular cancer, clinicians may now more accurately diagnose what changes have occurred in the patient’s tumor, identify the dysregulated pathway, and assign that patient to the therapy that is likely to be effective and least likely to have toxic effects. “We can use existing targeted therapy or targeted therapies that will be developed in the future,” he said.
For example, mutations in KRAS can be treated (and already are treated) with KRAS-targeted drugs, but the new information about mutations in the KRAS protein-coding region or amplification and structural rearrangements in KRAS suggests that tumors with these defects may also be treated with the same drug.
“Prior to this, looking just at the exome sequencing, we would be in the dark with roughly a third of the patients who came into the clinic. We would have no information to go on to do precision oncology,” he added.
However, a major limitation of these studies is the absence of clinical data correlating outcomes with treatments received by patients.
An ongoing ICGC ARGO project, which launched its website to coincide with these publications, aims to do just that: collect information from 100,000 cancer patients in clinical trials, as well as patients’ outcomes and genomic information.
Stein indicated that multiple national and international precision oncology projects will also be launched to collect genomic sequencing from patients in clinical trials or undergoing standard-of-care treatment. That information will be uploaded into databases that collect the key clinical information about the patient, tumor type, therapy received, and response, he continued.
“Once we collect enough data that correlate genomic data with clinical outcomes, we can begin to build machine learning systems and decision support systems that will help the clinician match the patient’s molecular and genomic information to therapeutic outcomes,” Stein said.
However, cancer genome testing is not going to be available at any cancer clinic anytime soon. Unlike gene panel testing, which is now available, the pipelines and standardizations of whole-genome sequencing will have to be imported to centers. “We can move analysis tools around for people to use; however, moving a research tool to a personalized diagnostic tool remains a challenge,” Campbell said.
In addition, there are costs associated with running these tools and analyzing the data that emerge. “It is reasonably expensive now, but…the costs will be coming down. This technology won’t be available tomorrow in a community setting, but in a couple years it will be more freely available,” Campbell said.
Funding information for the PCAWG and the authors’ relevant financial relationships are listed at the end of each report.
Nature. Published online February 5, 2010. PCAWG overvew, Full text; Pan-cancer analysis, Full text; Mutational signatures, Full text; Structural variants, Full text; Noncoding somatic drivers, Full text; Molecular clocks, Full text
For more from Medscape Oncology, follow us on Twitter: @MedscapeOnc.