Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011;12(4):R41.
doi: 10.1186/gb-2011-12-4-r41. Epub 2011 Apr 28.

GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers

Affiliations

GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers

Craig H Mermel et al. Genome Biol. 2011.

Abstract

We describe methods with enhanced power and specificity to identify genes targeted by somatic copy-number alterations (SCNAs) that drive cancer growth. By separating SCNA profiles into underlying arm-level and focal alterations, we improve the estimation of background rates for each category. We additionally describe a probabilistic method for defining the boundaries of selected-for SCNA regions with user-defined confidence. Here we detail this revised computational approach, GISTIC2.0, and validate its performance in real and simulated datasets.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic overview of the copy-number analysis framework. High-level overview of our cancer copy-number analysis framework, highlighting specific differences between the original GISTIC algorithm [15] and the GISTIC 2.0 pipeline described in this manuscript. The first step, accurate identification of the copy-number profile in each sample, is common to GISTIC and GISTIC2.0.
Figure 2
Figure 2
Computational separation of arm-level and focal SCNAs. (a) Boxplot showing the distribution of copy-number changes for amplified focal (length < 98% of a chromosome arm) and arm-level (length > 98% of a chromosome arm) SCNAs across 178 GBM profiles from TCGA. The black dotted line denotes a typical low-level amplitude threshold used to eliminate artifactual SCNAs, while the green dotted line denotes a typical high-level amplitude threshold used in previous version of GISTIC to eliminate arm-level SCNAs. (b) Histogram showing the frequency of observing SCNAs of a given length across 178 GBM samples. The high frequency of events occupying exactly one chromosome arm led us to distinguish between focal and arm-level SCNAs. (c) Heatmaps showing the total segmented copy-number profile of the TCGA GBM set (leftmost panel), and the results of computationally separating these samples into arm-level profiles (middle panel) and focal profiles (rightmost panel) by summing arm-level and focal SCNAs. In each heatmap, the chromosomes are arranged vertically from top to bottom and samples are arranged from left to right. Red and blue represent gain and loss, respectively.
Figure 3
Figure 3
Effects of amplitude-based or length-based filtering of arm-level events on GISTIC results. (a-c) GISTIC amplification (top) and deletion (bottom) plots using all data and a low amplitude threshold (a), using all data and a high amplitude threshold (b), and using the focal data and a low amplitude threshold (c). The genome is oriented vertically from top to bottom, and GISTIC q-values at each locus are plotted from left to right on a log scale. The green line represents the significance threshold (q-value = 0.25). For each plot, known or interesting candidate genes are highlighted in black when identified by all three analyses, in red when identified by the high amplitude or focal length analyses, in purple when identified by the low amplitude or focal length analyses, and in green when identified only in the focal length analysis.
Figure 4
Figure 4
Sensitivity of peel-off to detect secondary driver events. The average fraction of secondary driver events recovered in independent (not containing the primary driver) peaks by GISTIC using the standard peel-off method (blue line) or arbitrated peel-off (red line) is shown for two simulated datasets. (a) The data are derived from 1,000 simulated chromosomes across 300 samples with a primary driver event present in 10% of samples and a secondary driver event a fixed distance away that is present in 5% of samples. (b) Data are derived from 10,000 simulated chromosomes across 300 samples with a primary driver event present in 10% of samples and a secondary driver event present in 5% of samples, where the fraction of the secondary driver events that overlapped with the primary driver event was varied between 100% (complete dependence; far left) and 0% (complete independence; far right). Error bars represent the mean ± standard error of the mean (some are too small to be visible).
Figure 5
Figure 5
Sensitivity of peak finding algorithms. (a) Schematic diagram demonstrating various peak finding methods. The left panel shows the GISTIC score profile for a simulated chromosome containing a mix of driver events covering the denoted target gene and passenger events randomly scattered across the chromosome. The inset at right shows the region around the maximal G-score (gray box in left panel) in higher detail. The MCR (red dotted lines) is defined as the region of maximal segment overlap, or the region of highest G-score. The leave-k-out procedure (blue dotted lines, here shown for k = 1) is obtained by repeatedly computing the MCR after leaving out each sample in turn and taking as the left and right boundaries the minimal and maximal extent of the MCR. RegBounder works by attempting to find a region (dotted green line) over which the variation between boundary and maximal peak score is within the gth percentile of the local range distribution (Supplementary Methods in Additional file 1). Here, RegBounder produces a wider region than either the MCR or leave-k-out procedures, but is the only method whose boundary contains the true driver gene. (b,c) The average fraction of driver events contained within the peak region (conditional on having found a GISTIC peak within 10 Mb) is plotted as a function of driver-frequency (b) or sample size (c) for the MCR (red), leave-1-out (blue), and RegBounder algorithms (the latter at various confidence levels: 50%, magenta; 75%, green; 95%, black). In (b), data are derived from 10,000 simulated chromosomes across 500 samples in which the driver frequency varied from 1 to 10%. In (c), data are derived from 10,000 simulated chromosomes across a variable number of samples in which the driver frequency was fixed at 5%. Error-bars represent the mean ± standard error of the mean (some are too small to be visible).
Figure 6
Figure 6
Comparison of RegBounder to MCR and leave-1-out procedures applied to primary lung adenocarcinomas. The advantages of RegBounder over previous peak-finding procedures are illustrated for two well-described oncogene peaks identified in GISTIC analysis of 371 lung adenocarcinoma samples characterized on the Affymetrix 250K StyI SNP array (as published in [16]). (a) A well-described amplification peak is identified on chromosome 12p12.1 with MCR (red dotted lines) near to but not containing the known lung cancer oncogene KRAS. Because there are more than two apparent passenger events in this region, the leave-1-out peak (blue dotted lines) also does not contain KRAS. However, RegBounder (green dotted lines) produces a wider peak that captures KRAS. (b) An amplification peak on chromosome 5p15.33 contains hTERT, the catalytic subunit of the human telomerase holoenzyme, within the MCR (red dotted lines). In this case, RegBounder (green dotted lines) produces a narrower peak region than the corresponding leave-1-out peak (blue dotted lines), demonstrating the ability of RegBounder to achieve a greater balance between peak region size and accuracy. In both (a) and (b), the y-axis depicts the amplification G-score and the x-axis denotes position along the corresponding chromosome.
Figure 7
Figure 7
Specificity of peak finding algorithms. (a,b) The median size of the peak regions produced by the MCR (red), leave-1-out (blue), and RegBounder (green, 75% confidence) are shown as a function of driver frequency (a) and sample size (b). In (a), data are derived from 10,000 simulated chromosomes across 500 samples in which the driver frequency varied from 1 to 10%. In (b), data are derived from 10,000 simulated chromosomes across a variable number of samples in which the driver frequency was fixed at 5%. (c) Comparison of the peak region sizes obtained by RegBounder (green line) with the theoretically minimal peak region sizes (black line) that could be obtained by any peak finding algorithm with a similar confidence level (Supplementary Methods in Additional file 1). Error-bars represent the mean ± standard error of the mean (some are too small to be visible).

Similar articles

Cited by

References

    1. Hanahan D, Weinberg RA. The hallmarks of cancer. Cell. 2000;100:57–70. doi: 10.1016/S0092-8674(00)81683-9. - DOI - PubMed
    1. Stratton MR, Campbell PJ, Futreal PA. The cancer genome. Nature. 2009;458:719–724. doi: 10.1038/nature07943. - DOI - PMC - PubMed
    1. Santarius T, Shipley J, Brewer D, Stratton MR, Cooper CS. A census of amplified and overexpressed human cancer genes. Nat Rev Cancer. 2010;10:59–64. doi: 10.1038/nrc2771. - DOI - PubMed
    1. Beroukhim R, Mermel C, Porter D, Wei G, Raychaudhuri S, Donovan J, Barretina J, Boehm J, Dobson J, Urashima M. The landscape of somatic copy-number alteration across human cancers. Nature. 2010;463:899–905. doi: 10.1038/nature08822. - DOI - PMC - PubMed
    1. Chiang D, Getz G, Jaffe D, O'Kelly M, Zhao X. High-resolution mapping of copy-number alterations with massively parallel sequencing. Nat Methods. 2009;6:99–103. doi: 10.1038/nmeth.1276. - DOI - PMC - PubMed

Publication types

Substances

LinkOut - more resources