Extended Data Fig. 1: Validation of IPA isoforms by independent methods and identification of CLL-IPAs used for further analysis. | Nature

Extended Data Fig. 1: Validation of IPA isoforms by independent methods and identification of CLL-IPAs used for further analysis.

From: Widespread intronic polyadenylation inactivates tumour suppressor genes in leukaemia

Extended Data Fig. 1

a, RNA-seq data were used to validate the presence of IPA isoforms using a GLM. Within two 100-nucleotide windows (green bars) separated by 51 nucleotides and located up- and downstream of the IPA peak, the RNA-seq reads were counted. The IPA peak was considered validated if adjusted P < 0.1 (see Methods). Out of n = 5,587 tested IPA isoforms, n = 1,662 were validated by this method. Shown is MGA as a representative example. b, As only a fraction of IPA isoforms were validated by the method from a, additional methods were used to obtain independent evidence for the presence of the IPA isoforms. Independent evidence was obtained using untemplated adenosines from RNA-seq data or through the presence of the IPA isoform in other 3′-seq protocols10. As the majority of immune cell types used in this study have not been investigated using other 3′-seq protocols and IPA isoform expression is cell type-specific2, highly expressed IPA isoforms (>10 TPM) were not excluded from further analysis even if no read evidence was found by other protocols. c, Hierarchical clustering based on IPA site usage separates the 3′-seq dataset into four groups. It separates CD5+ B from CLL samples and clusters CLL samples into three different groups. Shown is the usage difference of the 20% most variable IPA isoforms across the dataset (n = 342). Four out of thirteen CLL samples cluster away from the rest of the samples and are characterized by a high number of IPA isoforms (CLL high). d, The GLM (FDR-adjusted P < 0.1, IPA usage difference ≥ 0.05, IPA isoform expressed in CD5+ B < 8 TPM) identified 477 recurrent (significantly upregulated in at least 2 out of 13 CLL samples by 3′-seq) and 454 non-recurrent (significantly upregulated in 1 out of 13 CLL samples by 3′-seq). IPAs were validated in an independent RNA-seq dataset containing 46 new CLL samples. Among the recurrent IPAs, 71% of testable IPAs were verified using another GLM (see a). Among the non-recurrent IPAs, 64% of testable IPAs were verified. e, Plotting the number of CLL-IPAs per sample separates the CLL samples investigated by 3′-seq into two groups: 4 out of 13 samples generate a high number of CLL-IPAs (CLL high, median of CLL-IPAs/sample, n = 100, range, 42–274), whereas the rest of the samples generate lower numbers (CLL low, median, n = 9, range, 5–28). Centre bar denotes the median; error bars denote the interquartile range. **P = 0.003, two-sided Mann–Whitney U-test.

Back to article page