Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Sep 9;2(9):100294.
doi: 10.1016/j.crmeth.2022.100294. eCollection 2022 Sep 19.

Sensitive and reproducible cell-free methylome quantification with synthetic spike-in controls

Affiliations

Sensitive and reproducible cell-free methylome quantification with synthetic spike-in controls

Samantha L Wilson et al. Cell Rep Methods. .

Abstract

Cell-free methylated DNA immunoprecipitation sequencing (cfMeDIP-seq) identifies genomic regions with DNA methylation, using a protocol adapted to work with low-input DNA samples and with cell-free DNA (cfDNA). We developed a set of synthetic spike-in DNA controls for cfMeDIP-seq to provide a simple and inexpensive reference for quantitative normalization. We designed 54 DNA fragments with combinations of methylation status (methylated and unmethylated), fragment length (80 bp, 160 bp, 320 bp), G + C content (35%, 50%, 65%), and fraction of CpG dinucleotides within the fragment (1/80 bp, 1/40 bp, 1/20 bp). Using 0.01 ng of spike-in controls enables training a generalized linear model that absolutely quantifies methylated cfDNA in MeDIP-seq experiments. It mitigates batch effects and corrects for biases in enrichment due to known biophysical properties of DNA fragments and other technical biases.

Keywords: DNA methylation; absolute quantification; batch effects; cell-free DNA; cell-free methylated DNA immunoprecipitation; cfDNA; cfMeDIP; early detection of cancer; liquid biopsy; minimally invasive testing; reference standards; spike-in controls.

PubMed Disclaimer

Conflict of interest statement

S.L.W., S.Y.S., T.T., D.D.D.C., and M.M.H. are inventors on patent application PCT/CA2020/051507 related to the synthetic spike-in controls, licensed to Adela. S.Y.S., S.V.B., and D.D.D.C. are inventors on other patent applications related to cfDNA methylation analysis technologies, licensed to Adela, serve in leadership roles at Adela, and own equity in Adela. S.V.B. is inventor on a patent related to cfDNA mutation analysis technologies, licensed to Roche Molecular Diagnostics. S.V.B. and D.D.D.C. have received research funding from Nektar Therapeutics.

Figures

None
Graphical abstract
Figure 1
Figure 1
Experimental design using synthetic spike-in control DNA (A) Technical assessment of the spike-in controls with cfDNA mimic. (Left) Assessment of technical bias in solely the spike-in controls. (Right) Optimization of the synthetic DNA amount using sheared HCT116 cfDNA mimic. (B) Clinical evaluation of acute myeloid leukemia (AML) patient samples with spike-in controls.
Figure 2
Figure 2
Assessing biases in fragment length, G + C content, and CpG fraction (A–C) In (A) input spike-in control DNA without cfMeDIP-seq, (B) output spike-in control DNA, after cfMeDIP-seq, and (C) 0.01 ng spike-in control DNA added to HCT116 replicates. Blue, methylated fragments; gray, unmethylated fragments. Circle, sample 1; triangle, sample 2. Solid line, mean of the two samples. Columns marked with numerals 1 and 2 represent alternative sets of fragments with identical properties but different sequences. See also Table S2.
Figure 3
Figure 3
Two-dimensional histograms of the number of reads found in 300 bp windows (A and B) Binned by molar amount and either (A) standard deviation of molar amount or (B) Umap k100 multi-read mappability. Histograms only include windows that do not overlap with UCSC simple repeats and the ENCODE blacklist, and regions with Umap k100 multi-read mappability scores 0.5. Asterisks indicate 11 outlier genomic windows chosen for further examination.
Figure 4
Figure 4
Correlation of two measurements of fragment methylation by cfMeDIP and EPIC array M-value for 300 bp genomic windows (A, C, E, and G) Molar amount calculated from HCT116 samples correlated to EPIC array M-values. (B, D, F, and H) Read counts calculated from the same samples, ignoring the spike-in controls. (A and B) 37,714 windows with ≥3 CpG probes represented on the EPIC array. (C and D) 7,975 windows with ≥5 CpG probes represented on the EPIC array. (E and F) 2,066 windows with ≥7 CpG probes represented on the EPIC array. (G and H) 158 windows with ≥10 CpG probes represented on the EPIC array. Solid black line, linear model of best fit; dashed red line, loess (Cleveland 1979) local regression.
Figure 5
Figure 5
Mean absolute error between known molar amount and predicted molar amount in test data consisting of held-out spike-ins not used for training For each number of spike-in fragments between 6 and 25 inclusive, we 100 times randomly selected that number of spike-ins as training data. We used the remaining spike-ins as test data. Each point shows the mean absolute error over all the test spike-ins for that iteration. The vertical limits of the plot include at least 98/100 iterations in every case. We denote outliers for 6 or 7 training spike-ins with a cross at the top of the plot, labeled with the mean absolute error for that case. Red line denotes median mean absolute error. See also Table S6.
Figure 6
Figure 6
Principal component analyses of cfMeDIP results normalized through four different strategies, and associations with experimental variables (Left) Proportion of the variance explained by each principal component. (Right) Association between known variables, both technical and clinical, and principal component. Cohen’s d is an effect size of standardized means between variable. ∗∗∗p < 0.001. (A) Raw read counts without any normalization. (B) Read counts normalized using QSEA. (C) Data normalized using spike-in controls. (D) Data normalized using spike-in controls and removing regions in UCSC simple repeats, in the ENCODE blacklist, and with Umap k100 multi-read mappability scores ≤0.5. See also Tables S3, S4, and S5.

Similar articles

Cited by

References

    1. Altschul S.F., Madden T.L., Schäffer A.A., Zhang J., Zhang Z., Miller W., Lipman D.J. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. - PMC - PubMed
    1. Amemiya H.M., Kundaje A., Boyle A.P. The ENCODE blacklist: identification of problematic regions of the genome. Sci. Rep. 2019;9:9354. - PMC - PubMed
    1. Blackburn J., Wong T., Madala B.S., Barker C., Hardwick S.A., Reis A.L.M., Deveson I.W., Mercer T.R. Use of synthetic DNA spike-in controls (sequins) for human genome sequencing. Nat. Protoc. 2019;14:2119–2151. - PubMed
    1. Chen K., Hu Z., Xia Z., Zhao D., Li W., Tyler J.K. The overlooked fact: fundamental need for spike-in control for virtually all genome-wide analyses. Mol. Cell Biol. 2015;36:662–667. - PMC - PubMed
    1. Chen S., Zhou Y., Chen Y., Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34:i884–i890. - PMC - PubMed

Publication types

Grants and funding

LinkOut - more resources