Abstract
CRISPR/Cas is a gene-editing technique that allows for the precise and specific introduction of a mutation into a DNA sequence. The outcome of a mutation on encoded protein depends on the type of mutation (deletion, insertion and/or substitution) and the position of the mutation in the DNA sequence. It can be predicted by using screening methods that are able to identify a mutation at nucleotide level. Here, several screening methods are discussed with a difference in complexity, resolution and scalability and the results are interpretated by taken into account the central dogma of the molecular biology. Two modules of the SMAP package, SMAP haplotype-window and SMAP effect-prediction, are proposed and implemented in a high-throughput screening workflow that allows for the automated and streamlined screening of CRISPR experiments.
You have full access to this open access chapter, Download chapter PDF
Similar content being viewed by others
1 Precision Gene Editing: Design Guided by Gene Structural Features
Gene editing, e.g., by CRISPR/Cas, is widely used for plant functional genomics research and has huge potential for targeted improvement of desired heritable traits in crops [1]. The precision and specificity of CRISPR/Cas allows for dedicated screening for induced mutant alleles at target sites. Combinations of mutant alleles detected by molecular screens on the one hand, with the central dogma of molecular biology that explains the relationship between the primary DNA sequence, gene structural features, and expression of the encoded protein (Fig. 5.1) on the other, allows prediction of the effect of the mutation on protein functionality. Thus, a comprehensive gene editing screening workflow to characterize the generated mutants ideally spans the entire path between “gene structure based” CRISPR/Cas and gRNA design, molecular detection of the mutant alleles, and prediction of the effect of the mutation on the encoded protein sequence, hence capturing the actual outcome of gene editing. Here, we review the different types of mutations that can be introduced by variants of CRISPR/Cas gene editing, highlight several molecular screening and detection techniques and place them in this overarching perspective.
2 Types of Mutations Introduced by CRISPR/Cas Mediated Gene Editing
In its basic form, CRISPR/Cas mediated gene editing introduces a double-stranded DNA break at a specific genomic position defined by the gRNA. Subsequent non-perfect repair via non-homologous end-joining (NHEJ) or via homology-directed repair (HDR) results in the introduction of mutations in the target DNA sequence [3]. Mutations induced by CRISPR/Cas occur within a short range flanking the protospacer adjacent motif (PAM) site (e.g., 3–4 bp upstream of the PAM site for Cas9). NHEJ creates allelic series of mutations, typically in the range of short deletions and insertions (one up to tens of nucleotides), or few substitutions. Potential target sites, cleavage efficiencies, and induced scarring patterns may be predicted based on the gRNA sequence using machine learning, but accurate models for plants require further training on large-scale data sets [4, 5]. CRISPR/Cas is widely used to create targeted gene knockouts in several plant species. For example, Wang et al. [6] used CRISPR/Cas to knock out the susceptibility genes of the mildew-resistance locus (MLO) in wheat, generating wheat that is resistant to powdery mildew [6]. Mutations in a gene that result in a non-functional protein encoded by that gene, as described for the mutations in the MLO genes, are called loss-of-function (LOF) mutations and can occur when the ORF downstream of the mutation is disrupted (out-of-frame indel or frameshift) [7].
The CRISPR/gRNA complex may also be used as location-specific vehicle to deliver DNA sequence modifiers to a given location and modify the primary sequence (like base-editing or prime-editing) or epigenetic state [8, 9]. In base-editing, a cytidine deaminase (C:G-to-T:A) or adenosine deaminase (A:T-to-G:C) is linked to the CRISPR/gRNA complex and is used for the conversion of a single nucleotide at a specific position [8]. Base-editing can be used to create a specific point mutation, which may result in a single amino acid change in the protein sequence, a premature STOP codon or alter a splicing acceptor or donor site [10]. For example, in tomato and potato, base-editing was successfully used to convert a cytidine into a thymine in the acetolactate synthase (ALS) gene, conferring resistance to herbicides [11]. Mutations in a gene that result in an enhanced activity or functionality of the protein encoded by that gene, as described for the mutation in the ALS gene, are called gain-of-function (GOF) mutations.
While base-editing can only be used for two types of nucleotide substitution, prime-editing can introduce all kinds of predefined mutations, including the deletion, insertion, and/or substitution of specific nucleotides [12]. A prime-editing system consists of a Cas enzyme with nickase activity, reverse transcriptase, and prime-editing guide RNA (pegRNA) with a primer binding site for the specification of the genomic target site and an RNA template that encodes the desired edit [13]. It was already successfully used in rice for the insertion of a fragment up to 15 bp in the OsCDC48-T1 gene [12] and for the triple amino acid substitution in the EPSPS gene in rice to confer a higher level of glyphosate resistance [14].
CRISPR/Cas and its variants are also able to target DNA sequences at gene regulatory sites, e.g., transcription factor binding sites, splicing sites, and translation initiation and/or termination codons, thus changing gene structural features or coding potential. This will affect the different processes driving transcription, mRNA maturation, and translation (Fig. 5.1). In addition, the epigenetic state can be modulated by fusing the CRISPR protein with an epigenetic modifier that can affect the methylation state at DNA level or affect the methylation and/or acetylation state at nucleosome level (histone modification) [15, 16]. For example, Gallego-Bartolomé et al. [17] were able to reactivate the transcription of the FWA gene by demethylation of the FWA promotor using a dead Cas9 fused to the human demethylase TET2cd [17].
In short, CRISPR/Cas and its variants can be used to introduce a range of mutations into a DNA sequence that have different effects on the encoded protein. LOF or GOF mutations can be generated to study the role of certain proteins in biological processes, to confer resistance to pathogens or herbicides, to divert the metabolic flux of biosynthesis pathways towards valuable compounds, etc.
3 Screening Methods: Complexity, Resolution, and Scalability
Different screening methods are available to detect the outcome of CRISPR/Cas gene editing, to identify which plant material contains a desired gene edited sequence, and to evaluate the mutation efficiency. Screening methods may apply different detection methods (physical properties of an amplified allele vs sequencing-based), comprise targeted or untargeted screening (local sequencing of the predicted edited site (e.g., amplification or capture of the gRNA binding site and flanking regions), or global sequencing (e.g., WGS, RNA-Seq)), and with different levels of throughput and automation (via locus and/or sample multiplexing).
Simply put, any standard molecular detection technique that can discriminate DNA sequence variants (alleles) can also be used to detect CRISPR-induced mutations (Fig. 5.2). PCR-amplification of the target region, coupled to a detection method such as high-resolution melting (HRM) [18], fluorescent probe binding (qPCR or ddPCR [19]), or amplicon length polymorphism (agarose gel-electrophoresis, capillary fragment analysis, or mismatch detection assay [20] (a variant of Cleaved Amplified Polymorphic Sequences (CAPS) markers), or IDAA [20]) can be used to identify mutated alleles (Fig. 5.2). In addition, Kompetitive Allele-Specific PCR (KASP [21]) or primer–extension assays [22] may be used to screen for expected SNPs. These techniques are cheap, easy to implement, and allow for quick routine screening of gene edited mutant collections [23]. However, they only indirectly show the presence of a mutation, and not the actual, exact mutant DNA sequence at the nucleotide level, a prerequisite to interpret the effect of the mutation on the encoded protein.
Amplification and sequencing of target loci of mutants provides information on the specific nucleotides that are deleted, inserted and/or substituted (Fig. 5.2). Sanger (dideoxy-) sequencing generates electropherograms allowing for the determination of the DNA sequence and the identification of mutations [23, 24]. The interpretation of the electropherogram can be challenging, as multiple nucleotides can be called at the same position due to heterozygous insertions, deletions and/or substitutions. Therefore, several computational tools have been developed to deconvolute the electropherograms, such as Tracking of Insertions and Deletions (TIDE) [25], CRISP-ID [26], Deconvolution of Complex DNA Repair (DECODR) [27], and Inference of CRISPR Edits (ICE) [28]. These tools utilize distinct algorithms to analyze electropherograms from a wild-type and a gene edited sample, generating a list with predicted mutated sequences [24]. The sensitivity of Sanger sequencing for alternative alleles in a heterozygous or otherwise mixed sample is about 15% [29]. Consequently, low-efficiency editing is likely to be overlooked. Furthermore, these methods are typically performed with a separate amplification and detection reaction for each sample and each locus (simplex), limiting the scalability for mutation screens to large collections at multiple target loci.
Next Generation Sequencing NGS allows for massive parallel sequencing and analysis of heterogeneous samples and substantially lowers the per-sample and per-locus costs in high-throughput mutation screens [3]. Because of its deep read coverage, NGS sensitivity for alternative alleles is 0.1–1% and thus enables screening of bulk samples (e.g., protoplasts after transfection), and efficient 1D, 2D or 3D pooling schemes [28]. In addition, multiplex amplicon sequencing combined with incorporation of sample-specific barcodes during library preparation facilitates parallel sequencing at hundreds of loci, in hundreds of samples per sequencing run. NGS yields targeted resequencing data that can be analyzed via bioinformatics tools such as CRISPResso2 [30] and SMAP haplotype-window [31] (Fig. 5.3). In SMAP haplotype-window, sequencing reads are mapped to a reference and the entire read sequence spanning the region between borders (typically the amplicon primer binding sites) is considered as an allele [31]. All the unique alleles are sorted and counted for the calculation of relative allele frequency per locus per sample. A region of interest (ROI) can be defined to focus the analysis on mutations introduced by the gene editing technique in a narrow nucleotide window and ignore additional sequence variants at distance from the edit site. Every allele is compared to the reference in its entirety, allowing for the detection of any combination of insertion, deletion, and/or substitution. SMAP haplotype-window will generate an integrated genotype call table with all the observed alleles per locus per sample. Since it is agnostic to the length of the deletion, insertion, or substitution, it can detect any mutation resulting from an edit in the primary DNA sequence in a given window, as long as the amplicon or read length spans the mutated allele. SMAP haplotype-window can also process probe-capture enriched, WGS, and RNA-Seq read data from global resequencing screens, for a given list of target loci. PacBio sequencing [32] and nanopore MinION sequencing [33] can be used to detect long-range insertions and deletions, as well as epigenetic DNA modifications introduced by CRISPR/Cas.
4 Mutation Screening in a Broader Perspective: From Nucleotide to Protein
The current repertoire of CRISPR/Cas DNA modifiers combined with gRNA specificity, generates a huge array of design possibilities, especially when based on the principles that predict how protein sequences may be altered by editing the genomic nucleotide sequence. A mutation screening workflow that draws on clever CRISPR design, in turn, should be able to consider detected mutated alleles in their respective gene context and classify the mutated alleles based on predefined desired alleles (e.g. a unique base-edit) or on percentage protein sequence similarity to the original wild-type allele [7].
The SMAP effect-prediction module from the SMAP package estimates the novel encoded protein sequence and is the final step in the mutation screening workflow (Fig. 5.3) [31]. SMAP haplotype-window generates a list of all observed haplotypes per locus per sample, which is directly used as input for SMAP effect-prediction, together with all positional information on gene structural features. SMAP effect-prediction replaces a segment of the original reference gene sequence by the observed mutated sequence and evaluates all the splicing sites, the translation initiation codon, open reading frame, and translation termination codon [31]. After translation of the most likely ORF in the mutated allele, the amino acid sequence is aligned to the original reference protein and the percentage protein sequence similarity is estimated as quantitative score for the remaining protein functionality. Proteins may no longer resemble the original reference protein (frameshift mutation or nonsense mutation), proteins can be identical to the original reference protein (silent mutation), etc. (Fig. 5.3). A threshold value can be set for the percentage sequence similarity to the original reference protein still needed for a protein to perform its function. DNA mutations that result in a protein with a lower percentage of similarity as a given threshold value can be defined as a loss-of-function mutation [34].
5 Conclusions
CRISPR/Cas and CRISPR/Cas variants are widely used to introduce mutations into a DNA sequence. However, mutations can have different effects on the function of a gene and its encoded protein. Here, we describe a molecular screening workflow that focuses on the path from CRISPR/Cas and gRNA design, through screening for mutant alleles, and prediction of the effect of the DNA sequence mutation on the encoded protein, all implemented in modules of the SMAP package. By using SMAP haplotype-window and SMAP effect-prediction, the detected mutated alleles are placed in their respective gene context, and the mutated alleles can be classified based on percentage of protein sequence similarity to the original wild-type allele. This high-throughput screening workflow allows for the automated and streamlined screening of multiplex CRISPR experiments, in large mutant collections (locus and/or sample multiplexing) and enables fast and easy interpretation of the effect of the mutant alleles on the protein sequence, and automated routine identification of carriers of desired alleles.
References
Mao, Y., Botella, J.R., Liu, Y., Zhu, J.-K.: Gene editing in plants: progress and challenges. Natl. Sci. Rev. 6, 421–437 (2019)
Roos, D., de Boer, M.: Mutations in cis that affect mRNA synthesis, processing and translation. Biochim. Biophys. Acta (BBA) - Mol. Basis Dis. 1867, 1–21 (2021)
Bell, C.C., Magor, G.W., Gillinder, K.R., Perkins, A.C.: A high-throughput screening strategy for detecting CRISPR-Cas9 induced mutations using next-generation sequencing. BMC Genomics. 15, 1–7 (2014)
Abadi, S., Yan, W.X., Amar, D., Mayrose, I.: A machine learning approach for predicting CRISPR-Cas9 cleavage efficiencies and patterns underlying its mechanism of action. PLoS Comput. Biol. 13, 1–24 (2017)
Allen, F., et al.: Predicting the mutations generated by repair of Cas9-induced double-strand breaks. Nat. Biotechnol. 37, 64–82 (2019)
Wang, Y., et al.: Simultaneous editing of three homoeoalleles in hexaploid bread wheat confers heritable resistance to powdery mildew. Nat. Biotechnol. 32, 947–951 (2014)
Gerasimavicius, L., Livesey, B.J., Marsh, J.A.: Loss-of-function, gain-of-function and dominant-negative mutations have profoundly different effects on protein structure. Nat. Commun. 13, 1–15 (2022)
Azameti, M.K., Dauda, W.P.: Base editing in plants: applications, challenges, and future prospects. Front. Plant Sci. 12, 1–13 (2021)
Li, J., et al.: Development of a highly efficient prime editor 2 system in plants. Genome Biol. 23, 1–9 (2022)
Kluesner, M.G., et al.: CRISPR-Cas9 cytidine and adenosine base editing of splice-sites mediates highly-efficient disruption of proteins in primary and immortalized cells. Nat. Commun. 12, 1–12 (2021)
Veillet, F., et al.: Transgene-free genome editing in tomato and potato plants using agrobacterium-mediated delivery of a CRISPR/Cas9 cytidine base editor. Int. J. Mol. Sci. 20, 1–10 (2019)
Lin, Q., et al.: Prime genome editing in rice and wheat. Nat. Biotechnol. 38, 582–585 (2020)
Jin, S., Lin, Q., Gao, Q., Gao, C.: Optimized prime editing in monocot plants using PlantPegDesigner and engineered plant prime editors (ePPEs). Nat. Protoc. 18, 831–853 (2023)
Li, H., Li, J., Chen, J., Yan, L., Xia, L.: Precise modifications of both exogenous and endogenous genes in Rice by prime editing. Mol. Plant. 13, 671–674 (2020)
Xie, N., Zhou, Y., Sun, Q., Tang, B.: Novel epigenetic techniques provided by the CRISPR/Cas9 system. Stem Cells Int. 2018, 1–12 (2018)
Dubois, A., Roudier, F.: Deciphering plant chromatin regulation via CRISPR/dCas9-based epigenome engineering. Epigenomes. 5, 1–16 (2021)
Gallego-Bartolomé, J., et al.: Targeted DNA demethylation of the arabidopsis genome using the human TET1 catalytic domain. Proc. Natl. Acad. Sci. U.S.A. 115, E2125–E2134 (2018)
Li, R., et al.: Rapid and sensitive screening and identification of CRISPR/Cas9 edited rice plants using quantitative real-time PCR coupled with high resolution melting analysis. Food Control. 112, 1–6 (2020)
Peng, C., et al.: Accurate detection and evaluation of the gene-editing frequency in plants using droplet digital PCR. Front. Plant Sci. 11, 1–8 (2020)
Sentmanat, M.F., Peters, S.T., Florian, C.P., Connelly, J.P., Pruett-Miller, S.M.: A survey of validation strategies for CRISPR-Cas9 editing. Sci. Rep. 8, 1–8 (2018)
Kalendar, R., Shustov, A.V., Akhmetollayev, I., Kairov, U.: Designing allele-specific competitive-extension PCR-based assays for high-throughput genotyping and gene characterization. Front. Mol. Biosci. 9, 1–13 (2022)
Pereiro, I., et al.: Arrayed primer extension technology simplifies mutation detection in Bardet-Biedl and Alström syndrome. Eur. J. Hum. Genet. 19, 485–488 (2011)
Bennett, E.P., et al.: INDEL detection, the ‘Achilles heel’ of precise genome editing: a survey of methods for accurate profiling of gene editing induced indels. Nucleic Acids Res. 48, 11958–11981 (2020)
Carrington, B., Sood, R.: A comprehensive review of Indel detection methods for identification of Zebrafish knockout mutants generated by genome-editing nucleases. Genes (Basel). 13, 1–16 (2022)
Brinkman, E.K., Chen, T., Amendola, M., van Steensel, B.: Easy quantitative assessment of genome editing by sequence trace decomposition. Nucleic Acids Res. 42, e168–e168 (2014)
Dehairs, J., Talebi, A., Cherifi, Y., Swinnen, J.V.: CRISP-ID: decoding CRISPR mediated indels by Sanger sequencing. Sci. Rep. 6, 28973 (2016)
Bloh, K., et al.: Deconvolution of complex DNA repair (DECODR): establishing a novel deconvolution algorithm for comprehensive analysis of CRISPR-edited sanger sequencing data. CRISPR J. 4, 120–131 (2021)
Conant, D., et al.: Inference of CRISPR edits from sanger trace data. CRISPR J. 5, 123–130 (2022)
Hagemann, I.S.: Overview of technical aspects and chemistries of next-generation sequencing. In: Clinical Genomics, pp. 3–19. Elsevier Inc. (2015). https://doi.org/10.1016/B978-0-12-404748-8.00001-0
Clement, K., et al.: CRISPResso2 provides accurate and rapid genome editing analysis. Nat. Biotechnol. 37, 220–224 (2019)
Schaumont, D. et al.: Stack Mapping Anchor Points (SMAP): a versatile suite of tools for read-backed haplotyping. bioRxiv 2022.03.10.483555 (2022)
Rhoads, A., Au, K.F.: PacBio sequencing and its applications. Genomics Proteomics Bioinformatics. 13, 278–289 (2015)
Feng, Y., Zhang, Y., Ying, C., Wang, D., Du, C.: Nanopore-based fourth-generation DNA sequencing technology. Genomics Proteomics Bioinformatics. 13, 4–16 (2015)
Develtere, W., et al.: SMAP design: a multiplex PCR amplicon and gRNA design tool to screen for natural and CRISPR-induced genetic variation. Nucleic Acids Res. 51, e37–e37 (2023)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2024 The Author(s)
About this chapter
Cite this chapter
Vereecke, E., Van Laere, K., Ruttink, T. (2024). CRISPR/Cas Mutation Screening: From Mutant Allele Detection to Prediction of Protein Coding Potential. In: Ricroch, A., Eriksson, D., Miladinović, D., Sweet, J., Van Laere, K., Woźniak-Gientka, E. (eds) A Roadmap for Plant Genome Editing . Springer, Cham. https://doi.org/10.1007/978-3-031-46150-7_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-46150-7_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46149-1
Online ISBN: 978-3-031-46150-7
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)