Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Nov;563(7732):501-507.
doi: 10.1038/s41586-018-0692-z. Epub 2018 Nov 14.

Improved reference genome of Aedes aegypti informs arbovirus vector control

Benjamin J Matthews  1   2   3 Olga Dudchenko  4   5   6   7 Sarah B Kingan  8 Sergey Koren  9 Igor Antoshechkin  10 Jacob E Crawford  11 William J Glassford  12 Margaret Herre  13   14 Seth N Redmond  15   16 Noah H Rose  17   18 Gareth D Weedall  19   20 Yang Wu  21   22   23 Sanjit S Batra  4   5   6 Carlos A Brito-Sierra  24   25 Steven D Buckingham  26 Corey L Campbell  27 Saki Chan  28 Eric Cox  29 Benjamin R Evans  30 Thanyalak Fansiri  31 Igor Filipović  32 Albin Fontaine  33   34   35   36 Andrea Gloria-Soria  30   37 Richard Hall  8 Vinita S Joardar  29 Andrew K Jones  38 Raissa G G Kay  39 Vamsi K Kodali  29 Joyce Lee  28 Gareth J Lycett  19 Sara N Mitchell  11 Jill Muehling  8 Michael R Murphy  29 Arina D Omer  4   5   6 Frederick A Partridge  26 Paul Peluso  8 Aviva Presser Aiden  4   5   40   41 Vidya Ramasamy  38 Gordana Rašić  32 Sourav Roy  42 Karla Saavedra-Rodriguez  27 Shruti Sharan  24   25 Atashi Sharma  23   43 Melissa Laird Smith  8 Joe Turner  44 Allison M Weakley  11 Zhilei Zhao  17   18 Omar S Akbari  45   46 William C Black 4th  27 Han Cao  28 Alistair C Darby  44 Catherine A Hill  24   25 J Spencer Johnston  47 Terence D Murphy  29 Alexander S Raikhel  42 David B Sattelle  26 Igor V Sharakhov  23   43   48 Bradley J White  11 Li Zhao  49 Erez Lieberman Aiden  4   5   6   7   15 Richard S Mann  12 Louis Lambrechts  33   35 Jeffrey R Powell  30 Maria V Sharakhova  23   43   48 Zhijian Tu  22   23 Hugh M Robertson  50 Carolyn S McBride  17   18 Alex R Hastie  28 Jonas Korlach  8 Daniel E Neafsey  15   16 Adam M Phillippy  9 Leslie B Vosshall  13   51   14
Affiliations

Improved reference genome of Aedes aegypti informs arbovirus vector control

Benjamin J Matthews et al. Nature. 2018 Nov.

Abstract

Female Aedes aegypti mosquitoes infect more than 400 million people each year with dangerous viral pathogens including dengue, yellow fever, Zika and chikungunya. Progress in understanding the biology of mosquitoes and developing the tools to fight them has been slowed by the lack of a high-quality genome assembly. Here we combine diverse technologies to produce the markedly improved, fully re-annotated AaegL5 genome assembly, and demonstrate how it accelerates mosquito science. We anchored physical and cytogenetic maps, doubled the number of known chemosensory ionotropic receptors that guide mosquitoes to human hosts and egg-laying sites, provided further insight into the size and composition of the sex-determining M locus, and revealed copy-number variation among glutathione S-transferase genes that are important for insecticide resistance. Using high-resolution quantitative trait locus and population genomic analyses, we mapped new candidates for dengue vector competence and insecticide resistance. AaegL5 will catalyse new biological insights and intervention strategies to fight this deadly disease vector.

PubMed Disclaimer

Conflict of interest statement

P.P., M.L.S., J.M., S.B.K., R.H. and J.K. are employees of Pacific Biosciences, a company developing single-molecule sequencing technologies. J.L., S.C., H.C. and A.R.H. are employees of Bionano Genomics and own company stock options. O.D., S.S.B., A.D.O., A.P.A. and E.L.A. are inventors on a US provisional patent application 62/347,605, filed 8 June 2016, by the Baylor College of Medicine and the Broad Institute.

Figures

Fig. 1
Fig. 1. AaegL5 assembly statistics and annotation.
a, b, Treemap of AaegL3 (a) and AaegL4 (b) contigs scaled by length. c, Principal component analysis of allelic variation of the indicated strains at 11,229 SNP loci. n = 7 per genotype d, Flow cytometry analysis of LVP_AGWG genome size. n = 5 per sex. Box plot: median is indicated by the blue line; boxes show first to third quartiles, whiskers are the 1.5× interquartile interval (Extended Data Fig. 1b). e, Treemap of AaegL5 contigs scaled by length. f, Genome composition (Supplementary Data 2, 3). g, Gene set alignment BLASTp coverage is compared between AaegL3.4 and AaegL5.0, with D. melanogaster protein queries. h, Alignment of 253 RNA-seq libraries to AaegL3.4 and AaegL5.0 gene set annotations (Supplementary Data 4–9). LTR, long terminal repeat retrotransposon; MITES, miniature inverted-repeat transposable elements; SINES, short interspersed nuclear elements.
Fig. 2
Fig. 2. Chromosomal arrangement and increased number of chemosensory receptor genes.
a, Location of predicted chemoreceptors (odorant receptors (ORs), gustatory receptors (GRs) and ionotropic receptors (IRs)) by chromosome in AaegL5. The blunt end of the arrowheads marks gene position and the arrow indicates orientation. Filled and open arrowheads represent intact genes and pseudogenes, respectively (Supplementary Data 17–20 and Extended Data Fig. 3). b, Chemosensory receptor annotations are compared between AaegL5 and AaegL3.
Fig. 3
Fig. 3. Application of AaegL5 to resolve the sex-determining locus.
a, M locus structure indicating high alignment identity (grey-dashed boxes) and boundaries of myo-sex and Nix gene models (magenta and white boxes, arrowheads represent orientation). b, FISH of BAC clones containing myo-sex and Nix. Scale bar, 2 μm. Representative image of 10 samples. c, De novo optical map spanning the M locus and bridging the estimated 163-kb gap in the AaegL5 assembly. DNA molecules are cropped at the edges for clarity. d, Chromosome quotient (CQ) analysis of genomic DNA from male and female libraries aligned to AaegL5 chromosome 1. Each dot represents the CQ value of a repeat-masked 1-kb window with >20 reads aligned from male libraries.
Fig. 4
Fig. 4. Copy-number variation in the glutathione S-transferase epsilon gene cluster.
a, Glutathione S-transferase epsilon (GSTe) gene cluster structure in AaegL5 compared to AaegL3 (Supplementary Data 23). Arrowheads indicate gene orientation. b, Dot-plot alignment of AaegL5 GSTe region to itself. c, Optical mapping of DNA labelled with indicated enzymes. DNA molecules are cropped at the edges for clarity. d, Genomic sequencing coverage of AaegL3 GSTe genes (DNA read pairs mapped to each gene, normalized by gene length in kb) from one LVP_AGWG male and pooled mosquitoes from four other laboratory strains.
Fig. 5
Fig. 5. Using the AaegL5 genome for applied population genetics.
a, Heat map of linkage based on pairwise recombination fractions for 255 RAD markers ordered by AaegL5 physical coordinates. b, Significant QTLs on chromosome 2 underly systemic DENV dissemination in midgut-infected mosquitoes (Extended Data Fig. 10a). Curves represent log of the odds ratio (LOD) scores obtained by interval mapping. Dotted vertical lines indicate genome-wide statistical significance thresholds (α = 0.05). Confidence intervals of significant QTLs: bright colour, 1.5-LOD interval; light colour, 2-LOD interval with generalist effects (black, across DENV serotypes and isolates) and DENV isolate-specific effects (red, indicative of genotype-by-genotype interactions). c, d, Synteny between linkage map (in cM) and physical map (in Mb) for chromosome 2 (c) and chromosomes 1 and 3 (d). The orange color of chromosome 1 denotes uncertainty in the cM estimates because of deviations in Mendelian ratios surrounding the M locus. e, Chromosome 3 SNPs significantly correlated with deltamethrin survival. f, g, Magnified and inverted view of box in e, centred on the new gene model of voltage-gated sodium channel (VGSC, transcript variant X3; the chromosomal position is indicated in red). f, Non-coding genes are omitted for clarity, and other genes indicated with grey boxes. VGSC exons are represented by tall boxes and untranslated regions by short boxes. Arrowheads indicate gene orientation. Non-synonymous VGSC SNPs are marked with larger black and yellow circles: V1016I = 315,983,763; F1534C = 315,939,224; V410L = 316,080,722. g, Difference in expected heterozygosity (Hexp alive − Hexp dead) for all SNPs.
Extended Data Fig. 1
Extended Data Fig. 1. Project flowchart, measured genome size and assembly process.
a, Flowchart of LVP_AGWG strain inbreeding, data collection and experimental design of the AaegL5 assembly process. b, Estimated average 1C genome size for each strain for five Ae. aegypti strains and Ae. mascarensis, the sister taxon of Ae. aegypti, for which the genome size has not previously been measured. There were no significant differences between the sexes within and between the species and strains analysed (P > 0.2). Significant differences between strains were determined using Proc GLM in SAS with both a Tukey and a Scheffé option with the same outcome. Data labelled with different letters are significantly different (P < 0.01). c, Combining Hi-C maps with 2D annotations enabled efficient review of sequences identified as alternative haplotypes by sequence alignment. The figure depicts a roughly 24 Mb × 24 Mb fragment of a contact map generated by aligning a Hi-C dataset to an intermediate genome assembly generated during the process of creating AaegL5. This intermediate assembly was a sequence comprising error-corrected, ordered and oriented FALCON-Unzip contigs. The intensity of each pixel in the contact map correlates with how often pairs of loci co-locate in the nucleus. Maximum intensity is indicated in the lower left of each panel. These maps include reads that do not align uniquely (reads with zero mapping quality); such alignments are randomly assigned to one of the possible genomic locations. Three panels show three types of annotations that are overlaid on top of the contact map. Left, FALCON-Unzip contig boundaries are highlighted as black squares along the diagonal. Notably, large linear features appear above and below the diagonal. These are the result of sequence overlap among contigs, which can indicate the presence of undercollapsed heterozygosity in the contig set. Because reads that do not map uniquely are randomly assigned during the alignment step, Hi-C reads derived from a contig will sometimes be aligned to an overlapping contig. When this happens, the Hi-C read pair may contribute to the formation of a linear feature above and below the diagonal. Therefore, the linear stretches of enriched contact frequency parallel to the diagonal are brought about by the random assignment procedure, and can facilitate the detection of pairs of overlapping contigs. Note that, when the overlap between contigs is owing to undercollapsed heterozygosity, both contigs will exhibit similar long-range contact patterns. This aspect of Hi-C data also provides evidence for the presence of undercollapsed heterozygosity. Centre, LASTZ-alignment-based annotations for fully redundant contigs. The squares shown in blue are obtained by taking diagonal contig boundary annotations (in black) and shifting them up (respectively, left) when drawing above (respectively, below) the diagonal so that the overlapping sequences are horizontally (respectively, vertically) aligned. Note that, as expected, the squares typically span linear, off-diagonal features in the Hi-C data. When one contig is entirely contained in another contig, the redundant contig does not contribute sequence to the merged chromosome-length scaffolds. Right, LASTZ-alignment-based annotations for partially redundant contigs. Again, the squares shown in blue are obtained by taking diagonal contig boundary annotations (in black) and shifting them up and left. The overlaps shown in this panel correspond to contigs that only partially overlap in sequence with other contigs. Consequently, some of their sequence is incorporated in the final fasta. d, Comparison of chromosome lengths between AaegL4 and AaegL5. Numbers are given before post-Hi-C polishing and gap closing. e, Step-wise assembly statistics for Hi-C scaffolding, alternative haplotype removal and annotation. *Removed length, 779,073,495 bp. **The definition of scaffold groups can be found in a previously published study. ***Gaps between contigs were set to 500 bp for calculating scaffold statistics. N/A, not applicable.
Extended Data Fig. 2
Extended Data Fig. 2. Remaining assembly gaps, summary of geneset annotation improvement, chromatin accessibility analysis, physical genome map and gene strucures of biogenic amine-binding receptors and opsins in AaegL5.
a, Representation of structural variants identified at assembly gaps by alignment of Bionano optical maps. The estimated size of an insertion (blue) or deletion (red) relative to the reference is represented by the size of the circle. When the size or type of structural variants could not be determined or did not agree between the two optical maps, the location of the assembly gap is plotted in grey. Approximate locations of the centromeres (red triangles) and telomere-associated repeat sequences (blue triangles) are indicated. Raw data are available as Supplementary Data 1. b, Comparison of protein-coding genes and transcripts in AaegL5.0 (NCBI RefSeq Release 101) and gene set annotations from An. gambiae (Agam), Culex pipiens (Cpip) and D. melanogaster (Dmel). c, Sex peptide receptor structure in AaegL3.4 and AaegL5.0, and female brain RNA-seq and ATAC-seq reads aligned to AaegL5. Blue lines on the RNA-seq track indicate splice junctions, with the number of reads spanning a junction represented by line thickness. Exons are represented by tall filled boxes and introns by lines. Arrowheads indicate gene orientation. d, Average read profiles across promoter regions, defined as the transcription start site (TSS) ±2.5 kb. Solid lines represent Tn5-treated native chromatin using the ATAC-seq protocol (n = 4), dotted lines represent Tn5-treated naked genomic DNA (n = 1). Shaded regions represent s.d. e, A physical genome map was developed by localizing 500 BAC clones to chromosomes using FISH. For the development of a final chromosome map for the AaegL5 assembly, we assigned the coordinates of each outmost BAC clone within a band (Supplementary Data 12) to the boundaries between bands. The final resolution of this map varies on average between 5 and 10 Mb because of the differences in BAC mapping density in different regions of chromosomes. f, Schematic of predicted gene structures of the Ae. aegypti biogenic amine-binding receptors and opsins. Exons, cylindrical bars; introns, black lines; dopamine receptors, yellow bars; serotonin receptors, magenta bars; muscarinic acetylcholine receptors, green bars; octopamine receptors, blue bars; opsins, dark purple bars; predicted 3′ and 5′ non-coding sequence (dark shading). The ‘unclassified receptor’ GPRnna19 is not shown. Details on gene models compared to previous annotations and the predicted amino acid sequences of each gene are available in Supplementary Data 14–16.
Extended Data Fig. 3
Extended Data Fig. 3. Chromosomal arrangement of chemosensory receptor genes.
The location of predicted chemoreceptors (odorant receptors (ORs), gustatory receptors (GRs) and ionotropic receptors (IRs)) across all three chromosomes in AaegL5. The blunt end of each arrowhead plotted above each chromosome marks gene position and arrowhead indicates orientation. Filled and open arrowheads represent intact genes and pseudogenes, respectively (Supplementary Data 17–20). This figure is identical to Fig. 2a, but here includes gene names.
Extended Data Fig. 4
Extended Data Fig. 4. Phylogenetic tree of odorant receptor gene families from Ae. aegypti, An. gambiae and D. melanogaster.
Maximum likelihood odorant receptor tree was rooted with Orco proteins, which are both highly conserved and basal within the odorant receptor family. Support levels for nodes are indicated by the size of black circles—reflecting approximate likelihood ratio tests (aLRT values ranging from 0 to 1 from PhyML v.3.0 run with default parameters). Suffixes after protein names are C, minor assembly correction; F, major assembly modification; N, new model; P, pseudogene. Scale bar, amino acid substitutions per site.
Extended Data Fig. 5
Extended Data Fig. 5. Phylogenetic tree of the gustatory receptor gene families from Ae. aegypti, An. gambiae and D. melanogaster.
Maximum likelihood gustatory receptor tree was rooted with the highly conserved and distantly related carbon dioxide and sugar receptor subfamilies, which together form a basal clade within the arthropod gustatory receptor family. Subfamilies and lineages closely related to D. melanogaster gustatory receptors of known function are highlighted. Support levels for nodes are indicated by the size of black circles—reflecting approximate likelihood ratio tests (aLRT values ranging from 0 to 1 from PhyML v.3.0 run with default parameters). Suffixes after protein names are C, minor assembly correction; F, major assembly modification; N, new model; P, pseudogene. Scale bar, amino acid substitutions per site.
Extended Data Fig. 6
Extended Data Fig. 6. Phylogenetic tree of the ionotropic receptor gene families from Ae. aegypti, An. gambiae and D. melanogaster.
Maximum likelihood phylogenetic tree of ionotropic receptor protein sequences from the indicated species rooted with highly conserved Ir8a and Ir25a proteins. Conserved proteins with orthologues in all species are named outside the circle, and previously unannotated ionotropic receptors are highlighted with red lines. Support levels for nodes are indicated by the size of black circles—reflecting approximate likelihood ratio tests (aLRT values ranging from - to 1 from PhyML v.3.0 run with default parameters). Suffixes after protein names are C, minor assembly correction; F, major assembly modification; N, new model; P, pseudogene. Scale bar, amino acid substitutions per site.
Extended Data Fig. 7
Extended Data Fig. 7. Chemosensory receptor expression in adult Ae. aegypti tissues.
Previously published RNA-seq data were reanalysed using the new chemoreceptor annotations and genome assembly. Chemoreceptors have been clustered according to Euclidian distance of their expression vectors using the R function hclust. Expression is given for females at three stages of the gonotrophic cycle (0, 48 or 96 h after taking a blood-meal, for which 0 h indicates not blood-fed, 48 h indicates 48 h after the blood-meal, and 96 h indicates gravid). New genes are indicated by black bars on the right.
Extended Data Fig. 8
Extended Data Fig. 8. Structural variation, the Hox gene cluster and Hox cofactor motifs.
a, Linked-read sequencing of two individuals from the LVP_AGWG strain identified putative structural variants in the AaegL5 assembly. b, Comparative genomic arrangement of the Hox cluster (HOXC) in five species (Supplementary Data 22). Note the split of labial (lab) and proboscipedia (pb) between two chromosomes in Ae. aegypti and Cx. quinquefasciatus. Owing to chromosome arm exchange, chromosome 3p in Cx. quinquefasciatus is the homologue of chromosome 2p in Ae. aegypti. c, Repeats in putative telomere-associated sequences downstream of pb in both species. d, Motifs known to mediate protein–protein interactions with the Hox cofactor Extradenticle (Exd) from the five indicated species are aligned using Clustal-Omega. Perfectly aligned residues are coloured according to Hox gene identity, non-conserved residues are grey.
Extended Data Fig. 9
Extended Data Fig. 9. Population genomic structure and linkage disequilibrium analysis of Ae. aegypti strains.
a, Chromosomal patterns of nucleotide diversity (π) in four strains of Ae. aegypti measured in 100-kb non-overlapping windows and presented as a LOESS-smoothed curve. b, Mean nucleotide diversity in the strains in a, with s.d. indicated in parentheses. Nucleotide diversity (π) was measured in non-overlapping 100-kb windows. The Liverpool and Costa Rica colonies maintain extensive diversity despite being colonized in the laboratory more than a decade ago, but show reduced genome-wide diversity (on the order of 30–40%) relative to the more recently laboratory colonized Innisfail and Clovis. c, Pairwise linkage disequilibrium between SNPs located within the same chromosome estimated from 28 wild-caught individuals from the indicated populations. Each point represents the mean linkage disequilibrium for that set of binned SNP pairs. Solid lines are LOESS-smoothed curves, and dashed lines correspond to r2max/2. Inclusion of additional individuals available from the Amacuzac population (up to 137) had a minimal effect on the linkage disequilibrium estimations (∆R2 < 0.017; data not shown). d, Linkage disequilibrium (r2) values along the Ae. aegypti AaegL5 genome assembly based on pairwise SNP comparisons. Data were obtained from the average r2 of SNPs in 1-kb bins.
Extended Data Fig. 10
Extended Data Fig. 10. QTL analysis of DENV competence in Ae. aegypti and Cys-loop LGICs.
a, Schematic representation of the experimental workflow for testing DENV competence in Ae. aegypti, related to Fig. 5b–d. b, Comparison of QTL map density constructed against AaegL3 or AaegL5 assemblies. c, Concentration–response curves showing the effect on Ae. aegypti larval motility of insecticides currently used in veterinary and agricultural applications (mean ± s.e.m., n = 7). d, Phylogenetic tree of Cys-loop LGIC subunits for Ae. aegypti and D. melanogaster. The accession numbers of the D. melanogaster sequences used in constructing the tree are: Dα1 (CAA30172), Dα2 (CAA36517), Dα3 (CAA75688), Dα4(CAB77445), Dα5 (AAM13390), Dα6 (AAM13392), Dα7(AAK67257), Dβ1 (CAA27641), Dβ2 (CAA39211), Dβ3 (CAC48166), GluCl (AAG40735), GRD (Q24352), HisCl1 (AAL74413), HisCl2 (AAL74414), LCCH3 (AAB27090), Ntr (NP_651958), pHCl (NP_001034025), RDL (AAA28556). For Ae. aegypti sequences, see Supplementary Data 24. ELIC (Erwinia ligand-gated ion channel), which is an ancestral Cys-loop LGIC found in bacteria (accession number P0C7B7), was used as an outgroup. Scale bar, amino acid substitutions per site.

Comment in

Similar articles

Cited by

References

    1. Bhatt S, et al. The global distribution and burden of dengue. Nature. 2013;496:504–507. doi: 10.1038/nature12060. - DOI - PMC - PubMed
    1. Nene V, et al. Genome sequence of Aedes aegypti, a major arbovirus vector. Science. 2007;316:1718–1723. doi: 10.1126/science.1138878. - DOI - PMC - PubMed
    1. Timoshevskiy VA, et al. An integrated linkage, chromosome, and genome map for the yellow fever mosquito Aedes aegypti. PLoS Negl. Trop. Dis. 2013;7:e2052. doi: 10.1371/journal.pntd.0002052. - DOI - PMC - PubMed
    1. Waterhouse RM, et al. BUSCO applications from quality assessments to gene prediction and phylogenomics. Mol. Biol. Evol. 2017;35:543–548. doi: 10.1093/molbev/msx319. - DOI - PMC - PubMed
    1. Chin CS, et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods. 2016;13:1050–1054. doi: 10.1038/nmeth.4035. - DOI - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources